Lead Investigator: Bruce Perkins, University of Toronto
Vivli Data Request: 5209
Funding Source: The study is not directly funded. Analysis will be conducted by a PhD student (L. Erik Lovblom) who is funded by a CIHR doctoraral award. The lead researcher (B Perkins) is a co-supervisor for the student.
Potential Conflicts of Interest: There are no conflict of interests. Dr. Perkins discloses that he was a PI on the EASE-2 study and has served as an advisor to Boehringer Ingelheim.

Summary of the Proposed Research:

We have designed a three-step project to identify the critical components of a diabetic ketoacidosis (DKA) mitigation strategy that will be intended for use in clinical practice and research. DKA is a serious complication of diabetes that occurs when high blood sugar and acidic substances called ketones build to dangerous levels in the body and can poison the body. Recently, a new group of oral drugs used for treating diabetes by lowering blood glucose levels called Sodium Glucose Linked Transporter (SGLT) inhibitors have been introduced. They may help people with T1D treat their diabetes in conjunction with insulin, but they may also increase the risk of DKA. Our mitigation strategy will be used to effectively mitigate risk of DKA in the general type 1 diabetes (T1D) population, in the T1D population using Sodium Glucose Linked Transporter (SGLT) inhibition, and in large-scale trials of efficacy and safety of SGLT inhibition.

The study design includes three separate objectives: the first two call for new statistical analyses of existing clinical trials and observational studies, and the third calls for the development of a mitigation tool by a multidisciplinary team of patients, caregivers, and healthcare providers. The first two objectives are meant to inform the third, and one of those first two objectives applies to this data request to Boehringer Ingelheim through

Our objective for this request is to define the optimal diagnostic thresholds for beta-hydroxybutyrate (BHB) levels to be used in mitigation strategies for patients using and not using SGLT inhibition therapy. BHB is a chemical that is used for energy by some cells in the body when sugar levels are low, and it can be checked using standard ketone tests as a marker of DKA. These optimal thresholds for BHB will therefore be helpful in predicting future DKA incidence in patients with T1D. We intend to analyze the data on the 1707 participants who were followed for 6-12 months in the EASE-2 and EASE-3 phase three clinical trials. Using “diagnostic study methods”, we will determine if there is a threshold level of BHB or ketones (obtained by the finger-stick method for obtaining a capillary blood sample from the finger) that identifies people at high risk of subsequent DKA events. The implication of this research is that the mitigation strategy can incorporate well-defined and validated thresholds to monitor for times when individual patients will be at elevated risk of DKA.

Statistical Analysis Plan:

The primary analysis of this second objective will use time-dependent receiver operating characteristic (ROC) curve methods. This analysis technique generalizes the concepts of cross-sectional diagnostic performance indicators (such as sensitivity and specificity) to a longitudinal setting and is able to account for censoring. It is also able to accommodate time-varying index test results; as such, we will evaluate the predictive ability of both baseline and time-varying BHB. Time-dependent area under the curve (AUC) values for each ROC curve (representing baseline t0 to follow-up time, ti where “i” represents event time) will be generated, and the survival concordance index (C-index) will be reported. The C-index is interpreted as the probability that predictions for a random pair of participants are concordant with their observed future outcomes, and it can be calculated as a weighted average of the time-dependent AUC values. Optimal diagnostic thresholds will be found by finding the point on each ROC curve closest to the point of perfect discrimination. The advantage of this overall analysis approach is that it yields estimates of diagnostic performance that can be useful for patients and clinicians. The effects of covariates on diagnostic performance, such as sex, insulin pump use, and empagliflozin dose, will be assessed using ROC-regression. In contrast to the primary analysis, the secondary analysis will use a non-hypothesis driven approach, and will investigate prediction of DKA using machine-learning techniques described in Section 3.10 below.
Power calculation. Based on published results of the EASE program, using an adapted method of Hanley and McNeil, we investigated power to detect a conservative area-under-the-curve of 0.70 from a receiver operating characteristic curve (ROC) analysis for the diagnostic validity of BHB measurements. During the program, 76 participants with certain or potential cases were observed. This number of cases yields power >0.99 to detect an AUC of 0.70.

Sensitivity analysis based on machine-learning techniques. Machine learning is a form of artificial intelligence which mines complex data for patterns and can be applied to solve two main types of tasks: image recognition and clinical prediction. The focus of this sub-section of the grant is the latter, with the goal of identifying adults with T1D who are at increased risk of DKA. Traditional analytic techniques (e.g., logistic regression, linear regression) work well when there is a relatively small number of important variables with linear relationships with the outcome. Traditional analytic techniques may not work well when the outcome is rare (e.g. DKA), continuous variables such as laboratory parameters with non-linear relationships are included, or time-varying variables such as changing serum ketone values are included. Unpublished data from Dr. Mike Fralick’s recently completed PhD identified potential risk factors for DKA among T2D adults who have received an SGLT2 inhibitor in routine care. The study included a cohort of approximately 110,000 adults in the US, all of whom had at least 85 baseline variables (e.g., age, sex, insulin use, serum creatinine, hemoglobin A1C). The study identified that the risk of DKA in the subsequent 365 days, consistent with past literature, was relatively rare (N=194 events, 4 per 1,000 person-years). Using gradient boosted trees, one of the most popular machine learning techniques, they identified that two of the strongest risk factors for DKA had non-linear relationships. The first was creatinine, which had a “U-shaped” relationship with risk of DKA, such that adults with a very low or very high creatinine had a higher risk of DKA. The second was hemoglobin A1C, which a “stepped-shape” relationship with the risk of DKA such that adults with an A1C below 9% had risk approaching 0%, while those with A1C of 10-11% had 2% risk, and those with A1C of 11-12% had 6% risk of DKA. However, this dataset lacked data on ketone levels and the model only considered baseline variables. Building from this work, we plan to use machine learning to identify risk factors for DKA among adults with T1D who received empagliflozin in the data-rich EASE program trial.
We will use machine learning to identify risk factors for DKA using two models: a static-model and a time-varying model. In the static model we will only include the baseline variables at the time of randomization only (e.g., age, sex, comorbid conditions, baseline A1C). In the time-varying model we will consider the variables at the time of randomization as well as the laboratory values as they change over time (e.g., serum ketones). Gradient boosting will be used for the static model as well as the time-varying model because it effectively handles missing data and it has been shown to have good predictive performance across a wide range of problems. Since the outcome of DKA is binary, we will apply a Bernoulli distribution which is a special case of the binomial distribution. The model hyperparameters will be selected using a grid search of varying number of trees, depth, lambda values (shrinkage factor), and bag fraction (random subset fraction) that will optimize the log-loss between observed and predicted probabilities. To implement gradient boosting we will use the xgboost package which is freely available in R.

Requested Studies:

Empagliflozin as Adjunctive to Insulin Therapy Over 26 Weeks in Patients With T1DM (EASE-3)
Sponsor: Boehringer Ingelheim
Study ID: NCT02580591

Empagliflozin as Adjunctive to InSulin thErapy Over 52 Weeks in Patients With Type 1 Diabetes Mellitus (EASE-2)
Sponsor: Boehringer Ingelheim
Study ID: NCT02414958

Public Disclosures:

Song, C., Dhaliwal, S., Bapat, P., Scarr, D., Bakhsh, A., Budram, D., Verhoeff, N.J., Weisman, A., Fralick, M., Ivers, N.M. and Cherney, D.Z., 2023. Point-of-care Capillary Blood Ketone Measurements and the Prediction of Future Ketoacidosis Risk in Type 1 Diabetes. Diabetes Care. Doi: 10.2337/dc23-0840