Risk stratification and responder identification for glucagon-like peptide-1 receptor agonists (GLP-1 RA) and sodium-glucose cotransporter 2 inhibitors (SGLT2i) in Type 2 Diabetes Mellitus (T2DM): a machine learning facilitated post-hoc analysis of clinical trials

Lead Investigator: Linong Ji, Peking University People’s Hospital
Title of Proposal Research:  Risk stratification and responder identification for glucagon-like peptide-1 receptor agonists (GLP-1 RA) and sodium-glucose cotransporter 2 inhibitors (SGLT2i) in Type 2 Diabetes Mellitus (T2DM): a machine learning facilitated post-hoc analysis of clinical trials
Vivli Data Request: 7844
Funding Source: Beijing Nova Cross program (Z211100002121169) to Xiantong Zou: Precision medicine in diabetes facilitated by artificial intelligence
Potential Conflicts of Interest: None

 

Summary of the Proposed Research:

Type 2 diabetes mellitus (T2DM) is a chronic disease that with high blood glucose affecting approximately 537 million adults (20-79 years) people worldwide. The number of people living with T2DM are still rising rapidly in many countries, causing intensive burdens on the global health system. T2DM may also lead to kidney failure, heart attacks and stroke, which are the main cause of premature death in T2DM patients. In current diabetes management, there is a priority to prevent the development of cardio-renal complications, which are kidney disease and heart disease related to T2DM. Several prescription medicines were used in conjunction with diet and exercise in order to reduce blood sugar levels in adults with T2DM and one non-insulin medication, glucagon-like peptide-1 receptor agonists (GLP-1 RA) showed a robust effect on the cardio-renal system. As compared to GLP-1RA, sodium-glucose cotransporter 2 inhibitors (SGLT2i) appeared to have similar heart-protective benefits and enhanced kidney-protective benefits. Both drugs were now recommended for T2DM with high cardiovascular or kidney risks. However, not all T2DM patients develop heart and kidney complications, so it is important to precisely identify who might progress to these complications so these patients may require these two drugs more than others. Also, in current clinical practice, which person should use GLP-1RA usage and which person should choose SGLT2i were unknown. Machine learning (ML) is a type of artificial intelligence (AI) that allows the machine or software to become more accurate at predicting outcomes. It is essential to use multiple datasets to test the generalizability of the machine learning methods. In this study, we aimed to use machine learning methods to develop useful tools to help clinicians identify those patients with a high risk to progress to cardio-renal complications using multiple cohorts. We will develop two machine learning models, one using only baseline features and one with additional information on responses to drugs at the early phase of treatment. By using these tools, we also predicted the final responses to SGLT2i and GLP-1RA in cohorts using these drugs. By developing these models and identifying the responders to GLP-1RA and SGLT2i, we will be able to help clinicians find the most fitting patients and make a good choice on drugs.

 

Statistical Analysis Plan:

(1) Data preparation and variable selection
To develop models, we will use ACCORD as the derivation cohort and Harmony, CANVAS and CREDENCE as external validation cohorts. In each cohort, continuous variables will be standardized and skew variables will be log-transformed. Missing data will be generated by multiple imputations. In the derivation cohort, demographics data (such as age, sex, and country), physical examination (such as body weight, height, BMI, waist circumference, pulse, SBP, DBP), blood and urine test (such as fasting plasma glucose, HbA1c, fasting insulin, fasting c-peptide, TG, TC, HDL, LDL, hs-CRP, UACR, uric acid), physical activity, smoking status, family history of cardiovascular disease, and other medication will be selected as independent predictors. Variables with missing data in more than 20% of samples or high collinearity (Pearson coefficient>0.7) will be first excluded.
(2) Development and Validation of baseline risk models
First, we will develop baseline risk models with pretreatment variables selected in the derivation cohort, the ACCORD trial. It was shown in ACCORD that intensive glycemic therapy didn’t reduce the risk for 3p-MACE or composite renal outcomes, except for the progression of albuminuria [14], so we combine the intensive glycemic control arm and standard glycemic control arm for cardiovascular disease and renal composite outcomes. However, intensive glycemic therapy reduced the risk for the progression of albuminuria [15]. We will add the treatment method as a variable to deviate the model for the albuminuria progression endpoint. Some machine learning methods will be applied, such as random forest, k-nearest neighbours (kNN), support vector machine (SVM), naïve Bayes, Elastic Net, xgboost and if appropriate, artificial neural network (ANN). Models will be calibrated and validated using 10-fold cross-validation. Model performance will be assessed by area under the curve (ROC) and C-statistic and the model with the best performance will be selected as the final “baseline risk model”. The cut-off point of this model will be determined using the Youden index. This model will be externally tested in the placebo arm of all validation cohorts since the outcome may be altered by SGLT2i/GLP-1RA treatment.
(3) Application 1: Cardio-renal Risk stratification within T2DM
We hypothesise that even in patients with high cardiovascular risk in clinical trials, we can still distinguish the patients who have larger cardio-renal risks and are worth prioritizing the SGLT2i/GLP-1RA therapy using powerful predictive models. In this part, we hope our model could help guide drug choice for patients at the beginning, so only baseline information will be used to predict outcomes. According to the prediction of our model, all patients in validation cohorts, i.e. HARMONY and CANVAS, will be separated into high-risk and low-risk groups using optimal cut-off points. Treatment effect heterogeneity will be tested between the high risk and low-risk groups on the absolute scale by estimating absolute risk differences using logistic regression, and on the relative scale by comparing models including an interaction term with models excluding interaction term using likelihood-ratio tests.
(4) Development and Validation of dynamic risk models
Dynamic risk models may improve the prediction accuracy since it incorporates not only baseline characters but also early responses to the treatment of a patient. We will add interim risk factors into the same baseline risk model to estimate updated outcome risks during different disease courses. The same methods as section (2) will be used to develop the model. The refined new model, including baseline and subsequent information, will be named “Dynamic Risk Model” and externally tested on all participants in validation cohorts, regardless of glucose-lowering strategies. To assess the advantage of our new model, we will also compare the performance of our models with our previously established baseline risk model, and with other well-established cardiovascular risk scores, such as SCORE (Systematic Coronary Risk Evaluation) and atherosclerotic cardiovascular disease algorithm for 10-year risk based on Pooled Cohort Equations (ASCVD) [8,9] in our validation populations.
(4) Application 2: Drug Responder identification
The biggest challenge for clinicians was to identify specific patients responding to SGLT2i or GLP-1RA at the beginning since either of them can be chosen to prevent the cardio-renal risks according to current guidelines. In our study, the eventual “responder” will be defined as a patient who was predicted to have a cardio-renal event at baseline but failed to occur at the end of the trial in GLP-1RA or SGLT2i treated arm. The proportion and patient characteristics of GLP-1RA responders and SGLT2i responders will be presented for exploratory analysis.
It would facilitate clinical decisions if we can identify the drug responders at the early stages of treatment. We will incorporate the information at different stages of treatment into our dynamic risk models. Gathering patients’ information at baseline, 1st follow-up and 2nd follow-up, our model will estimate the patients’ risks for outcome 3 times (recorded as risk0, risk1 and risk2) in a dynamic manner. A patient estimated to have a decreased risk during disease courses (i.e. 30% risk reduction from baseline) will be defined as an ‘interim responder’. The concordance rate will be measured between final responders and interim responders and the characteristics of interim responders will be presented as an exploratory analysis.

Requested Studies:

A Long Term, Randomised, Double Blind, Placebo-controlled Study to Determine the Effect of Albiglutide, When Added to Standard Blood Glucose Lowering Therapies, on Major Cardiovascular Events in Patients With Type 2 Diabetes Mellitus
Data Contributor: GlaxoSmithKline
Study ID: NCT02465515
Sponsor ID: GLP116174

Action to Control Cardiovascular Risk in Diabetes (ACCORD)
Data Contributor: BioLINCC (a data-sharing platform funded by the National Institutes of Health)
Study ID: NCT00000620
Sponsor ID: 123

A Randomized, Double-blind, Event-driven, Placebo-controlled, Multicenter Study of the Effects of Canagliflozin on Renal and Cardiovascular Outcomes in Subjects With Type 2 Diabetes Mellitus and Diabetic Nephropathy
Data Contributor: Johnson & Johnson
Study ID: NCT02065791
Sponsor ID: CR103517

A Randomized, Multicenter, Double-Blind, Parallel, Placebo-Controlled Study of the Effects of JNJ-28431754 on Cardiovascular Outcomes in Adult Subjects With Type 2 Diabetes Mellitus
Data Contributor: Johnson & Johnson
Study ID: NCT01032629
Sponsor ID: CR016627

A Randomized, Multicenter, Double-Blind, Parallel, Placebo-Controlled Study of the Effects of Canagliflozin on Renal Endpoints in Adult Subjects With Type 2 Diabetes Mellitus
Data Contributor: Johnson & Johnson
Study ID: NCT01989754
Sponsor ID: CR102647

Public Disclosures:

HUANG, Q., ZOU, X., BOYKO, E.J. and JI, L., 2024. 925-P: Using Machine Learning to Predict Dynamic Cardiovascular Risks and Identify the Treatment Responses of Canagliflozin in Type 2 Diabetes Mellitus. Diabetes, 73(Supplement_1). Doi: 10.2337/db24-925-P