Reproducible Machine Learning (ML) for Tumor Growth Inhibition (TGI) - can ML methods support the validity of conventional TGI metrics?

Lead Investigator: Andreas Meid, Heidelberg University Hospital
Title of Proposal Research: Reproducible Machine Learning (ML) for Tumor Growth Inhibition (TGI) – can ML methods support the validity of conventional TGI metrics?
Vivli Data Request: 6543
Funding Source: None
Potential Conflicts of Interest: None

Summary of the Proposed Research:

Reproducibility is a cornerstone of science and provides the basis for successful replication; reproducible results under the same conditions support the methodology and design to be applied to an independent setting. This is all the more important for machine learning (ML) techniques that are often considered as a “black box”. Here, it is fundamentally necessary to prove reproducibility and consistency with established approaches before such advanced methods can be validly applied to study further questions and potentially improve prediction of overall survival (OS) in model-informed drug development. Only then, medical science and patient care can profit from such analyses.

In order to do so, this project aims to reproduce conventional approaches to model tumor growth inhibition (TGI) and ML methods as already been published. In particular, conventional (maximum-likelihood-based) regression analyses will be complemented by the four ML methods of lasso, boosting, random forest, and kernel machine to investigate their (comparative) predictive performance, to explore the interactions between predictor variables, and to incorporate nonlinear relationships for the prediction of OS.

The project is an exploratory analysis of the OAK randomized controlled trial (RCT) comparing OS with atezolizumab versus docetaxel. All patients included in the primary population (each 425 patients randomly assigned to receive atezolizumab or docetaxel, respectively) will be considered for predictive models for the co-primary endpoint OS. Patient characteristics at study baseline will be considered as (candidate) predictors in each methodological approach (conventional regression, lasso, boosting, random forest, and kernel machine)

• What design and methods you have you chosen and why (in brief);
The prediction models are based on analyses of event times (i.e., overall survival). Among standard procedures, semi-parametric Cox regression and parametric, so-called Accelerated Failure Time (AFT) models with corresponding link functions for the event times are used here (e.g., transformations corresponding to the Weibull, exponential or lognormal distribution). Performance measures for these classes of models include prediction error curves or the Brier score.

Lasso procedures, also considered, are intermediate between standard procedures and machine-learning procedures because the so-called shrinkage parameter for effect estimation of independent variables comes from a (data-driven) cross-validation. The Lasso procedure is applicable to Cox and AFT models.

Among the pure machine-learning procedures, the so-called Cox boosting and random survival forests are implemented, while the latter require tuning various hyperparameters, including the number of variables randomly selected as candidates for splitting a node and the number of random splits considered for each candidate splitting variable.

Statistical Analysis Plan:

Four ML methods will be implemented in the project for covariate selection and log(hazard) or log(OS) predictions with censoring, for which the aforementioned baseline patient characteristics were tested as candidate predictors. Concerning variable selection using each of the ML methods, three options will be examined: (1) excluding all covariates from the model, (2) retaining only the important covariates based on the tuning parameter (lambda), and (3) including all prespecified covariates. By this, onr can compare the predictive value of covariate information in each respective modeling approach (conventional regression; lasso, boosting, random forest, and kernel machin). Variable importance will assess the importance of particular predictors in each modeling approach separately. The prediction models based on the ML approaches will be trained using internal validation from bootstrap sampling with replacement of the same size as the original dataset and a cross-validation step of 50 (B = 50 samples). Prediction performance will be assessed by the Brier score.

Of note, the original text included the word “we” which raised questions about the research team. The grammatical form of “we” may refer to the wide-ranging research environment. It is clear that only the named person will have access to the data and will also analyze them alone and independently in accordance with the proposal. For scientific queries, expert opinions or recommendations, colleagues will sometimes be consulted who can provide independent advice. Among them is the working group of Dr. Meid (in person here Lucas Wirbka) and long-term cooperation partners at the Technical University of Dortmund, Faculty of Statistics, working group “Statistical Methods for Big Data” (in person Prof. Andreas Groll, Alexander Gerharz).
On the other hand, there is a specific mention of “By this, we can compare the predictive value […]” which is to be understood quite neutrally. Alternatively, “By this, one can compare the predictive value […]” could be written congruently.

Requested Studies:

A Phase III, Open-Label, Multicenter, Randomized Study to Investigate the Efficacy and Safety of Atezolizumab (Anti-PD-L1 Antibody) Compared With Docetaxel in Patients With Non-Small Cell Lung Cancer After Failure With Platinum Containing Chemotherapy
Data Contributor: Roche
Study ID: NCT02008227
Sponsor ID: GO28915

Public Disclosure:

Meid, A., Gerharz, A., Groll, A. Machine learning for tumor growth inhibition: Interpretable predictive models for transparency and reproducibility. CPT: Pharmacometrics & Systems Pharmacology. 2022 Feb 01. doi: 10.1002/psp4.12761