Celgene: Classified Mixed Model Prediction for Pancreatic Cancer Research

Classified Mixed Model Prediction for Pancreatic Cancer Research

Lead Researcher : Thuan Nguyen 

Institution : Oregon Health & Science University 

Funding Source : None 

Potential Conflicts of Interest : None 

Data Sharing Agreement Date : 1/22/2016 

Lay Summary 

Many practical problems are related to prediction, where the main interest is at subject level, e.g., personalized medicine, or (small) sub-population level, e.g., small community. In such cases, it is possible to make substantial gains in prediction accuracy by identifying a class that a new subject belongs to. The random effect (e.g., subject-specificity) is introduced in the model and hence utilized as class/cluster identification (for a new observation). Such a random effect also could capture unobserved info that cannot be captured by some available covariate info. With this approach, the new subject is potentially associated with a random effect corresponding to the same class in the training data, so that method of mixed model prediction can be used to make the best prediction. We propose a new method, called classified mixed model prediction (CMMP) which is able to help identify such a class in the context of mixed effect models. The purpose of cluster-identification consideration is to increase the accuracy in prediction. We develop CMMP for both prediction of mixed effects and prediction of future observations, and consider different scenarios where there may or may not be a “match” of the new subject among the training-data subjects. Theoretical and empirical studies have been carried out to study the properties of CMMP, and its comparison with the existing method, e.g. regression-base prediction (RP). The results have shown that, even if the actual match does not exist between the class of the new observations and those of the training data, CMMP still helps in improving prediction accuracy. This statistical research methodological work is at the last stage and ready to submit to a statistical journal. We have found that we could improve the finding results using our proposed method (CMMP) to predict the aforementioned (primary/secondary) end points, compared with the one used in the reference paper (Von Hoff et al., 2013). In this paper, the authors have used simple statistical methods, e.g., Kaplan-Meier, log-rank test, Cox proportional-hazard model regression analysis, Chi-square test. The information of multi-centers of 11 countries with 151 community and academic centers over the North America (63%); Eastern Europe (15%); Australia (14%), and Western Europe (9%), has not taken into consideration in the current analysis. In addition, the authors have included the other covariates, e.g., Age, Sex, Karnofsky performance-status score, Primary tumor location, Liver metastatses, Level of CA19-9, Region in the final models for overall survival and progression-free survival prediction. The question has been raised—Are these the optimal/parsimonious models that could provide the best prediction for the outcomes of interest? If we carelessly throw all variables in the final predictive model, we somewhat burden the complexity of the model. As a consequence, the parameter estimation may be questionable (when the sample size is moderately proportional to the model dimension). Model selection is a natural procedure/step that should be involved in finding such models. Our approach will incorporate a new strategy of a model selection, called Fence Methods, to come up with a more reasonable final predictive model. The Fence Methods is designed for unconventional data, e.g., cluster data, hence mixed effect model is applied. See Jiang et al. (2008, 2009, 2010), Nguyen et al. (2012, 2014) for more details on Fence Methods. Thus, it’s natural to employ this model selection procedure with CMMP approach in predicting the survival time outcomes. Instead of Cox proportional-hazard model, we propose to use is a frailty model which includes both fixed and random effects. This model is often used in survival analysis where the random effect (e.g., subject-specificity) is taken into account. “Subject” here is defined as community center (151 of them). Under this frailty model, we will start with the full model in which the treatment effect and all covariates (mentioned above) are included. Using Fence Methods, we expect to obtain the parsimonious model. Out of 861 patients, we will randomly select 10% of them. The data of this subset patients will be our test set. The rest will serve as training data set; thereafter, CMMP approach is to predict our survival time outcomes. Mean square prediction error (MSPE) will be computed in both approaches—CMMP vs. RP. Statistical Methodology Research Team Jiming Jiang, Ph.D., is a Professor of Statistics at the University of California, Davis. He is best known for his research work in mixed effects models, small area estimation, and model selection, and his books on Linear and Generalized Linear Mixed Models and Their Applications (Springer 2007) and Large Sample Techniques for Statistics (Springer 2010). J. Sunil Rao, Ph.D., is a Professor and Director of the Division of Biostatistics at the University of Miami. His areas of research interest include mixed model selection and prediction, high dimensional model selection, bump hunting, modeling of health disparity data and cancer genomic modeling. Thuan Nguyen, M.D., Ph.D., is an Associate Professor of Biostatistics in the School of Public Health, Oregon Health & Science University. She received M.D. degree in 1995 from Hue University, School of Medicine, Vietnam, and Ph.D. degree in Biostatistics in 2008 from the University of California, Davis. Her areas of research interest include mixed effects models, statistical genetics, model selection, small area estimation, and longitudinal data analysis. Dr. Nguyen has multiple publications in major journals in statistics and biostatistics, including the Annals of Statistics, Journal of the American Statistical Association, Biostatistics, and Statistics in Medicine. The findings will be collaborated with Professor Motomi Mori, Director of the Biostatistics Shared Resource of Knight Cancer Institute at OHSU, along with researchers at the Brenden-Colson Center for Pancreatic Care at OHSU. We plan to establish our future collaboration with these experts in interpreting the finding results from our statistical analysis using the proposed method. We propose to meet with them for our initial consultation, hoping to get some useful suggestions that we could incorporate with our analysis plan. We also would like to request an on-site meeting right after our analysis work is done. We hope they could help us make the findings more insightful. Between our first and last on-site meetings, we would like to maintain our collaboration via email or phone communications monthly, if not weekly. The goal is to help us remain in the right direction clinically and scientifically. With their expertise in pancreatic cancer research, we hope to be able to deliver our work in more insightful and impactful way to our audience of interest. 

Research Design 

In this project, researchers will work to advance statistical methods, theory and application. We propose research in five major areas: (I) Classified mixed model prediction; (II) The E-MS algorithm: Model selection with incomplete data (E-MS); (III) Predictive model selection (PMS); (IV) Mixed model analysis after model selection (MAMS), and (V) Application and Software development. 

Studies Selected and Study Populations 

CMMP is designed to enhance the prediction accuracy at patient level. The class/cluster identification helps reduced prediction errors. The new method that we have developed applies naturally to the pancreatic cancer data. For example, a mixed effects model, more specifically, frailty model, can be built. The fixed covariates would include the treatment indicator [nab-paclitaxel-gemcitabine (NP-GEM) vs. gemcitabine (GEM)], and the variables used to stratify the patients. As for the random effects, which define the groups among the training data, one possibility is community and academic centers in different countries (in which case the countries could be treated as fixed effects). The naïve prediction would be to use the mean survival time for each treatment group. With our classified mixed model prediction method, we expect to provide better prediction of the individual survival time. 

Statistical Analysis Plan (SAP) 

Publication Plan 

Our anticipated publication plan includes publications in three journal categories: I. Statistical/Biostatistical methodology journals, such as Journal of American Statistical Association and Biometrika. II. Applied Statistics/Biostatistics journals, such as Biometrics, Biostatistics, and Statistics in Medicine. III. Journals in medical research, such as Cancer Research. 

Research Team 

Beth A. Vorderstrasse Oregon Health & Science University
Jiming Jiang University of California, Davis
J. Sunil Rao University of Miami
Jie Fan University of Miami 

Other Information 

Primary and Secondary Endpoints for this Research 

We use CMMP to predict (i)Overall survival (primary end point); (ii) progression-free survival (secondary end point); (iii) overall response rate (secondary end point) 

Publication Citation 

The publication citation will be added after the research is published.