Comparing methods of selecting and adjusting for confounding factors in clinical trial

Lead Investigator: Weituo Zhang, Shanghai Jiao Tong University School of Medicine
Title of Proposal Research: Comparing methods of selecting and adjusting for confounding factors in clinical trial
Vivli Data Request: 9866
Funding Source: National Natural Science Foundation of China
Potential Conflicts of Interest: None

Summary of the Proposed Research:

In observational studies (research studies in which researchers collect information from participants or look at data that was already collected), utilizing large-scale routine healthcare electronic records data, there often exist numerous high-dimensional confounding factors. High-dimensional typically refers to the number of features in a dataset, generally indicating features of four dimensions or more. High-dimensional datasets encompass a vast array of features. In medicine, these features can include patient demographics, individual disease conditions, clinical measurements, and imaging characteristics. A cofounding factor is an unmeasured variable that influences both the supposed cause and effect and can mask true causal effects, leading to spurious associations (misleading correlation between 2 variables). Usually researchers employ confounding/statistical methods to adjust for this, to obtain unbiased estimates of outcomes. Examples of these statistical methods are: multivariable analysis, propensity score analysis, least absolute shrinkage and selection operator (LASSO), high-dimensional propensity scores(hdPS), as well as statistical methods based on machine learning, such as targeted maximum likelihood estimation(TMLE).

In this study, we will generate a set of virtual confounding variables with distinct distributional characteristics based on the baseline information from the subject database. Utilizing random generation techniques, we will introduce these variables and, according to the distributional profiles of each confounding variable, apply selective omission to the original dataset. This process will result in the creation of novel datasets with varying distributions and corresponding biases. We will then apply statistical methods mentioned above to correct for confounding in each dataset. We will compare the effect sizes of drug efficacy derived from various confounder screening and adjustment methods across biased datasets of different types and levels of bias, with the actual effect sizes from the original study within the application dataset. Furthermore, we will comprehensively demonstrate the practical efficacy of these confounder screening and adjustment methods using metrics that assess regression performance, such as the Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and additional indicators of regression accuracy.By comparing the effect estimates obtained through these methods with the original clinical trial results, we will evaluate the actual correction performance of various methods on confounding factors in practical use. Our aim is to identify the best confounding adjustment method for obtaining unbiased estimates under different confounding scenarios.

In addition, the results obtained from this study will be beneficial for practical clinical research. Our research team is dedicated to exploring scientific issues related to lung cancer, including the etiology of lung cancer and clinical trials of targeted therapies for lung cancer. Therefore, we have chosen to apply for this dataset as a high-quality clinical trial dataset to investigate the effectiveness of methods for selecting and adjusting confounding factors, enabling us to quickly choose appropriate adjustment methods for future lung cancer-related research.

Requested Studies:

Randomized, Open-Label, Phase 3 Study of Pemetrexed Plus Carboplatin and Bevacizumab Followed by Maintenance Pemetrexed and Bevacizumab Versus Paclitaxel Plus Carboplatin and Bevacizumab Followed by Maintenance Bevacizumab in Patients With Stage IIIB or IV Nonsquamous Non-Small Cell Lung Cancer
Data Contributor: Lilly
Study ID: NCT00762034
Sponsor ID: 9707