Two-sample testing for clinical trials data with missing values

Lead Investigator: Dean Bodenham, Imperial College London
Title of Proposal Research: Two-sample testing for clinical trials data with missing values
Vivli Data Request: 8957
Funding Source: All researchers are employees of Imperial College London. There is no specific funding allocated to this project. As part of our employment, we undertake research on a variety of projects.
Potential Conflicts of Interest: None

Summary of the Proposed Research:

Two-sample testing is a crucial statistical tool used in clinical trials to compare the effects of different treatments or interventions. It assesses the statistical difference between two groups, typically a control group and an experimental group, providing robust evidence of treatment efficiacy or differences in outcomes. These statistical tests help ensure the validity of clinical trials results, allowing medical researchers to draw conclusions with confidence. Ultimately, this contributes to medical advancements, enhancing our understanding of disease treatment and patient care, and supporting evidence-based healthcare decisions. Without two-sample testing, proving the efficacy of new therapies would be considerably more challenging. Two examples of popular statistical two-sample tests are Student’s t-test and the Wilcoxon-Mann-Whitney test.

Missing data in clinical trials can occur due to various reasons such as patient dropping out, not taking the medication as they have been directed, or not coming back for follow-up appointments. It presents a significant challenge in statistical analyses because it can introduce bias, reduce statistical power (i.e. the sensitivity of the statistical test to detecting an effect when there actually is one), and undermine the validity of the results. Missing data can distort the true effect of a treatment or intervention, which can possibly result in misleading conclusions. If not handled appropriately, it can compromise the reliability of the trial findings, affecting subsequent clinical decision making and potential advancements in healthcare. Therefore, it is critical to develop strategies to prevent, monitor and manage missing data in clinical trials. Two of the most popular methods for handling missing data are to either ignore the missing data or to impute (guess) the missing values; both of these strategies carry risks.

The requested study data investigates the efficacy of a drug that could possibly treat diabetes, and contains several separate treatment groups. In each treatment group, each subject is given the same treatment, either a placebo (no drug) or the experimental drug at a certain dosage level. For each treatment group, one sample consists of baseline (pre-treatment) measurements, and the second sample consists of post-treatment measurements. However, in each treatment group, there are some post-treatment measurements missing. In our research, we shall analyse the requested study data using a newly-developed statistical two-sample test. This two-sample test is robust to missing data and does not rely on imputation, giving a statistical result with a guaranteed false positive rate (i.e. the proportion of results categorized as positive which are not actually positive is controlled to below a pre-specified threshold). The first goal of our research is to confirm the statistical conclusions reached in other statistical analyses of this clinical trial which relied on imputing (guessing) the missing values. The second goal of our research will be to use this example to demonstrate the effectiveness of our new statistical test for analysing clinical trials data which contains missing values. This will provide a powerful new statistical tool for clinical trials researchers that will enable more robust conclusions fostering the development of more effective and safer treatments.

Statistical Analysis Plan:

The NCT01874431 clinical has already grouped patients into different treatment groups, namely a placebo group and seven treatment groups. The variable of interest is the Albumin-to-creatinine ratio (UACR). The rationale behind selecting this study is that each treatment group’s sample of post-treatment measurements of the Albumin-to-creatinine ratio (UACR) levels contains missing values. This is because some patients dropped out of the trial before it was completed, and so no post-treatment measurements for those patients were taken. The number of missing values for each group is less than 15% of the total number of patients who started the trial in that group. Our proposed statistical two-sample test can analyse data where the samples contain missing values.

For each treatment group, we shall compare the Albumin-to-creatinine ratio (UACR) levels at the start of the clinical trial with the UACR levels at the end of the clinical trial via a novel statistical two-sample test. For each treatment group, the first sample contains the UACR values at the start of the trial, and the second sample contains the UACR values at the end of the trial. Several patients from each group dropped out of the clinical trial before it was completed, so each second sample will contain missing values. Our proposed two-sample test will compare the two UACR samples for each treatment group, while taking the number of missing values into account.

The statistical analysis will be done using the R programming language.

Requested Studies:

A Randomized, Double-blind, Placebo-controlled, Multi-center Study to Assess the Safety and Efficacy of Different Oral Doses of BAY94-8862 in Subjects With Type 2 Diabetes Mellitus and the Clinical Diagnosis of Diabetic Nephropathy
Data Contributor: Bayer
Study ID: NCT01874431
Sponsor ID: 16243

Public Disclosures:

Zeng Y., Adams N.M., Bodenham D.A. 2024. On two-sample testing for data with arbitrarily missing values. arXiv preprint. Doi : 10.48550/arXiv.2403.15327