**Lead Investigator:** Jennifer Lund, University of North Carolina at Chapel Hill

**Title of Proposal Research:** Improving Colon Cancer Therapy Decisions by Extending Trial Representation

**Vivli Data Request:** 7848

**Funding Source:** This work has been funded by the Patient-Centered Outcomes Research Institute (PCORI). This is a non-governmental agency – our funding is through a contract (ME-2017C3-9337).

**Potential Conflicts of Interest:** None

**Summary of the Proposed Research:**

Colon cancer is the third most common cancer diagnosed among men and women in the United States. Treatment for colon cancer often includes surgery followed by treatment with chemotherapy. When choosing a chemotherapy, doctors and patients generally discuss the trade-offs for the treatment, including its benefits – like reducing the chance that the cancer will come back – and its harms – like increasing the chance of having an adverse event that requires you to unexpectedly go to the hospital.

Information about the benefits and harms of chemotherapy are carefully studied in clinical trials. However, getting involved in a clinical trial can be difficult and as a result, cancer patients that participate are often different from those who do not. Studies have shown that less than 5% of all cancer patients enroll onto cancer treatment trials, and specific groups of patients, including adults over the age of 70, are less likely to participate at all. Because of this, it can be difficult for doctors to tell their older patients what the benefits and harms of a given treatment are, as people like them were not included in large enough numbers in the clinical trial.

In this study, our goal is to come-up with better information about the benefits and harms of treatment that can be used by doctors and a wide variety of colon cancer patients, including older patients, to make decisions about what chemotherapy to choose. We will compare the benefits and harms of two different chemotherapy treatments for colon cancer and look to see whether a shorter period of treatment (like 3 months) can provide the same benefits as a longer period of treatment (like 6 months), but with less harm. For this study, we will use information from colon cancer patients that participated in a large clinical trial of chemotherapy and information from colon cancer patients who were not treated in a clinical trial, but were instead treated by a doctor in regular clinic setting. The results of this study have the potential to help doctors and patients choose the chemotherapy with the most benefit and least harm, and in turn, improve patients overall health and quality of life.

**Statistical Analysis Plan:**

Effect measure of interest

Our overarching goal is to model the per-protocol effects of three adjuvant chemotherapy strategies using the MOSAIC trial data, including stage II and III colon cancer patients randomized to received 6-months of Fluorouracil, Folinic Acid, Oxaliplatin (FOLFOX) or 6-months of 5-Fluorouracil (5FU). Specifically, we are interested in the effects of these three strategies on all-cause mortality and the risk of grade 3-4 toxicities. We will estimate absolute risk differences for mortality at 1, 3, and 5 years and for grade 3-4 toxicities at 1, 3, 6, and 12 months.

Statistical analysis methods

We will use the complement of Kaplan-Meier methods (i.e., cumulative incidence) to estimate absolute risk differences for outcomes of interest, comparing the three specific adjuvant chemotherapy strategies of interest. For adjustment, we will combine state-of-the-art epidemiological and statistical methods, inverse odds of sampling weighting (Westreich, AJE, 2017) and the parametric g-formula (Robins and Hernan, Advances in longitudinal data analysis, 2009), which can be used to flexibly generate various treatment effects. Using these methods, we will reanalyze the MOSAIC data, which will account for time-varying confounding and selection bias. Because we are interested in time-varying treatment protocols (e.g., 3 months versus 6 months of FOLFOX), the parametric g-formula is a preferred strategy over other methods, as it has been shown to be less sensitive to sparse data. Researchers in our group have separately applied inverse odds of sampling weights and the parametric g-formula in studies of HIV and cardiovascular disease, as well as in occupational epidemiology settings, but to our knowledge, the two methods have not been combined nor applied to the oncology setting.

As we want to make inference to the target population of patients treated within the US Oncology network, we are concerned about accounting for potential effect measure modification. To account for effect measure modification, we will estimate inverse odds of sampling weights (see Westreich, AJE, 2017). These weights will be used to “standardize” the treatment effects estimated from the MOSAIC trial to the US Oncology population. We will include all measured risk factors for the outcomes of interest in our inverse odds of sampling models, as these are the factors that can contribute to effect measure modification on either the absolute or relative scales.

Planned adjustment for covariates

Our analyses will consider the following covariates:

Demographics: age, sex, and race (White, Black, Other).

Tumor features: tumor size, number of positive lymph nodes (0, 1-3, 4+), American Joint Commission on Cancer (AJCC) stage, tumor location (right vs. left (breast) or colon vs. sigmoid/ other), differentiation (well, moderately, poor), obstruction or perforation, and carcinoembryonic antigen (CEA) level.

Health status: body mass index; Eastern Cooperative Oncology Group performance status (0, 1, 2).

Chemotherapy completion: number of cycles and any dose reductions, delays, or omissions at each cycle, and the relative dose intensity of the regimen received

Time-varying confounding and selection bias will be addressed in all analyses. Even in clinical trials, confounding of per-protocol estimates may result from post-randomization factors that influence treatment compliance (e.g. adverse effects) and are independently associated with mortality. We will consider adverse events (grade 3-4 toxicities) and Electrocochleography (ECOG) status to be potential time-varying confounders affected by prior treatment for Aim 2 and 3 analyses.

We propose a novel analytic strategy to estimate the effectiveness of various treatment protocols versus historical (5FU) and contemporary (FOLFOX delivery) comparators on all-cause mortality using a combination of inverse odds of sampling weights and the parametric g-formula. We will execute these analyses using the following three steps.

1. Probability modeling (Trial data only). Using the person-cycle dataset for trial populations, we will first fit a pooled logistic model (i.e., a logistic model fit to all person-period observations) to estimate the log-odds of each of the time-varying confounders (i.e. grade 3-4 adverse events and ECOG performance status >2) and outcomes (death, censored/drop-out), for each person-cycle.

2. Monte Carlo sampling (Trial data only using inverse odds of sampling weights). The next step will then be to take the baseline trial population (n=2246 in MOSAIC) and re-sample the trial 100,000 times to create a “pseudo-population”. We will use weighted re-sampling to incorporate the inverse odds of sampling weights. As a result, the baseline characteristics of a given pseudo-population will reflect that of each of the annual routine care cohorts. The overall size of the resampling is based on the need to balance our ability to minimize simulation error with computational constraints.

We will create a separate pseudo-population for each treatment protocol of interest (e.g., fully adherent to 6-months of FOLFOX). In each pseudo-population, we will use the parameter estimates generated from step 1, baseline covariates, and the treatment protocol of interest and then simulate time-varying confounder and outcome data, as described in detail by Keil, Epidemiology, 2017. Starting at time t=1 (the first cycle), we will calculate the probability that each time-varying covariate (confounders and outcomes) will take on the probability “1”. Using these probabilities, we will then assign values to each pseudo-patient for all time-varying covariates based on a draw from a Bernoulli distribution. After performing this step at time t=1, it is then performed for time t=2 using the model coefficients from Step 1, values of time-varying confounders that were simulated at t=1, and the treatment protocol of interest. This process is repeated for time t=3,4 and so on until mortality or the end of follow-up at 5-years, whichever comes first.

3. Estimation of effectiveness (Based on pseudo-populations). We will repeat

Step 2 for each of the treatment protocols of interest. Because the pseudo-population emulates a closed cohort, we can estimate cumulative risk by counting the number of simulated deaths by a given time and dividing that number by 100,000. To compare the effectiveness of alternative treatment protocols, we will then estimate the cumulative risk differences for mortality at 1-, 3-, and 5-years. Ninety-five percent confidence intervals for the risk difference (and ratio) estimates will be generated using non-parametric bootstrap techniques (by re-running Steps 1-3 a large number (i.e., 400) times) on random samples, with replacement, and then taking the standard deviation of the cumulative risk difference estimates to approximate the standard error.

Power to detect a clinically important effect, or the precision of the effect estimate given the sample size available

The MOSAIC trial was designed to have 90% power using a two-sided alpha of 0.05 to detect a 6% difference in 3-year disease-free survival, which has been shown to be a good proxy for 5-year overall survival in adjuvant colon cancer trials.

Our analyses use stabilized inverse odds of sampling weights, creating a weighted trial population. Statistical power for the intention-to-treat effect estimates in the weighted trial data will be slightly reduced compared with the original trial. In the MOSAIC trial, the intent-to-treat (ITT )estimate for Disease Free Survival (DFS) was a hazard ratio of 0.82 (0.72, 0.93), resulting in a confidence limit ratio (CLR) of 1.29. Under varying scenarios of inverse odds of sampling weights, we estimate that precision of the same point estimate could range from a CLR of 1.44-1.68. As the goal of our analysis is to estimate the effects of alternative chemotherapy approaches in a target population of interest (ie., the US Oncology population), which has a set sample size, we do not include formal power calculations here.

Planned sensitivity analyses

In sensitivity analysis, we will vary model specifications used in the parametric g-formula modeling and include potential interactions into the time-varying outcome models.

Planned subgroup analyses

Although estimates will likely be imprecise, we plan to stratify our results by age group and stage, as these are clinically relevant subgroups of interest.

Handling of missing data

Briefly, missing data can occur in both the MOSAIC trial and US Oncology data; however, based on preliminary published data, missingness for demographic, tumor, and clinical characteristics is expected to be low. To address missing data, we propose to first use multiple imputation with chained equations and to compare those results to complete case analyses to assess the sensitivity of our findings to missing at random assumptions.

**Requested Studies:**

Multicenter International Study of Oxaliplatin/ 5FU-LV in the Adjuvant Treatment of Colon Cancer

Data Contributor: Sanofi

Study ID: NCT00275210

Sponsor ID: EFC3313

The Ontada/US Oncology iKnowMed Electronic Health Record (EHR) database is an oncology-specific, integrated, web-based EHR system capturing outpatient encounter data for patients treated at >400 community oncology practices. Overall, iKnowMed captures about 10% of newly-diagnosed cancer patients in the US (≈750,000) annually, including over 1,000 physicians from community-based oncology practices from 2008-2020. In the proposed study, we will statistically transport the treatment effects observed within the MOSAIC trial to a target population of patients with treatment data captured within the iKnowMed database. Specifically, we will use de-identified, patient-level data on all stage II/III colon cancer patients treated within the US Oncology network from 2008-20. This data source is ideal for this study because they contain accurate and detailed information from clinical practice settings regarding patient demographics, clinical variables, functional assessments (ECOG), tumor characteristics, and therapy delivery (e.g., dose, timing of administration, regimen). The iKnowMed data have been used to address an array of cancer-related research questions.

Data Contributor: I WILL BRING MY OWN

Sponsor ID: Ontada/US Oncology iKnowMed data

**Public Disclosures:**

Lund, J.L., Webster-Clark, M.A., Westreich, D., Sanoff, H.K., Robert, N., Frytak, J.R., Boyd, M., Shmuel, S., Stürmer, T. and Keil, A.P., 2024. Visualizing External Validity: Graphical Displays to Inform the Extension of Treatment Effects from Trials to Clinical Practice. Epidemiology, 35(2), pp.241-251. Doi : 10.1097/EDE.0000000000001694