**Lead Investigator:** Shirley Wang, Brigham and Women’s Hospital

**Title of Proposal Research:** Understanding effectiveness of new drugs in older adults shortly after market entry

**Vivli Data Request:** 5950

**Funding Source:** National Institute of Aging

**Potential Conflicts of Interest:** In the last 3 years, Dr. Wang has received salary support from Boehringer Ingelheim, Novartis Pharmaceuticals, and Johnson & Johnson for unrelated work

**Summary of the Proposed Research:**

When a new drug enters the market, comparative effectiveness evidence is often comprised solely of evidence from the randomized clinical trials (RCTs) which led to regulatory agency approval. Older adults are major consumers of drugs and other therapeutics and underrepresented in randomized clinical trials. During the critical early period after a new drug enters the market, evidence from RCTs may not reflect the average experience for the types of patients actually treated and there is insufficient accrual of experience in longitudinal healthcare databases for robust evidence generation.

At this critical juncture, combining evidence can enhance understanding of net benefit of new drugs in the older, sicker populations who are actually treated in practice. However, appropriate methods for integrating pre-market RCT and early post-market comparative effectiveness evidence to guide clinical practice have not yet been identified. We will explore these issues in two case studies. For each case study, we will obtain individual level data from pre-market RCTs and create observational cohorts comprised of initiators of the new drug and comparator using Medicare data. We will use complex weighting techniques and discrete event simulation to re-distribute baseline characteristics from RCT participants to match the distribution of characteristics observed in routine care of older adults in Medicare during early as well as later post-market experience with the new drugs.

This project will produce a framework for combining pre-market RCT and early post-market evidence as a means to accelerate understanding of treatment effectiveness in older adults with multiple comorbidities. Use of this framework will provide early insights and clinical guidance to geriatricians on use of new drugs in their patients shortly after market entry. Because the methods used in this project are designed to provide early evidence that reflect average effectiveness in the types (and subtypes) of patients actually treated as part of routine care as opposed to average effectiveness in participants of a trial, the impact of this project will be particularly profound for new medications that target older, sicker patients typically underrepresented in trials. The output of this project will be made public as publications in peer-reviewed journals.

**Statistical Analysis Plan:**

We will conduct analyses of the selected RCTs and conduct parallel studies using large observational healthcare claims from Medicare. The analyses in the RCTs will use variables defined and collected by the original investigators. The observational cohorts will use claims based algorithms to define similar clinical concepts for patients treated in routine care.

The observational cohorts will include new initiators of dabigatran or warfarin following a specified washout period (e.g. 365 days) during which the patient has continuous enrollment and no recorded use of either study drugs. We will require a window of continuous enrollment prior to the cohort entry date in order to capture baseline characteristics, identified via diagnoses and procedures during healthcare encounters. Observational cohorts will be restricted to patients with relevant indications; for example, restricting to patients with at least 2 diagnoses of atrial fibrillation during the year prior to the index date for the observational counterpart to the RE-LY trial. Inclusion-exclusion criteria will be defined to create a “trial-eligible” population identified from claims data for patients treated as part of routine care. Patients will be followed from index date for a pre-specified duration akin to ‘intention to treat’ analysis of randomized clinical trials as well as followed from index date until medication discontinuation, add on, or switch for ‘as treated’ analyses.

For each case study (RE-LY and RE-COVER II), we will define relevant benefit and safety outcomes based on claims algorithms which have previously been validated in other studies using Medicare or similar data sources. These claims-based algorithms will be chosen to parallel the primary efficacy and safety outcomes of the trials. Baseline risk factors for these outcomes that were measured in the RCTs will also be measured in the observational cohorts using claims-based algorithms. For example, conditions included in the CHA2DS-VASc risk score for stroke or the HAS-BLED risk score for major bleeding in patients with atrial fibrillation. All risk factors will be assessed prior to the index date for cohort entry to maintain clear temporality and avoid adjustment for intermediate variables.

Adjusting for confounding in observational data:

Unlike the cohort of patients participating in clinical trials, patients identified from observational data sources will not be randomly assigned to alternative therapeutic agents. Confounding in observational studies occurs when there are imbalances in risk factors for an outcome between patients exposed to treatment A compared to treatment B. In order to obtain an estimate of the average treatment effect for elderly patients in whom there is “common support”, we will:

1. Identify a cohort of patients who initiate either the new to market drug or an appropriate comparator

2. Identify benefit and safety outcomes of interest during follow up (on therapy or intention to treat).

3. Estimate a propensity score, using logistic regression to predict the probability of initiating the new drug versus the comparator given known risk factors for the outcome of interest.

4. Exclude patients with propensity scores outside the overlapping region for initiators of the new drug and the comparator.

5. Calculate stabilized inverse probability of treatment weights (S-IPTW) to create pseudo-populations in which the distribution of baseline characteristics for initiators of the new drug and the comparator match the overall distribution in the population with observed clinical equipoise in treatment selection.

6. Estimate treatment effects (e.g. rate difference) within the weighted population that are not confounded by measured patient characteristics included in the weights.

We will exclude patients who have propensity scores outside the overlapping region because for these patients, the clinical community is observed to be clear regarding which treatment is appropriate (e.g. contraindication in a subgroup). Our interest is in average treatment effectiveness and safety in real world patients for whom there is observed empirical equipoise, as represented by the regions of the propensity score distribution where there is overlap between initiators of the new drug and the comparator.

Approach 1: Stabilized inverse probability of treatment weighting

Among patients within the region of observed clinical equipoise, the S-IPTW will be calculated with the marginal probability of the treatment initiated as the numerator and the conditional probability (e.g. propensity score) for the treatment initiated as the denominator. If we let exposure to the new drug E equal 1 for patients who initiate the new drug and 0 for patients who initiate the comparator, PE equal the marginal probability of exposure, and PS equal the propensity score, the S-IPTW can be calculated using the following formula: S-IPTW = E*PE/PS + (1-E)*(1-PE)/(1-PS).

1.A. Adjusting RCT data to represent target population

Due to differences in the distribution of baseline characteristics between patients who participate in the randomized clinical trials leading to regulatory approval and patients who are treated following market entry (routine care patients), in the presence of treatment effect heterogeneity, simply combining individual level data from each source may not provide an estimate of the parameter of interest, that is, the average treatment effect for the population actually treated in routine care.

In order to utilize RCT data to estimate the average treatment effect among routine care patients, we will re-weight the individual RCT participants such that the overall distribution of baseline characteristics matches that of the routine care patients. We will do this by:

1. Identifying all baseline characteristics recorded from the RCT (including characteristics used for inclusion/exclusion criteria) which can be captured using claims based algorithms in observational data.

2. Estimating a propensity score, using logistic regression to predict the probability of being treated in routine care versus a participant in a trial given baseline characteristics.

3. Identifying and characterizing patients falling outside the region of propensity score overlap.

4. Calculating standardized morbidity ratio weights (SMR) to create a pseudo-population of RCT participants whose distribution of measured baseline characteristics matches the distribution in the population with observed clinical equipoise in treatment selection.

SMR weights can be calculated by assigning a weight of 1 to all routine care patients and assigning all RCT patients a weight equal to the propensity score odds (e.g. PS/(1-PS)). A similar strategy can be employed to re-weight routine care patients to the distribution of characteristics observed in the RCT data. By re-weighting RCT participants to have baseline characteristics distributed similarly to what is observed in the real world, we can integrate evidence from RCT and observational studies to accelerate understanding regarding effectiveness of new drugs when used in routine care populations.

1.B. Compare estimates from RCT data, observational data, and combination

Using the methods described above, we will be able to obtain estimates of treatment effectiveness from:

1. RCT participants: the original RCT population,

2. Patients treated as part of routine care: patients for whom there is observed clinical equipoise regarding treatment with a new drug versus an older comparator in longitudinal healthcare databases (within 3, 6, 12, 24 months after the new drug enters the market),

3. Re-weighted RCT: the RCT population re-weighted to match the distribution of baseline characteristics for patients treated as part of routine care

4. Re-weighted routine care patients: patients treated as part of routine care re-weighted to match the distribution of baseline characteristics in RCT participants

5. Re-weighted RCT + patients treated as part of routine care: combination of 2 and 3

Numbers 1 and 4 represent average treatment effectiveness in the RCT eligible population whereas 2, 3, and 5 represent average treatment effectiveness in the target population. If we compare estimates from the original RCT population to estimates obtained from the patients in observational data who would have been eligible to participate in the RCT (1 and 4) and obtain very similar results, this would imply that simply re-weighting individual RCT participants to obtain a distribution of baseline characteristics similar to that of the population that is observed to be treated could help guide geriatricians shortly after a new drug enters the market, before much data has accrued in longitudinal databases. If the estimates are different in a clinically meaningful way, this would raise questions regarding efficacy of the new drug in ideal conditions versus effectiveness of the drug in routine clinical practice as well as questions regarding potential for residual bias or misclassification in observational data. By comparing the point estimates and precision of estimates for 2, 3, and 5 at intervals after the new drug enters the market, we can gauge the added value of combining evidence from both sources of data during the early time immediately post market entry when there is limited experience with the drug captured in longitudinal healthcare databases. Benefit and safety outcomes will be evaluated for each case study.

Approach 2: Discrete Event Simulation

2.A. Developing and validating outcome prediction models using individual level data for RCT participants

We will first develop and validate outcome prediction models that describe the relationship between outcomes of interest (e.g., stroke, myocardial infarction, major bleeding) and patient-level covariates available in the RCT data. We will randomly divide the RCT population into a development and a validation dataset of equal sizes. In the development set, we will use regression models to regress the outcomes of interest (e.g., stroke) on covariates, such as patient characteristics, co-morbidities, use of concomitant medications, and history of events, stratified by assigned treatment. Specification of the statistical model will be based on the type of particular outcomes and availability of covariates in the Medicare data. The estimated statistical models then will be used to predict the occurrence of outcomes in the validation data set. Performance of estimated prediction models will be assessed using various statistics that provide measures of discrimination (e.g., c-statistics) and calibration (e.g., Hosmer-Lemeshow test).

2.B. Developing a discrete event simulation model that uses the outcome prediction models and patient information in the RCT to replicate the RCTs results

In the next step, we will develop a discrete event simulation model that uses the prediction models developed in previous step and information on patient characteristics from the RCT to replicate the RCT results. Building this simulation model involves three main steps: (1) generating a hypothetical cohort of patients with covariate distributions that match the RCT population; (2) designing a model structure based on possible pathways and health states that patients may experience over time; and (3) incorporating outcome prediction models in the discrete event simulation model to define probability of outcomes based on patient-level covariates. The simulation model then will estimate expected outcomes of RCT patients conditional on their characteristics, assuming that the outcome prediction models and model structure reflect, with sufficient accuracy, the true relationship among covariates and outcomes. If the model structure and assumptions are specified correctly, the model will accurately replicate the overall RCT outcomes for a cohort of patients similar to RCT participants.

2.C. Simulating expected outcomes for patients in the Medicare population using the discrete event simulation model.

We will use the discrete event simulation model developed in step B. among RCT patients to predict outcomes in the Medicare population. To do this, we will modify the baseline covariates of simulated cohort to mimic Medicare population that are likely use these medications. Distribution of covariates will be based on analysis of Medicare population using standard statistical approaches.

The estimated results in C. yield what the results of the RCT would have been had it been conducted in a population similar to that in the observational study. Differences between the predicted Medicare results and predicted RCT results therefore quantify HTE and provide reasonable expectation about the extent to which the RCT results will generalize to Medicare population.

Effect measure of interest: In this project, we are primarily focusing on absolute treatment effects, specifically rate differences per 100 patient years and associated confidence intervals. We are secondarily interested in rate ratios with confidence intervals.

Methods to control for bias: Restriction, stabilized inverse probability of treatment weighting, standardized morbidity ratio weighting

Assumptions:

• There are no unmeasured confounders not captured by the measured confounders included in the weights to standardize the populations (exchangeability assumption). Presence of unmeasured confounding is untestable, but can be indirectly evaluated with sensitivity analyses varying the strength of unmeasured confounding.

• The counterfactual outcome for each patient under the observed exposure is the observed outcome. (consistency assumption)

• There is a non-zero probability of being exposed for every covariate pattern observed in the data (positivity assumption). This can be evaluated in the data and issues with positivity can be alleviated via restriction to patients who fall within overlapping regions and use of stabilized weights. We will check to see that stabilized weights have a mean of 1.0 and reasonable range.

• Models to generate the weights are specified correctly. Robustness to different specifications of the models can be evaluated.

• Measured patient characteristics from RCT and observational studies capture similar clinical concepts.

Analysis of subgroups:

In addition to confounding, measurement error or differences in adherence, there could be true differences in treatment effect within subpopulations of patients with different risk factors. The limitations of subgroup analyses defined by single characteristics are well known and include issues such as concerns regarding multiple testing, being underpowered of detect heterogeneity in treatment effect when it exists, as well as being inadequate means to capture complexity in the source of treatment heterogeneity. Because stratification on multiple characteristics simultaneously can result in very small numbers within each stratum, we plan to explore heterogeneity in treatment effect along the scale of a summary baseline disease risk score, predicted individual treatment effect (based on counterfactuals), as well as differences in effects for patients treated in routine care who were versus were not trial eligible.

**Requested Studies:**

Randomized Evaluation of Long Term Anticoagulant Therapy (RE-LY) Comparing the Efficacy and Safety of Two Blinded Doses of Dabigatran Etexilate With Open Label Warfarin for the Prevention of Stroke and Systemic Embolism in Patients With Non-valvular Atrial Fibrillation: Prospective, Multi-centre, Parallel-group, Non-inferiority Trial (RE-LY Study)

Sponsor: Boehringer-Ingelheim

Study ID: NCT00262600

Sponsor ID: 1160.26

A Phase III, Randomised, Double Blind, Parallel-group Study of the Efficacy and Safety of Oral Dabigatran Etexilate (150 mg Bid) Compared to Warfarin (INR 2.0-3.0) for 6 Month Treatment of Acute Symptomatic Venous Thromboembolism, Following Initial Treatment (5-10 Days) With a Parenteral Anticoagulant Approved for This Indication (RE-COVER II)

Sponsor: Boehringer-Ingelheim

Study ID: NCT00680186

Sponsor ID: 1160.46

**Public Disclosures:**

- Shin, H., Wang, S., Kim, D.H., Alt, E., Mahesri, M., Bessette, L.G., Schneeweiss, S. and NajafZadeh, M., 2023. MSR40 Predicting Treatment Effects of a New-to-Market Drug in Clinical Practice Based on Phase III Randomized Trial Results. Value in Health, 26(6), p.S285. Doi: 10.1016/j.jval.2023.03.1576
- Shin, H., Wang, S.V., Kim, D.H., Alt, E., Mahesri, M., Bessette, L.G., Schneeweiss, S. and Najafzadeh, M. Predicting treatment effects of a new‐to‐market drug in clinical practice based on phase III randomized trial results. Clinical Pharmacology & Therapeutics. Doi: 10.1002/cpt.2983