What are the re-identification risk scores of publicly available anonymised clinical trial datasets?

Lead Investigator: Aryelly Rodriguez, The University of Edinburgh
Title of Proposal Research:  What are the re-identification risk scores of publicly available anonymised clinical trial datasets?
Vivli Data Request: 7400
Funding Source: None
Potential Conflicts of Interest: None


Summary of the Proposed Research:

There are increasing pressures for anonymised datasets from clinical trials to be shared across the scientific community. Some anonymised datasets are now publicly available for secondary research. However, we do not know if they pose a privacy risk to the involved patients.  We have 3 equations that can be used to calculate the re-identification risk scores using El-Emam’s three derived risk metrics (equations) under the prosecutor and the journalist scenarios for an entire anonymised dataset, using information in the anonymised dataset. Re-identification risk score is estimated probability of any given individual being re-identified from an anonymised/de-identified dataset. The re-identification risk score depends on the variables available in the dataset, the number of observations in the dataset and on the strategy used to attack the dataset (prosecutor or journalist scenario). These equations only generate numbers, and they do not aim to actually re-identify individuals in the datasets. We aim to collect a broad random sample of publicly available, anonymised clinical trial datasets to calculate their re-identification risk scores. Step 1: We will contact data holders and request access to their anonymised datasets following the data owners’ local procedures. Step 2: Re-identification risk scores will be calculated for each dataset, using the 3 equations. Step 3: We will investigate what characteristics of the datasets are associated with increased or decreased risk score, compare the risk scores and their usability, and discuss our findings. To the best of our knowledge, this will be the first study to use these risk of re-identification scores across a range of clinical trials datasets.


Requested Studies:

Immunogenicity and Safety Study of GSK Biologicals’ Quadrivalent Influenza Vaccine (GSK2282512A) When Administered in Children
Data Contributor: GlaxoSmithKline
Study ID: NCT01198756
Sponsor ID: 113314

A Dose-ranging Study of Vilanterol (VI) Inhalation Powder in Children Aged 5-11 Years With Asthma on a Background of Inhaled Corticosteroid Therapy
Data Contributor: GlaxoSmithKline
Study ID: NCT01573767
Sponsor ID: 106853

A Clinical Outcomes Study to Compare the Effect of Fluticasone Furoate/Vilanterol Inhalation Powder 100/25mcg With Placebo on Survival in Subjects With Moderate Chronic Obstructive Pulmonary Disease (COPD) and a History of or at Increased Risk for Cardiovascular Disease
Data Contributor: GlaxoSmithKline
Study ID: NCT01313676
Sponsor ID: HZC113782

A Randomised, Double-blind, Placebo-controlled, Incomplete Block, 4-period Crossover, Study to Investigate the Effects of 5-day Repeat Inhaled Doses of Fluticasone Propionate (BID, 50-2000 mcg) on Airway Responsiveness to Adenosine 5-monophosphate (AMP) Challenge When Delivered After the Last Dose in Mild Asthmatic Subjects.
Data Contributor: GlaxoSmithKline
Study ID: NCT00400855
Sponsor ID: SIG103337

A Multi-Center, Open-label, Randomized Study to Evaluate the Long Term Effectiveness of Levetiracetam as Monotherapy in Comparison With Oxcarbazepine in Subjects With Newly or Recently Diagnosed Partial Epilepsy
Data Contributor: UCB
Study ID: NCT01498822
Sponsor ID: N01367