Identification of Research Common Data Elements in HIV/AIDS using data science methods

Lead Investigator: Vojtech Huser, National Library of Medicine/NIH
Title of Research Proposal: Identification of Research Common Data Elements in HIV/AIDS using data science methods
Vivli Data Request: 3244
Funding Source: Government Funding: National Institutes of Health
Potential Conflicts of Interest: None

Summary of the Proposed Research:

In recent years, many new effort emerged that try to maximize the value of data collected during human clinical trials. Secondary analyses of individual trials or aggregated meta-analyses of multiple comparable trials can generate additional clinical discoveries or lead to novel hypotheses. Modern trials include patient consent to generate de-identified patient-level datasets at the trial completion and making this data available for secondary research use. HIV/AIDS interventional trials or observational studies are also following this trend toward data science infrastructure that unlocks new uses for data from completed studies. A key component that allows this data integration across studies is the development of research Common Data Elements (CDEs) that are incorporated into the study design and data collection.

This project aims to support this modern trend by analyzing research CDEs applicable to the HIV/AIDS domain. To do this there are three major aims for the study.

The first aim is to describe the current state of the art in obtaining data from past HIV/AIDS studies by analyzing relevant trial results data sharing platforms. This allows for the extraction of studies and the development of research protocol from past trial results. The further goal is to extract common data elements from the results and convert study formats into and common format.

The second aim is to analyze Common Data Elements found in Electronic Health Records with the goal to compare the CDEs found in EHR with the CDEs found in trial data from the first aim. This aim also includes the analysis of HER repositories to find CDEs that can be utilized in clinical studies with a greater goal to show the capability of using HER data elements that can be compared to trial CDEs and used in clinical trials to test clinical hypothesis.

By combining the results of the first two aims, the third goal is to recommend optimal data representation format that allows data scientist to easily integrate HIV/AIDS research dataset across studies. Using this information, a further goal of the project is to design clinical research informatics solutions that provide optimal syntactic format and semantic terminology bindings for sharing trial data. This also allows for the analysis of overlap between research studies and HER data elements and compare existing trial results platforms.

The project builds on National Library of Medicine efforts to be the epicenter for NIH data science. It utilizes existing NLM expertise in routine healthcare terminologies and clinical research informatics. As the National Library of Medicine, with our mission being to efficiently provide high quality information to the public as well as researchers, we use these studies to ultimately facilitate the development of an informatics tool to conduct more efficient studies.

Statistical Analysis Plan

The statistical plan includes the calculation of statistical counts of trial information sharing and the analysis of Common Data Elements throughout the interventional trial (Vivli data) and observational study information. Further analysis includes secondary analyses of individual trials or aggregated meta-analyses of multiple comparable trials can generate additional clinical discoveries or lead to novel hypotheses. This project offers a comprehensive analysis of data elements found in clinical trials and observational studies that focuses on the analysis of present data compared to missing data and allows for an understanding of information sharing and develops a plan for the sharing of a uniform group of data elements.

Vivli data will be used to accomplish the aim of analyzing the interventional clinical trial data to find the type and amount of data elements in the trial data. This will be done by using an analysis to find counts of the data and to categorize the type of data element. This will be done without the need to remove Vivli data from the secure research environment and we will not try to identify or link the data to any individual participant.

The SAP involves the analysis of each study/platform separately. Since we are doing a meta-analysis of the data elements that are being published and not the specific results, we can analyze the data elements, looking at key factors relating to the type and amount of data separately in each platform. This analysis will give us an understanding of the common and uncommon data elements by doing a comparative analysis of the different platforms which we have analyzed removing the need to actually physically combine the data.

The criteria we looked for when selecting studies were completed or no longer active late phase HIV related trials with preference placed on intervention over observational studies. These criteria included both studies directly related to HIV treatments and therapies as well as treatments for HIV positive patients regardless if the treatment itself was for HIV. This allows for comparison of data collected by studies on HIV patients and the differences in the data depending on the actual treatment type.

Requested Studies:

A Phase 3, Randomized, Open-Label Study of Lopinavir/Ritonavir (LPV/r) Tablets 800/200 Milligram (mg) Once-Daily (QD) Versus 400/100 mg Twice-Daily (BID) When Coadministered With Nucleoside/Nucleotide Reverse Transcriptase Inhibitors (NRTIs) in Antiretroviral-Experienced, Human Immunodeficiency Virus Type 1 (HIV-1) Infected Subjects

Sponsor: Abbott
Study ID: NCT00358917
Sponsor ID: M06-802

A Randomized, Open-Label Study of 800 Mg Lopinavir/200 Mg Ritonavir QD in CombinationWith Tenofovir and Emtricitabine Vs. 400 Mg Lopinavir /100 Mg Ritonavir BID in Combination With Tenofovir and Emtricitabine in HIV-Infected Antiretroviral Naïve Subjects

Sponsor: Abbott
Study ID: NCT00043966
Sponsor ID: M02-418

A Phase 3, Randomized, Open-label, Study of Lopinavir/Ritonavir Tablets Versus Soft Gel Capsules and Once Daily Versus Twice Daily Administration, WhenCoadministered With NRTIs in Antiretroviral Naive HIV-1 Infected Subjects

Sponsor: Abbott
Study ID: NCT00262522
Sponsor ID: M05-730

A Randomized, Open-label Study of Lopinavir/Ritonavir 400/100 mg Tablet Twice Daily + Co-formulated Emtricitabine/Tenofovir Disoproxil Fumarate 200/300 mg Once Daily Versus Lopinavir/Ritonavir 400/100 mg Tablet Twice Daily +Raltegravir 400 mg Twice Daily in Antiretroviral Naive, HIV-1 Infected Subjects

Sponsor: Abbott
Study ID: NCT00711009
Sponsor ID: M10-336

Safety and Immunogenicity of 13-Valent Pneumococcal Conjugate Vaccine (13vPnC) in HIV-Infected Subjects 6 Years of Age or Older Who Are Naive to Pneumococcal Vaccine

Sponsor: Pfizer Inc.
Study ID: NCT00962780

Safety & Immunogenicity of 13vPnC in HIV-Infected Subjects Aged 18 or Older Who Were Previously Immunized With 23vPS

Sponsor: Pfizer Inc.
Study ID: NCT00963235

Summary of Results

As part of our project, our goal was to identify common data elements (CDEs) as well as assess the feasibility of such identification through the acquisition and use of shared data. This includes assessing the acquisition process and the data format. Vivli has a multistep review process of requests where submitted requests require review by the platform and then each data supplier of the requested studies. Each data supplier has their own internal review, independent from each other, which can be a lengthy process in answering and editing the request to their approval. Each request can be approved by each data supplier, rejected by each, or accepted by some and not by others, leading to the partial acquisition of requested data. Certain data suppliers do not have their studies listed and have to be specifically asked for by NCT ID otherwise the study is not visible to requestors. Once acquired, each data package is supplied by the individual suppliers, so different study datasets can be added at different times instead of all at once and with different documentation. Most studies accessed had valuable supporting material in the data package that included, data dictionaries, case report forms, etc., though one study did not provide the data dictionary making the data unusable. While there is no standardization or formats required for the data, of the 6 studies we acquired access to 3 used CDISC standards while the other 3 used a custom format. The data packages are housed in a remote environment, where results can be downloaded out of the environment with patient counts of 15 or above and upon review by Vivli staff. The review process takes 2-7 days. However, users have the ability to download valuable supporting study materials with data supplier approval. The lack of standardization in the request process and the data provided, as well as limitations in the research environment, make the use of provided data challenging. While we did learn many lessons from the project, we were unable to meet the overall project goals of identifying CDEs due to the challenges of the platform.