Lead Investigator: Sam Royston, Replica Health
Title of Proposal Research: Merging Big Datasets for Training Blood Glucose Prediction Models
Vivli Data Request: 10450
Funding Source: None
Potential Conflicts of Interest: None
Summary of the Proposed Research:
Project Background
The focus of this research is on providing a standardized data format and code for parsing several big datasets into this format. The format shall be tailored for use in machine learning models.
Necessity of the Research
There is significant unexplored potential in improving blood glucose prediction models. Despite advances in technology, current models often struggle with challenges like missing and imbalanced data, which can lead to less accurate predictions. Addressing these challenges could lead to more reliable and effective tools for managing diabetes, potentially benefiting millions of people.
Impact on Public Health
Diabetes affects over 460 million people worldwide, according to the International Diabetes Federation. Improved blood glucose prediction models could enhance daily management for these individuals, reducing the risk of complications such as heart disease, kidney failure, and vision loss. This research could lead to tools that make diabetes management more personalized and effective, improving the quality of life for those living with the condition.
Contribution to Medical Science
This research aims to contribute to medical science by developing and standardizing benchmark datasets for blood glucose prediction. Benchmark datasets are critical in ensuring that models can be fairly compared and validated, which helps in identifying the most effective approaches. By creating a standardized dataset, this research will allow for more accurate comparisons between different models, ensuring that advancements in the field are based on solid evidence.
Research Design and Methods
The research will involve a literature review to identify existing datasets used in blood glucose prediction, as well as datasets from other fields that have successfully addressed similar challenges, such as imbalanced data. We will then merge and preprocess these datasets, creating a comprehensive benchmark dataset. This will involve careful handling of missing and imbalanced data to ensure the dataset is robust and representative.
We will also develop a tutorial to guide researchers on how to train models using these merged datasets. This tutorial will emphasize the importance of standardization, ensuring that models are trained and evaluated under consistent conditions. This approach will allow for apple-to-apple comparisons between models, making it easier to identify which models are truly making advancements in the field.
Relevance to Science and Public Health
By addressing the challenges of missing and imbalanced data and providing a standardized dataset, this research will help to accelerate progress in blood glucose prediction. This is essential for developing tools that can improve diabetes management, ultimately benefiting public health on a global scale.
Requested Studies:
Type 1 Diabetes EXercise Initiative: The Effect of Exercise on Glycemic Control in Type 1 Diabetes Study
Data Contributor: Jaeb Center for Health Research Foundation, Inc.
Study ID: T1-DEXI
Sponsor ID: T1-DEXI
Type 1 Diabetes EXercise Initiative Pediatric Study (T1DexiP): The Effect of Exercise on Glycemic Control in Youth with Type 1 Diabetes
Data Contributor: Jaeb Center for Health Research Foundation, Inc.
Study ID: T1-DEXIP
Sponsor ID: T1-DEXIP