Using Artificial Intelligence to Personalize Treatment Selection and Trajectory Monitoring of Bipolar Disorder

Lead Investigator: Gustavo Turecki, McGill University
Title of Proposal Research: Using Artificial Intelligence to Personalize Treatment Selection and Trajectory Monitoring of Bipolar Disorder
Vivli Data Request: 7811
Funding Source: Government: IMADAPT ERA PERMED 2020 grant
Commercial: Aifred Health
Additional Contracts or Consultancies: The researchers that are funded are paid via contracts as the Aifred Health research team is being subcontracted by the Douglas Mental Health University Institute.
Potential Conflicts of Interest: The research team members but excluding the primary investigator, work for a medical technology startup (Aifred Health) that will use the results of this study towards the future development of a clinical decision aid that the company may later commercialize. As part of the ERA-PERMED grant, the Aifred Health research team is being subcontracted by the Douglas Mental Health University Institute. Any conflicts of interest are managed as all publications must pass through Dr. Turecki, who has no interest in the company, as must the direction of the research project. All conflicts have also been declared as part of the successful local REB application. For this study, we do not foresee major potential conflicts of interest since we are only doing secondary data analysis, that will not produce any immediate commercial product.

Summary of the Proposed Research:

Bipolar disorder is a mood disorder characterized by extreme mood swings that cycle between emotional lows (depression) and highs (mania or hypomania). In bipolar disorder type 1, the mania is often severe with psychotic features and requires hospitalization. In bipolar disorder type 2, periods of elevated mood are milder and referred to as hypomania. An estimated 2.8% of the US population lives with bipolar disorder, with 82.9% of them having serious levels of impairment (figures from the National Institute of Mental Health [NIMH]). It is a serious and persistent mental illness (SPMI) that causes tremendous suffering to patients and their loved ones, with a 10-30 times greater risk of suicide in bipolar disorder as compared to general population. Bipolar disorder also poses a significant economic burden to healthcare systems. For example in the USA, the total cost burden is $202.1 billion, which breaks down to an estimated $81,559 per person.

Bipolar disorder is highly clinically complex and there is a dearth of strategies to effectively deal with failure of first line treatments. Only one third of bipolar patients will respond to initial treatment and the majority of patients must try several treatment options or combinations, spanning months or years before identifying one that works best for them. Even once and if the initial treatment selection is successful, there is an 80-90% chance of lifetime recurrence – 49% of these within a 2 year period, increasing the risk of prolonged functional impairment and suicide. Therefore, it is crucial to track the trajectory of the patient longitudinally to identify risk of relapse or switches towards manic or depressive phases, in order to intervene as quickly as possible. Improving our ability to select the most effective treatment for individual patients would mark a significant advancement in the treatment of bipolar disorder.

Deep learning is a process that can identify complex patterns in clinical trial data to embed knowledge of treatments within an artificial neural network (ANN). Our research aims to train an ANN with clinical trial data in order to find patterns in patient characteristics that respond best to a specific treatment and to identify how their condition will vary over time. We have previously demonstrated that this approach has been successful in generating differential treatment benefit prediction in unipolar major depression. Now, we aim to apply this method to bipolar depression as similar datasets exist for this condition, and there are similar challenges in treatment selection and morbidity as in unipolar major depression. The aim of the present analysis will be to predict remission probability via a machine learning model of individual subject data, with a view to identifying predictors of individual patient response in bipolar depression. In addition to predicting remission probability for the current depressive episode, our project will seek to identify predictors and to generate predictive models of switch from bipolar depression to manic and hypomanic phases, or of a lack of sustained remission (i.e., return to a depressive state soon after remission is achieved). The ultimate aim of this research is to develop an evidence-based approach to personalizing treatments for bipolar disorder while monitoring the trajectory of the patient to assist in the prevention of relapse or recurrence.

Statistical Analysis Plan:

Treatment efficacy will be measured by changes in normalized versions of standardized rating scales for bipolar disorder as well as rating scales for patient function when available. In this project, we have two main outcomes – prediction bipolar depression remission as well as mood switches (e.g., from depression to mania).

Model Evaluation and Bias Control
We will evaluate our ANN using the following standard tests: model sensitivity, model specificity, Positive Predictive Value (PPV), Negative Predictive Value (NPV), the Receiver Operating Curve (ROC), the Area Under the Curve (AUC), and F1 Score analysis. Samples within a data set are sorted into categories based on passing a set threshold; for example, a treatment is classified as effective for a given patient only if it passes a high predetermined threshold (i.e., meeting the usual criteria for response or remission). As we increase the threshold, we can increase the separation between categories and thus the specificity and selectivity of the ANN. Sensitivity indicates the ANN’s predictive power for correctly identifying a positive association; specificity determines the ANN’s ability to correctly identify a negative association. The ROC measures how responsive the ANN is to changes in predictive thresholds as we optimize this value. ROC is determined from the sensitivity and specificity. The area underneath this ROC curve (AUC) is the standard in comparing different iterations of a classifier and can indicate superiority over the other trained algorithms if the current algorithm has the largest AUC. The NPV is related to the specificity as it is responsible for correctly predicting negative results to patients without a certain condition. The PPV (otherwise known as the precision score in the machine learning community) is responsible for determining the number of correct results among all results to demonstrate that the algorithm understands the nuances of a certain illness. To help illustrate how this works, imagine a rare illness. For this rare illness, an algorithm can predict the negative case (i.e., not ill) for all patients and retain a very good accuracy but these metrics assist in decoding the true performance so as to be able to still predict this rare illness when necessary.

To promote the accuracy of our ANN in identifying key patient characteristics with respect to treatment responses for treatments, we will use k-fold validation assessment. We further assess the predictive accuracy of the ANN by measuring the variance in prediction error between its predictions made from the training data set compared to those made with the validation data set. The ideal scenario here is to keep the validation error as close to the training error as possible while still jointly lowering the training error. Large divergence of the validation error from that of the training will result in overfitting and a less generalizable model.

Deep learning techniques used to train ANNs typically employ a frequentist statistical approach to establish patterns and associations in data. Our training strategy will focus on the frequentist approach in order to identify replicable links between patient characteristics (inputs) and effective treatments/side-effects (outputs). This frequentist method can introduce a bias: ANNs will readily learn to make strong associations between inputs and outputs abundant in the data while downplaying those between rare inputs or outputs. (For example, the data may abound with patient classifications based on gender but lack information on their socioeconomic status; the ANN will thus succeed in linking effective treatments and gender but not socioeconomic status. This may happen despite the fact that socioeconomic status may be as strong a predictor of treatment success as gender). We will identify this potential bias by use of precision and recall metrics and correct any imbalance by use of a “regularizing term”. A regularising term is an additional restriction on the model aimed at penalising the model more for incorrect classification of classes that are not abundant.

A potential issue requiring vigilance with deep learning of ANNs is the problem of “overfitting”. Long-term training of an ANN may cause it to memorize and thus fixate on certain patterns from the training data. Overfitting of the ANN is problematic because overfit patterns are: 1) not representative of actual “learning” but rather “memorization”; and 2) may not generalize to new data sets. To counter overfitting, we will employ a common “dropout strategy”. Here we will randomly remove variables from the training data–specifically, the input features of patient characteristics–during different training rounds. For example, a patient with characteristics ABC can be used in three training rounds — one for A (removed BC), B (-AC), etc. — to train the ANN to recognize these three features as being linked to an effective treatment. Using the dropout method, the ANN can learn more distinct and similarly representative patterns between patient characteristics and effective treatments rather than fixating on a select few. We can further avoid the problem of overfitting by employing a “snapshot learning” procedure. This common procedure in the field of deep learning enables us to iteratively push several ANNs towards learning together more general patterns in the data instead of memorizing data points.

Statistical power calculations and levels of significance

We will follow a “10X standard” to determine the amount of data our ANN requires to understand and identify patterns in the clinical trial data. This rule-of-thumb in deep learning recommends using roughly 10 times more data points than trainable parameters (i.e. the weights and biases of connections between nodes at each layer of the ANN) embedded within the ANN. As such, following the 10X standard should provide us with more than enough data for accurate prediction.

Estimating a sample size that will ensure satisfactory performance of a deep learning model is difficult to predict since the distribution of data is critical to how well an optimization algorithm can separate features between the individual classes. However, extrapolating the methodology of Liu et al. (2014), assuming gamma to be 0.10, we can estimate how many samples we would need for each individual class with 50(20-1) = 950 samples per population [22]. Thus, if we wanted to include 20 different treatments, we would need 950 * 20 = 19000 total samples. This gives us a good estimation on how many samples our classifier will need to obtain a power value of around 80%.

Analysis of subgroups

Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP) can identify general patterns (i.e., “natural clusters”) in the training data. Once the ANN can associate categories to individual patients, both PCA, t-SNE, and UMAP techniques will sort the large number of features for each patient to simplify identification of patterns. These techniques simplify the visualization and identification by the investigators of the patterns learned by the ANN during training.

Handling of missing data

To address missing data we will employ the MICE (multiple imputation by chained equations) package in python (Van Buuren, Stef, and Karin Oudshoorn. Flexible multivariate imputation by MICE. Leiden: TNO, 1999.). Multiple imputation allows the use of other variables to predict missing values for each patient, iterating and updating multiple times to achieve convergence . We validate the imputations performed by examining the distributions for each variable’s values before and after imputation, only accepting imputations where the distributions on a variable level remain essentially unchanged. In previous work, we only performed imputation on variables where we had at least 50% of the data – this figure will likely remain very similar, but is considered in context with the data available and the performance of MICE on each individual dataset.

Deep learning techniques can train ANNs using incomplete data sets using the “dropout method” described previously. This means that while the model is being trained, certain variables are removed from the input set to give the model the ability to form several representations of the same class using different indicating factors (to illustrate, let us imagine that responders to drug X are generally female, over 50, do not have diabetes, and have low cholesterol; the model would learn to associate response to this drug to different combinations of 3 of those 4 predictors. As such, even if some patient records are missing one of the three factors, patients from those records could still be correctly identified as belonging to the class of patients who respond to drug X).
Overall, the dropout method enables us to group together large and incomplete data sets with heterogeneous patient features and treatments.

Requested Studies:

Safety and Efficacy of Olanzapine (LY170053) in the Long-term Treatment for Patients With Bipolar I Disorder, Depressed
Data Contributor: Lilly
Study ID: NCT00618748
Sponsor ID: 11682

Efficacy and Safety of Olanzapine in the Treatment of Patients With Bipolar I Disorder, Depressed: A Randomized, Double-Blind Comparison With Placebo
Data Contributor: Lilly
Study ID: NCT00510146
Sponsor ID: 11218

Olanzapine/Fluoxetine Combination Versus Lamotrigine in the Treatment of Bipolar I Depression
Data Contributor: Lilly
Study ID: NCT00485771
Sponsor ID: 7980

Bipolar Depression Assessment Study on Tx Response
Data Contributor: Lilly
Study ID: NCT00191399
Sponsor ID: 9370

Olanzapine Versus Divalproex and Placebo in the Treatment of Mild to Moderate Mania Associated With Bipolar I Disorder
Data Contributor: Lilly
Study ID: NCT00094549
Sponsor ID: 7029

A Randomized, Double-Blind Study of Depakote Monotherapy, Olanzapine Monotherapy, and Combination Therapy of Depakote Plus Olanzapine in Stable Subjects During the Maintenance Phase of Bipolar Illness
Data Contributor: AbbVie
Study ID: NCT00071253
Sponsor ID: M02-551

An Inpatient Study of the Effectiveness and Safety of Depakote ER in the Treatment of Mania/Bipolar Disorder
Data Contributor: AbbVie
Study ID: NCT00060905
Sponsor ID: M02-540

Olanzapine Versus Divalproex in the Treatment of Acute Mania
Data Contributor: Lilly
Study ID: F1D-US-HGHQ
Sponsor ID: F1D-US-HGHQ

Prevention Of Bipolar Relapse With Olanzapine And Other Mood-Stabilizers. A Prospective Observational Study (PROTECT)
Data Contributor: Lilly
Study ID: F1D-SB-B018
Sponsor ID: F1D-SB-B018

A Controlled Trial of the Efficacy of Rapid Initial Dose Escalation of Olanzapine to Treat Acute Behavioral Agitation in Schizophrenia and Bipolar I Disorder
Data Contributor: Lilly
Study ID: F1D-US-HGIY
Sponsor ID: F1D-US-HGIY

Olanzapine Versus Lithium in Relapse Prevention in Bipolar Disorder
Data Contributor: Lilly
Study ID: F1D-MC-HGHT
Sponsor ID: F1D-MC-HGHT

A Double-Blind Randomized Comparison of the Efficacy and Safety of Short-Acting Intramuscular Olanzapine, Short-Acting Intramuscular Lorazepam and Intramuscular Placebo in Acutely Agitated Patients Diagnosed with Mania Associated with Bipolar Disorder
Data Contributor: Lilly
Study ID: F1D-MC-HGHW
Sponsor ID: F1D-MC-HGHW

Olanzapine Versus Placebo in the Prevention of Relapse in Bipolar Disorder
Data Contributor: Lilly
Study ID: F1D-MC-HGHL
Sponsor ID: F1D-MC-HGHL

Placebo-Controlled Olanzapine Monotherapy in the Treatment of Bipolar I Depression
Data Contributor: Lilly
Study ID: F1D-MC-HGGY
Sponsor ID: F1D-MC-HGGY

Olanzapine Versus Placebo in the Treatment of Bipolar Disorder, Manic or Mixed
Data Contributor: Lilly
Study ID: F1D-MC-HGGW
Sponsor ID: F1D-MC-HGGW

Placebo- and Haloperidol-Controlled Double-Blind Trial of Olanzapine in Patients with Manic or Mixed Episode of Bipolar I Disorder
Data Contributor: Lilly
Study ID: F1D-JE-BMAC
Sponsor ID: F1D-JE-BMAC

A European observational study of health outcomes associated with treatment for mania in Bipolar Disorder.
Data Contributor: Lilly
Study ID: F1D-EW-HGKV
Sponsor ID: F1D-EW-HGKV

A Randomized, Double-Blind, Placebo-Controlled, Phase 3 Study to Evaluate the Efficacy and Safety of Once a Day, TAK-375 (Ramelteon) Tablet for Sublingual Administration (TAK-375SL Tablet) in the Treatment of Acute Depressive Episodes Associated With Bipolar I Disorder in Adult Patients Who Are on Lithium and/or Valproate
Data Contributor: Takeda
Study ID: NCT01467700
Sponsor ID: TAK-375SL_201

A Randomized, Double-Blind, Placebo-Controlled, Phase 3 Study to Evaluate the Efficacy and Safety of Once a Day, TAK-375SL as an Adjunctive Therapy to Treatment-as-Usual in the Maintenance Treatment of Bipolar I Disorder in Adult Patients
Data Contributor: Takeda
Study ID: NCT01467713
Sponsor ID: TAK-375SL_203

A Randomized, Double-Blind, Placebo-Controlled, Phase 3 Study to Evaluate the Efficacy and Safety of Once a Day, TAK-375 (Ramelteon) Tablet for Sublingual Administration (TAK-375SL Tablet) as an Adjunctive Therapy in the Treatment of Acute Depressive Episodes Associated With Bipolar 1 Disorder in Adult Subjects
Data Contributor: Takeda
Study ID: NCT01677182
Sponsor ID: TAK-375SL_301

Olanzapine Added to Mood Stabilizers in the Treatment of Bipolar Disorder
Data Contributor: Lilly
Study ID: F1D-MC-HGFU
Sponsor ID: F1D-MC-HGFU

Olanzapine Versus Placebo in the Treatment of Mania Associated with Bipolar I Disorder
Data Contributor: Lilly
Study ID: F1D-MC-HGEH
Sponsor ID: F1D-MC-HGEH

A Multicenter, Randomized, Double-Blind, Placebo-Controlled Study of Aripiprazole in the Treatment of Patients With Bipolar I Disorder With a Major Depressive Episode. CN138-146 LT is the 26-week Open Label Extension Phase of the Above Titled Protocol, CN138-146 ST.
Data Contributor: Otsuka
Study ID: NCT00094432
Sponsor ID: CN138-146

Public Disclosures:

Mehltretter, J., Fratila, R., Perlman, K., Tunteng, J.-F., Popescu, C., Turecki, G., & Benrimoh, D. (2024). Using Artificial Intelligence to Personalize Initial Treatment Selection in Bipolar Depression. Zenodo. Doi : 10.5281/zenodo.12747378