Machine learning for personalized drug response prediction for bladder and prostate cancer patients

Lead Investigator: Marianna Kruithof-de Julio, University of Bern
Title of Proposal Research: Machine learning for personalized drug response prediction for bladder and prostate cancer patients
Vivli Data Request: 7312
Funding Source: Swiss National Foundation and contract at University of Bern
Potential Conflicts of Interest: None

Summary of the Proposed Research:

Prostate cancer (PCa) is the most common cancer and the second leading cause of cancer-associated death in men. The survival rate of PCa patients is mostly determined by the extent of the tumor. If the cancer is confined to the prostatic gland, the median survival can be anticipated in excess of 5 years. If PCa has spread to distant organs, current therapies are not curative, and the median survival drops to 1 to 3 years.

Currently, PCa can be successfully treated surgically when still in its first phase of androgen dependency. Follow up with androgen deprivation therapy will contain the cancer and reduce the possibility of metastasis. However, once the cancer becomes androgen independent or “castration resistant,” therapy is no longer useful or successful.

Urothelial carcinoma of the bladder (BlCa) is the fifth most common cancer in the Western world. BlCa can be classified in non-muscle invasive (NMIBC) and muscle invasive (MIBC). Patients with low grade NMIBC have a good prognosis but show frequent recurrence. Those with high grade NMIBC show an even higher rate of recurrence and progression. Patients with NMIBC are followed up by cystoscopy and cytology every 3-6 months for at least 5 years.

Unfortunately, cystoscopy is invasive, and cytology has a low sensitivity, ranging from 20% to 53%. As a result of the need for this procedure-based, long-term follow-up, bladder cancer management costs more per patient lifetime than any other cancer. MIBC is a highly aggressive disease with a 60% 5-year overall survival, presumably due to early metastatic dissemination.

As of today, there are no preemptive diagnostic tests to identify increased risk to develop PCa or BlCa and patients are still treated with standard of care drugs. Standard of care therapy does not work for every patient. The development of machine learning algorithms for automatic prediction of drug response for new patients and new drugs will allow us to select the most promising drugs for each patient for further experimental evaluation. We will be able to determine therapy response based on multiple parameters defined by the study of the individual tumor including genomic and transcriptomic sequencing. This will have a huge impact on the patient population by reducing the number of patients that will undergo surgery.

Statistical Analysis Plan:

We will use the obtained dataset for pre-training a machine learning regression model (a Neural Network) to predict cell viability based on genomic and transcriptomic profiles of the patient as well as on the molecular properties of the drug. We will also experiment with using anonymized patient metadata as additional features for prediction. We will experiment with training a model on the organoid dataset alone as well as in combination with the cell line dataset obtained from the Genomics of Drug Sensitivity in Cancer Project.

The total dataset will be separated into 80% training data, 10% validation data for tuning the parameters of the model, and 10% testing data for testing the model. We will stratify the training, validation, and testing subsets according to the metadata to make sure that each subset contains a similar distribution of cancer types and other metadata. We will use cross-validation to train and evaluate the model on at least 5 different training/validation/test splits. For evaluation of the regression model, we will use mean square error and Pearson’s R coefficient.

We are aware that the requested dataset does not contain prostate or bladder cancer organoids. Therefore we will use the dataset exclusively for pre-training the model. After this has been done, we will fine-tune the pre-trained model using our prostate organoid drug screen data. Similarly as described above, we will split our prostate PDO dataset into at least 10 splits of training and testing in a cross-validation setting and then fine-tune and test the model on each split using mean square error and Pearson’s R coefficient as evaluation measures.

We are aware of the risk that prostate and bladder cancer data may appear to be significantly different in terms of their profiles and drug response as compared to other cancer types considered in the requested study. Before engaging into training the model, we will explore this difference by for example plotting the requested study data as well as our prostate and bladder data in a 2D space using such dimensionality reduction techniques as Uniform Manifold Approximation and Projection (UMAP). We will experiment with different normalization and batch correction techniques (e.g. CombatSeq) to obtain a feature space in which the requested study data and our organoid data are compatible.

After a model has been successfully trained and tested, we will perform error analysis, explore evaluation measures for each metadata type separately and try to isolate patient data, for which automatic viability prediction was or was not successful. We will also explore genomic, transcriptomic, and clinical features that have the highest importance scores in the model to understand if the model captures any biologically meaningful dimensions for prediction.

As baseline models, we will use simple regression models trained with our prostate and bladder organoid data alone, such as K-nearest neighbors, Support Vector Regression, Decision Tree, and Random Forest. We will compare the baseline models with the Neural Network model pre-trained with the requested study data. This comparison will allow us to conclude if pre-training a model on the requested study data gives us advantageous over using our dataset alone.

Requested Studies:

Genomic, Transcriptomic, and Drug Screening Data from a Pan-cancer Organoid Cohort with Source Tumor Samples
Data Contributor: Tempus Labs, Inc.
Study ID: T21.01
Sponsor ID: T21.01

Summary of Results:

The data from Tempus study ‘T21.01 – Genomic, Transcriptomic, and Drug Screening Data from a Pan-cancer Organoid Cohort with Source Tumor Samples’ did not contribute to our machine learning model for predicting precision therapy for prostate and pancreatic cancer patients. We did not use the Tempus Labs dataset in any publications and presentations.
The reason we couldn’t use the Tempus Labs dataset is that because of its relatively small size the dataset proved not to have any impact on the performance of the developed prediction model when added to other available datasets.