Abstract

Abstract BACKGROUND: Clinical trials are controlled patient studies aiming to objectively assess the effectiveness of treatment interventions. However, the average effectiveness observed at the group level does not directly apply to individual patients. Cancer clinical trials that include molecular profiling on the baseline tumor samples enable the discovery of treatment response-associated molecular features and the development of prediction models that help discriminate responders from non-responders in each treatment arm. Despite potential opportunities, there are many challenges associated with the predictive modeling of such data, as exemplified by the small sample size and large feature space. Advances in machine learning and artificial intelligence (AI) may provide new solutions to address these challenges. However, for machine learning and AI to have an impact, data needs to be carefully curated, high-quality, standardized, and easily accessible and understood by data scientists, who may not have the domain knowledge. METHODS: Here, we created a python package “ClinicalOmicsDB” to address the challenges of data accessibility and promote development and application of machine learning methods to omics data from clinical trial samples with treatment response information. The package makes data readily analyzable by data scientists so that they can develop, utilize, and optimize their algorithms for predicting treatment responses and discovering novel biomarkers. To promote a two-way dialogue, we have also developed several Jupyter Notebook tutorials for biologists or clinicians who wish to gain expertise in machine learning. Omics data from clinical studies are downloaded from Gene Expression Omnibus (GEO) and responses were determined based on clinical trial primary endpoints. Currently, the package has datasets from 22 breast cancer clinical trials, including a total of 5050 patients (Table 1). It will be continuously expanded to include additional trials for breast cancer and other cancer types. RESULTS: To evaluate package utility, we built machine learning models to predict neoadjuvant chemotherapy with four cycles of 5-fluorouracil/epirubicin/cyclophosphamide (FEC) followed by four cycles of docetaxel/capecitabine on US Oncology clinical trial 02-103 [GSE42822]. The best performing model was the Random Forest Classifier model, which had an AUC of 0.817. To determine the generalizability of machine learning models established from the package, we trained a Random Forest Classifier model using the GSE25055 breast cancer dataset and apply the model to a different breast cancer dataset, GSE20194, which yielded an AUC of 0.648. These results suggest utilizing machine learning on clinical omics datasets can provide predictive and generalizable models that could be implemented in clinical settings for future breast cancer patients. CONCLUSION: We are expanding the database for data scientists, biologists, and clinicians to practice novel biotechnology-derived therapies to facilitate the implementation of precision medicine approaches for future patients. As more people add new data to the package, we will work towards improving pharmaceutical and private companies’ clinical trial data sharing policies and practices to promote data sharing. Table 1. Available breast cancer datasets in ClinicalOmicsDB The current database has 22 breast cancer clinical trials with 5050 total patients. Therapy shows various cytotoxic and/or targeted treatments utilized in a clinical trial. The database will be continuously expanded to include additional trials for breast cancer and other cancer types. Citation Format: Chang In Moon, Byron Jia, Bing Zhang. ClinicalomicsDB - Bridging the gap between clinical omics data and machine learning [abstract]. In: Proceedings of the 2022 San Antonio Breast Cancer Symposium; 2022 Dec 6-10; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2023;83(5 Suppl):Abstract nr P2-12-02.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call