Abstract

BackgroundMachine learning (ML) methods still have limited applicability in personalized oncology due to low numbers of available clinically annotated molecular profiles. This doesn’t allow sufficient training of ML classifiers that could be used for improving molecular diagnostics.MethodsWe reviewed published datasets of high throughput gene expression profiles corresponding to cancer patients with known responses on chemotherapy treatments. We browsed Gene Expression Omnibus (GEO), The Cancer Genome Atlas (TCGA) and Tumor Alterations Relevant for GEnomics-driven Therapy (TARGET) repositories.ResultsWe identified data collections suitable to build ML models for predicting responses on certain chemotherapeutic schemes. We identified 26 datasets, ranging from 41 till 508 cases per dataset. All the datasets identified were checked for ML applicability and robustness with leave-one-out cross validation. Twenty-three datasets were found suitable for using ML that had balanced numbers of treatment responder and non-responder cases.ConclusionsWe collected a database of gene expression profiles associated with clinical responses on chemotherapy for 2786 individual cancer cases. Among them seven datasets included RNA sequencing data (for 645 cases) and the others – microarray expression profiles. The cases represented breast cancer, lung cancer, low-grade glioma, endothelial carcinoma, multiple myeloma, adult leukemia, pediatric leukemia and kidney tumors. Chemotherapeutics included taxanes, bortezomib, vincristine, trastuzumab, letrozole, tipifarnib, temozolomide, busulfan and cyclophosphamide.

Highlights

  • Machine learning (ML) methods still have limited applicability in personalized oncology due to low numbers of available clinically annotated molecular profiles

  • We curated Gene Expression Omnibus (GEO) [34], Tumor Alterations Relevant for GEnomics-driven Therapy (TARGET) [35] and The Cancer Genome Atlas (TCGA) [36] repositories to extract cancer gene expression profiles associated with the clinical outcomes of chemotherapeutic treatments

  • – at least 40 gene expression profiles present; – data obtained for the same cancer type and using the same experimental platform – every profile is linked with the case clinical history – all cancers treated with at least one common drug or chemotherapy regimen – treatment outcomes are available enabling to classify every case as either responder or non-responder

Read more

Summary

Introduction

Machine learning (ML) methods still have limited applicability in personalized oncology due to low numbers of available clinically annotated molecular profiles This doesn’t allow sufficient training of ML classifiers that could be used for improving molecular diagnostics. Personalized approach provides important advantages in clinical oncology in terms of improved patient survival and lower drug toxicities [1, 2]. The percentage of US patients with cancer estimated to benefit from personalized prescriptions of targeted therapeutics was only 0.7% in 2006, and it had increased to ~ 5% in 2018 [4] This progress could be more significant if more companion diagnostic tests would be available for the standardly used cancer drugs. Cancer gene expression data can be used per se or can be normalized on the available profiles of healthy human tissues [7]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call