Flexible Data Trimming Improves Performance of Global Machine Learning Methods in Omics-Based Personalized Oncology.

Victor Tkachev,Constantin Borisov,Anton Buzdin,Maxim Sorokin,Nicolas Borisov,Andrew Garazha

doi:10.3390/ijms21030713

Victor Tkachev, Constantin Borisov + Show 4 more

Open Access

PDF Available

https://doi.org/10.3390/ijms21030713

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

(1) Background: Machine learning (ML) methods are rarely used for an omics-based prescription of cancer drugs, due to shortage of case histories with clinical outcome supplemented by high-throughput molecular data. This causes overtraining and high vulnerability of most ML methods. Recently, we proposed a hybrid global-local approach to ML termed floating window projective separator (FloWPS) that avoids extrapolation in the feature space. Its core property is data trimming, i.e., sample-specific removal of irrelevant features. (2) Methods: Here, we applied FloWPS to seven popular ML methods, including linear SVM, k nearest neighbors (kNN), random forest (RF), Tikhonov (ridge) regression (RR), binomial naïve Bayes (BNB), adaptive boosting (ADA) and multi-layer perceptron (MLP). (3) Results: We performed computational experiments for 21 high throughput gene expression datasets (41–235 samples per dataset) totally representing 1778 cancer patients with known responses on chemotherapy treatments. FloWPS essentially improved the classifier quality for all global ML methods (SVM, RF, BNB, ADA, MLP), where the area under the receiver-operator curve (ROC AUC) for the treatment response classifiers increased from 0.61–0.88 range to 0.70–0.94. We tested FloWPS-empowered methods for overtraining by interrogating the importance of different features for different ML methods in the same model datasets. (4) Conclusions: We showed that FloWPS increases the correlation of feature importance between the different ML methods, which indicates its robustness to overtraining. For all the datasets tested, the best performance of FloWPS data trimming was observed for the BNB method, which can be valuable for further building of ML classifiers in personalized oncology.

Highlights

A personalized approach in oncology was proven helpful for increasing efficacy of drugs prescription in many cancers [1,2]
floating window projective separator (FloWPS) essentially improved the classifier quality for all global machine learning (ML) methods (SVM, random forest (RF), binomial naïve Bayes (BNB), adaptive boosting (ADA), multi-layer perceptron (MLP)), where the area under the receiver-operator curve (ROC AUC) for the treatment response classifiers increased from 0.61–0.88 range to 0.70–0.94
For all the datasets tested, the best performance of FloWPS data trimming was observed for the BNB method, which can be valuable for further building of ML classifiers in personalized oncology

Summary

Introduction

A personalized approach in oncology was proven helpful for increasing efficacy of drugs prescription in many cancers [1,2]. It is based on finding specific biomarkers which can be mutations, protein levels or patterns of gene expression [3]. Agnostic drug scoring approach, including machine learning (ML) methods can offer even a wider spectrum of opportunities by non-hypothesis-driven direct linkage of specific molecular features with clinical outcomes, such as responsiveness on certain types of treatment [7,8]. The high throughput transcriptomic data, including microarray- and next-generation sequencing gene expression profiles can be utilized for building such classifiers/predictors of clinical response to a certain type of treatment. The direct use of ML to personalize prediction of clinical outcomes is problematic, due to the lack of sufficient amounts of preceding clinically annotated cases supplemented with the high-throughput molecular data (~thousands or tens thousands of cases per treatment scheme) [23]

Methods

Results

Discussion

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Molecular Sciences	Publication Date: Jan 22, 2020
Citations: 25	License type: CC BY 4.0

R Discovery Prime

Flexible Data Trimming Improves Performance of Global Machine Learning Methods in Omics-Based Personalized Oncology.

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: International Journal of Molecular Sciences

Lead the way for us

Similar Papers

Flexible Data Trimming for Different Machine Learning Methods in Omics-Based Personalized Oncology
Victor Tkachev ... Anton Buzdin
-
Victor Tkachev, et. al.Victor Tkachev ... Anton Buzdin
01 Jan 2019
01 Jan 2019

Machine learning in pain research.
Jörn Lötsch ... Alfred Ultsch
Pain | VOL. 159
Jörn Lötsch, et. al.Jörn Lötsch ... Alfred Ultsch
24 Nov 2017
Pain | VOL. 159

Evaluation of nine machine learning methods for estimating daily land surface radiation budget from MODIS satellite data
Shaopeng Li ... Kun Jia
International Journal of Digital Earth | VOL. 15
Shaopeng Li, et. al.Shaopeng Li ... Kun Jia
14 Oct 2022
International Journal of Digital Earth | VOL. 15

An investigation of machine learning methods applied to genomic prediction in yellow-feathered broilers
Bogong Liu ... Haihan Zhang
Poultry Science | VOL. 104
Bogong Liu, et. al.Bogong Liu ... Haihan Zhang
01 Nov 2024
Poultry Science | VOL. 104

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Flexible Data Trimming Improves Performance of Global Machine Learning Methods in Omics-Based Personalized Oncology.

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: International Journal of Molecular Sciences