Pan-cancer classification by regularized multi-task learning

Sk Md Mosaddek Hossain,Sumanta Ray,Anirban Mukhopadhyay,Lutfunnesa Khatun

doi:10.1038/s41598-021-03554-8

Sk Md Mosaddek Hossain, Sumanta Ray + Show 2 more

Open Access

https://doi.org/10.1038/s41598-021-03554-8

Copy DOI

Journal: Scientific Reports	Publication Date: Dec 1, 2021
Citations: 10	License type: open-access

Affiliation: Aliah University, University of Kalyani

Abstract

Classifying pan-cancer samples using gene expression patterns is a crucial challenge for the accurate diagnosis and treatment of cancer patients. Machine learning algorithms have been considered proven tools to perform downstream analysis and capture the deviations in gene expression patterns across diversified diseases. In our present work, we have developed PC-RMTL, a pan-cancer classification model using regularized multi-task learning (RMTL) for classifying 21 cancer types and adjacent normal samples using RNASeq data obtained from TCGA. PC-RMTL is observed to outperform when compared with five state-of-the-art classification algorithms, viz. SVM with the linear kernel (SVM-Lin), SVM with radial basis function kernel (SVM-RBF), random forest (RF), k-nearest neighbours (kNN), and decision trees (DT). The PC-RMTL achieves 96.07% accuracy and 95.80% MCC score for a completely unknown independent test set. The only method that appears as the real competitor is SVM-Lin, which nearly equalizes the accuracy in prediction of PC-RMTL but only when complete feature sets are provided for training; otherwise, PC-RMTL outperformed all other classification models. To the best of our knowledge, this is a significant improvement over all the existing works in pan-cancer classification as they have failed to classify many cancer types from one another reliably. We have also compared gene expression patterns of the top discriminating genes across the cancers and performed their functional enrichment analysis that uncovers several interesting facts in distinguishing pan-cancer samples.

Highlights

(2) Our approach is the first to explicitly address how to learn the feature representation of multiple cancer types’ samples simultaneously
We demonstrate that PC-regularized multi-task learning (RMTL) provides better prediction accuracy than the other competing methods with the differentially expressed (DE) genes and smaller sets of features identified through the coefficients of the trained SVM-Lin and the minimum redundancy maximal relevance’ (MRMR) feature selection algorithm
It provides sound evidence that PC-RMTL can be utilized in the classification task when the expression of a small number of genes is available

Summary

Introduction

We have identified the key discriminating DE genes in the pan-cancer classification task using the coefficients (weights) of the trained SVM-Lin model. We demonstrate that PC-RMTL provides better prediction accuracy than the other competing methods with the DE genes and smaller sets of features (genes) identified through the coefficients (weights) of the trained SVM-Lin and the MRMR feature selection algorithm.

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Pan-cancer classification by regularized multi-task learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

Improved Parkinsonian tremor quantification based on automatic label modification and SVM with RBF kernel
Yumin Li ... Zengwei Wang
Physiological Measurement | VOL. 44
Yumin Li, et. al.Yumin Li ... Zengwei Wang
01 Feb 2023
Physiological Measurement | VOL. 44

Predicting BRAFV600E mutations in papillary thyroid carcinoma using six machine learning algorithms based on ultrasound elastography
Enock Adjei Agyekum ... Xiao-Qin Qian
Scientific Reports | VOL. 13
Enock Adjei Agyekum, et. al.Enock Adjei Agyekum ... Xiao-Qin Qian
03 Aug 2023
Scientific Reports | VOL. 13

Evaluation of a Quasi-fractal Dimension to Enhance Breast Cancer Detection in X-ray Mammograms using Support Vector Machine.
Kenya Murase ... Shohei Miyazaki
Japanese Journal of Medical Physics (Igakubutsuri) | VOL. 28
Kenya Murase, et. al.Kenya Murase ... Shohei Miyazaki
24 Sep 2012
Japanese Journal of Medical Physics (Igakubutsuri) | VOL. 28

A Machine Learning Challenge: Detection of Cardiac Amyloidosis Based on Bi-Atrial and Right Ventricular Strain and Cardiac Function.
Jan Eckstein ... Misagh Piran
Diagnostics | VOL. 12
Jan Eckstein, et. al.Jan Eckstein ... Misagh Piran
04 Nov 2022
Diagnostics | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Pan-cancer classification by regularized multi-task learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports