Explainable machine learning approach for cancer prediction through binarilization of RNA sequencing data.

Tianjie Chen,Md Faisal Kabir

doi:10.1371/journal.pone.0302947

Abstract

In recent years, researchers have proven the effectiveness and speediness of machine learning-based cancer diagnosis models. However, it is difficult to explain the results generated by machine learning models, especially ones that utilized complex high-dimensional data like RNA sequencing data. In this study, we propose the binarilization technique as a novel way to treat RNA sequencing data and used it to construct explainable cancer prediction models. We tested our proposed data processing technique on five different models, namely neural network, random forest, xgboost, support vector machine, and decision tree, using four cancer datasets collected from the National Cancer Institute Genomic Data Commons. Since our datasets are imbalanced, we evaluated the performance of all models using metrics designed for imbalance performance like geometric mean, Matthews correlation coefficient, F-Measure, and area under the receiver operating characteristic curve. Our approach showed comparative performance while relying on less features. Additionally, we demonstrated that data binarilization offers higher explainability by revealing how each feature affects the prediction. These results demonstrate the potential of data binarilization technique in improving the performance and explainability of RNA sequencing based cancer prediction models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Explainable machine learning approach for cancer prediction through binarilization of RNA sequencing data.

Abstract

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Journal: PLOS ONE	Publication Date: May 10, 2024
License type: CC BY 4.0

Similar Papers

Application of machine learning-based surrogate models for urban flood depth modeling in Ho Chi Minh City, Vietnam
Thanh Quang Dang ... Duong Tran Anh
Applied Soft Computing | VOL. 150
Thanh Quang Dang, et. al.Thanh Quang Dang ... Duong Tran Anh
10 Nov 2023
Applied Soft Computing | VOL. 150

Using a machine learning-based risk prediction model to analyze the coronary artery calcification score and predict coronary heart disease and risk assessment
Yue Huang ... Ying Zhang
Computers in Biology and Medicine | VOL. 151
Yue Huang, et. al.Yue Huang ... Ying Zhang
15 Nov 2022
Computers in Biology and Medicine | VOL. 151

A novel study to classify breath inhalation and breath exhalation using audio signals from heart and trachea
Ahmet Reşit Kavsaoğlu ... Eftal Sehirli
Biomedical Signal Processing and Control | VOL. 80
Ahmet Reşit Kavsaoğlu, et. al.Ahmet Reşit Kavsaoğlu ... Eftal Sehirli
13 Oct 2022
Biomedical Signal Processing and Control | VOL. 80

Performance evaluation of ML models for preoperative prediction of HER2-low BC based on CE-CBBCT radiomic features: A prospective study.
Xianfei Chen ... Xueli Liang
Medicine | VOL. 103
Xianfei Chen, et. al.Xianfei Chen ... Xueli Liang
14 Jun 2024
Medicine | VOL. 103

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Explainable machine learning approach for cancer prediction through binarilization of RNA sequencing data.

Abstract

Talk to us

Similar Papers

More From: PLOS ONE