Robust biomarker discovery for hepatocellular carcinoma from high-throughput data by multiple feature selection methods

Zishuang Zhang,Zhi-Ping Liu

doi:10.1186/s12920-021-00957-4

Abstract

BackgroundHepatocellular carcinoma (HCC) is one of the most common cancers. The discovery of specific genes severing as biomarkers is of paramount significance for cancer diagnosis and prognosis. The high-throughput omics data generated by the cancer genome atlas (TCGA) consortium provides a valuable resource for the discovery of HCC biomarker genes. Numerous methods have been proposed to select cancer biomarkers. However, these methods have not investigated the robustness of identification with different feature selection techniques.MethodsWe use six different recursive feature elimination methods to select the gene signiatures of HCC from TCGA liver cancer data. The genes shared in the six selected subsets are proposed as robust biomarkers. Akaike information criterion (AIC) is employed to explain the optimization process of feature selection, which provides a statistical interpretation for the feature selection in machine learning methods. And we use several methods to validate the screened biomarkers.ResultsIn this paper, we propose a robust method for discovering biomarker genes for HCC from gene expression data. Specifically, we implement recursive feature elimination cross-validation (RFE-CV) methods based on six different classication algorithms. The overlaps in the discovered gene sets via different methods are referred as the identified biomarkers. We give an interpretation of the feature selection process based on machine learning using AIC in statistics. Furthermore, the features selected by the backward logistic stepwise regression via AIC minimum theory are completely contained in the identified biomarkers. Through the classification results, the superiority of interpretable robust biomarker discovery method is verified.ConclusionsIt is found that overlaps among gene subsets contain different quantitative features selected by the RFE-CV of 6 classifiers. The AIC values in the model selection provide a theoretical foundation for the feature selection process of biomarker discovery via machine learning. What’s more, genes containing in more optimally selected subsets make better biological sense and implication. The quality of feature selection is improved by the intersections of biomarkers selected from different classifiers. This is a general method suitable for screening biomarkers of complex diseases from high-throughput data.

Highlights

Hepatocellular carcinoma (HCC) is one of the most common cancers
Classification and feature selection We obtain the 6 best classification subsets corresponding to the 6 machine learning algorithms
We find that there are a lot of overlaps between the features selected by random forest (RF) and the features selected by the four classifiers. 6 out of features selected by naïve Bayes (NB) appear in the features selected by RF

Summary

Introduction

The discovery of specific genes severing as biomarkers is of paramount significance for cancer diagnosis and prognosis. The high-throughput omics data generated by the cancer genome atlas (TCGA) consortium provides a valuable resource for the discovery of HCC biomarker genes. Numerous methods have been proposed to select cancer biomarkers. These methods have not investigated the robustness of identification with different feature selection techniques. Using effective molecular biomarkers is one of the most efficient way of realizing early cancer diagnosis. The availability of high-throughput omics data provides unprecedented opportunity and challenge for discoverying diagnostic biomarkers for HCC. The cohort study of TCGA provides amount of valuable data resources for the searching of cancer biomarkers [4]

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC medical genomics	Publication Date: Aug 1, 2021
Citations: 17	License type: open-access

R Discovery Prime

R Discovery Prime

Robust biomarker discovery for hepatocellular carcinoma from high-throughput data by multiple feature selection methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC medical genomics

Lead the way for us

Similar Papers

An Efficient Feature Selection Strategy Based on Multiple Support Vector Machine Technology with Gene Expression Data.
Ying Zhang ... Wenbin Liang
BioMed Research International | VOL. 2018
Ying Zhang, et. al.Ying Zhang ... Wenbin Liang
30 Aug 2018
BioMed Research International | VOL. 2018

Robust biomarker screening from gene expression data by stable machine learning-recursive feature elimination methods
Lingyu Li ... Zhi-Ping Liu
Computational Biology and Chemistry | VOL. 100
Lingyu Li, et. al.Lingyu Li ... Zhi-Ping Liu
29 Jul 2022
Computational Biology and Chemistry | VOL. 100

Performance Examination and Feature Selection on Sybil User Data using Recursive Feature Elimination
-
International Journal of Innovative Technology and Exploring Engineering | VOL. 8
--
01 Oct 2019
International Journal of Innovative Technology and Exploring Engineering | VOL. 8

Enhancing the prediction of IDC breast cancer staging from gene expression profiles using hybrid feature selection methods and deep learning architecture.
Akash Kishore ... Akshaya Mohan
Medical & Biological Engineering & Computing | VOL. 61
Akash Kishore, et. al.Akash Kishore ... Akshaya Mohan
02 Aug 2023
Medical & Biological Engineering & Computing | VOL. 61

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Robust biomarker discovery for hepatocellular carcinoma from high-throughput data by multiple feature selection methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC medical genomics