Abstract

Accurate screening on cancer biomarkers contributes to health assessment, drug screening, and targeted therapy for precision medicine. The rapid development of high-throughput sequencing technology has identified abundant genomic biomarkers, but most of them are limited to single-cancer analysis. Based on the combination of Fisher score, Recursive feature elimination, and Logistic regression (FRL), this paper proposes an integrative feature selection algorithm named FRL to explore potential cancer genomic biomarkers on cancer subsets. Fisher score is initially used to calculate the weights of genes to rapidly reduce the dimension. Recursive feature elimination and Logistic regression are then jointly employed to extract the optimal subset. Compared to the current differential expression analysis tool GEO2R based on the Limma algorithm, FRL has greater classification precision than Limma. Compared with five traditional feature selection algorithms, FRL exhibits excellent performance on accuracy (ACC) and F1-score and greatly improves computational efficiency. On high-noise datasets such as esophageal cancer, the ACC of FRL is 30% superior to the average ACC achieved with other traditional algorithms. As biomarkers found in multiple studies are more reliable and reproducible, and reveal stronger association on potential clinical value than single analysis, through literature review and spatial analyses of gene functional enrichment and functional pathways, we conduct cluster analysis on 10 diverse cancers with high mortality and form a potential biomarker module comprising 19 genes. All genes in this module can serve as potential biomarkers to provide more information on the overall oncogenesis mechanism for the detection of diverse early cancers and assist in targeted anticancer therapies for further developments in precision medicine.

Highlights

  • Cancers are genomic diseases that cause uncontrolled abnormal cell growth through the constant accumulation of certain genetic mutations [1]

  • All of the datasets are retrieved from a public repository called the Gene Expression Omnibus (GEO), which can be downloaded from the National Center for Biotechnology Information website

  • We present an integrative feature selection algorithm called FRL, which employs Fisher score, Recursive feature elimination, and Logistic regression (FRL)

Read more

Summary

Introduction

Cancers are genomic diseases that cause uncontrolled abnormal cell growth through the constant accumulation of certain genetic mutations [1]. Genes that present specific regulation signals to activate corresponding signaling pathways in cancers are called genomic biomarkers and can be tested by DNA chips [2]. Precision medicine is defined as the patient-targeted treatment based on the characteristics of genetic abnormalities and biomarkers. Driven by the popularity of precision medicine [5], the goal of targeted therapies for cancers is to track and address biomarkers from multidimensional gene expression data [6]. Chips can obtain gene expression data by synchronously tracking the expressions of a large number of genes. A gene expression profile has the characteristics of small sample sizes, high dimensionality, and large amounts

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.