Machine Learning-Based Identification of Colon Cancer Candidate Diagnostics Genes.

Saraswati Koppad,Annappa Basava,Georgios V Gkoutos,Animesh Acharjee,Katrina Nash

doi:10.3390/biology11030365

Abstract

Simple SummaryWe developed a predictive approach using different machine learning methods to identify a number of genes that can potentially serve as novel diagnostic colon cancer biomarkers.Background: Colorectal cancer (CRC) is the third leading cause of cancer-related death and the fourth most commonly diagnosed cancer worldwide. Due to a lack of diagnostic biomarkers and understanding of the underlying molecular mechanisms, CRC’s mortality rate continues to grow. CRC occurrence and progression are dynamic processes. The expression levels of specific molecules vary at various stages of CRC, rendering its early detection and diagnosis challenging and the need for identifying accurate and meaningful CRC biomarkers more pressing. The advances in high-throughput sequencing technologies have been used to explore novel gene expression, targeted treatments, and colon cancer pathogenesis. Such approaches are routinely being applied and result in large datasets whose analysis is increasingly becoming dependent on machine learning (ML) algorithms that have been demonstrated to be computationally efficient platforms for the identification of variables across such high-dimensional datasets. Methods: We developed a novel ML-based experimental design to study CRC gene associations. Six different machine learning methods were employed as classifiers to identify genes that can be used as diagnostics for CRC using gene expression and clinical datasets. The accuracy, sensitivity, specificity, F1 score, and area under receiver operating characteristic (AUROC) curve were derived to explore the differentially expressed genes (DEGs) for CRC diagnosis. Gene ontology enrichment analyses of these DEGs were performed and predicted gene signatures were linked with miRNAs. Results: We evaluated six machine learning classification methods (Adaboost, ExtraTrees, logistic regression, naïve Bayes classifier, random forest, and XGBoost) across different combinations of training and test datasets over GEO datasets. The accuracy and the AUROC of each combination of training and test data with different algorithms were used as comparison metrics. Random forest (RF) models consistently performed better than other models. In total, 34 genes were identified and used for pathway and gene set enrichment analysis. Further mapping of the 34 genes with miRNA identified interesting miRNA hubs genes. Conclusions: We identified 34 genes with high accuracy that can be used as a diagnostics panel for CRC.

Highlights

Colorectal cancer (CRC) is the third most common cause of death due to cancer and the fourth most commonly diagnosed cancer worldwide [1,2]
We used three gene expression datasets (GSE44861, GSE20916, GSE113513), available from the GEO database [24], and applied six different machine learning methods (Adaboost, ExtraTrees, logistic regression, naïve Bayes, random forest, and XGBoost) to identify genes that can be used as diagnostics markers
For each of the three GEO datasets examined, their respective differentially expressed genes (DEGs) were used as features across six different classification models, namely, Adaboost, ExtraTrees, logistic regression, naïve Bayes classifier, random forest, and XGBoost

Summary

Introduction

Colorectal cancer (CRC) is the third most common cause of death due to cancer and the fourth most commonly diagnosed cancer worldwide [1,2]. Recent studies have used gene microarrays, as well as high-throughput sequencing technologies, to explore differential expressing novel genes in colon cancer [10]. The advances in high-throughput sequencing technologies have been used to explore novel gene expression, targeted treatments, and colon cancer pathogenesis. Such approaches are routinely being applied and result in large datasets whose analysis is increasingly becoming dependent on machine learning (ML) algorithms that have been demonstrated to be computationally efficient platforms for the identification of variables across such high-dimensional datasets.

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Biology	Publication Date: Feb 25, 2022
Citations: 22	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Machine Learning-Based Identification of Colon Cancer Candidate Diagnostics Genes.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Biology

Lead the way for us

Similar Papers

Weighted gene co-expression network analysis combined with machine learning validation to identify key hub biomarkers in colorectal cancer.
Chenchen Guo ... Quanguo Liu
Functional & Integrative Genomics | VOL. 23
Chenchen Guo, et. al.Chenchen Guo ... Quanguo Liu
28 Dec 2022
Functional & Integrative Genomics | VOL. 23

Identification of hub genes and potential molecular mechanisms in MSS/MSI classifier primary colorectal cancer based on multiple datasets
Xia Qiao ... Xu Zhang
Discover Oncology | VOL. 15
Xia Qiao, et. al.Xia Qiao ... Xu Zhang
18 Jul 2024
Discover Oncology | VOL. 15

Comparison Between Statistical Model and Machine Learning Methods for Predicting the Risk of Renal Function Decline Using Routine Clinical Data in Health Screening.
Xia Cao ... Binfang Yang
Risk Management and Healthcare Policy | VOL. 15
Xia Cao, et. al.Xia Cao ... Binfang Yang
01 Apr 2022
Risk Management and Healthcare Policy | VOL. 15

Application of machine learning model in predicting the likelihood of blood transfusion after hip fracture surgery.
Xiao Chen ... Ruixin Tang
Aging Clinical and Experimental Research | VOL. 35
Xiao Chen, et. al.Xiao Chen ... Ruixin Tang
21 Sep 2023
Aging Clinical and Experimental Research | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Machine Learning-Based Identification of Colon Cancer Candidate Diagnostics Genes.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Biology