Transcriptome profiling by combined machine learning and statistical R analysis identifies TMEM236 as a potential novel diagnostic biomarker for colorectal cancer

Neha Shree Maurya,Aakash Chawade,Sandeep Kushwaha,Ashutosh Mani

doi:10.1038/s41598-021-92692-0

Neha Shree Maurya, Aakash Chawade + Show 2 more

Open Access

PDF Available

https://doi.org/10.1038/s41598-021-92692-0

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Colorectal cancer (CRC) is a common cause of cancer-related deaths worldwide. The CRC mRNA gene expression dataset containing 644 CRC tumor and 51 normal samples from the cancer genome atlas (TCGA) was pre-processed to identify the significant differentially expressed genes (DEGs). Feature selection techniques Least absolute shrinkage and selection operator (LASSO) and Relief were used along with class balancing for obtaining features (genes) of high importance. The classification of the CRC dataset was done by ML algorithms namely, random forest (RF), K-nearest neighbour (KNN), and artificial neural networks (ANN). The significant DEGs were 2933, having 1832 upregulated and 1101 downregulated genes. The CRC gene expression dataset had 23,186 features. LASSO had performed better than Relief for classifying tumor and normal samples through ML algorithms namely RF, KNN, and ANN with an accuracy of 100%, while Relief had given 79.5%, 85.05%, and 100% respectively. Common features between LASSO and DEGs were 38, from them only 5 common genes namely, VSTM2A, NR5A2, TMEM236, GDLN, and ETFDH had shown statistically significant survival analysis. Functional review and analysis of the selected genes helped in downsizing the 5 genes to 2, which are VSTM2A and TMEM236. Differential expression of TMEM236 was statistically significant and was markedly reduced in the dataset which solicits appreciation for assessment as a novel biomarker for CRC diagnosis.

Highlights

Colorectal Cancer (CRC) is very common in many countries and is one of the major causes of death worldwide[1]
A total of 695 Colorectal cancer (CRC) samples were collected from the The Cancer Genome Atlas (TCGA) database Fig. 2
The CRC gene expression dataset was reduced in dimensionality and was further analyzed through the different algorithms, named as Principal Component Analysis (PCA) and t-distributed stochastic neighborhood estimation (t-SNE)

Summary

Introduction

Colorectal Cancer (CRC) is very common in many countries and is one of the major causes of death worldwide[1]. Sun et al.[1] used GEO datasets and applied the Robust Rank Aggregation method to identify significant Differentially Expressed Genes (DEGs). They found 494 significant differential expressions containing 282 downregulated and 212 upregulated genes. Another study by Su et al.[4] has used both miRNA and mRNA datasets from GEO to identify. The studies which are mentioned above had only used the traditional approaches of R bioconductor for finding the genes responsible in CRC progression. Sometimes the traditional approaches often provide results that are inconsistent in behavior In this context, alternative methods can be implemented which can provide better and consistent results to achieve the respective goal. The classification of gene expression data can be performed through machine learning (ML) algorithms to find significant features

Methods

Results

Discussion

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Reports	Publication Date: Jul 12, 2021
Citations: 34	License type: open-access

R Discovery Prime

Transcriptome profiling by combined machine learning and statistical R analysis identifies TMEM236 as a potential novel diagnostic biomarker for colorectal cancer

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

Abstract PO-167: Predictive genetic risk factors and prognostic nomogram for colorectal cancer in Native Hawaiian population
Yuanyuan Fu ... Devin Takahashi
Cancer Epidemiology, Biomarkers & Prevention | VOL. 31
Yuanyuan Fu, et. al.Yuanyuan Fu ... Devin Takahashi
01 Jan 2021
Cancer Epidemiology, Biomarkers & Prevention | VOL. 31

Abstract 5865: Analysis of RNA sequencing data to advance our understanding of colorectal cancer health disparity in Native Hawaiians
Yuanyuan Fu ... Peiwen Fei
Cancer Research | VOL. 82
Yuanyuan Fu, et. al.Yuanyuan Fu ... Peiwen Fei
15 Jun 2022
Cancer Research | VOL. 82

Construction of a pyroptosis-related lncRNAs signature for predicting prognosis and immunotherapy response in glioma.
Qianrong Huang ... Fangzhou Guo
Medicine | VOL. 102
Qianrong Huang, et. al.Qianrong Huang ... Fangzhou Guo
10 Feb 2023
Medicine | VOL. 102

Comparison of ischemic stroke diagnosis models based on machine learning.
Wan-Xia Yang ... Jian-Qin Xie
Frontiers in Neurology | VOL. 13
Wan-Xia Yang, et. al.Wan-Xia Yang ... Jian-Qin Xie
05 Dec 2022
Frontiers in Neurology | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Transcriptome profiling by combined machine learning and statistical R analysis identifies TMEM236 as a potential novel diagnostic biomarker for colorectal cancer

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Scientific Reports