Abstract
DNA methylation change has been useful for cancer biomarker discovery, classification, and potential treatment development. So far, existing methods use either differentially methylated CpG sites or combined CpG sites, namely differentially methylated regions, that can be mapped to genes. However, such methylation signal mapping has limitations. To address these limitations, in this study, we introduced a combinatorial framework using linear regression, differential expression, deep learning method for accurate biological interpretation of DNA methylation through integrating DNA methylation data and corresponding TCGA gene expression data. We demonstrated it for uterine cervical cancer. First, we pre-filtered outliers from the data set and then determined the predicted gene expression value from the pre-filtered methylation data through linear regression. We identified differentially expressed genes (DEGs) by Empirical Bayes test using . Then we applied a deep learning method, “nnet” to classify the cervical cancer label of those DEGs to determine all classification metrics including accuracy and area under curve (AUC) through 10-fold cross validation. We applied our approach to uterine cervical cancer DNA methylation dataset (NCBI accession ID: GSE30760, 27,578 features covering 63 tumor and 152 matched normal samples). After linear regression and differential expression analysis, we obtained 6287 DEGs with false discovery rate (FDR) . After performing deep learning analysis, we obtained average classification accuracy () of the uterine cervical cancerous labels. This performance is better than that of other peer methods. We performed in-degree and out-degree hub gene network analysis using . We reported five top in-degree genes (, , , and ) and five top out-degree genes (, , , and ). After that, we performed KEGG pathway and Gene Ontology enrichment analysis of DEGs using tool WebGestalt(WEB-based Gene SeT AnaLysis Toolkit). In summary, our proposed framework that integrated linear regression, differential expression, deep learning provides a robust approach to better interpret DNA methylation analysis and gene expression data in disease study.
Highlights
DNA methylation has been found a promising biomarker in cancer detection and cancer classification
Linear regression and differential expression analysis, we obtained 6287 differentially expressed genes (DEGs) having false discovery rate (FDR) < 0.001 by Limma, in a list accompanied by computed t-score, p-value and FDR
We provided the list of all DEGs obtained by differential expression analysis by Empirical Bayes test using Limma with FDR corrected p-value in a supplementary file, Additional file 1: Table S1
Summary
DNA methylation has been found a promising biomarker in cancer detection and cancer classification. DNA methylation is vital for normal development It plays very important role in a number of key operations including genomic imprinting, inactivation of X-chromosome, repression of repetitive element transcription and transposition, and different diseases including cancer [1]. To biologically interpret the DNA methylation data, two kinds of analysis are available: (i) single differentially methylated genes (CpG sites) finding [2,3] and (ii) differentially methylated region (DMR) finding [4,5,6]. These two kinds of analysis are only specific to performing a single task. It is important to incorporate different factors to correctly interpret DNA methylation data by which it can work as multi-functionalities from different directions such as prediction of gene expression using DNA methylation, differential expression analysis, cancer classification [7], hub gene finding, and others
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.