Abstract
BackgroundLots of researches have been conducted in the selection of gene signatures that could distinguish the cancer patients from the normal. However, it is still an open question on how to extract the robust gene features.MethodsIn this work, a gene signature selection strategy for TCGA data was proposed by integrating the gene expression data, the methylation data and the prior knowledge about cancer biomarkers. Different from the traditional integration method, the expanded 450 K methylation data were applied instead of the original 450 K array data, and the reported biomarkers were weighted in the feature selection. Fuzzy rule based classification method and cross validation strategy were applied in the model construction for performance evaluation.ResultsOur selected gene features showed prediction accuracy close to 100% in the cross validation with fuzzy rule based classification model on 6 cancers from TCGA. The cross validation performance of our proposed model is similar to other integrative models or RNA-seq only model, while the prediction performance on independent data is obviously better than other 5 models. The gene signatures extracted with our fuzzy rule based integrative feature selection strategy were more robust, and had the potential to get better prediction results.ConclusionThe results indicated that the integration of expanded methylation data would cover more genes, and had greater capacity to retrieve the signature genes compared with the original 450 K methylation data. Also, the integration of the reported biomarkers was a promising way to improve the performance. PTCHD3 gene was selected as a discriminating gene in 3 out of the 6 cancers, which suggested that it might play important role in the cancer risk and would be worthy for the intensive investigation.
Highlights
Lots of researches have been conducted in the selection of gene signatures that could distinguish the cancer patients from the normal
The strategy mainly include two steps: firstly, the integrative analysis was implemented on the RNA-seq data and expanded DNA methylation profile, the methylation profile was retrieved from a newly developed expanding algorithm [19], and included ~ 18 times more CpG sites than 450 K methylation array data; the candidate gene features were further selected based on its combination performance with the reported biomarkers
Compared with Differentially Methylated Genes (DMGs) from the original 450 K methylation array data, one could see that the expanded methylation landscape could provide more DMGs
Summary
Lots of researches have been conducted in the selection of gene signatures that could distinguish the cancer patients from the normal It is still an open question on how to extract the robust gene features. For the Cancer Genome Atlas (TCGA, https://portal.gdc.cancer.gov//) which provided multiple-omics data, the integration of gene expression and DNA methylation profiles could improve the molecular subtype classification [15]. The strategy mainly include two steps: firstly, the integrative analysis was implemented on the RNA-seq data and expanded DNA methylation profile, the methylation profile was retrieved from a newly developed expanding algorithm [19], and included ~ 18 times more CpG sites than 450 K methylation array data; the candidate gene features were further selected based on its combination performance with the reported biomarkers
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.