Abstract

Breast cancer is a deadly disease which commonly occurs all over the world and has been found to be the largest cause of cancer in females. Its detection is still a major challenge, both from a computational and biological point of views. Next Generation Sequencing (NGS) techniques have accelerated the mapping of human genomes rapidly. Involvement of advanced NGS techniques reveals that multiple genetic molecules are responsible for the cause of breast cancer and its subtypes. However, the high volume of data that is produced by the NGS techniques is difficult to study because of their high dimensionality and complexity. Thus, the integrated study of multi-omics data is one of the major challenges in medical science. This fact motivated us to study the NGS based high throughput expression data of miRNAs and mRNAs as well as Beta values of DNA Methylation of the corresponding mRNAs. In this regard, first, these datasets, together consisting of 33564 features of 305 patients in five classes viz. Luminal A, Luminal B, HER2-enriched, Basal-like and Control, are analysed in an integrated fashion using deep learning technique to classify the breast cancer subtypes properly. Second, the results of the deep learning technique are further analysed in order to identify the deeply connected features, i.e. either miRNA or mRNA or DNA Methylation, which are pivotal in the classification of breast cancer subtypes as well as play a crucial role in its formation. For this purpose, a deep learning technique, called stacked autoencoder is used to encode/transform the features into a low dimensional space, which is then fed to the five well known classifiers for classification. Moreover, the same encoded data is used to select the potential features after performing multiplication with the original data and Bonferroni correction on the p-values produced by the one-sample t-test. The results have been validated quantitatively and through biological significance analysis where oncogene TP53 and tumor suppression gene BRCA1 have been found. These genes are known to play a crucial role in breast cancer. The datasets, code and supplementary materials of this work are provided online at http://www.nitttrkol.ac.in/indrajit/projects/integrated-analysis-breastcancer-subtypes/.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call