Abstract

Breast cancer being major death-leading cancer demands utmost attention. Recently, the next-generation sequencing techniques capable of capturing gene expression data have been used successfully for the detection of breast cancer. The proposed work identifies a small set of biomarker genes for molecular stratification of breast cancer subtypes. In this work, we have proposed Triphasic DeepBRCA - a novel deep learning framework, for breast cancer subtype detection and biomarker discovery. In the first phase, an autoencoder is used for extracting a compact representation of the gene expression data which is provided as an input to a supervised feed-forward neural network for classification of breast cancer subtypes in the second phase. In the third phase, the proposed Biomarker Gene Discovery Algorithm (BGDA) leverages the neural network classifier of the second phase to estimate the relevance of various genes. Next, Wilcoxon rank-sum test with False Discovery Rate (FDR) Correction is applied to identify the most differentiating genes. Using the TCGA BRCA RNASeq data, the proposed framework enabled us to discover a set of 54 most-variant genes. Using 10-fold cross-validation, we obtained a mean accuracy of 0.899 ± 0.04 at 95% confidence interval. We also validated our results on METABRIC dataset. Gene Set Analysis revealed statistically enriched pathways. Heatmap of the expression levels and t-SNE visualization reveals that these genes have an aggregated capability to distinguish amongst the different breast cancer subtypes. Further, the prognostic evaluation using 54 biomarkers revealed that over 30 genes out of 54 are significantly linked to the prognostic outcome.

Highlights

  • Breast Cancer is a complex and heterogeneous disorder marked by molecular, cellular, and clinical variations resulting in the unrestricted growth of abnormal cells

  • SCOPE Molecular subtyping of breast cancer has established itself as a promising approach for devising a clinical strategy, which in turn requires the identification of a small set of biomarker genes for molecular stratification of breast cancer subtypes

  • We have proposed Triphasic DeepBRCA - a three-phase deep learning framework for breast cancer subtype classification and biomarker discovery

Read more

Summary

Introduction

Breast Cancer is a complex and heterogeneous disorder marked by molecular, cellular, and clinical variations resulting in the unrestricted growth of abnormal cells. It develops mainly due to somatic mutations in certain genes [2], even though in a small fraction of cases, the cause of breast cancer may be hereditary. Intrinsic heterogeneity of breast cancer leads to its classification into clinically and prognostically crucial subtypes [5]. There are different approaches for breast cancer stratification [6]. Breast cancer may be categorized by TNM (Tumor Node Metastasis) staging.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call