Abstract
Breast cancer (BC) is one of the most common tumors, leading the causes of cancer death in women. However, the pathogenesis of BC still remains unclear, and the atlas of BC-associated risk factors is far from complete. In this study, we constructed a BC-specific coordinately regulatory network (CRN) to prioritize potential BC-associated protein-coding genes (PCGs) and non-coding RNAs (ncRNAs). We integrated 813 BC sample transcriptome data from The Cancer Genome Atlas (TCGA) and eight types of regulatory relationships to construct BC-specific CRN, including 387 transcription factors (TFs), 174 microRNAs (miRNAs), 407 long non-coding RNAs (lncRNAs), and 905 PCGs. After that, the random walk with restart (RWR) method was performed on the CRN by using the known BC-associated factors as seeds, and potential BC-associated risk factors were prioritized. The leave-one-out cross-validation (LOOCV) was utilized on the BC-specific CRN and achieved an area under the curve (AUC) of 0.92. The performances of common CRN, common protein–protein interaction (PPI) network, and BC-specific PPI network were also evaluated, demonstrating that the context-specific CRN prioritizes BC risk factors. Functional analysis for the top 100-ranked risk factors in the candidate list revealed that these factors were significantly enriched in cancer-related functions and had significant semantic similarity with BC-related gene ontology (GO) terms. Differential expression analysis and survival analysis proved that the prioritized risk factors significantly associated with BC progression and prognosis. In total, we provided a computational method to predict reliable BC-associated risk factors, which would help improve the understanding of the pathology of BC and benefit disease diagnosis and prognosis.
Highlights
Breast cancer (BC), a type of cancer developing from breast tissue, is the most frequent occurrence and one of the leading causes of cancer-related deaths among women (Siegel et al, 2019)
We obtained the interactions provided in the experimental module, and the prediction score should be no less than 0.95. long non-coding RNAs (lncRNAs)–transcription factors (TFs) and lncRNA–protein-coding genes (PCGs) regulations were downloaded from LncReg and LncRNA2Target (v2.0) (Jiang et al, 2015)
Great progress has been achieved in identifying risk factors of BC development in the last decades, the comprehensive landscape of genetic contribution to BC etiology remains to be further elucidated (Skol et al, 2016; Sun et al, 2017)
Summary
Breast cancer (BC), a type of cancer developing from breast tissue, is the most frequent occurrence and one of the leading causes of cancer-related deaths among women (Siegel et al, 2019). A large amount of study has been conducted to dissect the pathogenesis of BC, and multiple risk factors have been identified for the development of BC in the last decades. Epidemiological data demonstrated 50% of BCs occurred in women aged from 50 to 69 years, and both BRCA1 and BRCA2 mutations conferred a 60 to 80% lifetime risk for the development of BC (Matsen and Neumayer, 2013). With the advances in RNA-sequencing techniques, noncoding RNAs (ncRNAs), especially microRNAs (miRNAs) and long non-coding RNAs (lncRNAs), are confirmed to be related with the pathology of BC (Bhan et al, 2017; Xu et al, 2017). Yan et al (2008) identified differentially expressed (DE) miRNAs in BC and suggested that miR-21 overexpression contributed to the poor prognosis of BC patients. Great progress has been made in identifying genetic risk factors of BC, the genetic contribution to BC etiology remains to be elucidated (Skol et al, 2016)
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have