Abstract

Background: Breast cancer is an invasive disease with complex molecular mechanisms. Prognosis-related biomarkers are still urgently needed to predict outcomes of breast cancer patients.Methods: Original data were download from The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO). The analyses were performed using perl-5.32 and R-x64-4.1.1.Results: In this study, 1086 differentially expressed genes (DEGs) were identified in the TCGA cohort; 523 shared DEGs were identified in the TCGA and GSE10886 cohorts. Eight subtypes were estimated using non-negative matrix factorization clustering with significant differences seen in overall survival (OS) and progression-free survival (PFS) (P < 0.01). Univariate Cox analysis and least absolute shrinkage and selection operator (LASSO) regression analysis were performed to develop a related risk score related to the 17 DEGs; this score separated breast cancer into low- and high-risk groups with significant differences in survival (P < 0.01) and showed powerful effectiveness (TCGA all group: 1-year area under the curve [AUC] = 0.729, 3-year AUC = 0.778, 5-year AUC = 0.781). A nomogram prediction model was constructed using non-negative matrix factorization clustering, the risk score, and clinical characteristics. Our model was confirmed to be related with tumor microenvironment. Furthermore, DEGs in high-risk breast cancer were enriched in histidine metabolism (normalized enrichment score [NES] = 1.49, P < 0.05), protein export (NES = 1.58, P < 0.05), and steroid hormone biosynthesis signaling pathways (NES = 1.56, P < 0.05).Conclusions: We established a comprehensive model that can predict prognosis and guide treatment.

Highlights

  • Breast cancer is the most commonly diagnosed malignancy and cause of cancer-related deaths in females [1]

  • The expression of 1086 Differentially expressed gene (DEG) was founded in the The Cancer Genome Atlas (TCGA)-BRCA cohort by comparing breast cancer tissue (1096 tumor samples) with normal tissue (112 normal breast samples) (Figure 2A, 2B)

  • The results showed that CDK1 was the most significant gene with a connectivity degree of 318, followed by CCNA2, BUB1, CCNB1, and TOP2A

Read more

Summary

Introduction

Breast cancer is the most commonly diagnosed malignancy and cause of cancer-related deaths in females [1]. The Oncotype DX 21-gene test can evaluate the tumor recurrence and predict chemotherapy responses in patients with ERpositive breast cancer. MammaPrint signature and PAM50 can improve prognostic prediction in breast cancer patients. Despite their high power, these tools only consider gene status, and a model with comprehensive consideration of additional factors is urgently needed. Univariate Cox analysis and least absolute shrinkage and selection operator (LASSO) regression analysis were performed to develop a related risk score related to the 17 DEGs; this score separated breast cancer into low- and high-risk groups with significant differences in survival (P < 0.01) and showed powerful effectiveness (TCGA all group: 1-year area under the curve [AUC] = 0.729, 3-year AUC = 0.778, 5-year AUC = 0.781). Conclusions: We established a comprehensive model that can predict prognosis and guide treatment

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call