This study aimed to explore the correlation between primary tumors (PT) and paired metastatic lymph nodes (LN) and to develop a predictive model to provide evidence for forecasting patient prognoses. We obtained single-cell and bulk transcriptome data from the Gene Expression Omnibus database. Furthermore, mRNA transcriptomic data, encompassing 112 normal tissues and 1066 breast cancer samples, along with survival, clinical, and mutation information for breast cancer patients, were acquired from The Cancer Genome Atlas (TCGA). Employing a machine learning integration framework incorporating ten distinct algorithms, we developed and validated a prognostic model. We constructed a prognostic model named Lymph Node Metastasis-Related Scores (LMRS) using 26 differentially expressed genes trained on eight TCGA datasets. Across validation sets, the model demonstrated a high C-index, signifying its stability and effectiveness, outperforming 64 models from other studies. Notably, cytolytic activity and T cell co-stimulation were downregulated in the high LMRS group, alongside a downregulation of immune cells, including B cells, CD8 + T cells, iDCs, and TILs. Similarly, most immune checkpoints exhibited a decreasing trend with high LMRS expression. Finally, we selected the hub biomarkers PGK1 and HSP90 for pathological verification. Results indicated higher expression levels in PT and LN compared to normal and benign tumors, with higher expression levels in LN than in PT. This comprehensive analysis sheds light on gene expression differences between PT and LN in breast cancer, culminating in the development of a multiple-gene prognostic model with high clinical accuracy for prognosis prediction.
Read full abstract