Abstract

Background Lung squamous cell carcinoma (LSCC) is a frequently diagnosed cancer worldwide, and it has a poor prognosis. The current study is aimed at developing the prediction of LSCC prognosis by integrating multiomics data including transcriptome, copy number variation data, and mutation data analysis, so as to predict patients' survival and discover new therapeutic targets. Methods RNASeq, SNP, CNV data, and LSCC patients' clinical follow-up information were downloaded from The Cancer Genome Atlas (TCGA), and the samples were randomly divided into two groups, namely, the training set and the validation set. In the training set, the genes related to prognosis and those with different copy numbers or with different SNPs were integrated to extract features using random forests, and finally, robust biomarkers were screened. In addition, a gene-related prognostic model was established and further verified in the test set and GEO validation set. Results We obtained a total of 804 prognostic-related genes and 535 copy amplification genes, 621 copy deletions genes, and 388 significantly mutated genes in genomic variants; noticeably, these genomic variant genes were found closely related to tumor development. A total of 51 candidate genes were obtained by integrating genomic variants and prognostic genes, and 5 characteristic genes (HIST1H2BH, SERPIND1, COL22A1, LCE3C, and ADAMTS17) were screened through random forest feature selection; we found that many of those genes had been reported to be related to LSCC progression. Cox regression analysis was performed to establish 5-gene signature that could serve as an independent prognostic factor for LSCC patients and can stratify risk samples in training set, test set, and external validation set (p < 0.01), and the 5-year survival areas under the curve (AUC) of both training set and validation set were > 0.67. Conclusion In the current study, 5 gene signatures were constructed as novel prognostic markers to predict the survival of LSCC patients. The present findings provide new diagnostic and prognostic biomarkers and therapeutic targets for LSCC treatment.

Highlights

  • The incidence and mortality of lung cancer have been increasing annually all over the world in the past few decades [1], allowing lung cancer to become a leading cause of male cancer death and the second most frequent cause of female cancer death right behind breast cancer [2]

  • We found that the 5-gene signature is involved in important biological processes and pathways of Lung squamous cell carcinoma (LSCC), and similar results were shown by GSEA analysis, suggesting that the 5-gene signature could effectively predict the prognostic risk of patients with LSCC

  • The Cancer Genome Atlas (TCGA) RNA-Seq FPKM data contained a total of 553 samples, and clinical follow-up information contains 758 samples with SNP chips 6.0; copy number variation data contained 501 samples downloaded from UCSC; mutation annotation information (MAF) contains 178 samples downloaded using GDC client, downloaded from the GEO standardized expression profile; and clinical information contains 176 samples of GSE42127 [19] data; among them, a total of 43 had clinical follow-up information downloaded from GEO, and download date was on June 5, 2019

Read more

Summary

Introduction

The incidence and mortality of lung cancer have been increasing annually all over the world in the past few decades [1], allowing lung cancer to become a leading cause of male cancer death and the second most frequent cause of female cancer death right behind breast cancer [2]. Scientific research and achievements concerning lung adenocarcinoma and LSCC are limited; regarding such a research gap, the study of LSCC is highly urgent and necessary Multiomics data, such as cancer genome mapping (TCGA) and therapies applied to research (TARGET) projects, have the potential to generate effective treatments, as BioMed Research International. The current study is aimed at developing the prediction of LSCC prognosis by integrating multiomics data including transcriptome, copy number variation data, and mutation data analysis, so as to predict patients’ survival and discover new therapeutic targets. The genes related to prognosis and those with different copy numbers or with different SNPs were integrated to extract features using random forests, and robust biomarkers were screened. The present findings provide new diagnostic and prognostic biomarkers and therapeutic targets for LSCC treatment

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call