Abstract

Identification of reliable predictive biomarkers and new therapeutic targets is a critical step for significant improvement in patient outcomes. Here, we developed a multi-step bioinformatics analytic strategy to mine large omics and clinical data to build a prognostic scoring system for predicting the overall survival (OS) of lung adenocarcinoma (LuADC) patients. In latter we first identified 1327 significantly and robustly deregulated genes, 600 of which were significantly associated with the OS of LuADC patients. Gene co-expression network analysis revealed the biological functions of these 600 genes in normal lung and LuADCs, which were found to be enriched for cell cycle-related processes, blood vessel development, cell-matrix adhesion and metabolic processes. Finally, we implemented a multiple resampling method combined with Cox regression analysis to identify a 27-gene signature associated with OS, and then created a prognostic scoring system based on this signature. This scoring system robustly predicted OS of LuADC patients in 100 sampling test sets and was further validated in four independent LuADC cohorts. In addition, in comparison to other existing prognostic gene signatures published in the literature, our signature was significantly superior in predicting OS of LuADC patients. In summary, our multi-omics and clinical data integration study created a 27-gene prognostic risk score that can predict OS of LuADC patients independent of age, gender and clinical stage. This score could guide therapeutic selection and allow stratification in clinical trials.

Highlights

  • Lung cancer is the leading cause of cancerrelated death worldwide [1], where non-small cell lung cancer (NSCLC) is the most common type of cancer affecting the lungs with adenocarcinoma being the most common subtype

  • This resulted in a set of 1982 probe IDs (1374 down-regulated and 608 upregulated) represented by 1327 unique genes (884 downregulated and 543 up-regulated), which were consistently deregulated in all three datasets (Figure 1; Supporing Information Supplementary Table 1)

  • The effects of high or low expression levels on overall survival (OS) were examined using the KaplanMeier survival curve and log-rank test. This analysis identified 600 out of 1327 genes that were significantly associated with OS. 406 genes had a hazard ratio (HR) < 1 and 194 genes had a HR > 1 (Supporing Information Supplementary Table 2)

Read more

Summary

Introduction

Lung cancer is the leading cause of cancerrelated death worldwide [1], where non-small cell lung cancer (NSCLC) is the most common type of cancer affecting the lungs with adenocarcinoma being the most common subtype. Patient stratification based on histopathological markers, immunohistochemistry and other molecular factors has been evaluated to improve treatment decisions in lung adenocarcinoma (LuADC) patients [4,5,6]. The availability of large cancer genomic data sets allows for unbiased approaches to identify multi-gene signatures important in tumor progression. A number of gene signatures using microarray analysis show promise for prognosis or prediction of response to therapy in NSCLC [11,12,13,14]. These signatures were either based on incomplete genome annotation or were based solely on existing knowledge. A new comprehensive and unbiased genome-wide screening for genes associated with lung cancer prognosis is warranted

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call