Abstract

Precise biomarker development is a key step in disease management. However, most of the published biomarkers were derived from a relatively small number of samples with supervised approaches. Recent advances in unsupervised machine learning promise to leverage very large datasets for making better predictions of disease biomarkers. Denoising autoencoder (DA) is one of the unsupervised deep learning algorithms, which is a stochastic version of autoencoder techniques. The principle of DA is to force the hidden layer of autoencoder to capture more robust features by reconstructing a clean input from a corrupted one. Here, a DA model was applied to analyze integrated transcriptomic data from 13 published lung cancer studies, which consisted of 1916 human lung tissue samples. Using DA, we discovered a molecular signature composed of multiple genes for lung adenocarcinoma (ADC). In independent validation cohorts, the proposed molecular signature is proved to be an effective classifier for lung cancer histological subtypes. Also, this signature successfully predicts clinical outcome in lung ADC, which is independent of traditional prognostic factors. More importantly, this signature exhibits a superior prognostic power compared with the other published prognostic genes. Our study suggests that unsupervised learning is helpful for biomarker development in the era of precision medicine.

Highlights

  • Lung cancer is the most frequently diagnosed cancer and the leading cause of cancer death all over the world [1,2]

  • We identified some important Denoising autoencoder (DA) hidden nodes that were related to clinical phenotypes and constructed a molecular signature composed of multiple genes from the hidden DA nodes

  • We obtained 13 lung cancer transcriptome datasets from the Gene Expression Omnibus (GEO) database [36], which were all based on the Affymetrix Human Genome U133 Plus 2.0 Array (Table S1)

Read more

Summary

Introduction

Lung cancer is the most frequently diagnosed cancer and the leading cause of cancer death all over the world [1,2]. Combinations of mRNA, microRNA, and DNA sequencing with copy number, methylation, and proteome analyses revealed a comprehensive molecular profiling of lung ADC [12] Based on these molecular profiling data and the clinical phenotype data, many biomarker sets have been identified that provide better diagnosis or prognosis of lung ADC [15,16,17,18,19,20,21,22,23]. We proposed another set of 13 ion channel genes an overall diagnostic biomarker set to differentiate lung cancer subtypes [17] These studies provide a foundation for classification, outcome prediction, and treatment guidance of lung ADC

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call