Abstract

BackgroundThe pathogenesis of asthma is a complex process involving multiple genes and pathways. Identifying biomarkers from asthma datasets, especially those that include heterogeneous subpopulations, is challenging. Potentially, autoencoders provide ideal frameworks for such tasks as they can embed complex, noisy high-dimensional gene expression data into a low-dimensional latent space in an unsupervised fashion, enabling us to extract distinguishing features from expression data.ResultsHere, we developed a framework combining a denoising autoencoder and a supervised learning classifier to identify gene signatures related to asthma severity. Using the trained autoencoder with 50 hidden units, we found that hierarchical clustering on the low-dimensional embedding corresponds well with previously defined and clinically relevant clusters of patients. Moreover, each hidden unit has contributions from each of the genes, and pathway analysis of these contributions shows that the hidden units are significantly enriched in known asthma-related pathways. We then used genes that contribute most to the hidden units to develop a secondary random-forest classifier for directly predicting asthma severity. The feature importance metric from this classifier identified a signature based on 50 key genes, which are associated with severity. Furthermore, we can use these key genes to successfully estimate FEV1/FVC ratios across patients, via support-vector-machine regression.ConclusionWe found that the denoising autoencoder framework can extract meaningful patterns corresponding to functional gene groups and patient clusters from the gene expression of asthma patients.

Highlights

  • The pathogenesis of asthma is a complex process involving multiple genes and pathways

  • We argue that (1) this simple structure can retrieve components that have biological relevance and are explainable, (2) the hidden units produce clearer patterns than the raw data to categorize patients into heterogeneous groups, and (3) components of the clinically relevant hidden units may contain genes that are functionally associated with the pathogenesis of asthma, serving as potential sources for biomarker discovery

  • Hidden units associate with Transcriptomic endotype of asthma (TEA) clusters By encoding the original data into the hidden vector space, the model produced a sparse embedding space; we could observe distinct patterns related to clinical traits (Fig. 2a)

Read more

Summary

Introduction

The pathogenesis of asthma is a complex process involving multiple genes and pathways. Among asthma patients identified as severe, subpopulations with diverse pathogenicity may exist that respond differently to medications [3]. Several researchers have investigated aspects of asthma heterogeneity and tried to identify subgroups based on different types of indicators. Yan et al [7] identified three transcriptomic endotypes of asthma (TEAs) using unsupervised clustering on gene expression of induced sputum of asthma patients, demonstrating the predictive potential of molecular profiles on disease phenotypes. Each of these studies tried to interpret the identified subgroups by investigating how they associate with disease phenotypes, but did not explicitly evaluate their association with disease severity. As many asthma subgroups contain a non-trivial proportion of severe patients, work is needed to further characterize specific genes or pathways that lead to more severe phenotypes within each patient subgroup

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call