Transcriptomic analyses have advanced the understanding of complex disease pathophysiology including chronic obstructive pulmonary disease (COPD). However, identifying relevant biologic causative factors has been limited by the integration of high dimensionality data. COPD is characterized by lung destruction and inflammation with smoke exposure being a major risk factor. To define novel biological mechanisms in COPD, we utilized unsupervised and supervised interpretable machine learning analyses of single cell-RNA sequencing data from the gold standard mouse smoke exposure model to identify significant latent factors (context-specific co-expression modules) impacting pathophysiology. The machine learning transcriptomic signatures coupled to protein networks uncovered a reduction in network complexity and novel biological alterations in actin-associated gelsolin (GSN), which was transcriptionally linked to disease state. GSN was altered in airway epithelial cells in the mouse model and in human COPD. GSN was increased in plasma from COPD patients, and smoke exposure resulted in enhanced GSN release from airway cells from COPD patients. This method provides insights into rewiring of transcriptional networks that are associated with COPD pathogenesis and provide a novel analytical platform for other diseases.
Read full abstract