Abstract
Idiopathic pulmonary fibrosis is a progressive fibrotic lung disease with a variable clinical trajectory. Decline in forced vital capacity (FVC) is the main indicator of progression; however, missingness prevents long-term analysis of patterns in lung function. We aimed to identify distinct clusters of lung function trajectory among patients with idiopathic pulmonary fibrosis using machine learning techniques. We did a secondary analysis of longitudinal data on FVC collected from a cohort of patients with idiopathic pulmonary fibrosis from the PROFILE study; a multicentre, prospective, observational cohort study. We evaluated the imputation performance of conventional and machine learning techniques to impute missing data and then analysed the fully imputed dataset by unsupervised clustering using self-organising maps. We compared anthropometric features, genomic associations, serum biomarkers, and clinical outcomes between clusters. We also performed a replication of the analysis on data from a cohort of patients with idiopathic pulmonary fibrosis from an independent dataset, obtained from the Chicago Consortium. 415 (71%) of 581 participants recruited into the PROFILE study were eligible for further analysis. An unsupervised machine learning algorithm had the lowest imputation error among tested methods, and self-organising maps identified four distinct clusters (1-4), which was confirmed by sensitivity analysis. Cluster 1 comprised 140 (34%) participants and was associated with a disease trajectory showing a linear decline in FVC over 3 years. Cluster 2 comprised 100 (24%) participants and was associated with a trajectory showing an initial improvement in FVC before subsequently decreasing. Cluster 3 comprised 113 (27%) participants and was associated with a trajectory showing an initial decline in FVC before subsequent stabilisation. Cluster 4 comprised 62 (15%) participants and was associated with a trajectory showing stable lung function. Median survival was shortest in cluster 1 (2·87 years [IQR 2·29-3·40]) and cluster 3 (2·23 years [1·75-3·84]), followed by cluster 2 (4·74 years [3·96-5·73]), and was longest in cluster 4 (5·56 years [5·18-6·62]). Baseline FEV1 to FVC ratio and concentrations of the biomarker SP-D were significantly higher in clusters 1 and 3. Similar lung function clusters with some shared anthropometric features were identified in the replication cohort. Using a data-driven unsupervised approach, we identified four clusters of lung function trajectory with distinct clinical and biochemical features. Enriching or stratifying longitudinal spirometric data into clusters might optimise evaluation of intervention efficacy during clinical trials and patient management. National Institute for Health and Care Research, Medical Research Council, and GlaxoSmithKline.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.