Missing Data Research Articles

Chronic obstructive pulmonary disease affects nearly 400 million worldwide - over a million in the United Kingdom - and is the third leading cause of death. However, there is limited understanding of what prompts a diagnosis, how long this takes from symptom onset and the different approaches to clinical management by primary care professionals. Map out the clinical management and National Health Service contacts from symptom presentation to chronic obstructive pulmonary disease diagnosis and first acute exacerbation of chronic obstructive pulmonary disease in three time periods; construct risk prediction for first acute exacerbation of chronic obstructive pulmonary disease. Retrospective cohort study and cross-sectional survey. Primary care. Patients with incident chronic obstructive pulmonary disease aged > 35 years in England. None. First acute exacerbation of chronic obstructive pulmonary disease. Clinical Practice Research Datalink Aurum; new online survey. Forty thousand five hundred and seventy-seven patients were diagnosed between April 2006 and March 2007 (cohort 1), 48,249 between April 2016 and March 2017 (cohort 2) and 4752 between March and August 2020 (cohort 3). The mean (standard deviation) age was 68.3 years (12.0); 47.3% were female. Around three-quarters were diagnosed in primary care, with a slight fall in cohort 3. Compliance with National Institute for Health and Care Excellence diagnostic guidelines was slightly higher in cohorts 2 and 3 for all patients; 35.8% (10.0% in the year before diagnosis) had all four elements met for all cohorts combined. Multilevel modelling showed considerable between-practice variation in spirometry. The survey on the charity website had 156 responses by chronic obstructive pulmonary disease patients. Many respondents had not heard of the condition, hoped the symptoms would go away and identified various healthcare-related barriers to earlier diagnosis. Clinical Practice Research Datalink analysis showed notable changes in post-diagnosis prescribing from cohort 1 to 2, such as increases in long-acting muscarinic antagonist (21.7-46.3%). Triple therapy rose from 2.9% in cohort 2 to 11.1% in cohort 3. Documented pulmonary rehabilitation rose from just 0.8% in cohort 1 to 13.7% in cohort 2 and 20.9% in cohort 3. For all patients combined, the median time to first acute exacerbation of chronic obstructive pulmonary disease in patients who had one was 1.4 years in cohorts 1 and 2. Acute exacerbation of chronic obstructive pulmonary disease prediction models identified some consistent predictors, such as age, deprivation, severity, comorbidities, post-diagnosis spirometry and annual review. Models without post-diagnosis general practitioner actions had a c-statistic of around 0.70; the highest c-statistic was 0.81, for cohort 2 with post-diagnosis general practitioner actions and 6-month follow-up. All models had good calibration. The three most important predictors in terms of their population attributable risks were being a current smoker and offered smoking cessation advice (32.8%), disease severity (30.6%) and deprivation (15.4%). The highest population attributable risks for variables with adjusted hazard ratios < 1 were chronic obstructive pulmonary disease review (-27.3%) and flu vaccination (-26.6%). Symptom recording and chronic obstructive pulmonary disease diagnosis vary between practice; predicted forced expiratory volume in 1 second had many missing values. There has been some improvement over time in chronic obstructive pulmonary disease diagnosis and management, with large changes in prescribing, though patient and system barriers to further improvement exist. Data available to general practitioners cannot generate risk prediction models with sufficient accuracy. It will be important to expand the COVID-era cohort with longer follow-up and augment general practitioner data for better prediction. This study is registered as Researchregistry.com: researchregistry4762. This award was funded by the National Institute for Health and Care Research (NIHR) Health and Social Care Delivery Research programme (NIHR award ref: 17/99/72) and is published in full in Health and Social Care Delivery Research; Vol. 12, No. 43. See the NIHR Funding and Awards website for further award information.

Read full abstract

Abstract Introduction: Among the Hispanic/Latinx population in the US, four of five leading causes of death are attributed to smoking. Few evidence-based interventions have been developed to improve smoking cessation for this population. Our team has demonstrated the efficacy of a culturally targeted, extended self-help intervention for Spanish-speaking smokers, estimating a smoking abstinence rate of 33% at 24 months, compared to 24% for usual care. Further efforts are needed to enhance this efficacious, low-cost, scalable intervention approach. Machine learning (ML) is one approach to increase understanding of the predictors of treatment outcomes and inform strategies to improve the intervention. Toward that goal, this secondary data analysis utilized the decision tree (DT) model to predict self-reported 7-day point prevalence abstinence at 18-month follow-up for participants receiving our intervention. Method: Data from participants who reported smoking status at an 18-month follow-up (N=332 of 714 enrolled, 36% abstinent) were randomly split (80:20) into training and test datasets based on smoking status. We entered demographics, psychosocial, and smoking variables collected at baseline as predictors in the model. In addition to handling missing values and normalizing numeric features, we employed the Synthetic Minority Over-sampling Technique (SMOTE) combined with Tomek links (SMOTE-Tomek) to improve inter-class separability. Recursive Feature Elimination (RFE) using DT was implemented to identify the most relevant predictors. The cross-validation (CV) pipeline incorporated preprocessing, feature selection, handling class-imbalance, over-sampling, and model training with a DT as the interpretable classifier. Hyperparameters were tuned using stratified K-fold (K=5) CV to optimize parameters using Grid search. F1 score was selected as the primary metric as it accounts for each class performance by combining the precision and recall metrics. Finally, the performance of the classifier was evaluated in 20% of the unseen dataset. Results: The DT classifier for the CV achieved the F1 score of 0.73 [95% CI 0.66, 0.77], revealing a reasonably good performance in identifying smokers at 18-months. RFE selected familism, age, affect, and confidence as the most relevant predictors of smoking status. Individuals who had lower familism scores (&lt;138), combined with higher age (&gt;50), were more likely to be smokers at 18 months. Individuals who had higher familism scores (&gt;138) combined with higher negative affect (&gt;42) were more likely to be smokers. Conclusion: This study provides the first step toward personalized care for smoking cessation among the Hispanic/Latinx population and demonstrates the potential of DT classifier to predict cessation outcomes among individuals who received and completed a culturally and linguistically targeted smoking cessation intervention. Our future analysis will compare this model with conventional statistical modeling approaches and other ML algorithms to identify the best-performing parsimonious model. Citation Format: Ranjita Poudel, K. Ruwani M. Fernando, Matthew B. Schabath, Steven K. Sutton, Thomas H. Brandon, Issam El Naqa, Vani N. Simmons. A machine learning approach to predicting smoking cessation outcomes among Spanish-speaking smokers who completed a culturally targeted intervention [abstract]. In: Proceedings of the 17th AACR Conference on the Science of Cancer Health Disparities in Racial/Ethnic Minorities and the Medically Underserved; 2024 Sep 21-24; Los Angeles, CA. Philadelphia (PA): AACR; Cancer Epidemiol Biomarkers Prev 2024;33(9 Suppl):Abstract nr B019.

Read full abstract

Missing Data Research Articles

Related Topics

Articles published on Missing Data

A probabilistic framework for identifying anomalies in urban air quality data.

What happens between first symptoms and first acute exacerbation of COPD - observational study of routine data and patient survey.

Temporal Autoregressive Matrix Factorization for High-Dimensional Time Series Prediction of OSS.

Preinjury and Event-Related Characteristics of Pediatric Firearm Injuries: The American College of Surgeons Firearm Study, United States, March 2021‒February 2022.

Bayesian merged utilization of GRAPPA and SENSE (BMUGS) for in-plane accelerated reconstruction increases fMRI detection power

Classification of Air Pollutant Index on Data with Outliers and Imbalance Class Problem

Predicting the Optimal Treatment for Diseases Using Whale Optimization Algorithm

청소년이 인식하는 혐오표현에 관한 연구

A New Robust Lunar Landing Selection Method Using the Bayesian Optimization of Extreme Gradient Boosting Model (BO-XGBoost)

An automated approach to predict diabetic patients using KNN imputation and effective data mining techniques

Spatial interpolation of global DEM using federated deep learning

Gender-based differences in the association of self-reported sleep duration with cardiovascular disease and diabetes

Disparities in the diagnostic efficacy of radiomics models in predicting various degrees of cognitive impairment in patients with cerebral small vessel disease

Validation of the Mediating Effect of Parenting Stress in the Relationship between Work-Family Strains and Gains and Marital Satisfaction for Fathers in Korea.

Comparative Analysis of Imputation Methods for Enhancing Predictive Accuracy in Data Models

Implementing Stacking with Cross - Validation for Heart Disease Prediction.

Reconstructing daytime and nighttime MODIS land surface temperature in desert areas using multi-channel singular spectrum analysis

CMFHMDA: a prediction framework for human disease-microbe associations based on cross-domain matrix factorization.

A Method to Handle the Missing Values in Multi-Criteria Sorting Problems Based on Dominance Rough Sets

Abstract B019: A machine learning approach to predicting smoking cessation outcomes among Spanish-speaking smokers who completed a culturally targeted intervention

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Missing Data Research Articles

Related Topics

Articles published on Missing Data

A probabilistic framework for identifying anomalies in urban air quality data.

What happens between first symptoms and first acute exacerbation of COPD - observational study of routine data and patient survey.

Temporal Autoregressive Matrix Factorization for High-Dimensional Time Series Prediction of OSS.

Preinjury and Event-Related Characteristics of Pediatric Firearm Injuries: The American College of Surgeons Firearm Study, United States, March 2021‒February 2022.

Bayesian merged utilization of GRAPPA and SENSE (BMUGS) for in-plane accelerated reconstruction increases fMRI detection power

Classification of Air Pollutant Index on Data with Outliers and Imbalance Class Problem

Predicting the Optimal Treatment for Diseases Using Whale Optimization Algorithm

청소년이 인식하는 혐오표현에 관한 연구

A New Robust Lunar Landing Selection Method Using the Bayesian Optimization of Extreme Gradient Boosting Model (BO-XGBoost)

An automated approach to predict diabetic patients using KNN imputation and effective data mining techniques

Spatial interpolation of global DEM using federated deep learning

Gender-based differences in the association of self-reported sleep duration with cardiovascular disease and diabetes

Disparities in the diagnostic efficacy of radiomics models in predicting various degrees of cognitive impairment in patients with cerebral small vessel disease

Validation of the Mediating Effect of Parenting Stress in the Relationship between Work-Family Strains and Gains and Marital Satisfaction for Fathers in Korea.

Comparative Analysis of Imputation Methods for Enhancing Predictive Accuracy in Data Models

Implementing Stacking with Cross - Validation for Heart Disease Prediction.

Reconstructing daytime and nighttime MODIS land surface temperature in desert areas using multi-channel singular spectrum analysis

CMFHMDA: a prediction framework for human disease-microbe associations based on cross-domain matrix factorization.

A Method to Handle the Missing Values in Multi-Criteria Sorting Problems Based on Dominance Rough Sets

Abstract B019: A machine learning approach to predicting smoking cessation outcomes among Spanish-speaking smokers who completed a culturally targeted intervention