Abstract

Abstract Background Although measles is still rare in the United States (U.S.), there have been recent resurgent outbreaks in the U.S. To improve the accuracy of prediction given the rarity of measles events, we used machine learning (ML) algorithms to model measles case predictions at the U.S. county level. Methods The main outcome was occurrence of ≥1 measles case at the U.S. county level. Two ML prediction models were developed (HDBSCAN, a clustering algorithm, and XGBoost, a gradient boosting algorithm) and compared with traditional logistic regression. We included 28 predictors in the following categories: sociodemographics, population statistics, measles vaccination coverage, healthcare access, and exposure to measles via international air travel. The models were trained on 2014 case data and validated on 2018 case data. Models were compared using area under the receiver operating curve (AUC), sensitivity, specificity, positive predictive value (PPV), and F2 score (combined measure of sensitivity and PPV). Results There were 667 measles cases in 2014 and 375 in 2018 in the U.S. We identified U.S. counties for 635 (95.2%) cases in 2014 and 366 (97.6%) cases in 2018 through published sources, corresponding to 81/3143 (2.6%) counties in 2014 and 64/3143 (2.0%) counties in 2018 with ≥1 measles case. HDBSCAN had the highest sensitivity (0.92), but lowest AUC (0.68) and PPV (0.04) (Table). XGBoost had the highest F2 score (0.49), best balance of sensitivity (0.72) and specificity (0.94), and AUC = 0.92. Logistic regression had high AUC (0.91) and specificity (1.00) but the lowest sensitivity (0.16). Conclusion Machine learning approaches outperformed logistic regression by maximizing sensitivity to predict counties with measles cases, an important criterion to consider to prevent or prepare for future outbreaks. XGBoost or logistic regression could be considered to maximize specificity. Prioritizing sensitivity versus specificity may depend on county resources, priorities, and measles risk. Different modeling approaches could be considered to optimize surveillance efforts and develop effective interventions for timely response. Disclosures Stephanie Kujawski, PhD MPH, Merck & Co., Inc. (Employee, Shareholder) Boshu Ru, Ph.D., Merck & Co. Kenilworth, NJ (NYSE: MRK) (Employee, Shareholder) Amar K. Das, MD, PhD, Merck (Employee) richard baumgartner, PhD, Merck (Employee) Shuang Lu, MBA, MS, Merck (Employee) Matthew Pillsbury, PhD, Merck & CO. (Employee, Shareholder) Joseph Lewnard, PhD, Merck (Consultant, Grant/Research Support) James H. Conway, MD, FAAP, GSK (Advisor or Review Panel member)Merck (Advisor or Review Panel member)Moderna (Advisor or Review Panel member)Pfizer (Advisor or Review Panel member)Sanofi Pasteur (Research Grant or Support) Manjiri D. Pawaskar, PhD, Merck & Co., Inc. (Employee, Shareholder)

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call