Abstract

To continue closing the gap between the predictive modeling and its real-world application, we report a new data-to-prediction pipeline that advanced the state-of-the-art predictive performance of body mass index (BMI) classifications by integrating siloed claims databases via a common data model. This study adapted the ensemble-based methodology of the baseline prediction model and focused on removing the silos in the claims databases. We applied the Super Learner machine learning algorithm (SLA) to learn a combined dataset consisting of 50% data from the Optum Date of Death database and 50% data from the IBM MarketScan Commercial Claims and Encounters (CCAE), and omitted the commonly used one-hot-encoding step and used multi-categorical variables directly in the feature engineering process. These developments were then optimized via a standard cross-validation scheme and the performance was evaluated on a holdout test set. Sociodemographic and clinical characteristics were used with (denoted as SLA1) and without (denoted as SLA2) baseline BMI values to predict BMI classifications (≥ 30, ≥ 35, and ≥ 40kg/m2). Although the newly implemented SLA1 performed similarly to the previous model, with the area under the receiver operating characteristic curve (ROC AUC) being approximately 88% for all BMI classifications, specificity ranging from 90% to 96%, and accuracy ranging from 88% to 93%. The new SLA2 achieved consistently better performance on all metrics across all BMI classes. In particular, the new SLA2 achieved 77-79% in ROC AUC, increasing from the previously reported level (73%). Its specificity improved to the range of 76-90% from 71-86%. Its accuracy improved to the range of 77-86% from 73-80%. Its recall (i.e., sensitivity) improved to the range of 64-78% from 60-76%. This study demonstrates dramatic improvements in the prediction of BMI across classifications using integrated databases in a common data model for the generation of real-world evidence.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call