Abstract

The prevalence of metabolic syndrome is rapidly increasing in the United States. We hypothesized that prediction models using data obtained during pregnancy can accurately predict the future development of metabolic syndrome. This study aimed to develop machine learning models to predict the development of metabolic syndrome using factors ascertained in nulliparous pregnant individuals. This was a secondary analysis of a prospective cohort study (Nulliparous Pregnancy Outcomes Study: Monitoring Mothers-to-Be Heart Health Study [nuMoM2b-HHS]). Data were collected from October 2010 to October 2020, and analyzed from July 2023 to October 2023. Participants had in-person visits 2 to 7 years after their first delivery. The primary outcome was metabolic syndrome, defined by the National Cholesterol Education Program Adult Treatment Panel III criteria, which was measured within 2 to 7 years after delivery. A total of 127 variables that were obtained during pregnancy were evaluated. The data set was randomly split into a training set (70%) and a test set (30%). We developed a random forest model and a lasso regression model using variables obtained during pregnancy. We compared the area under the receiver operating characteristic curve for both models. Using the model with the better area under the receiver operating characteristic curve, we developed models that included fewer variables based on SHAP (SHapley Additive exPlanations) values and compared them with the original model. The final model chosen would have fewer variables and noninferior areas under the receiver operating characteristic curve. A total of 4225 individuals met the inclusion criteria; the mean (standard deviation) age was 27.0 (5.6) years. Of these, 754 (17.8%) developed metabolic syndrome. The area under the receiver operating characteristic curve of the random forest model was 0.878 (95% confidence interval, 0.846-0.909), which was higher than the 0.850 of the lasso model (95% confidence interval, 0.811-0.888; P<.001). Therefore, random forest models using fewer variables were developed. The random forest model with the top 3 variables (high-density lipoprotein, insulin, and high-sensitivity C-reactive protein) was chosen as the final model because it had the area under the receiver operating characteristic curve of 0.867 (95% confidence interval, 0.839-0.895), which was not inferior to the original model (P=.08). The area under the receiver operating characteristic curve of the final model in the test set was 0.847 (95% confidence interval, 0.821-0.873). An online application of the final model was developed (https://kawakita.shinyapps.io/metabolic/). We developed a model that can accurately predict the development of metabolic syndrome in 2 to 7 years after delivery.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.