Abstract

Abstract Objective To study the application of a machine learning algorithm for predicting gestational diabetes mellitus (GDM) in early pregnancy. Methods This study identified indicators related to GDM through a literature review and expert discussion. Pregnant women who had attended medical institutions for an antenatal examination from November 2017 to August 2018 were selected for analysis, and the collected indicators were retrospectively analyzed. Based on Python, the indicators were classified and modeled using a random forest regression algorithm, and the performance of the prediction model was analyzed. Results We obtained 4806 analyzable data from 1625 pregnant women. Among these, 3265 samples with all 67 indicators were used to establish data set F1; 4806 samples with 38 identical indicators were used to establish data set F2. Each of F1 and F2 was used for training the random forest algorithm. The overall predictive accuracy of the F1 model was 93.10%, area under the receiver operating characteristic curve (AUC) was 0.66, and the predictive accuracy of GDM-positive cases was 37.10%. The corresponding values for the F2 model were 88.70%, 0.87, and 79.44%. The results thus showed that the F2 prediction model performed better than the F1 model. To explore the impact of sacrificial indicators on GDM prediction, the F3 data set was established using 3265 samples (F1) with 38 indicators (F2). After training, the overall predictive accuracy of the F3 model was 91.60%, AUC was 0.58, and the predictive accuracy of positive cases was 15.85%. Conclusions In this study, a model for predicting GDM with several input variables (e.g., physical examination, past history, personal history, family history, and laboratory indicators) was established using a random forest regression algorithm. The trained prediction model exhibited a good performance and is valuable as a reference for predicting GDM in women at an early stage of pregnancy. In addition, there are certain requirements for the proportions of negative and positive cases in sample data sets when the random forest algorithm is applied to the early prediction of GDM.

Highlights

  • Gestational diabetes mellitus (GDM) refers to abnormal glucose tolerance and persistent high blood glucose concentration during pregnancy

  • There are certain requirements for the proportions of negative and positive cases in sample data sets when the random forest algorithm is applied to the early prediction of gestational diabetes mellitus (GDM)

  • The following indicators related to gestational diabetes were identified: physical examination results; medical history (hypertension, diabetes, heart disease, histories of abnormal pregnancy, polycystic ovary syndrome (PCOS), etc.); personal history; family history; specialist examination results; and laboratory indicators

Read more

Summary

Introduction

Gestational diabetes mellitus (GDM) refers to abnormal glucose tolerance and persistent high blood glucose concentration during pregnancy. Compared with normal pregnant mothers, women with GDM had a 7-fold increased risk of developing type 2 diabetes after delivery.[2] At the same time, the risk of metabolism-related diseases such as obesity and type 2 diabetes in offspring will increase significantly.[3] With the update of GDM diagnostic criteria, the increase of elderly pregnant women, and lifestyle changes, the global prevalence of GDM increased to 5.40–7.71% and showed a trend of increasing year by year.[4,5] Because of its significant harmfulness and high incidence, GDM has attracted wide attention of researchers all over the world

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call