To predict preterm birth (PTB) in multiparous women, comparing machine learning approaches with traditional logistic regression. A population-based cohort study was conducted using data from the Ontario Better Outcomes Registry and Network (BORN). The cohort included all multiparous women who delivered a singleton birth at 20–42 weeks’ gestation in an Ontario hospital between April 1, 2012 and March 31, 2014. The primary outcome was PTB < 37 weeks, with spontaneous PTB analyzed as a secondary outcome. Stepwise logistic regression and the Boruta machine learning were used to select the important variables during the first and second trimester. For building prediction models, the whole data set were divided for the two independent parts: two-third for training the classifiers (Logistic regression, random forests, decision trees, and artificial neural networks) and one-third for model validation. Then, the training data set were balanced by random over sampling technique. The best hyper parameters were obtained by the tenfold cross validation. The performance of all models was evaluated by sensitivity, specificity, positive predictive value, negative predictive value, and the area under the receiver operating characteristics (AUC). The cohort included 145,846 births, of which 8125 (5.57%) were preterm. In first-trimester models, the strongest predictors of PTB were previous PTB, preexisting diabetes, and abnormal pregnancy‐associated plasma protein-A. In the testing data set, the highest predictive ability was seen for artificial neural networks, with an area under the receiver operating characteristic curve (AUC) of 68.8% (95% CI 67.6–70.1%). In second-trimester models, addition of infant sex, attendance at first-trimester appointment, medication exposure, and abnormal alpha-fetoprotein concentrations increased the AUC to 72.1% (95% CI 71.1–73.1%) with logistic regression. With the inclusion of the variable complications during pregnancy, the AUC increased to 80.5% (95% CI 79.6–81.5%) using logistic regression. For both overall and spontaneous PTB, during both the first and second trimesters, models yielded negative predictive values of 97%. Overall, machine learning and logistic regression produced similar performance for prediction of PTB. For overall and spontaneous PTB, both first- and second-trimester models provided negative predictive values of ~ 97%, higher than that of fetal fibronectin.
Read full abstract