Explainable multimodal machine learning model for classifying pregnancy drug safety.

Guy Shtar,Maya Berlin,Elkana Kohn,Bracha Shapira,Lior Rokach,Matitiahu Berkovitch,Jonathan Wren

doi:10.1093/bioinformatics/btab769

Abstract

Teratogenic drugs can cause severe fetal malformation and therefore have critical impact on the health of the fetus, yet the teratogenic risks are unknown for most approved drugs. This article proposes an explainable machine learning model for classifying pregnancy drug safety based on multimodal data and suggests an orthogonal ensemble for modeling multimodal data. To train the proposed model, we created a set of labeled drugs by processing over 100000 textual responses collected by a large teratology information service. Structured textual information is incorporated into the model by applying clustering analysis to textual features. We report an area under the receiver operating characteristic curve (AUC) of 0.891 using cross-validation and an AUC of 0.904 for cross-expert validation. Our findings suggest the safety of two drugs during pregnancy, Varenicline and Mebeverine, and suggest that Meloxicam, an NSAID, is of higher risk; according to existing data, the safety of these three drugs during pregnancy is unknown. We also present a web-based application that enables physicians to examine a specific drug and its risk factors. The code and data is available from https://github.com/goolig/drug_safety_pregnancy_prediction.git. Supplementary data are available at Bioinformatics online.

Full Text