Background and aimAvailable evidence suggests elevated serum prolactin (PRL) levels in olanzapine (OLZ)-treated patients with schizophrenia. However, machine learning (ML)-based comprehensive evaluations of the influence of pathophysiological and pharmacological factors on PRL levels in OLZ-treated patients are rare. We aimed to forecast the PRL level in OLZ-treated patients and mine pharmacovigilance information on PRL-related adverse events by integrating ML and electronic health record (EHR) data.MethodsData were extracted from an EHR system to construct an ML dataset in 672×384 matrix format after preprocessing, which was subsequently randomly divided into a derivation cohort for model development and a validation cohort for model validation (8:2). The eXtreme gradient boosting (XGBoost) algorithm was used to build the ML models, the importance of the features and predictive behaviors of which were illustrated by SHapley Additive exPlanations (SHAP)-based analyses. The sequential forward feature selection approach was used to generate the optimal feature subset. The co-administered drugs that might have influenced PRL levels during OLZ treatment as identified by SHAP analyses were then compared with evidence from disproportionality analyses by using OpenVigil FDA.ResultsThe 15 features that made the greatest contributions, as ranked by the mean (|SHAP value|), were identified as the optimal feature subset. The features were gender_male, co-administration of risperidone, age, co-administration of aripiprazole, concentration of aripiprazole, concentration of OLZ, progesterone, co-administration of sulpiride, creatine kinase, serum sodium, serum phosphorus, testosterone, platelet distribution width, α-L-fucosidase, and lipoprotein (a). The XGBoost model after feature selection delivered good performance on the validation cohort with a mean absolute error of 0.046, mean squared error of 0.0036, root-mean-squared error of 0.060, and mean relative error of 11%. Risperidone and aripiprazole exhibited the strongest associations with hyperprolactinemia and decreased blood PRL according to the disproportionality analyses, and both were identified as co-administered drugs that influenced PRL levels during OLZ treatment by SHAP analyses.ConclusionsMultiple pathophysiological and pharmacological confounders influence PRL levels associated with effective treatment and PRL-related side-effects in OLZ-treated patients. Our study highlights the feasibility of integration of ML and EHR data to facilitate the detection of PRL levels and pharmacovigilance signals in OLZ-treated patients.