Background and objectiveCaused by shared genetic risk factors and similar neuropsychological symptoms, bipolar disorder (BD) and major depressive disorder (MDD) are at high risk of misdiagnosis, which is associated with ineffective treatment and worsening of outcomes. We aimed to develop a machine learning (ML)-based diagnostic system, based on electronic medical records (EMR) data, to mimic the clinical reasoning of human physicians to differentiate MDD and BD (especially BD depressive episodes) patients about to be admitted to a hospital and, hence, reduce the misdiagnosis of BD as MDD on admission. In addition, we examined to what extent our ML model could be made interpretable by quantifying and visualizing the features that drive the predictions. MethodsBy identifying 16,311 patients admitted to a hospital located in western China between 2009 and 2018 with a recorded main diagnosis of MDD or BD, we established three sub-cohorts with different combinations of features for both the MDD-BD cohort and the MDD-BD depressive episodes cohort, respectively. Four different ML algorithms (logistic regression, extreme gradient boosting (XGBoost), random forest, and support vector machine) and four train-test splits were used to train and validate diagnostic models, and explainable methods (SHAP and Break Down) were utilized to analyze the contribution of each of the features at both population-level and individual-level, including feature importance, feature interaction, and feature effect on prediction decision for a specific subject. ResultsThe XGBoost algorithm provided the best test performance (AUC: 0.838 (0.810–0.867), PPV: 0.810 and NPV: 0.834) for separating patients with BD from those with MDD. Core predictors included symptoms (mood-up, exciting, bad sleep, loss of interest, talking, mood-down, provoke), along with age, job, myocardial enzyme markers (creatine kinase, hydroxybutyrate dehydrogenase), diabetes-associated marker (glucose), bone function marker (alkaline phosphatase), non-enzymatic antioxidant (uric acid), markers of immune/inflammation (white blood cell count, lymphocyte count, basophil percentage, monocyte count), cardiovascular function marker (low density lipoprotein), renal marker (total protein), liver biochemistry marker (indirect bilirubin), and vital signs like pulse. For separating patients with BD depressive episodes from those with MDD, the test AUC was 0.777 (0.732–0.822), with PPV 0.576 and NPV 0.899. Additional validation in models built with self-reported symptoms removed from the feature set, showed test AUC of 0.701 (0.666–0.736) for differentiating BD and MDD, and AUC of 0.564 (0.515–0.614) for detecting patients in BD depressive episodes from MDD patients. Validation in the datasets without removing the patients with comorbidity showed an AUC of 0.826 (0.806–0.846). ConclusionThe diagnostic system accurately identified patients with BD in various clinical scenarios, and differences in patterns of peripheral markers between BD and MDD could enrich our understanding of potential underlying pathophysiological mechanisms of them.
Read full abstract