We set forth to build a prediction model of individuals who would develop bipolar disorder (BD) using machine learning techniques in a large birth cohort. A total of 3748 subjects were studied at birth, 11, 15, 18, and 22years of age in a community birth cohort. We used the elastic net algorithm with 10-fold cross-validation to predict which individuals would develop BD at endpoint (22years) at each follow-up visit before diagnosis (from birth up to 18years). Afterward, we used the best model to calculate the subgroups of subjects at higher and lower risk of developing BD and analyzed the clinical differences among them. A total of 107 (2.8%) individuals within the cohort presented with BD type I, 26 (0.6%) with BD type II, and 87 (2.3%) with BD not otherwise specified. Frequency of female individuals was 58.82% (n=150) in the BD sample and 53.02% (n=1868) among the unaffected population. The model with variables assessed at the 18-year follow-up visit achieved the best performance: AUC 0.82 (CI 0.75-0.88), balanced accuracy 0.75, sensitivity 0.72, and specificity 0.77. The most important variables to detect BD at the 18-year follow-up visit were suicide risk, generalized anxiety disorder, parental physical abuse, and financial problems. Additionally, the high-risk subgroup of BD showed a high frequency of drug use and depressive symptoms. We developed a risk calculator for BD incorporating both demographic and clinical variables from a 22-year birth cohort. Our findings support previous studies in high-risk samples showing the significance of suicide risk and generalized anxiety disorder prior to the onset of BD, and highlight the role of social factors and adverse life events.
Read full abstract