BackgroundThe biological, clinical and social factors which underpin the aetiology of psychotic disorders are known to overlap between different ICD-10/DSM-5 diagnostic categories. A transdiagnostic approach to investigate clinical phenotype may enable a better understanding of pathophysiology at individual patient level. We applied natural language processing (NLP) tools to electronic health record (EHR) data from patients presenting with an ICD-10 diagnosis of unipolar depression to determine if symptoms at diagnosis could predict subsequent onset of a bipolar or psychotic disorder.MethodsData were obtained from 20,582 adults presenting with unipolar depression (ICD-10 F32 or F33, excluding F32.3 and F33.3) to the South London and Maudsley (SLaM) NHS Foundation Trust between April 2006 and March 2018.Natural language processing (NLP) techniques were used to extract data on 21 mood and affective symptoms from free text clinical assessments documented in the period -3/+3 months from the date of the diagnosis of unipolar depression. We obtained descriptive analyses of demographics and symptom prevalence.Symptoms were categorised into four groups: 1. Depressive (low mood, anhedonia, feelings of guilt, hopelessness, helplessness, psychomotor retardation, worthlessness, tearfulness, low energy), 2. Manic (elation, grandiosity, pressured speech, flight of ideas), 3. Biological symptoms (insomnia, disturbed sleep, low appetite, weight loss, poor concentration) and 4. Emotional/behavioural symptoms (mood instability, agitation, irritability). The symptom network structure was estimated using the Enhanced Least Absolute Shrinkage and Selection Operation procedure. We assessed network stability via a case-dropping bootstrapping procedure.We investigated associations between each of the four symptom groups and clinical outcomes using multivariable Cox regression to predict five-year risk of bipolar disorder (ICD-10 F30/F31) or a psychotic disorder (ICD-10 F2*).ResultsOf all patients presenting with unipolar depression, 19,569 (95.1%) had at least one documented depressive symptom, 16,199 (78.7%) had at least one biological symptom, 10,006 (48.6%) had at least one emotional/behavioural symptom, and 1,372 (6.67%) had at least one manic symptom. Patients with at least one manic symptom were significantly more likely to be male (OR: 1.25 (95% CI 1.12 - 1.40), p < 0.001) and less likely to be of Black (OR: 0.80 (0.68 - 0.93), p = 0.004) or Other ethnicity (OR: 0.78 (0.66 - 0.91), p = 0.003). Elation was the most commonly reported manic symptom (3.17%). Network analysis revealed that the presence of manic symptoms was associated with co-occurrence of agitation, irritability and mood instability. Agitation was the most central symptom in terms of strength, betweenness and expected influence. The resulting network remained stable after dropping up to 33% of cases from the sample.1,861 (9.04%) patients who initially presented with unipolar depression subsequently developed a mania/bipolar disorder or psychotic disorder within 5 years. The presence of at least one manic (HR: 1.71, 1.50 – 1.97), biological (HR: 1.33, 1.16 – 1.53) or emotional (HR: 1.91, 1.73 – 2.13) symptom was associated with significantly increased risk of onset of a bipolar or psychotic disorder.DiscussionWe found that patients with unipolar depression have a heterogenous clinical phenotype with a significant proportion going on to develop a bipolar or psychotic disorder within 5 years. Symptoms extracted from the EHR using NLP were predictive of subsequent onset of a bipolar or psychotic disorder. A transdiagnostic approach to defining clinical phenotype may help to better predict subsequent clinical outcomes.