BackgroundThere is increasing interest in suicide surveillance solutions to identify non-fatal suicidal and self-harming behaviours in the Australian community not currently captured through national administrative datasets. ObjectiveThe aim of the present study was to develop machine learning models to classify self-harm related behaviours using unstructured clinical note text from New South Wales (NSW) Ambulance data and compare their performance via traditional methods. MethodsPrimary data were derived from NSW Ambulance electronic medical records (eMRs) for potential self-harm related NSW Ambulance attendances for the period 2013–2019. Data included paramedic clinical notes detailing the nature of the attendance, clinical outcome, and narrative information. We assessed sensitivity, specificity, positive predictive value, negative predictive value, F-score, and the Matthews correlation coefficient (MCC) for four algorithms (Support Vector Machine, random forest, decision tree, and logistic regression). ResultsThe performance of these algorithms was compared using the MCC measure. In a test sample of 3157 ambulance attendances (1349 self-harm related behaviours and 1808 unrelated), the MCC for classification of self-harm related behaviour ranged from +0.681 to +0.730. The Support Vector Machine (sensitivity = 82.7%, specificity = 89.6%, MCC = 0.730) and the logistic regression (sensitivity = 83.1%, specificity = 89.3%, MCC = 0.727) models performed best. ConclusionsThis study demonstrates that machine learning models can be applied to paramedic notes within unstructured medical records to classify self-harm related behaviours. The resulting model could be used to compliment current manual abstraction of self-harm behaviours and provide more timely approximations to be used for self-harm surveillance.
Read full abstract