Falls in older people are common and morbid. Prediction models can help identifying individuals at higher fall risk. Electronic health records (EHR) offer an opportunity to develop automated prediction tools that may help to identify fall-prone individuals and lower clinical workload. However, existing models primarily utilise structured EHR data and neglect information in unstructured data. Using machine learning and natural language processing (NLP), we aimed to examine the predictive performance provided by unstructured clinical notes, and their incremental performance over structured data to predict falls. We used primary care EHR data of people aged 65 or over. We developed three logistic regression models using the least absolute shrinkage and selection operator: one using structured clinical variables (Baseline), one with topics extracted from unstructured clinical notes (Topic-based) and one by adding clinical variables to the extracted topics (Combi). Model performance was assessed in terms of discrimination using the area under the receiver operating characteristic curve (AUC), and calibration by calibration plots. We used 10-fold cross-validation to validate the approach. Data of 35,357 individuals were analysed, of which 4,734 experienced falls. Our NLP topic modelling technique discovered 151 topics from the unstructured clinical notes. AUCs and 95% confidence intervals of the Baseline, Topic-based and Combi models were 0.709 (0.700-0.719), 0.685 (0.676-0.694) and 0.718 (0.708-0.727), respectively. All the models showed good calibration. Unstructured clinical notes are an additional viable data source to develop and improve prediction models for falls compared to traditional prediction models, but the clinical relevance remains limited.
Read full abstract