Automated screening of natural language in electronic health records for the diagnosis septic shock is feasible and outperforms an approach based on explicit administrative codes

Joris Vermassen,Kirsten Colpaert,Liesbet De Bus,Pieter Depuydt,Johan Decruyenaere

doi:10.1016/j.jcrc.2020.01.007

Abstract

PurposeIdentification of patients for epidemiologic research through administrative coding has important limitations. We investigated the feasibility of a search based on natural language processing (NLP) on the text sections of electronic health records for identification of patients with septic shock. Materials and methodsResults of an explicit search strategy (using explicit concept retrieval) and a combined search strategy (using both explicit and implicit concept retrieval) were compared to hospital ICD-9 based administrative coding and to our department's own prospectively compiled infection database. ResultsOf 8911 patients admitted to the medical or surgical ICU, 1023 (11.5%) suffered from septic shock according to the combined search strategy. This was significantly more than those identified by the explicit strategy (518, 5.8%), by hospital administrative coding (549, 5.8%) or by our own prospectively compiled database (609, 6.8%) (p < .001). Sensitivity and specificity of the automated combined search strategy were 72.7% (95%CI 69.0%–76.2%) and 93.0% (95%CI 92.4%–93.6%), compared to 56.0% (95%CI 52.0%–60.0%) and 97.5% (95%CI 97.1%–97.8%) for hospital administrative coding. ConclusionsAn automated search strategy based on a combination of explicit and implicit concept retrieval is feasible to screen electronic health records for septic shock and outperforms an administrative coding based explicit approach.

Full Text