Abstract

BackgroundDeriving structured data from unstructured clinical notes in electronic health records (EHRs) requires natural language processing and clinical expertise, which is often costly, and frequently a one-off investment. We implemented SemEHR, a semantic search system that reduces the expertise and effort required in this context. We aimed to use it to characterise and select patients for projects such as the UK Department of Health 100,000 Genome Project. MethodsBuilt upon the off-the-shelf toolkits, Bio-YODIE and CogStack, SemEHR integrates heterogeneous EHR documents and identifies contextualised (negation, temporality, and experiencer) mentions of a wide range of biomedical concepts including SNOMED CT, ICD-10, LOINC, and Drug Ontology. Text mining and semantics techniques are incorporated to derive a longitudinal patient panorama, combining structured profiles and unstructured records, available through semantic search interfaces. FindingsWe deployed SemEHR in various UK hospital EHRs, including the South London and Maudsley NHS Foundation Trust, where 46 million concept mentions were identified from 18 million documents. In a liver disease study, SemEHR identified 94 of 100 hepatitis C positive manually annotated patients. In a HIV study, SemEHR identified 21 of 23 true positives in a 1000-patient cohort. At King's College Hospital, SemEHR is being used to recruit patients into the 100,000 Genomes Project, where ontological associations are integrated to match recruitment criteria and populate complex phenotype models. A preliminary evaluation suggests that the tool is able to validate previously submitted cases and is very fast in searching phenotypes. InterpretationUsing SemEHR, a query such as “find patients with a family history of hepatitis C”, which previously might have required the user to have natural language processing expertise, becomes a simple search, for which SemEHR retrieves a relevant patient cohort, populates patient-level summaries, and provides a link to each mention in the original source. Results and feedback from the multiple studies have proven its efficiency: previously weeks or months of work can be done within minutes in some cases. FundingMedical Research Council (MC_PC_14089); NIHR Biomedical Research Centre for Mental Health, Biomedical Research Unit for Dementia; the European Union's Horizon 2020 (No 644753 KConnect); Wellcome Trust Seed Award in Science (109823/Z/15/Z); National Institute for Health Research University College London Hospital's Biomedical Research Centre; Arthritis Research UK; British Heart Foundation; Cancer Research UK; Chief Scientist Office; Economic and Social Research Council; Engineering and Physical Sciences Research Council; National Institute for Social Care and Health Research; Wellcome Trust (grant MR/K006584/1).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call