Learning relevance models for patient cohort retrieval.

Travis R Goodwin,Sanda M Harabagiu

doi:10.1093/jamiaopen/ooy010

Travis R Goodwin, Sanda M Harabagiu

Open Access

https://doi.org/10.1093/jamiaopen/ooy010

Copy DOI

Journal: JAMIA open	Publication Date: Sep 28, 2018
Citations: 13	License type: CC BY-NC 4.0

Affiliation: The University of Texas at Dallas

Abstract

ObjectiveWe explored how judgements provided by physicians can be used to learn relevance models that enhance the quality of patient cohorts retrieved from Electronic Health Records (EHRs) collections.MethodsA very large number of features were extracted from patient cohort descriptions as well as EHR collections. The features were used to investigate retrieving (1) neurology-specific patient cohorts from the de-identified Temple University Hospital electroencephalography (EEG) Corpus as well as (2) the more general cohorts evaluated in the TREC Medical Records Track (TRECMed) from the de-identified hospital records provided by the University of Pittsburgh Medical Center. The features informed a learning relevance model (LRM) that took advantage of relevance judgements provided by physicians. The LRM implements a pairwise learning-to-rank framework, which enables our learning patient cohort retrieval (L-PCR) system to learn from physicians’ feedback.Results and DiscussionWe evaluated the L-PCR system against state-of-the-art traditional patient cohort retrieval systems, and observed a 27% improvement when operating on EEGs and a 53% improvement when operating on TRECMed EHRs, showing the promise of the L-PCR system. We also performed extensive feature analyses to reveal the most effective strategies for representing cohort descriptions as queries, encoding EHRs, and measuring cohort relevance.ConclusionThe L-PCR system has significant promise for reliably retrieving patient cohorts from EHRs in multiple settings when trained with relevance judgments. When provided with additional cohort descriptions, the L-PCR system will continue to learn, thus offering a potential solution to the performance barriers of current cohort retrieval systems.

Full Text