Abstract

e13613 Background: Cohort selection for specialized clinical trials is a cardinal pillar of the evidence-based medicine; however, it is the most difficult, complex, time consuming, and expensive step. Determining the efficacy of a new treatment (or intervention) requires finding eligible patients meeting the inclusion and exclusion criteria. In specialized scenarios, the complex criteria may even require researchers to do time consuming manual reviews and analyses of electronic health records (EHRs) to shortlist qualified patients. The major contribution of this research is a set of novel semantic models to build a cohort for clinical research by enabling semantic search over electronic health records represented as Knowledge graphs (KGs). Methods: We present the design of a novel cohort retrieval system, satisfying inclusion and exclusion criteria of an oncology clinical research study. Knowledge graphs and semantic models: We construct knowledge graphs (KGs) to interconnect different data sources, stored in a data-lake, and develop semantic models that enable semantic search over the processed data. We designed and constructed an oncology knowledge graph that enables semantics driven cohort selection. In addition, we have built a novel cohort retrieval system, satisfying the inclusion and exclusion criteria of an oncology clinical research study, that utilizes a semantics driven dynamic query engine to generate and execute cohort selection queries on heterogeneous EHR data. Results: We obtained real world oncology data of 21,000 oncology patients and then constructed knowledge graphs for five cancer types -- Colon (C18), Lung (C34), Breast (C50), prostate (C61), and Multiple Myeloma (C90). The cohort building scenarios are designed to represent a mix of criterion types and combinations including both inclusion and exclusion criteria, involving conjunctions, disjunctions and numeric ranges. Our cohort builder is evaluated against ten well known key competency scenarios. A team of experts validated the results of our cohort builder obtained against these competency scenarios by directly querying the graph. Our extensive evaluations demonstrate that the cohort builder searches with 100% accuracy the patients that match the criteria specified in all ten competencies. The average time to build the cohort on all graphs for these competencies is less than 10 seconds compared with that of days when patients are manually searched in EHR systems. Conclusions: Our query engine is not tightly coupled with the architecture of our data-lake; rather its architecture is flexible and can be easily integrated with other enterprise data-lakes or EMR systems. In future, we plan to scale the extent of inclusion and exclusion criteria to provide interoperability with existing clinical trial knowledge. We also aim to empirically evaluate the efficiency of cohort selection queries using a knowledge graph with classical database query approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call