Abstract

Retrieval-based Clinical Decision Support (ReCDS) can aid clinical workflow by providing relevant literature and similar patients for a given patient. However, the development of ReCDS systems has been severely obstructed by the lack of diverse patient collections and publicly available large-scale patient-level annotation datasets. In this paper, we collect a novel dataset of patient summaries and relations called PMC-Patients to benchmark two ReCDS tasks: Patient-to-Article Retrieval (ReCDS-PAR) and Patient-to-Patient Retrieval (ReCDS-PPR). Specifically, we extract patient summaries from PubMed Central articles using simple heuristics and utilize the PubMed citation graph to define patient-article relevance and patient-patient similarity. PMC-Patients contains 167k patient summaries with 3.1 M patient-article relevance annotations and 293k patient-patient similarity annotations, which is the largest-scale resource for ReCDS and also one of the largest patient collections. Human evaluation and analysis show that PMC-Patients is a diverse dataset with high-quality annotations. We also implement and evaluate several ReCDS systems on the PMC-Patients benchmarks to show its challenges and conduct several case studies to show the clinical utility of PMC-Patients.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call