Abstract

e14079 Background: Low patient recruitment is one of the main reasons clinical trials fail. Identifying eligible patients for clinical trials using electric health records (EHRs) can help reach accrual targets. Ontology reasoning implemented in Trial2Patient, a scalable system we developed for matching patient to clinical trials, forms the basis for generating patient cohorts in our system. For efficient cohort definition, an attribute ontology for eligibility criteria and entity categorization is a necessary first step. To meet this requirement, we constructed an ontology platform for lung cancer trials. Methods: We classified 128 non-small cell lung cancer and 38 small cell lung cancer trials into different therapy groups. Among the 166 trials we examined, 110 were immuno-oncology therapy-based, 48 were targeted therapy-based, and 8 were chemotherapy or device trials. We analyzed the eligibility criteria for each trial manually to identify entities from all trials as well as indication specific and further therapy group specific entities. To incorporate a semi-automated, natural language process (NLP)-assisted named entity recognition (NER) into the future cohort definition process, we trained NLP and deep learning models for NER and ontology encoding. Attributes generated from 50 processed NSCLC trials were evaluated with our manually curated attributes. The ontology generated from lung cancer was tested in 74 prostate cancer trials for generalizability. Results: The ontology for lung cancer trials, which is generalizable to prostate cancer and other cancer clinical trials, were constructed. Total 507 attributes were extracted and entities were categorized into 8 groups. Evaluation of attributes generated by NLP and deep learning models compared with manually extracted attributes showed high consistency and accuracy. The average precision, recall and F1 values of 15 most commonly appearing entities (disease, histology, targeted therapy, immunotherapy, radiotherapy, neoadjuvant therapy, age, gender, test, vitals, value, drug, gene, mutation, problem) are 0.873, 0.769, and 0.805, respectively. Conclusions: We contribute to a clinical trial ontology platform for lung cancer and prostate cancer trial recruitment. This ontology platform can be expanded to other solid tumors or hematologic malignancies for clinical trial analysis, and can also be applied to generate synthetic control arm cohorts. We believe NLP-assisted NER can be successfully incorporated for the future work of large scale of clinical trial cohort definition.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call