Abstract

1539 Background: Clinical trial eligibility increasingly requires information found in NGS tests; lack of structured NGS results hinders the automation of trial matching for this criterion, which may be a deterrent to open biomarker-driven trials in certain sites. We developed a machine learning tool that infers the presence of NGS results in the EHR, facilitating clinical trial matching. Methods: The Flatiron Health EHR-derived database contains patient-level pathology and genetic counseling reports from community oncology practices. An internal team of clinical experts reviewed a random sample of patients across this network to generate labels of whether each patient had been NGS tested. A supervised ML model was trained by scanning documents in the EHR and extracting n-gram features from text snippets surrounding relevant keywords (i.e. 'Lung biomarker', 'Biomarker negative'). Through k-fold cross-validation and l2-regularization, we found that a logistic regression was able to classify patients' NGS testing status. The model's offline performance on a 20% hold-out test set was measured with standard classification metrics: sensitivity, specificity, positive predictive value (PPV) and NPV. In an online setting, we integrated the tool into Flatiron's clinical trial matching software OncoTrials by including in each patient's profile an indicator of "likely NGS tested" or "unlikely NGS tested" based on the classifier's prediction. For patients inferred as tested, the model linked users to a test report view in the EHR. In this online setting, we measured sensitivity and specificity of the model after user review in two community oncology practices. Results: This NGS testing status inference model was characterized using a test sample of 15,175 patients. The model sensitivity and specificity (95%CI) were 91.3% (90.2, 92.3) and 96.2% (95.8, 96.5), respectively; PPV was 84.5% (83.2, 85.8) and NPV was 98.0% (97.7, 98.2). In the validation sample (N = 200 originated from 2 distinct care sites), users identified NGS testing status with a sensitivity of 95.2% (88.3%, 98.7%). Conclusions: This machine learning model facilitates the screening for potential patient enrollment in biomarker-driven trials by automatically surfacing patients with NGS test results at high sensitivity and specificity into a trial matching application to identify candidates. This tool could mitigate a key barrier for participation in biomarker-driven trials for community clinics.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call