PurposeElectronic health records (EHR) contain a vast amount of clinical data. Improved, automated classification approaches have the potential to accurately and efficiently identify patient cohorts for research. We evaluated if a rule-based natural language processing (NLP) algorithm using clinical notes performed better for classifying proliferative diabetic retinopathy (PDR) and non-proliferative diabetic retinopathy (NPDR) severity compared to International Classification of Diseases 9th or 10th edition (ICD) codes. DesignCross-sectional study SubjectsDe-identified EHR data from an academic medical center identified 2366 patients aged ≥18 years, with diabetes mellitus, diabetic retinopathy, and available clinical notes. MethodsFrom these 2366 patients, 306 random patients (100 training set, 206 test set) underwent chart review by ophthalmologists to establish the gold standard. ICD codes were extracted from the EHR. The notes algorithm identified positive mention of PDR and NPDR severity from clinical notes. PDR and NPDR severity classification by ICD codes and the notes algorithm were compared to the gold standard. The entire diabetic retinopathy cohort (N=2366) was then classified as having presence (or absence) of PDR using ICD codes and the notes algorithm. Main Outcome MeasuresSensitivity, specificity, positive predictive value (PPV), negative predictive value, F1 score for the notes algorithm compared to ICD codes using a gold standard of chart review. ResultsFor PDR classification of the test set patients, the notes algorithm performed better than ICD codes for all metrics. Specifically, the notes algorithm had significantly higher sensitivity (90.5% [95% CI 85.7, 94.9] vs 68.4% [60.4, 75.3]), but similar PPV (98.0% [95.4-100] vs 94.7% [90.3, 98.3]) respectively. The F1 score was 0.941 [0.910, 0.966] for the notes algorithm compared to 0.794 [0.734, 0.842] for ICD codes. For PDR classification, ICD-10 codes performed better than ICD-9 codes (F1 score 0.836 [0.771, 0.878] vs 0.596 [0.222, 0.692]). For NPDR severity classification, the notes algorithm performed similarly to ICD codes, but performance was limited by small sample size. ConclusionsThe notes algorithm outperformed ICD codes for PDR classification. The findings demonstrate the significant potential of applying a rule-based NLP algorithm to clinical notes to increase the efficiency and accuracy of cohort selection for research.