AbstractBackgroundCognitive test results from electronic health records (EHRs) are key information for assessing the severity and progression of patients with mild cognitive impairment (MCI) and Alzheimer’s’ disease (AD). However, such information is often recorded in unstructured clinical notes rather than in an administrative database. We developed and validated a natural language processing (NLP) system to extract cognitive test results from clinical notes in the Veterans Affair (VA) Healthcare System.MethodAn NLP system was developed using regular expression‐based rules and Python to extract results for six tests that have been used most frequently in VA: Mini‐Mental State Exam (MMSE), Montreal Cognitive Assessment (MoCA), Saint Louis University Mental Status Examination (SLUMS), Mini‐cog, Boston Naming Test (BNT), and Benton Visual Retention Test (BVRT). The system extracted test results from each note in two steps: (1) searched a test name, or variations and abbreviation of the test name and, if successful, (2) searched the quantitative results (e.g., 12/30 for MMSE, 74th percentile for BNT) and/or descriptive results (e.g., “borderline impairment” for BVRT) within 5 words or one sentence before or after the test name. To balance the system performance and speed, we developed 3‐8 extraction rules per test based on a manual review of 30‐50 notes for each test. We further validated NLP performance on 6 held‐out datasets (50‐200 notes/test). We automatically sampled the development and held‐out notes by searching the test name and its variations/abbreviation to increase positive cases, i.e., notes that contained the above test results.ResultThe NLP system achieved 0.72‐0.92 predictive positive values (PPV), 0.96‐1.00 recall, and 0.83‐0.95 F1 scores on the validation test sets (Table 1). In addition, it demonstrated a scalable performance (processed 200,000 notes in 7 min), allowing an extraction of millions of notes from ∼1 million patients within hours.ConclusionRule‐based NLP can extract cognitive test results with adequate performance and scalable capability for clinical notes from patients with MCI or AD within the VA Healthcare System.
Read full abstract