Abstract

Background:The presence of erosive disease influences diagnosis, management, and prognosis in inflammatory arthritis (IA).Research of IA in large datasets is limited by a lack of methods for identifying erosions.Objectives:To develop methods for identifying articular erosions in radiology reports from veterans with IA.Methods:Included veterans had ≥2 ICD codes for ankylosing spondylitis (AS), psoriatic arthritis (PsA), or rheumatoid arthritis (RA) between 2005- 2019, in Veterans Affairs Corporate Data Warehouse. Chart review & annotation of radiology notes produced the reference standard, & identified erosion terms that informed classification rule development. A rule-based natural language processing (NLP) model was created & revised in training snippets. The NLP method was validated in an independent reference sample of IA patients at the snippet & patient levelsStepDescriptionNumber & example1 Radiology notesa.Select note titles potentially relevant to IAa. 35,141 notes titlesb.Extract notes with titles potentially related to IAb. 2,926,113 radiology notes2 Possible meaningful termsa.Compile list of root terms that may indicate erosiona. 11 root terms (i.e. ero*, pencil*cup, irreg*)b.Query radiology notes for root term variationsb. 1178 variations (i.e. erosion, erotic, erode)c.Select possible meaningful termsc. 179 possible terms (i.e. erosion, erode)3 Annotationa.Extract snippets^ containing possible meaningful termsa.5000 snippets from radiology notesb.Classify snippets according to: 1) Meaningful term, 2) Relevance to joint, 3) Attribution to IA, 4) Affirmationb.4068 classifications with 1017 snippets (in rounds of 50-417 snippets for NLP training & testing)4 Rule developmenta.Identify meaningful terms representing erosiona. 6 terms (pencil * cup, erosion, erosive, etc.)b.Exclude erosive processes irrelevant to joint(s)b. 28 irrelevant processes (i.e. gastric erosion)c. Exclude articular erosive processes not attributed to IAc. 5 non-IA processes IA (i.e. infection)d. Classify as affirmed/negated (erosion present/absent)d. 83 affirmation/negation rules5 NLP trainingDesign & revise NLP model until accuracy ≥90%6 rounds, 817 snippets (AS 417, RA 200, PsA 200)6 NLP testingTest NLP model200 snippets (AS 100, RA 50, PsA 50)7 Pt classificationa. Develop rules for classifying pts with discordant snippetsa. 5 rules developed in 368 ptsb. Build reference sample (pts classified as erosive or non-erosive via chart review)b. 30 IA pts (10 AS, 10 RA, 10 PsA)8 NLP validationValidate NLP model in reference sample at snippet level149 snippets (29 AS, 76 RA, 44 PsA)9 Method validationValidate methods (NLP+pt classification) at pt level30 IA pts (reference sample)pt= patient. ^Snippets include text containing 30 words before & after meaningful termsResults:In 168,667 veterans with IA, the mean age was 63.1 & 90.3% were male. Method development involved radiology note & erosion term selection, rule development, NLP model building, & method validation. The NLP model accuracy was 94.6% at the snippet level & 90.0% at the patient level, for all IA patients.Accuracy of methods.Conclusion:The methods accurately identify erosions from radiology reports of veterans with IA. They may facilitate a broad range of research involving cohort identification & disease severity stratification

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call