Abstract
Purpose: Data mining algorithms using electronic health records (EHRs) are useful in large-scale population-wide studies to classify etiology and comorbidities (Casey et al., 2016). Here, we apply this approach to developmental language disorder (DLD), a prevalent communication disorder whose risk factors and epidemiology remain largely undiscovered.Method: We first created a reliable system for manually identifying DLD in EHRs based on speech-language pathologist (SLP) diagnostic expertise. We then developed and validated an automated algorithmic procedure, called, Automated Phenotyping Tool for identifying DLD cases in health systems data (APT-DLD), that classifies a DLD status for patients within EHRs on the basis of ICD (International Statistical Classification of Diseases and Related Health Problems) codes. APT-DLD was validated in a discovery sample (N = 973) using expert SLP manual phenotype coding as a gold-standard comparison and then applied and further validated in a replication sample of N = 13,652 EHRs.Results: In the discovery sample, the APT-DLD algorithm correctly classified 98% (concordance) of DLD cases in concordance with manually coded records in the training set, indicating that APT-DLD successfully mimics a comprehensive chart review. The output of APT-DLD was also validated in relation to independently conducted SLP clinician coding in a subset of records, with a positive predictive value of 95% of cases correctly classified as DLD. We also applied APT-DLD to the replication sample, where it achieved a positive predictive value of 90% in relation to SLP clinician classification of DLD.Conclusions: APT-DLD is a reliable, valid, and scalable tool for identifying DLD cohorts in EHRs. This new method has promising public health implications for future large-scale epidemiological investigations of DLD and may inform EHR data mining algorithms for other communication disorders.Supplemental Material S1. Developmental language disorder (DLD) manual chart review rubric.Supplemental Material S2. Intercoder reliability for research assistant coders and SLP coders for 10% of the discovery sample.Supplemental Material S3. Determining the DLD phenotype among EHRs retrieved from a broad search for LD symptoms. Walters, C. E., Jr., Nitin, R., Margulis, K., Boorom, O., Gustavson, D. E., Bush, C. T., Davis, L. K., Below, J. E., Cox, N. J., Camarata, S. M., & Gordon, R. L. (2020). Automated Phenotyping Tool for identifying developmental language disorder cases in health systems data (APT-DLD): A new research algorithm for deployment in large-scale electronic health record systems. Journal of Speech, Language, and Hearing Research. https://doi.org/10.1044/2020_JSLHR-19-00397
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have