Determining Multiple Sclerosis Phenotype from Electronic Medical Records.

Richard E Nelson,Kristin Knippenberg,Scott L Duvall,Aaron W C Kamauu,Jorie Butler,Joanne Lafleur

doi:10.18553/jmcp.2016.22.12.1377

Abstract

Multiple sclerosis (MS), a central nervous system disease in which nerve signals are disrupted by scarring and demyelination, is classified into phenotypes depending on the patterns of cognitive or physical impairment progression: relapsing-remitting MS (RRMS), primary-progressive MS (PPMS), secondary-progressive MS (SPMS), or progressive-relapsing MS (PRMS). The phenotype is important in managing the disease and determining appropriate treatment. The ICD-9-CM code 340.0 is uninformative about MS phenotype, which increases the difficulty of studying the effects of phenotype on disease. To identify MS phenotype using natural language processing (NLP) techniques on progress notes and other clinical text in the electronic medical record (EMR). Patients with at least 2 ICD-9-CM codes for MS (340.0) from 1999 through 2010 were identified from nationwide EMR data in the Department of Veterans Affairs. Clinical experts were interviewed for possible keywords and phrases denoting MS phenotype in order to develop a data dictionary for NLP. For each patient, NLP was used to search EMR clinical notes, since the first MS diagnosis date for these keywords and phrases. Presence of phenotype-related keywords and phrases were analyzed in context to remove mentions that were negated (e.g., "not relapsing-remitting") or unrelated to MS (e.g., "RR" meaning "respiratory rate"). One thousand mentions of MS phenotype were validated, and all records of 150 patients were reviewed for missed mentions. There were 7,756 MS patients identified by ICD-9-CM code 340.0. MS phenotype was identified for 2,854 (36.8%) patients, with 1,836 (64.3%) of those having just 1 phenotype mentioned in their EMR clinical notes: 1,118 (39.2%) RRMS, 325 (11.4%) PPMS, 374 (13.1%) SPMS, and 19 (0.7%) PRMS. A total of 747 patients (26.2%) had 2 phenotypes, the most common being 459 patients (16.1%) with RRMS and SPMS. A total of 213 patients (7.5%) had 3 phenotypes, and 58 patients (2.0%) had 4 phenotypes mentioned in their EMR clinical notes. Positive predictive value of phenotype identification was 93.8% with sensitivity of 94.0%. Phenotype was documented for slightly more than one third of MS patients, an important but disappointing finding that sets a limit on studying the effects of phenotype on MS in general. However, for cases where the phenotype was documented, NLP accurately identified the phenotypes. Having multiple phenotypes documented is consistent with disease progression. The most common misidentification was because of ambiguity while clinicians were trying to determine phenotype. This study brings attention to the need for care providers to document MS phenotype more consistently and provides a solution for capturing phenotype from clinical text. This study was funded by Anolinx and F. Hoffman-La Roche. Nelson serves as a consultant for Anolinx. Kamauu is owner of Anolinx, which has received multiple research grants from pharmaceutical and biotechnology companies. LaFleur has received a Novartis grant for ongoing work. The views expressed in this article are those of the authors and do not necessarily reflect the position or policy of the Department of Veterans Affairs or the U.S. government. Study concept and design were contributed by Butler, LaFleur, Kamauu, DuVall, and Nelson. DuVall collected the data, and interpretation was performed by Nelson, DuVall, and Kamauu, along with Butler, LaFleur, and Knippenberg. The manuscript was written primarily by Nelson, along with Knippenberg and assisted by the other authors, and revised by Knippenberg, Nelson, and DuVall, along with the other authors.

Full Text