Proceedings of the 2013 international workshop on Mining unstructured big data using natural language processing
It is our great pleasure to welcome you to the 2013 ACM International Workshop on Mining Unstructured Big Data using Natural Language Processing, which will be held at ACM International Conference on Information and Knowledge Management, CIKM 2013. Unstructured text data is heterogeneous and available in different formats, such as text document, scientific publication, web page, and customer comment. The availability of many big unstructured text datasets enables, while also challenges researchers to discover and explore valuable information/knowledge via different techniques. Mining semantics by using Natural Language Processing (NLP) methodologies is an important approach to uncover the "latent knowledge/semantic" of the unstructured text data. In the past decade, while a number of NLP based features already successfully used to enhance the performance of the text mining or information retrieval systems, we are also facing some challenges. For instance, most NLP algorithms' computational cost is high, and we can hardly employ them directly to large-scale text data for online systems. In this workshop, we aggregate different but highly related research communities, i.e., "NLP", "Text Mining" and "IR" researchers, to investigate the possible opportunities and challenges in semantic mining problem. Nine very interesting papers, covering semantic analysis, social media mining, real-time information extraction, and etc., will be presented in this workshop. For this workshop, an opportunity is offered to both NLP and text mining research communities to better clarify the opportunities and challenges in NLP based semantic mining for big unstructured text data with their research experience. We also encourage attendees to attend the keynote presentation - "HathiTrust Data, Opportunities and Challenges for Text Mining and NLP" by Dr. Beth A. Plale, Director of Data to Insight Center, and Professor at School of Informatics and Computing, Indiana University. HathiTrust is a partnership of academic & research institutions, offering a collection of millions of digitized from libraries around the world plus effective API access. We hope that you will find this program interesting and thought-provoking and that the workshop will provide you with a valuable opportunity to share ideas with other researchers and practitioners from institutions around the world.
- Research Article
60
- 10.1111/ajt.14099
- Jan 4, 2017
- American Journal of Transplantation
Big Data, Predictive Analytics, and Quality Improvement in Kidney Transplantation: A Proof of Concept.
- Research Article
31
- 10.1093/jamiaopen/ooac006
- Jan 7, 2022
- JAMIA open
ObjectiveTo evaluate whether a natural language processing (NLP) algorithm could be adapted to extract, with acceptable validity, markers of residential instability (ie, homelessness and housing insecurity) from electronic health records (EHRs) of 3 healthcare systems.Materials and methodsWe included patients 18 years and older who received care at 1 of 3 healthcare systems from 2016 through 2020 and had at least 1 free-text note in the EHR during this period. We conducted the study independently; the NLP algorithm logic and method of validity assessment were identical across sites. The approach to the development of the gold standard for assessment of validity differed across sites. Using the EntityRuler module of spaCy 2.3 Python toolkit, we created a rule-based NLP system made up of expert-developed patterns indicating residential instability at the lead site and enriched the NLP system using insight gained from its application at the other 2 sites. We adapted the algorithm at each site then validated the algorithm using a split-sample approach. We assessed the performance of the algorithm by measures of positive predictive value (precision), sensitivity (recall), and specificity.ResultsThe NLP algorithm performed with moderate precision (0.45, 0.73, and 1.0) at 3 sites. The sensitivity and specificity of the NLP algorithm varied across 3 sites (sensitivity: 0.68, 0.85, and 0.96; specificity: 0.69, 0.89, and 1.0).DiscussionThe performance of this NLP algorithm to identify residential instability in 3 different healthcare systems suggests the algorithm is generally valid and applicable in other healthcare systems with similar EHRs.ConclusionThe NLP approach developed in this project is adaptable and can be modified to extract types of social needs other than residential instability from EHRs across different healthcare systems.
- Research Article
1
- 10.1200/jco.2020.38.15_suppl.2043
- May 20, 2020
- Journal of Clinical Oncology
2043 Background: Electronic health records (EHR) are used for retrospective cancer outcomes analysis. Sites and timing of recurrence are not captured in structured EHR data. Novel computerized methods are necessary to use unstructured longitudinal EHR data for large scale studies. Methods: We previously developed a neural network-based NLP algorithm to identify no recurrence vs. metastatic recurrence cases by analyzing physician notes, pathology and radiology reports in Stanford’s breast cancer database, Oncoshare (Cohort A). To validate this algorithm for local vs. distant recurrence, we identified a distinct Oncoshare cohort (Cohort B). Cases were manually curated for longitudinal development of local or distant recurrence and metastatic sites. A two-sided t-test was used to compare mean probabilities between local and distant recurrence cases. Next, we combined cases in Cohorts A and B to train and validate a novel NLP classifier that identifies metastatic site. The combined cohort was randomly divided into training and validation sets. Sensitivity and specificity were calculated for the NLP algorithm’s ability to detect metastatic sites compared to manual curation. Results: In Cohort B: 350 metastatic cases were identified. Mean probability for local and distant recurrence was 0.43 and 0.79, respectively and differed significantly for patients with local vs. distant recurrence (p<0.01). In Cohorts A and B: 632 metastatic cases were used for determination of sites. Sensitivity and specificity were highest for detection of peritoneal metastasis followed by liver, lung, skin, bone and central nervous system (table). Conclusions: This NLP algorithm is a scalable tool that uses unstructured EHR data to capture breast cancer recurrence, distinguishing local from distant recurrence and identifying metastatic site. This method may facilitate analysis of large datasets and correlation of outcomes with metastatic site. [Table: see text]
- Research Article
64
- 10.1016/j.ijrobp.2021.01.044
- Feb 3, 2021
- International journal of radiation oncology, biology, physics
Clinical Natural Language Processing for Radiation Oncology: A Review and Practical Primer
- Research Article
43
- 10.1186/s12911-019-0780-5
- Apr 1, 2019
- BMC Medical Informatics and Decision Making
BackgroundOsteoporosis has become an important public health issue. Most of the population, particularly elderly people, are at some degree of risk of osteoporosis-related fractures. Accurate identification and surveillance of patient populations with fractures has a significant impact on reduction of cost of care by preventing future fractures and its corresponding complications.MethodsIn this study, we developed a rule-based natural language processing (NLP) algorithm for identification of twenty skeletal site-specific fractures from radiology reports. The rule-based NLP algorithm was based on regular expressions developed using MedTagger, an NLP tool of the Apache Unstructured Information Management Architecture (UIMA) pipeline to facilitate information extraction from clinical narratives. Radiology notes were retrieved from the Mayo Clinic electronic health records data warehouse. We developed rules for identifying each fracture type according to physicians’ knowledge and experience, and refined these rules via verification with physicians. This study was approved by the institutional review board (IRB) for human subject research.ResultsWe validated the NLP algorithm using the radiology reports of a community-based cohort at Mayo Clinic with the gold standard constructed by medical experts. The micro-averaged results of sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1-score of the proposed NLP algorithm are 0.930, 1.0, 1.0, 0.941, 0.961, respectively. The F1-score is 1.0 for 8 fractures, and above 0.9 for a total of 17 out of 20 fractures (85%).ConclusionsThe results verified the effectiveness of the proposed rule-based NLP algorithm in automatic identification of osteoporosis-related skeletal site-specific fractures from radiology reports. The NLP algorithm could be utilized to accurately identify the patients with fractures and those who are also at high risk of future fractures due to osteoporosis. Appropriate care interventions to those patients, not only the most at-risk patients but also those with emerging risk, would significantly reduce future fractures.
- Research Article
129
- 10.1111/jgs.15411
- Jul 4, 2018
- Journal of the American Geriatrics Society
To examine the value of unstructured electronic health record (EHR) data (free-text notes) in identifying a set of geriatric syndromes. Retrospective analysis of unstructured EHR notes using a natural language processing (NLP) algorithm. Large multispecialty group. Older adults (N=18,341; average age 75.9, 58.9% female). We compared the number of geriatric syndrome cases identified using structured claims and structured and unstructured EHR data. We also calculated these rates using a population-level claims database as a reference and identified comparable epidemiological rates in peer-reviewed literature as a benchmark. Using insurance claims data resulted in a geriatric syndrome prevalence ranging from 0.03% for lack of social support to 8.3% for walking difficulty. Using structured EHR data resulted in similar prevalence rates, ranging from 0.03% for malnutrition to 7.85% for walking difficulty. Incorporating unstructured EHR notes, enabled by applying the NLP algorithm, identified considerably higher rates of geriatric syndromes: absence of fecal control (2.1%, 2.3 times as much as structured claims and EHR data combined), decubitus ulcer (1.4%, 1.7 times as much), dementia (6.7%, 1.5 times as much), falls (23.6%, 3.2 times as much), malnutrition (2.5%, 18.0 times as much), lack of social support (29.8%, 455.9 times as much), urinary retention (4.2%, 3.9 times as much), vision impairment (6.2%, 7.4 times as much), weight loss (19.2%, 2.9 as much), and walking difficulty (36.34%, 3.4 as much). The geriatric syndrome rates extracted from structured data were substantially lower than published epidemiological rates, although adding the NLP results considerably closed this gap. Claims and structured EHR data give an incomplete picture of burden related to geriatric syndromes. Geriatric syndromes are likely to be missed if unstructured data are not analyzed. Pragmatic NLP algorithms can assist with identifying individuals at high risk of experiencing geriatric syndromes and improving coordination of care for older adults.
- Research Article
- 10.1200/jco.2025.43.16_suppl.e23294
- Jun 1, 2025
- Journal of Clinical Oncology
e23294 Background: PD-L1 testing has become crucial for guiding immunotherapy in mNSCLC and evidence suggests increasing adoption of PD-L1 testing in the community oncology setting. This study evaluated current real-world PD-L1 testing patterns in The US Oncology Network to identify opportunities for augmenting personalized medicine in mNSCLC care. Methods: This observational study included adults with mNSCLC, diagnosed with de novo Stage IV disease or progressed from an earlier stage, who initiated first-line (1L) treatment between 11/01/2022 and 08/31/2024. Data were sourced from iKnowMed electronic health records (EHR). PD-L1 testing documentation was captured from structured EHR fields and supplemented using a validated natural language processing (NLP) algorithm for unstructured records. The NLP results were compared to manual abstraction (gold standard) for a stratified random sample (by clinic and clinic location) of 100 patients without evidence of PD-L1 records in structured records. The sensitivity, specificity, and F1 score of the NLP algorithm were assessed relative to abstraction to evaluate the accuracy and precision of the model. PD-L1 testing patterns were assessed descriptively. Results: Among 2,148 study-eligible patients, 75% (n = 1,607) had PD-L1 testing documented in structured EHR fields at any time. Among patients with structured PD-L1 documentation (n = 1,607), 42% were diagnosed with Stage IV disease and rates of other biomarker testing ranged from 84% (for ROS1) to 91% (for EGFR). Among patients confirmed through abstraction to lack PD-L1 testing (n = 36), 86% were diagnosed with Stage IV disease and rates of other biomarker testing ranged from 42% (for ALK) to 56% (for EGFR). In a sample of 100 patients without evidence of PD-L1 records in structured data, the NLP algorithm performance was 89% for sensitivity (95% confidence interval [CI] 79%-95%); 86% for specificity (95% CI 71%, 95%) and 90% for F1 score. By applying the NLP algorithm for all 541 patients without structured PD-L1 records, an additional 313 patients with PD-L1 tests were identified, resulting in PD-L1 testing across an estimated 89% (n = 1,920) of the overall population. Conclusions: In a contemporary sample of community oncology patients with mNSCLC, approximately 90% received PD-L1 testing. Leveraging information in unstructured data using a validated NLP algorithm increased capture of PD-L1 testing. As the highest PD-L1 testing rate published to date, this result may reflect the proportion of patients for whom PD-L1 testing is clinically appropriate, given that some patients may decline therapy and/or select hospice care. Future research should investigate how community oncology practices successfully implemented PD-L1 testing and apply these learnings to forthcoming actionable biomarkers.
- Research Article
9
- 10.3389/fdgth.2021.777905
- Dec 22, 2021
- Frontiers in Digital Health
Introduction: The Food and Drug Administration Center for Biologics Evaluation and Research conducts post-market surveillance of biologic products to ensure their safety and effectiveness. Studies have found that common vaccine exposures may be missing from structured data elements of electronic health records (EHRs), instead being captured in clinical notes. This impacts monitoring of adverse events following immunizations (AEFIs). For example, COVID-19 vaccines have been regularly administered outside of traditional medical settings. We developed a natural language processing (NLP) algorithm to mine unstructured clinical notes for vaccinations not captured in structured EHR data.Methods: A random sample of 1,000 influenza vaccine administrations, representing 995 unique patients, was extracted from a large U.S. EHR database. NLP techniques were used to detect administrations from the clinical notes in the training dataset [80% (N = 797) of patients]. The algorithm was applied to the validation dataset [20% (N = 198) of patients] to assess performance. Full medical charts for 28 randomly selected administration events in the validation dataset were reviewed by clinicians. The NLP algorithm was then applied across the entire dataset (N = 995) to quantify the number of additional events identified.Results: A total of 3,199 administrations were identified in the structured data and clinical notes combined. Of these, 2,740 (85.7%) were identified in the structured data, while the NLP algorithm identified 1,183 (37.0%) administrations in clinical notes; 459 were not also captured in the structured data. This represents a 16.8% increase in the identification of vaccine administrations compared to using structured data alone. The validation of 28 vaccine administrations confirmed 27 (96.4%) as “definite” vaccine administrations; 18 (64.3%) had evidence of a vaccination event in the structured data, while 10 (35.7%) were found solely in the unstructured notes.Discussion: We demonstrated the utility of an NLP algorithm to identify vaccine administrations not captured in structured EHR data. NLP techniques have the potential to improve detection of vaccine administrations not otherwise reported without increasing the analysis burden on physicians or practitioners. Future applications could include refining estimates of vaccine coverage and detecting other exposures, population characteristics, and outcomes not reliably captured in structured EHR data.
- Research Article
70
- 10.1186/s13326-020-00231-z
- Nov 16, 2020
- Journal of biomedical semantics
BackgroundFree-text descriptions in electronic health records (EHRs) can be of interest for clinical research and care optimization. However, free text cannot be readily interpreted by a computer and, therefore, has limited value. Natural Language Processing (NLP) algorithms can make free text machine-interpretable by attaching ontology concepts to it. However, implementations of NLP algorithms are not evaluated consistently. Therefore, the objective of this study was to review the current methods used for developing and evaluating NLP algorithms that map clinical text fragments onto ontology concepts. To standardize the evaluation of algorithms and reduce heterogeneity between studies, we propose a list of recommendations.MethodsTwo reviewers examined publications indexed by Scopus, IEEE, MEDLINE, EMBASE, the ACM Digital Library, and the ACL Anthology. Publications reporting on NLP for mapping clinical text from EHRs to ontology concepts were included. Year, country, setting, objective, evaluation and validation methods, NLP algorithms, terminology systems, dataset size and language, performance measures, reference standard, generalizability, operational use, and source code availability were extracted. The studies’ objectives were categorized by way of induction. These results were used to define recommendations.ResultsTwo thousand three hundred fifty five unique studies were identified. Two hundred fifty six studies reported on the development of NLP algorithms for mapping free text to ontology concepts. Seventy-seven described development and evaluation. Twenty-two studies did not perform a validation on unseen data and 68 studies did not perform external validation. Of 23 studies that claimed that their algorithm was generalizable, 5 tested this by external validation. A list of sixteen recommendations regarding the usage of NLP systems and algorithms, usage of data, evaluation and validation, presentation of results, and generalizability of results was developed.ConclusionWe found many heterogeneous approaches to the reporting on the development and evaluation of NLP algorithms that map clinical text to ontology concepts. Over one-fourth of the identified publications did not perform an evaluation. In addition, over one-fourth of the included studies did not perform a validation, and 88% did not perform external validation. We believe that our recommendations, alongside an existing reporting standard, will increase the reproducibility and reusability of future studies and NLP algorithms in medicine.
- Research Article
22
- 10.1093/ehjqcco/qcad021
- Mar 30, 2023
- European heart journal. Quality of care & clinical outcomes
This study aimed to develop and apply natural language processing (NLP) algorithms to identify recurrent atrial fibrillation (AF) episodes following rhythm control therapy initiation using electronic health records (EHRs). We included adults with new-onset AF who initiated rhythm control therapies (ablation, cardioversion, or antiarrhythmic medication) within two US integrated healthcare delivery systems. A code-based algorithm identified potential AF recurrence using diagnosis and procedure codes. An automated NLP algorithm was developed and validated to capture AF recurrence from electrocardiograms, cardiac monitor reports, and clinical notes. Compared with the reference standard cases confirmed by physicians' adjudication, the F-scores, sensitivity, and specificity were all above 0.90 for the NLP algorithms at both sites. We applied the NLP and code-based algorithms to patients with incident AF (n=22 970) during the 12 months after initiating rhythm control therapy. Applying the NLP algorithms, the percentages of patients with AF recurrence for sites 1 and 2 were 60.7% and 69.9% (ablation), 64.5% and 73.7% (cardioversion), and 49.6% and 55.5% (antiarrhythmic medication), respectively. In comparison, the percentages of patients with code-identified AF recurrence for sites 1 and 2 were 20.2% and 23.7% for ablation, 25.6% and 28.4% for cardioversion, and 20.0% and 27.5% for antiarrhythmic medication, respectively. When compared with a code-based approach alone, this study's high-performing automated NLP method identified significantly more patients with recurrent AF. The NLP algorithms could enable efficient evaluation of treatment effectiveness of AF therapies in large populations and help develop tailored interventions.
- Research Article
- 10.1055/a-2405-3703
- Sep 19, 2024
- American journal of perinatology
Distinguishing between medically indicated induction of labor (iIOL) and elective induction of labor (eIOL) is a daunting process for researchers. We aimed to develop a Natural Language Processing (NLP) algorithm to identify eIOLs from electronic health records (EHRs) within a large integrated health care system. We used structured and unstructured data from Kaiser Permanente Southern California's EHRs of patients who were <35 years old and had singleton deliveries between 37 and 40 gestational weeks. Induction of labor (IOL) pregnancies were identified if there was evidence of an IOL diagnosis code, procedure code, or documentation in a delivery flowsheet or progress note. A comprehensive NLP algorithm was developed and refined through an iterative process of chart reviews and adjudications, where IOL-associated reasons (medically indicated vs. elective induction) were reviewed. The final algorithm was applied to discern the indications of IOLs performed during the study period. A total of 332,163 eligible pregnancies were identified between January 1, 2008, and December 31, 2022. Of these eligible pregnancies, 68,541 (20.6%) were IOL, of which 6,824 (10.0%) were eIOL. Validation of the NLP process against 300 randomly selected pregnancies (100 eIOL, iIOL, and non-IOL cases each) yielded a positive predictive value of 83.0% and 88.0% for eIOL and iIOL, respectively. The rates of eIOL among the maternal age groups ranged between 9.6 and 10.3%, except for the <20 years group (12.2%). Non-Hispanic White individuals had the highest rate of eIOL (13.2%), while non-Hispanic Asian/Pacific Islanders had the lowest rate of eIOL (7.8%). The rate of eIOL increased from 1.0% in the 37-week gestational age (GA) group to 20.6% in the 40-week GA group. Findings suggest that the developed NLP algorithm effectively identifies eIOL. It can be utilized to support eIOL-related pharmacoepidemiological studies, fill in knowledge gaps, and provide content more relevant to researchers. · An NLP algorithm was developed to identify indications of IOL.. · The study algorithm was successfully implemented within a large integrated health care system.. · The study algorithm can be utilized to support eIOL-related studies..
- Research Article
5
- 10.2196/69132
- May 2, 2025
- JMIR AI
BackgroundAsthma-related symptoms are significant predictors of asthma exacerbation. Most of these symptoms are documented in clinical notes in a free-text format, and effective methods for capturing asthma-related symptoms from unstructured data are lacking.ObjectiveThe study aims to develop a natural language processing (NLP) algorithm for identifying symptoms associated with asthma from clinical notes within a large integrated health care system.MethodsWe analyzed unstructured clinical notes within 2 years before a visit with asthma diagnosis in 2013‐2018 and 2021‐2022 to identify 4 common asthma-related symptoms. Related terms and phrases were initially compiled from publicly available resources and then refined through clinician input and chart review. A rule-based NLP algorithm was iteratively developed and refined via multiple rounds of chart review followed by adjudication. Subsequently, transformer-based deep learning algorithms were trained using the same manually annotated datasets. A hybrid NLP algorithm was then generated by combining rule-based and transformer-based algorithms. The hybrid NLP algorithm was finally applied to the implementation notes.ResultsA total of 11,374,552 eligible clinical notes with 128,211,793 sentences were analyzed. After applying the hybrid algorithm to implementation notes, at least 1 asthma-related symptom was identified in 1,663,450 out of 127,763,086 (1.3%) sentences and 858,350 out of 11,364,952 (7.55%) notes, respectively. Cough was the most frequently identified at both the sentence (1,363,713/127,763,086, 1.07%) and note (660,685/11,364,952, 5.81%) levels, while chest tightness was the least frequent at both the sentence (141,733/127,763,086, 0.11%) and note (64,251/11,364,952, 0.57%) levels. The frequency of multiple symptoms ranged from 0.03% (36,057/127,763,086) to 0.38% (484,050/127,763,086) at the sentence level and 0.10% (10,954/11,364,952) to 1.85% (209,805/11,364,952) at the note level. Validation against 1600 manually annotated clinical notes yielded a positive predictive value ranging from 96.53% (wheezing) to 97.42% (chest tightness) at the sentence level and 96.76% (wheezing) to 97.42% (chest tightness) at the note level. Sensitivity ranged from 93.9% (dyspnea) to 95.95% (cough) at the sentence level and 96% (chest tightness) to 99.07% (cough) at the note level. All 4 symptoms had F1-scores greater than 0.95 at both the sentence and note levels, regardless of NLP algorithms.ConclusionsThe developed NLP algorithms could effectively capture asthma-related symptoms from unstructured clinical notes. These algorithms could be used to facilitate early asthma detection and predict exacerbation risk.
- Research Article
19
- 10.1200/cci.17.00069
- Feb 20, 2018
- JCO Clinical Cancer Informatics
PurposeTo compare the accuracy and reliability of a natural language processing (NLP) algorithm with manual coding by radiologists, and the combination of the two methods, for the identification of patients whose computed tomography (CT) reports raised the concern for lung cancer.MethodsAn NLP algorithm was developed using Clinical Text Analysis and Knowledge Extraction System (cTAKES) with the Yale cTAKES Extensions and trained to differentiate between language indicating benign lesions and lesions concerning for lung cancer. A random sample of 450 chest CT reports performed at Veterans Affairs Connecticut Healthcare System between January 2014 and July 2015 was selected. A reference standard was created by the manual review of reports to determine if the text stated that follow-up was needed for concern for cancer. The NLP algorithm was applied to all reports and compared with case identification using the manual coding by the radiologists.ResultsA total of 450 reports representing 428 patients were analyzed. NLP had higher sensitivity and lower specificity than manual coding (77.3% v 51.5% and 72.5% v 82.5%, respectively). NLP and manual coding had similar positive predictive values (88.4% v 88.9%), and NLP had a higher negative predictive value than manual coding (54% v 38.5%). When NLP and manual coding were combined, sensitivity increased to 92.3%, with a decrease in specificity to 62.85%. Combined NLP and manual coding had a positive predictive value of 87.0% and a negative predictive value of 75.2%.ConclusionOur NLP algorithm was more sensitive than manual coding of CT chest reports for the identification of patients who required follow-up for suspicion of lung cancer. The combination of NLP and manual coding is a sensitive way to identify patients who need further workup for lung cancer.
- Research Article
25
- 10.1002/pds.4919
- Dec 3, 2019
- Pharmacoepidemiology and Drug Safety
The objective was to develop a natural language processing (NLP) algorithm to identify vaccine-related anaphylaxis from plain-text clinical notes, and to implement the algorithm at five health care systems in the Vaccine Safety Datalink. The NLP algorithm was developed using an internal NLP tool and training dataset of 311 potential anaphylaxis cases from Kaiser Permanente Southern California (KPSC). We applied the algorithm to the notes of another 731 potential cases (423 from KPSC; 308 from other sites) with relevant codes (ICD-9-CM diagnosis codes for anaphylaxis, vaccine adverse reactions, and allergic reactions; Healthcare Common Procedure Coding System codes for epinephrine administration). NLP results were compared against a reference standard of chart reviewed and adjudicated cases. The algorithm was then separately applied to the notes of 6 427 359 KPSC vaccination visits (9 402 194 vaccine doses) without relevant codes. At KPSC, NLP identified 12 of 16 true vaccine-related cases and achieved a sensitivity of 75.0%, specificity of 98.5%, positive predictive value (PPV) of 66.7%, and negative predictive value of 99.0% when applied to notes of patients with relevant diagnosis codes. NLP did not identify the five true cases at other sites. When NLP was applied to the notes of KPSC patients without relevant codes, it captured eight additional true cases confirmed by chart review and adjudication. The current study demonstrated the potential to apply rule-based NLP algorithms to clinical notes to identify anaphylaxis cases. Increasing the size of training data, including clinical notes from all participating study sites in the training data, and preprocessing the clinical notes to handle special characters could improve the performance of the NLP algorithms. We recommend adding an NLP process followed by manual chart review in future vaccine safety studies to improve sensitivity and efficiency.
- Research Article
42
- 10.1302/0301-620x.102b7.bjj-2019-1574.r1
- Jul 1, 2020
- The Bone & Joint Journal
Natural Language Processing (NLP) offers an automated method to extract data from unstructured free text fields for arthroplasty registry participation. Our objective was to investigate how accurately NLP can be used to extract structured clinical data from unstructured clinical notes when compared with manual data extraction. A group of 1,000 randomly selected clinical and hospital notes from eight different surgeons were collected for patients undergoing primary arthroplasty between 2012 and 2018. In all, 19 preoperative, 17 operative, and two postoperative variables of interest were manually extracted from these notes. A NLP algorithm was created to automatically extract these variables from a training sample of these notes, and the algorithm was tested on a random test sample of notes. Performance of the NLP algorithm was measured in Statistical Analysis System (SAS) by calculating the accuracy of the variables collected, the ability of the algorithm to collect the correct information when it was indeed in the note (sensitivity), and the ability of the algorithm to not collect a certain data element when it was not in the note (specificity). The NLP algorithm performed well at extracting variables from unstructured data in our random test dataset (accuracy = 96.3%, sensitivity = 95.2%, and specificity = 97.4%). It performed better at extracting data that were in a structured, templated format such as range of movement (ROM) (accuracy = 98%) and implant brand (accuracy = 98%) than data that were entered with variation depending on the author of the note such as the presence of deep-vein thrombosis (DVT) (accuracy = 90%). The NLP algorithm used in this study was able to identify a subset of variables from randomly selected unstructured notes in arthroplasty with an accuracy above 90%. For some variables, such as objective exam data, the accuracy was very high. Our findings suggest that automated algorithms using NLP can help orthopaedic practices retrospectively collect information for registries and quality improvement (QI) efforts. Cite this article: Bone Joint J 2020;102-B(7 Supple B):99-104.