Natural Language Processing Pipeline Research Articles

Pediatric asthma is a heterogeneous disease; however, current characterizations of its subtypes are limited. Machine learning (ML) methods are well-suited for identifying subtypes. In particular, deep neural networks can learn patient representations by leveraging longitudinal information captured in electronic health records (EHRs) while considering future outcomes. However, the traditional approach for subtype analysis requires large amounts of EHR data, which may contain protected health information causing potential concerns regarding patient privacy. Federated learning is the key technology to address privacy concerns while preserving the accuracy and performance of ML algorithms. Federated learning could enable multisite development and implementation of ML algorithms to facilitate the translation of artificial intelligence into clinical practice. The aim of this study is to develop a research protocol for implementation of federated ML across a large clinical research network to identify and discover pediatric asthma subtypes and their progression over time. This mixed methods study uses data and clinicians from the OneFlorida+ clinical research network, which is a large regional network covering linked and longitudinal patient-level real-world data (RWD) of over 20 million patients from Florida, Georgia, and Alabama in the United States. To characterize the subtypes, we will use OneFlorida+ data from 2011 to 2023 and develop a research-grade pediatric asthma computable phenotype and clinical natural language processing pipeline to identify pediatric patients with asthma aged 2-18 years. We will then apply federated learning to characterize pediatric asthma subtypes and their temporal progression. Using the Promoting Action on Research Implementation in Health Services framework, we will conduct focus groups with practicing pediatric asthma clinicians within the OneFlorida+ network to investigate the clinical utility of the subtypes. With a user-centered design, we will create prototypes to visualize the subtypes in the EHR to best assist with the clinical management of children with asthma. OneFlorida+ data from 2011 to 2023 have been collected for 411,628 patients aged 2-18 years along with 11,156,148 clinical notes. We expect to complete the computable phenotyping within the first year of the project, followed by subtyping during the second and third years, and then will perform the focus groups and establish the user-centered design in the fourth and fifth years of the project. Pediatric asthma subtypes incorporating RWD from diverse populations could improve patient outcomes by moving the field closer to precision pediatric asthma care. Our privacy-preserving federated learning methodology and qualitative implementation work will address several challenges of applying ML to large, multicenter RWD data. DERR1-10.2196/57981.

Read full abstract

e13624 Background: Cancer staging is instrumental in driving clinical management and trial enrollment, but staging data are generally unreliable and unstructured in the electronic health record (EHR). Advances in natural language processing (NLP) may facilitate clinical staging and documentation [1], but challenges to real-world implementation include (1) automatically identifying appropriate patients and reports from the EHR and (2) developing an unbiased dataset for training and validation [2]. We describe our institution’s novel approach to overcome these barriers while building an in-house NLP pipeline for clinical tumor staging of non-small-cell lung cancer (NSCLC). Methods: We identified patients by searching our EHR (Epic) for a molecular analysis test ordered specifically for pathological diagnoses of NSCLC at our institution. We used the test order date as the diagnosis proxy date (DPD). For each patient, we extracted imaging reports up to 16 weeks before and 6 weeks after the DPD. To derive primary tumor size, we analyzed the CT Chest or PET/CT report closest to the DPD using an oncology-trained NLP text extraction and labeling tool (John Snow Labs). We cleaned all extracted tumor size entities and identified the largest measurement linked to the lungs. We compared primary tumor measurements from the NLP pipeline to those in a preexisting, manually compiled cancer registry (CNEXT). We manually analyzed discrepancies through chart review. Results: 542 patients with a DPD between 11/2016 - 9/2023 were processed through the NLP pipeline. Of 443 patients with valid values in both the pipeline and CNEXT, 53% (234) were exact matches, and 20% (90) had a close match (within 0-5mm), yielding a 73% accuracy rate for values within 5mm. When mismatched values were manually reviewed, several cases in CNEXT were found to have a DPD differing by more than 3 months and tumor sizes derived from external reports. When these cases were excluded, 320 of the remaining 349 patients had valid values in both the pipeline and the updated manual review. In this refined population, 66% (213) were exact matches, and 15% (48) had a close match, yielding an 82% accuracy rate for values within 5mm. Conclusions: To our knowledge, this is the first report of a pathology-based method to automatically and reliably identify patients with NSCLC and their relevant imaging reports directly from the EHR. We used a prebuilt NLP tool to derive primary tumor sizes with relatively high accuracy and found that adding flags for timeline discrepancies and external reports can further improve validity. As we near completion of analogous pipelines for node and metastasis staging, we will develop methodology to identify subgroups of patients that can be clinically staged with near-perfect accuracy, ultimately aiming to substantially limit manual staging of uncomplicated cases. 1. Puts 2023. 2. Wang 2022.

Read full abstract

Natural Language Processing Pipeline Research Articles

Related Topics

Articles published on Natural Language Processing Pipeline

Enhancing Aortic Aneurysm Surveillance: Transformer Natural Language Processing for Flagging and Measuring in Radiology Reports

Using Artificial Intelligence to Support Informed Decision-Making on BRAF Mutation Testing.

Improving surveillance of emergency department activity through natural language processing

An innovative linked Electronic Health Record derived data ecosystem for understanding the complexities of health and ageing.

A Comprehensive Natural Language Processing Pipeline for the Chronic Lupus Disease.

Utilizing natural language processing to analyze student narrative reflections for medical curriculum improvement.

Development of a natural language processing pipeline for assessment of cardiovascular risk in myeloproliferative neoplasms.

Use of Natural Language Processing to Extract and Classify Papillary Thyroid Cancer Features From Surgical Pathology Reports

Improving biomedical entity linking for complex entity mentions with LLM-based text simplification.

Demystifying large language models in second language development research

Representation of Social Determinants of Health terminology in medical subject headings: impact of added terms.

Combining Federated Machine Learning and Qualitative Methods to Investigate Novel Pediatric Asthma Subtypes: Protocol for a Mixed Methods Study.

Clinical efficacy of pre-trained large language models through the lens of aphasia

Early evaluation of a natural language processing tool to improve access to educational resources for surgical patients.

Advancing equity in breast cancer care: natural language processing for analysing treatment outcomes in under-represented populations

Age modulates the predictive value of self-reported sleepiness for all-cause mortality risk: insights from a comprehensive national database of veterans.

Age modifies the association between severe sleep apnea and all-cause mortality

Natural language processing pipeline to extract prostate cancer-related information from clinical notes.

Novel approach to implementing natural language processing for clinical staging of non-small-cell lung cancer.

Implementing a natural language processing pipeline to expedite clinical chart review to assess germline genetic testing in veterans with breast cancer.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Natural Language Processing Pipeline Research Articles

Related Topics

Articles published on Natural Language Processing Pipeline

Enhancing Aortic Aneurysm Surveillance: Transformer Natural Language Processing for Flagging and Measuring in Radiology Reports

Using Artificial Intelligence to Support Informed Decision-Making on BRAF Mutation Testing.

Improving surveillance of emergency department activity through natural language processing

An innovative linked Electronic Health Record derived data ecosystem for understanding the complexities of health and ageing.

A Comprehensive Natural Language Processing Pipeline for the Chronic Lupus Disease.

Utilizing natural language processing to analyze student narrative reflections for medical curriculum improvement.

Development of a natural language processing pipeline for assessment of cardiovascular risk in myeloproliferative neoplasms.

Use of Natural Language Processing to Extract and Classify Papillary Thyroid Cancer Features From Surgical Pathology Reports

Improving biomedical entity linking for complex entity mentions with LLM-based text simplification.

Demystifying large language models in second language development research

Representation of Social Determinants of Health terminology in medical subject headings: impact of added terms.

Combining Federated Machine Learning and Qualitative Methods to Investigate Novel Pediatric Asthma Subtypes: Protocol for a Mixed Methods Study.

Clinical efficacy of pre-trained large language models through the lens of aphasia

Early evaluation of a natural language processing tool to improve access to educational resources for surgical patients.

Advancing equity in breast cancer care: natural language processing for analysing treatment outcomes in under-represented populations

Age modulates the predictive value of self-reported sleepiness for all-cause mortality risk: insights from a comprehensive national database of veterans.

Age modifies the association between severe sleep apnea and all-cause mortality

Natural language processing pipeline to extract prostate cancer-related information from clinical notes.

Novel approach to implementing natural language processing for clinical staging of non-small-cell lung cancer.

Implementing a natural language processing pipeline to expedite clinical chart review to assess germline genetic testing in veterans with breast cancer.