Challenges In Natural Language Processing Research Articles

Achieving the Sustainable Development Goals (SDGs) requires collaboration among various stakeholders, particularly governments and non-state actors (NSAs). This collaboration results in but is also based on a continually growing volume of documents that needs to be analyzed and processed in a systematic way by government officials. Artificial Intelligence and Natural Language Processing (NLP) could, thus, offer valuable support for progressing towards SDG targets, including automating the government budget tagging and classifying NSA requests and initiatives, as well as helping uncover the possibilities for matching these two categories of activities. Many non-English speaking countries, including Indonesia, however, face limited NLP resources, such as, for instance, domain-specific pre-trained language models (PTLMs). This circumstance makes it difficult to automate document processing and improve the efficacy of SDG-related government efforts. The presented study introduces IndoGovBERT, a Bidirectional Encoder Representations from Transformers (BERT)-based PTLM built with domain-specific corpora, leveraging the Indonesian government’s public and internal documents. The model is intended to automate various laborious tasks of SDG document processing by the Indonesian government. Different approaches to PTLM development known from the literature are examined in the context of typical government settings. The most effective, in terms of the resultant model performance, but also most efficient, in terms of the computational resources required, methodology is determined and deployed for the development of the IndoGovBERT model. The developed model is then scrutinized in several text classification and similarity assessment experiments, where it is compared with four Indonesian general-purpose language models, a non-transformer approach of the Multilabel Topic Model (MLTM), as well as with a Multilingual BERT model. Results obtained in all experiments highlight the superior capability of the IndoGovBERT model for Indonesian government SDG document processing. The latter suggests that the proposed PTLM development methodology could be adopted to build high-performance specialized PTLMs for governments around the globe which face SDG document processing and other NLP challenges similar to the ones dealt with in the presented study.

Read full abstract

Efficiently treating cardiac patients before the onset of a heart attack relies on the precise prediction of heart disease. Identifying and detecting the risk factors for heart disease such as diabetes mellitus, Coronary Artery Disease (CAD), hyperlipidemia, hypertension, smoking, familial CAD history, obesity, and medications is critical for developing effective preventative and management measures. Although Electronic Health Records (EHRs) have emerged as valuable resources for identifying these risk factors, their unstructured format poses challenges for cardiologists in retrieving relevant information. This research proposed employing transfer learning techniques to automatically extract heart disease risk factors from EHRs. Leveraging transfer learning, a deep learning technique has demonstrated a significant performance in various clinical natural language processing (NLP) applications, particularly in heart disease risk prediction. This study explored the application of transformer-based language models, specifically utilizing pre-trained architectures like BERT (Bidirectional Encoder Representations from Transformers), RoBERTa, BioClinicalBERT, XLNet, and BioBERT for heart disease detection and extraction of related risk factors from clinical notes, using the i2b2 dataset. These transformer models are pre-trained on an extensive corpus of medical literature and clinical records to gain a deep understanding of contextualized language representations. Adapted models are then fine-tuned using annotated datasets specific to heart disease, such as the i2b2 dataset, enabling them to learn patterns and relationships within the domain. These models have demonstrated superior performance in extracting semantic information from EHRs, automating high-performance heart disease risk factor identification, and performing downstream NLP tasks within the clinical domain. This study proposed fine-tuned five widely used transformer-based models, namely BERT, RoBERTa, BioClinicalBERT, XLNet, and BioBERT, using the 2014 i2b2 clinical NLP challenge dataset. The fine-tuned models surpass conventional approaches in predicting the presence of heart disease risk factors with impressive accuracy. The RoBERTa model has achieved the highest performance, with micro F1-scores of 94.27%, while the BERT, BioClinicalBERT, XLNet, and BioBERT models have provided competitive performances with micro F1-scores of 93.73%, 94.03%, 93.97%, and 93.99%, respectively. Finally, a simple ensemble of the five transformer-based models has been proposed, which outperformed the most existing methods in heart disease risk fan, achieving a micro F1-Score of 94.26%. This study demonstrated the efficacy of transfer learning using transformer-based models in enhancing risk prediction and facilitating early intervention for heart disease prevention.

Read full abstract

Challenges In Natural Language Processing Research Articles

Related Topics

Articles published on Challenges In Natural Language Processing

Speech-Based Techniques for Emotion Detection in Natural Arabic Audio Files

Review of Language Structures and NLP Techniques for Chinese, Japanese, and English

IndoGovBERT: A Domain-Specific Language Model for Processing Indonesian Government SDG Documents

Disambiguating Clinical Abbreviations by One-to-All Classification: Algorithm Development and Validation Study.

Scientific landscape on opportunities and challenges of large language models and natural language processing

Fast Hybrid Approach for Thai News Summarization

Extracting Features from Text Flows based on Semantic Similarity for Text Classification: an Approach Inspired by Audio Analysis

Text-to-text generative approach for enhanced complex word identification

Towards explainable fake news detection and automated content credibility assessment: Polish internet and digital media use-case

Chimp Optimization Algorithm with Deep Learning-Driven Fine-grained Emotion Recognition in Arabic Corpus

Navigating the currents of natural language processing: A comprehensive overview of modern techniques and applications

Artificial intelligence and management education: A conceptualization of human-machine interaction

STVANet: A spatio-temporal visual attention framework with large kernel attention mechanism for citywide traffic dynamics prediction

Comparative Analysis of Deep Learning Models for Part of Speech Tagging in the Malay Language

Comprehensive analysis of natural language processing

Language Threshold for Multilingual Sentiment Analysis System

Large language models for biomedicine: foundations, opportunities, challenges, and best practices.

Adapting transformer-based language models for heart disease detection and risk factors extraction

Waste Pollution Classification in Indonesian Language using DistilBERT

Challenges of Natural Language Processing from a Linguistic Perspective

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Challenges In Natural Language Processing Research Articles

Related Topics

Articles published on Challenges In Natural Language Processing

Speech-Based Techniques for Emotion Detection in Natural Arabic Audio Files

Review of Language Structures and NLP Techniques for Chinese, Japanese, and English

IndoGovBERT: A Domain-Specific Language Model for Processing Indonesian Government SDG Documents

Disambiguating Clinical Abbreviations by One-to-All Classification: Algorithm Development and Validation Study.

Scientific landscape on opportunities and challenges of large language models and natural language processing

Fast Hybrid Approach for Thai News Summarization

Extracting Features from Text Flows based on Semantic Similarity for Text Classification: an Approach Inspired by Audio Analysis

Text-to-text generative approach for enhanced complex word identification

Towards explainable fake news detection and automated content credibility assessment: Polish internet and digital media use-case

Chimp Optimization Algorithm with Deep Learning-Driven Fine-grained Emotion Recognition in Arabic Corpus

Navigating the currents of natural language processing: A comprehensive overview of modern techniques and applications

Artificial intelligence and management education: A conceptualization of human-machine interaction

STVANet: A spatio-temporal visual attention framework with large kernel attention mechanism for citywide traffic dynamics prediction

Comparative Analysis of Deep Learning Models for Part of Speech Tagging in the Malay Language

Comprehensive analysis of natural language processing

Language Threshold for Multilingual Sentiment Analysis System

Large language models for biomedicine: foundations, opportunities, challenges, and best practices.

Adapting transformer-based language models for heart disease detection and risk factors extraction

Waste Pollution Classification in Indonesian Language using DistilBERT

Challenges of Natural Language Processing from a Linguistic Perspective