Extraction of Class Candidates from Scenario in Software Requirements Specifications
The development of software applications involves translating software requirement specifications (SRS) into structured models that guide system design. Among these, sequence diagrams are essential for visualizing dynamic interactions, but their manual construction from natural language descriptions is often error-prone and time-consuming. This study proposes an automated method for extracting sequence diagram elements namely classes, subclasses, and attributes from scenario sections of SRS documents. The approach leverages Natural Language Processing (NLP) techniques, combining Bidirectional Encoder Representations from Transformers (BERT) for contextual embeddings and Support Vector Machine (SVM) for classification. Noun phrases are identified and classified into UML-relevant entities using this hybrid model. To evaluate performance, two datasets SIData and SILo were used, each exhibiting distinct textual styles and domain characteristics. The system’s effectiveness was assessed using standard evaluation metrics such as precision, recall, and F1-score. Results indicate that the method is capable of capturing contextual relationships between extracted elements, although its performance varies across datasets, suggesting the need for further refinement. Overall, the study contributes toward automating early software design phases and reducing manual modeling effort.
- Research Article
7
- 10.47813/2782-5280-2024-3-1-0311-0320
- Mar 2, 2024
- Информатика. Экономика. Управление - Informatics. Economics. Management
First developed in 2018 by Google researchers, Bidirectional Encoder Representations from Transformers (BERT) represents a breakthrough in natural language processing (NLP). BERT achieved state-of-the-art results across a range of NLP tasks while using a single transformer-based neural network architecture. This work reviews BERT's technical approach, performance when published, and significant research impact since release. We provide background on BERT's foundations like transformer encoders and transfer learning from universal language models. Core technical innovations include deeply bidirectional conditioning and a masked language modeling objective during BERT's unsupervised pretraining phase. For evaluation, BERT was fine-tuned and tested on eleven NLP tasks ranging from question answering to sentiment analysis via the GLUE benchmark, achieving new state-of-the-art results. Additionally, this work analyzes BERT's immense research influence as an accessible technique surpassing specialized models. BERT catalyzed adoption of pretraining and transfer learning for NLP. Quantitatively, over 10,000 papers have extended BERT and it is integrated widely across industry applications. Future directions based on BERT scale towards billions of parameters and multilingual representations. In summary, this work reviews the method, performance, impact and future outlook for BERT as a foundational NLP technique. We provide background on BERT's foundations like transformer encoders and transfer learning from universal language models. Core technical innovations include deeply bidirectional conditioning and a masked language modeling objective during BERT's unsupervised pretraining phase. For evaluation, BERT was fine-tuned and tested on eleven NLP tasks ranging from question answering to sentiment analysis via the GLUE benchmark, achieving new state-of-the-art results. Additionally, this work analyzes BERT's immense research influence as an accessible technique surpassing specialized models. BERT catalyzed adoption of pretraining and transfer learning for NLP. Quantitatively, over 10,000 papers have extended BERT and it is integrated widely across industry applications. Future directions based on BERT scale towards billions of parameters and multilingual representations. In summary, this work reviews the method, performance, impact and future outlook for BERT as a foundational NLP technique.
- Research Article
- 10.52783/jisem.v10i51s.10392
- May 30, 2025
- Journal of Information Systems Engineering and Management
Automatic Short Answer Grading (ASAG) has gained increasing importance in educational technology, where accurate and scalable assessment solutions are needed. Recent advances in Natural Language Processing (NLP) have introduced powerful Transformer-based models, such as Bidirectional Encoder Representations from Transformers (BERT), Text-to-Text Transfer Transformer (T5), and Generative Pre-trained Transformer 3 (GPT-3), which have demonstrated state-of-the-art performance across various text-based tasks. This paper presents a comparative study of these three models in the context of ASAG, evaluating their effectiveness, accuracy, and efficiency. BERT’s bidirectional encoding, T5’s text-to-text framework, and GPT-3’s autoregressive generation are explored in depth to assess their ability to understand, grade, and generate feedback on short answers. We utilize standard ASAG datasets and multiple evaluation metrics, including accuracy, precision, recall, and F1-score, to measure their performance. The comparative analysis reveals that while all three models exhibit strong capabilities, they vary in handling complex language and ambiguous student responses, with trade-offs in computational cost and scalability. This study highlights the strengths and weaknesses of each model in ASAG and offers insights into their practical applications in educational settings. Introduction: The automation of grading has become a focal point in modern education systems, driven by the increasing demand for scalable and efficient assessment solutions (Sahu & Bhowmick, 2015). With the proliferation of online learning platforms, digital classrooms, and remote education, the ability to automatically grade short-answer questions has gained significant importance (Gomaa & Fahmy, 2020). Automatic Short Answer Grading (ASAG) seeks to evaluate student responses by comparing them to model answers, often assessing the content’s correctness, relevance, and linguistic features—critical components for evaluating students’ understanding and knowledge retention (Busatta & Brancher, 2018). Traditional ASAG approaches typically employed rule-based systems, statistical models, and early machine learning algorithms that relied heavily on predefined keywords, templates, or handcrafted features (Tulu et al., 2021). While effective for straightforward, fact-based questions, these systems struggled to capture the complexity and variability of natural language, resulting in reduced grading accuracy—especially for creative or ambiguous responses (Sychev et al., 2019). Consequently, such methods often required significant manual intervention, limiting their scalability and applicability in dynamic educational settings (Muftah & Aziz, 2013). The advent of deep learning, particularly in the field of Natural Language Processing (NLP), has marked a transformative shift in ASAG (Gaddipati et al., 2020). Neural network-based models have demonstrated a remarkable capacity to learn and generalize from large datasets, enabling a more nuanced understanding of language (Wang et al., 2019). This has led to the development of more robust ASAG systems capable of handling a broader spectrum of student responses, ranging from factual answers to complex explanations (Roy et al., 2016). A pivotal advancement in NLP is the introduction of the Transformer architecture, which has revolutionized how language models are designed and trained (Vaswani et al., 2017). Transformers excel in processing sequential data through self-attention mechanisms that capture long-range dependencies and contextual relationships within text. This architectural innovation has significantly enhanced performance across a variety of NLP tasks, such as machine translation, sentiment analysis, and question answering (Peters et al., 2018), making Transformer-based models particularly suitable for enhancing ASAG systems (Raffel et al., 2020). In this paper, we focus on three prominent Transformer-based models—BERT, T5, and GPT-3—each representing a distinct approach to language understanding and processing. These models have set new benchmarks across numerous NLP tasks, and their potential application in ASAG is substantial Objectives: The goal of this study is to conduct a comparative analysis of these three Transformer models—BERT, T5, and GPT-3—in the context of ASAG. We evaluate their performance on standard ASAG datasets using multiple evaluation metrics, such as accuracy, precision, recall, and F1-score. Additionally, we analyze the computational efficiency and scalability of these models to determine their practicality for deployment in large-scale educational environments. Methods: By providing a comprehensive comparison, this study seeks to shed light on the strengths and weaknesses of each model and their suitability for different types of ASAG tasks. Moreover, we aim to offer insights that can guide future research and development in this area, ultimately contributing to the creation of more effective and reliable automated grading systems. Results: The results of our comparative analysis of BERT, T5, and GPT-3 in the context of Automatic Short Answer Grading (ASAG) reveal important insights into the strengths and limitations of these Transformer models. This section discusses the implications of our findings, the practical considerations for deploying these models in educational settings, and identifies potential avenues for future research. Conclusions: In conclusion, this study provides a comprehensive comparative analysis of BERT, T5, and GPT-3 for ASAG, highlighting their strengths, limitations, and practical considerations. The insights gained from this research contribute to the ongoing development and refinement of automated grading systems, with the potential to enhance educational assessment and support in diverse learning environments.
- Research Article
- 10.52783/jisem.v10i51s.10376
- May 30, 2025
- Journal of Information Systems Engineering and Management
Automatic Short Answer Grading (ASAG) has gained increasing importance in educational technology, where accurate and scalable assessment solutions are needed. Recent advances in Natural Language Processing (NLP) have introduced powerful Transformer-based models, such as Bidirectional Encoder Representations from Transformers (BERT), Text-to-Text Transfer Transformer (T5), and Generative Pre-trained Transformer 3 (GPT-3), which have demonstrated state-of-the-art performance across various text-based tasks. This paper presents a comparative study of these three models in the context of ASAG, evaluating their effectiveness, accuracy, and efficiency. BERT’s bidirectional encoding, T5’s text-to-text framework, and GPT-3’s autoregressive generation are explored in depth to assess their ability to understand, grade, and generate feedback on short answers. We utilize standard ASAG datasets and multiple evaluation metrics, including accuracy, precision, recall, and F1-score, to measure their performance. The comparative analysis reveals that while all three models exhibit strong capabilities, they vary in handling complex language and ambiguous student responses, with trade-offs in computational cost and scalability. This study highlights the strengths and weaknesses of each model in ASAG and offers insights into their practical applications in educational settings. Introduction: The automation of grading has become a focal point in modern education systems, driven by the increasing demand for scalable and efficient assessment solutions (Sahu & Bhowmick, 2015). With the proliferation of online learning platforms, digital classrooms, and remote education, the ability to automatically grade short-answer questions has gained significant importance (Gomaa & Fahmy, 2020). Automatic Short Answer Grading (ASAG) seeks to evaluate student responses by comparing them to model answers, often assessing the content’s correctness, relevance, and linguistic features—critical components for evaluating students’ understanding and knowledge retention (Busatta & Brancher, 2018). Traditional ASAG approaches typically employed rule-based systems, statistical models, and early machine learning algorithms that relied heavily on predefined keywords, templates, or handcrafted features (Tulu et al., 2021). While effective for straightforward, fact-based questions, these systems struggled to capture the complexity and variability of natural language, resulting in reduced grading accuracy—especially for creative or ambiguous responses (Sychev et al., 2019). Consequently, such methods often required significant manual intervention, limiting their scalability and applicability in dynamic educational settings (Muftah & Aziz, 2013). The advent of deep learning, particularly in the field of Natural Language Processing (NLP), has marked a transformative shift in ASAG (Gaddipati et al., 2020). Neural network-based models have demonstrated a remarkable capacity to learn and generalize from large datasets, enabling a more nuanced understanding of language (Wang et al., 2019). This has led to the development of more robust ASAG systems capable of handling a broader spectrum of student responses, ranging from factual answers to complex explanations (Roy et al., 2016). A pivotal advancement in NLP is the introduction of the Transformer architecture, which has revolutionized how language models are designed and trained (Vaswani et al., 2017). Transformers excel in processing sequential data through self-attention mechanisms that capture long-range dependencies and contextual relationships within text. This architectural innovation has significantly enhanced performance across a variety of NLP tasks, such as machine translation, sentiment analysis, and question answering (Peters et al., 2018), making Transformer-based models particularly suitable for enhancing ASAG systems (Raffel et al., 2020). In this paper, we focus on three prominent Transformer-based models—BERT, T5, and GPT-3—each representing a distinct approach to language understanding and processing. These models have set new benchmarks across numerous NLP tasks, and their potential application in ASAG is substantial Objectives: The goal of this study is to conduct a comparative analysis of these three Transformer models—BERT, T5, and GPT-3—in the context of ASAG. We evaluate their performance on standard ASAG datasets using multiple evaluation metrics, such as accuracy, precision, recall, and F1-score. Additionally, we analyze the computational efficiency and scalability of these models to determine their practicality for deployment in large-scale educational environments. Methods: By providing a comprehensive comparison, this study seeks to shed light on the strengths and weaknesses of each model and their suitability for different types of ASAG tasks. Moreover, we aim to offer insights that can guide future research and development in this area, ultimately contributing to the creation of more effective and reliable automated grading systems. Results: The results of our comparative analysis of BERT, T5, and GPT-3 in the context of Automatic Short Answer Grading (ASAG) reveal important insights into the strengths and limitations of these Transformer models. This section discusses the implications of our findings, the practical considerations for deploying these models in educational settings, and identifies potential avenues for future research. Conclusions: In conclusion, this study provides a comprehensive comparative analysis of BERT, T5, and GPT-3 for ASAG, highlighting their strengths, limitations, and practical considerations. The insights gained from this research contribute to the ongoing development and refinement of automated grading systems, with the potential to enhance educational assessment and support in diverse learning environments.
- Research Article
10
- 10.1007/s11227-023-05319-8
- May 7, 2023
- The Journal of Supercomputing
Since the spread of the coronavirus flu in 2019 (hereafter referred to as COVID-19), millions of people worldwide have been affected by the pandemic, which has significantly impacted our habits in various ways. In order to eradicate the disease, a great help came from unprecedentedly fast vaccines development along with strict preventive measures adoption like lockdown. Thus, world wide provisioning of vaccines was crucial in order to achieve the maximum immunization of population. However, the fast development of vaccines, driven by the urge of limiting the pandemic caused skeptical reactions by a vast amount of population. More specifically, the people’s hesitancy in getting vaccinated was an additional obstacle in fighting COVID-19. To ameliorate this scenario, it is important to understand people’s sentiments about vaccines in order to take proper actions to better inform the population. As a matter of fact, people continuously update their feelings and sentiments on social media, thus a proper analysis of those opinions is an important challenge for providing proper information to avoid misinformation. More in detail, sentiment analysis (Wankhade et al. in Artif Intell Rev 55(7):5731–5780, 2022. https://doi.org/10.1007/s10462-022-10144-1) is a powerful technique in natural language processing that enables the identification and classification of people feelings (mainly) in text data. It involves the use of machine learning algorithms and other computational techniques to analyze large volumes of text and determine whether they express positive, negative or neutral sentiment. Sentiment analysis is widely used in industries such as marketing, customer service, and healthcare, among others, to gain actionable insights from customer feedback, social media posts, and other forms of unstructured textual data. In this paper, Sentiment Analysis will be used to elaborate on people reaction to COVID-19 vaccines in order to provide useful insights to improve the correct understanding of their correct usage and possible advantages. In this paper, a framework that leverages artificial intelligence (AI) methods is proposed for classifying tweets based on their polarity values. We analyzed Twitter data related to COVID-19 vaccines after the most appropriate pre-processing on them. More specifically, we identified the word-cloud of negative, positive, and neutral words using an artificial intelligence tool to determine the sentiment of tweets. After this pre-processing step, we performed classification using the BERT + NBSVM model to classify people’s sentiments about vaccines. The reason for choosing to combine bidirectional encoder representations from transformers (BERT) and Naive Bayes and support vector machine (NBSVM ) can be understood by considering the limitation of BERT-based approaches, which only leverage encoder layers, resulting in lower performance on short texts like the ones used in our analysis. Such a limitation can be ameliorated by using Naive Bayes and Support Vector Machine approaches that are able to achieve higher performance in short text sentiment analysis. Thus, we took advantage of both BERT features and NBSVM features to define a flexible framework for our sentiment analysis goal related to vaccine sentiment identification. Moreover, we enrich our results with spatial analysis of the data by using geo-coding, visualization, and spatial correlation analysis to suggest the most suitable vaccination centers to users based on the sentiment analysis outcomes. In principle, we do not need to implement a distributed architecture to run our experiments as the available public data are not massive. However, we discuss a high-performance architecture that will be used if the collected data scales up dramatically. We compared our approach with the state-of-art methods by comparing most widely used metrics like Accuracy, Precision, Recall and F-measure. The proposed BERT + NBSVM outperformed alternative models by achieving 73% accuracy, 71% precision, 88% recall and 73% F-measure for classification of positive sentiments while 73% accuracy, 71% precision, 74% recall and 73% F-measure for classification of negative sentiments respectively. These promising results will be properly discussed in next sections. The use of artificial intelligence methods and social media analysis can lead to a better understanding of people’s reactions and opinions about any trending topic. However, in the case of health-related topics like COVID-19 vaccines, proper sentiment identification could be crucial for implementing public health policies. More in detail, the availability of useful findings on user opinions about vaccines can help policymakers design proper strategies and implement ad-hoc vaccination protocols according to people’s feelings, in order to provide better public service. To this end, we leveraged geospatial information to support effective recommendations for vaccination centers.
- Abstract
4
- 10.1016/j.joca.2020.02.488
- Apr 1, 2020
- Osteoarthritis and Cartilage
Bert model fine-tuning for text classification in knee OA radiology reports
- Research Article
3
- 10.11591/eei.v13i2.6301
- Apr 1, 2024
- Bulletin of Electrical Engineering and Informatics
Recent research has focused on opinion mining from public sentiments using natural language processing (NLP) and machine learning (ML) techniques. Transformer-based models, such as bidirectional encoder representations from transformers (BERT), excel in extracting semantic information but are resourceintensive. Google’s new research, mixing tokens with fourier transform, also known as FNet, replaced BERT’s attention mechanism with a non-parameterized fourier transform, aiming to reduce training time without compromising performance. This study fine-tuned the FNet model with a publicly available Kaggle hotel review dataset and investigated the performance of this dataset in both FNet and BERT architectures along with conventional machine learning models such as long short-term memory (LSTM) and support vector machine (SVM). Results revealed that FNet significantly reduces the training time by almost 20% and memory utilization by nearly 60% compared to BERT. The highest test accuracy observed in this experiment by FNet was 80.27% which is nearly 97.85% of BERT’s performance with identical parameters.
- Research Article
- 10.70102/afts.2025.1833.176
- Oct 30, 2025
- Archives for Technical Sciences
Sentiment analysis has come out as an important activity in natural language processing (NLP) applications whose data analysis is in high demand at present in the modern world. The BERT (Bidirectional Encoder Representations from Transformers) algorithm has proved to be extremely efficient when it comes to sentiment analysis tasks, and its potential is far exceeding that of conventional algorithms, unlocking their potential however would require fine tuning of their hyperparameters. It is quite a feat to optimise the BERT’s various hyperparameters due to the complicated interaction between them (e.g. the learning rate, batch size, dropout rate, attention heads). In this paper, the Salp Swarm Algorithm (SSA) is used as a bio-inspired metaheuristic optimization technique to optimize the fine-tuning process. Through SSA’s exceptionally efficient search capabilities in modelling multidimensional search space, BERT hyperparameters are optimized systematically to the sentiment classification tasks. A benchmark dataset for sentiment analysis (Sentiment140) is used to evaluate the proposed model. The novelty of the presented model is the fact that it dynamically adjusts its search behaviour in response to performance signals, thus it identifies better-performing parameter sets than conventional methods, leading to successful exploitation of the BERT algorithm that has produced high performing configurations. Extensive evaluations against 3 state-of-the-art search algorithms, namely manual tuning, grid search, and random search are conducted on the Sentiment140 benchmark dataset, demonstrating the superiority of the proposed SSA BERT optimization technique over state-of-the-art methods. The SSA-BERT model achieved a maximum accuracy of 96.4 percent, which is far better than manual tuning, grid search, and random search (65.0 percent, 69.5 percent and 72.0 percent respectively). It also performed better than other existing BERT models used in related literature, which showed accuracy levels between 46.4 and 75.7 percent in accordance with different benchmarks Sentiment analysis has come out as an important activity in natural language processing (NLP) applications whose data analysis is in high demand at present in the modern world. The BERT (Bidirectional Encoder Representations from Transformers) algorithm has proved to be extremely efficient when it comes to sentiment analysis tasks, and its potential is far exceeding that of conventional algorithms, unlocking their potential however would require fine tuning of their hyperparameters. It is quite a feat to optimise the BERT’s various hyperparameters due to the complicated interaction between them (e.g. the learning rate, batch size, dropout rate, attention heads). In this paper, the Salp Swarm Algorithm (SSA) is used as a bio-inspired metaheuristic optimization technique to optimize the fine-tuning process. Through SSA’s exceptionally efficient search capabilities in modelling multidimensional search space, BERT hyperparameters are optimized systematically to the sentiment classification tasks. A benchmark dataset for sentiment analysis (Sentiment140) is used to evaluate the proposed model. The novelty of the presented model is the fact that it dynamically adjusts its search behaviour in response to performance signals, thus it identifies better-performing parameter sets than conventional methods, leading to successful exploitation of the BERT algorithm that has produced high performing configurations. Extensive evaluations against 3 state-of-the-art search algorithms, namely manual tuning, grid search, and random search are conducted on the Sentiment140 benchmark dataset, demonstrating the superiority of the proposed SSA BERT optimization technique over state-of-the-art methods. The SSA-BERT model achieved a maximum accuracy of 96.4 percent, which is far better than manual tuning, grid search, and random search (65.0 percent, 69.5 percent and 72.0 percent respectively). It also performed better than other existing BERT models used in related literature, which showed accuracy levels between 46.4 and 75.7 percent in accordance with different benchmarks.
- Research Article
- 10.48084/etasr.10625
- Jun 4, 2025
- Engineering, Technology & Applied Science Research
Text classification is a fundamental task in Natural Language Processing (NLP) with a wide range of applications such as sentiment analysis, document classification and content recommendation. Traditional approaches like Naive Bayes (NB), Support Vector Machine (SVM) and Random Forest (RF) relied on feature engineering but lacked contextual understanding. Deep learning came into the picture for text classification with transformer models such as Bidirectional Encoder Representations from Transformers (BERT), which could understand contextual words bidirectionally. In this article, we utilize a pre-trained BERT model fine-tuned on the Reuters-21578 dataset to classify news articles. We aim to measure the performance of transfer learning against common machine learning models and non-fine-tuned BERT. The fine-tuned model achieves 91.77% accuracy, which significantly outperforms the non-fine-tuned BERT, and performs better than classical classifiers such as NB, SVM and RF. The results show that fine-tuning allows BERT to contextualize domain-specific intricacies, resulting in improved classification performance. We also address the computational trade-offs associated with transformer models, highlighting the need for optimal methods for deployment. Thus, this study further enables the use of fine-tuned BERT in automatic news classification and is of significant value for information retrieval and content personalization.
- Book Chapter
1
- 10.1007/978-3-030-79757-7_11
- Jan 1, 2021
Question-Answering (QA) has become one of the most popular natural language processing (NLP) and information retrieval applications. To be applied in QA systems, this paper presents a question classification technique based on NLP and Bidirectional Encoder Representation from Transformers (BERT). We performed experimental investigation on BERT for question classification with TREC-6 dataset and a Thai sentence dataset. We propose an improved processing technique called “More Than Words – BERT” (MTW – BERT) that is a special NLP Annotation tags for combining Part-Of-Speech tagging and Named Entities Recognition to be able for learning both pattern of grammatical tag sequence and recognized entities together as input before classifying text on BERT model. Experimental results showed that MTW – BERT outperformed existing classification methods and achieved new state-of-the-art performance on question classification for TREC-6 dataset with 99.20%. In addition, MTW-BERT also applied for question classification for Thai sentences in wh-question category. The proposed technique remarkably achieved Thai wh-classification with accuracy rate of 87.50%.KeywordsClassificationBERT-based modelNLP TaggingAnalysis Thai Sentence
- Research Article
- 10.2174/0118722121300281240823174052
- Nov 1, 2025
- Recent Patents on Engineering
Advanced technologies on the internet create an environment for information exchange among communities. However, some individuals exploit these environments to spread false news. False News, or Fake News (FN), refers to misleading information deliberately crafted to harm the reputation of individuals, products, or services. Identifying FN is a challenging issue for the research community. Many researchers have proposed approaches for FN detection using Machine Learning (ML) and Natural Language Processing (NLP) techniques. In this patent article, we propose a combined approach for FN detection, leveraging both ML and NLP techniques. We first extract all terms from the dataset after applying appropriate preprocessing techniques. A Feature Selection Algorithm (FSA) is then employed to identify the most important features based on their scores. These selected features are used to represent the dataset documents as vectors. The term weight measure determines the significance of each term in the vector representation. These document vectors are combined with vector representations obtained through an NLP technique. Specifically, we use the Bidirectional Encoder Representations from Transformers (BERT) model to represent the document vectors. The BERT small case model is employed to generate features, which are then used to create the document vectors. The combined vector, comprising ML-based document vector representations and NLP-based vector representations, is fed into various ML algorithms. These algorithms are used to build a model for classification. Our combined approach for FN detection achieved the highest accuracy of 96.72% using the Random Forest algorithm, with document vectors that included content-based features of size 4000 concatenated with outputs from the 9th to 12th BERT encoder layers.
- Research Article
2
- 10.3390/analytics3020014
- Jun 18, 2024
- Analytics
In this work, we evaluated the efficacy of Google’s Pathways Language Model (GooglePaLM) in analyzing sentiments expressed in product reviews. Although conventional Natural Language Processing (NLP) techniques such as the rule-based Valence Aware Dictionary for Sentiment Reasoning (VADER) and the long sequence Bidirectional Encoder Representations from Transformers (BERT) model are effective, they frequently encounter difficulties when dealing with intricate linguistic features like sarcasm and contextual nuances commonly found in customer feedback. We performed a sentiment analysis on Amazon’s fashion review datasets using the VADER, BERT, and GooglePaLM models, respectively, and compared the results based on evaluation metrics such as precision, recall, accuracy correct positive prediction, and correct negative prediction. We used the default values of the VADER and BERT models and slightly finetuned GooglePaLM with a Temperature of 0.0 and an N-value of 1. We observed that GooglePaLM performed better with correct positive and negative prediction values of 0.91 and 0.93, respectively, followed by BERT and VADER. We concluded that large language models surpass traditional rule-based systems for natural language processing tasks.
- Research Article
24
- 10.1186/s12911-022-01819-4
- Jul 1, 2022
- BMC medical informatics and decision making
BackgroundSince no effective therapies exist for Alzheimer’s disease (AD), prevention has become more critical through lifestyle status changes and interventions. Analyzing electronic health records (EHRs) of patients with AD can help us better understand lifestyle’s effect on AD. However, lifestyle information is typically stored in clinical narratives. Thus, the objective of the study was to compare different natural language processing (NLP) models on classifying the lifestyle statuses (e.g., physical activity and excessive diet) from clinical texts in English.MethodsBased on the collected concept unique identifiers (CUIs) associated with the lifestyle status, we extracted all related EHRs for patients with AD from the Clinical Data Repository (CDR) of the University of Minnesota (UMN). We automatically generated labels for the training data by using a rule-based NLP algorithm. We conducted weak supervision for pre-trained Bidirectional Encoder Representations from Transformers (BERT) models and three traditional machine learning models as baseline models on the weakly labeled training corpus. These models include the BERT base model, PubMedBERT (abstracts + full text), PubMedBERT (only abstracts), Unified Medical Language System (UMLS) BERT, Bio BERT, Bio-clinical BERT, logistic regression, support vector machine, and random forest. The rule-based model used for weak supervision was tested on the GSC for comparison. We performed two case studies: physical activity and excessive diet, in order to validate the effectiveness of BERT models in classifying lifestyle status for all models were evaluated and compared on the developed Gold Standard Corpus (GSC) on the two case studies.ResultsThe UMLS BERT model achieved the best performance for classifying status of physical activity, with its precision, recall, and F-1 scores of 0.93, 0.93, and 0.92, respectively. Regarding classifying excessive diet, the Bio-clinical BERT model showed the best performance with precision, recall, and F-1 scores of 0.93, 0.93, and 0.93, respectively.ConclusionThe proposed approach leveraging weak supervision could significantly increase the sample size, which is required for training the deep learning models. By comparing with the traditional machine learning models, the study also demonstrates the high performance of BERT models for classifying lifestyle status for Alzheimer’s disease in clinical notes.
- Research Article
19
- 10.1145/3533430
- May 18, 2023
- ACM Transactions on Internet Technology
Computational Linguistics (CL) associated with the Internet of Multimedia Things (IoMT)-enabled multimedia computing applications brings several research challenges, such as real-time speech understanding, deep fake video detection, emotion recognition, home automation, and so on. Due to the emergence of machine translation, CL solutions have increased tremendously for different natural language processing (NLP) applications. Nowadays, NLP-enabled IoMT is essential for its success. Sarcasm detection, a recently emerging artificial intelligence (AI) and NLP task, aims at discovering sarcastic, ironic, and metaphoric information implied in texts that are generated in the IoMT. It has drawn much attention from the AI and IoMT research community. The advance of sarcasm detection and NLP techniques will provide a cost-effective, intelligent way to work together with machine devices and high-level human-to-device interactions. However, existing sarcasm detection approaches neglect the hidden stance behind texts, thus insufficient to exploit the full potential of the task. Indeed, the stance, i.e., whether the author of a text is in favor of, against, or neutral toward the proposition or target talked in the text, largely determines the text’s actual sarcasm orientation. To fill the gap, in this research, we propose a new task: stance-level sarcasm detection (SLSD), where the goal is to uncover the author’s latent stance and based on it to identify the sarcasm polarity expressed in the text. We then propose an integral framework, which consists of Bidirectional Encoder Representations from Transformers (BERT) and a novel stance-centered graph attention networks (SCGAT). Specifically, BERT is used to capture the sentence representation, and SCGAT is designed to capture the stance information on specific target. Extensive experiments are conducted on a Chinese sarcasm sentiment dataset we created and the SemEval-2018 Task 3 English sarcasm dataset. The experimental results prove the effectiveness of the SCGAT framework over state-of-the-art baselines by a large margin.
- Research Article
30
- 10.1016/j.ijmedinf.2022.104736
- Mar 7, 2022
- International journal of medical informatics
A hybrid model to identify fall occurrence from electronic health records
- Research Article
12
- 10.1016/j.artmed.2024.102889
- May 5, 2024
- Artificial Intelligence In Medicine
BackgroundPretraining large-scale neural language models on raw texts has made a significant contribution to improving transfer learning in natural language processing. With the introduction of transformer-based language models, such as bidirectional encoder representations from transformers (BERT), the performance of information extraction from free text has improved significantly in both the general and medical domains. However, it is difficult to train specific BERT models to perform well in domains for which few databases of a high quality and large size are publicly available. ObjectiveWe hypothesized that this problem could be addressed by oversampling a domain-specific corpus and using it for pretraining with a larger corpus in a balanced manner. In the present study, we verified our hypothesis by developing pretraining models using our method and evaluating their performance. MethodsOur proposed method was based on the simultaneous pretraining of models with knowledge from distinct domains after oversampling. We conducted three experiments in which we generated (1) English biomedical BERT from a small biomedical corpus, (2) Japanese medical BERT from a small medical corpus, and (3) enhanced biomedical BERT pretrained with complete PubMed abstracts in a balanced manner. We then compared their performance with those of conventional models. ResultsOur English BERT pretrained using both general and small medical domain corpora performed sufficiently well for practical use on the biomedical language understanding evaluation (BLUE) benchmark. Moreover, our proposed method was more effective than the conventional methods for each biomedical corpus of the same corpus size in the general domain. Our Japanese medical BERT outperformed the other BERT models built using a conventional method for almost all the medical tasks. The model demonstrated the same trend as that of the first experiment in English. Further, our enhanced biomedical BERT model, which was not pretrained on clinical notes, achieved superior clinical and biomedical scores on the BLUE benchmark with an increase of 0.3 points in the clinical score and 0.5 points in the biomedical score. These scores were above those of the models trained without our proposed method. ConclusionsWell-balanced pretraining using oversampling instances derived from a corpus appropriate for the target task allowed us to construct a high-performance BERT model.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.