English
Recent years have been an active testing ground for artificial neural networks for language understanding, a very important aspect of NLP. In this respect, emerging NLP technologies are largely motivated by the rising requirements to cope with the issues raised by different NLP tasks, allowing the processing and analysis of large text data samples, uncovering complex language behaviors, as well as extracting valuable information from disorganized text. NLP (Natural Language Processing) has proven to be the most successful field of machine learning thanks to its capability to teach itself and detect all kinds of features on its own based on enormous amounts of data. In NLP tasks like language modelling, text classification, emotion analysis, and machine translation, RNNs, CNNs, and transformer-based models have been used in new ways. While NLP is generally agreed upon the difficulties it faces, the progress of technology also gives birth to unexpected challenges. Thus, two factors, namely the expanding collections of large text datasets and the pressing need for more accurate and time-saving NLP models that emerge as a consequence are giving rise to new kinds of deep learning models and techniques. Here, this paper analyzes as a whole the most recent achievement of neural architectures for natural language processing applications. From introducing current models and approaches in NLP, highlighting their strengths and weaknesses, and identifying the areas to be researched in the future, this paper will conduct this discussion.<br /> Then, this paper will go on and investigate the of one in NLP, together with the importance of constantly improving architectures which are responsible for tackling these hard tasks. Subsequently, it will talk about the recent breakthroughs in deep learning models namely RNNs, CNNs, transformer-based models and attention mechanisms will be discussed next. At last, this paper will cover the ever-evolving roofline in NLP research, including transfer learning, self-supervised learning, and multimodal learning. Moreover, this paper will also underline the current shortcomings of existing NLP models and locate the themes where research needs to be reevaluated. This article, through the deep learning architecture review for NLP, offered a full-range overview of the recent advancement in deep learning, and this article is developed as a valuable corpus for the researcher, practitioners, and students in the field of NLP.
- # Natural Language Processing
- # Natural Language Processing Tasks
- # Transformer-based Models
- # Approaches In Natural Language Processing
- # Field Of Natural Language Processing
- # Natural Language Processing Models
- # Natural Language Processing Research
- # Advancement In Deep Learning
- # Self-supervised Learning
- # Deep Learning
- Research Article
- 10.1162/coli_r_00388
- Oct 29, 2020
- Computational Linguistics
Like any other science, research in natural language processing (NLP) depends on the ability to draw correct conclusions from experiments. A key tool for this is statistical significance testing: We use it to judge whether a result provides meaningful, generalizable findings or should be taken with a pinch of salt. When comparing new methods against others, performance metrics often differ by only small amounts, so researchers turn to significance tests to show that improved models are genuinely better. Unfortunately, this reasoning often fails because we choose inappropriate significance tests or carry them out incorrectly, making their outcomes meaningless. Or, the test we use may fail to indicate a significant result when a more appropriate test would find one. NLP researchers must avoid these pitfalls to ensure that their evaluations are sound and ultimately avoid wasting time and money through incorrect conclusions.This book guides NLP researchers through the whole process of significance testing, making it easy to select the right kind of test by matching canonical NLP tasks to specific significance testing procedures. As well as being a handbook for researchers, the book provides theoretical background on significance testing, includes new methods that solve problems with significance tests in the world of deep learning and multidataset benchmarks, and describes the open research problems of significance testing for NLP.The book focuses on the task of comparing one algorithm with another. At the core of this is the p-value, the probability that a difference at least as extreme as the one we observed could occur by chance. If the p-value falls below a predetermined threshold, the result is declared significant. Leaving aside the fundamental limitation of turning the validity of results into a binary question with an arbitrary threshold, to be a valid statistical significance test, the p-value must be computed in the right way. The book describes the two crucial properties of an appropriate significance test: The test must be both valid and powerful. Validity refers to the avoidance of type 1 errors, in which the result is incorrectly declared significant. Common mistakes that lead to type 1 errors include deploying tests that make incorrect assumptions, such as independence between data points. The power of a test refers to its ability to detect a significant result and therefore to avoid type 2 errors. Here, knowledge of the data and experiment must be used to choose a test that makes the correct assumptions. There is a trade-off between validity and power, but for the most common NLP tasks (language modeling, sequence labeling, translation, etc.), there are clear choices of tests that provide a good balance.Beginning with a detailed background on significance testing, the book then shows the reader how to carry out tests for specific NLP tasks. There is a mix of styles, with the first four chapters providing reference material that will be extremely useful to both new and experienced researchers. Here, it is easy to find the material related to a given NLP task. The next two chapters discuss more recent research into the application of significance tests to deep neural networks and for testing across multiple datasets. Alongside open research questions, these later chapters provide clear guidelines on how to apply the proposed methods. It is this mix of background material and reference guidelines that I believe makes this book so compelling and nicely self-contained.The introduction in Chapter 1 motivates the need for a comprehensive textbook and outlines challenges that the later chapters address more deeply. The theoretical background material begins in Chapter 2, which introduces core concepts, including hypothesis testing, type 1 and type 2 errors, validity and power, and p-values. The reader does not need to have any prior knowledge of statistical significance tests to follow this part. However, experienced readers could still benefit from reading this chapter, as concepts such as p-values are widely misunderstood and misused (Amrhein, Greenland, and McShane 2019).The significance tests themselves are introduced in Chapter 3, categorized into parametric and nonparametric tests. The chapter begins with the intuitively simple paired z-test, then builds up to more commonly-applied techniques, showing the connections and assumptions that each test makes. Step-by-step algorithms help the reader to implement each test. Although the chapter does cite uses of tests in NLP research, the main purpose is to present the theory behind each test and point out their differences.Chapter 4 provides perhaps the most handy part of the book for reference: a correspondence between common NLP tasks and statistical tests. Each task is discussed in terms of the evaluation metrics used, then a decision tree is introduced to guide the reader toward a choice between a parametric test, bootstrap or randomization test, or sampling-free nonparametric test. Section 4.3 then links each NLP evaluation measure to a specific significance test, presenting a large table that helps readers identify which test they need for a specific task. Particular considerations for each task are also pointed out to provide more detail about the appropriate options. The final part of this chapter describes the issue of p-hacking, in which dataset sizes are increased until a significance threshold is reached without consideration for biases in the data (discussed, for example, in Hofmann [2015]). The chapter proposes a simple solution to ensure robust significance testing with large datasets.Where Chapter 4 presents well-established methods, Chapter 5 introduces the current research question of how best to apply statistical significance testing to deep learning. Non-convex loss functions, stochastic optimization, random initialization, and a multitude of hyperparameters limit the conclusions we can draw from a single test run of a deep neural network (DNN). This chapter, which is based on the authors’ ACL paper (Dror, Shlomov, and Reichart 2019), explains how the comparison process can be overhauled to provide more meaningful evaluations. Beginning by explaining the difficulties of evaluating DNNs, the chapter then introduces criteria for a comparison framework, then discusses the limitations of current methods. Reimers and Gurevych (2018) have previously tackled this problem, but their approach has limited power and does not provide a confidence score. Other works, such as Clark et al. (2011), compare DNNs using a collection of statistics, such as the mean or standard deviation of performance across runs. This book shows how such an approach violates the assumptions of the significance tests. The authors propose almost stochastic dominance as the basis for a better alternative. The chapter explains how to use the proposed method, evaluates it in an empirical case study, and finally analyzes the errors made by each testing approach.Large NLP models are often tested across a range of datasets, which presents another problem for standard significance testing. Chapter 6 discusses the challenges of assessing two questions: (1) On how many datasets does algorithm A outperform algorithm B? (2) On which datasets does A outperform B? Applying standard significance tests individually to each dataset and counting the number of significant results is likely to overestimate the total number of significant results, as this chapter explains. The authors then present a new framework for replicability analysis, based on partial conjunction testing, and discuss two variants (Bonferroni and Fisher) for when the datasets are independent or dependent. They introduce a method based on Benjamini and Heller (2008) to count the number of datasets where one method outperforms another, then show how to use the Holm procedure (Holm 1979) to identify which datasets these are. Chapter 6 provides a lot of detailed background on the proposed replicability analysis framework, and the later sections again link the process to specific NLP case studies, and step-by-step summaries help the reader to apply the methodology. Extensive empirical results illustrate the very substantial differences in outcomes between the proposed approach and standard procedures.The final two chapters present open problems and conclude, showing that the topic has many interesting research questions of its own, such as problems when performing cross-validation, and the limited statistical power of replicability analysis.Overall, I highly recommend this book to a wide range of NLP researchers, from new students to seasoned experts who wish to ensure that they compare methods effectively. The book is excellent as both an introduction to the topic of significance testing and as a reference to use when evaluating your results. For anyone with further interest in the topic, it also points the way to future work. If one could level any criticism at this book at all, it is that it does not deeply discuss the basic flaws of significance testing or what the alternatives might be. For now, though, significance testing is an integral part of NLP research and this book provides a great resource for researchers who wish to perform it correctly and painlessly.
- Dissertation
- 10.32657/10356/182221
- Jan 1, 2025
Deep learning has become increasingly popular due to its remarkable ability to learn high-dimensional feature representations. Numerous algorithms and models have been developed to enhance the application of deep learning across various real-world tasks, including image classification, natural language processing, and autonomous driving. However, deep learning models are susceptible to backdoor threats, where an attacker manipulates the training process or data to cause incorrect predictions on malicious samples containing specific triggers, while maintaining normal performance on benign samples. With the advancement of deep learning, including evolving training schemes and the need for large-scale training data, new threats in the backdoor domain continue to emerge. Conversely, backdoors can also be leveraged to protect deep learning models, such as through watermarking techniques. In this thesis, we conduct an in-depth investigation into backdoor techniques from three novel perspectives. In the first part of this thesis, we demonstrate that emerging deep learning training schemes can introduce new backdoor risks. Specifically, pre-trained Natural Language Processing (NLP) models can be easily adapted to a variety of downstream language tasks, significantly accelerating the development of language models. However, the pre-trained model becomes a single point of failure for these downstream models. We propose a novel task-agnostic backdoor attack against pre-trained NLP models, wherein the adversary does not need prior information about the downstream tasks when implanting the backdoor into the pre-trained model. Any downstream models transferred from this malicious model will inherit the backdoor, even after extensive transfer learning, revealing the severe vulnerability of pre-trained foundation models to backdoor attacks. In the second part of this thesis, we develop novel backdoor attack methods suited to new threat scenarios. The rapid expansion of deep learning models necessitates large-scale training data, much of which is unlabeled and outsourced to third parties for annotation. To ensure data security, most datasets are read-only for training samples, preventing the addition of input triggers. Consequently, attackers can only achieve data poisoning by uploading malicious annotations. In this practical scenario, all existing data poisoning methods that add triggers to the input are infeasible. Therefore, we propose new backdoor attack methods that involve poisoning only the labels without modifying any input samples. In the third part of this thesis, we utilize the backdoor technique to proactively protect our deep learning models, specifically for intellectual property protection. Considering the complexity of deep learning tasks, generating a well-trained deep learning model requires substantial computational resources, training data, and expertise. Therefore, it is essential to protect these assets and prevent copyright infringement. Inspired by backdoor attacks that can induce specific behaviors in target models through carefully designed samples, several watermarking methods have been proposed to protect the intellectual property of deep learning models. Model owners can train their models to produce unique outputs for certain crafted samples and use these samples for ownership verification. While various extraction techniques have been designed for supervised deep learning models, challenges arise when applying them to deep reinforcement learning models due to differences in model features and scenarios. Therefore, we propose a novel watermarking scheme to protect deep reinforcement learning models from unauthorized distribution. Instead of using spatial watermarks as in conventional deep learning models, we design temporal watermarks that minimize potential impact and damage to the protected deep reinforcement learning model while achieving high-fidelity ownership verification. In summary, this thesis investigates the evolving landscape of backdoor threats during the development of deep learning techniques and the use of backdoors for beneficial purposes in intellectual property protection.
- Research Article
8
- 10.54254/2755-2721/77/20240674
- Jul 16, 2024
- Applied and Computational Engineering
This paper provides a comprehensive review of the evolution and advancements in deep learning models for Natural Language Processing (NLP). It explores the transition from statistical models to neural networks, highlighting the paradigm shift towards data-driven methodologies and the implications for NLP tasks. The emergence of neural network architectures, such as Recurrent Neural Networks (RNNs) and transformer-based models like BERT and GPT, has revolutionized language understanding and generation. Furthermore, the integration of deep learning in traditional NLP tasks, such as part-of-speech tagging and named entity recognition, has led to significant improvements in accuracy and efficiency. The paper also discusses the quantitative analysis of deep learning models, including performance metrics, computational efficiency, and mathematical modeling of language tasks. Case studies and applications, including sentiment analysis, machine translation, and automated content generation, exemplify the transformative impact of deep learning in NLP.
- Research Article
- 10.52783/jisem.v10i51s.10376
- May 30, 2025
- Journal of Information Systems Engineering and Management
Automatic Short Answer Grading (ASAG) has gained increasing importance in educational technology, where accurate and scalable assessment solutions are needed. Recent advances in Natural Language Processing (NLP) have introduced powerful Transformer-based models, such as Bidirectional Encoder Representations from Transformers (BERT), Text-to-Text Transfer Transformer (T5), and Generative Pre-trained Transformer 3 (GPT-3), which have demonstrated state-of-the-art performance across various text-based tasks. This paper presents a comparative study of these three models in the context of ASAG, evaluating their effectiveness, accuracy, and efficiency. BERT’s bidirectional encoding, T5’s text-to-text framework, and GPT-3’s autoregressive generation are explored in depth to assess their ability to understand, grade, and generate feedback on short answers. We utilize standard ASAG datasets and multiple evaluation metrics, including accuracy, precision, recall, and F1-score, to measure their performance. The comparative analysis reveals that while all three models exhibit strong capabilities, they vary in handling complex language and ambiguous student responses, with trade-offs in computational cost and scalability. This study highlights the strengths and weaknesses of each model in ASAG and offers insights into their practical applications in educational settings. Introduction: The automation of grading has become a focal point in modern education systems, driven by the increasing demand for scalable and efficient assessment solutions (Sahu & Bhowmick, 2015). With the proliferation of online learning platforms, digital classrooms, and remote education, the ability to automatically grade short-answer questions has gained significant importance (Gomaa & Fahmy, 2020). Automatic Short Answer Grading (ASAG) seeks to evaluate student responses by comparing them to model answers, often assessing the content’s correctness, relevance, and linguistic features—critical components for evaluating students’ understanding and knowledge retention (Busatta & Brancher, 2018). Traditional ASAG approaches typically employed rule-based systems, statistical models, and early machine learning algorithms that relied heavily on predefined keywords, templates, or handcrafted features (Tulu et al., 2021). While effective for straightforward, fact-based questions, these systems struggled to capture the complexity and variability of natural language, resulting in reduced grading accuracy—especially for creative or ambiguous responses (Sychev et al., 2019). Consequently, such methods often required significant manual intervention, limiting their scalability and applicability in dynamic educational settings (Muftah & Aziz, 2013). The advent of deep learning, particularly in the field of Natural Language Processing (NLP), has marked a transformative shift in ASAG (Gaddipati et al., 2020). Neural network-based models have demonstrated a remarkable capacity to learn and generalize from large datasets, enabling a more nuanced understanding of language (Wang et al., 2019). This has led to the development of more robust ASAG systems capable of handling a broader spectrum of student responses, ranging from factual answers to complex explanations (Roy et al., 2016). A pivotal advancement in NLP is the introduction of the Transformer architecture, which has revolutionized how language models are designed and trained (Vaswani et al., 2017). Transformers excel in processing sequential data through self-attention mechanisms that capture long-range dependencies and contextual relationships within text. This architectural innovation has significantly enhanced performance across a variety of NLP tasks, such as machine translation, sentiment analysis, and question answering (Peters et al., 2018), making Transformer-based models particularly suitable for enhancing ASAG systems (Raffel et al., 2020). In this paper, we focus on three prominent Transformer-based models—BERT, T5, and GPT-3—each representing a distinct approach to language understanding and processing. These models have set new benchmarks across numerous NLP tasks, and their potential application in ASAG is substantial Objectives: The goal of this study is to conduct a comparative analysis of these three Transformer models—BERT, T5, and GPT-3—in the context of ASAG. We evaluate their performance on standard ASAG datasets using multiple evaluation metrics, such as accuracy, precision, recall, and F1-score. Additionally, we analyze the computational efficiency and scalability of these models to determine their practicality for deployment in large-scale educational environments. Methods: By providing a comprehensive comparison, this study seeks to shed light on the strengths and weaknesses of each model and their suitability for different types of ASAG tasks. Moreover, we aim to offer insights that can guide future research and development in this area, ultimately contributing to the creation of more effective and reliable automated grading systems. Results: The results of our comparative analysis of BERT, T5, and GPT-3 in the context of Automatic Short Answer Grading (ASAG) reveal important insights into the strengths and limitations of these Transformer models. This section discusses the implications of our findings, the practical considerations for deploying these models in educational settings, and identifies potential avenues for future research. Conclusions: In conclusion, this study provides a comprehensive comparative analysis of BERT, T5, and GPT-3 for ASAG, highlighting their strengths, limitations, and practical considerations. The insights gained from this research contribute to the ongoing development and refinement of automated grading systems, with the potential to enhance educational assessment and support in diverse learning environments.
- Research Article
- 10.52783/jisem.v10i51s.10392
- May 30, 2025
- Journal of Information Systems Engineering and Management
Automatic Short Answer Grading (ASAG) has gained increasing importance in educational technology, where accurate and scalable assessment solutions are needed. Recent advances in Natural Language Processing (NLP) have introduced powerful Transformer-based models, such as Bidirectional Encoder Representations from Transformers (BERT), Text-to-Text Transfer Transformer (T5), and Generative Pre-trained Transformer 3 (GPT-3), which have demonstrated state-of-the-art performance across various text-based tasks. This paper presents a comparative study of these three models in the context of ASAG, evaluating their effectiveness, accuracy, and efficiency. BERT’s bidirectional encoding, T5’s text-to-text framework, and GPT-3’s autoregressive generation are explored in depth to assess their ability to understand, grade, and generate feedback on short answers. We utilize standard ASAG datasets and multiple evaluation metrics, including accuracy, precision, recall, and F1-score, to measure their performance. The comparative analysis reveals that while all three models exhibit strong capabilities, they vary in handling complex language and ambiguous student responses, with trade-offs in computational cost and scalability. This study highlights the strengths and weaknesses of each model in ASAG and offers insights into their practical applications in educational settings. Introduction: The automation of grading has become a focal point in modern education systems, driven by the increasing demand for scalable and efficient assessment solutions (Sahu & Bhowmick, 2015). With the proliferation of online learning platforms, digital classrooms, and remote education, the ability to automatically grade short-answer questions has gained significant importance (Gomaa & Fahmy, 2020). Automatic Short Answer Grading (ASAG) seeks to evaluate student responses by comparing them to model answers, often assessing the content’s correctness, relevance, and linguistic features—critical components for evaluating students’ understanding and knowledge retention (Busatta & Brancher, 2018). Traditional ASAG approaches typically employed rule-based systems, statistical models, and early machine learning algorithms that relied heavily on predefined keywords, templates, or handcrafted features (Tulu et al., 2021). While effective for straightforward, fact-based questions, these systems struggled to capture the complexity and variability of natural language, resulting in reduced grading accuracy—especially for creative or ambiguous responses (Sychev et al., 2019). Consequently, such methods often required significant manual intervention, limiting their scalability and applicability in dynamic educational settings (Muftah & Aziz, 2013). The advent of deep learning, particularly in the field of Natural Language Processing (NLP), has marked a transformative shift in ASAG (Gaddipati et al., 2020). Neural network-based models have demonstrated a remarkable capacity to learn and generalize from large datasets, enabling a more nuanced understanding of language (Wang et al., 2019). This has led to the development of more robust ASAG systems capable of handling a broader spectrum of student responses, ranging from factual answers to complex explanations (Roy et al., 2016). A pivotal advancement in NLP is the introduction of the Transformer architecture, which has revolutionized how language models are designed and trained (Vaswani et al., 2017). Transformers excel in processing sequential data through self-attention mechanisms that capture long-range dependencies and contextual relationships within text. This architectural innovation has significantly enhanced performance across a variety of NLP tasks, such as machine translation, sentiment analysis, and question answering (Peters et al., 2018), making Transformer-based models particularly suitable for enhancing ASAG systems (Raffel et al., 2020). In this paper, we focus on three prominent Transformer-based models—BERT, T5, and GPT-3—each representing a distinct approach to language understanding and processing. These models have set new benchmarks across numerous NLP tasks, and their potential application in ASAG is substantial Objectives: The goal of this study is to conduct a comparative analysis of these three Transformer models—BERT, T5, and GPT-3—in the context of ASAG. We evaluate their performance on standard ASAG datasets using multiple evaluation metrics, such as accuracy, precision, recall, and F1-score. Additionally, we analyze the computational efficiency and scalability of these models to determine their practicality for deployment in large-scale educational environments. Methods: By providing a comprehensive comparison, this study seeks to shed light on the strengths and weaknesses of each model and their suitability for different types of ASAG tasks. Moreover, we aim to offer insights that can guide future research and development in this area, ultimately contributing to the creation of more effective and reliable automated grading systems. Results: The results of our comparative analysis of BERT, T5, and GPT-3 in the context of Automatic Short Answer Grading (ASAG) reveal important insights into the strengths and limitations of these Transformer models. This section discusses the implications of our findings, the practical considerations for deploying these models in educational settings, and identifies potential avenues for future research. Conclusions: In conclusion, this study provides a comprehensive comparative analysis of BERT, T5, and GPT-3 for ASAG, highlighting their strengths, limitations, and practical considerations. The insights gained from this research contribute to the ongoing development and refinement of automated grading systems, with the potential to enhance educational assessment and support in diverse learning environments.
- Research Article
5
- 10.1145/3691631
- Jan 21, 2025
- ACM Transactions on Software Engineering and Methodology
With the development of Deep Learning, Natural Language Processing (NLP) applications have reached or even exceeded human-level capabilities in certain tasks. Although NLP applications have shown good performance, they can still have bugs like traditional software and even lead to serious consequences. Inspired by Lego blocks and syntax structure analysis, we propose an assembling test generation method for NLP applications or models and implement it in NLPLego . The key idea of NLPLego is to assemble the sentence skeleton and adjuncts in order by simulating the building of Lego blocks to generate multiple grammatically and semantically correct sentences based on one seed sentence. The sentences generated by NLPLego have derivation relations and different degrees of variation. These characteristics make it well-suited for integration with metamorphic testing theory, addressing the challenge of test oracle absence in NLP application testing. To validate NLPLego , we conduct experiments on three commonly used NLP tasks (i.e., machine reading comprehension, sentiment analysis, and semantic similarity measures), focusing on the efficiency of test generation and the quality and effectiveness of generated tests. We select five advanced NLP models and one popular industrial NLP software as the tested subjects. Given seed tests from SQuAD 2.0, SST, and QQP, NLPLego successfully detects 1,732, 3,140, and 261,879 incorrect behaviors with around 93.1% precision in three tasks, respectively. The experiment results show that NLPLego can efficiently generate high-quality tests for multiple NLP tasks to detect erroneous behaviors effectively. In the case study, we analyze the testing results provided by NLPLego to obtain intuitive representations of the different NLP capabilities of the tested subjects. The case study confirms that NLPLego can provide developers with clarity on the direction to improve NLP models or applications, laying the foundation for enhancing performance.
- Research Article
- 10.26662/ijiert.v8i3.pp74-83
- Oct 17, 2024
- International Journal of Innovations in Engineering Research and Technology
Natural Language Processing (NLP) has witnessed remarkable advancements over the past few decades, transforming the way machines understand and interact with human language. This survey provides a comprehensive overview of the key techniques and methodologies that have propelled the field forward, highlighting both traditional approaches and contemporary innovations. We begin by discussing foundational NLP techniques such as tokenization, part-of-speech tagging, and syntactic parsing, which laid the groundwork for understanding language structure. The evolution of statistical methods, including Hidden Markov Models (HMMs) and Conditional Random Fields (CRFs), is explored as a significant advancement in the probabilistic modeling of language. The survey then delves into the rise of machine learning approaches, particularly supervised and unsupervised learning, which have revolutionized various NLP tasks such as sentiment analysis, named entity recognition, and machine translation. We examine the impact of deep learning, focusing on architectures like Recurrent Neural Networks (RNNs), Long Short-Term Memory networks (LSTMs), and Convolutional Neural Networks (CNNs) that have enabled significant improvements in performance across a range of applications. The introduction of transformer models, particularly the attention mechanism and BERT (Bidirectional Encoder Representations from Transformers), marks a paradigm shift in how contextual information is captured, leading to state-of-the-art results in numerous NLP benchmarks. In addition to technical advancements, the survey addresses the challenges that persist in NLP, including issues of bias in language models, the necessity for large annotated datasets, and the importance of explainability in AI systems. We discuss ongoing research efforts aimed at mitigating these challenges, including techniques for domain adaptation, few-shot learning, and unsupervised representation learning. This survey aims to provide researchers and practitioners with a clear understanding of the trajectory of NLP techniques, illustrating how traditional methods have evolved into sophisticated deep learning models. We conclude by highlighting future directions for research in NLP, emphasizing the need for interdisciplinary approaches that integrate linguistics, cognitive science, and ethical considerations to build more robust, fair, and interpretable NLP systems. Through this comprehensive survey, we seek to inspire further exploration and innovation in the field of Natural Language Processing, paving the way for applications that can better understand and generate human language in diverse contexts.
- Research Article
21
- 10.1162/coli_a_00420
- Dec 7, 2021
- Computational Linguistics
Natural Language Processing and Computational Linguistics
- Research Article
10
- 10.1142/s1793830923300047
- Dec 20, 2023
- Discrete Mathematics, Algorithms and Applications
In recent years, deep learning has revolutionized natural language processing (NLP) by enabling the development of models that can learn complex representations of language data, leading to significant improvements in performance across a wide range of NLP tasks. Deep learning models for NLP typically use large amounts of data to train deep neural networks, allowing them to learn the patterns and relationships in language data. This is in contrast to traditional NLP approaches, which rely on hand-engineered features and rules to perform NLP tasks. The ability of deep neural networks to learn hierarchical representations of language data, handle variable-length input sequences, and perform well on large datasets makes them well-suited for NLP applications. Driven by the exponential growth of textual data and the increasing demand for condensed, coherent, and informative summaries, text summarization has been a critical research area in the field of NLP. Applying deep learning to text summarization refers to the use of deep neural networks to perform text summarization tasks. In this survey, we begin with a review of fashionable text summarization tasks in recent years, including extractive, abstractive, multi-document, and so on. Next, we discuss most deep learning-based models and their experimental results on these tasks. The paper also covers datasets and data representation for summarization tasks. Finally, we delve into the opportunities and challenges associated with summarization tasks and their corresponding methodologies, aiming to inspire future research efforts to advance the field further. A goal of our survey is to explain how these methods differ in their requirements as understanding them is essential for choosing a technique suited for a specific setting. This survey aims to provide a comprehensive review of existing techniques, evaluation methodologies, and practical applications of automatic text summarization.
- Research Article
- 10.63163/jpehss.v4i1.1260
- Mar 31, 2026
- Physical Education, Health and Social Sciences
Text classification is a crucial task in Natural Language Processing (NLP). The purpose of text classification research is to classify the text into pre-defined classes automatically. Low-resource languages still receive less attention in NLP tasks due to the scarcity of publicly annotated datasets and computational resources. Similarly, Balochi, a low-resource language with a 2500-year history and cultural significance, has not been considered much for the development of NLP applications. This research study implements a text classification task in Balochi and compares machine learning, Deep Learning, and Transformer-based models. Balochi-language’s unlabelled dataset of approximately 5.5k sentences was collected, and various pre-processing techniques, including tokenization, stop words removal, and text normalization, were applied. The experimental results of this research conclude that, among machine learning models, the SGD classifier achieved the highest accuracy of 98.83%. Among Deep Learning models, the BiLSTM achieved the highest accuracy of 98%. However, the Transformer-based model, the pre-trained XLM-RoBERTa, performed exceptionally well, achieving 99% accuracy on the Balochi classification task. These research findings provide a foundation for future multilingual pre-trained models for low-resource languages and aim to develop consistent Balochi language models for NLP applications.
- Research Article
3
- 10.52783/jas.v11i1.1432
- Jan 1, 2020
- JOURNAL OF ALGEBRAIC STATISTICS
Natural language processing (NLP) has become an indispensable tool across many disciplines, and deep learning models have shown promising early results in improving the accuracy and efficiency of NLP-related tasks. In order to get valuable insights into the strengths and weaknesses of different models and approaches, and to help determine which models are the most successful for fulfilling particular NLP tasks, a comparative study of deep learning models for NLP is invaluable. Several deep learning models for NLP tasks including sentiment analysis, named entity recognition, and machine translation are compared and contrasted in this article's literature review. This research takes a look at popular benchmarks and data sets for evaluating deep learning models for NLP comparisons. (NLP). The strengths and weaknesses of various models and approaches are also highlighted throughout the examination. In addition to a discussion of recent advancements in the field such pretrained language models and attention processes, the article also details the many challenges and limitations of comparing deep learning models for NLP and how they stack up against one another. (NLP). The report concludes with a discussion of directions in which further study of the topic may go. There is a need to construct more interpretable and multilingual deep learning models, and there is also a need to explore cross-modal learning and domain-specific models. When taken as a whole, a research comparing different deep learning models for NLP might have far-reaching effects on the creation of new NLP applications and the enhancement of current ones. This is due to its capacity to aid in the creation of more precise and efficient models for natural language processing and to provide light on the relative merits of existing approaches.
- Book Chapter
- 10.1016/b978-0-323-95502-7.00172-x
- Jan 1, 2025
- Reference Module in Life Sciences
Text Mining: Text Representation
- Research Article
99
- 10.1145/3529755
- Dec 3, 2022
- ACM Computing Surveys
Despite their success, deep networks are used as black-box models with outputs that are not easily explainable during the learning and the prediction phases. This lack of interpretability is significantly limiting the adoption of such models in domains where decisions are critical such as the medical and legal fields. Recently, researchers have been interested in developing methods that help explain individual decisions and decipher the hidden representations of machine learning models in general and deep networks specifically. While there has been a recent explosion of work on Explainable Artificial Intelligence (ExAI) on deep models that operate on imagery and tabular data, textual datasets present new challenges to the ExAI community. Such challenges can be attributed to the lack of input structure in textual data, the use of word embeddings that add to the opacity of the models and the difficulty of the visualization of the inner workings of deep models when they are trained on textual data. Lately, methods have been developed to address the aforementioned challenges and present satisfactory explanations on Natural Language Processing (NLP) models. However, such methods are yet to be studied in a comprehensive framework where common challenges are properly stated and rigorous evaluation practices and metrics are proposed. Motivated to democratize ExAI methods in the NLP field, we present in this work a survey that studies model-agnostic as well as model-specific explainability methods on NLP models. Such methods can either develop inherently interpretable NLP models or operate on pre-trained models in a post hoc manner. We make this distinction and we further decompose the methods into three categories according to what they explain: (1) word embeddings (input level), (2) inner workings of NLP models (processing level), and (3) models’ decisions (output level). We also detail the different evaluation approaches interpretability methods in the NLP field. Finally, we present a case-study on the well-known neural machine translation in an appendix, and we propose promising future research directions for ExAI in the NLP field.
- Research Article
- 10.56975/jetir.v12i12.573042
- Jan 1, 2025
- Journal of Emerging Technologies and Innovative Research
Sentiment analysis, crucial task of Natural Language Processing (NLP), becomes more tedious in the presence of code-mixed slang, where English blends with internet shorthand, colloquial expressions. Traditional monolingual NLP techniques are often inadequate for handling such irregular and informally structured text. The fundamental purpose of sentiment analysis, which is an important part of natural language processing (NLP), is to find and remove the emotional tone of text data. Sentiment analysis now faces additional difficulties as code-mixed languages, like Hinglish, a combination of Hindi and English, become more comm on. Conventional methods mainly handle monolingual data, which leaves code-mixed scenarios unexplored. A comprehensive approach to sentiment analysis on Hinglish is presented in this research article, which addresses problems such as inconsistent transliteration, a lack of standardized grammar, and the dearth of annotated datasets. We illustrate the potential of our method to efficiently analyze sentiments in code-mixed languages by building a solid dataset and utilizing cutting-edge machine learning and deep learning models. Our results make a substantial contribution to the developing field of multilingual natural language processing. Social media, an omnipresent web-based platform, has become a primary forum for discussion and expression, leading to the evolution of a "pseudo-language" in multilingual regions like India. This new linguistic phenomenon often involves code- mixing, where speakers seamlessly interchange languages within utterances, posing significant challenges for Natural Language Processing (NLP) research. This paper focuses on developing methods for sentiment analysis of such code-mixed social media text, specifically involving Indian languages. Sentiment analysis is a well-established area of natural language processing (NLP), but it is hard to do with Indian language texts that are informal and blend code. This work addresses important issues like inconsistent orthography, non-standard grammar, unpredictable abbreviations, and a lack of annotated resources by presenting a thorough framework for sentiment analysis on slang-rich code-mixed data. In order to capture the linguistic variability typical of code-mixed online communication, we create three new corpora from Twitter and Facebook. In order to achieve sentiment classification, the suggested pipeline combines code-mix complexity analysis, sentence boundary detection, word-level language identification, and part-of-speech tagging. Experiments show that deep learning models perform significantly better than traditional machine learning techniques, especially BiLSTM architectures and transformer-based systems like BERT. The results contribute to the wider development of multilingual and sociolinguistic NLP by demonstrating the efficacy of sophisticated neural models for sentiment processing in highly informal, linguistically hybrid social media text.
- Research Article
4
- 10.1007/s43681-024-00606-3
- Nov 27, 2024
- AI and Ethics
Natural Language Processing (NLP) research on AI Safety and social bias in AI has focused on safety for humans and social bias against human minorities. However, some AI ethicists have argued that the moral significance of nonhuman animals has been ignored in AI research. Therefore, the purpose of this study is to investigate whether there is speciesism, i.e., discrimination against nonhuman animals, in NLP research. First, we explain why nonhuman animals are relevant in NLP research. Next, we survey the findings of existing research on speciesism in NLP researchers, data, and models and further investigate this problem in this study. The findings of this study suggest that speciesism exists within researchers, data, and models, respectively. Specifically, our survey and experiments show that (a) among NLP researchers, even those who study social bias in AI, do not recognize speciesism or speciesist bias; (b) among NLP data, speciesist bias is inherent in the data annotated in the datasets used to evaluate NLP models; (c) OpenAI GPTs, recent NLP models, exhibit speciesist bias by default. Finally, we discuss how we can reduce speciesism in NLP research.