Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Adversarial Evaluation of Large Language Models for Building Robust Offensive Language Detection in Moroccan Arabic

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Offensive language detection is crucial for ensuring safe and inclusive digital environments. Identifying harmful content protects users and supports healthier online interactions. Despite advances in transformer-based models, particularly Large Language Models (LLMs), their application to this task remains underexplored for low-resource languages such as Moroccan Arabic, especially compared with high-resource languages. This study evaluates the performance of various open- and closed-source LLMs for offensive language detection in Moroccan Darija. The evaluated models include general-purpose LLMs such as LLaMA, Mistral, and Gemma, as well as Arabic-focused models such as ArabianGPT, Falcon Arabic, and Atlas-Chat. We also experiment with reasoning models such as DeepSeek and GPT-4. Beyond traditional evaluation metrics, we investigate the robustness of these LLMs and examine the impact of adversarial training on their performance. Moreover, we contribute to the field by creating a large, high-quality dataset. Our evaluation revealed that GPT-4o Mini achieved the best overall performance, reaching an F1-score of 88%. However, robustness testing under black-box and white-box adversarial attacks exposed notable vulnerabilities, with attack success rates reaching 30%, thereby highlighting the need for enhancement. Despite the complex morphology and linguistic variability of Moroccan Darija, adversarial training resulted in a notable improvement in both overall model performance and robustness against adversarial attacks, yielding an average increase of 20.89% in resistance to attacks. Furthermore, this approach enabled GPT-4o Mini to achieve an F1-score of 91%, surpassing the current state-of-the-art performance by 6%. These results highlight the importance of incorporating adversarial approaches in low-resource dialectal settings to effectively address linguistic variability and data scarcity.

Similar Papers
  • Research Article
  • 10.55041/ijsrem36608
Exploring Vulnerabilities and Threats in Large Language Models: Safeguarding Against Exploitation and Misuse
  • Aug 10, 2024
  • INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
  • Mr Aarush Varma + 1 more

This research paper delves into the inherent vulnerabilities and potential threats posed by large language models (LLMs), focusing on their implications across diverse applications such as natural language processing and data privacy. The study aims to identify and analyze these risks comprehensively, emphasizing the importance of mitigating strategies to prevent exploitation and misuse in LLM deployments. In recent years, LLMs have revolutionized fields like automated content generation, sentiment analysis, and conversational agents, yet their immense capabilities also raise significant security concerns. Vulnerabilities such as bias amplification, adversarial attacks, and unintended data leakage can undermine trust and compromise user privacy. Through a systematic examination of these challenges, this paper proposes safeguarding measures crucial for responsibly harnessing the potential of LLMs while minimizing associated risks. It underscores the necessity of rigorous security protocols, including robust encryption methods, enhanced authentication mechanisms, and continuous monitoring frameworks. Furthermore, the research discusses regulatory implications and ethical considerations surrounding LLM usage, advocating for transparency, accountability, and stakeholder engagement in policy- making and deployment practices. By synthesizing insights from current literature and real-world case studies, this study provides a comprehensive framework for stakeholders—developers, policymakers, and users—to navigate the complex landscape of LLM security effectively. Ultimately, this research aims to inform future advancements in LLM technology, ensuring its safe and beneficial integration into various domains while mitigating potential risks to individuals and society as a whole. Keywords— Adversarial attacks on LLMs, Bias in LLMs, Data privacy in LLMs, Ethical considerations LLMs, Exploitation of LLMs, Large Language Models (LLMs), Misuse of LLMs, Mitigation strategies for LLMs, Natural Language Processing (NLP), Regulatory frameworks LLMs, Responsible deployment of LLMs, Risks of LLMs, Security implications of LLMs, Threats to LLMs, Vulnerabilities in LLMs.

  • Research Article
  • Cite Count Icon 2
  • 10.3724/2096-7004.di.2024.0035
AWeCita: Generating Answer with Appropriate and Well-grained Citations Using LLMs
  • Dec 1, 2024
  • Data Intelligence
  • Suifeng Zhao + 6 more

Large language models (LLMs) excel in various Natural Language Processing tasks but struggle with hallucinations, leading to potentially misleading responses. Researchers have extensively explored LLMs’ citation practices. However, existing efforts often overlook the crucial aspects of the appropriateness and granularity of citation, which are vital for mitigating hallucination and enhancing interpretability. To bridge this gap and improve the quality of citations, we propose the Generating Answers with Appropriate and Well-grained Citations using LLMs task (AWeCita), with a focus on citing appropriately with a well granularity. Based on the traditional evaluation metrics of answer accuracy and citation correctness, we introduce two new evaluation metrics, citation appropriateness and citation granularity, to assess LLMs’ performance on this task more comprehensively and accurately. We conduct a series of exploratory experiments on ASQA and ELI5 datasets. The experimental results show that, AWeCita outperforms traditional tasks in the metric of citation granularity, most of our methods show a certain advantage incitation appropriateness, however, the improvement towards well-grained citation affects the quote-level citation correctness.

  • Research Article
  • 10.1177/20420986251405082
Transformer-based models for ADR detection: cross-drug validation and benchmarking against large language models
  • Dec 18, 2025
  • Therapeutic Advances in Drug Safety
  • Minjung Kim + 5 more

Background:Adverse drug reactions (ADRs) are harmful side effects of medications. Social media provides real-time, patient-generated data, though its unstructured format presents challenges. Natural language processing and transfer learning offer promising solutions.Objective:This study aimed to evaluate whether transformer-based models fine-tuned on a general ADR dataset can effectively classify ADRs from tweets related to glucagon-like peptide-1 (GLP-1) receptor agonists and to benchmark their performance against state-of-the-art large language models (LLMs).Design:This study employed a machine learning approach using transformer-based language models to classify ADRs in social media.Methods:BERT (bidirectional encoder representations from transformers)-base, BERTweet-base, and GPT-2 (Generative Pre-Trained Transformer-2) models were fine-tuned using Sarker and SIDER (Side Effect Resource) datasets for ADR classification. The test dataset comprised 396 tweets mentioning GLP-1 receptor agonists that were categorized as personal experiences. Model performance was primarily evaluated using the F1 score, which was used to select the optimal model. In addition, the fine-tuned transformer models were benchmarked against state-of-the-art LLMs, including ChatGPT 4o, ChatGPT 4o-mini, and Gemini 2.5 Flash.Results:Among 396 tweets, 116 (29.3%) were classified as ADRs and 280 (70.7%) as non-ADRs. Among the transformer-based models, BERTweet-base achieved the highest performance (accuracy: 0.835, F1: 0.729), outperforming both BERT-base (accuracy: 0.826, F1: 0.679) and GPT-2 (accuracy: 0.766, F1: 0.628). Among the LLMs, ChatGPT 4o-mini demonstrated the best results (accuracy: 0.970, F1: 0.948), followed by Gemini 2.5 Flash (accuracy: 0.954, F1: 0.919) and ChatGPT 4o (accuracy: 0.936, F1: 0.895). Overall, LLMs substantially outperformed the fine-tuned transformer-based models.Conclusion:Fine-tuned transformer-based models demonstrated reasonable performance in ADR detection from GLP-1 receptor agonist tweets, with BERTweet-base performing best. However, state-of-the-art LLMs, particularly ChatGPT 4o-mini, substantially outperformed these models, highlighting their potential for pharmacovigilance tasks.

  • Research Article
  • Cite Count Icon 17
  • 10.1145/3770084
A Survey on LLM-based Code Generation for Low-Resource and Domain-Specific Programming Languages
  • Oct 7, 2025
  • ACM Transactions on Software Engineering and Methodology
  • Sathvik Joel + 2 more

Large Language Models (LLMs) have shown remarkable capabilities in code generation for popular programming languages. However, their performance in Low-Resource Programming Languages (LRPLs) and Domain-Specific Languages (DSLs) remains a critical challenge. This gap affects millions of developers - with Rust alone having 3.5 million users - who are currently unable to fully leverage LLM capabilities. LRPLs and DSLs face unique challenges, including severe data scarcity and, for DSLs, highly specialized syntax and semantics that are poorly represented in general-purpose datasets. Addressing these challenges is crucial as LRPLs and DSLs significantly enhance development efficiency in specialized domains and applications, including financial and scientific works. While several surveys on LLMs for software engineering and code exist, none comprehensively address the challenges and opportunities specific to LRPLs and DSLs. Our survey fills this gap by providing a systematic review of the current state, methodologies, and challenges in leveraging LLMs for code generation in LRPL and DSL. We filtered 111 papers from over 27,000 published studies from 2020 – 2024 to understand the capabilities and limitations of LLMs in these specialized domains. We also expanded our literature search to include 5 recent papers from 2024 – 2025. We report LLMs used, benchmarks, and metrics to evaluate code generation in LRPLs and DSLs, as well as strategies used to enhance LLM performance, and the collected datasets and curation methods in this context. We identified four main evaluation techniques used in the literature, along with several metrics to assess code generation in LRPL and DSL. We categorized the methods used for LLM improvement into six main groups and summarized the novel methods and architectures proposed by the researchers. We also classified different approaches used for data collection and preparation. While different techniques, metrics, and datasets are used, there is a lack of a standard approach and a benchmark dataset to evaluate code generation in several LRPLs and DSLs. We discuss several distinctions of the studied approaches with the ones used in high-resource programming languages (HRPLs), as well as several challenges unique to these languages, especially DSLs. The challenges stem from the scarcity of data, the unique requirements, and specialized domains, which often need expertise guidelines or domain-specific tools. Accordingly, we provide insights into different research opportunities for the studied aspects. This survey serves as a comprehensive resource for researchers and practitioners working at the intersection of LLMs, software engineering, and specialized programming languages, providing a foundation for future advancements in LRPL and DSL code generation. A GitHub repository was created to organize the papers of this survey at https://github.com/jie-jw-wu/Survey-CodeLLM4LowResource-DSL .

  • Research Article
  • Cite Count Icon 50
  • 10.1162/coli_a_00561
LLM-based NLG Evaluation: Current Status and Challenges
  • Jun 24, 2025
  • Computational Linguistics
  • Mingqi Gao + 5 more

Evaluating natural language generation (NLG) is a vital but challenging problem in natural language processing. Traditional evaluation metrics mainly capturing content (e.g., n-gram) overlap between system outputs and references are far from satisfactory, and large language models (LLMs) such as ChatGPT have demonstrated great potential in NLG evaluation in recent years. Various automatic evaluation methods based on LLMs have been proposed, including metrics derived from LLMs, prompting LLMs, fine-tuning LLMs, and human–LLM collaborative evaluation. In this survey, we first give a taxonomy of LLM-based NLG evaluation methods, and discuss their pros and cons, respectively. Lastly, we discuss several open problems in this area and point out future research directions.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.54254/2755-2721/93/20240922
Research on adversarial attack and defense of large language models
  • Nov 8, 2024
  • Applied and Computational Engineering
  • Jidong Yang + 3 more

Abstract. Large language models (LLMs) have made excellent progress in text and image understanding and generation. However, with the wide range of applications of these models in various industries, the issue of their security, especially the defense against adversarial attacks, has become a focus of research. This study focuses on exploring the adversarial attacks faced by LLMs and their defense strategies, especially the design and optimization of defense mechanisms. Through literature review and case studies, this paper analyzes in detail the white-box and black-box attack patterns against LLMs, including model inversion, backdoor attacks, and token-based strategies. In response to these attacks, this paper proposes a series of defense strategies, including preventive measures such as data augmentation, adversarial training and model regularization, as well as real-time attack detection and response strategies such as anomaly detection and adversarial sample detection techniques. The core of this research is to improve the robustness and trustworthiness of LLMs, providing the necessary guarantees for their integration and sustainability in multiple industrial applications. In addition, this paper proposes future research directions, highlighting the importance of developing advanced defense systems, promoting interdisciplinary research and exploring new applications for LLMs. This research provides valuable insights into understanding and improving the security defense mechanisms of LLMs, which is essential for maintaining the security and user trust of these models.

  • Research Article
  • Cite Count Icon 3
  • 10.48084/etasr.10331
Evaluation of Arabic Large Language Models on Moroccan Dialect
  • Jun 4, 2025
  • Engineering, Technology & Applied Science Research
  • Faisal Qarah + 1 more

Large Language Models (LLMs) have shown outstanding performance in many Natural Language Processing (NLP) tasks for high-resource languages, especially English, primarily because most of them were trained on widely available text resources. As a result, many low-resource languages, such as Arabic and African languages and their dialects, are not well studied, raising concerns about whether LLMs can perform fairly across them. Therefore, evaluating the performance of LLMs for low-resource languages and diverse dialects is crucial. This study investigated the performance of LLMs in Moroccan Arabic, a low-resource dialect spoken by approximately 30 million people. The performance of 14 Arabic pre-trained models was evaluated on the Moroccan dialect, employing 11 datasets across various NLP tasks such as text classification, sentiment analysis, and offensive language detection. The evaluation results showed that MARBERTv2 achieved the highest overall average F1-score of 83.47, while the second-best model, DarijaBERT-mix, had an average F1-score of 83.38. These findings provide valuable insights into the effectiveness of current LLMs for low-resource languages, particularly the Moroccan dialect.

  • Research Article
  • Cite Count Icon 3
  • 10.1186/s42400-024-00338-1
Ctta: a novel chain-of-thought transfer adversarial attacks framework for large language models
  • Jun 1, 2025
  • Cybersecurity
  • Xinxin Yue + 3 more

Recent studies have indicated that large language models (LLMs) remain susceptible to adversarial attacks, despite enhanced robustness through the chain-of-thought (CoT) capability. However, this capability also introduces the potential for more covert and effective adversarial attack methods. This paper proposes a CoT Transfer Adversarial attack framework (CTTA) for general LLMs. Initially, we utilize a pre-trained model based on the transformer architecture and fine-tune it on various tasks to serve as a surrogate model. Subsequently, different levels of adversarial attack algorithms are utilized, and the generated adversarial samples are used as transfer samples. A thought chain-based adversarial transfer attack framework is constructed using transfer samples and thought chain techniques. Finally, various indicators are utilized to assess the performance of the general LLMs in response to this attack. The results demonstrate that the attack framework surpasses current state-of-the-art research. Numerous experiments on LLMs with varying performance and parameter sizes have validated the effectiveness, stability, and generalizability of this attack. The model’s error response and the superiority of this attack are thoroughly examined using attention by gradient technology, confirming the security threats posed by LLMs when leveraging CoT capability. This has significant implications for enhancing the security and robustness of LLMs.

  • Research Article
  • 10.52731/iee.v12.i1.848
Offensive Language Detection on Social Media Using Three Language Models and Three Datasets
  • Jan 1, 2026
  • Information Engineering Express
  • Zhenming Li + 1 more

There are more and more offensive posts on Social Media nowadays.Those posts are harmful and should be treated seriously.The most efficient way to detect offensive posts is to fine-tune a Large Language Model (LLM) on an offensive language dataset.In our research, we focus on maximizing the capacity of LLMs on offensive language detection tasks on Social Media.We select three LLMs with different attributes (DeepMoji, Bert, and HateBert) and three offensive language datasets (OLID, Curious Cat, and Ask FM).We mainly discuss achieving the best performance by configuring the LLMs and datasets.Experimental results show that simply fine-tuning an LLM with larger data can not always achieve the best performance.The combination of LLMs was effective, especially the combination of DeepMoji and HateBert.

  • Research Article
  • Cite Count Icon 2
  • 10.59490/dgo.2025.969
Performance Analysis of LLMs for Abstractive Summarization of Brazilian Legislative Documents
  • May 20, 2025
  • Conference on Digital Government Research
  • Danilo C.G De Lucena + 5 more

Legislative documents present substantial obstacles to summarization due to their complex argument structures and specialized terminology. This research investigates the application of Large Language Models (LLMs) in summarizing Brazilian legislative proposals from the Chamber of Deputies, examining a dataset of over 56 thousand texts from 2013 to 2023. The paper explores three main summarization methodologies: extractive, abstractive, and hybrid, with an emphasis on abstractive summarization using LLMs. The performance of the LLM LLAMA2-13b is assessed using metrics such as ROUGE, BLEU, METEOR, BERTScore, and BERTopic, compared against reference summaries. The results show that LLMs can generate coherent and informative summaries, with positive evaluation metric results. Notably, the study reveals that traditional summary evaluation metrics may not be adequate for evaluating LLMs in summarization tasks. On the other hand, metrics based on pre-trained models like BERT provide a more effective evaluation of this innovative automatic summarization approach.

  • Research Article
  • 10.1093/ndt/gfae069.792
#2924 Comparison of large language models and traditional natural language processing techniques in predicting arteriovenous fistula failure
  • May 23, 2024
  • Nephrology Dialysis Transplantation
  • Suman Lama + 6 more

Background and Aims Large language models (LLMs) have gained significant attention in the field of natural language processing (NLP), marking a shift from traditional techniques like Term Frequency-Inverse Document Frequency (TF-IDF). We developed a traditional NLP model to predict arteriovenous fistula (AVF) failure within next 30 days using clinical notes. The goal of this analysis was to investigate whether LLMs would outperform traditional NLP techniques, specifically in the context of predicting AVF failure within the next 30 days using clinical notes. Method We defined AVF failure as the change in status from active to permanently unusable status or temporarily unusable status. We used data from a large kidney care network from January 2021 to December 2021. Two models were created using LLMs and traditional TF-IDF technique. We used “distilbert-base-uncased”, a distilled version of BERT base model [1], and compared its performance with traditional TF-IDF-based NLP techniques. The dataset was randomly divided into 60% training, 20% validation and 20% test dataset. The test data, comprising of unseen patients’ data was used to evaluate the performance of the model. Both models were evaluated using metrics such as area under the receiver operating curve (AUROC), accuracy, sensitivity, and specificity. Results The incidence of 30 days AVF failure rate was 2.3% in the population. Both LLMs and traditional showed similar overall performance as summarized in Table 1. Notably, LLMs showed marginally better performance in certain evaluation metrics. Both models had same AUROC of 0.64 on test data. The accuracy and balanced accuracy for LLMs were 72.9% and 59.7%, respectively, compared to 70.9% and 59.6% for the traditional TF-IDF approach. In terms of specificity, LLMs scored 73.2%, slightly higher than the 71.2% observed for traditional NLP methods. However, LLMs had a lower sensitivity of 46.1% compared to 48% for traditional NLP. However, it is worth noting that training on LLMs took considerably longer than TF-IDF. Moreover, it also used higher computational resources such as utilization of graphics processing units (GPU) instances in cloud-based services, leading to higher cost. Conclusion In our study, we discovered that advanced LLMs perform comparably to traditional TF-IDF modeling techniques in predicting the failure of AVF. Both models demonstrated identical AUROC. While specificity was higher in LLMs compared to traditional NLP, sensitivity was higher in traditional NLP compared to LLMs. LLM was fine-tuned with a limited dataset, which could have influenced its performance to be similar to that of traditional NLP methods. This finding suggests that while LLMs may excel in certain scenarios, such as performing in-depth sentiment analysis of patient data for complex tasks, their effectiveness is highly dependent on the specific use case. It is crucial to weigh the benefits against the resources required for LLMs, as they can be significantly more resource-intensive and costly compared to traditional TF-IDF methods. This highlights the importance of a use-case-driven approach in selecting the appropriate NLP technique for healthcare applications.

  • Preprint Article
  • 10.31234/osf.io/wmjns_v1
Beyond the Numbers: Using Large Language Models to Analyze Student Feedback in Large Psychology Courses
  • Jul 23, 2025
  • Mariana Teles

Background: Psychology instructors struggle to analyze qualitative student feedback in large courses where traditional Likert-scale evaluations fail to capture student experience complexity. Current approaches are either too time-consuming or lack contextual understanding for actionable insights. Objective: To develop and validate a framework using open-source Large Language Models (LLMs) to analyze student feedback in psychology courses, comparing LLM insights with traditional evaluation metrics. Method: We implemented Facebook's BART LLM using zero-shot classification on open-ended course evaluations from a large psychology course comparing traditional lecture with active learning formats across two semesters. Data included 270 evaluations yielding 678 responses, analyzed using four learning categories. Results: LLM analysis revealed striking discrepancies with traditional metrics. While Likert-scale responses showed minimal differences between formats (Cohen's d = 0.16-0.33), LLM analysis revealed large, significant effects across all dimensions (Cohen's d = 0.94-1.13, p < .001). Validation confirmed reliability through moderate correlations with related Likert items (r = 0.29-0.44). Conclusion: LLM analysis demonstrated superior sensitivity in detecting teaching approach differences, capturing qualitative distinctions that numerical ratings miss while addressing challenges of qualitative data volume and analysis time. Teaching Implications: This methodology enables instructors to analyze hundreds of responses in minutes using accessible tools, providing practical evidence-based teaching improvement insights.

  • Research Article
  • Cite Count Icon 9
  • 10.1609/aaai.v37i13.26879
Exploring Social Biases of Large Language Models in a College Artificial Intelligence Course
  • Jun 26, 2023
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Skylar Kolisko + 1 more

Large neural network-based language models play an increasingly important role in contemporary AI. Although these models demonstrate sophisticated text generation capabilities, they have also been shown to reproduce harmful social biases contained in their training data. This paper presents a project that guides students through an exploration of social biases in large language models. As a final project for an intermediate college course in Artificial Intelligence, students developed a bias probe task for a previously-unstudied aspect of sociolinguistic or sociocultural bias they were interested in exploring. Through the process of constructing a dataset and evaluation metric to measure bias, students mastered key technical concepts, including how to run contemporary neural networks for natural language processing tasks; construct datasets and evaluation metrics; and analyze experimental results. Students reported their findings in an in-class presentation and a final report, recounting patterns of predictions that surprised, unsettled, and sparked interest in advocating for technology that reflects a more diverse set of backgrounds and experiences. Through this project, students engage with and even contribute to a growing body of scholarly work on social biases in large language models.

  • Research Article
  • Cite Count Icon 13
  • 10.1016/j.jbi.2024.104707
On the role of the UMLS in supporting diagnosis generation proposed by Large Language Models
  • Aug 13, 2024
  • Journal of Biomedical Informatics
  • Majid Afshar + 4 more

On the role of the UMLS in supporting diagnosis generation proposed by Large Language Models

  • Research Article
  • 10.1109/access.2025.3616181
Is This the Best Prompt? Scoring Prompts for Arabic NLP Across LLMs
  • Jan 1, 2025
  • IEEE Access
  • Dania Refai + 2 more

Large language models (LLMs) demonstrate impressive capabilities across a range of natural language processing (NLP) tasks. However, they are highly sensitive to prompt design, which significantly affects their ability to align outputs with user intent. Poorly crafted prompts can result in misleading or irrelevant responses. Nevertheless, selecting the most effective prompt from several candidates remains an open challenge. Despite the growing importance of prompt engineering, there is no comprehensive framework to systematically evaluate prompts across multiple dimensions, such as similarity, performance, efficiency, and consistency, particularly in scenarios where performance can be traded off against computational cost or consistency. In this study, we propose a novel scoring framework to evaluate handcrafted prompts across four essential dimensions: Similarity, performance, efficiency (measured by latency, input tokens, and output tokens), and consistency. Considering Arabic, a relatively low-resource, morphologically rich language, as a case study, we evaluated this framework on six diverse text classification tasks: Dialect identification, sentiment analysis, offensive language detection, stance detection, emotion detection, and sarcasm detection. Our methodology assesses prompts across multiple LLMs (GPT-4o mini, LLaMA, ALLAM, and Claude 3.5 Haiku), providing valuable insights into model-specific and task-specific performance patterns. Results demonstrate that no single prompt universally excels across all dimensions; rather, optimal prompts vary based on specific task requirements and evaluation priorities. The proposed framework enables the identification of the most effective prompts for each application context while revealing important trade-offs between performance metrics. By addressing the unique challenges of Arabic NLP, this research not only advances prompt engineering for underrepresented languages but also provides a systematic and adaptable methodology for prompt evaluation that can enhance LLM performance across a range of linguistic contexts, diverse domains, tasks, and various model architectures.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant