Articles published on Argument mining
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
110 Search results
Sort by Recency
- New
- Research Article
- 10.1145/3777009
- Nov 14, 2025
- ACM Computing Surveys
- Farid Ariai + 2 more
Natural Language Processing (NLP) is revolutionising the way both professionals and laypersons operate in the legal field. The considerable potential for NLP in the legal sector, especially in developing computational assistance tools for various legal processes, has captured the interest of researchers for years. This survey follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses framework, reviewing 154 studies, with a final selection of 131 after manual filtering. It explores foundational concepts related to NLP in the legal domain, illustrating the unique aspects and challenges of processing legal texts, such as extensive document lengths, complex language, and limited open legal datasets. We provide an overview of NLP tasks specific to legal text, such as Document Summarisation, Named Entity Recognition, Question Answering, Argument Mining, Text Classification, and Judgement Prediction. Furthermore, we analyse both developed legal-oriented language models, and approaches for adapting general-purpose language models to the legal domain. Additionally, we identify sixteen open research challenges, including the detection and mitigation of bias in artificial intelligence applications, the need for more robust and interpretable models, and improving explainability to handle the complexities of legal language and reasoning.
- Research Article
- 10.1177/19462174251344764
- Jun 30, 2025
- Argument & Computation
- Ekaterina Sviridova + 2 more
Argumentation is the process of creating arguments for and against competing claims. Computational argumentation involves different ways of analyzing and reasoning upon arguments and their relations. More precisely, Argument Mining is the research field aiming at automatically identifying and classifying argument structures in text. The research field is mainly focussed on the extraction of explicit argument structures (i.e., claims and premises connected by support and attack relations). However, an even more challenging task consists in extracting implicit argument structures in text (e.g., enthymemes). These structures are particularly valuable to then address argument reasoning, e.g., on incomplete and uncertain information, to finally compute the set of acceptable arguments, i.e., argument justification and skepticism. In this paper, we present and compare current approaches and available datasets for the novel task of Implicit Argument Mining. Future work perspectives are discussed to pave the way to further studies in this direction.
- Research Article
- 10.1145/3746170
- Jun 25, 2025
- ACM Transactions on the Web
- Salim Hafid + 6 more
Web claims, seen as assertions shared on the web and eligible for fact-checking, are at the heart of online discourse. They have been studied extensively on a variety of downstream tasks such as fact-checking, claim retrieval, bias detection, argument mining or viewpoint discovery. On the other hand, claims originating from scientific publications have also been the subject of several downstream NLP tasks. However, research carried out so far has yet to focus on scientific web claims, which are scientific claims made on the web (e.g., on social media and news articles). The process of detecting and fact-checking a claim from the web can be very different depending on whether the claim is scientific or not, thus making it crucial for the developed datasets, methods, and models to make a distinction between the two. With this work, we aim at understanding what makes this distinction necessary, by understanding the linguistic differences between scientific and non-scientific claims on the web, and the impact those differences have on existing downstream tasks. To do so, we manually annotate 1,524 web claims from established benchmarks for fact-checking-related tasks, and we run statistical tests to analyze and compare linguistic features of each group. We find that scientific claims on the web use more analytical speech, but also use more sentiment-related speech, more expressions of physical motion, and have distinct parts of speech (PoS) and punctuation styles. We also conduct experiments showing that BERT-based language models perform worse on scientific web claims by up to 17 F1 points for several downstream tasks. To understand why, we develop a novel methodology to map predictive tokens of language models to explainable linguistic features and find that language models fail to detect a specific subset of predictive features of scientific web claims. We conclude by stating that language models aimed at studying scientific web claims ought to be trained on scientific web discourse, as opposed to being trained only on generic web discourse or only on scientific text from scientific publications.
- Research Article
- 10.1007/s10115-025-02500-8
- Jun 21, 2025
- Knowledge and Information Systems
- Somaye Moslemnejad + 1 more
Abstract Argument relation classification (ARC) between argument components (ACs) has made significant progress in recent years. However, many existing approaches either rely heavily on external knowledge or on linguistic information encoded in Pre-trained Language Models (PLMs) or large language models, often neglecting the extraction of fine-grained, semantic information within ACs. This information is essential for developing strategies tailored to the specific challenges of ARC tasks. To address this, we propose leveraging Frame Semantic Parsing (FSP), an open-source transformer, to extract semantic frames. These frames, consisting of triggers and arguments along with their roles, represent the semantic relationships within ACs. We then design two types of prompt templates: one for triggers and arguments, and another for frames and roles, to generate conceptual information that facilitates ARC. Finally, we utilize the RoBERTa PLM model, training it with the two types of prompt templates using a Siamese network architecture, which encodes two inputs separately, with multi-head attention. Extensive experiments across six domain-specific argument mining datasets demonstrate that our FSP–ARC approach yields competitive results compared to four state-of-the-art baselines in terms of accuracy, precision, recall, and macro Macro F1s.
- Research Article
- 10.1162/coli_a_00553
- Jun 19, 2025
- Computational Linguistics
- Jianzhu Bao + 5 more
Abstract Argumentation is a fundamental human activity that involves reasoning and persuasion, which also serves as the basis for the development of AI systems capable of complex reasoning. In NLP, to better understand human argumentation, argument structure analysis aims to identify argument components, such as claims and premises, and their relations from free text. It encompasses a variety of divergent tasks, such as end-to-end argument mining, argument pair extraction, and argument quadruplet extraction. Existing methods are usually tailored to only one specific argument structure analysis task, overlooking the inherent connections among different tasks. We observe that the fundamental goal of these tasks is similar: identifying argument components and their interrelations. Motivated by this, we present a unified generative framework for argument structure analysis (UniASA). It can uniformly address multiple argument structure analysis tasks in a sequence-to-sequence manner. Further, we enhance UniASA with a multi-view learning strategy based on subtask decomposition. We conduct experiments on seven datasets across three tasks. The results indicate that UniASA can address these tasks uniformly and achieve performance that is either superior to or comparable with the previous state-of-the-art methods. Also, we show that UniASA can be effectively integrated with large language models, such as Llama, through fine-tuning or in-context learning.
- Research Article
- 10.1609/icwsm.v19i1.35938
- Jun 7, 2025
- Proceedings of the International AAAI Conference on Web and Social Media
- Heba Al Heraki + 1 more
This study presents an analysis of digital polarization on the topic of the Hijab by examining YouTube comments in Arabic. Employing a novel dataset of around 10K annotated comments, this research investigates the digital discourse using seven labels: Stance, Use of Sarcasm, Argumentation, Cordiality, Offensiveness, Hopefulness, and Apparent Gender of Commenters. The findings reveal significant insights into gender dynamics and the prevalence of specific rhetorical strategies within the debate. This study contributes to the broader field of polarization and argument mining, offering a unique lens on the intersection of digital culture and societal issues in the Arab context.
- Research Article
- 10.55041/ijsrem47073
- May 7, 2025
- INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
- Rudrendra Bahadur Singh
Abstract The goal of argument mining, or AM, is to identify the argumentative structures in a document. Prior approaches necessitate a number of subtasks, including component classification, relation classification, and span identification. Therefore, rule-based postprocessing is required for these methods to extract generative structures from each subtask's output. This method increases the model's complexity and broadens the hyperparameter search space. We suggest a straightforward yet effective technique based on a text-to-text generation strategy employing a pretrained encoder-decoder language model to overcome this challenge. Our approach eliminates the requirement for task-specific postprocessing and hyperparameter optimisation by producing argumentatively annotated text for spans, components, and relations all at once. Additionally, as it is a simple text-to-text creation method, we may readily modify our strategy to fit different kinds of argumentative frameworks. Experimental findings show that our strategy works well, achieving state-of-the-art performance on three distinct benchmark datasets: the Cornell eRulemaking Cor- pus (CDCP), AbstRCT and the Argument-annotated Essays Corpus (AAEC). Keywords: Argument Mining, Text-to-Text Generation, T5
- Research Article
- 10.1177/19462174251330980
- Apr 22, 2025
- Argument & Computation
- Xiaoou Wang + 2 more
The need for automated fact-checking has become urgent with the rise of misleading content on social media. Recently, Fake News Classification (FNC) has evolved to incorporate justifications provided by fact-checkers to explain their decisions. In this work, we argue that an argumentative representation of fact-checkers’ justifications can improve the precision and explainability of FNC systems. To address this challenging task, we present LIARArg, a novel linguistic resource composed of 2,832 news and their justifications. LIARArg extends the 6-label FNC dataset LIAR-PLUS with argumentation structures, leading to the first FNC dataset annotated with argument components (claim and premise) and fine-grained relations (attack, support, partial support and partial attack). To integrate argumentation in FNC, we propose a novel joint learning method combining, for the first time, Argument Mining and FNC which outperforms state-of-the-art approaches, especially for news with intermediate truthfulness labels. Besides, our experimental setting demonstrates that fine-grained relations allow an extra performance boost. We also show that the argumentative representation of human justifications can be exploited in a Chain-of-Thought manner both in prompts and model output, paving a promising avenue for research in explainable fact-checking. Finally, our fully automated pipeline shows that integrating argumentation into FNC is not only feasible but also effective.
- Research Article
- 10.1038/s41598-025-98554-3
- Apr 17, 2025
- Scientific Reports
- Eunhye Kim + 2 more
The announcement of LK-99 as a potential room-temperature, ambient-pressure superconductor sparked widespread debate across both traditional news outlets and social media platforms. This study investigates public perceptions and argumentation patterns surrounding LK-99 by applying sentiment analysis and computational argument mining to a diverse dataset. We analyzed 797 YouTube videos, 71,096 comments, and 1,329 news articles collected between 2023 and 2024. Our results reveal distinct sentiment trajectories: while news articles and YouTube posts exhibit fluctuating yet predominantly positive tones, user comments consistently maintain a more negative sentiment. Discourse analysis shows that structured argumentation—especially reasoning based on expert opinions, observable signs, and anticipated consequences—is prevalent in professionally curated content, whereas a significant proportion of user comments lack identifiable argumentation schemes. Moreover, channel-level analysis indicates that non-expert channels, despite their limited specialization in science, attract higher audience engagement than traditional science channels. These findings highlight the complexities of digital science communication and underscore the need for adaptive strategies that bridge the gap between expert evidence and public discourse. Our study provides practical recommendations to enhance public understanding of scientific advancements in digital spaces.
- Research Article
- 10.52783/jisem.v10i26s.4237
- Mar 28, 2025
- Journal of Information Systems Engineering and Management
- Rudrendra Bahadur Singh
Argumentation Mining (AM), a specialized branch of Natural Language Processing (NLP), which extract arguments from text and mapping out their relationships. While machine learning has been extensively explored for AM sub tasks, there's still a gap in structuring these methods to spot common patterns across different applications. This study, based on a review of 64 research papers, breaks down how AM is applied across various domains ranging from user-generated texts, English texts, speech to debates, legal documents, and scientific or medical texts. Among these, text takes the lead as the most researched area. Particularly Support Vector Machines (SVM), Bidirectional Encoder Representations from Transformers (BERT) and Bidirectional Long Short-Term Memory (BiLSTM), Convolutional Neural Network (CNN) are some machine learning models that dominates the field. The effectiveness of these models varies depending upon the type of text, excelling in user-generated text where as others perform better with scientific or medical data. The study highlights the need to further explore less-researched areas especially machine learning applications in legal, medical scientific and English texts and critically examine how Large language model and deep learning stacks up against traditional methods. By mapping these insights, the goal is to help researchers pick the right approach for specific AM tasks, ultimately pushing the field forward.
- Research Article
1
- 10.1136/bmjhci-2024-101017
- Oct 1, 2024
- BMJ Health & Care Informatics
- Shuang Wang + 2 more
BackgroundResearch commentaries have the potential for evidence appraisal in emphasising, correcting, shaping and disseminating scientific knowledge.ObjectivesTo identify the appropriate bibliographic source for capturing commentary information, this study compares comment data...
- Research Article
1
- 10.1016/j.ijar.2024.109267
- Aug 10, 2024
- International Journal of Approximate Reasoning
- Federico M Schmidt + 2 more
Identifying arguments within a text: Categorizing errors and their impact in arguments' relation prediction
- Research Article
1
- 10.1007/s10506-024-09415-9
- Aug 3, 2024
- Artificial Intelligence and Law
- Kilian Lüders + 1 more
Abstract Proportionality is a central and globally spread argumentation technique in public law. This article provides a conceptual introduction to proportionality and argues that such a domain-specific form of argumentation is particularly interesting for argument mining. As a major contribution of this article, we share a new dataset for which proportionality has been annotated. The dataset consists of 300 German Federal Constitutional Court decisions annotated at the sentence level (54,929 sentences). In addition to separating textual parts, a fine-grained system of proportionality categories was used. Finally, we used these data for a classification task. We built classifiers that predict whether or not proportionality is invoked in a sentence. We employed several models, including neural and deep learning models and transformers. A BERT-BiLSTM-CRF model performed best.
- Research Article
2
- 10.1007/s11023-024-09658-0
- May 4, 2024
- Minds and Machines
- Jonas Aaron Carstens + 1 more
While early optimists have seen online discussions as potential spaces for deliberation, the reality of many online spaces is characterized by incivility and irrationality. Increasingly, AI tools are considered as a solution to foster deliberative discourse. Against the backdrop of previous research, we show that AI tools for online discussions heavily focus on the deliberative norms of rationality and civility. In the operationalization of those norms for AI tools, the complex deliberative dimensions are simplified, and the focus lies on the detection of argumentative structures in argument mining or verbal markers of supposedly uncivil comments. If the fairness of such tools is considered, the focus lies on data bias and an input–output frame of the problem. We argue that looking beyond bias and analyzing such applications through a sociotechnical frame reveals how they interact with social hierarchies and inequalities, reproducing patterns of exclusion. The current focus on verbal markers of incivility and argument mining risks excluding minority voices and privileges those who have more access to education. Finally, we present a normative argument why examining AI tools for online discourses through a sociotechnical frame is ethically preferable, as ignoring the predicable negative effects we describe would present a form of objectionable indifference.
- Research Article
1
- 10.3233/aac-230008
- Mar 22, 2024
- Argument & Computation
- Gil Rocha + 3 more
Available corpora for Argument Mining differ along several axes, and one of the key differences is the presence (or absence) of discourse markers to signal argumentative content. Exploring effective ways to use discourse markers has received wide attention in various discourse parsing tasks, from which it is well-known that discourse markers are strong indicators of discourse relations. To improve the robustness of Argument Mining systems across different genres, we propose to automatically augment a given text with discourse markers such that all relations are explicitly signaled. Our analysis unveils that popular language models taken out-of-the-box fail on this task; however, when fine-tuned on a new heterogeneous dataset that we construct (including synthetic and real examples), they perform considerably better. We demonstrate the impact of our approach on an Argument Mining downstream task, evaluated on different corpora, showing that language models can be trained to automatically fill in discourse markers across different corpora, improving the performance of a downstream model in some, but not all, cases. Our proposed approach can further be employed as an assistant tool for better discourse understanding.
- Research Article
3
- 10.1016/j.ipm.2024.103707
- Mar 10, 2024
- Information Processing and Management
- Shuang Wang + 2 more
Scientific commentaries are dealing with uncertainty and complexity in science
- Research Article
- 10.1007/s00521-023-09250-0
- Dec 18, 2023
- Neural Computing and Applications
- Tiezheng Mao + 3 more
Our research focuses on extracting exchanged views from dialogical documents through argument pair extraction (APE). The objective of this process is to facilitate comprehension of complex argumentative discourse by finding the related arguments. The APE comprises two stages: argument mining and argument matching. Researchers typically employ sequence labeling models for mining arguments and text matching models to calculate the relationships between them, thereby generating argument pairs. However, these approaches fail to capture long-distance contextual information and struggle to fully comprehend the complex structure of arguments. In our work, we propose the context-aware heterogeneous graph matching (HGMN) model for the APE task. First, we design a graph schema specifically tailored to argumentative texts, along with a heterogeneous graph attention network that effectively captures context information and structural information of arguments. Moreover, the text matching between arguments is converted into a graph matching paradigm and a multi-granularity graph matching model is proposed to handle the intricate relationships between arguments at various levels of granularity. In this way, the semantics of argument are modeled structurally and thus capture the complicated correlations between arguments. Extensive experiments are conducted to evaluate the HGMN model, including comparisons with existing methods and the GPT series of large language models (LLM). The results demonstrate that HGMN outperforms the state-of-the-art method.
- Research Article
1
- 10.1111/lnc3.12505
- Dec 15, 2023
- Language and Linguistics Compass
- Anna Lindahl + 1 more
Abstract Argumentation has long been studied in a number of disciplines, including several branches of linguistics. In recent years, computational processing of argumentation has been added to the list, reflecting a general interest from the field of natural language processing (NLP) in building natural language understanding systems for increasingly intricate language phenomena. Computational argumentation analysis – referred to as argumentation mining in the NLP literature – requires large amounts of real‐world text with manually analyzed argumentation. This process is known as annotation in the NLP literature and such annotated datasets are used both as “gold standards” for assessing the quality of NLP applications and as training data for the machine learning algorithms underlying most state of the art approaches to NLP. Argumentation annotation turns out to be complex, both because argumentation can be complex in itself and because it does not come across as a unitary phenomenon in the literature. In this survey we review how argumentation has been studied in other fields, how it has been annotated in NLP and what has been achieved so far. We conclude with describing some important current and future issues to be resolved.
- Research Article
- 10.1016/j.jbi.2023.104555
- Nov 24, 2023
- Journal of Biomedical Informatics
- Vera Davydova + 2 more
Data and models for stance and premise detection in COVID-19 tweets: Insights from the Social Media Mining for Health (SMM4H) 2022 shared task
- Research Article
6
- 10.3389/frai.2023.1278796
- Nov 17, 2023
- Frontiers in artificial intelligence
- Abdullah Al Zubaer + 2 more
Generative pre-trained transformers (GPT) have recently demonstrated excellent performance in various natural language tasks. The development of ChatGPT and the recently released GPT-4 model has shown competence in solving complex and higher-order reasoning tasks without further training or fine-tuning. However, the applicability and strength of these models in classifying legal texts in the context of argument mining are yet to be realized and have not been tested thoroughly. In this study, we investigate the effectiveness of GPT-like models, specifically GPT-3.5 and GPT-4, for argument mining via prompting. We closely study the model's performance considering diverse prompt formulation and example selection in the prompt via semantic search using state-of-the-art embedding models from OpenAI and sentence transformers. We primarily concentrate on the argument component classification task on the legal corpus from the European Court of Human Rights. To address these models' inherent non-deterministic nature and make our result statistically sound, we conducted 5-fold cross-validation on the test set. Our experiments demonstrate, quite surprisingly, that relatively small domain-specific models outperform GPT 3.5 and GPT-4 in the F1-score for premise and conclusion classes, with 1.9% and 12% improvements, respectively. We hypothesize that the performance drop indirectly reflects the complexity of the structure in the dataset, which we verify through prompt and data analysis. Nevertheless, our results demonstrate a noteworthy variation in the performance of GPT models based on prompt formulation. We observe comparable performance between the two embedding models, with a slight improvement in the local model's ability for prompt selection. This suggests that local models are as semantically rich as the embeddings from the OpenAI model. Our results indicate that the structure of prompts significantly impacts the performance of GPT models and should be considered when designing them.