Stance Detection Benchmark: How Robust is Your Stance Detection?

Benjamin Schiller,Iryna Gurevych,Johannes Daxenberger

doi:10.1007/s13218-021-00714-w

Benjamin Schiller, Iryna Gurevych + Show 1 more

Open Access

https://doi.org/10.1007/s13218-021-00714-w

Copy DOI

Journal: KI - Künstliche Intelligenz	Publication Date: Mar 26, 2021
Citations: 29	License type: open-access

Affiliation: Technical University of Darmstadt

Abstract

Stance detection (StD) aims to detect an author’s stance towards a certain topic and has become a key component in applications like fake news detection, claim validation, or argument search. However, while stance is easily detected by humans, machine learning (ML) models are clearly falling short of this task. Given the major differences in dataset sizes and framing of StD (e.g. number of classes and inputs), ML models trained on a single dataset usually generalize poorly to other domains. Hence, we introduce a StD benchmark that allows to compare ML models against a wide variety of heterogeneous StD datasets to evaluate them for generalizability and robustness. Moreover, the framework is designed for easy integration of new datasets and probing methods for robustness. Amongst several baseline models, we define a model that learns from all ten StD datasets of various domains in a multi-dataset learning (MDL) setting and present new state-of-the-art results on five of the datasets. Yet, the models still perform well below human capabilities and even simple perturbations of the original test samples (adversarial attacks) severely hurt the performance of MDL models. Deeper investigation suggests overfitting on dataset biases as the main reason for the decreased robustness. Our analysis emphasizes the need of focus on robustness and de-biasing strategies in multi-task learning approaches. To foster research on this important topic, we release the dataset splits, code, and fine-tuned weights.

Highlights

Stance detection (StD) represents a well-established task in natural language processing and is often described by having two inputs: (1) a topic of a discussion and (2) a comment made by an author
(2) In an indepth analysis with adversarial attacks, we show that Transfer Learning (TL) and multi-dataset learning (MDL) for StD generally improves the performance of machine learning (ML) models, and drastically reduces their robustness if compared to single-dataset learning (SDL) models
We show (2) by comparing BERTSDL to BERTMDL (+ 4 pp) and MT-DNNSDL to MT-DNNMDL (+ 1.8 pp). The former comparison indicates that learning from similar datasets (i.e. MDL) has a higher impact than TL for StD

Summary

Introduction

Stance detection (StD) represents a well-established task in natural language processing and is often described by having two inputs: (1) a topic of a discussion and (2) a comment made by an author. Given these two inputs, the aim is to find out whether the author is in favor or against the topic. In SemEval-2016 Task 6 [30], the second input is a short tweet and the goal is to detect, whether the author has made a positive or negative comment towards a given controversial topic: Topic: Climate Change is a Real Concern Tweet: Gone are the days where we would get temperatures of Min -2 and Max 5 in Cape Town Stance: FAVOR. The number of samples varies drasticially between datasets (for our setup: from 2394 to 75,385)

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Stance Detection Benchmark: How Robust is Your Stance Detection?

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: KI - Künstliche Intelligenz

Lead the way for us

Similar Papers

Facilitating Machine Learning Model Comparison and Explanation through a Radial Visualisation
Jianlong Zhou ... Fang Chen
Energies | VOL. 14
Jianlong Zhou, et. al.Jianlong Zhou ... Fang Chen
28 Oct 2021
Energies | VOL. 14

Comparing Machine Learning Models and Statistical Models for Predicting Heart Failure Events: A Systematic Review and Meta-Analysis.
Zhoujian Sun ... Lechao Cheng
Frontiers in Cardiovascular Medicine | VOL. 9
Zhoujian Sun, et. al.Zhoujian Sun ... Lechao Cheng
06 Apr 2022
Frontiers in Cardiovascular Medicine | VOL. 9

Comparison of machine learning and traditional methods for prediction of adverse clinical events after percutaneous coronary intervention
A Zaka ... S Bacchi
European Heart Journal | VOL. 45
A Zaka, et. al.A Zaka ... S Bacchi
28 Oct 2024
European Heart Journal | VOL. 45

A Radial Visualisation for Model Comparison and Feature Identification
Jianlong Zhou ... Fang Chen
-
Jianlong Zhou, et. al.Jianlong Zhou ... Fang Chen
08 May 2020
08 May 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Stance Detection Benchmark: How Robust is Your Stance Detection?

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: KI - Künstliche Intelligenz