MultiCite: Modeling realistic citations requires moving beyond the single-sentence single-label setting

Bailey Kuehl ,Arman Cohan ,Anne Lauscher ,David Jurgens ,Kyle Lo

doi:10.48448/5ksy-5549

Abstract

Citation context analysis (CCA) is an important task in natural language processing that studies how and why scholars discuss each others’ work. Despite decades of study, computational methods for CCA have largely relied on overly-simplistic assumptions of how authors cite, which ignore several important phenomena. For instance, scholarly papers often contain rich discussions of cited work that span multiple sentences and express multiple intents concurrently. Yet, recent work in CCA is often approached as a single-sentence, single-label classification task, and thus many datasets used to develop modern computational approaches fail to capture this interesting discourse. To address this research gap, we highlight three understudied phenomena for CCA and release MULTICITE, a new dataset of 12.6K citation contexts from 1.2K computational linguistics papers that fully models these phenomena. Not only is it the largest collection of expert-annotated citation contexts to-date, MULTICITE contains multi-sentence, multi-label citation contexts annotated through-out entire full paper texts. We demonstrate how MULTICITE can enable the development of new computational methods on three important CCA tasks. We release our code and dataset at https://github.com/allenai/multicite.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MultiCite: Modeling realistic citations requires moving beyond the single-sentence single-label setting

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

How Does Castells's The Rise of the Network Society Contribute to Research in Human Geography? A Citation Content and Context Analysis
Feng Zhen ... Xia Wang
The Professional Geographer | VOL. 72
Feng Zhen, et. al.Feng Zhen ... Xia Wang
12 Jul 2019
The Professional Geographer | VOL. 72

Word Embedding for Bengali Language using Domain-related Corpus
Ashutosh Bandyopadhyay ... Jayashree Nair
-
Ashutosh Bandyopadhyay, et. al.Ashutosh Bandyopadhyay ... Jayashree Nair
26 Apr 2023
26 Apr 2023

Automatic Extraction of Comprehensive Drug Safety Information from Adverse Drug Event Narratives in the Korea Adverse Event Reporting System Using Natural Language Processing Techniques.
Siun Kim ... Yesol Hong
Drug Safety | VOL. 46
Siun Kim, et. al.Siun Kim ... Yesol Hong
17 Jun 2023
Drug Safety | VOL. 46

Multi-Task Text Classification using Graph Convolutional Networks for Large-Scale Low Resource Language
Mounika Marreddy ... Radhika Mamidi
-
Mounika Marreddy, et. al.Mounika Marreddy ... Radhika Mamidi
18 Jul 2022
18 Jul 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MultiCite: Modeling realistic citations requires moving beyond the single-sentence single-label setting

Abstract

Talk to us

Similar Papers