Scholarly context not found: one in five articles suffers from reference rot.

Martin Klein,Robert Sanderson,Harihar Shankar,Ke Zhou,Lyudmila Balakireva,Herbert Van De Sompel,Richard Tobin

doi:10.1371/journal.pone.0115253

Martin Klein, Robert Sanderson + Show 5 more

Open Access

https://doi.org/10.1371/journal.pone.0115253

Copy DOI

Journal: PLoS ONE	Publication Date: Dec 26, 2014
Citations: 129	License type: CC BY 4.0

Affiliation: Los Alamos National Laboratory, University of Edinburgh

Abstract

The emergence of the web has fundamentally affected most aspects of information communication, including scholarly communication. The immediacy that characterizes publishing information to the web, as well as accessing it, allows for a dramatic increase in the speed of dissemination of scholarly knowledge. But, the transition from a paper-based to a web-based scholarly communication system also poses challenges. In this paper, we focus on reference rot, the combination of link rot and content drift to which references to web resources included in Science, Technology, and Medicine (STM) articles are subject. We investigate the extent to which reference rot impacts the ability to revisit the web context that surrounds STM articles some time after their publication. We do so on the basis of a vast collection of articles from three corpora that span publication years 1997 to 2012. For over one million references to web resources extracted from over 3.5 million articles, we determine whether the HTTP URI is still responsive on the live web and whether web archives contain an archived snapshot representative of the state the referenced resource had at the time it was referenced. We observe that the fraction of articles containing references to web resources is growing steadily over time. We find one out of five STM articles suffering from reference rot, meaning it is impossible to revisit the web context that surrounds them some time after their publication. When only considering STM articles that contain references to web resources, this fraction increases to seven out of ten. We suggest that, in order to safeguard the long-term integrity of the web-based scholarly record, robust solutions to combat the reference rot problem are required. In conclusion, we provide a brief insight into the directions that are explored with this regard in the context of the Hiberlink project.

Highlights

Reference Rot in Web-Based Scholarly CommunicationReferencing sources is a fundamental part of the scholarly discourse
This growth pattern is related to the PubMed Central (PMC) submission policy that changed from voluntary to mandatory in 2008 [47] and which resulted in a dramatic growth in submissions from there onwards [48]
We explored a more abstract network consisting of the corpora as sources of URI references and the top level domains (TLDs) for each of those references as targets

Summary

Introduction

Reference Rot in Web-Based Scholarly CommunicationReferencing sources is a fundamental part of the scholarly discourse. There is an expectation that referenced sources can and should be checked by others, to allow a correct interpretation of information that is being communicated and to support reproducibility of results. This credo continues as scholarly communication transitions from being a paper-based to a web-based endeavor. Whereas references in the paper-based era were purely textual, in the web era they include HTTP URIs - from here on referred to as URIs - that provide convenient and immediate access to referenced resources on the web This immediacy is one of the web’s transformative characteristics that is inherited by web-based scholarly communication and that allows for a dramatic increase in the speed of knowledge dissemination. Web-based scholarly communication inherits some of the rather frustrating characteristics of the web, and, in this paper, we focus on one: reference rot

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Scholarly context not found: one in five articles suffers from reference rot.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

Caching HTTP 404 Responses Eliminates Unnecessary Archival Replay Requests
Kritika Garg ... Michele C Weigle
-
Kritika Garg, et. al.Kritika Garg ... Michele C Weigle
01 Jan 2021
01 Jan 2021

Abstract P216: The Utility of Specific Measure for Heart Failure Patient in Web-Based Management: Reliability of Kansas City Cardiomyopathy Questionnaire
Ankur Gupta ... Jorge Gonzalez
Circulation: Cardiovascular Quality and Outcomes | VOL. 4
Ankur Gupta, et. al.Ankur Gupta ... Jorge Gonzalez
01 Nov 2011
Circulation: Cardiovascular Quality and Outcomes | VOL. 4

Replaying Archived Twitter: When your bird is broken, will it bring you down?
Kritika Garg ... Himarsha R Jayanetti
-
Kritika Garg, et. al.Kritika Garg ... Himarsha R Jayanetti
01 Sep 2021
01 Sep 2021

An Accessible Communication System for Population-Based Genetic Testing: Development and Usability Study.
...
JMIR formative research | VOL. 6
, et. al. ...
17 Oct 2022
JMIR formative research | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Scholarly context not found: one in five articles suffers from reference rot.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE