Abstract

This study examines ETDs deposited during the period 2011-2015 in an institutional repository, to determine the degree to which the documents suffer from reference rot, that is, linkrot plus content drift. The authors converted and examined 664 doctoral dissertations in total, extracting 11,437 links, finding overall that 77% of links were active, and 23% exhibited linkrot. A stratified random sample of 49 ETDs was performed which produced 990 active links, which were then checked for content drift based on mementos found in the Wayback Machine. Mementos were found for 77% of links, and approximately half of these, 492 of 990, exhibited content drift. The results serve to emphasize not only the necessity of broader awareness of this problem, but also to stimulate action on the preservation front.

Highlights

  • A significant proportion of material in institutional repositories is comprised of electronic theses and dissertations (ETDs), providing academic librarians with a rich testbed for deepening our understanding of new paradigms in scholarly publishing and their implications for long-term digital preservation

  • We proceeded in phases: first downloading ETDs from Spectrum and converting to a text format that could be examined for patterns; extracting links from each and testing programmatically for linkrot; drawing a stratified random sample of active URLs and visiting them to determine if content drift had taken place

  • For links that had no memento in Wayback, content drift assessment was based on the presence of an observable date in the current active link, including copyright, and/or other details which positively correlated against our extracted snippet information

Read more

Summary

Introduction

A significant proportion of material in institutional repositories is comprised of electronic theses and dissertations (ETDs), providing academic librarians with a rich testbed for deepening our understanding of new paradigms in scholarly publishing and their implications for long-term digital preservation. While academic libraries have long collected and preserved hard copy theses and dissertations of the parent institution, the shift to mandatory electronic deposit of this material has conferred new obligations and curatorial functions not previously incorporated into library workflows. In addition to linkrot (where a link sends the user to a webpage which is no longer available), INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 there are webpages that remain available, but whose contents have undergone change over time-known as content drift This dual phenomena of linkrot plus content drift has been characterized as reference rot by the Hiberlink project team,[2] and has important implications for digital preservation. Following a successful pilot project, the Graduate Studies Office ceased accepting paper manuscripts, and mandated electronic deposit of all theses and dissertations into Spectrum as of spring 2011

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.