Digital humanities and web archives: Possible new paths for combining datasets

Niels Brügger

doi:10.1007/s42803-021-00038-z

Abstract

This article discusses the importance of web archives making their collections available as data and not only as sources seen through the Wayback Machine’s interface where only individual web pages are displayed. This will help unlock the full potential of the treasure trove that web archives constitute, and thereby also open up for methods from the wider field of digital humanities. Based on a case study of the entire Danish web domain .dk the article discusses methodological challenges involved in combining large extracted datasets from web archives, namely metadata about the size of websites and data about hyperlinks from the same websites. The aim is to answer the following two questions: 1) How to combine two different types of datasets extracted from a web archive, in this case the Danish Netarkivet? 2) What can the result of such a combination teach us about the structural characteristics of the Danish web domain from 2006 to 2015? The article shows that, indeed, it is possible to go beyond the Wayback Machine as the prime interface to web archives by combining two distinct datasets, and that such a venture can provide valuable knowledge about the overall structure of the Danish web domain, thus highlighting that websites of the same size tend to constitute isolated ‘link islands’, and that big websites are also the most important in the hyperlink network, which is more clearly the case in 2015 than in 2006.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Digital humanities and web archives: Possible new paths for combining datasets

Abstract

Talk to us

Similar Papers

More From: International Journal of Digital Humanities

Lead the way for us

Journal: International Journal of Digital Humanities	Publication Date: May 28, 2021
Citations: 3

Similar Papers

Lost in the Infinite Archive: The Promise and Pitfalls of Web Archives
Ian Milligan
International Journal of Humanities and Arts Computing | VOL. 10
Ian MilliganIan Milligan
01 Mar 2016
International Journal of Humanities and Arts Computing | VOL. 10

Web Archive Search as Research: Methodological and Theoretical Implications
Anat Ben-David ... Hugo Huurdeman
Alexandria: The Journal of National and International Library and Information Issues | VOL. 25
Anat Ben-David, et. al.Anat Ben-David ... Hugo Huurdeman
01 Aug 2014
Alexandria: The Journal of National and International Library and Information Issues | VOL. 25

The weaponization of web archives: Data craft and COVID-19 publics
Amelia Acker ... Mitch Chaiet
Harvard Kennedy School Misinformation Review | VOL. 1
Amelia Acker, et. al.Amelia Acker ... Mitch Chaiet
27 Sep 2020
Harvard Kennedy School Misinformation Review | VOL. 1

Citizen web archivists: applying web archiving as a pedagogical tool
Kayla Harris ... Stephanie Shreffler
Journal of Electronic Resources Librarianship | VOL. 33
Kayla Harris, et. al.Kayla Harris ... Stephanie Shreffler
02 Oct 2021
Journal of Electronic Resources Librarianship | VOL. 33

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Digital humanities and web archives: Possible new paths for combining datasets

Abstract

Talk to us

Similar Papers

More From: International Journal of Digital Humanities