Abstract

Web archives are not direct traces of the web, they are direct traces of crawlers. By design, the structure of web archives limits our capacity to explore the memory of the Web. These structural issues induce temporal discontinuities such as inconsistency, redundancy and blindness. In this paper, we address the question of re-injecting continuity within large corpora of web archives. We introduce the notions of persistences (series of time-stable snapshots of archived web pages) and continuity spaces (networks of time-consistent persistences). We demonstrate how – on the basis of a quality score – persistences can be used to select subsets of web archives within which in-depth historical analysis can be conducted at scale. We next propose to make use of a new visualization approach called web cernes to reconstruct the multi-level temporal evolution of an archived community of web sites. We finally apply our framework to study the history of the firsttuesday movement: a constellation of entrepreneurial web sites that acted in the interest of the economical growth of the web in the early 2000s.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.