Exploiting Multimedia in Creating and Analysing Multimedia Web Archives

Jonathon Hare,Kirk Martinez,Wendy Hall,Paul Lewis,David Dupplaw

doi:10.3390/fi6020242

Abstract

The data contained on the web and the social web are inherently multimedia and consist of a mixture of textual, visual and audio modalities. Community memories embodied on the web and social web contain a rich mixture of data from these modalities. In many ways, the web is the greatest resource ever created by human-kind. However, due to the dynamic and distributed nature of the web, its content changes, appears and disappears on a daily basis. Web archiving provides a way of capturing snapshots of (parts of) the web for preservation and future analysis. This paper provides an overview of techniques we have developed within the context of the EU funded ARCOMEM (ARchiving COmmunity MEMories) project to allow multimedia web content to be leveraged during the archival process and for post-archival analysis. Through a set of use cases, we explore several practical applications of multimedia analytics within the realm of web archiving, web archive analysis and multimedia data on the web in general.

Highlights

Community memories embodied on the web and within social media are inherently multimedia in nature and contain data and information in many different auditory, visual and textual forms
Multimedia analysis is typically very computationally expensive, especially compared to simple text analysis. This presents many challenges when working with multimedia web archives, especially if we want to integrate multimedia analysis into the decision making that happens inside an adaptive crawl
To deal with the problem of computational complexity in the ARCOMEM crawlers, the real-time decision making required for dynamically controlling the crawl is performed by fast text analysis techniques, which determine whether the outlinks of a document should be followed based on the relevance of the document to a crawl specification

Summary

Introduction

Community memories embodied on the web and within social media are inherently multimedia in nature and contain data and information in many different auditory, visual and textual forms. Multimedia analysis is typically very computationally expensive, especially compared to simple text analysis This presents many challenges when working with multimedia web archives, especially if we want to integrate multimedia analysis into the decision making that happens inside an adaptive crawl. Within the EU funded ARCOMEM (ARchiving COmmunity MEMories) project [1], we have developed adaptive web and social-web crawlers that produce highly focussed web archives relating to specific entities (people, places, and organisations), topics, opinions and events (ETOEs) [2]. To deal with the problem of computational complexity in the ARCOMEM crawlers, the real-time decision making required for dynamically controlling the crawl is performed by fast text analysis techniques, which determine whether the outlinks of a document should be followed based on the relevance of the document to a crawl specification (itself consisting of sets of ETOEs). We conclude with some remarks about the future outlook of our research on multimedia analysis with respect to web crawling and web analytics

Current Trends in Multimedia Analysis

The ARCOMEM Approach

Intelligently Harvesting and Sampling the Web

Use Cases for Multimedia Analysis in Archiving Community Memories

Aggregating Social Commentary

Measuring the Temporal Pulse of Social Multimedia

Recognising Social Events in Social Media Streams

Recognising People

Recognising Organisations

Recognising Places

Measuring Opinion and Sentiment

Conclusions and Outlook

Findings

ARCOMEM

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Future Internet	Publication Date: Apr 24, 2014
Citations: 45	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Exploiting Multimedia in Creating and Analysing Multimedia Web Archives

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Future Internet

Lead the way for us

Similar Papers

Turning Pure Web Page Storages into Living Web Archives
...
-
, et. al. ...
01 Jan 2009
01 Jan 2009

Exploiting the Social and Semantic Web for Guided Web Archiving
Thomas Risse ... Katerina Doka
-
Thomas Risse, et. al.Thomas Risse ... Katerina Doka
01 Jan 2012
01 Jan 2012

Políticas E Tecnologias De Preservação Digital No Arquivamento Da Web
Moises Rockembach ... Caterina Marta Groposo Pavão
Revista Ibero-Americana de Ciência da Informação | VOL. 11
Moises Rockembach, et. al.Moises Rockembach ... Caterina Marta Groposo Pavão
01 Feb 2018
Revista Ibero-Americana de Ciência da Informação | VOL. 11

Web Content Management Systems Archivability
Vangelis Banos ... Yannis Manolopoulos
-
Vangelis Banos, et. al.Vangelis Banos ... Yannis Manolopoulos
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploiting Multimedia in Creating and Analysing Multimedia Web Archives

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Future Internet