Abstract

There is an abundance of semi-structured reports on events being written and made available on the World Wide Web on a daily basis. These reports are primarily meant for human use. A recent movement is the addition of RDF metadata to make automatic processing by computers easier. A fine example of this movement is the open government data initiative which, by representing data from spreadsheets and textual reports in RDF, strives to speed up the creation of geographical mashups and visual analytic applications. In this paper, we present a newly linked dataset and the method we used to automatically translate semi-structured reports on the Web to an RDF event model. We demonstrate how the semantic representation layer makes it possible to easily analyze and visualize the aggregated reports to answer domain questions through a SPARQL client for the R statistical programming language. We showcase our method on piracy attack reports issued by the International Chamber of Commerce (ICC-CCS). Our pipeline includes conversion of the reports to RDF, linking their parts to external resources from the linked open data cloud and exposing them to the Web.

Highlights

  • Governmental and commercial organisations collect a wealth of information; from census to trade data and from pollution to crime

  • We first present a new dataset on the Web of Data, linked open piracy (LOP) describing maritime piracy events and detail its construction

  • We expose descriptions of piracy attacks at sea published to the Web by the International Chamber of Commerce’s International Maritime Bureau (ICC-CCS IMB) and the US National Geospatial-Intelligence Agency (NGA)2 as Linked Data RDF.3

Read more

Summary

Introduction

Governmental and commercial organisations collect a wealth of information; from census to trade data and from pollution to crime. The piracy reports are, similar to most open government data that is for example processed into http://www.data.gov, published in a human readable format.. The format and type of publication of the IMB piracy reports (following a given pattern for year of publication, daily updated to the web page) make it an ideal test case for automatic RDF event extraction; the topic of the reports is of contemporary socio-economic concern [3] and is related to research questions that go beyond what classic data mining can answer. The added benefit of using SEM as a model for Open Government Data is evaluated by answering complex domain questions derived from authorities in the domain of piracy analysis, UNITAR UNOSAT and the ICC-CCS IMB. A copy of the code discussed can be found online at http://www. few.vu.nl/~wrvhage/LOP/LOP_code_JoDS.zip

ICC-CCS IMB Website
NGA WTS Reports
Mappings
Data Preparation
Results
Weapons Analysis
The R SPARQL Package
Rebuilding UNOSAT Reports
Visualizing IMB Highlights
Additional Questions
Related Work
Conclusions and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call