Abstract

In the area of Linked Open Data (LOD) meaningful and high-performance interlinking of different datasets has become an ongoing challenge. Necessary tasks are supported by established standards and software, e.g. for the transformation, storage, interlinking and publication of data. Our use case Swissbib is a well-known provider for bibliographic data in Switzerland representing various libraries and library networks. In this article, a case study is presented from the project linked.swissbib.ch which focuses on the preparation and publication of the Swissbib data by means of LOD. Data available in Marc21 XML is extracted from the Swissbib system and transformed into an RDF/XML representation. From the approx. 21 million monolithic records the author information is extracted and interlinked with authority files from the Virtual International Authority File (VIAF) and DBpedia. The links are used to extract additional data from the counterpart corpora. Afterwards, data is pushed into an Elasticsearch index to make the data accessible for other components. As a demonstrator, a search portal is developed which presents the additional data and the generated links to users. In addition to that, a REST interface is developed in order to enable also access by other applications. A main obstacle in this project is the amount of data and the necessity of day-to-day (partial) updates. In the current situation the data in Swissbib and in the external corpora are too large to be processed by established linking tools. The arising memory footprint prevents the correct functioning of these tools. Also triple stores are unhandy by revealing a massive overhead for import and update operations. Hence, we have developed procedures for extracting and shaping the data into a more suitable form, e.g. data is reduced to the necessary properties and blocked. For this purpose, we used sorted N-Triples as an intermediate data format. This method proved to be very promising as our preliminary results show. Our approach could establish 30,773 links to DBpedia and 20,714 links to VIAF and both link sets show high precision values and could be generated in reasonable expenditures of time.

Highlights

  • Linked Open Data (LOD) have been an issue for several years and organizations from all over the world are making their data available to the public by means of LOD

  • We focus on linking the person data with the Virtual International Authority File (VIAF) and DBpedia

  • We work with the RDF dumps offered by DBpedia and VIAF that we use to enrich our data with links and to include some of the information we identify when linking to them

Read more

Summary

Introduction

Linked Open Data (LOD) have been an issue for several years and organizations from all over the world are making their data available to the public by means of LOD. This issue has come to certain importance within libraries (Pohl, 2010; Baker et al, 2011) and other cultural heritage. Interlinking Large-Scale Library Data institutions (Mayr et al, 2016). In 2014, the LOD cloud showed 570 interlinked corpora. Smith-Yoshimura published a paper on a survey where she analyzed project activities of 112 linked data projects (Smith-Yoshimura, 2016). The analysis emphasizes some of the recurring issues library projects face

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.