Abstract

The Dutch Historical Censuses (1795–1971) contain statistics that describe almost two centuries of History in the Netherlands. These censuses were conducted once every 10 years (with some exceptions) from 1795 to 1971. Researchers have used its wealth of demographic, occupational, and housing information to answer fundamental questions in social economic history. However, accessing these data has traditionally been a time consuming and knowledge intensive task. In this paper, we describe the outcomes of the cedar project, which make access to the digitized assets of the Dutch Historical Censuses easier, faster, and more reliable. This is achieved by using the data publishing paradigm of Linked Data from the Semantic Web. We use a digitized sample of 2,288 census tables to produce a linked dataset of more than 6.8 million statistical observations. The dataset is modeled using the rdf Data Cube, Open Annotation, and prov vocabularies. The contributions of representing this dataset as Linked Data are: (1) a uniform database interface for efficient querying of census data; (2) a standardized and reproducible data harmonization workflow; and (3) an augmentation of the dataset through richer connections to related resources on the Web.

Highlights

  • The Dutch historical censuses were conducted 17 times from 1795 until 1971, once every 10 years

  • The data collected in the 1795–1971 period is of special interest to historians and social scientists because of three facts: (1) it is based on counting the whole Dutch population, instead of sampling; (2) it provides an unprecedented level of detail, hardly comparable to modern censuses due to privacy regulations; and (3) the survey microdata from which the aggregations were originally built is almost entirely lost

  • The Dutch historical census dataset is surrounded by a history of its own, where many have devoted life-long efforts in improving the access to the most important collection of historical statistics about the past of the Netherlands

Read more

Summary

Introduction

The Dutch historical censuses were conducted 17 times from 1795 until 1971, once every 10 years. ­digitized as 300,000 scanned images in various projects between the cbs, the iish and several institutes of the Royal Netherlands Academy of Arts and ­Sciences (knaw), such as Data Archiving and Networked Services (dans) and the Netherlands Interdisciplinary Demographic Institute (nidi) These projects have translated part of these scans, by manual input, into more structured formats, resulting in a collection of 507 machine-readable Excel spreadsheets, containing 2,288 census tables.. The cedar rdf Database contains the final database of the harmonized Dutch historical censuses, encoded using the Resource Description Framework (rdf) in two variants: as a complete rdf conversion of the 2,288 tables, partially harmonized, of the 1795–1971 period (the “cedar” variant); and a partial rdf conversion of 140 tables, fully harmonized, of the 1859–1920 period (the “cedar-mini” variant) This database, its variants, and their associated documentation and metadata are available online in many forms, including a sparql endpoint; deposited at the dans archiving system easy and in the form of a website.. Research data journal for the humanities and sociDaolwnslocaideednfcroemsB(ril2l.c0o1m81)1/10-21/24021 02:21:04PM

Problem
Methods
Concluding Remarks
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call