Abstract

Record linkage is the process of identifying and linking records that refer to the same entities across several databases. In this paper we integrate three historical data sources (Canadian soldiers in the Canadian Expeditionary Force (CEF) who served in World War I, CEF casualties of World War I, and the Canadian census of 1901) to study the Canadian soldiers and casualties of World War I. We link the soldiers dataset to the casualties one to be able to identify the soldiers that died in WWI. In addition, we link the soldiers dataset to the Canadian census of 1901 to enrich the available attributes. The goal is to generate longitudinal data about the Canadian soldiers that would allow researchers to perform a systematic analysis of who lived and who died. The imprecision of historical data, along with the unavailability of expert links and a limited number of attributes make the linkage process a challenging task. We present in this paper methodology to integrate the three data sources and a preliminary analysis of the longitudinal data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call