Abstract

Record linkage is the process of identifying records corresponding to unique entities across datasets. Linking historical data allows researchers to better characterize topics like population mobility, impacts of local/national events, and generational changes. Most record linkage algorithms rely on string similarities (e.g. edit distance of name); however sometimes we expect to see changes not captured by standard text similarity metrics (e.g. name changes after marriage). The recently available Ireland 1901, 1911 national census records have limited, non-standardized fields containing the typical errors associated with digitizing and formatting hand-written records. These issues, coupled with high frequencies of common names, are part of the reasons traditional methods struggle. These methods often only consider pairwise information without incorporating household or relationship information across records (e.g. parents, siblings). However, the original census records correspond to households which allows us to explore incorporating additional structure into traditional record linkage methods. In this paper, we describe an initial labeling procedure for a subset of County Carlow, Ireland and compare approaches for including household information into both supervised and unsupervised record linkage techniques.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.