Evaluation of a Binary Semi-supervised Classification Technique for Probabilistic Record Linkage.

J Stausberg,D Nasseh

doi:10.3414/me14-01-0087

Abstract

The process of merging data of different data sources is referred to as record linkage. A medical environment with increased preconditions on privacy protection demands the transformation of clear-text attributes like first name or date of birth into one-way encrypted pseudonyms. When performing an automated or privacy preserving record linkage there might be the need of a binary classification deciding whether two records should be classified as the same entity. The classification is the final of the four main phases of the record linkage process: Preprocessing, indexing, matching and classification. The choice of binary classification techniques in dependence of project specifications in particular data quality has not extensively been studied yet. The aim of this work is the introduction and evaluation of an automatable semi-supervised binary classification system applied within the field of record linkage capable of competing or even surpassing advanced automated techniques of the domain of unsupervised classification. This work describes the rationale leading to the model and the final implementation of an automatable semi-supervised binary classification system and the comparison of its classification performance to an advanced active learning approach out of the domain of unsupervised learning. The performance of both systems has been measured on a broad variety of artificial test sets (n = 400), based on real patient data, with distinct and unique characteristics. While the classification performance for both methods measured as F-measure was relatively close on test sets with maximum defined data quality, 0.996 for semi-supervised classification, 0.993 for unsupervised classification, it incrementally diverged for test sets of worse data quality dropping to 0.964 for semi-supervised classification and 0.803 for unsupervised classification. Aside from supplying a viable model for semi-supervised classification for automated probabilistic record linkage, the tests conducted on a large amount of test sets suggest that semi-supervised techniques might generally be capable of outperforming unsupervised techniques especially on data with lower levels of data quality.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Evaluation of a Binary Semi-supervised Classification Technique for Probabilistic Record Linkage.

Abstract

Talk to us

Similar Papers

More From: Methods of Information in Medicine

Lead the way for us

Journal: Methods of Information in Medicine	Publication Date: Jan 1, 2016
Citations: 3

Similar Papers

Implementing privacy-preserving record linkage: welcome to the real world
James Boyd ... Sean Randall
International Journal of Population Data Science | VOL. 1
James Boyd, et. al.James Boyd ... Sean Randall
18 Apr 2017
International Journal of Population Data Science | VOL. 1

Optimization of the Mainzelliste software for fast privacy-preserving record linkage
Florens Rohde ... Martin Franke
Journal of Translational Medicine | VOL. 19
Florens Rohde, et. al.Florens Rohde ... Martin Franke
15 Jan 2021
Journal of Translational Medicine | VOL. 19

Scalable and approximate privacy-preserving record linkage

-

09 Dec 2014
09 Dec 2014

Validating a novel deterministic privacy-preserving record linkage between administrative & clinical data: applications in stroke research.
Alisia Southwell ... Richard Swartz
International journal of population data science | VOL. 7
Alisia Southwell, et. al.Alisia Southwell ... Richard Swartz
23 Nov 2022
International journal of population data science | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluation of a Binary Semi-supervised Classification Technique for Probabilistic Record Linkage.

Abstract

Talk to us

Similar Papers

More From: Methods of Information in Medicine