Abstract

IntroductionThe Tasmanian Data Linkage Unit (TDLU) undertook a complex data linkage project in 2019 linking public and private pathology data to five disparate health datasets. Having linked pathology data previously, the unit was aware of the challenges it faced linking a large dataset covering a fourteen-year time span. The aim of this study was to use data-linkage to develop a Tasmanian dataset to quantify the burden and distribution of chronic kidney disease, including identifying barriers to dialysis treatment services.
 Objectives and ApproachA cohort was selected from public and private providers of pathology services in Tasmania from 2004-2017 to support the establishment of a comprehensive researchable dataset. A linkage plan was developed that included detailed processes for cleaning and de-duplicating the pathology data prior to linkage. The larger private pathology dataset comprised 3.9 million records and data cleaning strategies were implemented. De-duplication created extensive clerical review and methods to reduce this work were devised and implemented as part of the linkage process.
 ResultsDe-duplication based on exact matches reduced the size of the dataset from 3.9 million to just over 520,000 individuals. Internal linkage of the dataset resulted in approximately 47,000 ‘groups’ eligible for review. Structured Query Language (SQL) queries were constructed and the number of groups eligible for review decreased by 42%. Further analysis was conducted, which resulted in an appropriate ‘cut off’ threshold being determined for clerical review and an estimate of false positive links remaining was calculated.
 Conclusion / ImplicationsMethods of reducing the amount of manual clerical review can be incorporated into a linkage design when there is a thorough understanding of the characteristics and content of the dataset to be linked. The methods used for this linkage project will be utilised for future projects using pathology data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call