Abstract
ObjectiveAn innovative large-scale automated method has been developed to produce de-identified linkable data. The objective is to create a wide pool of ready-to-use data to enable faster and wider collaborative analysis for the public good. ApproachA configurable automated pipeline prepares data for onward linkage at location, person, business and classification level through: Big data profiling pre- and post-processing, for overview of variables and characteristics Flagging potentially sensitive/identifiable variables Generalisable linkage methods for large-scale data, to enable the addition of unique IDs for onward linkage of de-identified data De-identification, hashing and redaction mechanisms, to remove and/or obscure sensitive/identifiable variables Automated production of metadata, capturing linkage quality and transformations across the data journey Quality assurance checks, including measure of linkage quality, assurance of variable derivations and redactions, and consistency checks on remaining data. ResultsThe pipeline enables a configurable automated approach to producing de-identified, linkable, ready-to-use data in a traceable and fully documented manner. To achieve this we have: overcome scalability issues in working with big data; implemented automation at various levels; and enabled standardisation across data types and data structures to deliver a consistent recognisable final product. Conclusions and ImplicationsBuilding an automated pipeline to enable onward linkage of de-identified administrative data was a complex process that has resulted in positive change around how our organisation operates and distributes data. This represents an important step towards future integration of linkage within the platform and the basis of future innovation in the area.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.