Abstract

ObjectivesTo determine the risk of misidentification when using a “Hidden In Plain Sight (HIPS)” Named Entity Recognition (NER) de-identification methodology applied to Scottish healthcare data within The Industrial Centre for Artificial Intelligence Research in Digital Diagnostics (iCAIRD) Safe Haven Artificial Intelligence Platform (SHAIP).
 ApproachRather than the traditional redaction of potential identifiable information in routinely collected healthcare data, our HIPS methodology utilises an NER “find and replace” approach to de-identification that keeps the structure of text intact. This ensures that context is maintained, key to the interpretation of free text information and potential Artificial Intelligence applications.
 To our knowledge these methods have been previously untested on Scottish healthcare data. We therefore performed assessment of this approach in terms of potential risk of misidentification using HIPS on structured Scottish data deployed in SHAIP as part of the iCAIRD programme.
 ResultsFive individual cohorts, with a total of 169,964 patients were included. For each cohort the HIPS approach was applied, and then compared to actual patient information from within the same region, in order to determine the risk of misidentification. The following fields were included: Forename, Surname, Previous Name, Gender, Date of Birth (DOB), and Postcode.
 Across the five cohorts and varying combinations of identifiable data fields there were a total of 94 instances of potential misidentification (0.06%). 85/94 (90.4%) of these were for the combination of Gender, Date of Birth and Postcode. Across the five cohorts there were only 3 instances (0.002%) of Forename/Surname/DOB, and 5 instances (0.003%) of Forename/Surname/Postcode potential misidentification amongst the 169,964 patients.
 ConclusionsThe iCAIRD NER HIPS Methodology provides an acceptably low misidentification rate. Further work is now required to determine the recall and precision rates. Benefits of this approach include retaining the structure of free text, as well as reducing the ability to detect any potential leaked identifiable data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.