Abstract

In recent years, sharing electronic medical records (EMRs) for more researchers outside the associated institutions is significant. For privacy preservation of the corresponding patients and the associated institutions, a de-identification task on the EMRs to be shared is a must. Although the deidentification task has been considered with positive research outcomes worldwide, especially those from the i2b2 (Informatics for Integrating Biology and the Bedside) shared tasks in 2006 and 2014, the task has not yet been a solved problem and still needs more investigation realistically. In this paper, we propose an automatic de-identification solution in a multilevel hybrid semi-supervised learning paradigm with a key focus on correctly identifying protected health information (PHI) in the EMRs. Similar to the existing works, our work defines a hybrid approach by combining a machine learning-based method with a conditional random fields model and a rule-based method in a post-processing phase to handle the PHI types with disambiguity. Nevertheless, our work is more general and practical. First, it considers the structure complexity of each EMR so that each section can be treated properly for more correct PHI identification up to its structure complexity: structured, semi-structured, or un-structured. Second, each EMR is then examined in our approach at three different levels of granularity such as a token level in the supervised learning phase, an entity level in the rule-based post-processing phase, and a section level along with the structure complexity in the semi-supervised learning phase. Many various detail levels will give our approach a deeper look at each EMR for more effectiveness. Third, our solution is conducted in a self-training manner so that it can get started with a small annotated data set in practice and get more effective with new EMRs over time. Evaluated with the i2b2 data set in comparison with the related works, our solution is effective with better F-measure values for the AGE, LOCATION, and PHONE PHI types and comparable for the other PHI types.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.