Automatic Detecting Documents Containing Personal Health Information

Yunli Wang,Matthew S Keays,Hongyu Liu,Liqiang Geng,Yonghua You

doi:10.1007/978-3-642-02976-9_46

Abstract

With the increasing usage of computers and Internet, personal health information (PHI) is distributed across multiple institutes and often scattered on multiple devices and stored in diverse formats. Non-traditional medical records such as emails and e-documents containing PHI are in a high risk of privacy leakage. We are facing the challenges of locating and managing PHI in the distributed environment. The goal of this study is to classify electronic documents into PHI and non-PHI. A supervised machine learning method was used for this text categorization task. Three classifiers: SVM, decision tree and Naive Bayesian were used and tested on three data sets. Lexical, semantic and syntactic features and their combinations were compared in terms of their effectiveness of classifying PHI documents. The results show that combining semantic and/or syntactic with lexical features is more effective than lexical features alone for PHI classification. The supervised machine learning method is effective in classifying documents into PHI and non-PHI.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automatic Detecting Documents Containing Personal Health Information

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

The inadvertent disclosure of personal health information through peer-to-peer file sharing programs
Khaled El Emam ... Elizabeth Jonker
Journal of the American Medical Informatics Association | VOL. 17
Khaled El Emam, et. al.Khaled El Emam ... Elizabeth Jonker
01 Mar 2010
Journal of the American Medical Informatics Association | VOL. 17

A study of the effectiveness of machine learning methods for classification of clinical interview fragments into a large number of categories
Mehedi Hasan ... Kathryn Brogan Hartlieb
Journal of Biomedical Informatics | VOL. 62
Mehedi Hasan, et. al.Mehedi Hasan ... Kathryn Brogan Hartlieb
13 May 2016
Journal of Biomedical Informatics | VOL. 62

Legal issues pertaining to the collection of sociodemographic data in emergency departments.
Haley Hrymak ... Murdoch Leeies
Academic Emergency Medicine | VOL. 30
Haley Hrymak, et. al.Haley Hrymak ... Murdoch Leeies
22 Mar 2023
Academic Emergency Medicine | VOL. 30

From Hippocrates to HIPAA: Privacy and confidentiality in Emergency Medicine—Part I: Conceptual, moral, and legal foundations
John C Moskop ... Arthur R Derse
Annals of Emergency Medicine | VOL. 45
John C Moskop, et. al.John C Moskop ... Arthur R Derse
01 Dec 2004
Annals of Emergency Medicine | VOL. 45

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic Detecting Documents Containing Personal Health Information

Abstract

Talk to us

Similar Papers