
AbstractVast amounts of human medical documents contain rich knowledge that can be used to facilitate a broad range of medical research and clinical study. One important application is to automatically categorize medical documents into specific categories. However, those medical documents usually contain names and identities of patients and doctors that are not allowed to be disclosed due to patient privacy and regulation issues concerning medical data. In this article, we address two issues, automatic name entity detection, and automatic classification of medical reports. We present a name entity recognition system, MD_NER_NCL, and a text document classification system, C_IME_RPT for medical report processing and categorization. The MD_NER_NCL contains an innovative segmentation algorithm, called HBE segmentation, that segments a medical text document into the Heading, Body and Ending parts, and a statistical reasoning process that utilizes knowledge of three entity lists: people name prefix list, people name suffix list, and false positive prefix list. The C_IME_RPT is developed based on Self Organizing Maps (SOM) and a machine learning process. Both systems have been evaluated using Independent Medical Examination (IME) reports provided by medical professionals. The proposed system MD_NER_NCL made a significant improvement over the well-known text analysis software, OpenNLP, for people name entity detection. The C_IME_RPT system attained a 89.9% classification accuracy, which is very good in clinical record classification. We also present an in-depth empirical study on the effectiveness of parameters associated with the SOM learning process and text mining, and their effects on classification results.KeywordsVector Space ModelName Entity RecognitionEntity RecognitionMedical DocumentTraining DocumentThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call