Abstract

Text classification is the process of assigning pre-defined category labels to documents based on what a classifications has learned from training examples. This paper investigates the partially supervised classification approach in the medical field. The approaches that have been evaluated include Rocchio, Naïve Bayesian (NB), Spy, Support vector machine (SVM), and Expectation Maximization (EM). A combination of these methods has been conducted. The experimental result showed that the combination which uses EM in step 2 is always produces better results than those uses SVM using small set of training samples. We also found that reducing the features based on tf-tdf values is decreasing the classification performance dramatically. Moreover, reducing the features based on their frequencies improve the classification performance significantly while also increasing efficiency, but it may require some experimentationÂ

Highlights

  • Classification is a form of data analysis that extracts models describing important data classes [10]

  • Comparing the classification performance obtained by ROC, Naïve Bayesian (NB) and Spy using Support vector machine (SVM) method for step two we found ROC achieved the best results in term of accuracy and F-measure regardless the number of training samples used followed by Spy

  • In term on the considerable amount of strong features could be noticed by the classification accuracy obtained by ROC-EM (95.9%), for example, is very competitive to those obtained by ROC-EM and S-EM (95.89% and 95.89% respectively), and by the excellent classification performance in term of F-measure obtained by ROC-EM and S-EM (93.39 % and 91.29 % respectively) are much better that those obtained by the same techniques in Table 8 which are 85.90 and 85.88 respectively

Read more

Summary

Introduction

Classification is a form of data analysis that extracts models describing important data classes [10]. The extracted models are called classifiers which are used to predict categorical class labels. The medical field has recently received great attention regarding the analysis of medical data which is available in an electronic form. The nature of the medical data is either unstructured or semi-structured which make it difficult to be analyzed using traditional data mining techniques. The medical staffs need automatic classification methods to analyze and categorize this huge amount of data. The Gastroenterology unit of a local hospital in UK had just such a problem as they collected electronic reports on thousands of colonoscopy procedures, but could not give answer to simple questions, such as the percentage of successful colonoscopies undertaken [34]. The aim of colonoscopy is to check for medical problems such as bleeding, colon cancer, polyps, colitis, etc. [6]

Objectives
Methods
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.