A feature selection for Korean web document clustering

Heum Park Heum Park,Young-Gi Kim Young-Gi Kim,Hyuk-Chul Kwon Hyuk-Chul Kwon

doi:10.1109/iecon.2004.1432224

Abstract

This paper is a comparative study of feature selection methods for Korean Web documents clustering. First, we focused on how the term feature and the co-link of Web documents affect clustering performance. We clustered Web documents by native term feature, co-link and both, and compared the output results with the originally allocated category. And we selected term features for each category using X/sup 2/, information gain (IG), and mutual information (MI) from training documents, and applied these features to other experimental documents. In addition we suggested a new method named max feature selection, which selects terms that have the maximum count for a category in each experimental document, and applied X/sup 2/ (or MI or IG) values to each term instead of term frequency of documents, and clustered them. In the results, X/sup 2/ shows a better performance than IG or MI, but the difference appears to be slight. But when we applied the max feature selection method, the clustering performance improved notably. Max feature selection is a simple but effective means of feature space reduction and shows powerful performance for Korean Web document clustering.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A feature selection for Korean web document clustering

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

한글 웹 문서 클러스터링 성능향상을 위한 자질선정 기법 비교 연구
Young-Gi Kim
Journal of the Korean Society for Library and Information Science | VOL. 39
Young-Gi KimYoung-Gi Kim
01 Mar 2005
Journal of the Korean Society for Library and Information Science | VOL. 39

Filtering Methods for Feature Selection in Web-Document Clustering
Heum Park ... Hyuk-Chul Kwon
-
Heum Park, et. al.Heum Park ... Hyuk-Chul Kwon
01 Jan 2007
01 Jan 2007

Research on Feature Selection and kNN Classification Method in Chinese Text Classification
Chao Xiao ... Ping Wu
-
Chao Xiao, et. al.Chao Xiao ... Ping Wu
01 Jan 2015
01 Jan 2015

A Category Resolve Power-Based Feature Selection Method
Yan Xu
Journal of Software | VOL. 19
Yan XuYan Xu
30 Jun 2008
Journal of Software | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A feature selection for Korean web document clustering

Abstract

Talk to us

Similar Papers