A Cluster Tree Method For Text Categorization

Zhaocai Sun,Yunming Ye,Weiru Deng,Zhexue Huang

doi:10.1016/j.proeng.2011.08.709

Zhaocai Sun, Yunming Ye + Show 2 more

Open Access

https://doi.org/10.1016/j.proeng.2011.08.709

Copy DOI

Journal: Procedia Engineering	Publication Date: Jan 1, 2011
Citations: 7	License type: cc-by-nc-nd

Affiliation: Harbin Institute of Technology

Abstract

The decision tree is a flexible and useful classification tool. But on the data with high dimensionality, it meets problems. For most of current decision tree algorithms, when splitting a node of a tree, only the “best” one feature is selected and used. Since more features are ignored, the classification accuracy is not high. To solve the problem, this paper uses a cluster tree for text categorization. Unlike familiar decision trees (e.g. CART, C4.5), clustering results are used as the splitting rule and more features are considered. Obviously, the used clustering algorithm is an very important to the cluster tree. For better performance, a text clustering algorithm is proposed to enhance the cluster tree. Experiments show that the cluster tree solves the high-dimensionality problem and outperforms C4.5 and CART on text data. Sometimes, it may do better than LibSVM, which may be the most powerful tool for text categorization.

Full Text