Abstract

The decision tree is a flexible and useful classification tool. But on the data with high dimensionality, it meets problems. For most of current decision tree algorithms, when splitting a node of a tree, only the “best” one feature is selected and used. Since more features are ignored, the classification accuracy is not high. To solve the problem, this paper uses a cluster tree for text categorization. Unlike familiar decision trees (e.g. CART, C4.5), clustering results are used as the splitting rule and more features are considered. Obviously, the used clustering algorithm is an very important to the cluster tree. For better performance, a text clustering algorithm is proposed to enhance the cluster tree. Experiments show that the cluster tree solves the high-dimensionality problem and outperforms C4.5 and CART on text data. Sometimes, it may do better than LibSVM, which may be the most powerful tool for text categorization.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.