Abstract
Hierarchical classification has been becoming a popular research topic nowadays, particularly on the web as text categorization. For a large web corpus, there can be a hierarchy with hundreds of thousands of topics, so it is common to handle this task using a flat classification approach, inducing a binary classifier only for the leaf-node classes. However, it always suffers from such low prediction accuracy due to an imbalanced issue in the training data. In this paper, we propose two novel strategies: (i) “Top-Level Pruning” to narrow down the candidate classes, and (ii) “Exclusive Top-Level Training Policy” to build more effective classifiers by utilizing the top-level data. The experiments on the Wikipedia dataset show that our system outperforms the traditional flat approach unanimously on all hierarchical classification metrics.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.