Abstract

Although an improvement of hierarchical text classification can be achieved by using hierarchical structure information, existing hierarchical text classification methods suffer from a problem, namely error propagation (especially in large-scale deep hierarchy). In this paper, we define the concept of path-based semantic vector for the presentation of categories based on which prior information provided by training set can be employed in a classifier-independent way to reduce and further eliminate classification errors. In particular, we first propose the occurrence probability based strategy for hierarchical text classification which can help limit errors rate efficiently. Cooccurrence probability is then introduced to correct the classification errors occurred on higher levels of the hierarchy. Extensive experiments show that our hierarchical classification strategies perform well on ODP dataset, even on deep levels of the hierarchy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call