Abstract

Large-scale classification of structured data where classes are organized in a hierarchical structure is an important area of research. Top-down approaches that leverage the hierarchy during the learning and prediction phase are efficient for solving large-scale hierarchical classification. However, accuracy of top-down approaches is poor due to error propagation, i.e., prediction errors made at higher levels in the hierarchy cannot be corrected at lower levels. One of the main reasons behind errors at the higher levels is the presence of inconsistent nodes that are introduced due to the arbitrary process of creating these hierarchies by domain experts. In this paper, we propose two different data-driven approaches (local and global) for hierarchical structure modification that identifies and flattens inconsistent nodes present in the hierarchy. Our extensive empirical evaluation of the proposed approaches on several image and text datasets with varying distribution of features, classes and training instances per class shows improved classification performance over competing hierarchical modification approaches. Specifically, we see an improvement up to 7 $$\%$$ in Macro-F1 score with our best approach over best top-down baseline. We also show that our modified hierarchy leads to improved classification performance for classes with few training samples in comparison with flat approach. Further, our experimental evaluation shows that combining the prediction from flat and hierarchical model in an ensemble setting results in better classification performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.