Abstract

The consequences of the choice of a hierarchy in hierarchical multi-label classification (HMLC) have previously not been considered in any detail. Three hierarchy-related factors in HMLC are examined here: hierarchy structure, class location in the hierarchy, and class distribution in feature space. Four general model groups are found to exist in HMLC modeling: "non-informative”, "semi-informative”, "comparable”, and "hierarchical”. Studies of synthetic and real data show that the choice of hierarchy used in the modeling is important in setting the relative error rates of false positives and false negatives. The choice of hierarchy depends upon the relative consequences of false positive and false negative errors produced by the resulting model. A low false negative error rate results from use of a "comparable” HMLC model with a hierarchy designed to maximize intergroup separation. A low false positive error rate results from use of a "hierarchical” HMLC model using any hierarchy. Modest differences in accuracy and F1 measure occur between the best-performing HMLC models built on several external and internal hierarchies for a complex, multiclass dataset. HMLC methods using "comparable” and "hierarchical” HMLC models and phylogenetic hierarchies examined slightly outperform a conventional classification using the same classifier on the Dalbergia data. BriefStudies of synthetic and real data show that the structure of a multi-label hierarchy in hierarchical, multi-label classification (HMLC) is important in setting the relative error rate of false positives and false negatives. A low false negative error rate results from use of a "comparable” HMLC model with a hierarchy designed to maximize inter-class separation. A low false positive error rate results from use of a "hierarchical” HMLC model with any hierarchy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call