Abstract

This paper focuses on a new type of taxonomy called supervised taxonomy (ST). Supervised taxonomies are generated considering background information concerning class labels in addition to distance metrics, and are capable of capturing class-uniform regions in a dataset. A hierarchical, agglomerative clustering algorithm, called STAXAC that generates STs is proposed and its properties are analyzed. Experimental results are presented that show that STAXAC produces purer taxonomies than the neighbor-joining (NJ) algorithm—a very popular taxonomy generation algorithm. We introduced novel measures and algorithms that assess classification complexity, class modality, and show that STs can be used as the main input of an effective data-editing tool to enhance the accuracy of k-nearest neighbor classifiers. We demonstrated in our experimental evaluation that assessing the classification complexity of a ST provides us with a good estimate of the difficulty of the classification problem at hand. Moreover, a class modality discovery tool (CMD) has been provided that—based on a domain expert's notion of what constitutes a “note-worthy” subclass—determines if specific classes in the dataset are zero-modal, unimodal, and multi-modal.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call