Improving classification models with taxonomy information

Luca Cagliero,Paolo Garza

doi:10.1016/j.datak.2013.01.005

Abstract

Classification is an established data mining problem that has largely been investigated by the research community. Since the raw data is commonly unsuitable for training a classifier as it is, several preprocessing steps are commonly integrated in the data mining and knowledge discovery process before applying classification.This paper investigates the usefulness of integrating taxonomy information into classifier construction. In particular, it presents a general-purpose strategy to improve structured data classification accuracy by enriching data with semantics-based knowledge provided by a taxonomy (i.e., a set of is-a hierarchies) built over data items. The proposed approach may be deemed particularly useful by experts who could directly access or easily infer meaningful taxonomy models over the analyzed data. To demonstrate the benefit obtained from utilizing taxonomies for contemporary classification methods, we also presented a generalized version of a state-of-the-art associative classifier, which also includes generalized (high level) rules in the classification model.Experiments show the effectiveness of the proposed approach in improving the accuracy of state-of-art classifiers, associative and not.

Full Text