Subgroup discovery in data sets with multi-dimensional responses

Lan Umek,Blaz Zupan

doi:10.3233/ida-2011-0481

Abstract

Most of the present subgroup discovery approaches aim at finding subsets of attribute-value data with unusual distribution of a single output variable. In general, real-life problems may be described with richer, multi-dimensional descriptions of the outcome. The discovery task in such domains is to find subsets of data instances with similar outcome description that are separable from the rest of the instances in the input space. We have developed a technique that directly addresses this problem and uses a combination of agglomerative clustering to find subgroup candidates in the space of output attributes, and predictive modeling to score and describe these candidates in the input attribute space. Experiments with the proposed method on a set of synthetic and on a real social survey data set demonstrate its ability to discover relevant and interesting subgroups from the data with multi-dimensional fesponses.

Full Text