Abstract
Subgroup Discovery is a supervised, exploratory data mining paradigm that aims to identify subsets of a dataset that show interesting behaviour with respect to some designated target attribute. The way in which such distributional differences are quantified varies with the target attribute type. This work concerns continuous targets, which are important in many practical applications. For such targets, differences are often quantified using z-score and similar measures that compare simple statistics such as the mean and variance of the subset and the data. However, most distributions are not fully determined by their mean and variance alone. As a result, measures of distributional difference solely based on such simple statistics will miss potentially interesting subgroups. This work proposes methods to recognise distributional differences in a much broader sense. To this end, density estimation is performed using histogram and kernel density estimation techniques. In the spirit of Exceptional Model Mining, the proposed methods are extended to deal with multiple continuous target attributes, such that comparisons are not restricted to univariate distributions, but are available for joint distributions of any dimensionality. The methods can be incorporated easily into existing Subgroup Discovery frameworks, so no new frameworks are developed.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have