Abstract
Subgroup discovery (SD) and exceptional model mining (EMM), its generalization to handle more complex targets, are two mature fields at the frontier of data mining and machine learning. More precisely, EMM aims to find coherent subgroups of a dataset where multiple targets interact in an unusual way. Correlation model classes have already been defined to discover interesting subgroups when dealing with two numerical targets. However, in this supervised setting, the two numerical targets are fixed before the subgroup search. To make unsupervised exploration possible, we propose to search for arbitrary subsets of numerical targets whose correlation is exceptional for an automatically found subgroup. This involves solving two challenges: the definition of a model that evaluates the interest of a subgroup for a subset of numerical targets and the definition of a pattern language that enumerates both subgroups and targets and lends itself to effective research strategies. We propose an integrated solution to both challenges. We introduce the problem of rank-correlated subgroup discovery with an arbitrary subset of numerical targets. A rank-correlated subgroup is identified by both conditions on descriptive attributes, whether numeric or nominal, and a pattern on numeric attributes that captures (positive or negative) rank correlations based on a generalization of the Kendall’s τ. We define a new branch-and-bound algorithm that exploits some pruning properties based on two upper-bounds and a closure property. An empirical study on several datasets demonstrates the efficiency and the effectiveness of the algorithm.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.