Efficient algorithms for finding optimal binary features in numeric and nominal labeled data

Michael Mampaey,Ad Feelders,Arno Knobbe,Rob Konijn,Siegfried Nijssen

doi:10.1007/s10115-013-0714-y

Abstract

An important subproblem in supervised tasks such as decision tree induction and subgroup discovery is finding an interesting binary feature (such as a node split or a subgroup refinement) based on a numeric or nominal attribute, with respect to some discrete or continuous target variable. Often one is faced with a trade-off between the expressiveness of such features on the one hand and the ability to efficiently traverse the feature search space on the other hand. In this article, we present efficient algorithms to mine binary features that optimize a given convex quality measure. For numeric attributes, we propose an algorithm that finds an optimal interval, whereas for nominal attributes, we give an algorithm that finds an optimal value set. By restricting the search to features that lie on a convex hull in a coverage space, we can significantly reduce computation time. We present some general theoretical results on the cardinality of convex hulls in coverage spaces of arbitrary dimensions and perform a complexity analysis of our algorithms. In the important case of a binary target, we show that these algorithms have linear runtime in the number of examples. We further provide algorithms for additive quality measures, which have linear runtime regardless of the target type. Additive measures are particularly relevant to feature discovery in subgroup discovery. Our algorithms are shown to perform well through experimentation and furthermore provide additional expressive power leading to higher-quality results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient algorithms for finding optimal binary features in numeric and nominal labeled data

Abstract

Talk to us

Similar Papers

More From: Knowledge and Information Systems

Lead the way for us

Journal: Knowledge and Information Systems	Publication Date: Dec 29, 2013
Citations: 8

Similar Papers

Efficient Algorithms for Finding Richer Subgroup Descriptions in Numeric and Nominal Data
Michael Mampaey ... Ad Feelders
-
Michael Mampaey, et. al.Michael Mampaey ... Ad Feelders
01 Dec 2012
01 Dec 2012

IMPACT Act Boosts Data Collection, Staffing Quality
Carey Cowles
Caring for the Ages | VOL. 15
Carey CowlesCarey Cowles
01 Nov 2014
Caring for the Ages | VOL. 15

Flexibly Mining Better Subgroups
Hoang-Vu Nguyen ... Jilles Vreeken
-
Hoang-Vu Nguyen, et. al.Hoang-Vu Nguyen ... Jilles Vreeken
30 Jun 2016
30 Jun 2016

Fast Augmentation Algorithms for Maximising the Flow in Repairable Flow Networks After a Component Failure
Michael T Todinov
-
Michael T TodinovMichael T Todinov
01 Aug 2011
01 Aug 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient algorithms for finding optimal binary features in numeric and nominal labeled data

Abstract

Talk to us

Similar Papers

More From: Knowledge and Information Systems