Learning Boolean concepts in the presence of many irrelevant features

Hussein Almuallim,Thomas G Dietterich

doi:10.1016/0004-3702(94)90084-1

Abstract

In many domains, an appropriate inductive bias is the MIN-FEATURES bias, which prefers consistent hypotheses definable over as few features as possible. This paper defines and studies this bias in Boolean domains. First, it is shown that any learning algorithm implementing the MIN-FEATURES bias requires ⊖(( ln( l δ ) + [2 p + p ln n])/ε) training examples to guarantee PAC-learning a concept having p relevant features out of n available features. This bound is only logarithmic in the number of irrelevant features. For implementing the MIN-FEATURES bias, the paper presents five algorithms that identify a subset of features sufficient to construct a hypothesis consistent with the training examples. FOCUS-1 is a straightforward algorithm that returns a minimal and sufficient subset of features in quasi-polynomial time. FOCUS-2 does the same task as FOCUS-1 but is empirically shown to be substantially faster than FOCUS-1. Finally, the Simple-Greedy, Mutual-Information-Greedy and Weighted-Greedy algorithms are three greedy heuristics that trade optimality for computational efficiency. Experimental studies are presented that compare these exact and approximate algorithms to two well-known algorithms, ID3 and FRINGE, in learning situations where many irrelevant features are present. These experiments show that—contrary to expectations—the ID3 and FRINGE algorithms do not implement good approximations of MIN-FEATURES. The sample complexity and generalization performance of the FOCUS algorithms is substantially better than either ID3 or FRINGE on learning problems where the MIN-FEATURES bias is appropriate. These experiments also show that, among our three heuristics, the Weighted-Greedy algorithm provides an excellent approximation to the FOCUS algorithms.

Full Text