Abstract

Greedy search algorithms are a practical approach to approximating the solution of optimal subset selection problems, such as selecting the optimum model inputs from a set of candidate features. When a submodularity property holds for the feature selection metric, an efficient ‘lazy’ implementation can be employed. Recently, the authors have shown that a lazy implementation of Forward Selection Component Analysis (Lazy FSCA) yields comparable performance to FSCA, even though its selection metric (variance explained) is not submodular. In this paper we empirically investigate the sensitivity of Lazy FSCA to submodularity violations and provide a comparative assessment of a number of theoretical greedy search performance bounds that apply to FSCA. Using a diverse range of real world datasets and extensive Monte Carlo simulations we show that even though violations in submodularity can be frequent, the impact on variance explained tends to be minimal. This is true even when there are large deviations in the sequence of selected variables, which can arise with highly correlated datasets. In addition, the available performance bounds are shown to be very conservative and a poor reflection of the true performance of FSCA/Lazy FSCA.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call