Lazy FSCA for unsupervised variable selection

Federico Zocco,Marco Maggipinto,Gian Antonio Susto,Seán Mcloone

doi:10.1016/j.engappai.2023.106624

Abstract

Various unsupervised greedy selection methods have been proposed as computationally tractable approximations to the NP-hard subset selection problem. These methods rely on sequentially selecting the variables that best improve performance with respect to a selection criterion. Theoretical results exist that provide performance bounds and enable ‘lazy greedy’ efficient implementations for selection criteria that satisfy a diminishing returns property known as submodularity. Recently, the authors introduced Forward Selection Component Analysis (FSCA) which uses variance explained as its selection criterion. While variance explained is not a submodular criterion, FSCA has been shown to be highly effective for applications such as measurement plan optimization. Motivated by the desire to achieve a more computationally efficient and scalable algorithm implementation, in this paper a ‘lazy’ implementation of FSCA (L-FSCA) is proposed, which, although not equivalent to FSCA due to the absence of submodularity, has the potential to yield comparable performance while being up to an order of magnitude faster to compute. The efficacy of L-FSCA is demonstrated by performing a systematic comparison with FSCA and five other unsupervised variable selection methods from the literature using simulated and real-world case studies. Experimental results confirm that L-FSCA yields almost identical performance to FSCA while reducing computation time by between 22% and 94% for the case studies considered.

Full Text