Information Entropy based Indices for Variable Selection Performance Assessment

Q Peter He

doi:10.1016/b978-0-444-64241-7.50375-x

Abstract

Abstract Variable selection plays an important role in data-driven modeling and other applications, and becomes increasingly important as big data become ubiquitous. In the last a few years, many different variable selection methods have been reported. However, how to directly evaluate the performance of variable selection methods has received limited attention. The common criteria used to assess variable selection performance either indirectly measure the effects of variable selection, such as through prediction performance of a model, or require ground truth of variable relevancy, which is usually unavailable, incomplete, or unverified in industrial applications. To address this limitation, two information entropy based consistency indices are proposed to directly evaluate the performance of variable selection methods: one does not require ground truth of variable relevancy, the other can make use of such information if available. A simulated case study (with ground truth) and an industrial case study (without ground truth) are provide to compare the proposed indices with the existing methods.

Full Text