Abstract

Unsupervised feature selection (UFS) is utilized in various application domains, such as data mining, pattern recognition, machine learning, etc. UFS follows three basic approaches, namely filter, wrapper, and hybrid (that is, a combination of both filter and wrapper) to select the relevant and non-redundant features. It has been observed that a filter method does not guarantee an optimal solution. However, a wrapper approach is computationally expensive. The hybrid method are known to give a better trade-off between filter and wrapper strategies. But, the practical applicability of schemes mentioned above are preferably restricted only to a numerical dataset and are not so suitable for a mixed dataset. Therefore, there is a need for a UFS scheme which can handle both the numerical and non-numerical features directly. In this paper, a robust and efficient two-phase (i.e., feature ranking (FR) and feature selection (FS)) UFS method is proposed. The proposed FR utilizes entropy and mutual information to produce maximum informative and non-redundant ranked features from a high-dimensional mixed dataset. Further, the proposed FS follows k-prototype clustering algorithm with improved Callinski-Harasbaz criteria-based selection methodology to choose optimal features. Experiments on real-life dataset substantiate that the proposed approach provides a better subset of features compared to the existing state of the art approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call