Abstract

In this paper, we proposed a generative graphical model for unsupervised robust feature selection. The model assumes that the data are independent and identically sampled from a finite mixture of Student-t distribution for dealing with outliers. The Student t-distribution works as the building block for robust clustering and outlier detection. Random variables that represent the features' saliency are included in the model for feature selection. As a result, the model is expected to simultaneously realise unsupervised clustering, feature selection and outlier detection. The inference is carried out by a tree-structured variational Bayes (VB) algorithm. The feature selection capability is realised by estimating the feature saliencies associated with the features. The adoption of full Bayesian treatment in the model realises automatic model selection. Experimental studies showed that the developed algorithm compares favourably against existing unsupervised Bayesian feature selection algorithm in terms of commonly-used internal and external cluster validity indices on controlled experimental settings and benchmark data sets. The controlled experimental study also showed that the developed algorithm is capable of exposing the outliers and finding the optimal number of components (model selection) accurately.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call