The goal of unsupervised learning, i.e., clustering, is to determine the intrinsic structure of unlabeled data. Feature selection for clustering improves the performance of grouping by removing irrelevant features. Typical feature selection algorithms select a common feature subset for all the clusters. Consequently, clusters embedded in different feature subspaces are not able to be identified. In this paper, we introduce a probabilistic model based on Gaussian mixture to solve this problem. Particularly, the feature relevance for an individual cluster is treated as a probability, which is represented by localized feature saliency and estimated through Expectation Maximization (EM) algorithm during the clustering process. In addition, the number of clusters is determined simultaneously by integrating a Minimum Message Length (MML) criterion. Experiments carried on both synthetic and real-world datasets illustrate the performance of the proposed approach in finding clusters embedded in feature subspace. 1. Introduction. Clustering is unsupervised classification of data objects into different groups (clusters) such that objects in one group are similar together and dis- similar from another group. Applications of data clustering are found in many fields, such as information discovering, text mining, web analysis, image grouping, medi- cal diagnosis, and bioinformatics. Many clustering algorithms have been proposed in the literature (8). Basically, they can be categorized into two groups: hierarchical or partitional. A clustering algorithm typically considers all available features of the dataset in an attempt to learn as much as possible from data. In practice, however, some features can be irrelevant, and thus hinder the clustering performance. Feature selection, which chooses the best feature subset for clustering, can be applied to solve this problem. Feature selection is extensively studied in supervised learning scenario (1-3), where class labels are available for judging the performance improvement contributed by a feature selection algorithm. For unsupervised learning, feature selection is a very dif- ficult problem due to the lack of class labels, and it has received extensive attention recently. The algorithm proposed in (4) measures feature similarity by an information compression index. In (5), the relevant features are detected using a distance-based entropy measure. (6) evaluates the cluster quality over different feature subsets by normalizing cluster separability or likelihood using a cross-projection method. In (7), feature saliency is defined as a probability and estimated by the Expectation Maxi- mization (EM) algorithm using Gaussian mixture models. A variational Bayesian ap-