Abstract

In many real-world applications of data mining, such as energy load balance of wireless sensor networks, given data points with balanced distribution, i.e., each class contains approximately the same number of instances, we often need to obtain a clustering result to reflect such balance. In many data, especially the high-dimensional data, such balanced structure is not obvious in the original feature space, due to the noisy and redundant features. Therefore we need to apply feature selection methods to pick several informative features to reveal such balanced structure of data. Feature selection is a fundamental problem in machine learning tasks and has attracted considerable attentions in recent years. However, conventional feature selection methods often focus on how to select the most discriminative features, whereas ignoring the balance property of the data. To tackle this problem, we propose a novel unsupervised feature selection method for balanced clustering which can reveal the intrinsic balanced structure of data. In our method, a balanced regularization term is introduced to select the features which can help to produce balanced clusters. Then, we provide an Alternating Direction Method of Multipliers (ADMM) to optimize the introduced objective function. At last, the experiments are conducted on six benchmark data sets, including Yale and 20NG data sets and so on, by comparing with other state-of-the-art unsupervised feature selection methods published in the literature. The experimental results show that our method not only has better clustering performance but also leads to more balanced clustering structure.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.