Abstract

The rapid development of big data era incurs the generation of huge amount of data day by day in various fields. Due to the large-scale and high-dimensional characteristics of these data, it is often difficult to achieve better decision-making in practical applications. Therefore, an efficient big data analytical method is urgently necessary. For feature engineering, feature selection seems to be an important research topic which is anticipated to select “excellent” features from candidate ones. The implementation of feature selection can not only achieve the purpose of dimensionality reduction, but also improve the computational efficiency and result performance of the model. In many classification tasks, researchers found that data seem to be usually close to each other if they are from the same class; thus, local compactness is of great importance for the evaluation of a feature. Based on this discovery, we propose a fast unsupervised feature selection algorithm, named Compactness Score (CSUFS), to select desired features. To prove the superiority of the proposed algorithm, several public data sets are considered with extensive experiments being performed. The experiments are presented by applying feature subsets selected through several different algorithms to the clustering task. The performance of clustering tasks is indicated by two well-known evaluation metrics, while the efficiency is reflected by the corresponding running time. As demonstrated, our proposed algorithm is more accurate and efficient compared with existing ones.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call