An Efficient Approach for Clustering US Census Data Based on Cluster Similarity Using Rough Entropy on Categorical Data

G Sreenivasulu,N Sambasiva Rao,S Viswanadha Raju

doi:10.1007/978-981-13-0586-3_37

Abstract

In the field of data mining, clustering is one of the major issues. In the categorical clustering, data labeling has been acknowledged as an important method. The grouping of all the similar data points together is called as clustering. Those points which are not labeled earlier go through the data labeling process. For categorical data, very limited algorithms are applied, although there are many approaches in the numerical domain. In categorical domain, the most challenging issue is to allocate all the unlabeled data points into proper clusters. In this paper, a method is anticipated for labeling and maintaining the similar data points into proper clusters. We have a data set named US Census, where the data was collected as part of the 1990 census. There are 68 categorical attributes. This data set was derived from the US Census 1990 raw data set. The new proposal is to allocate each unlabeled data point into the equivalent proper cluster with data labeling also. It is much useful to understand the demographic survey of the public. This method has two rewards: (1) The proposed algorithm exhibits high execution efficiency. (2) This algorithm can achieve superiority clusters. The proposed algorithm is empirically validated on US Census data set, and it is shown considerably more efficient than previous works while attaining results of high quality.

Full Text