Abstract

Traditionally, the data clustering algorithm is lack of comprehensive performance, leading to low clustering purity and long clustering time. In addition, the consistency between the clustering results and the original data distribution is not strong. Therefore, the multidimensional discrete big data clustering algorithm based on dynamic grid was put forward. Firstly, multidimensional discrete big data was processed in advance. The principal component analysis was used to reduce the dimension of data. The concept of entropy was introduced to divide the key attributes and noncritical attributes, so as to extract the key attributes. According to the results of data preprocessing, the dynamic grid was partitioned. According to the results, OptiGrid in data clustering algorithm was used to achieve the data clustering. The experimental results show that the clustering purity of this algorithm is between 95% and 100%, which is significantly higher than the traditional algorithm. Therefore, the multidimensional discrete big data clustering algorithm based on dynamic grid has better comprehensive performance, closer clustering shape to the original data distribution, higher clustering purity, and faster execution efficiency.

Highlights

  • Due to the shortcomings in above methods, a multidimensional discrete big data clustering algorithm based on dynamic grid was put forward

  • The results show that the proposed algorithm is effective in solving their own problems, so it has higher comprehensive performance

  • In order to verify the effectiveness of multidimensional discrete big data clustering algorithm based on dynamic grid, the clustering shape, efficiency, and accuracy of proposed algorithm was compared with the data clustering methods in Reference [2], Reference [3], and Reference [4] through experiments, and the results analysis was given

Read more

Summary

Introduction

With the rapid development of information technology, Internet and cloud computing, the amount of information is increasing explosively. Reference [2] proposed a data clustering method based on K-means algorithm. This method extracted a lot of data samples from massive data. In Reference [3], a data clustering method based on rapid regional evolution was proposed. This method was able to reduce the dimension of data. Due to the shortcomings in above methods, a multidimensional discrete big data clustering algorithm based on dynamic grid was put forward. This algorithm divides the grid in neighborhood of each dimension by the data points, and dynamically adjusts the grid structure. The results show that the proposed algorithm is effective in solving their own problems, so it has higher comprehensive performance

Overall Flow of Multidimensional Discrete Big Data Clustering Algorithm Based on
Multidimensional Discrete Big Data Processing
Dimension Reduction
Attribute Extraction
Dynamic Grid Generation
OptiGrid Data Clustering
Experimental Test Analysis
Experimental Environment and Data Set
Synthetic Data Set
Data Set in Real Environment
Cluster Shape
Cluster Purity
Execution Efficiency
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call