Abstract

This paper aims at how to turn the stored data into useful information, dig out the deep information of process operation, and improve the process monitoring ability by using this information. Based on the research topic of large-range linear accelerometer detection technology and data mining, this paper improves the traditional data mining methods to different degrees according to the characteristics of different detection objects, and puts forward some new detection data analysis and processing and fault diagnosis and prediction methods. Through the analysis of the density-based algorithm and the grid-based algorithm, this paper proposes an equivalent rule of dense cell recognition and density-based object search and thus proposes a grid-based and density-based clustering algorithm CLIGRID. This algorithm achieves fast clustering by clustering in stages and selecting seed objects to extend the class, thus reducing the number of regional queries and I/O overhead. Based on the DBSCAN algorithm of parameter selection difficult and hard to find the problem of large cluster density difference, put forward the multidimensional value of worshiping improved algorithms of DBSCAN algorithm using grid map density distribution of density matrix, automatically determine the density level classification, through a multidimensional hierarchical clustering process with multiple density more careful in the level of clustering results, solved by using global epsilon value clustering quality deterioration. The comparison results show that the efficiency of CLIGRID algorithm is significantly higher than DBSCAN algorithm, and the execution time of the algorithm is basically linear with the number of data points. Experiments show that when the data volume is less than 10,000, the efficiency of the algorithm is slightly lower than DBSCAN algorithm, because the representative core unit occupies a large proportion (64%) in all the dense units, while the seed objects in the representative core unit participate in the operation in the two stages of clustering, which increases the running time. With the increase in the data set and the decrease in the proportion of core elements (26%), the effect of grid clustering on the improvement of algorithm efficiency gradually becomes apparent.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call