Abstract
Clustering algorithm has a wide range of applications in data mining, pattern recognition and machine learning. It is an important part of data mining technology. The emergence of massive data makes the application of data mining technology endless. Cluster analysis is the basic operation of big data processing. The clustering algorithm is to divide similar elements into one class, and to divide elements with large differences into different classes. Aiming at the computational complexity of the density clustering algorithm, this paper proposes an improved algorithm W-DBSCAN which uses Warshall algorithm to reduce its complexity. In the density clustering algorithm, the data with high similarity are densely connected. In this paper, aiming at the complexity of the density clustering algorithm, an improved algorithm W-DBSCAN using the Warshall algorithm to mitigate its complexity is proposed. In the density clustering algorithm, the data with high similarity is density-connected. This paper constructs a matrix <i>n</i>×<i>n</i> where the element (<i>x</i>, <i>y</i>) is marked as 1 means that the data x and data y are directly reachable, and then the reachability matrix of the matrix is calculated using the Warshall algorithm. The solution density connection problem is transformed into the solution reachability matrix problem, thus reducing the complexity of the algorithm.
Highlights
Clustering algorithm has a wide range of applications in data mining, pattern recognition and machine learning
Feng Zhenhua et al [7]. proposed a greedy DBSCAN improved algorithm, the user input parameters, using the greedy strategy to find the radius parameters and discover the clusters, and the final clustering results are generated by the combination of clusters; Wang Zhaofeng et al [8] proposed a dynamic selection method of DBSCAN algorithm parameters based on K-means; Cai Yue et al [9] proposed an improved DBSCAN algorithm for text clustering, using the least squares method to reduce the dimension of text vectors, and creating a cluster relationship tree structure to enable the algorithm to adaptively cluster text data
If the distance between two data is less than the threshold, the two data densities can be reached, marked as 1, if the distance between the two points is greater than the threshold, indicating that there is no density reachability relationship between the two data, marked as 0, until all data is calculated, establish a 0, 1 similarity matrix about the data; the Warshall algorithm is used to find the transitive closure of the matrix, which is the maximum density connected set of density clustering
Summary
Clustering is a work of classifying data into different classes, minimizing intra-class similarity and maximizing class-to-class similarity. The density clustering algorithm solves the problem that K-means [12] does not adapt to all data, but for beginners. Based on the above research, a density clustering algorithm based on Warshall [13] is proposed. If the distance between two data is less than the threshold, the two data densities can be reached, marked as 1, if the distance between the two points is greater than the threshold, indicating that there is no density reachability relationship between the two data, marked as 0, until all data is calculated, establish a 0, 1 similarity matrix about the data; the Warshall algorithm is used to find the transitive closure of the matrix, which is the maximum density connected set of density clustering. The proposed algorithm is defined as a W-DBSCAN algorithm
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.