Abstract
With the rapid development of network technology and database technology, computers have been able to store large‐scale and massive data. On the other hand, traditional data analysis and processing tools such as management information system can only process these data on the surface, but the deeper data analysis ability is not satisfactory. The contradiction between data supply ability and data analysis ability is becoming more and more prominent, so there is an urgent need for an automation technology that can deeply process data. Data mining technology came into being. Cluster analysis, as an important topic in data mining, is a data mining method that divides data into natural groups and gives the description of the characteristics of each group. It is a basic method of data mining and knowledge discovery. Cluster analysis is a data mining technology for unsupervised classification of data without prior knowledge and guidance. Through the appropriate use of advanced algorithms, it can explore the hidden valuable information, improve the quality of data analysis and interpretation, and provide a scientific judgment basis for the reprocessing or understanding of data by other data analysis and sorting tools. First, this paper briefly introduces the principle, development, and methods of cluster analysis and expounds the application of cluster analysis. Then it expounds the principle of R‐means clustering algorithm, analyzes the advantages and disadvantages of basic R‐means clustering algorithm, and expounds several existing improvement methods. An improved R‐means clustering algorithm and a clustering analysis model based on R‐means clustering algorithm are proposed, and the corresponding algorithm flow and implementation are given.
Highlights
Human society has entered a period of high-speed civilization
Since the 1990s, information technology and Internet technology, people have accumulated very rich data in various fields of production and life. ese massive data promote the development of database technology and make it easy for people to obtain a large amount of data
Data mining technology came into being and gradually became a research hot spot in the field of computer science, attracted many experts and scholars, and showed strong vitality
Summary
Human society has entered a period of high-speed civilization. Especially since the 1990s, information technology and Internet technology, people have accumulated very rich data in various fields of production and life. ese massive data promote the development of database technology and make it easy for people to obtain a large amount of data. K-means clustering algorithm is one of the most commonly used typical partition-based algorithms, which uses the sum of error square criterion function as the clustering criterion It has the advantages of simple operation, fast, efficient, and scalable processing of large data sets, but the algorithm has the following defects: the clustering results are sensitive to the selection of the initial center value, the K value in the algorithm needs to be specified in advance, it is easy to fall into the local optimal solution, and only spherical clusters can be found. By selecting the appropriate function and using the nonlinear mapping ability of the function, while improving the clustering performance, the high-dimensional nonlinear separable data become linearly separable after being mapped to the space In this process, due to the nonconvexity of the proposed model, it is often easy to fall into the local optimal solution in the process of solving. Where P is the probability of individual being selected, fi fitness values of individuals
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.