Abstract
Clustering tries to find the natural structure of input datasets and partitions them into groups or clusters. As an unsupervised pattern classification method, it has been widely used in data mining, pattern recognition, image processing and so on. However, many of the existing clustering algorithms are suffering from many obstacles, such as low efficiency, poor clustering accuracy, more sensitive to noise points and cannot deal with complex big data properly. Aiming at these problems, an improved K-means algorithm (Grid-K-means) is firstly proposed. In the algorithm, dynamically changing grids operations are used to substitute data point operations to improve the clustering efficiency and reduce the number of manually setting initial parameters. Meanwhile, by utilizing grids with the highest density to determine the initial clustering centers, more accurate and stable clustering results are acquired. Then, based on the idea of utilizing grid as the weighted representative point to process the dataset, a new clustering validity index (BCVI) is introduced to better evaluate the quality of clustering results. BCVI can quickly determine the optimal clustering number especially for large-scale datasets. Experimental results on testing 5 simulated datasets (including two large sample data sets) have demonstrated that the Grid-K-means algorithm is faster and more accurate than the traditional ones. Meanwhile, the clustering results are evaluated by our BCVI and 6 other existing clustering validity indexes. The experimental results have also shown that the new BCVI is superior to traditional indexes in data processing speed and stability.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.