Abstract

Validity evaluation aims to analyze the quality of the clustering algorithm with different measurement criteria. A variety of assessment methods have been introduced in the application of pattern recognition and computer vision. Although it is well known that mining information of massive data is essential, most of the validity indices only provide a single partitioning scheme for clustering validation. Moreover, the conventional evaluation algorithm is susceptible to the density and dimension of the dataset, which leads to assessment failure. In this paper, a normalization-based validity index (NbVI) is proposed for validity evaluation of the adaptive K-means clustering from a multi-solution perspective. According to the concept of high-compact within clusters and high-separation among groups, NbVI attempts to find the maximum relative ratio between normalized inter-distance and normalized intra-distance. The experimental results demonstrate that the proposed NbVI method exhibits excellent performance for the clustering of the density-unbalanced dataset for multi-solution applications. Moreover, the NbVI validation shows high versatility using different clustering algorithms.

Highlights

  • The advent of emerging technologies such as the Internet of Things (IoT) and the fifth-generation mobile networks (5G) have promoted the development of science and technology and had significant impacts on humanity’s lifestyle

  • WORKS In this paper, the normalization-based validity index (NbVI) method is developed for the quality assessment of K-means clustering

  • The normalization-based technique improves the domination of validity index by intra-distance or inter-distance and enables to provide optimal multi-solution for validity evaluation of K-means clustering

Read more

Summary

Introduction

The advent of emerging technologies such as the Internet of Things (IoT) and the fifth-generation mobile networks (5G) have promoted the development of science and technology and had significant impacts on humanity’s lifestyle. Identifying applicable information from the large dataset by category technique is significant. The clustering algorithm attempts to partition unlabeled data into clusters such. According to different cluster definition approaches, the clustering algorithm can be categorized into four main types [5]: (1) partitional clustering; (2) hierarchical clustering; (3) density-based clustering; (4) grid-based clustering. Different clustering algorithms use distinct criteria to discover the optimal partitioning scheme of the dataset. The K-means algorithm, as the most representative and wellknown clustering method, exhibits great power in partitional clustering [6], [7]. More recent attention has focused on the improvement of K-means clustering quality by introducing various criteria. Liu et al has proposed an improved path-based clustering algorithm by mining the centroid of data points [8].

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call