Abstract

Clustering is one of the most commonly used techniques in data mining. Its main goal is to group objects into clusters so that each group contains objects that are more similar to each other than to objects in other clusters. The evaluation of a clustering solution is a task carried out through the application of validity indices. These indices measure the quality of the solution and can be classified as either internal that calculate the quality of the solution through the data of the clusters, or as external indices that measure the quality by means of external information such as the class. Generally, indices from the literature determine their optimal result through graphical representation, whose results could be imprecisely interpreted. The aim of this paper is to present a new external validity index based on the chi-squared statistical test named Chi Index, which presents accurate results that require no further interpretation. Chi Index was analyzed using the clustering results of 3 clustering methods in 47 public datasets. Results indicate a better hit rate and a lower percentage of error against 15 external validity indices from the literature.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.