Abstract

Data clustering can apply to both numeric and categorical attributes. The numerical characteristics of data are all such values that can be written as digits. This work is not directly devoted to this category of attributes. The research's primary and most crucial goal is to compare the clustering processes, the quality of the obtained clusters, and the clustering results between the selected algorithms for clustering numerical sets (using coding techniques) and qualitative data sets. Design experiments were performed on three datasets. The algorithms used to group the observations into classes were the k-means algorithm and the kmodes algorithm. Three different distance determination measures were used in the case of the k-means algorithm - the Euclidean measure, the urban measure, and the Chebyshev measure. It can be concluded that for the structures of the three selected data sets, the k-modes algorithm presents better results of the clustering quality and thus more effectively distributes observations to classes. It also does it in a much shorter time than the k-means algorithm. The keyword used here, however, is dataset structure.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.