Comparison between Two Algorithms for Computing the Weighted Generalized Affinity Coefficient in the Case of Interval Data

Áurea Sousa,Helena Bacelar-Nicolau,João Cabral,Osvaldo Silva,Leonor Bacelar-Nicolau

doi:10.3390/stats6040068

Áurea Sousa, Helena Bacelar-Nicolau + Show 3 more

Open Access

https://doi.org/10.3390/stats6040068

Copy DOI

Journal: Stats	Publication Date: Oct 13, 2023
Citations: 1	License type: CC BY 4.0

Affiliation: University of the Azores, University of Lisbon

Abstract

From the affinity coefficient between two discrete probability distributions proposed by Matusita, Bacelar-Nicolau introduced the affinity coefficient in a cluster analysis context and extended it to different types of data, including for the case of complex and heterogeneous data within the scope of symbolic data analysis (SDA). In this study, we refer to the most significant partitions obtained using the hierarchical cluster analysis (h.c.a.) of two well-known datasets that were taken from the literature on complex (symbolic) data analysis. h.c.a. is based on the weighted generalized affinity coefficient for the case of interval data and on probabilistic aggregation criteria from a VL parametric family. To calculate the values of this coefficient, two alternative algorithms were used and compared. Both algorithms were able to detect clusters of macrodata (aggregated data into groups of interest) that were consistent and consonant with those reported in the literature, but one performed better than the other in some specific cases. Moreover, both approaches allow for the treatment of large microdatabases (non-aggregated data) after their transformation into macrodata from the huge microdata.

Full Text