Abstract

Many indexes have been proposed in literature for the comparison of two crisp data partitions, as resulting from two different classifications attempts, two different clustering solutions or the comparison of a predicted vs. a true labeling. Most of these indexes implementations have a computational cost of O(N2) (where Nis the number of data points) and this fact may limit their usage in very big datasets or their integration in computational-intensive validation strategies. Furthermore, their extension to fuzzy partitions is not obvious. In this paper we analyze efficient algorithms to compute many classical indexes (most notably the Jaccard coefficient and the Rand index) in O(d2+ N) (where dis the number of different classes/clusters) and propose a straightforward procedure to extend their computation to fuzzy partitions. The fuzzy extension is based on a pseudo-count concept and provides a natural framework for including memberships in computation of binary similarity indexes, not limited to the ones here revised. Results on simulated data using the Jaccard coefficient highlight a higher consistence of its proposed fuzzy extension with respect to its crisp counterpart.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.