Abstract

Procedures that evaluate the results of clustering algorithms are known as clustering validation (CV) indexes. There are several (CV) indexes usually classified into two broad classes namely external and internal clustering validation indexes depending on whether on ground truth or optimal clustering are known in advance or not respectively. Traditional cluster validation indexes are even impossible to perform especially when the size of the data set is very large. In this paper, we are interested in external validation of clustering large data sets. To solve the issue of CV in a big data context, we propose in this paper a parallel external clustering validation especially F-measure (MR_F-measure) model that is based on MapReduce. The experimental results reveal that MR_F-measure scales very well with increasing data set sizes.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.