Abstract
In many disciplines, the evaluation of algorithms for processing massive data is a challenging research issue. However, different algorithms can produce different or even conflicting evaluation performance, and this phenomenon has not been fully investigated. The motivation of this paper aims to propose a solution scheme for the evaluation of clustering algorithms to reconcile different or even conflicting evaluation performance. The goal of this research is to propose and develop a model, called decision-making support for evaluation of clustering algorithms (DMSECA), to evaluate clustering algorithms by merging expert wisdom in order to reconcile differences in their evaluation performance for information fusion during a complex decision-making process. The proposed model is tested and verified by an experimental study using six clustering algorithms, nine external measures, and four MCDM methods on 20 UCI data sets, including a total of 18,310 instances and 313 attributes. The proposed model can generate a list of algorithm priorities to produce an optimal ranking scheme, which can satisfy the decision preferences of all the participants. The results indicate our developed model is an effective tool for selecting the most appropriate clustering algorithms for given data sets. Furthermore, our proposed model can reconcile different or even conflicting evaluation performance to reach a group agreement in a complex decision-making environment.
Highlights
Clustering is widely applied in the initial stage of big data analysis to divide large data sets into smaller sections, so the data can be comprehended and mastered with successive analytic operations [1,2,3]. e processing of massive data relies on the selection of an appropriate clustering algorithm, and the issue of the evaluation of clustering algorithms remains an active and significant issue in many subjects, such as fuzzy set, genomics, data mining, computer science, machine learning, business intelligence, and financial analysis [1, 4,5,6]
We present an experiment on 20 UCI data sets. is is designed to test and verify our proposed decisionmaking support for evaluation of clustering algorithms (DMSECA) model for performance evaluation of clustering algorithms in order to reconcile individual differences or even conflicts in the evaluation performance of clustering algorithms based on MCDM in a complex decision-making environment. e experimental data sets, experimental design, and experimental results are as follows
The experimental design is described in detail to examine the feasibility and effectiveness of our proposed DMSECA model. e DMSECA model can be verified by applying the four MCDM methods introduced in Section 3.2 to estimate the performance of the clustering algorithms for the 20 selected public-domain UCI machine learning data sets
Summary
Clustering is widely applied in the initial stage of big data analysis to divide large data sets into smaller sections, so the data can be comprehended and mastered with successive analytic operations [1,2,3]. e processing of massive data relies on the selection of an appropriate clustering algorithm, and the issue of the evaluation of clustering algorithms remains an active and significant issue in many subjects, such as fuzzy set, genomics, data mining, computer science, machine learning, business intelligence, and financial analysis [1, 4,5,6]. E processing of massive data relies on the selection of an appropriate clustering algorithm, and the issue of the evaluation of clustering algorithms remains an active and significant issue in many subjects, such as fuzzy set, genomics, data mining, computer science, machine learning, business intelligence, and financial analysis [1, 4,5,6]. Clustering algorithms, which are unsupervised patternlearning algorithms without a priori information, partition the original data space into smaller sections with high intergroup dissimilarities and intragroup similarities. Clustering can be used to process various types of massive data to uncover unknown correlations, hidden patterns, and other potentially useful information. Naldi et al [11] pointed out different clustering algorithms sometimes produce different data partitions. Different algorithms produce different or even conflicting results. erefore, the evaluation of clustering algorithms remains a significant task and a challenging problem
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.