Abstract

The basis of ensemble learning is using multiple learning algorithms to improve predictive performance compared to individual learners. Behind the various advantages of ensemble learning, there are several issues that need attention, one of which is related to finding a set of diverse base learners. Recently, clustering has been used to generate diverse base learners as opposed to bagging. The main advantages of cluster-based ensemble learners are their robustness and versatility. The key parameters for implementing a clustering algorithm are the cluster size and distance metrics. The contribution of this study is to compare four distance metrics, including the Euclidean, Manhattan, Chebyshev, and Canberra distances, in the clustering method for ensemble generation and evaluate them based on accuracy, purity, and diversity. The methodology is tested on 10 benchmark UCI datasets. The results show that the use of the Chebyshev and Canberra distances achieved superior accuracy to both the Euclidean and Manhattan distances, while the purity and diversity values of the use of the Chebyshev distance outperformed the other three.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call