Granular computing-based approach for classification towards reduction of bias in ensemble learning

Han Liu,Mihaela Cocea

doi:10.1007/s41066-016-0034-1

Abstract

Machine learning has become a powerful approach in practical applications, such as decision making, sentiment analysis and ontology engineering. To improve the overall performance in machine learning tasks, ensemble learning has become increasingly popular by combining different learning algorithms or models. Popular approaches of ensemble learning include Bagging and Boosting, which involve voting towards the final classification. The voting in both Bagging and Boosting could result in incorrect classification due to the bias in the way voting takes place. To reduce the bias in voting, this paper proposes a probabilistic approach of voting in the context of granular computing towards improvement of overall accuracy of classification. An experimental study is reported to validate the proposed approach of voting using 15 data sets from the UCI repository. The results show that probabilistic voting is effective in increasing the accuracy through reduction of the bias in voting. This paper contributes to the theoretical and empirical analysis of causes of bias in voting, towards advancing ensemble learning approaches through the use of probabilistic voting.

Highlights

Machine learning has become an increasingly powerful approach in real applications, such as decision making (Das et al 2016; Xu and Wang 2016), sentiment analysis (Liu 2012; Pedrycz and Chen 2016) and ontology engineering (Pedrycz and Chen 2016; Roussey et al 2011)
The Random Forests and Adaboost methods are used for this experimental study due to the fact that they are the popular examples of Bagging and Boosting, respectively, in practical applications
We have discussed in the context of granular computing how the current deterministic ways of voting in ensemble learning methods are biased through the assumptions of completeness of data and sample representativeness, which are rarely met, especially in the context of big data

Summary

Introduction

Machine learning has become an increasingly powerful approach in real applications, such as decision making (Das et al 2016; Xu and Wang 2016), sentiment analysis (Liu 2012; Pedrycz and Chen 2016) and ontology engineering (Pedrycz and Chen 2016; Roussey et al 2011). Machine learning can be involved in classification and regression, which are considered as supervised learning tasks. In other words, training data used in classification and regression are labelled. Machine learning can be involved in association and clustering, which are considered as unsupervised learning tasks. In other words, training data used in association and clustering are unlabelled.

Objectives

Results

Conclusion