Decision trees are commonly used for learning and extracting classification rules from data. The fuzzy rule based decision tree (FRDT) is very representative owing to its better robustness and generalization. However, FRDT cannot work well on the analysis of large-scale data sets. One solution for this problem is parallel computing. A proved effective parallel computing model is Map-Reduce. Ensemble learning is an effective strategy which can significantly improve the generalization ability of machine learning systems. The objective of this paper is to develop a fuzzy rule-base based decision tree on the strategies of parallel computing and ensemble learning. First, we implement a parallel fusing fuzzy rule based classification system via Map-Reduce (MR-FFRCS) to display how to extract fuzzy rules from data in parallel and how to evaluate the fuzzy rules in an ensemble learning way. Then, taking MR-FFRCS as a fundamental module, we propose a parallel fuzzy rule-base based decision tree (MR-FRBDT) to improve the original FRDT algorithm. The experimental studies mainly focus on feasibility and parallelism. Compared with FRDT on 23 UCI benchmark data sets, the proposed MR-FRBDT algorithm with fewer parameters is effective and has the ability to handle large-scale data sets. Furthermore, some numerical experiments conducted on several large-scale data sets verify the parallel performance on reducing computing time and avoiding memory restrictions.
Read full abstract