Abstract

Ensemble learning has been widely used in various fields. Still, too many base classifiers will affect the classification time of the ensemble classifier under the big data environment, while reducing base classifiers will affect the classification accuracy of the ensemble classifier. Therefore, the multi-objective teaching-learning-based optimization (MO-TLBO) algorithm is used to carry out ensemble pruning of random forest (RF) to improve the classification accuracy and speed of RF. MO-TLBO algorithm aims at maximizing classification accuracy and minimizing classification time, and it can find a sub-forest with higher classification accuracy and faster classification speed. In addition, considering the vast computational time of ensemble pruning of RF via MO-TLBO algorithm under the big data environment, a vote set is constructed to improve the fitness evaluation process. In the Spark platform, the RF improved by the MO-TLBO algorithm (MO-TLBO-RF) is parallelized based on data parallelism. The Shuffle optimization strategy is proposed to reduce the number of Shuffles in the execution of parallel MO-TLBO-RF. The proposed MO-TLBO-RF is applied to rolling bearing fault diagnosis. The experimental results show that the algorithm can obtain an RF with high fault diagnosis accuracy and fast fault diagnosis speed. The results also prove that the ensemble pruning time can be greatly reduced via the vote set and parallelization of MO-TLBO-RF.

Highlights

  • Ensemble learning combines multiple base classifiers to form an ensemble classifier, which has been widely used in biology, transportation, energy, industry, medicine, and other fields [1]–[5]

  • In order to reduce the enormous computational time of ensemble pruning of random forest (RF) via multi-objective teaching-learning-based optimization (MO-TLBO) algorithm under the big data environment, the RF improved by MOTLBO algorithm is parallelized on Spark according to data parallelism, the Shuffle optimization strategy is proposed, and a vote set is constructed

  • 2) Comparison of Different Swarm Intelligence Optimization Algorithms To evaluate the effectiveness of the MO-TLBO algorithm, three different swarm intelligence optimization algorithms are used for ensemble pruning of RF, i.e., RF improved by multi-objective genetic algorithm (MO-GA-RF), RF improved by multi-objective whale optimization algorithm (MO-WOA-RF), and MO-TLBO-RF

Read more

Summary

INTRODUCTION

Ensemble learning combines multiple base classifiers to form an ensemble classifier, which has been widely used in biology, transportation, energy, industry, medicine, and other fields [1]–[5]. The existing researches use multi-objective meta-heuristic algorithms to effectively improve the classification accuracy and reduce the size of the ensemble classifier They do not take the classification time of the ensemble classifier as one goal. In order to reduce the enormous computational time of ensemble pruning of RF via MO-TLBO algorithm under the big data environment, the RF improved by MOTLBO algorithm is parallelized on Spark according to data parallelism, the Shuffle optimization strategy is proposed, and a vote set is constructed. The MO-TLBO algorithm whose two goals are the maximization of classification accuracy and the minimization of classification time is proposed, and a crossover operator with an adaptive crossover rate is designed to better find the best combination of base classifiers.

THE CLASSIC TLBO ALGORITHM
MO-TLBO-RF Spark-RF
PERFORMANCE ANALYSIS OF MODEL TRAINING AND FAULT DIAGNOSIS
19 With the vote set Parallel MO-TLBO-RF
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.