MapReduce-based adaptive random forest algorithm for multi-label classification

Qinghua Wu,Haihui Wang,Xuesong Yan,Xiaobo Liu

doi:10.1007/s00521-018-3900-8

Abstract

Due to the complexity of data characteristics, multi-label learning in data mining has been proposed by scholars to solve the problem of information knowledge in the era of big data. In the era of big data, the complexity of the data structures makes it impossible for traditional single-label learning methods to meet the needs of technological development. Moreover, the importance of multi-label learning is gradually becoming evident. The random forest (RF) algorithm is regarded as one of the best classification algorithms. In this study, the traditional decision tree algorithm was improved, and the traditional RF method was converted into an adaptive RF (ARF) method for multi-label classification. By experiments, the effectiveness of the proposed method was verified. The RF method may not be able to classify massive data in a short time, but Hadoop, which was by Apache, is suitable for data-intensive tasks. On this basis, we modified the MapReduce programming mode to make it suitable for the proposed ARF method. This method was implemented on the cloud platform, and the time effectiveness of the parallel model was verified by experiments.

Full Text