Abstract

As the representative ensemble machine learning method, the Random Forest (RF) algorithm has widely been used in diverse applications on behalf of the fast learning speed and the high classification accuracy. Research on RF can be classified into two categories: (1) improving the classification accuracy and (2) decreasing the number of trees in a forest. However, most of papers related to the performance improvement of RF have focused on improving the classification accuracy. Only some papers have focused on reducing the number of trees in a forest. In this paper, we propose a new Covariance-Based Dynamic RF algorithm, called C-DRF. Compared to the previous works, while ensuring the good-enough classification accuracy, the proposed C-DRF algorithm reduces the number of trees. Specifically, by computing the covariance between the number of trees in a forest and F-measure at each iteration, the proposed algorithm determines whether to increase the number of trees composing a forest. To evaluate the performance of the proposed C-DRF algorithm, we compared the learning time, the test time, and the memory usage with the original RF algorithm under the different areas of datasets. Under the same or higher classification accuracy, it is shown that the proposed C-DRF algorithm improves the performance of the original RF algorithm by as much as 58.68% at learning time, 47.91% at test time, and 68.06% in memory usage on average. As a practical application area, we also show that the proposed C-DRF algorithm is more efficient than the state-of-the-art RF algorithms in Network Intrusion Detection (NID) area.

Highlights

  • As one of the classification modeling approaches, decision tree learning has widely been used in various learning fields such as statistics, data mining, and machine learning

  • We show that the proposed C-DRF algorithm reduces the learning time, the memory usage, and the test time compared to the other Random Forest (RF) algorithms under various applications such as network intrusion detection

  • By analyzing the covariance between the number of trees in a forest and the F1-measure at each iteration, the proposed CDRF algorithm composed a forest with the minimum number of trees while ensuring the good-enough classification accuracy

Read more

Summary

Introduction

As one of the classification modeling approaches, decision tree learning has widely been used in various learning fields such as statistics, data mining, and machine learning. While keeping the best classification accuracy close to that of the original RF algorithm [12], the proposed algorithm reduces the number of trees composing the forest. (1) To the best of our knowledge, we propose the first RF learning algorithm that generates the minimum number of trees using covariance, while keeping the best classification accuracy close to the original RF algorithm. We show that the proposed algorithm reduces the number of trees in a forest while keeping the best accuracy close to the original RF algorithm [12, 19]. We show that the proposed C-DRF algorithm reduces the learning time, the memory usage, and the test time compared to the other RF algorithms under various applications such as network intrusion detection.

Related Works
C-DRF Algorithm
Complexity Analysis
Experimental Evaluation
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call