Abstract
The random forest classifier is widely used in different fields due to its accuracy and robustness. Since its invention, the random forest algorithm is naturally developed for multi-dimensional vectorial data since features can be directly sampled during the decision tree construction procedure. In the context of discrete sequence classification, an explicit feature set is not readily available and we need to employ a feature extraction algorithm before building the random forest. However, such a predefined feature subset may limit the diversity of decision trees since the set of candidate features is composed of all subsequences. As a result, the predictive accuracy of constructed random forest classifier may be reduced. To address this, we propose a new algorithm that is able to directly build a random forest by choosing features from the set of all subsequences adaptively. To improve the running efficiency of our algorithm, the count-suffix tree is utilized to facilitate the fast frequency counting of subsequences so as to accelerate the generation of each randomized decision tree. The experimental results on 15 real datasets show that our method can outperform those state-of-the-art classification algorithms in terms of the predictive accuracy. The source code of our method can be found at: https://github.com/JiaqiWang-dlut/RSForest.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.