Abstract
Most understandable classifiers are based on contrast patterns, which can be accurately mined from decision trees. Nevertheless, tree diversity must be ensured to mine a representative pattern collection. In this paper, we performed an experimental comparison among different diversity generation procedures. We compare diversity generated by each procedure based on the amount of total, unique, and minimal patterns extracted from the induced tree for different minimal support thresholds. This comparison, together with an accuracy and abstention experiment, shows that Random Forest and Bagging generate the most diverse and accurate pattern collection. Additionally, we study the influence of data type in the results, finding that Random Forest is best for categorical data and Bagging for numerical data. Comparison includes most known diversity generation procedures and three new deterministic procedures introduced here. These deterministic procedures outperform existing deterministic method, but are still outperformed by random procedures.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.