Abstract

Imbalanced dataset classification is a challenging problem, since many classifiers are sensitive to class distribution so that the classifiers’ prediction has bias towards majority class. Hellinger Distance has been proven that it is skew-insensitive and the decision trees that employ Hellinger Distance as a splitting criterion have shown better performance than other decision trees based on Information Gain. We propose a new decision tree induction classifier (HeDEx) based on Hellinger Distance that is randomized ensemble trees selecting both attribute and split-point at random. We also propose hyperplane as a decision surface for HeDEx to improve the performance. A new pattern-based oversampling method is also proposed in this paper to reduce the bias towards majority class. The patterns are detected from HeDEx and the new instances generated are applied after verification process using Hellinger Distance Decision Trees. Our experiments show that the proposed methods show performance improvements on imbalanced datasets over the state-of-the-art Hellinger Distance Decision Trees.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.