Abstract

Class imbalance and noise present significant challenges in numerous real-world classification tasks. The prevalence of an uneven distribution of samples typically results in a bias towards the majority class in Support Vector Machine (SVM) classifiers, compounded by the often inherent noise within these samples. Addressing both class imbalance and noise, we introduce two fuzzy-based methodologies. The first method employs intuitionistic fuzzy membership, resulting in the development of the Robust Energy-based Intuitionistic Fuzzy Least Squares Twin Support Vector Machine (IF-RELSTSVM), a model specifically designed for class imbalance learning. The IF-RELSTSVM model is distinguished by its use of intuitionistic fuzzy scores for both classes, significantly attenuating the detrimental effects of noise and outliers. A distinctive attribute of IF-RELSTSVM is its proficiency in processing noisy data points, whether proximate to or distant from the hyperplane. Additionally, we introduce a novel concept of hyperplane-based fuzzy membership, calculating fuzzy memberships through a projection-based approach. This foundation supports the formulation of a Robust Energy-based Fuzzy Least Squares Twin Support Vector Machine (F-RELSTSVM), also aimed at class imbalance learning. The efficacy of the proposed IF-RELSTSVM and F-RELSTSVM algorithms is rigorously evaluated across several benchmark and synthetic datasets, employing the Area Under the ROC Curve (AUC) as a performance metric. Experimental findings indicate that these algorithms surpass baseline models in the majority of datasets tested. Statistical analyses further validate the significance of our proposed methods, demonstrating their suitability for application in environments characterized by noise and class imbalance. A case study in credit card fraud detection showcases the F-RELSTSVM algorithm achieving an impressive average AUC of 90.84%, thereby outperforming comparable algorithms and highlighting the practical applicability of our approaches in tackling challenging datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call