A Monte Carlo fuzzy logistic regression framework against imbalance and separation

Duygu İçen,Georgios Charizanos,Haydar Demirhan

doi:10.1016/j.ins.2023.119893

Duygu İçen, Georgios Charizanos + Show 1 more

Open Access

https://doi.org/10.1016/j.ins.2023.119893

Copy DOI

Journal: Information Sciences	Publication Date: Nov 13, 2023
Citations: 1	License type: cc-by

Affiliation: Hacettepe University, RMIT University

Abstract

This article proposes a new fuzzy logistic regression framework with high classification performance against imbalance and separation while keeping the interpretability of classical logistic regression. Separation and imbalance are two core problems in logistic regression, which can result in biased coefficient estimates and inaccurate predictions. Existing research on fuzzy logistic regression primarily focuses on developing possibilistic models instead of using a logit link function that converts log-odds ratios to probabilities. At the same time, little consideration is given to issues of separation and imbalance. Our study aims to address these challenges by proposing new methods of fuzzifying binary variables and classifying subjects based on a comparison against a fuzzy threshold. We use combinations of fuzzy and crisp predictors, output, and coefficients to understand which combinations perform better under imbalance and separation. Numerical experiments with synthetic and real datasets are conducted to demonstrate the usefulness and superiority of the proposed framework. Seven crisp machine learning models are implemented for benchmarking in the numerical experiments. The proposed framework shows consistently strong performance results across datasets with imbalance or separation and performs equally well when such issues are absent. Meanwhile, the considered machine learning methods are significantly impacted by the imbalanced datasets.

Full Text