Machine Learning-based Approach on Dealing with Binary Classification Problem in Imbalanced Financial Data

Dennis Dennis,Alexander A.S Gunawan,Russell Kenael Azaria,Ivan Reginald Budianto

doi:10.1109/ismode53584.2022.9742834

Abstract

Imbalanced data are known to be notably difficult to deal with, as it needs a thorough understanding of the data to know how it should be done. However, it occurs in many fields, especially in finance, like banking. Many problems that use financial data can be summarized as binary problems. Some of them are fatal if not identified correctly. This research aims to find how to utilize machine learning models to deal with imbalanced data, specifically one that contains binary classification problems. In this paper, we use imbalanced insurance and credit card datasets. The research is conducted by starting from doing feature selection in the datasets by removing irrelevant columns, followed by using SMOTE algorithm variants (K-Means SMOTE and Borderline SMOTE) and pure SMOTE algorithm as the oversampling methods, and Near-Miss and All-KNN for undersampling methods. The algorithms are implemented by using scikit libraries. Lastly, PCA is used for dimensionality reduction and Logistic Regression as the machine learning model with cross-validations for deciding the best hyperparameter. The procedure produces five different Logistic Regression models that differ in how it handles the imbalances, which will be compared. The result shows that the oversampling methods work better than undersampling methods, with K-Means SMOTE and Borderline SMOTE performing better than the pure SMOTE, meaning that machine learning can be used as a solution to deal with binary classification problems in imbalanced financial data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Machine Learning-based Approach on Dealing with Binary Classification Problem in Imbalanced Financial Data

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A Method for Analyzing the Performance Impact of Imbalanced Binary Data on Machine Learning Models
Ming Zheng ... Yuhao Miao
Axioms | VOL. 11
Ming Zheng, et. al.Ming Zheng ... Yuhao Miao
01 Nov 2022
Axioms | VOL. 11

Cervical Cancer Prediction Based on Imbalanced Data Using Machine Learning Algorithms with a Variety of Sampling Methods
Mădălina Maria Muraru ... László Barna Iantovics
Applied Sciences | VOL. 14
Mădălina Maria Muraru, et. al.Mădălina Maria Muraru ... László Barna Iantovics
05 Nov 2024
Applied Sciences | VOL. 14

Class-imbalance learning based discriminant analysis
Xiaoyuan Jing ... Jingyu Yang
-
Xiaoyuan Jing, et. al. Xiaoyuan Jing ... Jingyu Yang
01 Nov 2011
01 Nov 2011

Predicting the Depression of the South Korean Elderly using SMOTE and an Imbalanced Binary Dataset
Haewon Byeon
International Journal of Advanced Computer Science and Applications | VOL. 12
Haewon ByeonHaewon Byeon
01 Jan 2020
International Journal of Advanced Computer Science and Applications | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Machine Learning-based Approach on Dealing with Binary Classification Problem in Imbalanced Financial Data

Abstract

Talk to us

Similar Papers