Autobin: A predictive approach towards automatic binning using data splitting

Tanja Verster

doi:10.37920/sasj.2018.52.2.3

Abstract

The concept of binning is known by many names: discretisation, classing, grouping and quantisation. It entails the mapping of continuous or categorical data into discrete bins. Binning is an important pre-processing step in most predictive models and considered a basic data preparation step in building a credit scorecard. Credit scorecards are mathematical models which attempt to provide a quantitative estimate of the probability that a customer will display a defined behaviour (e.g. default) with respect to their current credit position with a lender. Among the practical advantages of binning are the removal of the effects of outliers and a way to handle missing values. Many binning methods exist but they are often time consuming to actually carry out. We propose a new method, Autobin, that is based on data splitting and maximising a cross-validation form of the predicted log-likelihood. Autobin has the advantage of being nearly automatic and requires very little by way of tuning parameters. In a limited simulation study done, it was found that Autobin outperforms its competitors.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: South African Statistical Journal	Publication Date: Jan 1, 2018
Citations: 5	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Autobin: A predictive approach towards automatic binning using data splitting

Abstract

Talk to us

Similar Papers

More From: South African Statistical Journal

Lead the way for us

Similar Papers

Optimization of credit scorecard combinations based on quantum annealing algorithm
Bingze He ... Xingyi Ji
Highlights in Science, Engineering and Technology | VOL. 61
Bingze He, et. al.Bingze He ... Xingyi Ji
30 Jul 2023
Highlights in Science, Engineering and Technology | VOL. 61

Credit scorecard based on logistic regression with random coefficients
Gang Dong ... Kin Keung Lai
Procedia Computer Science | VOL. 1
Gang Dong, et. al.Gang Dong ... Kin Keung Lai
01 May 2010
Procedia Computer Science | VOL. 1

Bibliography
-
-
--
23 Dec 2016
23 Dec 2016

Matrix Sequential Hybrid Credit Scorecard Based on Logistic Regression and Clustering

Iranian Journal of Management Studies | VOL. 11

01 Jan 2018
Iranian Journal of Management Studies | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Autobin: A predictive approach towards automatic binning using data splitting

Abstract

Talk to us

Similar Papers

More From: South African Statistical Journal