Split and Conquer Method in Penalized Logistic Regression with Lasso (Application on Credit Scoring Data)

F Shofiyah,A Sofro

doi:10.1088/1742-6596/1108/1/012107

Abstract

Big data is one of the biggest issues recently. We need a new approach to deal with the problem. One of a statistical strategy that can be used to solve the problem is split and conquer method. In this paper, we focus on non-Gaussian data, i.e binomial distribution. In this research will be discussed about the implementation of the method for credit scoring data. The result is there are 5 important independent variables in credit scoring data. The first variable is the percentage of total balances in credit cards and private lines of credit except real estate divided by the number of credit limits. The second variable is the age of debtor. The third variable is how many times debtor has been 30-59 days late pay in the last 2 years. The fourth variable is how many times the debtor has been late for pay 90 days or more. The last variable is how many times debtor has been 60-89 days late pay in the last 2 years.

Full Text