Abstract
Cluster analysis is a technique commonly used to group objects and then further analysis is carried out to obtain a model, named cluster integration. This process can be continued with various analyzes, including path analyzes, discriminant analyzes, logistics, etc. In this chapter, the author discusses the reason to use dummy variables in this type of cluster analysis. Dummy variables are the main way that categorical variables are included as predictors in modeling. With statistical models such as linear regression, one of the dummy variables needs to be excluded, otherwise the predictor variables are perfectly correlated. Thus, usually if a categorical variable can take k values, we only need k-1 dummy variables, the k-th variable being redundant, it does not bring any new information. When more dummy variables than needed are used this is known as dummy variable trapping. The advantage to use dummy variables is that they are simple to use and the decision making process is easier to manage. The novelty in this chapter is the perspective of the dummy variable technique using cluster analysis in statistical modeling. The data used in this study is an assessment of the provision of credit risk at a bank in Indonesia. All analyzes were carried out using software R.
Highlights
The application of cluster analysis is commonly used to group objects
We will explain the technical perspective of dummy variables using cluster analysis in statistical modeling, such as regression analysis, path analysis, and discriminant analysis
The coefficient of total determination of the Cluster integration logistic regression analysis model with 3 groups is 0.8923, so it can be concluded that the diversity of data that can be explained by the model is 89.23% while the remaining 10.17% is explained by variables outside the model
Summary
The application of cluster analysis is commonly used to group objects. Cluster analysis can be used to group objects and further analysis is carried out to obtain a model, namely cluster integration. Cluster integration can be continued with various analyzes, including path analysis, discriminant analysis, logistics, etc. In cluster integration with path analysis, it aims to group homogeneous objects into one group, the goal is that the resulting residual variance is homogeneous in addition to maximizing the adjusted R2 value. In cluster integration with discriminant analysis, the benefits of cluster analysis generated can maximize the accuracy, sensitivity, and specificity of the model. We will explain the technical perspective of dummy variables using cluster analysis in statistical modeling, such as regression analysis, path analysis, and discriminant analysis
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have