The Application of Agglomerative Clustering in Customer Credit Receipt of Fashion and Shoe Retail

Michael Abadi Santoso,Gloria Virginia,Budi Susanto

doi:10.9744/jirae.3.1.37-44

Abstract

Agglomerative Clustering is one of data mining methods to get a cluster in form of trees. In order to achieve these objectives, we used two agglomerative methods such as Single Linkage and Complete Linkage. Searching for nearest items to be clustered into one cluster also needs a similarity distance to be measured. We used Euclidean Distance and Cosine Similarity for measuring similarity distance between two points. The factors that promote high levels of accuracy depend on the pre-proceeding stage for clustering process and also affect the results obtained. Therefore, we conducted research through several stages: pre-processing such as ETL, normalization, and pivoting. The ETL process consisted of removing outliers using IQR method, data-cleaning and data-filtering processes. For normalization, we used Min-Max and Altman Z-Score methods to get the best normal value. The results of this research demonstrate that the highest accuracy occurs when using the Complete Linkage with Min-Max and the Euclidean method with the average purity of 0.4. The significant difference is observed when using the Z-Score and Cosine Similarity methods; the average purity is around 0.11. Besides, we found that the system also could not predict the customers’ preferences in buying goods for the next period. Another result in the research is that transactional data in a company are not good enough to be clusterized.

Full Text