Abstract

Agglomerative Clustering is one of data mining methods to get a cluster in form of trees. In order to achieve these objectives, we used two agglomerative methods such as Single Linkage and Complete Linkage. Searching for nearest items to be clustered into one cluster also needs a similarity distance to be measured. We used Euclidean Distance and Cosine Similarity for measuring similarity distance between two points. The factors that promote high levels of accuracy depend on the pre-proceeding stage for clustering process and also affect the results obtained. Therefore, we conducted research through several stages: pre-processing such as ETL, normalization, and pivoting. The ETL process consisted of removing outliers using IQR method, data-cleaning and data-filtering processes. For normalization, we used Min-Max and Altman Z-Score methods to get the best normal value. The results of this research demonstrate that the highest accuracy occurs when using the Complete Linkage with Min-Max and the Euclidean method with the average purity of 0.4. The significant difference is observed when using the Z-Score and Cosine Similarity methods; the average purity is around 0.11. Besides, we found that the system also could not predict the customers’ preferences in buying goods for the next period. Another result in the research is that transactional data in a company are not good enough to be clusterized.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.