Investigating the impact of data scaling on the k-nearest neighbor algorithm

Muasir Pagan,Muhammad Zarlis,Ade Candra

doi:10.11591/csit.v4i2.p135-142

Muasir Pagan, Muhammad Zarlis + Show 1 more

Open Access

https://doi.org/10.11591/csit.v4i2.p135-142

Copy DOI

Abstract

This study investigates the impact of data scaling techniques on the performance of the k-nearest neighbor (KNN) algorithm using ten different datasets from various domains. Three commonly used data scaling techniques, min-max normalization, Z-score, and decimal scaling, are evaluated based on the KNN algorithm's performance in terms of accuracy, precision, recall, F1-score, runtime, and memory usage. The study aims to provide insights into the applicability and effectiveness of different scaling techniques in different contexts, aid in the design and implementation of machine learning systems, and help identify the strengths and weaknesses of each technique and their suitability for specific types of data. The results show that data scaling significantly affects the performance of the KNN algorithm, and the choice of scaling method can have significant implications for practical applications. Moreover, the performance of the three scaling techniques varies across different datasets, suggesting that the choice of scaling technique should be made based on the specific characteristics of the data. Overall, this study provides a comprehensive analysis of the impact of data scaling techniques on the KNN algorithm's performance and can help practitioners and researchers in the machine learning community make informed decisions when designing and implementing machine learning systems.

Full Text