Explore the role and emphasis of K-Means, Decision Tree and Distance Based algorithms in data exception detection

Xiaotian Chen

doi:10.1088/1742-6596/2634/1/012050

Abstract

K-Means, Decision Tree and Distance-Based algorithms are 3 important ways of classifying data. These three algorithms have different methods and focus on data classification. Therefore, they are always applied into different scenarios. K-Means algorithm is to divide three-dimensional data or two-dimensional data into several clusters to facilitate subsequent data processing and analysis. Decision Tree is based on the “tree” structure to make decisions. It is an important classification and regression method in data mining technology. It is a prediction analysis model expressed in the form of tree structure (including binary tree and multi tree). When it comes to the Distance-Based algorithm, it is a common anomaly detection method applicable to various data domains. It defines outliers based on the nearest neighbor distance. The purpose of this paper is to explore the kernel of three anomaly detection algorithms through an example of data anomaly detection. Therefore, in this paper, all three algorithms’ commonalities and differences will be discussed and illustrated through a case of data exception handling.

Full Text