Abstract

AbstractWhen data mining, there will be a lot of abnormal data, abnormal data refers to data in the data set that is inconsistent with most data or deviates from the normal behavior pattern. In this paper, the KNN (k-Nearest Neighbor) algorithm, the Local Outlier Factor algorithm, and the Isolation Forest algorithm will be used to process the MIT-BIH arrhythmia data set. The KNN algorithm is an Anomaly detection algorithm based on distance but may divide normal data into abnormal data due to the deviation of parameter selection. The improvement proposed in this paper is to add weight to the distance to reduce the probability of division error. The Isolation Forest algorithm divides the data according to the characteristics of the data and then predicts the data to be abnormal or normal data. The improvement proposed in this paper is to first select the features of the data, so that the algorithm can be more accurate when dividing the data, thereby improving the detection. Effect. In terms of visual display of test results, this article selects the Receiver Operating Characteristic Curve graph, which can intuitively show the detection effect of the algorithm.KeywordsAnomaly detection algorithmMIT-BIH arrhythmia datasetReceiver operating characteristic curve

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call