Abstract

Outlier detection is an important and challenging problem in industrial automation, where data are often collected in large amounts but with little labeled information. To realize real-time outlier detection on data streams, many models have been proposed in the academic. However, most existing outlier detection algorithms still have two main limitations: (1) Need a large amount of memory to store data. (2) Poor detection of high-dimensional data in application scenarios. In this paper, we propose a new algorithm, called CELOF which can effectively overcome the two limitations. In CELOF, We first use information entropy to construct a new index weight calculation method, which can distinguish the influencing factors of different indexes and improve the detection accuracy of multi-dimensional data. Next, we designed a new reachable distance factor discrimination method to extract the original data information and then proposed a new strategy for outlier detection, which can greatly reduce the amount of data storage. Finally, the final experiment result shows that the CELOF algorithm has an average improvement of 15% in accuracy compared to the state-of-the-art algorithms, and the CELOF’s running time less than 1% of the original LOF. Additionally, our comprehensive experiments use different real data sets for simulation, and the results show that our algorithm can be widely used in different practical application scenarios without any prior information and data distribution.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call