Abstract
Outlier detection tasks refer to identifying the objects that have different characteristics from the normal observations. Most existing approaches detect outliers from the global perspective, which can effectively detect global outliers and most clustered outliers but cannot detect local outliers when the normal samples form clusters with different densities. The methods based on local outlier factors can effectively detect local outliers, but when the number of outliers increases, the more occurrences of clustered outliers will lead to the degeneration of the detection performance. We proposed an outlier detection method based on density–distance decision graph to detect local, global and clustered outliers simultaneously. Firstly, kernel density estimation and local reachable distance are combined to calculate the local density. The density ratio of the neighbors of an instance to itself is calculated as the degree of local outliers. Then, we propose a metric named density lifting distance as the degree of global outliers, which is calculated by the distance between k nearest neighbors with higher density of the instance and itself. The density ratio and density lift distance are combined to draw the density–distance decision graph, and the product of two metrics is calculated as the final outlier score. Comprehensive experiments were conducted on 8 synthetic datasets and 16 real-world datasets compared with 12 state-of-the-art methods. The results show that the proposed method works well when the samples form clusters with different densities as well as the percentage of outliers varies, and outperforms the state-of-the-art methods tested in terms of AUC.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Engineering Applications of Artificial Intelligence
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.