Outlier detection: How to Select [formula omitted] for [formula omitted]-nearest-neighbors-based outlier detectors

Jiawei Yang,Xu Tan,Sylwan Rahardja

doi:10.1016/j.patrec.2023.08.020

Abstract

Unsupervised k-nearest-neighbor-based outlier detectors play a vital role in data science research. However, the detectors’ performance relies on the choice of the parameter k. However, autonomous selection of the optimal k is poorly documented in literature as it is very challenging. Conventional methods prove ineffective and lack universality as they fail to account for both application and detector factors simultaneously. This article proposes neighborhood consistency, a new concept which tackles the existing issues of selecting the optimal k by considering both application and detector factors. This concept was used to develop a method termed k finder based on neighborhood consistency (KFC). KFC does not rely on any extra parameter and has linear time complexity. Simulations show that KFC outperformed baselines and had a good generality to different datasets and detectors. The implementation of the proposed methods can be found on www.OutlierNet.com for reproducibility.

Full Text