Fast anomaly detection with locality-sensitive hashing and hyperparameter autotuning

Jorge Meira,Carlos Eiras-Franco,Verónica Bolón-Canedo,Goreti Marreiros,Amparo Alonso-Betanzos

doi:10.1016/j.ins.2022.06.035

Jorge Meira, Carlos Eiras-Franco + Show 3 more

Open Access

https://doi.org/10.1016/j.ins.2022.06.035

Copy DOI

Abstract

This paper presents LSHAD, an anomaly detection (AD) method based on Locality Sensitive Hashing (LSH), capable of dealing with large-scale datasets. The resulting algorithm is highly parallelizable and its implementation in Apache Spark further increases its ability to handle very large datasets. Moreover, the algorithm incorporates an automatic hyperparameter tuning mechanism so that users do not have to implement costly manual tuning. Our LSHAD method is novel as both hyperparameter automation and distributed properties are not usual in AD techniques. Our results for experiments with LSHAD across a variety of datasets point to state-of-the-art AD performance while handling much larger datasets than state-of-the-art alternatives. In addition, evaluation results for the tradeoff between AD performance and scalability show that our method offers significant advantages over competing methods.

Full Text