Abstract

Abstract Purpose The main aim of this study is to build a robust novel approach that is able to detect outliers in the datasets accurately. To serve this purpose, a novel approach is introduced to determine the likelihood of an object to be extremely different from the general behavior of the entire dataset. Design/methodology/approach This paper proposes a novel two-level approach based on the integration of bagging and voting techniques for anomaly detection problems. The proposed approach, named Bagged and Voted Local Outlier Detection (BV-LOF), benefits from the Local Outlier Factor (LOF) as the base algorithm and improves its detection rate by using ensemble methods. Findings Several experiments have been performed on ten benchmark outlier detection datasets to demonstrate the effectiveness of the BV-LOF method. According to the results, the BV-LOF approach significantly outperformed LOF on 9 datasets of 10 ones on average. Research limitations In the BV-LOF approach, the base algorithm is applied to each subset data multiple times with different neighborhood sizes (k) in each case and with different ensemble sizes (T). In our study, we have chosen k and T value ranges as [1–100]; however, these ranges can be changed according to the dataset handled and to the problem addressed. Practical implications The proposed method can be applied to the datasets from different domains (i.e. health, finance, manufacturing, etc.) without requiring any prior information. Since the BV-LOF method includes two-level ensemble operations, it may lead to more computational time than single-level ensemble methods; however, this drawback can be overcome by parallelization and by using a proper data structure such as R*-tree or KD-tree. Originality/value The proposed approach (BV-LOF) investigates multiple neighborhood sizes (k), which provides findings of instances with different local densities, and in this way, it provides more likelihood of outlier detection that LOF may neglect. It also brings many benefits such as easy implementation, improved capability, higher applicability, and interpretability.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.