Abstract

The detection of anomalous data patterns is one of the most prominent machine learning use cases in industrial applications. Unfortunately very often there are no ground truth labels available and therefore it is good practice to combine different unsupervised base learners with the hope to improve the overall predictive quality. Here one of the challenges is to combine base learners that are accurate and divers at the same time, where another challenge is to enable model explainability. In this paper we present BHAD, a fast unsupervised Bayesian histogram anomaly detector, which scales linearly with the sample size and the number of attributes and is shown to have very competitive accuracy compared to other analyzed anomaly detectors. For the problem of model explainability in unsupervised outlier ensembles we introduce a generic model explanation approach using a supervised surrogate model. For the problem of ensemble construction we propose a greedy model selection approach using the mutual information of two score distributions as a similarity measure. Finally we give a detailed description of a real fraud detection application from the corporate insurance domain using an outlier ensemble, we share various feature engineering ideas as well as discuss practical challenges.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.