Abstract

AbstractThe ability to collect and store ever more massive data, unlabeled in many cases, has been accompanied by the need to process them efficiently in order to extract relevant information and possibly design solutions based on the latter. In various situations, the vast majority of the observations exhibit the same behavior, while a small proportion deviates from it. Detecting these outlier observations (or equivalently defined as anomalies) is now one of the major challenges for machine learning applications (e.g. fraud detection or predictive maintenance). We propose here a novel methodology for outlier/anomaly detection, by learning a scoring function defined on the feature space allowing for ranking the observations by degree of abnormality. The scoring function is built through maximization of an empirical performance criterion taking the form of a (two-sample) linear rank statistic. We show that bipartite ranking algorithms can thus be used to learn nearly optimal scoring function with provable theoretical guarantees. We illustrate our methodology with numerical experiments based on open access online code.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.