Abstract

Recently in Dufrenois [1], a new Fisher type contrast measure has been proposed to extract a target population in a dataset contaminated by outliers. Although mathematically sound, this work presents some further shortcomings in both the formalism and the field of use. First, we propose to re-express this problem from the formalism of proximal support vector machines as introduced in Mangasarian and Wild [2]. This change is far from harmless since it introduces a suited writing for solving the problem. Another limiting factor of the method is that its performance relies on the assumption that the density between the target and outliers are different. This consideration can easily prove to be over-optimistic for real world datasets making the method unreliable, at least directly. The computation of the decision boundary is a time consuming part of the algorithm since it is based on solving a generalized eigenvalue problem (GEP). This method is therefore limited to medium sized data sets. In this paper, we propose appropriate strategies to unlock all these shortcomings and fully benefit from the interest of the approach. Firstly, we show under some conditions that generating appropriate artificial outliers allows to stay within the constraints of the method and thus enlarges the conditions of use. Secondly, we show that the GEP can be advantageously replaced by a conjugate gradient solution (CG) significantly decreasing the computational cost. Lastly, the proposed algorithm is compared with recent novelty detectors on synthetic and real datasets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.