Abstract

We consider functional outlier detection from a geometric perspective, specifically: for functional datasets drawn from a functional manifold, which is defined by the data’s modes of variation in shape, translation, and phase. Based on this manifold, we developed a conceptualization of functional outlier detection that is more widely applicable and realistic than previously proposed taxonomies. Our theoretical and experimental analyses demonstrated several important advantages of this perspective: it considerably improves theoretical understanding and allows describing and analyzing complex functional outlier scenarios consistently and in full generality, by differentiating between structurally anomalous outlier data that are off-manifold and distributionally outlying data that are on-manifold, but at its margins. This improves the practical feasibility of functional outlier detection: we show that simple manifold-learning methods can be used to reliably infer and visualize the geometric structure of functional datasets. We also show that standard outlier-detection methods requiring tabular data inputs can be applied to functional data very successfully by simply using their vector-valued representations learned from manifold learning methods as the input features. Our experiments on synthetic and real datasets demonstrated that this approach leads to outlier detection performances at least on par with existing functional-data-specific methods in a large variety of settings, without the highly specialized, complex methodology and narrow domain of application these methods often entail.

Highlights

  • We demonstrate that procedures based on this perspective simplify and improve functional outlier detection in practice: this suggests a principled, yet flexible approach for applying well-established, highly performant standard outlier-detection methods such as local outlier factors (LOF) [6] to functional data, based on embedding coordinates obtained via manifold learning or dimension-reduction methods

  • In addition to applying LOF to 5D embeddings and directly to the functional data, we investigate the performance of four “functional data”-specific outlier-detection methods: directional outlyingness (DO) [14,34], total variational depth (TV) [44], elastic depth (ED_amp, ED_pha) [9], and the approach based on translation, phase, and amplitude boxplots (AP_BOX) presented by Xie et al [15]

  • We note that LOF applied directly to functional data distances yielded very similar results as LOF applied to their 5D embeddings

Read more

Summary

Introduction

To cut through the confusion, we propose a geometric perspective on functional outlier detection based on the well-known “manifold hypothesis” [4,5] This refers to the assumption that ostensibly complex, high-dimensional data lie on a much simpler, lower-dimensional manifold embedded in the observation space and that this manifold’s structure can be learned and represented in a low-dimensional space, often called embedding space. FD usually contain shape and translation, as well as phase variation, i.e., both “vertical” and “horizontal” variability These different kinds of variability contribute to the difficulty of precisely defining and differentiating the various forms of functional outliers and developing methods that can “catch them all”, making outlier detection a highly investigated research topic in FDA. Arribas-Gil and Romo [2] argued that the proposed outlier taxonomy of Hubert et al [3] can be made more precise in terms of expectation functions f (t) and g(t), with f (t) a “common” process; see Figure 1

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.