Abstract
Finding outliers in functional infinite-dimensional vector spaces is widely present in the industry for data that may originate from physical measurements or numerical simulations. An automatic and unsupervised process of outlier identification can help ensure the quality of a dataset (trimming), validate the results of industrial simulation codes, or detect specific phenomena or anomalies. This paper focuses on data originating from expensive simulation codes to take into account the realistic case where only a limited quantity of information about the studied process is available. A detection methodology based on different features, such as h-mode depth or the dynamic time warping, is proposed to evaluate the outlyingness both in the magnitude and shape senses. Theoretical examples are used to identify pertinent feature combinations and showcase the quality of the detection method with respect to state-of-the-art methodologies of detection. Finally, we show the practical interest of the method in an industrial context thanks to a nuclear thermal-hydraulic use case and how it can serve as a tool to perform sensitivity analysis on functional data.
Highlights
This paper deals with the problem of finding outliers, i.e. data that differ distinctly from other elements of the considered dataset, when they belong to functional infinitedimensional vector spaces
The other main kind of outliers are shape outliers. This type of functional outlier is significantly more difficult to detect, but several techniques able to deal with them have been developed in recent years
It is possible that retaining a higher number of modes in this case could allow better detection capabilities, but this procedure greatly increases the curse of dimensionality problem (even if this subject is not treated in the paper by Hyndman (2009)), and it does not allow visualization purposes
Summary
This paper deals with the problem of finding outliers, i.e. data that differ distinctly from other elements of the considered dataset, when they belong to functional infinitedimensional vector spaces. The most basic form these objects can adopt are one-dimensional real functions, which might represent the evolution of a physical parameter of interest over time. These data are normally generated through an actual empirical measure, or a simulation code. As the functional data are infinite-dimensional by nature, functional data analysis methods always rely on a dimension reduction technique, whether implicitly or explicitly. It is the case in the context of classification (Chamroukhi and Nguyen, 2019), clustering (Slaets et al., 2012), landmark research and registration (Ieva et al, 2011) of functional data. By providing more synthetic descriptors of functional observations, functional data analysis methods allow a more practical treatment of data thanks to the available multivariate tools
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.