Abstract

AbstractWith accurate data, governments can make the most informed decisions to keep people safer through pandemics such as the COVID-19 coronavirus. In such events, data reliability is crucial and therefore outlier detection is an important and even unavoidable issue. Outliers are often considered as the most interesting observations, because the fact that they differ from the data majority may lead to relevant findings in the subject area. Outlier detection has also been addressed in the context of multivariate functional data, thus smooth functions of several characteristics, often derived from measurements at different time points (Hubert et al. in Stat Methods Appl 24(2):177–202, 2015b). Here the underlying data are regarded as compositions, with the compositional parts forming the multivariate information, and thus only relative information in terms of log-ratios between these parts is considered as relevant for the analysis. The multivariate functional data thus have to be derived as smooth functions by utilising this relative information. Subsequently, already established multivariate functional outlier detection procedures can be used, but for interpretation purposes, the functional data need to be presented in an appropriate space. The methodology is illustrated with publicly available data around the COVID-19 pandemic to find countries displaying outlying trends.

Highlights

  • The crisis caused by COVID-19 in almost all areas of life has revealed that an accurate data collection is a challenge that cannot be resolved due to political or logistic problems

  • Many countries report the number of cases, deaths, tests, and further parameters related to the COVID-19 pandemic regularly over time, and the data are accessible in public data repositories

  • The source of information for the analysis would not consist in the number of cases, death, tests, etc., for a particular day in a particular country, but in theratios between these numbers. This is what is done in compositional data analysis, and outlier detection in this context will focus on atypical behaviour in the multivariate information of suchratios

Read more

Summary

12.1 Introduction

The crisis caused by COVID-19 in almost all areas of life has revealed that an accurate data collection is a challenge that cannot be resolved due to political or logistic problems. Many countries report the number of cases, deaths, tests, and further parameters (variables) related to the COVID-19 pandemic regularly over time, and the data are accessible in public data repositories. Instead of directly considering the reported number (represented by the functions), one could focus on analysing relative information This can be done by taking (log-)ratios between the variables. The source of information for the analysis would not consist in the number of cases, death, tests, etc., for a particular day in a particular country, but in the (log-)ratios between these numbers This is what is done in compositional data analysis, and outlier detection in this context will focus on atypical behaviour in the multivariate information of such (log-)ratios. In this paper we consider a new method for the detection of outliers in the compositional functional data setting.

12.1.1 Compositional Data Analysis Concepts
12.1.2 Functional Data
12.2 Smoothing for CODA Time Series
12.3 Outlier Detection in Compositional FDA
12.4 Application to COVID-19 Data
12.5 Summary and Conclusions
Methods

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.