Abstract

Sensor networks in real-world environments, such as smart cities or ambient intelligent platforms, provide applications with large and heterogeneous sets of data streams. Outliers—observations that do not conform to an expected behavior—has then turned into a crucial task to establish and maintain secure and reliable databases in this kind of platforms. However, the procedures to obtain accurate models for erratic observations have to operate with low complexity in terms of storage and computational time, in order to attend the limited processing and storage capabilities of the sensor nodes in these environments. In this work, we analyze three binary classifiers based on three statistical prediction models—ARIMA (Auto-Regressive Integrated Moving Average), GAM (Generalized Additive Model), and LOESS (LOcal RegrESSion)—for outlier detection with low memory consumption and computational time rates. As a result, we provide (1) the best classifier and settings to detect outliers, based on the ARIMA model, and (2) two real-world classified datasets as ground truths for future research.

Highlights

  • In the two last decades, technology innovation has led to intelligent environments [1] such as smart homes [2], smart hospitals [3], or even smart cities [4]

  • The three statistical predicting models we evaluate in the present work are ARIMA (Auto-Regressive Integrated Moving Average), GAM (Generalized Additive Model), and LOESS (LOcal RegrESSion), and they are oriented to particular time series

  • The model that achieves the best scores for Research question 1 (RQ1) and Research question 2 (RQ2) in terms of accuracy is ARIMA, but it is very important to note that the model settings must be different depending on the specific goal—the parameterization for RQ1 is not valid for RQ2 and vice versa

Read more

Summary

Introduction

In the two last decades, technology innovation has led to intelligent environments [1] such as smart homes [2], smart hospitals [3], or even smart cities [4]. IT (Information Technology) infrastructures such as sensors enable making decisions by providing real-time information of the environment to end users, leveraging interconnected devices in a huge number of domains. Outliers correspond to observations that deviate from other observations in a sample and do not conform to an expected pattern or other items in a dataset [9] They may refer either to inconsistent data or good data that may point missing values. The existing works explained in the subsections apply different approaches to detect outliers, which can be mainly grouped into (a) statistical techniques and (b) data-mining or machine-learning algorithms. Both types can be divided into unsupervised or supervised methods. Observations on unsupervised approaches are not previously labeled—previously classified as outliers or not

Objectives
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.