Analysis of Machine Learning Algorithms for Anomaly Detection on Edge Devices.

Aleks Huč,Jakob Šalej,Mira Trebar

doi:10.3390/s21144946

Aleks Huč, Jakob Šalej + Show 1 more

Open Access

https://doi.org/10.3390/s21144946

Copy DOI

Journal: Sensors (Basel, Switzerland)	Publication Date: Jul 20, 2021
Citations: 9	License type: CC BY 4.0

Affiliation: University of Ljubljana

Abstract

The Internet of Things (IoT) consists of small devices or a network of sensors, which permanently generate huge amounts of data. Usually, they have limited resources, either computing power or memory, which means that raw data are transferred to central systems or the cloud for analysis. Lately, the idea of moving intelligence to the IoT is becoming feasible, with machine learning (ML) moved to edge devices. The aim of this study is to provide an experimental analysis of processing a large imbalanced dataset (DS2OS), split into a training dataset (80%) and a test dataset (20%). The training dataset was reduced by randomly selecting a smaller number of samples to create new datasets Di (i = 1, 2, 5, 10, 15, 20, 40, 60, 80%). Afterwards, they were used with several machine learning algorithms to identify the size at which the performance metrics show saturation and classification results stop improving with an F1 score equal to 0.95 or higher, which happened at 20% of the training dataset. Further on, two solutions for the reduction of the number of samples to provide a balanced dataset are given. In the first, datasets DRi consist of all anomalous samples in seven classes and a reduced majority class (‘NL’) with i = 0.1, 0.2, 0.5, 1, 2, 5, 10, 15, 20 percent of randomly selected samples. In the second, datasets DCi are generated from the representative samples determined with clustering from the training dataset. All three dataset reduction methods showed comparable performance results. Further evaluation of training times and memory usage on Raspberry Pi 4 shows a possibility to run ML algorithms with limited sized datasets on edge devices.

Highlights

Balanced datasets (DCi)—selected clusters of ofresults representative samples performance metrics, shown as a comparison to identify which ap- from proach is viable for implementation as edge computing on
A detailed performance evaluation of machine learning (ML) algorithms for these model was performed with 5-fold cross validation
Detailed analysis is valuable based on the p sentativedatasets measurement accuracy, F1is score, andThe confusion matrixes to give described comparison.performance metrics, shown as a comparison of results to identify

Summary

Introduction

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. Fog, edge, and mist computing are, more or less, well known paradigms introduced in the Internet of Things (IoT) [1]. By keeping processing closer to the edge of the network, many issues such as low latency, privacy, and location awareness requirements can be mitigated, with the added benefit of increased privacy as raw data are not sent to the cloud. The authors provide an overview of fog computing and other related paradigms, such as edge computing, mist computing, and mobile computing

Objectives

Results

Conclusion