Robustness Evaluations of Sustainable Machine Learning Models against Data Poisoning Attacks in the Internet of Things

Corey Dunn,Nour Moustafa,Benjamin Turnbull

doi:10.3390/su12166434

Abstract

With the increasing popularity of the Internet of Things (IoT) platforms, the cyber security of these platforms is a highly active area of research. One key technology underpinning smart IoT systems is machine learning, which classifies and predicts events from large-scale data in IoT networks. Machine learning is susceptible to cyber attacks, particularly data poisoning attacks that inject false data when training machine learning models. Data poisoning attacks degrade the performances of machine learning models. It is an ongoing research challenge to develop trustworthy machine learning models resilient and sustainable against data poisoning attacks in IoT networks. We studied the effects of data poisoning attacks on machine learning models, including the gradient boosting machine, random forest, naive Bayes, and feed-forward deep learning, to determine the levels to which the models should be trusted and said to be reliable in real-world IoT settings. In the training phase, a label modification function is developed to manipulate legitimate input classes. The function is employed at data poisoning rates of 5%, 10%, 20%, and 30% that allow the comparison of the poisoned models and display their performance degradations. The machine learning models have been evaluated using the ToN_IoT and UNSW NB-15 datasets, as they include a wide variety of recent legitimate and attack vectors. The experimental results revealed that the models’ performances will be degraded, in terms of accuracy and detection rates, if the number of the trained normal observations is not significantly larger than the poisoned data. At the rate of data poisoning of 30% or greater on input data, machine learning performances are significantly degraded.

Highlights

With an estimated 50 billion active devices by the end of 2020, the Internet of Things (IoT) is one of the fastest developing fields in computing [1]
This is best highlighted in the models trained on the ToN_IoT dataset; between 0% and 20%, the more reliable models showed a drop in accuracy and precision directly proportionate to the percentage of poisoned data in the training set
Based on building shallow and deep learning models using two datasets, the results demonstrated that increasing the number of manipulated labels led to reduced accuracy and increased false positive rates with different cross validation settings (i.e., k = 10 for the four machine learning models using the two datasets)

Summary

Introduction

With an estimated 50 billion active devices by the end of 2020, the Internet of Things (IoT) is one of the fastest developing fields in computing [1]. The cloud represents the second half of an IoT ecosystem, and is often forgotten, as there is often little known about how data are collected, analysed, and output. This half is vulnerable, and the implementation of different systems is outside of public analysis. IoT ecosystems include three main components: small devices that contain sensors and actuators, network communications, and cloud-based storage and processors [17]. The consumer-facing devices are the cyber-physical component, comprised of processors, sensors, and/or actuators These devices utilise minimal processing, are often low-cost, and are designed to rely on network connectivity.

Objectives

Methods

Results

Discussion

Conclusion