A Novel Approach of Ensemble Learning with Feature Reduction for Classification of Binary and Multiclass IoT Data

Mr. Vijay M. Khadse Et al.

doi:10.17762/turcomat.v12i6.4811

Abstract

The number of network and sensor-enabled devices in the Internet of Things (IoT) domains is growing extremely, leading to a huge production of data. These data contain important information which can be used in various areas, such as science, industry, medical, and even social life. To make the IoT system smart, the only solution is entering the world of machine learning. Many machine learning algorithms are introduced for handling such a huge amount of IoT data. It is very difficult to find the best-suited algorithm for problems in the IoT domain. This study combined three ensemble models and proposed a new model termed the “hybrid model”. A set of features are extracted from the raw IoT datasets from diverse IoT domains, using Principal component analysis (PCA), Linear discriminant analysis (LDA), and Isomap for classification problems. Performance comparison of the classifiers is provided in terms of their accuracy, area under the curve (AUC), and F1 score. This comparative study’s experimental result shows that Hybrid with PCA and Stacking ensemble technique in particular with PCA have better overall performance than other ensemble techniques for binary class and multie class datasets respectively

Highlights

The Internet of Things (IoT) is the most widely spreading fields in every aspect of human life (Singh and singh, 2015)
This study considers Principal component analysis (PCA), Linear discriminant analysis (LDA) and Isomap as feature reduction techniques to improve the performance of the model on diverse multi-domain binary and multiclass multi-domain IoT datasets
This comparative study investigated the possibility of applying bagging, boosting, stacking and hybrid ensemble algorithms with PCA, LDA and Isomap to improve the performance on IoT sensor datasets

Summary

Introduction

The Internet of Things (IoT) is the most widely spreading fields in every aspect of human life (Singh and singh, 2015). IoT devices generate huge amounts of data in every field of their application. Data generated for IoT systems is mostly continuous values. It has an advantage over categorical data, as it can be naturally ordered and similarity and distance functions can be defined on them (Boriah et al, 2008; Wilson and Martinez, 1997). Raw data generated by IoT devices need to be abstracted. There is a widespread use of ensemble models of ML and pattern recognition application due to their ability to significantly improve accuracy as compared to base algorithms

Methods

Results

Conclusion