Abstract

The Internet of Things (IoT) and sensors are becoming increasingly popular, especially in monitoring large and ambient environments. Applications that embrace IoT and sensors often require mining the data feeds that are collected at frequent intervals for intelligence. Despite the fact that such sensor data are massive, most of the data contents are identical and repetitive; for example, human traffic in a park at night. Most of the traditional classification algorithms were originally formulated decades ago, and they were not designed to handle such sensor data effectively. Hence, the performance of the learned model is often poor because of the small granularity in classification and the sporadic patterns in the data. To improve the quality of data mining from the IoT data, a new pre-processing methodology based on subspace similarity detection is proposed. Our method can be well integrated with traditional data mining algorithms and anomaly detection methods. The pre-processing method is flexible for handling similar kinds of sensor data that are sporadic in nature that exist in many ambient sensing applications. The proposed methodology is evaluated by extensive experiment with a collection of classical data mining models. An improvement over the precision rate is shown by using the proposed method.

Highlights

  • The infrastructure of the Internet of Things (IoT) is establishing rapidly recently, with the hype of smart cities over the world

  • Heterogeneity Activity Recognition Data Set: The Heterogeneity Human Activity Recognition (HHAR) dataset from Smart phones and Smart watches is a dataset devised to benchmark human activity recognition algorithms

  • This paper reports a subspace similarity detection model based on the subspace-attribute probability calculation, and the computational process uses the anomaly detection method

Read more

Summary

Introduction

The infrastructure of the Internet of Things (IoT) is establishing rapidly recently, with the hype of smart cities over the world. The proposed pre-processing part of a classification model calculates the probability between the subspace of source sequential data and the target data This model transforms the subsequent input data into probability data in a period by the length of sliding windows that controls the time interval the resolution. The contributions of this paper are as follows: (i) a pre-processing method suitable for sensor data that may have persistent and redundant data values is proposed. The method converts the original data values into probabilities that are computed based on the similarity of subspace; (ii) The pre-processing method could set the size of the subspace and the length of the sliding window, and it can effectively combine the needs of the time segment analysis in the real task.

Literature Review
Problem Definitions j j
Reconstruct Training Data Table
Reconstruct Test Data Table
Step 3
Datasets
Comparison of Pre-Processing Methods
Evaluation Criteria
Parameters Setting
Result and Analysis
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call