Abstract
In order to rapidly process large amounts of sensor stream data, it is effective to extract and use samples that reflect the characteristics and patterns of the data stream well. In this article, we focus on improving the uniformity confidence of KSample, which has the characteristics of random sampling in the stream environment. For this, we first analyze the uniformity confidence of KSample and then derive two uniformity confidence degradation problems: (1) initial degradation, which rapidly decreases the uniformity confidence in the initial stage, and (2) continuous degradation, which gradually decreases the uniformity confidence in the later stages. We note that the initial degradation is caused by the sample range limitation and the past sample invariance, and the continuous degradation by the sampling range increase. For each problem, we present a corresponding solution, that is, we provide the sample range extension for sample range limitation, the past sample change for past sample invariance, and the use of UC-window for sampling range increase. By reflecting these solutions, we then propose a novel sampling method, named UC-KSample, which largely improves the uniformity confidence. Experimental results show that UC-KSample improves the uniformity confidence over KSample by 2.2 times on average, and it always keeps the uniformity confidence higher than the user-specified threshold. We also note that the sampling accuracy of UC-KSample is higher than that of KSample in both numeric sensor data and text data. The uniformity confidence is an important sampling metric in sensor data streams, and this is the first attempt to apply uniformity confidence to KSample. We believe that the proposed UC-KSample is an excellent approach that adopts an advantage of KSample, dynamic sampling over a fixed sampling ratio, while improving the uniformity confidence.
Highlights
A data stream refers to a continuous form of data that is constantly generated.[1,2] In particular, when a stream has a large capacity, such as sensor data,[3] its real-time processing is very costly.[4]
We focus on KSample among the fixed sampling ratio methods, which has the characteristics of random sampling for sensor stream data
As shown in the figure, the uniformity confidence of UC-KSample is always higher than those of KSample and Naive UC-KSample, and it is always higher than the threshold of 70%
Summary
A data stream refers to a continuous form of data that is constantly generated.[1,2] In particular, when a stream has a large capacity, such as sensor data,[3] its real-time processing is very costly.[4]. The past sample invariance arises because the elements already stored in the slot are sent to the secondary storage and cannot be changed To solve these problems, in UC-KSample, we first alleviate the sample range limitation property by including the already sampled data in the sample extraction range and eventually increase the number of samples that can be generated. The continuous degradation occurs because of the sampling range increase in which the range of streams to be sampled becomes greater than the range of streams that can be considered To solve this problem, we present the concept of UC-window in UC-KSample and explain how to determine the size of UC-window to guarantee a certain uniformity confidence. Section ‘‘Experimental evaluation’’ presents the results of experimental evaluation, and section ‘‘Conclusion’’ concludes the article
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Distributed Sensor Networks
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.