Abstract
Recent developments in cloud computing and the Internet of Things have enabled smart environments, in terms of both monitoring and actuation. Unfortunately, this often results in unsustainable cloud-based solutions, whereby, in the interest of simplicity, a wealth of raw (unprocessed) data are pushed from sensor nodes to the cloud. Herein, we advocate the use of machine learning at sensor nodes to perform essential data-cleaning operations, to avoid the transmission of corrupted (often unusable) data to the cloud. Starting from a public pollution dataset, we investigate how two machine learning techniques (kNN and missForest) may be embedded on Raspberry Pi to perform data imputation, without impacting the data collection process. Our experimental results demonstrate the accuracy and computational efficiency of edge-learning methods for filling in missing data values in corrupted data series. We find that kNN and missForest correctly impute up to 40% of randomly distributed missing values, with a density distribution of values that is indistinguishable from the benchmark. We also show a trade-off analysis for the case of bursty missing values, with recoverable blocks of up to 100 samples. Computation times are shorter than sampling periods, allowing for data imputation at the edge in a timely manner.
Highlights
Smart environments find themselves at the intersection of the Internet of Things (IoT) and cloud computing, and are capable of gathering information on the surroundings, as well as manipulating it in order to accommodate certain conditions [1]
We advocate for the use of solutions involving edge computing, a paradigm proposed for solving IoT and localized computation needs [8,9,10]
We tackled the task of dealing with missing data within IoT smart environments at the edge for a set scenario
Summary
Smart environments find themselves at the intersection of the Internet of Things (IoT) and cloud computing, and are capable of gathering information on the surroundings (monitoring), as well as manipulating it in order to accommodate certain conditions (actuation) [1]. A challenge that arises is the management of the big IoT data generated by these types of systems [2]. We advocate for the use of solutions involving edge computing, a paradigm proposed for solving IoT and localized computation needs [8,9,10]. In this way, part of the processing can be done at the edge, close to the data source, which in turn results in costs savings related to data transmission, latency and bandwidth usage among other benefits. Examples of tasks in an edge computing scenario can include task-based resource allocation [11], service scheduling for power distribution [12], task offloading mechanisms [13], local sentiment analysis [14], charging, and discharging networking system algorithms for electric vehicles [15], etc
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.