Abstract

We propose a framework for complex event processing (CEP) coupled with predictive analytics to predict simple events and complex events on Internet of Things (IoT) data streams. The data is consumed through a REST service containing the traffic data of around 2,000 locations in the city of Madrid, Spain. This prediction of complex events will help users in understanding the future state of road traffic and hence take meaningful decisions. For predicting events, we propose a framework that uses WSO2 Siddhi as CEP, along with InfluxDB as persistent storage. The data is consumed in the CEP with the help of a high-speed Apache Kafka messaging pipeline. This data is used to build predictive models inside the CEP that helps users to derive meaningful insights. However, in these event analytics engines, the events are created via rules that are triggered when the streaming data exceeds a certain threshold. The calculation of the “threshold” is utmost necessary as it acts as the means for the generation of simple events and complex events in an event analytics scenario. We have proposed a novel 2-fold approach for finding out the thresholds in such large datasets. We have taken the help of unsupervised learning to get the idea of thresholds. The first phase uses Node-RED and serverless computing to create the thresholds and then supply them back to the CEP for prediction. The machine learning models run on a cloud service, and the predictions or thresholds are returned back through REST services into the CEP. In the second phase, it not only creates the thresholds but also uses novel hypothesis testing techniques along with windowing mechanism on data streams to implement clustering and supply the result back into the CEP. This approach leverages on the usage of statistical techniques to understand the change in distribution of data. The changes in the data distributions trigger the retraining of the machine learning models, and the results are given back into the CEP for being used in an event generation scenario. We have also included a section in which we have incorporated a statistical analysis on the dataset used.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call