PAS3-HSID: a Dynamic Bio-Inspired Approach for Real-Time Hot Spot Identification in Data Streams

Rebecca Tickle,Mohammad Mesgarpour,Grazziela P Figueredo,Isaac Triguero,Robert I John

doi:10.1007/s12559-019-09638-y

Abstract

Hot spot identification is a very relevant problem in a wide variety of areas such as health care, energy or transportation. A hot spot is defined as a region of high likelihood of occurrence of a particular event. To identify hot spots, location data for those events is required, which is typically collected by telematics devices. These sensors are constantly gathering information, generating very large volumes of data. Current state-of-the-art solutions are capable of identifying hot spots from big static batches of data by means of variations of clustering or instance selection techniques that pre-process the original input data, providing the most relevant locations. However, these approaches neglect to address changes in hot spots over time. This paper presents a dynamic bio-inspired approach to detect hot spots in big data streams. This computational intelligence method is designed and applied to the transportation sector as a case study to identify incidents in the roads caused by heavy goods vehicles. We adapt an immune-based algorithm to account for the temporary aspect of hot spots inspired by the idea of pheromones, which is then subsequently implemented using Apache Spark Streaming. Experimental results on real datasets with up to 4.5 million data points—provided by a telematics company—show that the algorithm is capable of quickly processing large streaming batches of data, as well as successfully adapting over time to detect hot spots. The outcome of this method is twofold, both reducing data storage requirements and demonstrating resilience to sudden changes in the input data (concept drift).

Highlights

Hot spot identification is a very relevant problem in a wide variety of areas such as health care, energy or transportation
- provided by a telematics company - show that the algorithm is capable of quickly processing large streaming batches of data, as well as successfully adapting over time to detect hot spots
In this work we have presented an approach for vehicle hot spot identification in data streams, adapting an existing instance selection method, SeleSup, with a pheromonebased mechanism that ensures the hot spots found are reflective of the recent incident distribution

Summary

Introduction

Hot spot identification is a very relevant problem in a wide variety of areas such as health care, energy or transportation. Current state-of-the-art solutions are capable of identifying hot spots from big static batches of data by means of variations of clustering or instance selection techniques that pre-process the original input data, providing the most relevant locations. These approaches neglect to address changes in hot spots over time. HSID methods can be applied commercially, for example, by using mobile phone data to determine most frequently visited places and provide targeted marketing interventions While these examples mostly belong to unrelated disciplines, their commonality is that the establishment of a set of hot spots relies on location data. This could be the physical distance between the locations of events; in others, additional constraints may be required when determining whether a specific event contributes to a hot spot or not

Objectives

Results

Conclusion