With the advancement of social sensing technologies, digital maps have recently witnessed a tremendous evolution with the aim of integrating enriched semantic layers from heterogeneous and diverse data sources. Current generations of digital maps are often crowd-sourced, allow interactive route planning, and may contain live updates, such as traffic congestion states. Within this context, we believe that the next generation of maps will introduce the concept of extracting Events of Interest (EoI) from crowdsourced data, and displaying them at different spatial scales based on their significance. This paper introduces Hadath1, a scalable and efficient system that extracts social events from unstructured data streams, e.g. Twitter. Hadath applies natural language processing and multi-dimensional clustering techniques to extract relevant events of interest at different map scales, and to infer the spatio-temporal scope of detected events. Hadath also implements a hierarchical in-memory spatio-temporal indexing scheme to allow efficient and scalable access to raw data, as well as to extracted clusters of events. Initially, data packets are processed to discover events at a local scale, then, the proper spatio-temporal scope and the significance of detected events at a global scale is determined. As a result, live events can be displayed at different spatio-temporal resolutions, thus allowing a smooth and unique browsing experience. Finally, to validate our proposed system, we conducted experiments on real-time and historical social media streams.
Read full abstract