Urban Traffic Event Detection using Twitter Data

Rahul Deb Das ,Ross S Purves

doi:10.5167/uzh-161971

Abstract

Understanding traffic events is important for urban policy making and transport management. Traffic events could be related to traffic congestion, transportation infrastructure issues, parking issues, to name a few. Currently, traffic events are monitored through static sensors e.g., CCTV camera, loop detectors which have limited spatial coverage and high main- tenance cost. Thus, we attempt to use the concept of citizens as sensors and develop a cost-effective model to understand urban traffic events from unstructured and informal tweets. So far existing works attempted to classify tweets either in traffic or non-traffic categorization [1], [3], [4]. Most of the state-of- the-art have used geotagged tweets for identifying traffic events [2], which accounted for only 1%-3% total tweet population, and thus lots of useful information in the ungeotagged tweets may be lost. Some other works explored a number of abstract topics related to urban transportation and environment, however without retrieving any spatial information from the tweet [5], [6]. The main contribution of this work is, in contrast to the earlier works, this research explores ungeotagged tweets to detect traffic events and developed a novel framework (Fig. 1, 2) that does not only categorize traffic related tweets but also retrieve locations of the traffic events from the tweet content. The model has been tested in the city of Mumbai in India where people use different local place names which are often informal and hard to detect using a traditional named entity recognition systems. To detect the locations of the traffic events we developed a hybrid georeferencing model that consists of a supervised model and a number of spatial rules that can handle informal place names and vernacular geographical aspects. For tweet categorization we used a binary classifier based on Decision Tree (DT) with 0.65 precision and 0.57 recall. The tweets are manually labelled into either traffic or non-traffic. Then the classifier is trained using a bag-of-words model. In the next phase, a hybrid georeferencing model is developed. The proposed georeferencing model consists of a pre-trained StanfordNER on the top tier and two spatial rule-based layers in the subsequent tiers (Fig. 3). The rules are based on spatial prepositions, object types and vernacular place names in India. Out of 1143 annotated place names the model can correctly retrieve 691 place names. To disambiguate and geocode the place names, OpenStreetMap has been used. This work shows Twitter can be useful for detecting urban events in Mumbai. One of the challenges in georeferencing the traffic event location lies in the way people mention the place names. The same place name may be mentioned differently by different people or it may not be present in the gazetteer, e.g., OpenStreetMap, which causes difficulty in toponym recognition and disambiguation. In this work the toponyms (retrieved from tweet content) are mapped to precise geo-coordinates to indicate traffic locations. However, traffic events can stretch along a street segment or over a region. Future work will look into understanding the spatial extent of the affected area from other contextual cues and spatial relationships retrieved from the tweet content.

Full Text