Abstract

The adequacy of traditional transport related issues detection is often limited by physical sparse sensor coverage and reporting incident/issues to the emergency response system is labor intensive. The social media tweet text have been mined so as to identify the complaints regarding various road transportation issues of traffic, accident, and potholes. In order to identify and segregate tweets related to different issues, keyword-based approaches have been used previously, but these methods are solely dependent on seed keywords which are manually given and these set of keywords are not sufficient to cover all tweets posts. So, to overcome this issue, a novel approach has been proposed that captures the semantic context through dense word embedding by employing word2vec model. However, the process of tweet segregation on the basis of semantic similar keywords may suffer from the problem of pragmatic ambiguity. To handle this, Word2Vec model has been applied to match the semantically similar tweets with respect to each category. Furthermore, the hotspots have been identified corresponding to each category. However, due to the scarcity of geo-tagged tweets, we have proposed a hybrid method which amalgamates Named Entity Recognition (NER), Part of speech (POS), and Regular Expression (RE) to extract the location information from the tweet textual content. Due to the lack of availability of the ground truth dataset, model feasibility has been validated from the existing data records (i.e., published by government official accounts and reported on news media) and the evaluation results signify that the stated approach identifies few additional hotspots as compared to the existing reports while analyzing the tweets.

Highlights

  • In India, four major tier-1 cities (Mumbai, Delhi, Kolkata, and Bengaluru) annually losses 22 billion dollar due to congestion

  • In this paper, we introduced a framework that identifies incidents caused by non-recurrent events from the social media platform

  • The proposed framework can be divided into five major components which include collecting data from multiple sources, data preprocessing, identification of similar semantic keywords corresponding to the different categories, removing the pragmatic ambiguity and content based location identification for finding the vulnerable areas

Read more

Summary

INTRODUCTION

In India, four major tier-1 cities (Mumbai, Delhi, Kolkata, and Bengaluru) annually losses 22 billion dollar due to congestion It mainly induced from non-recurrent events such as accident, adverse road conditions, construction on roads, potholes, adverse weather condition, and inadequate drainage. It might be due to the restriction imposed by Twitter over tweet post length, i.e. 140 character limits It makes text classification and information extraction a challenging problem. This paper presents a methodology to crawl, pre-process and filter freely available tweets These tweets post analyzed to extract non-recurrent events information by using deep learning and Natural Language processing (NLP) techniques. The main contribution of this work can be summarized as follows: 1) Semantic Similar keywords:We have proposed and applying an adaptive semi-supervised method for tweets, by leveraging dense word embedding to identify semantic similar keywords for non-recurrent event’s.

RELATED WORK
METHODOLOGY
EXPERIMENTS AND RESULTS
MODEL FEASIBILITY
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call