Abstract

Road transportation is the backbone of modern societies, yet it costs annually over a million deaths and trillions of dollars to the global economy. Social media such as Twitter have increasingly become an important source of information in many dimensions of smart societies. Automatic detection of road traffic events using Twitter data mining is one such area of a great many applications and enormous potential, albeit facing major challenges concerning the management and analysis of big data (volume, velocity, variety, and veracity). Various approaches on the subject have been proposed in recent years, but the methods and outcomes are in their infancy. This paper proposes a method for automatic detection of road traffic related events from tweets in the Saudi dialect using machine learning and big data technologies. Firstly, we build and train a classifier using three machine learning algorithms, Naive Bayes, Support Vector Machine, and logistic regression, to filter tweets into relevant and irrelevant. Subsequently, we train other classifiers to detect multiple types of events including accident, roadwork, road closure, road damage, traffic condition, fire, weather, and social events. The results from the analysis of one million tweets show that our method is able to detect road traffic events, as well as their location and time, automatically, without any prior knowledge of the events. To the best of our knowledge, this is the first work on traffic event detection from Arabic tweets using machine learning and the Apache Spark big data platform.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.