Obtaining accurate speed and travel time information is a challenge for researchers, geographers, and transportation agencies. In the past, traffic data were usually acquired and disseminated by government agencies through fixed-location sensors. High costs, infrastructure demands, and low coverage levels of these sensor devices require agencies and researchers to look beyond the traditional approaches. With the emergence of smartphones and navigation apps, location-based and crowdsourced Big Data are receiving increased attention. In this regard, location-based big data (LocBigData) collected from probe vehicles and road users can be used to provide speed and travel time information in different locations. Examining the quality of crowdsourced data is essential for researchers and agencies before using them. This study assessed the quality of Waze speed data from surface streets and conducted a case study in Sevierville, Tennessee. Typically, examining the quality of these data in surface streets and arterials is more challenging than freeways data. This research used Bluetooth speed data as the ground truth, which is independent of Waze data. In this study, three steps of methodology were used. In the first step, Waze speed data was compared to Bluetooth data in terms of accuracy, mean difference, and distribution similarity. In the second step, a k-means algorithm was used to categorize Waze data quality, and a multinomial logistics regression model was performed to explore the significant factors that impact data quality. Finally, in the third step, machine learning techniques were conducted to predict the data quality in different conditions. The result of the comparison showed a similar pattern and a slight difference between datasets, which verified the quality of Waze speed data. The statistical model indicates that that Waze speed data are more accurate in peak hours than in night hours. Also, the traffic speed, traffic volume, and segment length have a significant association on the accuracy of Waze data on surface streets. Finally, the result of machine learning prediction showed that a KNN method performed the highest prediction accuracy of 84.5% and 82.9% of the time for training and test datasets, respectively. Overall, the study results suggest that Waze speed data is a promising data source for surface streets.
Read full abstract