Identifying Public Transit Commuters Based on Both the Smartcard Data and Survey Data: A Case Study in Xiamen, China

Shichao Sun,Dongyuan Yang

doi:10.1155/2018/9693272

Shichao Sun, Dongyuan Yang

Open Access

https://doi.org/10.1155/2018/9693272

Copy DOI

Abstract

Understanding the travel patterns of public transit commuters was important to the efforts towards improving the service quality, promoting public transit use, and better planning the public transit system. Smartcard data, with its wide coverage and relative abundance, could provide new opportunities to study public transit riders’ behaviors and travel patterns with much less cost than conventional data source. However, the major limitation of smartcard data is the absence of social attributes of the cardholders, so that it cannot clearly extract public transit commuters and explain the mechanism of their travel behaviors. This study employed a machine learning approach called Naive Bayesian Classifier (NBC) to identify public transit commuters based on both the smartcard data and survey data, demonstrated in Xiamen, China. Compared with existing methods which were plagued by the validation of the accuracy of the identification results, the adopted approach was a machine learning algorithm with functions of accuracy checking. The classifier was trained and tested by survey data obtained from 532 valid questionnaires. The accuracy rate for identification of public transit commuters was 92% in the test instances. Then, under a low calculation load, it identified the objectives in smartcard data without requiring travel regularity assumptions of public transit commuters. Nearly 290,000 cardholders were classified as public transit commuters. Statistics such as average first boarding time and travel frequency of workdays during peak hours were obtained. Finally, the smartcard data were fused with bus location data to reveal the spatial distributions of the home and work locations of these public transit commuters, which could be utilized to improve public transit planning and operations.

Highlights

Public transit systems have long been regarded as an effective way to mitigate the growing urban congestion, exhaust emissions, and energy consumption caused by the excessive use of private automobiles [1, 2]
An original machine learning algorithm called Naıve Bayesian Classifier (NBC) was adopted in this paper to identify public transit commuters based on both the smartcard data and survey data
Since achieving the identification of public transit commuters in smartcard data was not an easy task in most metropolis of China, this paper extended the application of NBC approach and employed the method to estimate the attribute of cardholders instead of trip purpose

Summary

Introduction

Public transit systems have long been regarded as an effective way to mitigate the growing urban congestion, exhaust emissions, and energy consumption caused by the excessive use of private automobiles [1, 2]. In order to unceasingly improve the performance and promote public transit use, the authorities in metropolises of China have been working for years to obtain a better understanding of passengers’ travel characteristics [5,6,7]. In this context, mining the travel patterns of public transit commuters has received much attention of researchers [1, 4, 8,9,10,11]. Smartcard data can provide more abundant and higher quality travel data with less cost, through which travel patterns could be analyzed based on precise observations of individuals’ smartcard usage in a time period [10, 13,14,15,16,17,18]

Methods

Results

Conclusion