Public Transport Authorities generate large quantities of data as part of their daily operations, including vehicle positions, arrival times, and dynamic routing options. This information is essential feedback into the system for planning routes and timetables, modelling passenger demand growth and evaluating operational performance. The General Transit Feed Specification (GTFS) is a data format that allows public transport data to be consumed by a wide variety of software applications. There are barriers to widespread consumption of GTFS data related to location-specific data extensions, non-human-readable formats and error cleaning. This paper describes a flexible dataset of actual bus arrivals and departure times created with a pipeline for GTFS realtime feeds designed to address these challenges. The paper describes the pipeline, verifies the quality of the data and presents an output of 25 months of actual bus arrival and departure times for Sydney, Australia. We conclude by discussing relevance to researchers and practitioners of the pipeline outputs in general and the sample data specifically.
Read full abstract