Abstract

Encouraging the use of public transport is essential to combat congestion and pollution in an urban environment. To achieve this, the reliability of arrival time prediction should be improved as this is one area of improvement frequently requested by passengers. The development of accurate predictive algorithms requires good quality data, which is often not available. Here we demonstrate a method to synthesise data using a reference curve approach derived from very limited real world data without reliable ground truth. This approach allows the controlled introduction of artefacts and noise to simulate their impact on prediction accuracy. To illustrate these impacts, a recurrent neural network next-step prediction is used to compare different scenarios in two different UK cities. The results show that a realistic data synthesis is possible, allowing for controlled testing of predictive algorithms. It also highlights the importance of reliable data transmission to gain such data from real world sources. Our main contribution is the demonstration of a synthetic data generator for public transport data, which can be used to compensate for low data quality. We further show that this data generator can be used to develop and enhance predictive algorithms in the context of urban bus networks if high-quality data is limited, by mixing synthetic and real data. • Method to generate synthetic bus journeys based on very limited and unreliable data. • Insights into data quality challenges in public transport systems. • Algorithms can be improved by training on a mixture of synthetic and real-world data. • The demonstrated data synthesis could be the base for a public transport benchmarking dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call