Abstract

While individual data are key for epidemiology, social simulation, economics, and various other fields, data owners are increasingly required to protect the personally identifiable information from data. Simple data de-identification or ‘data masking’ measures are limited, because they both reduce the utility of the dataset and are not sufficient to protect individual confidentiality. This paper provides detail on the creation of a synthetic trip data in transportation, with the Smart Card data as the case study. It discusses and compares two machine learning methods, a Generative Adversarial Network and a Bayesian Network for modelling and generating this dataset. The synthetic data retain important utility of the real dataset, e.g., the origin, destination, and time of travel, while each data point does not represent a real trip in the original dataset. The synthetic dataset can be used in various applications, including microsimulation of public transport systems, analysing travel behaviours, model predictive control of transit flows, or evaluation of transport policies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call