Abstract

Household surveys provide immense value in the fields of transportation and urban planning. However, even the most well-funded surveying agencies rely on sampling methods to estimate the nature of the true population, and the collected microdata is frequently aggregated, or limited in volume and detail to protect the privacy of respondents. Population synthesis provides a means to scale this microdata to represent larger regions for use in microsimulation. Despite their accuracy and widespread adoption, traditional synthesis algorithms for reweighting microdata samples scale poorly with the number of variables and geographic regions being modeled, and can suffer from non-convergence with smaller sample sizes. Several generative models have been proposed to address these shortcomings, but lack features such as sub-region modeling, and the ability to simultaneously generate both individuals and households. This work proposes an extension to recent generative approaches capable of generating synthetic populations consisting of both individual and household-level variables, that uses a two-part Variational Autoencoder (VAE) and Conditional-VAE (CVAE) to learn a distribution of latent variables in the general population, and use them to generate new samples. This can help in synthesizing smaller, traditionally under-sampled groups. This approach is benchmarked against a state of the art open source population synthesizer. In addition, the VAE/CVAE model is tested under increasingly minimal training data. Findings indicate the VAE/CVAE model creates more accurate populations, using less time than the traditional synthesizer under small to medium dimensional datasets (4–16 variables). The VAE/CVAE also performs well with few (n = 100) training samples, with diminishing returns for additional training samples.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call