Creating a surrogate commuter network from Australian Bureau of Statistics census data

Kristopher M Fair,Cameron Zachreson,Mikhail Prokopenko

doi:10.1038/s41597-019-0137-z

Abstract

Between the 2011 and 2016 national censuses, the Australian Bureau of Statistics changed its anonymity policy compliance system for the distribution of census data. The new method has resulted in dramatic inconsistencies when comparing low-resolution data to aggregated high-resolution data. Hence, aggregated totals do not match true totals, and the mismatch gets worse as the data resolution gets finer. Here, we address several aspects of this inconsistency with respect to the 2016 usual-residence to place-of-work travel data. We introduce a re-sampling system that rectifies many of the artifacts introduced by the new ABS protocol, ensuring a higher level of consistency across partition sizes. We offer a surrogate high-resolution 2016 commuter dataset that reduces the difference between the aggregated and true commuter totals from ~34% to only ~7%, which is on the order of the discrepancy across partition resolutions in data from earlier years.

Highlights

Background & SummaryHigh-resolution commuter network information, as well as general information describing population distributions[1], is a major factor in the computational modeling of diffusion phenomena in various contexts: demographic[2], epidemiological[3,4,5,6], economic[7], ecological[8] and so on
Privacy constraints on released Census data, in the presence of intricate dependencies between population and employment distributions in relatively small, highly urbanized, but spatially spread countries, such as Australia, coupled with changes in data protocols across census years, present specific challenges in reconstructing commuter networks with sufficiently high fidelity[1,9,10,11,12]. These challenges manifest in two ways. The first of these pertains to individual microdata, which is organized by household to capture information about both the individual and housing unit
We pre-processed all data provided by the Australian Bureau of Statistics (ABS) to remove the edges that link to non-geographic regions such as “Migratory/offshore/shipping” and “No usual address”

Summary

Background & Summary

High-resolution commuter network information, as well as general information describing population distributions[1], is a major factor in the computational modeling of diffusion phenomena in various contexts: demographic[2], epidemiological[3,4,5,6], economic[7], ecological[8] and so on. To understand this result in more detail, it is helpful to note that the spatial distribution of the working population is very heterogeneous, with an exponentially larger fraction of the working population employed within the central business districts of major cities.

Methods

Findings

Code Availability