This study explores the scalability of machine learning models for estimating walking and cycling volumes across the extensive New South Wales (NSW) Six Cities Region in Australia using mobile phone and crowdsourced data. Previous research has focused on localized applications, missing the complexities of larger networks. The research addresses this gap by identifying unique challenges such as the scarcity and representativeness of observed count data, gaps in the crowdsourced and mobile phone data, and inconsistencies in link-level volume estimates. We propose and demonstrate the application of strategies like enhancing geographical diversity of observed count data and employing an extensive cross-validation approach in model training and testing. By leveraging various auxiliary datasets, the study demonstrates the effectiveness of these strategies in improving model performance. These findings provide valuable insights for transportation modelers, policymakers, and urban planners, offering a robust framework for supporting sustainable transportation infrastructure and policies with advanced data-driven methodologies.
Read full abstract