Abstract
This research developed a method for evaluating and integrating emerging sources (Strava, StreetLight, and Bikeshare) of bicycle activity data with conventional demand data (permanent counts, short-duration counts) using traditional (Poisson) and advanced machine learning techniques. First, a literature review was conducted, along with cataloging and evaluating available third-party data sources and existing applications. Next, six sites (Boulder, Charlotte, Dallas, Portland, Bend, and Eugene) that represented a variety of contexts (urban, suburban) and geographical diversity were selected. Of these, Boulder, Charlotte and Dallas constituted the basic sites, where one year of data (i.e., 2019) was used for modeling. Portland, Bend, and Eugene in Oregon were considered enhanced sites, where three years of data (2017-2019) were used for model estimation. Demographic, network, count and emerging data were gathered for these sites. Using these data, Poisson and Random Forest models were estimated. The model estimation process was designed to allow for comparison of the relative accuracy and value added by different data sources and modeling techniques. Three sets of models were specified – All City Pooled, Oregon Pooled and city-specific models. In general, the three data sources (static, Strava, and StreetLight) appeared to be complementary to one another; that is, adding any two data sources together tended to outperform each data source on its own. Low-volume sites proved challenging, with the best-performing models still demonstrating considerable prediction error. City-specific models generally displayed better model fit and prediction performance. Using Strava or StreetLight counts to predict annual average daily bicycle traffic (AADBT) without static adjustment variables increased expected prediction error by a factor of about 1.4 (i.e., a 40 % increase in %RMSE). That rule of thumb figure of 1.4 times was only slightly lower when combining Strava plus StreetLight without static variables (1.3x). Tests of transferability showed that transferring the model specifications without reestimating the model parameters resulted in 10-50 % increase in error rate across models. Performance of machine learning models was comparable to count models. The findings from this study indicate that rather than replacing conventional bike data sources and count programs, big data sources like Strava and StreetLight actually make the old “small” data even more important.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.