Bike-sharing systems can definitely contribute to the achievement of sustainable urban mobility. In spite of this potential, their planning and operation are not free of difficulties. The main operational problem of bike-sharing systems is the unbalanced distribution of bicycles over the service region, resulting in zones where bicycles are scarce and zones where bicycles accumulate. In order to provide an acceptable level of service, the operator needs to carry out repositioning movements, which are costly. Bike-sharing repositioning optimization solutions have been developed that rely on the estimation of the expected number of requests and returns at each location. Errors in this prediction are directly transferred to suboptimal repositioning solutions. For this reason, the development of methodologies able to accurately forecast bike-sharing usage is an issue of great concern. This paper deals with this problem using machine learning regression methods, which yield usage predictions from inputs such as historical usage and meteorological data. Three different machine learning regression techniques have been analyzed (i.e., random forest, gradient boosting, and artificial neural networks) and applied to a case study based on the New York City bike-sharing system. This paper describes the variables of the models and their calibration processes. Results are analyzed and compared in order to determine which one of the three techniques and under what conditions is the most adequate. Comparisons are not only made in terms of accuracy but also with respect to the applicability of the algorithms. Results indicate that, given the similar accuracy of all methods, the simpler calibration process of the random forest technique makes it advisable for most applications.
Read full abstract