Understanding and Predicting Ride-Hailing Fares in Madrid: A Combination of Supervised and Unsupervised Techniques

Tulio Silveira-Santos,Anestis Papanikolaou,Thais Rangel,Jose Manuel Vassallo

doi:10.3390/app13085147

Tulio Silveira-Santos, Anestis Papanikolaou + Show 2 more

Open Access

https://doi.org/10.3390/app13085147

Copy DOI

Journal: Applied Sciences	Publication Date: Apr 20, 2023
Citations: 1	License type: CC BY 4.0

Affiliation: Universidad Politécnica de Madrid

Abstract

App-based ride-hailing mobility services are becoming increasingly popular in cities worldwide. However, key drivers explaining the balance between supply and demand to set final prices remain to a considerable extent unknown. This research intends to understand and predict the behavior of ride-hailing fares by employing statistical and supervised machine learning approaches (such as Linear Regression, Decision Tree, and Random Forest). The data used for model calibration correspond to a ten-month period and were downloaded from the Uber Application Programming Interface for the city of Madrid. The findings reveal that the Random Forest model is the most appropriate for this type of prediction, having the best performance metrics. To further understand the patterns of the prediction errors, the unsupervised technique of cluster analysis (using the k-means clustering method) was applied to explore the variation of the discrepancy between Uber fares predictions and observed values. The analysis identified a small share of observations with high prediction errors (only 1.96%), which are caused by unexpected surges due to imbalances between supply and demand (usually occurring at major events, peak times, weekends, holidays, or when there is a taxi strike). This study helps policymakers understand pricing, demand for services, and pricing schemes in the ride-hailing market.

Full Text