Abstract: Given the complexity of urban transportation networks and the multiple variables that might affect journey times, estimating the length of cab rides in New York City (NYC) is a difficult process. In this study, we provide a unique method for resolving this issue that makes use of machine learning techniques and a wide range of attributes gleaned from taxi trip data. We start by gathering a sizable collection of historical records of NYC taxi trips,providing specifics like pick-up and drop-off points, timestamps, and lengths of trips. To deal with outliers, missing values, and geographical and temporal irregularities, we preprocess the data. Furthermore, we design a broad range of characteristics, such as geographic coordinates, time of day, day of the week, and weather conditions, to capture the spatial, temporal, and contextual elements of each journey. Then, using gradient boosting methods, we create a prediction model that efficiently uncovers the intricate patterns seen in the data. We carefully adjust the model's hyperparameter to enhance performance and use cross-validation techniques to guarantee resilience. In addition, we apply ensemble techniques to enhance prediction precision and minimise model bias. We conduct lengthy tests on a held-out test set and compare the performance of our model to a number of baseline techniques frequently employed in triptime prediction in order to assess the efficacy of our suggested strategy. The outcomes show that our strategy works better than the competition, with lower prediction errors and higher accuracy. We also do interpretability assessments to learn more about the variables that have the most impact on estimates of trip time. Our results demonstrate the potential of feature engineering and M L approaches for precise and trustworthy taxi trip length prediction in NYC. The suggested method not only helps taxi service companies by allowing them to more accurately predict journey lengths, but it also improves customer experience by giving more precise travel time estimates. Additionally, our approach may be used as a starting point for future studies in the field of urban transportation prediction, enabling better efficiency and planning in urban mobility networks.
Read full abstract