The spatial heterogeneity and temporal variability of traffic in urban environments make traffic emissions inference challenging. To address this challenge, this study introduces a novel geographical context-based approach utilizing high-resolution taxi GPS data, incorporating multidimensional contextual factors such as road data, points of interest (POI), weather data, and population density. The proposed method can enhance the precision of traffic emissions inference compared to conventional macroscopic estimation techniques. To overcome the issue of missing data in traffic emissions inference from taxi data, three ensemble machine learning algorithms—Random Forest, Gradient Boosting Decision Trees (GBDT), and eXtreme Gradient Boosting (XGBoost)—are employed. These algorithms efficiently handle a substantial volume of taxi GPS data, achieving reduced computational time and model complexity. The proposed framework establishes localized models for each road segment, taking into consideration both geographical and external features that characterize the urban environment. This localized modeling contributes significantly to a more profound understanding of traffic dynamics. A thorough comparative analysis is conducted to assess the performance of the proposed method. Results indicate that incorporating multidimensional urban features is advantageous for traffic speed inference. Among the ensemble learning models, Random Forest outperforms others when dealing with a small missing rate or limited sample size, while XGBoost exhibits superior performance for larger missing rates or substantial sample sizes. Additionally, an analysis of the feature importance in traffic speed highlights that road network features are the most significant factors, followed by temporal characteristics, spatial attributes, POI data, and weather information. Finally, leveraging inferred traffic speed and volume information, emissions from large-scale urban road traffic are inferred based on the COPERT model. In contrast to methods relying on complex, multi-source data for emission estimation, our approach utilizes simple and easily accessible data, enabling precise estimation of emissions on a large-scale spatiotemporal basis.
Read full abstract