Abstract

AbstractThe development of the “causal” forest by Wager and Athey (J Am Stat Assoc 113(523): 1228–1242, 2018) represents a significant advance in the area of explanatory/causal machine learning. However, this approach has not yet been widely applied to geographically referenced data, which present some unique issues: the random split of the test and training sets in the typical causal forest design fractures the spatial fabric of geographic data. To help solve this issue, we use a simulated dataset with known properties for average treatment effects and conditional average treatment effects to compare the performance of CF models across different definitions of the test/train split. We also develop a new “spatial” T-learner that can be implemented using predictive methods like random forest to provide estimates of heterogeneous treatment effects across all units. Our results show that all of the machine learning models outperform traditional ordinary least squares regression at identifying the true average treatment effect, but are not significantly different from one another. We then apply the preferred causal forest model in the context of analysing the treatment effect of the construction of the Valley Metro light rail (tram) system on on-road CO2 emissions per capita at the block group level in Maricopa County, Arizona, and find that the neighbourhoods most likely to benefit from treatment are those with higher pre-treatment proportions of transit and pedestrian commuting and lower proportions of auto commuting.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call