System identification, sensitivity analysis, optimization and control, require a large number of model evaluations. Accurate simulators are too slow for these applications. Fast emulators provide a solution to this efficiency demand, sacrificing unneeded accuracy for speed. There are many strategies for developing emulators but selecting one remains subjective. Herein we compare the performance of two kinds of emulators: mechanistic emulators that use knowledge of the simulator's equations, and purely data-driven emulators using matrix factorization. We borrow simulators from urban water management, because more stringent performance criteria on water utilities have made emulation a crucial tool within this field. Results suggest that naive data-driven emulation outperforms mechanistic emulation. We discuss scenarios in which mechanistic emulation seems favorable for extrapolation in time and dealing with sparse and unevenly sampled data. We also point to advances in Machine Learning that have not permeated yet into the environmental science community.
Read full abstract