Abstract

Abstract. Can we improve machine-learning (ML) emulators with synthetic data? If data are scarce or expensive to source and a physical model is available, statistically generated data may be useful for augmenting training sets cheaply. Here we explore the use of copula-based models for generating synthetically augmented datasets in weather and climate by testing the method on a toy physical model of downwelling longwave radiation and corresponding neural network emulator. Results show that for copula-augmented datasets, predictions are improved by up to 62 % for the mean absolute error (from 1.17 to 0.44 W m−2).

Highlights

  • The use of machine learning (ML) in weather and climate is becoming increasingly popular (Huntingford et al, 2019; Reichstein et al, 2019)

  • When it comes to training ML models for weather and climate applications two main strategies may be identified: one in which input and output pairs are directly provided and a second in which inputs are provided but corresponding outputs are generated through a physical model

  • The method is demonstrated with a toy model of downwelling radiation as the physical model (Sect. 2.4) and a simple feed-forward neural network (FNN) as the ML emulator (Sect. 2.5)

Read more

Summary

Introduction

The use of machine learning (ML) in weather and climate is becoming increasingly popular (Huntingford et al, 2019; Reichstein et al, 2019). Krasnopolsky and Lin, 2012; Rasp and Lerch, 2018) When it comes to training ML models for weather and climate applications two main strategies may be identified: one in which input and output pairs are directly provided (e.g. both come from observations) and a second in which inputs are provided but corresponding outputs are generated through a physical model (e.g. parameterization schemes or even a whole weather and climate model). Given estimated models C and F1, . . ., Fn for the copula and marginal distributions, we can generate synthetic data as follows

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call