Comment on gmd-2021-218

doi:10.5194/gmd-2021-218-rc3

Abstract

The use of statistical models to study the impact of weather on crop yield has not ceased to increase. Unfortunately, this type of application is characterised by datasets with a very limited number of samples (typically one sample per year). In general, statistical inference uses three datasets: the training dataset to optimise the model parameters, the validation datasets to select the best model, and the testing dataset to evaluate the model generalisation ability. Splitting the overall database into three datasets is impossible in crop yield modelling. The leave-one-out cross-validation method or simply leave-one-out (LOO) has been introduced to facilitate statistical modelling when the database is limited. However, the model choice is made using the testing dataset, which can be misleading by favouring unnecessarily complex models. The nested cross-validation approach was introduced in machine learning to avoid this problem by truly utilising three datasets, especially problems with limited databases. In this study, we proposed one particular implementation of the nested cross-validation, called the leave-two-out method (LTO), to chose the best model with an optimal model complexity (using the validation dataset) and estimated the true model quality (using the testing dataset). Two applications are considered: Robusta coffee in Cu M'gar (Dak Lak, Vietnam) and grain maize over 96 French departments. In both cases, LOO is misleading by choosing too complex models; LTO indicates that simpler models actually perform better when a reliable generalisation test is considered. The simple models obtained using the LTO approach have reasonable yield anomaly forecasting skills in both study crops. This LTO approach can also be used in seasonal forecasting applications. We suggest that the LTO method should become a standard procedure for statistical crop modelling.

Highlights

Many approaches are available to study the impact of climate/weather variables on crop yield
We suggest that the leave-two-out method (LTO) method should become a standard procedure for statistical crop modelling
The chosen model is not independent of the testing dataset, and the obtained testing score may be unreliable. This is not a problem 40 if there are many available samples, but a small sample size can cause many issues: the model can overfit the training dataset; the complexity of the chosen model is not adequate, and our assessment of its generalisation ability is false. This is often a mistake in crop yield modelling that uses over-complex models that cannot be calibrated with a limited number of samples

Summary

Introduction

Many approaches are available to study the impact of climate/weather variables on crop yield. This is not a problem 40 if there are many available samples, but a small sample size can cause many issues: the model can overfit the training dataset; the complexity of the chosen model is not adequate, and our assessment of its generalisation ability is false This is often a mistake in crop yield modelling that uses over-complex models that cannot be calibrated with a limited number of samples. The LTO will be used here to obtain a reliable assessment of the model generalisation ability, compare the performances of different predictive models, and determine the optimal complexity of the statistical crop models This approach will be tested in two real-world applications: Robusta coffee in Cu M’gar (a district of Dak Lak province in Vietnam) from 2000 to 2018 and grain maize 55 over 96 departments (i.e., administrative units) in France for the 1989-2010 period. The following sections of this study will (1) introduce the databases used for statistical crop models, (2) describe the role of three datasets in statistical inference, (3) introduce the two cross-validation approaches, (4) evaluate and select the “best model ” by using LOO and LTO approaches, (5) estimate the Robusta coffee yield anomalies in Cu M’gar (Dak Lak, Vietnam), and (6) assess the seasonal yield anomaly forecasts for grain maize in France

Coffee yield database

Grain maize yield database

Statistical yield models

Model complexity and overfitting

Training, validation and testing datasets

Measuring the quality of statistical yield models

Leave-One-Out

Reliability model assessment

Conclusions and perspectives

Findings

425 Acknowledgements

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Comment on gmd-2021-218

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Quick response on RC1
Thi Lan Anh Dinh
-
Thi Lan Anh DinhThi Lan Anh Dinh
07 Oct 2021
07 Oct 2021

Comment on gmd-2021-218
-
-
--
24 Oct 2021
Comment on gmd-2021-218
-

Comment on gmd-2021-218
-
-
--
05 Oct 2021
Comment on gmd-2021-218
-

Comment on gmd-2021-218
Thi Lan Anh Dinh
-
Thi Lan Anh DinhThi Lan Anh Dinh
08 Nov 2021
Comment on gmd-2021-218
Thi Lan Anh Dinh

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comment on gmd-2021-218

Abstract

Highlights

Summary

Talk to us

Similar Papers