Generalizability of Gene Expression Programming-based approaches for estimating daily reference evapotranspiration in coastal stations of Iran

Jalal Shiri,Ali Ashraf Sadraddini,Amir Hossein Nazemi,Ozgur Kisi,Gorka Landeras,Ahmad Fakheri Fard,Pau Marti

doi:10.1016/j.jhydrol.2013.10.034

Abstract

Summary When dealing with climatic variables, the performance assessment of many Artificial Intelligence (AI) and/or data mining applications is based on a single data set assignment of the training and test sets. Further, it is very usual that this assignment is defined according to a local and temporary criterion, i.e. the models are trained and tested using data of the same station. Based on this procedure, the performance of the models outside the training location cannot be inferred. The present work evaluates the performance of Gene Expression Programming (GEP) based models for estimating reference evapotranspiration (ET0) according to temporal and spatial criteria and data set scanning procedures in coastal environments of Iran. The accuracy differences between the local and the external performance depend on the specific climatic trends of the test stations, as well as on the input combination used to feed the models. When relying on a suitable input selection, externally trained models might be a valid alternative to locally trained ones, which would be a crucial advantage in places where only limited climatic variables are available. K-fold testing is a good choice to prevent partially valid conclusions derived from model assessments based on a simple data set assignment. Further, calibration of the GEP model may not be needed, if enough climatic data are available at other stations for external model application. The performance of the GEP model fluctuates chronologically and spatially. A suitable assessment of the model should consider a complete local and/or external scan of the data set used.

Full Text