Abstract
The way that available data are divided into training, testing, and validation subsets can have a significant influence on the performance of an artificial neural network (ANN). Despite numerous studies, no systematic approach has been developed for the optimal division of data for ANN models. This paper presents two methodologies for dividing data into representative subsets, namely, a genetic algorithm (GA) and a self‐organizing map (SOM). These two methods are compared with the conventional approach commonly used in the literature, which involves an arbitrary division of the data. A case study is presented in which ANN models developed using each data division technique are used to forecast salinity in the River Murray at Murray Bridge (South Australia) 14 days in advance. When tested on a validation data set from July 1992 to March 1998, the models developed using the GA and SOM data division techniques resulted in a reduction in RMS error of 24.2% and 9.9%, respectively, over the conventional data division method. It was found that a SOM could be used to diagnose why an ANN model has performed poorly, given that the poor performance is primarily related to the data themselves and not the choice of the ANN's parameters or architecture.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.