Impacts of Sample Design for Validation Data on the Accuracy of Feedforward Neural Network Classification

Giles Foody

doi:10.3390/app7090888

Abstract

Validation data are often used to evaluate the performance of a trained neural network and used in the selection of a network deemed optimal for the task at-hand. Optimality is commonly assessed with a measure, such as overall classification accuracy. The latter is often calculated directly from a confusion matrix showing the counts of cases in the validation set with particular labelling properties. The sample design used to form the validation set can, however, influence the estimated magnitude of the accuracy. Commonly, the validation set is formed with a stratified sample to give balanced classes, but also via random sampling, which reflects class abundance. It is suggested that if the ultimate aim is to accurately classify a dataset in which the classes do vary in abundance, a validation set formed via random, rather than stratified, sampling is preferred. This is illustrated with the classification of simulated and remotely-sensed datasets. With both datasets, statistically significant differences in the accuracy with which the data could be classified arose from the use of validation sets formed via random and stratified sampling (z = 2.7 and 1.9 for the simulated and real datasets respectively, for both p < 0.05%). The accuracy of the classifications that used a stratified sample in validation were smaller, a result of cases of an abundant class being commissioned into a rarer class. Simple means to address the issue are suggested.

Highlights

Artificial neural networks are widely used for supervised classification applications
The overall trend expected would be for the accuracy with which the abundant class is classified to increase while the accuracy of the classification of the rarer class would decline. As such it is hypothesized that the use of a stratified sample design may not be ideal as its use relative to a randomly-defined validation dataset would be associated with a decrease in overall accuracy, arising noticeably through a decrease in the accuracy for the abundant class as a result of an increase in the commission of cases of abundant class by the set of rarer classes
For example, that relative to the classification obtained with a stratified validation sample, the use of the validation set formed by random sampling resulted in a higher overall accuracy of the testing set

Summary

Introduction

Artificial neural networks are widely used for supervised classification applications. In a supervised classification with a conventional feedforward neural network it is common for part of the training sample to be used for validation purposes [1,2,10,33,34,35] In this data splitting approach part of the training set is used in the normal way to provide examples of the classes upon which the classifier may learn to form rules to classify cases of unknown membership. The overall trend expected would be for the accuracy with which the abundant class is classified to increase while the accuracy of the classification of the rarer class would decline As such it is hypothesized that the use of a stratified sample design may not be ideal as its use relative to a randomly-defined validation dataset would be associated with a decrease in overall accuracy, arising noticeably through a decrease in the accuracy for the abundant class (es) as a result of an increase in the commission of cases of abundant class (es) by the set of rarer classes

Data and Methods

Results and Discussion

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Aug 30, 2017
Citations: 12	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Impacts of Sample Design for Validation Data on the Accuracy of Feedforward Neural Network Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Effect of sampling design on abundance estimates of benthic invertebrates in environmental monitoring studies
Hn Cabral ... Ag Murta
Marine Ecology Progress Series | VOL. 276
Hn Cabral, et. al.Hn Cabral ... Ag Murta
01 Jan 2004
Marine Ecology Progress Series | VOL. 276

A new model selection strategy in time series forecasting with artificial neural networks: IHTS
Serkan Aras ... İpek Deveci Kocakoç
Neurocomputing | VOL. 174
Serkan Aras, et. al.Serkan Aras ... İpek Deveci Kocakoç
19 Oct 2015
Neurocomputing | VOL. 174

A transfer learning approach to space debris classification using observational light curve data
James Allworth ... Mitch Bryson
Acta Astronautica | VOL. 181
James Allworth, et. al.James Allworth ... Mitch Bryson
27 Jan 2021
Acta Astronautica | VOL. 181

Attention Mechanism and Depthwise Separable Convolution Aided 3DCNN for Hyperspectral Remote Sensing Image Classification
Wenmei Li ... Yu Wang
Remote Sensing | VOL. 14
Wenmei Li, et. al.Wenmei Li ... Yu Wang
05 May 2022
Remote Sensing | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Impacts of Sample Design for Validation Data on the Accuracy of Feedforward Neural Network Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences