Abstract

In the field of deep learning, people strive to construct high-quality deep neural networks (DNNs) to improve the accuracy of predicting. As well known, the quality of training data have great impacts on the quality of DNN models, since all the DNN models are obtained by training using these training data. However, there is not any reported systematic study on how the quality of training data affects the quality of DNN model. To study the relationships between data quality and model quality, we mainly consider four aspects of data quality including Skewed Classes, Sample Complexity, Label Quality, and Noisy Data in this paper. We design experiments on MNIST and Cifar-10, and attempt to find out the influences of four aspects on the quality of DNN models. Pearson correlation coefficient and Spearman correlation coefficient are utilized to evaluate such influences. Experimental results show that all the four aspects of data quality have significant impacts on the quality of DNN models. It means that the decrease of data quality in these four aspects will reduce the accuracy of the DNN models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call