Data Augmentation policies and heuristics effects over dataset imbalance for developing plant identification systems based on Deep Learning: A case study.

Luciano Araújo Dourado Filho,Rodrigo Tripodi Calumby

doi:10.5335/rbca.v14i2.13487

Abstract

Data augmentation (DA) is a widely known strategy for effectiveness improvement in computer vision models such as Deep Convolutional Neural Networks (DCNN). Although it enables improving model generalization by increasing data diversity, in this work we propose to investigate its effects with respect to two different sources of dataset imbalance (i.e., Content and Sampling imbalance) in a plant species recognition task. We systematically evaluated several techniques to generate the augmented datasets used to train the DCNN models that enabled a thorough investigation over the effects of DA in terms of imbalance attenuation. The results allowed inferring that data augmentation enables mitigating the negative effects related to underrepresentation mainly caused by the dataset imbalance.

Full Text