Abstract

Small objects are widely found in different applications such as traffic signs and to segment those objects make it difficult to extract features due to the small number of pixels. Previous research has been done to show how error prone the semantic segmentation networks to small objects in variance of application such as medical images and remote sensing and how it leads to class imbalance. However, small object segmentation seems to be tricky and making the network struggle. Recently there are small amount of research has been done in the effect of the feature extraction backbone to the small object datasets. In this paper we investigate the effect of different backbone feature extraction such as AlexNet, VGGNet, GoogleNet on an imbalanced small objects dataset after grouping them by shape and colour in the Fully Convolutional Networks (FCN). We measure the performance on PASCAL VOC and Malaysian Traffic Sign Dataset (MTSD) showing the pixel accuracy, mean accuracy per class, mean IoU and frequency weighted IoU for each backbone and FCN. The results show that VGGNet as a backbone with Cross Entropy (CE) combined with Dice Loss (DL) achieves the highest score in mean IoU for imbalanced dataset but not for balanced dataset. However, in the imbalanced dataset major classes have a higher probability to confuse with minor classes due to the class imbalance. In conclusion we investigate different backbone networks with grouped labels dataset in shape and colour and we recommend using VGGNet FCN with CE combined with DL for imbalanced datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call