MarsNet: Multi-Label Classification Network for Images of Various Sizes

Ju-Youn Park,Jong-Hwan Kim,Dukyoung Lee,Yewon Hwang

doi:10.1109/access.2020.2969217

Abstract

Since the Convolutional Neural Network (CNN) has surfaced and fascinated the world, many researchers have exploited CNN for image classification, object detection, semantic segmentation, etc. However, the conventional CNNs have a pyramidal structure and were designed to process images which have the same size. Although some CNNs can accept images of various sizes, performance is degraded for images smaller than the size of images used for training. In this paper, we propose MarsNet, a CNN based end-to-end network for multi-label classification with an ability to accept various size inputs. In order to allow the network to accept such images, dilated residual network (DRN) is modified to get higher resolution feature maps, and horizontal vertical pooling (HVP) is newly designed to efficiently aggregate positional information from the feature maps. Furthermore, multi-label scoring module and threshold estimation module are employed to serve the purpose of multi-label classification. We verify the effectiveness of the proposed network through two distinctive experiments. We first verify our model by inspecting and classifying multiple types of defects occurred in PCB screen printer using solder paste inspection (SPI) datasets. Secondly, we verify our network using VOC 2007 dataset. Our network is pioneering in that no research has attempted to accomplish multi-label classification for defects in addition to being able to take input images of various sizes in SPI field.

Highlights

Ever since the idea of deep learning has emerged, there has been booming research on deep learning due to its tremendous benefits
EXPERIMENTS It is important to note that Solder Paste Inspection (SPI) task is to classify multiple types of defects occurred in the screen printer by observing the entire PCB image at once
For SPI tasks with images of various sizes, the best performance was achieved by pooling the high resolution feature maps of mDRN using horizontal vertical pooling (HVP) and applying the threshold estimation module

Summary

INTRODUCTION

Ever since the idea of deep learning has emerged, there has been booming research on deep learning due to its tremendous benefits. Park et al.: MarsNet: Multi-Label Classification Network for Images of Various Sizes This is because CNNs generally adopt the structure that reduces the size of feature maps in pyramidal fashion to offset its drawback of using an enormous number of parameters. We build a modified verison of DRN, mDRN, to resolve the low resolution problem by adding more dilated convolutional layers which result in feature maps that are half of the input image size for each dimension. Output value of j-th node in the fully connected layer, fij = fj(Xi) represents the score of the corresponding class for the input Xi. The following sigmoid cross entropy loss is used for multi-label classification to train the proposed network: L(X , Y ) = − 1 N. where σ (·) is a sigmoid function. Based on the above equation, the proposed MarsNet selects the class j whose score value, fij, from the multi-label scoring module is greater than the corresponding threshold from the threshold estimation module, θj

EXPERIMENTS

SPI IMAGE DATASET

Findings

CONCLUSION