Abstract

The deeper the network, the more computing resources are needed, especially for semantic segmentation networks. We only know that within a certain range, the more convolutional layers, the better, but we don't know whether the number of convolutional layers in the backbone network has met the needs of segmentation tasks and whether the learning ability of the shallow networks is fully exploited. In this paper, we first propose to introduce the hyper-parameters of the number of convolutional layers into the deep learning training task. Based on prior knowledge, we design the loss-weight of the depth supervision branches in the basic semantic segmentation that has a linear function or a quadratic function relationship with the number of convolutional layers. The deep supervision layer is used as a tool to output the loss of the convolutional layer at different positions during the training process, and it becomes invalid during the test process. For the medium size networks, the method we designed improves the MIoU score of the PSV and CamVid data sets by an average of 1.37% and 0.94 % with Lovasz-Softmax loss function, respectively. Our approaches get significant improvement on shallow networks with MIoU scores by nearly 1% and 8% on the PSV dataset and the CamVid dataset respectively. Experimental results on the Cityscapes and CamVid datasets also show that by using deep supervision branches with loss weights, the learning ability of the model can be stably improved without increasing model parameters and without changing the model's backbone network structure.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call