Abstract

Real-time semantic segmentation is a key research topic in the application field of artificial intelligence such as automatic driving and intelligent robot. At present, the consumption of storage space and computing resources of the real-time semantic segmentation model is still huge. As an efficient model compression method, knowledge distillation is widely used in various fields of computer vision. In this paper, we propose a novel knowledge distillation framework based on generative adversarial network structure, which combines spatial consistency and temporal consistency. The teacher network in this framework jointly uses the CNN branch and transformer branch to improve the spatial consistency of lightweight real-time semantic segmentation of the student network. In addition, we integrate the inter-frame relationship obtained by the optical flow network and semantic segmentation network in continuous time as the time consistency constraint of the student network. Finally, spatial consistency and temporal consistency are coupled as spatial-temporal consistency knowledge. The main purpose of our knowledge distillation method is to transfer the spatio-temporal consistency knowledge contained by teachers to students. The student network obtained by knowledge distillation can process each frame independently in the inference stage, and our knowledge distillation method does not participate in the inference process of the student network, so it will not increase the computational cost of the student network in the inference process, but it can narrow the performance gap of real-time semantic segmentation between large model and compact model. Using our method, we can get a high-performance and efficient lightweight model. Finally, we verify the effectiveness of our proposed method on the Camvid dataset and the Cityscapes dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call