YOLOD: A Task Decoupled Network Based on YOLOv5

Xingzhu Liang,Qing Chen,Wei Cheng,Xinyun Yan,Lixin Wang,Chunjiong Zhang

doi:10.1109/tce.2023.3278264

Abstract

Object detection includes three subtasks of predicting target position, classification, and confidence. In the mainstream object detection model, the model pursues internal structure refinement, and each subtask shares almost the same structure, which is a task-coupled structure. The task-coupled structure of the model reduces the training parameters, but it cannot be tuned on the network structure for each task separately, which can limit the model performance. We designed a task decoupled object detection network (YOLOD) based on YOLOv5, where YOLOD is decoupled immediately after the backbone network. By observing the loss convergence of each subtask, three network structures are designed separately and the branch size is controlled so that the model has fewer training parameters. At the same time, some experimental adjustments were made to YOLOD to accelerate the convergence speeds of the model. In addition, we add image contour information to the original three-channel image to assist model training and improve detection accuracy. The experiments demonstrate that the modified model is smaller in size and has the largest accuracy improvement on the small-scale model. without introducing any attention-based modules, YOLOD-S achieves a mAP improvement of 1.1% on the MS COCO dataset and 2.29% on the VOC dataset, and the larger model YOLOD-L achieves an accuracy of 48.8% on the COCO dataset.

Full Text