Abstract The instance segmentation of traffic targets in complex road scenes is one of the most challenging tasks in autonomous driving. Unlike the bounding box localization for object detection and the category perception mask for semantic segmentation, instance segmentation requires accurate identification of each object under each category and more precise segmentation and positioning of these target objects. Although instance segmentation has apparent advantages, methods, for instance segmentation in complex road scenes, still need to be discovered. In this paper, we proposed an efficient instance segmentation method TTIS-YOLO(Traffic Target Instance Segmentation - YOLO) based on YOLOV5-7.0 for traffic object segmentation of complex road scenes. Our main work is as follows: to propose a MECSP(Multiscale Efficient Cross Stage Partial Network) module, which has fewer parameters, better cross-layer information exchange, and feature representation capabilities. Propose an Efficient Bidirectional Cross Scale Connection optimization method that enables the network to perform more detailed and efficient feature fusion without losing original information, refining the mask flow. WIoU Loss is used as the loss function of positioning and segmentation, and the positioning performance of the model is effectively improved through the strategy of dynamically allocating gradient gains. Experiments have shown that our proposed TTIS-YOLO outperforms baseline models and other mainstream instances segmentation algorithms such as Mask RCNN, YOLACT, SOLO, and SOLOV2 with the highest segmentation accuracy and fastest inference speed. Our proposed TTIS-YOLO-S achieves the best balance between segmentation accuracy and inference speed. Compared to the baseline model, the AP50 and recall values on the Cityscapes validation set increased by 1.7% and 0.9%, respectively, with a parameter reduction of 20.6%, inference speed of 78.1fps on GeForce RTX 3090Ti. Meanwhile, TTIS-YOLO-L achieved the highest segmentation accuracy, with an AP50 value of 27%, and the model parameter quantity decreased by 35.4% compared to the baseline model.
Read full abstract