Abstract

Object detection is one of the main tasks of computer vision. Object detection algorithms usually rely on deep convolutional neural networks, which require the host device to have high computing capabilities, greatly limiting the application of object detection methods for mobile devices with limited computing capabilities, such as embedded devices. Among the current object detection algorithms, the you only look once (YOLO) series takes both speed and accuracy into consideration and is one of the most commonly used methods for object detection. In this article, TRC-YOLO is proposed, which improves the mean average precision (mAP) and real-time detection speed of the model while reducing the size of the model. In TRC-YOLO, the convolution kernel of YOLO v4-tiny is pruned and an expansive convolution layer is introduced into the residual module of the network to produce an hourglass Cross Stage Partial ResNet (CSPResNet) structure. A receptive field block (RFB) that simulates human vision is also added, increasing the receptive field of the model and strengthening the feature extraction ability of the network. In addition, the convolutional block attention module is applied, which combines spatial attention and channel attention, to enhance the effective features of the model and reduce the negative impact of noise on the model. The size of the TRC-YOLO model is 17.8 MB, which is 5.9 MB smaller than YOLO v4-tiny, and the model parameter is 2.983 billion floating point operations per second (BFLOP/s) (3.834 BFLOP/s less than YOLO v4-tiny). In addition, TRC-YOLO achieves a real-time performance of 36.9 frames per second on a Jetson Xavier NX, and its mAP on the PASCAL VOC dataset is 66.4 % (3.83 % higher than YOLO v4-tiny). In addition, the mAP of TRC-YOLO on the MS COCO dataset is 37.7 %, which is 1.9 % higher than that of the baseline model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call