One-shot multi-object tracking using CNN-based networks with spatial-channel attention mechanism

Guofa Li,Xin Chen,Mingjun Li,Wenbo Li,Shen Li,Gang Guo,Huaizhi Wang,Hao Deng

doi:10.1016/j.optlastec.2022.108267

Abstract

Deep learning algorithms for multi-object tracking have made great progress and have powered the emergence of state-of-the-art models to address multi-object tracking problems. Though a lot of efforts have been made, false detections (named “FP”) and missed detections (named “FN”) caused by inaccurate tracking still cannot be well addressed especially in extremely crowded driving situations. To address these problems, we develop a new online one-shot multi-object tracking system based on convolutional neural networks with spatial-channel attention mechanism. Firstly, we propose a feature combination module (FCM) that uses dilated convolution to obtain different receptive fields to adapt to the deformation of the targets, instead of introducing a large number of parameters to deal with the problem of target scale transformation like the recent feature pyramid network. Then, an attention mechanism network (AM-Net) is designed to allow the model to dynamically focus on certain parts of the input that help perform the task and ignore irrelevant information. Finally, we introduce a combination of triple loss and online instance matching loss (TOIM Loss) to distinguish similar instances within a class. Our proposed method is evaluated in three commonly used multi-object tracking datasets including 2DMOT15, MOT16 and MOT20. The results show that our proposed method is generally superior to the compared models.

Full Text