Multi-object tracking via deep feature fusion and association analysis

Hui Li,Yapeng Liu,Xiaoguo Liang,Yongfeng Yuan,Yuanzhi Cheng,Guanglei Zhang,Shinichi Tamura

doi:10.1016/j.engappai.2023.106527

Abstract

We describe a tracking-by-detection framework for multi-object tracking (MOT). It first detects the objects of interest in each frame of the video, followed by identifying association with the object detected in the previous frame. A deep association network is described to perform object feature matching in the arbitrary two frames to infer association degree of objects, and then similarity matrix loss is used to calculate association between each object in different frames to achieve an accurate tracking. The novelty of the work lies in the design of a multi-scale fusion strategy by gradually concatenating sub-networks of low-resolution feature maps in parallel to the main network of high-resolution feature maps, in the construction of a deeper backbone network which can enhance the semantic information of object features, and in the use of a siamese network for training a pair of discontinuous frames. The main advantage of our framework is that it avoids missing detection and partial detection. It is particularly suitable for solving the problem of object ID switch caused by occlusion, entering and leaving of objects. Our method is evaluated and demonstrated on the publicly available MOT15, MOT16, MOT17 and MOT20 benchmark datasets. Compared with the state-of-the-art methods, our method achieves better tracking performance, and is therefore, more suited for MOT tasks.

Full Text