Abstract
In this paper, a graph convolutional network (GCN)-based multi-object tracking (MOT) algorithm, consisting of a module for extracting the initial features and a module for updating the features, that estimates the affinity between nodes is proposed. The feature extraction module utilizes the pose feature of the object such that the tracking is correct even when the object is partially occluded. Unlike previous graph neural network (GNN)-based MOT methods, this study is based on a GCN and includes a new feature update mechanism, which is updated by combining the output of the neural network and the node similarity between the tracker and detection nodes for each layer. The node feature is updated by aggregating the updated edge feature and the connection strength between the tracker and detection. In each GCN layer, the three networks for the node, edge update, and edge classification were designed to minimize the network parameters to enable faster MOT compared to other GCN-based MOTs. The entire GCN network was designed to learn end-to-end through an affinity loss. The experimental results for the MOT16 and 17 challenge datasets show that the proposed method achieves a superior or similar performance in terms of tracking accuracy and speed compared to state-of-the-art methods, including GCN-based MOT.
Highlights
Object tracking can be primarily divided into single-object tracking (SOT) and multi-object tracking (MOT)
The proposed method is lighter and achieves significantly improved tracking results compared with other graph convolutional network (GCN)-based state-of-the-art methods based on experiments conducted using various MOT benchmark datasets
Detailed quantitative evaluation results were uploaded onto the MOT benchmark dataset (MOT-BD) website, and detailed evaluation results can be obtained
Summary
Object tracking can be primarily divided into single-object tracking (SOT) and multi-object tracking (MOT). In terms of practical applications, including video surveillance, autonomous vehicles, and robot navigation, the MOT, which can track multiple objects simultaneously, is receiving more attention than the SOT, which tracks only one object. The tracking-by-detection paradigm, the most common approach in MOT, largely depends on two performances. The first is the object detection performance. The object must be accurately detected in every frame such that the tracking avoids breaking or being incorrectly connected during subsequent tracking operations. Various object detectors [1]–[4] with a high performance based on a CNN have recently been introduced, and the degradation of the tracker from an erroneous object detection has been resolved to a certain extent. Object detectors can still detect incorrect objects or miss objects owing to object occlusions or camera shaking.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have