Abstract
Multi-target tracking (MTT) is among the fundamental problems in the field of video analysis and monitoring. In the tracking-by-detection framework, data association is one of the most important and difficult problems. In this paper, we propose a framework to obtain the appearance features of a target in an end-to-end fashion, which fuses high-level and low-level semantic information. The high-order feature map is abstracted using the high-order apparent relationship for each target between the current frame and the previous frames, whereas the similarity matrix is used to describe the high-order features of the target. The best matching relationships between targets are obtained using hierarchical data association and the Hungarian algorithm. This proposed method is called Multi-target tracking Based on High-order Appearance Feature Fusion (MTT-HAFF), which can handle a large number of input sequences, local association failures, and identity exchanges that result from unreliable detections. The results show that the proposed algorithm has a good robustness for long-term occlusion tracking.
Highlights
Tracking is an essential component of various vision applications, e.g., robotics with trajectory planning and decision making [1], [2] and intelligent video surveillance [3], [4], to detect, identify, and track targets and to monitor restricted areas from approach or entry, such as in museums, zoos, military facilities, and prisons
EXPERIMENTAL RESULTS AND ANALYSIS The proposed Multi-target tracking (MTT)-HAFF framework is used to extract features and calculate the similarity score, which processes the apparent sequence of the previous frame and the newly detected targets in the current frame
Extensive experiments show that the MultiObject Tracking Accuracy (MOTA) metric in our algorithm reaches 48.8%, the IDF1 reaches 50.1%, and the Mostly Lost (ML) reaches 30.1%, in the MOT17 dataset
Summary
Tracking is an essential component of various vision applications, e.g., robotics with trajectory planning and decision making [1], [2] and intelligent video surveillance [3], [4], to detect, identify, and track targets and to monitor restricted areas from approach or entry, such as in museums, zoos, military facilities, and prisons. MTT based on deep learning generally uses a pre-training model to represent the appearance features of a target. We utilize the appearance features to model the MTT-HAFF network and estimate the similarity relationship of the target in a pair of frames using the extracted appearance features. (3) OCN and the gradual recursive network (GRN) are added to the architecture to reduce the spatial redundancy and communicate efficiently between the low-level and highlevel semantic information This module expands the receiving area of the original pixel space and obtains more global information, which increases the image recognition ability and improves the MTT performance. Additional representative appearance features are extracted by adding the SRN module behind the convolutional layer This enables more effective learning for additional spatial relationships between two pixels with a certain distance on the image
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.