Target Adaptive Tracking Based on GOTURN Algorithm with Convolutional Neural Network and Data Fusion.

Zhengze Li,Jiancheng Xu

doi:10.1155/2021/4276860

Abstract

With the advent of the artificial intelligence era, target adaptive tracking technology has been rapidly developed in the fields of human-computer interaction, intelligent monitoring, and autonomous driving. Aiming at the problem of low tracking accuracy and poor robustness of the current Generic Object Tracking Using Regression Network (GOTURN) tracking algorithm, this paper takes the most popular convolutional neural network in the current target-tracking field as the basic network structure and proposes an improved GOTURN target-tracking algorithm based on residual attention mechanism and fusion of spatiotemporal context information for data fusion. The algorithm transmits the target template, prediction area, and search area to the network at the same time to extract the general feature map and predicts the location of the tracking target in the current frame through the fully connected layer. At the same time, the residual attention mechanism network is added to the target template network structure to enhance the feature expression ability of the network and improve the overall performance of the algorithm. A large number of experiments conducted on the current mainstream target-tracking test data set show that the tracking algorithm we proposed has significantly improved the overall performance of the original tracking algorithm.

Highlights

Vision is an important way for humans to observe the world
About 75% of the information that humans learn from the external world comes from the human visual system
E network structure of the Generic Object Tracking Using Regression Network (GOTURN) algorithm is relatively simple. e twin network of GOTURN uses the first five-layer network structure of CAFENet, and CAFENet is pretrained on ImageNet. e last one is a 3-layer fully connected layer, each layer has 4096 nodes, and the output layer after the fully connected layer has 4 nodes, which are used to output the coordinates of the upper left corner and the lower right corner of the tracking target. e author of the GOTURN algorithm first analysed the video sequence and found that the tracking target has a Laplacian distribution relationship between the previous frame and the Original video frame

Summary

Introduction

Vision is an important way for humans to observe the world. About 75% of the information that humans learn from the external world comes from the human visual system. As a simulation reference model for computer vision, the human visual system integrates computer science and engineering, physics, signal processing, applied mathematics and statistics, neurophysiology, and psychology [1]. Target tracking refers to the detection, recognition, and tracking of targets in a video image sequence to obtain information such as the target’s speed, position, and movement trajectory [2]. Each frame of video can be processed as a picture to obtain the position coordinates of the target on the image. According to different coordinate values, the moving targets in the series of image sequences are connected, so as to obtain the moving trajectory of the target object in the entire video stream [3]. According to the different methods used in target tracking, they can be roughly divided into the following categories

Computational Intelligence and Neuroscience

Softmax Classification

Conv layers FC layers

Simulation Results and Performance Analysis

Tracking result

GOTURN IGOTURN