Abstract

Deep learning (CNN) can significantly improve the accuracy of image recognition with its powerful features, but the low-level network layer also contains important feature information. In order to achieve more stable and efficient tracking in multi-target tracking, this type of deep features will also be used to make the features more expressive by integrating the data from the front and back layers. The deformable convolution is also introduced to overcome the deformation problem caused by the camera motion. And with the increase of time, we predict the position of the target by the motion model, so as to remove the position where the target is impossible to reach in physical space, and further optimize the association before multiple targets. In this paper, we use an end-to-end correlation method to reduce the complexity of the algorithm. We tested it on the open source dataset MOT17 dataset and obtained remarkable results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call