Abstract

Artificial intelligence has been widely studied on solving intelligent surveillance analysis and security problems in recent years. Although many multimedia security approaches have been proposed by using deep learning network model, there are still some challenges on their performances which deserve in-depth research. On the one hand, high computational complexity of current deep learning methods makes it hard to be applied to real-time scenario. On the other hand, it is difficult to obtain the specific features of a video by fine-tuning the network online with the object state of the first frame, which fails to capture rich appearance variations of the object. To solve above two issues, in this paper, an effective object tracking method with learning attention is proposed to achieve the object localization and reduce the training time in adversarial learning framework. First, a prediction network is designed to track the object in video sequences. The object positions of the first ten frames are employed to fine-tune prediction network, which can fully mine a specific features of an object. Second, the prediction network is integrated into the generative adversarial network framework, which randomly generates masks to capture object appearance variations via adaptively dropout input features. Third, we present a spatial attention mechanism to improve the tracking performance. The proposed network can identify the mask that maintains the most robust features of the objects over a long temporal span. Extensive experiments on two large-scale benchmarks demonstrate that the proposed algorithm performs favorably against state-of-the-art methods.

Highlights

  • Nowadays, multimedia content is being widely shared over the Internet due to the rapid development of network technologies and advent of high-end devices

  • Artificial intelligence has been widely studied on solving a variety of difficult problems using deep learning network model, such as convolution neural networks for steganalysis and forensics, and generative adversarial networks for coverless steganagraphy

  • We evaluate all the trackers on 50 video sequences using the one-pass evaluation with distance precision and overlap success metrics

Read more

Summary

Introduction

Multimedia content (in particular image and video data) is being widely shared over the Internet due to the rapid development of network technologies and advent of high-end devices. Emerging technologies such as Cloud, Fog, Edge, SDN, Big. Data, Internet of Things (IoT), and Deep Learning provide scalability, flexibility, agility, and ubiquity in terms of data acquisition, data storage, data management, and communications. Surveillance technology for intelligent multimedia hiding and forensics has been a hot topic in multimedia security community It is the basis of advanced video processing tasks such as follow-up steganography [1], data hiding [2], JPEG compressed [3], and object recognition [4] and is a necessary prerequisite for implementing high-level intelligent behavior analysis.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call