Abstract

The existing body of work on video object tracking (VOT) algorithms has studied various image conditions such as occlusion, clutter, and object shape, which influence video quality and affect tracking performance. Nonetheless, there is no clear distinction between the performance reduction caused by scene-dependent challenges such as occlusion and clutter, and the effect of authentic in-capture and post-capture distortions. Despite the plethora of VOT methods in the literature, there is a lack of detailed studies analyzing the performance of videos with authentic in-capture and post-capture distortions. We introduced a new dataset of authentically distorted videos (AD-SVD) to address this issue. This dataset contains 4476 videos with different authentic distortions and surveillance activities. Furthermore, it provides benchmarking results for evaluating ten state-of-the-art visual object trackers (from VOT 2017–2018 challenges) based on the proposed dataset. In addition, this study develops an approach for performance prediction and quality-aware feature selection for single-object tracking in authentically distorted surveillance videos. The method predicts the performance of a VOT algorithm with high accuracy. Then, the probability of obtaining the reference output is maximized without executing the tracking algorithms. We also propose a framework to reduce video tracker computation resources (time and video storage space). We achieve this by balancing processing time and tracking accuracy by predicting the performance in a range of spatial resolutions. This approach can reduce the execution time by up to 34% with a slight decrease in performance of 3%.

Highlights

  • Video object tracking (VOT) is one of the most studied areas in computer vision and multimedia processing

  • Even though recent works on benchmarking of thermal VOT algorithms have been proposed in [67], [68], [69] that address challenges such as real-world scenarios along with deformable and blurry targets, in this work we focus on authentically distorted visible light surveillance videos

  • We proposed and tested a performance prediction approach for single object tracking of authentically distorted surveillance videos

Read more

Summary

INTRODUCTION

Video object tracking (VOT) is one of the most studied areas in computer vision and multimedia processing. Previous studies on the impact of distortions on the performance of machine vision algorithms have addressed tasks such as object and face detection [17], dermoscopy [18], and face recognition in long-wave infrared (LWIR) images [19]– [21] These approaches are usually based on natural scene statistics (NSS) or deep relevant quality features that account for post-capture distortions such as blur, additive noise, and uneven illumination. These include execution time and the disk space required for storage by predicting the VOT algorithm performance and determining the optimal spatial scale to process a video This approach complements our previous work in which we demonstrated the impact of authentic distortions on state-of-the-art video trackers and developed a qualityaware-tracker for post-capture distortions [55], [56]. The remainder of this paper is organized as follows: Section II presents the proposed AD-SVD dataset and the benchmarking of video trackers, Section III describes the details of our video tracker performance prediction method, Section IV discusses our proposed method for video tracker execution time reduction, Section V analyzes the experimental results, and Section VI concludes the paper

AUTHENTICALLY DISTORTED SURVEILLANCE VIDEOS DATASET
SUPPORT VECTOR MACHINE REGRESSION MODEL
VIDEO TRACKER EXECUTION TIME REDUCTION
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call