Abstract

Nowadays, the use of video data for machine (VCM) tasks has become increasingly prevalent, with deep learning and computer vision requiring large volumes of video data for object detection, object tracking, and other tasks. However, the features required for machine tasks are different from those used by humans, and a new approach is needed to encode and compress video data for machine consumption. Video coding for machines has received considerable attention, with many approaches focusing on compressing features rather than the video itself. However, a key challenge in this process is repacking the features in an efficient and effective manner. This paper proposes a distance-based patch tiling and intra-block quilting method to repack feature sequences in a manner that is better suited for existing video codecs, based on statistical analysis of feature characteristics in the channel dimension. Experimental results demonstrate that our method achieves an 65.54% BD-rate gain compared to benchmark methods. This research has significant implications for improving the efficiency of video coding for machine applications, and future work could explore the use of feature dimensionality reduction and combination of neural network (NN) codec to optimize the repacking of features for compression.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call