Abstract

Recent studies on Video Coding for Machine (VCM) have achieved remarkable results. However, in practical visual-analytic applications, except for transmitting the visual feature for machine analysis, video textures are also mandatory for human monitoring and decision-making. To this end, this paper proposes a human-machine friendly video compression scheme (HMFVC) which can satisfy both human viewing and machine analysis well. First, we propose a learned semantic representation (LSR) method to extract semantic information between temporal neighboring frames. LSR could be utilized in signal reconstruction for human viewing and visual analysis for machine understanding. Second, given the proposed LSR, we design an end-to-end optimized video compression framework to jointly optimize the visual quality for human perception, analysis accuracy for machines, and compression efficiency as well. Finally, an HMFVC codec is developed, which can achieve higher action recognition accuracy and better reconstruction quality than the traditional codecs and learned video compression approaches. Specifically, HMFVC saves 77% bitrate to achieve the same analysis performance with the original videos compared to x265. To our knowledge, HMFVC is the first end-to-end optimized video compression scheme to serve both humans and machines. It is a promising framework for human-machine friendly video compression approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call