Abstract

Advancements in artificial intelligence have significantly increased the use of images and videos in machine analysis algorithms, predominantly neural networks. However, the traditional methods of compressing, storing and transmitting media have been optimized for human viewers rather than machines. Current research in coding images and videos for machine analysis has evolved in two distinct paths. The first is characterized by End-to-End (E2E) learned codes, which show promising results in image coding but have yet to match the performance of leading Conventional Video Codecs (CVC) and suffer from a lack of interoperability. The second path optimizes CVC, such as the Versatile Video Coding (VVC) standard, for machine-oriented reconstruction. Although CVC-based approaches enjoy widespread hardware and software compatibility and interoperability, they often fall short in machine task performance, especially at lower bitrates. This paper proposes a novel hybrid codec for machines named NN-VVC, which combines the advantages of an E2E-learned image codec and a CVC to achieve high performance in both image and video coding for machines. Our experiments show that the proposed system achieved up to − 43.20% and − 26.8% Bjøntegaard Delta rate reduction over VVC for image and video data, respectively, when evaluated on multiple different datasets and machine vision tasks according to the common test conditions designed by the VCM study group in MPEG standardization activities. Furthermore, to improve reconstruction quality, we introduce a human-focused branch into our codec, enhancing the visual appeal of reconstructions intended for human supervision of the machine-oriented main branch.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.