Abstract

With the development of deep learning technology and the abundance of sensors, machine vision applications that utilize vast amounts of image/video data are rapidly increasing in the autonomous vehicle, video surveillance and smart city fields. However, achieving a more compact image/video representation and lower latency solutions is challenging for such machine-based applications. Therefore, it is essential to develop a more efficient video coding standard for machine vision applications. Currently, the Moving Picture Experts Group (MPEG) is developing a new standard called video coding for machines (VCM) with two tracks, each mainly dealing with compression of the input image/video (Track 2) and compression of the features extracted from it (Track 1). In this paper, an enhanced multiscale feature compression (E-MSFC) method is proposed to efficiently compress multiscale features generated by a feature pyramid network (FPN), which is the backbone network of machine vision networks specified in the VCM evaluation framework. The proposed E-MSFC reduces the feature channels to be included in a single feature map and compresses the feature map using versatile video coding (VVC), the latest video standard, rather than the single stream feature compression (SSFC) module in the existing MSFC. In addition, the performance of the E-MSFC is further enhanced by adding a bottom-up structure to the multiscale feature fusion (MSFF) module, which performs the channel-wise reduction in the E-MSFC. Experimental results reveal that the proposed E-MSFC significantly outperforms the VCM image anchor with a BD-rate gain of up to 85.94%, which includes an additional gain of 0.96% achieved by the MSFF with the bottom-up structure.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call