Abstract

The rapid advancement of multimedia and imaging technologies has resulted in increasingly diverse visual and semantic data. A large range of applications such as remote-assisted driving requires the amalgamated storage and transmission of various visual and semantic data. However, existing works suffer from the limitation of insufficiently exploiting the redundancy between different types of data. In this paper, we propose a unified framework to jointly compress a diverse spectrum of visual and semantic data, including images, point clouds, segmentation maps, object attributes and relations. We develop a unifying process that embeds the representations of these data into a joint embedding graph according to their categories, which enables flexible handling of joint compression tasks for various visual and semantic data. To fully leverage the redundancy between different data types, we further introduce an embedding-based adaptive joint encoding process and a Semantic Adaptation Module to efficiently encode diverse data based on the learned embeddings in the joint embedding graph. Experiments on the Cityscapes, MSCOCO, and KITTI datasets demonstrate the superiority of our framework, highlighting promising steps toward scalable multimedia processing.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call