Abstract
RGB-D data, being homogeneous cross-modal data, demonstrates significant correlations among data elements. However, current research focuses only on a unidirectional pattern of cross-modal contextual information, neglecting the exploration of bidirectional relationships in the compression field. Thus, we propose a joint RGB-D compression scheme, which is combined with Bi-directional Cross-modal Prior Transfer (Bi-CPT) modules and a Bi-directional Cross-modal Enhanced Entropy (Bi-CEE) model. The Bi-CPT module is designed for compact representations of cross-modal features, effectively eliminating spatial and modality redundancies at different granularity levels. In contrast to the traditional entropy models, our proposed Bi-CEE model not only achieves spatial-channel contextual adaptation through partitioning RGB and depth features but also incorporates information from other modalities as prior to enhance the accuracy of probability estimation for latent variables. Furthermore, this model enables parallel multi-stage processing to accelerate coding. Experimental results demonstrate the superiority of our proposed framework over the current compression scheme, outperforming both rate-distortion performance and downstream tasks, including surface reconstruction and semantic segmentation. The source code will be available at https://github.com/xyy7/Learning-based-RGB-D-Image-Compression .
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: ACM Transactions on Multimedia Computing, Communications, and Applications
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.