RGB-D data, being homogeneous cross-modal data, demonstrates significant correlations among data elements. However, current research focuses only on a unidirectional pattern of cross-modal contextual information, neglecting the exploration of bidirectional relationships in the compression field. Thus, we propose a joint RGB-D compression scheme, which is combined with Bi-directional Cross-modal Prior Transfer (Bi-CPT) modules and a Bi-directional Cross-modal Enhanced Entropy (Bi-CEE) model. The Bi-CPT module is designed for compact representations of cross-modal features, effectively eliminating spatial and modality redundancies at different granularity levels. In contrast to the traditional entropy models, our proposed Bi-CEE model not only achieves spatial-channel contextual adaptation through partitioning RGB and depth features but also incorporates information from other modalities as prior to enhance the accuracy of probability estimation for latent variables. Furthermore, this model enables parallel multi-stage processing to accelerate coding. Experimental results demonstrate the superiority of our proposed framework over the current compression scheme, outperforming both rate-distortion performance and downstream tasks, including surface reconstruction and semantic segmentation. The source code will be available at https://github.com/xyy7/Learning-based-RGB-D-Image-Compression .