Point clouds offer realistic three-dimensional (3-D) representations of objects and scenes at the expense of large data volumes. To represent such data compactly in real-world applications, Video-based Point Cloud Compression (V-PCC) converts their texture into two-dimensional (2-D) attributes and occupancy maps before applying lossy video compression. Unfortunately, the coding artifacts introduced in the decoded attribute maps eventually degrade the quality of the reconstructed point cloud, thereby influencing its immersive experience. This paper proposes a deep learning-based attribute map enhancement method that fully leverages the occupancy map's guidance. The design philosophy is that the cross-modality guidance from occupancy can be leveraged as critical information to enhance the attribute. Therefore, instead of treating attribute and occupancy as two separate sources of signals, occupancy serves as an indispensable auxiliary, such that the proposed framework explicitly provides the model with abundant clues by conducting local feature modification and global dependencies aggregation. In particular, the proposed framework is compatible with existing V-PCC bitstreams and can be feasibly incorporated into the standardized decoder pipeline. Extensive evaluations show the effectiveness of the proposed framework in attribute enhancement, with equivalently 6.0% Bjontegaard Delta-rate (BD-rate) savings obtained.
Read full abstract