TSF-NET3D: TSF-NET for 3D Point Cloud Attribute Compression Artifacts Removal
Transformer-based Spatial and Frequency-Decomposed Feature Fusion Network (TSF-Net) exhibited great potential as a learned in-loop filter in Versatile Video Coding (VVC). Utilizing a channel-wise transformer for pixel and frequency-decomposed feature fusion in a multi-scale deep-learning setup, TSF-Net achieved remarkable success in removing video compression artifacts. In this article, considering the potential of TSF-Net, we extend this work to the 3D domain of point clouds and propose a new framework called TSF-Net3D. More specifically, we incorporate sparse convolution (SparseConv) to process point clouds and implement TSF-Net3D as a post-processing block in Geometry-based Point Cloud Compression (G-PCC) to enhance the quality of color attribute in the reconstructed frame. Implementation-wise, TSF-Net3D differs from TSF-Net in two fronts: (1) TSF-Net3D does not utilize frequency-decomposed information but rather pixel information only; (2) TSF-Net3D extends point cloud processing in three scales with two-level feature fusion, unlike TSF-Net, which processes features at only two scales with single-level feature fusion. We evaluate TSF-Net3D on the 8 iVFBv 2 dataset, and our experimental results demonstrate that our proposed method achieves a significant YUV Bjøntegaard Delta (BD) - bitrate saving of up to -13.12% over the G-PCC(TMC13v21) RAHT baseline while also outperforming other state-of-the-art methods.