Fast Grid-Based Refining Segmentation Method in Video-Based Point Cloud Compression

Jieon Kim,Yong-Hwan Kim

doi:10.1109/access.2021.3084180

Abstract

The video-based point cloud compression (V-PCC, ISO/IEC 23090-5) is the state-of-the-art international standard for compressing dynamic point clouds developed by the moving picture experts group (MPEG). It has been achieved good rate-distortion (RD) performance by employing the 2D-based dynamic point cloud compression. As a brief look, V-PCC first converts the 3D input point cloud into a set of 2D patches followed by a packing process. The packing process then maps the patches into a 2D grid. Such a way allows compressing the patches utilizing the existing video coding standards. Besides the RD performance, complexity is another vital factor to consider in performance evaluations. In the V-PCC encoder, the self-time accounts for on average 15.9% and a maximum of 48.2% of the total-time, which can be a hindrance to realizing real-time V-PCC applications. One of the most computationally intensive modules of V-PCC is the grid-based refining segmentation (G-RS). Thus this paper proposes a fast G-RS method that can adaptively select voxels that need the refining segmentation. More concretely, the proposed method classifies the voxels based on the projection plane indices of 3D points and only applies the refining process to the selected voxels. Experimental results demonstrate that the proposed method reduces the complexity of the refining steps in G-RS, on average, by 60.7% and 62.5% without coding efficiency loss compared to the test model for category 2 (TMC2) version 12.0 reference software under the random access (RA) and all-intra (AI) configurations, respectively.

Highlights

Advances in three-dimensional (3D) capture technologies have opened a new chapter in 3D sensing beyond virtual/augmented reality (VR/AR) content creation to smart factories, robots, and automated driving applications
In viewing the V-point cloud compression (PCC) common test conditions (CTCs) [3], each 3D point cloud frame consists of 800,000-2,900,000 3D points, and a 3D point is stored with 10 bits to represent the geometry information and 8 bits for color (RGB) information
DIRECT EDGE-VOXEL AND NO EDGE-VOXEL DECISION We introduce the notion of uniformity index that indicates whether the projection plane index (PPI) distribution is uniform in a voxel

Summary

Introduction

Advances in three-dimensional (3D) capture technologies have opened a new chapter in 3D sensing beyond virtual/augmented reality (VR/AR) content creation to smart factories, robots, and automated driving applications. The first step converts a 3D point cloud into 2D video sequences (i.e., attribute and geometry videos [8], [9]) and additional metadata such as an occupancy map and auxiliary patch information, which are essential to interpret video sequences. This approach has been studied for many years and enables to leverage of the existing video coding standards for compressing 2D videos from a point cloud [7], [11], [30], [31]. Other more detailed descriptions regarding V-PCC can be found in the literature [6], [33]

Objectives

Methods

Results

Conclusion