Abstract
Research in light field (LF) processing has heavily increased over the last decade. This is largely driven by the desire to achieve the same level of immersion and navigational freedom for camera-captured scenes as it is currently available for CGI content. Standardization organizations such as MPEG and JPEG continue to follow conventional coding paradigms in which viewpoints are discretely represented on 2-D regular grids. These grids are then further decorrelated through hybrid DPCM/transform techniques. However, these 2-D regular grids are less suited for high-dimensional data, such as LFs. We propose a novel coding framework for higher-dimensional image modalities, called Steered Mixture-of-Experts (SMoE). Coherent areas in the higher-dimensional space are represented by single higher-dimensional entities, called kernels. These kernels hold spatially localized information about light rays at any angle arriving at a certain region. The global model consists thus of a set of kernels which define a continuous approximation of the underlying plenoptic function. We introduce the theory of SMoE and illustrate its application for 2-D images, 4-D LF images, and 5-D LF video. We also propose an efficient coding strategy to convert the model parameters into a bitstream. Even without provisions for high-frequency information, the proposed method performs comparable to the state of the art for low-to-mid range bitrates with respect to subjective visual quality of 4-D LF images. In case of 5-D LF video, we observe superior decorrelation and coding performance with coding gains of a factor of 4x in bitrate for the same quality. At least equally important is the fact that our method inherently has desired functionality for LF rendering which is lacking in other state-of-the-art techniques: (1) full zero-delay random access, (2) light-weight pixel-parallel view reconstruction, and (3) intrinsic view interpolation and super-resolution.
Highlights
V IRTUAL reality (VR) for camera-captured scenes is fundamentally different and more complex compared to VR consumption of computer-generated (CG) scenes
A light field (LF) video with 10 × 10 viewpoints, in full HD at 30 fps yields 6,220,800,000 pixels per second! Sparse representations such as Steered Mixture-of-Experts (SMoE) are hugely beneficial for such higher-dimensional modalities as a single kernel can span over a large number of pixels spread out over five dimensions simultaneously
Four RD-points of the three High Efficiency Video Coder (HEVC) configurations and for the minibatch SMoE method were selected in the lowest range, as this was assumed to cover the highest variance in Mean Opinion Scores (MOS) scores
Summary
V IRTUAL reality (VR) for camera-captured scenes is fundamentally different and more complex compared to VR consumption of computer-generated (CG) scenes (e.g. as in gaming). The second strategy relies on known hybrid transform/difference-coding techniques commonly used for video Following this philosophy, scenes are represented by coding a minimal set of 2-D images, and reconstructing the missing ones by view synthesis. The reconstruction is not truly pixel-level parallel due to the intra-coding techniques These systems do not cope well with irregularly-sampled data and heterogeneous camera setups in scene acquisition systems. We focus on LF images and video, and the reconstruction performance with coded model parameters as a LF compression tool. Other applications of this representation are superresolution, denoising, segmentation, etc.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.