Abstract

Depth-image-based rendering (DIBR) is a commonly used method for synthesizing additional views using video-plus-depth (V+D) format. A critical issue with DIBR-based view synthesis is the lack of information behind foreground objects. This lack is manifested as disocclusions, holes, next to the foreground objects in rendered virtual views as a consequence of the virtual camera “seeing” behind the foreground object. The disocclusions are larger in the extrapolation case, i.e. the single camera case. Texture synthesis methods (inpainting methods) aim to fill these disocclusions by producing plausible texture content. However, virtual views inevitably exhibit both spatial and temporal inconsistencies at the filled disocclusion areas, depending on the scene content. In this paper, we propose a layered depth image (LDI) approach that improves the spatio-temporal consistency. In the process of LDI generation, depth information is used to classify the foreground and background in order to form a static scene sprite from a set of neighboring frames. Occlusions in the LDI are then identified and filled using inpainting, such that no disocclusions appear when the LDI data is rendered to a virtual view. In addition to the depth information, optical flow is computed to extract the stationary parts of the scene and to classify the occlusions in the inpainting process. Experimental results demonstrate that spatio-temporal inconsistencies are significantly reduced using the proposed method. Furthermore, subjective and objective qualities are improved compared to state-of-the-art reference methods.

Highlights

  • 1 Introduction Three-dimensional television (3DTV) and free viewpoint television (FTV) technologies are actively pursued in both research and compression standardization, which is supported by the development of enabling display technologies [1]

  • The proposed hole classification using the optical flow ensures the temporal consistency at static object holes by reusing the previously inpainted information

  • The computational time was reduced compared to reference methods by using the hole classification and reusing the inpainted textures to fill the holes at static objects

Read more

Summary

Introduction

Three-dimensional television (3DTV) and free viewpoint television (FTV) technologies are actively pursued in both research and compression standardization, which is supported by the development of enabling display technologies [1]. Background sprites are generated by using the temporal frame information, after which disocclusions are filled and updated using the true texture and inpainting is applied to fill holes in virtual views [29,30,31,32]. Regardless, reusing the synthesized background from the sprite and inpainting the remaining holes at the moving foreground causes spatial inconsistencies Another spritebased hole-filling method presented in [33,34,35] extracts the static background information using depth classification by probability analysis, a Gaussian mixed model and structural similarity index. The proposed method produces a layered depth image by identifying the occlusions and inpainting them in the original view using temporal information and motion estimation.

Static scene sprite construction
Foreground-background boundary extraction
Results and analysis
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call