We describe a system for robustly estimating synthetic depth maps in unconstrained images and videos, for semi-automatic conversion into stereoscopic 3D. Currently, this process is automatic or done manually by rotoscopers. Automatic is the least labor intensive, but makes user intervention or error correction difficult. Manual is the most accurate, but time consuming and costly. Noting the merits of both, a semi-automatic method blends them together, allowing for faster and accurate conversion. This requires user-defined strokes on the image, or over several keyframes for video, corresponding to a rough estimate of the depths. After, the rest of the depths are determined, creating depth maps to generate stereoscopic 3D content, with Depth Image Based Rendering to generate the artificial views. Depth map estimation can be considered as a multi-label segmentation problem: each class is a depth. For video, we allow the user to label only the first frame, and we propagate the strokes using computer vision techniques. We combine the merits of two well-respected segmentation algorithms: Graph Cuts and Random Walks. The diffusion from Random Walks, with the edge preserving of Graph Cuts should give good results. We generate good quality content, more suitable for perception, compared to a similar framework.
Read full abstract