Abstract

We propose a new method for modeling the indoor scene from a single color image. With our system, the user only needs to drag a few semantic bounding boxes surrounding the objects of interest. Our system then automatically finds the most similar 3D models from the ShapeNet model repository and aligns them with the corresponding objects of interest. To achieve this, each 3D model is represented as a group of view-dependent representations generated from a set of synthesized views. We iteratively conduct object segmentation and 3D model retrieval, based on the observation that good segmentation of the objects of interest can significantly improve the accuracy of model retrieval and make it robust to cluttered background and occlusions, and in turn, the retrieved 3D models can be used to assist with object segmentation. Segmentation of all objects of interest is achieved simultaneously under a unified multi-labeling framework which fully utilizes the correspondences between the objects of interest and retrieved model images. Besides, we propose a new method to estimate the scene layout of the input image with the segmentation masks, which helps compose the resulting scene and further improves the modeling result remarkably. We verify the effectiveness of our approach through experimenting with a variety of indoor images and comparing against the relevant methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call