Abstract

This article investigates a data-driven approach for semantic scene understanding, without pixelwise annotation or classifier training. The proposed framework parses a target image in two steps: first, retrieving its exemplars (that is, references) from an image database, where all images are unsegmented but annotated with tags; second, recovering its pixel labels by propagating semantics from the references. The authors present a novel framework making the two steps mutually conditional and bootstrapped under the probabilistic Expectation-Maximization (EM) formulation. In the first step, the system selects the references by jointly matching the appearances as well as the semantics (that is, the assigned labels) with the target. They process the second step via a combinatorial graphical representation, in which the vertices are superpixels extracted from the target and its selected references. Then they derive the potentials of assigning labels to one vertex of the target, which depend upon the graph edges that connect the vertex to its spatial neighbors of the target and to similar vertices of the references. The proposed framework can be applied naturally to perform image annotation on new test images. In the experiments, the authors validated their approach on two public databases, and demonstrated superior performance over the state-of-the-art methods in both semantic segmentation and image annotation tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.