Abstract

Automatically recognizing the visual phrase of an image is a challenging issue in computer vision. In this paper, we propose a method to discover and identify the visual phrase by automatically analyzing 3D spatial geometric structure of an image. It includes two steps: (1) learning 3D spatial geometric model; and (2) recognizing visual phrase. To achieve the first goal, we propose 3D geometric models (3DSG) that jointly capture both the features of objects and 3D spatial layout among objects in a visual phrase. In the second step, we transform the visual phrase recognition into verification by measuring the similarity of spatial configuration between the given visual pattern and the 3DSG model. The nature of our method makes itself precisely determine whether the given visual pattern belongs to a specific 3DSG model or not by maximizing the joint probability of the given visual pattern and a 3DSG model. Experiments conducted on several datasets show that our model outperforms the state-of-the-art models in modeling 3D spatial geometric structure as well as recognizing visual phrase. The results also demonstrate that modeling 3D spatial configuration between objects can significantly improve the deeper image understanding.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.