Abstract

Studying the spatial organization of objects in images is fundamental to increase both the understanding of a sensed scene and the explainability of the perceived similarity between images. This leads to the fundamental problem of handling spatial relations: given two objects depicted in an image, or two parts in an object, how to extract and describe efficiently their spatial configuration? Dedicated descriptors already exist for this task, like the efficient force histogram. In this article, we introduce the Force Banner, which extends it to two dimensions by using a panel of forces (attraction and repulsion), so as to benefit from more expressiveness and to model rich spatial information. This descriptor can be used as an intermediate representation of the image dedicated to the spatial configuration, and feed a classical 2D Convolutional Neural Network (CNN) to benefit from their powerful performances. As an illustration of this, we used it to solve a classification problem aiming to discriminate simple spatial relations, but with variable configuration complexities. Experimental results obtained on datasets of images with various shapes highlight the interest of this approach, in particular for complex spatial configurations.

Highlights

  • In recent years, taking spatial relationships into account in image analysis processes has been a hot topic studied by the computer vision community, and more generally in the pattern recognition domain

  • We propose to combine the advantages of traditional approaches a.k.a. relative position descriptors to those of Convolutional Neural Network (CNN) to answer the problem of the recognition of spatial relations

  • Rather than trying to learn a spatial relationship directly from the initial image space, as it is the case with standard CNNs based approaches, we propose in Section III an intermediate representation of the image, capturing information of relative positions between a couple of objects, and we train the CNNs to recognize the spatial relationship from this rich representation

Read more

Summary

INTRODUCTION

In recent years, taking spatial relationships into account in image analysis processes has been a hot topic studied by the computer vision community, and more generally in the pattern recognition domain. Deep learning based strategies (such as Convolutional Neural Networks (CNNs)) have been proposed in the computer vision community to efficiently exploit the discriminative aspects of local features in images for various tasks Such models have led to outstanding results in image classification tasks, but one of their inherent downside is precisely their weak ability to take into account spatial information, because images are represented as orderless collections of local features. Rather than trying to learn a spatial relationship directly from the initial image space, as it is the case with standard CNNs based approaches, we propose in Section III an intermediate representation of the image, capturing information of relative positions between a couple of objects, and we train the CNNs to recognize the spatial relationship from this rich representation.

RELATED WORKS
PROPOSED APPROACH FOR THE RECOGNITION OF
Towards the Force Banner
Chosen CNN model
EXPERIMENTAL STUDY
Experimental protocol
Results and discussion
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.