Abstract

Image matching, or comparing images in order to obtain a measure of their similarity, is a fundamental aspect of many problems in computer vision, including object and scene recognition, content-based image retrieval, stereo correspondence, motion tracking, texture classification and video data mining. It is a complex problem, that remains challenging due to partial occlusions, image deformations, and viewpoint or lighting changes that may occur across different images (Grauman & Darrell, 2005). Image matching can be defined as “the process of bringing two images geometrically into agreement so that corresponding pixels in the two images correspond to the same physical region of the scene being imaged” (Dai & Lu, 1999). Therefore, according to this definition, image matching problem is accomplished by transforming (e.g., translating, rotating, scaling) one of the images in such a way that the similarity with the other image is maximised in some sense. The 3D nature of real-world scenarios makes this solution complex to achieve, specially because images can be taken from arbitrary viewpoints and in different illumination conditions. Instead, the similarity may be applied to global features derived from the original images. However, this is not the more efficient solution. Besides, these global statistics cannot usually deal with real-world scenarios because they do not often give adequate descriptions of the local structures or discriminating features which are present on the image (Grauman & Darrell, 2005). Other solution to the image matching problem is to describe the image using a set of distinguished regions (Matas et al., 2002). These regions must own some invariant and stable property in order to be detected with high repeatability in images taken from arbitrary viewpoint. Then, the matching between two images is posed as a search in the correspondence space established between the associated sets of distinguished regions. If each region is described by a vector of image pixels, then cross-correlation can be used to obtain a similarity value between two regions (Mikolajczyk & Schmid, 2005). However, due to the high dimensionality of such vector, the generation of the correlation space typically presents a high computational cost. In order to reduce the computational complexity, the number of tentative correspondences can be limited by computing local invariant descriptors for distinguished regions (Matas et al., 2002; Grauman & Darrell, 2005). These descriptors can be also employed to estimate the similarity value between two regions. In this paper, we have adopted an approach which describes the image using a set of distinguished regions and exploits local invariant descriptors to estimate the similarity value between two distinguished regions belonging to different images. Thus, there are four

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call