Abstract

Aggregating a set of local features has become one of the most common approaches for representing a multi-media data such as 2D image and 3D model. The success of Bag-of-Features (BF) aggregation [2] prompted several extensions to BF, that are, VLAD [12], Fisher Vector (FV) coding [22] and Super Vector (SV) coding [34]. They all learn small number of codewords, or representative local features, by clustering a set of large number of local features. The set of local features extracted from a media data (e.g., an image) is encoded by considering distribution of features around the codewords; BF uses frequency, VLAD and FV uses displacement vector, and SV uses a combination of both. In doing so, these encoding algorithms assume linearity of feature space about a codeword. Consequently, even if the set of features form a non-linear manifold, its non-linearity would be ignored, potentially degrading quality of aggregated features. In this paper, we propose a novel feature aggregation algorithm called Diffusion-on-Manifold (DM) that tries to take into account, via diffusion distance, structure of non-linear manifold formed by the set of local features. In view of 3D shape retrieval, we also propose a local 3D shape feature defined for oriented point set. Experiments using shape-based 3D model retrieval scenario show that the DM aggregation results in better retrieval accuracy than the existing aggregation algorithms we've compared against, that are, VLAD, FV, and SV, etc..

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call