An efficient Hough transform for multi-instance object recognition and pose estimation

Erdem Yoruk,Ceyhun Burak Akgul,Kaan Taha Oner

doi:10.1109/icpr.2016.7899825

Abstract

Generalized Hough transform, when applied to object detection, recognition and pose estimation, can be susceptible to spurious voting depending on the Hough space to be used and hypotheses to be voted. This often necessitates additional computational steps like non-maxima suppression and geometric consistency checks, which can be costly and prevent voting based methods from being precise and scalable for large numbers of target classes and crowded scenes. In this paper, we propose an efficient and refined Hough transform for simultaneous detection, recognition and exact pose estimation, which can efficiently accommodate up to multiple tens of co-visible query instances and multiple thousands of visually similar classes. Specifically, we match SURF features from a given query image to a database of model features with known poses, and in contrast to existing techniques, for each matched pair, we analytically compute a concise set of 6 degrees-of-freedom pose hypotheses, for which the geometric relationship of the correspondence remains invariant. We also introduce an indirect but equivalent representation for those correspondence-specific poses, termed as feature aligning affine transformations, which results in a Hough voting scheme as cheap and refined as line drawing in raster grids. Owing to minimized voting redundancy, we can obtain a very sparse and stable Hough image, which can be readily used to read off instances and poses without dedicated steps of non-maxima suppression and geometric verification. Experimented on an extensive Grocery Products dataset, our method significantly outperforms the sate-of-the-art with near real time overall cost.

Full Text