Abstract

Event Abstract Back to Event Large-scale Real-time Object Identification Based on Analytic Features Stephan Hasler1*, Heiko Wersing1 and Edgar Korner1 1 Honda Research Institute Europe GmbH, Germany Inspired by the findings that columns in inferotemporal cortex respond to complex visual features generalizing over retinal position and scale (Tanaka, Ann. Rev. of Neurosc., 1996) and that objects are then represented by the combined activation of such columns (Tsunoda et al., Nature Neurosc., 2001), we previously developed a framework to select a set of analytic SIFT-descriptors (Hasler et al., Proc. of ICANN, 2007), that is dedicated for 3D object recognition. In this work we embed this representation in an online system that is able to robustly identify a large number of pre-trained objects. In contrast to related work, we do not restrict the objects' pose to characteristic views but rotate them freely in hand in front of a cluttered background.To tackle this unconstrained setting we use following processing steps: Stereo images are acquired with cameras mounted on a pan-tilt unit. Disparity is used to select and track a region of interest based on closest proximity. To remove background clutter we learn a foreground mask using depth information as initial hypothesis (Denecke et al., Neurocomputing, 2009). Then analytic shape features and additional color features are extracted. Finally, the identification is performed by a simple classifier. To our knowledge, this is the first system that can robustly identify 126 hand-held objects in real-time.The used type of representation differs strongly from the standard SIFT framework proposed by Lowe (Int. J. of Comp. Vision, 2004). First, we extract SIFT-descriptors at each foreground position in the attended image region. Thus, parts are found to be analytic that would not have passed usual keypoint criteria. Second, we do not store the constellation of object parts but keep only the maximum response per feature. This results in a simple combinatorial object representation in accordance to biology, but depends on a good figure-ground segregation. Third, we match the local descriptors against an alphabet of visual features. This alphabet is rather small (usually several hundreds) and the result of a supervised selection strategy favoring object specific parts that can be invariantly detected in several object poses. The selection method is dynamic in the way, that it selects more features for objects with stronger variations in appearance. We draw a direct comparison to the SIFT framework using the COIL100 database as a toy-problem.Despite the quite simple object representation, our system shows a very high performance in distinguishing the 126 objects in the realistic online setting. We underline this by the tests on an offline database acquired under the same conditions. With a nearest neighbor classifier (NNC) we obtain an error rate of 25 percent using analytic features only. When adding an RGB histogram as complementary feature channel this error rate drops to 15 percent for the NNC and to 10.35 percent using a single layer perceptron. Considering the high difficulty of the database with a baseline NNC error rate of 85 percent on the gray-scale images compared to 10 percent for the COIL100, these results mark a major step towards invariant identification of 3D objects. Conference: Bernstein Conference on Computational Neuroscience, Frankfurt am Main, Germany, 30 Sep - 2 Oct, 2009. Presentation Type: Poster Presentation Topic: Abstracts Citation: Hasler S, Wersing H and Korner E (2009). Large-scale Real-time Object Identification Based on Analytic Features. Front. Comput. Neurosci. Conference Abstract: Bernstein Conference on Computational Neuroscience. doi: 10.3389/conf.neuro.10.2009.14.010 Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters. The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated. Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed. For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions. Received: 25 Aug 2009; Published Online: 25 Aug 2009. * Correspondence: Stephan Hasler, Honda Research Institute Europe GmbH, Offenbach, Germany, stephan.hasler@honda-ri.de Login Required This action requires you to be registered with Frontiers and logged in. To register or login click here. Abstract Info Abstract The Authors in Frontiers Stephan Hasler Heiko Wersing Edgar Korner Google Stephan Hasler Heiko Wersing Edgar Korner Google Scholar Stephan Hasler Heiko Wersing Edgar Korner PubMed Stephan Hasler Heiko Wersing Edgar Korner Related Article in Frontiers Google Scholar PubMed Abstract Close Back to top Javascript is disabled. Please enable Javascript in your browser settings in order to see all the content on this page.

Highlights

  • The recognition of objects under real-world conditions is a difficult problem

  • The error rates of recognition are given depending on the number of used objects

  • This should help to predict the scalability of the approach towards larger number of objects

Read more

Summary

Introduction

The recognition of objects under real-world conditions is a difficult problem. Because of this, most approaches limit the complexity by using only few objects, restricting the pose to canonical views, or by providing controlled background conditions. Other parts-based approaches, like the one we use here, leave out spatial information by determining only the maximum response of an alphabet of features to an image [4, 5]. These approaches implicitly assume that only a single object is in view so that no binding is necessary To balance this more general type of representation the parts themselves have to be more specific and meaningful. They use a rather large alphabet of features which is trained in an unsupervised fashion and they represent spatial relations This more complex and slower processing is not reflected in a gain in performance as we report a similar performance for an even higher number of objects.

System
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.