Abstract

3D object recognition has been a cutting-edge research topic since the popularization of depth cameras. These cameras enhance the perception of the environment and so are particularly suitable for autonomous robot navigation applications. Advanced deep learning approaches for 3D object recognition are based on complex algorithms and demand powerful hardware resources. However, autonomous robots and powered wheelchairs have limited resources, which affects the implementation of these algorithms for real-time performance. We propose to use instead a 3D voxel-based extension of the 2D histogram of oriented gradients (3DVHOG) as a handcrafted object descriptor for 3D object recognition in combination with a pose normalization method for rotational invariance and a supervised object classifier. The experimental goal is to reduce the overall complexity and the system hardware requirements, and thus enable a feasible real-time hardware implementation. This article compares the 3DVHOG object recognition rates with those of other 3D recognition approaches, using the ModelNet10 object data set as a reference. We analyze the recognition accuracy for 3DVHOG using a variety of voxel grid selections, different numbers of neurons ( Nh) in the single hidden layer feedforward neural network, and feature dimensionality reduction using principal component analysis. The experimental results show that the 3DVHOG descriptor achieves a recognition accuracy of 84.91% with a total processing time of 21.4 ms. Despite the lower recognition accuracy, this is close to the current state-of-the-art approaches for deep learning while enabling real-time performance.

Highlights

  • Over the last decade, object recognition through visual cameras has been a fundamental computer vision research question

  • We evaluate a 3D handcrafted object descriptor that was developed by Dupre and Argyriou[2] as an extension of the original 2D histogram of oriented gradients (HOG)[3] to support volumetric 3D data (3DVHOG)

  • We have proposed a pose normalization method based on the principal component analysis (PCA) pose normalization in combination with the standard data deviation (PCA-STD).[38]

Read more

Summary

Introduction

Object recognition through visual cameras has been a fundamental computer vision research question. The introduction of consumer depth cameras in recent years has led to an extension of computer vision from 2D to 3D data, enabling a real-world visual perception. The powered wheelchair requires to measure relative distances (d) from the surrounding objects while at the same time recognizing the caregiver from any other objects (Figure 1). Depth data processing and 3D object recognition are simple tasks for human perception, they are a huge challenge for computer vision due to limitations of the 3D image data acquisition and the computational power required for real-time 3D data processing.[1] These limitations must be evaluated in advance to choose a proper depth data processing approach

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call