Abstract

In this survey we present a complete landscape of joint object detection and pose estimation methods that use monocular vision. Descriptions of traditional approaches that involve descriptors or models and various estimation methods have been provided. These descriptors or models include chordiograms, shape-aware deformable parts model, bag of boundaries, distance transform templates, natural 3D markers and facet features whereas the estimation methods include iterative clustering estimation, probabilistic networks and iterative genetic matching. Hybrid approaches that use handcrafted feature extraction followed by estimation by deep learning methods have been outlined. We have investigated and compared, wherever possible, pure deep learning based approaches (single stage and multi stage) for this problem. Comprehensive details of the various accuracy measures and metrics have been illustrated. For the purpose of giving a clear overview, the characteristics of relevant datasets are discussed. The trends that prevailed from the infancy of this problem until now have also been highlighted.

Highlights

  • Object detection is the process of finding instances of real-world objects in images or videos

  • We look at different types of approaches for joint object detection and pose estimation

  • Expectation Maximization (EM) finds the maximum likelihood parameters in statistical models. These methods are conceptually similar (ICE is the manifestation of EM that considers local features that map to an object instance and evaluates pose hypotheses)

Read more

Summary

Introduction

Object detection is the process of finding instances of real-world objects in images or videos. Object detection algorithms typically use extracted features and learning algorithms to recognize instances of an object category. The position and orientation of an object is called the pose of the object. To determine the pose of an object in an image is called pose estimation. A general task in the various applications of pose estimation is to determine the position and orientation of each of the object in the scene with regard to some coordinate system. There have been many works on object detection and pose estimation separately. This work aims to cover all works that treat these two problems simultaneously

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call