Abstract

Pose estimation of recognized objects is fundamental to tasks such as robotic grasping and manipulation. The need for reliable grasping imposes stringent accuracy requirements on pose estimation in cluttered, occluded scenes in dynamic environments. Modern methods employ large sets of training data to learn features and object templates in order to find correspondence between models and observed data. However, these methods require extensive annotation of ground-truth poses. An alternative is to use algorithms, such as PERCH (PErception Via SeaRCH) that seek an optimal explanation of the observed scene in a space of possible rendered versions. While PERCH offers strong guarantees on accuracy, the initial formulation suffers from poor scalability owing to its high runtime. In this work, we present PERCH 2.0, a deliberative approach that takes advantage of GPU acceleration and RGB data by formulating pose estimation as a single-shot, fully parallel approach. We show that PERCH 2.0 achieves a two orders of magnitude speedup (∼100X) over the hierarchical PERCH by evaluating thousands of poses in parallel. In addition, we propose a combined deliberative and discriminative framework for 6-DoF pose estimation that doesn’t require any ground-truth pose-annotation. Our work shows that PERCH 2.0 achieves, on the YCB-Video Dataset, a higher accuracy than DenseFusion, a state-of-the-art, end-to-end, learning-based approach. We also demonstrate that our work leads directly to an extension of deliberative pose estimation methods like PERCH to new domains, such as conveyor picking, which was previously infeasible due to high runtime. Our code is available at https://sbpl-cruz.github.io/perception/

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call