Abstract

A key challenge in object manipulation using prosthetic hands is grasp detection and pose estimation, especially in cluttered scenes. Vision-based robotic grasping solutions typically only use conventional frame-based video cameras with high spatiotemporal redundancy, which is unsuitable for mobile platforms like prostheses with low processing power. On the other hand, while event-based dynamic vision sensors (DVS) have low spatiotemporal redundancy, their low resolution results in poor object segmentation and detection performance. In this paper we outline a novel hybrid solution inspired by the two-streams hypothesis of the neural processing of vision, utilizing both a frame-based video camera and a DVS to counter the pitfalls of both systems. By using computationally efficient object detection methods on the frame-based camera to highlight regions-of-interest (ROIs) for the DVS, we are able to perform pose estimation by computing the smallest axis of DVS events generated in the ROI. The proposed approach allows us to rapidly determine the required wrist rotation and a suitable grasp type to pick up objects using a prosthetic hand. Results on a laptop show that our method matches the accuracy of a conventional solution that employs only a frame-based video camera, while achieving 77.29% faster inference speed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call