Special issue on robot vision: what is robot vision?

Udo Frese,Heiko Hirschmüller

doi:10.1007/s11554-015-0541-3

Abstract

The Oxford English Dictionary defines a robot as ‘‘a machine capable of carrying out a complex series of actions automatically [...]’’. So the key point of a robot is that it can act. However, to be able to act, a robot must first perceive: What is around me? Where am I? And where are things I want to act on? Since the early days of robotics, the prime perception channel to answer these questions is vision—robot vision. Nowadays, this is even more true than before, since RGBD cameras provide direct geometrical perception. Remarkably, these devices provide ‘‘vision’’ in two senses. Seen from the outside, they extend the classical RGB images, where geometry must be indirectly inferred from with a direct sense of depth of scene geometry. Seen from the inside, they are actually vision sensors based on a conventional camera and structured light. As in robotics the objective is to make robots act, there are two other points related (at least to some degree) to robot vision worth stating here: first, the final output is usually metric 3D information, which is needed as input for robot motion. And second, since motion involves time, robot perception and robot vision need to be accomplished in real-time, which is the theme of the JRTIP journal. This special issue on Robot Vision aims at reporting on recent progress made to use real-time image processing towards addressing the above three questions of robotic perception. The call for papers of this special issue received a total of 26 manuscripts. Based on thorough reviews conducted by three reviewers per manuscript, seven high-quality papers for inclusion in this issue were selected which are briefly mentioned below. The first two papers aim at perceiving what is around a robot. In their paper ‘Dense real-time mapping of objectclass semantics from RGB-D’, Stuckler et al. reconstruct a 3D environment model from a moving RGBD-camera and simultaneously label parts of this model with semantic categories such as ground, furniture, palette, or human. Kriegel et al. focus on specific unknown objects in the environment in their paper ‘Efficient next-best-scan planning for autonomous 3D surface reconstruction of unknown objects’, which is a good example regarding how the capability of robots is used in perception. The question ‘Where am I?’ is addressed by Asadi et al. in their paper ‘Delayed fusion for real-time vision-aided inertial navigation’. It discusses that a moving robot has already moved further than that shown in the image once that image has been processed by computer vision and how this aspect is incorporated into sensor-fusion algorithms. Three papers aim at finding objects around a robot and determining their pose, thereby answering the third question. Orts-Escolano et al. concentrate on the performance of the first feature extraction stages and speed them up using GPGPU processing in their paper ‘Real-time 3D semi-local surface patch extraction using GPGPU’.Wang et al. propose a novel global object descriptor that combines color and shape in their paper ‘Textured/textureless object recognition and pose estimation using RGB-D image’. Finally, in their paper ‘Advances in real-time object tracking—extensions for robust object tracking with a monte carlo particle filter’ Morwald et al. emphasize the tracking view on object pose estimation and propose how the tracker can improve reliability and accuracy of the determined pose. U. Frese (&) University of Bremen, Enrique-Schmidt-Str. 5, 28215 Bremen, Germany e-mail: ufrese@informatik.uni-bremen.de

Full Text