Abstract

This paper explores ways of combining vision and touch for the purpose of object recognition. In particular, it focuses on scenarios when there are few tactile training samples (as these are usually costly to obtain) and when vision is artificially impaired. Whilst machine vision is a widely studied field, and machine touch has received some attention recently, the fusion of both modalities remains a relatively unexplored area. It has been suggested that, in the human brain, there exist shared multi-sensorial representations of objects. This provides robustness when one or more senses are absent or unreliable. Modern robotics systems can benefit from multi-sensorial input, in particular in contexts where one or more of the sensors perform poorly. In this paper, a recently proposed tactile recognition model was extended by integrating a simple vision system in three different ways: vector concatenation (vision feature vector and tactile feature vector), object label posterior averaging and object label posterior product. A comparison is drawn in terms of overall accuracy of recognition and in terms of how quickly (number of training samples) learning occurs. The conclusions reached are: (1) the most accurate system is “posterior product”, (2) multi-modal recognition has higher accuracy to either modality alone if all visual and tactile training data are pooled together, and (3) in the case of visual impairment, multi-modal recognition “learns faster”, i.e. requires fewer training samples to achieve the same accuracy as either other modality.

Highlights

  • It seems evident that the presence of multiple sensors, capable of capturing complementary information about the environment, is a desirable feature of modern robots [11, 18]

  • This paper focuses on this particular scope: single-touch object recognition

  • A system was proposed for the purpose of visuo-tactile object recognition, by extending a recent tactile recognition model [7] and integrating it with a simple visual model

Read more

Summary

Background

It seems evident that the presence of multiple sensors, capable of capturing complementary information about the environment, is a desirable feature of modern robots [11, 18]. In the field of machine vision, object recognition has been so well understood that, in some cases, artificial systems have surpassed human accuracy [13]. (2017) 4:2 such as point-clouds or voxel space representation Accuracy in these studies reaches 80% in some cases for 45 objects and only 10 touches, but 3D models of the objects are required in advance. This paper focuses on this particular scope: single-touch (non-grasping) object recognition. Pezzementi et al [30] apply a predefined exploration routine with a single finger contact, to learn object models based on histograms of features ( being the closest in data collection methodology to the work presented in this paper). It was shown that single-touch object recognition is possible even with a low-resolution sensor [7]. That model is extended to account for visual information, comparing three different approaches to such multi-modal integration

Related work
Findings
Conclusions and evaluation
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call