Forming Object Concept Using Bayesian Network

Tomoaki Nakamura,Takayuki Nagai

doi:10.5772/10075

Abstract

As the recent developments in humanoid robotics, there is growing interest in object recognition and learning, since they are essential tasks for robots to work in our surrounding environments. Most frameworks for recognition and learning are based only on visual features. It seems that those are insufficient for ’understanding’ of objects, since each object has its own intended use leading to the function, which is the key to object concept (Landau et al., 1998; Stark et al., 1996). Of course, appearance is deeply related with functions, since many objects have certain forms resulting from their functions. This fact is especially-pronounced in hand tools. Thus the visual learning and recognition of hand tools may succeed to some extent. However, such classification does not give any information on their functions. The important point is not classification in its own right but rather inference of the function through the classification. We believe that must be the basis of ’understanding’, which we call object concept. Therefore objects must be learned, i.e. categorized, and recognized through their functions. In this chapter objects (hand tools) are modeled as the relationship between appearance and functions. The proposed approach uses the model, which relates appearance and functions, for learning and recognizing objects. The appearance is defined as a visual feature of the object, while the function is defined as certain changes in work objects caused by a tool. Each function is represented by a feature vector which quantifies the changes in the work object. Then the function is abstracted from these feature vectors using the Bayesian learning approach (Attias, 1999). All information can be obtained by observing the scene, in which a man uses the hand tool. For the model of object concept, Bayesian Network is utilized. The conditional probability tables, which are parameters of the model, are estimated by applying EM algorithm to the observed visual features and function information. This process can be seen as the learning of objects based on their functions. Since the function and appearance are stochastically connected in the model, inference of unseen object’s function is possible as well as recognizing its category. Related works are roughly classified into three categories. One of these is an attempt to recognize objects through their functions (Rivlin et al., 1995; Stark et al., 1996; Woods et al., 1995). Although those works share the same idea with us, the authors do not consider the learning process of object function. Thus the function of each object must be defined and programmed manually. Secondly, unsupervised visual categorization of objects has been studied extensively (Fergus et al., 2003; Sivic et al., 2005). However, function is not taken into consideration. Thirdly, there has been research on object recognition through human action (Kojima et al., 2004). The authors relate object recognition with human action, which represents how to use it, rather than the object function itself. In (Ogura et al., 2005), authors have reported the 6

Full Text