Abstract

It is fair to say that robots which can interact with and serve humans especially in the domestic environment will spread widely in near future. A fundamental task called mobile manipulation is required for such domestic service robots. Therefore, many humanoid robots have been developed with the ability of mobile manipulation (1–5). Recently, competitions such as RoboCup@Home (6), Mobile Manipulation Challenge (7), and Semantic Robot Vision Challenge (8), have been proposed to evaluate such robots. Since the tasks are implemented on domestic service robots, it stands to reason that natural interaction such as speech instruction should be used for the mobile manipulation. Here, we focus on the mobile manipulation using natural speech instruction such as “Bring me X” (X is an out-of-vocabulary (OOV) word). In order to realize this task, the integration of navigation, manipulation, speech recognition, and image recognition is required. Image and speech recognition are difficult especially when novel objects are involved in the system. For example, there are objects specific to each home and new products can be brought into the home. It is impossible to register the names and images of all these objects with the robot in advance. Hence, we propose a method for learning novel objects with a simple procedure. The robot, on which the proposed learningmethod is implemented, is intended to be used in a private domestic environment. Therefore, the procedure of teaching objects to the robot must be simple. For example, the user says, “This object is X” (X is the name of the object) and shows the object to the robot (Fig.1: Left). It is easy for a user to teach a robot many objects with this procedure. Then the user orders the robot to bring him/her something. For example, the user says, “Bring me X” (Fig.1: Right). As we mentioned earlier, such extended manipulation tasks are necessary for domestic service robots. However, there are three problems in teaching novel objects to the robots. The first problem is speech recognition of an object’s name. In usual methods, phonemes of names must be registered in an internal dictionary. However, it is impossible to register all objects in advance. The second problem is the speech synthesis. A robot must utter the name of the recognized object for interaction with humans such as “Is it X?” However, conventional robot utterance systems cannot utter a word which is not registered in the dictionary. Even if the phoneme sequence of an OOVword can be recognized, 13

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call