Abstract
Actions are fundamental for perception leading to law-like relations, which can be described by the theory of Sensory Motor Contingencies (SMCs) [1]. We propose that actions do not only play a key role for perception, but also in developing more complex cognitive capabilities, e.g. definition of object concepts and action plans. Once SMCs have been learned their mastery readily can lead to goal-oriented behavior. We want to exploit the concept of SMCs to learn object affordances [2] and use them for grasping in a real-world robot system. The robot will learn which features of an object are relevant for grasping and choose its motor actions accordingly. For this purpose, we suggest a novel architecture, which combines unsupervised learning of Sigma-Pi neurons [3] and reinforcement learning (RL). To perform successful grasping two steps are necessary. First, the position of the target object has to be identified and it has to be known in which relation this position is with respect to the hand that will be used for grasping. Multiplying the co-activation of the input units coding for hand and object position and summing over the multiplied inputs leads to an invariant output representing the distance of the two entities. This can be done using Sigma-Pi neurons. The next step is to learn the motor actions that result in a movement of the hand to the object. For this purpose classical reinforcement learning, e.g. SARSA is applied. Using a traditional approach would lead to first train the Sigma-Pi layer in a self-organizing fashion [3] and then, in a second step, use the learned relations as a basis for RL. A self-organized percept is thereby adaptively paired with a suitable action. We propose an alternative method that is capable of learning both, the relations of object and hand and the movement of the hand towards the object in a single-step procedure. In RL the prediction error between estimates of neighboring state values is determined and in turn used to modulate learning of action weights that encode both, value function and action strategy (Q-values). However, after a successful action the prediction error can not only be used to update the Q-values, it also can be used to adapt the weights of the Sigma-Pi neurons of the lower layer. These neurons, each associated to an action, thereby learn the action-relevant input manifold in a law-like relation. A similar approach has been successfully applied to learn action-relevant features of stimuli [4]. Currently we are working on a proof-of-concept simulation comparing the one-step approach to the traditional two-step procedure.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.