Abstract
ABSTRACT For a domestic personal robot, personalized services are as important as predesigned tasks, because the robot needs to adjust the home state based on the operator's habits. An operator's habits are composed of cues, behaviors, and rewards. This article introduces behavioral footprints to describe the operator's behaviors in a house, and applies the inverse reinforcement learning technique to extract the operator's habits, represented by a reward function. We implemented the proposed approach with a mobile robot on indoor temperature adjustment, and compared this approach with a baseline method that recorded all the cues and behaviors of the operator. The result shows that the proposed approach allows the robot to reveal the operator's habits accurately and adjust the environment state accordingly.
Highlights
A personal robot is designed to provide standard services in different scenarios
The behavior is described by the environment state changes due to the behaviors, namely the behavioral footprints
To evaluate these learned reward functions, two indexes are adopted, including the accuracy of reward function rA, computed by comparing the robot’s evaluation of the home states on agreeability and the true values provided by the operator, and the accuracy of robot action rD, indicated by the ratio of disagreement on actions between the robot and the operator
Summary
A personal robot is designed to provide standard services in different scenarios. By incorporating a door recognition and manipulation algorithm, the robot can open various kinds of doors in different houses in exactly the same way. This strategy, combined with commands from the operator, allows the robot to complete each task consistently in different environments. The robot may need to open a door to different extents, as some operators like it to be fully open, while others may prefer it to be half open This kind of state adjustment, if designed in an offline way, requires a remarkable amount of manual work. The robot must be personalized by having it learn the habits of the operator, in order to adjust itself according to the habit of each operator
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.