Abstract
The stochastic multi-armed bandit problem is a standard model to solve the exploration–exploitation trade-off in sequential decision problems. In clinical trials, which are sensitive to outlier data, the goal is to learn a risk-averse policy to provide a trade-off between exploration, exploitation, and safety. In this paper, we present a risk-averse multi-armed bandit algorithm to solve a decision-making problem based on the social engagement behaviors of children with Autism Spectrum Disorder (ASD). The algorithm is carried out when children interact with a humanoid robot and imitate a sequence of the robot's movements. The proposed algorithm is based on the Best Empirical Sampled Average algorithm under Entropic Value-at-Risk as a risk measure to decide on the best sequence of movements that can improve the social engagement behaviors of the children with ASD while imitating the robot's movements. We provide a detailed experimental analysis to compare the performance of our proposed algorithm to some well-known risk-averse multi-armed bandit algorithms on some artificial scenarios and our real-world problem. The experimental results report that the proposed algorithm outperforms its competitors in terms of robustness, risk avoidance, and cumulative regret, promoting the social engagement behaviors of children with ASD when imitating a robot's movements.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.