Abstract

Reinforcement learning illustrates how people represent and learn large-scale environments for spatial navigation. However, it is unclear how “reachable” spaces are represented when performing manual tasks like chopping vegetables. To analyze the mechanisms underlying learning reachable space, we used inspiration from a recent study (de Cothi et al., 2022) to develop a haptic maze task where human participants reached to a target while avoiding invisible haptic obstacles in the environment. Participants reached with a robotic handle, generating contact forces to simulate the maze boundary, obstacle walls, and the floor, which supported the hand. Participants were blind to their hand position, while the target and maze boundary remained visible. Two conditions were implemented for each experiment variation; maze obstacles were invisible in the first condition, then became visible in the second. Audio feedback was given upon finding the target. We tested participants in 25 unique mazes, performing 10 trials within each maze. Two experimental procedures were employed: one with a fixed target and varying starting locations (18 participants), while the other had a fixed starting location and variable targets (10 participants). We simulated and compared the likelihoods of 3 different reinforcement learning models: model-based (MB), model-free (MF) and successor representation (SR). MB-simulated agents learned a map of the maze, planning the shortest route to the target. MF-simulated agents cached and updated the expected total future reward for each action using experience, pursuing the highest-reward actions. SR-simulated agents generated a “cognitive” (predictive) map between grids (states), integrating it with the target to plan actions. Results suggested that humans use a combination of MB and MF to learn reachable spaces. MB agents had higher likelihoods (better represented human behaviour) than MF agents for earlier trials, but MF had greater likelihoods for later trials. Meanwhile, SR learning was significantly worse at predicting participant behaviour.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.