Abstract

In any given task, there are at least two broad classes of strategies that can be utilized to select actions. Model-based strategies simulate a model to select actions that maximize expected performance, while model-free policy-based strategies learn mappings between stimuli and response options specific to training examples (i.e no generalization). A natural trade-off exists between these strategies, wherein model-based out-perform model-free under conditions of stimulus variability and limited training, while abundant training on few stimuli favor model-free strategies. These ideas predict when specificity should occur – in experimental conditions in which policies outperform predictive models. We tested this prediction using a motion extrapolation task that supports both model- and policy-based strategies. We predicted participants would transition to policy-based strategies and exhibit specificity in learning only under conditions where policy-based strategies were more reliable. We quantified these conditions (e.g. stimulus set sizes and training durations) by simulating performance of a model-based Kalman filter and policy-based Q-learning algorithm. For 200 training trials, 4 paths (4P) predicts transition, while 20 paths (20P) does not. The task required selecting the reemergence location (one of 20 bins) of a dot that traveled along a circular arc of variable curvature and orientation before disappearing behind an occluder. Separate groups were trained in the 4P and 20P conditions. We monitored observers' extrapolation strategies through no-feedback pre-test, post-test, and interleaved test sessions among training sessions. While both groups performed similarly during pre-test, relying on default models, transition to a policy-based strategy occurred only in the 4P condition, resulting in improved performance on trained stimuli, but decreased performance on novel stimuli. The results comprise the first quantitative prediction and successful test of when specificity in learning will occur, and show that specificity should result whenever response policies are the most reliable way to satisfy task demands.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.