Abstract

This paper constructs a learning probabilistic automata (PA) model with response of operant conditioning (OC) behavior, which used for simulating skinner-pigeon experiment. The PA model with OC is a form of animal learning in that it allows an agent to adapt its actions to gain maximally from the environment while only being rewarded for correct performance. The learning mechanism achieved by design probability of action selection, which is updated by the information of reward and punishment form the environment, and then the agent select an action random according to the probability of action selection. We apply our model to skinner-pigeon experiment, the peck button task. The pigeon learn this task in stages. In simulation, our model also acquires the task in a similar manner.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call