Abstract

Faced with an ever-increasing complexity of their domains of application, artificial learning agents are now able to scale up in their ability to process an overwhelming amount of data. However, this comes at the cost of encoding and processing an increasing amount of redundant information. This work exploits the possibility of learning systems, applied in partially observable domains, to selectively focus on the specific type of information that is more likely related to the causal interaction among transitioning states. A temporal difference displacement criterion is defined to implement adaptive masking of the observations. It can enable a significant improvement of convergence of temporal difference algorithms applied to partially observable Markov processes, as shown by experiments performed under a variety of machine learning problems, ranging from highly complex visuals as Atari games to simple textbook control problems such as CartPole. The proposed framework can be added to most RL algorithms since it only affects the observation process, selecting the parts more promising to explain the dynamics of the environment and reducing the dimension of the observation space.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.