Abstract

This paper is concerned with the autonomous learning of plans in probabilistic domains with out a priori domain-specific knowledge. In contrast to existing reinforcement learning algorithms that generate only reactive plans, and existing probabilistic planning algorithms that require a sub stantial amount of a priori knowledge in order to plan, a two-stage bottom-up process is devised in which first reinforcement learning/dynamic programming is applied, without the use of a pri ori domain-specific knowledge, to acquire a reactive plan, and then explicit plans are extracted from the reactive plan. Several options for plan extraction are examined, each of which is based on a beam search that performs temporal projection in a restricted fashion, guided by the value functions resulting from reinforcement learning/dynamic programming. Some completeness and soundness results are given. Examples in several domains are discussed that together demonstrate the working of the proposed model.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.