BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems

Zachary Lipton,Li Deng,Jianfeng Gao,Faisal Ahmed,Lihong Li,Xiujun Li

doi:10.1609/aaai.v32i1.11946

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems

Zachary Lipton, Li Deng + Show 4 more

Open Access

PDF Available

https://doi.org/10.1609/aaai.v32i1.11946

Copy DOI

Export

Save

Cite

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Apr 27, 2018
Citations: 40

Affiliation: Carnegie Mellon University, Citadel, Microsoft Research (United Kingdom), Google (United States)

#Successful Episodes #Thompson Sampling #Efficient Exploration #Dialogue Systems #Neural Network #Common Strategies #Deep Learning #Task-Oriented Dialogue Systems #Replay Buffer #Q-learning Agents

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

We present a new algorithm that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems. Our agents explore via Thompson sampling, drawing Monte Carlo samples from a Bayes-by-Backprop neural network. Our algorithm learns much faster than common exploration strategies such as ε-greedy, Boltzmann, bootstrapping, and intrinsic-reward-based ones. Additionally, we show that spiking the replay buffer with experiences from just a few successful episodes can make Q-learning feasible when it might otherwise fail.

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.

R Discovery Prime

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems