Abstract

The 20 Questions (Q20) game is a well known game which encourages deductive reasoning and creativity. In the game, the answerer first thinks of an object such as a famous person or a kind of animal. Then the questioner tries to guess the object by asking 20 questions. In a Q20 game system, the user is considered as the answerer while the system itself acts as the questioner which requires a good strategy of question selection to figure out the correct object and win the game. However, the optimal policy of question selection is hard to be derived due to the complexity and volatility of the game environment. In this paper, we propose a novel policy-based Reinforcement Learning (RL) method, which enables the questioner agent to learn the optimal policy of question selection through continuous interactions with users. To facilitate training, we also propose to use a reward network to estimate the more informative reward. Compared to previous methods, our RL method is robust to noisy answers and does not rely on the Knowledge Base of objects. Experimental results show that our RL method clearly outperforms an entropy-based engineering system and has competitive performance in a noisy-free simulation environment.

Highlights

  • The 20 Question Game (Q20 Game) is a classic game that requires deductive reasoning and creativity

  • Wu et al (2018) further improve the relevance table with a lot of engineering tricks. Since these table-based methods greedily select questions and the model parameters are only updated by rules, their models are very sensitive to noisy answers from users, which is common in the real-world Q20 games

  • Our contributions can be summarized as follows: (1) We propose a novel Reinforcement Learning (RL) framework to learn the optimal policy of question selection in the Q20 game without any dependencies on the existing Knowledge Base (KB) of target objects

Read more

Summary

Introduction

The 20 Question Game (Q20 Game) is a classic game that requires deductive reasoning and creativity. At the beginning of the game, the answerer thinks of a target object and keeps it concealed. The questioner tries to figure out the target object by asking questions about it, and the answerer answers each question with a simple “Yes”, “No” or “Unknown”, honestly. The questioner wins the game if the target object is found within 20 questions. In a Q20 game system, the user is considered as the answerer while the system itself acts as the questioner which requires a good question selection strategy to win the game. As a game with the hype read your mind, Q20 has been played since the 19th century, and was brought to screen in the 1950s by the TV show Twenty Questions. Burgener’s program (Burgener, 2006) further popularized Q20 as an electronic game in 1988, and modern virtual assistants like Microsoft XiaoIce and Amazon Alexa incorporate this game into their system to demonstrate their intelligence

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call