Abstract
This article proposes a novel Mahjong game model, LsAc ∗‐MJ, designed to address challenges posed by data scarcity, difficulty in leveraging contextual information, and the computational resource‐intensive nature of self‐play zero‐shot learning. The model is applied to Japanese Mahjong for experiments. LsAc ∗‐MJ employs long short‐term memory (LSTM) neural networks, utilizing hidden nodes to store and propagate contextual historical information, thereby enhancing decision accuracy. Additionally, the paper introduces an optimized Advantage Actor‐Critic (A2C) algorithm incorporating an experience replay mechanism to enhance the model’s decision‐making capabilities and mitigate convergence difficulties arising from strong data correlations. Furthermore, the paper presents a two‐stage training approach for self‐play deep reinforcement learning models guided by expert knowledge, thereby improving training efficiency. Extensive ablation experiments and performance comparisons demonstrate that, in contrast to other typical deep reinforcement learning models on the RLcard platform, the LsAc ∗‐MJ model consumes lower computational and time resources, has higher training efficiency, faster average decision time, higher win‐rate, and stronger decision‐making ability.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.