HORSE-CFR: Hierarchical opponent reasoning for safe exploitation counterfactual regret minimization

Shijia Wang,Jiao Wang,Bangyan Song

doi:10.1016/j.eswa.2024.125697

Shijia Wang, Jiao Wang + Show 1 more

https://doi.org/10.1016/j.eswa.2024.125697

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Opponent modeling-based game decision-making algorithms relax the assumption of rationality, having the potential to achieve higher payoffs than Nash equilibrium strategies. For opponent modeling methods, existing work primarily suffers from incompatibility between computational complexity and robustness, leading to difficulties in achieving high payoff decisions from limited historical interactions in imperfect information games. This paper introduces the HORSE-CFR algorithm, which incorporates Hierarchical Opponent Reasoning (HOR) and Safe Exploitation Counterfactual Regret Minimization (SE-CFR) to enhance decision-making robustness in imperfect information games. HOR combines neural networks with Bayesian theory to accelerate reasoning, improve interpretability, and reduce modeling errors. SE-CFR optimizes the balance between profitability and conservatism, integrating opponent modeling-based strategy adaptation into a constrained linear binary optimization framework. In experiments, HORSE-CFR outperformed Nash equilibrium strategies when playing against various opponents, increasing payoffs by 16.4% in Leduc Hold’em and 36.8% in the Transit game, respectively. It also improved payoffs by more than 9.0% compared to the best-known opponent modeling-based safe adaptive algorithm in both games.

Full Text