Reinforcement learning tree-based planning methods have been gaining popularity in the last few years due to their success in single-agent domains, where a perfect simulator model is available: for example, Go and chess strategic board games. This paper pretends to extend tree search algorithms to the multiagent setting in a decentralized structure, dealing with scalability issues and exponential growth of computational resources. The dynamic tree search combines forward planning and direct temporal-difference updates, outperforming markedly conventional tabular algorithms such as learning and state-action-reward-state-action (SARSA). Future state transitions and rewards are predicted with a model built and learned from real interactions between agents and the environment. This paper analyzes the developed algorithm in the hunter–pursuit cooperative game against stochastic and intelligent evaders. The dynamic tree search aims to adapt single-agent tree search learning methods to the multiagent boundaries and is demonstrated to be a remarkable advance as compared to conventional temporal-difference techniques.