In practical decision-making dialogues, reinforcement learning methods face hurdles due to delays and sparse reward feedback for agents, and in some cases, lack of rewards altogether. These issues can impede efficient learning of dialogue strategies and compromise the performance of the model. To address this challenge, this paper introduces the Multi-Agent Curiosity Reward Model (MACRM) for task-oriented dialog systems. Firstly, in terms of dialog reward mechanisms, a forward dynamics model generates curiosity rewards, which are integrated with extrinsic rewards from the dialog environment feedback to mitigate the problem of sparse rewards resulting from inadequate agent exploration. Secondly, regarding the dialogue strategy training mechanism, an exploration-exploitation approach inspired by organismic exploration is adopted. This approach involves fully exploring the dialogue environment in the early stages and optimally exploiting learned knowledge later, thereby balancing exploration and exploitation and enhancing dialogue strategy learning efficiency. To assess the proposed model's effectiveness, experiments are conducted using the MultiWOZ corpus across three reward environments: (1) extrinsic rewards only, (2) curiosity rewards only, and (3) a combination of both. The experimental results demonstrate that agents employing MACRM exhibit faster learning of dialogue strategies compared to those relying on a single exploratory reward method, effectively addressing reward sparsity and delay issues in practical decision-making scenarios.