Abstract

In multi-agent systems, the Dynamic Distributed Constraint Optimisation Problem (D-DCOP) framework is pivotal, allowing for the decomposition of global objectives into agent constraints. Proactive agent behaviour is crucial in such systems, enabling agents to anticipate future changes and adapt accordingly. Existing approaches, like Proactive Dynamic DCOP (PD-DCOP) algorithms, often necessitate a predefined environment model. We address the problem of enabling proactive agent behaviour in D-DCOPs where the dynamics model of the environment is unknown. Specifically, we propose an approach where agents learn local autoregressive models from observations, predicting future states to inform decision-making. To achieve this, we present a temporal experience-sharing message-passing algorithm that leverages dynamic agent connections and a distance metric to collate training data. Our approach outperformed baseline methods in a search-and-extinguish task using the RoboCup Rescue Simulator, achieving better total building damage. The experimental results align with prior work on the significance of decision-switching costs and demonstrate improved performance when the switching cost is combined with a learned model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call