Personalized product demands have made the production mode of many varieties and small batches mainstream. Self-organizing manufacturing systems represented by multi-agent-based manufacturing systems (MAMS) are pursued by manufacturing enterprises to support collaborative dynamic scheduling among various pieces of manufacturing equipment. In contrast to traditional scheduling methods including meta-heuristic algorithms and dispatching rules, reinforcement learning (RL), whether in the form of single-agent RL (SARL) or multi-agent RL (MARL), combines high response and robust performance in dynamic environments. However, the SARL-based methods suffer from the curse of dimensionality in the action space, while current MARL-based methods fail to achieve the optimization of collaborative scheduling decisions among heterogeneous manufacturing equipment. These factors directly hinder the effective resolution of dynamic scheduling in the MAMS. Therefore, we present a novel MARL-based approach that solves the dynamic flexible job-shop scheduling problem in the MAMS (DFJSP-MAMS), with the objective of minimizing the mean tardiness. Here, various items of heterogeneous manufacturing equipment in the MAMS, including the warehouse, machines, buffers, and vehicles, are encapsulated as agents. Moreover, a partially observable Markov game is constructed to specify the collaborative scheduling process among these agents and transfer the DFJSP-MAMS into a MARL task. Furthermore, an action space consisting of several weight variables is designed for each agent to offer the weight selection of dispatching rules. Subsequently, a policy network is built for each agent to determine the weights and scheduling decisions. Finally, the multi-agent deep deterministic policy gradient algorithm is employed to train these agents for the optimization of collaborative scheduling decisions among various pieces of heterogeneous equipment. Experiment results demonstrate that the proposed approach outperforms single dispatching rules, the SARL-based method, and other MARL-based methods under 36 distinct manufacturing environment conditions.
Read full abstract