Heterogeneity among factories in distributed manufacturing significantly expands the solution space, complicating optimisation. Traditional centralised scheduling methods lack the scalability to adapt to varying factory scales. This paper proposes an end-to-end decentralised scheduling framework based on deep reinforcement learning (DRL) for dynamic distributed heterogeneous permutation flowshop scheduling problem (DDHPFSP) with random job arrivals. The framework utilises a multi-agent architecture, where each factory operates as an independent agent, enabling efficient, robust, and scalable scheduling. Specifically, the DDHPFSP is formulated as a partially observable Markov decision process (POMDP), with a state space reflecting heterogeneity and permutation characteristics and a new tailored reward function addressing sparse rewards and high reward variance. An end-to-end policy network with dual-layer architecture is developed, incorporating a feature extraction network to capture intrinsic relationships between jobs and heterogeneous factories, enhancing the agent's self-learning and policy evolution. Moreover, a backward swap search (BSS) method based on greedy heuristics optimises the pre-scheduling plan during the online phase with minimal computation time. Experimental results demonstrate the framework outperforms the best comparison methods by 39.76% on 540 baseline instances and 59.95% on 2430 generalisation instances. Furthermore, the framework's effectiveness improves by 68.9% with the introduction of the BSS method.
Read full abstract