Wafer processing is the most expensive, time-consuming, and complex stage in semiconductor manufacturing. It varies significantly based on orders of customers (agents). Optimising the wafer processing flow in a multi-agent scenario can meet customised requirements, speed up delivery, and reduce costs. This work models wafer processing as a multi-agent job shop scheduling problem (MAJSP) with release dates. The objective is to minimise the total weighted makespan of agents. To address both dynamic and static scheduling scenarios in the MAJSP context, two deep reinforcement learning-based (DRL) methods are proposed. In a dynamic scheduling scenario, the statuses of orders and production resources can change at any moment. A DRL method called Graph Transformer Network (GTN) is proposed to rapidly generate high-quality solutions. In a static scheduling scenario, the production plan can be formulated based on predetermined demand and resource conditions. A novel hybrid method (GTN-DABC) that combines GTN with the discrete artificial bee colony algorithm (DABC) is proposed to provide high-quality production plans for manufacturers within an acceptable computation time. Experimental results demonstrate that the proposed GTN outperforms existing heuristics, and the well-designed GTN-DABC is more competitive than other meta-heuristics.