• The manufacturing equipment in the workshop is constructed as an equipment agent with the support of edge computing node, and a multi-agent manufacturing system with the ability of online scheduling and scheduling strategy optimization is presented. • A self-organizing negotiation mechanism based on contract network protocol (CNP) is proposed to guide the cooperation and competition among multiple agents , and to provide support for intelligent decision-making. • The decision-making module of equipment agent called artificial intelligence (AI) scheduler is established based on the multi-layer perceptron, and enables the manufacturing equipment to intelligently generate an optimal production strategy according to the perceived workshop state. • An adaptive learning strategy of AI scheduler based on proximal policy optimization (PPO) algorithm is proposed to improve its decision-making performance under the disturbance of orders and resources. • Experimental results show that the proposed methodology achieves better performance in terms of production strategy learning and disturbance handling compared with other optimization-based approaches and dispatching rules . Personalized orders bring challenges to the production paradigm, and there is an urgent need for the dynamic responsiveness and self-adjustment ability of the workshop. Traditional dispatching rules and heuristic algorithms solve the production planning and control problems by making schedules. However, the previous methods cannot work well in a changeable workshop environment when encountering a large number of stochastic disturbances of orders and resources. Recently, the potential of artificial intelligence (AI) algorithms in solving the dynamic scheduling problem has attracted researchers' attention. Therefore, this paper presents a multi-agent manufacturing system based on deep reinforcement learning (DRL), which integrates the self-organization mechanism and self-learning strategy. Firstly, the manufacturing equipment in the workshop is constructed as an equipment agent with the support of edge computing node, and an improved contract network protocol (CNP) is applied to guide the cooperation and competition among multiple agents, so as to complete personalized orders efficiently. Secondly, a multi-layer perceptron is employed to establish the decision-making module called AI scheduler inside the equipment agent. According to the perceived workshop state information, AI scheduler intelligently generates an optimal production strategy to perform task allocation. Then, based on the collected sample trajectories of scheduling process, AI scheduler is periodically trained and updated through the proximal policy optimization (PPO) algorithm to improve its decision-making performance. Finally, in the multi-agent manufacturing system testbed, dynamic events such as stochastic job insertions and unpredictable machine failures are considered in the verification experiments. The experimental results show that the proposed method is capable of obtaining the scheduling solutions that meet various performance metrics, as well as dealing with resource or task disturbances efficiently and autonomously.