Abstract Simultaneously reducing network energy consumption and delay is a hot topic today. This paper addresses this issue by designing a novel multi-objective data transmission optimization algorithm based on deep reinforcement learning. A three-layer back propagation (BP) neural network is designed to improve the accuracy of environmental state prediction, by learning from historical state and action sequence data, which can help the agent make better decision for routing selection in complex network environment. Based on this, we use Q-Learning to find routing for transmission demands, aggregating more traffic through less links and routers, to reduce energy consumption and delay. To enhance the efficiency and robustness of the algorithm, a new reward mechanism is designed based on the traffic demand and the link state. The algorithm divides candidate links into three levels for path selection so that a better solution can be obtained on the basis of ensuring feasible solutions are obtained. Continuous updating of the Pareto set through multiple state steps approximates the optimal solution. We leverage the Euclidean distance to the reference point to measure the optimization effect of the two objectives. The simulation results show that this algorithm outperforms existing algorithms in reducing energy consumption and network delay.