Continuous growth of energy consumption for metro systems has raised the public's concern. Meanwhile, both operators and passengers pursue for efficient metro services, especially during peak hours. This paper aims at designing a bi-objective optimization model and a new solution approach so that the energy conservation of metro trains and less passenger waiting time can be achieved. Firstly, considering complex routes, a multi-particle train operation model with the objectives of punctuality, parking accuracy, energy efficiency and safety provided is established for the minimization of traction energy consumption through investigating the optimal force coefficients and coast positions, while suitable dwell times and headway times of metro trains are found trough building a timetable model, which balances the regenerative energy, transfer and non-transfer passenger waiting time. Secondly, a deep reinforcement learning (DRL) and non-dominated sorting genetic algorithm II (NSGA-II) based two-layer solution approach is introduced for model calculation. Finally, comparison shows that the proposed model can achieve the energy conservation of metro trains by 9.16% with passenger waiting time decreased by 15.88%. More experiments further demonstrate the superiority of the developed model and solution approach.