The urgent mission for carbon peak and carbon neutrality is demanding greater industrial sustainability. Energy-efficient hybrid flow shop scheduling problem (EEHFSP) has been raising increasing attention in recent years. This paper studies a new EEHFSP with uniform machines to minimize total tardiness, total energy cost, and carbon trading cost. Time-of-use tariffs and power down strategies are simultaneously adopted. A novel multi-objective mixed-integer nonlinear programming model for the problem is proposed. To solve the model, we propose a Q-learning and general variable neighborhood search (GVNS) driven non-dominated sorting genetic algorithm II (QVNS-NSGA-II). The novelty of the algorithm is that we incorporate Q-learning into GVNS to guide premium adaptive operator selection throughout the shaking and local search processes. A distinguishing feature is that the states and actions of Q-learning are set as neighborhood structures and local search operators. The Q-learning-driven GVNS is embedded into NSGA-II to promote the exploration and exploitation capability. Experimental results show that the proposed QVNS-NSGA-II outperforms NSGA-II, improved Jaya, and modified MOEA/D in terms of the quantity, quality of Pareto solutions, and computational efficiency. Sensitivity analysis also derives several managerial implications. The proposed approach can be applied to improve sustainability and productivity for hybrid flow shop manufacturers.