The increasing uptake of renewable energy sources leads to more uncertainties of energy scheduling, resulting in more difficulties of energy demand management in smart grid. Accurate forecasting plays a significant role in energy management, and in some cases it may involve predictions of different consumers or geographic regions. Traditionally, these prediction problems have been solved independently, ignoring the potential shared knowledge among them that may help facilitate the overall performance. In this manuscript, we propose a multi-swarm multi-tasking ensemble learning (MSMTEL) framework for solving the energy demand forecasting across multiple cities. The proposed method comprises single-task pretraining, multi-task optimization (MTO), and ensemble learning (EL). For each prediction task, several subtasks are generated based on predefined parameters in the forecasting model. Each subtask utilizes an independent deep neural network (DNN) as the predictor, which is pretrained individually. Subsequently, we modify the dynamic multi-swarm particle swarm optimization (DMS-PSO) as a customized multi-swarm PSO (CMS-PSO) algorithm for implementing MTO. Each subswarm in CMS-PSO focuses on finding the optimal model knowledge (pretrained DNN weights and biases) of source tasks (all subtasks) to be reused by the target subtask. Finally, instead of choosing the best forecasting among the subtasks, results of subtasks over the same energy prediction problem are combined to yield the EL result, wherein weight coefficients determining the contributions of subtasks to EL are optimized by PSO. MSMTEL is evaluated against single-task learning (STL) and MTO to demonstrate its superior performance.