Transfer learning (TL) is a technique used in energy systems to enhance the accuracy of short-term load forecasting (STLF) with scarce data. The selection of transfer domains is decisive for the accuracy of TL. Traditional transfer domain selection algorithms based on linear and nonlinear analysis ignore the probability distribution of load series between target and source domains, inevitably resulting in negative transfer. This paper proposes a transfer domain selection algorithm that combines Wasserstein distance (WD) and maximal information coefficient (MIC), namely WM algorithm. The WM algorithm is used to determine transfer domains for training DSSFA-LSTM, a decomposition-based forecasting model that developed in our previous work. Again, TL is used to predict the short-term load of target domain, generating WM-DSSFA-LSTM-TL model. The experimental results show that the WM algorithm can effectively reduce the risk of negative transfer by measuring the similarity between time series variables based on nonlinear and probability distribution. In case studies, the WM-DSSFA-LSTM-TL model did not experience negative transfer, and its reliability is better than advanced forecasting models, including LSTM, Informer, and Autoformer. In ELP case, WM-DSSFA-LSTM-TL achieved the highest fitting degree; and compared to LSTM, Informer, and Autoformer, its R2 scores increased 0.76, 0.96, and 0.63, respectively.