Accurate building energy prediction methods have become a key factor in achieving energy-saving goals. Traditional methods for building energy multi-step prediction often use recursive or direct strategies to address time series prediction problems, which may neglect the data sequence correlation and result in the cumulative error. To solve the above problem, this paper proposed a Temporal Feature Decomposition Fusion Network (TFDFNet) model for building energy consumption multi-step prediction, with an encoder-decoder architecture. In the encoder, feature fusion layers is employed to consider the influence of different feature sequences on the predicted sequence. Through the decomposition of historical load data sequences, different hierarchical sequence structures are constructed to enhance the interpretability and predictability of the data. In the decoder, a simple and efficient network is constructed using MLP to decode the encoded information and obtain the prediction results. Experimental results show that, the proposed model achieves higher prediction accuracy and more stable convergence than the other five comparable methods, which also indicates the potential of achieving excellent building energy consumption multi-step prediction results with a simple model design.