This study proposes a short-term load prediction method of a bidirectional long short-term memory network based on feature mining of the power consumption big data in combination with the attention mechanism (AT) of Bayesian optimization to address the problems that a considerable amount of feature factors exist and the feature relationship is obscured in the historical power consumption big data. The method comprehensively considers the global features of the power consumption data in space and the local features in time. First, the Cen-CK-means clustering method is used to cluster the electricity consumption data of users, and the statistical, combination, and time category characteristics are extracted according to the meteorological factors related to load over multiple time scales. Second, the Bayesian and bidirectional long and short memory networks are combined to extract the temporal and spatial characteristics of the load data itself. Meanwhile, the AT is introduced to automatically assign the corresponding weights to the hidden layer state of the bidirectional long and short memory. This task is carried out to distinguish the importance of the different time load series, which can effectively reduce the loss of historical information and highlight information about key historical time points. Finally, taking the first type of load as an example, compared with the SVP, RBPNN, BiLSTM, and BO-BiLSTM algorithms, the MAPE index is reduced by 1.05%, 1.75%, 0.52%, and 0.26%, respectively. RMSE decreased by 186.61, 154.93, 91.88, and 15.76 MW, respectively, while R2 increased by 0.04, 0.07, 0.03, and 0.03, respectively. In the one-week forecast time, MAPE index decreased by 1.97%, 2.44%, 1.21%, and 0.6%, respectively; RMSE decreased by 271.18, 305.7, 183.13, and 97.91 MW, respectively; and R2 increased by 0.12, 0.08, 0.04, and 0.03, respectively.