In recent years, the reduction of high carbon emissions has become a paramount objective for industries worldwide. In response, enterprises and industries are actively pursuing low-carbon transformations. Within this context, power systems have a pivotal role, as they are the primary drivers of national development. Efficient energy scheduling and utilization have therefore become critical concerns. The convergence of smart grid technology and artificial intelligence has propelled transformer load forecasting to the forefront of enterprise power demand management. Traditional forecasting methods relying on regression analysis and support vector machines are ill-equipped to handle the growing complexity and diversity of load forecasting requirements. This paper presents a BERT-based power load forecasting method that leverages natural language processing and image processing techniques to enhance the accuracy and efficiency of transformer load forecasting in smart grids. The proposed approach involves using BERT for data preprocessing, analysis, and feature extraction on long-term historical load data from power grid transformers. Multiple rounds of training and fine-tuning are then conducted on the BERT architecture using the preprocessed training datasets. Finally, the trained BERT model is used to predict the transformer load, and the predicted results are compared with those obtained based on long short-term memory (LSTM) and actual composite values. The experimental results show that compared with LSTM method, the BERT-based model has higher short-term power load prediction accuracy and feature extraction capability. Moreover, the proposed scheme enables high levels of accuracy, thereby providing valuable support for resource management in power dispatching departments and offering theoretical guidance for carbon reduction initiatives.