Electricity is a vital resource for societal and economic activities. Accurate electricity load forecasting can effectively reduce costs and enhance energy efficiency. Nevertheless, current research has limitations in effectively extracting and utilizing load forecasting information. This encompasses both undiscovered latent features and models that have not effectively captured long-term dependencies. To address these issues, this paper proposes a novel network architecture, which utilizes a hybrid feature extraction strategy and integrates machine learning methods with deep learning techniques. This method employs a hybrid feature extraction strategy composed of gradient-boosted regression trees, Fisher Score, and mutual information to achieve comprehensive feature representation. The network architecture utilizes a multi-head convolutional neural network and a bi-directional long short-term memory (BiLSTM) with lag parameters. This architecture can simultaneously learn multiple levels and aspects of features, further enhancing the BiLSTM model’s ability to capture long-term dependencies. Through multiple experiments conducted on datasets from Maine, Singapore, New South Wales, Australia, and European countries, our method outperformed the comparison models in three out of the five datasets.