In recent years, many deep learning-based methods have been utilized for the feedback of Channel State Information (CSI) in massive MIMO systems. The Transformer-based networks leverage global self-attention mechanisms that can effectively capture remote correlations between antennas, while Convolutional Neural Networks (CNNs) excel in acquiring local information. To balance the advantages of both, this paper proposes an Efficient Feature Aggregation Network called EFANet, which hybrid CNNs and Transformer. Specifically, we propose a Refined Window Multi-head Self-Attention (RW-MSA) through hybrid Convolutional Embedding Unit (CEU) and Window Multi-head Self-Attention (W-MSA) to reduce information loss between windows and achieve efficient feature aggregation. Additionally, we develop a Local Enhanced Feedforward Network (LEFN) to further integrate local information in the CSI matrix and model detailed features of different regions. Finally, the Compensation Unit (CU) is designed to further compensate for global-local features in the CSI matrix. Through the above design, the global and local features are fully interactive to reduce information loss. Numerous experiments have shown that the proposed method achieves better CSI reconstruction performance while reducing computational complexity.