Automatically detection of curtain wall frames is a crucial part of building automation. However, the current curtain wall installation robots mainly rely on manual assistance for positioning, resulting in low efficiency and inaccuracy. To address these drawbacks, we propose an edge information fusion perception network (EIFP-Net) for RGB-D curtain wall frames detection. Specifically, a cross-modal context fusion module (CCFM) allows fusing of RGB and depth features to enhance cross-modal information complementation and capture multi-scale context information simultaneously. In addition, an edge features sensing module (EFSM) is derived from the RGB branch, which is committed to realizing edge feature extraction. Within this module, a differential enhancement module (DEM) is introduced to enhance the edge information. Finally, a multi-scale progressive refinement decoder (MPRD) is designed to refine the features, utilizing the edge information as a guide to capture comprehensive features. The experimental results on the constructed curtain wall frame dataset show that the proposed EIFP-Net performs better than the state-of-the-art detection models, achieving 85.94 %, 88.29 %, 96.84 %, 86.81 %, and 87.10 % on the five evaluation metrics of precision, recall, accuracy, mIoU, and F1-score, respectively.