Knowledge-driven building extraction method exhibits a restricted adaptability scope and is vulnerable to external factors that affect its extraction accuracy. On the other hand, data-driven building extraction method lacks interpretability, heavily relies on extensive training data, and may result in extraction outcomes with building boundary blur issues. The integration of pre-existing knowledge with data-driven learning is essential for the intelligent identification and extraction of buildings from high-resolution aerial images. To overcome the limitations of current deep learning building extraction networks in effectively leveraging prior knowledge of aerial images, a geometric significance-aware deep mutual learning network (GSDMLNet) is proposed. Firstly, the GeoSay algorithm is utilized to derive building geometric significance feature maps as prior knowledge and integrate them into the deep learning network to enhance the targeted extraction of building features. Secondly, a bi-directional guidance attention module (BGAM) is developed to facilitate deep mutual learning between the building feature map and the building geometric significance feature map within the dual-branch network. Furthermore, the deployment of an enhanced flow alignment module (FAM++) is utilized to produce high-resolution, robust semantic feature maps with strong interpretability. Ultimately, a multi-objective loss function is crafted to refine the network’s performance. Experimental results demonstrate that the GSDMLNet excels in building extraction tasks within densely populated and diverse urban areas, reducing misidentification of shadow-obscured regions and color-similar terrains lacking building structural features. This approach effectively ensures the precise acquisition of urban building information in aerial images.