In the dynamic urban landscape, understanding the distribution of buildings is paramount. Extracting and delineating building footprints from high-resolution images, captured by aerial platforms or satellites, is essential but challenging to accomplish manually, due to the abundance of high-resolution data. Automation becomes imperative, yet it introduces complexities related to handling diverse data sources and the computational demands of advanced algorithms. The innovative solution proposed in this paper addresses some intricate challenges occurring when integrating deep learning and data fusion on Earth Observed imagery. By merging RGB orthophotos with Digital Surface Models, deriving from the same aerial high-resolution surveys, an integrated consistent four-band dataset is generated. This unified approach, focused on the extraction of height information through stereoscopy utilizing a singular source, facilitates enhanced pixel-to-pixel data fusion. Employing DeepLabv3 algorithms, a state-of-the-art semantic segmentation network for multi-scale context, pixel-based segmentation on the integrated dataset was performed, excelling in capturing intricate details, particularly when enhanced by the additional height information deriving from the Digital Surface Models acquired over urban landscapes. Evaluation over a 21 km2 area in Turin, Italy, featuring diverse building frameworks, showcases how the proposed approach leads towards superior accuracy levels and building boundary refinement. Notably, the methodology discussed in the present article, significantly reduces training time compared to conventional approaches like U-Net, overcoming inherent challenges in high-resolution data automation. By establishing the effectiveness of leveraging DeepLabv3 algorithms on an integrated dataset for precise building footprint segmentation, the present contribution holds promise for applications in 3D modelling, Change detection and urban planning. An approach favouring the application of deep learning strategies on integrated high-resolution datasets can then guide decision-making processes facilitating urban management tasks.