High-resolution remote sensing images can finely express rich surface information. Using the macroscopic and spatial-temporal full coverage advantages of high-resolution remote sensing images for urban building objectification recognition is a current research hot spot in the field of remote sensing image analysis and application. However, the current research still lacks effective technical means to convert surface elements quickly and accurately from remote sensing image space to geographic information space. In this paper, the complementary advantages between image processing and deep learning are combined to research target-level extraction of high-resolution remote sensing urban buildings based on the building element information data. The experimental results achieved the precision of 0.9481 and the recall of 0.9733, indicating that the method proposed in this paper can be applied to the effective extraction of urban-level buildings, which expands the theoretical basis of the object recognition method of remote sensing thematic features based on the idea of object image analysis.