High-resolution remote sensing images contain abundant building information and provide an important data source for extracting buildings, which is of great significance to farmland preservation. However, the types of ground features in farmland are complex, the buildings are scattered and may be obscured by clouds or vegetation, leading to problems such as a low extraction accuracy in the existing methods. In response to the above problems, this paper proposes a method of attention-enhanced U-Net for building extraction from farmland, based on Google and WorldView-2 remote sensing images. First, a Resnet unit is adopted as the infrastructure of the U-Net network encoding part, then the spatial and channel attention mechanism module is introduced between the Resnet unit and the maximum pool and the multi-scale fusion module is added to improve the U-Net network. Second, the buildings found on WorldView-2 and Google images are extracted through farmland boundary constraints. Third, boundary optimization and fusion processing are carried out for the building extraction results on the WorldView-2 and Google images. Fourth, a case experiment is performed. The method in this paper is compared with semantic segmentation models, such as FCN8, U-Net, Attention_UNet, and DeepLabv3+. The experimental results indicate that this method attains a higher accuracy and better effect in terms of building extraction within farmland; the accuracy is 97.47%, the F1 score is 85.61%, the recall rate (Recall) is 93.02%, and the intersection of union (IoU) value is 74.85%. Hence, buildings within farming areas can be effectively extracted, which is conducive to the preservation of farmland.