Rural building automatic extraction technology is of great significance for rural planning and disaster assessment; however, existing methods face the dilemma of scarce sample data and large regional differences in rural buildings. To solve this problem, this study constructed an image dataset of typical Chinese rural buildings, including nine typical geographical regions, such as the Northeast and North China Plains. Additionally, an improved remote sensing image rural building extraction network called AGSC-Net was designed. Based on an encoder–decoder structure, the model integrates multiple attention gate (AG) modules and a context collaboration network (CC-Net). The AG modules realize focused expression of building-related features through feature selection. The CC-Net module models the global dependency between different building instances, providing complementary localization and scale information to the decoder. By embedding AG and CC-Net modules between the encoder and decoder, the model can capture multiscale semantic information on building features. Experiments show that, compared with other models, AGSC-Net achieved the best quantitative metrics on two rural building datasets, verifying the accuracy of the extraction results. This study provides an effective example for automatic extraction in complex rural scenes and lays the foundation for related monitoring and planning applications.