Automatic building extraction from very high-resolution remote sensing images is of great significance in several application domains, such as emergency information analysis and intelligent city construction. In recent years, with the development of deep learning technology, convolutional neural networks (CNNs) have made considerable progress in improving the accuracy of building extraction from remote sensing imagery. However, most existing methods require numerous parameters and large amounts of computing and storage resources. This affects their efficiency and limits their practical application. In this study, to balance the accuracy and amount of computation required for building extraction, a novel efficient lightweight residual network (ELRNet) with an encoder-decoder structure is proposed for building extraction. ELRNet consists of a series of downsampling blocks and lightweight feature extraction modules (LFEMs) for the encoder and an appropriate combination of LFEMs and upsampling blocks for the decoder. The key to the proposed ELRNet is the LFEM which has depthwise-factorised convolution incorporated in its design. In addition, the effective channel attention (ECA) added to LFEM, performs local cross-channel interactions, thereby fully extracting the relevant information between channels. The performance of ELRNet was evaluated on the public WHU Building dataset, achieving 88.24% IoU with 2.92 GFLOPs and 0.23 million parameters. The proposed ELRNet was compared with six state-of-the-art baseline networks (SegNet, U-Net, ENet, EDANet, ESFNet, and ERFNet). The results show that ELRNet offers a better tradeoff between accuracy and efficiency in the automatic extraction of buildings in very highresolution remote sensing images. This code is publicly available on GitHub (https://github.com/GaoAi/ELRNet).