Convolutional neural networks (CNNs) have facilitated impressive improvements in the semantic segmentation of very high-resolution (VHR) remote sensing images. The success of semantic segmentation depends on an effective receptive field (RF) large enough to cover the entire object. Popular methods to enlarge the effective RF include dilated filters, subsampling operations, and stacking layers. Unfortunately, the methods are inefficient or able to cause grid artifacts. Moreover, although the object sizes vary greatly in remote sensing images, the size of the RF cannot reach a compromise between small and large objects. To tackle these problems, we propose adaptive effective receptive convolution (AERFC) for VHR remote sensing images. AERFC adaptively controls the sampling location of convolution and automatically adjusts the effective RF without significantly increasing the parameter number and computational cost. Thus, AERFC reduces the training difficulty, decreases overfitting risk, and reserves details in VHR images. AERFC is also integrated with spatial pyramid pooling (SPP) to aggregate diverse multiscale features for exploring contextual information. Experimental results of the quantitative and qualitative evaluation over four benchmark data sets show that AERFC outperforms state-of-the-art methods.