Sea–land segmentation is of great significance for autonomous coastline monitoring, which is fundamental research in the remote sensing community. Due to the diverse contents and easily confused sea–land boundaries contained in remote sensing images, it is always challenging to achieve precise sea–land segmentation for complex scenarios. Although existing deep learning-based methods have exhibited promising performance, excessive computational load and insufficient use of hierarchical features remain unresolved. In this paper, we contribute to addressing the problems by developing an efficient and lightweight convolutional neural network (CNN) termed E-Net. On the one hand, the proposed network adopts a novel E-shaped architecture that reforms the conventional U-codec structure to make full use of hierarchical features at different depths, so that the sea–land segmentation effect can be significantly improved without excessive computational load. On the other hand, a contextual aggregation attention mechanism (CA2M) is designed to further facilitate efficient aggregation and transmission of contextual information, so that the fuzzy and irregular sea–land boundaries can be accurately distinguished. Extensive experiments reveal that our approach not only produces superior sea–land segmentation effect but also demonstrates promising computational efficiency. Specifically, the proposed E-Net achieves state-of-the-art sea–land segmentation performance with 92.78% and 93.62% mean Intersection over Union (mIoU) on the SLSD and HRSC2016 datasets, respectively, while the frames per second (FPS) reaches 108.032 with as low as 52.287G floating point operations per second (FLOPs).