Landcover classification is an important application in remote sensing, but it is always a challenge to distinguish different features with similar characteristics or large-scale differences. Some deep learning networks, such as UperNet, PSPNet, and DANet, use pyramid pooling and attention mechanisms to improve their abilities in multi-scale features extraction. However, due to the neglect of low-level features contained in the underlying network and the information differences between feature maps, it is difficult to identify small-scale objects. Thus, we propose a novel image segmentation network, named HFENet, for mining multi-level semantic information. Like the UperNet, HFENet adopts a top-down horizontal connection architecture while includes two improved modules, the HFE and the MFF. According to the characteristics of different levels of semantic information, HFE module reconstructs the feature extraction part by introducing an attention mechanism and pyramid pooling module to fully mine semantic information. With the help of a channel attention mechanism, MFF module up-samples and re-weights the feature maps to fuse them and enhance the expression ability of multi-scale features. Ablation studies and comparative experiments between HFENet and seven state-of-the-art models (U-Net, DeepLabv3+, PSPNet, FCN, UperNet, DANet and SegNet) are conducted with a self-labeled GF-2 remote sensing image dataset (MZData) and two open datasets landcover.ai and WHU building dataset. The results show that HFENet on three datasets with six evaluation metrics (mIoU, FWIoU, PA, mP, mRecall and mF1) are better than the other models and the mIoU is improved 7.41–10.60% on MZData, 1.17–11.57% on WHU building dataset and 0.93–4.31% on landcover.ai. HFENet can perform better in the task of refining the semantic segmentation of remote sensing images.