Semantic segmentation plays an important role in very high resolution (VHR) image understanding. Despite the potentials of the deep convolutional network in improving performance by end-to-end feature learning, each model has its limitations, and it is hard to discriminate complex features purely by a single model. Ensemble learning is promising for integrating the strengths of different models, however, the ensemble of deep models is challenging due to the huge amount of parameters and computation of the deep model itself as well as the difficulty in capturing complementarity between different models. To tackle these problems, a head-level ensemble network (HENet) is proposed in this letter, which reduces model complexity by sharing feature extraction networks and improves complementarity between models by novel cooperative learning (CL). Experiments on ISPRS 2-D semantic labeling benchmark demonstrate the effectiveness and advantage of the proposed method.