Knowledge Distillation (KD) aims to distill the dark knowledge of a high-powered teacher network into a student network, which can improve the capacity of student network and has been successfully applied to semantic segmentation. However, the standard knowledge distillation approaches merely represent the supervisory signal of teacher network as the dark knowledge, while ignoring the impact of network architecture during distillation. In this paper, we found that the student network with a more similar architecture against the teacher network obtains more performance gain from distillation. Therefore, a more generalized paradigm for knowledge distillation is to distill both the soft-label and the structure of the teacher network. We propose a novel Structural Distillation (SD) method which introduces the structural similarity constraints into vanilla knowledge distillation. We leverage Neural Architecture Search technique to search optimal student structure for semantic segmentation from a well-designed search space, which mimics the given teacher both in terms of soft-label and network structure. Experiment results make clear that our proposed method outperforms both the NAS with conventional Knowledge Distillation and human-designed methods, and achieves sota performance on the Cityscapes dataset under various platform-aware latency constraints. Furthermore, the best architecture discovered on Cityscapes also transfers well to the PASCAL VOC2012 dataset.