An accurate and timely cracking assessment, including the presence, location and crack geometric feature measurement, is crucial for evaluating concrete wind towers. Therefore, the early identification of cracks is a critical procedure in promptly evaluating structural integrity. This study proposed an ad-hoc encoder–decoder network based on DeepLabv3+ with depth separable convolutions to automatically segment cracks from real-world images captured from various concrete wind towers. The combined advantages of the improved DeepLabv3+ and the lightweight MobileNet v2 are suitable as a benchmark due to their high performance and universality. Four experiments were conducted to determine the model design choice and crack feature measurement capability: (1) six parametric tests using various pre-trained base networks and algorithm optimisers, (2) the influence of complex background noise (i.e., handwriting script) on crack segmentation performance, (3) comparative studies with cutting-edge pixel-wise segmentation models and (4) crack feature measurement (i.e., length and width). The research outcome demonstrated that DeepLabv3+ with MobileNet v2 can potentially be applied for efficient and accurate crack segmentation in concrete wind towers with complex backgrounds.