It is inefficient, subjective, and potentially life-threatening for both inspectors and pedestrians to manually inspect glass curtain cracks of high-rise buildings. Due to the development of image instance segmentation techniques for deep learning and the operation of compact unmanned aerial vehicles (UAVs), automatic, accurate, and safe glass curtain crack inspection can be achieved. A novel approach proposes to segment glass curtain cracks efficiently and accurately, combining a convolutional neural network (CNN), the region of interest alignment (RoIAlign), and an iterative up-sampling method. The approach proposed achieves excellent performance with an accuracy=99.89%, precision=91.1%, recall=72.1%, F1-score=79.7%, and mIoU=78.7%. Comparative studies have been conducted to examine the performance of the proposed approach relative to Solo, Swin, Mask RCNN, and ConvNext V2. The results show that the proposed network outperforms the networks used for comparison and that it can more accurately detect and segment a variety of crack types in complex reflective environments of glass curtains.