Unmanned aerial vehicles (UAVs) with high-resolution optical and infrared ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">IR</i> ) imaging have been introduced in recent years to perform inexpensive and fast inspections in operation and maintenance activities of solar power plants, reducing the labor needed, while lowering the on-site inspection time. Even though UAVs can acquire images extremely quickly, the analysis of those images is still a time-consuming procedure that should be performed by a trained professional. Therefore, a computer vision approach may be used to accelerate image analysis. In this work, a dataset of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">IR</i> images was created from a 10-MW solar power plant and a comparative analysis between mask R- convolutional neural network (CNN) and U-Net was performed for two experiments. Concerning the defective module segmentation, the mask R-CNN algorithm achieved a mean average precision at intersection over union (IoU) = 0.50 of 0.96, using augmentation data. Regarding the segmentation and classification of failure type, the algorithm reached a value of 0.88 considering the same evaluation metric and data augmentation. When compared to the U-Net in terms of IoU, the mask R-CNN outperformed it with 0.87 and 0.83 for the first and second experiments, respectively.