The accurate diagnosis of the compound fault of industrial robots can be highly beneficial to maintenance management. In the actual noisy working environment of industrial robots, the mixed and feeble failure features are easy to be overwhelmed, which poses a major challenge for the industrial robot compound fault diagnosis. Meanwhile, in the existing studies, a large-size deep learning model is the guarantee of decent denoising and fault diagnosis performance. However, this demands expensive computational costs and large data samples, which are not always available. In order to address both challenges, in this study, an integrated approach that contains two compact Transformer networks is proposed to achieve accurate compound fault diagnosis for industrial robots. In this approach, the feedback current signals collected from a six-axis industrial robot are first transformed into time-frequency image representation via continuous wavelet transformation (CWT). Secondly, a novel deep learning algorithm called compact Uformer is proposed to denoise the time-frequency image. Subsequently, the denoised time-frequency images are fed into compact convolutional Transformer (CCT) for compound fault diagnosis. An experimental study based on a real-world industrial robot compound fault dataset was conducted. The experimental results reveal that the proposed method can achieve satisfactory compound fault diagnosis accuracy based on the data collected from the noisy environment in comparison with the state-of-the-art algorithms.