Abstract As a core component of photovoltaic power generation systems, the three-phase rectifier device plays a crucial role, and its failure can potentially reduce energy conversion efficiency and output quality. Presently, the performance of fault time-domain signal diagnosis methods based on three-phase rectifier circuits is challenging to enhance. This paper proposes a novel fault detection method for three-phase rectifier devices based on Vision Transformer, referred to as CWT-ViT, to address this issue. This method transforms time-domain fault signals into images through Continuous Wavelet Transform (CWT), subsequently inputting these images into a Vision Transformer model. Relying on its powerful self-attention mechanism and fully connected layers, it realizes the extraction and learning of rectifier device image features. A Simulink simulation model of a three-phase bridge controllable rectifier circuit is established for fault injection to collect fault signals. Fault diagnosis experiments demonstrate that the proposed diagnostic method achieves a prediction accuracy of 98.6%, maintaining a relatively high precision level. In comparison to four excellent classification models currently available: AlexNet, RepVGG, ResNet, and GoogLeNet, the proposed method demonstrates superior diagnostic performance. Additionally, this paper conducts ablation experiments to meticulously analyze the impact of each module in the fault diagnosis process. This research achieves more precise and efficient fault diagnosis in photovoltaic power generation systems, thereby reducing downtime and maintenance costs for actual equipment and enhancing the stability of photovoltaic power generation systems. This research provides an innovative, intelligent solution for the intelligent operations and maintenance of photovoltaic power generation.