Sound source localization technology has gradually become one of the main methods for fault source target localization in dangerous and difficult-to-discover applications. In order to locate the target fault sound source more intuitively, accurately and in real time in practical applications, an audio-visual positioning system is developed. A 64-channel microphone array is designed, which is evenly distributed in a circular shape, and has a video acquisition target at the same time. The system uses GCC-PHAT to estimate the relative time delay between the signal source arriving at two microphones, and then uses SRP-PHAT algorithm to obtain weighted sum beamforming. A spatial grid search strategy based on spherical coordinate spatial contraction method is proposed. When the output power is maximum, the direction corresponding to the beam is regarded as the sound source direction. Using the same sound source signal, the proposed positioning method is compared with the direct Newton iterative method and the Cartesian coordinate scanning SRP-based maximum method. The simulation results show that the Cartesian coordinate value and spherical coordinate value of the positioning algorithm in this article are closer to the real coordinates of the sound source position than the other two methods, and the positioning operation time is the shortest. In addition, in the actual corona discharge power source positioning experiment, the system can accurately locate the insulator pollution discharge point, which is consistent with the infrared detection and positioning results and has good robustness. Therefore, the system location method can provide a new choice for sound source location applications.