Purpose This study aims to solve the problems of large training sample size, low data sample quality, low efficiency of the currently used classical model, high computational complexity of the existing concern mechanism, and high graphics processing unit (GPU) occupancy in the current visualization software defect prediction, proposing a method for software defect prediction termed recurrent criss-cross attention for weighted activation functions of recurrent SE-ResNet (RCCA-WRSR). First, following code visualization, the activation functions of the SE-ResNet model are replaced with a weighted combination of Relu and Elu to enhance model convergence. Additionally, an SE module is added before it to filter feature information, eliminating low-weight features to generate an improved residual network model, WRSR. To focus more on contextual information and establish connections between a pixel and those not in the same cross-path, the visualized red as integer, green as integer, blue as integer images are inputted into a model incorporating a fused RCCA module for defect prediction. Design/methodology/approach Software defect prediction based on code visualization is a new software defect prediction technology, which mainly realizes the defect prediction of code by visualizing code as image, and then applying attention mechanism to extract the features of image. However, the challenges of current visualization software defect prediction mainly include the large training sample size and low sample quality of the data, and the classical models used today are not efficient, and the existing attention mechanisms have high computational complexity and high GPU occupancy. Findings Experimental evaluation using ten open-source Java data sets from PROMISE and five existing methods demonstrates that the proposed approach achieves an F-measure value of 0.637 in predicting 16 cross-version projects, representing a 6.1% improvement. Originality/value RCCA-WRSR is a new visual software defect prediction based on recurrent criss-cross attention and improved residual network. This method effectively enhances the performance of software defect prediction.