This paper proposes an automatic visual inspection method for the fracture detection of clevises in the catenary systems of high-speed railways, using images of catenaries captured by an inspection vehicle. First, the clevises are extracted from the catenary image using a convolutional neural network based algorithm, known as the faster region-based convolutional neural network. Because the structure of catenary systems does not have many variations and the contextual information near a catenary fitting may have strong correlation with its category, the architecture of the original faster region-based convolutional neural network is modified to make use of the contextual information of the regions of interest in the images for object recognition. A crack detection process is then used to recognize the fractures of clevises. To detect the cracks, the edge map of the clevis sub-image is generated using a region-scalable fitting model. Areas where the cracks are most likely to occur are projected from a standard clevis image to the clevis sub-image by shape context matching and affine transformation matrix computation. The cracks are then recognized by calculating the wavelet entropy inside these areas followed by morphological filtering. Experimental results show that the modified faster region-based convolutional neural network architecture achieves better results in clevis extraction than the original architecture as well as some other state-of-art object detection models. The detection is not affected by the scaling, texture and grayscale changes of the clevises caused by the variation of shooting distance, shooting angle and illumination variations. The fractures of the clevises can be accurately and reliably detected using the fracture detection method proposed in this paper and the performance of this visual inspection method meets the strict requirements for catenary system maintenance.