Multi-scale defect detection on bearing surfaces is a challenging task due to the varying sizes and shapes of the defects and character-induced noise backgrounds. It has been proven that introducing attention mechanism in convolutional neural networks can often enhance feature extraction performance. However, most existing detectors use attention mechanisms at only one specific level of abstraction, which limits their capability to extract all defect features. In response to this issue, inspired by natural language processing, we present a novel visual hierarchical attention detector for multi-scale defect location and classification. The detector leverages texture, semantic, and instance features of defects through a hierarchical attention mechanism, enabling multi-scale defect detection in bearing images with complex backgrounds. First, a multi-attentional backbone uses spatial attention mechanisms to extract the spatial and semantic information of defects. Second, a semantic alignment and channel attention based feature pyramid network aligns the semantic information of the extracted features and concentrates on the important layers of the feature pyramid. Third, a semantic guided anchoring based region proposal network uses instance-level attention to refine the anchors. Furthermore, the distance-intersection over union loss optimizes the detector to locate defects. The experiments on an industrial bearing surface defect image dataset demonstrated that the proposed method outperformed the baseline by 6.03 % and 8.76 % in terms of mean average precision (mAP) and mean average recall (mAR), achieving 0.7838 mAP and 0.9588 mAR, respectively.
Read full abstract