Capsule networks, which have achieved many achievements in machine vision, have also gotten wide attention in machinery fault diagnosis. However, due to the non-stationarity and diversity of mechanical system vibration signals, as well as the dual-scale characteristics of different fault features, it is often difficult for existing single-scale capsule networks to fully mine vital discriminative features in the data, which is not conducive to accurate identification of mechanical faults. This paper studies an attention-based dual-scale feature fusion capsule network for mechanical fault diagnosis. It first extracts dual-scale features from grayscale images obtained from vibration signals using convolutional layers composed of kernels of different sizes; secondly, an attention-based two-branch network is designed to calculate the weights of features at different scales, and accordingly dual-scale feature fusion is performed; finally, the obtained features are entered into the capsule layers, and the classification and identification of mechanical faults are realized by optimizing the model using the classification loss and reconstruction loss. A rolling bearing experimental dataset and a motor fault dataset are adopted to assess the performance of the proposed method, and the comparison results confirm its effectiveness and superiority, indicating that it has the potential to be a useful tool for detecting mechanical faults.