The safe operation of mechanical equipment is a basic requirement in the industrial manufacturing process. As an important component of mechanical equipment, rolling bearings are prone to unforeseen failures due to long-term operation under complex conditions. The occurrence of failures, no matter how big or small, will cause economic losses. Therefore, reliable diagnosis of rolling bearings is crucial. Aiming at the problem of insufficient feature recognition of convolutional neural networks under strong noise background, a deep feature extraction network integrating multiscale convolutional neural network (MSCNN) and bidirectional gated recurrent unit (BiGRU) is proposed. MSCNN and BiGRU are used to extract multiscale features and temporal features from noisy vibration signals respectively, and different weights are assigned to the fused features through the attention mechanism module to achieve important feature selection. Furthermore, in order to solve the problem of different feature distributions between the source domain and the target domain under variable working conditions, transfer learning is introduced in the proposed deep feature extraction network. The difference in feature distribution between the source domain and the target domain is measured by a multi-level distance formula, and the measurement result is added to the loss function. The back propagation of the loss is used to achieve the alignment of feature distribution between the source domain and the target domain. Finally, the model uses the SoftMax function as a classifier for rolling bearing fault diagnosis. Experimental comparison and analysis show that the proposed model has good migration ability and achieves a higher fault identification accuracy.