The identification of ships in Synthetic Aperture Radar (SAR) imagery is critical for effective maritime surveillance. The advent of deep learning has significantly improved the accuracy of SAR ship classification and recognition. However, distinguishing features between different ship categories in SAR images remains a challenge, particularly as the number of categories increases. The key to achieving high recognition accuracy lies in effectively extracting and utilizing discriminative features. To address this, we propose DCN-MSFF-TR, a novel recognition model inspired by the Transformer encoder–decoder architecture. Our approach integrates a deformable convolutional module (DCN) within the backbone network to enhance feature extraction. Additionally, we introduce multi-scale self-attention processing from the Transformer into the feature hierarchy and fuse these representations at appropriate levels using a feature pyramid strategy. This enables each layer to leverage both its own information and synthesized features from other layers, enhancing feature representation. Extensive evaluations on the OpenSARShip-3-Complex and OpenSARShip-6-Complex datasets demonstrate the effectiveness of our method. DCN-MSFF-TR achieves average recognition accuracies of 78.1% and 66.7% on the three-class and six-class datasets, respectively, outperforming existing recognition models and showcasing its superior capability in accurately identifying ship categories in SAR images.