The development of Deep Neural Networks (DNNs) has facilitated profound advancements in computer vision. While DNNs demonstrate substantial capabilities, their susceptibility to adversarial attacks can introduce significant errors and hinder their applicability in real-world scenarios. The characteristic of black-box attacks is that they rely on the input–output mapping of the model without accessing its parameters or gradients. Due to the disparities between the substitute and target models, black-box attacks typically exhibit reduced success rates. To address the challenges outlined previously, we propose the Frequency Domain Transformation (FDT) method that employs the Discrete Cosine Transform (DCT) for transforming the input image into the frequency domain. It innovatively applies a grid mask to generate adversarial samples within the frequency domain. This transformation diminishes the spatial correlation among the image pixels and offers a fresh perspective for enhancing the transferability of adversarial examples. Experiments on adversarial attacks using the ImageNet dataset reveal that adversarial examples, when created through methods that transform data in the frequency domain, show enhanced transferability compared to those produced by transformations in the spatial domain. In attacks on six classification models using the single-model approach, FDT achieved an average success rate of 82.9%, corresponding to a 14.4% improvement over the spectrum simulation attack (SSA) method. For ensemble-model attacks with momentum-based techniques, the average success rate using the proposed strategy was 94.4%, corresponding to a 5.7% improvement over the SSA method.