Hyperspectral image classification (HSIC) is a rapidly developing field that utilizes deep learning methods. However, the reliance on convolutional neural networks (CNNs) for spectral–spatial feature extraction presents certain limitations. Specifically, the use of the fixed-position convolutional kernels in CNNs hinders their ability to capture fully the spectral information around the spatial central pixel, thereby overlooking critical differences between features. To address this issue, a decoupled image- and frequency-domain spectral–spatial framework for HSIC was developed in this study. This method incorporates image- and frequency-domain-based multiscale learnable convolutional attention to refine the differentiating features of the different feature distributions. Additionally, a novel frequency-domain information enhancement module was designed to extract the structural shape and texture details under the semantic constraints of the frequency phase, complementing the image domain to improve the extracted feature maps. Furthermore, a simple and efficient hierarchical feature representation module was introduced to extract both local and global information effectively from the fused features. The experimental results obtained using three open datasets and a practical hyperspectral image of the Gaofen-5 satellite demonstrate that the proposed method outperforms other state-of-the-art HSIC methods.