Hyperspectral image (HSI) qualities are limited by a mixture of Gaussian noise, impulse noise, stripes, and deadlines during the sensor imaging process, resulting in weak application performance. To enhance HSI qualities, methods based on convolutional neural networks have been successively applied to restore clean data from the observed data. However, the architecture of these methods lacks spectral and spatial constraints, and the convolution operators have limited receptive fields and inflexible model inferences. Thus, in this study, we propose an efficient end-to-end transformer, named HSI denoising transformer (Hider), for mixed HSI noise removal. First, a U-shaped 3-D transformer architecture is built for multiscale feature aggregation. Second, a multihead global spectral attention module within the spectral transformer block is designed to excavate information in different spectral patterns. Finally, an additional locally enhanced cross-spatial attention module within the spatial-spectral transformer block is constructed to build the long-range spatial relationship to avoid the high computational complexity of global spatial self-attention. Through the imposition of global correlations along spectrum and spatial self-similarity constraints on the transformer, our proposed Hider aims to capture long-range spatial contextual information and cluster objects with the same spectral pattern for HSI denoising. To verify the effectiveness and efficiency of Hider, we conducted extensive simulated and real experiments. The denoising results on both simulated and real-world datasets show that Hider achieves superior evaluation metrics and visual assessments compared with other state-of-the-art methods.