Substantial advancements have been achieved in hyperspectral image (HSI) classification through contemporary deep learning techniques. Nevertheless, the incorporation of an excessive number of irrelevant tokens in large-scale remote sensing data results in inefficient long-range modeling. To overcome this hurdle, this study introduces the Group-Sensitive Selective Perception Transformer (GSAT) framework, which builds upon the Vision Transformer (ViT) to enhance HSI classification outcomes. The innovation of the GSAT architecture is primarily evident in several key aspects. Firstly, the GSAT incorporates a Group-Sensitive Pixel Group Mapping (PGM) module, which organizes pixels into distinct groups. This allows the global self-attention mechanism to function within these groupings, effectively capturing local interdependencies within spectral channels. This grouping tactic not only boosts the model’s spatial awareness but also lessens computational complexity, enhancing overall efficiency. Secondly, the GSAT addresses the detrimental effects of superfluous tokens on model efficacy by introducing the Sensitivity Selection Framework (SSF) module. This module selectively identifies the most pertinent tokens for classification purposes, thereby minimizing distractions from extraneous information and bolstering the model’s representational strength. Furthermore, the SSF refines local representation through multi-scale feature selection, enabling the model to more effectively encapsulate feature data across various scales. Additionally, the GSAT architecture adeptly represents both global and local features of HSI data by merging global self-attention with local feature extraction. This integration strategy not only elevates classification precision but also enhances the model’s versatility in navigating complex scenes, particularly in urban mapping scenarios where it significantly outclasses previous deep learning methods. The advent of the GSAT architecture not only rectifies the inefficiencies of traditional deep learning approaches in processing extensive remote sensing imagery but also markededly enhances the performance of HSI classification tasks through the deployment of group-sensitive and selective perception mechanisms. It presents a novel viewpoint within the domain of hyperspectral image classification and is poised to propel further advancements in the field. Empirical testing on six standard HSI datasets confirms the superior performance of the proposed GSAT method in HSI classification, especially within urban mapping contexts, where it exceeds the capabilities of prior deep learning techniques. In essence, the GSAT architecture markedly refines HSI classification by pioneering group-sensitive pixel group mapping and selective perception mechanisms, heralding a significant breakthrough in hyperspectral image processing.