In recent years, grassland monitoring has shifted from traditional field surveys to remote-sensing-based methods, but the desired level of accuracy has not yet been obtained. Multi-temporal hyperspectral data contain valuable information about species and growth season differences, making it a promising tool for grassland classification. Transformer networks can directly extract long-sequence features, which is superior to other commonly used analysis methods. This study aims to explore the transformer network's potential in the field of multi-temporal hyperspectral data by fine-tuning it and introducing it into high-powered grassland detection tasks. Subsequently, the multi-temporal hyperspectral classification of grassland samples using the transformer network (MHCgT) is proposed. To begin, a total of 16,800 multi-temporal hyperspectral data were collected from grassland samples at different growth stages over several years using a hyperspectral imager in the wavelength range of 400-1000 nm. Second, the MHCgT network was established, with a hierarchical architecture, which generates a multi-resolution representation that is beneficial for grass hyperspectral time series' classification. The MHCgT employs a multi-head self-attention mechanism to extract features, avoiding information loss. Finally, an ablation study of MHCgT and comparative experiments with state-of-the-art methods were conducted. The results showed that the proposed framework achieved a high accuracy rate of 98.51% in identifying grassland multi-temporal hyperspectral which outperformed CNN, LSTM-RNN, SVM, RF, and DT by 6.42-26.23%. Moreover, the average classification accuracy of each species was above 95%, and the August mature period was easier to identify than the June growth stage. Overall, the proposed MHCgT framework shows great potential for precisely identifying multi-temporal hyperspectral species and has significant applications in sustainable grassland management and species diversity assessment.