Maize is susceptible to pest disease, and the production of maize would suffer a significant decline without precise early detection. Hyperspectral imaging is well-suited for the precise detection of diseases due to its ability to capture the internal chemical characteristics of vegetation. However, the abundance of redundant information in hyperspectral data poses challenges in extracting significant features. To overcome the above problems, in this study we proposed an attention-based spatial-spectral joint network model for hyperspectral detection of pest-infected maize. The model contains 3D and 2D convolutional layers that extract features from both spatial and spectral domains to improve the identification capability of hyperspectral images. Moreover, the model is embedded with an attention mechanism that improves feature representation by focusing on important spatial and spectral-wise information and enhances the feature extraction ability of the model. Experimental results demonstrate the effectiveness of the proposed model across different field scenarios, achieving overall accuracies (OAs) of 99.24% and 97.4% on close-up hyperspectral images and middle-shot hyperspectral images, respectively. Even under the condition of a lack of training data, the proposed model performs a superior performance relative to other models and achieves OAs of 98.29% and 92.18%. These results proved the validity of the proposed model, and it is accomplished efficiently for pest-infected maize detection. The proposed model is believed to have the potential to be applied to mobile devices such as field robots in order to monitor and detect infected maize automatically.