Convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have gained improved results in remote sensing image data classification. Multispectral image classification can benefit from the rich spectral information extracted by these models for land cover classification. This paper proposes a classification model called a hierarchical convolutional recurrent neural network (HCRNN) to combine the CNN and RNN modules for pixel-level classification of multispectral remote sensing images. In the HCRNN model, the original 13-band information from Sentinel-2 is transformed into a 1D multispectral sequence using a fully connected layer. It is then reshaped into a 3D multispectral feature matrix. The 2D-CNN features are extracted and used as inputs to the corresponding hierarchical RNN. The feature information at each level is adapted to the same convolution size. This network structure fully leverages the advantages of CNNs and RNNs to extract temporal and spatial features from the spectral data, leading to high-precision pixel-level multispectral remote sensing image classification. The experimental results demonstrate that the overall accuracy of the HCRNN model on the Sentinel-2 dataset reaches 97.62%, which improves the performance by 1.78% compared to the RNN model. Furthermore, this study focused on the changes in forest cover in the study area of Laibin City, Guangxi Zhuang Autonomous Region, which was 7997.1016 km2, 8990.4149 km2, and 8103.0020 km2 in 2017, 2019, and 2021, respectively, with an overall trend of a small increase in the area covered.