Urban green spaces are an indispensable part of the ecology of cities, serving as the city’s “purifier” and playing a crucial role in promoting sustainable urban development. Therefore, the refined classification of urban green spaces is an important task in urban planning and management. Traditional methods for the refined classification of urban green spaces heavily rely on expert knowledge, often requiring substantial time and cost. Hence, our study presents a multi-label image classification model based on MobileViT. This model integrates the Triplet Attention module, along with the LSTM module, to enhance its label prediction capabilities while maintaining its lightweight characteristic for standalone operation on mobile devices. Trial outcomes in our UGS dataset in this study demonstrate that the approach we used outperforms the baseline by 1.64%, 3.25%, 3.67%, and 2.71% in mAP,F1,precision, and recall, respectively. This indicates that the model can uncover the latent dependencies among labels to enhance the multi-label image classification device’s performance. This study provides a practical solution for the intelligent and detailed classification of urban green spaces, which holds significant importance for the management and planning of urban green spaces.