Identifying drivable areas between orchard rows is crucial for intelligent agricultural equipment. However, challenges remain in this field’s accuracy, real-time performance, and generalization of deep learning models. This study proposed the SwinLabNet model in the context of jujube orchards, an innovative network model that utilized a lightweight CNN-transformer hybrid architecture. This approach optimized feature extraction and contextual information capture, effectively addressing long-range dependencies, global information acquisition, and detailed boundary processing. After training on the jujube orchard dataset, the SwinLabNet model demonstrated significant performance advantages: training accuracy reached 97.24%, the mean Intersection over Union (IoU) was 95.73%, and the recall rate was as high as 98.36%. Furthermore, the model performed exceptionally well on vegetable datasets, highlighting its generalization capability across different crop environments. This study successfully applied the SwinLabNet model in orchard environments, providing essential support for developing intelligent agricultural equipment, advancing the identification of drivable areas between rows, and laying a solid foundation for promoting and applying intelligent agrarian technologies.