The precise segmentation of Zanthoxylum bungeanum clusters is crucial for developing picking robots. An improved Mask R-CNN model was proposed in this study for the segmentation of Zanthoxylum bungeanum clusters in natural environments. Firstly, the Swin-Transformer network was introduced into the model’s backbone as the feature extraction network to enhance the model’s feature extraction capabilities. Then, the SK attention mechanism was utilized to fuse the detailed information into the mask branch from the low-level feature map of the feature pyramid network (FPN), aiming to supplement the image detail features. Finally, the distance intersection over union (DIOU) loss function was adopted to replace the original bounding box loss function of Mask R-CNN. The model was trained and tested based on a self-constructed Zanthoxylum bungeanum cluster dataset. Experiments showed that the improved Mask R-CNN model achieved 84.0% and 77.2% in detection mAP50box and segmentation mAP50mask, respectively, representing a 5.8% and 4.6% improvement over the baseline Mask R-CNN model. In comparison to conventional instance segmentation models, such as YOLACT, Mask Scoring R-CNN, and SOLOv2, the improved Mask R-CNN model also exhibited higher segmentation precision. This study can provide valuable technology support for the development of Zanthoxylum bungeanum picking robots.
Read full abstract