Abstract

Due to the small sizes of green apples and the similar color of the fruit skin and background in real complex orchard environments, detecting small green apples remains a great challenge. In this paper, we present a focal bottleneck transformer network (FBoT-Net) to incorporate high-level semantic information with strong representation ability and global and local feature information through the focal bottleneck transformer module. We replace the multi-head self-attention with a focal transformer layer in the final three bottleneck architecture steps of ResNet. Specifically, focal multi-head self-attention based on the window is introduced to capture local fine-grained information around the window and global coarse-grained features via window-wise attention. The experimental results show that APS of small and large scale can reach 47.3% and 34.2% on the SmallApple and Pascal VOC. Overall, FBoT-Net is slightly better than BoTNet and other state-of-the-art methods when run on SmallApple. Our method also has a strong generalization ability on the Pascal VOC publicly.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call