For small target objects on fast-moving conveyor belts, traditional vision detection algorithms equipped with conventional robotic arms struggle to capture the long and short-range pixel dependencies crucial for accurate detection. This leads to high miss rates and low precision. In this study, we integrate the traditional EMA (efficient multi-scale attention) algorithm with the c2f (channel-to-pixel) module from the original YOLOv8, alongside a Faster-Net module designed based on partial convolution concepts. This fusion results in the Faster-EMA-Net module, which greatly enhances the ability of the algorithm and robotic technologies to extract pixel dependencies for small targets, and improves perception of dynamic small target objects. Furthermore, by incorporating a small target semantic information enhancement layer into the multiscale feature fusion network, we aim to extract more expressive features for small targets, thereby boosting detection accuracy. We also address issues with training time and subpar performance on small targets in the original YOLOv8 algorithm by improving the loss function. Through experiments, we demonstrate that our attention-based visual detection algorithm effectively enhances accuracy and recall rates for fast-moving small targets, meeting the demands of real industrial scenarios. Our approach to target detection using industrial robotic arms is both practical and cutting-edge.
Read full abstract