Ensemble of ghost convolution block with nested transformer encoder for dense object recognition

Ponduri Vasanthi,Laavanya Mohan

doi:10.1016/j.bspc.2023.105645

Abstract

The Technological advancement and innovation is happening at rapid pace and within the ambit of computerized era, the recognition models have displayed exceptionally outstanding performance in object detection but still the technology relies on ‘Dense-packed object detection”. This object detection remains one of the greatest challenges of the present situation due to redundant feature map calculation complexity, diverse shapes, and alignment of the object in various directions. This paper suggests a GCB (Ghost Convolution Block) based on nested-transformer encoder block in the feature refinement network to overcome these intricacies. The GCB effectively alleviates the redundant feature maps calculation complexity by using DW (Depth-Wise separable) convolution operation. Whereas, the nested-transformer encoder block extracts in-depth information from diverse shaped objects and misaligned objects based on the MHSA (Multi-Head-Self-Attention) mechanism's query, key, and value parameter information. We propose to perform quantitative evaluations on the VOC, GWHD (Global Wheat Head Detection), and SKU-110K data sets and carried out an ablation study by using the YOLOv5 model with GCB and GCBTR (Ghost Convolution Block-based Transformer) modules. We compared this model with conventional other models, our model achieved 84.2% mAP, 80% precision, 78.1% recall, and 79% F1-score on VOC, 81.8% mAP, 91.6% precision, 73.1% recall, and 81.3% F1-score on SKU-110K and also achieved 95.7% mAP, 94.4% precision, 90.2% recall on GWHD datasets. Thus, obtained reliable results prove that the suggested model has shown superior performance when compared with other existing models as well as the YOLOv5 model in detecting dense objects.

Full Text