Abstract

Accurately estimating object numbers plays a crucial role in industrial applications. However, object counting in computer vision poses a formidable challenge, particularly when dealing with images containing tiny, similar, and stacked objects. Therefore, this paper proposes an encoder–decoder-based convolutional neural network named AMSA-CAFF Net to accurately count and regress high-quality density maps from X-ray images of highly dense microscopic components. AMSA-CAFF Net has three major components: an adaptive multi-scale feature aggregation (AMSA) module for encoding adaptive multi-scale features in the encoder, a channel-wise adaptive feature fusion (CAFF) module to ensure smooth feature fusion, and a continuous feature enhancement mechanism to promote information integration in the decoder. In addition, there is still a lack of datasets dedicated to industrial object counting tasks. To address the limitations associated with the existing datasets, we construct a novel and large-scale object counting dataset named X-ray-based industrial electronic component counting dataset (XRAY-IECCD), consisting of 1,460 X-ray images of 10 types of components, resulting in a total of 2,915,126 objects annotated with points. To the best of our knowledge, the XRAY-IECCD is the first dataset of tiny, dense, multi-class object counts with a substantial number of annotations. We evaluate the performance of AMSA-CAFF Net on three datasets: the XRAY-IECCD, crowd counting benchmark ShanghaiTech dataset, and object counting benchmark CARPK dataset. The proposed framework outperforms the state-of-the-art approaches over the proposed XRAY-IECCD and CARPK datasets, highlighting its effectiveness. Furthermore, we achieve competitive results on the ShanghaiTech dataset, which demonstrates the generalization capability of our method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call