Keypoint detection is an important research topic in target recognition and classification. This paper studies the detection of keypoints in images of Amur tigers and proposes a target keypoint detection method based on heterogeneous convolution neural networks. Because of the limited storage capacity of the monitoring device and higher accuracy requirement, we propose a heterogeneous convolution called SHetConv, which is composed of group convolution and standard convolution. We use two kinds of SHetConv, one to reduce the computational costs [number of FLOPs (FLOPs stands for the floating-point operations per second .)] and one to increase the receptive field. To further improve the effectiveness of the model, we propose a feature fusion module to make full use of the semantic information and spatial information of images. We evaluate the algorithm on Tiger Pose Keypoint, CIFAR-10 and MPII datasets. The experimental results show that our method has a better accuracy, recall rate and $${F_{{1}}}$$ -score than other state-of-the-art keypoint detection methods. Moreover, the number of parameters and FLOPs are substantially reduced. Specifically, the number of parameter and FLOPs of the Our (scaled network + fusion module + shet2) model are 0.14 and 0.143 times those of the big HRNet-W48 model, and its $${F_{{1}}}$$ -score is increased by 0.3%.
Read full abstract