Deep Learning-Based Object Detection Improvement for Fine-Grained Birds

Kuihe Yang,Ziying Song

doi:10.1109/access.2021.3076429

Kuihe Yang, Ziying Song

Open Access

https://doi.org/10.1109/access.2021.3076429

Copy DOI

Abstract

When the object detection algorithm is applied to the bird protection project, there are many problems like large model parameters, high similarity between bird species and single sample scene. In order to further improve the detection accuracy and stability of the object detection model, a multi-object detection algorithm for fine-grained birds is proposed. Firstly, the algorithm introduces Depthwise separable convolution into the feature extraction layer of YOLOv3 algorithm. The convolution process is divided into two parts: deep convolution and point-by-point convolution. The separation between intra-channel convolution and inter-channel convolution is realized. On the basis of high detection accuracy, the number of algorithm model parameters and calculation amount are greatly reduced. Finally, Focal loss was added to the loss function to solve the serious imbalance of positive and negative samples. By reducing the weight of the large number of simple background classes, the algorithm was more focused on detecting foreground classes. The experimental results show that, in the bird data set, the average precision mean (mAP) index of this algorithm is 2.71% higher than YOLOv3 algorithm, the number of parameters is 79.88% lower than YOLOv3 basic model, and the number of frames per second (FPS) is 19.98% higher than YOLOv3 algorithm. This algorithm not only greatly reduces the number of model parameters and computation, but also improves the detection speed and mAP.

Highlights

The Hengshui Lake Wetland Bird Sanctuary project in China uses high-definition cameras to photograph birds, including grey jays, egrets, gulls, and black-billed gulls
This model is based on the deep learning object detection YOLOv3 algorithm and proposes two improvements: Replacing the convolutional layer in the original network with dilated convolution to maintain a larger receptive field and higher resolution, improving the model’s accuracy for small objects; The confidence score of the detection frame is updated by calculating the scale factor, and the detection frame with a score lower than the threshold is removed
By introducing to Depthwise separable convolution and Focal loss, we find that Depthwise separable convolution improves the performance of object detection algorithm model more obviously

Summary

INTRODUCTION

The Hengshui Lake Wetland Bird Sanctuary project in China uses high-definition cameras to photograph birds, including grey jays, egrets, gulls, and black-billed gulls. In order to accurately detect the number of birds around the transmission line, promptly drive the birds away to ensure the normal operation of the line, Zou and Liang [7] design a DC-YOLO model This model is based on the deep learning object detection YOLOv3 algorithm and proposes two improvements: Replacing the convolutional layer in the original network with dilated convolution to maintain a larger receptive field and higher resolution, improving the model’s accuracy for small objects; The confidence score of the detection frame is updated by calculating the scale factor, and the detection frame with a score lower than the threshold is removed. By reducing the weight of the large number of simple background classes, the algorithm is more focused on detecting foreground classes

RELATED WORK

BACKGROUND

Findings

CONCLUSION