Abstract

Detection of small moving objects is an important research area with applications including monitoring of flying insects, studying their foraging behavior, using insect pollinators to monitor flowering and pollination of crops, surveillance of honeybee colonies, and tracking movement of honeybees. However, due to the lack of distinctive shape and textural details on small objects, direct application of modern object detection methods based on convolutional neural networks (CNNs) shows considerably lower performance. In this paper we propose a method for the detection of small moving objects in videos recorded using unmanned aerial vehicles equipped with standard video cameras. The main steps of the proposed method are video stabilization, background estimation and subtraction, frame segmentation using a CNN, and thresholding the segmented frame. However, for training a CNN it is required that a large labeled dataset is available. Manual labelling of small moving objects in videos is very difficult and time consuming, and such labeled datasets do not exist at the moment. To circumvent this problem, we propose training a CNN using synthetic videos generated by adding small blob-like objects to video sequences with real-world backgrounds. The experimental results on detection of flying honeybees show that by using a combination of classical computer vision techniques and CNNs, as well as synthetic training sets, the proposed approach overcomes the problems associated with direct application of CNNs to the given problem and achieves an average F1-score of 0.86 in tests on real-world videos.

Highlights

  • Convolutional neural networks (CNNs) have improved state of the art results on tasks of object detection in images and videos [1,2]

  • Many algorithms for moving object detection exist [6], due to the presence of motion in the background, their direct application is limited and results in a large number of false positive detections. To filter out these false positive detections, we propose using a CNN trained on groups of consecutive frames, which learns a representation of appearance and motion of small objects and outputs confidence maps of presence of moving objects in the middle frame for each group of frames given as an input to it

  • The pixelwise mean of the frames in a temporal window can be regarded as background estimation, since small moving objects are filtered out by time averaging the frames in the window

Read more

Summary

Introduction

Convolutional neural networks (CNNs) have improved state of the art results on tasks of object detection in images and videos [1,2]. The main reason for this discrepancy is the lack of distinctive shape and texture on small objects It precludes learning useful representations of small objects resulting in worse detection performance. The pixelwise mean of the frames in a temporal window can be regarded as background estimation, since small moving objects are filtered out by time averaging the frames in the window. In this step, we essentially fit a Gaussian probability distribution function, characterized by its mean and standard deviation, to the values of each pixel in a window of previous frames. We subtract the estimated pixel-wise mean from each frame and divide the result with the estimated pixel-wise standard deviation

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call