Abstract
Moving object detection and tracking from image sequences has been extensively studied in a variety of fields. Nevertheless, observing geometric attributes and identifying the detected objects for further investigation of moving behavior has drawn less attention. The focus of this study is to determine moving trajectories, object heights, and object recognition using a monocular camera configuration. This paper presents a scheme to conduct moving object recognition with three-dimensional (3D) observation using faster region-based convolutional neural network (Faster R-CNN) with a stationary and rotating Pan Tilt Zoom (PTZ) camera and close-range photogrammetry. The camera motion effects are first eliminated to detect objects that contain actual movement, and a moving object recognition process is employed to recognize the object classes and to facilitate the estimation of their geometric attributes. Thus, this information can further contribute to the investigation of object moving behavior. To evaluate the effectiveness of the proposed scheme quantitatively, first, an experiment with indoor synthetic configuration is conducted, then, outdoor real-life data are used to verify the feasibility based on recall, precision, and F1 index. The experiments have shown promising results and have verified the effectiveness of the proposed method in both laboratory and real environments. The proposed approach calculates the height and speed estimates of the recognized moving objects, including pedestrians and vehicles, and shows promising results with acceptable errors and application potential through existing PTZ camera images at a very low cost.
Highlights
In the field of computer vision, detecting and tracking moving objects has been widely studied for decades
A detector-agnostic procedure was developed by integrating both unsupervised and supervised (deep learning convolutional neural networks (CNN)) techniques to extract the detected and verified targets through the fusion and data association steps [2]
This study focuses on the spatial information processing of object geometry estimation in Pan Tilt Zoom (PTZ)
Summary
In the field of computer vision, detecting and tracking moving objects has been widely studied for decades. The segmenting-based methods, such as mean shift clustering, graph-cuts, and active contours, divide the images into perceptually similar regions Supervised classification methods, such as support vector machine, neural networks, and adaptive boosting techniques, are trained to detect the features of the objects [21]. A more intuitive method is the background subtraction method in which algorithms can be categorized into recursive and non-recursive methods [24] These algorithms can provide more comprehensive object information by finding the variations in the image background model provided that the precise background has been known [25,26,27]. 3D scene flow has been introduced to form a dense 3D motion field for object detection, but stereo or multiple camera configurations are typically required to obtain depth information of the scene [28,29]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have