Abstract

Object detection and tracking are important areas of research in computer vision. Computer vision solutions to object detection are typically single-frame solutions. To perform tracking by detection, these solutions typically do object detection on a perframe basis, thus losing any temporal information from previous frames. Many multiobject tracking solutions report the average precision performance on video datasets, but they do not evaluate the temporal qualities of these solutions. In video, not only the detection of objects is important but the temporal motion attributes of an object’s path, such as its velocity, acceleration, and jerk, are important as well. Many implementations of Object Tracking by Detection systems have run into the problem of motion smoothing for bounding box paths. This paper focuses on quantifying the smoothness of detected object paths within some temporal window. We propose using two smoothness metrics from the field of biokinematics and adapt them for use with detections. Finally, using these metrics, we evaluate the ground truth and two popular object detectors, at the time of experimentation (YOLOv3 and Retinanet), on the entire MOT17 dataset. The results show that the metrics are useful in determining object smoothness, and provide us with an additional approach to evaluate an algorithm’s performance in object tracking. The experiments also demonstrate that YOLOv3 produces smoother bounding boxes than Retinanet. All supplemental graphs and data are shown in our appendix

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call