Abstract

In video annotation, instead of annotating every frame of a trajectory, usually only a sparse set of annotations is provided by the user: typically its endpoints plus some key intermediate frames, interpolating the remaining annotations between these key frames in order to reduce the cost of the video labeling. While a number of video annotation tools have been proposed, some of which are freely available, and bounding box interpolation is mainly based on image processing techniques whose performance is highly dependent on image quality, occlusions, etc. We propose an alternative method to interpolate bounding box annotations, based on cubic splines and the geometric properties of the elements involved, rather than image processing techniques. The algorithm proposed is compared with other bounding box interpolation methods described in the literature, using a set of selected videos modeling different types of object and camera motion. Experiments show that the accuracy of the interpolated bounding boxes is higher than the accuracy of the other evaluated methods, especially when considering rigid objects. The main goal of this paper is related with the bounding box interpolation step, and we believe that our design can be integrated seamlessly with any annotation tool already developed.

Highlights

  • The growth in image and video processing demands larger quantities of annotated training data

  • 4 Results and discussion In order to test the interpolation schema described in this paper, the proposed algorithm has been implemented in Python, using OpenCV

  • For the cubic spline interpolation, we used the functions implemented in the SciPy library [19]

Read more

Summary

Introduction

The growth in image and video processing demands larger quantities of annotated training data. To take into account this effect, Yuen et al [14] pointed out that ‘a constant velocity in space does not project to a constant velocity in the image plane, due to perspective effects.’ In their work, they proposed an alternative method to project the coordinates of the annotated object to the image plane for any time between two key frames. This improvement allows us to use cubic splines to interpolate the spatial bounding box coordinates, instead of using linear interpolation, to model object trajectories.

Reconstruction for a single segment
Reconstruction from multiple segments
Results and discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call