Abstract

We describe a new algorithm for distinguishing human actions in videos, called the differential geometric trajectory cloud (DGTC) method that captures both fine and large scale structure of the covariant transformed spatio-temporal optical flow field. We show the utility of our algorithm in the context of a content based video retrieval (CBVR) system, where specific frames from a full length video (or separate video shots in a database) are identified containing a queried human action. In the DGTC method, the local geometry of the spatio-temporal covariant eigenspace curves, unique to each human action, are characterized by the Frenet–Serret basis equations, thereby specifying the local time averaged curvature and torsion, as well as providing a means for defining a mean osculating hyperplane for the entire trajectory. To classify a human action from a query, our system uses an adaptive distance metric between the covariant transformed query trajectory and each of the trajectories from all of the actions in the training set. Based upon the separation of between the query and each class, the distance uses either large or small scale information about the trajectory: for large separations, the distance is the separation between trajectory cloud centroids, while for small and intermediate separations the distance is based upon the mean hyperplane orientation obtained from the time averaged curvature and torsion of the trajectory. Our system can function in real-time and has an accuracy greater than 93% for multiple action recognition within video repositories. We also demonstrate the use of our CBVR system locating specific frame positions of trained actions in two full featured films.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call