Human body motion capture from multi-image video sequences

Nicola D'Apuzzo

doi:10.1117/12.473089

Abstract

In this paper is presented a method to capture the motion of the human body from multi image video sequences without using markers. The process is composed of five steps: acquisition of video sequences, calibration of the system, surface measurement of the human body for each frame, 3-D surface tracking and tracking of key points. The image acquisition system is currently composed of three synchronized progressive scan CCD cameras and a frame grabber which acquires a sequence of triplet images. Self calibration methods are applied to gain exterior orientation of the cameras, the parameters of internal orientation and the parameters modeling the lens distortion. From the video sequences, two kinds of 3-D information are extracted: a three-dimensional surface measurement of the visible parts of the body for each triplet and 3-D trajectories of points on the body. The approach for surface measurement is based on multi-image matching, using the adaptive least squares method. A full automatic matching process determines a dense set of corresponding points in the triplets. The 3-D coordinates of the matched points are then computed by forward ray intersection using the orientation and calibration data of the cameras. The tracking process is also based on least squares matching techniques. Its basic idea is to track triplets of corresponding points in the three images through the sequence and compute their 3-D trajectories. The spatial correspondences between the three images at the same time and the temporal correspondences between subsequent frames are determined with a least squares matching algorithm. The results of the tracking process are the coordinates of a point in the three images through the sequence, thus the 3-D trajectory is determined by computing the 3-D coordinates of the point at each time step by forward ray intersection. Velocities and accelerations are also computed. The advantage of this tracking process is twofold: it can track natural points, without using markers; and it can track local surfaces on the human body. In the last case, the tracking process is applied to all the points matched in the region of interest. The result can be seen as a vector field of trajectories (position, velocity and acceleration). The last step of the process is the definition of selected key points of the human body. A key point is a 3-D region defined in the vector field of trajectories, whose size can vary and whose position is defined by its center of gravity. The key points are tracked in a simple way: the position at the next time step is established by the mean value of the displacement of all the trajectories inside its region. The tracked key points lead to a final result comparable to the conventional motion capture systems: 3-D trajectories of key points which can be afterwards analyzed and used for animation or medical purposes.

Full Text