Abstract
Recognition of human actions from videos can be improved if depth information is available. Depth information certainly helps in segregating foreground motion from the background. Single image depth estimation (SIDE) is a commonly used method for the analysis of weather degraded images. In this study, the idea of SIDE is extended to human action recognition (HAR) on datasets where depth information is not available. Several depth‐based HAR algorithms are available but all of them are using the depth information given with the dataset. Some other methods are using depth motion map which refers to the depth of motion in a temporal direction. Here, a new depth‐based end‐to‐end deep network is proposed for HAR in which the frame‐wise depth is estimated and this estimated depth is used for processing instead of RGB frame. As colour information is not required for estimating motion, a single channel depth map is used for estimating motion in the video. It makes the system computationally efficient. The proposed method is tested and verified on three benchmark datasets namely JHMDB, HMDB51 and UCF101. The proposed method outperforms the existing state‐of‐the‐art methods for HAR on all the three tested datasets.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have