Abstract

In the last few years we have seen how the volume of video data has exponentially grown. Specialised online sites like YouTube and NetFlix are attracting a considerable amount of audience who are uploading, accessing, and actively interacting with the online sites. Furthermore, millions of video surveillance cameras have been installed around the world. Video cameras are installed to monitor shopping centres, universities, parks, streets, and in general to monitor any public place. Undoubtedly, it is becoming indispensable to efficiently and automatically manage and interpret all the massive amount of video data available nowadays. Computer vision is the science responsible for processing images and videos. The main goal of this thesis is to contribute towards efficiently managing and interpreting video data via action analysis and video summarisation. Action analysis using computer vision techniques is essential given that the majority of the available videos contain human actions. Action analysis is a broad topic that covers several areas. For instance, we can find: action recognition, joint action segmentation and recognition, and action assessment. For the action recognition problem, there are several techniques designed to recognise actions. Among them, two schools of thought have gained attention recently. On one hand, traditional video encoders and its variants are the main reference for action recognition. Traditional video encoders include the popular Bag of Visual Words and the Fisher Vector representation. On the other hand, statistical modelling of actions via Riemannian manifolds offers an interesting alternative to traditional video encoders. To this end, we provide a detailed analysis of the performance of the two aforementioned schools of thought for action recognition under same set of features across several datasets. The detailed analysis also investigates when these methods break and how performance degrades when the datasets have challenging conditions, likely to be encountered in uncontrolled situations. To address the joint action segmentation and recognition problem, we propose two hierarchical systems where a given video is processed as a sequence of overlapping temporal windows. Both proposed systems require fewer parameters to be optimised and avoid the need for a custom dynamic programming definition as in previous works. The last action analysis problem this thesis focuses on is action assessment. Action assessment is still in early stages. Action assessment consists in assessing how well people perform actions. Learning how to automatically assess actions can be a valuable tool. For instance, catwalk competitions require human assessment which may be highly subjective. However, to date, nobody has attempted to apply computer vision techniques to automatically assess the quality of how someone strides down the catwalk. Action analysis is not the only way to process video information. Video summarisation is an active area of research within the computer vision community. Instead of tedious manual review of hours and hours of video, video summarisation aims to provide a concise and informative summary of the video. We present a novel approach to video summarisation that makes use of a Bag-of-visual-Textures approach which is computationally efficient and effective. Our approach can be used for short-term and long-term videos. On long-term videos the proposed system considerably reduces the amount of footage with only minor degradation in the information content.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call