Abstract

General object and activity recognition is a fundamental problem in computer vision, which has been the subject of much research. Traditional approaches include model-based and appearance template-based methods. Recently, inspired by methods from the text retrieval literature, local visual feature-based models have shown a lot of success for recognition of objects or activities with large within-class geometric variability. There are several challenges in this approach, namely feature selection and target modeling using these features. This thesis proposes a local-global visual feature-based framework for general object and activity recognition with novel methods for these problems: (1) Combinatorial and statistical methods for selecting informative parts to build statistical models for part-based object recognition. First a combinatorial optimization formulation is used for clustering on a weighted multipartite graph. Second, a statistical method for selecting discriminative parts from positive images is used to localize objects. (2) An entropy based vocabulary selection method for “bag-of-words” models for activity recognition. (3) Integrating both spatial and temporal information with appearance features for human activity recognition. This method models the human motions with the distribution of local motion features and their spatial-temporal arrangements. The effectiveness of the proposed methods is demonstrated by several object recognition and activity recognition data sets, which include human facial expressions and hand gestures, etc. This thesis also covers an interesting project regarding a framework of applying Discrete Fourier Transform to detect salient regions in images and video sequences. This framework generalizes the previous saliency detection methods and can be applied for saliency detection in the video sequences.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call