Semantic Video Research Articles

This paper aims for event recognition when video examples are scarce or even completely absent. The key in such a challenging setting is a semantic video representation. Rather than building the representation from individual attribute detectors and their annotations, we propose to learn the entire representation from freely available web videos and their descriptions using an embedding between video features and term vectors. In our proposed embedding, which we call Video2vec, the correlations between the words are utilized to learn a more effective representation by optimizing a joint objective balancing descriptiveness and predictability. We show how learning the Video2vec embedding using a multimodal predictability loss, including appearance, motion and audio features, results in a better predictable representation. We also propose an event specific variant of Video2vec to learn a more accurate representation for the words, which are indicative of the event, by introducing a term sensitive descriptiveness loss. Our experiments on three challenging collections of web videos from the NIST TRECVID Multimedia Event Detection and Columbia Consumer Videos datasets demonstrate: i) the advantages of Video2vec over representations using attributes or alternative embeddings, ii) the benefit of fusing video modalities by an embedding over common strategies, iii) the complementarity of term sensitive descriptiveness and multimodal predictability for event recognition. By its ability to improve predictability of present day audio-visual video features, while at the same time maximizing their semantic descriptiveness, Video2vec leads to state-of-the-art accuracy for both few- and zero-example recognition of events in video.

Visual attention influenced by example images and predefined targets are widely studied in both cognitive and computer vision fields. Nevertheless, semantics, known to be related to high-level human perception, have a great influence on top-down attention process. Understanding the impact of semantics on visual attention is beneficial for providing psychological and computational guidance on many real-world applications, e.g., semantic video retrieval. In this paper, we intend to study the mechanisms of attention control and computational modeling of saliency detection for dynamic scenes under semantic-instructed viewing conditions. We start our study by establishing a dataset REMoT, the first video eye-tracking dataset with semantic instructions to our best knowledge. We collect the fixation locations of subjects when they are given four kinds of instructions with different levels of noise. The fixation behavior analysis on REMoT shows that the process of semantic-instructed attention can be explained with long-term memory and short-term memory of human visual system. Inspired by this finding, we propose a memory-guided probabilistic model to exploit the semantic-instructed top-down attention. The experience of attention distribution to similar scenes in long-term memory is simulated by linear mapping of global scene features. An HMM-like conditional probabilistic chain is constructed to model the dynamic fixation patterns among neighboring frames in short-term memory. Then, a generative saliency model is constructed which probabilistically combines the top-down and a bottom-up modules for semantic-instructed saliency detection. We compare our model to state-of-the-art models on REMoT and a widely used dataset RSD. Experimental results show that our model achieves significant improvements not only in predicting visual attention under correct instructions, but also in detecting saliency for free viewing.

Semantic Video Research Articles

Related Topics

Articles published on Semantic Video

Multi-label semantic concept detection in videos using fusion of asymmetrically trained deep convolutional neural networks and foreground driven concept co-occurrence matrix

Learning explicit video attributes from mid-level representation for video captioning

Video eCommerce++: Toward Large Scale Online Video Advertising

Hierarchical Latent Concept Discovery for Video Event Detection.

Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks.

Unsupervised Commercials Identification in Videos

Multi-Modal Visual Features-Based Video Shot Boundary Detection

Video2vec Embeddings Recognize Events When Examples Are Scarce.

Hierarchically Supervised Deconvolutional Network for Semantic Video Segmentation

Guest Editorial: Analysis and Retrieval of Events/Actions and Workflows in Video Streams

RDF-powered semantic video annotation tools with concept mapping to Linked Data for next-generation video indexing: a comprehensive review

TagBook: A Semantic Video Representation Without Supervision for Event Detection

Reading the Videos: Temporal Labeling for Crowdsourced Time-Sync Videos Based on Semantic Embedding

Semantic video labeling by developmental visual agents

Video Summary Based on F-Sift, Tamura Textural and Middle Level Semantic Feature

State of the Art: A Summary of Semantic Image and Video Retrieval Techniques

A framework for dynamic restructuring of semantic video analysis systems based on learning attention control

Local Features and a Two-Layer Stacking Architecture for Semantic Concept Detection in Video

On semantic-instructed attention: From video eye-tracking dataset to memory-guided probabilistic saliency model

Fuzzy reasoning framework to improve semantic video interpretation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Semantic Video Research Articles

Related Topics

Articles published on Semantic Video

Multi-label semantic concept detection in videos using fusion of asymmetrically trained deep convolutional neural networks and foreground driven concept co-occurrence matrix

Learning explicit video attributes from mid-level representation for video captioning

Video eCommerce++: Toward Large Scale Online Video Advertising

Hierarchical Latent Concept Discovery for Video Event Detection.

Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks.

Unsupervised Commercials Identification in Videos

Multi-Modal Visual Features-Based Video Shot Boundary Detection

Video2vec Embeddings Recognize Events When Examples Are Scarce.

Hierarchically Supervised Deconvolutional Network for Semantic Video Segmentation

Guest Editorial: Analysis and Retrieval of Events/Actions and Workflows in Video Streams

RDF-powered semantic video annotation tools with concept mapping to Linked Data for next-generation video indexing: a comprehensive review

TagBook: A Semantic Video Representation Without Supervision for Event Detection

Reading the Videos: Temporal Labeling for Crowdsourced Time-Sync Videos Based on Semantic Embedding

Semantic video labeling by developmental visual agents

Video Summary Based on F-Sift, Tamura Textural and Middle Level Semantic Feature

State of the Art: A Summary of Semantic Image and Video Retrieval Techniques

A framework for dynamic restructuring of semantic video analysis systems based on learning attention control

Local Features and a Two-Layer Stacking Architecture for Semantic Concept Detection in Video

On semantic-instructed attention: From video eye-tracking dataset to memory-guided probabilistic saliency model

Fuzzy reasoning framework to improve semantic video interpretation