High-level Semantic Video Research Articles

Recent works on video captioning mainly learn the map from low-level visual features to language description directly without explicitly representing the high-level semantic video concepts (e.g. objects, actions in the annotated sentences). To bridge the semantic gap, in this paper, addressing it, we propose a novel video attribute representation learning algorithm for video concept understanding and utilize the learned explicit video attribute representation to improve video captioning performance. To achieve it, firstly, inspired by the success of spectrogram in audio processing, a novel mid-level video representation named “video response map” (VRM) is proposed, by which the frame sequence could be represented by a single image representation. Therefore, the video attributes representation learning could be converted to a well-studied multi-label image classification problem. Then in the captions prediction step, the learned video attributes feature is integrated with the single frame feature to improve previous sequence-to-sequence language generation model by adjusting the LSTM (Long-Short Term Memory) input units. The proposed video captioning framework could both handle variable frame inputs and utilize high-level semantic video attribute features. Experimental results on video captioning tasks show that the proposed method, utilizing only RGB frames as input without extra video or text training data, could achieve competitive performance with state-of-the-art methods. Furthermore, the extensive experimental evaluations on the UCF-101 action classification benchmark well demonstrate the representation capability of the proposed VRM.

Read full abstract

Despite much work on Universal Multimedia Experience (UME), existing video adaptation approaches cannot yet be considered as truly user-centric, mostly due to their poor handling of semantic user preferences. Indeed, these works mainly concentrate on lower-level user preferences but do neither consider any fine-grained object-level adaptation nor evaluate different adaptation options based on predicted user expectations. Moreover, these works do not provide owners with property rights that enable them to place restrictions on the types of modifications to be made to the video content. To address these shortcomings, we propose the Personalized vIdeo Adaptation Framework (PIAF) for high-level semantic video adaptation. PIAF is a fully integrated framework providing all the requirements for a semantic video adaptation. It defines a video annotation model and a user profile model comprising semantic constraints that are delineated in a consistent way, based on the standards MPEG-7 and MPEG-21. At the heart of the framework, the Adaptation Decision Taking Engine (ADTE) computes utility values for different adaptation options, considering each shot separately. The corresponding utility function evaluates the possible choices by evaluating multiple parameters that capture different dimensions of a multimedia experience: amount of modified content, modifications to key objects and shots with respect to the semantic integrity of the original content, expected processing cost of the adaptation, and the anticipated visual and temporal quality of the adapted content. Furthermore, the ADTE can deal with intellectual property issues by selecting an adaptation plan of good quality that also satisfies constraints specified by the content owner. This paper places a significant emphasis on theoretical details of the utility function and the computation of the adaptation plan. It also presents the results and evaluation of the adaptation process both in simulation and user study.

Read full abstract

High-level Semantic Video Research Articles

Related Topics

Articles published on High-level Semantic Video

Learning explicit video attributes from mid-level representation for video captioning

Personalized video adaptation framework (PIAF): high-level semantic adaptation

The thinking eye is only half the story: High-level semantic video surveillance

A Visual Attention Based Region-of-Interest Determination Framework for Video Sequences

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

High-level Semantic Video Research Articles

Related Topics

Articles published on High-level Semantic Video

Learning explicit video attributes from mid-level representation for video captioning

Personalized video adaptation framework (PIAF): high-level semantic adaptation

The thinking eye is only half the story: High-level semantic video surveillance

A Visual Attention Based Region-of-Interest Determination Framework for Video Sequences