Video summarization (VS) technology presents a multi-tendency development. However, there are still certain challenges in shortening a long video into several diverse and concise versions, such as generating various versions and variable lengths of summaries depending on user requirements. In this context, it is critical to extract the spatial–temporal features of video frames. To address these issues, this paper proposes a multi-flexible video summarization scheme using a property-constraint decision.Due to the difficulty of directly transforming user requirements into summary algorithm factors, this paper employs property-constraints as a bridge between user requirements and the video summary algorithm. The specific implementation method is to aggregate and analyze existing research results on summary-oriented properties. It begins with the establishment of a property-constraints library, followed by the translating and mapping of user requirements into VS property-constraints, and ultimately constructing a property-constraints tree for flexible decision versions of VS. In addition, a hybrid cascade bidirectional network (CB-ConvLSTM) based on the Convolutional LSTM Network (ConvLSTM) is designed to extract the spatial–temporal features of the video. On this basis, the VS generator is configured. The goal of this scheme, which combines the VS property constraints and CB-ConvLSTM, is to “analyze once, satisfy multiple factors, generate multiple levels”. Verification experiments and a comparative analysis are conducted on benchmark datasets to evaluate the algorithm in this paper. The results indicate that the proposed algorithm is highly rational, effective, and usable.
Read full abstract