Video Summarization Using Deep Learning and Optimization Approaches: A Systematic Review
The burgeoning volume of video data has intensified the imperative for advanced mechanisms to enable efficient storage, navigation, indexing, retrieval, and fluid content dissemination. Despite extensive scholarly efforts in video summarization, there persists a critical need to consolidate recent innovations, delineate ongoing challenges, trace emerging paradigms, standardize evaluative frameworks, and curate benchmark datasets for rigorous performance appraisal. This survey provides a comprehensive analysis of contemporary summarization methodologies, spotlighting transformative advancements and paradigmatic shifts over the past two decades that have redefined the domain. It systematically classifies core approaches, synthesizes pivotal insights, and underscores significant milestones. Video summarization condenses voluminous footage into its most semantically rich segments, a functionality indispensable for applications such as surveillance, where continuous Closed-Circuit Television (CCTV) monitoring underpins security and incident tracking. Yet, processing protracted video content remains computationally demanding and time-intensive, a challenge compounded when integrating multiple perspectives, thus emphasizing the centrality of Multi-View Summarization (MVS). This study elucidates the theoretical underpinnings, technical intricacies, and practical implications of both single-view and multi-view summarization, with particular emphasis on deep learning architectures and optimization-driven strategies. Through a systematic review of recent developments, the paper aims to inform future research, unlock new opportunities, and contribute to the evolution of more robust and adaptive video summarization frameworks.
- Book Chapter
66
- 10.1201/9781003277224-2
- Aug 15, 2022
It was only up till recent times that computer science and its were sufficient for the application in basic principles. With the in the field of artificial intelligence, the subset Deep learning is towards substantial research and advances, creating diverse We cannot consider deep learning to be an individual approach; it is a collective term which comprises fields from contrasting to be associated with the common spine—Deep learning. Basis for strong approach in deep learning lies in cognizance of the of deep learning. The implementations can be performed vastly in fields through implication of not just one but numerous algorithms achieving our goal. The architecture of deep learning has enhanced in previous years exponentially, and as per demand, the refinement of learning implying that the architecture is dynamic. A few of the most improvised architectures are mentioned below: 30Recurrent neural networks (RNNs) Long short-term memory (LSTM)/gated recurrent unit (GRU) Convolutional neural networks (CNNs) Deep belief networks (DBN) and deep stacking networks (DSNs) Open source software options for deep learning. The area of implementation for deep learning in problem solving is vast. Feed forward networks are very effective as well as recurrent networks can be a good source for the solution of the deep learning problems. The Framework for deep learning can be implemented in software packages for the useful creation of neural network. The framework needs an implementation on a standardized scale and hence needs industrial experts for the framework to be implemented. The entire framework is in simple terms based on the Diagnosis of the problem and further, evaluating the problem. It is evident that the architecture and framework of deep learning is vast and expanding its horizons to every field possible for implementation. Therefore deep learning architecture and framework would be vitalized, with step by step conception. The architecture would be simplified as well as illustrated. All the aforesaid architecture like Recurrent neural network, Long short term memory/gated recurrent unit, convolutional, Deep belief—deep stack as well as open source would be simplified as well as illustrated.
- Research Article
37
- 10.1016/j.neucom.2018.12.040
- Dec 28, 2018
- Neurocomputing
Video summarization via spatio-temporal deep architecture
- Research Article
1
- 10.32628/cseit24104109
- Jul 9, 2024
- International Journal of Scientific Research in Computer Science, Engineering and Information Technology
Through the combination of machine learning (ML) and deep learning (DL) approaches, substantial progress has been made in the field of medical picture categorization, which is an essential component in the field of medical diagnostics. Within the context of medical picture categorization, this paper provides an in-depth examination of the development, methodology, and applications of machine learning and deep learning. By making use of handmade features, traditional machine learning techniques, such as support vector machines and decision trees, have laid the groundwork for early achievements in the field. On the other hand, the introduction of deep learning, and more specifically convolutional neural networks (CNNs), has brought about a revolution in the industry by making it possible to automatically extract features and obtaining greater performance. This article takes a look at a number of different deep learning architectures, including ResNet, VGG, and Inception, and highlights the contributions that these designs have made to tasks such as illness categorization, organ segmentation, and tumor identification. In addition to this, it discusses alternative solutions such as data augmentation, transfer learning, and model optimization after addressing problems such as the lack of data, the interpretability of the data, and the demands placed on the computing resources. In addition, the evaluation takes into account the ethical concerns, as well as the need for rigorous validation in order to guarantee clinical application. This study highlights the revolutionary influence that machine learning and deep learning have had on medical imaging by conducting a comparative analysis of current research. It also highlights the ongoing need for innovation and cooperation across disciplines in order to improve diagnostic accuracy and patient outcomes.
- Research Article
21
- 10.1016/j.jtumed.2020.10.008
- Nov 10, 2020
- Journal of Taibah University Medical Sciences
Investigating the learning approaches of students in nursing education
- Research Article
62
- 10.1007/s11042-021-10977-y
- May 15, 2021
- Multimedia Tools and Applications
The volume of video data generated has seen an exponential growth over the years and video summarization has emerged as a process that can facilitate efficient storage, quick browsing, indexing, fast retrieval and quick sharing of the content. In view of the vast literature available on different aspects of video summarization approaches and techniques, a need has arisen to summarize and organize various recent research findings, future research focus and trends, challenges, performance measures and evaluation and datasets for testing and validations. This paper investigates into the existing video summarization frameworks and presents a comprehensive view of the existing approaches and techniques. It highlights the recent advances in the techniques and discusses the paradigm shift that has occurred over the last two decades in the area, leading to considerable improvement. Attempts are made to consolidate the most significant findings right from the basic summarization structure to the classification of summarization techniques and noteworthy contributions in the area. Additionally, the existing datasets categorized domain-wise for the purpose of video summarization and evaluation are enumerated. The present study would be helpful in: assimilating important research findings and data for ready reference, identifying groundwork and exploring potential directions for further research.
- Book Chapter
- 10.1007/978-3-030-76167-7_11
- Jan 1, 2021
Recently, forecasting is an important task in various domains. Forecasting has applications in various domains where future estimates are very useful such as economics, weather, transportation, environment, sales and production, finance, sports, and health care. It helps to predict future values based on present and past data. It needs to process a large volume of data to analyze the trend and identify the behavior of the historical data to forecast the future. In a day, a large volume of data in terms of petabytes is generated from various devices, sensors, and social media which helps to develop forecasting applications. To perform forecasting, different approaches like qualitative and quantitative methods are used. Traditional approaches are not suitable to process the huge volume of data results of the inefficient forecasting models. Hence, a deep architecture of neural networks like deep learning techniques (DLT) is more suitable to process, analyze, and forecast models. Also, DLT supports the nonlinear data, multivariate data, and multistep data in developing efficient forecasting models. The main motivation of this chapter is to brief the use of deep learning techniques to develop forecasting models in various domains. This chapter mainly focuses on the different deep learning architectures such as fully connected neural networks (FCNN), recursive neural network, convolutional neural network (CNN), recurrent neural network (RNN), generative adversarial networks (GAN), and deep reinforcement learning to develop forecasting models with various parameters and reveal the applications of these methods.
- Research Article
1
- 10.21315/eimj2022.14.4.7
- Dec 27, 2022
- Education in Medicine Journal
There are minimal published data on the relationship between personality traits and learning approaches among medical students. This study explored the causal-effect relationship of personality traits and learning approaches among medical students. A cross-sectional study was conducted on medical students and they responded to the Learning Approach Inventory and USM Personality Inventory to measure personality traits and learning approaches, respectively. A structural equation modelling was performed by AMOS 24 to test the causal-effect relationship of personality traits and learning approaches. Conscientiousness had a positive direct effect on deep learning approach, while neuroticism had negative direct effect on deep and strategic learning approaches. Extraversion, openness, and agreeableness had no significant link or effect on any learning approaches. Strategic learning approach had positive direct effect on deep learning approach and a mediator for surface learners on deep learning approach. Surface learning approach had a negative direct effect on deep learning approach. There was a significant relationship of specific personality traits and learning approaches. Conscientiousness and neuroticism had significant relationships with deep and strategic learning approaches. These findings enables medical educators to have a better understanding of the influence of personality traits on medical students’ learning approaches to learning tasks and their implications on instructional strategies.
- Research Article
1574
- 10.3390/electronics8030292
- Mar 5, 2019
- Electronics
In recent years, deep learning has garnered tremendous success in a variety of application domains. This new field of machine learning has been growing rapidly and has been applied to most traditional application domains, as well as some new areas that present more opportunities. Different methods have been proposed based on different categories of learning, including supervised, semi-supervised, and un-supervised learning. Experimental results show state-of-the-art performance using deep learning when compared to traditional machine learning approaches in the fields of image processing, computer vision, speech recognition, machine translation, art, medical imaging, medical information processing, robotics and control, bioinformatics, natural language processing, cybersecurity, and many others. This survey presents a brief survey on the advances that have occurred in the area of Deep Learning (DL), starting with the Deep Neural Network (DNN). The survey goes on to cover Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), including Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU), Auto-Encoder (AE), Deep Belief Network (DBN), Generative Adversarial Network (GAN), and Deep Reinforcement Learning (DRL). Additionally, we have discussed recent developments, such as advanced variant DL techniques based on these DL approaches. This work considers most of the papers published after 2012 from when the history of deep learning began. Furthermore, DL approaches that have been explored and evaluated in different application domains are also included in this survey. We also included recently developed frameworks, SDKs, and benchmark datasets that are used for implementing and evaluating deep learning approaches. There are some surveys that have been published on DL using neural networks and a survey on Reinforcement Learning (RL). However, those papers have not discussed individual advanced techniques for training large-scale deep learning models and the recently developed method of generative models.
- Research Article
19
- 10.3390/app13106065
- May 15, 2023
- Applied Sciences
There is an abundance of digital video content due to the cloud’s phenomenal growth and security footage; it is therefore essential to summarize these videos in data centers. This paper offers innovative approaches to the problem of key frame extraction for the purpose of video summarization. Our approach includes the extraction of feature variables from the bit streams of coded videos, followed by optional stepwise regression for dimensionality reduction. Once the features are extracted and their dimensionality is reduced, we apply innovative frame-level temporal subsampling techniques, followed by training and testing using deep learning architectures. The frame-level temporal subsampling techniques are based on cosine similarity and the PCA projections of feature vectors. We create three different learning architectures by utilizing LSTM networks, 1D-CNN networks, and random forests. The four most popular video summarization datasets, namely, TVSum, SumMe, OVP, and VSUMM, are used to evaluate the accuracy of the proposed solutions. This includes the precision, recall, F-score measures, and computational time. It is shown that the proposed solutions, when trained and tested on all subjective user summaries, achieved F-scores of 0.79, 0.74, 0.88, and 0.81, respectively, for the aforementioned datasets, showing clear improvements over prior studies.
- Research Article
10
- 10.1186/s13634-024-01139-x
- May 15, 2024
- EURASIP Journal on Advances in Signal Processing
Person re-identification (ReID) aims to find the person of interest across multiple non-overlapping cameras. It is considered an essential step for person tracking applications which is vital for surveillance. Person ReID could be investigated either using image-based or video-based. Video-based person ReID is considered more discriminating and realistic than image-based ReID due to the massive information extracted for each person. Different deep-learning techniques have been used for video-based ReID. In this survey, recently published articles are reviewed according to video-based ReID system pipeline: deep features learning, deep metric learning, and deep learning approaches. The deep feature learning approaches are categorized into spatial and temporal approaches, while deep metric learning is divided into metric and metric learning approaches. The deep learning approaches are differentiated into: supervised, unsupervised, weakly-supervised, and one-shot learning. A detailed analysis is held for the architectures of the state-of-the-art deep learning approaches. And their performance on four benchmark datasets is compared.
- Research Article
2
- 10.4103/sjhs.sjhs_106_19
- Jan 1, 2019
- Saudi Journal for Health Sciences
Background: In problem-based learning (PBL) curricula implemented around the world, it is assumed that students adopt a deep learning approach to studying and aim to gain a profound understanding of the subjects being studied. However, it is not clear which PBL components initiate or deter deep learning and to what extent this happens and why. Aim: This study explored to which extent students used a deep or surface learning approach in PBL and whether this differs across years. We also investigated which PBL components students perceived to be hindrances to deep or surface learning. Methods: The study took place at Sulaiman Al Rajhi Medical College, Qassim, Kingdom of Saudi Arabia. A mixed-methods approach was applied. A validated questionnaire and semi-structured focus group interviews were conducted sequentially. Results: First-, second-, and third-year students reported, in scale 1–5, for deep learning scores, respectively, with mean (M) = 3.55, M = 3.41, and M = 3.55. First-, second-, and third-year students reported, in scale 1–5, for surface learning scores, respectively, with M = 2.88, M = 2.78, and M = 2.89. The differences for both deep and surface learning across the years were statistically nonsignificant. According to students, they study deeply on main learning objectives and superficially on minor objectives as indicated by tutors, they are stimulated toward deep learning through interesting topics during self-study, and examinations drive them toward deep or surface learning depending on the question format and necessity to pass. Conclusions: The results of this study confirm that students' perceptions of PBL components affect their approaches to deep and surface learning. These effects are not entirely negative or positive. Students seem to frequently employ a deep learning approach in PBL throughout the 3 years. These conclusions will allow program administrators/educationalists to constructively design curricula around the perceptions of learners of PBL tutors, topics, and examinations.
- Research Article
7
- 10.3389/fdata.2022.1106776
- Jan 9, 2023
- Frontiers in Big Data
With the massive expansion of videos on the internet, searching through millions of them has become quite challenging. Smartphones, recording devices, and file sharing are all examples of ways to capture massive amounts of real time video. In smart cities, there are many surveillance cameras, which has created a massive volume of video data whose indexing, retrieval, and administration is a difficult problem. Exploring such results takes time and degrades the user experience. In this case, video summarization is extremely useful. Video summarization allows for the efficient storing, retrieval, and browsing of huge amounts of information from video without sacrificing key features. This article presents a classification and analysis of video summarization approaches, with a focus on real-time video summarization (RVS) domain techniques that can be used to summarize videos. The current study will be useful in integrating essential research findings and data for quick reference, laying the preliminaries, and investigating prospective research directions. A variety of practical uses, including aberrant detection in a video surveillance system, have made successful use of video summarization in smart cities.
- Research Article
75
- 10.1007/s10462-023-10444-0
- Mar 15, 2023
- Artificial Intelligence Review
One of the critical multimedia analysis problems in today’s digital world is video summarization (VS). Many VS methods have been suggested based on deep learning methods. Nevertheless, These are inefficient in processing, extracting, and deriving information in the minimum amount of time from long-duration videos. Detailed analysis and investigation of numerous deep learning approach accomplished to determine root of problems connected with different deep learning methods in identifying and summarizing the essential activities in such videos. Various deep learning techniques have been investigated and examined to detect the event and summarization capability for detecting and summarizing multiple activities. Keyframe selection Event detection, categorization, and the activity feature summarization correspond to each activity. The limitations related to each category are also discussed in depth. Concerns about detecting low activity using the deep network on various types of public datasets are also discussed. Viable strategies are suggested to evaluate and improve the generated video summaries on such datasets. Moreover, Potential recommended applications based on literature are listed out. Various deep learning tools for experimental analysis have also been discussed in the paper. Future directions are presented for further exploration of research in VS using deep learning strategies.
- Research Article
8
- 10.1152/advan.00196.2023
- Apr 11, 2024
- Advances in physiology education
This study aimed to compare the impact of the partially flipped physiology classroom (PFC) and the traditional lecture-based classroom (TLC) on students' learning approaches. The study was conducted over 5 mo at Xiangya School of Medicine from February to July 2022 and comprised 71 students majoring in clinical medicine. The experimental group (n = 32) received PFC teaching, whereas the control group (n = 39) received TLC. The Revised Two-Factor Study Process Questionnaire (R-SPQ-2F) was used to assess the impact of different teaching methods on students' learning approaches. After the PFC, students got significantly higher scores on deep learning approach (Z = -3.133, P < 0.05). Conversely, after the TLC students showed significantly higher scores on surface learning approach (Z = -2.259, P < 0.05). After the course, students in the PFC group scored significantly higher in deep learning strategy than those in the TLC group (Z = -2.196, P < 0.05). The PFC model had a positive impact on deep learning motive and strategy, leading to an improvement in the deep approach, which is beneficial for the long-term development of students. In contrast, the TLC model only improved the surface learning approach. The study implies that educators should consider implementing PFC to enhance students' learning approaches.NEW & NOTEWORTHY In this article, we compare the impact of the partially flipped classroom (PFC) and the traditional lecture classroom (TLC) in a physiology course on medical students' learning approaches. We found that the PFC benefited students by significantly enhancing their deep learning motive, strategy, and approach, which was good for them. However, the TLC model only improved the surface learning motive and approach.
- Research Article
63
- 10.3390/computers8010004
- Jan 1, 2019
- Computers
We describe the sentiment analysis experiments that were performed on the Lithuanian Internet comment dataset using traditional machine learning (Naïve Bayes Multinomial—NBM and Support Vector Machine—SVM) and deep learning (Long Short-Term Memory—LSTM and Convolutional Neural Network—CNN) approaches. The traditional machine learning techniques were used with the features based on the lexical, morphological, and character information. The deep learning approaches were applied on the top of two types of word embeddings (Vord2Vec continuous bag-of-words with negative sampling and FastText). Both traditional and deep learning approaches had to solve the positive/negative/neutral sentiment classification task on the balanced and full dataset versions. The best deep learning results (reaching 0.706 of accuracy) were achieved on the full dataset with CNN applied on top of the FastText embeddings, replaced emoticons, and eliminated diacritics. The traditional machine learning approaches demonstrated the best performance (0.735 of accuracy) on the full dataset with the NBM method, replaced emoticons, restored diacritics, and lemma unigrams as features. Although traditional machine learning approaches were superior when compared to the deep learning methods; deep learning demonstrated good results when applied on the small datasets.