Will You Ever Become Popular? Learning to Predict Virality of Dance Clips

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Dance challenges are going viral in video communities like TikTok nowadays. Once a challenge becomes popular, thousands of short-form videos will be uploaded within a couple of days. Therefore, virality prediction from dance challenges is of great commercial value and has a wide range of applications, such as smart recommendation and popularity promotion. In this article, a novel multi-modal framework that integrates skeletal, holistic appearance, facial and scenic cues is proposed for comprehensive dance virality prediction. To model body movements, we propose a pyramidal skeleton graph convolutional network (PSGCN) that hierarchically refines spatio-temporal skeleton graphs. Meanwhile, we introduce a relational temporal convolutional network (RTCN) to exploit appearance dynamics with non-local temporal relations. An attentive fusion approach is finally proposed to adaptively aggregate predictions from different modalities. To validate our method, we introduce a large-scale viral dance video (VDV) dataset, which contains over 4,000 dance clips of eight viral dance challenges. Extensive experiments on the VDV dataset well demonstrate the effectiveness of our approach. Furthermore, we show that short video applications such as multi-dimensional recommendation and action feedback can be derived from our model.

Similar Papers
  • Book Chapter
  • 10.1007/978-981-19-8915-5_28
IeSTGCN:A Mining Model of Skeleton Spatio-temporal Graph
  • Jan 1, 2022
  • Guojun Mao + 1 more

Traditional methods of human action recognition focus on internal links between human joints, that is, local neighbor connections. However, some external links that cannot exist in the skeleton graph are also important for identifying human behaviors, such as hands and feet in harmonious movement. To capture richer spatial features in a skeleton graph, it is necessary to add external links and distinguish internal links from external links in a skeleton graph. Therefore, this paper designs two different adjacency matrices to characterize the internal links and external links of the human body respectively, and set different edge weights and feature weights to them for autonomous learning during the convolution process. Furthermore, a spatio-temporal graph convolution network called ieSTGCN is proposed. It consists of two modules: graph convolution network supporting internal and external links (ieGCN) and temporal convolution network in human joints (joTCN). Experiments on the Kinetics and the NTU-RGB+D datasets demonstrate that our model can obtain better recognition accuracy than some benchmark models.

  • Research Article
  • 10.55011/staiqc.2022.2102
Optimized Skeleton graph based CNN for Human Abnormal Detection in Video Streams
  • Jan 1, 2022
  • Sparklinglight Transactions on Artificial Intelligence and Quantum Computing
  • Bhagya Jyothi K + 1 more

Human Action Recognition (HAR) is the process of understanding human actions and behavior. HAR has a broad range of applications, and it has been focused on increasing the attention in various domain of computed vision. Abnormal detection from video stream is vigorous to guarantee the security in both outside spaces with the internal. Furthermore, the abnormal actions are really infrequent and rare, which makes the supervision process more challenging and difficult. In this research, skeleton graph-based Convolutional Neural Network (CNN) is devised for human abnormal activity detection. Here, the skeleton graph-based CNN (Skeleton graph_CNN) is devised based on the concept of classical convolution and skeleton graph generation. The human action recognition classifies the human actions into normal and abnormal class. The abnormal actions from the recognized outcome are detected with Skeleton graph_CNN, which provides the various actions of human as an output. The Skeleton graph_CNNgenerates the skeleton shaped human structure by connecting the joints within the frame to consecutive frames. Moreover, the HAR is carried out using IITB-Corridor Dataset based on metrics, such as testing accuracy of 0.961, sensitivity of 0.956 and specificity of 0.960, correspondingly.

  • Research Article
  • Cite Count Icon 6
  • 10.1088/1361-6501/ad73f1
Temporal convolutional network with soft threshold and contractile self-attention mechanism for remaining useful life prediction of rolling bearings
  • Sep 5, 2024
  • Measurement Science and Technology
  • Hao Ma + 4 more

Remaining useful life (RUL) prediction is an effective approach to prevent system failures and reduce maintenance expenditures. Due to the wide receptive field and the avoidance of future information leakage, the temporal convolutional network (TCN) is widely applied for RUL estimation of bearings. However, the predictive performance of TCN is limited by the loss of degradation features and the breakdown of continuity in timing information. To overcome the above defects, a hybrid temporal convolutional network with soft threshold and contractile self-attention mechanism (HTCN-SC) is proposed. Firstly, the adaptive threshold is determined by the contraction self-attention mechanism with higher interpretability, which captures the contribution of different features to the estimation of RUL. Then, the soft threshold is employed to activate the degraded features. On the one hand, the degeneracy features endowed by the dilated causal convolution with obvious negative values are fully preserved. On the other hand, the noise components that are given low weights are completely suppressed compared to the original TCN. Finally, parallel branch composed of one-dimensional convolutional networks are used to supplement the continuity of time series. Degradation signals from different working conditions and bearings are employed to verify the performance of the HTCN-SC. The results indicate that HTCN-SC with accurate RUL estimation and generalization ability is an effective tool for rolling bearing health monitoring.

  • Research Article
  • Cite Count Icon 6
  • 10.3390/electronics12051115
Deep-Learning-Based Sequence Causal Long-Term Recurrent Convolutional Network for Data Fusion Using Video Data
  • Feb 24, 2023
  • Electronics
  • Daehyeon Jeon + 1 more

The purpose of AI-Based schemes in intelligent systems is to advance and optimize system performance. Most intelligent systems adopt sequential data types derived from such systems. Realtime video data, for example, are continuously updated as a sequence to make necessary predictions for efficient system performance. The majority of deep-learning-based network architectures such as long short-term memory (LSTM), data fusion, two streams, and temporal convolutional network (TCN) for sequence data fusion are generally used to enhance robust system efficiency. In this paper, we propose a deep-learning-based neural network architecture for non-fix data that uses both a causal convolutional neural network (CNN) and a long-term recurrent convolutional network (LRCN). Causal CNNs and LRCNs use incorporated convolutional layers for feature extraction, so both architectures are capable of processing sequential data such as time series or video data that can be used in a variety of applications. Both architectures also have extracted features from the input sequence data to reduce the dimensionality of the data and capture the important information, and learn hierarchical representations for effective sequence processing tasks. We have also adopted a concept of series compact convolutional recurrent neural network (SCCRNN), which is a type of neural network architecture designed for processing sequential data combined by both convolutional and recurrent layers compactly, reducing the number of parameters and memory usage to maintain high accuracy. The architecture is challenge-able and suitable for continuously incoming sequence video data, and doing so allowed us to bring advantages to both LSTM-based networks and CNNbased networks. To verify this method, we evaluated it through a sequence learning model with network parameters and memory that are required in real environments based on the UCF-101 dataset, which is an action recognition data set of realistic action videos, collected from YouTube with 101 action categories. The results show that the proposed model in a sequence causal long-term recurrent convolutional network (SCLRCN) provides a performance improvement of at least 12% approximately or more to be compared with the existing models (LRCN and TCN).

  • Research Article
  • Cite Count Icon 1
  • 10.1158/1538-7445.am2022-5053
Abstract 5053: Artificial intelligence (AI)-based multimodal framework predicts androgen-deprivation therapy (ADT) outcomes in non-metastatic castration resistant prostate cancer (nmCRPC) from SPARTAN
  • Jun 15, 2022
  • Cancer Research
  • Pooya Mobadersany + 12 more

Objective: AI has demonstrated great promise in learning sophisticated features and relations in data that would otherwise remain hidden to the human eye. Here, we developed a proprietary AI-based multimodal approach to integrate clinical, digitized hematoxylin-eosin (H&E), and radiology bone scan (rBS) data for outcome prediction in ADT-treated nmCRPC patients. Identifying prostate cancer patients who may not benefit from ADT could improve the medical management of this disease beyond current definitive therapy. Methods: Patients in the ADT+placebo arm from SPARTAN clinical trial on nmCRPC with available clinical, H&E, and rBS were used (n=154). These patients were randomly divided into 70% (n=107) discovery and 30% (n=47) hold-out test datasets. Using the discovery set, we developed and trained a multimodal approach that combines survival convolutional neural networks (SCNNs1) and Cox proportional-hazards model (CPH) to learn ADT outcomes for overall survival (OS) and time to PSA progression (TTP) from the integration of imaging data and 11 traditional clinical features (e.g., tumor stage, Gleason score, PSA). The ability of the trained framework in predicting outcomes and risk stratification was evaluated on the hold-out set. Bootstrap analysis with Wilcoxon signed rank test was used to determine the significance of the multimodal framework’s performance improvement compared to clinical CPH. Results: The multimodal framework was predictive of ADT outcomes for OS and TTP in nmCRPC patients. In SPARTAN’s hold-out set, the multimodal framework significantly improved the predictive power of clinical CPH by 14%—16% across both outcomes (Wilcoxon signed rank P<0.0001). In particular, the multimodal framework’s concordance index (c-index) was 0.72 for OS and 0.73 for TTP, while clinical CPH’s c-index was 0.62 for OS, and 0.64 for TTP. Further, multimodal framework significantly stratified high- from low-risk nmCRPC patients for OS and TTP (log-rank P= 0.0049-0.0072), while clinical CPH failed to stratify risk for OS (log-rank P= 0.2891). Conclusion: AI-based framework that learns from the integration of different data types improves outcome prediction in ADT-treated nmCRPC. The multimodal approach demonstrates promise in treatment decision support for the early use of androgen receptor-directed therapy and patient selection for clinical trials with novel treatment combinations. Reference: 1. Mobadersany, Pooya, et al. "Predicting cancer outcomes from histology and genomics using convolutional networks." Proceedings of the National Academy of Sciences 115.13 (2018): E2970-E2979. Conflict of interest statement P.M., J.L., D.G., C.A., S.M., S.B., M.K.Y., K.T., N.H., J.Z., J.G., N.K., and S.S.F.Y., are employees of Janssen Pharmaceutical, LLC. Citation Format: Pooya Mobadersany, Justin Lucas, Darshana Govind, Clemente Aguilar-Bonavides, Sharon McCarthy, Sabine Brookman-May, Margaret K. Yu, Ken Tian, Natalie Hutnick, Jose Zamalloa, Joel Greshock, Najat Khan, Stephen S.F. Yip. Artificial intelligence (AI)-based multimodal framework predicts androgen-deprivation therapy (ADT) outcomes in non-metastatic castration resistant prostate cancer (nmCRPC) from SPARTAN [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 5053.

  • Research Article
  • Cite Count Icon 6
  • 10.1088/1361-6501/ad8add
A multi-scale temporal convolutional capsule network with parameter-free attention module-dynamic routing for intelligent diagnosis of rolling bearing
  • Nov 19, 2024
  • Measurement Science and Technology
  • Yulin Jin + 3 more

We proposed a multi-scale temporal convolutional capsule network model coupled with a parameter-free attention module and dynamic routing mechanism to analyze complex vibration signals for diagnosing the health of bearings. The proposed method utilizes a capsule network as the fundamental architecture. Instead of a convolutional neural network, a temporal convolutional network is employed. Additionally, a multi-scale feature fusion module is integrated into the capsule network structure to dynamically extract various layers of features from fault samples, enhancing the discriminatory capability of abnormal data. Subsequently, the parameter-free attention module and dynamic routing mechanism are employed to construct digital capsules. This allows the smallest unit capsule in a single layer to carry more information, enhance the similarity between the instance primary capsule and the fault capsule, reduce the interference of irrelevant features to the model, and improve the accuracy of fault type recognition. Finally, a multi-scale temporal convolutional capsule network model that integrates feature extraction and pattern recognition is established to perform end-to-end diagnosis of the bearing. Experimental findings suggest that the proposed method outperforms other deep learning methods in terms of accuracy and robustness. It can provide a theoretical basis and implementation path for the detection and diagnosis of train wheelset bearing time series abnormal data.

  • Research Article
  • Cite Count Icon 8
  • 10.3934/era.2023135
Systemic risk prediction based on Savitzky-Golay smoothing and temporal convolutional networks
  • Jan 1, 2023
  • Electronic Research Archive
  • Xite Yang + 4 more

<abstract><p>Based on the data from January 2007 to December 2021, this paper selects 14 representatives from four levels of the extreme risk of financial institutions, the contagion effect between financial systems, volatility and instability of financial markets, liquidity, and credit risk systemic risk. By constructing a Savitzky-Golay-TCN deep convolutional neural network, the systemic risk indicators of China's financial market are predicted, and their accuracy and reliability are analyzed. The research found that: 1) Savitzky-Golay-TCN deep convolutional neural network has a strong generalization ability, and the prediction effect on all indices is stable. 2) Compared with the three control models (time-series convolutional network (TCN), convolutional neural network (CNN), and long short-term memory (LSTM)), the Savitzky-Golay-TCN deep convolutional neural network has excellent prediction accuracy, and its average prediction accuracy for all indices has increased. 3) Savitzky-Golay-TCN deep convolutional neural network can better monitor financial market changes and effectively predict systemic risk.</p></abstract>

  • Research Article
  • Cite Count Icon 25
  • 10.1186/s12938-024-01244-w
Automatic detection of epilepsy from EEGs using a temporal convolutional network with a self-attention layer
  • Jun 1, 2024
  • BioMedical Engineering OnLine
  • Leen Huang + 4 more

BackgroundOver 60% of epilepsy patients globally are children, whose early diagnosis and treatment are critical for their development and can substantially reduce the disease’s burden on both families and society. Numerous algorithms for automated epilepsy detection from EEGs have been proposed. Yet, the occurrence of epileptic seizures during an EEG exam cannot always be guaranteed in clinical practice. Models that exclusively use seizure EEGs for detection risk artificially enhanced performance metrics. Therefore, there is a pressing need for a universally applicable model that can perform automatic epilepsy detection in a variety of complex real-world scenarios.MethodTo address this problem, we have devised a novel technique employing a temporal convolutional neural network with self-attention (TCN-SA). Our model comprises two primary components: a TCN for extracting time-variant features from EEG signals, followed by a self-attention (SA) layer that assigns importance to these features. By focusing on key features, our model achieves heightened classification accuracy for epilepsy detection.ResultsThe efficacy of our model was validated on a pediatric epilepsy dataset we collected and on the Bonn dataset, attaining accuracies of 95.50% on our dataset, and 97.37% (A v. E), and 93.50% (B vs E), respectively. When compared with other deep learning architectures (temporal convolutional neural network, self-attention network, and standardized convolutional neural network) using the same datasets, our TCN-SA model demonstrated superior performance in the automated detection of epilepsy.ConclusionThe proven effectiveness of the TCN-SA approach substantiates its potential as a valuable tool for the automated detection of epilepsy, offering significant benefits in diverse and complex real-world clinical settings.

  • Research Article
  • Cite Count Icon 26
  • 10.1109/access.2022.3219490
Convolution-Bidirectional Temporal Convolutional Network for Protein Secondary Structure Prediction
  • Jan 1, 2022
  • IEEE Access
  • Yunqing Zhang + 2 more

As a basic feature extraction method, convolutional neural networks have some information loss problems when dealing with sequence problems, and a temporal convolutional network can compensate for this problem. Howerover, ordinary temporal convolutional networks can not deal well protein secondary structure prediction because of their one-way analysis. Therefore, we propose an integrated deep learning model called Convoluntional-Bidirectional Temporal Convolutional Network. for 3-state and 8-state protein secondary structure predictions based on a convolutional neural network and bidirectional temporal convolutional networks. Because the model combines the advantages of the convolutional neural network and bidirectional temporal convolution network, it can not only capture the local correlation in the amino acid sequence but also analyse the long-distance interaction in the amino acid sequence. Therefore, this model can effectively improve the accuracy of protein secondary structure predictions. The experimental results show that the combination of convolutional neural network and bidirectional temporal convolutional network is effective for predicting protein secondary structure.

  • Research Article
  • Cite Count Icon 16
  • 10.3233/jifs-210970
DPTCN: A novel deep CNN model for short text classification
  • Dec 16, 2021
  • Journal of Intelligent & Fuzzy Systems
  • Shujuan Yu + 4 more

As an important branch of Nature Language Processing (NLP), how to extract useful text information and effective long-range associations has always been a bottleneck for text classification. With the great effort of deep learning researchers, deep Convolutional Neural Networks (CNNs) have made remarkable achievements in Computer Vision but still controversial in NLP tasks. In this paper, we propose a novel deep CNN named Deep Pyramid Temporal Convolutional Network (DPTCN) for short text classification, which is mainly consisting of concatenated embedding layer, causal convolution, 1/2 max pooling down-sampling and residual blocks. It is worth mentioning that our work was highly inspired by two well-designed models: one is temporal convolutional network for sequential modeling; another is deep pyramid CNN for text categorization; as their applicability and pertinence remind us how to build a model in a special domain. In the experiments, we evaluate the proposed model on 7 datasets with 6 models and analyze the impact of three different embedding methods. The results prove that our work is a good attempt to apply word-level deep convolutional network in short text classification.

  • Research Article
  • Cite Count Icon 7
  • 10.20965/jaciii.2024.p0552
Basketball Sports Posture Recognition Technology Based on Improved Graph Convolutional Neural Network
  • May 20, 2024
  • Journal of Advanced Computational Intelligence and Intelligent Informatics
  • Jinmao Tong + 1 more

Basketball has rapidly developed in recent years. Analysis of various moves in basketball can provide technical references for professional players and assist referees in judging games. Traditional technology can no longer provide modern basketball players with theoretical support. Therefore, using intelligent methods to recognize human body postures in basketball was a relatively innovative approach. To be able to recognize the basketball sports posture of players more accurately, the experiment proposes a basketball stance recognition model based on enhanced graph convolutional networks (GCN), that is, the basketball stance recognition model based on enhanced GCN and spatial temporal graph convolutional network (ST-GCN) model. This model combines the respective advantages of the GCN and temporal convolutional network and can handle graph-structured data with time-series relationships. The ST-GCN can be further deduced by realizing the convolution operation of the graph structure and establishing a spatiotemporal graph convolution model for the posture sequence of a person’s body. A dataset of technical basketball actions is constructed to verify the effectiveness of the ST-GCN model. The final experimental findings indicated that the final recognition accuracy of the ST-GCN model for basketball postures was approximately 95.58%, whereas the final recognition accuracy of the long short term memory + multiview re-observation skeleton action recognition (LSTM+MV+AC) model was about 93.65%.

  • Research Article
  • Cite Count Icon 5
  • 10.1190/geo2024-0417.1
Nash-multitask learning-semisupervised temporal convolutional network method for prestack three-parameter inversion
  • Jan 7, 2025
  • Geophysics
  • Yingtian Liu + 6 more

Deep-learning techniques have been widely used in prestack three-parameter inversions to address ill-posed problems. Among these techniques, multitask learning (MTL) methods can simultaneously train multiple tasks, enhancing model generalization and predictive performance. However, existing MTL methods typically adopt heuristic or nonheuristic approaches to jointly update the gradient of each task, which often leads to gradient conflicts between different tasks, reducing inversion accuracy. To address this issue, we develop a semisupervised temporal convolutional network (STCN) method based on Nash equilibrium, referred to as the Nash-MTL-STCN method. First, temporal convolutional networks with noncausal convolution and convolutional neural networks (CNNs) are used as multitask layers to extract shared features from partial angle stack seismic data, with CNNs serving as the single-task layer. Subsequently, a feature mechanism is used to extract shared features in the multitask layer through hierarchical processing, and the gradient combination of these shared features is treated as a Nash game for the optimization of strategy and joint updates. This approach maximizes the overall utility of the three-parameter inversion while alleviating gradient conflicts. In addition, to enhance the generalization and stability of the network, we incorporate geophysical forward modeling and low-frequency constraints into the network. Experimental results demonstrate that our method resolves the gradient conflict issue associated with conventional MTL methods with constant weights and achieves higher precision than four widely used nonheuristic MTL methods. Further experiments using field data also validate the effectiveness of our method.

  • Research Article
  • Cite Count Icon 64
  • 10.1016/j.neunet.2023.12.016
GT-LSTM: A spatio-temporal ensemble network for traffic flow prediction
  • Dec 10, 2023
  • Neural networks : the official journal of the International Neural Network Society
  • Yong Luo + 4 more

GT-LSTM: A spatio-temporal ensemble network for traffic flow prediction

  • Conference Article
  • Cite Count Icon 42
  • 10.1109/iros45743.2020.9341327
Multiple Trajectory Prediction with Deep Temporal and Spatial Convolutional Neural Networks
  • Oct 24, 2020
  • Jan Strohbeck + 6 more

Automated vehicles need to not only perceive their environment, but also predict the possible future behavior of all detected traffic participants in order to safely navigate in complex scenarios and avoid critical situations, ranging from merging on highways to crossing urban intersections. Due to the availability of datasets with large numbers of recorded trajectories of traffic participants, deep learning based approaches can be used to model the behavior of road users. This paper proposes a convolutional network that operates on rasterized actor-centric images which encode the static and dynamic actor-environment. We predict multiple possible future trajectories for each traffic actor, which include position, velocity, acceleration, orientation, yaw rate and position uncertainty estimates. To make better use of the past movement of the actor, we propose to employ temporal convolutional networks (TCNs) and rely on uncertainties estimated from the previous object tracking stage. We evaluate our approach on the public "Argoverse Motion Forecasting" dataset, on which it won the first prize at the Argoverse Motion Forecasting Challenge, as presented on the NeurIPS 2019 workshop on "Machine Learning for Autonomous Driving".

  • Conference Article
  • Cite Count Icon 6
  • 10.1145/3339363.3339377
Investigating Deep Neural Networks for Gravitational Wave Detection in Advanced LIGO Data
  • May 24, 2019
  • Alexander Schmitt + 3 more

Since the first detection of a Gravitational Wave (GW) in September 2015 at the Laser Interferometer Gravitational-Wave Observatory (LIGO), it was unclear if the Einstein's Theory (E=MC2) was true and if GWs existed. A huge investment in highly advanced optical and electronic equipment was taken to build huge detectors for the possibility to record something which was, until then, only a theory. The LIGO detectors are so special because they can detect a change in length by one ten-thousandth the width of a proton. These constructions are huge and need big financial and technological investments to achieve the precisions.

Save Icon
Up Arrow
Open/Close