Frame-level Features Research Articles

Group activity recognition (GAR) is an increasingly popular topic in the field of computer vision. Numerous researchers have proposed a range of methods to achieve outstanding recognition performance. However, these methods invariably require fine-grained personal feature extraction and a large network architecture to aggregate individual features or reason person relationships. To mitigate the need for a bloated portfolio of annotations and high training costs, weak supervision has emerged as a promising approach. Under the weak supervision paradigm, only coarse-grained labels are used during network training. Nevertheless, this method poses two key challenges. Firstly, it is limited in its ability to model temporal relationships among individual persons, and secondly, it tends to focus on less relevant information, thereby leading to suboptimal network parameter optimization. Both of these challenges result in erroneous temporal information judgment and training inefficiencies. To address these challenges within the weak supervision paradigm, we propose a novel Temporal Contrastive and Spatial Enhancement Coarse-Grained Network (TCSE-CGN) to solve the GAR problem. TCSE-CGN comprises two simple yet effective streams, namely the Spatial Enhancement Stream and the Temporal Contrastive Stream. After extracting features using only several RGB frames, half of the extracted feature is sent to the Spatial Enhancement Stream for enhancement using an attention mechanism. Consequently, the network automatically learns more representative information. The remaining feature is sent to the Temporal Contrastive Stream, which uses contrastive learning to model temporal relationships among all RGB frame-level features. Specifically, the network is guided to learn the hidden semantic temporal information about inter-frame sequences. Network parameters are optimized using a combination of universe cross-entropy loss and a novel temporal contrastive loss. Comprehensive experiments are conducted on two widely used datasets, namely the Volleyball dataset and the Collective dataset, to demonstrate the effectiveness of TCSE-CGN. Results show that TCSE-CGN performs competitively with other works that require more supervision and a larger architecture.

Read full abstract

Abstract Background Ulcerative Colitis (UC) endoscopic disease severity assessment is crucial in drug development and impacts clinical outcomes. We developed ArgesUCEIS, a set of three AI algorithms, to estimate UCEIS component scores that quantify the degree of bleeding, ulceration/erosion, and vascular obliteration in endoscopy videos. Our goal is to validate the locked algorithms that automate UCEIS component scoring using prospective UC clinical trial data. Methods We trained three distinct AI algorithms (ArgesUCEIS) to estimate degree of bleeding, ulceration/erosion, and vascular obliteration, respectively, based on human readings using endoscopy videos from a UC clinical trial (UNIFI: NCT02407236, Phase 3, 965 subjects, 3128 videos) where 20% of the total data was kept as a holdout set. The AI-model training had two steps: 1) training a feature extraction module using self-supervised learning (SSL); 2) training a supervised small transformer network with attention-based classifier using SSL features to estimate corresponding UCEIS component scores, with the attention layer potentially identifying representative frames. We validated the locked AI-models for UCEIS component scores using an independent UC trial (QUASAR: NCT04033445, Phase 2b induction study, 313 subjects, 615 videos), scoring baseline and week 12 endoscopic visits. AI model scores were compared to human read UCEIS component scores using AUC, Accuracy, and F1 score. Additionally, a Spearman correlation analysis assessed the relationship between UCEIS component model scores and human-read Mayo endoscopic subscores (MES) at both visit weeks. Results The locked AI models were evaluated on independent QUASAR data for all three component models, showing moderate to high performance with AUC, Accuracy, and F1 score, comparable to the UNIFI holdout set results (Table 1), indicating their generalizability. Figure 1 shows representative frames ("high attention") for each model, providing insights into the frame-level features used by ArgesUCEIS to estimate disease severity. Spearman correlation analysis indicated small to moderate correlations of bleeding, ulceration/erosion, and vascular AI models with human-read MES at baseline (0.22, 0.48, 0.37) and week 12 (0.50, 0.66, 0.61) visits, respectively, all with p-values below 0.05. Conclusion ArgesUCEIS, a set of three AI models for estimating UCEIS component scores, were validated using prospective UC clinical trial data and holdout data. They showed good performance with moderate to high AUC and accuracy, automating endoscopic disease severity assessments. They can be used alongside or independently of MES. They have potential to improve quality, efficiency, and enhance our understanding of drug's endoscopic impact in UC clinical trials.

Read full abstract

Frame-level Features Research Articles

Related Topics

Articles published on Frame-level Features

TCEDN: A Lightweight Time-Context Enhanced Depression Detection Network.

Spatiotemporal Decouple-and-Squeeze Contrastive Learning for Semisupervised Skeleton-Based Action Recognition.

Micro-expression recognition based on a novel GCN-transformer cooperation model for IoT-eHealth

FTAN: Frame-to-frame temporal alignment network with contrastive learning for few-shot action recognition

Deep Fake Video Detection

Improved Convolutional Neural Network–Time-Delay Neural Network Structure with Repeated Feature Fusions for Speaker Verification

Resformer: Local Frame-Level Feature and Global Segment-Level Feature Joint Learning for Speaker Verification

DLLBVS: Design of a High-Efficiency Deep Learning based Low-BER Video Streaming Model for High-Noise Wireless Networks

Review Paper on Deepfake Video Detection using Neural Networks

Temporal Correlation Vision Transformer for Video Person Re-Identification

Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval

DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval

SF-TMN: SlowFast temporal modeling network for surgical phase recognition.

Temporal Contrastive and Spatial Enhancement Coarse Grained Network for Weakly Supervised Group Activity Recognition

Fisher ratio-based multi-domain frame-level feature aggregation for short utterance speaker verification

P294 ArgesUCEIS: A high-performance, generalizable AI models for scoring Ulcerative Colitis Endoscopic Index of Severity (UCEIS) component scores in endoscopy videos

Involving Distinguished Temporal Graph Convolutional Networks for Skeleton-Based Temporal Action Segmentation

Detection of deep fakes using deep learning

Video-based person re-identification with complementary local and global features using a graph transformer.

Multidimensional Refinement Graph Convolutional Network With Robust Decouple Loss for Fine-Grained Skeleton-Based Action Recognition.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Frame-level Features Research Articles

Related Topics

Articles published on Frame-level Features

TCEDN: A Lightweight Time-Context Enhanced Depression Detection Network.

Spatiotemporal Decouple-and-Squeeze Contrastive Learning for Semisupervised Skeleton-Based Action Recognition.

Micro-expression recognition based on a novel GCN-transformer cooperation model for IoT-eHealth

FTAN: Frame-to-frame temporal alignment network with contrastive learning for few-shot action recognition

Deep Fake Video Detection

Improved Convolutional Neural Network–Time-Delay Neural Network Structure with Repeated Feature Fusions for Speaker Verification

Resformer: Local Frame-Level Feature and Global Segment-Level Feature Joint Learning for Speaker Verification

DLLBVS: Design of a High-Efficiency Deep Learning based Low-BER Video Streaming Model for High-Noise Wireless Networks

Review Paper on Deepfake Video Detection using Neural Networks

Temporal Correlation Vision Transformer for Video Person Re-Identification

Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval

DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval

SF-TMN: SlowFast temporal modeling network for surgical phase recognition.

Temporal Contrastive and Spatial Enhancement Coarse Grained Network for Weakly Supervised Group Activity Recognition

Fisher ratio-based multi-domain frame-level feature aggregation for short utterance speaker verification

P294 ArgesUCEIS: A high-performance, generalizable AI models for scoring Ulcerative Colitis Endoscopic Index of Severity (UCEIS) component scores in endoscopy videos

Involving Distinguished Temporal Graph Convolutional Networks for Skeleton-Based Temporal Action Segmentation

Detection of deep fakes using deep learning

Video-based person re-identification with complementary local and global features using a graph transformer.

Multidimensional Refinement Graph Convolutional Network With Robust Decouple Loss for Fine-Grained Skeleton-Based Action Recognition.