Past Frames Research Articles

Computer-aided ultrasound (US) imaging is an important prerequisite for early clinical diagnosis and treatment. Due to the harsh ultrasound (US) image quality and the blurry tumor area, recent memory-based video object segmentation models (VOS) achieve frame-level segmentation by performing intensive similarity matching among the past frames which could inevitably result in computational redundancy. Furthermore, the current attention mechanism utilized in recent models only allocates the same attention level among whole spatial-temporal memory features without making distinctions, which may result in accuracy degradation. In this paper, we first build a larger annotated benchmark dataset for breast lesion segmentation in ultrasound videos, then we propose a lightweight clip-level VOS framework for achieving higher segmentation accuracy while maintaining the speed. The Inner-Outer Clip Retformer is proposed to extract spatialtemporal tumor features in parallel. Specifically, the proposed Outer Clip Retformer extracts the tumor movement feature from past video clips to locate the current clip tumor position, while the Inner Clip Retformer detailedly extracts current tumor features that can produce more accurate segmentation results. Then a Clip Contrastive loss function is further proposed to align the extracted tumor features along both the spatial-temporal dimensions to improve the segmentation accuracy. In addition, the Global Retentive Memory is proposed to maintain the complementary tumor features with lower computing resources which can generate coherent temporal movement features. In this way, our model can significantly improve the spatial-temporal perception ability without increasing a large number of parameters, achieving more accurate segmentation results while maintaining a faster segmentation speed. Finally, we conduct extensive experiments to evaluate our proposed model on several video object segmentation datasets, the results show that our framework outperforms state-of-theart segmentation methods.

Recently, memory-based networks have achieved promising performance for video object segmentation (VOS). However, existing methods still suffer from unsatisfactory segmentation accuracy and inferior efficiency. The reasons are mainly twofold: 1) during memory construction, the inflexible memory storage mechanism results in a weak discriminative ability for similar appearances in complex scenarios, leading to video-level temporal redundancy, and 2) during memory reading, matching robustness and memory retrieval accuracy decrease as the number of video frames increases. To address these challenges, we propose an adaptive sparse memory network (ASM) that efficiently and effectively performs VOS by sparsely leveraging previous guidance while attending to key information. Specifically, we design an adaptive sparse memory constructor (ASMC) to adaptively memorize informative past frames according to dynamic temporal changes in video frames. Furthermore, we introduce an attentive local memory reader (ALMR) to quickly retrieve relevant information using a subset of memory, thereby reducing frame-level redundant computation and noise in a simpler and more convenient manner. To prevent key features from being discarded by the subset of memory, we further propose a novel attentive local feature aggregation (ALFA) module, which preserves useful cues by selectively aggregating discriminative spatial dependence from adjacent frames, thereby effectively increasing the receptive field of each memory frame. Extensive experiments demonstrate that our model achieves state-of-the-art performance with real-time speed on six popular VOS benchmarks. Furthermore, our ASM can be applied to existing memory-based methods as generic plugins to achieve significant performance improvements. More importantly, our method exhibits robustness in handling sparse videos with low frame rates.

Past Frames Research Articles

Related Topics

Articles published on Past Frames

Scalable video transformer for full-frame video prediction

Video object detection via space–time feature aggregation and result reuse

Wavelet-Driven Spatiotemporal Predictive Learning: Bridging Frequency and Time Variations

Generalizable Fourier Augmentation for Unsupervised Video Object Segmentation

An Anarchist Archaeology of Equality: Pasts and Futures Against Hierarchy

Cascaded Inner-Outer Clip Retformer for Ultrasound Video Object Segmentation.

Adaptive Sparse Memory Networks for Efficient and Robust Video Object Segmentation.

Efficient Long-Short Temporal Attention network for unsupervised Video Object Segmentation

Video anomaly detection based on cross-frame prediction mechanism and spatio-temporal memory-enhanced pseudo-3D encoder

TSDTVOS: Target-guided spatiotemporal dual-stream transformers for video object segmentation

Online Multi-Face Tracking With Multi-Modality Cascaded Matching

Rendezvous in time: an attention-based temporal fusion approach for surgical triplet recognition.

MUNet: Motion uncertainty-aware semi-supervised video object segmentation

Monocular-Vision-Based Moving Target Geolocation Using Unmanned Aerial Vehicle

Global video object segmentation with spatial constraint module

Multiple object tracking with appearance feature prediction and similarity fusion

Video Frame Prediction by Joint Optimization of Direct Frame Synthesis and Optical-Flow Estimation

Coherence-aware context aggregator for fast video object segmentation

Robust voice activity detection based on weighted average of long-term quadratic Renyi and differential entropies

Autoregressive Predictive Coding: A Comprehensive Study

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Past Frames Research Articles

Related Topics

Articles published on Past Frames

Scalable video transformer for full-frame video prediction

Video object detection via space–time feature aggregation and result reuse

Wavelet-Driven Spatiotemporal Predictive Learning: Bridging Frequency and Time Variations

Generalizable Fourier Augmentation for Unsupervised Video Object Segmentation

An Anarchist Archaeology of Equality: Pasts and Futures Against Hierarchy

Cascaded Inner-Outer Clip Retformer for Ultrasound Video Object Segmentation.

Adaptive Sparse Memory Networks for Efficient and Robust Video Object Segmentation.

Efficient Long-Short Temporal Attention network for unsupervised Video Object Segmentation

Video anomaly detection based on cross-frame prediction mechanism and spatio-temporal memory-enhanced pseudo-3D encoder

TSDTVOS: Target-guided spatiotemporal dual-stream transformers for video object segmentation

Online Multi-Face Tracking With Multi-Modality Cascaded Matching

Rendezvous in time: an attention-based temporal fusion approach for surgical triplet recognition.

MUNet: Motion uncertainty-aware semi-supervised video object segmentation

Monocular-Vision-Based Moving Target Geolocation Using Unmanned Aerial Vehicle

Global video object segmentation with spatial constraint module

Multiple object tracking with appearance feature prediction and similarity fusion

Video Frame Prediction by Joint Optimization of Direct Frame Synthesis and Optical-Flow Estimation

Coherence-aware context aggregator for fast video object segmentation

Robust voice activity detection based on weighted average of long-term quadratic Renyi and differential entropies

Autoregressive Predictive Coding: A Comprehensive Study