Discovery Logo
Sign In
Search
Paper
Search Paper
R Discovery for Libraries Pricing Sign In
  • Home iconHome
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Literature Review iconLiterature Review NEW
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
Discovery Logo menuClose menu
  • Home iconHome
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Literature Review iconLiterature Review NEW
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
features
  • Audio Papers iconAudio Papers
  • Paper Translation iconPaper Translation
  • Chrome Extension iconChrome Extension
Content Type
  • Journal Articles iconJournal Articles
  • Conference Papers iconConference Papers
  • Preprints iconPreprints
  • Seminars by Cassyni iconSeminars by Cassyni
More
  • R Discovery for Libraries iconR Discovery for Libraries
  • Research Areas iconResearch Areas
  • Topics iconTopics
  • Resources iconResources

Related Topics

  • Video Scene
  • Video Scene
  • Video Segmentation
  • Video Segmentation
  • Video Sequences
  • Video Sequences
  • Video Shot
  • Video Shot
  • Semantic Video
  • Semantic Video

Articles published on Video Object

Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
1684 Search results
Sort by
Recency
  • New
  • Research Article
  • 10.1016/j.dsp.2026.106006
OVSMMFA-Net: An object variation sensitive and multi-direction mamba based feature aggregation network for video object detection
  • May 1, 2026
  • Digital Signal Processing
  • Tingting Yao + 5 more

OVSMMFA-Net: An object variation sensitive and multi-direction mamba based feature aggregation network for video object detection

  • Research Article
  • 10.1145/3803013
A Simple Switchable Framework for Open-Vocabulary Video Instance Segmentation
  • Apr 20, 2026
  • ACM Transactions on Multimedia Computing, Communications, and Applications
  • Feng Zhu + 2 more

Recently, the challenging task of Open-Vocabulary Video Instance Segmentation (OVVIS) has been proposed. The OVVIS task requires simultaneously classifying, segmenting, and tracking objects in videos from an open set of categories, including novel categories unseen during training. Previous approaches typically rely on universal object proposals, memory-induced tracking, and open-vocabulary classification, which are often incompatible with established VIS and open-vocabulary segmentation methods. Observing that recent VIS methods share a common architecture decomposed into a segmenter and a tracker, we design a simple yet effective Switchable Open-vocabulary VIS (SOV) framework. SOV consists of an Open-Vocabulary Segmenter and a Dual Memory Tracker. The segmenter incorporates a frozen CLIP vision encoder as the backbone to enhance generalization on novel categories. The Dual Memory Tracker is training-free and utilizes a dual-memory mechanism to enhance tracking robustness. Moreover, we can easily switch to various trackers. Benefiting from this design, SOV can inherit advantages from state-of-the-art VIS methods. To further optimize training efficiency, we propose a progressive ”Long-Image, Short-Video” training pipeline. This strategy decouples the training process into an extensive image-level pre-training phase followed by a rapid video-level adaptation phase, significantly accelerating convergence while effectively bridging the domain gap between static images and dynamic videos. Our method outperforms previous methods by large margins on various benchmarks while maintaining faster inference speeds. Specifically, SOV achieves 38.0 mAP on the LV-VIS validation set. It also achieves strong zero-shot performance on popular VIS datasets (YTVIS19 50.9 mAP, YTVIS21 45.2 mAP, OVIS 23.1 mAP), comparable to fully-supervised methods. To further validate the flexibility of our switchable architecture, we extend SOV with the state-of-the-art CTVIS tracker, which yields improved performance (51.3 mAP) on YTVIS19. Code is available in the supplementary material.

  • Research Article
  • 10.1109/tpami.2026.3684742
Learning When and How to Update Memory for Video Object Segmentation.
  • Apr 16, 2026
  • IEEE transactions on pattern analysis and machine intelligence
  • Shengye Qiao + 4 more

Recent progress in semi-supervised video object segmentation has largely hinged on memory-based methods. However, when faced with increasingly tough challenges emerging in complex scenarios, such as fundamental semantic transformations and severe spatial deformations, the fixed-interval memory update mechanism usually adopted in these memory-based methods is insufficient to align with the pivotal moments of object changes. This inflexible mechanism motivates us to design an adaptive memory update mechanism in response to the semantic-spatial changes of target objects. To this end, we propose a novel Change-Sensitive Network (CSNet) to learn when and how to update memory to effectively address intricate challenges in complex scenarios. Specifically, wefirst design an Adaptive Perception-Capture module with a hierarchical contrastive learning loss to determine when to update memory moments by measuring the extent of object changes, thus dividing entire videos into different object-change clips. To further extract and highlight object changes to assist in the segmentation of frames after changes occur, we construct Dynamic Memory Update modules to redefine how to update memory by smoothly retaining the object prototypes within clips and dynamically amplifying the object variations across clips. Extensive experiments demonstrate that our proposed CSNet exhibits clear superiority when evaluated on eight datasets covering three kinds: common, complex and long-video datasets.

  • Research Article
  • 10.1016/j.knosys.2026.115572
MTTrack: A joint mamba-transformer framework with memory enhancement for real-time satellite remote sensing video object tracking
  • Apr 1, 2026
  • Knowledge-Based Systems
  • Guocai Du + 4 more

MTTrack: A joint mamba-transformer framework with memory enhancement for real-time satellite remote sensing video object tracking

  • Research Article
  • 10.1016/j.knosys.2026.115426
Enabling nearshore cross-modal video object detector to learn more accurate spatial and temporal information
  • Apr 1, 2026
  • Knowledge-Based Systems
  • Yuanlin Zhao + 5 more

Enabling nearshore cross-modal video object detector to learn more accurate spatial and temporal information

  • Research Article
  • Cite Count Icon 1
  • 10.1109/tmi.2025.3627954
Accelerating Volumetric Medical Image Annotation via Short-Long Memory SAM 2.
  • Apr 1, 2026
  • IEEE transactions on medical imaging
  • Yuwen Chen + 7 more

Manual annotation of volumetric medical images, such as magnetic resonance imaging (MRI) and computed tomography (CT), is a labor-intensive and time-consuming process. Recent advancements in foundation models for video object segmentation, such as Segment Anything Model 2 (SAM 2), offer a potential opportunity to significantly speed up the annotation process by manually annotating one or a few slices and then propagating target masks across the entire volume. However, the performance of SAM 2 in this context varies. Our experiments show that relying on a single memory bank and attention module is prone to error propagation, particularly at boundary regions where the target is present in the previous slice but absent in the current one. To address this problem, we propose Short-Long Memory SAM 2 (SLM-SAM 2), a novel architecture that integrates distinct short-term and long-term memory banks with separate attention modules to improve segmentation accuracy. We evaluate SLM-SAM 2 on four public datasets covering organs, bones, and muscles across MRI, CT, and ultrasound videos. We show that the proposed method markedly outperforms the default SAM 2, achieving an average Dice Similarity Coefficient improvement of 0.14 and 0.10 in the scenarios when 5 volumes and 1 volume are available for the initial adaptation, respectively. SLM-SAM 2 also exhibits stronger resistance to over-propagation, reducing the time required to correct propagated masks by 60.575% per volume compared to SAM 2, making a notable step toward more accurate automated annotation of medical images for segmentation model development.

  • Research Article
  • 10.3390/app16062934
ZoomPatch: An Adaptive PTZ Scheduling Framework for Small Object Video Analytics
  • Mar 18, 2026
  • Applied Sciences
  • Shutong Chen + 2 more

Accurate detection of small objects in video analytics is limited by low pixel resolution and insufficient visual cues. While software-based enhancements often fail to recover missing details, Pan–Tilt–Zoom (PTZ) cameras can physically increase spatial resolution through optical zoom. However, mechanical latency and configuration complexity hinder their real-time applicability. We propose ZoomPatch, a real-time video analytics framework tailored for small object detection. ZoomPatch actively schedules PTZ adjustments to capture optically enhanced subframes of regions of interest (ROIs) and fuses inference results back to the global reference frame. Specifically, it introduces a dynamic Cycle Length Proposer to adapt analysis cycles based on scene motion, and a Mixed Integer Linear Programming (MILP)-based Configuration Decider to determine the optimal sequence of pan, tilt, and zoom adjustments under time budget constraints. Simulation-based experimental evaluations across diverse workloads demonstrate that ZoomPatch significantly outperforms fixed-perspective, super-resolution (SR), and greedy baselines. Notably, in the detection task using YOLOv10, ZoomPatch improves the F1-score from 0.33 to 0.47 (a 42% increase) compared to the fixed-perspective baseline. Furthermore, ZoomPatch yields performance gains of 30% and 7% over the SR baseline (0.36) and the greedy baseline (0.44).

  • Research Article
  • 10.3791/69299
Behavioral Engagement Assessment in University Classrooms via Deep Learning-based Video Object Detection.
  • Mar 17, 2026
  • Journal of visualized experiments : JoVE
  • Miaomiao Feng + 1 more

This study aims to assess students' learning engagement in university classrooms using deep learning-based video object detection. To do so, via correlation analysis, this research first identified seven classroom behaviors presenting highly positive correlation with learning engagement as indicators to measure students' learning engagement; then it collected 30 synchronized videos of real classroom teaching from 6 classes from Shandong University of Science and Technology (SDUST) and divided them into a training set and a test set. After the seven behaviors were manually annotated in the training data, a machine learning algorithm was then trained in a supervised manner on this set. Once trained, the model generated initial annotations for the remaining unlabeled data. To achieve more accurate and efficient classroom behavior recognition, this study selected two representative algorithms, namely, Faster R-CNN and YOLOv5s, for behavior detection experiments. Based on a comparison of their detection performance in terms of accuracy and time cost, YOLOv5s was selected for classroom behavior detection in this study. Finally, this study used the focus group method to assign scores to each behavior and develop a three-level learning engagement scoring model. Based on automatically measured behavioral data, the model enables real-time, automatic assessment of learning engagement at both the individual and class levels.

  • Research Article
  • 10.1016/j.media.2025.103904
Tracking spatial temporal details in ultrasound long video via wavelet analysis and memory bank.
  • Mar 1, 2026
  • Medical image analysis
  • Chenxiao Zhang + 2 more

Tracking spatial temporal details in ultrasound long video via wavelet analysis and memory bank.

  • Research Article
  • 10.1016/j.neunet.2026.108808
Few-shot video object segmentation in X-ray angiography using local matching and spatio-temporal consistency loss.
  • Mar 1, 2026
  • Neural networks : the official journal of the International Neural Network Society
  • Lin Xi + 2 more

High-quality, densely annotated data serve as a crucial foundation for developing robust X-ray angiography segmentation models. However, obtaining per-object pixel-level annotations in the medical domain is both expensive and time-consuming, often requiring close collaboration between clinical experts and developers. This paper aims to reduce the annotation costs of X-ray angiography videos by leveraging few-shot video object segmentation (FSVOS), which separates target objects from the background using only a single annotated frame during inference. We introduce a novel FSVOS model that employs a local matching strategy to restrict the search space to the most relevant neighboring pixels. Rather than relying on inefficient standard im2col-like implementations (e.g., spatial convolutions, depthwise convolutions and feature-shifting mechanisms) or hardware-specific CUDA kernels (e.g., deformable and neighborhood attention), which often suffer from limited portability across non-CUDA devices, we reorganize the local sampling process through a direction-based sampling perspective. Specifically, we implement a non-parametric sampling mechanism that enables dynamically varying sampling regions. This approach provides the flexibility to adapt to diverse spatial structures without the computational costs of parametric layers and the need for model retraining. To further enhance feature coherence across frames, we design a supervised spatio-temporal contrastive learning scheme that enforces consistency in feature representations. In addition, we introduce a publicly available benchmark dataset for multi-object segmentation in X-ray angiography videos (MOSXAV), featuring detailed, manually labeled segmentation ground truth. Extensive experiments on the CADICA, XACV, and MOSXAV datasets show that our proposed FSVOS method outperforms current state-of-the-art video segmentation methods in terms of segmentation accuracy and generalization capability (i.e., seen and unseen categories). This work offers enhanced flexibility and potential for a wide range of clinical applications. Code is available at: https://github.com/xilin-x/XRAVOS.

  • Research Article
  • 10.1016/j.imavis.2026.105945
STSim-Mamb: A spatiotemporal similarity learning framework for unsupervised video object segmentation
  • Mar 1, 2026
  • Image and Vision Computing
  • Maojin Sun + 1 more

STSim-Mamb: A spatiotemporal similarity learning framework for unsupervised video object segmentation

  • Research Article
  • 10.1016/j.knosys.2026.115323
Adaptive region encoding for efficient video object detection in edge computing
  • Mar 1, 2026
  • Knowledge-Based Systems
  • Lisha Gao + 6 more

Adaptive region encoding for efficient video object detection in edge computing

  • Research Article
  • 10.1016/j.image.2025.117456
Video object segmentation based on feature compression and attention correction
  • Mar 1, 2026
  • Signal Processing: Image Communication
  • Zhiqiang Hou + 5 more

Video object segmentation based on feature compression and attention correction

  • Research Article
  • 10.3390/technologies14030142
SoccerDETR: Real-Time Soccer Object Detection via Visual State Space Models with Semantic-Aware Feature Fusion
  • Feb 27, 2026
  • Technologies
  • Dongyang Zhou + 1 more

Real-time object detection in soccer videos presents significant challenges due to the dynamic nature of matches, varying object scales, and the stringent requirement for efficient processing. In this work, we define real-time detection as that which achieves inference speeds of at least 30 frames per second (FPS), which is the minimum requirement for smooth video processing and live broadcast applications. While transformer-based detectors have achieved remarkable accuracy, their quadratic computational complexity limits their real-time applications. In this paper, we propose SoccerDETR, a novel real-time detection framework that integrates MobileMamba-based visual state space models with an efficient transformer encoder for soccer object detection. Our approach introduces four key innovations: (1) a MobileMamba backbone leveraging selective state space modeling to achieve linear computational complexity while maintaining global receptive fields; (2) a Semantic-aware Dynamic Feature Fusion Module (SDFM) that adaptively aggregates multi-scale features through progressive semantic injection; (3) a Spatial-Channel Synergistic Attention (SCSA) mechanism that explores the synergistic effects between spatial and channel attention for enhanced feature representation; and (4) a Separable Dynamic Decoder that employs dynamic convolution attention to replace traditional cross-attention, significantly reducing computational overhead. Additionally, we design a Scale-Aware Focal Loss (SAFL) that addresses the class imbalance and scale variation problems inherent in soccer scenarios. Extensive experiments on the Soccana and SoccerNet datasets demonstrate that SoccerDETR achieves state-of-the-art performance with 94.2% mAP@50 on Soccana and 91.8% mAP@50 on SoccerNet, while maintaining real-time inference speed of 78 FPS on a single NVIDIA RTX 4090 GPU with a batch size of 1 and an input resolution 640 × 640. Our method outperforms existing approaches by 2.3–5.7% in mAP while being 1.5–3.2× faster, demonstrating the effectiveness of state space models for efficient sports video object detection. Comprehensive ablation studies validate the effectiveness of each proposed component, and cross-dataset experiments demonstrate strong generalization capability.

  • Research Article
  • 10.1007/s11042-026-21444-x
An improved semi-supervised video object segmentation and tracking algorithm for real-time applications
  • Feb 26, 2026
  • Multimedia Tools and Applications
  • Han Wu + 1 more

An improved semi-supervised video object segmentation and tracking algorithm for real-time applications

  • Research Article
  • 10.1145/3790093
Few-Shot Learning in Video and 3D Object Detection: A Survey
  • Feb 23, 2026
  • ACM Computing Surveys
  • Md Meftahul Ferdaus + 4 more

Few-shot learning (FSL) and data-efficient learning paradigms enable object detection models to recognize novel classes from minimally annotated examples, addressing expensive data-labeling challenges. This systematic survey examines recent advances in few-shot, semi-supervised, sparsely-supervised, and weakly-supervised approaches for video and 3D object detection, focusing on developments through foundation models and vision-language model integration. For video object detection, techniques including tube proposals, temporal matching networks, motion-guided approaches, and temporal consistency-based semi-supervised methods utilize spatiotemporal relationships for efficient novel class adaptation, with recent architectures achieving substantial gains from 33 to 48 average precision in few-shot scenarios. For 3D object detection, specialized approaches address point cloud sparsity and texture limitations through uncertainty-aware methods, geometric learning, and multimodal fusion, with sparsely-supervised techniques achieving competitive performance using only 2% of annotations, enabling practical deployment in autonomous driving and robotics. The survey analyzes methodological advances including meta-learning, transfer learning, pseudo-label generation, contrastive instance mining, and foundation model integration across applications spanning autonomous driving, surveillance, robotics, industrial control, and medical imaging. By examining developments across multiple supervision paradigms, this work highlights data-efficient learning’s potential for minimizing annotation requirements and enabling robust real-world deployment across temporal, spatial, and multimodal domains.

  • Research Article
  • 10.1016/j.neunet.2026.108705
TransUTD: Underwater cross-domain collaborative spatial-temporal transformer detector.
  • Feb 10, 2026
  • Neural networks : the official journal of the International Neural Network Society
  • Bingxun Zhao + 3 more

TransUTD: Underwater cross-domain collaborative spatial-temporal transformer detector.

  • Research Article
  • 10.1007/s10278-026-01855-w
Network for Real-time Laryngeal Lesions Video Object Detection.
  • Feb 6, 2026
  • Journal of imaging informatics in medicine
  • Yan Wang + 8 more

Early and accurate diagnosis of nasopharyngeal-laryngeal tumors is critical for improving patient prognosis. Deep learning methods have achieved significant progress in the automatic detection of lesions in static endoscopic images. However, during nasopharyngeal-laryngeal endoscopy, the quality of endoscopic videos often suffers from motion blur, uneven exposure, and reflective artifacts, which adversely affect the performance of existing static image detectors. Therefore, we propose a novel two-stage video lesion detection network, DynSTPN, to address the challenge of lesion detection in complex scenarios. First, in the prompt generation network stage, we design a dynamic prompt generator that generates discriminative prompt based on spatio-temporal feature representations of reference frames to mitigate quality degradation in inference frames. Second, at the object detection network stage, we introduce an adaptive differentiable gating mechanism to integrate reference frames' prompt information, dynamically adjusting the enhancement effect of reference frames on the inference frame. Experiments were conducted on two datasets: the self-constructed four-category nasopharyngeal-laryngeal lesion video object detection (NLLVOD) and the publicly available ImageNet VID dataset. Compared to state-of-the-art (SOTA) methods, DynSTPN achieved the best balance between detection accuracy and efficiency on the VID dataset. On the NLLVOD dataset, DynSTPN achieved a superior detection accuracy of 79.6% and speed of 29.4 FPS, meeting the real-time requirements for clinical applications. These results significantly outperform SOTA static image detector, YOLOv12-M. Experimental results demonstrate that DynSTPN effectively leverages information from video reference frames to enhance detection performance, achieving superior accuracy compared to SOTA image/video methods, thereby offering enhanced clinical applicability.

  • Research Article
  • 10.1016/j.ecoinf.2026.103674
Towards automated bycatch monitoring: Optimizing and evaluating multi-object tracking of salmon in pollock trawls
  • Feb 1, 2026
  • Ecological Informatics
  • Moses Lurbur + 2 more

Towards automated bycatch monitoring: Optimizing and evaluating multi-object tracking of salmon in pollock trawls

  • Research Article
  • 10.1007/s11263-025-02700-3
Practical Video Object Detection via Feature Selection and Aggregation
  • Jan 30, 2026
  • International Journal of Computer Vision
  • Yuheng Shi + 2 more

Practical Video Object Detection via Feature Selection and Aggregation

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • .
  • .
  • .
  • 10
  • 1
  • 2
  • 3
  • 4
  • 5

Popular topics

  • Latest Artificial Intelligence papers
  • Latest Nursing papers
  • Latest Psychology Research papers
  • Latest Sociology Research papers
  • Latest Business Research papers
  • Latest Marketing Research papers
  • Latest Social Research papers
  • Latest Education Research papers
  • Latest Accounting Research papers
  • Latest Mental Health papers
  • Latest Economics papers
  • Latest Education Research papers
  • Latest Climate Change Research papers
  • Latest Mathematics Research papers

Most cited papers

  • Most cited Artificial Intelligence papers
  • Most cited Nursing papers
  • Most cited Psychology Research papers
  • Most cited Sociology Research papers
  • Most cited Business Research papers
  • Most cited Marketing Research papers
  • Most cited Social Research papers
  • Most cited Education Research papers
  • Most cited Accounting Research papers
  • Most cited Mental Health papers
  • Most cited Economics papers
  • Most cited Education Research papers
  • Most cited Climate Change Research papers
  • Most cited Mathematics Research papers

Latest papers from journals

  • Scientific Reports latest papers
  • PLOS ONE latest papers
  • Journal of Clinical Oncology latest papers
  • Nature Communications latest papers
  • BMC Geriatrics latest papers
  • Science of The Total Environment latest papers
  • Medical Physics latest papers
  • Cureus latest papers
  • Cancer Research latest papers
  • Chemosphere latest papers
  • International Journal of Advanced Research in Science latest papers
  • Communication and Technology latest papers

Latest papers from institutions

  • Latest research from French National Centre for Scientific Research
  • Latest research from Chinese Academy of Sciences
  • Latest research from Harvard University
  • Latest research from University of Toronto
  • Latest research from University of Michigan
  • Latest research from University College London
  • Latest research from Stanford University
  • Latest research from The University of Tokyo
  • Latest research from Johns Hopkins University
  • Latest research from University of Washington
  • Latest research from University of Oxford
  • Latest research from University of Cambridge

Popular Collections

  • Research on Reduced Inequalities
  • Research on No Poverty
  • Research on Gender Equality
  • Research on Peace Justice & Strong Institutions
  • Research on Affordable & Clean Energy
  • Research on Quality Education
  • Research on Clean Water & Sanitation
  • Research on COVID-19
  • Research on Monkeypox
  • Research on Medical Specialties
  • Research on Climate Justice
Discovery logo
FacebookTwitterLinkedinInstagram

Download the FREE App

  • Play store Link
  • App store Link
  • Scan QR code to download FREE App

    Scan to download FREE App

  • Google PlayApp Store
FacebookTwitterTwitterInstagram
  • Universities & Institutions
  • Publishers
  • R Discovery PrimeNew
  • Ask R Discovery
  • Blog
  • Accessibility
  • Topics
  • Journals
  • Open Access Papers
  • Year-wise Publications
  • Recently published papers
  • Pre prints
  • Questions
  • FAQs
  • Contact us
Lead the way for us

Your insights are needed to transform us into a better research content provider for researchers.

Share your feedback here.

FacebookTwitterLinkedinInstagram
Cactus Communications logo

Copyright 2026 Cactus Communications. All rights reserved.

Privacy PolicyCookies PolicyTerms of UseCareers