Video Object Research Articles

Referring video object segmentation (RVOS) aims to segment the target object in a video sequence described by a language expression. Typical multimodal Transformer based RVOS approaches process video sequence in a frame-independent manner to reduce the high computational cost, which however restricts the performance due to the lack of inter-frame interaction for temporal coherence modeling and spatio-temporal representation learning of the referred object. Besides, the absence of sufficient cross-modal interactions results in weak correlation between the visual and linguistic features, which increases the difficulty of decoding the target information and limits the performance of the model. In this paper, we propose a bidirectional correlation-driven inter-frame interaction Transformer, dubbed BIFIT, to address these issues in RVOS. Specifically, we design a lightweight and plug-and-play inter-frame interaction module in the Transformer decoder to efficiently learn the spatio-temporal features of the referred object, so as to decode the object information in the video sequence more precisely and generate more accurate segmentation results. Moreover, a bidirectional multi-level vision-language interaction module is implemented before the multimodal Transformer to enhance the correlation between the linguistic and multi-level visual features, thus facilitating the language queries to decode more precise object information from visual features and ultimately improving the segmentation performance. Extensive experimental results on four benchmarks validate the superiority of our BIFIT over state-of-the-art methods and the effectiveness of our proposed modules. The code is available in https://github.com/LANMNG/BIFIT.

Object detection system using Convolutional Neural Network(CNN) that can accurately identify and classify objects in videos. The purpose of object detection using CNN to enhance technology such as security cameras, smart devices by enabling them to identify and understand objects in videos. Object detection using CNN is a fascinating filed in computer vision. Detection can be difficult since there are all kinds of variations in orientation, lighting, background that can result in completely different videos of the very same object. Now with the advance of deep learning and neural network, we can finally tackle such problems without coming up with various heuristics real-time. The project “Object detection using CNN for video streaming” detects objects efficiently based on CNN algorithm and apply the algorithm on image or video data. In this project, we develop a technique to identify an object considering the deep learning pre-trained model MobileNet for Single Shot Multi-Box Detector (SSD). This algorithm is used for real-time detection and for webcam, which detects the objects in a video stream. Therefore, we use an object detection module that can detect what is in the video stream. In order to implement the module, we combine the MobileNet and the SSD framework for a fast and efficient deep learning-based method of object detection. The main purpose of our research is to elaborate the accuracy of an object detection method SSD and the importance of pre-trained deep learning model MobileNet. The experimental results show that the Average Precision (AP) of the algorithm to detect different classes as car, person and chair is 99.76%, 97.76% and 71.07%, respectively. The main objective of our project is to make clear the object detecting accuracy. The existing methods are Region Based Convolutional Neural Network(R-CNN) and You Only Look Once(YOLO).R-CNN could not pushed real time speed though its system is updated and new versions of it are deployed and YOLO network is popular but YOLO is to struggle to detect objects grouped close together, especially smaller ones. To avoid the drawbacks of these methods we proposed this model which included single shot multi-box detector (SSD), this algorithm is used for real time detection and Mobile-Net architecture.

Video Object Research Articles

Related Topics

Articles published on Video Object

Visual Semantic Segmentation Based on Few/Zero-Shot Learning: An Overview

Evaluating quality of motion for unsupervised video object segmentation

Bidirectional correlation-driven inter-frame interaction Transformer for referring video object segmentation

Object Detection Using CNN

Multi-View Inconsistency Analysis for Video Object-Level Splicing Localization

Video object segmentation via couple streams and feature memory

On-the-fly point annotation for fast medical video labeling.

SENSE: Hyperspectral video object tracker via fusing material and motion cues

A Quantum Evolutionary Learning Tracker for Video

Flow-Edge-Net: Video Saliency Detection Based on Optical Flow and Edge-Weighted Balance Loss

Ultimate pose estimation: A comparative study

Kernel based local matching network for video object segmentation

QDETRv: Query-Guided DETR for One-Shot Object Localization in Videos

Multi-Modal Prompting for Open-Vocabulary Video Visual Relationship Detection

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation

Context Enhanced Transformer for Single Image Object Detection in Video Data

Generalizable Fourier Augmentation for Unsupervised Video Object Segmentation

Cascade transformers with dynamic attention for video question answering

Detection of an in-housed pig using modified YOLOv5 model

A comparison of deep learning-based object detection for unmanned aerial vehicle

Lead the way for us