ImageNet VID Research Articles

Object detection has been vigorously investigated for years but fast accurate detection for real-world scenes remains a very challenging problem. Overcoming drawbacks of single-stage detectors, we take aim at precisely detecting objects for static and temporal scenes in real time. Firstly, as a dual refinement mechanism, a novel anchor-offset detection is designed, which includes an anchor refinement, a feature location refinement, and a deformable detection head. This new detection mode is able to simultaneously perform two-step regression and capture accurate object features. Based on the anchor-offset detection, a dual refinement network (DRNet) is developed for high-performance static detection, where a multi-deformable head is further designed to leverage contextual information for describing objects. As for temporal detection in videos, temporal refinement networks (TRNet) and temporal dual refinement networks (TDRNet) are developed by propagating the refinement information across time. We also propose a soft refinement strategy to temporally match object motion with the previous refinement. Our proposed methods are evaluated on PASCAL VOC, COCO, and ImageNet VID datasets. Extensive comparisons on static and temporal detection verify the superiority of DRNet, TRNet, and TDRNet. Consequently, our developed approaches run in a fairly fast speed, and in the meantime achieve a significantly enhanced detection accuracy, i.e., 84.4% mAP on VOC 2007, 83.6% mAP on VOC 2012, 69.4% mAP on VID 2017, and 42.4% AP on COCO. Ultimately, producing encouraging results, our methods are applied to online underwater object detection and grasping with an autonomous system. Codes are publicly available at https://github.com/SeanChenxy/TDRN.

ImageNet VID Research Articles

Articles published on ImageNet VID

Video Object Detection Using Event-Aware Convolutional Lstm and Object Relation Networks

Automatic tracking of the dairy goat in the surveillance video

Local Attention Sequence Model for Video Object Detection

A feature temporal attention based interleaved network for fast video object detection

Video object detection with a convolutional regression tracker

Real-time and accurate object detection in compressed video by long short-term feature aggregation

MINet: Meta-Learning Instance Identifiers for Video Object Detection.

Video object detection algorithm based on dynamic combination of sparse feature propagation and dense feature aggregation

Video object detection for autonomous driving: Motion-aid feature calibration

Single Shot Video Object Detector

SCNet: Scale-aware coupling-structure network for efficient video object detection

Temporal Context Enhanced Feature Aggregation for Video Object Detection

Joint Anchor-Feature Refinement for Real-Time Accurate Object Detection in Images and Videos

Motion Context Network for Weakly Supervised Object Detection in Videos

Video Object Detection Guided by Object Blur Evaluation

Video Object Detection With Two-Path Convolutional LSTM Pyramid

Visual tracking based on semantic and similarity learning

Detect or Track: Towards Cost-Effective Video Object Detection/Tracking

Video Object Detection by Aggregating Features across Adjacent Frames

Object Detection in Videos by High Quality Object Linking.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

ImageNet VID Research Articles

Articles published on ImageNet VID

Video Object Detection Using Event-Aware Convolutional Lstm and Object Relation Networks

Automatic tracking of the dairy goat in the surveillance video

Local Attention Sequence Model for Video Object Detection

A feature temporal attention based interleaved network for fast video object detection

Video object detection with a convolutional regression tracker

Real-time and accurate object detection in compressed video by long short-term feature aggregation

MINet: Meta-Learning Instance Identifiers for Video Object Detection.

Video object detection algorithm based on dynamic combination of sparse feature propagation and dense feature aggregation

Video object detection for autonomous driving: Motion-aid feature calibration

Single Shot Video Object Detector

SCNet: Scale-aware coupling-structure network for efficient video object detection

Temporal Context Enhanced Feature Aggregation for Video Object Detection

Joint Anchor-Feature Refinement for Real-Time Accurate Object Detection in Images and Videos

Motion Context Network for Weakly Supervised Object Detection in Videos

Video Object Detection Guided by Object Blur Evaluation

Video Object Detection With Two-Path Convolutional LSTM Pyramid

Visual tracking based on semantic and similarity learning

Detect or Track: Towards Cost-Effective Video Object Detection/Tracking

Video Object Detection by Aggregating Features across Adjacent Frames

Object Detection in Videos by High Quality Object Linking.