Interactive 3D Annotation of Objects in Moving Videos from Sparse Multi-view Frames

Kotaro Oomori,Wataru Kawabe,Takeo Igarashi,Keita Higuchi,Fabrice Matulic

doi:10.1145/3626476

Abstract

Segmenting and determining the 3D bounding boxes of objects of interest in RGB videos is an important task for a variety of applications such as augmented reality, navigation, and robotics. Supervised machine learning techniques are commonly used for this, but they need training datasets: sets of images with associated 3D bounding boxes manually defined by human annotators using a labelling tool. However, precisely placing 3D bounding boxes can be difficult using conventional 3D manipulation tools on a 2D interface. To alleviate that burden, we propose a novel technique with which 3D bounding boxes can be created by simply drawing 2D bounding rectangles on multiple frames of a video sequence showing the object from different angles. The method uses reconstructed dense 3D point clouds from the video and computes tightly fitting 3D bounding boxes of desired objects selected by back-projecting the 2D rectangles. We show concrete application scenarios of our interface, including training dataset creation and editing 3D spaces and videos. An evaluation comparing our technique with a conventional 3D annotation tool shows that our method results in higher accuracy. We also confirm that the bounding boxes created with our interface have a lower variance, likely yielding more consistent labels and datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Interactive 3D Annotation of Objects in Moving Videos from Sparse Multi-view Frames

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ACM on Human-Computer Interaction

Lead the way for us

Similar Papers

MLOD: A multi-view 3D object detection based on robust feature fusion method
Jian Deng ... Krzysztof Czarnecki
-
Jian Deng, et. al.Jian Deng ... Krzysztof Czarnecki
01 Oct 2019
01 Oct 2019

Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images
Shuran Song ... Jianxiong Xiao
-
Shuran Song, et. al.Shuran Song ... Jianxiong Xiao
01 Jun 2016
01 Jun 2016

An accurate approach for obtaining spatiotemporal information of vehicle loads on bridges based on 3D bounding box reconstruction with computer vision
Jinsong Zhu ... Teng Shi
Measurement | VOL. 181
Jinsong Zhu, et. al.Jinsong Zhu ... Teng Shi
29 May 2021
Measurement | VOL. 181

Temporally consistent caption detection in videos using a spatiotemporal 3D method
Dong-Qing Zhang ... Sitaram Bhagavathy
-
Dong-Qing Zhang, et. al. Dong-Qing Zhang ... Sitaram Bhagavathy
01 Nov 2009
01 Nov 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Interactive 3D Annotation of Objects in Moving Videos from Sparse Multi-view Frames

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ACM on Human-Computer Interaction