Abstract

Autonomous vehicles operate in highly dynamic environments necessitating an accurate assessment of which aspects of a scene are moving and where they are moving to. A popular approach to 3D motion estimation, termed scene flow, is to employ 3D point cloud data from consecutive LiDAR scans, although such approaches have been limited by the small size of real-world, annotated LiDAR data. In this work, we introduce a new large-scale dataset for scene flow estimation derived from corresponding tracked 3D objects, which is <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\sim 1,000\times$</tex-math></inline-formula> larger than previous real-world datasets in terms of the number of annotated frames. We demonstrate how previous works were bounded by the amount of real LiDAR data available, suggesting that larger datasets are required to achieve state-of-the-art predictive performance. Furthermore, we show how previous heuristics such as down-sampling heavily degrade performance, motivating a new class of models that are tractable on the full point cloud. To address this issue, we introduce the FastFlow3D architecture which provides real time inference on the full point cloud. Additionally, we design human-interpretable metrics that better capture real world aspects by accounting for ego-motion and providing breakdowns per object type. We hope that this dataset may provide new opportunities for developing real world scene flow systems.

Highlights

  • M OTION is a prominent cue that enables humans to navigate complex environments [1]

  • Recognizing the limitations of previous works, we develop a new baseline model architecture, FastFlow3D, that is tractable on the complete point cloud with the ability to run in real time (i.e. < 100 ms) on an autonomous vehicles (AVs)

  • We discuss dataset statistics and how this affects our selection of evaluation metrics

Read more

Summary

INTRODUCTION

M OTION is a prominent cue that enables humans to navigate complex environments [1]. Likewise, understanding and predicting the 3D motion field of a scene – termed the scene flow – provides an important signal to enable autonomous vehicles (AVs) to understand and navigate highly dynamic environments [2]. We address these shortcomings of this field by introducing a large scale dataset for scene flow geared towards AVs. We derive per-point labels for motion estimation by bootstrapping from tracked objects densely annotated in a scene from a recently released large scale AV dataset [15]. We suspect the degree to which the fields of semi-supervised learning address this problem may have strong implications for the real-world application of scene flow in AVs. We hope that the resulting dataset presented in this paper may open the opportunity for qualitatively new forms of learned scene flow models

Benchmarks for scene flow estimation
Datasets for tracking in AV’s
Models for learning scene flow
CONSTRUCTING A SCENE FLOW DATASET
Problem definition
From tracked boxes to flow annotations
EVALUATION METRICS FOR SCENE FLOW
FASTFLOW3D: A SCALABLE BASELINE MODEL
RESULTS
A large-scale dataset for scene flow
A scalable model baseline for scene flow
Generalizing to unlabeled moving objects
DISCUSSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call