An Embodied Multi-Sensor Fusion Approach to Visual Motion Estimation Using Unsupervised Deep Networks.

E Jared Shamwell,William D Nothwang,Donald Perlis

doi:10.3390/s18051427

E Jared Shamwell, William D Nothwang + Show 1 more

Open Access

PDF Available

https://doi.org/10.3390/s18051427

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Aimed at improving size, weight, and power (SWaP)-constrained robotic vision-aided state estimation, we describe our unsupervised, deep convolutional-deconvolutional sensor fusion network, Multi-Hypothesis DeepEfference (MHDE). MHDE learns to intelligently combine noisy heterogeneous sensor data to predict several probable hypotheses for the dense, pixel-level correspondence between a source image and an unseen target image. We show how our multi-hypothesis formulation provides increased robustness against dynamic, heteroscedastic sensor and motion noise by computing hypothesis image mappings and predictions at 76–357 Hz depending on the number of hypotheses being generated. MHDE fuses noisy, heterogeneous sensory inputs using two parallel, inter-connected architectural pathways and n (1–20 in this work) multi-hypothesis generating sub-pathways to produce n global correspondence estimates between a source and a target image. We evaluated MHDE on the KITTI Odometry dataset and benchmarked it against the vision-only DeepMatching and Deformable Spatial Pyramids algorithms and were able to demonstrate a significant runtime decrease and a performance increase compared to the next-best performing method.

Highlights

Due to both the speed and quality of their sensors and restrictive on-board computational capabilities, current state-of-the-art (SOA) size, weight, and power (SWaP) constrained autonomous robotic systems must trade operational speed for safety and robustness
While increased performance in the noise-free conditions was an unintended consequence of the multi-hypothesis formulation, the central contribution of this work is in the handling of noise-contaminated input data
We have shown the unsupervised learning of correspondence between static grayscale images in a deep sensorimotor fusion network with noisy sensor data

Summary

Introduction

Due to both the speed and quality of their sensors and restrictive on-board computational capabilities, current state-of-the-art (SOA) size, weight, and power (SWaP) constrained autonomous robotic systems must trade operational speed for safety and robustness. This trade-off is especially pronounced in dynamic, GPS-/communications-denied environments where robust navigation must be performed only with on-board sensors and computational resources. For VO, a crucial early-step is finding a correspondence mapping between scene elements perceived on the imaging plane in sequentially captured image frames. This is referred to as the correspondence problem (see [2] for a review)

Methods

Results

Conclusion