Abstract

Research on autonomous cars, early intensified in the 1990s, is becoming one of the main research paths in automotive industry. Recent works use Rapidly-exploring Random Trees to explore the state space along a given reference path, and to compute the minimum time collision-free path in real time. Those methods do not require good approximations of the reference path, they are able to cope with discontinuous routes, they are capable of navigating in realistic traffic scenarios, and they derive their power from an extensive computational effort directed to improve the quality of the trajectory from step to step. In this paper, we focus on re-engineering an existing state-of-the-art sequential algorithm to obtain a CUDA-based GPGPU (General Purpose Graphics Processing Units) implementation. To do that, we show how to partition the original algorithm among several working threads running on the GPU, how to propagate information among threads, and how to synchronize those threads. We also give detailed evidence on how to organize memory transfers between the CPU and the GPU (and among different CUDA kernels) such that planning times are optimized and the available memory is not exceeded while storing massive amounts of fuse data. To sum up, in our application the GPU is used for all main operations, the entire application is developed in the CUDA language, and specific attention is paid to concurrency, synchronization, and data communication. We run experiments on several real scenarios, comparing the GPU implementation with the CPU one in terms of the quality of the generated paths and in terms of computation (wall-clock) times. The results of our experiments show that embedded GPUs can be used as an enabler for real-time applications of computationally expensive planning approaches.

Highlights

  • Autonomous driving systems are becoming more real in our daily life, and new techniques and new improvements are proposed by researchers and companies at a high rate

  • We focus our attention on how to re-engineer the randomized sample-based algorithm presented by Schwesinger et al [10], on a CUDA

  • Notice that higher Tlookahead values are more suited to highway routes, where starting distance may have a higher priority with respect to minimum obstacle distances, On the contrary, smaller Tlookahead values are better for parking maneuvers where speed is drastically reduced and obstacles are usually motionless

Read more

Summary

A Smart Many-Core Implementation of a Motion

Gianpiero Cabodi 1 , Paolo Camurati 1 , Alessandro Garbo 1,† , Michele Giorelli 2 , Stefano Quer 1, *,† and Francesco Savarese 1,†. Received: 26 November 2018; Accepted: 29 January 2019; Published: 2 February 2019

Introduction
Contributions
Path Planning
GPGPU and Parallel Programming Basic Notions
Path Planning Methodologies
GPU-Based Path Planning Strategies
Terminology
The Algorithm
Migration to a Parallel Environment
High Level Tool Structure
Data Structures and GPU Memory
High Level Algorithm
C ONCURRENT P LANNING CYCLE
Function DRAW S AMPLE
Function EXPAND K ERNEL
Function COMPUTE C OST K ERNEL
Experimental Analysis
Operating Scenarios
Evaluation Metrics
Original Algorithm Parameter Setting
Time Comparison
Conclusions and Future Works
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.