Abstract

Visual Odometry (VO) systems are widely used to determine the position and orientation of a robot or camera in an unknown environment. They are deployed on resource-constrained platforms, such as drones and Virtual Reality (VR) or Augmented Reality (AR) headsets. VO systems harnessing modern System-on-Chip (SoCs) with integrated Field Programmable Gate Array (FPGA) have the potential to improve the overall systems performance. This paper explores the FPGA acceleration of sparse VO kernels using High-level Synthesis (HLS) as this kind of VO system has been designed to use with low-power SoCs. We show that both computational and data transfer overheads between the processing cores of the CPU of the SoC and the accelerators on the FPGA need to be optimized to obtain better end-to-end performance. This is a result of the additional data movement incurred when using an FPGA accelerator and also because of the sparse computational nature with predictable or random memory access patterns of the kernels involved. However, state-of-the-art HLS tools are not yet able to perform the required optimizations automatically because they usually assume that the kernels to be accelerated have dense computational patterns with regular memory access. In this paper we propose three, potentially generic, methods to reduce the data transfer between the CPU and the customised hardware kernels on the FPGA; these methods are: (a) approximation based on domain-specific knowledge, (b) image compression, and (c) the use of on-the-fly computation. We present a case study of the use of these methods on SVO, a state-of-the-art sparse VO system with a semi-direct front-end. We demonstrate that our proposed methods can reduce data transfer overhead to achieve better end-to-end performance and that they can be applied not only when using standard Xilinx HLS tools but also with other state-of-the-art HLS tools, such as HeteroFlow. Compared to the baseline performance of the original SVO software on an Arm CPU, our proposed methods assist the HLS and HeteroFlow designs to achieve a speedup of 2.4x and 2.14x, respectively, without noticeable accuracy loss. The HLS and HeteroFlow designs also achieve a 1.85x and 1.89x, respectively, improvement in energy efficiency on the SoC system used. Compared to the SVO software baseline running on the Intel Xeon CPU, our proposed methods assist the HLS and HeteroFlow designs to achieve 8.2x and 8.3x improvement in energy efficiency, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call