Scalable shared-memory architecture to solve the Knapsack 0/1 problem

Fernando A Escobar,Anthony Kolar,Naim Harb,Filipe Vinci Dos Santos,Carlos Valderrama

doi:10.1016/j.micpro.2017.04.001

Fernando A Escobar, Anthony Kolar + Show 3 more

https://doi.org/10.1016/j.micpro.2017.04.001

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Dynamic Programming (DP) is used to solve combinatorial optimization problems and constitutes one of the 13 High Performance Computing (HPC) patterns. DP suffers from irregular, data-dependent memory accesses that deteriorates performance. The Knapsack 0/1 belongs to the simplest DP algorithms which is called Serial Monadic and has been treated in software with cache-efficient algorithms as well as parallel threads, OpenMP or MPI.In this paper we propose a shared memory, parametrizable architecture to compute the DP matrix for the Knapsack 0/1. Our system has a parallel runtime of Θ(mC/q) for a knapsack of capacity C with m items and q operators. Using additional off-chip space and DMA transfers it can solve knapsacks of any size. The architecture is implemented on the ZYNQ-7020 System On Chip (SoC) that contains a dual-core ARM plus Artix FPGA fabric. Under such architecture we make use of 64-bit High Performance ports for off-chip transfers and asymmetric 32-bit write/64-bit read BRAMs to minimize data exchange times. We also exploit computation synchronization to minimize BRAM address propagation and reduce routing congestion. We present results for a base system with 70 Processing Elements (PEs) capable of solving problems with a maximum item weight ωmax=1024. For more complex instances we configure the architecture with 58 PEs and ωmax=6144, where a single BRAM is shared among 13 computing units. We thus solve problems with 6 × bigger weights than previous works, attain a 16 × speed-up versus an optimized software on an Intel Xeon E5 and get the highest efficiency per core versus other architectures. We achieve between 2.4−3.3× acceleration versus previous FPGA solutions.

Full Text