Abstract
Graph computation problems that exhibit irregular memory access patterns are known to show poor performance on multiprocessor architectures. Although recent studies use FPGA technology to tackle the memory wall problem of graph computation by adopting a massively multi-threaded architecture, the performance is still far less than optimal memory performance due to the long memory access latency. In this paper, we address the memory wall problem by taking advantage of sequential streaming bandwidth of external DRAM memory. First, we present an edge-streaming model that streams edges from external DRAM memory while makes random access to the set of vertices in on-chip SRAM, leading to a fully utilization of external memory bandwidth in burst mode. Second, we propose an on-chip distributed off-chip shared memory architecture with a high performance shuffle network to real-timely shuffle intermediate results, which significantly reduces the requirement for intermediate buffers and saves off-chip memory bandwidth. We further use PageRank as a case study to validate the effectiveness of the proposed architecture. Evaluation results on ML605 board show that our architecture can achieve up to 4× improvement in terms of performance to bandwidth ratio over previously published FPGA-based implementations.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.