Abstract

With computing systems becoming ubiquitous, numerous data sets of extremely large size are becoming available foranalysis. Often the data collected have complex, graph based structures, which makes them difficult to process with traditional tools. Moreover, the irregularities in the data sets, and in the analysis algorithms, hamper the scaling of performance in large distributedhigh-performance systems, optimized for locality exploitation and regular data structures. In this paper we present an approach tosystem design that enable efficient execution of applications with irregular memory patterns on a distributed, many-core architecture, based on off-the-shelf cores. We introduce a set of hardware and software components, which provide a distributed global address space, fine-grained synchronization and latency hiding of remote accesses with multithreading. An FPGA prototype has been implemented to explore the design with a set of typical irregular kernels. We finally present an analytical model that highlights the benefits of the approach and helps identifying the bottlenecks in the prototype. The experimental evaluation on graph basedapplications demonstrates the scalability of the architecture for different configurations of the whole system.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.