Abstract

MapReduce has already been an indispensable framework for the programmers to develop big data applications at scale without concerning the underlying complex system details. However, such an important framework is missing on the Sunway many-core processor that powers the world-leading supercomputer Sunway Taihulight. This paper fills the gap by implementing an efficient MapReduce framework on Sunway many-core processor which takes full advantage of the architecture features such as local device memory and register communication. Specifically, we propose three different schemes that adopt different methods to partition the computation of map and reduce across the MPE and CPEs of Sunway processor. Especially in CPE-accelerated Dynamic Partitioning Scheme (CDPS), the processing role of the CPEs within each CPE pair can be transformed dynamically between map and reduce, which is effective to improve the load balance during runtime. In addition, we adopt the local device memory as well as register communication on each CPE to improve the efficiency of data access and communication. The experiment results demonstrate our MapReduce framework based on CDPS achieves 36.9× performance speedup on average across representative benchmarks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call