Exploiting multi‐cores for efficient interchange of large messages in distributed systems

Dongfang Zhao,Zhou Zhou,Xiaobing Zhou,Ioan Raicu,Kan Qiao,Tonglin Li

doi:10.1002/cpe.3742

Abstract

SummaryConventional data serialization tools assume that objects to be coded are usually small in size so a single CPU core can encode it in a timely manner. In the era of Big Data, however, object gets increasingly complex and larger, which makes data serialization become a new performance bottleneck. This paper describes an approach to parallelize data serialization by leveraging multiple cores. Parallelizing data serialization introduces new questions such as how to split the (sub)objects, how to allocate the available cores, and how to minimize its overhead in practice. In this paper we design a framework for parallelly serializing large objects and analyze the design tradeoffs under different scenarios. To validate the proposed approach, we implemented parallel protocol buffers—the parallel version of Google's Protocol Buffers, a widely‐used data serialization utility. Experimental results confirm the effectiveness of Parallel Protocol Buffers: multiple cores employed in data serialization achieve highly scalable performance and incur negligible overhead. Copyright © 2015 John Wiley & Sons, Ltd.

Full Text