Distributed computing of all-to-all comparison problems in heterogeneous systems

Yi-Fan Zhang,Wayne Kelly,Colin Fidge,Yu-Chu Tian

doi:10.1109/iecon.2015.7392403

Abstract

The requirement of distributed computing of all-to-all comparison (ATAC) problems in heterogeneous systems is increasingly important in various domains. Though Hadoop-based solutions are widely used, they are inefficient for the ATAC pattern, which is fundamentally different from the MapReduce pattern for which Hadoop is designed. They exhibit poor data locality and unbalanced allocation of comparison tasks, particularly in heterogeneous systems. The results in massive data movement at runtime and ineffective utilization of computing resources, affecting the overall computing performance significantly. To address these problems, a scalable and efficient data and task distribution strategy is presented in this paper for processing large-scale ATAC problems in heterogeneous systems. It not only saves storage space but also achieves load balancing and good data locality for all comparison tasks. Experiments of bioinformatics examples show that about 89% of the ideal performance capacity of the multiple machines have be achieved through using the approach presented in this paper.

Full Text