Abstract

Sunway TaihuLight system is the first supercomputer offering a peak performance over 100 PFlops, which can be utilized to parallelize Non-dominated Sorting Genetic Algorithm II (NSGA-II), a standard approach to multi-objective optimization. However, insufficient off-chip memory bandwidth and limited scratchpad memory capacity of the supercomputer hinder the performance improvement of parallellizing NSGA-II. In this article, we propose an optimized parallel NSGA-II on Sunway TaihuLight system, called swNSGA-II, by utilizing process- and thread-level parallelism of the system based on an improved island/master-slave model. To overcome the hurdles of low memory bandwidth and capacity, we propose a data sharing scheme based on register-level communication that can efficiently parallelize non-dominated sorting and crowding-distance computation of NSGA-II. Several optimization techniques including vectorization, direct memory accessing, and double buffering are also adopted to further accelerate swNSGA-II. Experiment results show that the proposed swNSGA-II can achieve a speedup of 41284 on a use case of path planning, and a speedup of 62692 on ZDT1 as compared to conventional NSGA-II.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call