Abstract
Loop selection for multilevel nested loops is a very difficult problem, for which solutions through the underlying hardware-based loop selection techniques and the traditional software-based static compilation techniques are ineffective. A genetic algorithm- (GA-) based method is proposed in this study to solve this problem. First, the formal specification and mathematical model of the loop selection problem are presented; then, the overall framework for the GA to solve the problem is designed based on the mathematical model; finally, we provide the chromosome representation method and fitness function calculation method, the initial population generation algorithm and chromosome improvement methods, the specific implementation methods of genetic operators (crossover, mutation, and selection), the offspring population generation method, and the GA stopping criterion during the GA operation process. Experimental tests with the SPEC2006 and NPB3.3.1 standard test sets were performed on the Sunway TaihuLight supercomputer. The test results indicated that the proposed method can achieve a speedup improvement that is superior to that by the current mainstream methods, which confirm the effectiveness of the proposed method. Solving the loop selection problem of multilevel nested loops is of great practical significance for exploiting the parallelism of general scientific computing programs and for giving full play to the performance of multicore processors.
Highlights
With the rapid development of multicore processor technology, how to effectively use multicore processors to improve the performance of general scientific computing programs is challenging
With the deepening of the number of loop layers, the granularity of the parallelism becomes increasingly small. erefore, to improve the parallelism performance of multilevel nested loops in general scientific computing programs, in this study, we propose a method that uses the least number of loops to expose the maximum parallelism in multilevel nested loops
A uses a loop selection method that combines the compile time and the runtime [31], and Method B uses a loop selection method based on machine learning [32]. e automatic parallelization performance of the basic compiler was again used as the benchmark of the test. en, the ten scientific computing programs selected from the SPEC2006 test set and the NPB3.3.1 parallel computing test set were tested, with the results shown in Figures 14 and 15, respectively
Summary
With the rapid development of multicore processor technology, how to effectively use multicore processors to improve the performance of general scientific computing programs is challenging. There are loop levels for which parallelism can be directly executed in the multilevel nested loops, and they are moved to the outermost layer for parallel execution through loop exchange. In another case, which is more common, there are no loop levels at which parallelism can be directly executed in multilevel nested loops, and a set of loop levels must be selected to be moved to the outermost layer for serial execution to expose possible new parallelism. As the number of nested loop layers increases, the dependencies between the loop iterations become very complicated, which makes it difficult to analyze and search for such a group of loop levels
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.