Abstract
Modern multi-processor systems provide opportunities to improve parallel code implementation, thereby improving performance. Applying techniques for code improvement is still somewhat of an art, but a methodology to structure code changes is essential to success. We use a set of techniques to improve the performance of several existing codes for dual-processor systems. We take advantage of shared memories and shared local high speed networks as well as higher bandwidth for increasing numbers of components on processors and on boards. The techniques we introduce for dual-processor systems also apply to other multiprocessor systems, to exploit the advantages of low latencies and increasing cache sizes. We present parallel code performance improvement techniques and a methodology for applying them to existing codes. We use several codes to compare our results with theoretical peak performance and to illustrate our techniques and methodology. The results show that key techniques for code improvement work well with our methodology. We illustrate these techniques with examples will help developers to improve the performance of their codes.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have