CHAPTER 2 - Shared-Memory Parallel Programs

Daniel E Lenoski,Wolf-Dietrich Weber

doi:10.1016/b978-1-55860-315-8.50007-5

Abstract

This chapter discusses shared-memory parallel programs. A model of execution time is presented by breaking it into the categories of processor busy and processor idle time, and then breaking each of these into subcategories, depending on the cause of the busy or idle state of the processor. In a simple PRAM memory model, only synchronization wait times produce any processor idle time, and the processor utilization is generally quite high. A significant portion of total time is spent on idle time because of memory latency when more realistic memory system models are introduced. Allowing shared data to be cached significantly reduces this idle time and thus improves application performance. The sharing behavior of parallel applications is studied by looking at communication-to-computation ratios and invalidation patterns of these applications. Although ideal communication traffic is quite low for all applications, introducing a real memory system model increases traffic by an order of magnitude or larger. In contrast, for the cached case with 64-byte cache lines, most of the additional traffic is because of write backs, transfers, and inefficient use of the data transferred. Because of the significantly reduced run time when shared data is cached, the bandwidth requirements are much higher than for the uncached case.

Full Text