Limits of task-based parallelism in irregular applications

Barbara Kreaseck,Dean Tullsen,Brad Calder

doi:10.1145/346023.346034

Barbara Kreaseck, Dean Tullsen + Show 1 more

Open Access

https://doi.org/10.1145/346023.346034

Copy DOI

Abstract

Tomorrow's microprocessors will be able to handle multiple flows of control. Applications that exhibit task level parallelism (TLP) and can be decomposed into parallel tasks will perform well on these platforms. TLP arises when a task is independent of its neighboring code. Traditional parallel compilers exploit one variety of TLP, loop level parallelism (LLP), where loop iterations are executed in parallel. LLP can overwhelming be found in numeric, typically FORTRAN programs with regular patterns of data accesses. In contrast, irregular applications, typified by general purpose integer applications, exhibit little LLP as they tend to access data in irregular patterns through pointers. Without pointer disambiguation to analyze data access dependences, traditional parallel compilers cannot parallelize these irregular applications and ensure correct execution.We focus on a different variety of TLP, namely Speculative Task Parallelism (STP). STP arises when a task (either a leaf-procedure, a non-leaf procedure or an entire loop) is control- and memory-independent of its preceding code, and thus could be executed in parallel. Two sections of code are memory-independent when neither contains a store to a memory location that the other accesses. To exploit STP, we assume a hypothetical speculative machine that supports speculative futures (a parallel programming construct that executes a task early on a different thread or processor) with mechanisms for resolving incorrect speculation when the task is not, after all, independent. This allows us to speculatively parallelize code when there is a high probability of independence, but no guarantee.Figure 1 illustrates STP, showing a task Y in the dynamic instruction stream of an irregular application that has no memory access conflicts with a group of instructions, X, that precede Y. The shorter of X and Y determines the overlap of memory-independent instructions as seen in Figures 1(b) and 1(c). In the absence of any register dependences, X and Y may be executed in parallel, resulting in shorter execution time. It is hard for traditional parallel compilers of pointer-based languages to expose this parallelism.The goals of this paper are to identify such regions as X and Y within irregular applications and to find the number of instructions that may thus be removed from the critical path. This number represents the maximum STP when the cost of exploiting STP is zero.Because the biggest barrier to detecting independence in irregular codes is memory disambiguation, we identify memory-independent tasks using a profile-based approach and measure the amount of STP by estimating the amount of memory-independent instructions those tasks expose. We vary the level of control dependence and memory dependence to investigate their effect on the amount of memory-independence we find. We profile at different memory granularities and introduce synchronization to expose higher levels of memory-independence. Across this variety of speculation assumptions, 7 to 22% of dynamic instructions are within tasks that are found to be memory-independent. This was on the SPECint95 benchmarks, a set of irregular applications for which traditional methods of parallelization are ineffective.

Full Text