Access To Critical Section Research Articles

With the degree of parallelism increasing, performance of multi-threaded shared variable applications is not only limited by serialized critical section execution, but also by the serialized competition overhead for threads to get access to critical section. As the number of concurrent threads grows, such competition overhead may exceed the time spent in critical section itself, and become the dominating factor limiting the performance of parallel applications. In modern operating systems, queue spinlock, which comprises a low-overhead spinning phase and a high-overhead sleeping phase, is often used to lock critical sections. In the paper, we show that this advanced locking solution may create very high competition overhead for multithreaded applications executing in NoC-based CMPs. Then we propose a software-hardware cooperative mechanism that can opportunistically maximize the chance that a thread wins the critical section access in the low-overhead spinning phase, thereby reducing the competition overhead. At the OS primitives level, we monitor the remaining times of retry (RTR) in a thread's spinning phase, which reflects in how long the thread must enter into the high-overhead sleep mode. At the hardware level, we integrate the RTR information into the packets of locking requests, and let the NoC prioritize locking request packets according to the RTR information. The principle is that the smaller RTR a locking request packet carries, the higher priority it gets and thus quicker delivery. We evaluate our opportunistic competition overhead reduction technique with cycle-accurate full-system simulations in GEM5 using PARSEC (11 programs) and SPEC OMP2012 (14 programs) benchmarks. Compared to the original queue spinlock implementation, experimental results show that our method can effectively increase the opportunity of threads entering the critical section in low-overhead spinning phase, reducing the competition overhead averagely by 39.9% (maximally by 61.8%) and accelerating the execution of the Region-of-Interest averagely by 14.4% (maximally by 24.5%) across all 25 benchmark programs.

Read full abstract

We present a new critical section protocol designed for distributed systems with general topologies, where the physical layer is implemented as point-to-point physical links in contrast to shared access physical media. The protocol operates correctly for any topology; however, its time performance is topology dependent. The distributed system can be modeled by a graph G(V, E), where V denotes the set of processors and E is the set of bidirectional communication links. We use n to denote |V|; D(G) is the diameter of G, T(G) is the spanning tree of G, and D(T) is the diameter of T(G). An important measure of the performance of the protocol is the amount of traffic caused by its operation. Let message-hop be the amount of traffic generated by a single message between two adjacent nodes. The proposed protocol generates network traffic of only 3*(n − 1) ∈ Θ(n) [message-hops] per critical section access for any topology which is less than other existing fully distributed protocols. A lower bound on traffic for a single critical section access for a fully distributed protocol is shown to be 2*(n − 1) [message-hops]. Some previously published algorithms generate Θ(n2) [message-hops] of network traffic for some topologies. Another important measure of the performance of the protocol is the cs-access time. It is the time required to access the critical section in the absence of other requests; and it depends on the topology. The high cs-access time performance is achieved by taking a novel approach of distributing the communication and parts of computation functions of the protocol and exploiting the physical topology. For a constant size message, the time to traverse an edge, including the message communication software processing in the source and destination nodes, is called message-hop-time and it is denoted by th. For a general graph G (with spanning tree T) the new protocol has the cs-access time performance Θ(max(D(T), max(deg (vi)))) [th], where deg(vi) is computed in T. For the graphs where G has D(G) ∈ Θ(log2n) and max(deg(vi)) in G is O(log2n), the cs-access time performance is Θ(log2n) [th]. For the class of graphs where G has D(G) ∈ Θ(n), the cs-access time performance is Θ(n) [th]. For the Star graphs the cs-access time performance is Θ(n) [th]. The worst case time performance occurs for linear and Star graphs. The proposed protocol has a better network traffic performance and (depending on the topology) a better or equal cs-access time performance than previously published fully distributed protocols. The protocol keeps the clock bounded in well-designed systems using a distributed predictive "clock squashing" mechanism.

Read full abstract

Access To Critical Section Research Articles

Related Topics

Articles published on Access To Critical Section

Evaluating and optimizing stabilizing dining philosophers

Opportunistic competition overhead reduction for expediting critical section in NoC based CMPs

A time complexity lower bound for adaptive mutual exclusion

A distributed mutual exclusion algorithm over multi-routing protocol for mobile ad hoc networks

A fair distributed mutual exclusion algorithm

A Distributed Critical Section Protocol for General Topology

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Access To Critical Section Research Articles

Related Topics

Articles published on Access To Critical Section

Evaluating and optimizing stabilizing dining philosophers

Opportunistic competition overhead reduction for expediting critical section in NoC based CMPs

A time complexity lower bound for adaptive mutual exclusion

A distributed mutual exclusion algorithm over multi-routing protocol for mobile ad hoc networks

A fair distributed mutual exclusion algorithm

A Distributed Critical Section Protocol for General Topology