Commercial Multiprocessors Research Articles

Hardware transactional memory is a new parallel programming paradigm supported by current commercial multiprocessors. This paradigm provides optimistic concurrency and overcomes some of the problems associated with classical lock-based synchronization, such as deadlock and serialization. Certain algorithms of computational geometry are found to be good candidates for parallelization with this paradigm. However, hardware transactional approaches to these algorithms lead to poor performance as the resulting transactions are too large for the underlying hardware to deal with. Large transactions overflow hardware resources serializing the execution.In this paper, we propose using privatizing transactions to parallelize two computational geometry algorithms: Lee’s algorithm, which solves the shortest-route problem, and Ruppert’s algorithm for Delaunay/Voronoi mesh refinement. Privatizing transactions are based on commercial hardware transactional memory extensions, and their goal is to reduce transaction footprint by means of a non-transactional private execution section. This results in effective smaller transactions. Our implementation is able to further reduce the transaction size as we propose a reduced validation set for privatizing transactions. Programming complexity of these implementations is discussed.Results show that our privatizing transaction implementations indeed enhance performance comparing with existing hardware transactional memory versions. Experiments with Intel’s transactional memory extensions yield speedups ranging from 2× to 3.5× with four threads.

Read full abstract

This paper presents a software-based approach to fault-tolerant routing in networks using wormhole or virtual cut-through switching. When a message encounters a faulty output link, it is removed from the network by the local router and delivered to the messaging layer of the local node's operating system. The message passing software can reroute this message, possibly along nonminimal paths. Alternatively, the message may be addressed to an intermediate node, which will forward the message to the destination. A message may encounter multiple faults and pass through multiple intermediate nodes. The proposed techniques are applicable to both obliviously and adaptively routed networks. The techniques are specifically targeted toward commercial multiprocessors where the mean time to repair (MTTR) is much smaller than the mean time between router failures (MTBF), i.e., it is sufficient to tolerate a maximum of three failures. This paper presents requirements for buffer management, deadlock freedom, and livelock freedom. Simulation results are presented to evaluate the degradation in latency and throughput as a function of the number and distribution of faults. There are several advantages of such an approach. Router designs are minimally impacted, and thus remain compact and fast. Only messages that encounter faulty components are affected, while the machine is ensured of continued operation until the faulty components can be replaced. The technique leverages existing network technology, and the concepts are portable across evolving switch and router designs. Therefore, we feel that the technique is a good candidate for incorporation into the next generation of multiprocessor networks.

Read full abstract

Commercial Multiprocessors Research Articles

Related Topics

Articles published on Commercial Multiprocessors

Improving hardware transactional memory parallelization of computational geometry algorithms using privatizing transactions

Towards High Performance Computing (HPC) Through Parallel Programming Paradigms and Their Principles

Software-based rerouting for fault-tolerant pipelined communication

Multiprocessors should support simple memory consistency models

Multicoloring of grid-structured PDE solvers on shared-memory multiprocessors

Predicting performance of parallel computations

Low-synchronization translation lookaside buffer consistency in large-scale shared-memory multiprocessors

Logarithmic indices for multiprocessor evaluation

Parallelization of an event driven simulator for computer systems simulation

Parallel Discrete-Event Simulation

Commercial multiprocessors (title only)

VLSI based design principles for MIMD multiprocessor computers with distributed memory management

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Commercial Multiprocessors Research Articles

Related Topics

Articles published on Commercial Multiprocessors

Improving hardware transactional memory parallelization of computational geometry algorithms using privatizing transactions

Towards High Performance Computing (HPC) Through Parallel Programming Paradigms and Their Principles

Software-based rerouting for fault-tolerant pipelined communication

Multiprocessors should support simple memory consistency models

Multicoloring of grid-structured PDE solvers on shared-memory multiprocessors

Predicting performance of parallel computations

Low-synchronization translation lookaside buffer consistency in large-scale shared-memory multiprocessors

Logarithmic indices for multiprocessor evaluation

Parallelization of an event driven simulator for computer systems simulation

Parallel Discrete-Event Simulation

Commercial multiprocessors (title only)

VLSI based design principles for MIMD multiprocessor computers with distributed memory management