Pipelined Parallel Processing Research Articles

Vector operations are important in many computer applications. They often represent the main part of operations of the entire problem and consume a great amount of computing time. So, it is natural to apply parallel computation to vector operations in order to increase the speed of solving a problem. Among vector operations, vector reduction is a known and common type of operation (e.g., vector summation, inner product evaluation). In this paper vector reduction techniques for parallel pipelined processing are discussed. The computation and communication properties and constraints of both single and multiple vector reductions in a multipipeline environment are considered. From this a simple, yet efficient “partitioned linear pipeline array” (PLPA) architecture is proposed and the performance of a number of scheduling algorithms related to this architecture is determined. The performance comparison between the proposed approach and the well-known tree-structured reduction processor is given. From the results of performance analysis, it is shown that the PLPA approach has approximately the same performance as a pipelined binary reduction tree. However, the PLPA approach is much simpler and easier to implement, and is also more flexible than a tree-structured reduction processor. Finally, as an example, the matrix multiplication operation on a PLPA is considered. It is shown that with the PLPA architecture a very good performance can be obtained.

An organization of interleaved multimodule semiconductor memories is studied to facilitate accessing of memory words by a parallel-pipelined processor. All modules are assumed to be identical and to have address cycle (address hold time) and memory cycle of a and c segment time units, respectively. A total of N(=2 n ) memory modules are arranged such that there are l(=2 b ) lines for addresses and m(=2 n-b ) memory modules per line. For a parallel-pipelined processor of order (s,p) which consists of P parallel processors each of which has s degrees of multiprogramming, there can be up to s · p memory requests in each instruction cycle. Memory request collisions are bound to occur in such a system. Performance is evaluated as a function of the memory configuration. Results show that for reasonably large values of N, high performance can be obtained even in the nonbuffered case when l is a · p or more. Buffering has maximum effect on performance when l is near a · p. When l must be grater than a · p for adequate performance in the nonbuffered case, buffering can be used to reduce l while maintaining performance.

Pipelined Parallel Processing Research Articles

Related Topics

Articles published on Pipelined Parallel Processing

A new technique for visual motion alarm

Parallel vector reduction algorithms and architectures

Automatic analysis of speech using parallel cellular pipelined processor

Edge-Detection Processing in Finding Subtle Oil Traps in Complex Stratigraphic and Structural Environments: ABSTRACT

Architecture for VLSI Design of Reed-Solomon Decoders

Computer architectures for image processing in the USA

Effects of buffered memory requests in multiprocessor systems

Effects of buffered memory requests in multiprocessor systems

Organization of Semiconductor Memories for Parallel-Pipelined Processors

Parallel and pipeline computation of fast unitary transforms

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Pipelined Parallel Processing Research Articles

Related Topics

Articles published on Pipelined Parallel Processing

A new technique for visual motion alarm

Parallel vector reduction algorithms and architectures

Automatic analysis of speech using parallel cellular pipelined processor

Edge-Detection Processing in Finding Subtle Oil Traps in Complex Stratigraphic and Structural Environments: ABSTRACT

Architecture for VLSI Design of Reed-Solomon Decoders

Computer architectures for image processing in the USA

Effects of buffered memory requests in multiprocessor systems

Effects of buffered memory requests in multiprocessor systems

Organization of Semiconductor Memories for Parallel-Pipelined Processors

Parallel and pipeline computation of fast unitary transforms