Year Year arrow
arrow-active-down-0
Publisher Publisher arrow
arrow-active-down-1
Journal
1
Journal arrow
arrow-active-down-2
Institution Institution arrow
arrow-active-down-3
Institution Country Institution Country arrow
arrow-active-down-4
Publication Type Publication Type arrow
arrow-active-down-5
Field Of Study Field Of Study arrow
arrow-active-down-6
Topics Topics arrow
arrow-active-down-7
Open Access Open Access arrow
arrow-active-down-8
Language Language arrow
arrow-active-down-9
Filter Icon Filter 1
Year Year arrow
arrow-active-down-0
Publisher Publisher arrow
arrow-active-down-1
Journal
1
Journal arrow
arrow-active-down-2
Institution Institution arrow
arrow-active-down-3
Institution Country Institution Country arrow
arrow-active-down-4
Publication Type Publication Type arrow
arrow-active-down-5
Field Of Study Field Of Study arrow
arrow-active-down-6
Topics Topics arrow
arrow-active-down-7
Open Access Open Access arrow
arrow-active-down-8
Language Language arrow
arrow-active-down-9
Filter Icon Filter 1
Export
Sort by: Relevance
  • Research Article
  • Cite Count Icon 3
  • 10.1080/10637190412331295157
A comparative study of explicit group iterative solvers on a cluster of workstations
  • Dec 1, 2004
  • Parallel Algorithms and Applications
  • Norhashidah Hj Mohd Ali + 2 more

In this paper, a group iterative scheme based on rotated (cross) five-point finite difference discretisation, i.e. the four-point explicit decoupled group (EDG) is considered in solving a second order elliptic partial differential equation (PDE). This method was firstly introduced by Abdullah [“The four point EDG method: a fast poisson solver”, Int. J. Comput. Math., 38 (1991) 61–70], where the method was found to be more superior than the common existing methods based on the standard five-point finite difference discretisation. The method was further extended to different type of PDE's, where similar improved results were established [Ali, N.H.M., Abdullah, A.R. Four Point EDG: A Fast Solver For The Navier–Stokes Equation, M.H.Hamza (ed.) Proceedings of the IASTED International Conference on Modelling Simulation And Optimization, Gold Coast, Australia, May 6–9 (1996) (CD Rom-File 242-165.pdf), ISBN: 0-88986-197-8; Ali, N.H.M., Abdullah, A.R. New Parallel Point Iterative Solutions For the Diffusion-Convection Equation Proceedings of the International Conference on Parallel and Distributed Computing and Networks Singapore, Aug. 11–13 (1997) 136–139; Ali, N.H.M., Abdullah, A.R. “New rotated iterative algorithms for the solution of a coupled system of elliptic equations” Int. J. Comput. Math. 74 (1999) 223–251]. These new iterative algorithms had been developed to run on the Sequent Balance, a shared memory parallel computer [A.R. Abdullah, N.M. Ali, The Comparative Study of Parallel Strategies For The Solution of Elliptic PDE's Parallel Algorithms and Applications Vol. 10 (1996) 93–103; Ali, N.H.M., Abdullah, A.R. “Parallel four point explicit decoupled group (EDG) method for elliptic PDE's” Proceedings of the Seventh IASTED/ISMM International Conference on Parallel and Distributed Computing and Systems (1995) 302–304 (Washington DC); Ali, N.H.M., Abdullah, A.R. New Parallel Point Iterative Solutions For the Diffusion-Convection Equation Proceedings of the International Conference on Parallel and Distributed Computing and Networks, Singapore, Aug. 11–13 (1997) 136–139; Yousif, W.S., Evans, D.J.“Explicit decoupled group iterative methods and their parallel implementations” Parallel Algorithms and Applications 7 (1995) 53–71] where they were shown to be suitable to be implemented in parallel. In this work, the four-point group algorithm was ported to run on a cluster of Sun workstations using a parallel virtual machine (PVM) programming environment together with the four-point explicit group (EG) method [Evans, D.J., Yousif, W.S. “The implementation of the explicit block iterative methods on the balance 8000 parallel computer” Parallel Computing 16 (1990) 81–97]. We describe the parallel implementations of these methods in solving the Poisson equation and the results of some computational experiments are compared and reported. rosni@cs.usm.my kokjl@hotmail.com

  • Research Article
  • Cite Count Icon 12
  • 10.1080/10637190412331279957
FPGA implementation of a Cholesky algorithm for a shared-memory multiprocessor architecture*
  • Dec 1, 2004
  • Parallel Algorithms and Applications
  • Satchidanand G Haridas + 1 more

Solving a system of linear equations is a key problem in engineering and science. Matrix factorization is a key component of many methods used to solve such equations. However, the factorization process is very time consuming, so these problems have often been targeted for parallel machines rather than sequential ones. Nevertheless, commercially available supercomputers are expensive and only large institutions have the resources to purchase them. Hence, efforts are on to develop moreaffordable alternatives. In this paper, we propose such an approach. We present an implementation of a parallel version of the Cholesky matrix factorization algorithm on a single-chip multiprocessor built inside an APEX20K series Field-Programmable Gate Array (FPGA) developed by Altera. Our multiprocessor system uses an asymmetric, shared-memoryMIMD architecture and was built using the configurable Nios™ processor core which was also developed by Altera. Our system was developed using Altera's System-On-a-Programmable-Chip (SOPC) Quartus II development environment. Our Cholesky implementation is based on an algorithm described by George et al.[6]. This algorithm is scalable and uses a “queue of tasks” approach to ensure dynamic load-balancing among the processing elements. Our implementation assumes dense matrices in the input. We present performance results for uniprocessor and multiprocessor implementations. Our results show that the implementation of multiprocessors inside FPGAs can benefit matrix operations, such as matrix factorization. Further benefits result from good dynamic load-balancing techniques.

  • Research Article
  • Cite Count Icon 2
  • 10.1080/10637190410001700604
Fast and scalable parallel matrix computations with reconfigurable pipelined optical buses
  • Dec 1, 2004
  • Parallel Algorithms and Applications
  • Keqin Li

We present fast and highly scalable parallel computations for a number of important and fundamental matrix problems on linear arrays with reconfigurable pipelined optical bus systems. These problems include computing the powers, the inverse, the characteristic polynomial, the determinant, the rank and an LU- and a QR-factorization of a matrix; multiplying a chain of matrices; and solving linear systems of equations. These computations are based on efficient implementation of the fastest sequential matrix multiplication algorithm, and are highly scalable over a wide range of system size. Such fast and scalable parallel matrix computations were not seen before on distributed memory parallel computing systems.

  • Research Article
  • 10.1080/10637190412331295148
Cost-effective modeling for natural resource distribution systems
  • Dec 1, 2004
  • Parallel Algorithms and Applications
  • Abdel-Elah Al-Ayyoub

Pipe systems are in the cores of many real life applications including water, oil and gas distribution as well as air-conditioning and compressed air management. Modeling and analysis of flow in pipe networks is of great practical significance in all these areas. Pipe networks are usually made up of thousands of components such as pipes, pumps, valves, tanks and reservoirs. One common way to model these networks is by using systems of linear equations. Practical sizes for these systems usually involve exhaustive calculations that require high computational power. This work emphasizes the design and evaluation of a concurrent system for modeling pipe networks using linear algebraic methods. The proposed approach offers low-cost and high-speed alternative to traditional solutions. It uses a unified row mapping method that exploits the properties of the pipe network matrix in order to achieve a balanced load distribution. This approach is based on cluster computing as a viable alternative to the expensive massively parallel processing systems. The performance of the proposed approach is investigated on a cluster of workstations connected by general-purpose networks.

  • Research Article
  • Cite Count Icon 1
  • 10.1080/10637190412331295166
Application of MPI-IO in Parallel Particle Transport Monte-Carlo Simulation
  • Dec 1, 2004
  • Parallel Algorithms and Applications
  • Mo Zeyao + 1 more

Parallel computers are increasingly being used to run large-scale applications that also have huge input/output (I/O) requirements. However, many applications usually obtain poor I/O performance on parallel machines. In this paper, we will address the parallel I/O of a parallel particle transport Monte-Carlo simulation code (PTMC) on a parallel computer. This paper shows that, without careful treatments, the I/O overheads will ultimately dominate the elapsed simulation time. Fortunately, we have successfully designed the parallel MPI I/O methods for it. In particular, for a benchmark application MAP6 with 105 steps of 100,000 samples, we have elevated the speedup from 10 with 64 processors to 56 with 90 processors. Moreover, our method is scalable for a larger number of CPUs and a larger number of samples.

  • Research Article
  • 10.1080/10637190410001725445
The Journal of Parallel Algorithms and Applications: Special Issue on Parallel and Distributed Algorithms
  • Jun 1, 2004
  • Parallel Algorithms and Applications
  • George A Gravvanis + 1 more

The Journal of Parallel Algorithms and Applications publishes original quality research throughout various areas, including Parallel and Distributed Algorithms. The scope of the journal includes no...

  • Open Access Icon
  • Research Article
  • Cite Count Icon 12
  • 10.1080/10637190410001725481
HP Java: Programming Support for High-Performance Grid-Enabled Applications
  • Jun 1, 2004
  • Parallel Algorithms and Applications
  • Han-Ku Lee + 3 more

The paper begins by considering what a grid computing environment might be, why it is demanded and how the authors' HP spmd programming fits into this picture. We then review our HP Java environment as a contribution towards programming support for high-performance grid-enabled environments. Future grid computing systems will need to provide programming models. In a proper programming model for grid-enabled environments and applications, high performance on multi-processor systems is a critical issue. We describe the features of HP Java, including run-time communication library, compilation strategies and optimization schemes. Through experiments, we compare HP Java programs against FORTRAN and ordinary Java programs. We aim to demonstrate that HP Java can be used “anywhere”—not only for high-performance parallel computing, but also for grid-enabled applications.

  • Research Article
  • Cite Count Icon 12
  • 10.1080/10637190410001725463
Deadlock-free dynamic reconfiguration over InfiniBand™ NETWORKS
  • Jun 1, 2004
  • Parallel Algorithms and Applications
  • Bilal Zafar + 3 more

InfiniBand Architecture (IBA) is a newly established general-purpose interconnect standard applicable to local area, system area and storage area networking and I/O. Networks based on this standard should be capable of tolerating topological changes due to resource failures, link/switch activations, and/or hot swapping of components. In order to maintain connectivity, the network's routing function may need to be reconfigured on each topological change. Although the architecture has various mechanisms useful for configuring the network, no strategy or procedure is specified for ensuring deadlock freedom during dynamic network reconfiguration. In this paper, a method for applying the Double Scheme over InfiniBand networks is proposed. The Double Scheme provides a systematic way of reconfiguring a network dynamically while ensuring freedom from deadlocks. We show how features and mechanisms available in IBA for other purposes can also be used to implement dynamic network reconfiguration based on the Double Scheme. We also propose new mechanisms that may be considered in future versions of the IBA specification for making dynamic reconfiguration and other subnet management operations more efficient.

  • Research Article
  • 10.1080/10637190410001725454
A bit-serial floating-point unit for a massively parallel system on a chip
  • Jun 1, 2004
  • Parallel Algorithms and Applications
  • Manfred Schimmler* + 2 more

This paper presents the design of a new bit-serial floating-point unit (FPU). It has been developed for the processors of the instruction systolic array (ISA) parallel computer model. In contrast to conventional bit-parallel FPUs the bit-serial approach requires a different data format. Our FPU uses an IEEE compliant internal floating-point format that allows a fast least significant bit (LSB)-first arithmetic and can be efficiently implemented in hardware. Tel.:+49-431-880-4480. Fax:+49-431-880-4054masch@informatik.uni-kiel.de Tel.:+49-461-8051235. Fax:+49-461-8051527lang@fh-flensburg.de

  • Research Article
  • Cite Count Icon 3
  • 10.1080/10637190410001725472
A portable Software Architecture for Mesh-Independent Particle Tracking Algorithms
  • Jun 1, 2004
  • Parallel Algorithms and Applications
  • Jing-Ru C Cheng + 2 more

Particle tracking methods are central to a wide spectrum of scientific computing applications. To support such applications, this paper presents a compact software architecture that can be used to interface parallel particle tracking software to computational mesh management systems. A detailed description is presented of the in-element particle tracking framework supported by this software architecture—a framework that encompasses most particle tracking applications. The use of this parallel software architecture is illustrated through the implementation of two differential equation solvers, the forward Euler and an implicit trapezoidal method, on a distributed, unstructured, computational mesh. A design goal of this software effort has been to interface to software libraries such as Scalable Unstructured Mesh Algorithms and Applications (SUMAA3d) in addition to application codes (e.g. FEMWATER). This goal of portability is achieved through a software architecture that specifies a lightweight functional interface that maintains the full functionality required by particle–mesh methods. The use of this approach in parallel programming environments written in C and Fortran is demonstrated.