Adaptive local grid refinement/coarsening results in unequal distribution of work load among the processors of a parallel system. A novel method for balancing the load in cases of dynamically changing tetrahedral grids is developed. The approach employs local exchange of cells among processors to redistribute the load equally. An important part of the load-balancing algorithm is the method employed by a processor to determine which cells within its subdomain are to be exchanged. Two such methods are presented and compared. The strategy for load balancing is based on the divide-and-conquer approach that leads to an efficient parallel algorithm. This method is implemented on a distributed-memory multiple instruction multiple data system. I. Introduction C OMPUTATIONAL fluid dynamics (CFD) has advanced rapidly over the last two decades, and it is recognized as a valuable tool for engineering design. However, numerical simulation of three-dimensi onal flowfields remains very expensive even with the use of current vector supercomputers. Vector computers have accelerated computations only by one or two orders of magnitude compared with scalar machines. An evolving approach in computer architectures is the design of scalable massively parallel computers, wherein a number of processors are involved in executing different portions of the job. Parallel computing appears to be a promising approach for future design applications of CFD. State-of-the-art parallel architectures can be broadly classified into single instruction multiple data (SIMD), shared memory multiple instruction multiple data (MIMD), and partitioned memory MIMD architectures. Shared memory MIMD architectures such as the Cray Y-MP are extensions of pipelined vector processors with additional facilities to enable effective utilization of the available multiple processors. However, the scalability of such architectures is severely limited due to the inherent bottleneck of the common, shared memory. On the other hand, SIMD architectures such as the CM-2 are based on the lockstep paradigm of parallel computation wherein a large number of processors execute the same instructions on local data. Although this enhances scalability, the overhead associated with communication among these processors can sometimes prove to be a major bottleneck.1 Partitioned memory MIMD architectures provide a good compromise between the other two types of systems. The user has the flexibility to allocate data as well as tasks to each individual processor, thereby enabling fine tuning of the application to the underlying architecture. There is a price to be paid, however, in terms of additional user responsibility to coordinate the processors by exchanging the relevant information via message passing. In the case of an adaptive grid algo