The purpose of this talk is twofold: to give an overview of the major parallel processor projects at IBM T.J. Watson research and then to discuss in some detail the VICTOR [1] message-passing parallel machine. Finally, concepts of a possible massively-parallel machine for very high performance are briefly discussed. In the first part of the talk the RP3 [2], the ACE [3], the GFII [4], the M/370, the Hybrid Dataflow [5], and the VLIW [6] projects will be discussed at a level sufficient to explain the key features of each of these projects, but without going into technical details. In summary, the RP3 is a 64-node shared memory machine, the ACE a 8-node shared bus workstation, the GF11 a switch-based SIMD machine with up to 576 nodes, the M/370 a project to study commercial applications of message-passing machines. The Hybrid Data Flow and the VLIW projects intend to explore very fine grained parallelism. The VICTOR project is a family of transputer based machines which were designed and build at T.J. Watson. It includes V32 and V256 with 32 and 256 nodes, respectively, and a set of 16-node workstations. All of these are presently operational. All machines except V32 are T800 based, with 4 MB of memory/node. V256 incorporates a distributed 16-node fileserver with 10 GB capacity. Special hardware supports multiple-users via spatial partitioning and non-intrusive processor monitoring. The operating environment is based on a mixture of vendor compiler and loader technology and a runtime system developed in the VICTOR group. In particular, this includes a set of routing routines and support for the filesystem. Languages used are OCCAM, C, Pascal and Fortran 77. Presently, the Trollius [7] operating system which has been developed at Cornell University is being ported onto VICTOR. Applications programs which have been written for VICTOR include presently fractals, ray-tracing, a nuclear physics Monte Carlo code, a computer pipeline model and a neural network code. Under development presently are two codes for VLSI design, dealing with parallel fault simulation and with circuit simulation. The latter one uses a parallel version of waveform relaxation techniques and is expected to be able to handle circuits with up to 106 transistors. Other applications being developed are graphical process monitors, database research, 3D-seismic code and multi-robot simulation. Work in the Mathematical Sciences department is focussed on developing a parallel language with emphasis on dynamic process scheduling. In the last part of the talk issues of scaling message-passing architectures into the Teraflop performance range are discussed.
Read full abstract