AbstractThis paper introduces the MoM-3 as a reconfigurable accelerator for high perform-ance computing at a moderate price. By using a new machine paradigm to trigger theoperations in the MoM-3, this accelerator is especially suited to scientific algorithms,where the hardware structure can be configured to match the structure of the algorithm.The MoM-3 efficiently uses reconfigurable logic devices to provide a fine-grain parallel-ism, and multiple address generators to have the complete memory bandwidth free fordata transfers (instead of fetching address computing instructions).Speed-up factors up to 82, compared to state-of-the-art workstations, are demon-strated by means of an Ising spin system simulation example. Adding the MoM-3 as anaccelerator allows to achieve supercomputer performance from a low-cost workstation. 1. Introduction Scientific computing provides the greatest challenges to modern workstations and even supercom-puters. A lot of different computer architectures have been presented, which take into account char-acteristics, that are common to many scientific algorithms. Vector processors [4] speed upoperations on large arrays of data by the use of pipelining techniques. Parallel multiprocessor archi-tectures [15] benefit from the fact, that many operations on large amounts of data are independentfrom each other. This allows to distribute these operations onto different processors (or processingelements) and execute them in parallel. But all of these architectures basically still follow thevon Neumann machine paradigm with a fixed instruction set, where the sequence of instructionstriggers the accesses to data in memory and the data manipulations.The Map-oriented Machine 3 (MoM-3) is an architecture based on the Xputer machine paradigm[3]. Instead of a hardwired ALU with a fixed instruction set, an Xputer has a reconfigurable ALUbased on field-programmable devices. All data manipulations, which are performed in the loopbodies of an algorithm, are combined to a set of compound operators. Each compound operatormatches a single loop body and takes several data words as input to produce a number of resultingdata words. The compound operators are configured into the field-programmable devices. Afterconfiguration, an Xputer’s “instruction set” consists only of the compound operators as they arerequired by the algorithm actually running on the Xputer. The combination of several operations ofa high level language description to one compound operator allows to introduce pipelining and finegrain parallelism to a larger extend, as can be done in fixed instruction set processors. E.g. interme-diate results can be passed along in the pipeline, instead of writing them back to the register fileafter every instruction. Since many scientific algorithms compute array indices in several nestedloops, the sequence of data addresses in a program trace shows a regular pattern. This leads to theidea to have complex address generators compute such address sequences from a small parameterset, which describes the address pattern. And instead of an instruction sequencer as a centralizedcontrol to trigger the operations in the reconfigurable ALU, the address generators themselves serveas a decentralized control. They automatically activate the appropriate compound operator, eachtime a new set of input data is fetched from memory and the previous results have been writtenback. This so-called data sequencing mechanism directly matches the loop structure of the algo-