A multiprogramming stand alone systolic data flow machine (abstract only)

Doron Tal

doi:10.1145/322917.323097

Abstract

A recent trend in computer systems is the distribution of computation among physical processors and a search for alternatives to the classical sequential machine. The data flow (DF) machine is such an alternative (Gajs82, Arvi84, Trel82, Pase85), being radically different from the von Neumann approach. The DF architecture circumvents traditional sequential execution by introducing parallelism into the control flow. Execution flows naturally as operands become available, following an algorithm's parallelism. Unfortunately, several problems have plagued DF from its inception. The absence of explicit storage, operand accumulation and the law of granularity impose serious overhead problems, while the data-driven philosophy prevents lookahead and instruction overlap parallelism. The DF machine does not handle complex data structures efficiently and its process representation is limited to physical size. Finally all the DF machines built or simulated to date have been single-user and have employed an external control to provide operating system services. These shortcomings have motivated the proposal of a new architectural concept called a Systolic Data Flow Machine (SDFM). We discuss here only those concepts of SDFM relevant to multiprogramming, virtualization and operating system support. Details concerning the architecture are available in (Berg86, Tal86a, Tal86c).The SDFM is based on a partitioning of DF programs, which are graphs, into subgraphs (blocks) of sufficiently small size so that they may be mapped to programmable systolic arrays (Fish83). The processing element (PE), is the basic execution element of the SDFM and contains such an array and a local common memory. DF program blocks are mapped to PEs during execution of a DF program. Only a subset of a program must be physically resident in PE's at any time, permitting virtualization. Tokens transmitted between blocks are handled by a matching unit which generates a block fault whenever a destination block is not currently resident. Block faults cause replacement of inactive resident PE blocks by those currently demanded. The block replacement algorithm behaves according to some “inactivity” criterion, such as “no processors active” or “no token produced for some time”. The PE local memory may hold several block images, making feasible fast and efficient block switching. In order to facilitate efficient block utilization, sharing of code and code duplication, the architecture is extended to support logical entities called logical segments (LSs). Semantically, LSs are in some sense analogous to the good old fashioned procedures. Syntactically, programs are partitioned to loosely connected LSs subgraphs with related code segments, each is composed of several blocks which are closer to system hardware, thus achieving the logical benefits embedded in a large grain DF machine while using blocks for physical mapping. Moreover, LSs can be associated with different attributes which control protection and priorities facilitating services concerned with scheduling, interrupt handling etc. Frequently invoked OS procedures can permanently reside in PE's, eliminating artificial linearization in conventional OSs. The SDFM global memory provides virtual backup to the local memories, storage of DF programs, I/O address space, and serves as a backup for the match memory. The operating system is further discussed in (Tal86b).

Full Text