A pure dynamic data-flow processor is capable of natural scheduling for parallel execution instructions in its architecture, and therefore we expect that it is applicable to fine-grain parallel computing. The data-driven processor Qv-x series is suitable for the ULSI design. It is made up of self-timed elastic transfer architectures. In this paper, we will report a new architecture Qth-0 to overcome the defects of overhead in generating packets and the total amount of hardware for the firing mechanism. In this Qth-0 architecture, we will provide the “pipelined thread processing mechanism.” Several instructions mutually depending on resources are gathered into a processing unit called a thread by static scheduling, and packets consisting of the thread and operand data are fed into the pipelined ALU. In horizontal VLIW, instructions independent of resources are multiprocessed in spatial shared form, but in the pipelined thread processing mechanism of this architecture, instructions which mutually depend on resources are multiprocessed in time shared form, so we consider this mechanism to be vertical VLIW. By adopting this mechanism, we could decrease the total number of transaction packets and increase the total number of instructions executed for matching token packets. As a result we could improve execution speed to between 9 and 40%. © 1998 Scripta Technica. Syst Comp Jpn, 28(13): 27–35, 1997
Read full abstract