Distributed register-file microarchitecture (DRFM), which comprises multiple uniform blocks (called islands), each containing a dedicated register file, functional unit(s) and data-routing logic, has been known as a very attractive architecture for implementing designs with platform-featured on-chip memory or register-file IP blocks. In comparison with the discrete-register-based architecture, DRFM offers an opportunity of reducing the cost of global (inter-island) connections by confining as many of the computations to the inside of the islands as possible. Consequently, for DRFM architecture, two important problems to be solved effectively in high-level synthesis are: (problem 1) scheduling and resource binding for minimising inter-island connections (IICs) and (problem 2) data transfer (i.e. communication) scheduling through the IICs for minimising access conflicts among data transfers. By solving problem 1, the design complexity because of the long interconnect delay is minimised, whereas by solving problem 2, the additional latency required to resolve the register-file access conflicts among the inter-island data transfers is minimised. This work proposes novel solutions to the two problems. Specifically, for problem 1, previous work solves it in two separate steps: (i) scheduling and (ii) then determining the IICs by resource binding to islands. However, in this algorithm called DFRM-int, the authors place primary importance on the cost of interconnections. Consequently, the authors minimise the cost of interconnections first to fully exploit the effects of scheduling on interconnects and then to schedule the operations later. For problem 2, previous work tries to solve the access conflicts by forwarding data directly to the destination island. However, in this algorithm called DFRM-com, the authors devise an efficient technique of exploring an extensive design space of data forwarding indirectly as well as directly to find a near-optimal solution. By applying this proposed synthesis approach DFRM-int+DFRM-com, the authors are able to further reduce the IICs by 17.9%, compared with that by the conventional DRFM approach, even completely eliminating register-file access conflicts without any increase of latency.
Read full abstract