Abstract

With the continuous shrinking of transistor size, processor designers are facing new difficulties to achieve high clock frequency. The register file read time, the wake up and selection logic traversal delay and the bypass network transit delay with also their respective power consumptions constitute major difficulties for the design of wide issue superscalar processors. In this paper, we show that transgressing a rule, that has so far been applied in the design of all the superscalar processors, allows to reduce these difficulties. Currently used general-purpose ISAs feature a single logical register file (and generally a floating-point register file). Up to now all superscalar processors have allowed any general-purpose functional unit to read and write any physical general purpose register. First, we propose Register Write Specialization, i.e, forcing distinct groups of functional units to write only in distinct subsets of the physical register file, thus limiting the number of write ports on each individual register. Register Write Specialization significantly reduces the access time, the power consumption and the silicon area of the register file without impairing performance. Second, we propose to combine Register Write Specialization with Register Read Specialization for clustered superscalar processors. This limits the number of read ports on each individual register and simplifies both the wakeup logic and the bypass network. With a 8-way 4-cluster WSRS architecture, the complexities of the wake-up logic entry and bypass point are equivalent to the ones found with a conventional 4-way issue processor. More physical registers are needed in WSRS architectures. Nevertheless, using WSRS architecture allows a dramatic reduction of the total silicon area devoted to the physical register file (by a factor four to six). Its power consumption is more than halved and its read access time is shortened by one third. Some extra hardware and/or a few extra pipeline stages are needed for register renaming. WSRS architecture induces constraints on the policy for allocating instructions to clusters. However, performance of a 8-way 4-cluster WSRS architecture stands the comparison with the one of a conventional 8-way 4-cluster conventional superscalar processor.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.