Abstract

Multicore designs have become the dominant organization for future high performance microprocessors. Instead of increasing cache sizes, clock frequencies, pipeline depths or register file (RF) ports, multicore designs tend to make each processor core simple but highly efficient. This new dimension for improving performance and power efficiency in multicore requires us to rethink processor architecture. Multiply-accumulate (MAC) operation is such a performance improvement technique that needs to be reviewed. MAC operation is the fundamentals of many DSP and multimedia applications, but it tends to be awkward to implement in an orthogonal instruction set architecture (ISA) because of operand bandwidth problem, instruction encoding problem, and hardware cost problem. So a big question is that whether we should support MAC or not in high-efficiency processor designs? This paper does a comparative study on this question and introduce data bandwidth relaxing techniques to eliminate narrow bandwidth provided by two-port RFs. The trade-off are also made to solve the instruction coding and hardware cost problem. So, the new design wisdom becomes that if you support multiply (MUL) operation, then support MAC operation.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.