Abstract
This paper analyzes the performance of vector-dominated regions of code in numerical and multimedia applications in a superscalar + vector architecture and compares it with an eight-way superscalar processor. The ability to split a program’s execution into scalar and vector regions allows us to show that (1) as expected, the vector unit is much better than the wide-issue superscalar at executing the vector-dominated regions of the code; (2) on the scalar regions, the eight-way superscalar, although better than a four-way superscalar, is clearly not worth the extra complexity in terms of extra transistors and potential cycle-time limitations. Overall, the vector-enhanced superscalar is from 6% to 303% better than an eight-way superscalar. We also present detailed data on the performance of the memory system, which is usually the key limiting factor when running numerical and multi-\break media applications. We evaluate two additional cache designs that try to alleviate problems created by non-unit stride memory references.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.