Abstract

The trace cache is a technique that provides accurate, high bandwidth instruction fetch. However, when a desired instruction trace is not found in the cache, conventional instruction fetch and decode must be used to satisfy the trace request. Such auxiliary fetch hardware can be expensive in terms of energy, area and complexity. An approach to combine a trace cache and conventional instruction fetch hardware using a decoupled design is explored. The design enables the processor to dynamically switch between trace ID and PC-based prediction methods and helps to hide the latency associated with the instruction memory path. The decoupled design with accelerated slow path instruction delivery and no instruction cache is able to provide comparable benefit to a front-end with an 8 kB instruction cache (within 2% of the instructions per cycle with the cache). High tolerance can be demonstrated for both trace table misses and increased memory latency when scaling down the size of the trace table and scaling up the L2 access latency.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.