Abstract

Java processors have been introduced to offer hardware acceleration for java applications. They execute java bytecodes directly in hardware. However, the stack nature of the java virtual machine instruction set imposes a limitation on the achievable execution performance. If we intend to exploit instruction level parallelism, we must remove the stack completely. This can be achieved by recursive stack folding algorithms, such as OPEX, which dynamically transform groups of java bytecodes to RISC like instructions. However, the decoding throughputs that are obtained are limited. In this paper we propose a novel stack folding technique, that uses a predecoded cache to store folded bytecodes, thus enabling reuse. The decoding throughput reaches 4 RISC instructions per cycle. With use of a superscalar backend core, the obtained IPC is approximately 2.08 instructions per cycle (or 3.02 java bytecodes per cycle).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call