Abstract

The widening gap between processor speed and memory latency makes memory accesses become a major performance bottleneck for modern processor architectures. Eliminating the memory access and improving the program locality may help to improve the performance. This paper proposes the extension design of load and store instructions, describes the hardware/software cooperative optimization idea for global data access. The research work is based on the synthetic analysis of program data access behavior and compiler optimization ability observation, which has a great influence on the design decision. Compared with the popular GP-addressing technique, our method may not only save the dedicated registers, but also eliminate the overhead of calling functions in dynamic shared objects. Experiment results show an average reduction of 5.92% in dynamic memory access instructions and an average improvement by 2.16% in code size for SPEC programs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call