Binary translation enables transparent execution, analysis, and modification of the binary program, serving as a core technology that facilitates instruction set emulation, cross-platform compatibility of software, and program instrumentation. Handling indirect branch instructions is widely recognized as a significant performance bottleneck in binary translation. While the target of direct branch can be determined during the translation phase, indirect branch requires a run-time lookup from the guest program counter to the host program counter, significantly influencing the performance of translator. Although several methods have been proposed to accelerate this process, each guest indirect branch instruction still translates into approximately ten host instructions, resulting in considerable overhead. This paper introduces Tiaozhuan, which addresses this issue by employing two optimization schemes. Firstly, Full Address Mapping uses a larger address space to store address mappings from guest to host, effectively reducing the number of instructions required to lookup the target of an indirect branch. Secondly, Exception Assisted Branch Elimination further eliminates branch instructions that check target correctness of targets in the lookup process. These two approaches enable indirect branches target lookup to be completed within 1-2 instructions, noticeably decreasing the overhead of indirect branches. Compared to state-of-the-art mechanisms, the SPEC CPU2006 benchmark suite showed a reduction in the number of instructions by an average of 4.2%, with the highest observed performance improvement reaching 19.4% and an average increase of 3.9%.
Read full abstract