RegCPython: A Register-based Python Interpreter for Better Performance

Qiang Zhang,Lei Xu,Baowen Xu

doi:10.1145/3568973

Abstract

Interpreters are widely used in the implementation of many programming languages, such as Python, Perl, and Java. Even though various JIT compilers emerge in an endless stream, interpretation efficiency still plays a critical role in program performance. Does a stack-based interpreter or a register-based interpreter perform better? The pros and cons of the pair of architectures have long been discussed. The stack architecture is attractive for its concise model and compact bytecode, but our study finds that the register-based interpreter can also be implemented easily and that its bytecode size only grows by a small margin. Moreover, the latter turns out to be appreciably faster. Specifically, we implemented an open source Python interpreter named RegCPython based on CPython v3.10.1. The former is register based, while the latter is stack based. Without changes in syntax, Application Programming Interface, and Application Binary Interface, RegCPython is excellently compatible with CPython, as it does not break existing syntax or interfaces. It achieves a speedup of 1.287 on the most favorable benchmark and 0.977 even on the most unfavorable benchmark. For all Python-intensive benchmarks, the average speedup reaches 1.120 on x86 and 1.130 on ARM. Our evaluation work, which also serves as an empirical study, provides a detailed performance survey of both interpreters on modern hardware. It points out that the register-based interpreters are more efficient mainly due to the elimination of machine instructions needed, while changes in branch mispredictions and cache misses have a limited impact on performance. Additionally, it confirms that the register-based implementation is also satisfactory in terms of memory footprint, compilation cost, and implementation complexity.

Full Text