Abstract
Dynamic binary translation (DBT) requires the implementation of load-link/store-conditional (LL/SC) primitives for guest systems that rely on this form of synchronization. When targeting, e.g., ×86 host systems, LL/SC guest instructions are typically emulated using atomic compare-and-swap (CAS) instructions on the host. Whilst this direct mapping is efficient, this approach is problematic due to subtle differences between LL/SC and CAS semantics. In this article, we demonstrate that this is a real problem, and we provide code examples that fail to execute correctly on QEMU and a commercial DBT system, which both use the CAS approach to LL/SC emulation. We then develop two novel and provably correct LL/SC emulation schemes: 1) a purely software-based scheme, which uses the DBT system's page translation cache for correctly selecting between fast, but unsynchronized, and slow, but fully synchronized memory accesses and 2) a hardware-accelerated scheme that leverages hardware transactional memory (HTM) provided by the host. We have implemented these two schemes in the Synopsys DesignWare ARC nSIM DBT system, and we evaluate our implementations against full applications, and targeted microbenchmarks. We demonstrate that our novel schemes are not only correct but also deliver competitive performance on-par or better than the widely used, but broken CAS scheme.
Accepted Version (
Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have