Abstract
Speed, portability and correctness have traditionally been the main requirements for dynamic binary translation (DBT) systems. Given the increasing availability of multi-core machines as both emulation guests and hosts, scalability has emerged as an additional design objective. It has however been an elusive goal for two reasons: contention on common data structures such as the translation cache is difficult to avoid without hurting performance, and instruction set architecture (ISA) disparities between guest and host (such as mismatches in the memory consistency model and the semantics of atomic operations) can compromise correctness. In this paper we address these challenges in a simple and memory-efficient way, demonstrating a multi-threaded DBT-based emulator that scales in an architecture-independent manner. Furthermore, we explore the trade-offs that exist when emulating atomic operations across ISAs, and present a novel approach for correct and scalable emulation of load-locked/store-conditional instructions based on hardware transactional memory (HTM). By adding around 1000 lines of code to QEMU, we demonstrate the scalability of both user-mode and full-system emulation on a 64-core x86_64 host running x86_64 guest code, and a 12-core, 96-thread POWER8 host running x86_64 and Aarch64 guest code.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.