Software and Hardware Co-designed Multi-level TLBs for Chip Multiprocessors

Xiaohui Zhang,Guangqiang Chen,Ming Cong

doi:10.1109/cit.2011.17

Abstract

Translation Look aside Buffers (TLBs) have a significant impact on system performance. Numerous prior studies focus on TLBs design for uniprocessors. As the advent of chip multiprocessors (CMPs), we need shift to TLBs for chip multiprocessors. This paper presents a software-implemented level-two TLB -- SL2-TLB which is a shared level-two TLB for multiprocessors. It not only reduces the cost of TLB refill handler for every processor core, but also reduces the redundant TLB misses' cost for CMPs effectively. Today, CMPs typically employ private per-core TLBs. SL2-TLB together with the hardware TLBs make up a software-hardware co-designed multilevel TLB system which brings great benefit to system performance while avoiding changing the hardware TLB. So it is a convenient and efficient method for CMPs' TLB performance improvement. The benefit brought by SL2-TLB to SPECCPU2000 is less than that to SPECCPU2006, about 5% and 7% separately. Therein to, the average performance improvement of SPECint 2006 reaches about 12.7%. That is because the overhead for TLB refill is small when the cache is large enough to avoid a miss as walking the page table of applications with small memory footprints. The further optimization for SL2-TLB is kept the SL2-TLB table stay in L2 cache forever by the cache locking scheme. SL2-TLB together with cache locking scheme improves the performances by over 13% for SPECint 2006. And an average performance improvement of over 7% is brought to the new emerging parallel benchmark suite-Princeton Application Repository for Shared-Memory Computers (PARSEC). And all the above evaluations are done on Godson-3 processors which is the latest generation of China's most powerful microprocessor family.

Full Text