Abstract

The architectural challenges for reaching extreme‐scale computing necessitate major progress in designing high performance and energy‐efficient hardware building blocks, such as microprocessors. The chip multiprocessor (CMP) architecture has emerged as a preferred solution to exploit the increasing transistor density for sustainable performance improvement. As the core count keeps scaling up, developing parallel applications to reap commensurate performance improvement becomes imperative and of paramount importance. The Hardware Transactional Memory (HTM) approach promises increased productivity in the practice of parallel programming. Recent research in academia and industry suggests that the design space and tradeoffs of HTM are still far from being well understood. To pave the way for more HTM‐enabled processors, two crucial issues in HTM designs must be addressed. The first issue is achieving high performance under frequent transaction conflicts. The second issue is designing energy‐efficient HTM techniques. Invariably, both issues demand efficient communication in transaction execution. This thesis dissertation contributes a set of hardware techniques to achieve efficient and scalable communication in such systems. ❧ First, we contribute the Selective Eager‐Lazy HTM system (SEL‐TM) to leverage the concurrency and communication benefit of lazy version management while suppressing its corresponding complexity and overhead with eager management. The mixed mode execution generates 22% less network traffic in high contention workloads representative of upcoming TM applications. The performance is improved by at least 14% over either a pure eager or a pure lazy HTMs. Second, we contribute Transactional Memory Network‐on‐Chip (TMNOC), an in‐network filtering mechanism that proactively filters away pathological transactional requests that waste network‐on‐chip bandwidth utilization. TMNOC is the first published HTM‐network co‐design. Experimental results show that TMNOC reduces network traffic by 20% averaged across the high contention workloads, thereby reducing network energy consumption by 24%. The third proposal mitigates the disruptive coherence forwarding in transactional execution when the cache coherence protocol is reused for conflict detection. We address the problem with a Predictive Unicast and Notification (PUNO) mechanism. PUNO is effective in reducing transaction aborting by 43% on average and avoiding 17% of the on‐chip communication. Fourth, we propose Consolidated Conflict Detection (C2D), a holistic solution that addresses the communication overhead in conflict detection with cost‐effective hardware designs. Evaluations show that the C2D technique, when being used to implement eager conflict detection, can reduce 39% of the on‐chip communication. The corresponding energy savings due to C2D is 27%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call