Abstract
Multi-core architectures also referred to as Chip Multiprocessors (CMPs) have emerged as the dominant architecture for both desktop and high-performance systems. CMPs introduce many challenges that need to be addressed to achieve the best performance. One of the big challenges comes with the shared-memory model observed in such architectures which is the cache coherence overhead problem. Contemporary architectures employ write-invalidate based protocols which are known to generate coherence misses that yield to latency issues. On the other hand, write-update based protocols can solve the coherence misses problem but they tend to generate excessive network traffic which is especially not desirable for CMPs. Previous studies have shown that a single protocol approach is not sufficient for many sharing patterns. As a solution, this paper evaluates an adaptive protocol which targets write-update optimizations for producer-consumer sharing patterns. This work targets a minimalistic hardware extension approach to test the benefits of such adaptive protocols in a practical environment. Experimental study is conducted on a 16-core CMP by using a full-system simulator with selected scientific applications from SPLASH-2 and NAS parallel benchmark suites. Results show up to 40% improvement for coherence misses which corresponds to 15% application speedup.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.