Abstract

in shared-memory Chip Multiprocessor (CMP), shared data between different cores must be exchanged through the last-level-shared-cache and cache coherence must be maintained at the same time. As the number of cores increase, the cache coherence wall has become more and more serious. As for the multimedia applications full of streaming-like data, existing multicore cache coherence protocols show lower performance and cannot meet the timeliness. In the paper, considering the poor temporal locality and high real-time characteristics of the multimedia data, we propose the distributed light-weight active-push buffer (DWALP-buffer) architecture to alleviate the cache coherence latency on streaming-like data in CMP. The architecture introduces a dedicated shared-data exchange channel between adjacent cores. The channel bridges the internal register files and reduces the shared-data communication latency. Supported by the control protocol, the architecture can adaptively balance the rate mismatch in producer-consumer pipeline model. We build a quad-core CMP simulation platform with the DWLAP-buffers. Our experiment indicates that comparing with the last-shared-level-cache method the architecture can improve the average performance by 13% and alleviate the snooping operations caused from maintaining cache coherence by 26%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call