Abstract

Data compression has been intensively studied to increase the utility of cache, network-on-chip (NoC), and main memory in energy-efficient processors. However, prior solutions to data compression often add remarkable compression and decompression delay to the critical path of memory access, which is thought as the major factor limiting its application to commodity processors. Unlike prior work that deals with memory compression or network compression separately, this paper proposes a unified on-chip distributed data compressor (DISCO), to enable near-zero-latency cache and memory block compression for chip multiprocessors adopting nonuniform cache access. DISCO integrates a multimode cache compressor into the NoC routers and overlaps the de/compression latency with the queuing delay in the network. In addition, cache block evicted to or fetched from the main memory can also be compressed or decompressed during the network queuing time in this unified DISCO compressor. With the support of congestion-awareness, it is shown in the evaluation that DISCO, which unifies the compression solution of the memory hierarchy, dramatically decreases the compression overhead of isolated techniques, and significantly boosts the efficiency of data moving and store.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call