Abstract

Power density constraints have affected the scaling of clock speed in processors, but following Moore's law we have entered the multicore domain and we are about to step in the era of manycores. Harnessing the full potential of large number of cores is a challenging problem as shared on-chip resources such as memory subsystem, interconnect networks become the bottlenecks. One easy and popular way of utilizing parallelism in large scale systems is by running multiple instances of the same application as we observe in many domains such as verification, security etc. and we term it as multiexecution. This model of computation will probably become more popular as the number of cores in a processor grows. We identify that leveraging the similarity in data across the instances of an application by dynamically merging identical data in a cache can reduce the off-chip traffic and thereby, lead to faster execution. However, dissimilarities in content increase the competition for cache lines as well. In this paper we explore the design space of hybrid mergeable cache architecture that places dissimilar data blocks in a conventional cache and thereby, enables us to exploit data similarity more efficiently by reducing the conflicts. We experiment with benchmarks from various multi-execution domain and show that our hybrid mergeable cache design leads to an average of 9.5% additional speedup over Mergeable cache while running 8 copies of an application, with an overhead of less than 1.34% in area.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call