Abstract

A heterogeneous chip multiprocessor (CMP) architecture consists of processor cores and caches of varying size and complexity. In a multi-programmed computing environment, threads of execution exhibit different run time characteristics and hardware resource requirements. So heterogeneous multiprocessor significantly out perform homogeneous multiprocessor system. Issues in designing and managing heterogeneity in multiprocessor have significant impact on overall system cost and performance. These issues are (a) replicating standard cores is an efficient strategy in homogeneous CMP design but in heterogeneous CMP architecture, particularly a fully custom heterogeneous processor not necessarily composed of pre-existing cores, incurs additional costs in design, verification, and testing, (b) in order to take advantage of a heterogeneous architecture, an appropriate policy to map running tasks to processor cores must be determined to maximize the performance of the whole system by accurately exploiting its resources, so a very good software require to take advantage of heterogeneity, and (c) processor speeds are improving at a much faster than the memories speed, as a result the data access time dominates the execution times of many programs. And in multiprocessor environment this gap increasing, as core count in chip multiprocessor increase, on-chip cache and also the off-chip memory bandwidth get scarcer to the cores. In this paper, we propose a method of creating heterogeneity at run time by partitioning cache and memory bandwidth. In this case, we can take advantage of using pre-existing standard core in designing multiprocessor and a use of basic scheduler with out considering heterogeneity as heterogeneity is created at run time by partitioning cache and bandwidth. Also we have described a method of creating heterogeneity of system by coordinated partitioning of shared last level cache and off-chip memory bandwidth. We have proposed an efficient low overhead approach to partition the cache based on set wise partitioning by separating addressing part and data part, and along with graceful space acquirement policy. This approach quickly re-partitions the cache with minimum overhead and with smaller granularity. Also we have extended the bandwidth partition model which is based on CPI model to handle read/write access behavior of applications. Finally we have analyzed and experimentally evaluated six different cache partitioning schemes and concluded that partition based on available bandwidth partitioning and access frequency of L2 out perform others.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call