Abstract

With the wide availability of chip multi-processing (CMP), software developers are now facing the task of effectively parallelizing their software code. Once they have identified the areas of parallelization, they will need to know the level of code granularity needed to ensure profitable execution. Furthermore, this problem multiplies itself with different hardware available. In this paper, we present a novel approach for fair comparison of the hardware configuration by simulation through configuring a pair of quad-core processors. The simulated configuration represents shared cache CMP, private cache CMP and symmetrical multiprocessor (SMP) environment. We then present a modified lmbench micro-benchmark suite to measure the cost of threading on these different hardware configurations. In our empirical studies, we observe that shared cache CMP exhibits better performance when the operating systems load balancer is highly active. However, the measurements also indicate that thread size is an important consideration where potential cache trashing can occur when sharing a cache between processing cores. Private cache CMP and SMP do not exhibit significant difference in our measurements. The techniques presented can be incorporated into integrated development environment, compilers and potentially even other run-time environments.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.