L2 Cache Modeling for Scientific Applications on Chip Multi-Processors

Fengguang Song,Jack Dongarra,Shirley Moore

doi:10.1109/icpp.2007.52

Abstract

It is critical to provide high performance for scientific applications running on chip multi-processors (CMP). A CMP architecture often comprises a shared 12 cache and lower-level storages. The shared 12 cache can reduce the number of cache misses if the data are accessed in common by several threads, but it can also lead to performance degradation due to resource contention. Sometimes running threads on all cores can cause severe contention and increase the number of cache misses greatly. To investigate how the performance of a thread varies when running it concurrently with other threads on the remaining cores, we develop an analytical model to predict the number of misses on the shared L2 cache. In particular, we apply the model to thread-parallel numerical pro grams. We assume that all the threads compute homogeneous tasks and share a fully associative L2 cache. We use circular sequence profiling and stack processing techniques to analyze the L2 cache trace to predict the number of compulsory cache misses, capacity cache misses on shared data, and capacity cache misses on private data, respectively. Our method is able to predict the L2 cache performance for threads that have a global shared address space. For scientific applications, threads often have overlapping memory footprints. We use a cycle accurate simulator to validate the model with three scientific programs: dense matrix multiplication, blocked dense matrix multiplication, and sparse matrix-vector product. The average relative errors for the three experiments are 8.01%, 1.85%, and 2.41%, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

L2 Cache Modeling for Scientific Applications on Chip Multi-Processors

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Load Balance Scheduling Algorithm for CMP Architecture
Qingsong Shi ... Wei Hu
-
Qingsong Shi, et. al. Qingsong Shi ... Wei Hu
01 Feb 2009
01 Feb 2009

Novel fairness-aware co-scheduling for shared cache contention game on chip multiprocessors
Zheng Xiao ... Keqin Li
Information Sciences | VOL. 526
Zheng Xiao, et. al.Zheng Xiao ... Keqin Li
02 Apr 2020
Information Sciences | VOL. 526

NCID
Li Zhao ... Srihari Makineni
-
Li Zhao, et. al.Li Zhao ... Srihari Makineni
17 May 2010
17 May 2010

A reusability-aware cache memory sharing technique for high-performance low-power CMPs with private L2 caches
Sungjune Youn ... Hyunhee Kim
-
Sungjune Youn, et. al.Sungjune Youn ... Hyunhee Kim
27 Aug 2007
27 Aug 2007

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

L2 Cache Modeling for Scientific Applications on Chip Multi-Processors

Abstract

Talk to us

Similar Papers