Abstract

In order to improve the processor performance, the response of the industry has been to increase the number of cores on the die. One salient feature of multi-core architectures is that they have a varying degree of sharing of caches at different levels. With the advent of multi-core architectures, we are facing the problem that is new to parallel computing, namely, the management of hierarchical caches. Data locality features need to be considered in order to reduce the variance in the performance for different data sizes. In this paper, we propose a programming approach for the algorithms running on shared memory multi-core systems by using blocking, which is a well-known optimization technique coupled with parallel programming paradigm, OpenMP. We have chosen the sizes of various problems based on the architectural parameters of the system like cache level, cache size, cache line size. We studied the cache optimization scheme on commonly used linear algebra applications – matrix multiplication (MM), Gauss-Elimination (GE) and LU Decomposition (LUD) algorithm.

Highlights

  • While microprocessor technology has delivered significant improvements in clock speed over the past decade, it has exposed a variety of other performance bottlenecks

  • We present the parallelization of matrix multiplication (MM), GE and LU Decomposition (LUD) algorithm on shared memory systems using OpenMP

  • For GE and LUD problems, we used the approach of 1D partitioning of the matrix among the cores and used OpenMP paradigm for distributing the work among number of www.ijacsa.thesai.org threads to be executed on various cores

Read more

Summary

INTRODUCTION

While microprocessor technology has delivered significant improvements in clock speed over the past decade, it has exposed a variety of other performance bottlenecks To alleviate these bottlenecks, microprocessor designers have explored alternate routes to cost effective performance gains. An important feature of these new architectures is the integration of large number of simple cores with software managed cache hierarchy with local storage. Offering these new architectures as general-purpose computation platforms creates number of problems, the most obvious one being programmability. It is essential that algorithms be designed to maximize data locality so as to best exploit the hierarchical cache structures.

COMPUTING PROBLEM
RELATED WORK
IMPLEMENTATION
Architecture Aware Parallelization
Determining Block Size
Effect of Cache Line Size
LU Decomposition
EXPERIMENTAL SETUP & RESULTS
PERFORMANCE ANALYSIS
CONCLUSION & FUTURE WORK
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call