Abstract

Current multi-core processors implement sophisticated hardware prefetchers, that can be configured by application (PID), to improve the system performance. When running multiple applications, each application can present different prefetch requirements, hence different configurations can be used. Setting the optimal prefetch configuration for each application is a complex task since it does not only depend on the application characteristics but also on the interference at the shared memory resources (e.g., memory bandwidth). In his paper, we propose <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">DeepP</i> , a deep learning approach for the IBM POWER8 that identifies at run-time the best prefetch configuration for each application in a workload. To this end, the neural network predicts the performance of each application under the studied prefetch configurations by using a set of performance events. The prediction accuracy of the network is improved thanks to a dynamic training methodology that allows learning the impact of dynamic changes of the prefetch configuration on performance. At run-time, the devised network infers the best prefetch configuration for each application and adjusts it dynamically. Experimental results show that the proposed approach improves performance, on average, by 5.8%, 6.7%, and 15.8% compared to the default prefetch configuration across different 6-, 8-, and 10-application workloads, respectively.

Highlights

  • Hardware data prefetching is a speculative technique that fetches data in advance to the processor

  • We demonstrate that a neural network can be trained to predict the performance (IPC) of co-running applications depending on the prefetch configuration with high accuracy and minimal overhead regardless of the number of applications, which allows scaling in performance with the number of applications

  • We propose a multi-program aware neural network for the IBM POWER8 that establishes a correlation between inter-application interference and performance by taking as input the aggregate memory bandwidth consumption of co-runners

Read more

Summary

Introduction

Hardware data prefetching is a speculative technique that fetches data in advance to the processor. Current high-performance processors are deployed with a set of configurable prefetches aimed at capturing different memory behaviors. The IBM POWER family of processors implements the most complex and powerful prefetchers deployed in current servers. These processors allow the user to update the prefetch setting at run-time, which is especially challenging in the case of multicores, where main memory bandwidth contention can become a severe performance bottleneck. The result of the sum is fed to an activation function. The purpose of this function is twofold. It normalizes the sum result in a range, which is commonly between 0 and 1 or -1 and 1

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call