Prefetching offers the potential to significantly improve performance by speculatively loading application data so that it is available before it is needed. By their very nature, prefetching techniques are application behavior dependant. This implies that no universal prefetching solution exists. A combination of prefetching strategies need to be used to target a diverse set of applications. In this work, we develop the first comprehensive mathematical framework that allows a designer to better understand the prefetching opportunities of an application. We first use dynamic analysis to study the memory access behavior of an application and measure a series of metrics to both identify the optimized schedule, and estimate its achievable performance. To validate our model, we implement and evaluate three different prefetching strategies: helper threads, software prefetching and FPGA prefetching. We show that, for each individual scenario, our framework correctly generates the optimized schedule of prefetches and predicts the performance improvement with an accuracy of more than 95%. Using our framework, developers can choose the best prefetching strategy and parameters for their specific workload and use case.
Read full abstract