Abstract

Recent trends in computer architecture have increased the role of dedicated hardware logic as an effective approach to computation. Virtualization of logic computations (i.e., by sharing a fixed function) provides a means to effectively utilize hardware resources by context switching the logic to support multiple data streams of computation. Multiple applications or users can take advantage of this by using the virtualized computation in an accelerator as a computational service, such as in a software as a service (SaaS) model over a network. In this paper, we analyze the performance of virtualized hardware logic and develop M/G/1 queueing model equations and simulation models to predict system performance. We predict system performance using the queueing model and tune a schedule for optimal performance. We observe that high variance and high load give high mean latency. The simulation models validate the queueing model, predict queue occupancy, show that a Poisson input process distribution (assumed in the queueing model) is reasonable for low load, and expand the set of scheduling algorithms considered.

Highlights

  • The need for increasing computation, combined with the slowing of Moore’s Law [1] and the demise of Dennard scaling [2,3], has initiated a strong interest in architecturally diverse systems, including graphics processing units (GPUs), neural processing units (NPUs), tensor processing units (TPUs), field-programmable gate arrays (FPGAs), and other custom logic

  • We evaluate the use of virtualized logic computations with our analytical and simulation performance models

  • Analytical Model Predictions We have application and technology independent M/G/1 queueing model equations developed above that can be used to predict the performance of virtualized logic computations for fine- and coarse-grain contexts

Read more

Summary

Introduction

The need for increasing computation, combined with the slowing of Moore’s Law [1] and the demise of Dennard scaling [2,3], has initiated a strong interest in architecturally diverse systems, including graphics processing units (GPUs), neural processing units (NPUs), tensor processing units (TPUs), field-programmable gate arrays (FPGAs), and other custom logic These systems, typically constructed as a combination of traditional processor cores coupled with one or more accelerators, are ubiquitous in the embedded computing domain (e.g., smartphones) and are becoming more and more common across the board (e.g., seven of the top ten systems on the Top500 list, the world’s fastest supercomputers, contain accelerators [4]). This is a technique for virtualizing a hardware design described by a net list that would otherwise be too large to physically fit onto an FPGA [11]. Run sequentially to perform the complete computation with logic reconfiguration done between parts

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call