Graphics Processing Units (GPUs) are commonly integrated with computing devices to enhance the performance and capabilities of graphical workloads. In addition, they are increasingly being integrated in data centers and clouds such that they can be used to accelerate data intensive workloads. Under a number of scenarios the GPU can be shared between multiple applications at a fine granularity allowing a spy application to monitor side channels and attempt to infer the behavior of the victim. For example, OpenGL and WebGL send workloads to the GPU at the granularity of a frame, allowing an attacker to interleave the use of the GPU to measure the side-effects of the victim computation through performance counters or other resource tracking APIs. We demonstrate the vulnerability by implementing three end-to-end attacks. We show that an OpenGL or CUDA based spy can fingerprint websites accurately (attack I), track user activities within the website, and even infer the keystroke timings for a password text box (attack II) with high accuracy. The third attack demonstrates how a CUDA spy application can derive the internal parameters of a neural network model being used by another CUDA application on the cloud. To counter these attacks, the paper suggests mitigations based on limiting the rate of the calls, or limiting the granularity of the returned information.
Read full abstract