Abstract

This paper introduces a special case of the floorplanning problem for optimizing neural networks to run on a wafer-scale computing engine. From a compute perspective, neural networks can be represented by a deeply layered structure of compute kernels. During the training of a neural network, gradient descent is used to determine the weight factors. Each layer then uses a local weight tensor to transform activations and gradients that are shared among connected kernels according to the topology of the network. This process is computationally intensive and requires high memory and communication bandwidth. Cerebras has developed a novel computer system designed for this work that is powered by a 21.5cm by 21.5cm wafer-scale processor with 400,000 programmable compute cores. It is structured as a regular array of 633 by 633 processing elements, each with its own local high bandwidth SRAM memory and direct high bandwidth connection to its neighboring cores. In addition to supporting traditional execution models for neural network training and inference, this engine has a unique capability to compile and compute every layer of a complete neural network simultaneously. Mapping a neural network in this fashion onto Cerebras' Wafer-Scale Engine (WSE) is reminiscent of the traditional floorplanning problem in physical design. A kernel ends up as a rectangle of x by y compute elements. These are the flexible blocks that need to be placed to optimize performance. This paper describes an ISPD 2020 challenge to develop algorithms and heuristics that produce compiled neural networks that achieve the highest possible performance on the Cerebras WSE.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call