Abstract

Despite its unprecedented prevalence, a foundation model’s exponentially growing training cost, dataset sizes, and model capacity hinder the democratization of modern AI technology and require novel system design solutions. In this paper, we review state-of-the-art (SOTA) challenges and methodologies in scaling AI system-on-chip (SoC) design to harness the power of the foundation model. We organize our discussions four-fold. First, we discuss AI SoC architecture design to enable high-performance training for foundation models. Second, we discuss challenges in managing foundation model training with dataflow accelerators. We show that data flow accelerators, a class of promising architectures removing execution bottlenecks through overlapping computation and data fetching, pose new challenges for hardware resource mapping and allocation. Third, we discuss challenges for exploiting parallelism encompassing multiple dimensions, e.g., tensor, model, and data-level parallelism. Mapping models over tensor and model dimensions enables large model training at the cost of introducing distributed and orchestrated gradient synchronization. Last, we discuss electrical and energy design trade-offs for implementing massive computation and memory units capturing computation and data locality on a dataflow accelerator. The solution to all four aspects lies at the intersection of system-aware machine learning algorithms, dataflow-driven software systems, and scalable hardware design.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call