ZeROf-Offload: forward-gradient scheme for efficient full parameter fine-tuning of billion-scale language models

Jian Zhu,Peicheng Feng,Jiawei Lu,Bowei Fang,Hesong Yang

doi:10.1088/2632-2153/ad9667

Abstract

Abstract In large language models (LLMs), full-parameter fine-tuning is crucial for task-specific adaptation. Traditionally, this relies on deep learning training frameworks utilizing the back-propagation scheme. However, this scheme presents inherent issues, e.g. activation memory bottlenecks and backward locking, which limit the efficient computational resource usage. In this work, we propose the design and analysis of ZeROf-Offload, an innovative fine-tuning framework that adapts the forward-gradient scheme. This framework adopts a unique forward-gradient-oriented CPU offload strategy, enabling fine-tuning of billion-scale LLMs solely in the forward phase and enhancing computational efficiency. Empirical evaluations reveal the advantage of eliminating the backward phase in fine-tuning. ZeROf-Offload achieves134 TFlops/GPU for models with over 130 billion parameters on a single DGX-A100 node, outperforming DeepSpeed’s ZeRO-Offload, which achieves 102 TFlops/GPU for models with up to 53.7 billion parameters, the largest size manageable within GPU memory limitations. Furthermore, we have expanded ZeROf-Offload for multi-DGX-A100 environments with integrated 3D parallelism, achieving near-linear speedup across up to 128 GPUs and the token throughput by 1.4x and 1.5x, respectively. The experimental results demonstrate that the proposed ZeROf-Offload has achieved the highest throughput performance compared to all examined state-of-the-art frameworks.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

ZeROf-Offload: forward-gradient scheme for efficient full parameter fine-tuning of billion-scale language models

Abstract

Published Version

Talk to us

Similar Papers

More From: Machine Learning: Science and Technology

Lead the way for us

Journal: Machine Learning: Science and Technology	Publication Date: Dec 1, 2024
License type: cc-by

Similar Papers

Optimizing Microservice Deployment in Edge Computing with Large Language Models: Integrating Retrieval Augmented Generation and Chain of Thought Techniques
Kan Feng ... Kai Peng
Symmetry | VOL. 16
Kan Feng, et. al.Kan Feng ... Kai Peng
05 Nov 2024
Symmetry | VOL. 16

RTaC: A Generalized Framework for Tooling
Nisarg Bhavsar ... Ashish Patwa
-
Nisarg Bhavsar, et. al.Nisarg Bhavsar ... Ashish Patwa
01 Jan 2024
01 Jan 2024

LCA Perspectives for Resource Efficiency Assessment
Laura Schneider ... Matthias Finkbeiner
-
Laura Schneider, et. al.Laura Schneider ... Matthias Finkbeiner
01 Jan 2015
01 Jan 2015

Efficient Inference Offloading for Mixture-of-Experts Large Language Models in Internet of Medical Things
Xiaoming Yuan ... Minrui Xu
Electronics | VOL. 13
Xiaoming Yuan, et. al.Xiaoming Yuan ... Minrui Xu
27 May 2024
Electronics | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

ZeROf-Offload: forward-gradient scheme for efficient full parameter fine-tuning of billion-scale language models

Abstract

Published Version

Talk to us

Similar Papers

More From: Machine Learning: Science and Technology