| No. |
Title |
Authors |
Journal |
| 184 |
LORA: Low-Rank Adaptation of large language models |
Edward Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen (Microsoft corporation) |
arXiv (): |
Abstract
An important paradigm of natural language processing consists of large-scale pre-
training on general domain data and adaptation to particular tasks or domains. As
we pre-train larger models, full fine-tuning, which retrains all model parameters,
becomes less feasible. Using GPT-3 175B as an example – deploying indepen-
dent instances of fine-tuned models, each with 175B parameters, is prohibitively
expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-
trained model weights and injects trainable rank decomposition matrices into each
layer of the Transformer architecture, greatly reducing the number of trainable pa-
rameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam,
LoRA can reduce the number of trainable parameters by 10,000 times and the
GPU memory requirement by 3 times. LoRA performs on-par or better than fine-
tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite hav-
ing fewer trainable parameters, a higher training throughput, and, unlike adapters,
no additional inference latency. We also provide an empirical investigation into
rank-deficiency in language model adaptation, which sheds light on the efficacy of
LoRA. We release a package that facilitates the integration of LoRA with PyTorch
models and provide our implementations and model checkpoints for RoBERTa,
DeBERTa, and GPT-2 at https://github.com/microsoft/LoRA.
Date: 2025.08.01 (FRI) 14:00
Presenter: Jungwoo Lee (CSB Lab. MS. student)
이정우 학생이 주도하여 "LLM에서 사용되는 LORA 기법"을 주제로 8.1 (금) 오후 2시 저널클럽을 진행했습니다.