Smaller and/or quanitzed Large Language Models can be fine-tuned on a single GPU. For example for FLAN T5 XL (3b) a Nvidia V100 GPU with 16GB is sufficient. This post demonstrates a simple example which results in a custom PEFT model.
PEFT (Parameter-Efficient Fine-Tuning) is a library for efficiently adapting large pretrained models to various downstream applications without fine-tuning all of a model’s parameters. LoRA is one of the PEFT techniques which is leveraged below.
The example is from the article Efficient Large Language Model training with LoRA and Hugging Face. The author describes the benefit of LoRA compared to full fine-tuning.
The training took ~10:36:00 and cost ~13.22$ for 10h of training. For comparison a full fine-tuning on FLAN-T5-XXL with the same duration (10h) requires 8x A100 40GBs and costs ~322$.
Read the following posts and articles for more context.
- Preparing LLM LoRA Fine-Tuning locally
- Evaluating LoRA Fine-Tuning Result
- Deploying a Virtual Server with GPU in the IBM Cloud
- Training Models locally via Containers
- Efficient Large Language Model training with LoRA and Hugging Face
- Hugging Face LoRA Documentation
Dependencies
On your server install Python and PyTorch.
1
2
3
4
5
6
apt update
apt upgrade
apt install python3
apt install python3-pip
pip3 install torch torchvision torchaudio
apt install nano
Next install the necessary Hugging Face libraries. Note that I had to use the exact versions below. The latest versions did not work for me.
1
2
3
4
pip install "accelerate==0.17.1"
pip install "peft==0.2.0"
pip install "transformers==4.27.2" "datasets" "evaluate==0.4.0" "bitsandbytes==0.41.2.post2" loralib
pip install rouge-score tensorboard py7zr scipy openpyxl
Example
The following code uses the small FLAN T5 model for testing whether code works. For the real fine-tuning bigger models can be defined.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
model_id="google/flan-t5-small"
#model_id="google/flan-t5-xl"
#model_id="google/flan-t5-xxl"
peft_model_id="results-ft-model"
data_train="data/train"
logs_output_dir="logs"
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained(model_id)
from datasets import load_from_disk
from random import randrange
ds = load_from_disk(data_train)
from transformers import AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained(model_id, load_in_8bit=True, device_map="auto")
from peft import LoraConfig, get_peft_model, prepare_model_for_int8_training, TaskType
lora_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q", "v"],
lora_dropout=0.05,
bias="none",
task_type=TaskType.SEQ_2_SEQ_LM
)
model = prepare_model_for_int8_training(model)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
from transformers import DataCollatorForSeq2Seq
label_pad_token_id = -100
data_collator = DataCollatorForSeq2Seq(
tokenizer,
model=model,
label_pad_token_id=label_pad_token_id,
pad_to_multiple_of=8
)
from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments
training_args = Seq2SeqTrainingArguments(
output_dir=logs_output_dir,
auto_find_batch_size=True,
learning_rate=1e-3,
num_train_epochs=5,
logging_dir=f"{logs_output_dir}/logs",
logging_strategy="steps",
logging_steps=100,
save_strategy="no",
report_to="tensorboard",
)
trainer = Seq2SeqTrainer(
model=model,
args=training_args,
data_collator=data_collator,
train_dataset=ds,
)
model.config.use_cache = False
trainer.train()
trainer.model.save_pretrained(peft_model_id)
tokenizer.save_pretrained(peft_model_id)
Large Language Models come with built-in loss functions. For many scenarios this works well enough and you don’t need custom loss functions.
Output
The script produces the following output:
1
2
3
4
5
6
7
8
$ python3 lora-tune.py
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████| 2/2 [00:02<00:00, 1.32s/it]
trainable params: 9437184 || all params: 2859194368 || trainable%: 0.33006444422319176
...
{'loss': 0.1128, 'learning_rate': 0.0005454545454545455, 'epoch': 2.27}
{'loss': 0.0735, 'learning_rate': 9.090909090909092e-05, 'epoch': 4.55}
{'train_runtime': 1773.3362, 'train_samples_per_second': 0.973, 'train_steps_per_second': 0.124, 'train_loss': 0.09138318679549477, 'epoch': 5.0}
100%|████████████████████████████████████████████████████████████████████████████| 220/220 [29:33<00:00, 8.06s/it]
Tensorboard logs are stored on disc.
1
2
3
4
5
tree logs/
logs/
├── 1701679868.607323
│ └── events.out.tfevents.1701679868.gpu-server.116313.1
└── events.out.tfevents.1701679868.gpu-server.116313.0
The custom PEFT/LoRA model is also saved to disc.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
results# ls -la
total 5112
drwxr-xr-x 2 root root 4096 Nov 28 11:56 .
drwxr-xr-x 6 root root 4096 Dec 4 09:20 ..
-rw-r--r-- 1 root root 355 Nov 28 11:56 adapter_config.json
-rw-r--r-- 1 root root 2787918 Nov 28 11:56 adapter_model.bin
-rw-r--r-- 1 root root 2201 Nov 28 11:56 special_tokens_map.json
-rw-r--r-- 1 root root 2422164 Nov 28 11:56 tokenizer.json
-rw-r--r-- 1 root root 2497 Nov 28 11:56 tokenizer_config.json
cat adapter_config.json
{
"base_model_name_or_path": "google/flan-t5-xl",
"bias": "none",
"enable_lora": null,
"fan_in_fan_out": false,
"inference_mode": true,
"lora_alpha": 32,
"lora_dropout": 0.05,
"merge_weights": false,
"modules_to_save": null,
"peft_type": "LORA",
"r": 16,
"target_modules": [
"q",
"v"
],
"task_type": "SEQ_2_SEQ_LM"
}
Next Steps
To learn more, check out the Watsonx.ai documentation and the Watsonx.ai landing page.