Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
352008b
Start agent traces
aymeric-roucher Feb 24, 2025
6c231d2
Working local version with o1
aymeric-roucher Feb 25, 2025
69b2651
Update api addr
aymeric-roucher Feb 26, 2025
ad948c2
Increase concurrent requests
aymeric-roucher Feb 26, 2025
a00f0ee
Update sbatch params
aymeric-roucher Feb 26, 2025
143fcfa
Add conda activation
aymeric-roucher Feb 26, 2025
0af9e75
Use local model
aymeric-roucher Feb 26, 2025
6cffffe
128 concurrent
aymeric-roucher Feb 26, 2025
cf13c2b
Log
aymeric-roucher Feb 26, 2025
cffa362
Add conda init
aymeric-roucher Feb 26, 2025
e35800c
Fix slurm script
aymeric-roucher Feb 26, 2025
b47a4be
Add await
aymeric-roucher Feb 26, 2025
0cd0999
Try fixing async func
aymeric-roucher Feb 26, 2025
dd15ad8
Add stop sequences
aymeric-roucher Feb 26, 2025
d2588cd
Add port
aymeric-roucher Feb 27, 2025
b738e58
Make synchronous
aymeric-roucher Feb 28, 2025
f78b865
Small adapts to script
aymeric-roucher Feb 28, 2025
cb2a2c2
More detailed error logging
aymeric-roucher Feb 28, 2025
9a2d16f
Even more detailed request error logging
aymeric-roucher Feb 28, 2025
2a1ff76
Reduce context length
aymeric-roucher Feb 28, 2025
a97eb27
Add token counting
aymeric-roucher Feb 28, 2025
d8cb19b
Fix message roles an add token counting
aymeric-roucher Feb 28, 2025
e42b1cd
Add dummy completion
aymeric-roucher Feb 28, 2025
83a679f
Test
aymeric-roucher Feb 28, 2025
d87e3f3
Running with gpt-4o
aymeric-roucher Feb 28, 2025
8e70ca4
Update timeouts
aymeric-roucher Feb 28, 2025
2876d52
Adjust
aymeric-roucher Feb 28, 2025
cf52433
Flatten messages
aymeric-roucher Feb 28, 2025
a07cd54
Prompt more around testing the function
aymeric-roucher Feb 28, 2025
ddc1cdd
Improve explanations in prompt
aymeric-roucher Feb 28, 2025
4c2fce6
Also store final outputs
aymeric-roucher Mar 13, 2025
4a20ba4
Try Qwen Coder 32B
aymeric-roucher Apr 2, 2025
6961c36
Remove some dependencies to work on mac
aymeric-roucher Apr 3, 2025
2b1bc05
Merge branch 'main' into agent-traces
aymeric-roucher Apr 3, 2025
38efcfc
Working trace generation with auto verification by running test cases
aymeric-roucher Apr 3, 2025
b7522e3
Add training scripts for agents
aymeric-roucher Apr 3, 2025
2ddf70e
Change job name
aymeric-roucher Apr 3, 2025
49083cc
Intervert sft training configs
aymeric-roucher Apr 3, 2025
de2b792
Point to proper config file
aymeric-roucher Apr 3, 2025
5647c26
Add distributed type
aymeric-roucher Apr 3, 2025
8a7951c
Revert to zero3 config
aymeric-roucher Apr 3, 2025
d28d07b
Remove deepspeed config
aymeric-roucher Apr 4, 2025
cae3c7c
Update train slurm
aymeric-roucher Apr 4, 2025
2a08444
Switch to new venv
aymeric-roucher Apr 8, 2025
1eaf1d1
Move script to proper file
aymeric-roucher Apr 8, 2025
2043be9
Change job name
aymeric-roucher Apr 8, 2025
2030e16
Increase epochs
aymeric-roucher Apr 8, 2025
08a449c
Update dataset name
aymeric-roucher Apr 9, 2025
60472f6
Increase epochs
aymeric-roucher Apr 9, 2025
9347590
adding qwen 3b training setup
Apr 15, 2025
a66a5e6
Merge branch 'main' into agent-traces
aymeric-roucher Jun 23, 2025
a9b5411
Add aguvis download script
aymeric-roucher Jun 23, 2025
80f7ce8
Improve collection script
aymeric-roucher Jun 23, 2025
984d631
Add Readme for agents
aymeric-roucher Jun 23, 2025
a675552
Fix env variables
aymeric-roucher Jun 23, 2025
fbd987c
Remove weka
aymeric-roucher Jun 23, 2025
3aee6ef
Modify train slurm
aymeric-roucher Jun 23, 2025
7cb592c
Remove parsing
aymeric-roucher Jun 23, 2025
b7a700e
Revert training script to the good old time when it worked
aymeric-roucher Jun 23, 2025
3b77977
Revert to new shitty script
aymeric-roucher Jun 23, 2025
81c64ac
Change weka path
aymeric-roucher Jun 23, 2025
eb39096
Try edit
aymeric-roucher Jun 23, 2025
0ee52fc
Fix env
aymeric-roucher Jun 23, 2025
c4d4126
Working SFT for text model
aymeric-roucher Jun 24, 2025
3c3e954
Start adapting script for VLM training
aymeric-roucher Jun 24, 2025
a452f2f
Impreove data collection script
aymeric-roucher Jun 25, 2025
5fa7e51
Deactivate multinodes
aymeric-roucher Jun 26, 2025
933ea92
Merge branch 'agent-traces' of github.com:huggingface/open-r1 into ag…
aymeric-roucher Jun 26, 2025
a658db9
Fix sft collate function for vlms
aymeric-roucher Jun 26, 2025
24ea112
Fix collate fn in sft.py
aymeric-roucher Jun 26, 2025
db30467
Working VLM training 🥳
aymeric-roucher Jun 26, 2025
5eadb06
Add single-GPU training script
aymeric-roucher Jun 26, 2025
b316210
Add second dataset in mix
aymeric-roucher Jun 26, 2025
2ba1c65
Add aguvis conversion script
aymeric-roucher Jul 1, 2025
f6b8f7c
Conversion script
aymeric-roucher Jul 1, 2025
22b84cf
Merge branch 'agent-traces' of github.com:huggingface/open-r1 into ag…
aymeric-roucher Jul 1, 2025
035f134
Integrate aguvis conversion to smolagents
aymeric-roucher Jul 1, 2025
f692c10
Try catch wrap for processing
aymeric-roucher Jul 1, 2025
b0d794c
override existing split
aymeric-roucher Jul 1, 2025
31cf3a2
Nit script args
aymeric-roucher Jul 1, 2025
029dc60
Update train instructions
aymeric-roucher Jul 3, 2025
880a585
Merge branch 'agent-traces' of github.com:huggingface/open-r1 into ag…
aymeric-roucher Jul 3, 2025
1b50860
Remove merge artifact
aymeric-roucher Jul 3, 2025
868d4a4
Small fixes in recipe
aymeric-roucher Jul 9, 2025
4c83688
Modify aguvis conversion script
aymeric-roucher Jul 9, 2025
6a63f2f
Unify conversion in only one script
aymeric-roucher Jul 9, 2025
e8a4c2b
Update imports
aymeric-roucher Jul 9, 2025
18fea48
Fix script
aymeric-roucher Jul 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions README_AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
Launch:
```bash
sbatch --nodes=1 slurm/train.slurm --model SmolLM2-1.7B-Instruct --task sft --config agent --accelerator zero3
```
Refers to the config recipes/SmolLM2-1.7B-Instruct/sft/config_agent.yaml
zero3 is one of the accelerate configs in recipes/accelerate_configs


### VLM training

Launch in multi GPU:
```bash
sbatch --qos=high --nodes=1 slurm/train.slurm --model Qwen2.5-VL-3B-Instruct --task sft --config agent --accelerator zero3
```

🛑 For me the above fails because of NCCL issues, I launch it in single-GPU mode as follows:
```bash
sbatch slurm/trainsingle.slurm --model Qwen2.5-VL-3B-Instruct --task sft --config agent
```

The config is located under recipes/Qwen2.5-VL-3B-Instruct/sft/config_agent.yaml
Empty file removed logs/.gitkeep
Empty file.
46 changes: 46 additions & 0 deletions recipes/Qwen2.5-3B-Instruct/sft/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Model arguments
# You can download the model and manually change the rope to 300k/500k and max_position_embeddings to 32768
model_name_or_path: HuggingFaceTB/SmolLM2-1.7B-Instruct
model_revision: main
torch_dtype: bfloat16
attn_implementation: sdpa

# Data training arguments
dataset_name: open-r1/OpenR1-Math-220k
dataset_num_proc: 48

#SFT hyperparam
max_length: 8192 # You can set this to 32768 if you change the rope, but you need to change the config.json file
weight_decay: 0.0001
optim: adamw_torch
lr_scheduler_type: linear
warmup_ratio: 0.1
learning_rate: 5.0e-05
gradient_accumulation_steps: 2
per_device_eval_batch_size: 4
per_device_train_batch_size: 4 # Change this depending on the context length of the model to keep a 500M GBS.

# SFT trainer config
max_steps: -1
num_train_epochs: 3
bf16: true
do_eval: false
eval_strategy: 'no'
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
hub_model_id: OpenR1-Qwen-7B-SFT
hub_strategy: every_save
log_level: info
logging_steps: 5
logging_strategy: steps
packing: true
output_dir: data/OpenR1-Qwen-7B-SFT
overwrite_output_dir: true
push_to_hub: true
report_to:
- wandb
save_strategy: "steps"
save_steps: 500
save_total_limit: 1
seed: 42
46 changes: 46 additions & 0 deletions recipes/Qwen2.5-3B-Instruct/sft/config_agent.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Model arguments
# You can download the model and manually change the rope to 300k/500k and max_position_embeddings to 32768
model_name_or_path: Qwen/Qwen2.5-3B-Instruct
model_revision: main
torch_dtype: bfloat16
attn_implementation: sdpa

# Data training arguments
dataset_name: smolagents/training-traces
dataset_num_proc: 48

#SFT hyperparam
max_length: 8192 # You can set this to 32768 if you change the rope, but you need to change the config.json file
weight_decay: 0.0001
optim: adamw_torch
lr_scheduler_type: linear
warmup_ratio: 0.1
learning_rate: 4.0e-05
gradient_accumulation_steps: 1
per_device_eval_batch_size: 4
per_device_train_batch_size: 2 # Change this depending on the context length of the model to keep a 500M GBS.

# SFT trainer config
max_steps: -1
num_train_epochs: 2
bf16: true
do_eval: false
eval_strategy: 'no'
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
hub_model_id: oR1-Qwen-3B-Agentic-e2-lr4e-b2
hub_strategy: every_save
log_level: info
logging_steps: 5
logging_strategy: steps
packing: true
output_dir: data/oR1-Qwen-3B-Agentic-e2-lr4e-b2
overwrite_output_dir: true
push_to_hub: true
report_to:
- wandb
save_strategy: "steps"
save_steps: 500
save_total_limit: 1
seed: 42
76 changes: 76 additions & 0 deletions recipes/Qwen2.5-VL-3B-Instruct/sft/config_agent.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# Model arguments
# You can download the model and manually change the rope to 300k/500k and max_position_embeddings to 32768
model_name_or_path: Qwen/Qwen2.5-VL-3B-Instruct
vision_model: true
model_revision: main
torch_dtype: bfloat16
attn_implementation: sdpa

# Data training arguments
dataset_name: smolagents/aguvis-stage-2
dataset_num_proc: 48

#SFT hyperparam
max_length: 32768
optim: adamw_torch
lr_scheduler_type: cosine_with_min_lr
lr_scheduler_kwargs:
min_lr_rate: 0.1
max_grad_norm: 0.2
warmup_ratio: 0.03
learning_rate: 1.0e-05
gradient_accumulation_steps: 8
per_device_eval_batch_size: 4
per_device_train_batch_size: 4 # Change this depending on the context length of the model to keep a 500M GBS.

single_gpu: true

# SFT trainer config
max_steps: -1
num_train_epochs: 1
bf16: true
do_eval: false
eval_strategy: 'no'
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
hub_model_id: smolagents/Qwen2.5-VL-3B-Instruct-Agentic
hub_strategy: end
push_to_hub: true
log_level: info
logging_steps: 5
logging_strategy: steps
output_dir: data/smolagents-Qwen2.5-VL-3B-Instruct-Agentic
overwrite_output_dir: true
report_to:
- wandb
save_strategy: "epoch"
save_steps: 1
save_total_limit: 1
seed: 42

dataset_mixture:
datasets: # List of datasets to include in the mixture
- id: smolagents/aguvis-stage-2 # Hub dataset ID
config: mind2web # Name of the dataset config
split: train # Split to use from the dataset
columns: # Columns to keep
- images
- texts
weight: 1.
- id: smolagents/aguvis-stage-2
config: guiact-web-single
split: train
columns:
- images
- texts
weight: 1.
- id: smolagents/aguvis-stage-2
config: guiact-web-multi
split: train
columns:
- images
- texts
weight: 1.
seed: 42 # Seed for shuffling the combined dataset
test_split_size: 0.1
45 changes: 45 additions & 0 deletions recipes/SmolLM2-1.7B-Instruct/sft/config_agent.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Model arguments
# You can download the model and manually change the rope to 300k/500k and max_position_embeddings to 32768
model_name_or_path: HuggingFaceTB/SmolLM2-1.7B-Instruct
model_revision: main
torch_dtype: bfloat16
attn_implementation: sdpa

# Data training arguments
dataset_name: smolagents/codeagent-traces
dataset_num_proc: 48

#SFT hyperparam
max_length: 8192 # You can set this to 32768 if you change the rope, but you need to change the config.json file
weight_decay: 0.0001
optim: adamw_torch
lr_scheduler_type: linear
warmup_ratio: 0.1
learning_rate: 5.0e-05
gradient_accumulation_steps: 2
per_device_eval_batch_size: 4
per_device_train_batch_size: 4 # Change this depending on the context length of the model to keep a 500M GBS.

# SFT trainer config
max_steps: -1
num_train_epochs: 1
bf16: true
do_eval: false
eval_strategy: 'no'
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
hub_model_id: OpenR1-SmolLM2-1.7B-Instruct-Agentic
hub_strategy: every_save
log_level: info
logging_steps: 5
logging_strategy: steps
output_dir: data/OpenR1-SmolLM2-1.7B-Instruct-Agentic
overwrite_output_dir: true
push_to_hub: true
report_to:
- wandb
save_strategy: "steps"
save_steps: 500
save_total_limit: 1
seed: 42
Loading
Loading