-
Notifications
You must be signed in to change notification settings - Fork 900
Description
Hi guys,
I use grpo colocate mode to train qwen25VL 7B. But during training, this continuous system memory increase was observed. I suspect that some kind of memory leak happened here.
the GREEN line is system memory usage.
here is my sh command:
NPROC_PER_NODE=8
MAX_PIXELS=1280000
swift rlhf
--rlhf_type grpo
--model /mnt2/models/Qwen2.5_VL_7B_Instruct
--train_type full
--dataset /ossfs/workspace/xxx.json
--torch_dtype bfloat16
--num_train_epochs 1
--max_length 4096
--per_device_train_batch_size 2
--per_device_eval_batch_size 1
--gradient_accumulation_steps 2
--eval_steps 1000
--save_steps 1000
--eval_strategy 'no'
--learning_rate 1e-6
--save_total_limit 2
--logging_steps 1
--output_dir /mnt2/user//outputs/
--warmup_ratio 0.05
--dataloader_num_workers 4
--max_completion_length 1024
--external_plugins examples/train/grpo/plugin/plugin.py
--reward_funcs external_ui_acc uiformat
--num_generations 16
--use_vllm true
--vllm_gpu_memory_utilization 0.2
--vllm_max_model_len 5120
--deepspeed zero3_offload
--temperature 1.1
--log_completions true
--num_infer_workers 8
--tensor_parallel_size 4
--async_generate false
--sleep_level 1
--report_to swanlab \
here is my related libraries:
vllm 0.7.3
trl 0.16.0.dev0
transformers 4.49.0
torch 2.5.1+cu121
Btw, I also found that someone encountered system memory leak issues in other open-source project, when using vllm==0.7.3. So I guess something go wrong with specific version of vllm:
hiyouga/EasyR1#50