We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
如题,使用swift infer 进行qwen2.5-vl-72b模型对测试集进行推理,发现显存分配不均,0卡显存占满,其他卡占用不到一半。 脚本 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \ MAX_PIXELS=1003520 \ swift infer \ --model /njfs/train-material/models/Qwen2.5-VL-72B-Instruct \ --infer_backend pt \ --temperature 0 \ --max_new_tokens 4096 \ --val_dataset data/quality_check_porn/swift.jsonl \ --result_path infer_result/swift.jsonl \ --max_batch_size 4
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \ MAX_PIXELS=1003520 \ swift infer \ --model /njfs/train-material/models/Qwen2.5-VL-72B-Instruct \ --infer_backend pt \ --temperature 0 \ --max_new_tokens 4096 \ --val_dataset data/quality_check_porn/swift.jsonl \ --result_path infer_result/swift.jsonl \ --max_batch_size 4
The text was updated successfully, but these errors were encountered:
Please try: --attn_impl flash_attn or --infer_backend vllm --tensor_parallel_size xxx
Sorry, something went wrong.
感谢回复,改成 --infer_backend vllm --tensor_parallel_size xxx 可以了。我又遇到一个问题,我在4台A800*8的机器上进行lora微调,报显存溢出,但实际上显存分配,每张卡分配不均,有的卡不到60G。请问可以怎么解决?
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 NNODES=$TORCH_NNODES NPROC_PER_NODE=$TORCH_NPROC_PER_NODE NODE_RANK=$TORCH_NODE_RANK MASTER_ADDR=$TORCH_MASTER_ADDR MASTER_PORT=$TORCH_MASTER_PORT MAX_PIXELS=1003520 swift sft --model /models/Qwen2.5-VL-7B-Instruct --dataset 'data/train_swift.jsonl' --train_type lora --torch_dtype bfloat16 --num_train_epochs 4 --per_device_train_batch_size 2 --per_device_eval_batch_size 2 --learning_rate 2e-5 --lora_rank 8 --lora_alpha 32 --target_modules all-linear --freeze_vit true --gradient_accumulation_steps 8 --eval_steps 100 --save_steps 500 --save_total_limit 5 --logging_steps 5 --max_length 10240 --output_dir output/sft --warmup_ratio 0.05 --dataloader_num_workers 32 --attn_impl flash_attn --deepspeed zero2
No branches or pull requests
如题,使用swift infer 进行qwen2.5-vl-72b模型对测试集进行推理,发现显存分配不均,0卡显存占满,其他卡占用不到一半。
脚本
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \ MAX_PIXELS=1003520 \ swift infer \ --model /njfs/train-material/models/Qwen2.5-VL-72B-Instruct \ --infer_backend pt \ --temperature 0 \ --max_new_tokens 4096 \ --val_dataset data/quality_check_porn/swift.jsonl \ --result_path infer_result/swift.jsonl \ --max_batch_size 4
The text was updated successfully, but these errors were encountered: