Skip to content

端口监听错误 #3988

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
adoptedirelia opened this issue Apr 24, 2025 · 2 comments
Open

端口监听错误 #3988

adoptedirelia opened this issue Apr 24, 2025 · 2 comments

Comments

@adoptedirelia
Copy link

我在运行swift sft的时候,在训练的时候报错

RuntimeError: The server socket has failed to listen on any local network address. port: 29500, useIpv6: 0, code: -98, name: EADDRINUSE, message: address already in use

请问有什么解决办法嘛
我已经尝试过

export MASTER_PORT=29501

但是报错的时候还是会显示port错误在29500

@Jintao-Huang
Copy link
Collaborator

有shell不

@adoptedirelia
Copy link
Author

有shell不

CUDA_VISIBLE_DEVICES=0
swift sft
--model LLM-Research/Meta-Llama-3.1-8B
--train_type lora
--dataset ./DPO_data/2WikimhQA_sft.jsonl
--torch_dtype bfloat16
--num_train_epochs 5
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--learning_rate 1e-4
--lora_rank 8
--lora_alpha 32
--target_modules all-linear
--gradient_accumulation_steps 4
--eval_steps 100
--save_steps 100
--save_total_limit 2
--logging_steps 5
--max_length 10240
--output_dir output_model
--warmup_ratio 0.05
--dataloader_num_workers 4
--deepspeed zero2
--dataset_num_proc 4

数据集的格式检查过是正确的

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants