Address already in use: how to enable multiple training process on one machine?

Hi, authors. Thanks for your great work! I would like to know what changes should I make to enable training two process on one machine, which has 8 GPUs in total. I used the first four for training a model and when I tried to use the left four for another model, I got the error message

Traceback (most recent call last):                                                        
  File "/mnt/sda/TEL_syn/main.py", line 185, in <module>                                  
    handle_distributed(args_parser, os.path.expanduser(os.path.abspath(__file__)))        
  File "/mnt/sda/TEL_syn/lib/utils/distributed.py", line 31, in handle_distributed        
    _setup_process_group(args)                                                            
  File "/mnt/sda/TEL_syn/lib/utils/distributed.py", line 74, in _setup_process_group      
    torch.distributed.init_process_group(                                                 
  File "/home/jiw010/anaconda3/envs/tel/lib/python3.8/site-packages/torch/distributed/dist
ributed_c10d.py", line 500, in init_process_group                                         
    store, rank, world_size = next(rendezvous_iterator)                                   
  File "/home/jiw010/anaconda3/envs/tel/lib/python3.8/site-packages/torch/distributed/rend
ezvous.py", line 190, in _env_rendezvous_handler                                          
    store = TCPStore(master_addr, master_port, world_size, start_daemon, timeout)         
RuntimeError: Address already in use 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Address already in use: how to enable multiple training process on one machine? #100

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Address already in use: how to enable multiple training process on one machine? #100

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions