You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/getting_started.md
+5-18Lines changed: 5 additions & 18 deletions
Original file line number
Diff line number
Diff line change
@@ -271,36 +271,23 @@ Usually it is slow if you do not have high speed networking like InfiniBand.
271
271
### Launch multiple jobs on a single machine
272
272
273
273
If you launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs,
274
-
you need to specify different ports (29500 by default) for each job to avoid communication conflict.
274
+
you need to specify different ports (29500 by default) foreach job to avoid communication conflict. Otherwise, there will be error message saying `RuntimeError: Address alreadyin use`.
275
275
276
-
If you use `dist_train.sh` to launch training jobs, you can set the port in commands.
276
+
If you use `dist_train.sh` to launch training jobs, you can set the port in commands with environment variable `PORT`.
If you use launch training jobs with Slurm, you need to modify the config files (usually the 6th line from the bottom inconfig files) to set different communication ports.
283
+
If you use `slurm_train.sh` to launch training jobs, you can setthe port incommands with environment variable `MASTER_PORT`.
284
284
285
-
In `config1.py`,
286
-
```python
287
-
dist_params = dict(backend='nccl', port=29500)
288
-
```
289
-
290
-
In `config2.py`,
291
-
```python
292
-
dist_params = dict(backend='nccl', port=29501)
293
-
```
294
-
295
-
Then you can launch two jobs with `config1.py` ang `config2.py`.
In this way, only pixels with confidence score under 0.7 are used to train. And we keep at least 100000 pixels during training.
28
+
In this way, only pixels with confidence score under 0.7 are used to train. And we keep at least 100000 pixels during training. If `thresh` is not specified, pixels of top ``min_kept`` loss will be selected.
29
29
30
30
## Class Balanced Loss
31
31
For dataset that is not balanced in classes distribution, you may change the loss weight of each class.
0 commit comments