We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 1d4ac90 commit 755b0a3Copy full SHA for 755b0a3
language/llama3.1-8b/README_mahmood.md
@@ -35,4 +35,6 @@ python -u main.py --scenario Offline --model-path ${CHECKPOINT_PATH} --batch-siz
35
inteactive job command:
36
```
37
srun --mpi=pmix --job-name="int_gpu_job" --partition=gpu-a100-small --time=01:00:00 --ntasks=1 --cpus-per-task=2 --gpus-per-task=1 --mem-per-cpu=5G --account=research-eemcs-qce --pty /bin/bash -il
38
+
39
+/scratch/mnaderantahan/nsight-systems-2025.5.1/bin/nsys profile --output nsys.out --trace=cuda,cublas,cudnn,osrt,nvtx --sample cpu --cpuctxsw process-tree python -u main.py --scenario Offline --model-path $CHECKPOINT_PATH --batch-size $BATCH_SIZE --dtype bfloat16 --user-conf user.conf --total-sample-count 1 --dataset-path $DATASET_PATH --output-log-dir output --tensor-parallel-size $GPU_COUNT --vllm
40
0 commit comments