Skip to content

Respect each example requirements and use uv #1330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 26, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Respect each example requirements and use uv
This commit introduces few changes to CI by modifying `run_*_examples.sh`
and respective github workflows:

* Switched to uv
* Added tearup and teardown stages for tests (`start()` and `stop()` methods
  wrapping up test bodies - these are called automatically)
* Tearup (`start()`) installs example dependencies and, optionally (if `VIRTUAL_ENV=.venv`
  is passed), creates uv virtual environment
* Teardown (`stop()`) removes uv virtual environment if it was created (to
  save space)
* If no `VIRTUAL_ENV` set, then scripts expect to be executed in the existing
  virtual environment. These can be `python -m venv`, `uv env` or `conda env`.
  In this case example dependencies will be installed in this environment
  potentially reinstalling existing packages (including `torch`!).
* Dropped automated detection of CUDA platform. Now scripts require `USE_CUDA=True`
  to be passed explicitly
* Added `PIP_INSTALL_ARGS` environment variable to be passed to `uv pip install` calls
  for each example dependencies. This allows to adjust torch indices and other options.

Execute all tests in current virtual environment (might rewrite packages):
```
./run_distributed_examples.sh
```

Execute all tests creating separate environment for each example:
```
VIRTUAL_ENV=.venv ./run_distributed_examples.sh
```

Run with CUDA:
```
USE_CUDA=True ./run_distributed_examples.sh
```

Adjust index:
```
PIP_INSTALL_ARGS="--pre -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html" \
   ./run_distributed_examples.sh
```

Signed-off-by: Dmitry Rogozhkin <[email protected]>
  • Loading branch information
dvrogozh committed Apr 26, 2025
commit 1203511e59f922d4fce0a529ac09771501fe4606
10 changes: 6 additions & 4 deletions .github/workflows/main_distributed.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,14 @@ jobs:
with:
python-version: 3.8
- name: Install PyTorch
run: |
python -m pip install --upgrade pip
pip install --pre torch -f https://download.pytorch.org/whl/nightly/cu118/torch_nightly.html
uses: astral-sh/setup-uv@v6
- name: Run Tests
env:
USE_CUDA: 'True'
VIRTUAL_ENV: '.venv'
PIP_INSTALL_ARGS: '--pre -f https://download.pytorch.org/whl/nightly/cu118/torch_nightly.html'
run: |
./run_distributed_examples.sh "run_all,clean"
./run_distributed_examples.sh
- name: Open issue on failure
if: ${{ failure() && github.event_name == 'schedule' }}
uses: rishabhgupta/git-action-issue@v2
Expand Down
14 changes: 6 additions & 8 deletions .github/workflows/main_python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,16 +21,14 @@ jobs:
uses: actions/setup-python@v5
with:
python-version: '3.10'
- name: Install PyTorch
run: |
python -m pip install --upgrade pip
# Install CPU-based pytorch
pip install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
# Maybe use the CUDA 10.2 version instead?
# pip install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/cu102/torch_nightly.html
- name: Install uv
uses: astral-sh/setup-uv@v6
- name: Run Tests
env:
VIRTUAL_ENV: '.venv'
PIP_INSTAL_ARGS: '--pre -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html'
run: |
./run_python_examples.sh "install_deps,run_all,clean"
./run_python_examples.sh
- name: Open issue on failure
if: ${{ failure() && github.event_name == 'schedule' }}
uses: rishabhgupta/git-action-issue@v2
Expand Down
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,8 @@ If you're new, we encourage you to take a look at issues tagged with [good first
1. Fork the repo and create your branch from `main`.
2. Make sure you have a GPU-enabled machine, either locally or in the cloud. `g4dn.4xlarge` is a good starting point on AWS.
3. Make your code change.
4. First, install all dependencies with `./run_python_examples.sh "install_deps"`.
5. Then, make sure that `./run_python_examples.sh` passes locally by running the script end to end.
4. Install `uv`.
5. Then, make sure that `VIRTUAL_ENV=.venv ./run_python_examples.sh` passes locally by running the script end to end.
6. If you haven't already, complete the Contributor License Agreement ("CLA").
7. Address any feedback in code review promptly.

Expand Down
46 changes: 29 additions & 17 deletions run_distributed_examples.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,30 @@
# The purpose is just as an integration test, not to actually train models in any meaningful way.
# For that reason, most of these set epochs = 1 and --dry-run.
#
# Optionally specify a comma separated list of examples to run.
# can be run as:
# ./run_python_examples.sh "install_deps,run_all,clean"
# to pip install dependencies (other than pytorch), run all examples, and remove temporary/changed data files.
# Expects pytorch, torchvision to be installed.
# Optionally specify a comma separated list of examples to run. Can be run as:
# * To run all examples:
# ./run_distributed_examples.sh
# * To run specific example:
# ./run_distributed_examples.sh "distributed/tensor_parallelism,distributed/ddp"
#
# To test examples on CUDA accelerator, run as:
# USE_CUDA=True ./run_distributed_examples.sh
#
# Script requires uv to be installed. When executed, script will install prerequisites from
# `requirements.txt` for each example. If ran within activated virtual environment (uv venv,
# python -m venv, conda) this might reinstall some of the packages. To change pip installation
# index or to pass additional pip install options, run as:
# PIP_INSTALL_ARGS="--pre -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html" \
# ./run_python_examples.sh
#
# To force script to create virtual environment for each example, run as:
# VIRTUAL_ENV=".venv" ./run_distributed_examples.sh
# Script will remove environments it creates in a teardown step after execution of each example.

BASE_DIR="$(pwd)/$(dirname $0)"
source $BASE_DIR/utils.sh

USE_CUDA=$(python -c "import torch; print(torch.cuda.is_available())")
USE_CUDA=${USE_CUDA:-False}
case $USE_CUDA in
"True")
echo "using cuda"
Expand All @@ -30,21 +44,19 @@ case $USE_CUDA in
;;
esac

function distributed() {
start
bash tensor_parallelism/run_example.sh tensor_parallelism/tensor_parallel_example.py || error "tensor parallel example failed"
bash tensor_parallelism/run_example.sh tensor_parallelism/sequence_parallel_example.py || error "sequence parallel example failed"
bash tensor_parallelism/run_example.sh tensor_parallelism/fsdp_tp_example.py || error "2D parallel example failed"
python ddp/main.py || error "ddp example failed"
function distributed_tensor_parallelism() {
uv run bash run_example.sh tensor_parallel_example.py || error "tensor parallel example failed"
uv run bash run_example.sh sequence_parallel_example.py || error "sequence parallel example failed"
uv run bash run_example.sh fsdp_tp_example.py || error "2D parallel example failed"
}

function clean() {
cd $BASE_DIR
echo "running clean to remove cruft"
function distributed_ddp() {
uv run main.py || error "ddp example failed"
}

function run_all() {
distributed
run distributed/tensor_parallelism
run distributed/ddp
}

# by default, run all examples
Expand All @@ -54,7 +66,7 @@ else
for i in $(echo $EXAMPLES | sed "s/,/ /g")
do
echo "Starting $i"
$i
run $i
echo "Finished $i, status $?"
done
fi
Expand Down
Loading
Loading