forked from pytorch/examples
-
Notifications
You must be signed in to change notification settings - Fork 0
[pull] main from pytorch:main #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
pull
wants to merge
128
commits into
isLinXu:main
Choose a base branch
from
pytorch:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
typo infinitely
* Add mps device * Add --mps to run_python_examples.sh * Update imagenet with mps device * Use curl in run_python_examples.sh to accommodate macOS devices * Fix for https://github.com/pytorchq/examples/issues/1060
* Update .gitignore
* Add code for multinode training on slurm * filtered-clone examples, update script path * python3.7 -> python3
* Adds files for minGPT training with DDP * filtered-clone, update script path, update readme * add refs to karpathy's repo * add training data * add AMP training * delete raw data file, update index.rst * Update gpt2_train_cfg.yaml
After training a model in Mac with mps option, when trying to run the generate script it is giving a Runtime error "Placeholder storage has not been allocated on MPS device". To avoid this issue, this change is made
Add set_epoch for shuffling inputs, fix arg order
…ORCE (#1083) Replace list with deque to obtain O(1) time complexity of insertion at the beginning of the list of returns
* Example of MNIST using RNN * Example of MNIST using RNN: Changed RNN type to LSTM and changed variable names * Example of MNIST using RNN: Resolving review comments * Example of MNIST using RNN: Removing unintentional new line
val data should not shuffle
…PI and deprecate the old one (#1099) * [PT-D][Tensor Parallel] Update the example for TP to use DTensor and new TP API
* word language model on Jetson NX When running the word language model on Jetson NX, the original main.py fails caused by that torch (NVIDIA offical pytorch docker image: `l4t-pytorch:r35.1.0-pth1.11-py3`) do not have the `mps` backend. This modification has fixed the problem. * Update word_language_model/main.py That's better! Co-authored-by: Steven Liu <[email protected]> Co-authored-by: Steven Liu <[email protected]>
set type of batch_size argument to int
Summary: 1. Pick specific version of torchvision to fix dependency errors 2. Pin numpy to be below version 2. 3. Update Python version in python tests. Test Plan: Tested locally.
Summary: Fix up the FSDP tutorial to get it functional again. 1. Add missing import for load_dataset. 2. Use `checkpoint` instead of `_shard.checkpoint` to get rid of a warning. 3. Add nlp to requirements.txt 4. Get rid of `load_metric` as this function does not exist in new `datasets` module. 5. Add `legacy=False` to get rid of tokenizer warnings. Test Plan: Ran the tutorial as follows and ensured that it ran successfully: ``` torchrun --nnodes=1 --nproc_per_node=2 T5_training.py W1031 09:46:49.166000 2847649 torch/distributed/run.py:793] W1031 09:46:49.166000 2847649 torch/distributed/run.py:793] ***************************************** W1031 09:46:49.166000 2847649 torch/distributed/run.py:793] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W1031 09:46:49.166000 2847649 torch/distributed/run.py:793] ***************************************** dict_keys(['train', 'validation', 'test']) Size of train dataset: (157252, 3) Size of Validation dataset: (5599, 3) dict_keys(['train', 'validation', 'test']) Size of train dataset: (157252, 3) Size of Validation dataset: (5599, 3) bFloat16 enabled for mixed precision - using bfSixteen policy ```
correct `model.train` description to be 'Put model into training mode' as opposed to 'Put model into inference mode'
* Add requirements.txt to examples which miss them Signed-off-by: Dmitry Rogozhkin <[email protected]> * Update numpy requirement for reinforcement_learning to be <2 Current version of the example requires `numpy<2` otherwise the following error can be seen: ``` AttributeError: module 'numpy' has no attribute 'bool8'. Did you mean: 'bool'? ``` Signed-off-by: Dmitry Rogozhkin <[email protected]> * Update torch requirement for time and word examples to be <2.6 Current version of examples require `torch<2.6` otherwise the following error can be seen: ``` File "/pytorch/examples/time_sequence_prediction/train.py", line 47, in <module> data = torch.load('traindata.pt') ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/pytorch/examples/time_sequence_prediction/.venv/lib/python3.12/site-packages/torch/serialization.py", line 1524, in load raise pickle.UnpicklingError(_get_wo_message(str(e))) from None ``` Signed-off-by: Dmitry Rogozhkin <[email protected]> * Respect each example requirements and use uv This commit introduces few changes to CI by modifying `run_*_examples.sh` and respective github workflows: * Switched to uv * Added tearup and teardown stages for tests (`start()` and `stop()` methods wrapping up test bodies - these are called automatically) * Tearup (`start()`) installs example dependencies and, optionally (if `VIRTUAL_ENV=.venv` is passed), creates uv virtual environment * Teardown (`stop()`) removes uv virtual environment if it was created (to save space) * If no `VIRTUAL_ENV` set, then scripts expect to be executed in the existing virtual environment. These can be `python -m venv`, `uv env` or `conda env`. In this case example dependencies will be installed in this environment potentially reinstalling existing packages (including `torch`!). * Dropped automated detection of CUDA platform. Now scripts require `USE_CUDA=True` to be passed explicitly * Added `PIP_INSTALL_ARGS` environment variable to be passed to `uv pip install` calls for each example dependencies. This allows to adjust torch indices and other options. Execute all tests in current virtual environment (might rewrite packages): ``` ./run_distributed_examples.sh ``` Execute all tests creating separate environment for each example: ``` VIRTUAL_ENV=.venv ./run_distributed_examples.sh ``` Run with CUDA: ``` USE_CUDA=True ./run_distributed_examples.sh ``` Adjust index: ``` PIP_INSTALL_ARGS="--pre -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html" \ ./run_distributed_examples.sh ``` Signed-off-by: Dmitry Rogozhkin <[email protected]> --------- Signed-off-by: Dmitry Rogozhkin <[email protected]>
Update GitHub Actions workflow and requirements for documentation build - Upgrade actions/checkout from v2 to v4 - Refactor dependencies installation in the workflow - Pin sphinx version to 5.3.0 in requirements.txt with descriptions Signed-off-by: jafraustro <[email protected]>
Refactor GAT example to utilize `torch.accelerator` API `torch.accelerator` API allows to abstract some of the accelerator specifics in the user scripts. By leveraging this API, the code becomes more adaptable to various hardware accelerators. Signed-off-by: jafraustro <[email protected]>
* Use torch.acceleratort API in VAE example * Use torch.accelerator API in VAE examples, fix README
* FSDP2 example Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * update README Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * fix typo in README Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * fix README Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
* Fix documentation build: GitHub Actions, Sphinx 5.3.0, and theme compatibility * simplified workflow and remove redundant configs * removed venv activation line * removed unnecessary line
Signed-off-by: dggaytan <[email protected]>
Update the example usage of `torch.load()` with required safe globals. Signed-off-by: Dmitry Rogozhkin <[email protected]>
* save model in global rank 0 in multinode * set_epoch only when training
…ccelerator API (#1342) * Restore default CI configuration for VAE and Siamese examples using Accelerator API * Update Siamese Readme for consistency with accelerator argument * Update siamese_network/README.md Co-authored-by: Dmitry Rogozhkin <[email protected]> * Update siamese_network/README.md Co-authored-by: Dmitry Rogozhkin <[email protected]> * Update siamese_network/main.py Co-authored-by: Dmitry Rogozhkin <[email protected]> * Update vae/main.py Co-authored-by: Dmitry Rogozhkin <[email protected]> * Improve Readme files for clearer descriptions * Update Readme file structure to enhance organization --------- Co-authored-by: Dmitry Rogozhkin <[email protected]>
Signed-off-by: eromomon <[email protected]>
Signed-off-by: Edgar Romo Montiel <[email protected]>
…order Co-authored-by: Dmitry Rogozhkin <[email protected]>
…after each call to word_language_model/main.py
Update super_resolution example to support accelerate API
* Add Differentiable Physics: Mass-Spring System example * Add differentiable_physics to run_all() in test script * Add visualization and update training code in mass_spring.py * Finalize differentiable_physics with visualization and CI integration * Finalize differentiable_physics with visualization and CI integration * Finalize differentiable_physics with the updates * Update requirements.txt for differentiable_physics * Update run_python_examples.sh to test differentiable_physics in CI * Add mass spring example and update requirements * Add mass spring example and update requirements * Updated README and visualization from corporate ID (abhitorch81) * Update readme.md --------- Co-authored-by: Abhishek Nandy <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot]
Can you help keep this open source service alive? 💖 Please sponsor : )