Pulse · pytorch/pytorch · GitHub

June 30, 2025 – July 7, 2025

Overview

163 Active pull requests

211 Active issues

9 Pull requests merged by 2 people

Fix cuda 12.9 aarch64 GPU builds. Update CUDA_STABLE variable.
#157641 merged Jul 4, 2025
Remove +PTX from CUDA 12.8 builds
#157634 merged Jul 4, 2025
Cleanup leftover miniconda brew installation
#157567 merged Jul 4, 2025
Fix GITHUB_OUTPUT syntax in create_release.yml workflow
#157539 merged Jul 4, 2025
[aarch64] Add back NCCL lib to cuda arm wheel
#157105 merged Jul 4, 2025
[MPS] Revert cumsum/cumprod to MPSGraph implementation
#157494 merged Jul 3, 2025
[ez] Disable some failing periodic tests
#157560 merged Jul 3, 2025
Revert "Update triton version to 3.4"
#157471 merged Jul 2, 2025
[ROCm] Bump AOTriton to 0.10b
#156845 merged Jun 30, 2025

154 Pull requests opened by 85 people

[distributed] build enum for Backend class
#157263 opened Jun 30, 2025
Fix init CUDA preload: get correct versions (#147001)
#157264 opened Jun 30, 2025
[inductor] fix tensor.to(uint8) error when tensor src type is float
#157267 opened Jun 30, 2025
Fix the Problems About Defining Static Variable in Inline Function
#157269 opened Jun 30, 2025
[inductor][templates] Finalize all registered hooks
#157270 opened Jun 30, 2025
Update docs dependencies
#157287 opened Jun 30, 2025
[nativert] add memory overlap debug assertion
#157290 opened Jun 30, 2025
adding the ability to record aten arg vals and types
#157291 opened Jun 30, 2025
Fixes typo in nccl_window_registration test
#157293 opened Jun 30, 2025
Enable `file_descriptor` strategy on Darwin
#157295 opened Jun 30, 2025
Test re-enabling ET test
#157298 opened Jun 30, 2025
[AOTI][experiment]
#157301 opened Jun 30, 2025
[dynamo] Fix source for lru_cache method
#157308 opened Jun 30, 2025
Fix inconsistent pybind11 usage across ONNX and Tensorpipe during CMake build
#157309 opened Jun 30, 2025
Using torch.accelerator in comm_mode_features_example.py and visualize_sharding_example.py
#157317 opened Jun 30, 2025
Making input dynamically adjust.
#157324 opened Jun 30, 2025
Add inductor lowerings for adaptive_avg_pool3d/adaptive_max_pool3d
#157331 opened Jun 30, 2025
[BE] Rename TorchVersion -> VersionString
#157333 opened Jul 1, 2025
Make the name assert actually do something, and reserve some more names
#157342 opened Jul 1, 2025
[dynamo] Replace unimplemented with unimplemented_v2 in `torch/_dynamo/variables/torch.py`
#157344 opened Jul 1, 2025
Add Intel GPU info collection to the collect env script
#157351 opened Jul 1, 2025
[cherry-pick] temporarily disabling generation of weblinks for torch v2.8 …
#157353 opened Jul 1, 2025
[BE] Update xpu driver repo for CD used almalinux 8.10
#157356 opened Jul 1, 2025
[BE] fix typo: inpt -> input
#157361 opened Jul 1, 2025
Fix diagnostic message for CUDA version mismatch in cuda.cmake
#157370 opened Jul 1, 2025
[HF][DCP] Upload local consolidated files to remote storage if needed
#157371 opened Jul 1, 2025
[submodule][cutlass] Update pin to b995f93 v4.0.0
#157376 opened Jul 1, 2025
[release/2.8] update Triton 3.4 pin to f81f19a7
#157377 opened Jul 1, 2025
101385: Warning message when non-coo tensors are passed to `is_sparse`
#157378 opened Jul 1, 2025
[inductor] Fix memory layout for concatenation of repeated input
#157380 opened Jul 1, 2025
[multi-kernel][fix-comments] attempt-1
#157384 opened Jul 1, 2025
[CI] Fixes CI for CUDA Version > 12.9
#157385 opened Jul 1, 2025
Add explicit typing to nn.Module.__init__() parameters
#157389 opened Jul 1, 2025
[xpu] Correctly load RNG state during XPU checkpointing
#157390 opened Jul 1, 2025
[dynamic shapes] allocate fresh symbols for slice
#157392 opened Jul 1, 2025
Fix is_unaligned usage of statically_known_true
#157400 opened Jul 1, 2025
[SymmMem] Install NVSHMEM wheel in CI docker
#157411 opened Jul 2, 2025
[cherry-pick] Organize BUCK for torch/standalone and Rename torch::standalone to headeronly
#157418 opened Jul 2, 2025
Preserve current stream in TestCuda::test_stream_compatibility
#157421 opened Jul 2, 2025
[PowerPC] Fixed build issue for vsx vec256 complexfloat and scaled_mm_out_cpu
#157422 opened Jul 2, 2025
Add a flag "realized" in IRNode to enable tracking origin_nodes
#157423 opened Jul 2, 2025
[Refactor][XPU] Refactor XPU quantization op and add header files.
#157430 opened Jul 2, 2025
[build] make SDist buildable: bootstrap git repo and submodules
#157432 opened Jul 2, 2025
[test] Yanbing/tf32 dev
#157433 opened Jul 2, 2025
Add a test for checking that the CUDA stubs directory is not in libcaffe2_nvrts.so's RPATH or RUNPATH
#157437 opened Jul 2, 2025
Fix FlexAttention int64 indexing for large tensors
#157447 opened Jul 2, 2025
[inductor][user triton] sanitize triple-quoted docstrings in kernel definitions
#157454 opened Jul 2, 2025
Add legacy note to autograd.profiler doc.
#157459 opened Jul 2, 2025
[PowerPC]: Fixed build issue that occur because of datatype f8 enablement for onednn in qlinear and prepack
#157469 opened Jul 2, 2025
[dynamo] Add an assertion in guards to fail early for non-sequence length checks
#157478 opened Jul 2, 2025
[wip] torch._dynamo.save/load() for saving and loading compiled models.
#157481 opened Jul 2, 2025
Fix typo: 'tracable' → 'traceable' in torch/_dynamo/variables/torch.py
#157483 opened Jul 2, 2025
[BE] rewrite `CacheBase` and `LocalCache` as generics
#157493 opened Jul 2, 2025
Add test for user-managed weights with load_state_dict
#157496 opened Jul 2, 2025
[EXPERIMENTL][dynamo] Remove `input_source_to_var`
#157497 opened Jul 2, 2025
Add `max_pool3d` backward pass for MPS
#157498 opened Jul 2, 2025
[EXPERIMENTAL] turn on `torch._dynamo.config.capture_scalar_outputs` by default
#157499 opened Jul 2, 2025
[EXPERIMENTAL] turn on `torch._dynamo.config.capture_dynamic_output_shape_ops` by default
#157500 opened Jul 2, 2025
[DeviceMesh] Use user set backend and pg option even for the global mesh
#157501 opened Jul 2, 2025
[autograd] Avoid creating and recording event when unnecessary
#157503 opened Jul 2, 2025
[1/N] cost coverage improvment
#157504 opened Jul 2, 2025
[refactor][dynamo] make BUILD_TUPLE instruction use inst.arg
#157505 opened Jul 2, 2025
[WIP][FSDP2] support dataclass args/kwargs and output
#157506 opened Jul 2, 2025
[DO NOT MERGE] Clone of PR #157309
#157507 opened Jul 2, 2025
[wip] inspect output code
#157508 opened Jul 2, 2025
[ONNX] Fix conversion of attention - 4D
#157509 opened Jul 2, 2025
[wip] merge async and progressive
#157510 opened Jul 2, 2025
[dynamo] fix infinite loop in computing all stack meta
#157511 opened Jul 2, 2025
[dynamo] Fix bug in dict(mapping_proxy)
#157515 opened Jul 2, 2025
[PGO] include module int attributes in PGO state
#157518 opened Jul 3, 2025
[cherry-pick] [fake tensor] fix issue of no attribute tags (#156689)
#157519 opened Jul 3, 2025
Enable TF32 as fp32 internal precision for matmul/linear/conv
#157520 opened Jul 3, 2025
[c10d] support dynamic shapes for all_to_all_single_autograd
#157521 opened Jul 3, 2025
[DeviceMesh] Add error when users try to slice non contiguous flattened dim submesh
#157523 opened Jul 3, 2025
[Easy] Show some clear error when torch.ops.load_library fails.
#157524 opened Jul 3, 2025
[br][pc] consolidate attempt 1
#157526 opened Jul 3, 2025
[dynamo, docs] add dynamo programming model docs
#157527 opened Jul 3, 2025
[WIP] avoid unnecessary slices
#157528 opened Jul 3, 2025
[FSDP2] Use reduceOpSum for world size 1
#157529 opened Jul 3, 2025
Fix typo: 'reset_paramteres' → 'reset_parameters' in transformer.cpp
#157536 opened Jul 3, 2025
handling special case for pow(3) for GPU
#157537 opened Jul 3, 2025
Don't try installing missing cuda dependencies on s390x
#157540 opened Jul 3, 2025
S390x update test marks
#157541 opened Jul 3, 2025
[indcutor] pack linear for FP32 dynamic mode
#157542 opened Jul 3, 2025
Add is_hidden_event method to KinetoEvent Python interface
#157546 opened Jul 3, 2025
[BE][1/5] fix typos in aten/
#157550 opened Jul 3, 2025
[BE][2/5] fix typos in aten/ (aten/src/ATen/native/)
#157551 opened Jul 3, 2025
[BE][3/5] fix typos in aten/ (aten/src/ATen/native/)
#157552 opened Jul 3, 2025
[BE][4/5] fix typos in aten/ (aten/src/ATen/native/)
#157553 opened Jul 3, 2025
[BE][5/5] fix typos in aten/ (aten/src/ATen/)
#157554 opened Jul 3, 2025
Try adding sm_50-sm_70 arches for linux cuda 12.8 builds
#157558 opened Jul 3, 2025
Linux py 3.14 wheel builds
#157559 opened Jul 3, 2025
[PT2][memory] mutation size correctness
#157562 opened Jul 3, 2025
[PT2][fusion] ban fusions with large accumulated reads
#157563 opened Jul 3, 2025
[dynamo] [guard] Change the guard type of inside disable function to avoid unnecessary recompilation.
#157566 opened Jul 3, 2025
[MPS][DO NOT MERGE] CI signals for conv nan issue on macOS CPU
#157568 opened Jul 3, 2025
[simplefsdp auto-bucketing] ir node runtime estimation
#157572 opened Jul 3, 2025
Test case for nanogpt
#157576 opened Jul 3, 2025
Fixed the function to get the origin nodes of fused triton kernel.
#157578 opened Jul 3, 2025
[fbcode] switch to cutlass-4
#157579 opened Jul 3, 2025
allow user to pass in custom partitioner function
#157580 opened Jul 3, 2025
Fix typo: 'initalizer' → 'initializer' in test_reductions.cpp
#157581 opened Jul 3, 2025
allow _size_of to return individual element's size
#157582 opened Jul 3, 2025
correctly import torch.version
#157584 opened Jul 3, 2025
[CUDA][NVTX] use `pytorch` nvtx domain for pytorch ranges
#157586 opened Jul 3, 2025
Add einops x torch.compile testing in PyTorch CI (#157416)
#157588 opened Jul 3, 2025
Add stack trace of exception to MultiProcContinousTest
#157589 opened Jul 3, 2025
Add master switch for aot_inductor.compile_standalone
#157590 opened Jul 3, 2025
[AOTI] Split aoti_runtime/model.h to prepare for model static linking
#157592 opened Jul 3, 2025
Fix einsum strategy shard dim > ndim
#157593 opened Jul 3, 2025
[dynamo] Move skipIf decorator to class level in test_fx_graph_runnable
#157594 opened Jul 3, 2025
Fix doc issue 153531 by adding further explanation of STFT equation
#157595 opened Jul 3, 2025
Fix typo: 'inital_grad' → 'initial_grad' in FSDP test
#157596 opened Jul 3, 2025
Fix einops x torch.compile interaction
#157600 opened Jul 4, 2025
[DRAFT] DDE-Free select with unbacked index.
#157605 opened Jul 4, 2025
[aot] add format_consts_to_cpp function for further development.
#157608 opened Jul 4, 2025
[Device] Add support for PrivateUse1 device type in parse_type function
#157609 opened Jul 4, 2025
[pruning] add more test cases for pruning
#157613 opened Jul 4, 2025
tlparse remove duplicate reasons
#157618 opened Jul 4, 2025
[pruning] Implement Taylor expansion unstructured pruning
#157620 opened Jul 4, 2025
[nativert] Move ModelRunnerBase to oss.
#157633 opened Jul 4, 2025
[BE][1/6] fix typos in test/
#157635 opened Jul 4, 2025
[BE][2/6] fix typos in test/ (test/test_*.py)
#157636 opened Jul 4, 2025
[BE][3/6] fix typos in test/
#157637 opened Jul 4, 2025
[BE][6/6] fix typos in test/ (test/distributed/)
#157640 opened Jul 4, 2025
Fix typo: 'initalization' → 'initialization' in profiler test comment
#157645 opened Jul 4, 2025
[MemoryViz] Add file selector button
#157647 opened Jul 4, 2025
Fix typo: 'occurance' → 'occurrence' in typing test
#157649 opened Jul 4, 2025
Fix typo: 'paramter' → 'parameter' in dynamo variable comment
#157651 opened Jul 4, 2025
[wip] async cancellation test
#157652 opened Jul 4, 2025
Fix typo: 'reset_paramteres' → 'reset_parameters' in transformer module comments
#157656 opened Jul 4, 2025
Fixes issue 157195 by adding error message
#157658 opened Jul 5, 2025
[wip] merge async and progressive compiles
#157659 opened Jul 5, 2025
Fix typo: 'occurance' → 'occurrence' in lazy extract_compiled_graph.py
#157664 opened Jul 5, 2025
Fix typo: 'initalizer' → 'initializer' in test_reductions.cpp
#157667 opened Jul 5, 2025
Fix 'dllimport attribute ignored on inline function'
#157670 opened Jul 6, 2025
Fix index_put propagate strategy arg unpack error
#157671 opened Jul 6, 2025
Fix torch._numpy advanced indexing to match NumPy when indices are separated
#157676 opened Jul 6, 2025
[pt2 event logging] add configurable prefix
#157678 opened Jul 6, 2025
installing requirements.txt fix
#157681 opened Jul 6, 2025
[dtensor] add support for fused optimizer with parameters across multiple meshes
#157682 opened Jul 7, 2025
[Inductor][Float8] Add float8_e4m3fn into assertion dtype list.
#157684 opened Jul 7, 2025
[BE] add `SHFMT` linter to format shell scripts
#157685 opened Jul 7, 2025
[BE][1/4] format shell scripts with `SHFMT`
#157686 opened Jul 7, 2025
[BE][2/4] format shell scripts with `SHFMT` in .circleci/ and .github/
#157687 opened Jul 7, 2025
[BE][3/4] format shell scripts with `SHFMT` in .ci/
#157688 opened Jul 7, 2025
[BE][4/4] format shell scripts with `SHFMT` in scripts/
#157689 opened Jul 7, 2025
[canary] dedupe args + on by default
#157690 opened Jul 7, 2025
[canary] dedupe args + on by default
#157691 opened Jul 7, 2025
[BE][Easy] add `.editorconfig` setting for C/C++/CUDA/ObjC
#157692 opened Jul 7, 2025
[CI] Fix xpu ci test sccache issue
#157693 opened Jul 7, 2025
fix storage use_count
#157694 opened Jul 7, 2025
[SymmMem] find_path does not search /usr/local/lib
#157695 opened Jul 7, 2025
Update slow tests
#157696 opened Jul 7, 2025

84 Issues closed by 25 people

DISABLED test_ranks_and_tag (__main__.CompileTest)
#147974 closed Jul 7, 2025
DISABLED test_dont_dce_rand (__main__.ReproTests)
#156580 closed Jul 7, 2025
DISABLED test_add_complex_conj (__main__.ReproTests)
#156579 closed Jul 7, 2025
DISABLED test_tracker_with_activation_checkpointing (__main__.TestTrackerFullyShard1DTrainingCompose)
#139814 closed Jul 7, 2025
DISABLED test_tracker_non_root_forward_backward (__main__.TestTrackerFullyShard1DTrainingCore)
#129692 closed Jul 7, 2025
DISABLED test_non_contiguous_input_mm_plus_mm (__main__.TestMaxAutotune)
#126867 closed Jul 7, 2025
DISABLED test_aoti (__main__.TestMemoryPlanning)
#145211 closed Jul 7, 2025
DISABLED test_graph_partition_forward_backward_not_called (__main__.CudaGraphTreeTests)
#157642 closed Jul 7, 2025
Will the Metal4 update bring significant optimizations for future pytorch mps performance and compatibility?
#157660 closed Jul 6, 2025
`torch.compile` fails with `UnicodeDecodeError` when model contains extreme value injection
#156451 closed Jul 6, 2025
torch.utils.cpp_extension fails to parse clang version 20.1.7+libcxx
#157665 closed Jul 6, 2025
Mispelled "paramter" in test_fully_shard_training.py
#157564 closed Jul 5, 2025
torch.nonzero(t, as_tuple=...) does not work with the JIT because the as_tuple signatures are not exposed properly
#45499 closed Jul 5, 2025
test_ops.py extremely slow on cuda11.3
#79528 closed Jul 5, 2025
Torch.compile Dynamo failed to run FX node with fake tensors
#157657 closed Jul 5, 2025
Fix warning #177-D: variable "threshold" was declared but never referenced
#157653 closed Jul 5, 2025
DISABLED test_is_isnot (__main__.TestScript)
#120694 closed Jul 4, 2025
DISABLED test_sdpa_mask_fp16_L6_S17_NH23_HS121 (__main__.TestSDPA)
#138905 closed Jul 4, 2025
DISABLED test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_int32 (__main__.TestForeachCUDA)
#156497 closed Jul 4, 2025
DISABLED test_Linear_cuda_tf32 (__main__.TestNN)
#155216 closed Jul 4, 2025
DISABLED test_graph_partition_forward_backward (__main__.CudaGraphTreeTests)
#157615 closed Jul 4, 2025
Importing `torch` overwrites `typing.TypeIs` when `_running_with_deploy()` is true.
#153942 closed Jul 4, 2025
INTERNAL ASSERT FAILED in mse_loss when mixing CPU and CUDA tensors
#154978 closed Jul 4, 2025
DISABLED test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_int16 (__main__.TestForeachCUDA)
#156430 closed Jul 4, 2025
DISABLED test_graph_partition_dynamic_shapes (__main__.CudaGraphTreeTests)
#157555 closed Jul 4, 2025
[aot_compile]Explanation: Dynamo does not know how to trace the builtin `time.time.`
#157352 closed Jul 4, 2025
[inductor] `F.fractional_max_pool2d` throws `AssertionError` on Inductor when input `rank=3`
#156682 closed Jul 4, 2025
Set inplace operations are not updating the set inplace
#153552 closed Jul 4, 2025
dynamo cannot trace global op_set .__contains__
#145761 closed Jul 4, 2025
Why scale value of GradScaler sudden changed?
#157436 closed Jul 4, 2025
Incorrect inference of the groups parameter type for channel_stuffle (int misclassified as Tensor)
#157602 closed Jul 4, 2025
A more flexible API for torch.compile fullgraph=True
#144908 closed Jul 3, 2025
Suggestion: integration of einops test suite
#146782 closed Jul 3, 2025
DISABLED test_set_stance_aot_eager_then_compile (__main__.DecoratorTests)
#148644 closed Jul 3, 2025
DISABLED test_graph_partition_custom_op_no_split (__main__.CudaGraphTreeTests)
#157532 closed Jul 3, 2025
Wrong vector shift results on PowerPC
#109777 closed Jul 3, 2025
Enhanced torch.chunk and torch.split
#60531 closed Jul 3, 2025
DISABLED test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_float64 (__main__.TestForeachCUDA)
#153544 closed Jul 3, 2025
DISABLED test_graph_partition_custom_op_mutation (__main__.CudaGraphTreeTests)
#157448 closed Jul 3, 2025
DISABLED [WORKFLOW_NAME] / [PLATFORM_NAME] / [JOB_NAME]
#157530 closed Jul 3, 2025
[CPU][flex attention] Llama 3 failed on CPU with PyTorch 2025-06-22 nightly wheel
#156688 closed Jul 3, 2025
an illegal memory access was encountered global exception
#136407 closed Jul 3, 2025
Bug with "make latexpdf"
#135420 closed Jul 3, 2025
pytorch 2.1.2+cu118, RTX 8000, backward show RuntimeError: Expected is_sm80 || is_sm90 to be true, but got false.
#136979 closed Jul 3, 2025
some tests regarding `torch.export` in `transformers` fail with `torch 2.8.0 rc` but pass with `torch 2.7.1`
#157284 closed Jul 2, 2025
Tiny Typo in Docs
#157444 closed Jul 2, 2025
[inductor] vision_maskrcnn dashboard failure on H100 (and MI300)
#157316 closed Jul 2, 2025
Profiler: Add hide metadata flag to skip events in key_averages() table
#155213 closed Jul 2, 2025
torch.compile triton kernel errors when there are """ docblocks
#155006 closed Jul 2, 2025
DISABLED test_graph_partition_custom_op_dynamoc_shapes (__main__.CudaGraphTreeTests)
#157426 closed Jul 2, 2025
DISABLED test_graph_partition_custom_op (__main__.CudaGraphTreeTests)
#157412 closed Jul 2, 2025
AttributeError: '_OpNamespace' 'aten' object has no attribute 'momentum'
#145274 closed Jul 2, 2025
DISABLED test_name_match (__main__.TestGuardSerialization)
#156246 closed Jul 2, 2025
DISABLED test_shape_env (__main__.TestGuardSerialization)
#156264 closed Jul 2, 2025
DISABLED test_graph_partition_cpu_tensor_symints (__main__.CudaGraphTreeTests)
#157366 closed Jul 2, 2025
torch.export produce stack_trace for output node that can fail decomposition
#157183 closed Jul 1, 2025
[MPS] `torch.compile` fails on `torch.linalg.cholesky` (possible memory layout issue?)
#156658 closed Jul 1, 2025
DISABLED test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_float32 (__main__.TestForeachCUDA)
#153470 closed Jul 1, 2025
test issue, ignore this
#157151 closed Jul 1, 2025
the example program using libtorch is not linked against torch_cuda even when USE_CUDA is defined
#148770 closed Jul 1, 2025
DISABLED test_graph_partition_cpu_scalar_mutation (__main__.CudaGraphTreeTests)
#157358 closed Jul 1, 2025
[ROCm] support torch._C._set_sm_carveout_experimental - Parity with Nvidia
#149280 closed Jul 1, 2025
DISABLED test_graph_partition_cpu_scalar4 (__main__.CudaGraphTreeTests)
#157347 closed Jul 1, 2025
DISABLED test_graph_partition_cpu_scalar3 (__main__.CudaGraphTreeTests)
#157338 closed Jul 1, 2025
[inductor] [triton backend] `Conv2d-unsqueeze-AdaptiveAvgPool3d` output incorrect results on inductor
#157248 closed Jul 1, 2025
DISABLED test_graph_partition_cpu_scalar2 (__main__.CudaGraphTreeTests)
#157311 closed Jul 1, 2025
avoid guarding on max() unnecessarily
#149635 closed Jun 30, 2025
[Upstream Triton] Support new host-side TMA API in user-defined triton kernels
#155574 closed Jun 30, 2025
[feature request][AOTI] Expand check input assertions to cover input guards created during compilation?
#151925 closed Jun 30, 2025
DISABLED test_lowering_to_x86 (__main__.TestQuantizePT2EX86Inductor)
#153140 closed Jun 30, 2025
aot inductor intermediate tensor debug printing (setting 2) not working
#145425 closed Jun 30, 2025
Certain MPS operations didn't properly check for data type
#157303 closed Jun 30, 2025
Missing MPS-compatible build for PyTorch 2.7.1 on Apple Silicon (M4)
#157271 closed Jun 30, 2025
Native BFloat16 Mixed BatchNorm Train gives incorrect gradients
#156513 closed Jun 30, 2025
[release] Make pytorch source distribution package respect pep-0517
#150461 closed Jun 30, 2025
DISABLED test_ddp_uneven_inputs (__main__.TestDistBackendWithSpawn)
#75648 closed Jun 30, 2025
DISABLED test_graph_partition_cpu_scalar1 (__main__.CudaGraphTreeTests)
#157277 closed Jun 30, 2025
Why is there such a big difference in size between the torch CUDA 12.6 whl and the cuda 12.9 whl? Why is only the 12.6 whl package placed on the PyPI source?
#157265 closed Jun 30, 2025
DISABLED test_quantize (__main__.TestOpenReg)
#156089 closed Jun 30, 2025
DISABLED test_jacobian_vectorize_raises_no_warnings_logging_tensor (__main__.TestAutogradFunctional)
#153707 closed Jun 30, 2025
DISABLED test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_float16 (__main__.TestForeachCUDA)
#153379 closed Jun 30, 2025
RuntimeError: d.is_cuda() INTERNAL ASSERT FAILED at "/pytorch/c10/cuda/impl/CUDAGuardImpl.h"
#151486 closed Jun 30, 2025
DISABLED test_reorder_peak_memory (__main__.TestOperatorReorderForPeakMemory)
#145332 closed Jun 30, 2025
DISABLED test_graph_partition_cpu_op_and_dynamic_shapes (__main__.CudaGraphTreeTests)
#157257 closed Jun 30, 2025

127 Issues opened by 74 people

[inductor][fuzzer] Compilation Error in complex64+toint
#157683 opened Jul 7, 2025
CONTRIBUTING.md install command incorrect
#157680 opened Jul 6, 2025
Flex Attention breaks in certain cases when used with a learned bias
#157677 opened Jul 6, 2025
Cannot create a mask for each sequence in a batch with Flex Attention
#157675 opened Jul 6, 2025
extern declaration of the entity XXX is treated as a static definition
#157674 opened Jul 6, 2025
Inductor throws UnicodeDecodeError when compiling a simple model on Windows with MSVC
#157673 opened Jul 6, 2025
Feedback about Getting Started on Intel GPU
#157672 opened Jul 6, 2025
NCCL error caused due to use of NVLS in torch 2.7.1-cu128 on aarch64 gb200 cluster
#157668 opened Jul 6, 2025
ConvNd ops in channel last layout (N,L,C) / (N,H,W,C) / (N,D,H,W,C)
#157663 opened Jul 5, 2025
OffsetBasedRNGTracker's run_state_sync causes deadlock due to inconsistent broadcast order across ranks
#157662 opened Jul 5, 2025
RuntimeError: operator torchvision::nms does not exist
#157648 opened Jul 4, 2025
DISABLED test_vmap_exhaustive_dot_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157644 opened Jul 4, 2025
DISABLED test_graph_partition_forward_backward_not_called (__main__.CudaGraphTreeTests)
#157643 opened Jul 4, 2025
Einsum of 2 dtensors fails in inference mode
#157631 opened Jul 4, 2025
Regression: torch.distributed.gather_object segfaults
#157627 opened Jul 4, 2025
Segmentation faults in test_ops.py tests with gcc13 on AArch64 (v1)
#157626 opened Jul 4, 2025
Is there some official method to extract the featuremap of each node in pt2 graph like the function torchvision.models.feature_extraction.create_feature_extractor()
#157625 opened Jul 4, 2025
file_name is not correctly read in here
#157624 opened Jul 4, 2025
`TORCH_DISTRIBUTED_DEBUG=DETAIL` causes DTensors to raise errors
#157622 opened Jul 4, 2025
ResNet Onnx export dynamic batch size exported as fixed batch size
#157621 opened Jul 4, 2025
DISABLED test_vmap_exhaustive_addmv_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157617 opened Jul 4, 2025
DISABLED test_graph_partition_forward_backward (__main__.CudaGraphTreeTests)
#157616 opened Jul 4, 2025
`torch.compile` fails with `NotImplementedError: Unsupported for now if query, key, value are the same buffer.` in `flex_attention`
#157612 opened Jul 4, 2025
`torch.compile` fails on `prims.broadcast_in_dim` with alias annotation error
#157610 opened Jul 4, 2025
`torch.compile` fails on `torch.vdot` with complex tensors
#157607 opened Jul 4, 2025
Both DTensor TP and SP are missing the last collective in the backward pass
#157606 opened Jul 4, 2025
Incorrect inference of the groups parameter type for channel_stuffle (int misclassified as Tensor)
#157603 opened Jul 4, 2025
PyTorch 2.7.1 will probably break with einops 0.8.2 or 0.9.0
#157601 opened Jul 4, 2025
PT2E Quantization Migration Tracker
#157591 opened Jul 3, 2025
[DTensor] Better communication cost model for redistribute
#157585 opened Jul 3, 2025
[precompile] Precompile failure on nanogpt training
#157577 opened Jul 3, 2025
torch.compile with numpy code differs from numpy's behavior
#157569 opened Jul 3, 2025
DISABLED test_graph_partition_dynamic_shapes (__main__.CudaGraphTreeTests)
#157556 opened Jul 3, 2025
Add full support for NVIDIA RTX Pro 6000 (Blackwell – SM122 / Compute Capability 12.2)
#157549 opened Jul 3, 2025
Nightly cu128 aarch64 wheels haven't been built for weeks
#157548 opened Jul 3, 2025
Several `torch.*` functions raise uninformative `NotImplementedError`s when called with integer `dtype`
#157547 opened Jul 3, 2025
test_dtensor.py::test_dtensor_save_load_import conflicts with autoloader importing torch._dynamo
#157545 opened Jul 3, 2025
Vmap error raised by mask_mod of FlexAttention
#157543 opened Jul 3, 2025
PyTorch fails to detect AVX through it's detected
#157538 opened Jul 3, 2025
Error in Qwen inference: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "/pytorch/c10/cuda/CUDACachingAllocator.cpp
#157535 opened Jul 3, 2025
DISABLED test_vmap_exhaustive___rmatmul___cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157534 opened Jul 3, 2025
DISABLED test_graph_partition_custom_op_no_split (__main__.CudaGraphTreeTests)
#157533 opened Jul 3, 2025
pytorch
#157531 opened Jul 3, 2025
[release 2.9] Deprecate support for Maxwell, Pascal, and Volta architectures
#157517 opened Jul 3, 2025
Failure with cub::TransformInputIterator in 12.9 periodic CI test
#157502 opened Jul 2, 2025
[DTensor] Improve `tensor_metadata` and `redistribute_cost` coverage for op strategies.
#157495 opened Jul 2, 2025
Quantized version of Gather layer
#157490 opened Jul 2, 2025
`FSDPModule.set_reduce_scatter_divide_factor` on subset of parameters is broken?
#157485 opened Jul 2, 2025
torch.ops._c10d_functional_autograd.all_to_all_single missing dynamic shapes support
#157479 opened Jul 2, 2025
torch 2.6 and torchvision 0.21.0 incompatibility?
#157476 opened Jul 2, 2025
[AOTI] Unit test for testing load_state_dict and
#157474 opened Jul 2, 2025
Nightly NCCL builds are missing optional features from NCCL
#157465 opened Jul 2, 2025
vLLM tests failing in torch 2.8rc but passing with torch 2.7
#157461 opened Jul 2, 2025
torch._dynamo.exc.InternalTorchDynamoError: RuntimeError: Compiler: cl is not found
#157458 opened Jul 2, 2025
RNN pseudocode wrong?
#157457 opened Jul 2, 2025
Deprecation of CUTLASS Python interface
#157456 opened Jul 2, 2025
we should graph break on nn.Parameter constructors
#157452 opened Jul 2, 2025
Dynamo's einops version check is bogus
#157451 opened Jul 2, 2025
DISABLED test_op_has_batch_rule_vdot_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157450 opened Jul 2, 2025
DISABLED test_graph_partition_custom_op_mutation (__main__.CudaGraphTreeTests)
#157449 opened Jul 2, 2025
FlexAttention + int64 indexing
#157446 opened Jul 2, 2025
DDP+TP composition does not work as expected
#157445 opened Jul 2, 2025
[Regression] The torchbench model resnet50_quantized_qat fail_to_run in Pytorch 2.8 but pass in PyTorch 2.7
#157434 opened Jul 2, 2025
``torch.quantile`` edge case
#157431 opened Jul 2, 2025
DISABLED test_graph_partition_custom_op_dynamoc_shapes (__main__.CudaGraphTreeTests)
#157428 opened Jul 2, 2025
DISABLED test_addmm_relu_cuda_float32 (__main__.TestLinalgCUDA)
#157427 opened Jul 2, 2025
The torch.gather documentation states that input and index must have the same number of dimensions. However, no corresponding validation is added.
#157425 opened Jul 2, 2025
nll_loss gives result when both input and target are 1D tensor
#157420 opened Jul 2, 2025
In the torch.Tensor.scatter_ documentation, self, index, and src (if it is a tensor) should have the same number of dimensions, but in practice, the CPU、Gpu does not add a check. Validation needs to be added.
#157419 opened Jul 2, 2025
einops 0.6.1 x torch.compile broken in pytorch nightlies
#157417 opened Jul 2, 2025
DISABLED test_old_cholesky_batched_upper_cuda_float32 (__main__.TestLinalgCUDA)
#157415 opened Jul 2, 2025
DISABLED test_addmm_mv_transpose_a_True_transpose_b_True_alpha_0_2_beta_1_0_cuda_float32 (__main__.TestLinalgCUDA)
#157414 opened Jul 2, 2025
DISABLED test_graph_partition_custom_op (__main__.CudaGraphTreeTests)
#157413 opened Jul 2, 2025
[CI] s390x-periodic tests broken with "No matching distribution found for cuda-bindings<13.0,>=12.0"
#157409 opened Jul 2, 2025
[autograd] Slowdown in backward after #151079
#157407 opened Jul 1, 2025
Calling unbind on 2D NestedTensor throws RuntimeError
#157404 opened Jul 1, 2025
AOTI: Failure in compile_fx.py with FakeScriptObject (with possible fix)
#157401 opened Jul 1, 2025
[dynamo] using disable inside of compile always recompiles
#157399 opened Jul 1, 2025
Cannot copy data from one gpu to another using torch
#157398 opened Jul 1, 2025
[dynamo] non-strict trace'd functions cannot return constants
#157397 opened Jul 1, 2025
[FSDP2] figure out the contract for mp_policy and tensor subclass extention
#157395 opened Jul 1, 2025
How to compose HSDP with CP?
#157393 opened Jul 1, 2025
[FSDP2] document the contract for modifying DTensor model.parameters()
#157391 opened Jul 1, 2025
Torch is unusable when cuda-12.4 is installed locally
#157381 opened Jul 1, 2025
[CI] M2Pro MacOS-15 tests are unstable again
#157379 opened Jul 1, 2025
DISABLED test_addmm_mv_transpose_a_True_transpose_b_True_alpha_0_2_beta_0_5_cuda_float32 (__main__.TestLinalgCUDA)
#157369 opened Jul 1, 2025
DISABLED test_op_has_batch_rule_nn_functional_conv_transpose2d_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157368 opened Jul 1, 2025
DISABLED test_graph_partition_cpu_tensor_symints (__main__.CudaGraphTreeTests)
#157367 opened Jul 1, 2025
[MPS] test_linalg_cholesky fails on M4
#157364 opened Jul 1, 2025
torch.Tensor.addmm_ The calculation result is inconsistent with the formula calculation result
#157360 opened Jul 1, 2025
DISABLED test_graph_partition_cpu_scalar_mutation (__main__.CudaGraphTreeTests)
#157359 opened Jul 1, 2025
torch.fx.experimental.symbolic_shapes.GuardOnDataDependentSymNode: Could not guard on data-dependent expression Eq(u0, 1) (unhinted: Eq(u0, 1)). (Size-like symbols: none)
#157355 opened Jul 1, 2025
Bug in cmake/public/cuda.cmake: Incorrect use of set(${...}) leads to missing CUDA version in error message
#157354 opened Jul 1, 2025
DISABLED test_graph_partition_cpu_scalar4 (__main__.CudaGraphTreeTests)
#157350 opened Jul 1, 2025
DISABLED test_matmul_small_brute_force_3d_Nd_cuda_float32 (__main__.TestLinalgCUDA)
#157349 opened Jul 1, 2025
DISABLED test_addmm_mv_transpose_a_True_transpose_b_False_alpha_0_2_beta_1_0_cuda_float32 (__main__.TestLinalgCUDA)
#157348 opened Jul 1, 2025
DISABLED test_conv2d_api (__main__.TestQuantizedFunctionalOps)
#157346 opened Jul 1, 2025
nn.rmsnorm is super slower than nn.layernorm
#157345 opened Jul 1, 2025
ImportError: cannot import name 'scaled_mm_configs' from 'torch._inductor.kernel.mm_common
#157343 opened Jul 1, 2025
DISABLED test_graph_partition_cpu_scalar3 (__main__.CudaGraphTreeTests)
#157339 opened Jul 1, 2025
DISABLED test_matmul_small_brute_force_2d_Nd_cuda_float32 (__main__.TestLinalgCUDA)
#157337 opened Jul 1, 2025
DISABLED test_addmm_mv_transpose_a_True_transpose_b_False_alpha_0_2_beta_0_5_cuda_float32 (__main__.TestLinalgCUDA)
#157336 opened Jul 1, 2025
DISABLED test_op_has_batch_rule_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157335 opened Jul 1, 2025
Inefficient 2D convolution compared to JAX
#157334 opened Jul 1, 2025
[inductor][dynamic shapes] hugging face models fail while creating error guard
#157330 opened Jun 30, 2025
Regression in llama2 model export
#157323 opened Jun 30, 2025
Symmetric memory test failed with TORCH_SYMMMEM=NVSHMEM
#157321 opened Jun 30, 2025
Torch Elastic Wait timeout increase
#157318 opened Jun 30, 2025
DISABLED test_op_has_batch_rule_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157315 opened Jun 30, 2025
DISABLED test_linalg_solve_triangular_cuda_float32 (__main__.TestLinalgCUDA)
#157314 opened Jun 30, 2025
DISABLED test_addmm_mv_transpose_a_False_transpose_b_True_alpha_0_2_beta_1_0_cuda_float32 (__main__.TestLinalgCUDA)
#157313 opened Jun 30, 2025
DISABLED test_graph_partition_cpu_scalar2 (__main__.CudaGraphTreeTests)
#157312 opened Jun 30, 2025
PyTorch Tutorial Audit - ONNX
#157300 opened Jun 30, 2025
DISABLED test_tensordot_cuda (__main__.TestLinalgCUDA)
#157297 opened Jun 30, 2025
DISABLED test_conv1d_api (__main__.TestQuantizedFunctionalOps)
#157296 opened Jun 30, 2025
[export] run_decompositions generates inefficient operations
#157289 opened Jun 30, 2025
DISABLED test_linalg_matrix_exp_compare_with_taylor_cuda_float32 (__main__.TestLinalgCUDA)
#157282 opened Jun 30, 2025
DISABLED test_addmm_mv_transpose_a_False_transpose_b_True_alpha_0_2_beta_0_0_cuda_float32 (__main__.TestLinalgCUDA)
#157281 opened Jun 30, 2025
DISABLED test_graph_partition_cpu_scalar1 (__main__.CudaGraphTreeTests)
#157280 opened Jun 30, 2025
DISABLED test_addmm_dynamic_shapes_cuda (__main__.DynamicShapesGPUTests)
#157279 opened Jun 30, 2025
DISABLED test_op_has_batch_rule_nn_functional_conv2d_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157278 opened Jun 30, 2025
several `transformers` tests fail with `torch 2.8 RC` but pass with `torch 2.7.1` on `T4` (but both pass on `A10`)
#157276 opened Jun 30, 2025
`torch 2.8 RC` gives 10000 larger output difference in some `transformers` tests
#157274 opened Jun 30, 2025
`test_can_compile_fast_image_processor` in `transformers` pass with `torch 2.7` but fail with `torch 2.8 RC`
#157273 opened Jun 30, 2025
`torch.reciprocal` and `torch.divide` for Complex `inf` Incorrectly Returns `NaN` Only for Tensors with >= 4 Elements on CPU
#157272 opened Jun 30, 2025
Better typechecking of `int` only-operators `|`, `^`, `&`, `<<`, `>>`, `~` and `@`
#157266 opened Jun 30, 2025
The opp is not compatible with compile mode="reduce-overhead" and linear layers for large inputs.
#157363 opened Jun 30, 2025

418 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

[ONNX] remove unnecessary slices before converting into onnx
#157192 commented on Jul 3, 2025 • 23 new comments
[AOTI] codegen for static linkage
#157129 commented on Jul 4, 2025 • 16 new comments
[dynamo] Add FakeProcessGroup support for fx_graph_runnable with distributed collectives
#157162 commented on Jul 4, 2025 • 9 new comments
Fix torch.export.export() GPU failure with RNN modules.
#155734 commented on Jul 1, 2025 • 9 new comments
[AOTI][Intel GPU] Add XPU quantization ops to AOT Inductor.
#156572 commented on Jul 3, 2025 • 8 new comments
[DDP][FSDP2] Add unit test for DDP mixed precision with FSDP2 ignored params
#157140 commented on Jul 2, 2025 • 8 new comments
`fast-autotune`: Model Prediction of Triton Kernel Runtimes
#156851 commented on Jun 30, 2025 • 7 new comments
[DLPack] Add support for missing keyword-arguments.
#150218 commented on Jul 4, 2025 • 5 new comments
Added philox based RNG context for HPU device in Dtensor scenarios
#156581 commented on Jul 3, 2025 • 5 new comments
Optimize scatter/gather kernel for ARM.
#156161 commented on Jul 4, 2025 • 4 new comments
[WIP] Automatically load and save dynamo entries via caching_precompile
#155913 commented on Jul 3, 2025 • 4 new comments
Fused RMSNorm implementation
#153666 commented on Jul 2, 2025 • 4 new comments
multi-kernel matmuls based on varying hint sizes
#156628 commented on Jul 3, 2025 • 4 new comments
[ZENDNN] Integrate ZenDNN library, implement Linear op, add unit-tests
#156599 commented on Jul 2, 2025 • 4 new comments
[dynamo] Avoid recompiling over unused objects
#156891 commented on Jul 3, 2025 • 3 new comments
[ROCm] logsumexp on ROCm needs scaling back to natural base.
#156903 commented on Jul 2, 2025 • 3 new comments
[ci][cutlass backend] Add ci for cutlass backend tests
#156626 commented on Jul 3, 2025 • 3 new comments
Adapting pipeline parallelism test cases to be device agnostic
#155108 commented on Jul 2, 2025 • 3 new comments
[HOP, map] Rework of map autograd to the new interface
#153343 commented on Jul 4, 2025 • 3 new comments
Fix slice op redistribute_cost compute
#157178 commented on Jul 3, 2025 • 3 new comments
[scan] Fix issues with scan on CPU and for autograd when implementing an RNN with multiple layers
#155422 commented on Jul 2, 2025 • 3 new comments
[Inductor] Set the default value of min_chunk_size to 512
#150762 commented on Jul 2, 2025 • 3 new comments
Add cascade sum support for Inductor CPP backend
#156296 commented on Jul 3, 2025 • 2 new comments
Fix: fallback in deserialize_torch_artifact for ScriptObject using weights_only=FalseFix: fallback in deserialize_torch_artifact for ScriptObject using we…
#154333 commented on Jul 1, 2025 • 2 new comments
[BE] add a linter to check consistency for cmake minimum version in requirements
#156961 commented on Jul 3, 2025 • 2 new comments
[TEST] triton Update 3.4 - 2
#156664 commented on Jul 4, 2025 • 2 new comments
Update _torch_docs.py to Fix torch.bernoulli()
#152104 commented on Jul 2, 2025 • 2 new comments
[build] remove upper version pin for `setuptools<80.0`
#156049 commented on Jul 4, 2025 • 2 new comments
[generator] Close all open generators in compile_subgraph
#157149 commented on Jul 2, 2025 • 2 new comments
[CUDA] Use runtime driver API for cuStreamWriteValue32
#156097 commented on Jul 3, 2025 • 1 new comment
Enhance testing infrastructure to add half-precision support for `histc` on XPU
#154339 commented on Jul 3, 2025 • 1 new comment
ROCm OCP Micro-scaling Format (mx-fp8/mx-fp4) Support
#151360 commented on Jul 1, 2025 • 1 new comment
[oss] Add version to metadata
#155343 commented on Jul 2, 2025 • 1 new comment
[TEST] Triton 3.4.0 pin update
#156186 commented on Jul 4, 2025 • 1 new comment
Use CMake wholearchive group
#156393 commented on Jul 7, 2025 • 1 new comment
[dynamo][fsdp] Consistent behavior of int attributes
#157262 commented on Jul 2, 2025 • 0 new comments
[BE]: Update CUTLASS submodule to 4.0.0
#153541 commented on Jul 4, 2025 • 0 new comments
implement MKLGenerator
#154199 commented on Jul 3, 2025 • 0 new comments
Upgrade MKL in CI
#154198 commented on Jul 2, 2025 • 0 new comments
[BE]: Update pybind11 submodule to 3.0.0rc
#154115 commented on Jul 4, 2025 • 0 new comments
DOC: update CrossEntropyLoss with note and example of incorrect target specification
#155649 commented on Jul 3, 2025 • 0 new comments
[pytorch_146643] fixed max triton generation
#154056 commented on Jul 2, 2025 • 0 new comments
[pytorch][triton] Enabling TMA for flex-attention for supported device types
#153662 commented on Jul 3, 2025 • 0 new comments
Add MPS implementation of CTC Loss based on CUDA version
#154044 commented on Jul 2, 2025 • 0 new comments
[dict] Raise TypeError in dict methods
#154003 commented on Jul 5, 2025 • 0 new comments
[list] Implement list.count
#153969 commented on Jul 5, 2025 • 0 new comments
[dict] Implement dict subclass `fromkeys` classmethod
#155608 commented on Jul 5, 2025 • 0 new comments
[OrderedDict] Set the correct dict class in UserDefinedDictVariable
#155502 commented on Jul 5, 2025 • 0 new comments
[OrderedDict] Add `bool(OrderedDict)`
#155503 commented on Jul 5, 2025 • 0 new comments
FractionalMaxPool3d add kernel_size check
#155549 commented on Jul 4, 2025 • 0 new comments
Fix conversion of values in libtorch agnostic tests
#155115 commented on Jul 2, 2025 • 0 new comments
Fixes #154982: add missing to_result_dtype in vector_norm
#155111 commented on Jul 1, 2025 • 0 new comments
[OrderedDict] Implement `OrderedDict.move_to_end(key, last=False)`
#155152 commented on Jul 5, 2025 • 0 new comments
[dict] Implement dict.__ior__ and fix return type in dict.__or__
#155072 commented on Jul 5, 2025 • 0 new comments
[OrderedDict] Implement `OrderedDict.popitem(last=...)`
#155153 commented on Jul 5, 2025 • 0 new comments
[Intel GPU] Refactor Matmul integration: Modularize bias handling and memory creation
#154977 commented on Jul 3, 2025 • 0 new comments
[dict] Implement `__eq__` for dict_items
#155154 commented on Jul 5, 2025 • 0 new comments
update the baseline for nightly max_autotune tests
#154973 commented on Jul 1, 2025 • 0 new comments
[OrderedDict] Implement explicit OrderedDict dunder method call
#154943 commented on Jul 5, 2025 • 0 new comments
[dict] Implement dict.__eq__ and dict.__ne__
#154942 commented on Jul 5, 2025 • 0 new comments
[BE]: Try to enable LTO
#154819 commented on Jul 5, 2025 • 0 new comments
[dict] Allow Dynamo to trace through explicit dict dunder method call
#154794 commented on Jul 5, 2025 • 0 new comments
[dict] Add dict.popitem
#154793 commented on Jul 5, 2025 • 0 new comments
[vision hash update] update the pinned vision hash
#154694 commented on Jul 7, 2025 • 0 new comments
Use official CUDAToolkit module in CMake
#154595 commented on Jul 2, 2025 • 0 new comments
Fix MKL error: Inconsistent configuration parameters
#154585 commented on Jul 3, 2025 • 0 new comments
[OrderedDict] Implement `hasattr(..., IteratorVariable)`
#155501 commented on Jul 5, 2025 • 0 new comments
[cpp_wrapper] Build main and kernel code in separate threads
#154551 commented on Jul 4, 2025 • 0 new comments
[Dynamo] Guard serialization for BUILTIN_MATCH
#152729 commented on Jul 6, 2025 • 0 new comments
Update the signature and test of torch.hamming_window()
#152682 commented on Jul 3, 2025 • 0 new comments
Raise error when no record on extra_files
#152664 commented on Jul 2, 2025 • 0 new comments
Add assert_fp8_close helper for FP8 tensor comparisons
#152651 commented on Jul 5, 2025 • 0 new comments
[BE]remove vulkan test
#152643 commented on Jul 1, 2025 • 0 new comments
[pytree] make `tree_*` functions accept both Python and C++ `PyTreeSpec`
#152624 commented on Jul 3, 2025 • 0 new comments
Parameterized CUDA Graph Launch
#152622 commented on Jul 1, 2025 • 0 new comments
Update padding_mode type annotation to use Literal type (PaddingMode)
#152610 commented on Jul 1, 2025 • 0 new comments
[Testing] Is FindCUDA.cmake from `Modules_CUDA_fix` called at all?
#152604 commented on Jun 30, 2025 • 0 new comments
[BE] Delete `Module_CUDA_fix`
#152603 commented on Jul 1, 2025 • 0 new comments
[BE] Update numba versions
#152557 commented on Jul 6, 2025 • 0 new comments
[compile async] [cache] testing
#152523 commented on Jul 6, 2025 • 0 new comments
[inductor] [compile async] Don't compile in eager
#152507 commented on Jul 5, 2025 • 0 new comments
fix: Update padding_mode to use Literal for type checking
#152458 commented on Jul 1, 2025 • 0 new comments
Add epoch to fake tensor cache key
#152453 commented on Jul 1, 2025 • 0 new comments
fix: outdated contents in dynamo overview
#152382 commented on Jul 6, 2025 • 0 new comments
Updates to build on Noble (Ubuntu24.04) and py3.12
#152240 commented on Jul 4, 2025 • 0 new comments
IGNORE: Testing OIDC
#152181 commented on Jun 30, 2025 • 0 new comments
Extend compute_global_tensor_shape to multi dimension sharding
#152166 commented on Jul 2, 2025 • 0 new comments
Add dynamo config to HOP-ify context managers
#152159 commented on Jul 2, 2025 • 0 new comments
Add standard Python source distribution generation to (pre-)release workflow
#152098 commented on Jul 3, 2025 • 0 new comments
[UniformValueConstantFolder] deduce value on CPU rather than on device
#151998 commented on Jul 7, 2025 • 0 new comments
docs: add torch.e and torch.pi to constants table (#134964)
#151996 commented on Jul 6, 2025 • 0 new comments
Skip fuse attention on fp32 if not tf32
#151924 commented on Jul 4, 2025 • 0 new comments
Idea: Add SBOM Generation (and optional vuln scan) for better supply chain insight
#156085 commented on Jul 2, 2025 • 0 new comments
[CUDA] Allow cuDNN or flash attn in `test_activation_checkpointing` pattern match check
#153272 commented on Jul 4, 2025 • 0 new comments
fix dtensor and tensor inconsistent compute mesh
#153268 commented on Jul 7, 2025 • 0 new comments
Adding XPU support to DTensor examples
#153213 commented on Jul 1, 2025 • 0 new comments
[TESTING] Triton pin (Jul 1) f81f19a7f6cb7f905fde3195014c1bf51659642f
#153117 commented on Jul 2, 2025 • 0 new comments
Add CUDA support for Adagrad(fused=True)
#153038 commented on Jul 1, 2025 • 0 new comments
[WIP][dynamic shapes] unbacked safer cat, repeat
#153011 commented on Jul 6, 2025 • 0 new comments
[Pytorch] Add `torch.cuda.streams.Event` to save torch functions list
#152978 commented on Jul 6, 2025 • 0 new comments
[dtensor] Extend Partial partition of replicated tensor for min/max reduce
#152975 commented on Jul 7, 2025 • 0 new comments
docs: Improve documentation for NCCL timeout / watchdog variables
#152959 commented on Jul 6, 2025 • 0 new comments
[ROCm] Ck gemm architecture guard
#152951 commented on Jun 30, 2025 • 0 new comments
[feature] Channel Wise Parallel API for Conv layers
#152937 commented on Jul 6, 2025 • 0 new comments
Allow Inductor backends to attest their own availability
#152933 commented on Jul 5, 2025 • 0 new comments
Add overall tensor similarity comparison (#152647)
#152920 commented on Jul 6, 2025 • 0 new comments
Clarify wrap_triton doc about optional triton_op usage
#152874 commented on Jul 5, 2025 • 0 new comments
ci: Remove conda-env-macOS-ARM64, prefer pip
#152843 commented on Jul 5, 2025 • 0 new comments
[MSVC] Enable updated lambda processor by setting compiler flag /Zc:lambda globally
#152828 commented on Jul 5, 2025 • 0 new comments
another try
#152808 commented on Jul 4, 2025 • 0 new comments
wip
#152807 commented on Jul 4, 2025 • 0 new comments
Update CMakeLists.txt
#152786 commented on Jul 6, 2025 • 0 new comments
added short integer for repeat_interleave_cpu, Fixes #151311
#152762 commented on Jul 5, 2025 • 0 new comments
Allow ATen ops overloading
#152759 commented on Jul 4, 2025 • 0 new comments
Handle less functions than number of segments
#152753 commented on Jul 6, 2025 • 0 new comments
Conditionally support experimental filesystem include in jit_opt_limit
#152748 commented on Jul 5, 2025 • 0 new comments
[BE][Cleanup][Dynamo] Stop logging entire_frame_compile_time_s
#152738 commented on Jul 5, 2025 • 0 new comments
docs: fix dead link in torch.compile docs
#152734 commented on Jul 5, 2025 • 0 new comments
[BE]: Update NCCL to 2.27.5
#157108 commented on Jul 4, 2025 • 0 new comments
[Quant][CPU] Enable fp8 qconv
#157076 commented on Jul 7, 2025 • 0 new comments
Build CPP Extensions with COLOR
#157051 commented on Jun 30, 2025 • 0 new comments
Use std::string_view in torchgen
#157050 commented on Jul 2, 2025 • 0 new comments
[a2av] Make test input more random
#157029 commented on Jul 3, 2025 • 0 new comments
[EXPERIMENTAL][dynamo] Avoid potential graph breaks by relaxing `handle_traced_output` checks
#157013 commented on Jul 2, 2025 • 0 new comments
[itertools] Add CPython tests for itertools
#156981 commented on Jul 2, 2025 • 0 new comments
[CI] add decorator for specifying H100-only tests
#156980 commented on Jun 30, 2025 • 0 new comments
[TESTING] test new xpu runner
#156917 commented on Jun 30, 2025 • 0 new comments
Track monitor
#156907 commented on Jul 1, 2025 • 0 new comments
Add cuda 12.9 periodic tests
#156900 commented on Jul 3, 2025 • 0 new comments
ci: Add ability to test images for build-triton-wheel
#156894 commented on Jul 1, 2025 • 0 new comments
[refactor][dynamo] extract a helper function create_resume_fn from create_call_resume_at
#156869 commented on Jul 3, 2025 • 0 new comments
[TESTING] [DO NOT MERGE] Updated triton commit pin - upstream base
#156841 commented on Jul 2, 2025 • 0 new comments
[logging] [redo] dynamo_timed for CachingAutotuner.coordinate_descent_tuning
#156840 commented on Jul 3, 2025 • 0 new comments
[gtest][listing] Enable gtest json listing for the fbcode/caffe2 project
#156816 commented on Jul 7, 2025 • 0 new comments
add device generalization support for distributed tests
#156796 commented on Jul 4, 2025 • 0 new comments
[inductor] initial triton static config lookup table
#156785 commented on Jun 30, 2025 • 0 new comments
[cherry-pick] revert #156552
#156767 commented on Jul 4, 2025 • 0 new comments
add tests for Thunk utility function
#156759 commented on Jun 30, 2025 • 0 new comments
Add back manywheel-py3_9-cuda12_4-build/test
#156753 commented on Jul 6, 2025 • 0 new comments
WIP `fast_autotune`: Add lookup table and ML model to filter triton matmul configs
#156683 commented on Jul 1, 2025 • 0 new comments
Enable set SDPA backend by torch.nn.attention.sdpa_kernel on XPU
#156669 commented on Jul 4, 2025 • 0 new comments
[invoke_subgraph] make same subgraph share get_attr target
#157253 commented on Jun 30, 2025 • 0 new comments
[cc][pac] attempt 1.1
#157250 commented on Jun 30, 2025 • 0 new comments
[user triton] AOT inductor support for device-side TMA
#157241 commented on Jul 2, 2025 • 0 new comments
remove allow-untyped-defs from torch/ao/pruning/_experimental/pruner/parametrization.py
#157235 commented on Jul 1, 2025 • 0 new comments
remove allow-untyped-defs from torch/ao/nn/quantized/modules/rnn.py
#157234 commented on Jul 5, 2025 • 0 new comments
remove allow-untyped-defs from torch/backends/mkl/__init__.py
#157233 commented on Jul 2, 2025 • 0 new comments
remove allow-untyped-defs from torch/backends/cusparselt/__init__.py
#157232 commented on Jul 5, 2025 • 0 new comments
remove allow-untyped-defs from torch/_classes.py
#157231 commented on Jul 2, 2025 • 0 new comments
remove allow-untyped-defs from torch/utils/data/_utils/fetch.py
#157230 commented on Jul 1, 2025 • 0 new comments
remove allow-untyped-defs from torch/_lazy/__init__.py
#157228 commented on Jul 2, 2025 • 0 new comments
Updating default value of eps in RMSNorm documentation
#157223 commented on Jul 2, 2025 • 0 new comments
[DTensor][FSDP2] necessary changes to FSDP and TP to unblock EP
#157216 commented on Jul 7, 2025 • 0 new comments
fix type hints for interpolation functions
#157202 commented on Jul 6, 2025 • 0 new comments
Adding bias argument to NN normalization methods
#157198 commented on Jul 2, 2025 • 0 new comments
[DO NOT MERGE] Test new MI300X capacity.
#157191 commented on Jul 2, 2025 • 0 new comments
[Do Not Merge] moved pytorch mi300 worfklows to test scale sets
#157190 commented on Jul 3, 2025 • 0 new comments
[dynamo] auto-rewrite data-dependent if into torch.cond
#157161 commented on Jul 3, 2025 • 0 new comments
[dynamo] remove dead object from keepalive
#157159 commented on Jul 2, 2025 • 0 new comments
[WIP][CUDA][CI] Test B200 Runner with Nightly Inductor Perf Test
#157153 commented on Jul 2, 2025 • 0 new comments
[generator] Raise `StopIteration(value)` with value from the return stmt
#157152 commented on Jul 2, 2025 • 0 new comments
[nativert] libtorch kernel registry
#157150 commented on Jul 7, 2025 • 0 new comments
[contextlib] Fixes for CPython contextlib tests
#157148 commented on Jul 2, 2025 • 0 new comments
[Bugfix][Inductor] Fix dependency list merged incorrectly for a custom op with multiple mutated inputs and None return type.
#157133 commented on Jul 1, 2025 • 0 new comments
ReplaceWithCopy graph pass
#156666 commented on Jul 1, 2025 • 0 new comments
[WIP] Add a new API of allocator setting for accelerator
#156175 commented on Jul 1, 2025 • 0 new comments
Implementation of a ScannedModule
#156172 commented on Jul 1, 2025 • 0 new comments
[WIP] Deprecate some functions in CUDAAllocatorConfig, use AcceleratorAllocatorConfig instead
#156165 commented on Jul 2, 2025 • 0 new comments
[list] Raise exception in invalid list method call
#156148 commented on Jul 5, 2025 • 0 new comments
[executorch hash update] update the pinned executorch hash
#156141 commented on Jul 7, 2025 • 0 new comments
[cuDNN][64-bit indexing] update conv depthwise 64bit indexing dispatch condition to match native kernel
#156140 commented on Jul 3, 2025 • 0 new comments
Convert to markdown: jit.rst
#156094 commented on Jun 30, 2025 • 0 new comments
Fix atleast_{1,2,3}d() with no arguments description
#156042 commented on Jul 1, 2025 • 0 new comments
[BE][Easy] set end-of-line for `.bat` file to CRLF in `.editorconfig`
#156032 commented on Jul 7, 2025 • 0 new comments
[build] modernize build-frontend: `python setup.py develop/install` -> `[uv ]pip install --no-build-isolation [-e ].`
#156027 commented on Jul 5, 2025 • 0 new comments
[BE] add a minimal linter to check `pyproject.toml` consistency
#156017 commented on Jul 5, 2025 • 0 new comments
Handling overflow for long int overflow for the product of kernel_hei…
#155989 commented on Jul 3, 2025 • 0 new comments
[CI][cpp_wrapper] Fix selection of CPU OpInfo tests
#155967 commented on Jul 2, 2025 • 0 new comments
[FSDP2] Fix issue with set_reduce_scatter_divide_factor errors and MixedPrecisionPolicy
#155964 commented on Jul 4, 2025 • 0 new comments
HF loads dcp - don't do a full deserialize on every file
#155942 commented on Jul 1, 2025 • 0 new comments
[inductor] Add `-> bool` to functions named `is_*` or `_is_*`
#155928 commented on Jul 4, 2025 • 0 new comments
[dynamo] Add `-> bool` to functions named `is_*` or `_is_*`
#155923 commented on Jul 5, 2025 • 0 new comments
[NOT FOR MERGE] Exploratory work on AOTInductor training
#155877 commented on Jul 4, 2025 • 0 new comments
[einops] Ensure Dynamo can trace through explicit set dunder method call
#155842 commented on Jul 2, 2025 • 0 new comments
[doc] Updates to distributed.md for XCCL backend
#155834 commented on Jul 3, 2025 • 0 new comments
[DONT MERGE][TESTING][1/2] xpu test runner
#155793 commented on Jun 30, 2025 • 0 new comments
add sfdp pattern
#155792 commented on Jul 2, 2025 • 0 new comments
[Misc] handle sys exit caused by skip_if_lt_x_gpu in test_composabili…
#155665 commented on Jul 2, 2025 • 0 new comments
[C10d][Gloo] Enable complex datatype support in ProcessGroupGloo
#156633 commented on Jun 30, 2025 • 0 new comments
[BE] fix typo in torch/distributed/tensor/: childs -> children
#156609 commented on Jul 6, 2025 • 0 new comments
[BE] fix typo in torch/_numpy/_normalizations.py: parm -> param
#156608 commented on Jul 6, 2025 • 0 new comments
[BE][15/16] fix typos in torch/ (torch/distributed/tensor/)
#156605 commented on Jul 6, 2025 • 0 new comments
docstring_linter: Fix #151692 and other issues
#156596 commented on Jul 4, 2025 • 0 new comments
[Inductor Dashboard] Enable deterministic algorithms for some models
#156592 commented on Jun 30, 2025 • 0 new comments
[Doc] remove WSL2 in support matrix for Intel GPU
#156590 commented on Jun 30, 2025 • 0 new comments
[CPU] Fix memory access for sbgemm bf16
#156585 commented on Jul 7, 2025 • 0 new comments
[xla hash update] update the pinned xla hash
#156584 commented on Jul 7, 2025 • 0 new comments
Enable target-determination (TD) for ROCm CI
#156545 commented on Jul 5, 2025 • 0 new comments
[dynamo] Guard eagerly on list objects to avoid guard on getitem index
#156531 commented on Jul 1, 2025 • 0 new comments
[DO NOT MERGE] Update trunk.yml to change the runner that the job runs-on
#156491 commented on Jul 4, 2025 • 0 new comments
[ROCm][Windows] Fix finding ROCm/HIP version
#156486 commented on Jul 2, 2025 • 0 new comments
[DONT MERGE][TESTING][2/2] test new xpu runner
#156410 commented on Jun 30, 2025 • 0 new comments
[list] Add list.__delitem__
#156339 commented on Jul 5, 2025 • 0 new comments
[BE][2/16] fix typos in torch/ (torch/_*/)
#156312 commented on Jul 6, 2025 • 0 new comments
[BE][1/16] fix typos in torch/
#156311 commented on Jul 6, 2025 • 0 new comments
[list] Add list.__mul__ and list.__imul__
#156271 commented on Jul 5, 2025 • 0 new comments
Implement list.__add__ and list.__iadd__
#156270 commented on Jul 5, 2025 • 0 new comments
Add fallback-aware device checking for MPS operations
#156267 commented on Jul 1, 2025 • 0 new comments
[list] Implement `list.remove`
#156242 commented on Jul 5, 2025 • 0 new comments
[Native][CPU][TopK] Improve perf by reducing swap operations
#156183 commented on Jul 1, 2025 • 0 new comments
[NVIDIA] Refactor Family Blackwell Support codegen
#156176 commented on Jul 2, 2025 • 0 new comments
PyTorch CPP Extensions fail when same kernel is compiled more than once on ROCm servers
#155344 commented on Jul 2, 2025 • 0 new comments
SourcelessBuilder.create does not know how to wrap <class '__main__.InFlexData'>
#154009 commented on Jul 2, 2025 • 0 new comments
`torch.linalg.solve` does not raise an error for singular matrix on CPU.
#154842 commented on Jul 2, 2025 • 0 new comments
TORCH_COMPILE_DEBUG=1 does not consistently generate debug logs
#152374 commented on Jul 2, 2025 • 0 new comments
Quantile is limited to 16 million elements and have poor performance.
#64947 commented on Jul 2, 2025 • 0 new comments
[RFC] Remove the FSDP data copy from compute stream critical path
#157027 commented on Jul 2, 2025 • 0 new comments
DISABLED test_op_has_batch_rule_tensordot_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#142769 commented on Jul 2, 2025 • 0 new comments
DISABLED test_graph_partition_cpu_op_and_dynamic_shapes (__main__.CudaGraphTreeTests)
#157258 commented on Jul 2, 2025 • 0 new comments
DISABLED test_op_has_batch_rule_nn_functional_conv_transpose3d_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#82340 commented on Jul 2, 2025 • 0 new comments
DISABLED test_slice_scatter_reinplace_cuda (__main__.GPUTests)
#145189 commented on Jul 2, 2025 • 0 new comments
DISABLED test_module_and_optimizer_ids (__main__.TestTorchTidyProfiler)
#87581 commented on Jul 2, 2025 • 0 new comments
torch compile does not support SyncBatchNorm with fullgraph=True
#156680 commented on Jul 2, 2025 • 0 new comments
[RFC][API-Unstable] Intel GPU distributed Backend integration in `torch-xpu-ops`and registeration in PyTorch
#141741 commented on Jul 2, 2025 • 0 new comments
`torch.compile` creates a CUDA context even for CPU based code
#150622 commented on Jul 1, 2025 • 0 new comments
Support for Bazel workspace function or Bazel module
#112903 commented on Jul 1, 2025 • 0 new comments
Export + autocast is eating the exception
#153202 commented on Jul 1, 2025 • 0 new comments
TorchInductor CPU Performance Dashboard
#93531 commented on Jul 1, 2025 • 0 new comments
FSDP offload doesn't prefetch param to GPU
#157209 commented on Jul 1, 2025 • 0 new comments
Upgrade AWS lambda functions from version 2.x to 3.x of the AWS SDK for JavaScript
#137228 commented on Jul 1, 2025 • 0 new comments
cd: Migrate binary builds off of Jinja
#149660 commented on Jul 1, 2025 • 0 new comments
Add Python NoGil support in CI
#156854 commented on Jul 1, 2025 • 0 new comments
Add the XPU item to pytorch.org/get-started
#156810 commented on Jul 1, 2025 • 0 new comments
Vendored wheels on PyTorch pip repository are outdated (e.g., `cmake`, `certifi`)
#156694 commented on Jul 1, 2025 • 0 new comments
[OSS tooling] pytorchbot fail to revert a PR
#156607 commented on Jul 1, 2025 • 0 new comments
Add explicit typing to nn.Module __init__()
#156740 commented on Jul 1, 2025 • 0 new comments
Adafactor foreach impl performance tracker
#133367 commented on Jul 3, 2025 • 0 new comments
[RFC] Experimental Wheel Variant Support
#155141 commented on Jul 3, 2025 • 0 new comments
Triton pin update for PyTorch 2.8 / Triton 3.4
#154206 commented on Jul 3, 2025 • 0 new comments
cmake: add USE_SYSTEM_{KLEIDI,CUDNN_FRONTEND,CUTLASS} options to USE_SYSTEM_LIBS
#153863 commented on Jul 3, 2025 • 0 new comments
Add batched torch.combinations
#40375 commented on Jul 3, 2025 • 0 new comments
Python 3.14 support for PyTorch
#156856 commented on Jul 3, 2025 • 0 new comments
Allow `low` and `high` to be tensors in `torch.randint`
#89438 commented on Jul 3, 2025 • 0 new comments
DISABLED test_host_memory_stats (__main__.TestCuda)
#148607 commented on Jul 3, 2025 • 0 new comments
DISABLED test_matrix_rank_basic_cuda_float32 (__main__.TestLinalgCUDA)
#150406 commented on Jul 3, 2025 • 0 new comments
DISABLED test_inductor_all_gather_into_tensor_single (__main__.CompileTest)
#147707 commented on Jul 3, 2025 • 0 new comments
DISABLED test_per_sample_api_compute_batch_size_not_pytreeable_cpu (__main__.TestExpandedWeightModuleCPU)
#146972 commented on Jul 3, 2025 • 0 new comments
MPS Memory Leak
#154329 commented on Jul 3, 2025 • 0 new comments
[Doc] [Win] libuv installation doc is not correct.
#148315 commented on Jul 3, 2025 • 0 new comments
DISABLED test_fake_crossref_backward_no_amp_cholesky_solve_cuda_float32 (__main__.TestFakeTensorCUDA)
#156419 commented on Jul 3, 2025 • 0 new comments
[dynamo] Replace `unimplemented` with `unimplemented_v2`
#147913 commented on Jul 3, 2025 • 0 new comments
Most requested ops for the MPS backend
#154052 commented on Jul 3, 2025 • 0 new comments
[cudagraph] simplify usage of how cudagraph dumps debug file
#126753 commented on Jul 3, 2025 • 0 new comments
[MPS] Performance regression and visual bug with ComfyUI Flux dev since nightly 20250510
#155797 commented on Jul 2, 2025 • 0 new comments
[compile][transformers] Recompilation with mark_static_address with cudagraphs
#156377 commented on Jul 2, 2025 • 0 new comments
Broadcasting matmul is much slower than corresponding einsum
#110858 commented on Jul 2, 2025 • 0 new comments
torch._dynamo.mark_static_address refuses to work with nn.Parameter
#157221 commented on Jul 2, 2025 • 0 new comments
[Upstream Triton] persistent mm + tma accuracy failures
#156028 commented on Jul 2, 2025 • 0 new comments
Registering function that takes `const SymInt&` to op that accepts `SymInt` leads to cryptic error
#124645 commented on Jul 2, 2025 • 0 new comments
PyTorch Memory Management in GPU-to-CPU Transfers issue
#124487 commented on Jul 2, 2025 • 0 new comments
Gross mismatch in PDF between CUDA and CPU for multivariate Gaussian mixture models
#156959 commented on Jul 2, 2025 • 0 new comments
Tensorboard `add_video()` broken for `moviepy>=2.0`
#147317 commented on Jul 2, 2025 • 0 new comments
Export Huggingface models with StaticCache
#155862 commented on Jun 30, 2025 • 0 new comments
[ONNX] torch.nn.functional.interpolate \w antialias=True isn't op.Resize compatible
#157220 commented on Jun 30, 2025 • 0 new comments
`torch.ldexp` goes out of range when `2**other` is out of range
#153069 commented on Jun 30, 2025 • 0 new comments
Is compilation caching for NumPy operators not supported in PyTorch 2.7.1?
#156943 commented on Jun 30, 2025 • 0 new comments
Compilation issues with ROCm 6.4.1 on Debian 12
#155794 commented on Jun 30, 2025 • 0 new comments
Update epsilon logic to improve numerical stability
#151110 commented on Jun 30, 2025 • 0 new comments
DISABLED test_forward_generation (__main__.CudaGraphTreeTests)
#157058 commented on Jun 30, 2025 • 0 new comments
Windows Source Build Fails with OSError: [WinError 126] on aoti_custom_ops.dll for RTX 5080 (sm_120), Pre-built PyTorch Works
#157128 commented on Jun 30, 2025 • 0 new comments
QAT support for conv2d with groups > 1
#157222 commented on Jun 30, 2025 • 0 new comments
NVFp4 Cublas Error
#157054 commented on Jun 30, 2025 • 0 new comments
Setting up for development
#157141 commented on Jun 30, 2025 • 0 new comments
[CUDA][CUTLASS] test_cutlass_backend.py unit test failures on SM90+
#155888 commented on Jun 30, 2025 • 0 new comments
[feature request] Native checkpointing to/from `s3://`
#155992 commented on Jun 30, 2025 • 0 new comments
CMake improperly configures pybind11. 3 different versions of pybind11 in use at the sametime.
#156725 commented on Jun 30, 2025 • 0 new comments
Preload CUDA fails if CUDA libs in different PYTHONPATH
#147001 commented on Jun 30, 2025 • 0 new comments
[RFC] Integrate NCCL scalable init API
#136539 commented on Jun 30, 2025 • 0 new comments
Pypi Support for Windows arm64
#154260 commented on Jun 30, 2025 • 0 new comments
[Feature] Taylor expansion pruning
#157218 commented on Jun 30, 2025 • 0 new comments
`TorchScript` does not allow accessing methods of nested tensors
#156544 commented on Jun 30, 2025 • 0 new comments
Set dependencies lower bound
#156587 commented on Jun 30, 2025 • 0 new comments
[CD] Windows Wheel builds CUDA 12.9.1 Stack Overflow during build
#156181 commented on Jun 30, 2025 • 0 new comments
DISABLED test_forward_backward_not_called_backend_inductor (__main__.CudaGraphTreeTests)
#157035 commented on Jun 30, 2025 • 0 new comments
DISABLED test_remove_noop_view_dtype_cuda (__main__.GPUTests)
#151541 commented on Jun 30, 2025 • 0 new comments
DISABLED test_hessian_vectorize_raises_no_warnings_logging_tensor (__main__.TestAutogradFunctional)
#153644 commented on Jun 30, 2025 • 0 new comments
Unable to compile
#156915 commented on Jun 30, 2025 • 0 new comments
Libtorch segfault when used with libqpOASES
#33890 commented on Jun 30, 2025 • 0 new comments
Improve debug message for metadata guard failure
#157075 commented on Jul 1, 2025 • 0 new comments
[dynamo] torch.randint_like on DTensor does not work with compile
#156649 commented on Jul 1, 2025 • 0 new comments
[dynamic shapes] translation validation failure under `fake_tensor_propagate_real_tensors`
#156251 commented on Jul 1, 2025 • 0 new comments
FakeTensorUpdater doesn't support HOPs
#156819 commented on Jul 1, 2025 • 0 new comments
Convolution NN for complex numbers and more special functions
#116414 commented on Jul 1, 2025 • 0 new comments
torch.Tensor.is_sparse returns false for non-COO sparse tensors
#101385 commented on Jul 1, 2025 • 0 new comments
test_tensor_with_grad_to_scalar_warning failure
#157252 commented on Jul 1, 2025 • 0 new comments
DISABLED test_graph_partition (__main__.CudaGraphTreeTests)
#157173 commented on Jul 1, 2025 • 0 new comments
DISABLED test_function_compiled_multiple_times (__main__.CudaGraphTreeTests)
#157143 commented on Jul 1, 2025 • 0 new comments
DISABLED test_matmul_small_brute_force_tunableop_cuda_float32 (__main__.TestLinalgCUDA)
#141635 commented on Jul 1, 2025 • 0 new comments
[Windows] pytorch >= 2.5
#140875 commented on Jul 1, 2025 • 0 new comments
[ued][gemma3] HF + torch.compile - torch.compile on Gemma3
#149574 commented on Jul 1, 2025 • 0 new comments
DISABLED test_op_has_batch_rule_nn_functional_conv_transpose1d_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#142566 commented on Jul 1, 2025 • 0 new comments
DISABLED test_frozen_fn (__main__.CudaGraphTreeTests)
#157112 commented on Jul 1, 2025 • 0 new comments
CTCLoss gradient is incorrect
#52241 commented on Jul 1, 2025 • 0 new comments
UR Error when calling grid_sample
#153996 commented on Jul 1, 2025 • 0 new comments
[inductor][cpu] pyhpc_isoneutral_mixing, lennard_jones and pyhpc_equation_of_state performance regression in 2025-06-23 nightly release
#157077 commented on Jul 1, 2025 • 0 new comments
Torch RPC examples from docs say usage is deprecated.
#149393 commented on Jul 1, 2025 • 0 new comments
Documentation Clarification Needed for Clamping of Scale Coefficient in clip_grads_with_norm_
#151554 commented on Jul 1, 2025 • 0 new comments
torch.compile with mode = "max-autotune" breaks when starting from inference_mode
#135892 commented on Jul 1, 2025 • 0 new comments
UNSTABLE pull / cuda12.8-py3.10-gcc9-sm75 / test (pr_time_benchmarks)
#153987 commented on Jul 1, 2025 • 0 new comments
DISABLED test_forward_with_skipped_cudagraphed_backward (__main__.CudaGraphTreeTests)
#157086 commented on Jul 1, 2025 • 0 new comments
Export always give a value range with max length - 1
#156882 commented on Jul 1, 2025 • 0 new comments
Perf drop when running with FSDP and torch.compile
#156966 commented on Jul 1, 2025 • 0 new comments
DeviceMesh's `_set_mesh_dim_group_options` ineffective for 1-dim meshes
#156593 commented on Jun 30, 2025 • 0 new comments
[user triton] on-device TMA + AOTI causes IMA with pytorch 2.8 branch
#157240 commented on Jun 30, 2025 • 0 new comments
Fix `SequentialLR` deprecate warning about invoke `step(epoch)`
#149392 commented on Jul 4, 2025 • 0 new comments
NUMA Binding Integration with torchrun
#149334 commented on Jul 2, 2025 • 0 new comments
[cuDNN][SDPA] cuDNN SDPA refactor/cleanup, nested tensor backward, test priority bump for `sm90`, `sm100`
#149282 commented on Jul 2, 2025 • 0 new comments
Fix unexpected keyword argument 'mode' when calling `CompileCounterWithBackend`
#149271 commented on Jul 6, 2025 • 0 new comments
[test] test for keep going
#149003 commented on Jul 1, 2025 • 0 new comments
Fix AttributeError for `_get_vc_env` with setuptools>=75.9.0
#148847 commented on Jul 6, 2025 • 0 new comments
C++ support to print symbolic tensors as `Symbolic tensor: size=(...)`
#148846 commented on Jul 3, 2025 • 0 new comments
Trunk workflow for Windows Arm64
#148753 commented on Jul 1, 2025 • 0 new comments
Optimize AOTInductor: Caching, Reduced Decompositions, and Improved JSON Handling
#148616 commented on Jul 1, 2025 • 0 new comments
[BE][pytree] cleanup parameterized pytree tests
#148569 commented on Jul 3, 2025 • 0 new comments
[triton hash update] update the pinned triton hash
#148492 commented on Jul 7, 2025 • 0 new comments
[BE][pytree] rename argument name in register function to match the type annotations: `*_fn -> *_func`
#148484 commented on Jul 3, 2025 • 0 new comments
[BE][pytree] rename `NodeDef` member to match the type annotations: `*_fn -> *_func`
#148474 commented on Jul 3, 2025 • 0 new comments
[pytree] simplify public API exposition with `__module__`
#148328 commented on Jul 3, 2025 • 0 new comments
[pytree] add another simplified pytree module `torch.pytree`
#148180 commented on Jul 3, 2025 • 0 new comments
Support `contextlib.suppress`
#147990 commented on Jul 2, 2025 • 0 new comments
Update triton_heuristics.py
#147690 commented on Jul 2, 2025 • 0 new comments
removed zero dim cpu logic from fake_tensor.py
#147501 commented on Jul 2, 2025 • 0 new comments
Deprecate DataLoader pin_memory_device param
#146821 commented on Jul 4, 2025 • 0 new comments
Support contextlib.ExitStack
#146506 commented on Jul 2, 2025 • 0 new comments
Update quantile doc
#146485 commented on Jul 1, 2025 • 0 new comments
[dcp] Minor improvements to filesystem writer
#146273 commented on Jul 5, 2025 • 0 new comments
docs: change log to ln in Softplus function and class
#146199 commented on Jul 1, 2025 • 0 new comments
Avoid data-dependent errors by runtime assert substitution.
#145681 commented on Jul 1, 2025 • 0 new comments
Fix full_like decomposition to preserve strides
#144765 commented on Jul 2, 2025 • 0 new comments
[BE][PYFMT] remove `black`: finish `black -> ruff format` migration
#144557 commented on Jul 5, 2025 • 0 new comments
Deprecated pkg_resources and use distributions instead
#151915 commented on Jun 30, 2025 • 0 new comments
[reland][ROCm] remove caffe2 from hipify
#151845 commented on Jul 1, 2025 • 0 new comments
Horizontal
#151780 commented on Jul 4, 2025 • 0 new comments
enable windows inductor UT in CI
#151777 commented on Jul 7, 2025 • 0 new comments
Add adaptive_avg_pool2d input and output_size check
#151769 commented on Jul 1, 2025 • 0 new comments
Implement avg_pool3d for MPS backend
#151742 commented on Jul 5, 2025 • 0 new comments
Update OpenBLAS commit
#151547 commented on Jul 2, 2025 • 0 new comments
Implement fast exp for AVX2 and AVX512 for the flash attention
#151441 commented on Jul 7, 2025 • 0 new comments
Use Allocator API raw_allocate & raw_dealloc in CUDAAllocator
#151305 commented on Jul 5, 2025 • 0 new comments
[dynamo] Avoid unnecessary `.detach()` call in `_make_subclass` polyfill
#151265 commented on Jul 5, 2025 • 0 new comments
Implement MKLGenerator
#151218 commented on Jul 4, 2025 • 0 new comments
Fix `MaskedTensor` to device ignored mask
#151205 commented on Jul 4, 2025 • 0 new comments
TESTING: IGNORE
#151116 commented on Jun 30, 2025 • 0 new comments
[export] add runtime assert messages to python torch checks
#150719 commented on Jul 5, 2025 • 0 new comments
Make LazyModuleMixin materialize after load_state_dict
#150593 commented on Jul 1, 2025 • 0 new comments
Refactor CUDAAllocatorConfig to reuse AcceleratorAllocatorConfig
#150312 commented on Jul 1, 2025 • 0 new comments
Add differentiable ops hint message in Module docs
#150291 commented on Jul 5, 2025 • 0 new comments
softmax: add device check for xpu with half_to_float
#150278 commented on Jul 3, 2025 • 0 new comments
Add cmake variable USE_ROCM_CK
#150245 commented on Jul 5, 2025 • 0 new comments
[WIP][dynamic shapes] rewrite should_swap with guard_or_false
#150164 commented on Jul 1, 2025 • 0 new comments
AOTI freezing: fix test issues and enable by default
#149961 commented on Jul 2, 2025 • 0 new comments
DRAFT: Add TMA opt for concat function target hopper and blackwell arch
#149893 commented on Jul 6, 2025 • 0 new comments
Add SWA with a cyclical scheduler example
#149847 commented on Jul 1, 2025 • 0 new comments
Inductor logging + analysis of torch.profile
#149697 commented on Jul 7, 2025 • 0 new comments
Introduce AcceleratorAllocatorConfig as the common class
#149601 commented on Jul 7, 2025 • 0 new comments
[test] sccache docker build
#149536 commented on Jul 6, 2025 • 0 new comments
ROCm+gcc 15 asserts
#145608 commented on Jul 5, 2025 • 0 new comments
Make tlparse able to show a summary of distinct graph breaks
#153669 commented on Jul 5, 2025 • 0 new comments
I want to calculate the matrix multiplication of two Boolean matrices, but torch.mm will report an error. Is there any more efficient alternative?
#107041 commented on Jul 5, 2025 • 0 new comments
RendezvousConnectionError when use C10d on multi nodes
#69197 commented on Jul 5, 2025 • 0 new comments
Wrong error message for wrong dtypes in `torch.binomial`
#157195 commented on Jul 5, 2025 • 0 new comments
Run existing eager DTensor tests under torch.compile
#127772 commented on Jul 5, 2025 • 0 new comments
Trying to build from source with use_flash_attention fails on windows due to fatal error C1189
#134854 commented on Jul 5, 2025 • 0 new comments
Dead link in `torch.compile` docs
#119272 commented on Jul 5, 2025 • 0 new comments
[v.2.8.0] Release Tracker
#156745 commented on Jul 4, 2025 • 0 new comments
torch.compile does not work with Flash attention 3
#144540 commented on Jul 4, 2025 • 0 new comments
DISABLED test_simple_multi_arch_embed_kernel_binary_True_cuda (__main__.AOTInductorTestABICompatibleGpu)
#156930 commented on Jul 4, 2025 • 0 new comments
DISABLED test_assigning_back_deleter_fns_to_tensor (__main__.TestBlockStateAbsorption)
#134810 commented on Jul 4, 2025 • 0 new comments
DISABLED test_wait_tensor (__main__.CompileTest)
#148014 commented on Jul 4, 2025 • 0 new comments
DISABLED test_index (__main__.TestPythonBuiltinOP)
#119160 commented on Jul 4, 2025 • 0 new comments
Process never ends when sending tensors through multiprocessing queues in Python 3.12+ with filesystem strategy
#153050 commented on Jul 4, 2025 • 0 new comments
[dynamo] dynamo is unable to enter `except RuntimeError` while eager can
#157217 commented on Jul 4, 2025 • 0 new comments
DISABLED test_mempool_limited_memory_with_allocator (__main__.TestMemPool)
#157256 commented on Jul 4, 2025 • 0 new comments
Device Error on vmap
#151591 commented on Jul 4, 2025 • 0 new comments
Segmentation fault in torch.repeat_interleave
#157097 commented on Jul 4, 2025 • 0 new comments
Floating point exception in torch.nn.functional.conv_transpose3d
#157098 commented on Jul 4, 2025 • 0 new comments
[FSDP2] allow different dtypes for the model params with gradients
#156784 commented on Jul 4, 2025 • 0 new comments
flex_attention + dynamic=True with large batch or heads causes Triton Error [CUDA]: invalid argument
#157018 commented on Jul 3, 2025 • 0 new comments
Documentation: explaining the STFT formula
#153531 commented on Jul 3, 2025 • 0 new comments
[Tracker] AutoParallel's feature request to DTensor
#156217 commented on Jul 3, 2025 • 0 new comments
Unexpected, batch size and device dependent NaN propagation in Conv1d
#157237 commented on Jul 3, 2025 • 0 new comments
`RuntimeError: UR error` with XPU
#149953 commented on Jul 3, 2025 • 0 new comments
[BE][PYFMT] migrate PYFMT for `test/[i-z]*/` to `ruff format`
#144556 commented on Jul 5, 2025 • 0 new comments
[BE][PYFMT] migrate PYFMT for `torch/[p-z]*/` to `ruff format`
#144552 commented on Jul 5, 2025 • 0 new comments
[dynamo, nested graph breaks] add nested graph break tests
#144516 commented on Jul 2, 2025 • 0 new comments
Add where_ ops
#143636 commented on Jul 5, 2025 • 0 new comments
[Draft][WIP] Enable XPU path for FlexAttention
#143553 commented on Jul 7, 2025 • 0 new comments
Fix `USE_STATIC_MKL` lost functionality
#138996 commented on Jul 1, 2025 • 0 new comments
Always produce XML
#138513 commented on Jul 5, 2025 • 0 new comments
Add DeviceAllocator as the base device allocator
#138222 commented on Jul 4, 2025 • 0 new comments
[pytree] add `treespec_{leaf,tuple,dict}` functions for args_spec modification
#138214 commented on Jul 3, 2025 • 0 new comments
[pytree] Add public pytree module `torch.utils.pytree`
#137400 commented on Jul 3, 2025 • 0 new comments
Feature: Implement support for `cudnn_batch_norm_out` kernel to replace the autogen approach.
#123020 commented on Jul 4, 2025 • 0 new comments
Automated submodule update: FBGEMM
#115316 commented on Jul 4, 2025 • 0 new comments
[pytree] support PyStructSequence types for Python pytree
#113258 commented on Jul 3, 2025 • 0 new comments
Automated submodule update: kineto
#106149 commented on Jul 2, 2025 • 0 new comments
[WIP][RFC] Compilable flex_attention + Context Parallel
#157015 commented on Jul 7, 2025 • 0 new comments
DISABLED test_dont_aggressively_write_assert (__main__.ReproTests)
#156570 commented on Jul 7, 2025 • 0 new comments
DISABLED test_inductor_reduce_scatter_tensor_coalesced (__main__.CompileTest)
#147887 commented on Jul 7, 2025 • 0 new comments
mps and cpu backends produce different training results with FFT and Adam
#151740 commented on Jul 6, 2025 • 0 new comments
[ONNX] Create a tutorial for exporting hf transformers model
#156258 commented on Jul 6, 2025 • 0 new comments
Add `is_outputs_batched` param to `autograd.grad`
#156616 commented on Jul 6, 2025 • 0 new comments
MPS operator coverage tracking issue (2.6+ version)
#141287 commented on Jul 6, 2025 • 0 new comments
ImportError: libcupti.so.11.2: cannot open shared object file: No such file or directory
#88802 commented on Jul 6, 2025 • 0 new comments
Migrating existing backend-MAIA integration toward PrivateUse1 / openReg
#155864 commented on Jul 6, 2025 • 0 new comments
Flex Attention is incompatible with selective AC
#147879 commented on Jul 6, 2025 • 0 new comments
Pipeline Parallelism Fails when stage input does not produce gradients in all stages.
#152827 commented on Jul 6, 2025 • 0 new comments
General MPS op coverage tracking issue
#77764 commented on Jul 6, 2025 • 0 new comments