-
Notifications
You must be signed in to change notification settings - Fork 24.6k
Insights: pytorch/pytorch
Overview
Could not load contribution data
Please try again later
9 Pull requests merged by 2 people
-
Fix cuda 12.9 aarch64 GPU builds. Update CUDA_STABLE variable.
#157641 merged
Jul 4, 2025 -
Remove +PTX from CUDA 12.8 builds
#157634 merged
Jul 4, 2025 -
Cleanup leftover miniconda brew installation
#157567 merged
Jul 4, 2025 -
Fix GITHUB_OUTPUT syntax in create_release.yml workflow
#157539 merged
Jul 4, 2025 -
[aarch64] Add back NCCL lib to cuda arm wheel
#157105 merged
Jul 4, 2025 -
[MPS] Revert cumsum/cumprod to MPSGraph implementation
#157494 merged
Jul 3, 2025 -
[ez] Disable some failing periodic tests
#157560 merged
Jul 3, 2025 -
Revert "Update triton version to 3.4"
#157471 merged
Jul 2, 2025 -
[ROCm] Bump AOTriton to 0.10b
#156845 merged
Jun 30, 2025
154 Pull requests opened by 85 people
-
[distributed] build enum for Backend class
#157263 opened
Jun 30, 2025 -
Fix init CUDA preload: get correct versions (#147001)
#157264 opened
Jun 30, 2025 -
[inductor] fix tensor.to(uint8) error when tensor src type is float
#157267 opened
Jun 30, 2025 -
Fix the Problems About Defining Static Variable in Inline Function
#157269 opened
Jun 30, 2025 -
[inductor][templates] Finalize all registered hooks
#157270 opened
Jun 30, 2025 -
Update docs dependencies
#157287 opened
Jun 30, 2025 -
[nativert] add memory overlap debug assertion
#157290 opened
Jun 30, 2025 -
adding the ability to record aten arg vals and types
#157291 opened
Jun 30, 2025 -
Fixes typo in nccl_window_registration test
#157293 opened
Jun 30, 2025 -
Enable `file_descriptor` strategy on Darwin
#157295 opened
Jun 30, 2025 -
Test re-enabling ET test
#157298 opened
Jun 30, 2025 -
[AOTI][experiment]
#157301 opened
Jun 30, 2025 -
[dynamo] Fix source for lru_cache method
#157308 opened
Jun 30, 2025 -
Fix inconsistent pybind11 usage across ONNX and Tensorpipe during CMake build
#157309 opened
Jun 30, 2025 -
Using torch.accelerator in comm_mode_features_example.py and visualize_sharding_example.py
#157317 opened
Jun 30, 2025 -
Making input dynamically adjust.
#157324 opened
Jun 30, 2025 -
Add inductor lowerings for adaptive_avg_pool3d/adaptive_max_pool3d
#157331 opened
Jun 30, 2025 -
[BE] Rename TorchVersion -> VersionString
#157333 opened
Jul 1, 2025 -
Make the name assert actually do something, and reserve some more names
#157342 opened
Jul 1, 2025 -
[dynamo] Replace unimplemented with unimplemented_v2 in `torch/_dynamo/variables/torch.py`
#157344 opened
Jul 1, 2025 -
Add Intel GPU info collection to the collect env script
#157351 opened
Jul 1, 2025 -
[cherry-pick] temporarily disabling generation of weblinks for torch v2.8 …
#157353 opened
Jul 1, 2025 -
[BE] Update xpu driver repo for CD used almalinux 8.10
#157356 opened
Jul 1, 2025 -
[BE] fix typo: inpt -> input
#157361 opened
Jul 1, 2025 -
Fix diagnostic message for CUDA version mismatch in cuda.cmake
#157370 opened
Jul 1, 2025 -
[HF][DCP] Upload local consolidated files to remote storage if needed
#157371 opened
Jul 1, 2025 -
[submodule][cutlass] Update pin to b995f93 v4.0.0
#157376 opened
Jul 1, 2025 -
[release/2.8] update Triton 3.4 pin to f81f19a7
#157377 opened
Jul 1, 2025 -
101385: Warning message when non-coo tensors are passed to `is_sparse`
#157378 opened
Jul 1, 2025 -
[inductor] Fix memory layout for concatenation of repeated input
#157380 opened
Jul 1, 2025 -
[multi-kernel][fix-comments] attempt-1
#157384 opened
Jul 1, 2025 -
[CI] Fixes CI for CUDA Version > 12.9
#157385 opened
Jul 1, 2025 -
Add explicit typing to nn.Module.__init__() parameters
#157389 opened
Jul 1, 2025 -
[xpu] Correctly load RNG state during XPU checkpointing
#157390 opened
Jul 1, 2025 -
[dynamic shapes] allocate fresh symbols for slice
#157392 opened
Jul 1, 2025 -
Fix is_unaligned usage of statically_known_true
#157400 opened
Jul 1, 2025 -
[SymmMem] Install NVSHMEM wheel in CI docker
#157411 opened
Jul 2, 2025 -
[cherry-pick] Organize BUCK for torch/standalone and Rename torch::standalone to headeronly
#157418 opened
Jul 2, 2025 -
Preserve current stream in TestCuda::test_stream_compatibility
#157421 opened
Jul 2, 2025 -
[PowerPC] Fixed build issue for vsx vec256 complexfloat and scaled_mm_out_cpu
#157422 opened
Jul 2, 2025 -
Add a flag "realized" in IRNode to enable tracking origin_nodes
#157423 opened
Jul 2, 2025 -
[Refactor][XPU] Refactor XPU quantization op and add header files.
#157430 opened
Jul 2, 2025 -
[build] make SDist buildable: bootstrap git repo and submodules
#157432 opened
Jul 2, 2025 -
[test] Yanbing/tf32 dev
#157433 opened
Jul 2, 2025 -
Add a test for checking that the CUDA stubs directory is not in libcaffe2_nvrts.so's RPATH or RUNPATH
#157437 opened
Jul 2, 2025 -
Fix FlexAttention int64 indexing for large tensors
#157447 opened
Jul 2, 2025 -
[inductor][user triton] sanitize triple-quoted docstrings in kernel definitions
#157454 opened
Jul 2, 2025 -
Add legacy note to autograd.profiler doc.
#157459 opened
Jul 2, 2025 -
[PowerPC]: Fixed build issue that occur because of datatype f8 enablement for onednn in qlinear and prepack
#157469 opened
Jul 2, 2025 -
[dynamo] Add an assertion in guards to fail early for non-sequence length checks
#157478 opened
Jul 2, 2025 -
[wip] torch._dynamo.save/load() for saving and loading compiled models.
#157481 opened
Jul 2, 2025 -
Fix typo: 'tracable' → 'traceable' in torch/_dynamo/variables/torch.py
#157483 opened
Jul 2, 2025 -
[BE] rewrite `CacheBase` and `LocalCache` as generics
#157493 opened
Jul 2, 2025 -
Add test for user-managed weights with load_state_dict
#157496 opened
Jul 2, 2025 -
[EXPERIMENTL][dynamo] Remove `input_source_to_var`
#157497 opened
Jul 2, 2025 -
Add `max_pool3d` backward pass for MPS
#157498 opened
Jul 2, 2025 -
[EXPERIMENTAL] turn on `torch._dynamo.config.capture_scalar_outputs` by default
#157499 opened
Jul 2, 2025 -
[EXPERIMENTAL] turn on `torch._dynamo.config.capture_dynamic_output_shape_ops` by default
#157500 opened
Jul 2, 2025 -
[DeviceMesh] Use user set backend and pg option even for the global mesh
#157501 opened
Jul 2, 2025 -
[autograd] Avoid creating and recording event when unnecessary
#157503 opened
Jul 2, 2025 -
[1/N] cost coverage improvment
#157504 opened
Jul 2, 2025 -
[refactor][dynamo] make BUILD_TUPLE instruction use inst.arg
#157505 opened
Jul 2, 2025 -
[WIP][FSDP2] support dataclass args/kwargs and output
#157506 opened
Jul 2, 2025 -
[DO NOT MERGE] Clone of PR #157309
#157507 opened
Jul 2, 2025 -
[wip] inspect output code
#157508 opened
Jul 2, 2025 -
[ONNX] Fix conversion of attention - 4D
#157509 opened
Jul 2, 2025 -
[wip] merge async and progressive
#157510 opened
Jul 2, 2025 -
[dynamo] fix infinite loop in computing all stack meta
#157511 opened
Jul 2, 2025 -
[dynamo] Fix bug in dict(mapping_proxy)
#157515 opened
Jul 2, 2025 -
[PGO] include module int attributes in PGO state
#157518 opened
Jul 3, 2025 -
[cherry-pick] [fake tensor] fix issue of no attribute tags (#156689)
#157519 opened
Jul 3, 2025 -
Enable TF32 as fp32 internal precision for matmul/linear/conv
#157520 opened
Jul 3, 2025 -
[c10d] support dynamic shapes for all_to_all_single_autograd
#157521 opened
Jul 3, 2025 -
[DeviceMesh] Add error when users try to slice non contiguous flattened dim submesh
#157523 opened
Jul 3, 2025 -
[Easy] Show some clear error when torch.ops.load_library fails.
#157524 opened
Jul 3, 2025 -
[br][pc] consolidate attempt 1
#157526 opened
Jul 3, 2025 -
[dynamo, docs] add dynamo programming model docs
#157527 opened
Jul 3, 2025 -
[WIP] avoid unnecessary slices
#157528 opened
Jul 3, 2025 -
[FSDP2] Use reduceOpSum for world size 1
#157529 opened
Jul 3, 2025 -
Fix typo: 'reset_paramteres' → 'reset_parameters' in transformer.cpp
#157536 opened
Jul 3, 2025 -
handling special case for pow(3) for GPU
#157537 opened
Jul 3, 2025 -
Don't try installing missing cuda dependencies on s390x
#157540 opened
Jul 3, 2025 -
S390x update test marks
#157541 opened
Jul 3, 2025 -
[indcutor] pack linear for FP32 dynamic mode
#157542 opened
Jul 3, 2025 -
Add is_hidden_event method to KinetoEvent Python interface
#157546 opened
Jul 3, 2025 -
[BE][1/5] fix typos in aten/
#157550 opened
Jul 3, 2025 -
[BE][2/5] fix typos in aten/ (aten/src/ATen/native/)
#157551 opened
Jul 3, 2025 -
[BE][3/5] fix typos in aten/ (aten/src/ATen/native/)
#157552 opened
Jul 3, 2025 -
[BE][4/5] fix typos in aten/ (aten/src/ATen/native/)
#157553 opened
Jul 3, 2025 -
[BE][5/5] fix typos in aten/ (aten/src/ATen/)
#157554 opened
Jul 3, 2025 -
Try adding sm_50-sm_70 arches for linux cuda 12.8 builds
#157558 opened
Jul 3, 2025 -
Linux py 3.14 wheel builds
#157559 opened
Jul 3, 2025 -
[PT2][memory] mutation size correctness
#157562 opened
Jul 3, 2025 -
[PT2][fusion] ban fusions with large accumulated reads
#157563 opened
Jul 3, 2025 -
[dynamo] [guard] Change the guard type of inside disable function to avoid unnecessary recompilation.
#157566 opened
Jul 3, 2025 -
[MPS][DO NOT MERGE] CI signals for conv nan issue on macOS CPU
#157568 opened
Jul 3, 2025 -
[simplefsdp auto-bucketing] ir node runtime estimation
#157572 opened
Jul 3, 2025 -
Test case for nanogpt
#157576 opened
Jul 3, 2025 -
Fixed the function to get the origin nodes of fused triton kernel.
#157578 opened
Jul 3, 2025 -
[fbcode] switch to cutlass-4
#157579 opened
Jul 3, 2025 -
allow user to pass in custom partitioner function
#157580 opened
Jul 3, 2025 -
Fix typo: 'initalizer' → 'initializer' in test_reductions.cpp
#157581 opened
Jul 3, 2025 -
allow _size_of to return individual element's size
#157582 opened
Jul 3, 2025 -
correctly import torch.version
#157584 opened
Jul 3, 2025 -
[CUDA][NVTX] use `pytorch` nvtx domain for pytorch ranges
#157586 opened
Jul 3, 2025 -
Add einops x torch.compile testing in PyTorch CI (#157416)
#157588 opened
Jul 3, 2025 -
Add stack trace of exception to MultiProcContinousTest
#157589 opened
Jul 3, 2025 -
Add master switch for aot_inductor.compile_standalone
#157590 opened
Jul 3, 2025 -
[AOTI] Split aoti_runtime/model.h to prepare for model static linking
#157592 opened
Jul 3, 2025 -
Fix einsum strategy shard dim > ndim
#157593 opened
Jul 3, 2025 -
[dynamo] Move skipIf decorator to class level in test_fx_graph_runnable
#157594 opened
Jul 3, 2025 -
Fix doc issue 153531 by adding further explanation of STFT equation
#157595 opened
Jul 3, 2025 -
Fix typo: 'inital_grad' → 'initial_grad' in FSDP test
#157596 opened
Jul 3, 2025 -
Fix einops x torch.compile interaction
#157600 opened
Jul 4, 2025 -
[DRAFT] DDE-Free select with unbacked index.
#157605 opened
Jul 4, 2025 -
[aot] add format_consts_to_cpp function for further development.
#157608 opened
Jul 4, 2025 -
[Device] Add support for PrivateUse1 device type in parse_type function
#157609 opened
Jul 4, 2025 -
[pruning] add more test cases for pruning
#157613 opened
Jul 4, 2025 -
tlparse remove duplicate reasons
#157618 opened
Jul 4, 2025 -
[pruning] Implement Taylor expansion unstructured pruning
#157620 opened
Jul 4, 2025 -
[nativert] Move ModelRunnerBase to oss.
#157633 opened
Jul 4, 2025 -
[BE][1/6] fix typos in test/
#157635 opened
Jul 4, 2025 -
[BE][2/6] fix typos in test/ (test/test_*.py)
#157636 opened
Jul 4, 2025 -
[BE][3/6] fix typos in test/
#157637 opened
Jul 4, 2025 -
[BE][6/6] fix typos in test/ (test/distributed/)
#157640 opened
Jul 4, 2025 -
Fix typo: 'initalization' → 'initialization' in profiler test comment
#157645 opened
Jul 4, 2025 -
[MemoryViz] Add file selector button
#157647 opened
Jul 4, 2025 -
Fix typo: 'occurance' → 'occurrence' in typing test
#157649 opened
Jul 4, 2025 -
Fix typo: 'paramter' → 'parameter' in dynamo variable comment
#157651 opened
Jul 4, 2025 -
[wip] async cancellation test
#157652 opened
Jul 4, 2025 -
Fix typo: 'reset_paramteres' → 'reset_parameters' in transformer module comments
#157656 opened
Jul 4, 2025 -
Fixes issue 157195 by adding error message
#157658 opened
Jul 5, 2025 -
[wip] merge async and progressive compiles
#157659 opened
Jul 5, 2025 -
Fix typo: 'occurance' → 'occurrence' in lazy extract_compiled_graph.py
#157664 opened
Jul 5, 2025 -
Fix typo: 'initalizer' → 'initializer' in test_reductions.cpp
#157667 opened
Jul 5, 2025 -
Fix 'dllimport attribute ignored on inline function'
#157670 opened
Jul 6, 2025 -
Fix index_put propagate strategy arg unpack error
#157671 opened
Jul 6, 2025 -
Fix torch._numpy advanced indexing to match NumPy when indices are separated
#157676 opened
Jul 6, 2025 -
[pt2 event logging] add configurable prefix
#157678 opened
Jul 6, 2025 -
installing requirements.txt fix
#157681 opened
Jul 6, 2025 -
[dtensor] add support for fused optimizer with parameters across multiple meshes
#157682 opened
Jul 7, 2025 -
[Inductor][Float8] Add float8_e4m3fn into assertion dtype list.
#157684 opened
Jul 7, 2025 -
[BE] add `SHFMT` linter to format shell scripts
#157685 opened
Jul 7, 2025 -
[BE][1/4] format shell scripts with `SHFMT`
#157686 opened
Jul 7, 2025 -
[BE][2/4] format shell scripts with `SHFMT` in .circleci/ and .github/
#157687 opened
Jul 7, 2025 -
[BE][3/4] format shell scripts with `SHFMT` in .ci/
#157688 opened
Jul 7, 2025 -
[BE][4/4] format shell scripts with `SHFMT` in scripts/
#157689 opened
Jul 7, 2025 -
[canary] dedupe args + on by default
#157690 opened
Jul 7, 2025 -
[canary] dedupe args + on by default
#157691 opened
Jul 7, 2025 -
[BE][Easy] add `.editorconfig` setting for C/C++/CUDA/ObjC
#157692 opened
Jul 7, 2025 -
[CI] Fix xpu ci test sccache issue
#157693 opened
Jul 7, 2025 -
fix storage use_count
#157694 opened
Jul 7, 2025 -
[SymmMem] find_path does not search /usr/local/lib
#157695 opened
Jul 7, 2025 -
Update slow tests
#157696 opened
Jul 7, 2025
84 Issues closed by 25 people
-
DISABLED test_ranks_and_tag (__main__.CompileTest)
#147974 closed
Jul 7, 2025 -
DISABLED test_dont_dce_rand (__main__.ReproTests)
#156580 closed
Jul 7, 2025 -
DISABLED test_add_complex_conj (__main__.ReproTests)
#156579 closed
Jul 7, 2025 -
DISABLED test_tracker_with_activation_checkpointing (__main__.TestTrackerFullyShard1DTrainingCompose)
#139814 closed
Jul 7, 2025 -
DISABLED test_tracker_non_root_forward_backward (__main__.TestTrackerFullyShard1DTrainingCore)
#129692 closed
Jul 7, 2025 -
DISABLED test_non_contiguous_input_mm_plus_mm (__main__.TestMaxAutotune)
#126867 closed
Jul 7, 2025 -
DISABLED test_aoti (__main__.TestMemoryPlanning)
#145211 closed
Jul 7, 2025 -
DISABLED test_graph_partition_forward_backward_not_called (__main__.CudaGraphTreeTests)
#157642 closed
Jul 7, 2025 -
Will the Metal4 update bring significant optimizations for future pytorch mps performance and compatibility?
#157660 closed
Jul 6, 2025 -
`torch.compile` fails with `UnicodeDecodeError` when model contains extreme value injection
#156451 closed
Jul 6, 2025 -
torch.utils.cpp_extension fails to parse clang version 20.1.7+libcxx
#157665 closed
Jul 6, 2025 -
Mispelled "paramter" in test_fully_shard_training.py
#157564 closed
Jul 5, 2025 -
test_ops.py extremely slow on cuda11.3
#79528 closed
Jul 5, 2025 -
Torch.compile Dynamo failed to run FX node with fake tensors
#157657 closed
Jul 5, 2025 -
Fix warning #177-D: variable "threshold" was declared but never referenced
#157653 closed
Jul 5, 2025 -
DISABLED test_is_isnot (__main__.TestScript)
#120694 closed
Jul 4, 2025 -
DISABLED test_sdpa_mask_fp16_L6_S17_NH23_HS121 (__main__.TestSDPA)
#138905 closed
Jul 4, 2025 -
DISABLED test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_int32 (__main__.TestForeachCUDA)
#156497 closed
Jul 4, 2025 -
DISABLED test_Linear_cuda_tf32 (__main__.TestNN)
#155216 closed
Jul 4, 2025 -
DISABLED test_graph_partition_forward_backward (__main__.CudaGraphTreeTests)
#157615 closed
Jul 4, 2025 -
Importing `torch` overwrites `typing.TypeIs` when `_running_with_deploy()` is true.
#153942 closed
Jul 4, 2025 -
INTERNAL ASSERT FAILED in mse_loss when mixing CPU and CUDA tensors
#154978 closed
Jul 4, 2025 -
DISABLED test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_int16 (__main__.TestForeachCUDA)
#156430 closed
Jul 4, 2025 -
DISABLED test_graph_partition_dynamic_shapes (__main__.CudaGraphTreeTests)
#157555 closed
Jul 4, 2025 -
[aot_compile]Explanation: Dynamo does not know how to trace the builtin `time.time.`
#157352 closed
Jul 4, 2025 -
[inductor] `F.fractional_max_pool2d` throws `AssertionError` on Inductor when input `rank=3`
#156682 closed
Jul 4, 2025 -
Set inplace operations are not updating the set inplace
#153552 closed
Jul 4, 2025 -
dynamo cannot trace global op_set .__contains__
#145761 closed
Jul 4, 2025 -
Why scale value of GradScaler sudden changed?
#157436 closed
Jul 4, 2025 -
Incorrect inference of the groups parameter type for channel_stuffle (int misclassified as Tensor)
#157602 closed
Jul 4, 2025 -
A more flexible API for torch.compile fullgraph=True
#144908 closed
Jul 3, 2025 -
Suggestion: integration of einops test suite
#146782 closed
Jul 3, 2025 -
DISABLED test_set_stance_aot_eager_then_compile (__main__.DecoratorTests)
#148644 closed
Jul 3, 2025 -
DISABLED test_graph_partition_custom_op_no_split (__main__.CudaGraphTreeTests)
#157532 closed
Jul 3, 2025 -
Wrong vector shift results on PowerPC
#109777 closed
Jul 3, 2025 -
Enhanced torch.chunk and torch.split
#60531 closed
Jul 3, 2025 -
DISABLED test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_float64 (__main__.TestForeachCUDA)
#153544 closed
Jul 3, 2025 -
DISABLED test_graph_partition_custom_op_mutation (__main__.CudaGraphTreeTests)
#157448 closed
Jul 3, 2025 -
DISABLED [WORKFLOW_NAME] / [PLATFORM_NAME] / [JOB_NAME]
#157530 closed
Jul 3, 2025 -
[CPU][flex attention] Llama 3 failed on CPU with PyTorch 2025-06-22 nightly wheel
#156688 closed
Jul 3, 2025 -
an illegal memory access was encountered global exception
#136407 closed
Jul 3, 2025 -
Bug with "make latexpdf"
#135420 closed
Jul 3, 2025 -
some tests regarding `torch.export` in `transformers` fail with `torch 2.8.0 rc` but pass with `torch 2.7.1`
#157284 closed
Jul 2, 2025 -
Tiny Typo in Docs
#157444 closed
Jul 2, 2025 -
[inductor] vision_maskrcnn dashboard failure on H100 (and MI300)
#157316 closed
Jul 2, 2025 -
Profiler: Add hide metadata flag to skip events in key_averages() table
#155213 closed
Jul 2, 2025 -
torch.compile triton kernel errors when there are """ docblocks
#155006 closed
Jul 2, 2025 -
DISABLED test_graph_partition_custom_op_dynamoc_shapes (__main__.CudaGraphTreeTests)
#157426 closed
Jul 2, 2025 -
DISABLED test_graph_partition_custom_op (__main__.CudaGraphTreeTests)
#157412 closed
Jul 2, 2025 -
AttributeError: '_OpNamespace' 'aten' object has no attribute 'momentum'
#145274 closed
Jul 2, 2025 -
DISABLED test_name_match (__main__.TestGuardSerialization)
#156246 closed
Jul 2, 2025 -
DISABLED test_shape_env (__main__.TestGuardSerialization)
#156264 closed
Jul 2, 2025 -
DISABLED test_graph_partition_cpu_tensor_symints (__main__.CudaGraphTreeTests)
#157366 closed
Jul 2, 2025 -
torch.export produce stack_trace for output node that can fail decomposition
#157183 closed
Jul 1, 2025 -
[MPS] `torch.compile` fails on `torch.linalg.cholesky` (possible memory layout issue?)
#156658 closed
Jul 1, 2025 -
DISABLED test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_float32 (__main__.TestForeachCUDA)
#153470 closed
Jul 1, 2025 -
test issue, ignore this
#157151 closed
Jul 1, 2025 -
the example program using libtorch is not linked against torch_cuda even when USE_CUDA is defined
#148770 closed
Jul 1, 2025 -
DISABLED test_graph_partition_cpu_scalar_mutation (__main__.CudaGraphTreeTests)
#157358 closed
Jul 1, 2025 -
[ROCm] support torch._C._set_sm_carveout_experimental - Parity with Nvidia
#149280 closed
Jul 1, 2025 -
DISABLED test_graph_partition_cpu_scalar4 (__main__.CudaGraphTreeTests)
#157347 closed
Jul 1, 2025 -
DISABLED test_graph_partition_cpu_scalar3 (__main__.CudaGraphTreeTests)
#157338 closed
Jul 1, 2025 -
[inductor] [triton backend] `Conv2d-unsqueeze-AdaptiveAvgPool3d` output incorrect results on inductor
#157248 closed
Jul 1, 2025 -
DISABLED test_graph_partition_cpu_scalar2 (__main__.CudaGraphTreeTests)
#157311 closed
Jul 1, 2025 -
avoid guarding on max() unnecessarily
#149635 closed
Jun 30, 2025 -
[Upstream Triton] Support new host-side TMA API in user-defined triton kernels
#155574 closed
Jun 30, 2025 -
[feature request][AOTI] Expand check input assertions to cover input guards created during compilation?
#151925 closed
Jun 30, 2025 -
DISABLED test_lowering_to_x86 (__main__.TestQuantizePT2EX86Inductor)
#153140 closed
Jun 30, 2025 -
aot inductor intermediate tensor debug printing (setting 2) not working
#145425 closed
Jun 30, 2025 -
Certain MPS operations didn't properly check for data type
#157303 closed
Jun 30, 2025 -
Missing MPS-compatible build for PyTorch 2.7.1 on Apple Silicon (M4)
#157271 closed
Jun 30, 2025 -
Native BFloat16 Mixed BatchNorm Train gives incorrect gradients
#156513 closed
Jun 30, 2025 -
[release] Make pytorch source distribution package respect pep-0517
#150461 closed
Jun 30, 2025 -
DISABLED test_ddp_uneven_inputs (__main__.TestDistBackendWithSpawn)
#75648 closed
Jun 30, 2025 -
DISABLED test_graph_partition_cpu_scalar1 (__main__.CudaGraphTreeTests)
#157277 closed
Jun 30, 2025 -
DISABLED test_quantize (__main__.TestOpenReg)
#156089 closed
Jun 30, 2025 -
DISABLED test_jacobian_vectorize_raises_no_warnings_logging_tensor (__main__.TestAutogradFunctional)
#153707 closed
Jun 30, 2025 -
DISABLED test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_float16 (__main__.TestForeachCUDA)
#153379 closed
Jun 30, 2025 -
RuntimeError: d.is_cuda() INTERNAL ASSERT FAILED at "/pytorch/c10/cuda/impl/CUDAGuardImpl.h"
#151486 closed
Jun 30, 2025 -
DISABLED test_reorder_peak_memory (__main__.TestOperatorReorderForPeakMemory)
#145332 closed
Jun 30, 2025 -
DISABLED test_graph_partition_cpu_op_and_dynamic_shapes (__main__.CudaGraphTreeTests)
#157257 closed
Jun 30, 2025
127 Issues opened by 74 people
-
[inductor][fuzzer] Compilation Error in complex64+toint
#157683 opened
Jul 7, 2025 -
CONTRIBUTING.md install command incorrect
#157680 opened
Jul 6, 2025 -
Flex Attention breaks in certain cases when used with a learned bias
#157677 opened
Jul 6, 2025 -
Cannot create a mask for each sequence in a batch with Flex Attention
#157675 opened
Jul 6, 2025 -
extern declaration of the entity XXX is treated as a static definition
#157674 opened
Jul 6, 2025 -
Inductor throws UnicodeDecodeError when compiling a simple model on Windows with MSVC
#157673 opened
Jul 6, 2025 -
Feedback about Getting Started on Intel GPU
#157672 opened
Jul 6, 2025 -
NCCL error caused due to use of NVLS in torch 2.7.1-cu128 on aarch64 gb200 cluster
#157668 opened
Jul 6, 2025 -
ConvNd ops in channel last layout (N,L,C) / (N,H,W,C) / (N,D,H,W,C)
#157663 opened
Jul 5, 2025 -
OffsetBasedRNGTracker's run_state_sync causes deadlock due to inconsistent broadcast order across ranks
#157662 opened
Jul 5, 2025 -
RuntimeError: operator torchvision::nms does not exist
#157648 opened
Jul 4, 2025 -
DISABLED test_vmap_exhaustive_dot_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157644 opened
Jul 4, 2025 -
DISABLED test_graph_partition_forward_backward_not_called (__main__.CudaGraphTreeTests)
#157643 opened
Jul 4, 2025 -
Einsum of 2 dtensors fails in inference mode
#157631 opened
Jul 4, 2025 -
Regression: torch.distributed.gather_object segfaults
#157627 opened
Jul 4, 2025 -
Segmentation faults in test_ops.py tests with gcc13 on AArch64 (v1)
#157626 opened
Jul 4, 2025 -
file_name is not correctly read in here
#157624 opened
Jul 4, 2025 -
`TORCH_DISTRIBUTED_DEBUG=DETAIL` causes DTensors to raise errors
#157622 opened
Jul 4, 2025 -
ResNet Onnx export dynamic batch size exported as fixed batch size
#157621 opened
Jul 4, 2025 -
DISABLED test_vmap_exhaustive_addmv_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157617 opened
Jul 4, 2025 -
DISABLED test_graph_partition_forward_backward (__main__.CudaGraphTreeTests)
#157616 opened
Jul 4, 2025 -
`torch.compile` fails on `prims.broadcast_in_dim` with alias annotation error
#157610 opened
Jul 4, 2025 -
`torch.compile` fails on `torch.vdot` with complex tensors
#157607 opened
Jul 4, 2025 -
Both DTensor TP and SP are missing the last collective in the backward pass
#157606 opened
Jul 4, 2025 -
Incorrect inference of the groups parameter type for channel_stuffle (int misclassified as Tensor)
#157603 opened
Jul 4, 2025 -
PyTorch 2.7.1 will probably break with einops 0.8.2 or 0.9.0
#157601 opened
Jul 4, 2025 -
PT2E Quantization Migration Tracker
#157591 opened
Jul 3, 2025 -
[DTensor] Better communication cost model for redistribute
#157585 opened
Jul 3, 2025 -
[precompile] Precompile failure on nanogpt training
#157577 opened
Jul 3, 2025 -
torch.compile with numpy code differs from numpy's behavior
#157569 opened
Jul 3, 2025 -
DISABLED test_graph_partition_dynamic_shapes (__main__.CudaGraphTreeTests)
#157556 opened
Jul 3, 2025 -
Add full support for NVIDIA RTX Pro 6000 (Blackwell – SM122 / Compute Capability 12.2)
#157549 opened
Jul 3, 2025 -
Nightly cu128 aarch64 wheels haven't been built for weeks
#157548 opened
Jul 3, 2025 -
Several `torch.*` functions raise uninformative `NotImplementedError`s when called with integer `dtype`
#157547 opened
Jul 3, 2025 -
test_dtensor.py::test_dtensor_save_load_import conflicts with autoloader importing torch._dynamo
#157545 opened
Jul 3, 2025 -
Vmap error raised by mask_mod of FlexAttention
#157543 opened
Jul 3, 2025 -
PyTorch fails to detect AVX through it's detected
#157538 opened
Jul 3, 2025 -
DISABLED test_vmap_exhaustive___rmatmul___cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157534 opened
Jul 3, 2025 -
DISABLED test_graph_partition_custom_op_no_split (__main__.CudaGraphTreeTests)
#157533 opened
Jul 3, 2025 -
pytorch
#157531 opened
Jul 3, 2025 -
[release 2.9] Deprecate support for Maxwell, Pascal, and Volta architectures
#157517 opened
Jul 3, 2025 -
Failure with cub::TransformInputIterator in 12.9 periodic CI test
#157502 opened
Jul 2, 2025 -
[DTensor] Improve `tensor_metadata` and `redistribute_cost` coverage for op strategies.
#157495 opened
Jul 2, 2025 -
Quantized version of Gather layer
#157490 opened
Jul 2, 2025 -
`FSDPModule.set_reduce_scatter_divide_factor` on subset of parameters is broken?
#157485 opened
Jul 2, 2025 -
torch.ops._c10d_functional_autograd.all_to_all_single missing dynamic shapes support
#157479 opened
Jul 2, 2025 -
torch 2.6 and torchvision 0.21.0 incompatibility?
#157476 opened
Jul 2, 2025 -
[AOTI] Unit test for testing load_state_dict and
#157474 opened
Jul 2, 2025 -
Nightly NCCL builds are missing optional features from NCCL
#157465 opened
Jul 2, 2025 -
vLLM tests failing in torch 2.8rc but passing with torch 2.7
#157461 opened
Jul 2, 2025 -
torch._dynamo.exc.InternalTorchDynamoError: RuntimeError: Compiler: cl is not found
#157458 opened
Jul 2, 2025 -
RNN pseudocode wrong?
#157457 opened
Jul 2, 2025 -
Deprecation of CUTLASS Python interface
#157456 opened
Jul 2, 2025 -
we should graph break on nn.Parameter constructors
#157452 opened
Jul 2, 2025 -
Dynamo's einops version check is bogus
#157451 opened
Jul 2, 2025 -
DISABLED test_op_has_batch_rule_vdot_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157450 opened
Jul 2, 2025 -
DISABLED test_graph_partition_custom_op_mutation (__main__.CudaGraphTreeTests)
#157449 opened
Jul 2, 2025 -
FlexAttention + int64 indexing
#157446 opened
Jul 2, 2025 -
DDP+TP composition does not work as expected
#157445 opened
Jul 2, 2025 -
[Regression] The torchbench model resnet50_quantized_qat fail_to_run in Pytorch 2.8 but pass in PyTorch 2.7
#157434 opened
Jul 2, 2025 -
``torch.quantile`` edge case
#157431 opened
Jul 2, 2025 -
DISABLED test_graph_partition_custom_op_dynamoc_shapes (__main__.CudaGraphTreeTests)
#157428 opened
Jul 2, 2025 -
DISABLED test_addmm_relu_cuda_float32 (__main__.TestLinalgCUDA)
#157427 opened
Jul 2, 2025 -
nll_loss gives result when both input and target are 1D tensor
#157420 opened
Jul 2, 2025 -
einops 0.6.1 x torch.compile broken in pytorch nightlies
#157417 opened
Jul 2, 2025 -
DISABLED test_old_cholesky_batched_upper_cuda_float32 (__main__.TestLinalgCUDA)
#157415 opened
Jul 2, 2025 -
DISABLED test_graph_partition_custom_op (__main__.CudaGraphTreeTests)
#157413 opened
Jul 2, 2025 -
[CI] s390x-periodic tests broken with "No matching distribution found for cuda-bindings<13.0,>=12.0"
#157409 opened
Jul 2, 2025 -
[autograd] Slowdown in backward after #151079
#157407 opened
Jul 1, 2025 -
Calling unbind on 2D NestedTensor throws RuntimeError
#157404 opened
Jul 1, 2025 -
AOTI: Failure in compile_fx.py with FakeScriptObject (with possible fix)
#157401 opened
Jul 1, 2025 -
[dynamo] using disable inside of compile always recompiles
#157399 opened
Jul 1, 2025 -
Cannot copy data from one gpu to another using torch
#157398 opened
Jul 1, 2025 -
[dynamo] non-strict trace'd functions cannot return constants
#157397 opened
Jul 1, 2025 -
[FSDP2] figure out the contract for mp_policy and tensor subclass extention
#157395 opened
Jul 1, 2025 -
How to compose HSDP with CP?
#157393 opened
Jul 1, 2025 -
[FSDP2] document the contract for modifying DTensor model.parameters()
#157391 opened
Jul 1, 2025 -
Torch is unusable when cuda-12.4 is installed locally
#157381 opened
Jul 1, 2025 -
[CI] M2Pro MacOS-15 tests are unstable again
#157379 opened
Jul 1, 2025 -
DISABLED test_graph_partition_cpu_tensor_symints (__main__.CudaGraphTreeTests)
#157367 opened
Jul 1, 2025 -
[MPS] test_linalg_cholesky fails on M4
#157364 opened
Jul 1, 2025 -
torch.Tensor.addmm_ The calculation result is inconsistent with the formula calculation result
#157360 opened
Jul 1, 2025 -
DISABLED test_graph_partition_cpu_scalar_mutation (__main__.CudaGraphTreeTests)
#157359 opened
Jul 1, 2025 -
Bug in cmake/public/cuda.cmake: Incorrect use of set(${...}) leads to missing CUDA version in error message
#157354 opened
Jul 1, 2025 -
DISABLED test_graph_partition_cpu_scalar4 (__main__.CudaGraphTreeTests)
#157350 opened
Jul 1, 2025 -
DISABLED test_matmul_small_brute_force_3d_Nd_cuda_float32 (__main__.TestLinalgCUDA)
#157349 opened
Jul 1, 2025 -
DISABLED test_conv2d_api (__main__.TestQuantizedFunctionalOps)
#157346 opened
Jul 1, 2025 -
nn.rmsnorm is super slower than nn.layernorm
#157345 opened
Jul 1, 2025 -
ImportError: cannot import name 'scaled_mm_configs' from 'torch._inductor.kernel.mm_common
#157343 opened
Jul 1, 2025 -
DISABLED test_graph_partition_cpu_scalar3 (__main__.CudaGraphTreeTests)
#157339 opened
Jul 1, 2025 -
DISABLED test_matmul_small_brute_force_2d_Nd_cuda_float32 (__main__.TestLinalgCUDA)
#157337 opened
Jul 1, 2025 -
Inefficient 2D convolution compared to JAX
#157334 opened
Jul 1, 2025 -
[inductor][dynamic shapes] hugging face models fail while creating error guard
#157330 opened
Jun 30, 2025 -
Regression in llama2 model export
#157323 opened
Jun 30, 2025 -
Symmetric memory test failed with TORCH_SYMMMEM=NVSHMEM
#157321 opened
Jun 30, 2025 -
Torch Elastic Wait timeout increase
#157318 opened
Jun 30, 2025 -
DISABLED test_linalg_solve_triangular_cuda_float32 (__main__.TestLinalgCUDA)
#157314 opened
Jun 30, 2025 -
DISABLED test_graph_partition_cpu_scalar2 (__main__.CudaGraphTreeTests)
#157312 opened
Jun 30, 2025 -
PyTorch Tutorial Audit - ONNX
#157300 opened
Jun 30, 2025 -
DISABLED test_tensordot_cuda (__main__.TestLinalgCUDA)
#157297 opened
Jun 30, 2025 -
DISABLED test_conv1d_api (__main__.TestQuantizedFunctionalOps)
#157296 opened
Jun 30, 2025 -
[export] run_decompositions generates inefficient operations
#157289 opened
Jun 30, 2025 -
DISABLED test_linalg_matrix_exp_compare_with_taylor_cuda_float32 (__main__.TestLinalgCUDA)
#157282 opened
Jun 30, 2025 -
DISABLED test_graph_partition_cpu_scalar1 (__main__.CudaGraphTreeTests)
#157280 opened
Jun 30, 2025 -
DISABLED test_addmm_dynamic_shapes_cuda (__main__.DynamicShapesGPUTests)
#157279 opened
Jun 30, 2025 -
DISABLED test_op_has_batch_rule_nn_functional_conv2d_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157278 opened
Jun 30, 2025 -
`torch 2.8 RC` gives 10000 larger output difference in some `transformers` tests
#157274 opened
Jun 30, 2025 -
`test_can_compile_fast_image_processor` in `transformers` pass with `torch 2.7` but fail with `torch 2.8 RC`
#157273 opened
Jun 30, 2025 -
Better typechecking of `int` only-operators `|`, `^`, `&`, `<<`, `>>`, `~` and `@`
#157266 opened
Jun 30, 2025 -
The opp is not compatible with compile mode="reduce-overhead" and linear layers for large inputs.
#157363 opened
Jun 30, 2025
418 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[ONNX] remove unnecessary slices before converting into onnx
#157192 commented on
Jul 3, 2025 • 23 new comments -
[AOTI] codegen for static linkage
#157129 commented on
Jul 4, 2025 • 16 new comments -
[dynamo] Add FakeProcessGroup support for fx_graph_runnable with distributed collectives
#157162 commented on
Jul 4, 2025 • 9 new comments -
Fix torch.export.export() GPU failure with RNN modules.
#155734 commented on
Jul 1, 2025 • 9 new comments -
[AOTI][Intel GPU] Add XPU quantization ops to AOT Inductor.
#156572 commented on
Jul 3, 2025 • 8 new comments -
[DDP][FSDP2] Add unit test for DDP mixed precision with FSDP2 ignored params
#157140 commented on
Jul 2, 2025 • 8 new comments -
`fast-autotune`: Model Prediction of Triton Kernel Runtimes
#156851 commented on
Jun 30, 2025 • 7 new comments -
[DLPack] Add support for missing keyword-arguments.
#150218 commented on
Jul 4, 2025 • 5 new comments -
Added philox based RNG context for HPU device in Dtensor scenarios
#156581 commented on
Jul 3, 2025 • 5 new comments -
Optimize scatter/gather kernel for ARM.
#156161 commented on
Jul 4, 2025 • 4 new comments -
[WIP] Automatically load and save dynamo entries via caching_precompile
#155913 commented on
Jul 3, 2025 • 4 new comments -
Fused RMSNorm implementation
#153666 commented on
Jul 2, 2025 • 4 new comments -
multi-kernel matmuls based on varying hint sizes
#156628 commented on
Jul 3, 2025 • 4 new comments -
[ZENDNN] Integrate ZenDNN library, implement Linear op, add unit-tests
#156599 commented on
Jul 2, 2025 • 4 new comments -
[dynamo] Avoid recompiling over unused objects
#156891 commented on
Jul 3, 2025 • 3 new comments -
[ROCm] logsumexp on ROCm needs scaling back to natural base.
#156903 commented on
Jul 2, 2025 • 3 new comments -
[ci][cutlass backend] Add ci for cutlass backend tests
#156626 commented on
Jul 3, 2025 • 3 new comments -
Adapting pipeline parallelism test cases to be device agnostic
#155108 commented on
Jul 2, 2025 • 3 new comments -
[HOP, map] Rework of map autograd to the new interface
#153343 commented on
Jul 4, 2025 • 3 new comments -
Fix slice op redistribute_cost compute
#157178 commented on
Jul 3, 2025 • 3 new comments -
[scan] Fix issues with scan on CPU and for autograd when implementing an RNN with multiple layers
#155422 commented on
Jul 2, 2025 • 3 new comments -
[Inductor] Set the default value of min_chunk_size to 512
#150762 commented on
Jul 2, 2025 • 3 new comments -
Add cascade sum support for Inductor CPP backend
#156296 commented on
Jul 3, 2025 • 2 new comments -
Fix: fallback in deserialize_torch_artifact for ScriptObject using weights_only=FalseFix: fallback in deserialize_torch_artifact for ScriptObject using we…
#154333 commented on
Jul 1, 2025 • 2 new comments -
[BE] add a linter to check consistency for cmake minimum version in requirements
#156961 commented on
Jul 3, 2025 • 2 new comments -
[TEST] triton Update 3.4 - 2
#156664 commented on
Jul 4, 2025 • 2 new comments -
Update _torch_docs.py to Fix torch.bernoulli()
#152104 commented on
Jul 2, 2025 • 2 new comments -
[build] remove upper version pin for `setuptools<80.0`
#156049 commented on
Jul 4, 2025 • 2 new comments -
[generator] Close all open generators in compile_subgraph
#157149 commented on
Jul 2, 2025 • 2 new comments -
[CUDA] Use runtime driver API for cuStreamWriteValue32
#156097 commented on
Jul 3, 2025 • 1 new comment -
Enhance testing infrastructure to add half-precision support for `histc` on XPU
#154339 commented on
Jul 3, 2025 • 1 new comment -
ROCm OCP Micro-scaling Format (mx-fp8/mx-fp4) Support
#151360 commented on
Jul 1, 2025 • 1 new comment -
[oss] Add version to metadata
#155343 commented on
Jul 2, 2025 • 1 new comment -
[TEST] Triton 3.4.0 pin update
#156186 commented on
Jul 4, 2025 • 1 new comment -
Use CMake wholearchive group
#156393 commented on
Jul 7, 2025 • 1 new comment -
[dynamo][fsdp] Consistent behavior of int attributes
#157262 commented on
Jul 2, 2025 • 0 new comments -
[BE]: Update CUTLASS submodule to 4.0.0
#153541 commented on
Jul 4, 2025 • 0 new comments -
implement MKLGenerator
#154199 commented on
Jul 3, 2025 • 0 new comments -
Upgrade MKL in CI
#154198 commented on
Jul 2, 2025 • 0 new comments -
[BE]: Update pybind11 submodule to 3.0.0rc
#154115 commented on
Jul 4, 2025 • 0 new comments -
DOC: update CrossEntropyLoss with note and example of incorrect target specification
#155649 commented on
Jul 3, 2025 • 0 new comments -
[pytorch_146643] fixed max triton generation
#154056 commented on
Jul 2, 2025 • 0 new comments -
[pytorch][triton] Enabling TMA for flex-attention for supported device types
#153662 commented on
Jul 3, 2025 • 0 new comments -
Add MPS implementation of CTC Loss based on CUDA version
#154044 commented on
Jul 2, 2025 • 0 new comments -
[dict] Raise TypeError in dict methods
#154003 commented on
Jul 5, 2025 • 0 new comments -
[list] Implement list.count
#153969 commented on
Jul 5, 2025 • 0 new comments -
[dict] Implement dict subclass `fromkeys` classmethod
#155608 commented on
Jul 5, 2025 • 0 new comments -
[OrderedDict] Set the correct dict class in UserDefinedDictVariable
#155502 commented on
Jul 5, 2025 • 0 new comments -
[OrderedDict] Add `bool(OrderedDict)`
#155503 commented on
Jul 5, 2025 • 0 new comments -
FractionalMaxPool3d add kernel_size check
#155549 commented on
Jul 4, 2025 • 0 new comments -
Fix conversion of values in libtorch agnostic tests
#155115 commented on
Jul 2, 2025 • 0 new comments -
Fixes #154982: add missing to_result_dtype in vector_norm
#155111 commented on
Jul 1, 2025 • 0 new comments -
[OrderedDict] Implement `OrderedDict.move_to_end(key, last=False)`
#155152 commented on
Jul 5, 2025 • 0 new comments -
[dict] Implement dict.__ior__ and fix return type in dict.__or__
#155072 commented on
Jul 5, 2025 • 0 new comments -
[OrderedDict] Implement `OrderedDict.popitem(last=...)`
#155153 commented on
Jul 5, 2025 • 0 new comments -
[Intel GPU] Refactor Matmul integration: Modularize bias handling and memory creation
#154977 commented on
Jul 3, 2025 • 0 new comments -
[dict] Implement `__eq__` for dict_items
#155154 commented on
Jul 5, 2025 • 0 new comments -
update the baseline for nightly max_autotune tests
#154973 commented on
Jul 1, 2025 • 0 new comments -
[OrderedDict] Implement explicit OrderedDict dunder method call
#154943 commented on
Jul 5, 2025 • 0 new comments -
[dict] Implement dict.__eq__ and dict.__ne__
#154942 commented on
Jul 5, 2025 • 0 new comments -
[BE]: Try to enable LTO
#154819 commented on
Jul 5, 2025 • 0 new comments -
[dict] Allow Dynamo to trace through explicit dict dunder method call
#154794 commented on
Jul 5, 2025 • 0 new comments -
[dict] Add dict.popitem
#154793 commented on
Jul 5, 2025 • 0 new comments -
[vision hash update] update the pinned vision hash
#154694 commented on
Jul 7, 2025 • 0 new comments -
Use official CUDAToolkit module in CMake
#154595 commented on
Jul 2, 2025 • 0 new comments -
Fix MKL error: Inconsistent configuration parameters
#154585 commented on
Jul 3, 2025 • 0 new comments -
[OrderedDict] Implement `hasattr(..., IteratorVariable)`
#155501 commented on
Jul 5, 2025 • 0 new comments -
[cpp_wrapper] Build main and kernel code in separate threads
#154551 commented on
Jul 4, 2025 • 0 new comments -
[Dynamo] Guard serialization for BUILTIN_MATCH
#152729 commented on
Jul 6, 2025 • 0 new comments -
Update the signature and test of torch.hamming_window()
#152682 commented on
Jul 3, 2025 • 0 new comments -
Raise error when no record on extra_files
#152664 commented on
Jul 2, 2025 • 0 new comments -
Add assert_fp8_close helper for FP8 tensor comparisons
#152651 commented on
Jul 5, 2025 • 0 new comments -
[BE]remove vulkan test
#152643 commented on
Jul 1, 2025 • 0 new comments -
[pytree] make `tree_*` functions accept both Python and C++ `PyTreeSpec`
#152624 commented on
Jul 3, 2025 • 0 new comments -
Parameterized CUDA Graph Launch
#152622 commented on
Jul 1, 2025 • 0 new comments -
Update padding_mode type annotation to use Literal type (PaddingMode)
#152610 commented on
Jul 1, 2025 • 0 new comments -
[Testing] Is FindCUDA.cmake from `Modules_CUDA_fix` called at all?
#152604 commented on
Jun 30, 2025 • 0 new comments -
[BE] Delete `Module_CUDA_fix`
#152603 commented on
Jul 1, 2025 • 0 new comments -
[BE] Update numba versions
#152557 commented on
Jul 6, 2025 • 0 new comments -
[compile async] [cache] testing
#152523 commented on
Jul 6, 2025 • 0 new comments -
[inductor] [compile async] Don't compile in eager
#152507 commented on
Jul 5, 2025 • 0 new comments -
fix: Update padding_mode to use Literal for type checking
#152458 commented on
Jul 1, 2025 • 0 new comments -
Add epoch to fake tensor cache key
#152453 commented on
Jul 1, 2025 • 0 new comments -
fix: outdated contents in dynamo overview
#152382 commented on
Jul 6, 2025 • 0 new comments -
Updates to build on Noble (Ubuntu24.04) and py3.12
#152240 commented on
Jul 4, 2025 • 0 new comments -
IGNORE: Testing OIDC
#152181 commented on
Jun 30, 2025 • 0 new comments -
Extend compute_global_tensor_shape to multi dimension sharding
#152166 commented on
Jul 2, 2025 • 0 new comments -
Add dynamo config to HOP-ify context managers
#152159 commented on
Jul 2, 2025 • 0 new comments -
Add standard Python source distribution generation to (pre-)release workflow
#152098 commented on
Jul 3, 2025 • 0 new comments -
[UniformValueConstantFolder] deduce value on CPU rather than on device
#151998 commented on
Jul 7, 2025 • 0 new comments -
docs: add torch.e and torch.pi to constants table (#134964)
#151996 commented on
Jul 6, 2025 • 0 new comments -
Skip fuse attention on fp32 if not tf32
#151924 commented on
Jul 4, 2025 • 0 new comments -
Idea: Add SBOM Generation (and optional vuln scan) for better supply chain insight
#156085 commented on
Jul 2, 2025 • 0 new comments -
[CUDA] Allow cuDNN or flash attn in `test_activation_checkpointing` pattern match check
#153272 commented on
Jul 4, 2025 • 0 new comments -
fix dtensor and tensor inconsistent compute mesh
#153268 commented on
Jul 7, 2025 • 0 new comments -
Adding XPU support to DTensor examples
#153213 commented on
Jul 1, 2025 • 0 new comments -
[TESTING] Triton pin (Jul 1) f81f19a7f6cb7f905fde3195014c1bf51659642f
#153117 commented on
Jul 2, 2025 • 0 new comments -
Add CUDA support for Adagrad(fused=True)
#153038 commented on
Jul 1, 2025 • 0 new comments -
[WIP][dynamic shapes] unbacked safer cat, repeat
#153011 commented on
Jul 6, 2025 • 0 new comments -
[Pytorch] Add `torch.cuda.streams.Event` to save torch functions list
#152978 commented on
Jul 6, 2025 • 0 new comments -
[dtensor] Extend Partial partition of replicated tensor for min/max reduce
#152975 commented on
Jul 7, 2025 • 0 new comments -
docs: Improve documentation for NCCL timeout / watchdog variables
#152959 commented on
Jul 6, 2025 • 0 new comments -
[ROCm] Ck gemm architecture guard
#152951 commented on
Jun 30, 2025 • 0 new comments -
[feature] Channel Wise Parallel API for Conv layers
#152937 commented on
Jul 6, 2025 • 0 new comments -
Allow Inductor backends to attest their own availability
#152933 commented on
Jul 5, 2025 • 0 new comments -
Add overall tensor similarity comparison (#152647)
#152920 commented on
Jul 6, 2025 • 0 new comments -
Clarify wrap_triton doc about optional triton_op usage
#152874 commented on
Jul 5, 2025 • 0 new comments -
ci: Remove conda-env-macOS-ARM64, prefer pip
#152843 commented on
Jul 5, 2025 • 0 new comments -
[MSVC] Enable updated lambda processor by setting compiler flag /Zc:lambda globally
#152828 commented on
Jul 5, 2025 • 0 new comments -
another try
#152808 commented on
Jul 4, 2025 • 0 new comments -
wip
#152807 commented on
Jul 4, 2025 • 0 new comments -
Update CMakeLists.txt
#152786 commented on
Jul 6, 2025 • 0 new comments -
added short integer for repeat_interleave_cpu, Fixes #151311
#152762 commented on
Jul 5, 2025 • 0 new comments -
Allow ATen ops overloading
#152759 commented on
Jul 4, 2025 • 0 new comments -
Handle less functions than number of segments
#152753 commented on
Jul 6, 2025 • 0 new comments -
Conditionally support experimental filesystem include in jit_opt_limit
#152748 commented on
Jul 5, 2025 • 0 new comments -
[BE][Cleanup][Dynamo] Stop logging entire_frame_compile_time_s
#152738 commented on
Jul 5, 2025 • 0 new comments -
docs: fix dead link in torch.compile docs
#152734 commented on
Jul 5, 2025 • 0 new comments -
[BE]: Update NCCL to 2.27.5
#157108 commented on
Jul 4, 2025 • 0 new comments -
[Quant][CPU] Enable fp8 qconv
#157076 commented on
Jul 7, 2025 • 0 new comments -
Build CPP Extensions with COLOR
#157051 commented on
Jun 30, 2025 • 0 new comments -
Use std::string_view in torchgen
#157050 commented on
Jul 2, 2025 • 0 new comments -
[a2av] Make test input more random
#157029 commented on
Jul 3, 2025 • 0 new comments -
[EXPERIMENTAL][dynamo] Avoid potential graph breaks by relaxing `handle_traced_output` checks
#157013 commented on
Jul 2, 2025 • 0 new comments -
[itertools] Add CPython tests for itertools
#156981 commented on
Jul 2, 2025 • 0 new comments -
[CI] add decorator for specifying H100-only tests
#156980 commented on
Jun 30, 2025 • 0 new comments -
[TESTING] test new xpu runner
#156917 commented on
Jun 30, 2025 • 0 new comments -
Track monitor
#156907 commented on
Jul 1, 2025 • 0 new comments -
Add cuda 12.9 periodic tests
#156900 commented on
Jul 3, 2025 • 0 new comments -
ci: Add ability to test images for build-triton-wheel
#156894 commented on
Jul 1, 2025 • 0 new comments -
[refactor][dynamo] extract a helper function create_resume_fn from create_call_resume_at
#156869 commented on
Jul 3, 2025 • 0 new comments -
[TESTING] [DO NOT MERGE] Updated triton commit pin - upstream base
#156841 commented on
Jul 2, 2025 • 0 new comments -
[logging] [redo] dynamo_timed for CachingAutotuner.coordinate_descent_tuning
#156840 commented on
Jul 3, 2025 • 0 new comments -
[gtest][listing] Enable gtest json listing for the fbcode/caffe2 project
#156816 commented on
Jul 7, 2025 • 0 new comments -
add device generalization support for distributed tests
#156796 commented on
Jul 4, 2025 • 0 new comments -
[inductor] initial triton static config lookup table
#156785 commented on
Jun 30, 2025 • 0 new comments -
[cherry-pick] revert #156552
#156767 commented on
Jul 4, 2025 • 0 new comments -
add tests for Thunk utility function
#156759 commented on
Jun 30, 2025 • 0 new comments -
Add back manywheel-py3_9-cuda12_4-build/test
#156753 commented on
Jul 6, 2025 • 0 new comments -
WIP `fast_autotune`: Add lookup table and ML model to filter triton matmul configs
#156683 commented on
Jul 1, 2025 • 0 new comments -
Enable set SDPA backend by torch.nn.attention.sdpa_kernel on XPU
#156669 commented on
Jul 4, 2025 • 0 new comments -
[invoke_subgraph] make same subgraph share get_attr target
#157253 commented on
Jun 30, 2025 • 0 new comments -
[cc][pac] attempt 1.1
#157250 commented on
Jun 30, 2025 • 0 new comments -
[user triton] AOT inductor support for device-side TMA
#157241 commented on
Jul 2, 2025 • 0 new comments -
remove allow-untyped-defs from torch/ao/pruning/_experimental/pruner/parametrization.py
#157235 commented on
Jul 1, 2025 • 0 new comments -
remove allow-untyped-defs from torch/ao/nn/quantized/modules/rnn.py
#157234 commented on
Jul 5, 2025 • 0 new comments -
remove allow-untyped-defs from torch/backends/mkl/__init__.py
#157233 commented on
Jul 2, 2025 • 0 new comments -
remove allow-untyped-defs from torch/backends/cusparselt/__init__.py
#157232 commented on
Jul 5, 2025 • 0 new comments -
remove allow-untyped-defs from torch/_classes.py
#157231 commented on
Jul 2, 2025 • 0 new comments -
remove allow-untyped-defs from torch/utils/data/_utils/fetch.py
#157230 commented on
Jul 1, 2025 • 0 new comments -
remove allow-untyped-defs from torch/_lazy/__init__.py
#157228 commented on
Jul 2, 2025 • 0 new comments -
Updating default value of eps in RMSNorm documentation
#157223 commented on
Jul 2, 2025 • 0 new comments -
[DTensor][FSDP2] necessary changes to FSDP and TP to unblock EP
#157216 commented on
Jul 7, 2025 • 0 new comments -
fix type hints for interpolation functions
#157202 commented on
Jul 6, 2025 • 0 new comments -
Adding bias argument to NN normalization methods
#157198 commented on
Jul 2, 2025 • 0 new comments -
[DO NOT MERGE] Test new MI300X capacity.
#157191 commented on
Jul 2, 2025 • 0 new comments -
[Do Not Merge] moved pytorch mi300 worfklows to test scale sets
#157190 commented on
Jul 3, 2025 • 0 new comments -
[dynamo] auto-rewrite data-dependent if into torch.cond
#157161 commented on
Jul 3, 2025 • 0 new comments -
[dynamo] remove dead object from keepalive
#157159 commented on
Jul 2, 2025 • 0 new comments -
[WIP][CUDA][CI] Test B200 Runner with Nightly Inductor Perf Test
#157153 commented on
Jul 2, 2025 • 0 new comments -
[generator] Raise `StopIteration(value)` with value from the return stmt
#157152 commented on
Jul 2, 2025 • 0 new comments -
[nativert] libtorch kernel registry
#157150 commented on
Jul 7, 2025 • 0 new comments -
[contextlib] Fixes for CPython contextlib tests
#157148 commented on
Jul 2, 2025 • 0 new comments -
[Bugfix][Inductor] Fix dependency list merged incorrectly for a custom op with multiple mutated inputs and None return type.
#157133 commented on
Jul 1, 2025 • 0 new comments -
ReplaceWithCopy graph pass
#156666 commented on
Jul 1, 2025 • 0 new comments -
[WIP] Add a new API of allocator setting for accelerator
#156175 commented on
Jul 1, 2025 • 0 new comments -
Implementation of a ScannedModule
#156172 commented on
Jul 1, 2025 • 0 new comments -
[WIP] Deprecate some functions in CUDAAllocatorConfig, use AcceleratorAllocatorConfig instead
#156165 commented on
Jul 2, 2025 • 0 new comments -
[list] Raise exception in invalid list method call
#156148 commented on
Jul 5, 2025 • 0 new comments -
[executorch hash update] update the pinned executorch hash
#156141 commented on
Jul 7, 2025 • 0 new comments -
[cuDNN][64-bit indexing] update conv depthwise 64bit indexing dispatch condition to match native kernel
#156140 commented on
Jul 3, 2025 • 0 new comments -
Convert to markdown: jit.rst
#156094 commented on
Jun 30, 2025 • 0 new comments -
Fix atleast_{1,2,3}d() with no arguments description
#156042 commented on
Jul 1, 2025 • 0 new comments -
[BE][Easy] set end-of-line for `.bat` file to CRLF in `.editorconfig`
#156032 commented on
Jul 7, 2025 • 0 new comments -
[build] modernize build-frontend: `python setup.py develop/install` -> `[uv ]pip install --no-build-isolation [-e ].`
#156027 commented on
Jul 5, 2025 • 0 new comments -
[BE] add a minimal linter to check `pyproject.toml` consistency
#156017 commented on
Jul 5, 2025 • 0 new comments -
Handling overflow for long int overflow for the product of kernel_hei…
#155989 commented on
Jul 3, 2025 • 0 new comments -
[CI][cpp_wrapper] Fix selection of CPU OpInfo tests
#155967 commented on
Jul 2, 2025 • 0 new comments -
[FSDP2] Fix issue with set_reduce_scatter_divide_factor errors and MixedPrecisionPolicy
#155964 commented on
Jul 4, 2025 • 0 new comments -
HF loads dcp - don't do a full deserialize on every file
#155942 commented on
Jul 1, 2025 • 0 new comments -
[inductor] Add `-> bool` to functions named `is_*` or `_is_*`
#155928 commented on
Jul 4, 2025 • 0 new comments -
[dynamo] Add `-> bool` to functions named `is_*` or `_is_*`
#155923 commented on
Jul 5, 2025 • 0 new comments -
[NOT FOR MERGE] Exploratory work on AOTInductor training
#155877 commented on
Jul 4, 2025 • 0 new comments -
[einops] Ensure Dynamo can trace through explicit set dunder method call
#155842 commented on
Jul 2, 2025 • 0 new comments -
[doc] Updates to distributed.md for XCCL backend
#155834 commented on
Jul 3, 2025 • 0 new comments -
[DONT MERGE][TESTING][1/2] xpu test runner
#155793 commented on
Jun 30, 2025 • 0 new comments -
add sfdp pattern
#155792 commented on
Jul 2, 2025 • 0 new comments -
[Misc] handle sys exit caused by skip_if_lt_x_gpu in test_composabili…
#155665 commented on
Jul 2, 2025 • 0 new comments -
[C10d][Gloo] Enable complex datatype support in ProcessGroupGloo
#156633 commented on
Jun 30, 2025 • 0 new comments -
[BE] fix typo in torch/distributed/tensor/: childs -> children
#156609 commented on
Jul 6, 2025 • 0 new comments -
[BE] fix typo in torch/_numpy/_normalizations.py: parm -> param
#156608 commented on
Jul 6, 2025 • 0 new comments -
[BE][15/16] fix typos in torch/ (torch/distributed/tensor/)
#156605 commented on
Jul 6, 2025 • 0 new comments -
docstring_linter: Fix #151692 and other issues
#156596 commented on
Jul 4, 2025 • 0 new comments -
[Inductor Dashboard] Enable deterministic algorithms for some models
#156592 commented on
Jun 30, 2025 • 0 new comments -
[Doc] remove WSL2 in support matrix for Intel GPU
#156590 commented on
Jun 30, 2025 • 0 new comments -
[CPU] Fix memory access for sbgemm bf16
#156585 commented on
Jul 7, 2025 • 0 new comments -
[xla hash update] update the pinned xla hash
#156584 commented on
Jul 7, 2025 • 0 new comments -
Enable target-determination (TD) for ROCm CI
#156545 commented on
Jul 5, 2025 • 0 new comments -
[dynamo] Guard eagerly on list objects to avoid guard on getitem index
#156531 commented on
Jul 1, 2025 • 0 new comments -
[DO NOT MERGE] Update trunk.yml to change the runner that the job runs-on
#156491 commented on
Jul 4, 2025 • 0 new comments -
[ROCm][Windows] Fix finding ROCm/HIP version
#156486 commented on
Jul 2, 2025 • 0 new comments -
[DONT MERGE][TESTING][2/2] test new xpu runner
#156410 commented on
Jun 30, 2025 • 0 new comments -
[list] Add list.__delitem__
#156339 commented on
Jul 5, 2025 • 0 new comments -
[BE][2/16] fix typos in torch/ (torch/_*/)
#156312 commented on
Jul 6, 2025 • 0 new comments -
[BE][1/16] fix typos in torch/
#156311 commented on
Jul 6, 2025 • 0 new comments -
[list] Add list.__mul__ and list.__imul__
#156271 commented on
Jul 5, 2025 • 0 new comments -
Implement list.__add__ and list.__iadd__
#156270 commented on
Jul 5, 2025 • 0 new comments -
Add fallback-aware device checking for MPS operations
#156267 commented on
Jul 1, 2025 • 0 new comments -
[list] Implement `list.remove`
#156242 commented on
Jul 5, 2025 • 0 new comments -
[Native][CPU][TopK] Improve perf by reducing swap operations
#156183 commented on
Jul 1, 2025 • 0 new comments -
[NVIDIA] Refactor Family Blackwell Support codegen
#156176 commented on
Jul 2, 2025 • 0 new comments -
PyTorch CPP Extensions fail when same kernel is compiled more than once on ROCm servers
#155344 commented on
Jul 2, 2025 • 0 new comments -
SourcelessBuilder.create does not know how to wrap <class '__main__.InFlexData'>
#154009 commented on
Jul 2, 2025 • 0 new comments -
`torch.linalg.solve` does not raise an error for singular matrix on CPU.
#154842 commented on
Jul 2, 2025 • 0 new comments -
TORCH_COMPILE_DEBUG=1 does not consistently generate debug logs
#152374 commented on
Jul 2, 2025 • 0 new comments -
Quantile is limited to 16 million elements and have poor performance.
#64947 commented on
Jul 2, 2025 • 0 new comments -
[RFC] Remove the FSDP data copy from compute stream critical path
#157027 commented on
Jul 2, 2025 • 0 new comments -
DISABLED test_op_has_batch_rule_tensordot_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#142769 commented on
Jul 2, 2025 • 0 new comments -
DISABLED test_graph_partition_cpu_op_and_dynamic_shapes (__main__.CudaGraphTreeTests)
#157258 commented on
Jul 2, 2025 • 0 new comments -
DISABLED test_op_has_batch_rule_nn_functional_conv_transpose3d_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#82340 commented on
Jul 2, 2025 • 0 new comments -
DISABLED test_slice_scatter_reinplace_cuda (__main__.GPUTests)
#145189 commented on
Jul 2, 2025 • 0 new comments -
DISABLED test_module_and_optimizer_ids (__main__.TestTorchTidyProfiler)
#87581 commented on
Jul 2, 2025 • 0 new comments -
torch compile does not support SyncBatchNorm with fullgraph=True
#156680 commented on
Jul 2, 2025 • 0 new comments -
[RFC][API-Unstable] Intel GPU distributed Backend integration in `torch-xpu-ops`and registeration in PyTorch
#141741 commented on
Jul 2, 2025 • 0 new comments -
`torch.compile` creates a CUDA context even for CPU based code
#150622 commented on
Jul 1, 2025 • 0 new comments -
Support for Bazel workspace function or Bazel module
#112903 commented on
Jul 1, 2025 • 0 new comments -
Export + autocast is eating the exception
#153202 commented on
Jul 1, 2025 • 0 new comments -
TorchInductor CPU Performance Dashboard
#93531 commented on
Jul 1, 2025 • 0 new comments -
FSDP offload doesn't prefetch param to GPU
#157209 commented on
Jul 1, 2025 • 0 new comments -
Upgrade AWS lambda functions from version 2.x to 3.x of the AWS SDK for JavaScript
#137228 commented on
Jul 1, 2025 • 0 new comments -
cd: Migrate binary builds off of Jinja
#149660 commented on
Jul 1, 2025 • 0 new comments -
Add Python NoGil support in CI
#156854 commented on
Jul 1, 2025 • 0 new comments -
Add the XPU item to pytorch.org/get-started
#156810 commented on
Jul 1, 2025 • 0 new comments -
Vendored wheels on PyTorch pip repository are outdated (e.g., `cmake`, `certifi`)
#156694 commented on
Jul 1, 2025 • 0 new comments -
[OSS tooling] pytorchbot fail to revert a PR
#156607 commented on
Jul 1, 2025 • 0 new comments -
Add explicit typing to nn.Module __init__()
#156740 commented on
Jul 1, 2025 • 0 new comments -
Adafactor foreach impl performance tracker
#133367 commented on
Jul 3, 2025 • 0 new comments -
[RFC] Experimental Wheel Variant Support
#155141 commented on
Jul 3, 2025 • 0 new comments -
Triton pin update for PyTorch 2.8 / Triton 3.4
#154206 commented on
Jul 3, 2025 • 0 new comments -
cmake: add USE_SYSTEM_{KLEIDI,CUDNN_FRONTEND,CUTLASS} options to USE_SYSTEM_LIBS
#153863 commented on
Jul 3, 2025 • 0 new comments -
Add batched torch.combinations
#40375 commented on
Jul 3, 2025 • 0 new comments -
Python 3.14 support for PyTorch
#156856 commented on
Jul 3, 2025 • 0 new comments -
Allow `low` and `high` to be tensors in `torch.randint`
#89438 commented on
Jul 3, 2025 • 0 new comments -
DISABLED test_host_memory_stats (__main__.TestCuda)
#148607 commented on
Jul 3, 2025 • 0 new comments -
DISABLED test_matrix_rank_basic_cuda_float32 (__main__.TestLinalgCUDA)
#150406 commented on
Jul 3, 2025 • 0 new comments -
DISABLED test_inductor_all_gather_into_tensor_single (__main__.CompileTest)
#147707 commented on
Jul 3, 2025 • 0 new comments -
DISABLED test_per_sample_api_compute_batch_size_not_pytreeable_cpu (__main__.TestExpandedWeightModuleCPU)
#146972 commented on
Jul 3, 2025 • 0 new comments -
MPS Memory Leak
#154329 commented on
Jul 3, 2025 • 0 new comments -
[Doc] [Win] libuv installation doc is not correct.
#148315 commented on
Jul 3, 2025 • 0 new comments -
DISABLED test_fake_crossref_backward_no_amp_cholesky_solve_cuda_float32 (__main__.TestFakeTensorCUDA)
#156419 commented on
Jul 3, 2025 • 0 new comments -
[dynamo] Replace `unimplemented` with `unimplemented_v2`
#147913 commented on
Jul 3, 2025 • 0 new comments -
Most requested ops for the MPS backend
#154052 commented on
Jul 3, 2025 • 0 new comments -
[cudagraph] simplify usage of how cudagraph dumps debug file
#126753 commented on
Jul 3, 2025 • 0 new comments -
[MPS] Performance regression and visual bug with ComfyUI Flux dev since nightly 20250510
#155797 commented on
Jul 2, 2025 • 0 new comments -
[compile][transformers] Recompilation with mark_static_address with cudagraphs
#156377 commented on
Jul 2, 2025 • 0 new comments -
Broadcasting matmul is much slower than corresponding einsum
#110858 commented on
Jul 2, 2025 • 0 new comments -
torch._dynamo.mark_static_address refuses to work with nn.Parameter
#157221 commented on
Jul 2, 2025 • 0 new comments -
[Upstream Triton] persistent mm + tma accuracy failures
#156028 commented on
Jul 2, 2025 • 0 new comments -
Registering function that takes `const SymInt&` to op that accepts `SymInt` leads to cryptic error
#124645 commented on
Jul 2, 2025 • 0 new comments -
PyTorch Memory Management in GPU-to-CPU Transfers issue
#124487 commented on
Jul 2, 2025 • 0 new comments -
Gross mismatch in PDF between CUDA and CPU for multivariate Gaussian mixture models
#156959 commented on
Jul 2, 2025 • 0 new comments -
Tensorboard `add_video()` broken for `moviepy>=2.0`
#147317 commented on
Jul 2, 2025 • 0 new comments -
Export Huggingface models with StaticCache
#155862 commented on
Jun 30, 2025 • 0 new comments -
[ONNX] torch.nn.functional.interpolate \w antialias=True isn't op.Resize compatible
#157220 commented on
Jun 30, 2025 • 0 new comments -
`torch.ldexp` goes out of range when `2**other` is out of range
#153069 commented on
Jun 30, 2025 • 0 new comments -
Is compilation caching for NumPy operators not supported in PyTorch 2.7.1?
#156943 commented on
Jun 30, 2025 • 0 new comments -
Compilation issues with ROCm 6.4.1 on Debian 12
#155794 commented on
Jun 30, 2025 • 0 new comments -
Update epsilon logic to improve numerical stability
#151110 commented on
Jun 30, 2025 • 0 new comments -
DISABLED test_forward_generation (__main__.CudaGraphTreeTests)
#157058 commented on
Jun 30, 2025 • 0 new comments -
Windows Source Build Fails with OSError: [WinError 126] on aoti_custom_ops.dll for RTX 5080 (sm_120), Pre-built PyTorch Works
#157128 commented on
Jun 30, 2025 • 0 new comments -
QAT support for conv2d with groups > 1
#157222 commented on
Jun 30, 2025 • 0 new comments -
NVFp4 Cublas Error
#157054 commented on
Jun 30, 2025 • 0 new comments -
Setting up for development
#157141 commented on
Jun 30, 2025 • 0 new comments -
[CUDA][CUTLASS] test_cutlass_backend.py unit test failures on SM90+
#155888 commented on
Jun 30, 2025 • 0 new comments -
[feature request] Native checkpointing to/from `s3://`
#155992 commented on
Jun 30, 2025 • 0 new comments -
CMake improperly configures pybind11. 3 different versions of pybind11 in use at the sametime.
#156725 commented on
Jun 30, 2025 • 0 new comments -
Preload CUDA fails if CUDA libs in different PYTHONPATH
#147001 commented on
Jun 30, 2025 • 0 new comments -
[RFC] Integrate NCCL scalable init API
#136539 commented on
Jun 30, 2025 • 0 new comments -
Pypi Support for Windows arm64
#154260 commented on
Jun 30, 2025 • 0 new comments -
[Feature] Taylor expansion pruning
#157218 commented on
Jun 30, 2025 • 0 new comments -
`TorchScript` does not allow accessing methods of nested tensors
#156544 commented on
Jun 30, 2025 • 0 new comments -
Set dependencies lower bound
#156587 commented on
Jun 30, 2025 • 0 new comments -
[CD] Windows Wheel builds CUDA 12.9.1 Stack Overflow during build
#156181 commented on
Jun 30, 2025 • 0 new comments -
DISABLED test_forward_backward_not_called_backend_inductor (__main__.CudaGraphTreeTests)
#157035 commented on
Jun 30, 2025 • 0 new comments -
DISABLED test_remove_noop_view_dtype_cuda (__main__.GPUTests)
#151541 commented on
Jun 30, 2025 • 0 new comments -
DISABLED test_hessian_vectorize_raises_no_warnings_logging_tensor (__main__.TestAutogradFunctional)
#153644 commented on
Jun 30, 2025 • 0 new comments -
Unable to compile
#156915 commented on
Jun 30, 2025 • 0 new comments -
Libtorch segfault when used with libqpOASES
#33890 commented on
Jun 30, 2025 • 0 new comments -
Improve debug message for metadata guard failure
#157075 commented on
Jul 1, 2025 • 0 new comments -
[dynamo] torch.randint_like on DTensor does not work with compile
#156649 commented on
Jul 1, 2025 • 0 new comments -
[dynamic shapes] translation validation failure under `fake_tensor_propagate_real_tensors`
#156251 commented on
Jul 1, 2025 • 0 new comments -
FakeTensorUpdater doesn't support HOPs
#156819 commented on
Jul 1, 2025 • 0 new comments -
Convolution NN for complex numbers and more special functions
#116414 commented on
Jul 1, 2025 • 0 new comments -
torch.Tensor.is_sparse returns false for non-COO sparse tensors
#101385 commented on
Jul 1, 2025 • 0 new comments -
test_tensor_with_grad_to_scalar_warning failure
#157252 commented on
Jul 1, 2025 • 0 new comments -
DISABLED test_graph_partition (__main__.CudaGraphTreeTests)
#157173 commented on
Jul 1, 2025 • 0 new comments -
DISABLED test_function_compiled_multiple_times (__main__.CudaGraphTreeTests)
#157143 commented on
Jul 1, 2025 • 0 new comments -
DISABLED test_matmul_small_brute_force_tunableop_cuda_float32 (__main__.TestLinalgCUDA)
#141635 commented on
Jul 1, 2025 • 0 new comments -
[Windows] pytorch >= 2.5
#140875 commented on
Jul 1, 2025 • 0 new comments -
[ued][gemma3] HF + torch.compile - torch.compile on Gemma3
#149574 commented on
Jul 1, 2025 • 0 new comments -
DISABLED test_op_has_batch_rule_nn_functional_conv_transpose1d_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#142566 commented on
Jul 1, 2025 • 0 new comments -
DISABLED test_frozen_fn (__main__.CudaGraphTreeTests)
#157112 commented on
Jul 1, 2025 • 0 new comments -
CTCLoss gradient is incorrect
#52241 commented on
Jul 1, 2025 • 0 new comments -
UR Error when calling grid_sample
#153996 commented on
Jul 1, 2025 • 0 new comments -
[inductor][cpu] pyhpc_isoneutral_mixing, lennard_jones and pyhpc_equation_of_state performance regression in 2025-06-23 nightly release
#157077 commented on
Jul 1, 2025 • 0 new comments -
Torch RPC examples from docs say usage is deprecated.
#149393 commented on
Jul 1, 2025 • 0 new comments -
Documentation Clarification Needed for Clamping of Scale Coefficient in clip_grads_with_norm_
#151554 commented on
Jul 1, 2025 • 0 new comments -
torch.compile with mode = "max-autotune" breaks when starting from inference_mode
#135892 commented on
Jul 1, 2025 • 0 new comments -
UNSTABLE pull / cuda12.8-py3.10-gcc9-sm75 / test (pr_time_benchmarks)
#153987 commented on
Jul 1, 2025 • 0 new comments -
DISABLED test_forward_with_skipped_cudagraphed_backward (__main__.CudaGraphTreeTests)
#157086 commented on
Jul 1, 2025 • 0 new comments -
Export always give a value range with max length - 1
#156882 commented on
Jul 1, 2025 • 0 new comments -
Perf drop when running with FSDP and torch.compile
#156966 commented on
Jul 1, 2025 • 0 new comments -
DeviceMesh's `_set_mesh_dim_group_options` ineffective for 1-dim meshes
#156593 commented on
Jun 30, 2025 • 0 new comments -
[user triton] on-device TMA + AOTI causes IMA with pytorch 2.8 branch
#157240 commented on
Jun 30, 2025 • 0 new comments -
Fix `SequentialLR` deprecate warning about invoke `step(epoch)`
#149392 commented on
Jul 4, 2025 • 0 new comments -
NUMA Binding Integration with torchrun
#149334 commented on
Jul 2, 2025 • 0 new comments -
[cuDNN][SDPA] cuDNN SDPA refactor/cleanup, nested tensor backward, test priority bump for `sm90`, `sm100`
#149282 commented on
Jul 2, 2025 • 0 new comments -
Fix unexpected keyword argument 'mode' when calling `CompileCounterWithBackend`
#149271 commented on
Jul 6, 2025 • 0 new comments -
[test] test for keep going
#149003 commented on
Jul 1, 2025 • 0 new comments -
Fix AttributeError for `_get_vc_env` with setuptools>=75.9.0
#148847 commented on
Jul 6, 2025 • 0 new comments -
C++ support to print symbolic tensors as `Symbolic tensor: size=(...)`
#148846 commented on
Jul 3, 2025 • 0 new comments -
Trunk workflow for Windows Arm64
#148753 commented on
Jul 1, 2025 • 0 new comments -
Optimize AOTInductor: Caching, Reduced Decompositions, and Improved JSON Handling
#148616 commented on
Jul 1, 2025 • 0 new comments -
[BE][pytree] cleanup parameterized pytree tests
#148569 commented on
Jul 3, 2025 • 0 new comments -
[triton hash update] update the pinned triton hash
#148492 commented on
Jul 7, 2025 • 0 new comments -
[BE][pytree] rename argument name in register function to match the type annotations: `*_fn -> *_func`
#148484 commented on
Jul 3, 2025 • 0 new comments -
[BE][pytree] rename `NodeDef` member to match the type annotations: `*_fn -> *_func`
#148474 commented on
Jul 3, 2025 • 0 new comments -
[pytree] simplify public API exposition with `__module__`
#148328 commented on
Jul 3, 2025 • 0 new comments -
[pytree] add another simplified pytree module `torch.pytree`
#148180 commented on
Jul 3, 2025 • 0 new comments -
Support `contextlib.suppress`
#147990 commented on
Jul 2, 2025 • 0 new comments -
Update triton_heuristics.py
#147690 commented on
Jul 2, 2025 • 0 new comments -
removed zero dim cpu logic from fake_tensor.py
#147501 commented on
Jul 2, 2025 • 0 new comments -
Deprecate DataLoader pin_memory_device param
#146821 commented on
Jul 4, 2025 • 0 new comments -
Support contextlib.ExitStack
#146506 commented on
Jul 2, 2025 • 0 new comments -
Update quantile doc
#146485 commented on
Jul 1, 2025 • 0 new comments -
[dcp] Minor improvements to filesystem writer
#146273 commented on
Jul 5, 2025 • 0 new comments -
docs: change log to ln in Softplus function and class
#146199 commented on
Jul 1, 2025 • 0 new comments -
Avoid data-dependent errors by runtime assert substitution.
#145681 commented on
Jul 1, 2025 • 0 new comments -
Fix full_like decomposition to preserve strides
#144765 commented on
Jul 2, 2025 • 0 new comments -
[BE][PYFMT] remove `black`: finish `black -> ruff format` migration
#144557 commented on
Jul 5, 2025 • 0 new comments -
Deprecated pkg_resources and use distributions instead
#151915 commented on
Jun 30, 2025 • 0 new comments -
[reland][ROCm] remove caffe2 from hipify
#151845 commented on
Jul 1, 2025 • 0 new comments -
Horizontal
#151780 commented on
Jul 4, 2025 • 0 new comments -
enable windows inductor UT in CI
#151777 commented on
Jul 7, 2025 • 0 new comments -
Add adaptive_avg_pool2d input and output_size check
#151769 commented on
Jul 1, 2025 • 0 new comments -
Implement avg_pool3d for MPS backend
#151742 commented on
Jul 5, 2025 • 0 new comments -
Update OpenBLAS commit
#151547 commented on
Jul 2, 2025 • 0 new comments -
Implement fast exp for AVX2 and AVX512 for the flash attention
#151441 commented on
Jul 7, 2025 • 0 new comments -
Use Allocator API raw_allocate & raw_dealloc in CUDAAllocator
#151305 commented on
Jul 5, 2025 • 0 new comments -
[dynamo] Avoid unnecessary `.detach()` call in `_make_subclass` polyfill
#151265 commented on
Jul 5, 2025 • 0 new comments -
Implement MKLGenerator
#151218 commented on
Jul 4, 2025 • 0 new comments -
Fix `MaskedTensor` to device ignored mask
#151205 commented on
Jul 4, 2025 • 0 new comments -
TESTING: IGNORE
#151116 commented on
Jun 30, 2025 • 0 new comments -
[export] add runtime assert messages to python torch checks
#150719 commented on
Jul 5, 2025 • 0 new comments -
Make LazyModuleMixin materialize after load_state_dict
#150593 commented on
Jul 1, 2025 • 0 new comments -
Refactor CUDAAllocatorConfig to reuse AcceleratorAllocatorConfig
#150312 commented on
Jul 1, 2025 • 0 new comments -
Add differentiable ops hint message in Module docs
#150291 commented on
Jul 5, 2025 • 0 new comments -
softmax: add device check for xpu with half_to_float
#150278 commented on
Jul 3, 2025 • 0 new comments -
Add cmake variable USE_ROCM_CK
#150245 commented on
Jul 5, 2025 • 0 new comments -
[WIP][dynamic shapes] rewrite should_swap with guard_or_false
#150164 commented on
Jul 1, 2025 • 0 new comments -
AOTI freezing: fix test issues and enable by default
#149961 commented on
Jul 2, 2025 • 0 new comments -
DRAFT: Add TMA opt for concat function target hopper and blackwell arch
#149893 commented on
Jul 6, 2025 • 0 new comments -
Add SWA with a cyclical scheduler example
#149847 commented on
Jul 1, 2025 • 0 new comments -
Inductor logging + analysis of torch.profile
#149697 commented on
Jul 7, 2025 • 0 new comments -
Introduce AcceleratorAllocatorConfig as the common class
#149601 commented on
Jul 7, 2025 • 0 new comments -
[test] sccache docker build
#149536 commented on
Jul 6, 2025 • 0 new comments -
ROCm+gcc 15 asserts
#145608 commented on
Jul 5, 2025 • 0 new comments -
Make tlparse able to show a summary of distinct graph breaks
#153669 commented on
Jul 5, 2025 • 0 new comments -
I want to calculate the matrix multiplication of two Boolean matrices, but torch.mm will report an error. Is there any more efficient alternative?
#107041 commented on
Jul 5, 2025 • 0 new comments -
RendezvousConnectionError when use C10d on multi nodes
#69197 commented on
Jul 5, 2025 • 0 new comments -
Wrong error message for wrong dtypes in `torch.binomial`
#157195 commented on
Jul 5, 2025 • 0 new comments -
Run existing eager DTensor tests under torch.compile
#127772 commented on
Jul 5, 2025 • 0 new comments -
Trying to build from source with use_flash_attention fails on windows due to fatal error C1189
#134854 commented on
Jul 5, 2025 • 0 new comments -
Dead link in `torch.compile` docs
#119272 commented on
Jul 5, 2025 • 0 new comments -
[v.2.8.0] Release Tracker
#156745 commented on
Jul 4, 2025 • 0 new comments -
torch.compile does not work with Flash attention 3
#144540 commented on
Jul 4, 2025 • 0 new comments -
DISABLED test_simple_multi_arch_embed_kernel_binary_True_cuda (__main__.AOTInductorTestABICompatibleGpu)
#156930 commented on
Jul 4, 2025 • 0 new comments -
DISABLED test_assigning_back_deleter_fns_to_tensor (__main__.TestBlockStateAbsorption)
#134810 commented on
Jul 4, 2025 • 0 new comments -
DISABLED test_wait_tensor (__main__.CompileTest)
#148014 commented on
Jul 4, 2025 • 0 new comments -
DISABLED test_index (__main__.TestPythonBuiltinOP)
#119160 commented on
Jul 4, 2025 • 0 new comments -
Process never ends when sending tensors through multiprocessing queues in Python 3.12+ with filesystem strategy
#153050 commented on
Jul 4, 2025 • 0 new comments -
[dynamo] dynamo is unable to enter `except RuntimeError` while eager can
#157217 commented on
Jul 4, 2025 • 0 new comments -
DISABLED test_mempool_limited_memory_with_allocator (__main__.TestMemPool)
#157256 commented on
Jul 4, 2025 • 0 new comments -
Device Error on vmap
#151591 commented on
Jul 4, 2025 • 0 new comments -
Segmentation fault in torch.repeat_interleave
#157097 commented on
Jul 4, 2025 • 0 new comments -
Floating point exception in torch.nn.functional.conv_transpose3d
#157098 commented on
Jul 4, 2025 • 0 new comments -
[FSDP2] allow different dtypes for the model params with gradients
#156784 commented on
Jul 4, 2025 • 0 new comments -
flex_attention + dynamic=True with large batch or heads causes Triton Error [CUDA]: invalid argument
#157018 commented on
Jul 3, 2025 • 0 new comments -
Documentation: explaining the STFT formula
#153531 commented on
Jul 3, 2025 • 0 new comments -
[Tracker] AutoParallel's feature request to DTensor
#156217 commented on
Jul 3, 2025 • 0 new comments -
Unexpected, batch size and device dependent NaN propagation in Conv1d
#157237 commented on
Jul 3, 2025 • 0 new comments -
`RuntimeError: UR error` with XPU
#149953 commented on
Jul 3, 2025 • 0 new comments -
[BE][PYFMT] migrate PYFMT for `test/[i-z]*/` to `ruff format`
#144556 commented on
Jul 5, 2025 • 0 new comments -
[BE][PYFMT] migrate PYFMT for `torch/[p-z]*/` to `ruff format`
#144552 commented on
Jul 5, 2025 • 0 new comments -
[dynamo, nested graph breaks] add nested graph break tests
#144516 commented on
Jul 2, 2025 • 0 new comments -
Add where_ ops
#143636 commented on
Jul 5, 2025 • 0 new comments -
[Draft][WIP] Enable XPU path for FlexAttention
#143553 commented on
Jul 7, 2025 • 0 new comments -
Fix `USE_STATIC_MKL` lost functionality
#138996 commented on
Jul 1, 2025 • 0 new comments -
Always produce XML
#138513 commented on
Jul 5, 2025 • 0 new comments -
Add DeviceAllocator as the base device allocator
#138222 commented on
Jul 4, 2025 • 0 new comments -
[pytree] add `treespec_{leaf,tuple,dict}` functions for args_spec modification
#138214 commented on
Jul 3, 2025 • 0 new comments -
[pytree] Add public pytree module `torch.utils.pytree`
#137400 commented on
Jul 3, 2025 • 0 new comments -
Feature: Implement support for `cudnn_batch_norm_out` kernel to replace the autogen approach.
#123020 commented on
Jul 4, 2025 • 0 new comments -
Automated submodule update: FBGEMM
#115316 commented on
Jul 4, 2025 • 0 new comments -
[pytree] support PyStructSequence types for Python pytree
#113258 commented on
Jul 3, 2025 • 0 new comments -
Automated submodule update: kineto
#106149 commented on
Jul 2, 2025 • 0 new comments -
[WIP][RFC] Compilable flex_attention + Context Parallel
#157015 commented on
Jul 7, 2025 • 0 new comments -
DISABLED test_dont_aggressively_write_assert (__main__.ReproTests)
#156570 commented on
Jul 7, 2025 • 0 new comments -
DISABLED test_inductor_reduce_scatter_tensor_coalesced (__main__.CompileTest)
#147887 commented on
Jul 7, 2025 • 0 new comments -
mps and cpu backends produce different training results with FFT and Adam
#151740 commented on
Jul 6, 2025 • 0 new comments -
[ONNX] Create a tutorial for exporting hf transformers model
#156258 commented on
Jul 6, 2025 • 0 new comments -
Add `is_outputs_batched` param to `autograd.grad`
#156616 commented on
Jul 6, 2025 • 0 new comments -
MPS operator coverage tracking issue (2.6+ version)
#141287 commented on
Jul 6, 2025 • 0 new comments -
ImportError: libcupti.so.11.2: cannot open shared object file: No such file or directory
#88802 commented on
Jul 6, 2025 • 0 new comments -
Migrating existing backend-MAIA integration toward PrivateUse1 / openReg
#155864 commented on
Jul 6, 2025 • 0 new comments -
Flex Attention is incompatible with selective AC
#147879 commented on
Jul 6, 2025 • 0 new comments -
Pipeline Parallelism Fails when stage input does not produce gradients in all stages.
#152827 commented on
Jul 6, 2025 • 0 new comments -
General MPS op coverage tracking issue
#77764 commented on
Jul 6, 2025 • 0 new comments