Fix pybind enum efficiency issue in return_and_correct_aliasing #161315

swolchok · 2025-08-22T22:02:37Z

Stack from ghstack (oldest at bottom):

Scanning a list of pybind enums with in is slow. See NOTE in code for full explanation.

This is a significant optimization; will be updating the torchdispatch/return_and_correct_aliasing portion of this stack with benchmark and results soonish.

Scanning a list of pybind enums with `in` is slow. See NOTE in code for full explanation. This is a significant optimization; will be updating the torchdispatch/return_and_correct_aliasing portion of this stack with benchmark and results soonish. [ghstack-poisoned]

pytorch-bot · 2025-08-22T22:02:41Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161315

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ROCm MI2xx CI/CD workflows failing due to : download from https://api.github.com/repos/pytorch/pytorch timed out.

✅ No Failures

As of commit ba4c3f7 with merge base 05eeb29 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Scanning a list of pybind enums with `in` is slow. See NOTE in code for full explanation. This is a significant optimization; will be updating the torchdispatch/return_and_correct_aliasing portion of this stack with benchmark and results soonish. ghstack-source-id: df8fc09 Pull Request resolved: #161315

…asing" Scanning a list of pybind enums with `in` is slow. See NOTE in code for full explanation. This is a significant optimization; will be updating the torchdispatch/return_and_correct_aliasing portion of this stack with benchmark and results soonish. [ghstack-poisoned]

pytorchmergebot · 2025-08-28T07:10:08Z

Starting merge as part of PR stack under #161317

bdhirsh · 2025-08-29T19:46:04Z

torch/utils/_python_dispatch.py

        ]
-    schema_info = SchemaInfo(args=arg_schemas, outs=out_schemas)
+    schema_info = SchemaInfo(
+        args=arg_schemas, outs=out_schemas, int_tags=[int(x) for x in func.tags]


would it be reasonable to move this into a SchemaInfo.__post_init__, since this is just a perf optimization that can be inferred from the other args? Or are you worried about post_init itself being slow?

are you worried about post_init itself being slow?

Not specifically, more that I had never heard of __post_init__ before working on this stack.

I would rather execute fewer Python functions than more of them, though, and unless I'm wrong about SchemaInfo being confined to this file I'd rather not compromise performance for this one-callsite class.

…asing" Scanning a list of pybind enums with `in` is slow. See NOTE in code for full explanation. This is a significant optimization; will be updating the torchdispatch/return_and_correct_aliasing portion of this stack with benchmark and results soonish. [ghstack-poisoned]

pytorchmergebot · 2025-08-30T00:30:49Z

Starting merge as part of PR stack under #161432

pytorchmergebot · 2025-08-30T05:51:20Z

Starting merge as part of PR stack under #161432

…61317) This assertion was expensive because of is_traceable_wrapper_subclass. Finding a cheap check to run first that's likely to let us skip the rest seems to improve things significantly. Pull Request resolved: #161317 Approved by: https://github.com/ezyang, https://github.com/XilunWu, https://github.com/bdhirsh ghstack dependencies: #161301, #161292, #161304, #161308, #161315

Not a huge cost, but free win is free. Pull Request resolved: #161328 Approved by: https://github.com/Skylion007 ghstack dependencies: #161301, #161292, #161304, #161308, #161315, #161317

`auto` forces a copy. Confirmed this did something noticable with perf. Pull Request resolved: #161329 Approved by: https://github.com/zpcore, https://github.com/fduwjj, https://github.com/Skylion007, https://github.com/bdhirsh ghstack dependencies: #161301, #161292, #161304, #161308, #161315, #161317, #161328

If we want them interned, we should intern at callsites. (The numpy reference has bit rotted; see numpy/numpy@b222eb6#diff-6bdb6105198083838f51c57b55b3a49472ed23043bb40018f1ea41138e687163) Profiling a simple torchdispatch benchmark with perf before/after seems to show that time spent copying std::strings and interning Python strings is gone, though there is some noise and the improvement is very small. Pull Request resolved: #161432 Approved by: https://github.com/ezyang ghstack dependencies: #161301, #161292, #161304, #161308, #161315, #161317, #161328, #161329

) symbools are not identical with Py_True or PyFalse, so we can do those cheap checks first and at least get plain old bools to go fast. Pull Request resolved: #161455 Approved by: https://github.com/Skylion007 ghstack dependencies: #161301, #161292, #161304, #161308, #161315, #161317, #161328, #161329, #161432

…rch#161315) Scanning a list of pybind enums with `in` is slow. See NOTE in code for full explanation. This is a significant optimization; will be updating the torchdispatch/return_and_correct_aliasing portion of this stack with benchmark and results soonish. Pull Request resolved: pytorch#161315 Approved by: https://github.com/Skylion007, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308

…torch#161317) This assertion was expensive because of is_traceable_wrapper_subclass. Finding a cheap check to run first that's likely to let us skip the rest seems to improve things significantly. Pull Request resolved: pytorch#161317 Approved by: https://github.com/ezyang, https://github.com/XilunWu, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315

…61328) Not a huge cost, but free win is free. Pull Request resolved: pytorch#161328 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317

`auto` forces a copy. Confirmed this did something noticable with perf. Pull Request resolved: pytorch#161329 Approved by: https://github.com/zpcore, https://github.com/fduwjj, https://github.com/Skylion007, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328

…h#161432) If we want them interned, we should intern at callsites. (The numpy reference has bit rotted; see numpy/numpy@b222eb6#diff-6bdb6105198083838f51c57b55b3a49472ed23043bb40018f1ea41138e687163) Profiling a simple torchdispatch benchmark with perf before/after seems to show that time spent copying std::strings and interning Python strings is gone, though there is some noise and the improvement is very small. Pull Request resolved: pytorch#161432 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328, pytorch#161329

…rch#161455) symbools are not identical with Py_True or PyFalse, so we can do those cheap checks first and at least get plain old bools to go fast. Pull Request resolved: pytorch#161455 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328, pytorch#161329, pytorch#161432

…rch#161315) Scanning a list of pybind enums with `in` is slow. See NOTE in code for full explanation. This is a significant optimization; will be updating the torchdispatch/return_and_correct_aliasing portion of this stack with benchmark and results soonish. Pull Request resolved: pytorch#161315 Approved by: https://github.com/Skylion007, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308

…torch#161317) This assertion was expensive because of is_traceable_wrapper_subclass. Finding a cheap check to run first that's likely to let us skip the rest seems to improve things significantly. Pull Request resolved: pytorch#161317 Approved by: https://github.com/ezyang, https://github.com/XilunWu, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315

…61328) Not a huge cost, but free win is free. Pull Request resolved: pytorch#161328 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317

`auto` forces a copy. Confirmed this did something noticable with perf. Pull Request resolved: pytorch#161329 Approved by: https://github.com/zpcore, https://github.com/fduwjj, https://github.com/Skylion007, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328

…h#161432) If we want them interned, we should intern at callsites. (The numpy reference has bit rotted; see numpy/numpy@b222eb6#diff-6bdb6105198083838f51c57b55b3a49472ed23043bb40018f1ea41138e687163) Profiling a simple torchdispatch benchmark with perf before/after seems to show that time spent copying std::strings and interning Python strings is gone, though there is some noise and the improvement is very small. Pull Request resolved: pytorch#161432 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328, pytorch#161329

…rch#161455) symbools are not identical with Py_True or PyFalse, so we can do those cheap checks first and at least get plain old bools to go fast. Pull Request resolved: pytorch#161455 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328, pytorch#161329, pytorch#161432

This was referenced Aug 22, 2025

Fix OpSchema equality check #161231

Closed

Use comparison key in OpSchema to avoid duplicate work between __hash__ and __eq__ #161234

Closed

Minor cleanup of DeviceMesh.__eq__ #161235

Closed

Skylion007 approved these changes Aug 22, 2025

View reviewed changes

swolchok mentioned this pull request Aug 22, 2025

Improve assert perf in _python_dispatch._correct_storage_aliasing #161317

Closed

swolchok requested review from XilunWu, ezyang and zpcore August 22, 2025 22:31

This was referenced Aug 23, 2025

Avoid double hash lookup in torch._library.simple_registry #161328

Closed

Fix accidental copy in pushPyOutToStack #161329

Closed

This was referenced Aug 28, 2025

Remove unnecessary asserts in _correct_storage_aliasing #161694

Closed

Port OpSchema.__post_init__ and OpSchema._recompute_comparison_key to C++ #161695

Closed

bdhirsh reviewed Aug 29, 2025

View reviewed changes

bdhirsh approved these changes Aug 29, 2025

View reviewed changes

swolchok added 2 commits August 29, 2025 14:32

pytorchmergebot added the Merged label Aug 30, 2025

pytorchmergebot closed this in 0c459f2 Aug 30, 2025

github-actions bot deleted the gh/swolchok/800/head branch September 30, 2025 02:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix pybind enum efficiency issue in return_and_correct_aliasing #161315

Fix pybind enum efficiency issue in return_and_correct_aliasing #161315

Uh oh!

swolchok commented Aug 22, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 22, 2025 •

edited

Loading

Uh oh!

pytorchmergebot commented Aug 28, 2025

Uh oh!

bdhirsh Aug 29, 2025

Uh oh!

swolchok Aug 30, 2025

Uh oh!

pytorchmergebot commented Aug 30, 2025

Uh oh!

pytorchmergebot commented Aug 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix pybind enum efficiency issue in return_and_correct_aliasing #161315

Fix pybind enum efficiency issue in return_and_correct_aliasing #161315

Uh oh!

Conversation

swolchok commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161315

❗ 1 Active SEVs

✅ No Failures

Uh oh!

pytorchmergebot commented Aug 28, 2025

Uh oh!

bdhirsh Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

swolchok Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

pytorchmergebot commented Aug 30, 2025

Uh oh!

pytorchmergebot commented Aug 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

swolchok commented Aug 22, 2025 •

edited

Loading

pytorch-bot bot commented Aug 22, 2025 •

edited

Loading