-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Fix pybind enum efficiency issue in return_and_correct_aliasing #161315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Scanning a list of pybind enums with `in` is slow. See NOTE in code for full explanation. This is a significant optimization; will be updating the torchdispatch/return_and_correct_aliasing portion of this stack with benchmark and results soonish. [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161315
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit ba4c3f7 with merge base 05eeb29 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Scanning a list of pybind enums with `in` is slow. See NOTE in code for full explanation. This is a significant optimization; will be updating the torchdispatch/return_and_correct_aliasing portion of this stack with benchmark and results soonish. ghstack-source-id: df8fc09 Pull Request resolved: #161315
…asing" Scanning a list of pybind enums with `in` is slow. See NOTE in code for full explanation. This is a significant optimization; will be updating the torchdispatch/return_and_correct_aliasing portion of this stack with benchmark and results soonish. [ghstack-poisoned]
|
Starting merge as part of PR stack under #161317 |
| ] | ||
| schema_info = SchemaInfo(args=arg_schemas, outs=out_schemas) | ||
| schema_info = SchemaInfo( | ||
| args=arg_schemas, outs=out_schemas, int_tags=[int(x) for x in func.tags] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be reasonable to move this into a SchemaInfo.__post_init__, since this is just a perf optimization that can be inferred from the other args? Or are you worried about post_init itself being slow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you worried about post_init itself being slow?
Not specifically, more that I had never heard of __post_init__ before working on this stack.
I would rather execute fewer Python functions than more of them, though, and unless I'm wrong about SchemaInfo being confined to this file I'd rather not compromise performance for this one-callsite class.
…asing" Scanning a list of pybind enums with `in` is slow. See NOTE in code for full explanation. This is a significant optimization; will be updating the torchdispatch/return_and_correct_aliasing portion of this stack with benchmark and results soonish. [ghstack-poisoned]
…asing" Scanning a list of pybind enums with `in` is slow. See NOTE in code for full explanation. This is a significant optimization; will be updating the torchdispatch/return_and_correct_aliasing portion of this stack with benchmark and results soonish. [ghstack-poisoned]
|
Starting merge as part of PR stack under #161432 |
1 similar comment
|
Starting merge as part of PR stack under #161432 |
…61317) This assertion was expensive because of is_traceable_wrapper_subclass. Finding a cheap check to run first that's likely to let us skip the rest seems to improve things significantly. Pull Request resolved: #161317 Approved by: https://github.com/ezyang, https://github.com/XilunWu, https://github.com/bdhirsh ghstack dependencies: #161301, #161292, #161304, #161308, #161315
`auto` forces a copy. Confirmed this did something noticable with perf. Pull Request resolved: #161329 Approved by: https://github.com/zpcore, https://github.com/fduwjj, https://github.com/Skylion007, https://github.com/bdhirsh ghstack dependencies: #161301, #161292, #161304, #161308, #161315, #161317, #161328
If we want them interned, we should intern at callsites. (The numpy reference has bit rotted; see numpy/numpy@b222eb6#diff-6bdb6105198083838f51c57b55b3a49472ed23043bb40018f1ea41138e687163) Profiling a simple torchdispatch benchmark with perf before/after seems to show that time spent copying std::strings and interning Python strings is gone, though there is some noise and the improvement is very small. Pull Request resolved: #161432 Approved by: https://github.com/ezyang ghstack dependencies: #161301, #161292, #161304, #161308, #161315, #161317, #161328, #161329
) symbools are not identical with Py_True or PyFalse, so we can do those cheap checks first and at least get plain old bools to go fast. Pull Request resolved: #161455 Approved by: https://github.com/Skylion007 ghstack dependencies: #161301, #161292, #161304, #161308, #161315, #161317, #161328, #161329, #161432
…rch#161315) Scanning a list of pybind enums with `in` is slow. See NOTE in code for full explanation. This is a significant optimization; will be updating the torchdispatch/return_and_correct_aliasing portion of this stack with benchmark and results soonish. Pull Request resolved: pytorch#161315 Approved by: https://github.com/Skylion007, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308
…torch#161317) This assertion was expensive because of is_traceable_wrapper_subclass. Finding a cheap check to run first that's likely to let us skip the rest seems to improve things significantly. Pull Request resolved: pytorch#161317 Approved by: https://github.com/ezyang, https://github.com/XilunWu, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315
…61328) Not a huge cost, but free win is free. Pull Request resolved: pytorch#161328 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317
`auto` forces a copy. Confirmed this did something noticable with perf. Pull Request resolved: pytorch#161329 Approved by: https://github.com/zpcore, https://github.com/fduwjj, https://github.com/Skylion007, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328
…h#161432) If we want them interned, we should intern at callsites. (The numpy reference has bit rotted; see numpy/numpy@b222eb6#diff-6bdb6105198083838f51c57b55b3a49472ed23043bb40018f1ea41138e687163) Profiling a simple torchdispatch benchmark with perf before/after seems to show that time spent copying std::strings and interning Python strings is gone, though there is some noise and the improvement is very small. Pull Request resolved: pytorch#161432 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328, pytorch#161329
…rch#161455) symbools are not identical with Py_True or PyFalse, so we can do those cheap checks first and at least get plain old bools to go fast. Pull Request resolved: pytorch#161455 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328, pytorch#161329, pytorch#161432
…rch#161315) Scanning a list of pybind enums with `in` is slow. See NOTE in code for full explanation. This is a significant optimization; will be updating the torchdispatch/return_and_correct_aliasing portion of this stack with benchmark and results soonish. Pull Request resolved: pytorch#161315 Approved by: https://github.com/Skylion007, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308
…torch#161317) This assertion was expensive because of is_traceable_wrapper_subclass. Finding a cheap check to run first that's likely to let us skip the rest seems to improve things significantly. Pull Request resolved: pytorch#161317 Approved by: https://github.com/ezyang, https://github.com/XilunWu, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315
…61328) Not a huge cost, but free win is free. Pull Request resolved: pytorch#161328 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317
`auto` forces a copy. Confirmed this did something noticable with perf. Pull Request resolved: pytorch#161329 Approved by: https://github.com/zpcore, https://github.com/fduwjj, https://github.com/Skylion007, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328
…h#161432) If we want them interned, we should intern at callsites. (The numpy reference has bit rotted; see numpy/numpy@b222eb6#diff-6bdb6105198083838f51c57b55b3a49472ed23043bb40018f1ea41138e687163) Profiling a simple torchdispatch benchmark with perf before/after seems to show that time spent copying std::strings and interning Python strings is gone, though there is some noise and the improvement is very small. Pull Request resolved: pytorch#161432 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328, pytorch#161329
…rch#161455) symbools are not identical with Py_True or PyFalse, so we can do those cheap checks first and at least get plain old bools to go fast. Pull Request resolved: pytorch#161455 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328, pytorch#161329, pytorch#161432
Stack from ghstack (oldest at bottom):
is, not ==, to check exact type matches in _python_dispatch #161304Scanning a list of pybind enums with
inis slow. See NOTE in code for full explanation.This is a significant optimization; will be updating the torchdispatch/return_and_correct_aliasing portion of this stack with benchmark and results soonish.