-
Notifications
You must be signed in to change notification settings - Fork 93
[aarch64] update the Primtive cache logic for ACL fixed format kernels #261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[aarch64] update the Primtive cache logic for ACL fixed format kernels #261
Conversation
Hi @yanbing-j, this is another PR required for PyTorch 2.2. same as #257, resubmitting on the correct branch. Could you please merge this before the ideep branch is updated in PyTorch. Thank you! |
Hi @snadampal , thanks for the commit. If #257 is a replicated one, please close that PR. |
Hi @jgong5, could you please help review this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we still keep separate path for aarch64 and x86? Can we use the same code now?
#ifdef __aarch64__ | ||
return tensor::desc(pd.first.weights_desc(), groups); | ||
#else | ||
return tensor::desc(pd.weights_desc(), groups); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we share the same logic now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jgong5 , thanks for the review. yes, we still need separate logic because of the need for caching the primitive on aarch64 vs primitive descriptor alone on x86. The fixed format kernels enabled us with handling the weights outside, but they are still mutable. So, the state is maintained as the primitive resource, and caching the whole primitive is the only way to avoid redundant resource creation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate more on the difference of onednn primitives between aarch64 and x86. What resources need to be created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on x86, the kernels are initialized during the primitive init and will be cached as part of oneDNN primitive caching. Where as on aarch64, kernels are initialized during the resource mapper init, this is because the kernels are mutable. on aarch64, ACL kernel initialization involves workspace buffer allocations which are time consuming. This PR has more details about why we can't use oneDNN primitive caching on aarch64 yet, and using ideep operator level caching is the only option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Thanks for the detailed info. Do you have plan to make ACL immutable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, @jgong5 , we are discussing this with the ACL community.
Hi @yanbing-j , the PR is approved, can you please merge it before the ideep submodule is updated for PyTorch2.2. Thank you! |
oneDNN3.3 enables ACL fixed format kernel support on aarch64 which don't need the workaround for the weights handling during the caching. This PR is to update the logic to remove weights hash checking for the primitive caching.