Skip to content

[aarch64] update the Primtive cache logic for ACL fixed format kernels #261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 1, 2023

Conversation

snadampal
Copy link

oneDNN3.3 enables ACL fixed format kernel support on aarch64 which don't need the workaround for the weights handling during the caching. This PR is to update the logic to remove weights hash checking for the primitive caching.

@snadampal
Copy link
Author

Hi @yanbing-j, this is another PR required for PyTorch 2.2. same as #257, resubmitting on the correct branch. Could you please merge this before the ideep branch is updated in PyTorch. Thank you!

@yanbing-j
Copy link

Hi @snadampal , thanks for the commit. If #257 is a replicated one, please close that PR.

@yanbing-j
Copy link

Hi @jgong5, could you please help review this PR?

Copy link

@jgong5 jgong5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we still keep separate path for aarch64 and x86? Can we use the same code now?

Comment on lines +1348 to 1352
#ifdef __aarch64__
return tensor::desc(pd.first.weights_desc(), groups);
#else
return tensor::desc(pd.weights_desc(), groups);
#endif
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we share the same logic now?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jgong5 , thanks for the review. yes, we still need separate logic because of the need for caching the primitive on aarch64 vs primitive descriptor alone on x86. The fixed format kernels enabled us with handling the weights outside, but they are still mutable. So, the state is maintained as the primitive resource, and caching the whole primitive is the only way to avoid redundant resource creation.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate more on the difference of onednn primitives between aarch64 and x86. What resources need to be created?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on x86, the kernels are initialized during the primitive init and will be cached as part of oneDNN primitive caching. Where as on aarch64, kernels are initialized during the resource mapper init, this is because the kernels are mutable. on aarch64, ACL kernel initialization involves workspace buffer allocations which are time consuming. This PR has more details about why we can't use oneDNN primitive caching on aarch64 yet, and using ideep operator level caching is the only option.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Thanks for the detailed info. Do you have plan to make ACL immutable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, @jgong5 , we are discussing this with the ACL community.

@yanbing-j yanbing-j requested a review from jgong5 November 30, 2023 00:32
@snadampal
Copy link
Author

Hi @yanbing-j , the PR is approved, can you please merge it before the ideep submodule is updated for PyTorch2.2. Thank you!

@yanbing-j yanbing-j merged commit d0c2278 into intel:ideep_pytorch Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants