Skip to content

[AArch64] Use SVE XAR for fixed-length operations. #139229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
davemgreen opened this issue May 9, 2025 · 4 comments · May be fixed by #139460
Open

[AArch64] Use SVE XAR for fixed-length operations. #139229

davemgreen opened this issue May 9, 2025 · 4 comments · May be fixed by #139460
Labels
backend:AArch64 missed-optimization SVE ARM Scalable Vector Extensions

Comments

@davemgreen
Copy link
Collaborator

There is a Neon SHA3 v2i64 XAR operation, but not for v4i32, v8i16 and v16i8. If sve2-sha3 is available we can use the SVE instructions instead.

https://godbolt.org/z/9hdqKoWMx (G1 and F1 are already OK).
vs with scalable vectors: https://godbolt.org/z/GhazeoaWY

https://godbolt.org/z/fejTchexj

typedef char __attribute__ ((vector_size (16))) v16qi;
typedef unsigned short __attribute__ ((vector_size (16))) v8hi;
typedef unsigned int __attribute__ ((vector_size (16))) v4si;
typedef unsigned long long __attribute__ ((vector_size (16))) v2di;
typedef char __attribute__ ((vector_size (8))) v8qi;
typedef unsigned short __attribute__ ((vector_size (8))) v4hi;
typedef unsigned int __attribute__ ((vector_size (8))) v2si;

v2di
G1 (v2di r) {
    return (r >> 39) | (r << 25);
}

v4si
G2 (v4si r) {
    return (r >> 23) | (r << 9);
}

v8hi
G3 (v8hi r) {
    return (r >> 5) | (r << 11);
}

v16qi
G4 (v16qi r)
{
  return (r << 2) | (r >> 6);
}

v2si
G5 (v2si r) {
    return (r >> 22) | (r << 10);
}

v4hi
G6 (v4hi r) {
    return (r >> 7) | (r << 9);
}

v8qi
G7 (v8qi r)
{
  return (r << 3) | (r >> 5);
}

See #137162, this is an extension to that issue.

@llvmbot
Copy link
Member

llvmbot commented May 9, 2025

@llvm/issue-subscribers-backend-aarch64

Author: David Green (davemgreen)

There is a Neon SHA3 v2i64 XAR operation, but not for v4i32, v8i16 and v16i8. If sve2-sha3 is available we can use the SVE instructions instead.

https://godbolt.org/z/9hdqKoWMx (G1 and F1 are already OK).
vs with scalable vectors: https://godbolt.org/z/GhazeoaWY

https://godbolt.org/z/fejTchexj

typedef char __attribute__ ((vector_size (16))) v16qi;
typedef unsigned short __attribute__ ((vector_size (16))) v8hi;
typedef unsigned int __attribute__ ((vector_size (16))) v4si;
typedef unsigned long long __attribute__ ((vector_size (16))) v2di;
typedef char __attribute__ ((vector_size (8))) v8qi;
typedef unsigned short __attribute__ ((vector_size (8))) v4hi;
typedef unsigned int __attribute__ ((vector_size (8))) v2si;

v2di
G1 (v2di r) {
    return (r &gt;&gt; 39) | (r &lt;&lt; 25);
}

v4si
G2 (v4si r) {
    return (r &gt;&gt; 23) | (r &lt;&lt; 9);
}

v8hi
G3 (v8hi r) {
    return (r &gt;&gt; 5) | (r &lt;&lt; 11);
}

v16qi
G4 (v16qi r)
{
  return (r &lt;&lt; 2) | (r &gt;&gt; 6);
}

v2si
G5 (v2si r) {
    return (r &gt;&gt; 22) | (r &lt;&lt; 10);
}

v4hi
G6 (v4hi r) {
    return (r &gt;&gt; 7) | (r &lt;&lt; 9);
}

v8qi
G7 (v8qi r)
{
  return (r &lt;&lt; 3) | (r &gt;&gt; 5);
}

See #137162, this is an extension to that issue.

@Rajveer100
Copy link
Contributor

The third link needs all architectural features to be enabled, else we don't see the transformation for G1:

https://godbolt.org/z/hWqa5T9z3

@Rajveer100
Copy link
Contributor

Rajveer100 commented May 10, 2025

The first two links also need correction:

-mtriple=aarch64 -mattr=+sve2,+sve2-sha3,+sha3

=aarch64 is needed for the transformation.

https://godbolt.org/z/oGPKnYrob

https://godbolt.org/z/neszhddn5

@davemgreen
Copy link
Collaborator Author

Godbolt uses a cache and sometimes need to refresh when changes happen to trunk. There is a little refresh button on the bottom left of the compilation window (it should not be cached to the updated version).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 missed-optimization SVE ARM Scalable Vector Extensions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants