Skip to content

Calls to sin() don't get converted to sinf, leading to inefficient use of vector variants with -fveclib=ArmPL #139044

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
willlovett-arm opened this issue May 8, 2025 · 4 comments

Comments

@willlovett-arm
Copy link

https://godbolt.org/z/fE16nGPxP

#include <math.h>

void foo(float *p)
{
for (int x=0; x<999; ++x) {
    p[x] = sin(p[x]);
  }
}

generates calls to armpl_vsinq_f64 instead of armpl_vsinq_f32.

If we replace that code with sinf it works fine:

#include <math.h>

void foo(float *p)
{
for (int x=0; x<999; ++x) {
    p[x] = sinf(p[x]);
  }
}

I'm reasonably sure it's legitimate to implicitly switch to sinf(), and this would lead to double the vector throughput.

@willlovett-arm willlovett-arm added backend:AArch64 SVE ARM Scalable Vector Extensions labels May 8, 2025
@llvmbot
Copy link
Member

llvmbot commented May 8, 2025

@llvm/issue-subscribers-backend-aarch64

Author: Will Lovett (willlovett-arm)

https://godbolt.org/z/fE16nGPxP
#include &lt;math.h&gt;

void foo(float *p)
{
for (int x=0; x&lt;999; ++x) {
    p[x] = sin(p[x]);
  }
}

generates calls to armpl_vsinq_f64 instead of armpl_vsinq_f32.

If we replace that code with sinf it works fine:

#include &lt;math.h&gt;

void foo(float *p)
{
for (int x=0; x&lt;999; ++x) {
    p[x] = sinf(p[x]);
  }
}

I'm reasonably sure it's legitimate to implicitly switch to sinf(), and this would lead to double the vector throughput.

@davemgreen
Copy link
Collaborator

davemgreen commented May 8, 2025

We do this if the function is a @sin, but not for @llvm.sin intrinsics.
https://godbolt.org/z/MTKefx5Yr

@davemgreen davemgreen added llvm:instcombine floating-point Floating-point math and removed backend:AArch64 SVE ARM Scalable Vector Extensions labels May 8, 2025
guy-david added a commit that referenced this issue May 9, 2025
This optimization already exists, but for the libcall versions of these
functions and not for their intrinsic form.
Solves #139044.

There are probably more opportunities for other intrinsics, because the
switch-case in `LibCallSimplifier::optimizeCall` covers only `pow`,
`exp2`, `log`, `log2`, `log10`, `sqrt`, `memset`, `memcpy` and
`memmove`.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this issue May 9, 2025
…wed (#139082)

This optimization already exists, but for the libcall versions of these
functions and not for their intrinsic form.
Solves llvm/llvm-project#139044.

There are probably more opportunities for other intrinsics, because the
switch-case in `LibCallSimplifier::optimizeCall` covers only `pow`,
`exp2`, `log`, `log2`, `log10`, `sqrt`, `memset`, `memcpy` and
`memmove`.
@fhahn
Copy link
Contributor

fhahn commented May 9, 2025

Should be fixed by #139082

@willlovett-arm
Copy link
Author

Amazing turnaround! Thanks @guy-david , all 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants