-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[SimplifyLibCalls] Shrink sin, cos to sinf, cosf when allowed #139082
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Guy David (guy-david) ChangesThis optimization already exists, but for the libcall versions of these functions and not for their intrinsic form. Full diff: https://github.com/llvm/llvm-project/pull/139082.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp b/llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
index 941e787f91eff..94a79ad824370 100644
--- a/llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
@@ -4136,6 +4136,11 @@ Value *LibCallSimplifier::optimizeCall(CallInst *CI, IRBuilderBase &Builder) {
return optimizeMemCpy(CI, Builder);
case Intrinsic::memmove:
return optimizeMemMove(CI, Builder);
+ case Intrinsic::sin:
+ case Intrinsic::cos:
+ if (UnsafeFPShrink)
+ return optimizeUnaryDoubleFP(CI, Builder, TLI, /*isPrecise=*/true);
+ return nullptr;
default:
return nullptr;
}
diff --git a/llvm/test/Transforms/InstCombine/simplify-intrinsics.ll b/llvm/test/Transforms/InstCombine/simplify-intrinsics.ll
new file mode 100644
index 0000000000000..8d7f70b6d1a61
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/simplify-intrinsics.ll
@@ -0,0 +1,61 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt < %s -passes=instcombine -S | FileCheck %s --check-prefixes=ANY,NO-FLOAT-SHRINK
+; RUN: opt < %s -passes=instcombine -enable-double-float-shrink -S | FileCheck %s --check-prefixes=ANY,DO-FLOAT-SHRINK
+
+target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
+
+declare double @llvm.cos.f64(double)
+declare float @llvm.cos.f32(float)
+
+declare double @llvm.sin.f64(double)
+declare float @llvm.sin.f32(float)
+
+; cos -> cosf
+
+; ANY-LABEL: define float @cos_no_fastmath(float %f) {
+; NO-FLOAT-SHRINK: call double @llvm.cos.f64
+; DO-FLOAT-SHRINK: %[[RES:[0-9]+]] = call float @llvm.cos.f32
+; DO-FLOAT-SHRINK: ret float %[[RES]]
+; ANY: }
+define float @cos_no_fastmath(float %f) {
+ %d = fpext float %f to double
+ %result = call double @llvm.cos.f64(double %d)
+ %truncated_result = fptrunc double %result to float
+ ret float %truncated_result
+}
+
+; ANY-LABEL: define float @cos_fastmath(float %f) {
+; ANY: %[[RES:[0-9]+]] = call fast float @llvm.cos.f32
+; ANY: ret float %[[RES]]
+; ANY: }
+define float @cos_fastmath(float %f) {
+ %d = fpext fast float %f to double
+ %result = call fast double @llvm.cos.f64(double %d)
+ %truncated_result = fptrunc fast double %result to float
+ ret float %truncated_result
+}
+
+; sin -> sinf
+
+; ANY-LABEL: define float @sin_no_fastmath(float %f) {
+; NO-FLOAT-SHRINK: call double @llvm.sin.f64
+; DO-FLOAT-SHRINK: %[[RES:[0-9]+]] = call float @llvm.sin.f32
+; DO-FLOAT-SHRINK: ret float %[[RES]]
+; ANY: }
+define float @sin_no_fastmath(float %f) {
+ %d = fpext float %f to double
+ %result = call double @llvm.sin.f64(double %d)
+ %truncated_result = fptrunc double %result to float
+ ret float %truncated_result
+}
+
+; ANY-LABEL: define float @sin_fastmath(float %f) {
+; ANY: %[[RES:[0-9]+]] = call fast float @llvm.sin.f32
+; ANY: ret float %[[RES]]
+; ANY: }
+define float @sin_fastmath(float %f) {
+ %d = fpext fast float %f to double
+ %result = call fast double @llvm.sin.f64(double %d)
+ %truncated_result = fptrunc fast double %result to float
+ ret float %truncated_result
+}
|
; ANY: } | ||
define float @sin_fastmath(float %f) { | ||
%d = fpext fast float %f to double | ||
%result = call fast double @llvm.sin.f64(double %d) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this just valid with all fast-math flags, or some subset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing optimization for libcalls requires all fast-math flags to be present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should fix that, this shouldn't require all of them (in particular nsz or contract)
04e11b7
to
31c76dd
Compare
31c76dd
to
5ce64df
Compare
5ce64df
to
76489c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but this can handle the vector case as well
%result = call fast double @llvm.sin.f64(double %d) | ||
%truncated_result = fptrunc double %result to float | ||
ret float %truncated_result | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also test vector case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work, I think because of this check:
if (!CI->getType()->isDoubleTy() || !CalleeFn) |
Let's build up on it in a subsequent PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
Would be good to do this for all relevant FP functions
This optimization already exists, but for the libcall versions of these functions and not for their intrinsic form.
Solves #139044.
There are probably more opportunities for other intrinsics, because the switch-case in
LibCallSimplifier::optimizeCall
covers onlypow
,exp2
,log
,log2
,log10
,sqrt
,memset
,memcpy
andmemmove
.