Skip to content

8321010: RISC-V: C2 RoundVF #17745

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 26 commits into from
Closed

Conversation

Hamlin-Li
Copy link

@Hamlin-Li Hamlin-Li commented Feb 7, 2024

Hi,
Can you have a review on this patch to add RoundVF/RoundDF intrinsics?

Current test shows that, it bring performance gain when vlenb >= 32 (which is on bananapi), but bring regression when vlenb == 16 (which is on k230). So I only enable the intrinsic when vlenb >= 32.

For double, even vlenb == 32, there is still some regression, so in this pr I only enable it when vlenb >= 64. Although there is no hardware to verify it, I think from the trend of performance data on bananapi and k230, it's promising to bring better performance rather than regression for double when vlenb == 64+. Please compare the data of Benchmark on bananapi, +UseSuperWord and Benchmark on k230, +UseSuperWord, enable RoundVF/D when vlenb == 16.

Thanks!

Tests

test/hotspot/jtreg/compiler/vectorization/TestRoundVectRiscv64.java test/hotspot/jtreg/compiler/c2/cr6340864/TestFloatVect.java test/hotspot/jtreg/compiler/c2/cr6340864/TestDoubleVect.java test/hotspot/jtreg/compiler/floatingpoint/TestRound.java

test/jdk/java/lang/Math/RoundTests.java

Performance - with Intrinsic

on bananapi

Benchmark on bananapi, +UseSuperWord

Benchmark on bananapi, +UseSuperWord (TESTSIZE) Mode Cnt Score +intrinsic Score -intrinsic Error Units Improvement
FpRoundingBenchmark.test_round_double 2048 avgt 10 23794.153 20557.467 2899.266 ns/op 0.864
FpRoundingBenchmark.test_round_float 2048 avgt 10 11531.853 16562.431 865.779 ns/op 1.436

on k230 (enable intrinsic even when vlenb == 16)

Benchmark on k230, +UseSuperWord, enable RoundVF/D when vlenb == 16

Benchmark on k230, +UseSuperWord, enable RoundVF/D when vlenb == 16 (TESTSIZE) Mode Cnt Score +intrinsic Error Units Score -intrinsic improvement
FpRoundingBenchmark.test_round_double 2048 avgt 10 47049.826 2507.496 ns/op 23300.526 0.495
FpRoundingBenchmark.test_round_float 2048 avgt 10 22353.826 1020.517 ns/op 18149.228 0.812

Performance - without Intrinsic

on bananapi, intrinsic disabled due to -UseSuperWord

Benchmark on bananapi, -UseSuperWord

Benchmark on bananapi, -UseSuperWord (TESTSIZE) Mode Cnt Score +intrinsic Score -intrinsic Error Units Improvement
FpRoundingBenchmark.test_round_double 2048 avgt 10 22651.12 22954.95 1182.26 ns/op 1.013
FpRoundingBenchmark.test_round_float 2048 avgt 10 21722.434 22082.771 735.31 ns/op 1.017

on k230, intrinsic disabled due to -UseSuperWord

Benchmark on k230, -UseSuperWord

Benchmark on k230, -UseSuperWord (TESTSIZE) Mode Cnt Score Error Units Score -intrinsic improvement
FpRoundingBenchmark.test_round_double 2048 avgt 10 23312.201 1049.356 ns/op 23350.433 1.002
FpRoundingBenchmark.test_round_float 2048 avgt 10 20174.847 1039.798 ns/op 20138.565 0.998

on k230, intrinsic disabled due to vlenb == 16

Benchmark on k230, +UseSuperWord

Benchmark on k230, +UseSuperWord (TESTSIZE) Mode Cnt Score +intrinsic Error Units Score -intrinsic improvement
FpRoundingBenchmark.test_round_double 2048 avgt 10 23353.978 1063.472 ns/op 23277.546 0.997
FpRoundingBenchmark.test_round_float 2048 avgt 10 18168.97 883.562 ns/op 18137.665 0.998

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issues

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/17745/head:pull/17745
$ git checkout pull/17745

Update a local copy of the PR:
$ git checkout pull/17745
$ git pull https://git.openjdk.org/jdk.git pull/17745/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 17745

View PR using the GUI difftool:
$ git pr show -t 17745

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/17745.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 7, 2024

👋 Welcome back mli! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Feb 7, 2024
@openjdk
Copy link

openjdk bot commented Feb 7, 2024

@Hamlin-Li The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@Hamlin-Li
Copy link
Author

/solves JDK-8321011

@mlbridge
Copy link

mlbridge bot commented Feb 7, 2024

@openjdk
Copy link

openjdk bot commented Feb 7, 2024

@Hamlin-Li
Adding additional issue to solves list: 8321011: RISC-V: C2 RoundVD.

@VladimirKempik
Copy link

Hello Hamlin, I recall you had licheepi board.
I would be nice if you can try to measure rvv performance gain with this https://github.com/syntacore/syntaj21/tree/rvv0.7.1

This PR showed it's not always easy to win perf just by using rvv - #17413

I understand it might not be possible, but would be nice to give it a try (I can share hsdis with support for 0.7.1 if needed)

@Hamlin-Li
Copy link
Author

@VladimirKempik Thanks for your comments, I'll consider it.

@Hamlin-Li
Copy link
Author

Hey, I'll be on vacation for a while, so maybe slow in response.
But please leave your comments freely as usual.

@Hamlin-Li
Copy link
Author

Hello Hamlin, I recall you had licheepi board. I would be nice if you can try to measure rvv performance gain with this https://github.com/syntacore/syntaj21/tree/rvv0.7.1

This PR showed it's not always easy to win perf just by using rvv - #17413

I understand it might not be possible, but would be nice to give it a try (I can share hsdis with support for 0.7.1 if needed)

Had a internal discussion about your suggestion, seems 0.7.1 is not incompatible with 1.0/2.0, and for this simple intrinsic, we think a better path is to have it first, then re-visit it when we have real hardware to measure the performance later.

@openjdk
Copy link

openjdk bot commented Mar 13, 2024

@Hamlin-Li This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8321010: RISC-V: C2 RoundVF
8321011: RISC-V: C2 RoundVD

Reviewed-by: rehn, luhenry

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 436 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@Hamlin-Li
Copy link
Author

Hey, Is someone avaiable to review this patch again? Thanks!

@luhenry
Copy link
Member

luhenry commented Sep 2, 2024

@RealFYang @turbanoff could we please have another review? Thank you!

Copy link
Contributor

@robehn robehn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems alright, thanks!

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 13, 2024
@Hamlin-Li
Copy link
Author

Thanks @luhenry @robehn for you reviewing!

/integrate

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Sep 13, 2024
@openjdk
Copy link

openjdk bot commented Sep 13, 2024

@Hamlin-Li This pull request has not yet been marked as ready for integration.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 13, 2024
@Hamlin-Li
Copy link
Author

/integrate

@openjdk
Copy link

openjdk bot commented Sep 13, 2024

Going to push as commit bacd046.
Since your change was applied there have been 436 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Sep 13, 2024
@openjdk openjdk bot closed this Sep 13, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 13, 2024
@openjdk
Copy link

openjdk bot commented Sep 13, 2024

@Hamlin-Li Pushed as commit bacd046.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler [email protected] integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

7 participants