Skip to content

8348561: Add aarch64 intrinsics for ML-DSA #23300

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from

Conversation

ferakocz
Copy link
Contributor

@ferakocz ferakocz commented Jan 24, 2025

By using the aarch64 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8348561: Add aarch64 intrinsics for ML-DSA (Enhancement - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/23300/head:pull/23300
$ git checkout pull/23300

Update a local copy of the PR:
$ git checkout pull/23300
$ git pull https://git.openjdk.org/jdk.git pull/23300/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 23300

View PR using the GUI difftool:
$ git pr show -t 23300

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/23300.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 24, 2025

👋 Welcome back ferakocz! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jan 24, 2025

@ferakocz This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8348561: Add aarch64 intrinsics for ML-DSA

Reviewed-by: adinn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 49 new commits pushed to the master branch:

  • 8073914: 8350974: The os_cpu VM_STRUCTS, VM_TYPES, etc have no declarations and should be removed
  • 7ee89a5: 8350893: Use generated names for hand generated opto runtime blobs
  • fae37aa: 8345627: [REDO] Use gcc12 -ftrivial-auto-var-init=pattern in debug builds
  • 1f10ffb: 8350851: ZGC: Reduce size of ZAddressOffsetMax scaling data structures
  • 4fc72b8: 8351082: Remove dead code for estimating CDS archive size
  • b6e2d66: 8351087: Combine scratch object tables in heapShared.cpp
  • d9b98f7: 8350771: Fix -Wzero-as-null-pointer-constant warning in nsk/monitoring ThreadController utility
  • 7c173fd: 8351077: Shenandoah: Update comments in ShenandoahConcurrentGC::op_reset_after_collect
  • 96613cc: 8349516: StAXStream2SAX.handleCharacters() fails on empty CDATA
  • 3a8a432: 8349094: GenShen: Race between control and regulator threads may violate assertions
  • ... and 39 more: https://git.openjdk.org/jdk/compare/ab4b0ef9242a4cd964fbcf2d1f3d370234c09408...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@adinn) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk
Copy link

openjdk bot commented Jan 24, 2025

@ferakocz The following labels will be automatically applied to this pull request:

  • graal
  • hotspot
  • security

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jan 27, 2025
@mlbridge
Copy link

mlbridge bot commented Jan 27, 2025

Webrevs

@openjdk
Copy link

openjdk bot commented Jan 30, 2025

@ferakocz this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout mldsa-aarch64-intrinsics
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Jan 30, 2025
@adinn
Copy link
Contributor

adinn commented Jan 30, 2025

@ferakocz I'm afraid you lucked out on getting your change committed before my reorganization of the stub generation code. If you are unsure of how to do the merge so your new stub is declared and generated following the new model (see the doc comments in stubDeclarations.hpp for details) let me know and I'll be happy to help you sort it out.

@openjdk openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Feb 3, 2025
@ferakocz
Copy link
Contributor Author

ferakocz commented Feb 3, 2025

@ferakocz I'm afraid you lucked out on getting your change committed before my reorganization of the stub generation code. If you are unsure of how to do the merge so your new stub is declared and generated following the new model (see the doc comments in stubDeclarations.hpp for details) let me know and I'll be happy to help you sort it out.

@adinn I think I managed to figure it out. Please take a look at the PR and let me know if I should have done anything differently.

@adinn
Copy link
Contributor

adinn commented Feb 4, 2025

@ferakocz Yes, the stub declaration part of it looks to be correct.

The rest of the patch will need at least two reviewers (@theRealAph? @martinuy? @franferrax) and may take some time to review, given that they will probably need to read up on the maths and algorithms. As an aid for reviewers and maintainers it would be good to insert a comment into the generator file linking the implementations to the relevant maths and algorithm. I found the FIPS-204 spec and the CRYSTALS-Dilithium Algorithm Specifications and Supporting Documentation paper, Shi Bai, Léo Ducas et al, 2021 - are they the best ones to look at?

@ferakocz
Copy link
Contributor Author

ferakocz commented Feb 4, 2025

@ferakocz Yes, the stub declaration part of it looks to be correct.

The rest of the patch will need at least two reviewers (@theRealAph? @martinuy? @franferrax) and may take some time to review, given that they will probably need to read up on the maths and algorithms. As an aid for reviewers and maintainers it would be good to insert a comment into the generator file linking the implementations to the relevant maths and algorithm. I found the FIPS-204 spec and the CRYSTALS-Dilithium Algorithm Specifications and Supporting Documentation paper, Shi Bai, Léo Ducas et al, 2021 - are they the best ones to look at?

The Java implementation of ML-DSA is based on the FIPS-204 standard and the intrinsics' implementations are based on the corresponding Java methods, except that the montMul() calls in them are inlined. The rest of the transformation from Java code to intrinsic code is pretty straightforward, so a reviewer need not necessarily understand the whole mathematics of the ML-DSA algorithms, just that the Java and the corresponding intrinsic code do the same thing.

@adinn
Copy link
Contributor

adinn commented Feb 5, 2025

@ferakocz

The Java implementation of ML-DSA is based on the FIPS-204 standard and the intrinsics' implementations are based on the corresponding Java methods, except that the montMul() calls in them are inlined. The rest of the transformation from Java code to intrinsic code is pretty straightforward, so a reviewer need not necessarily understand the whole mathematics of the ML-DSA algorithms, just that the Java and the corresponding intrinsic code do the same thing.

Yes, I located the relevant Java implementations in SHA3.java (keccak) and ML_DSA.java (dilithiumXXX) plus also SHA3Parallel.java (doubleKeccak). The first file does at least mention FIPS-202. The second does not include any reference, in particular does not mention FIPS-204 (correction it mentions it in passing rather than citing it as the basis for the derived code).

I still think it would be helpful for reviewers and maintainers if you were to add a comment in front of the generator routines that 1) notes that these routines are based on the relevant Java sources and 2) mentions that the Java code is in turn based on the FIPS-202 and FIPS-204 standards.

While I agree that a reviewer or maintainer could simply check the generated code against the Java code I believe access to the underlying theory will be of aid when it comes to understanding what each variant is doing and verifying the equivalence of the two.

That's why I'd also prefer to have two reviews to be sure that more than one of us who may be tasked with maintaining this code can be happy that we understand, at least, the equivalence in question.

@mcpowers
Copy link
Contributor

Some measurements:

With Intrinsics
---------------
keygen    ML-DSA-44   38.8  us/op
keygen    ML-DSA-65   82.5  us/op
keygen    ML-DSA-87  112.6  us/op
siggen    ML-DSA-44  119.1  us/op
siggen    ML-DSA-65  186.5  us/op
siggen    ML-DSA-87  306.1  us/op
sigver    ML-DSA-44   46.4  us/op
sigver    ML-DSA-65   72.8  us/op
sigver    ML-DSA-87  123.4  us/op
No Intrinsics
-------------
keygen    ML-DSA-44   63.1  us/op
keygen    ML-DSA-65  118.7  us/op
keygen    ML-DSA-87  167.2  us/op
siggen    ML-DSA-44  466.8  us/op
siggen    ML-DSA-65  546.3  us/op
siggen    ML-DSA-87  560.3  us/op
sigver    ML-DSA-44   71.6  us/op
sigver    ML-DSA-65  117.9  us/op
sigver    ML-DSA-87  180.4  us/op

@@ -2609,6 +2615,8 @@ template<typename R, typename... Rx>
INSN(minv, 0, 0b011011, false); // accepted arrangements: T8B, T16B, T4H, T8H, T2S, T4S
INSN(smaxp, 0, 0b101001, false); // accepted arrangements: T8B, T16B, T4H, T8H, T2S, T4S
INSN(sminp, 0, 0b101011, false); // accepted arrangements: T8B, T16B, T4H, T8H, T2S, T4S
INSN(sqdmulh,0, 0b101101, false); // accepted arrangements: T4H, T8H, T2S, T4S
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, not a comment on the algorithm itself but you might have to add these new instructions in the gtest for aarch64 here - test/hotspot/gtest/aarch64/aarch64-asmtest.py and use this file to generate test/hotspot/gtest/aarch64/asmtest.out.h which would contain these newly added instructions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried that, but the python script (actually the as command that it started) threw error messages:

aarch64ops.s:338:24: error: index must be a multiple of 8 in range [0, 32760].
prfm PLDL1KEEP, [x15, 43]
^
aarch64ops.s:357:20: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4]
sub x1, x10, x23, sxth #2
^
aarch64ops.s:359:20: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4]
add x11, x21, x5, uxtb #3
^
aarch64ops.s:360:22: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4]
adds x11, x17, x17, uxtw #1
^
aarch64ops.s:361:20: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4]
sub x11, x0, x15, uxtb #1
^
aarch64ops.s:362:19: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4]
subs x7, x1, x0, sxth #2
^
This is without any modifications from what is in the master branch currently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might have to use an assembler from the latest binutils build (if the system default isn't the latest) and add the path to the assembler in the "AS" variable. Also you can run it something like - python aarch64-asmtest.py | expand > asmtest.out.h. Please let me know if you still face problems.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ferakocz This also really needs addressing before committing the patch. Perhaps @theRealAph can advise on how to circumvent the problems you found when trying to update the python script?

Copy link
Contributor

@theRealAph theRealAph Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might have to use an assembler from the latest binutils build (if the system default isn't the latest) and add the path to the assembler in the "AS" variable. Also you can run it something like - python aarch64-asmtest.py | expand > asmtest.out.h. Please let me know if you still face problems.

People have been running this script for a decade now.

Let's look at just one of these:

aarch64ops.s:357:20: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4]
sub x1, x10, x23, sxth #2

From the AArch64 manual:

SUB (extended register)
SUB <Xd|SP>, <Xn|SP>, {, {#}}

It thinks this is a SUB (shifted register), bit it's really a SUB (extended register).

fedora:aarch64 $ cat t.s
sub x1, x10, x23, sxth #2
fedora:aarch64 $ as t.s
fedora:aarch64 $ objdump -D a.out
Disassembly of section .text:

0000000000000000 <.text>:
   0:	cb37a941 	sub	x1, x10, w23, sxth #2

So perhaps binutils expects w23 here, not x23. But the manual (ARM DDI 0487K.a) says x23 should be just fine, and, what's more, gives the x form preferred status.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried it with top-of trunk latest binutils:

fedora:aarch64 $ ~/binutils-gdb-install/bin/as -march=armv9-a+sha3+sve2-bitperm aarch64ops.s
fedora:aarch64 $ ~/binutils-gdb-install/bin/as --version
GNU assembler (GNU Binutils) 2.44.50.20250225

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha!

aph@Andrews-MacBook-Pro ~ % as t.s   
t.s:1:19: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4]
sub x1, x10, x23, sxth #2
                  ^
aph@Andrews-MacBook-Pro ~ % as --version
Apple clang version 16.0.0 (clang-1600.0.26.6)
Target: arm64-apple-darwin24.3.0

Copy link
Contributor Author

@ferakocz ferakocz Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so GNU as is more forgiving than Apple as...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did my patch to aarch64-asmtest.py solve the problem?

Copy link
Contributor Author

@ferakocz ferakocz Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tried, I just used GNU as.

@@ -2586,6 +2591,7 @@ template<typename R, typename... Rx>
void NAME(FloatRegister Vd, SIMD_Arrangement T, FloatRegister Vn, FloatRegister Vm) { \
guarantee(T != T1Q && T != T1D, "incorrect arrangement"); \
if (!acceptT2D) guarantee(T != T2D, "incorrect arrangement"); \
if (strcmp(#NAME, "sqdmulh") == 0) guarantee(T != T8B && T != T16B, "incorrect arrangement"); \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

I think it might be better to change this test from a strcmp call to (opc2 == 0b101101). The strcmp test is clearer to a reader of the code but the call may not be guaranteed to be compiled out at build time while the latter will.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as suggested.

@@ -4063,6 +4063,93 @@ class StubGenerator: public StubCodeGenerator {
return start;
}

// Execute on round of keccak of two computations in parallel.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

It would be helpful to add comments that relate the register and instruction selection to the original Java source code. e.g. change the header as follows

// Performs 2 keccak round transformations using vector parallelism
// 
// Two sets of 25 * 64-bit input states a0[lo:hi]...a24[lo:hi] are passed in
// the lower/upper halves of registers v0...v24 and the transformed states
// are returned in the same registers. Intermediate 64-bit pairs
// c0...c5 and d0...d5 are computed in registers v25...v30. v31 is
// loaded with the required pair of 64 bit rounding constants.
// During computation of the output states some intermediate results are
// shuffled around registers v0...v30. Comments on each line indicate
// how the values in registers correspond to variables ai, ci, di in
// the Java source code, likewise how the generated machine instructions
// correspond to Java source operations (n.b. rol means rotate left).

The annotate the generation steps as follows:

__ eor3(v29, __ T16B, v4, v9, v14);       // c4 = a4 ^ a9 ^ a14
__ eor3(v26, __ T16B, v1, v6, v11);       // c1 = a1 ^ a16 ^ a11
__ eor3(v28, __ T16B, v3, v8, v13);       // c3 = a3 ^ a8 ^a13
__ eor3(v25, __ T16B, v0, v5, v10);       // c0 = a0 ^ a5 ^ a10
__ eor3(v27, __ T16B, v2, v7, v12);       // c2 = a2 ^ a7 ^ a12
__ eor3(v29, __ T16B, v29, v19, v24);     // c4 ^= a19 ^ a24
__ eor3(v26, __ T16B, v26, v16, v21);     // c1 ^= a16 ^ a21
__ eor3(v28, __ T16B, v28, v18, v23);     // c3 ^= a18 ^ a23
__ eor3(v25, __ T16B, v25, v15, v20);     // c0 ^= a15 ^ a20
__ eor3(v27, __ T16B, v27, v17, v22);     // c2 ^= a17 ^ a22

__ rax1(v30, __ T2D, v29, v26);           // d0 = c4 ^ rol(c1, 1)
__ rax1(v26, __ T2D, v26, v28);           // d2 = c1 ^ rol(c3, 1)
__ rax1(v28, __ T2D, v28, v25);           // d4 = c3 ^ rol(c0, 1)
__ rax1(v25, __ T2D, v25, v27);           // d1 = c0 ^ rol(c2, 1)
__ rax1(v27, __ T2D, v27, v29);           // d3 = c2 ^ rol(c4, 1)

__ eor(v0, __ T16B, v0, v30);             // a0 = a0 ^ d0
__ xar(v29, __ T2D, v1,  v25, (64 - 1));  // a10' = rol((a1^d1), 1)
__ xar(v1,  __ T2D, v6,  v25, (64 - 44)); // a1 = rol(a6^d1), 44)
__ xar(v6,  __ T2D, v9,  v28, (64 - 20)); // a6 = rol((a9^d4), 20)
__ xar(v9,  __ T2D, v22, v26, (64 - 61)); // a9 = rol((a22^d2), 61)
__ xar(v22, __ T2D, v14, v28, (64 - 39)); // a22 = rol((a14^d4), 39)
__ xar(v14, __ T2D, v20, v30, (64 - 18)); // a14 = rol((a20^d0), 18)
__ xar(v31, __ T2D, v2,  v26, (64 - 62)); // a20' = rol((a2^d2), 62)
__ xar(v2,  __ T2D, v12, v26, (64 - 43)); // a2 = rol((a12^d2), 43)
__ xar(v12, __ T2D, v13, v27, (64 - 25)); // a12 = rol((a13^d3), 25)
__ xar(v13, __ T2D, v19, v28, (64 - 8));  // a13 = rol((a19^d4), 8)
__ xar(v19, __ T2D, v23, v27, (64 - 56)); // a19 = rol((a23^d3), 56)
__ xar(v23, __ T2D, v15, v30, (64 - 41)); // a23 = rol((a15^d0), 41)
__ xar(v15, __ T2D, v4,  v28, (64 - 27)); // a15 = rol((a4^d4), 27)
__ xar(v28, __ T2D, v24, v28, (64 - 14)); // a4' = rol((a24^d4), 14)
__ xar(v24, __ T2D, v21, v25, (64 - 2));  // a24 = rol((a21^d1), 2)
__ xar(v8,  __ T2D, v8,  v27, (64 - 55)); // a21' = rol((a8^d3), 55)
__ xar(v4,  __ T2D, v16, v25, (64 - 45)); // a8' = rol((a16^d1), 45)
__ xar(v16, __ T2D, v5,  v30, (64 - 36)); // a16 = rol((a5^d0), 36)
__ xar(v5,  __ T2D, v3,  v27, (64 - 28)); // a5 = rol((a3^d3), 28)
__ xar(v27, __ T2D, v18, v27, (64 - 21)); // a3' = rol((a18^d3), 21)
__ xar(v3,  __ T2D, v17, v26, (64 - 15)); // a18' = rol((a17^d2), 15)
__ xar(v25, __ T2D, v11, v25, (64 - 10)); // a17' = rol((a11^d1), 10)
__ xar(v26, __ T2D, v7,  v26, (64 - 6));  // a11' = rol((a7^d2), 6)
__ xar(v30, __ T2D, v10, v30, (64 - 3));  // a7' = rol((a10^d0), 3)

__ bcax(v20, __ T16B, v31, v22, v8);      // a20 = a20' ^ (~a21 & a22')
__ bcax(v21, __ T16B, v8,  v23, v22);     // a21 = a21' ^ (~a22 & a23)
__ bcax(v22, __ T16B, v22, v24, v23);     // a22 = a22 ^ (~a23 & a24)
__ bcax(v23, __ T16B, v23, v31, v24);     // a23 = a23 ^ (~a24 & a20')
__ bcax(v24, __ T16B, v24, v8,  v31);     // a24 = a24 ^ (~a20' & a21')

__ ld1r(v31, __ T2D, __ post(rscratch1, 8)); // rc = round_constants[i]

__ bcax(v17, __ T16B, v25, v19, v3);      // a17 = a17' ^ (~a18' & a19)
__ bcax(v18, __ T16B, v3,  v15, v19);     // a18 = a18' ^ (~a19 & a15')
__ bcax(v19, __ T16B, v19, v16, v15);     // a19 = a19 ^ (~a15 & a16)
__ bcax(v15, __ T16B, v15, v25, v16);     // a15 = a15 ^ (~a16 & a17')
__ bcax(v16, __ T16B, v16, v3,  v25);     // a16 = a16 ^ (~a17' & a18')

__ bcax(v10, __ T16B, v29, v12, v26);     // a10 = a10' ^ (~a11' & a12)
__ bcax(v11, __ T16B, v26, v13, v12);     // a11 = a11' ^ (~a12 & a13)
__ bcax(v12, __ T16B, v12, v14, v13);     // a12 = a12 ^ (~a13 & a14)
__ bcax(v13, __ T16B, v13, v29, v14);     // a13 = a13 ^ (~a14 & a10')
__ bcax(v14, __ T16B, v14, v26, v29);     // a14 = a14 ^ (~a10' & a11')

__ bcax(v7, __ T16B, v30, v9,  v4);       // a7 = a7' ^ (~a8' & a9)
__ bcax(v8, __ T16B, v4,  v5,  v9);       // a8 = a8' ^ (~a9 & a5)
__ bcax(v9, __ T16B, v9,  v6,  v5);       // a9 = a9 ^ (~a5 & a6)
__ bcax(v5, __ T16B, v5,  v30, v6);       // a5 = a5 ^ (~a6 & a7)
__ bcax(v6, __ T16B, v6,  v4,  v30);      // a6 = a6 ^ (~a7 & a8')

__ bcax(v3, __ T16B, v27, v0,  v28);      // a3 = a3' ^ (~a4' & a0)
__ bcax(v4, __ T16B, v28, v1,  v0);       // a4 = a4' ^ (~a0 & a1)
__ bcax(v0, __ T16B, v0,  v2,  v1);       // a0 = a0 ^ (~a1 & a2)
__ bcax(v1, __ T16B, v1,  v27, v2);       // a1 = a1 ^ (~a2 & a3)
__ bcax(v2, __ T16B, v2,  v28, v27);      // a2 = a2 ^ (~a3 & a4')

__ eor(v0, __ T16B, v0, v31);             // a0 = a0 ^ rc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this piece of code is not new, and I don't really think that this level of commenting is necessary, especially in code that is very unlikely to change, I added the comments.

@shqking
Copy link
Contributor

shqking commented Feb 19, 2025

Hi. Here is the test result of our CI.

copyright year

the following files should update the copyright year to 2025.

src/hotspot/cpu/aarch64/assembler_aarch64.hpp
src/hotspot/cpu/aarch64/stubRoutines_aarch64.hpp
src/hotspot/share/runtime/globals.hpp
src/java.base/share/classes/sun/security/provider/ML_DSA.java
src/java.base/share/classes/sun/security/provider/SHA3Parallel.java
test/micro/org/openjdk/bench/java/security/MLDSA.java

cross-build failure

Cross build for riscv64/s390/ppc64 failed.

Here shows the error msg for ppc64

=== Output from failing command(s) repeated here ===
* For target support_interim-jmods_support__create_java.base.jmod_exec:
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/tmp/jdk-src/src/hotspot/share/asm/codeBuffer.hpp:200), pid=72752, tid=72769
#  assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000e85cb03dc620 <= 0x0000e85cb03e8ab4 <= 0x0000e85cb03e8ab0
#
# JRE version: OpenJDK Runtime Environment (25.0) (fastdebug build 25-internal-git-1e01c6deec3)
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 25-internal-git-1e01c6deec3, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64)
# Problematic frame:
# V  [libjvm.so+0x3b391c]  Instruction_aarch64::~Instruction_aarch64()+0xbc
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /tmp/ci-scripts/jdk-src/make/
#
# An error report file with more information is saved as:
# /tmp/jdk-src/make/hs_err_pid72752.log
   ... (rest of output omitted)

* All command lines available in /sysroot/ppc64el/tmp/build-ppc64el/make-support/failure-logs.
=== End of repeated output ===

I suppose we should make the similar update at file src/hotspot/cpu/aarch64/stubDeclarations_aarch64.hpp to other platforms

@adinn
Copy link
Contributor

adinn commented Feb 19, 2025

@ferakocz Apologies for the delays in reviewing and the limited feedback up to now. The code clearly does the job well but I think it would be made clearer and easier to maintain by tweaking/extending some of the generator methods and adding more detailed commenting. I am afraid I may take a few days to provide the relevant details because of other commitments.

Copy link
Contributor

@adinn adinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comments as indicated to relate generated code to original Java source. Otherwise good to go.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Feb 24, 2025
@adinn
Copy link
Contributor

adinn commented Feb 24, 2025

I raised JDK-8350589 to cover investigation of an alternative implementation.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Feb 26, 2025
Copy link
Contributor

@adinn adinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, still good

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Feb 26, 2025
@adinn
Copy link
Contributor

adinn commented Feb 27, 2025

@ferakocz Apologies for raising yet another resolve conflict. You will need to make a further adjustment to the compiler blob declaration to accommodate a fix I just pushed to resolve a problem with cross-compilation. Your patch should now specify

*** oops, mistake, see below ***

@adinn
Copy link
Contributor

adinn commented Feb 27, 2025

Oops. sorry - cut and paste error -- the new setting should be

do_arch_blob(compiler, 55000 ZGC_ONLY(+5000))

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Feb 28, 2025
@ferakocz
Copy link
Contributor Author

ferakocz commented Mar 4, 2025

Oops. sorry - cut and paste error -- the new setting should be

do_arch_blob(compiler, 55000 ZGC_ONLY(+5000))

@adinn, I have done this change, but that erased your approval. Could you reapprove?

Copy link
Contributor

@adinn adinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still good.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Mar 4, 2025
@adinn
Copy link
Contributor

adinn commented Mar 4, 2025

@ferakocz Feel free to integrate and I will sponsor

@ferakocz
Copy link
Contributor Author

ferakocz commented Mar 4, 2025

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Mar 4, 2025
@openjdk
Copy link

openjdk bot commented Mar 4, 2025

@ferakocz
Your change (at version d82dfb2) is now ready to be sponsored by a Committer.

@ferakocz
Copy link
Contributor Author

ferakocz commented Mar 4, 2025

@ferakocz Feel free to integrate and I will sponsor

@adinn thanks a lot for the review and the sponsoring, too!

@seanjmullan
Copy link
Member

I think it would be nice to add a release note for this describing the approximate performance improvement.

@adinn
Copy link
Contributor

adinn commented Mar 4, 2025

/sponsor

@openjdk
Copy link

openjdk bot commented Mar 4, 2025

Going to push as commit 3230894.
Since your change was applied there have been 49 commits pushed to the master branch:

  • 8073914: 8350974: The os_cpu VM_STRUCTS, VM_TYPES, etc have no declarations and should be removed
  • 7ee89a5: 8350893: Use generated names for hand generated opto runtime blobs
  • fae37aa: 8345627: [REDO] Use gcc12 -ftrivial-auto-var-init=pattern in debug builds
  • 1f10ffb: 8350851: ZGC: Reduce size of ZAddressOffsetMax scaling data structures
  • 4fc72b8: 8351082: Remove dead code for estimating CDS archive size
  • b6e2d66: 8351087: Combine scratch object tables in heapShared.cpp
  • d9b98f7: 8350771: Fix -Wzero-as-null-pointer-constant warning in nsk/monitoring ThreadController utility
  • 7c173fd: 8351077: Shenandoah: Update comments in ShenandoahConcurrentGC::op_reset_after_collect
  • 96613cc: 8349516: StAXStream2SAX.handleCharacters() fails on empty CDATA
  • 3a8a432: 8349094: GenShen: Race between control and regulator threads may violate assertions
  • ... and 39 more: https://git.openjdk.org/jdk/compare/ab4b0ef9242a4cd964fbcf2d1f3d370234c09408...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Mar 4, 2025
@openjdk openjdk bot closed this Mar 4, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Mar 4, 2025
@openjdk
Copy link

openjdk bot commented Mar 4, 2025

@adinn @ferakocz Pushed as commit 3230894.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

7 participants