8348561: Add aarch64 intrinsics for ML-DSA #23300

ferakocz · 2025-01-24T14:24:23Z

By using the aarch64 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8348561: Add aarch64 intrinsics for ML-DSA (Enhancement - P3)

Reviewers

Andrew Dinn (@adinn - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/23300/head:pull/23300
$ git checkout pull/23300

Update a local copy of the PR:
$ git checkout pull/23300
$ git pull https://git.openjdk.org/jdk.git pull/23300/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 23300

View PR using the GUI difftool:
$ git pr show -t 23300

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/23300.diff

Using Webrev

Link to Webrev Comment

bridgekeeper · 2025-01-24T14:25:12Z

👋 Welcome back ferakocz! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-01-24T14:25:44Z

@ferakocz This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8348561: Add aarch64 intrinsics for ML-DSA

Reviewed-by: adinn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 49 new commits pushed to the master branch:

8073914: 8350974: The os_cpu VM_STRUCTS, VM_TYPES, etc have no declarations and should be removed
7ee89a5: 8350893: Use generated names for hand generated opto runtime blobs
fae37aa: 8345627: [REDO] Use gcc12 -ftrivial-auto-var-init=pattern in debug builds
1f10ffb: 8350851: ZGC: Reduce size of ZAddressOffsetMax scaling data structures
4fc72b8: 8351082: Remove dead code for estimating CDS archive size
b6e2d66: 8351087: Combine scratch object tables in heapShared.cpp
d9b98f7: 8350771: Fix -Wzero-as-null-pointer-constant warning in nsk/monitoring ThreadController utility
7c173fd: 8351077: Shenandoah: Update comments in ShenandoahConcurrentGC::op_reset_after_collect
96613cc: 8349516: StAXStream2SAX.handleCharacters() fails on empty CDATA
3a8a432: 8349094: GenShen: Race between control and regulator threads may violate assertions
... and 39 more: https://git.openjdk.org/jdk/compare/ab4b0ef9242a4cd964fbcf2d1f3d370234c09408...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@adinn) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

openjdk · 2025-01-24T14:26:29Z

@ferakocz The following labels will be automatically applied to this pull request:

graal
hotspot
security

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

mlbridge · 2025-01-27T18:23:42Z

Webrevs

openjdk · 2025-01-30T16:10:33Z

@ferakocz this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout mldsa-aarch64-intrinsics
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

adinn · 2025-01-30T16:23:56Z

@ferakocz I'm afraid you lucked out on getting your change committed before my reorganization of the stub generation code. If you are unsure of how to do the merge so your new stub is declared and generated following the new model (see the doc comments in stubDeclarations.hpp for details) let me know and I'll be happy to help you sort it out.

ferakocz · 2025-02-03T18:11:51Z

@ferakocz I'm afraid you lucked out on getting your change committed before my reorganization of the stub generation code. If you are unsure of how to do the merge so your new stub is declared and generated following the new model (see the doc comments in stubDeclarations.hpp for details) let me know and I'll be happy to help you sort it out.

@adinn I think I managed to figure it out. Please take a look at the PR and let me know if I should have done anything differently.

adinn · 2025-02-04T11:45:54Z

@ferakocz Yes, the stub declaration part of it looks to be correct.

The rest of the patch will need at least two reviewers (@theRealAph? @martinuy? @franferrax) and may take some time to review, given that they will probably need to read up on the maths and algorithms. As an aid for reviewers and maintainers it would be good to insert a comment into the generator file linking the implementations to the relevant maths and algorithm. I found the FIPS-204 spec and the CRYSTALS-Dilithium Algorithm Speciﬁcations and Supporting Documentation paper, Shi Bai, Léo Ducas et al, 2021 - are they the best ones to look at?

ferakocz · 2025-02-04T18:57:28Z

@ferakocz Yes, the stub declaration part of it looks to be correct.

The rest of the patch will need at least two reviewers (@theRealAph? @martinuy? @franferrax) and may take some time to review, given that they will probably need to read up on the maths and algorithms. As an aid for reviewers and maintainers it would be good to insert a comment into the generator file linking the implementations to the relevant maths and algorithm. I found the FIPS-204 spec and the CRYSTALS-Dilithium Algorithm Speciﬁcations and Supporting Documentation paper, Shi Bai, Léo Ducas et al, 2021 - are they the best ones to look at?

The Java implementation of ML-DSA is based on the FIPS-204 standard and the intrinsics' implementations are based on the corresponding Java methods, except that the montMul() calls in them are inlined. The rest of the transformation from Java code to intrinsic code is pretty straightforward, so a reviewer need not necessarily understand the whole mathematics of the ML-DSA algorithms, just that the Java and the corresponding intrinsic code do the same thing.

adinn · 2025-02-05T10:32:44Z

@ferakocz

The Java implementation of ML-DSA is based on the FIPS-204 standard and the intrinsics' implementations are based on the corresponding Java methods, except that the montMul() calls in them are inlined. The rest of the transformation from Java code to intrinsic code is pretty straightforward, so a reviewer need not necessarily understand the whole mathematics of the ML-DSA algorithms, just that the Java and the corresponding intrinsic code do the same thing.

Yes, I located the relevant Java implementations in SHA3.java (keccak) and ML_DSA.java (dilithiumXXX) plus also SHA3Parallel.java (doubleKeccak). The first file does at least mention FIPS-202. The second does not include any reference, in particular does not mention FIPS-204 (correction it mentions it in passing rather than citing it as the basis for the derived code).

I still think it would be helpful for reviewers and maintainers if you were to add a comment in front of the generator routines that 1) notes that these routines are based on the relevant Java sources and 2) mentions that the Java code is in turn based on the FIPS-202 and FIPS-204 standards.

While I agree that a reviewer or maintainer could simply check the generated code against the Java code I believe access to the underlying theory will be of aid when it comes to understanding what each variant is doing and verifying the equivalence of the two.

That's why I'd also prefer to have two reviews to be sure that more than one of us who may be tasked with maintaining this code can be happy that we understand, at least, the equivalence in question.

mcpowers · 2025-02-10T20:58:32Z

Some measurements:

With Intrinsics
---------------
keygen    ML-DSA-44   38.8  us/op
keygen    ML-DSA-65   82.5  us/op
keygen    ML-DSA-87  112.6  us/op
siggen    ML-DSA-44  119.1  us/op
siggen    ML-DSA-65  186.5  us/op
siggen    ML-DSA-87  306.1  us/op
sigver    ML-DSA-44   46.4  us/op
sigver    ML-DSA-65   72.8  us/op
sigver    ML-DSA-87  123.4  us/op

No Intrinsics
-------------
keygen    ML-DSA-44   63.1  us/op
keygen    ML-DSA-65  118.7  us/op
keygen    ML-DSA-87  167.2  us/op
siggen    ML-DSA-44  466.8  us/op
siggen    ML-DSA-65  546.3  us/op
siggen    ML-DSA-87  560.3  us/op
sigver    ML-DSA-44   71.6  us/op
sigver    ML-DSA-65  117.9  us/op
sigver    ML-DSA-87  180.4  us/op

Bhavana-Kilambi · 2025-02-11T10:40:31Z

src/hotspot/cpu/aarch64/assembler_aarch64.hpp

@@ -2609,6 +2615,8 @@ template<typename R, typename... Rx>
  INSN(minv,   0, 0b011011, false); // accepted arrangements: T8B, T16B, T4H, T8H, T2S, T4S
  INSN(smaxp,  0, 0b101001, false); // accepted arrangements: T8B, T16B, T4H, T8H, T2S, T4S
  INSN(sminp,  0, 0b101011, false); // accepted arrangements: T8B, T16B, T4H, T8H, T2S, T4S
+  INSN(sqdmulh,0, 0b101101, false); // accepted arrangements: T4H, T8H, T2S, T4S


Hi, not a comment on the algorithm itself but you might have to add these new instructions in the gtest for aarch64 here - test/hotspot/gtest/aarch64/aarch64-asmtest.py and use this file to generate test/hotspot/gtest/aarch64/asmtest.out.h which would contain these newly added instructions.

I have tried that, but the python script (actually the as command that it started) threw error messages:

aarch64ops.s:338:24: error: index must be a multiple of 8 in range [0, 32760].
prfm PLDL1KEEP, [x15, 43]
^
aarch64ops.s:357:20: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4]
sub x1, x10, x23, sxth #2
^
aarch64ops.s:359:20: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4]
add x11, x21, x5, uxtb #3
^
aarch64ops.s:360:22: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4]
adds x11, x17, x17, uxtw #1
^
aarch64ops.s:361:20: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4]
sub x11, x0, x15, uxtb #1
^
aarch64ops.s:362:19: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4]
subs x7, x1, x0, sxth #2
^
This is without any modifications from what is in the master branch currently.

You might have to use an assembler from the latest binutils build (if the system default isn't the latest) and add the path to the assembler in the "AS" variable. Also you can run it something like - python aarch64-asmtest.py | expand > asmtest.out.h. Please let me know if you still face problems.

@ferakocz This also really needs addressing before committing the patch. Perhaps @theRealAph can advise on how to circumvent the problems you found when trying to update the python script?

You might have to use an assembler from the latest binutils build (if the system default isn't the latest) and add the path to the assembler in the "AS" variable. Also you can run it something like - python aarch64-asmtest.py | expand > asmtest.out.h. Please let me know if you still face problems.

People have been running this script for a decade now.

Let's look at just one of these:

aarch64ops.s:357:20: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] sub x1, x10, x23, sxth #2

From the AArch64 manual:

SUB (extended register)
SUB <Xd|SP>, <Xn|SP>, {, {#}}

It thinks this is a SUB (shifted register), bit it's really a SUB (extended register).

fedora:aarch64 $ cat t.s sub x1, x10, x23, sxth #2 fedora:aarch64 $ as t.s fedora:aarch64 $ objdump -D a.out Disassembly of section .text: 0000000000000000 <.text>: 0: cb37a941 sub x1, x10, w23, sxth #2

So perhaps binutils expects w23 here, not x23. But the manual (ARM DDI 0487K.a) says x23 should be just fine, and, what's more, gives the x form preferred status.

I just tried it with top-of trunk latest binutils:

fedora:aarch64 $ ~/binutils-gdb-install/bin/as -march=armv9-a+sha3+sve2-bitperm aarch64ops.s fedora:aarch64 $ ~/binutils-gdb-install/bin/as --version GNU assembler (GNU Binutils) 2.44.50.20250225

Aha!

aph@Andrews-MacBook-Pro ~ % as t.s t.s:1:19: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] sub x1, x10, x23, sxth #2 ^ aph@Andrews-MacBook-Pro ~ % as --version Apple clang version 16.0.0 (clang-1600.0.26.6) Target: arm64-apple-darwin24.3.0

OK, so GNU as is more forgiving than Apple as...

Did my patch to aarch64-asmtest.py solve the problem?

I haven't tried, I just used GNU as.

adinn · 2025-02-18T13:33:52Z

src/hotspot/cpu/aarch64/assembler_aarch64.hpp

@@ -2586,6 +2591,7 @@ template<typename R, typename... Rx>
  void NAME(FloatRegister Vd, SIMD_Arrangement T, FloatRegister Vn, FloatRegister Vm) { \
    guarantee(T != T1Q && T != T1D, "incorrect arrangement");                           \
    if (!acceptT2D) guarantee(T != T2D, "incorrect arrangement");                       \
+    if (strcmp(#NAME, "sqdmulh") == 0) guarantee(T != T8B && T != T16B, "incorrect arrangement");   \


Suggestion:

I think it might be better to change this test from a strcmp call to (opc2 == 0b101101). The strcmp test is clearer to a reader of the code but the call may not be guaranteed to be compiled out at build time while the latter will.

Changed as suggested.

adinn · 2025-02-18T13:43:18Z

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp

@@ -4063,6 +4063,93 @@ class StubGenerator: public StubCodeGenerator {
    return start;
  }

+  // Execute on round of keccak of two computations in parallel.


Suggestion:

It would be helpful to add comments that relate the register and instruction selection to the original Java source code. e.g. change the header as follows

// Performs 2 keccak round transformations using vector parallelism // // Two sets of 25 * 64-bit input states a0[lo:hi]...a24[lo:hi] are passed in // the lower/upper halves of registers v0...v24 and the transformed states // are returned in the same registers. Intermediate 64-bit pairs // c0...c5 and d0...d5 are computed in registers v25...v30. v31 is // loaded with the required pair of 64 bit rounding constants. // During computation of the output states some intermediate results are // shuffled around registers v0...v30. Comments on each line indicate // how the values in registers correspond to variables ai, ci, di in // the Java source code, likewise how the generated machine instructions // correspond to Java source operations (n.b. rol means rotate left).

The annotate the generation steps as follows:

__ eor3(v29, __ T16B, v4, v9, v14); // c4 = a4 ^ a9 ^ a14 __ eor3(v26, __ T16B, v1, v6, v11); // c1 = a1 ^ a16 ^ a11 __ eor3(v28, __ T16B, v3, v8, v13); // c3 = a3 ^ a8 ^a13 __ eor3(v25, __ T16B, v0, v5, v10); // c0 = a0 ^ a5 ^ a10 __ eor3(v27, __ T16B, v2, v7, v12); // c2 = a2 ^ a7 ^ a12 __ eor3(v29, __ T16B, v29, v19, v24); // c4 ^= a19 ^ a24 __ eor3(v26, __ T16B, v26, v16, v21); // c1 ^= a16 ^ a21 __ eor3(v28, __ T16B, v28, v18, v23); // c3 ^= a18 ^ a23 __ eor3(v25, __ T16B, v25, v15, v20); // c0 ^= a15 ^ a20 __ eor3(v27, __ T16B, v27, v17, v22); // c2 ^= a17 ^ a22 __ rax1(v30, __ T2D, v29, v26); // d0 = c4 ^ rol(c1, 1) __ rax1(v26, __ T2D, v26, v28); // d2 = c1 ^ rol(c3, 1) __ rax1(v28, __ T2D, v28, v25); // d4 = c3 ^ rol(c0, 1) __ rax1(v25, __ T2D, v25, v27); // d1 = c0 ^ rol(c2, 1) __ rax1(v27, __ T2D, v27, v29); // d3 = c2 ^ rol(c4, 1) __ eor(v0, __ T16B, v0, v30); // a0 = a0 ^ d0 __ xar(v29, __ T2D, v1, v25, (64 - 1)); // a10' = rol((a1^d1), 1) __ xar(v1, __ T2D, v6, v25, (64 - 44)); // a1 = rol(a6^d1), 44) __ xar(v6, __ T2D, v9, v28, (64 - 20)); // a6 = rol((a9^d4), 20) __ xar(v9, __ T2D, v22, v26, (64 - 61)); // a9 = rol((a22^d2), 61) __ xar(v22, __ T2D, v14, v28, (64 - 39)); // a22 = rol((a14^d4), 39) __ xar(v14, __ T2D, v20, v30, (64 - 18)); // a14 = rol((a20^d0), 18) __ xar(v31, __ T2D, v2, v26, (64 - 62)); // a20' = rol((a2^d2), 62) __ xar(v2, __ T2D, v12, v26, (64 - 43)); // a2 = rol((a12^d2), 43) __ xar(v12, __ T2D, v13, v27, (64 - 25)); // a12 = rol((a13^d3), 25) __ xar(v13, __ T2D, v19, v28, (64 - 8)); // a13 = rol((a19^d4), 8) __ xar(v19, __ T2D, v23, v27, (64 - 56)); // a19 = rol((a23^d3), 56) __ xar(v23, __ T2D, v15, v30, (64 - 41)); // a23 = rol((a15^d0), 41) __ xar(v15, __ T2D, v4, v28, (64 - 27)); // a15 = rol((a4^d4), 27) __ xar(v28, __ T2D, v24, v28, (64 - 14)); // a4' = rol((a24^d4), 14) __ xar(v24, __ T2D, v21, v25, (64 - 2)); // a24 = rol((a21^d1), 2) __ xar(v8, __ T2D, v8, v27, (64 - 55)); // a21' = rol((a8^d3), 55) __ xar(v4, __ T2D, v16, v25, (64 - 45)); // a8' = rol((a16^d1), 45) __ xar(v16, __ T2D, v5, v30, (64 - 36)); // a16 = rol((a5^d0), 36) __ xar(v5, __ T2D, v3, v27, (64 - 28)); // a5 = rol((a3^d3), 28) __ xar(v27, __ T2D, v18, v27, (64 - 21)); // a3' = rol((a18^d3), 21) __ xar(v3, __ T2D, v17, v26, (64 - 15)); // a18' = rol((a17^d2), 15) __ xar(v25, __ T2D, v11, v25, (64 - 10)); // a17' = rol((a11^d1), 10) __ xar(v26, __ T2D, v7, v26, (64 - 6)); // a11' = rol((a7^d2), 6) __ xar(v30, __ T2D, v10, v30, (64 - 3)); // a7' = rol((a10^d0), 3) __ bcax(v20, __ T16B, v31, v22, v8); // a20 = a20' ^ (~a21 & a22') __ bcax(v21, __ T16B, v8, v23, v22); // a21 = a21' ^ (~a22 & a23) __ bcax(v22, __ T16B, v22, v24, v23); // a22 = a22 ^ (~a23 & a24) __ bcax(v23, __ T16B, v23, v31, v24); // a23 = a23 ^ (~a24 & a20') __ bcax(v24, __ T16B, v24, v8, v31); // a24 = a24 ^ (~a20' & a21') __ ld1r(v31, __ T2D, __ post(rscratch1, 8)); // rc = round_constants[i] __ bcax(v17, __ T16B, v25, v19, v3); // a17 = a17' ^ (~a18' & a19) __ bcax(v18, __ T16B, v3, v15, v19); // a18 = a18' ^ (~a19 & a15') __ bcax(v19, __ T16B, v19, v16, v15); // a19 = a19 ^ (~a15 & a16) __ bcax(v15, __ T16B, v15, v25, v16); // a15 = a15 ^ (~a16 & a17') __ bcax(v16, __ T16B, v16, v3, v25); // a16 = a16 ^ (~a17' & a18') __ bcax(v10, __ T16B, v29, v12, v26); // a10 = a10' ^ (~a11' & a12) __ bcax(v11, __ T16B, v26, v13, v12); // a11 = a11' ^ (~a12 & a13) __ bcax(v12, __ T16B, v12, v14, v13); // a12 = a12 ^ (~a13 & a14) __ bcax(v13, __ T16B, v13, v29, v14); // a13 = a13 ^ (~a14 & a10') __ bcax(v14, __ T16B, v14, v26, v29); // a14 = a14 ^ (~a10' & a11') __ bcax(v7, __ T16B, v30, v9, v4); // a7 = a7' ^ (~a8' & a9) __ bcax(v8, __ T16B, v4, v5, v9); // a8 = a8' ^ (~a9 & a5) __ bcax(v9, __ T16B, v9, v6, v5); // a9 = a9 ^ (~a5 & a6) __ bcax(v5, __ T16B, v5, v30, v6); // a5 = a5 ^ (~a6 & a7) __ bcax(v6, __ T16B, v6, v4, v30); // a6 = a6 ^ (~a7 & a8') __ bcax(v3, __ T16B, v27, v0, v28); // a3 = a3' ^ (~a4' & a0) __ bcax(v4, __ T16B, v28, v1, v0); // a4 = a4' ^ (~a0 & a1) __ bcax(v0, __ T16B, v0, v2, v1); // a0 = a0 ^ (~a1 & a2) __ bcax(v1, __ T16B, v1, v27, v2); // a1 = a1 ^ (~a2 & a3) __ bcax(v2, __ T16B, v2, v28, v27); // a2 = a2 ^ (~a3 & a4') __ eor(v0, __ T16B, v0, v31); // a0 = a0 ^ rc

Although this piece of code is not new, and I don't really think that this level of commenting is necessary, especially in code that is very unlikely to change, I added the comments.

shqking · 2025-02-19T02:55:18Z

Hi. Here is the test result of our CI.

copyright year

the following files should update the copyright year to 2025.

src/hotspot/cpu/aarch64/assembler_aarch64.hpp
src/hotspot/cpu/aarch64/stubRoutines_aarch64.hpp
src/hotspot/share/runtime/globals.hpp
src/java.base/share/classes/sun/security/provider/ML_DSA.java
src/java.base/share/classes/sun/security/provider/SHA3Parallel.java
test/micro/org/openjdk/bench/java/security/MLDSA.java

cross-build failure

Cross build for riscv64/s390/ppc64 failed.

Here shows the error msg for ppc64

=== Output from failing command(s) repeated here ===
* For target support_interim-jmods_support__create_java.base.jmod_exec:
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/tmp/jdk-src/src/hotspot/share/asm/codeBuffer.hpp:200), pid=72752, tid=72769
#  assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000e85cb03dc620 <= 0x0000e85cb03e8ab4 <= 0x0000e85cb03e8ab0
#
# JRE version: OpenJDK Runtime Environment (25.0) (fastdebug build 25-internal-git-1e01c6deec3)
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 25-internal-git-1e01c6deec3, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64)
# Problematic frame:
# V  [libjvm.so+0x3b391c]  Instruction_aarch64::~Instruction_aarch64()+0xbc
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /tmp/ci-scripts/jdk-src/make/
#
# An error report file with more information is saved as:
# /tmp/jdk-src/make/hs_err_pid72752.log
   ... (rest of output omitted)

* All command lines available in /sysroot/ppc64el/tmp/build-ppc64el/make-support/failure-logs.
=== End of repeated output ===

I suppose we should make the similar update at file src/hotspot/cpu/aarch64/stubDeclarations_aarch64.hpp to other platforms

adinn · 2025-02-19T10:42:02Z

@ferakocz Apologies for the delays in reviewing and the limited feedback up to now. The code clearly does the job well but I think it would be made clearer and easier to maintain by tweaking/extending some of the generator methods and adding more detailed commenting. I am afraid I may take a few days to provide the relevant details because of other commitments.

adinn

Please add comments as indicated to relate generated code to original Java source. Otherwise good to go.

adinn · 2025-02-24T16:31:14Z

I raised JDK-8350589 to cover investigation of an alternative implementation.

adinn

Ok, still good

adinn · 2025-02-27T09:48:26Z

@ferakocz Apologies for raising yet another resolve conflict. You will need to make a further adjustment to the compiler blob declaration to accommodate a fix I just pushed to resolve a problem with cross-compilation. Your patch should now specify

*** oops, mistake, see below ***

adinn · 2025-02-27T09:53:21Z

Oops. sorry - cut and paste error -- the new setting should be

do_arch_blob(compiler, 55000 ZGC_ONLY(+5000))

ferakocz · 2025-03-04T11:11:44Z

Oops. sorry - cut and paste error -- the new setting should be
do_arch_blob(compiler, 55000 ZGC_ONLY(+5000))

@adinn, I have done this change, but that erased your approval. Could you reapprove?

adinn

Still good.

adinn · 2025-03-04T14:01:27Z

@ferakocz Feel free to integrate and I will sponsor

ferakocz · 2025-03-04T14:07:33Z

/integrate

openjdk · 2025-03-04T14:08:01Z

@ferakocz
Your change (at version d82dfb2) is now ready to be sponsored by a Committer.

ferakocz · 2025-03-04T14:10:17Z

@ferakocz Feel free to integrate and I will sponsor

@adinn thanks a lot for the review and the sponsoring, too!

seanjmullan · 2025-03-04T14:33:01Z

I think it would be nice to add a release note for this describing the approximate performance improvement.

adinn · 2025-03-04T14:41:11Z

/sponsor

openjdk · 2025-03-04T14:41:38Z

Going to push as commit 3230894.
Since your change was applied there have been 49 commits pushed to the master branch:

8073914: 8350974: The os_cpu VM_STRUCTS, VM_TYPES, etc have no declarations and should be removed
7ee89a5: 8350893: Use generated names for hand generated opto runtime blobs
fae37aa: 8345627: [REDO] Use gcc12 -ftrivial-auto-var-init=pattern in debug builds
1f10ffb: 8350851: ZGC: Reduce size of ZAddressOffsetMax scaling data structures
4fc72b8: 8351082: Remove dead code for estimating CDS archive size
b6e2d66: 8351087: Combine scratch object tables in heapShared.cpp
d9b98f7: 8350771: Fix -Wzero-as-null-pointer-constant warning in nsk/monitoring ThreadController utility
7c173fd: 8351077: Shenandoah: Update comments in ShenandoahConcurrentGC::op_reset_after_collect
96613cc: 8349516: StAXStream2SAX.handleCharacters() fails on empty CDATA
3a8a432: 8349094: GenShen: Race between control and regulator threads may violate assertions
... and 39 more: https://git.openjdk.org/jdk/compare/ab4b0ef9242a4cd964fbcf2d1f3d370234c09408...master

Your commit was automatically rebased without conflicts.

openjdk · 2025-03-04T14:41:48Z

@adinn @ferakocz Pushed as commit 3230894.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

8348561: Add aarch64 intrinsics for ML-DSA

e40e260

openjdk bot added graal [email protected] security [email protected] hotspot [email protected] labels Jan 24, 2025

fixing whitespace errors

eb7cc43

openjdk bot added the rfr Pull request is ready for review label Jan 27, 2025

Use SHA3Parallel for matrix generation

3615163

openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Jan 30, 2025

merging master

5630fd1

openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Feb 3, 2025

removed debugging code

9f7c4a2

ferakocz added 2 commits February 6, 2025 19:39

Adding comments + some code reorganization

9a3a944

renaming a couple of functions

28b1cf7

Bhavana-Kilambi reviewed Feb 11, 2025

View reviewed changes

adinn reviewed Feb 18, 2025

View reviewed changes

Fixed copyright years

bc1e089

adinn approved these changes Feb 24, 2025

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Feb 24, 2025

adinn approved these changes Feb 24, 2025

View reviewed changes

ferakocz added 2 commits February 26, 2025 15:08

Changed aarch64-asmtest.py as suggested by Bhavana-Kilambi

62d2a3d

Added more comments, mainly as suggested by Andrew Dinn

aa0570d

openjdk bot removed the ready Pull request is ready to be integrated label Feb 26, 2025

adinn approved these changes Feb 26, 2025

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Feb 26, 2025

Merged master.

d82dfb2

openjdk bot removed the ready Pull request is ready to be integrated label Feb 28, 2025

adinn approved these changes Mar 4, 2025

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Mar 4, 2025

openjdk bot added the sponsor Pull request is ready to be sponsored label Mar 4, 2025

openjdk bot added the integrated Pull request has been integrated label Mar 4, 2025

openjdk bot closed this Mar 4, 2025

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Mar 4, 2025

8348561: Add aarch64 intrinsics for ML-DSA #23300

8348561: Add aarch64 intrinsics for ML-DSA #23300

Uh oh!

Conversation

ferakocz commented Jan 24, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Reviewing

Uh oh!

bridgekeeper bot commented Jan 24, 2025

Uh oh!

openjdk bot commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openjdk bot commented Jan 24, 2025

Uh oh!

mlbridge bot commented Jan 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

openjdk bot commented Jan 30, 2025

Uh oh!

adinn commented Jan 30, 2025

Uh oh!

ferakocz commented Feb 3, 2025

Uh oh!

adinn commented Feb 4, 2025

Uh oh!

ferakocz commented Feb 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adinn commented Feb 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mcpowers commented Feb 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

theRealAph Feb 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ferakocz Feb 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ferakocz Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shqking commented Feb 19, 2025

copyright year

cross-build failure

Uh oh!

adinn commented Feb 19, 2025

Uh oh!

adinn left a comment

Choose a reason for hiding this comment

ferakocz commented Jan 24, 2025 •

edited by openjdk bot

Loading

openjdk bot commented Jan 24, 2025 •

edited

Loading

mlbridge bot commented Jan 27, 2025 •

edited

Loading

ferakocz commented Feb 4, 2025 •

edited

Loading

adinn commented Feb 5, 2025 •

edited

Loading

theRealAph Feb 25, 2025 •

edited

Loading

ferakocz Feb 25, 2025 •

edited

Loading

ferakocz Feb 27, 2025 •

edited

Loading

adinn commented Feb 27, 2025 •

edited

Loading