Skip to content

Conversation

@Mateusvff
Copy link

Summary

This pull request adds experimental support for a hybrid post-quantum handshake to wireguard-go, combining the existing X25519-based key agreement with ML-KEM (Kyber-1024) and ML-DSA (Dilithium) for authentication. The goal is to provide a prototype implementation of a post-quantum–ready WireGuard handshake while preserving the original Noise_IK pattern and existing behaviour for non-PQ peers.

The implementation is based on the circl cryptographic library from Cloudflare, which provides Go implementations of Kyber and Dilithium.


Design overview

The implementation is split into two main parts:

  1. Hybrid key agreement (ML-KEM / Kyber-1024)

    • Each device now maintains an ML-KEM static key pair (Kyber-1024).
    • During the initiation message, the initiator performs both X25519 Diffie–Hellman and Kyber encapsulation using the responder’s ML-KEM public key.
    • The ML-KEM ciphertext is carried in an extended field of MessageInitiation and is encrypted within the Noise handshake.
    • The classical X25519 shared secret and the ML-KEM shared secret are combined using a KDF (KDF2) to derive a single combinedSecret, which feeds into the existing Noise key schedule.
  2. Hybrid authentication (ML-DSA / Dilithium)

    • Each device also maintains an ML-DSA static key pair (Dilithium5).
    • MessageInitiation is extended with a Signature field that carries a Dilithium signature over the serialized message fields (excluding MAC and signature fields).
    • On the responder side, after resolving the peer via the classical identity, the responder loads the peer’s ML-DSA public key and verifies the signature before proceeding with ML-KEM decapsulation and key derivation.

Implementation details

  • New sizes and types

    • Added ML-KEM constants: MLKEMPublicKeySize, MLKEMPrivateKeySize, MLKEMCiphertextSize.
    • Added ML-DSA constants: MLDSAPublicKeySize, MLDSAPrivateKeySize, MLDSASignatureSize.
    • These are defined in device/noise-types.go.
  • Extended identity and handshake structures

    • staticIdentity now includes:
      • mlkemPrivateKey, mlkemPublicKey
      • mldsaPrivateKey, mldsaPublicKey
    • Handshake now includes:
      • remoteMLKEMStatic (peer’s ML-KEM public key)
      • remoteMLDSAStatic (peer’s ML-DSA public key)
  • Handshake protocol changes

    • In device/noise-protocol.go:
      • MessageInitiation was extended with:
        • MLKEM field to carry the Kyber ciphertext.
        • Signature field to carry the Dilithium signature.
      • CreateMessageInitiation:
        • Performs X25519 DH as before.
        • Calls kyber1024.Scheme().Encapsulate(remoteMLKEMStatic) to obtain mlkemSecret and ciphertext.
        • Encrypts and stores ciphertext into msg.MLKEM.
        • Combines the X25519 secret and mlkemSecret via KDF2 into combinedSecret.
        • Serializes the message fields and signs them using the local ML-DSA private key, storing the result in msg.Signature.
      • ConsumeMessageInitiation:
        • Resolves the peer via the classical identity (X25519).
        • Retrieves remoteMLDSAStatic and verifies msg.Signature with the Dilithium verification routine.
        • Decrypts msg.MLKEM and calls Decapsulate with the local ML-KEM private key to recover mlkemSecret.
        • Recomputes combinedSecret using the same KDF and continues with the existing Noise key schedule.
  • UAPI and key management

    • device/uapi.go was updated to accept:
      • mlkem_private_key, mlkem_public_key
      • mldsa_private_key, mldsa_public_key
    • A helper module device/quantum-keys.go was added providing:
      • GenerateQuantumKeyPair for Kyber-1024 key pairs.
      • GenerateMLDSAKeyPair for Dilithium5 key pairs.
    • The implementation uses circl’s kem.Scheme API for Kyber (via GenerateKeyPair) and sign.Scheme for Dilithium (via GenerateKey).

Compatibility

  • Existing behaviour and classical X25519-only handshakes remain unchanged when PQ keys are not configured.
  • The hybrid mode is only used when both sides provide ML-KEM and ML-DSA keys via UAPI.
  • The Noise_IK pattern and MAC validation logic remain intact; PQ data is carried in additional fields and integrated into the key schedule and authentication steps.

Testing

  • Unit tests

    • Added tests for:
      • ML-KEM key generation, encapsulation and decapsulation.
      • ML-DSA key generation, signing and verification.
      • KDF combination of classical and PQ secrets.
  • Integration tests

    • End-to-end handshake tests between two PQ-enabled peers (initiator and responder).
    • Regression tests with classical-only peers to ensure no change in existing behaviour.
  • Manual validation

    • Verified that both peers derive identical session keys in the hybrid mode.
    • Verified that invalid signatures or corrupted PQ ciphertexts abort the handshake.
    • Performed basic performance checks to confirm that handshake latency remains within acceptable bounds.

dinhngtu and others added 30 commits November 30, 2025 19:06
Use parallel summation with native byte order per RFC 1071.
add-with-carry operation is used to add 4 words per operation.  Byteswap
is performed before and after checksumming for compatibility with old
`checksumNoFold()`.  With this we get a 30-80% speedup in `checksum()`
depending on packet sizes.

Add unit tests with comparison to a per-word implementation.

**Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz**

| Size | OldTime | NewTime | Speedup  |
|------|---------|---------|----------|
| 64   | 12.64   | 9.183   | 1.376456 |
| 128  | 18.52   | 12.72   | 1.455975 |
| 256  | 31.01   | 18.13   | 1.710425 |
| 512  | 54.46   | 29.03   | 1.87599  |
| 1024 | 102     | 52.2    | 1.954023 |
| 1500 | 146.8   | 81.36   | 1.804326 |
| 2048 | 196.9   | 102.5   | 1.920976 |
| 4096 | 389.8   | 200.8   | 1.941235 |
| 8192 | 767.3   | 413.3   | 1.856521 |
| 9000 | 851.7   | 448.8   | 1.897727 |
| 9001 | 854.8   | 451.9   | 1.891569 |

**AMD EPYC 7352 24-Core Processor**

| Size | OldTime | NewTime | Speedup  |
|------|---------|---------|----------|
| 64   | 9.159   | 6.949   | 1.318031 |
| 128  | 13.59   | 10.59   | 1.283286 |
| 256  | 22.37   | 14.91   | 1.500335 |
| 512  | 41.42   | 24.22   | 1.710157 |
| 1024 | 81.59   | 45.05   | 1.811099 |
| 1500 | 120.4   | 68.35   | 1.761522 |
| 2048 | 162.8   | 90.14   | 1.806079 |
| 4096 | 321.4   | 180.3   | 1.782585 |
| 8192 | 650.4   | 360.8   | 1.802661 |
| 9000 | 706.3   | 398.1   | 1.774177 |
| 9001 | 712.4   | 398.2   | 1.789051 |

Signed-off-by: Tu Dinh Ngoc <[email protected]>
[Jason: simplified and cleaned up unit tests]
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Mateus Franco <[email protected]>
Signed-off-by: ruokeqx <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Mateus Franco <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Mateus Franco <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Mateus Franco <[email protected]>
It should be POLLIN because closeFd is read-only file.

Signed-off-by: Kurnia D Win <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Mateus Franco <[email protected]>
Reduce allocations by eliminating byte reader, hand-rolled decoding and
reusing message structs.

Synthetic benchmark:

    var msgSink MessageInitiation
    func BenchmarkMessageInitiationUnmarshal(b *testing.B) {
        packet := make([]byte, MessageInitiationSize)
        reader := bytes.NewReader(packet)
        err := binary.Read(reader, binary.LittleEndian, &msgSink)
        if err != nil {
            b.Fatal(err)
        }
        b.Run("binary.Read", func(b *testing.B) {
            b.ReportAllocs()
            for range b.N {
                reader := bytes.NewReader(packet)
                _ = binary.Read(reader, binary.LittleEndian, &msgSink)
            }
        })
        b.Run("unmarshal", func(b *testing.B) {
            b.ReportAllocs()
            for range b.N {
                _ = msgSink.unmarshal(packet)
            }
        })
    }

Results:
                                         │      -      │
                                         │   sec/op    │
MessageInitiationUnmarshal/binary.Read-8   1.508µ ± 2%
MessageInitiationUnmarshal/unmarshal-8     12.66n ± 2%

                                         │      -       │
                                         │     B/op     │
MessageInitiationUnmarshal/binary.Read-8   208.0 ± 0%
MessageInitiationUnmarshal/unmarshal-8     0.000 ± 0%

                                         │      -       │
                                         │  allocs/op   │
MessageInitiationUnmarshal/binary.Read-8   2.000 ± 0%
MessageInitiationUnmarshal/unmarshal-8     0.000 ± 0%

Signed-off-by: Alexander Yastrebov <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Mateus Franco <[email protected]>
This is already enforced in receive.go, but if these unmarshallers are
to have error return values anyway, make them as explicit as possible.

Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Mateus Franco <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Mateus Franco <[email protected]>
This pairs with the recent change in wireguard-tools.

Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Mateus Franco <[email protected]>
Optimize message encoding by eliminating binary.Write (which internally
uses reflection) in favour of hand-rolled encoding.

This is companion to 9e7529c.

Synthetic benchmark:

    var packetSink []byte
    func BenchmarkMessageInitiationMarshal(b *testing.B) {
        var msg MessageInitiation
        b.Run("binary.Write", func(b *testing.B) {
            b.ReportAllocs()
            for range b.N {
                var buf [MessageInitiationSize]byte
                writer := bytes.NewBuffer(buf[:0])
                _ = binary.Write(writer, binary.LittleEndian, msg)
                packetSink = writer.Bytes()
            }
        })
        b.Run("binary.Encode", func(b *testing.B) {
            b.ReportAllocs()
            for range b.N {
                packet := make([]byte, MessageInitiationSize)
                _, _ = binary.Encode(packet, binary.LittleEndian, msg)
                packetSink = packet
            }
        })
        b.Run("marshal", func(b *testing.B) {
            b.ReportAllocs()
            for range b.N {
                packet := make([]byte, MessageInitiationSize)
                _ = msg.marshal(packet)
                packetSink = packet
            }
        })
    }

Results:
                                             │      -      │
                                             │   sec/op    │
    MessageInitiationMarshal/binary.Write-8    1.337µ ± 0%
    MessageInitiationMarshal/binary.Encode-8   1.242µ ± 0%
    MessageInitiationMarshal/marshal-8         53.05n ± 1%

                                             │     -      │
                                             │    B/op    │
    MessageInitiationMarshal/binary.Write-8    368.0 ± 0%
    MessageInitiationMarshal/binary.Encode-8   160.0 ± 0%
    MessageInitiationMarshal/marshal-8         160.0 ± 0%

                                             │     -      │
                                             │ allocs/op  │
    MessageInitiationMarshal/binary.Write-8    3.000 ± 0%
    MessageInitiationMarshal/binary.Encode-8   1.000 ± 0%
    MessageInitiationMarshal/marshal-8         1.000 ± 0%

Signed-off-by: Alexander Yastrebov <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Mateus Franco <[email protected]>
Kernels below 5.12 are missing this:

    commit 98184612aca0a9ee42b8eb0262a49900ee9eef0d
    Author: Norman Maurer <[email protected]>
    Date:   Thu Apr 1 08:59:17 2021

        net: udp: Add support for getsockopt(..., ..., UDP_GRO, ..., ...);

        Support for UDP_GRO was added in the past but the implementation for
        getsockopt was missed which did lead to an error when we tried to
        retrieve the setting for UDP_GRO. This patch adds the missing switch
        case for UDP_GRO

        Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.")
        Signed-off-by: Norman Maurer <[email protected]>
        Reviewed-by: David Ahern <[email protected]>
        Signed-off-by: David S. Miller <[email protected]>

That means we can't set the option and then read it back later. Given
how buggy UDP_GRO is in general on odd kernels, just disable it on older
kernels all together.

Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Mateus Franco <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Mateus Franco <[email protected]>
…ivate keys in Device struct

Signed-off-by: Mateus Franco <[email protected]>
… marshaling and unmarshaling

Signed-off-by: Mateus Franco <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

8 participants