Cryptographic Acceleators For Digital Signature Based On Ed25519
Cryptographic Acceleators For Digital Signature Based On Ed25519
Abstract— This article presents highly optimized implemen- Although most current cryptosystems will be broken by
tations of the Ed25519 digital signature algorithm [Edwards quantum computing based on Shor’s algorithm [5], the transi-
curve digital signature algorithm (EdDSA)]. This algorithm tion to postquantum cryptography (PQC) includes an emerging
significantly improves the execution time without sacrificing secu- field called hybrid systems [6], requiring both classic and
rity, compared to exiting digital signature algorithms. Although
EdDSA is employed in many widely used protocols, such as PQC [7]. Hence, designing high security ECC-based digi-
TLS and SSH, there appear to be extremely few hardware tal signature for different applications is crucial. EdDSA is
implementations that focus only on EdDSA. Hence, we pro- notable for high speed and constant-time implementations
pose two different field-programmable gate array (FPGA)-based and was quickly implemented as a part of the TLS and
EdDSA implementations, i.e., efficient and high-performance OpenSSH protocols [8]. Hence, it has to be implemented
Ed25519 architectures applicable for a security level comparable in various platforms subject to the performance requirement
to AES-128. Our proposed efficient Ed25519 scheme achieves an
improvement of more than 84% compared to the best previous
of the target application, such as constrained IoT devices.
work by reducing the required area. It also incorporates more However, EdDSA has not got sufficient study, especially in the
than 8× speedup. Furthermore, our proposed high-performance field of hardware implementation based on field-programmable
architecture shows a 21× speedup with more than 6200 digital gate arrays (FPGAs). Therefore, investigation of the hardware
signature algorithms per second, showing a significant improve- implementation of this algorithm is required considering the
ment in terms of utilized area × time on a Xilinx advantages of FPGA-based designs to exploit parallelism,
Zynq-7020 FPGA. Finally, the effective side-channel counter- which leads to improvements in the efficiency of the overall
measures are embedded in our proposed designs, which also
outperform the previous works. system.
There are two main solutions to enable the hardware-based
Index Terms— Ed25519, Edwards curve digital signature digital signature algorithm in the constrained IoT, including:
algorithm (EdDSA), elliptic curve cryptography, hardware 1) HW/SW approach to cope with embedded constraints and
implementation, side channel.
2) pure HW method that includes all in hardware instruc-
I. I NTRODUCTION tions. The HW/SW method makes the design smaller, slower,
Authorized licensed use limited to: Robert Gordon University. Downloaded on May 28,2021 at 00:26:23 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Zhang and Bai [15] proposed a core with a security level make the best of all its features. In this work, we present
128 bit over the SM2 curve. two different architectures, i.e., efficient and high-performance
Recently, a number of hardware implementations have been design of Ed25519 implementation considering different per-
introduced to implement an elliptic curve point multiplication formance levels for time-constrained and area-constrained
(ECPM) core over Curve25519. Sasdrich and Güneysu [16] applications.
proposed the first Curve25519 implementation using Our contributions to this work are listed as follows.
a DSP-based single-core architecture. This work has 1) We propose a new approach for implementing the
been extended by adding side-channel countermeasures EdDSA accelerator on FPGA. We analyze the com-
in [17] and [18] to provide an evaluation against common putation of the restricted-X coordinates of a point on
physical attacks. In [19], fast and compact implementations the Montgomery curve with additional coordinate con-
of ECPM were proposed. This architecture employs a version and design two novel, highly parallel hardware
semisystolic bit-serial multiplier and carry-compact addition architectures based on these algorithms. In this article,
to provide a high-performance architecture. The work of we show how to leverage the advantages of computa-
Koppermann et al. [20], [21] presented a high-speed tion over the Montgomery curve while implementing
prime field multiplier with a latency of 92 μs for a point Ed25519 accelerator circuits so that the true benefits of
multiplication. In addition, in [22], a low-latency ECPM was the accelerator circuits can be achieved.
proposed employing a pipelined arithmetic architecture on 2) We explore the tradeoffs of area and performance
FPGA and ASIC platforms. It should be noted that FPGA to accomplish different optimization perspectives.
implementations of Curve25519 in the literature cannot be We demonstrate various optimization techniques in order
directly compared to ours because the ECPM core in EdDSA to achieve an overall optimization in terms of effi-
occupies more resources for implementing hash core and ciency, including the parallelization, resource sharing,
module L reduction. Furthermore, it requires more time for redundant number presentation, adoption of distrib-
a point multiplication since this architecture is reused for uted RAM and ROM blocks, and interleaved architec-
nonmodular multiplication and module L reduction. ture, which achieves above 84% efficiency improve-
A non-DSP-based Ed25519 point multiplication core was ment of the area–time product compared to the leading
presented by Mehrabi and Doche [23] using the double-and- FPGA implementations.
add algorithm. Hence, this architecture is a nonconstant-time 3) We instantiate the proposed architecture in a Xilinx
core vulnerable to SPA attacks. Notably, the reported area does Zynq-7020 FPGA and provide performance evaluations.
not include all the required modules for providing a digital The effective countermeasures against SCA are embed-
signature, such as hash function and modulus L reduction. ded to enhance the resistance of the proposed archi-
We explore that SHA-512 increases almost 25% utilized area tectures against timing, SPA, and differential power
in Ed25519. Moreover, Turan and Verbauwhede [24] proposed analysis (DPA) attacks.
an Ed25519 architecture combined with the X25519 key The remainder of this article is organized as follows. Section II
exchange. This design targets resource-constrained devices presents the background. Section III conducts our proposed
on a Zynq SoC. Turan and Verbauwhede [24, Sec. 3.3] architectures. The experimental results and comparison are
claimed that the cost of computing using restricted-X coor- given in Section IV. We conclude this article in Section V.
dinates of a point on the Montgomery curve is more than
extended coordinates on the twisted Edwards curve due to
the complexity of coordinate conversion. Therefore, the core II. P RELIMINARIES
works over the twisted Edwards curve. Besides, although side- A. Background
channel countermeasures are considered for the ECPM core,
A point P = (x, y) lies on a twisted Edwards curve
the authors do not include a resistant SHA-512 core, allowing
E if E = {(x, y) ∈ F p × F p : ax 2 + y 2 = 1 + d x 2 y 2 }.
vulnerability against SCA, as shown in [25].
The Ed25519 is a type of Schnorr’s signature employing
Based on the aforementioned discussions, the tradeoff
(twisted) Edwards curves developed by Bernstein et al. [1].
explorations between resource utilization and performance to
Ed25519 includes three different phases, i.e., key generation,
implement an efficient Ed25519 implementation from dif-
signing, and verifying. In the key generation, KeyGen(s)
ferent optimization perspectives have not been thoroughly
takes a parameter s and computes a signing key sk and a
studied. Particularly, designing a unified architecture consist-
public key pk with associated message space M. In sign-
ing of physical protection against SCA in all submodules
ing, a signature (R, S) is generated by Sign(sk, m), taking
to perform secure key generation, signature generation, and
an sk and a message m ∈ M. The signature (R, S) can
signature verification is required. Besides, employing the fast
be verified by Verify( pk, m, R, S) considering the public
and efficient Karatsuba-based multiplier for designing a high-
key pk and message m ∈ M. The Appendix gives these
performance Ed25519 architecture should be investigated.
algorithms. For details, we refer interested readers to [26].
Eventually, the signature computation cost over the Edwards
Moreover, Ed25519 is equivalent to a Montgomery curve
domain compared to the Montgomery domain for a highly
called Curve25519, introduced by Bernstein [27] in 2006.
parallel design should be investigated.
For group arithmetic based on Ed25519, the computa-
tion can be performed on extended homogeneous coordi-
B. Contributions nates [1], [26]. A mapping between affine coordinates (x, y)
To the best of our knowledge, there appear to be very few and extended coordinates (X, Y, Z , T ) for a point P is
hardware implementations that focus only on Ed25519 and defined by x = X/Z , y = Y/Z , and x × y = T /Z .
Authorized licensed use limited to: Robert Gordon University. Downloaded on May 28,2021 at 00:26:23 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Authorized licensed use limited to: Robert Gordon University. Downloaded on May 28,2021 at 00:26:23 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Authorized licensed use limited to: Robert Gordon University. Downloaded on May 28,2021 at 00:26:23 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 4. Proposed Montgomery ladder scheduling in the efficient architecture. x mod L ≡ x 1 2256 + x 0 ≡ x 1 24 × (L − l0 ) + x 0
≡ −x 1 · l0 24 + x 0 . (3)
C. Ed25519 Design Considerations In (3), a 256 × 125-bit nonmodular multiplication should
be performed, which utilizes the already provided modular
1) Hash Unit: According to RFC 8032 [26], SHA-512 is
multiplier. Then, the product is shifted by 4 bits and subtracted
recommended by the standard to use in Ed25519. It takes
from x 0 .
arbitrary inputs in 1024-bit chunks and provides 512-bit out-
For the next round, let x = x 1 2252 + x 0 ; hence, x 1 and
put. In general, hash computation does not take considerable
x 0 have 134 and 252 bit, respectively. The reduction can be
latency compared to ECPM. Therefore, lightweight hardware
performed as follows:
architecture is implemented for efficient architecture, which
utilizes minimum resources. x mod L ≡ x 1 2252 + x 0 ≡ x 1 × (L − l0 ) + x 0
Fig. 5 illustrates message-digest creation for N-block mes- ≡ −x 1 · l0 + x 0 . (4)
sage. As one can see, the main part of SHA-512 is the com-
pressor core, which works iteratively, i.e., 80 times repeated Performing (4) results in a 260-bit-long value. Therefore,
compressing for each 1024-bit chunk of input. the third round must be performed similar to the second round,
In order to minimize CPD, the entire data path is designed leading to a 253-bit-long value.
64-bit. In addition, we use the optimal number of registers 3) Double-Point Multiplication: Two scalar multiplications
employing a dedicated finite state machine and resource shar- are required for a verification procedure. The verifying algo-
ing approach to decrease the utilized resources and complexity. rithm can be revised to improve efficiency, including two main
Authorized licensed use limited to: Robert Gordon University. Downloaded on May 28,2021 at 00:26:23 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Authorized licensed use limited to: Robert Gordon University. Downloaded on May 28,2021 at 00:26:23 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
TABLE II TABLE V
I MPLEMENTATION R ESULTS IN T ERMS OF U TILIZATION R EQUIREMENTS P ERFORMANCE R ESULTS FOR U NPROTECTED AND P ROTECTED
S CHEME A GAINST DPA (R ESULTS A RE R EPORTED
FOR A 1024-bit M ESSAGE )
TABLE III
FPGA I MPLEMENTATION R ESULTS IN T ERMS OF C LOCK C YCLES
Authorized licensed use limited to: Robert Gordon University. Downloaded on May 28,2021 at 00:26:23 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
TABLE VI
C OMPARISON OF D IFFERENT D ESIGNS FOR THE D IGITAL S IGNATURE A LGORITHM
Authorized licensed use limited to: Robert Gordon University. Downloaded on May 28,2021 at 00:26:23 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
implementation and heterogeneous computing. Hence, our [13] B. Panjwani, “Scalable and parameterized hardware implementation of
design achieves almost 40 times speedup compared to [36]. elliptic curve digital signature algorithm over prime fields,” in Proc.
Int. Conf. Adv. Comput., Commun. Informat. (ICACCI), Sep. 2017,
V. C ONCLUSION pp. 211–218.
[14] J. Vliegen et al., “A compact FPGA-based architecture for elliptic curve
In this article, we have proposed hardware design strategies cryptography over prime fields,” in Proc. 21st IEEE Int. Conf. Appl.-
for recently proposed Edwards curve digital signatures Specific Syst., Architectures Processors, 2010, pp. 313–316.
[15] D. Zhang and G. Bai, “High-performance implementation of SM2 based
Ed25519 on Xilinx Zynq-7020 FPGA, including advanced on FPGA,” in Proc. 8th IEEE Int. Conf. Commun. Softw. Netw. (ICCSN),
protection against side-channel attacks. The proposed Jun. 2016, pp. 718–722.
architectures achieve above 84% efficiency improvement [16] P. Sasdrich and T. Güneysu, “Efficient elliptic-curve cryptography
using curve25519 on reconfigurable devices,” in Proc. 10th Int. Symp.,
of the area–time product using pipelined architecture and D. Goehringer, M. D. Santambrogio, J. M. P. Cardoso, and K. Bertels,
interleaved multiplication. Our high-performance and efficient Eds., Vilamoura, Portugal, 2014, pp. 25–36.
architectures compute more than 6200 and 2200 signings and [17] P. Sasdrich and T. Güneysu, “Implementing Curve25519 for side-
5100 and 1500 verifications per second, respectively. We also channel-protected elliptic curve cryptography,” ACM Trans. Reconfig-
urable Technol. Syst., vol. 9, no. 1, pp. 1–15, Nov. 2015.
show the design can outperform recently presented works [18] P. Sasdrich and T. Gäneysu, “Exploring RFC 7748 for hardware imple-
using only moderate resource requirements. mentation: Curve25519 and Curve448 with side-channel protection,”
J. Hardw. Syst. Secur., vol. 2, no. 4, pp. 297–313, Dec. 2018.
A PPENDIX [19] M. Bisheh Niasar, R. El Khatib, R. Azarderakhsh, and
M. Mozaffari-Kermani, “Fast, small, and area-time efficient
Ed25519 has some critical parameters shown in Table VII. architectures for key-exchange on Curve25519,” in Proc. IEEE 27th
EdDSA algorithms are described in Algorithms 1–3, respec- Symp. Comput. Arithmetic (ARITH), Jun. 2020, pp. 72–79.
tively. According to [26], an encoded integer S = enc(S) can [20] P. Koppermann, F. DeSantis, J. Heyszl, and G. Sigl, “X25519 hard-
ware implementation for low-latency applications,” in Proc. Euromicro
be shown in its little-endian convention. In addition, when Conf. Digit. Syst. Design, P. Kitsos, Ed., Limassol, Cyprus, 2016,
an element P = (x, y) is encoded, its y-coordinate should be pp. 99–106.
encoded first, and then, its most significant bit is substituted by [21] P. Koppermann, F. De Santis, J. Heyszl, and G. Sigl, “Low-
the least significant bit of its x. The dom(x, y) string function latency X25519 hardware implementation: Breaking the 100 microsec-
onds barrier,” Microprocessors Microsyst., vol. 52, pp. 491–497,
is blank for Ed25519. Jul. 2017.
[22] R. Salarifard and S. Bayat-Sarmadi, “An efficient low-latency point-
ACKNOWLEDGMENT multiplication over Curve25519,” IEEE Trans. Circuits Syst. I, Reg.
The authors would like to thank the reviewers for their Papers, vol. 66, no. 10, pp. 3854–3862, Oct. 2019.
comments. [23] M. A. Mehrabi and C. Doche, “Low-cost, low-power FPGA implemen-
tation of ED25519 and CURVE25519 point multiplication,” Information,
R EFERENCES vol. 10, no. 9, p. 285, Sep. 2019.
[24] F. Turan and I. Verbauwhede, “Compact and flexible FPGA implemen-
[1] D. J. Bernstein, N. Duif, T. Lange, P. Schwabe, and B. Yang, “High- tation of Ed25519 and X25519,” ACM Trans. Embedded Comput. Syst.,
speed high-security signatures,” in Proc. 13th Int. Workshop, Nara, vol. 18, no. 3, pp. 1–21, 2019.
Japan, Sep./Oct. 2011, pp. 124–142. [25] N. Samwel, L. Batina, G. Bertoni, J. Daemen, and R. Susella, “Breaking
[2] A. C. Aldaya, C. P. García, and B. B. Brumley, “From A to Z: Ed25519 in WolfSSL,” Cryptol. ePrint Arch., Tech. Rep. 2017/985,
Projective coordinates leakage in the wild,” Cryptol. ePrint Arch., 2017.
Tech. Rep. 2020/432, 2020. [26] S. Josefsson and I. Liusvaara, Edwards-Curve Digital Signature Algo-
[3] K. Ryan, “Return of the hidden number Problem: A widespread and rithm (EdDSA), document RFC 8032, 2017, pp. 1–60.
novel key extraction attack on ECDSA and DSA,” Trans. Cryptograph. [27] D. J. Bernstein, “Curve25519: New Diffie-Hellman speed records,” in
Hardw. Embedded Syst., vol. 2019, no. 1, pp. 146–168, Nov. 2018. Proc. 9th Int. Conf. Theory Pract. Public-Key Cryptogr., M. Yung,
[4] D. J. Bernstein and T. Lange. (2011). Security Dangers of the Y. Dodis, A. Kiayias, and T. Malkin, Eds., New York, NY, USA, 2006,
Nist Curves. [Online]. Available: https://www.hyperelliptic.org/tanja/ pp. 207–228.
vortraege/20130531.pdf [28] H. Hisil, K. K.-H. Wong, G. Carter, and E. Dawson, “Twisted edwards
[5] P. W. Shor, “Algorithms for quantum computation: Discrete logarithms curves revisited,” Cryptol. ePrint Arch., Tech. Rep. 2008/522, 2008.
and factoring,” in Proc. 35th Annu. Symp. Found. Comput. Sci., Santa Fe, [29] M. Hamburg, “Fast and compact elliptic-curve cryptography,” in Proc.
NM, USA, Nov. 1994, pp. 124–134. IACR, 2012, p. 309.
[6] N. Bindel, U. Herath, M. McKague, and D. Stebila, “Transitioning to [30] K. Okeya and K. Sakurai, “Efficient elliptic curve cryptosystems from
a quantum-resistant public key infrastructure,” in Proc. IACR, 2017, a scalar multiplication algorithm with recovery of the Y-coordinate on a
p. 460. montgomery-form elliptic curve,” in Proc. Int. Workshop, Paris, France,
[7] M. Bisheh-Niasar, R. Azarderakhsh, and M. Mozaffari-Kermani, May 2001, pp. 126–141.
“High-speed NTT-based polynomial multiplication accelerator for [31] D. F. Aranha, F. R. Novaes, A. Takahashi, M. Tibouchi, and Y. Yarom,
CRYSTALS-Kyber post-quantum cryptograsphy,” Cryptol. ePrint Arch., “Ladderleak: Breaking ECDSA with less than one bit of nonce leakage,”
Tech. Rep. 2021/563, 2021. Cryptol. ePrint Arch., Tech. Rep. 2020/615, 2020.
[8] (2020). Things That Use Ed25519. [Online]. Available: https://ianix. [32] J. Coron, “Resistance against differential power analysis for ellip-
com/pub/ed25519-deployment.html tic curve cryptosystems,” in Proc. Cryptograph. Hardw. Embedded
[9] P. Kietzmann, L. Boeckmann, L. Lanzieri, T. C. Schmidt, and Syst., Ç. K. Koç and C. Paar, Eds., Worcester, MA, USA, 1999,
M. Wählisch, “A performance study of crypto-hardware in the low-end pp. 292–302.
IoT,” in Proc. IACR, 2021, p. 58. [33] P. Schwabe. (Sep. 2013). Scalar-Multiplication Algorithms. [Online].
[10] M. Bisheh Niasar, R. Azarderakhsh, and M. Mozaffari Kermani, “Effi- Available: https://cryptojedi.org/peter/data/eccss-20130911b.pdf
cient hardware implementations for elliptic curve cryptography over [34] M. Bisheh Niasar, R. Azarderakhsh, and M. Mozaffari Kermani, “Opti-
Curve448,” in Proc. 21st Int. Conf. Cryptol. India, Bangalore, India, mized architectures for elliptic curve cryptography over Curve448,” in
Dec. 2020, pp. 228–247. Proc. IACR, 2020, p. 1338.
[11] M. Bisheh-Niasar, R. Azarderakhsh, and M. Mozaffari-Kermani, “Area- [35] M. Scott, “On the deployment of curve based cryptography for the
time efficient hardware architecture for signature based on Ed448,” IEEE Internet of Things,” in Proc. IACR, 2020, p. 514.
Trans. Circuits Syst. II, Exp. Briefs, early access, Mar. 23, 2021, doi: [36] H. Fujii and D. F. Aranha, “Curve25519 for the cortex-M4 and beyond,”
10.1109/TCSII.2021.3068136. in Proc. 5th Int. Conf. Cryptol. Inf. Secur. Latin Amer., Havana, Cuba,
[12] B. Glas, O. Sander, V. Stuckert, K. D. Müller-Glaser, and J. Becker, Sep. 2017, pp. 109–127.
“Prime field ECDSA signature processing for reconfigurable embed- [37] D. Bernstein and T. Lange. EBACS: ECRYPT Benchmarking of Cryp-
ded systems,” Int. J. Reconfigurable Comput., vol. 2011, Oct. 2011, tographic Systems. Accessed: Mar. 22, 2021. [Online]. Available:
Art. no. 836460. https://bench.cr.yp.to
Authorized licensed use limited to: Robert Gordon University. Downloaded on May 28,2021 at 00:26:23 UTC from IEEE Xplore. Restrictions apply.