96 results sorted by ID
PQCUARK: A Scalar RISC-V ISA Extension for ML-KEM and ML-DSA
Xavier Carril, Alicia Manuel Pasoot, Emanuele Parisi, Carlos Andrés Lara-Niño, Oriol Farràs, Miquel Moretó
Implementation
Recent advances in quantum computing pose a threat to the security of digital communications, as large-scale quantum machines can break commonly used cryptographic algorithms, such as RSA and ECC. To mitigate this risk, post-quantum cryptography (PQC) schemes are being standardized, with recent NIST recommendations selecting two lattice-based algorithms: ML-KEM for key encapsulation and ML-DSA for digital signatures. Two computationally intensive kernels dominate the execution of these...
Extending and Accelerating Inner Product Masking with Fault Detection via Instruction Set Extension
Songqiao Cui, Geng Luo, Junhan Bao, Josep Balasch, Ingrid Verbauwhede
Implementation
Inner product masking is a well-studied masking countermeasure against side-channel attacks. IPM-FD further extends the IPM scheme with fault detection capabilities. However, implementing IPM-FD in software especially on embedded devices results in high computational overhead. Therefore, in this work we perform a detailed analysis of all building blocks for IPM-FD scheme and propose a Masked Processing Unit to accelerate all operations, for example multiplication and IPM-FD specific...
Taming the Stack: Proof-Preserving Blockwise FrodoKEM on RISC-V Devices with Hardware Acceleration
Frank Hartmann
Implementation
FrodoKEM provides conservative post-quantum security through unstructured lattices, yet its deployment on embedded systems is historically constrained by high memory requirements. While state-of-the-art implementations mitigate this by generating the public matrix on-the-fly, they remain bottlenecked by the sequential generation of secret matrices, which enforces a rigid trade-off between stack usage and recomputation overhead. To address this, we propose a blockwise secret generation...
Architectural Leakage Analysis of Masked Cryptographic Software on RISC-V Cores
Siddhartha Chowdhury, Nimish Mishra, Sarani Bhattacharya, Debdeep Mukhopadhyay
Attacks and cryptanalysis
Software masking—particularly through threshold implementations—has long been regarded as a foundational defense mechanism against side-channel attacks. These schemes enforce the principles of non-completeness and uniformity, offering provable first-order resistance even under realistic leakage assumptions. However, such assurances were primarily developed under the simplified assumption of scalar or in-order execution, where instruction flow and data dependencies are well-behaved and...
HORCRUX - A Lightweight PQC-RISC-V eXtension Architecture
Alessandra Dolmeta, Valeria Piscopo, Guido Masera, Maurizio Martina, Michael Hutter
Implementation
This work presents a RISC-V extension for Post-Quantum Cryptography
(PQC) called HORCRUX, which provides a unified Instruction-Set Extension (ISE) supporting all NIST-approved PQC algorithms. HORCRUX addresses the current fragmentation in hardware support, where existing extensions typically focus on individual algorithms or limited subsets of PQC schemes, and targets the common kernels shared across ML-KEM, ML-DSA, SLH-DSA, and HQC. To address the primary computational bottlenecks of all...
Vectorized Falcon-Sign Implementations using SSE2, AVX2, AVX-512F, NEON, and RVV
Jipeng Zhang, Jiaheng Zhang
Implementation
Falcon, a NTRU-based digital signature algorithm, has been selected by NIST as one of the post-quantum cryptography (PQC) standards. Compared to verification, the signature generation of Falcon is relatively slow. One of the core operations in signature generation is discrete Gaussian sampling, which involves a component known as the BaseSampler. The BaseSampler accounts for up to 30% of the time required for signature generation, making it a significant performance bottleneck. This work...
Key Recovery from Side-Channel Power Analysis Attacks on Non-SIMD HQC Decryption
Nathan Maillet, Cyrius Nugier, Vincent Migliore, Jean-Christophe Deneuville
Attacks and cryptanalysis
HQC is a code-based cryptosystem that has recently been announced for standardization after the fourth round of the NIST post-quantum cryptography standardization process. During this process, the NIST specifically required submitters to provide two kinds of implementation: a reference one, meant to serve lisibility and compliance with the specifications; and an optimized one, aimed at showing the performance of the scheme alongside other desirable properties such as resilience against...
Brief Comments on Rijndael-256 and the Standard RISC-V Cryptography Extensions
Markku-Juhani O. Saarinen
Implementation
We evaluate the implementation aspects of Rijndael-256 using the ratified RISC-V Vector Cryptography extension Zvkn. A positive finding is that Rijndael-256 can be implemented in constant time with the existing RISC-V ISA as the critical AES and fixed crossbar permutation instructions are in the DIEL (data-independent execution latency) set. Furthermore, simple tricks can be used to expand the functionality of key expansion instructions to cover the additional round constants required....
CRAFT: Characterizing and Root-Causing Fault Injection Threats at Pre-Silicon
Arsalan Ali Malik, Harshvadan Mihir, Aydin Aysu
Attacks and cryptanalysis
Fault injection attacks (FIA) pose significant security threats to embedded systems as they exploit weaknesses across multiple layers, including system software, instruction set architecture (ISA), microarchitecture, and physical hardware. Early detection and understanding of how physical faults propagate to system-level behavior are essential to safeguarding cyberinfrastructure.
This work introduces CRAFT, a framework that combines pre-silicon analysis with post-silicon validation to...
Proving CPU Executions in Small Space
Vineet Nair, Justin Thaler, Michael Zhu
Cryptographic protocols
zkVMs are SNARKs for verifying CPU execution. They allow an untrusted prover to show that it correctly ran a specified program on a witness, where the program is given as bytecode conforming to an instruction set architecture like RISC-V. Existing zkVMs still struggle with high prover resource costs, notably large runtime and memory usage. We show how to implement Jolt—an advanced, sum-check- based zkVM—with a significantly reduced memory footprint, without relying on SNARK recursion, and...
Combined Masking and Shuffling for Side-Channel Secure Ascon on RISC-V
Linus Mainka, Kostas Papagiannopoulos
Implementation
Both masking and shuffling are very common software countermeasures against side-channel attacks. However, exploring possible combinations of the two countermeasures to increase and fine-tune side-channel resilience is less investigated. With this work, we aim to bridge that gap by both concretising the security guarantees of several masking and shuffling combinations presented in earlier work and additionally investigating their randomness cost. We subsequently implement these approaches to...
Masking-Friendly Post-Quantum Signatures in the Threshold-Computation-in-the-Head Framework
Thibauld Feneuil, Matthieu Rivain, Auguste Warmé-Janville
Cryptographic protocols
Side-channel attacks pose significant threats to cryptographic implementations, which require the inclusion of countermeasures to mitigate these attacks. In this work, we study the masking of state-of-the-art post-quantum signatures based on the MPC-in-the-head paradigm. More precisely, we focus on the recent threshold-computation-in-the-head (TCitH) framework that applies to some NIST candidates of the post-quantum standardization process. We first provide an analysis of side-channel attack...
MIDAS: an End-to-end CAD Framework for Automating Combinational Logic Locking
Akashdeep Saha, Siddhartha Chowdhury, Rajat Subhra Chakraborty, Debdeep Mukhopadhyay
Implementation
Logic locking has surfaced as a notable safeguard
against diverse hazards that pose a risk to the integrated circuit
(IC) supply chain. Existing literature on logic locking largely
encompasses the art of proposing new constructions, on the one
hand, and unearthing weaknesses in such algorithms on the
other. Somehow, in this race of make and break, the stress on
automation of adopting such techniques on real-life circuits has
been rather limited. For the first time, we present a...
Twist and Shout: Faster memory checking arguments via one-hot addressing and increments
Srinath Setty, Justin Thaler
Cryptographic protocols
A memory checking argument enables a prover to prove to a verifier that it is correctly processing reads and writes to memory. They are used widely in modern SNARKs, especially in zkVMs, where the prover proves the correct execution of a CPU including the correctness of memory operations.
We describe a new approach for memory checking, which we call the method of one-hot addressing and increments. We instantiate this method via two different families of protocols, called Twist and Shout....
PASTA on Edge: Cryptoprocessor for Hybrid Homomorphic Encryption
Aikata Aikata, Daniel Sanz Sobrino, Sujoy Sinha Roy
Implementation
Fully Homomorphic Encryption (FHE) enables privacy-preserving computation but imposes significant computational and communication overhead on the client for the public-key encryption. To alleviate this burden, previous works have introduced the Hybrid Homomorphic Encryption (HHE) paradigm, which combines symmetric encryption with homomorphic decryption to enhance performance for the FHE client. While early HHE schemes focused on binary data, modern versions now support integer prime fields,...
ARCHER: Architecture-Level Simulator for Side-Channel Analysis in RISC-V Processors
Asmita Adhikary, Abraham J. Basurto Becerra, Lejla Batina, Ileana Buhan, Durba Chatterjee, Senna van Hoek, Eloi Sanfelix Gonzalez
Applications
Side-channel attacks pose a serious risk to cryptographic implementations, particularly in embedded systems. While current methods, such as test vector leakage assessment~(TVLA), can identify leakage points, they do not provide insights into their root causes. We propose ARCHER, an architecture-level tool designed to perform side-channel analysis and root cause identification for software cryptographic implementations on RISC-V processors. ARCHER has two main components: (1) Side-Channel...
Verifying Jolt zkVM Lookup Semantics
Carl Kwan, Quang Dao, Justin Thaler
Applications
Lookups are a popular way to express repeated constraints in state-of-the art SNARKs. This is especially the case for zero-knowledge virtual machines (zkVMs), which produce succinct proofs of correct execution for programs expressed as bytecode according to a specific instruction set architecture (ISA). The Jolt zkVM (Arun, Setty & Thaler, Eurocrypt 2024) for RISC-V ISA employs Lasso (Setty, Thaler & Wahby, Eurocrypt 2024), an efficient lookup argument for massive structured tables, to prove...
Towards Explainable Side-Channel Leakage: Unveiling the Secrets of Microarchitecture
Ischa Stork, Vipul Arora, Łukasz Chmielewski, Ileana Buhan
Implementation
We explore the use of microbenchmarks, small assembly code snippets, to detect microarchitectural side-channel leakage in CPU implementations. Specifically, we investigate the effectiveness of microbenchmarks in diagnosing the predisposition to side-channel leaks in two commonly used RISC-V cores: Picorv32 and Ibex. We propose a new framework that involves diagnosing side-channel leaks, identifying leakage points, and constructing leakage profiles to understand the underlying causes. We...
PQNTRU: Acceleration of NTRU-based Schemes via Customized Post-Quantum Processor
Zewen Ye, Junhao Huang, Tianshun Huang, Yudan Bai, Jinze Li, Hao Zhang, Guangyan Li, Donglong Chen, Ray C.C. Cheung, Kejie Huang
Implementation
Post-quantum cryptography (PQC) has rapidly evolved in response to the emergence of quantum computers, with the US National Institute of Standards and Technology (NIST) selecting four finalist algorithms for PQC standardization in 2022, including the Falcon digital signature scheme. The latest round of digital signature schemes introduced Hawk, both based on the NTRU lattice, offering compact signatures, fast generation, and verification suitable for deployment on resource-constrained...
NeutronNova: Folding everything that reduces to zero-check
Abhiram Kothapalli, Srinath Setty
Foundations
We introduce NeutronNova, a new folding scheme for the zero-check relation: an instance-witness pair is in the zero-check relation if a corresponding multivariate polynomial evaluates to zero for all inputs over a suitable Boolean hypercube. The folding scheme is a two-round protocol, and it internally invokes a \emph{single} round of the sum-check protocol. The folding scheme is more efficient than prior state-of-the-art schemes and directly benefits from recent improvements to the...
Optimized Software Implementation of Keccak, Kyber, and Dilithium on RV{32,64}IM{B}{V}
Jipeng Zhang, Yuxing Yan, Junhao Huang, Çetin Kaya Koç
Implementation
With the standardization of NIST post-quantum cryptographic (PQC) schemes, optimizing these PQC schemes across various platforms presents significant research value. While most existing software implementation efforts have concentrated on ARM platforms, research on PQC implementations utilizing various RISC-V instruction set architectures (ISAs) remains limited.
In light of this gap, this paper proposes comprehensive and efficient optimizations of Keccak, Kyber, and Dilithium on...
Marian: An Open Source RISC-V Processor with Zvk Vector Cryptography Extensions
Thomas Szymkowiak, Endrit Isufi, Markku-Juhani Saarinen
Implementation
The RISC-V Vector Cryptography Extensions (Zvk) were ratified in 2023 and integrated into the main ISA manuals in 2024. These extensions support high-speed symmetric cryptography (AES, SHA2, SM3, SM4) operating on the vector register file and offer significant performance improvements over scalar cryptography extensions (Zk) due to data parallelism. As a ratified extension, Zvk is supported by compiler toolchains and is already being integrated into popular cryptographic middleware such as...
ECO-CRYSTALS: Efficient Cryptography CRYSTALS on Standard RISC-V ISA
Xinyi Ji, Jiankuo Dong, Junhao Huang, Zhijian Yuan, Wangchen Dai, Fu Xiao, Jingqiang Lin
Implementation
The field of post-quantum cryptography (PQC) is continuously evolving. Many researchers are exploring efficient PQC implementation on various platforms, including x86, ARM, FPGA, GPU, etc. In this paper, we present an Efficient CryptOgraphy CRYSTALS (ECO-CRYSTALS) implementation on standard 64-bit RISC-V Instruction Set Architecture (ISA). The target schemes are two winners of the National Institute of Standards and Technology (NIST) PQC competition: CRYSTALS-Kyber and CRYSTALS-Dilithium,...
A Fast and Efficient SIKE Co-Design: Coarse-Grained Reconfigurable Accelerators with Custom RISC-V Microcontroller on FPGA
Jing Tian, Bo Wu, Lang Feng, Haochen Zhang, Zhongfeng Wang
Implementation
This paper proposes a fast and efficient FPGA-based hardware-software co-design for the supersingular isogeny key encapsulation (SIKE) protocol controlled by a custom RISC-V processor. Firstly, we highly optimize the core unit, the polynomial-based field arithmetic logic unit (FALU), with the proposed fast convolution-like multiplier (FCM) to significantly reduce the resource consumption while still maintaining low latency and constant time for all the four SIKE parameters. Secondly, we pack...
SoCureLLM: An LLM-driven Approach for Large-Scale System-on-Chip Security Verification and Policy Generation
Shams Tarek, Dipayan Saha, Sujan Kumar Saha, Mark Tehranipoor, Farimah Farahmandi
Applications
Contemporary methods for hardware security verification struggle with adaptability, scalability, and availability due to the increasing complexity of the modern system-on-chips (SoCs). Large language models (LLMs) have emerged as a viable approach to address these shortcomings in security verification because of their natural language understanding, advanced reasoning, and knowledge transfer capabilities. However, their application to large designs is limited by inherent token limitation and...
CISELeaks: Information Leakage Assessment of Cryptographic Instruction Set Extension Prototypes
Aruna Jayasena, Richard Bachmann, Prabhat Mishra
Attacks and cryptanalysis
Software based cryptographic implementations provide flexibility but they face performance limitations. In contrast, hardware based cryptographic accelerators utilize application-specific customization to provide real-time security solutions.
Cryptographic instruction-set extensions (CISE) combine the advantages of both hardware and software based solutions to provide higher performance combined with the flexibility of atomic-level cryptographic operations. While CISE is widely used to...
Efficient Second-Order Masked Software Implementations of Ascon in Theory and Practice
Barbara Gigerl, Florian Mendel, Martin Schläffer, Robert Primas
Implementation
In this paper, we present efficient protected software implementations of the authenticated cipher Ascon, the recently announced winner of the NIST standardization process for lightweight cryptography.
Our implementations target theoretical and practical security against second-order power analysis attacks.
First, we propose an efficient second-order extension of a previously presented first-order masking of the Keccak S-box that does not require online randomness.
The extension...
Plan your defense: A comparative analysis of leakage detection methods on RISC-V cores
Konstantina Miteloudi, Asmita Adhikary, Niels van Drueten, Lejla Batina, Ileana Buhan
Applications
Hardening microprocessors against side-channel attacks is a critical aspect of ensuring their security. A key step in this process is identifying and mitigating “leaky" hardware modules, which leak information during the execution of cryptographic algorithms.
In this paper, we explore how different leakage detection methods, the Side-channel Vulnerability Factor (SVF) and the Test Vector Leakage Assessment (TVLA), contribute to hardening of microprocessors. We conduct experiments on two...
Accelerating SLH-DSA by Two Orders of Magnitude with a Single Hash Unit
Markku-Juhani O. Saarinen
Implementation
We report on efficient and secure hardware implementation techniques for the FIPS 205 SLH-DSA Hash-Based Signature Standard. We demonstrate that very significant overall performance gains can be obtained from hardware that optimizes the padding formats and iterative hashing processes specific to SLH-DSA. A prototype implementation, SLotH, contains Keccak/SHAKE, SHA2-256, and SHA2-512 cores and supports all 12 parameter sets of SLH-DSA. SLotH also supports side-channel secure PRF computation...
Faster Complete Formulas for the GLS254 Binary Curve
Thomas Pornin
Implementation
GLS254 is an elliptic curve defined over a finite field of characteristic 2; it contains a 253-bit prime order subgroup, and supports an endomorphism that can be efficiently computed and helps speed up some typical operations such as multiplication of a curve element by a scalar. That curve offers on x86 and ARMv8 platforms the best known performance for elliptic curves at the 128-bit security level.
In this paper we present a number of new results related to GLS254:
- We describe...
PQ.V.ALU.E: Post-Quantum RISC-V Custom ALU Extensions on Dilithium and Kyber
Konstantina Miteloudi, Joppe Bos, Olivier Bronchain, Björn Fay, Joost Renes
Implementation
This paper explores the challenges and potential solutions of implementing the recommended upcoming post-quantum cryptography standards (the CRYSTALS-Dilithium and CRYSTALS-Kyber algorithms) on resource constrained devices.
The high computational cost of polynomial operations, fundamental to cryptography based on ideal lattices, presents significant challenges in an efficient implementation.
This paper proposes a hardware/software co-design strategy using RISC-V extensions to optimize...
MAFIA: Protecting the Microarchitecture of Embedded Systems Against Fault Injection Attacks
Thomas Chamelot, Damien Couroussé, Karine Heydemann
Implementation
Fault injection attacks represent an effective threat to embedded systems. Recently, Laurent et al. have reported that fault injection attacks can leverage faults inside the microarchitecture. However, state-of-the-art counter-measures, hardware-only or with hardware support, do not consider the integrity of microarchitecture control signals that are the target of these faults.
We present MAFIA, a microarchitecture protection against fault injection attacks. MAFIA ensures integrity of...
To extend or not to extend: Agile Masking Instructions for PQC
Markus Krausz, Georg Land, Florian Stolz, Dennis Naujoks, Jan Richter-Brockmann, Tim Güneysu, Lucie Kogelheide
Implementation
Splitting up sensitive data into multiple shares – termed masking – has
proven an effective countermeasure against various types of Side-Channel Analysis (SCA) on cryptographic implementations. However, in software this approach not only leads to dramatic performance overheads for non-linear operations, but also suffers from microarchitectural leakage, which is hard to avoid. Both problems can be addressed with one solution: masked hardware accelerators. In this context, Gao et al. [GGM+...
Jolt: SNARKs for Virtual Machines via Lookups
Arasu Arun, Srinath Setty, Justin Thaler
Cryptographic protocols
Succinct Non-interactive Arguments of Knowledge (SNARKs) allow an untrusted prover to establish that it correctly ran some "witness-checking procedure" on a witness. A zkVM (short for zero-knowledge Virtual Machine) is a SNARK that allows the witness-checking procedure to be specified as a computer program written in the assembly language of a specific instruction set architecture (ISA).
A $\textit{front-end}$ converts computer programs into a lower-level representation such as an...
Unlocking the lookup singularity with Lasso
Srinath Setty, Justin Thaler, Riad Wahby
Foundations
This paper introduces Lasso, a new family of lookup arguments, which allow an untrusted prover to commit to a vector $a \in \mathbb{F}^m$ and prove that all entries of a reside in some predetermined table $t \in \mathbb{F}^n$. Lasso’s performance characteristics unlock the so-called "lookup singularity". Lasso works with any multilinear polynomial commitment scheme, and provides the following efficiency properties.
For $m$ lookups into a table of size $n$, Lasso’s prover commits to just...
PicoEMP: A Low-Cost EMFI Platform Compared to BBI and Voltage Fault Injection using TDC and External VCC Measurements
Colin O'Flynn
Implementation
Electromagnetic Fault Injection (EMFI) has been demonstrated to be useful for both academic and industrial research. Due to the dangerous voltages involved, most work is done with commercial tools. This paper introduces a safety-focused low-cost and open-source design that can be built for less than \$50 using only off-the-shelf parts.
The paper also introduces an iCE40 based Time-to-Digital Converter (TDC), which is used to visualize the glitch inserted by the EMFI tool. This...
MuxProofs: Succinct Arguments for Machine Computation from Vector Lookups
Zijing Di, Lucas Xia, Wilson Nguyen, Nirvan Tyagi
Cryptographic protocols
Proofs for machine computation prove the correct execution of arbitrary programs that operate over fixed instruction sets (e.g., RISC-V, EVM, Wasm). A standard approach for proving machine computation is to prove a universal set of constraints that encode the full instruction set at each step of the program execution. This approach incurs a proving cost per execution step on the order of the total sum of instruction constraints for all of the instructions in the set, despite each step of the...
eLIMInate: a Leakage-focused ISE for Masked Implementation
Hao Cheng, Daniel Page, Weijia Wang
Implementation
Even given a state-of-the-art masking scheme, masked software implementation of some cryptography functionality can pose significant challenges stemming, e.g., from simultaneous requirements for efficiency and security. In this paper we design an Instruction Set Extension (ISE) to address a specific element of said challenge, namely the elimination of leakage stemming from architectural and micro-architectural overwriting. Conceptually, the ISE allows a leakage-focused behavioural hint to be...
Secure Context Switching of Masked Software Implementations
Barbara Gigerl, Robert Primas, Stefan Mangard
Implementation
Cryptographic software running on embedded devices requires protection against physical side-channel attacks such as power analysis. Masking is a widely deployed countermeasure against these attacksand is directly implemented on algorithmic level. Many works study the security of masked cryptographic software on CPUs, pointing out potential problems on algorithmic/microarchitecture-level, as well as corresponding solutions, and even show masked software can be implemented efficiently and...
HyperNova: Recursive arguments for customizable constraint systems
Abhiram Kothapalli, Srinath Setty
Foundations
We introduce HyperNova, a new recursive argument for proving incremental computations whose steps are expressed with CCS (Setty et al. ePrint 2023/552), a customizable constraint system that simultaneously generalizes Plonkish, R1CS, and AIR without overheads. HyperNova makes four contributions, each resolving a major problem in the area of recursive arguments.
First, it provides a folding scheme for CCS where the prover’s cryptographic cost is a single multi-scalar multiplication (MSM)...
Hardware-Software Co-design for Side-Channel Protected Neural Network Inference
Anuj Dubey, Rosario Cammarota, Avinash Varna, Raghavan Kumar, Aydin Aysu
Applications
Physical side-channel attacks are a major threat to stealing confidential data from devices. There has been a recent surge in such attacks on edge machine learning (ML) hardware to extract the model parameters. Consequently, there has also been some work, although limited, on building corresponding side-channel defenses against such attacks. All the current solutions either take the fully software or fully hardware-centric approaches, which are limited either in performance or...
Recommendation for a holistic secure embedded ISA extension
Florian Stolz, Marc Fyrbiak, Pascal Sasdrich, Tim Güneysu
Foundations
Embedded systems are a cornerstone of the ongoing digitization of our society, ranging from expanding markets around IoT and smart-X devices over to sensors in autonomous driving, medical equipment or critical infrastructures. Since a vast amount of embedded systems are safety-critical (e.g., due to their operation site), security is a necessity for their operation. However, unlike mobile, desktop, and server systems, where adversaries typically only act have remote access, embedded systems...
SuperNova: Proving universal machine executions without universal circuits
Abhiram Kothapalli, Srinath Setty
Foundations
This paper introduces SuperNova, a new recursive proof system for incrementally producing succinct proofs of correct execution of programs on a stateful machine with a particular instruction set (e.g., EVM, RISC-V). A distinguishing aspect of SuperNova is that the cost of proving a step of a program is proportional only to the size of the circuit representing the instruction invoked by the program step. This is a stark departure from prior works that employ universal circuits where the cost...
RISC-V Instruction Set Extensions for Lightweight Symmetric Cryptography
Hao Cheng, Johann Großschädl, Ben Marshall, Dan Page, Thinh Pham
Implementation
The NIST LightWeight Cryptography (LWC) selection process aims to standardise cryptographic functionality which is suitable for resource-constrained devices. Since the outcome is likely to have significant, long-lived impact, careful evaluation of each submission with respect to metrics explicitly outlined in the call is imperative. Beyond the robustness of submissions against cryptanalytic attack, metrics related to their implementation (e.g., execution latency and memory footprint) form an...
Threshold Implementations in Software: Micro-architectural Leakages in Algorithms
John Gaspoz, Siemen Dhooghe
Implementation
This paper provides necessary properties to algorithmically secure first-order maskings in scalar micro-architectures. The security notions of threshold implementations are adapted following micro-processor leakage effects which are known to the literature. The resulting notions, which are based on the placement of shares, are applied to a two-share randomness-free PRESENT cipher and Keccak-f. The assembly implementations are put on a RISC-V and an ARM Cortex-M4 core. All designs are...
Key-Recovery Fault Injection Attack on the Classic McEliece KEM
Sabine Pircher, Johannes Geier, Julian Danner, Daniel Mueller-Gritschneder, Antonia Wachter-Zeh
Attacks and cryptanalysis
We present a key-recovery fault injection attack on the Classic McEliece Key Encapsulation Mechanism (KEM). The fault injections target the error-locator polynomial of the Goppa code and the validity checks in the decryption algorithm, making a chosen ciphertext attack possible. Faulty decryption outputs are used to generate a system of polynomial equations in the secret support elements of the Goppa code. After solving the equations, we can determine a suitable Goppa polynomial and form an...
Low-latency implementation of the GIFT cipher on RISC-V architectures
Gheorghe Pojoga, Kostas Papagiannopoulos
Implementation
Lightweight cryptography is a viable solution for constrained computational environments that require a secure communication channel. To standardize lightweight primitives, NIST has published a call for algorithms that address needs like compactness, low-latency, low-power/energy, etc. Among the candidates, the GIFT family of block ciphers was utilized in various NIST candidates due to its high-security margin and small gate footprint. As a result of their hardware-oriented design, software...
Risky Translations: Securing TLBs against Timing Side Channels
Florian Stolz, Jan Philipp Thoma, Pascal Sasdrich, Tim Güneysu
Foundations
Microarchitectural side-channel vulnerabilities in modern processors are known to be a powerful attack vector that can be utilized to bypass common security boundaries like memory isolation. As shown by recent variants of transient execution attacks related to Spectre and Meltdown, those side channels allow to leak data from the microarchitecture to the observable architectural state. The vast majority of attacks currently build on the cache-timing side channel, since it is easy to exploit...
Maximizing the Potential of Custom RISC-V Vector Extensions for Speeding up SHA-3 Hash Functions
Huimin Li, Nele Mentens, Stjepan Picek
Applications
SHA-3 is considered to be one of the most secure standardized hash functions. It relies on the Keccak-f[1 600] permutation, which operates on an internal state of 1 600 bits, mostly represented as a 5×5×64-bit matrix. While software implementations process the state sequentially in chunks of typically 32 or 64 bits, the Keccak-f[1 600] permutation can benefit a lot from speedup through parallelization. This paper is the first to explore the full potential of parallelization of Keccak-f[1...
Power Contracts: Provably Complete Power Leakage Models for Processors
Roderick Bloem, Barbara Gigerl, Marc Gourjon, Vedad Hadžić, Stefan Mangard, Robert Primas
Foundations
The protection of cryptographic software implementations against power-analysis attacks is critical for applications in embedded systems.
A commonly used algorithmic countermeasure against these attacks is masking, a secret-sharing scheme that splits a sensitive computation into computations on multiple random shares.
In practice, the security of masking schemes relies on several assumptions that are often violated by microarchitectural side-effects of CPUs.
Many past works address this...
Post-Quantum Signatures on RISC-V with Hardware Acceleration
Patrick Karl, Jonas Schupp, Tim Fritzmann, Georg Sigl
Implementation
CRYSTALS-Dilithium and Falcon are digital signature algorithms based on cryptographic lattices, that are considered secure even if large-scale quantum computers will be able to break conventional public-key cryptography. Both schemes have been selected for standardization in the NIST post-quantum competition. In this work, we present a RISC-V HW/SW odesign that aims to combine the advantages of software- and hardware implementations, i.e. flexibility and performance. It shows the use of ...
RevEAL: Single-Trace Side-Channel Leakage of the SEAL Homomorphic Encryption Library
Furkan Aydin, Emre Karabulut, Seetal Potluri, Erdem Alkim, Aydin Aysu
This paper demonstrates the first side-channel attack on homomorphic encryption (HE), which allows computing on encrypted data. We reveal a power-based side-channel leakage of Microsoft SEAL prior to v3.6 that implements the Brakerski/Fan-Vercauteren (BFV) protocol. Our proposed attack targets the Gaussian sampling in the SEAL’s encryption phase and can extract the entire message with a single power measurement.
Our attack works by (1) identifying each coefficient index being sampled, (2)...
A Scalable SIMD RISC-V based Processor with Customized Vector Extensions for CRYSTALS-Kyber
Huimin Li, Nele Mentens, Stjepan Picek
Implementation
SHA-3 is considered to be one of the most secure standardized hash functions. It relies on the Keccak-f[1,600] permutation, which operates on an internal state of 1,600 bits, mostly represented as a $5\times5\times64{-}bit$ matrix. While existing implementations process the state sequentially in chunks of typically 32 or 64 bits, the Keccak-f[1,600] permutation can benefit a lot from speedup through parallelization. This paper is the first to explore the full potential of parallelization of...
Time-memory Trade-offs for Saber+ on Memory-constrained RISC-V
Jipeng Zhang, Junhao Huang, Zhe Liu, Sujoy Sinha Roy
Public-key cryptography
Saber is a module-lattice-based key encapsulation scheme that has been selected as a finalist in the NIST Post-Quantum Cryptography Standardization Project. As Saber computes on considerably large matrices and vectors of polynomials, its efficient implementation on memory-constrained IoT devices is very challenging. In this paper, we present an implementation of Saber with a minor tweak to the original Saber protocol for achieving reduced memory consumption and better performance. We call...
SME: Scalable Masking Extensions
Ben Marshall, Dan Page
Supporting masking countermeasures for non-invasive side-channel security in instructions set architectures is a hard problem. Masked operations often have a large number of inputs and outputs, and enabling portable higher order masking has remained a difficult. However, there are clear benefits to enabling this in terms of performance, code density and security guarantees. We present SME, an instruction set extension for enabling secure and efficient software masking of cryptographic
code...
On Entropy and Bit Patterns of Ring Oscillator Jitter
Markku-Juhani O. Saarinen
Implementation
Thermal jitter (phase noise) from a free-running ring oscillator is a common, easily implementable physical randomness source in True Random Number Generators (TRNGs). We show how to evaluate entropy, autocorrelation, and bit pattern distributions of ring oscillator noise sources, even with low jitter levels or some bias. Entropy justification is required in NIST 800-90B and AIS-31 testing and for applications such as the RISC-V entropy source extension. Our numerical evaluation algorithms...
SeqL+: Secure Scan-Obfuscation with Theoretical and Empirical Validation
Seetal Potluri, Shamik Kundu, Akash Kumar, Kanad Basu, Aydin Aysu
Implementation
Existing logic-locking attacks are known to successfully decrypt a functionally correct key of a locked combinational
circuit. Extensions of these attacks to real-world Intellectual Properties (IPs, which are sequential circuits) have been demonstrated through the scan-chain by selectively initializing the combinational logic and analyzing the responses. In this paper, we propose SeqL+ to mitigate a broad class of such attacks. The key idea is to lock selective functional-input/scan-output...
Architecture Support for Bitslicing
Pantea Kiaei, Tom Conroy, Patrick Schaumont
Implementation
The bitsliced programming model has shown to boost the throughput of software programs. However, on a standard architecture, it exerts a high pressure on register access, causing memory spills and restraining the full potential of bitslicing. In this work, we present architecture support for bitslicing in a System-on-Chip. Our hardware extensions are of two types; internal to the processor core, in the form of custom instructions, and external to the processor, in the form of direct memory...
Saidoyoki: Evaluating side-channel leakage in pre- and post-silicon setting
Pantea Kiaei, Zhenyuan Liu, Ramazan Kaan Eren, Yuan Yao, Patrick Schaumont
Implementation
Predicting the level and exploitability of side-channel leakage from complex SoC design is a challenging task. We present Saidoyoki, a test platform that enables the assessment of side-channel leakage under two different settings. The first is pre-silicon side-channel leakage estimation in SoC, and it requires the use of fast side-channel leakage estimation from a high level design description. The second is post-silicon side-channel leakage measurement and analysis in SoC, and it requires a...
Simpira Gets Simpler: Optimized Simpira on Microcontrollers
Minjoo Sim, Siwoo Eum, Hyeokdong Kwon, Kyungbae Jang, Hyunjun Kim, Hyunji Kim, Gyeongju Song, Wai-Kong Lee, Hwajeong Seo
Implementation
Simpira Permutation is a Permutation design using the AES algorithm. The AES algorithm is the most widely used in the world, and Intel has developed a hardware accelerated AES instruction set (AES-NI) to improve the performance of encryption. By using AES-NI, Simpira can be improved further. However, low-end processors that do not support AES-NI require efficient implementation of Simpira optimization. In this paper, we introduce a optimized implementation of a Simpira Permutation in 8-bit...
All the Polynomial Multiplication You Need on RISC-V
Hwajeong Seo, Hyeokdong Kwon, Siwoo Eum, Kyungbae Jang, Hyunjun Kim, Hyunji Kim, Minjoo Sim, Gyeongju Song, Wai-Kong Lee
Implementation
Polynomial multiplication is a core operation for public key cryptography, such as pre-quantum cryptography (e.g. elliptic curve cryptography) and post-quantum cryptography (e.g. code-based cryptography and multivariate-based cryptography).
For this reason, the efficient and secure implementation of polynomial multiplication has been actively conducted for high availability and security level in application services.
In this paper, we present all polynomial multiplication methods on modern...
Secure and Efficient Software Masking on Superscalar Pipelined Processors
Barbara Gigerl, Robert Primas, Stefan Mangard
Applications
Physical side-channel attacks like power analysis pose a serious threat to cryptographic devices in real-world applications. Consequently, devices implement algorithmic countermeasures like masking.
In the past, works on the design and verification of masked software implementations have mostly focused on simple microprocessors that find usage on smart cards. However, many other applications such as in the automotive industry require side-channel protected cryptographic computations on much...
XDIVINSA: eXtended DIVersifying INStruction Agent to Mitigate Power Side-Channel Leakage
Thinh H. Pham, Ben Marshall, Alexander Fell, Siew-Kei Lam, Daniel Page
Implementation
Side-channel analysis (SCA) attacks pose a major threat to embedded systems due to their ease of accessibility. Realising SCA resilient cryptographic algorithms on embedded systems under tight intrinsic constraints, such as low area cost, limited computational ability, etc., is extremely challenging and often not possible. We propose a seamless and effective approach to realise a generic countermeasure against SCA attacks. XDIVINSA, an extended diversifying instruction agent, is introduced...
A lightweight ISE for ChaCha on RISC-V
Ben Marshall, Daniel Page, Thinh Hung Pham
Implementation
ChaCha is a high-throughput stream cipher designed with the aim of ensuring high-security margins while achieving high performance on software platforms. RISC-V, an emerging, free, and open Instruction Set Architecture (ISA) is being developed with many instruction set extensions (ISE). ISEs are a native concept in RISC-V to support a relatively small RISC-V ISA to suit different use-cases including cryptographic acceleration via either standard or custom ISEs. This paper proposes a...
Quantum-Resistant Security for Software Updates on Low-power Networked Embedded Devices
Gustavo Banegas, Koen Zandberg, Adrian Herrmann, Emmanuel Baccelli, Benjamin Smith
Applications
As the Internet of Things (IoT) rolls out today to devices
whose lifetime may well exceed a decade,
conservative threat models should consider attackers with access to quantum computing
power.
The SUIT standard (specified by the IETF) defines a security
architecture for IoT software updates, standardizing the metadata and
the cryptographic tools---namely, digital signatures and hash functions---that guarantee the legitimacy of software updates.
While the performance of SUIT has...
CARiMoL: A Configurable Hardware Accelerator for Ringand Module Lattice-Based Post-Quantum Cryptography
Afifa Ishtiaq, Dr. Muhammad Shafique, Dr. Osman Hassan
Implementation
Abstract—CARiMoL is a novel run-time Configurable Hardware Accelerator for Ring and Module Lattice-based postquantum
cryptography. It’s flexible design can be configured to key-pair generation, encapsulation, and decapsulation for NewHope and CRYSTALS-Kyber schemes using same hardware. CARiMoL offers run-time configurability for multiple security levels of NewHope and CRYSTALS-Kyber schemes, supporting both Chosen-Plaintext Attack (CPA) and Chosen-Ciphertext Attack (CCA) secure...
Hardware Penetration Testing Knocks Your SoCs Off
Mark Fischer, Fabian Langer, Johannes Mono, Clemens Nasenberg, Nils Albartus
Applications
Today’s society depends on interconnected electronic devices, which handle various sensitive information. Due to the
knowledge needed to develop these devices and the economic advantage of reusable solutions, most of these systems contain
Third-Party Intellectual Property (3PIP) cores that might not be trustworthy. If one of these 3PIP cores is vulnerable, the security of the entire device is potentially affected. As a result, sensitive data that is processed by the device can be leaked to...
Optimized Implementation of SM4 on AVR Microcontrollers, RISC-V Processors, and ARM Processors
Hyeokdong Kwon, Hyunjun Kim, Siwoo Eum, Minjoo Sim, Hyunji Kim, Wai-Kong Lee, Zhi Hu, Hwajeong Seo
Implementation
The SM4 block cipher is a Chinese domestic crpytographic that was introduced in 2003. Since the algorithm was developed for the use in wireless sensor networks, it is mandated in the Chinese National Standard for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure). The SM4 block cipher uses a 128-bit block size and a 32-bit round key. This consists of 32 rounds and one reverse translation \texttt{R}. In this paper, we present the optimized implementation of the SM4 block...
On the Design and Misuse of Microcoded (Embedded) Processors — A Cautionary Note
Nils Albartus, Clemens Nasenberg, Florian Stolz, Marc Fyrbiak, Christof Paar, Russell Tessier
Applications
Today's microprocessors often rely on microcode updates to address issues such as security or functional patches. Unfortunately, microcode update flexibility opens up new attack vectors through malicious microcode alterations. Such attacks share many features with hardware Trojans and have similar devastating consequences for system security. However, due to microcode's opaque nature, little is known in the open literature about the capabilities and limitations of microcode Trojans.
We...
Accelerated RISC-V for Post-Quantum SIKE
Rami Elkhatib, Reza Azarderakhsh, Mehran Mozaffari-Kermani
Public-key cryptography
Software implementations of cryptographic algorithms are slow but highly flexible and relatively easy to implement. On the other hand, hardware implementations are usually faster but provide little flexibility and require a lot of time to implement efficiently. In this paper, we develop a hybrid software-hardware implementation of the third round of Supersingular Isogeny Key Encapsulation (SIKE), a post-quantum cryptography algorithm candidate for NIST. We implement an isogeny field...
2021/536
Last updated: 2021-12-29
Analyzing the Potential of Transport Triggered Architecture for Lattice-based Cryptography Algorithms
Latif AKÇAY, Berna ÖRS
Implementation
Lattice-based structures offer considerable possibilities for post-quantum cryptography. Recently, many algorithms have been built on hard lattice problems. The three of the remaining four in the final round of the post-quantum cryptography standardization process use lattice-based methods. Especially in embedded systems, these algorithms should be operated effectively. In this study, the potential of transport triggered architecture is examined in this sense. We try to compare open source...
Masked Accelerators and Instruction Set Extensions for Post-Quantum Cryptography
Tim Fritzmann, Michiel Van Beirendonck, Debapriya Basu Roy, Patrick Karl, Thomas Schamberger, Ingrid Verbauwhede, Georg Sigl
Public-key cryptography
Side-channel attacks can break mathematically secure cryptographic systems leading to a major concern in applied cryptography. While the cryptanalysis and security evaluation of Post-Quantum Cryptography (PQC) have already received an increasing research effort, a cost analysis of efficient side-channel countermeasures is still lacking. In this work, we propose a masked HW/SW codesign of the NIST PQC finalists Kyber and Saber, suitable for their different characteristics. Among others, we...
Coco: Co-Design and Co-Verification of Masked Software Implementations on CPUs
Barbara Gigerl, Vedad Hadzic, Robert Primas, Stefan Mangard, Roderick Bloem
Implementation
The protection of cryptographic implementations against power analysis attacks is of critical importance for many applications in embedded systems. The typical approach of protecting against these attacks is to implement algorithmic countermeasures, like masking. However, implementing these countermeasures in a secure and correct manner is challenging. Masking schemes require the independent processing of secret shares, which is a property that is often violated by CPU microarchitectures in...
Bypassing Isolated Execution on RISC-V with Fault Injection
Shoei Nashimoto, Daisuke Suzuki, Rei Ueno, Naofumi Homma
Applications
RISC-V is equipped with physical memory protection (PMP) to prevent malicious software from accessing protected memory regions. One of the main objectives of PMP is to provide a trusted execution environment (TEE) that isolates secure and insecure applications. In this study, we propose a fault injection attack to bypass the isolation based on PMP. The proposed attack scheme involves extracting successful glitch parameters for fault injection under the assumption of a black-box environment....
Fixslicing AES-like Ciphers: New bitsliced AES speed records on ARM-Cortex M and RISC-V
Alexandre Adomnicai, Thomas Peyrin
Implementation
The fixslicing implementation strategy was originally introduced as a new representation for the hardware-oriented GIFT block cipher to achieve very efficient software constant-time implementations. In this article, we show that the fundamental idea underlying the fixslicing technique is not of interest only for GIFT, but can be applied to other ciphers as well. Especially, we study the benefits of fixslicing in the case of AES and show that it allows to reduce by 52% the amount of...
A Fast and Compact RISC-V Accelerator for Ascon and Friends
Stefan Steinegger, Robert Primas
Implementation
Ascon-p is the core building block of Ascon, the winner in the lightweight category
of the CAESAR competition. With ISAP, another Ascon-p-based AEAD scheme is currently competing
in the 2nd round of the NIST lightweight cryptography standardization project.
In contrast to Ascon, ISAP focuses on providing hardening/protection against a large
class of implementation attacks, such as DPA, DFA, SFA, and SIFA, entirely on mode-level.
Consequently, Ascon-p can be used to realize a wide range of...
The design of scalar AES Instruction Set Extensions for RISC-V
Ben Marshall, G. Richard Newell, Dan Page, Markku-Juhani O. Saarinen, Claire Wolf
Implementation
Secure, efficient execution of AES is an essential requirement on most computing platforms. Dedicated Instruction Set Extensions (ISEs) are often included for this purpose. RISC-V is a (relatively) new ISA that lacks such a standardised ISE. We survey the state-of-the-art industrial and academic ISEs for AES, implement and evaluate five different ISEs, one of which is novel. We recommend separate ISEs for 32 and 64-bit base architectures, with measured performance improvements for an AES-128...
Fault Injection as an Oscilloscope: Fault Correlation Analysis
Albert Spruyt, Alyssa Milburn, Lukasz Chmielewski
Implementation
Fault Injection (FI) attacks have become a practical threat to modern cryptographic implementations. Such attacks have recently focused more on exploitation of implementation-centric and device-specific properties of the faults. In this paper, we consider the parallel between SCA attacks and FI attacks; specifically, that many FI attacks rely on the data-dependency of activation and propagation of a fault, and SCA attacks similarly rely on data-dependent power usage. In fact, these are so...
Development of The RISC-V Entropy Source Interface
Markku-Juhani O. Saarinen, G. Richard Newell, Ben Marshall
Implementation
The RISC-V True Random Number Generator (TRNG) architecture breaks with previous ISA TRNG practice by splitting the Entropy Source (ES) component away from cryptographic DRBGs into a separate privileged interface, and in its use of polling. The modular approach is suitable for the RISC-V hardware IP ecosystem, allows a significantly smaller implementation footprint on platforms that need it, while directly supporting current standards compliance testing methods. We describe the interface,...
Assembly or Optimized C for Lightweight Cryptography on RISC-V?
Fabio Campos, Lars Jellema, Mauk Lemmen, Lars Müller, Amber Sprenkels, Benoit Viguier
Implementation
A major challenge when applying cryptography on constrained environments is the trade-off between performance and security. In this work, we analyzed different strategies for the optimization of several candidates of NIST's lightweight cryptography standardization project on a RISC-V architecture. In particular, we studied the general impact of optimizing symmetric-key algorithms in assembly and in plain C. Furthermore, we present optimized implementations, achieving a speed-up of up to 81%...
An Instruction Set Extension to Support Software-Based Masking
Si Gao, Johann Großschädl, Ben Marshall, Dan Page, Thinh Pham, Francesco Regazzoni
Implementation
In both hardware and software, masking can represent an effective means of hardening an implementation against side channel attack vectors such as Differential Power Analysis (DPA). Focusing on software, however, the use of masking can present various challenges: specifically, it often 1) requires significant effort to translate any theoretical security properties into practice, and, even then, 2) imposes a significant overhead in terms of efficiency. To address both challenges, this paper...
Comparison of RISC-V and transport triggered architectures for a post-quantum cryptography application
Latif AKÇAY, Berna ÖRS
Implementation
Cryptography is one of the basic phenomena of security systems. However, some of the widely used public key cryptography algorithms can be broken by using quantum computers. Therefore, many post-quantum cryptography algorithms are proposed in recent years to handle this issue. NTRU is one of the most important of these quantum-safe algorithms. Besides the importance of cryptography algorithms, the architecture where they are implemented is also essential. In this study, we developed an NTRU...
Domain-Oriented Masked Instruction Set Architecture for RISC-V
Pantea Kiaei, Patrick Schaumont
Implementation
An important selling point for the RISC-V instruction set is the separation between ISA and the implementation of the ISA, leading to flexibility in the design. We argue that for secure implementations, this flexibility is often a vulnerability. With a hardware attacker, the side-effects of instruction execution cannot be ignored. As a result, a strict separation between the ISA interface and implementation is undesirable. We suggest that secure ISA may require additional implementation...
RISQ-V: Tightly Coupled RISC-V Accelerators for Post-Quantum Cryptography
Tim Fritzmann, Georg Sigl, Johanna Sepúlveda
Public-key cryptography
Empowering electronic devices to support Post-Quantum Cryptography (PQC) is a challenging task. PQC introduces new mathematical elements and operations which are usually not easy to implement on standard processors. Especially for low cost and resource constraint devices, hardware acceleration is usually required. In addition, as the standardization process of PQC is still ongoing, a focus on maintaining flexibility is mandatory. To cope with such requirements, hardware/software co-design...
Parameterized Hardware Accelerators for Lattice-Based Cryptography and Their Application to the HW/SW Co-Design of qTESLA
Wen Wang, Shanquan Tian, Bernhard Jungk, Nina Bindel, Patrick Longa, Jakub Szefer
Implementation
This paper presents a set of efficient and parameterized hardware accelerators that target post-quantum lattice-based cryptographic schemes, including a versatile cSHAKE core, a binary-search CDT-based Gaussian sampler, and a pipelined
NTT-based polynomial multiplier, among others. Unlike much of prior work, the accelerators are fully open-sourced, are designed to be constant-time, and can be parameterized at compile-time to support different parameters without the need for
re-writing the...
ISA Extensions for Finite Field Arithmetic - Accelerating Kyber and NewHope on RISC-V
Erdem Alkim, Hülya Evkan, Norman Lahr, Ruben Niederhagen, Richard Petri
Implementation
We present and evaluate a custom extension to the RISC-V instruction set
for finite fields arithmetic. The result serves as a very compact
approach to software-hardware co-design of PQC implementations in the
context of small embedded processors such as smartcards. The extension
provides instructions that implement finite field operations with
subsequent reduction of the result. As small finite fields are used in
various PQC schemes, such instructions can provide a considerable
speedup for...
2019/1403
Last updated: 2019-12-14
No RISC, no Fun: Comparison of Hardware Accelerated Hash Functions for XMSS
Ingo Braun, Fabio Campos, Steffen Reith, Marc Stöttinger
Implementation
We investigate multiple implementations of a hash-based digital signature scheme in software and hardware for a RISC-V processor. For this, different instantiations of XMSS by leveraging SHA-256 and SHA-3 are considered. Moreover, we propose various optimisations for accelerating the signature scheme on resource-constrained FPGAs.
Compared to the pure software version, the implemented hardware accelerators for SHA-256 and SHA-3 achieve a significant speedup of 25x and 87x respectively for...
Sapphire: A Configurable Crypto-Processor for Post-Quantum Lattice-based Protocols (Extended Version)
Utsav Banerjee, Tenzin S. Ukyab, Anantha P. Chandrakasan
Implementation
Public key cryptography protocols, such as RSA and elliptic curve cryptography, will be rendered insecure by Shor’s algorithm when large-scale quantum computers are built. Cryptographers are working on quantum-resistant algorithms, and lattice-based cryptography has emerged as a prime candidate. However, high computational complexity of these algorithms makes it challenging to implement lattice-based protocols on low-power embedded devices. To address this challenge, we present Sapphire – a...
SNEIK on Microcontrollers: AVR, ARMv7-M, and RISC-V with Custom Instructions
Markku-Juhani O. Saarinen
Implementation
SNEIK is a family of lightweight cryptographic algorithms derived from a
single 512-bit permutation. The SNEIGEN ``entropy distribution
function'' was designed to speed up certain functions in post-quantum
and lattice-based public key algorithms.
We implement and evaluate SNEIK algorithms on popular 8-bit AVR and 32-bit
ARMv7-M (Cortex M3/M4) microcontrollers, and also describe an
implementation for the open-source RISC-V (RV32I) Instruction Set
Architecture (ISA). Our results demonstrate...
Efficient Cryptography on the RISC-V Architecture
Ko Stoffelen
Implementation
RISC-V is a promising free and open-source instruction set architecture. Most of the instruction set has been standardized and several hardware implementations are commercially available. In this paper we highlight features of RISC-V that are interesting for optimizing implementations of cryptographic primitives. We provide the first optimized assembly implementations of table-based AES, bitsliced AES, ChaCha, and the Keccak-$f$[1600] permutation for the RV32I instruction set. With respect...
2019/656
Last updated: 2019-08-14
SeqL: Secure Scan-Locking for IP Protection
Seetal Potluri, Aydin Aysu, Akash Kumar
Implementation
Existing logic-locking attacks are known to successfully decrypt a functionally correct key of a locked combinational circuit. It is possible to extend these attacks to real-world Silicon-based Intellectual Properties (IPs, which are sequential circuits) through the scan-chain by selectively initializing the combinational logic and analyzing the responses. In this paper, we propose SeqL, which achieves functional isolation and locks selective functional-input/scan-output pairs, thus...
XMSS and Embedded Systems - XMSS Hardware Accelerators for RISC-V
Wen Wang, Bernhard Jungk, Julian Wälde, Shuwen Deng, Naina Gupta, Jakub Szefer, Ruben Niederhagen
Implementation
We describe a software-hardware co-design for the hash-based post-quantum signature scheme XMSS on a RISC-V embedded processor. We provide software optimizations for the XMSS reference implementation for SHA-256 parameter sets and several hardware accelerators that allow to balance area usage and performance based on individual needs. By integrating our hardware accelerators into the RISC-V processor, the version with the best time-area product generates a key pair (that can be used to...
Data Oblivious ISA Extensions for Side Channel-Resistant and High Performance Computing
Jiyong Yu, Lucas Hsiung, Mohamad El Hajj, Christopher W. Fletcher
Foundations
Blocking microarchitectural (digital) side channels is one of the most pressing challenges in hardware security today. Recently, there has been a surge of effort that attempts to block these leakages by writing programs data obliviously. In this model, programs are written to avoid placing sensitive data-dependent pressure on shared resources. Despite recent efforts, however, running data oblivious programs on modern machines today is insecure and low performance. First, writing programs...
Secure Boot and Remote Attestation in the Sanctum Processor
Ilia Lebedev, Kyle Hogan, Srinivas Devadas
Foundations
During the secure boot process for a trusted execution environment, the processor must provide a chain of certificates to the remote client demonstrating that their secure container was established as specified. This certificate chain is rooted at the hardware manufacturer who is responsible for constructing chips according to the correct specification and provisioning them with key material. We consider a semi-honest manufacturer who is assumed to construct chips correctly, but may attempt...
Concealing Secrets in Embedded Processors Designs
Hannes Gross, Manuel Jelinek, Stefan Mangard, Thomas Unterluggauer, Mario Werner
Implementation
Side-channel analysis (SCA) attacks pose a serious threat to embedded systems. So far, the research on masking as a countermeasure against SCA focuses merely on cryptographic algorithms, and has either been implemented for particular hardware or software implementations. However, the drawbacks of protecting specific implementations are the lack of flexibility in terms of used algorithms, the impossibility to update protected hardware implementations, and long development cycles for...
Sanctum: Minimal Hardware Extensions for Strong Software Isolation
Victor Costan, Ilia Lebedev, Srinivas Devadas
Applications
Sanctum offers the same promise as SGX, namely strong provable isolation of software modules running concurrently and sharing resources, but protects against an important class of additional software attacks that infer private information from a program's memory access patterns. We follow a principled approach to eliminating entire attack surfaces through isolation, rather than plugging attack-specific privacy leaks.
Sanctum demonstrates that strong software isolation is achievable with a...
Recent advances in quantum computing pose a threat to the security of digital communications, as large-scale quantum machines can break commonly used cryptographic algorithms, such as RSA and ECC. To mitigate this risk, post-quantum cryptography (PQC) schemes are being standardized, with recent NIST recommendations selecting two lattice-based algorithms: ML-KEM for key encapsulation and ML-DSA for digital signatures. Two computationally intensive kernels dominate the execution of these...
Inner product masking is a well-studied masking countermeasure against side-channel attacks. IPM-FD further extends the IPM scheme with fault detection capabilities. However, implementing IPM-FD in software especially on embedded devices results in high computational overhead. Therefore, in this work we perform a detailed analysis of all building blocks for IPM-FD scheme and propose a Masked Processing Unit to accelerate all operations, for example multiplication and IPM-FD specific...
FrodoKEM provides conservative post-quantum security through unstructured lattices, yet its deployment on embedded systems is historically constrained by high memory requirements. While state-of-the-art implementations mitigate this by generating the public matrix on-the-fly, they remain bottlenecked by the sequential generation of secret matrices, which enforces a rigid trade-off between stack usage and recomputation overhead. To address this, we propose a blockwise secret generation...
Software masking—particularly through threshold implementations—has long been regarded as a foundational defense mechanism against side-channel attacks. These schemes enforce the principles of non-completeness and uniformity, offering provable first-order resistance even under realistic leakage assumptions. However, such assurances were primarily developed under the simplified assumption of scalar or in-order execution, where instruction flow and data dependencies are well-behaved and...
This work presents a RISC-V extension for Post-Quantum Cryptography (PQC) called HORCRUX, which provides a unified Instruction-Set Extension (ISE) supporting all NIST-approved PQC algorithms. HORCRUX addresses the current fragmentation in hardware support, where existing extensions typically focus on individual algorithms or limited subsets of PQC schemes, and targets the common kernels shared across ML-KEM, ML-DSA, SLH-DSA, and HQC. To address the primary computational bottlenecks of all...
Falcon, a NTRU-based digital signature algorithm, has been selected by NIST as one of the post-quantum cryptography (PQC) standards. Compared to verification, the signature generation of Falcon is relatively slow. One of the core operations in signature generation is discrete Gaussian sampling, which involves a component known as the BaseSampler. The BaseSampler accounts for up to 30% of the time required for signature generation, making it a significant performance bottleneck. This work...
HQC is a code-based cryptosystem that has recently been announced for standardization after the fourth round of the NIST post-quantum cryptography standardization process. During this process, the NIST specifically required submitters to provide two kinds of implementation: a reference one, meant to serve lisibility and compliance with the specifications; and an optimized one, aimed at showing the performance of the scheme alongside other desirable properties such as resilience against...
We evaluate the implementation aspects of Rijndael-256 using the ratified RISC-V Vector Cryptography extension Zvkn. A positive finding is that Rijndael-256 can be implemented in constant time with the existing RISC-V ISA as the critical AES and fixed crossbar permutation instructions are in the DIEL (data-independent execution latency) set. Furthermore, simple tricks can be used to expand the functionality of key expansion instructions to cover the additional round constants required....
Fault injection attacks (FIA) pose significant security threats to embedded systems as they exploit weaknesses across multiple layers, including system software, instruction set architecture (ISA), microarchitecture, and physical hardware. Early detection and understanding of how physical faults propagate to system-level behavior are essential to safeguarding cyberinfrastructure. This work introduces CRAFT, a framework that combines pre-silicon analysis with post-silicon validation to...
zkVMs are SNARKs for verifying CPU execution. They allow an untrusted prover to show that it correctly ran a specified program on a witness, where the program is given as bytecode conforming to an instruction set architecture like RISC-V. Existing zkVMs still struggle with high prover resource costs, notably large runtime and memory usage. We show how to implement Jolt—an advanced, sum-check- based zkVM—with a significantly reduced memory footprint, without relying on SNARK recursion, and...
Both masking and shuffling are very common software countermeasures against side-channel attacks. However, exploring possible combinations of the two countermeasures to increase and fine-tune side-channel resilience is less investigated. With this work, we aim to bridge that gap by both concretising the security guarantees of several masking and shuffling combinations presented in earlier work and additionally investigating their randomness cost. We subsequently implement these approaches to...
Side-channel attacks pose significant threats to cryptographic implementations, which require the inclusion of countermeasures to mitigate these attacks. In this work, we study the masking of state-of-the-art post-quantum signatures based on the MPC-in-the-head paradigm. More precisely, we focus on the recent threshold-computation-in-the-head (TCitH) framework that applies to some NIST candidates of the post-quantum standardization process. We first provide an analysis of side-channel attack...
Logic locking has surfaced as a notable safeguard against diverse hazards that pose a risk to the integrated circuit (IC) supply chain. Existing literature on logic locking largely encompasses the art of proposing new constructions, on the one hand, and unearthing weaknesses in such algorithms on the other. Somehow, in this race of make and break, the stress on automation of adopting such techniques on real-life circuits has been rather limited. For the first time, we present a...
A memory checking argument enables a prover to prove to a verifier that it is correctly processing reads and writes to memory. They are used widely in modern SNARKs, especially in zkVMs, where the prover proves the correct execution of a CPU including the correctness of memory operations. We describe a new approach for memory checking, which we call the method of one-hot addressing and increments. We instantiate this method via two different families of protocols, called Twist and Shout....
Fully Homomorphic Encryption (FHE) enables privacy-preserving computation but imposes significant computational and communication overhead on the client for the public-key encryption. To alleviate this burden, previous works have introduced the Hybrid Homomorphic Encryption (HHE) paradigm, which combines symmetric encryption with homomorphic decryption to enhance performance for the FHE client. While early HHE schemes focused on binary data, modern versions now support integer prime fields,...
Side-channel attacks pose a serious risk to cryptographic implementations, particularly in embedded systems. While current methods, such as test vector leakage assessment~(TVLA), can identify leakage points, they do not provide insights into their root causes. We propose ARCHER, an architecture-level tool designed to perform side-channel analysis and root cause identification for software cryptographic implementations on RISC-V processors. ARCHER has two main components: (1) Side-Channel...
Lookups are a popular way to express repeated constraints in state-of-the art SNARKs. This is especially the case for zero-knowledge virtual machines (zkVMs), which produce succinct proofs of correct execution for programs expressed as bytecode according to a specific instruction set architecture (ISA). The Jolt zkVM (Arun, Setty & Thaler, Eurocrypt 2024) for RISC-V ISA employs Lasso (Setty, Thaler & Wahby, Eurocrypt 2024), an efficient lookup argument for massive structured tables, to prove...
We explore the use of microbenchmarks, small assembly code snippets, to detect microarchitectural side-channel leakage in CPU implementations. Specifically, we investigate the effectiveness of microbenchmarks in diagnosing the predisposition to side-channel leaks in two commonly used RISC-V cores: Picorv32 and Ibex. We propose a new framework that involves diagnosing side-channel leaks, identifying leakage points, and constructing leakage profiles to understand the underlying causes. We...
Post-quantum cryptography (PQC) has rapidly evolved in response to the emergence of quantum computers, with the US National Institute of Standards and Technology (NIST) selecting four finalist algorithms for PQC standardization in 2022, including the Falcon digital signature scheme. The latest round of digital signature schemes introduced Hawk, both based on the NTRU lattice, offering compact signatures, fast generation, and verification suitable for deployment on resource-constrained...
We introduce NeutronNova, a new folding scheme for the zero-check relation: an instance-witness pair is in the zero-check relation if a corresponding multivariate polynomial evaluates to zero for all inputs over a suitable Boolean hypercube. The folding scheme is a two-round protocol, and it internally invokes a \emph{single} round of the sum-check protocol. The folding scheme is more efficient than prior state-of-the-art schemes and directly benefits from recent improvements to the...
With the standardization of NIST post-quantum cryptographic (PQC) schemes, optimizing these PQC schemes across various platforms presents significant research value. While most existing software implementation efforts have concentrated on ARM platforms, research on PQC implementations utilizing various RISC-V instruction set architectures (ISAs) remains limited. In light of this gap, this paper proposes comprehensive and efficient optimizations of Keccak, Kyber, and Dilithium on...
The RISC-V Vector Cryptography Extensions (Zvk) were ratified in 2023 and integrated into the main ISA manuals in 2024. These extensions support high-speed symmetric cryptography (AES, SHA2, SM3, SM4) operating on the vector register file and offer significant performance improvements over scalar cryptography extensions (Zk) due to data parallelism. As a ratified extension, Zvk is supported by compiler toolchains and is already being integrated into popular cryptographic middleware such as...
The field of post-quantum cryptography (PQC) is continuously evolving. Many researchers are exploring efficient PQC implementation on various platforms, including x86, ARM, FPGA, GPU, etc. In this paper, we present an Efficient CryptOgraphy CRYSTALS (ECO-CRYSTALS) implementation on standard 64-bit RISC-V Instruction Set Architecture (ISA). The target schemes are two winners of the National Institute of Standards and Technology (NIST) PQC competition: CRYSTALS-Kyber and CRYSTALS-Dilithium,...
This paper proposes a fast and efficient FPGA-based hardware-software co-design for the supersingular isogeny key encapsulation (SIKE) protocol controlled by a custom RISC-V processor. Firstly, we highly optimize the core unit, the polynomial-based field arithmetic logic unit (FALU), with the proposed fast convolution-like multiplier (FCM) to significantly reduce the resource consumption while still maintaining low latency and constant time for all the four SIKE parameters. Secondly, we pack...
Contemporary methods for hardware security verification struggle with adaptability, scalability, and availability due to the increasing complexity of the modern system-on-chips (SoCs). Large language models (LLMs) have emerged as a viable approach to address these shortcomings in security verification because of their natural language understanding, advanced reasoning, and knowledge transfer capabilities. However, their application to large designs is limited by inherent token limitation and...
Software based cryptographic implementations provide flexibility but they face performance limitations. In contrast, hardware based cryptographic accelerators utilize application-specific customization to provide real-time security solutions. Cryptographic instruction-set extensions (CISE) combine the advantages of both hardware and software based solutions to provide higher performance combined with the flexibility of atomic-level cryptographic operations. While CISE is widely used to...
In this paper, we present efficient protected software implementations of the authenticated cipher Ascon, the recently announced winner of the NIST standardization process for lightweight cryptography. Our implementations target theoretical and practical security against second-order power analysis attacks. First, we propose an efficient second-order extension of a previously presented first-order masking of the Keccak S-box that does not require online randomness. The extension...
Hardening microprocessors against side-channel attacks is a critical aspect of ensuring their security. A key step in this process is identifying and mitigating “leaky" hardware modules, which leak information during the execution of cryptographic algorithms. In this paper, we explore how different leakage detection methods, the Side-channel Vulnerability Factor (SVF) and the Test Vector Leakage Assessment (TVLA), contribute to hardening of microprocessors. We conduct experiments on two...
We report on efficient and secure hardware implementation techniques for the FIPS 205 SLH-DSA Hash-Based Signature Standard. We demonstrate that very significant overall performance gains can be obtained from hardware that optimizes the padding formats and iterative hashing processes specific to SLH-DSA. A prototype implementation, SLotH, contains Keccak/SHAKE, SHA2-256, and SHA2-512 cores and supports all 12 parameter sets of SLH-DSA. SLotH also supports side-channel secure PRF computation...
GLS254 is an elliptic curve defined over a finite field of characteristic 2; it contains a 253-bit prime order subgroup, and supports an endomorphism that can be efficiently computed and helps speed up some typical operations such as multiplication of a curve element by a scalar. That curve offers on x86 and ARMv8 platforms the best known performance for elliptic curves at the 128-bit security level. In this paper we present a number of new results related to GLS254: - We describe...
This paper explores the challenges and potential solutions of implementing the recommended upcoming post-quantum cryptography standards (the CRYSTALS-Dilithium and CRYSTALS-Kyber algorithms) on resource constrained devices. The high computational cost of polynomial operations, fundamental to cryptography based on ideal lattices, presents significant challenges in an efficient implementation. This paper proposes a hardware/software co-design strategy using RISC-V extensions to optimize...
Fault injection attacks represent an effective threat to embedded systems. Recently, Laurent et al. have reported that fault injection attacks can leverage faults inside the microarchitecture. However, state-of-the-art counter-measures, hardware-only or with hardware support, do not consider the integrity of microarchitecture control signals that are the target of these faults. We present MAFIA, a microarchitecture protection against fault injection attacks. MAFIA ensures integrity of...
Splitting up sensitive data into multiple shares – termed masking – has proven an effective countermeasure against various types of Side-Channel Analysis (SCA) on cryptographic implementations. However, in software this approach not only leads to dramatic performance overheads for non-linear operations, but also suffers from microarchitectural leakage, which is hard to avoid. Both problems can be addressed with one solution: masked hardware accelerators. In this context, Gao et al. [GGM+...
Succinct Non-interactive Arguments of Knowledge (SNARKs) allow an untrusted prover to establish that it correctly ran some "witness-checking procedure" on a witness. A zkVM (short for zero-knowledge Virtual Machine) is a SNARK that allows the witness-checking procedure to be specified as a computer program written in the assembly language of a specific instruction set architecture (ISA). A $\textit{front-end}$ converts computer programs into a lower-level representation such as an...
This paper introduces Lasso, a new family of lookup arguments, which allow an untrusted prover to commit to a vector $a \in \mathbb{F}^m$ and prove that all entries of a reside in some predetermined table $t \in \mathbb{F}^n$. Lasso’s performance characteristics unlock the so-called "lookup singularity". Lasso works with any multilinear polynomial commitment scheme, and provides the following efficiency properties. For $m$ lookups into a table of size $n$, Lasso’s prover commits to just...
Electromagnetic Fault Injection (EMFI) has been demonstrated to be useful for both academic and industrial research. Due to the dangerous voltages involved, most work is done with commercial tools. This paper introduces a safety-focused low-cost and open-source design that can be built for less than \$50 using only off-the-shelf parts. The paper also introduces an iCE40 based Time-to-Digital Converter (TDC), which is used to visualize the glitch inserted by the EMFI tool. This...
Proofs for machine computation prove the correct execution of arbitrary programs that operate over fixed instruction sets (e.g., RISC-V, EVM, Wasm). A standard approach for proving machine computation is to prove a universal set of constraints that encode the full instruction set at each step of the program execution. This approach incurs a proving cost per execution step on the order of the total sum of instruction constraints for all of the instructions in the set, despite each step of the...
Even given a state-of-the-art masking scheme, masked software implementation of some cryptography functionality can pose significant challenges stemming, e.g., from simultaneous requirements for efficiency and security. In this paper we design an Instruction Set Extension (ISE) to address a specific element of said challenge, namely the elimination of leakage stemming from architectural and micro-architectural overwriting. Conceptually, the ISE allows a leakage-focused behavioural hint to be...
Cryptographic software running on embedded devices requires protection against physical side-channel attacks such as power analysis. Masking is a widely deployed countermeasure against these attacksand is directly implemented on algorithmic level. Many works study the security of masked cryptographic software on CPUs, pointing out potential problems on algorithmic/microarchitecture-level, as well as corresponding solutions, and even show masked software can be implemented efficiently and...
We introduce HyperNova, a new recursive argument for proving incremental computations whose steps are expressed with CCS (Setty et al. ePrint 2023/552), a customizable constraint system that simultaneously generalizes Plonkish, R1CS, and AIR without overheads. HyperNova makes four contributions, each resolving a major problem in the area of recursive arguments. First, it provides a folding scheme for CCS where the prover’s cryptographic cost is a single multi-scalar multiplication (MSM)...
Physical side-channel attacks are a major threat to stealing confidential data from devices. There has been a recent surge in such attacks on edge machine learning (ML) hardware to extract the model parameters. Consequently, there has also been some work, although limited, on building corresponding side-channel defenses against such attacks. All the current solutions either take the fully software or fully hardware-centric approaches, which are limited either in performance or...
Embedded systems are a cornerstone of the ongoing digitization of our society, ranging from expanding markets around IoT and smart-X devices over to sensors in autonomous driving, medical equipment or critical infrastructures. Since a vast amount of embedded systems are safety-critical (e.g., due to their operation site), security is a necessity for their operation. However, unlike mobile, desktop, and server systems, where adversaries typically only act have remote access, embedded systems...
This paper introduces SuperNova, a new recursive proof system for incrementally producing succinct proofs of correct execution of programs on a stateful machine with a particular instruction set (e.g., EVM, RISC-V). A distinguishing aspect of SuperNova is that the cost of proving a step of a program is proportional only to the size of the circuit representing the instruction invoked by the program step. This is a stark departure from prior works that employ universal circuits where the cost...
The NIST LightWeight Cryptography (LWC) selection process aims to standardise cryptographic functionality which is suitable for resource-constrained devices. Since the outcome is likely to have significant, long-lived impact, careful evaluation of each submission with respect to metrics explicitly outlined in the call is imperative. Beyond the robustness of submissions against cryptanalytic attack, metrics related to their implementation (e.g., execution latency and memory footprint) form an...
This paper provides necessary properties to algorithmically secure first-order maskings in scalar micro-architectures. The security notions of threshold implementations are adapted following micro-processor leakage effects which are known to the literature. The resulting notions, which are based on the placement of shares, are applied to a two-share randomness-free PRESENT cipher and Keccak-f. The assembly implementations are put on a RISC-V and an ARM Cortex-M4 core. All designs are...
We present a key-recovery fault injection attack on the Classic McEliece Key Encapsulation Mechanism (KEM). The fault injections target the error-locator polynomial of the Goppa code and the validity checks in the decryption algorithm, making a chosen ciphertext attack possible. Faulty decryption outputs are used to generate a system of polynomial equations in the secret support elements of the Goppa code. After solving the equations, we can determine a suitable Goppa polynomial and form an...
Lightweight cryptography is a viable solution for constrained computational environments that require a secure communication channel. To standardize lightweight primitives, NIST has published a call for algorithms that address needs like compactness, low-latency, low-power/energy, etc. Among the candidates, the GIFT family of block ciphers was utilized in various NIST candidates due to its high-security margin and small gate footprint. As a result of their hardware-oriented design, software...
Microarchitectural side-channel vulnerabilities in modern processors are known to be a powerful attack vector that can be utilized to bypass common security boundaries like memory isolation. As shown by recent variants of transient execution attacks related to Spectre and Meltdown, those side channels allow to leak data from the microarchitecture to the observable architectural state. The vast majority of attacks currently build on the cache-timing side channel, since it is easy to exploit...
SHA-3 is considered to be one of the most secure standardized hash functions. It relies on the Keccak-f[1 600] permutation, which operates on an internal state of 1 600 bits, mostly represented as a 5×5×64-bit matrix. While software implementations process the state sequentially in chunks of typically 32 or 64 bits, the Keccak-f[1 600] permutation can benefit a lot from speedup through parallelization. This paper is the first to explore the full potential of parallelization of Keccak-f[1...
The protection of cryptographic software implementations against power-analysis attacks is critical for applications in embedded systems. A commonly used algorithmic countermeasure against these attacks is masking, a secret-sharing scheme that splits a sensitive computation into computations on multiple random shares. In practice, the security of masking schemes relies on several assumptions that are often violated by microarchitectural side-effects of CPUs. Many past works address this...
CRYSTALS-Dilithium and Falcon are digital signature algorithms based on cryptographic lattices, that are considered secure even if large-scale quantum computers will be able to break conventional public-key cryptography. Both schemes have been selected for standardization in the NIST post-quantum competition. In this work, we present a RISC-V HW/SW odesign that aims to combine the advantages of software- and hardware implementations, i.e. flexibility and performance. It shows the use of ...
This paper demonstrates the first side-channel attack on homomorphic encryption (HE), which allows computing on encrypted data. We reveal a power-based side-channel leakage of Microsoft SEAL prior to v3.6 that implements the Brakerski/Fan-Vercauteren (BFV) protocol. Our proposed attack targets the Gaussian sampling in the SEAL’s encryption phase and can extract the entire message with a single power measurement. Our attack works by (1) identifying each coefficient index being sampled, (2)...
SHA-3 is considered to be one of the most secure standardized hash functions. It relies on the Keccak-f[1,600] permutation, which operates on an internal state of 1,600 bits, mostly represented as a $5\times5\times64{-}bit$ matrix. While existing implementations process the state sequentially in chunks of typically 32 or 64 bits, the Keccak-f[1,600] permutation can benefit a lot from speedup through parallelization. This paper is the first to explore the full potential of parallelization of...
Saber is a module-lattice-based key encapsulation scheme that has been selected as a finalist in the NIST Post-Quantum Cryptography Standardization Project. As Saber computes on considerably large matrices and vectors of polynomials, its efficient implementation on memory-constrained IoT devices is very challenging. In this paper, we present an implementation of Saber with a minor tweak to the original Saber protocol for achieving reduced memory consumption and better performance. We call...
Supporting masking countermeasures for non-invasive side-channel security in instructions set architectures is a hard problem. Masked operations often have a large number of inputs and outputs, and enabling portable higher order masking has remained a difficult. However, there are clear benefits to enabling this in terms of performance, code density and security guarantees. We present SME, an instruction set extension for enabling secure and efficient software masking of cryptographic code...
Thermal jitter (phase noise) from a free-running ring oscillator is a common, easily implementable physical randomness source in True Random Number Generators (TRNGs). We show how to evaluate entropy, autocorrelation, and bit pattern distributions of ring oscillator noise sources, even with low jitter levels or some bias. Entropy justification is required in NIST 800-90B and AIS-31 testing and for applications such as the RISC-V entropy source extension. Our numerical evaluation algorithms...
Existing logic-locking attacks are known to successfully decrypt a functionally correct key of a locked combinational circuit. Extensions of these attacks to real-world Intellectual Properties (IPs, which are sequential circuits) have been demonstrated through the scan-chain by selectively initializing the combinational logic and analyzing the responses. In this paper, we propose SeqL+ to mitigate a broad class of such attacks. The key idea is to lock selective functional-input/scan-output...
The bitsliced programming model has shown to boost the throughput of software programs. However, on a standard architecture, it exerts a high pressure on register access, causing memory spills and restraining the full potential of bitslicing. In this work, we present architecture support for bitslicing in a System-on-Chip. Our hardware extensions are of two types; internal to the processor core, in the form of custom instructions, and external to the processor, in the form of direct memory...
Predicting the level and exploitability of side-channel leakage from complex SoC design is a challenging task. We present Saidoyoki, a test platform that enables the assessment of side-channel leakage under two different settings. The first is pre-silicon side-channel leakage estimation in SoC, and it requires the use of fast side-channel leakage estimation from a high level design description. The second is post-silicon side-channel leakage measurement and analysis in SoC, and it requires a...
Simpira Permutation is a Permutation design using the AES algorithm. The AES algorithm is the most widely used in the world, and Intel has developed a hardware accelerated AES instruction set (AES-NI) to improve the performance of encryption. By using AES-NI, Simpira can be improved further. However, low-end processors that do not support AES-NI require efficient implementation of Simpira optimization. In this paper, we introduce a optimized implementation of a Simpira Permutation in 8-bit...
Polynomial multiplication is a core operation for public key cryptography, such as pre-quantum cryptography (e.g. elliptic curve cryptography) and post-quantum cryptography (e.g. code-based cryptography and multivariate-based cryptography). For this reason, the efficient and secure implementation of polynomial multiplication has been actively conducted for high availability and security level in application services. In this paper, we present all polynomial multiplication methods on modern...
Physical side-channel attacks like power analysis pose a serious threat to cryptographic devices in real-world applications. Consequently, devices implement algorithmic countermeasures like masking. In the past, works on the design and verification of masked software implementations have mostly focused on simple microprocessors that find usage on smart cards. However, many other applications such as in the automotive industry require side-channel protected cryptographic computations on much...
Side-channel analysis (SCA) attacks pose a major threat to embedded systems due to their ease of accessibility. Realising SCA resilient cryptographic algorithms on embedded systems under tight intrinsic constraints, such as low area cost, limited computational ability, etc., is extremely challenging and often not possible. We propose a seamless and effective approach to realise a generic countermeasure against SCA attacks. XDIVINSA, an extended diversifying instruction agent, is introduced...
ChaCha is a high-throughput stream cipher designed with the aim of ensuring high-security margins while achieving high performance on software platforms. RISC-V, an emerging, free, and open Instruction Set Architecture (ISA) is being developed with many instruction set extensions (ISE). ISEs are a native concept in RISC-V to support a relatively small RISC-V ISA to suit different use-cases including cryptographic acceleration via either standard or custom ISEs. This paper proposes a...
As the Internet of Things (IoT) rolls out today to devices whose lifetime may well exceed a decade, conservative threat models should consider attackers with access to quantum computing power. The SUIT standard (specified by the IETF) defines a security architecture for IoT software updates, standardizing the metadata and the cryptographic tools---namely, digital signatures and hash functions---that guarantee the legitimacy of software updates. While the performance of SUIT has...
Abstract—CARiMoL is a novel run-time Configurable Hardware Accelerator for Ring and Module Lattice-based postquantum cryptography. It’s flexible design can be configured to key-pair generation, encapsulation, and decapsulation for NewHope and CRYSTALS-Kyber schemes using same hardware. CARiMoL offers run-time configurability for multiple security levels of NewHope and CRYSTALS-Kyber schemes, supporting both Chosen-Plaintext Attack (CPA) and Chosen-Ciphertext Attack (CCA) secure...
Today’s society depends on interconnected electronic devices, which handle various sensitive information. Due to the knowledge needed to develop these devices and the economic advantage of reusable solutions, most of these systems contain Third-Party Intellectual Property (3PIP) cores that might not be trustworthy. If one of these 3PIP cores is vulnerable, the security of the entire device is potentially affected. As a result, sensitive data that is processed by the device can be leaked to...
The SM4 block cipher is a Chinese domestic crpytographic that was introduced in 2003. Since the algorithm was developed for the use in wireless sensor networks, it is mandated in the Chinese National Standard for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure). The SM4 block cipher uses a 128-bit block size and a 32-bit round key. This consists of 32 rounds and one reverse translation \texttt{R}. In this paper, we present the optimized implementation of the SM4 block...
Today's microprocessors often rely on microcode updates to address issues such as security or functional patches. Unfortunately, microcode update flexibility opens up new attack vectors through malicious microcode alterations. Such attacks share many features with hardware Trojans and have similar devastating consequences for system security. However, due to microcode's opaque nature, little is known in the open literature about the capabilities and limitations of microcode Trojans. We...
Software implementations of cryptographic algorithms are slow but highly flexible and relatively easy to implement. On the other hand, hardware implementations are usually faster but provide little flexibility and require a lot of time to implement efficiently. In this paper, we develop a hybrid software-hardware implementation of the third round of Supersingular Isogeny Key Encapsulation (SIKE), a post-quantum cryptography algorithm candidate for NIST. We implement an isogeny field...
Lattice-based structures offer considerable possibilities for post-quantum cryptography. Recently, many algorithms have been built on hard lattice problems. The three of the remaining four in the final round of the post-quantum cryptography standardization process use lattice-based methods. Especially in embedded systems, these algorithms should be operated effectively. In this study, the potential of transport triggered architecture is examined in this sense. We try to compare open source...
Side-channel attacks can break mathematically secure cryptographic systems leading to a major concern in applied cryptography. While the cryptanalysis and security evaluation of Post-Quantum Cryptography (PQC) have already received an increasing research effort, a cost analysis of efficient side-channel countermeasures is still lacking. In this work, we propose a masked HW/SW codesign of the NIST PQC finalists Kyber and Saber, suitable for their different characteristics. Among others, we...
The protection of cryptographic implementations against power analysis attacks is of critical importance for many applications in embedded systems. The typical approach of protecting against these attacks is to implement algorithmic countermeasures, like masking. However, implementing these countermeasures in a secure and correct manner is challenging. Masking schemes require the independent processing of secret shares, which is a property that is often violated by CPU microarchitectures in...
RISC-V is equipped with physical memory protection (PMP) to prevent malicious software from accessing protected memory regions. One of the main objectives of PMP is to provide a trusted execution environment (TEE) that isolates secure and insecure applications. In this study, we propose a fault injection attack to bypass the isolation based on PMP. The proposed attack scheme involves extracting successful glitch parameters for fault injection under the assumption of a black-box environment....
The fixslicing implementation strategy was originally introduced as a new representation for the hardware-oriented GIFT block cipher to achieve very efficient software constant-time implementations. In this article, we show that the fundamental idea underlying the fixslicing technique is not of interest only for GIFT, but can be applied to other ciphers as well. Especially, we study the benefits of fixslicing in the case of AES and show that it allows to reduce by 52% the amount of...
Ascon-p is the core building block of Ascon, the winner in the lightweight category of the CAESAR competition. With ISAP, another Ascon-p-based AEAD scheme is currently competing in the 2nd round of the NIST lightweight cryptography standardization project. In contrast to Ascon, ISAP focuses on providing hardening/protection against a large class of implementation attacks, such as DPA, DFA, SFA, and SIFA, entirely on mode-level. Consequently, Ascon-p can be used to realize a wide range of...
Secure, efficient execution of AES is an essential requirement on most computing platforms. Dedicated Instruction Set Extensions (ISEs) are often included for this purpose. RISC-V is a (relatively) new ISA that lacks such a standardised ISE. We survey the state-of-the-art industrial and academic ISEs for AES, implement and evaluate five different ISEs, one of which is novel. We recommend separate ISEs for 32 and 64-bit base architectures, with measured performance improvements for an AES-128...
Fault Injection (FI) attacks have become a practical threat to modern cryptographic implementations. Such attacks have recently focused more on exploitation of implementation-centric and device-specific properties of the faults. In this paper, we consider the parallel between SCA attacks and FI attacks; specifically, that many FI attacks rely on the data-dependency of activation and propagation of a fault, and SCA attacks similarly rely on data-dependent power usage. In fact, these are so...
The RISC-V True Random Number Generator (TRNG) architecture breaks with previous ISA TRNG practice by splitting the Entropy Source (ES) component away from cryptographic DRBGs into a separate privileged interface, and in its use of polling. The modular approach is suitable for the RISC-V hardware IP ecosystem, allows a significantly smaller implementation footprint on platforms that need it, while directly supporting current standards compliance testing methods. We describe the interface,...
A major challenge when applying cryptography on constrained environments is the trade-off between performance and security. In this work, we analyzed different strategies for the optimization of several candidates of NIST's lightweight cryptography standardization project on a RISC-V architecture. In particular, we studied the general impact of optimizing symmetric-key algorithms in assembly and in plain C. Furthermore, we present optimized implementations, achieving a speed-up of up to 81%...
In both hardware and software, masking can represent an effective means of hardening an implementation against side channel attack vectors such as Differential Power Analysis (DPA). Focusing on software, however, the use of masking can present various challenges: specifically, it often 1) requires significant effort to translate any theoretical security properties into practice, and, even then, 2) imposes a significant overhead in terms of efficiency. To address both challenges, this paper...
Cryptography is one of the basic phenomena of security systems. However, some of the widely used public key cryptography algorithms can be broken by using quantum computers. Therefore, many post-quantum cryptography algorithms are proposed in recent years to handle this issue. NTRU is one of the most important of these quantum-safe algorithms. Besides the importance of cryptography algorithms, the architecture where they are implemented is also essential. In this study, we developed an NTRU...
An important selling point for the RISC-V instruction set is the separation between ISA and the implementation of the ISA, leading to flexibility in the design. We argue that for secure implementations, this flexibility is often a vulnerability. With a hardware attacker, the side-effects of instruction execution cannot be ignored. As a result, a strict separation between the ISA interface and implementation is undesirable. We suggest that secure ISA may require additional implementation...
Empowering electronic devices to support Post-Quantum Cryptography (PQC) is a challenging task. PQC introduces new mathematical elements and operations which are usually not easy to implement on standard processors. Especially for low cost and resource constraint devices, hardware acceleration is usually required. In addition, as the standardization process of PQC is still ongoing, a focus on maintaining flexibility is mandatory. To cope with such requirements, hardware/software co-design...
This paper presents a set of efficient and parameterized hardware accelerators that target post-quantum lattice-based cryptographic schemes, including a versatile cSHAKE core, a binary-search CDT-based Gaussian sampler, and a pipelined NTT-based polynomial multiplier, among others. Unlike much of prior work, the accelerators are fully open-sourced, are designed to be constant-time, and can be parameterized at compile-time to support different parameters without the need for re-writing the...
We present and evaluate a custom extension to the RISC-V instruction set for finite fields arithmetic. The result serves as a very compact approach to software-hardware co-design of PQC implementations in the context of small embedded processors such as smartcards. The extension provides instructions that implement finite field operations with subsequent reduction of the result. As small finite fields are used in various PQC schemes, such instructions can provide a considerable speedup for...
We investigate multiple implementations of a hash-based digital signature scheme in software and hardware for a RISC-V processor. For this, different instantiations of XMSS by leveraging SHA-256 and SHA-3 are considered. Moreover, we propose various optimisations for accelerating the signature scheme on resource-constrained FPGAs. Compared to the pure software version, the implemented hardware accelerators for SHA-256 and SHA-3 achieve a significant speedup of 25x and 87x respectively for...
Public key cryptography protocols, such as RSA and elliptic curve cryptography, will be rendered insecure by Shor’s algorithm when large-scale quantum computers are built. Cryptographers are working on quantum-resistant algorithms, and lattice-based cryptography has emerged as a prime candidate. However, high computational complexity of these algorithms makes it challenging to implement lattice-based protocols on low-power embedded devices. To address this challenge, we present Sapphire – a...
SNEIK is a family of lightweight cryptographic algorithms derived from a single 512-bit permutation. The SNEIGEN ``entropy distribution function'' was designed to speed up certain functions in post-quantum and lattice-based public key algorithms. We implement and evaluate SNEIK algorithms on popular 8-bit AVR and 32-bit ARMv7-M (Cortex M3/M4) microcontrollers, and also describe an implementation for the open-source RISC-V (RV32I) Instruction Set Architecture (ISA). Our results demonstrate...
RISC-V is a promising free and open-source instruction set architecture. Most of the instruction set has been standardized and several hardware implementations are commercially available. In this paper we highlight features of RISC-V that are interesting for optimizing implementations of cryptographic primitives. We provide the first optimized assembly implementations of table-based AES, bitsliced AES, ChaCha, and the Keccak-$f$[1600] permutation for the RV32I instruction set. With respect...
Existing logic-locking attacks are known to successfully decrypt a functionally correct key of a locked combinational circuit. It is possible to extend these attacks to real-world Silicon-based Intellectual Properties (IPs, which are sequential circuits) through the scan-chain by selectively initializing the combinational logic and analyzing the responses. In this paper, we propose SeqL, which achieves functional isolation and locks selective functional-input/scan-output pairs, thus...
We describe a software-hardware co-design for the hash-based post-quantum signature scheme XMSS on a RISC-V embedded processor. We provide software optimizations for the XMSS reference implementation for SHA-256 parameter sets and several hardware accelerators that allow to balance area usage and performance based on individual needs. By integrating our hardware accelerators into the RISC-V processor, the version with the best time-area product generates a key pair (that can be used to...
Blocking microarchitectural (digital) side channels is one of the most pressing challenges in hardware security today. Recently, there has been a surge of effort that attempts to block these leakages by writing programs data obliviously. In this model, programs are written to avoid placing sensitive data-dependent pressure on shared resources. Despite recent efforts, however, running data oblivious programs on modern machines today is insecure and low performance. First, writing programs...
During the secure boot process for a trusted execution environment, the processor must provide a chain of certificates to the remote client demonstrating that their secure container was established as specified. This certificate chain is rooted at the hardware manufacturer who is responsible for constructing chips according to the correct specification and provisioning them with key material. We consider a semi-honest manufacturer who is assumed to construct chips correctly, but may attempt...
Side-channel analysis (SCA) attacks pose a serious threat to embedded systems. So far, the research on masking as a countermeasure against SCA focuses merely on cryptographic algorithms, and has either been implemented for particular hardware or software implementations. However, the drawbacks of protecting specific implementations are the lack of flexibility in terms of used algorithms, the impossibility to update protected hardware implementations, and long development cycles for...
Sanctum offers the same promise as SGX, namely strong provable isolation of software modules running concurrently and sharing resources, but protects against an important class of additional software attacks that infer private information from a program's memory access patterns. We follow a principled approach to eliminating entire attack surfaces through isolation, rather than plugging attack-specific privacy leaks. Sanctum demonstrates that strong software isolation is achievable with a...