0% found this document useful (0 votes)
7 views77 pages

Hash Functions

The document provides an introduction to hash functions, explaining their role in ensuring data integrity through the creation of a short 'fingerprint' or message digest of data. It discusses the security aspects of hash functions, including the challenges of preimage, second preimage, and collision problems, as well as the random oracle model which serves as an idealized framework for analyzing hash functions. Additionally, it covers various algorithms for finding preimages and collisions, highlighting the relationships between different security criteria.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views77 pages

Hash Functions

The document provides an introduction to hash functions, explaining their role in ensuring data integrity through the creation of a short 'fingerprint' or message digest of data. It discusses the security aspects of hash functions, including the challenges of preimage, second preimage, and collision problems, as well as the random oracle model which serves as an idealized framework for analyzing hash functions. Additionally, it covers various algorithms for finding preimages and collisions, highlighting the relationships between different security criteria.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

Introduction to Hash Functions

Sugata Gangopadhyay

Indian Institute of Technology Roorkee

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 1 / 77


Hash functions

A cryptographic hash function provides assurance of data integrity.

A hash function is used to construct a short “fingerprint” of some data.

Even if the data is stored in an insecure place, its integrity can be


checked from time to time by recomputing the fingerprint and verifying
that the fingerprint has not changed.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 2 / 77


Hash functions: fingerprinting data, message digest

Let h be a hash function and x be some data.

Usually, x is a binary string of arbitrary length.

The corresponding fingerprint y = h(x) is said to be a message digest.

A message digest would typically be a fairly short binary string; 160 bits
is a common choice.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 3 / 77


Hash functions: fingerprinting data, message digest

Suppose that y is stored in a secure place, but x is not.

Suppose x is changed to x0 , say.

The fact that x has been altered can be detected by computing y 0 = h(x0 )
and verifying that y 0 6= y.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 4 / 77


Keyed hash functions: message authentication code, MAC

Suppose that Alice and Bob share a secret key K which determines a
hash function, say hK .

For a message x, the corresponding authentication tag y = hK (x), can


be computed by Alice and Bob.

The pair (x, y) can be transmitted over an insecure channel from Alice to
Bob.

When Bob receives the pair (x, y), he can verify if y = hK (x).

If this condition is satisfied, he is confident that neither x nor y was


altered by an adversary, provided that the hash family is “secure”.

In particular, Bob is assured that the message x originates from Alice.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 5 / 77


Hash functions

A hash family is a four-tuple (X , Y, K, H), where following conditions


are satisfied:

1 X is a set of possible messages


2 Y is a finite set of possible message digests or authentication tags
3 K, the keyspace, is a finite set of possible keys
4 For each K ∈ K, there is a hash function hK : X → Y.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 6 / 77


Notation and terminology

In the definition of a hash function, X could be a finite or infinite set; Y


is always a finite set.

If X is a finite set, a hash function is sometimes called a compression


function, and we always assume that|X | ≥ |Y| |.

It is customary to assume that|X | ≥ 2|Y|.

A pair (x, y) ∈ X × Y is said to be a valid pair under the key K if


hK (x) = y.

One aim of the study of hash functions is to develop methods that resist
creation of valid pairs by adversaries.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 7 / 77


Notation and terminology

Let F X ,Y denote the set of all functions from X to Y.

|X | = N ,|Y| = M .

The number of all possible hash functions from X to Y is F X ,Y = M N .

Any hash family F ⊆ F X ,Y is termed as an (N, M )-hash family.

An unkeyed function is a function h : X → Y such that|K| = 1

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 8 / 77


Security of Hash Functions

Suppose that h : X → Y is an unkeyed hash function. Let x ∈ X , and


define y = h(x).

If a hash function is to be considered to be secure, it should be the case


that the following three problems are difficult to solve.
1 Preimage:
Instance: A hash function h : X → Y and an element y ∈ Y.
Find: x ∈ X such that h(x) = y.
2 Second Preimage:
Instance: A hash function h : X → Y and an element x ∈ X .
Find: x0 ∈ X such that x 6= x0 and h(x) = h(x0 ).
3 Collision:
Instance: A hash function h : X → Y.
Find: x, x0 ∈ X such that x 6= x0 and h(x) = h(x0 ).

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 9 / 77


Random Oracle Model

Random oracle model is an idealized model for a hash function which


attempts to capture the concept of an “ideal” hash function.

If a hash function h is well designed, it should be the case that the only
efficient way to determine the value h(x) for a given x is actually
evaluate the function h at the value x.

This should remain true even if many other values h(x1 ), h(x2 ), . . . have
already been calculated.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 10 / 77


Random Oracle Model

The random oracle model, which was introduced by Bellare and


Rogaway, provides a mathematical model of an “ideal” hash function.

In this model, a hash function h : X → Y is chosen randomly from


F X ,Y , and we are only permitted oracle access to the function h.

This means that we are not given a formula or an algorithm to compute


values of the function h, and the only way to compute a value h(x) is to
query the oracle.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 11 / 77


The random oracle model

Theorem
Suppose that h ∈ F X ,Y is chosen randomly, and let X0 ⊆ X . Suppose that
the values h(x) have been determined (by querying an oracle h) if and only if
1
x ∈ X0 . The Pr[h(x) = y] = M for all x ∈ X \ X0 and all y ∈ Y.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 12 / 77


Example where the random oracle model does not apply
h : Zn × Zn → Zn , h(x, y) = ax + by mod n a, b ∈ Zn and n ≥ 2 is a
positive integer.
Suppose h(x1 , y1 ) = z1 and h(x2 , y2 ) = z2 . Let r, s ∈ Zn . Then
h(rx1 + sx2 mod n, ry1 + sy2 mod n)
= a(rx1 + sx2 ) + b(ry1 + sy2 ) mod n
= r(ax1 + by1 ) + s(ax2 + by2 ) mod n
= rh(x1 , y1 ) + sh(x2 , y2 ).
Suppose we are told that a hash function in use is a linear function from
Zn × Zn to Zn .
Then given the hash values for any two points, we can determine the
hash values at several other points without actually evaluating the hash
function.
This proves that the random oracle model does not hold for such a
function.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 13 / 77
Randomized algorithms
A Las Vegas algorithm is a randomized algorithm which may fail to give
an answer, but if the algorithm does return an answer, then the answer
must be correct.

Suppose 0 ≤  < 1 is a real number. A randomized algorithm has


worst-case success probability  if the probability that the algorithm
returns a correct answer, averaged over all problem instances of a
specified size, is at least .

(, Q)-algorithm denotes a Las Vegas algorithm with average-case


success probability , in which the oracle queries made by the algorithms
is at most Q.

The success probability  is the average over all possible random choices
of h ∈ F X ,Y , and all possible random choices of x ∈ X or y ∈ Y, if x
and y are specified as part of the problem instance.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 14 / 77
Find-Preimage

Input: h, y, Q;
choose any X0 ⊆ X ,|X0 | = Q;
for x ∈ X0 do
if h(x) = y then
return x
end
return failure
end
Algorithm 1: Find-Preimage

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 15 / 77


Find-Preimage

Theorem
For any X0 ⊆ X with|X0 | = Q, the average-case success probability of
1 Q
Algorithm 1 is  = 1 − (1 − M ) .

Proof.
Let y ∈ Y, and X0 = {x1 , . . . , xQ }. Let Ei denote the event “h(xi ) = y. ”
Pr[Ei ] = 1/M and Pr[Eic ] = 1 − 1/M .

Pr[E1 ∨ E2 ∨ · · · ∨ EQ ] = 1 − Pr[E1c ∧ E2c ∧ · · · ∧ EQ


c
]
 Q
1
=1− 1− .
M

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 16 / 77


Find-Second-Preimage

Input: h, x, Q;
y ← h(x);
choose X0 ⊆ X \ {x},|X0 | = Q − 1;
for x0 ∈ X0 do
if h(x0 ) = y then
return x0
end
return failure
end
Algorithm 2: Find-Second-Preimage

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 17 / 77


Find-Second-Preimage

Theorem
For any X0 ⊆ X with|X0 | = Q − 1, the average-case success probability of
1 Q−1
Algorithm 2 is  = 1 − (1 − M ) .

Proof.
Let y ∈ Y, and X0 = {x1 , . . . , xQ−1 }. Let Ei denote the event “h(xi ) = y. ”
Pr[Ei ] = 1/M and Pr[Eic ] = 1 − 1/M .

Pr[E1 ∨ E2 ∨ · · · ∨ EQ−1 ] = 1 − Pr[E1c ∧ E2c ∧ · · · ∧ EQ−1


c
]
 Q−1
1
=1− 1− .
M

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 18 / 77


Find-Collision

Input: h, Q;
choose X0 ⊆ X ,|X0 | = Q;
for x ∈ X0 do
yx ← h(x);
end
if yx = yx0 for some x 6= x0 then
return (x, x0 )
end
else
return failure
end
Algorithm 3: Find-Collision

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 19 / 77


Find-Collision
Theorem
For any X0 ⊆ X with|X0 | = Q, the success probability of Algorithm 3 is
    
M −1 M −2 M −Q+1
=1− ···
M M M

Proof.
Let X0 = {x1 , x2 , . . . , xQ }. For 1 ≤ i ≤ Q, let Ei denote the event
h(xi ) ∈
/ {h(x1 ), h(x2 ), . . . , h(xi−1 )}.

M −i+1
Pr[Ei |E1 ∧ E2 ∧ · · · ∧ Ei−1 ] = .
M
Therefore
Q−1
Y 
i
Pr[E1 ∧ E2 ∧ · · · ∧ EQ ] = 1− .
M
i=1
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 20 / 77
Find-Collision

Q−1
Y 
i
Pr[E1 ∧ E2 ∧ · · · ∧ EQ ] = 1−
M
i=1
Q−1
Y −i PQ−1 i −Q(Q−1)
= e M = e− i=1 M =e 2M .
i=1

−Q(Q−1)
The probability of finding at least one collision is  = 1 − e 2M .
−Q(Q−1)
e 2M ≈ 1 − ; −Q(Q−1) 2M ≈ ln(1 − ); Q2 − Q ≈ 2M ln 1− 1

r
1
Q≈ 2M ln
1−

If we take  = 0.5, then our estimate is Q ≈ 1.17 M .
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 21 / 77
Comparison of security criteria
C OLLISION -T O -S ECOND -P REIMAGE Finding a collision is easier than
finding the second-preimage. If we have an algorithm that solves the
second-primage problem, then it can be used to find collision. This is sail
to be a “reduction of the problem C OLLISION to the problem
S ECOND -P REIMAGE.
C OLLISION -T O -S ECOND -P REIMAGE
Input: external O RACLE -2 ND -P REIMAGE
choose x ∈ X uniformly at random;
if O RACLE -2 ND -P REIMAGE(h, x) = x0 then
return (x, x0 )
end
else
return failure
end
Algorithm 4: Collision-To-Second-Preimage
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 22 / 77
C OLLISION -T O -P REIMAGE
C OLLISION -T O -P REIMAGE Suppose that we have a (1, Q) algorithm to
solve preimage. The question is whether it can be used to solve collision.
The following randomized algorithm performs that task. The is a
“reduction of the problem C OLLISION to the problem P REIMAGE.
C OLLISION -T O -P REIMAGE
Input: external O RACLE -P REIMAGE
choose x ∈ X uniformly at random;
y ← h(x) ;
if O RACLE -P REIMAGE(h, y) = x0 and x 6= x0 then
return (x, x0 )
end
else
return failure
end
Algorithm 5: Collision-To-Preimage
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 23 / 77
Collision-to-Preimage

Let x ∼ x0 if and only if h(x) = h(x0 ).


[x] = {x0 ∈ X : x0 ∼ x} is the equivalence class of x.
Let the set of all equivalence classes be C.
For x ∈ X , let y = h(x). The probability that
|[x]|−1
C OLLISION -T O -P REIMAGE is successful is [x] .
| |
The average probability of success

1 X [x] − 1 1 X X |C| − 1
Pr[success] = =
|X | [x] |X | |C|
x∈X C∈C x∈C
1 X |X | −|Y| |Y| 1
= (|C| − 1) = =1− ≥ , if |X | ≥ 2|Y| .
|X | |X | |X | 2
C∈C

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 24 / 77


Iterated Hash Functions

We denote the length of a bitstring x by|x|.

The concatenation of the bitstrings x and y is written as xky.

Let compress : {0, 1}m+t → {0, 1}m .

We use the compression function compress to construct a hash function



[
h: {0, 1}i → {0, 1}` .
i=m+t+1

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 25 / 77


Iterated Hash Functions
Preprocessing step
Given an input string x, where|x| ≥ m + t + 1, construct a string y, using a
public algorithm, such that|y| ≡ 0 (mod t). Denote y = y1 ky2 k · · · kyr ,
where|yi | = t for 1 ≤ i ≤ r.

Processing step
Let IV be a public initial value that is a bitstring of length m. Then compute
the following:

z0 ← IV
z1 ← compress(z0 ky1 )
z2 ← compress(z1 ky2 )
.. ..
. .
zr ← compress(zr−1 kyr ).
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 26 / 77
Iterated Hash Functions

Output step
Let g : {0, 1}m → {0, 1}` be a public function. Define h(x) = g(zr ).

Padding
Padding function is a publicly disclosed function that is applied on x to
produce pad(x).
Typically pad(x) involves the length|x| and additional zeros so that the
length of y = xkpad(x) is divisible by t.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 27 / 77


Merkle-Damgård Construction

Suppose compress : {0, 1}m+t → {0, 1}m is a collision resistant


compression function, where t ≥ 1. We will use compress to construct a
collision resistant hash function h : X → {0, 1}m , where

X = ∪∞ i
i=m+t+1 {0, 1} .

Case 1: t ≥ 2
n
Suppose x ∈ X , and|x| = n ≥ m + t + 1. k = d t−1 e and d = k(t − 1) − n.
We can express x as the concatenation: x = x1 kx2 k · · · kxk , where
|x1 | = |x2 | = · · · = |xk−1 | = t − 1 and xk = t − 1 − d.
y(x) = y1 ky2 k · · · kyk+1 . yk is formed from xk by padding on the right with
d zeroes, so that all the blocks yi (1 ≤ i ≤ k) are of length t − 1. yk+1 should
be padded on the left with zeroes so that|yk+1 | = t − 1.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 28 / 77


Merkle-Damgård Construction
external compress;
comment compress : {0, 1}m+t → {0, 1}m , where t ≥ 2 ;
n ← |x| ;
k ← dn/(t − 1)e ;
d ← k(t − 1) − n;
for i ← 1 to k − 1 do
yi ← x i ;
end
yk ← xk k0d ;
yk+1 ← the binary representation of d;
z1 ← 0m+1 ky1 ;
g1 ← compress(z1 );
for i ← 1 to k do
zi+1 ← gi k1kyi+1 ;
gi+1 ← compress(zi+1 );
end
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 29 / 77
Collision Resistance

Theorem
Suppose compress : {0, 1}m+t → {0, 1}m is a collision resistant
compression function, where t ≥ 2. Then the function

[
h: {0, 1}i → {0, 1}m ,
i=m+t+1

as constructed in 6 is a collision resistant hash function.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 30 / 77


Merkle-Damgård Construction
Merkle-Damgård(x) for t = 1
external compress;
comment compress : {0, 1}m+1 → {0, 1}m ;
n ← |x| ;
y ← 11kf (x1 )kf (x2 )k · · · kf (xn );
denote y = y1 ky2 k · · · kyk , where yi ∈ {0, 1}, 1 ≤ i ≤ k;
g1 ← compress(0m ky1 ) ;
for i ← 1 to k − 1 do
gi+1 ← compress(gi kyi+1 );
end
return gk ;
Algorithm 7: M ERKLE -DAMGÅRD(x)

|x| = n ≥ m + 2. f (0) = 0, f (1) = 01.


Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 31 / 77
Collision Resistance t = 1

Theorem
Suppose compress : {0, 1}m+1 → {0, 1}m is a collision resistant
compression function, where t ≥ 2. Then the function

[
h: {0, 1}i → {0, 1}m ,
i=m+2

as constructed in 7 is a collision resistant hash function.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 32 / 77


Some Examples of iterated hash functions

Hash functions constructed by using Merkle-Damgård approach:

MD4 was proposed by Rivest in 1990.


MD5 was proposed by Rivest in 1992.
SHA was proposed as a standard by NIST in 1993, and published as FIPS
180-1. Now SHA is referred to as SHA-0.
Discovery of collisions:

Collision in the compression function of MD4 and MD5 were discovered


in mid-1990s.
It was shown in 1998 that SHA-0 has a weakness that would allow
collision to be found in approximately 261 steps that is much more
efficient than a birthday attack, which requires 280 steps.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 33 / 77


Some Examples of iterated hash functions

List of further attacks:


In CRYPTO-2004:
Collision for SHA-0 was found by Joux.
Collision for MD5 and several other popular hash functions were found by
Wang, Lai, and Yu.
The first collision for SHA-1 was found by Stevens, Bursztein, Karpman,
Albertini, and Markov and announced in 23 February 2017. This attack
was approximately 100000 times faster than a brute-force “birthday
paradox” search having roughly 280 trials.
SHA-2 includes four hash functions known as SHA-224, SHA-256,
SHA-384, and SHA-512.
The last three of the above are approved as FIPS standard in 2002.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 34 / 77


Operations used in SHA-1

X ∧Y bitwise “and” of X and Y


X ∨Y bitwise “or” of X and Y
X ⊕Y bitwise “x-or” of X and Y
¬X bitwise complement of X
X +Y integer addition modulo 232
ROTLs (X) circular left shift of X by s positions (0 ≤ s ≤ 31)

These operations are very efficient.


However, when a suitable sequence of these operations is performed, the
output is quite unpredictable.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 35 / 77


The Sponge Construction

SHA-3 is based on a design called the sponge construction.

This technique was developed by Bertoni, Daemen, Peeters, and Van


Assche.

Instead of using a compression function, the basic “building block” is a


function f that maps bitstrings of a fixed length to bitstrings of the same
length.

Typically f will be a bijection, so every bitstring will have a unique


preimage.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 36 / 77


The Sponge Construction

Suppose that f operates on bitstrings of length b. That is


f 0 : {0, 1}b → {0, 1}b . The integer b is call the width.

Write b = r + c, where r is the bitrate and c is the capacity.

The value of r affects the efficiency of the resulting sponge function, as a


message will be processed r bits at a time.

The value of c affects the resulting security of the sponge function.

The security level against a certain kind of collision attack is intended to


be roughly 2c/2 . This is comparable to the security of a random oracle
with a c-bit output.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 37 / 77


The Sponge Construction

The sponge function based on f works as follows:

The input message M is a bitstring of arbitrary length.

M is padded appropriately so that its length is a multiple of r.

Then the padded message is split into blocks of length r.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 38 / 77


The Sponge Construction
The sponge function based on f works as follows:
Absorbing phase
Initially the state is a bitstring of length b consisting of zeroes.
The first block of the padded message is exclusive-ored with the first r
bits of the state. Then the function f is applied which updates the state.
This process is repeated with the remaining blocks of the padded
message.

Squeezing phase
Suppose ` output bits are desired.
Take the first r bits of the current state; this forms an output block.
If ` > r we apply f to the current state and take the first r output bits as
another output block.
The process is repeated until we have a total of at least ` bits.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 39 / 77
Diagram of the sponge construction

Sugata Gangopadhyay (CSE IITR) Figure 1: sponge


Introduction construction
to Hash Functions 40 / 77
The Sponge Construction

M = m1 k · · · kmk , where m1 , . . . , mk ∈ {0, 1}r .


. . . 0} ∈ {0, 1}r and
Let the initial state be x0 ky0 where x0 = |00 {z
r
. . . 0} ∈ {0, 1}c , and let the state after the ith step be xi kyi ,
y0 = |00 {z
c
where xi ∈ {0, 1}r and yi ∈ {0, 1}c .
The following equations describe the state transitions.

x1 ky1 = f (m1 ⊕ x0 ky0 )


x2 ky2 = f (m2 ⊕ x1 ky1 )
.. .. ..
. . .
xk kyk = f (mk ⊕ xk−1 kyk−1 ).

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 41 / 77


Generation of an internal collision

Suppose

x1 ky1 = f (x0 ky0 ) x1 ky1 = f (x0 ⊕ x0 ky0 ) = f (x0 ky0 )


x2 ky2 = f (x0 ky1 ) x2 ky2 = f (x1 ⊕ x1 ky1 ) = f (x0 ky1 )
.. .. .. · · ·
. . . xh kyh = f (xh−1 ⊕ xh−1 kyh−1 )
xk kyk = f (x0 kyk−1 ). = f (x0 kyh−1 )
xh+1 kyh+1 = f (xh ⊕ xh kyh ) = f (x0 kyh )
· · ·
M = x0 kx1 k · · · kxh
xk+1 kyk+1 = f (xk ⊕ xk kyk ) = f (x0 kyk ).
M 0 = x0 kx1 k · · · kxk

There exists h < k, Since f (x0 kyh ) = f (x0 kyk )


h 6= k such that
f (x0 kyk ) = f (x0 kyh ). xh+1 kyh+1 = xk+1 kyk+1 .
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 42 / 77
Generating collision from the output

Suppose the squeezing phase produces an `-bit output string.

We can generate a collision by mounting a birthday attack on the output


`
by evaluating the sponge function approximately 2 2 times.

Therefore, we can generate collision by applying the sponge function


` c
min{2 2 , 2 2 } times.
If ` ≤ c, then generate collision from `-bit output strings by mounting
birthday attack.
If c < ` generate output collision by constructing internal collision using
the technique discussed in the previous slide.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 43 / 77


SHA-3

SHA3-224, SHA3-256, SHA3-384, and SHA3-512.

SHAKE128, SHAKE256 are extendable output functions that is


abbreviated to XOF.

hash function b r c collision security preimage security


SHA3-224 1600 1152 448 112 224
SHA3-256 1600 1088 512 128 256
SHA3-384 1600 832 768 192 384
SHA3-512 1600 576 1024 256 512
SHAKE128 1600 1344 256 min{ d2 , 128} min{d, 128}
SHAKE256 1600 1088 512 min{ d2 , 256} min{d, 256}

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 44 / 77


Message Authentication Codes: keyed hash function
Keyed hash function from an unkeyed hash function
Suppose h is an unkeyed hash function with IV as the initial value that
required every input message x to have length that is a multiple of t.
h utilizes the compression function compress : {0, 1}m+t → {0, 1}m .
The initial value IV is set to the key K, i.e., IV = K.

An iterative keyed has function

z0 ← K
z1 ← compress(z0 ky1 )
z2 ← compress(z1 ky2 )
.. ..
. .
zr ← compress(zr−1 kyr ).
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 45 / 77
Unkeyed to keyed hash functions

IV = K

z0 ← K
z1 ← compress(z0 ky1 )
z2 ← compress(z1 ky2 )
.. ..
. .
zr ← compress(zr−1 kyr ).

Length extension attack


We have x and hK (x).
Consider the message xkx0 . Then hK (xkx0 ) = compress(hk (x)kx0 ).

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 46 / 77


Length extension attack with padding
Length extension attack with padding
Suppose y = xkpad(x), such that|y| = rt.
Let|w| = t. Define: x0 = xkpad(x)kw.
y 0 = x0 kpad(x0 ) = xkpad(x)kwkpad(x0 ). where y 0 = r0 t for some
integer r0 > r.

Computing hK (x0 )
Let zr = hK (x).

zr+1 ← compress(hK (x)kyr+1 )


zr+2 ← compress(zr+1 kyr+2 )
.. ..
. .
zr0 ← compress(zr0 −1 kyr0 )

(x0 ) = z(CSE
hKGangopadhyay
SoSugata r0 . IITR) Introduction to Hash Functions 47 / 77
MAC attack models

The objective of an adversary (Oscar) is to try to produce a message-tag


pair (x, y) that is valid under a fixed but unknown key, K.

Oscar might have access to some valid pairs for the key K:

(x1 , y1 ), (x2 , y2 ), . . . , (xQ , yQ ).

Two standard attack models are


1 known message attack;
2 chosen message attack.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 48 / 77


Forgery

Suppose Q valid pairs (x1 , y1 ), (x2 , y2 ), . . . , (xQ , yQ ) for an unknown


key K is available to Oscar.
If Oscar can output a message-tag valid pair (x, y) such that
x∈ / {x1 , . . . , xQ } with the probability bounded below by , then Oscar
is said to be an (, Q)-forger for the given MAC.
The pair (x, y) is said to be a forgery.
The probability  can be an average-case probability over all possible
keys, or the worst-case probability.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 49 / 77


Two obvious attacks

Key guessing attack:


1 Oscar chooses K ∈ K uniformly at random, and outputs the tag hK (x) for
an arbitrary message x.
1
2 This attack succeeds with probability |K| .

Tag guessing attack:


1 Oscar chooses the tag y ∈ Y uniformly at random and outputs y has the
tag for any arbitrary message x.
1
2 This attack succeeds with probability |Y| .

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 50 / 77


Nested MACs

A nested MAC builds a MAC algorithm from the composition of two


(keyed) hash families.

Compositions of the hash families (X , Y, K, G) and (Y, Z, L, H) is


(X , Z, M, G ◦ H) in which M = K × L and

G ◦ H = {g ◦ h : g ∈ G, h ∈ H}

where (g ◦ h)(K,L) (x) = hL (gK (x)) for all x ∈ X .

|Y| ≥ |Z| and|X | is either finite or infinite.

If X is finite, then|X | > |Y|.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 51 / 77


Nested MACs

The nested MAC is secure if the following two conditions are satisfied:
1 (Y, Z, L, H) is secure as a MAC, given a fixed (unknown) key.
2 (X , Y, K, G) is collision resistant, given a fixed (unknown) key.

We will refer to (Y, Z, L, H) as the “little MAC”.

(X , Z, M, G ◦ H) is the “big MAC” or the “nested MAC.”

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 52 / 77


Adversaries

We consider the following three adversaries:


1 a forger for the little MAC (which carries a “little MAC attack”),

2 a collision-finder for the has family (X , Y, K, G), when the key is secret
(this is an “unknown-key collision attack”), and

3 a forger for the nested MAC (which we term a “big MAC attack”).

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 53 / 77


Types of attacks

Little MAC attack : a key L is chosen and kept secret. Oscar is allowed
to choose values for y and query a little MAC oracle for the value of
hL (y). Then Oscar attempts to output a pair (y 0 , z) such that z = hL (y 0 )
where y 0 was not one of his previous queries.
Unknown-key collision attack : a key K is chosen and kept secret.
Oscar is allowed to choose values for x and query a hash oracle for
values of gK (x). Then Oscar attempts to output a pair x0 , x00 such that
x0 6= x and gK (x0 ) = gK (x00 ).
Big MAC attack : a pair of keys (K, L) is chosen and kept secret. Oscar
is allowed to choose values for x and query a big MAC oracle for values
of hL (gK (x)). Then Oscar attempts to output a pair (x0 , z) such that
z = hL (gK (x0 )) where x0 was not one of its previous queries.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 54 / 77


Assumptions

We assume that:
1 there does not exist an (1 , Q + 1)-unknown-key collision attack for a
randomly chosen function gK ∈ G where K is secret.

2 there does not exist an (2 , Q)-little MAC attack for a randomly chosen
function hL ∈ H, where L is secret.

3 There exists an (, Q)-big MAC attack for a randomly chosen function
(g ◦ h)(K,L) ∈ G ◦ H, where (K, L) is a secret.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 55 / 77


The attack

The big MAC algorithm outputs a valid pair (x, z) after making at most
Q queries to a big MAC oracle.
x1 , . . . , xQ are the queries, say, generating valid message-tag pairs
(x1 , z1 ), . . . , (xQ , zQ ), as well the valid message-tag pair (x, z) with
probability at least .
Make Q + 1 queries to a hash oracle gK to obtain
y1 = gK (x1 ), . . . , yQ = gK (xQ ), and y = gK (x).
If y = yi for some i ∈ {1, . . . , Q} we have a collision. Else, we have a
valid pair for the little MAC, hence a forger for the little MAC.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 56 / 77


Probability bounds

Any unknown-collision attack has probability at most 1 of succeeding.


The big MAC attack has success probability at least .
Therefore, the probability that (x, z) is a valid pair and y ∈
/ {y1 , . . . , yQ }
is at least  − 1 .
The success probability of any little MAC attack is at most 2 .
So  ≤ 1 + 2 .

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 57 / 77


Theorem
Suppose (X , Z, M, G ◦ H) is a nested MAC. Suppose that there does not exist
an (1 , Q + 1)-collision attack for a randomly chosen function gK ∈ G, when
the key K is secret. Further, suppose that there does not exist an
(2 , Q)-forger for a randomly chosen function hL ∈ H, where L is secret.
Finally, suppose there exist an (, Q)-forger for the nested MAC, for a
randomly chosen function (g ◦ h)(K,L) ∈ G ◦ H. Then  ≤ 1 + 2 .

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 58 / 77


HMAC

HMAC is a nested MAC algorithm that was adopted as a FIPS standard


in March, 2002.
HMACK (x) = SHA-1((K ⊕ opad)kSHA-1((K ⊕ ipad)kx))
ipad and opad are 512-bit constants, defined in hexadecimal notation as
ipad = 3636 · · · 36, opad = 5C5C · · · 5C.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 59 / 77


CBC-MAC

CBC-MAC(x, K)
denote x = x1 kx2 k · · · kxn ;
IV ← 00 · · · 0;
y0 ← IV;
for i ← 1tok − 1 do
yi ← EK (yi−1 ⊕ xi );
end
return yy ;
Algorithm 8: MAC FROM BLOCK CIPHERS

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 60 / 77


Authenticated encryption
MAC-and-encrypt: Given a message x, compute a tag z = hK1 (x) and
a ciphertext y = eK2 (x). The pair (y, z) is transmitted. The receiver
would decrypt y, obtaining x, and then verify the correctness of the tag z
on x.
MAC-then-encrypt Here the tag z = hK1 (x) would be computed first.
Then the plaintext and tag would both be encrypted, yielding
y = eK2 (xkz). The ciphertext y would be transmitted. The receiver will
decrypt y, obtaining x and z, and then verify the correctness of the tag z
on x.
encrypt-then-MAC Here the first step is to encrypt x, producing a
ciphertext y = eK2 (x). Then a tag is created for the ciphertext y, namely,
z = hK1 (y). The pair (y, z) is transmitted. The receiver will first verify
the correctness of the tag z on y. Then, provided that the tag is valid, the
receiver will decrypt y to obtain x.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 61 / 77
Authenticated encryption

Encrypt-then-MAC is preferred over the other methods.


A security result due to Bellare and Namprempre says that this method
of authenticated encryption is secure provided that the two component
schemes are secure.
There exist instantiations of MAC-then-Encrypt and MAC-and-Encrypt
then are insecure, even though the component schemes are secure.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 62 / 77


Counter with CBC MAC (CCM) mode of operation

CCM mode computes a tag using CBC-MAC. This is then followed by


an encryption in counter mode.
Let K be the encryption key and let x = x1 kx2 k · · · kxn be the plaintext.
We choose a counter ctr, and construct a sequence T0 , T1 , . . . , Tn
defined as Ti = ctr + i mod 2m where m is the block length of the
cipher.
The plaintext blocks x1 , x2 , . . . , xn are encrypted by computing
yi = xi ⊕ eK (Ti ).
Compute temp = CBC-MAC(x, K) and y 0 = T0 ⊕ temp.
The ciphertext is the string y = y1 ky2 k · · · kyn ky 0 .

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 63 / 77


Decryption

To decrypt and verify y, one would first decrypt y1 k · · · kyn using the
counter mode decryption with the counter sequence T1 , T2 , . . . , Tn ,
obtaining the plaintext string x.
The second step is to compute CBC-MAC(x, K) and see if it is equal to
y 0 ⊕ T0 .
The ciphertext is rejected if this condition does not hold.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 64 / 77


Galois Counter mode

A detailed description of GCM is given in NIST Special Publication


800-38D.
The encryption is done in counter mode using a 128-bit AES key. The
initial value of the 128-bit counter is derived from an IV that is typically
96 bits in length.
The IV is transmitted along with the ciphertext, and it should be changed
every time a new encryption is performed.
The computation of the authentication tag requires performing
multiplications by a constant value H in the finite field F2128 . The value
of H is determined by encrypting Counter 0.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 65 / 77


Unconditionally secure MACS

Assumptions:
The adversary has infinite computational power.
Any given key is used to produce only one authentication tag.

For Q ∈ {0, 1} we define deception probability P dQ to be the


probability that the adversary can create a successful forgery after
observing Q valid message-tag pairs.
The attack when Q = 0 is said to be impersonation attack, and the attack
when Q = 1 is said to be substitution attack.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 66 / 77


Unconditionally Secure MACs

Assumption: the key K is chosen uniformly at random from K.


In a substitution attack Oscar’s success probability  may depend on the
the particular message-tag pair (x, y) that he observes.
We will assume P d1 to be the maximum of the relevant values (x, y).
Thus when we say that P d1 ≤ , it means that Oscar’s success
probability is at most  regardless of the message-tag pair he observes
prior to making his substitution.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 67 / 77


Unconditionally Secure MACs

key 0 1 2
(0, 0) 0 0 0
(0, 1) 1 1 1
Example (0, 2) 2 2 2
Suppose X = Y = Z3 , and K = Z3 × Z3 . For (1, 0) 0 1 2
each K = (a, b) ∈ K and each x ∈ X , define (1, 1) 1 2 0
h(a,b) (x) = ax + b mod 3, and then define (1, 2) 2 0 1
H = {h(a,b) : (a, b) ∈ Z3 × Z3 }. Each of the (2, 0) 0 2 1
9 keys are used with probability 19 . (2, 1) 1 0 2
(2, 2) 2 1 0
Table 1: An authentication matrix

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 68 / 77


Unconditionally Secure MACs

Deception probabilities
key 0 1 2
Any message-tag pair (x, y) will be a (0, 0) 0 0 0
valid pair with probability 13 . (0, 1) 1 1 1
So P d0 = 13 . (0, 2) 2 2 2
If Oscar sees the message-tag pair (0, 0) (1, 0) 0 1 2
he knows that (1, 1) 1 2 0
K0 = {(0, 0), (1, 0), (2, 0)}. (1, 2) 2 0 1
(1, 1) is a forgery if K0 = (1, 0). This (2, 0) 0 2 1
happens with probability 13 . (2, 1) 1 0 2
(2, 2) 2 1 0
Repeating this for all possible
message-tag pairs gives us the same Table 2: An authentication matrix
probability. So P d1 = 13 .

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 69 / 77


Payoff: deception probability of impersonation

Let K0 denote the key chosen by Alice and Bob. For x ∈ X and y ∈ Y,
define payoff (x, y) to be the probability that the message-tag pair (x, y)
is valid.
payoff (x, y) = Pr[y = hK0 (x)]
{K ∈ K : hK (x) = y}
= .
|K|

P d0 = max{payoff (x, y) : x ∈ X , y ∈ Y}.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 70 / 77


Payoff: deception probability of substitution

payoff (x0 , y 0 ; x, y) = Pr[y 0 = hK0 (x0 )|y = hK0 (x)]


Pr[y 0 = hK0 (x0 ) ∧ y = hK0 (x)]
=
Pr[y = hK0 (x)]
{K ∈ K : y 0 = hK (x0 ), y = hK (x)}
=
{K ∈ K : y = hK (x)}

V = {(x, y) : {K ∈ K : y = hK (x)} ≥ 1}.


P d1 = max(x,y)∈V {max(x0 ,y0 ),x0 6=x {payoff (x0 , y 0 ; x, y)}}.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 71 / 77


Strongly Universal Hash Families

Definition
Suppose (X , Y, K, H) is an (N, M ) hash family. This hash family is strongly
universal provided that the following condition is satisfied for every x, x0 ∈ X
such that x 6= x0 , and for every y, y 0 ∈ Y:

|K|
{K ∈ K : y 0 = hK (x0 ), y = hK (x)} = .
M2

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 72 / 77


Strongly Universal Hash Families

Suppose that (X , Y, K, H) is a strongly universal (N, M )-hash family. Then

|K|
{K ∈ K : hK (x) = y} = ,
M
for every x ∈ X and for every y ∈ Y.

Suppose (X , Y, K, H) is a strongly universal (N, M )-hash family. Then


1
(X , Y, K, H) is an authentication code with P d0 = P d1 = M .

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 73 / 77


Optimality of Deception Probabilities

Suppose (X , Y, K, H) is and (N, M )-hash family. Suppose we fix a


message x ∈ X . Then we can computer as follows:
X X {K ∈ K : hK (x) = y}
payoff (x, y) =
|K|
y∈Y y∈Y
|K|
= = 1.
|K|

Hence, for every x ∈ X , there exists an authenticating tag y (depending


on x), such that
1
payoff (x, y) ≥ .
M

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 74 / 77


Optimality of deception probabilities

Theorem
1
Suppose (X , Y, K, H) is an (N, M )-hash family. Then P d0 ≥ M. Further
1
P d0 = M if and only if

|K|
{K ∈ K : hK (x) = y} =
M
for every x ∈ X and y ∈ Y.

Theorem
1
Suppose (X , Y, K, H) is an (N, M )-hash family. Then P d1 ≥ M.

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 75 / 77


Optimality of deception probabilities

Theorem
1
Suppose (X , Y, K, H) is an (N, M )-hash family. Then P d1 = M. Further
1
P d0 = M if and only if the hash family is strongly universal.

Theorem
1
Suppose (X , Y, K, H) is an (N, M )-hash family such that P d1 = M. Then
1
P d0 = M .

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 76 / 77


The End

Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 77 / 77

You might also like