Introduction to Hash Functions
Sugata Gangopadhyay
Indian Institute of Technology Roorkee
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 1 / 77
Hash functions
A cryptographic hash function provides assurance of data integrity.
A hash function is used to construct a short “fingerprint” of some data.
Even if the data is stored in an insecure place, its integrity can be
checked from time to time by recomputing the fingerprint and verifying
that the fingerprint has not changed.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 2 / 77
Hash functions: fingerprinting data, message digest
Let h be a hash function and x be some data.
Usually, x is a binary string of arbitrary length.
The corresponding fingerprint y = h(x) is said to be a message digest.
A message digest would typically be a fairly short binary string; 160 bits
is a common choice.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 3 / 77
Hash functions: fingerprinting data, message digest
Suppose that y is stored in a secure place, but x is not.
Suppose x is changed to x0 , say.
The fact that x has been altered can be detected by computing y 0 = h(x0 )
and verifying that y 0 6= y.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 4 / 77
Keyed hash functions: message authentication code, MAC
Suppose that Alice and Bob share a secret key K which determines a
hash function, say hK .
For a message x, the corresponding authentication tag y = hK (x), can
be computed by Alice and Bob.
The pair (x, y) can be transmitted over an insecure channel from Alice to
Bob.
When Bob receives the pair (x, y), he can verify if y = hK (x).
If this condition is satisfied, he is confident that neither x nor y was
altered by an adversary, provided that the hash family is “secure”.
In particular, Bob is assured that the message x originates from Alice.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 5 / 77
Hash functions
A hash family is a four-tuple (X , Y, K, H), where following conditions
are satisfied:
1 X is a set of possible messages
2 Y is a finite set of possible message digests or authentication tags
3 K, the keyspace, is a finite set of possible keys
4 For each K ∈ K, there is a hash function hK : X → Y.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 6 / 77
Notation and terminology
In the definition of a hash function, X could be a finite or infinite set; Y
is always a finite set.
If X is a finite set, a hash function is sometimes called a compression
function, and we always assume that|X | ≥ |Y| |.
It is customary to assume that|X | ≥ 2|Y|.
A pair (x, y) ∈ X × Y is said to be a valid pair under the key K if
hK (x) = y.
One aim of the study of hash functions is to develop methods that resist
creation of valid pairs by adversaries.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 7 / 77
Notation and terminology
Let F X ,Y denote the set of all functions from X to Y.
|X | = N ,|Y| = M .
The number of all possible hash functions from X to Y is F X ,Y = M N .
Any hash family F ⊆ F X ,Y is termed as an (N, M )-hash family.
An unkeyed function is a function h : X → Y such that|K| = 1
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 8 / 77
Security of Hash Functions
Suppose that h : X → Y is an unkeyed hash function. Let x ∈ X , and
define y = h(x).
If a hash function is to be considered to be secure, it should be the case
that the following three problems are difficult to solve.
1 Preimage:
Instance: A hash function h : X → Y and an element y ∈ Y.
Find: x ∈ X such that h(x) = y.
2 Second Preimage:
Instance: A hash function h : X → Y and an element x ∈ X .
Find: x0 ∈ X such that x 6= x0 and h(x) = h(x0 ).
3 Collision:
Instance: A hash function h : X → Y.
Find: x, x0 ∈ X such that x 6= x0 and h(x) = h(x0 ).
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 9 / 77
Random Oracle Model
Random oracle model is an idealized model for a hash function which
attempts to capture the concept of an “ideal” hash function.
If a hash function h is well designed, it should be the case that the only
efficient way to determine the value h(x) for a given x is actually
evaluate the function h at the value x.
This should remain true even if many other values h(x1 ), h(x2 ), . . . have
already been calculated.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 10 / 77
Random Oracle Model
The random oracle model, which was introduced by Bellare and
Rogaway, provides a mathematical model of an “ideal” hash function.
In this model, a hash function h : X → Y is chosen randomly from
F X ,Y , and we are only permitted oracle access to the function h.
This means that we are not given a formula or an algorithm to compute
values of the function h, and the only way to compute a value h(x) is to
query the oracle.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 11 / 77
The random oracle model
Theorem
Suppose that h ∈ F X ,Y is chosen randomly, and let X0 ⊆ X . Suppose that
the values h(x) have been determined (by querying an oracle h) if and only if
1
x ∈ X0 . The Pr[h(x) = y] = M for all x ∈ X \ X0 and all y ∈ Y.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 12 / 77
Example where the random oracle model does not apply
h : Zn × Zn → Zn , h(x, y) = ax + by mod n a, b ∈ Zn and n ≥ 2 is a
positive integer.
Suppose h(x1 , y1 ) = z1 and h(x2 , y2 ) = z2 . Let r, s ∈ Zn . Then
h(rx1 + sx2 mod n, ry1 + sy2 mod n)
= a(rx1 + sx2 ) + b(ry1 + sy2 ) mod n
= r(ax1 + by1 ) + s(ax2 + by2 ) mod n
= rh(x1 , y1 ) + sh(x2 , y2 ).
Suppose we are told that a hash function in use is a linear function from
Zn × Zn to Zn .
Then given the hash values for any two points, we can determine the
hash values at several other points without actually evaluating the hash
function.
This proves that the random oracle model does not hold for such a
function.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 13 / 77
Randomized algorithms
A Las Vegas algorithm is a randomized algorithm which may fail to give
an answer, but if the algorithm does return an answer, then the answer
must be correct.
Suppose 0 ≤ < 1 is a real number. A randomized algorithm has
worst-case success probability if the probability that the algorithm
returns a correct answer, averaged over all problem instances of a
specified size, is at least .
(, Q)-algorithm denotes a Las Vegas algorithm with average-case
success probability , in which the oracle queries made by the algorithms
is at most Q.
The success probability is the average over all possible random choices
of h ∈ F X ,Y , and all possible random choices of x ∈ X or y ∈ Y, if x
and y are specified as part of the problem instance.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 14 / 77
Find-Preimage
Input: h, y, Q;
choose any X0 ⊆ X ,|X0 | = Q;
for x ∈ X0 do
if h(x) = y then
return x
end
return failure
end
Algorithm 1: Find-Preimage
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 15 / 77
Find-Preimage
Theorem
For any X0 ⊆ X with|X0 | = Q, the average-case success probability of
1 Q
Algorithm 1 is = 1 − (1 − M ) .
Proof.
Let y ∈ Y, and X0 = {x1 , . . . , xQ }. Let Ei denote the event “h(xi ) = y. ”
Pr[Ei ] = 1/M and Pr[Eic ] = 1 − 1/M .
Pr[E1 ∨ E2 ∨ · · · ∨ EQ ] = 1 − Pr[E1c ∧ E2c ∧ · · · ∧ EQ
c
]
Q
1
=1− 1− .
M
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 16 / 77
Find-Second-Preimage
Input: h, x, Q;
y ← h(x);
choose X0 ⊆ X \ {x},|X0 | = Q − 1;
for x0 ∈ X0 do
if h(x0 ) = y then
return x0
end
return failure
end
Algorithm 2: Find-Second-Preimage
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 17 / 77
Find-Second-Preimage
Theorem
For any X0 ⊆ X with|X0 | = Q − 1, the average-case success probability of
1 Q−1
Algorithm 2 is = 1 − (1 − M ) .
Proof.
Let y ∈ Y, and X0 = {x1 , . . . , xQ−1 }. Let Ei denote the event “h(xi ) = y. ”
Pr[Ei ] = 1/M and Pr[Eic ] = 1 − 1/M .
Pr[E1 ∨ E2 ∨ · · · ∨ EQ−1 ] = 1 − Pr[E1c ∧ E2c ∧ · · · ∧ EQ−1
c
]
Q−1
1
=1− 1− .
M
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 18 / 77
Find-Collision
Input: h, Q;
choose X0 ⊆ X ,|X0 | = Q;
for x ∈ X0 do
yx ← h(x);
end
if yx = yx0 for some x 6= x0 then
return (x, x0 )
end
else
return failure
end
Algorithm 3: Find-Collision
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 19 / 77
Find-Collision
Theorem
For any X0 ⊆ X with|X0 | = Q, the success probability of Algorithm 3 is
M −1 M −2 M −Q+1
=1− ···
M M M
Proof.
Let X0 = {x1 , x2 , . . . , xQ }. For 1 ≤ i ≤ Q, let Ei denote the event
h(xi ) ∈
/ {h(x1 ), h(x2 ), . . . , h(xi−1 )}.
M −i+1
Pr[Ei |E1 ∧ E2 ∧ · · · ∧ Ei−1 ] = .
M
Therefore
Q−1
Y
i
Pr[E1 ∧ E2 ∧ · · · ∧ EQ ] = 1− .
M
i=1
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 20 / 77
Find-Collision
Q−1
Y
i
Pr[E1 ∧ E2 ∧ · · · ∧ EQ ] = 1−
M
i=1
Q−1
Y −i PQ−1 i −Q(Q−1)
= e M = e− i=1 M =e 2M .
i=1
−Q(Q−1)
The probability of finding at least one collision is = 1 − e 2M .
−Q(Q−1)
e 2M ≈ 1 − ; −Q(Q−1) 2M ≈ ln(1 − ); Q2 − Q ≈ 2M ln 1− 1
r
1
Q≈ 2M ln
1−
√
If we take = 0.5, then our estimate is Q ≈ 1.17 M .
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 21 / 77
Comparison of security criteria
C OLLISION -T O -S ECOND -P REIMAGE Finding a collision is easier than
finding the second-preimage. If we have an algorithm that solves the
second-primage problem, then it can be used to find collision. This is sail
to be a “reduction of the problem C OLLISION to the problem
S ECOND -P REIMAGE.
C OLLISION -T O -S ECOND -P REIMAGE
Input: external O RACLE -2 ND -P REIMAGE
choose x ∈ X uniformly at random;
if O RACLE -2 ND -P REIMAGE(h, x) = x0 then
return (x, x0 )
end
else
return failure
end
Algorithm 4: Collision-To-Second-Preimage
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 22 / 77
C OLLISION -T O -P REIMAGE
C OLLISION -T O -P REIMAGE Suppose that we have a (1, Q) algorithm to
solve preimage. The question is whether it can be used to solve collision.
The following randomized algorithm performs that task. The is a
“reduction of the problem C OLLISION to the problem P REIMAGE.
C OLLISION -T O -P REIMAGE
Input: external O RACLE -P REIMAGE
choose x ∈ X uniformly at random;
y ← h(x) ;
if O RACLE -P REIMAGE(h, y) = x0 and x 6= x0 then
return (x, x0 )
end
else
return failure
end
Algorithm 5: Collision-To-Preimage
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 23 / 77
Collision-to-Preimage
Let x ∼ x0 if and only if h(x) = h(x0 ).
[x] = {x0 ∈ X : x0 ∼ x} is the equivalence class of x.
Let the set of all equivalence classes be C.
For x ∈ X , let y = h(x). The probability that
|[x]|−1
C OLLISION -T O -P REIMAGE is successful is [x] .
| |
The average probability of success
1 X [x] − 1 1 X X |C| − 1
Pr[success] = =
|X | [x] |X | |C|
x∈X C∈C x∈C
1 X |X | −|Y| |Y| 1
= (|C| − 1) = =1− ≥ , if |X | ≥ 2|Y| .
|X | |X | |X | 2
C∈C
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 24 / 77
Iterated Hash Functions
We denote the length of a bitstring x by|x|.
The concatenation of the bitstrings x and y is written as xky.
Let compress : {0, 1}m+t → {0, 1}m .
We use the compression function compress to construct a hash function
∞
[
h: {0, 1}i → {0, 1}` .
i=m+t+1
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 25 / 77
Iterated Hash Functions
Preprocessing step
Given an input string x, where|x| ≥ m + t + 1, construct a string y, using a
public algorithm, such that|y| ≡ 0 (mod t). Denote y = y1 ky2 k · · · kyr ,
where|yi | = t for 1 ≤ i ≤ r.
Processing step
Let IV be a public initial value that is a bitstring of length m. Then compute
the following:
z0 ← IV
z1 ← compress(z0 ky1 )
z2 ← compress(z1 ky2 )
.. ..
. .
zr ← compress(zr−1 kyr ).
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 26 / 77
Iterated Hash Functions
Output step
Let g : {0, 1}m → {0, 1}` be a public function. Define h(x) = g(zr ).
Padding
Padding function is a publicly disclosed function that is applied on x to
produce pad(x).
Typically pad(x) involves the length|x| and additional zeros so that the
length of y = xkpad(x) is divisible by t.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 27 / 77
Merkle-Damgård Construction
Suppose compress : {0, 1}m+t → {0, 1}m is a collision resistant
compression function, where t ≥ 1. We will use compress to construct a
collision resistant hash function h : X → {0, 1}m , where
X = ∪∞ i
i=m+t+1 {0, 1} .
Case 1: t ≥ 2
n
Suppose x ∈ X , and|x| = n ≥ m + t + 1. k = d t−1 e and d = k(t − 1) − n.
We can express x as the concatenation: x = x1 kx2 k · · · kxk , where
|x1 | = |x2 | = · · · = |xk−1 | = t − 1 and xk = t − 1 − d.
y(x) = y1 ky2 k · · · kyk+1 . yk is formed from xk by padding on the right with
d zeroes, so that all the blocks yi (1 ≤ i ≤ k) are of length t − 1. yk+1 should
be padded on the left with zeroes so that|yk+1 | = t − 1.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 28 / 77
Merkle-Damgård Construction
external compress;
comment compress : {0, 1}m+t → {0, 1}m , where t ≥ 2 ;
n ← |x| ;
k ← dn/(t − 1)e ;
d ← k(t − 1) − n;
for i ← 1 to k − 1 do
yi ← x i ;
end
yk ← xk k0d ;
yk+1 ← the binary representation of d;
z1 ← 0m+1 ky1 ;
g1 ← compress(z1 );
for i ← 1 to k do
zi+1 ← gi k1kyi+1 ;
gi+1 ← compress(zi+1 );
end
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 29 / 77
Collision Resistance
Theorem
Suppose compress : {0, 1}m+t → {0, 1}m is a collision resistant
compression function, where t ≥ 2. Then the function
∞
[
h: {0, 1}i → {0, 1}m ,
i=m+t+1
as constructed in 6 is a collision resistant hash function.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 30 / 77
Merkle-Damgård Construction
Merkle-Damgård(x) for t = 1
external compress;
comment compress : {0, 1}m+1 → {0, 1}m ;
n ← |x| ;
y ← 11kf (x1 )kf (x2 )k · · · kf (xn );
denote y = y1 ky2 k · · · kyk , where yi ∈ {0, 1}, 1 ≤ i ≤ k;
g1 ← compress(0m ky1 ) ;
for i ← 1 to k − 1 do
gi+1 ← compress(gi kyi+1 );
end
return gk ;
Algorithm 7: M ERKLE -DAMGÅRD(x)
|x| = n ≥ m + 2. f (0) = 0, f (1) = 01.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 31 / 77
Collision Resistance t = 1
Theorem
Suppose compress : {0, 1}m+1 → {0, 1}m is a collision resistant
compression function, where t ≥ 2. Then the function
∞
[
h: {0, 1}i → {0, 1}m ,
i=m+2
as constructed in 7 is a collision resistant hash function.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 32 / 77
Some Examples of iterated hash functions
Hash functions constructed by using Merkle-Damgård approach:
MD4 was proposed by Rivest in 1990.
MD5 was proposed by Rivest in 1992.
SHA was proposed as a standard by NIST in 1993, and published as FIPS
180-1. Now SHA is referred to as SHA-0.
Discovery of collisions:
Collision in the compression function of MD4 and MD5 were discovered
in mid-1990s.
It was shown in 1998 that SHA-0 has a weakness that would allow
collision to be found in approximately 261 steps that is much more
efficient than a birthday attack, which requires 280 steps.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 33 / 77
Some Examples of iterated hash functions
List of further attacks:
In CRYPTO-2004:
Collision for SHA-0 was found by Joux.
Collision for MD5 and several other popular hash functions were found by
Wang, Lai, and Yu.
The first collision for SHA-1 was found by Stevens, Bursztein, Karpman,
Albertini, and Markov and announced in 23 February 2017. This attack
was approximately 100000 times faster than a brute-force “birthday
paradox” search having roughly 280 trials.
SHA-2 includes four hash functions known as SHA-224, SHA-256,
SHA-384, and SHA-512.
The last three of the above are approved as FIPS standard in 2002.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 34 / 77
Operations used in SHA-1
X ∧Y bitwise “and” of X and Y
X ∨Y bitwise “or” of X and Y
X ⊕Y bitwise “x-or” of X and Y
¬X bitwise complement of X
X +Y integer addition modulo 232
ROTLs (X) circular left shift of X by s positions (0 ≤ s ≤ 31)
These operations are very efficient.
However, when a suitable sequence of these operations is performed, the
output is quite unpredictable.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 35 / 77
The Sponge Construction
SHA-3 is based on a design called the sponge construction.
This technique was developed by Bertoni, Daemen, Peeters, and Van
Assche.
Instead of using a compression function, the basic “building block” is a
function f that maps bitstrings of a fixed length to bitstrings of the same
length.
Typically f will be a bijection, so every bitstring will have a unique
preimage.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 36 / 77
The Sponge Construction
Suppose that f operates on bitstrings of length b. That is
f 0 : {0, 1}b → {0, 1}b . The integer b is call the width.
Write b = r + c, where r is the bitrate and c is the capacity.
The value of r affects the efficiency of the resulting sponge function, as a
message will be processed r bits at a time.
The value of c affects the resulting security of the sponge function.
The security level against a certain kind of collision attack is intended to
be roughly 2c/2 . This is comparable to the security of a random oracle
with a c-bit output.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 37 / 77
The Sponge Construction
The sponge function based on f works as follows:
The input message M is a bitstring of arbitrary length.
M is padded appropriately so that its length is a multiple of r.
Then the padded message is split into blocks of length r.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 38 / 77
The Sponge Construction
The sponge function based on f works as follows:
Absorbing phase
Initially the state is a bitstring of length b consisting of zeroes.
The first block of the padded message is exclusive-ored with the first r
bits of the state. Then the function f is applied which updates the state.
This process is repeated with the remaining blocks of the padded
message.
Squeezing phase
Suppose ` output bits are desired.
Take the first r bits of the current state; this forms an output block.
If ` > r we apply f to the current state and take the first r output bits as
another output block.
The process is repeated until we have a total of at least ` bits.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 39 / 77
Diagram of the sponge construction
Sugata Gangopadhyay (CSE IITR) Figure 1: sponge
Introduction construction
to Hash Functions 40 / 77
The Sponge Construction
M = m1 k · · · kmk , where m1 , . . . , mk ∈ {0, 1}r .
. . . 0} ∈ {0, 1}r and
Let the initial state be x0 ky0 where x0 = |00 {z
r
. . . 0} ∈ {0, 1}c , and let the state after the ith step be xi kyi ,
y0 = |00 {z
c
where xi ∈ {0, 1}r and yi ∈ {0, 1}c .
The following equations describe the state transitions.
x1 ky1 = f (m1 ⊕ x0 ky0 )
x2 ky2 = f (m2 ⊕ x1 ky1 )
.. .. ..
. . .
xk kyk = f (mk ⊕ xk−1 kyk−1 ).
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 41 / 77
Generation of an internal collision
Suppose
x1 ky1 = f (x0 ky0 ) x1 ky1 = f (x0 ⊕ x0 ky0 ) = f (x0 ky0 )
x2 ky2 = f (x0 ky1 ) x2 ky2 = f (x1 ⊕ x1 ky1 ) = f (x0 ky1 )
.. .. .. · · ·
. . . xh kyh = f (xh−1 ⊕ xh−1 kyh−1 )
xk kyk = f (x0 kyk−1 ). = f (x0 kyh−1 )
xh+1 kyh+1 = f (xh ⊕ xh kyh ) = f (x0 kyh )
· · ·
M = x0 kx1 k · · · kxh
xk+1 kyk+1 = f (xk ⊕ xk kyk ) = f (x0 kyk ).
M 0 = x0 kx1 k · · · kxk
There exists h < k, Since f (x0 kyh ) = f (x0 kyk )
h 6= k such that
f (x0 kyk ) = f (x0 kyh ). xh+1 kyh+1 = xk+1 kyk+1 .
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 42 / 77
Generating collision from the output
Suppose the squeezing phase produces an `-bit output string.
We can generate a collision by mounting a birthday attack on the output
`
by evaluating the sponge function approximately 2 2 times.
Therefore, we can generate collision by applying the sponge function
` c
min{2 2 , 2 2 } times.
If ` ≤ c, then generate collision from `-bit output strings by mounting
birthday attack.
If c < ` generate output collision by constructing internal collision using
the technique discussed in the previous slide.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 43 / 77
SHA-3
SHA3-224, SHA3-256, SHA3-384, and SHA3-512.
SHAKE128, SHAKE256 are extendable output functions that is
abbreviated to XOF.
hash function b r c collision security preimage security
SHA3-224 1600 1152 448 112 224
SHA3-256 1600 1088 512 128 256
SHA3-384 1600 832 768 192 384
SHA3-512 1600 576 1024 256 512
SHAKE128 1600 1344 256 min{ d2 , 128} min{d, 128}
SHAKE256 1600 1088 512 min{ d2 , 256} min{d, 256}
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 44 / 77
Message Authentication Codes: keyed hash function
Keyed hash function from an unkeyed hash function
Suppose h is an unkeyed hash function with IV as the initial value that
required every input message x to have length that is a multiple of t.
h utilizes the compression function compress : {0, 1}m+t → {0, 1}m .
The initial value IV is set to the key K, i.e., IV = K.
An iterative keyed has function
z0 ← K
z1 ← compress(z0 ky1 )
z2 ← compress(z1 ky2 )
.. ..
. .
zr ← compress(zr−1 kyr ).
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 45 / 77
Unkeyed to keyed hash functions
IV = K
z0 ← K
z1 ← compress(z0 ky1 )
z2 ← compress(z1 ky2 )
.. ..
. .
zr ← compress(zr−1 kyr ).
Length extension attack
We have x and hK (x).
Consider the message xkx0 . Then hK (xkx0 ) = compress(hk (x)kx0 ).
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 46 / 77
Length extension attack with padding
Length extension attack with padding
Suppose y = xkpad(x), such that|y| = rt.
Let|w| = t. Define: x0 = xkpad(x)kw.
y 0 = x0 kpad(x0 ) = xkpad(x)kwkpad(x0 ). where y 0 = r0 t for some
integer r0 > r.
Computing hK (x0 )
Let zr = hK (x).
zr+1 ← compress(hK (x)kyr+1 )
zr+2 ← compress(zr+1 kyr+2 )
.. ..
. .
zr0 ← compress(zr0 −1 kyr0 )
(x0 ) = z(CSE
hKGangopadhyay
SoSugata r0 . IITR) Introduction to Hash Functions 47 / 77
MAC attack models
The objective of an adversary (Oscar) is to try to produce a message-tag
pair (x, y) that is valid under a fixed but unknown key, K.
Oscar might have access to some valid pairs for the key K:
(x1 , y1 ), (x2 , y2 ), . . . , (xQ , yQ ).
Two standard attack models are
1 known message attack;
2 chosen message attack.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 48 / 77
Forgery
Suppose Q valid pairs (x1 , y1 ), (x2 , y2 ), . . . , (xQ , yQ ) for an unknown
key K is available to Oscar.
If Oscar can output a message-tag valid pair (x, y) such that
x∈ / {x1 , . . . , xQ } with the probability bounded below by , then Oscar
is said to be an (, Q)-forger for the given MAC.
The pair (x, y) is said to be a forgery.
The probability can be an average-case probability over all possible
keys, or the worst-case probability.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 49 / 77
Two obvious attacks
Key guessing attack:
1 Oscar chooses K ∈ K uniformly at random, and outputs the tag hK (x) for
an arbitrary message x.
1
2 This attack succeeds with probability |K| .
Tag guessing attack:
1 Oscar chooses the tag y ∈ Y uniformly at random and outputs y has the
tag for any arbitrary message x.
1
2 This attack succeeds with probability |Y| .
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 50 / 77
Nested MACs
A nested MAC builds a MAC algorithm from the composition of two
(keyed) hash families.
Compositions of the hash families (X , Y, K, G) and (Y, Z, L, H) is
(X , Z, M, G ◦ H) in which M = K × L and
G ◦ H = {g ◦ h : g ∈ G, h ∈ H}
where (g ◦ h)(K,L) (x) = hL (gK (x)) for all x ∈ X .
|Y| ≥ |Z| and|X | is either finite or infinite.
If X is finite, then|X | > |Y|.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 51 / 77
Nested MACs
The nested MAC is secure if the following two conditions are satisfied:
1 (Y, Z, L, H) is secure as a MAC, given a fixed (unknown) key.
2 (X , Y, K, G) is collision resistant, given a fixed (unknown) key.
We will refer to (Y, Z, L, H) as the “little MAC”.
(X , Z, M, G ◦ H) is the “big MAC” or the “nested MAC.”
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 52 / 77
Adversaries
We consider the following three adversaries:
1 a forger for the little MAC (which carries a “little MAC attack”),
2 a collision-finder for the has family (X , Y, K, G), when the key is secret
(this is an “unknown-key collision attack”), and
3 a forger for the nested MAC (which we term a “big MAC attack”).
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 53 / 77
Types of attacks
Little MAC attack : a key L is chosen and kept secret. Oscar is allowed
to choose values for y and query a little MAC oracle for the value of
hL (y). Then Oscar attempts to output a pair (y 0 , z) such that z = hL (y 0 )
where y 0 was not one of his previous queries.
Unknown-key collision attack : a key K is chosen and kept secret.
Oscar is allowed to choose values for x and query a hash oracle for
values of gK (x). Then Oscar attempts to output a pair x0 , x00 such that
x0 6= x and gK (x0 ) = gK (x00 ).
Big MAC attack : a pair of keys (K, L) is chosen and kept secret. Oscar
is allowed to choose values for x and query a big MAC oracle for values
of hL (gK (x)). Then Oscar attempts to output a pair (x0 , z) such that
z = hL (gK (x0 )) where x0 was not one of its previous queries.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 54 / 77
Assumptions
We assume that:
1 there does not exist an (1 , Q + 1)-unknown-key collision attack for a
randomly chosen function gK ∈ G where K is secret.
2 there does not exist an (2 , Q)-little MAC attack for a randomly chosen
function hL ∈ H, where L is secret.
3 There exists an (, Q)-big MAC attack for a randomly chosen function
(g ◦ h)(K,L) ∈ G ◦ H, where (K, L) is a secret.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 55 / 77
The attack
The big MAC algorithm outputs a valid pair (x, z) after making at most
Q queries to a big MAC oracle.
x1 , . . . , xQ are the queries, say, generating valid message-tag pairs
(x1 , z1 ), . . . , (xQ , zQ ), as well the valid message-tag pair (x, z) with
probability at least .
Make Q + 1 queries to a hash oracle gK to obtain
y1 = gK (x1 ), . . . , yQ = gK (xQ ), and y = gK (x).
If y = yi for some i ∈ {1, . . . , Q} we have a collision. Else, we have a
valid pair for the little MAC, hence a forger for the little MAC.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 56 / 77
Probability bounds
Any unknown-collision attack has probability at most 1 of succeeding.
The big MAC attack has success probability at least .
Therefore, the probability that (x, z) is a valid pair and y ∈
/ {y1 , . . . , yQ }
is at least − 1 .
The success probability of any little MAC attack is at most 2 .
So ≤ 1 + 2 .
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 57 / 77
Theorem
Suppose (X , Z, M, G ◦ H) is a nested MAC. Suppose that there does not exist
an (1 , Q + 1)-collision attack for a randomly chosen function gK ∈ G, when
the key K is secret. Further, suppose that there does not exist an
(2 , Q)-forger for a randomly chosen function hL ∈ H, where L is secret.
Finally, suppose there exist an (, Q)-forger for the nested MAC, for a
randomly chosen function (g ◦ h)(K,L) ∈ G ◦ H. Then ≤ 1 + 2 .
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 58 / 77
HMAC
HMAC is a nested MAC algorithm that was adopted as a FIPS standard
in March, 2002.
HMACK (x) = SHA-1((K ⊕ opad)kSHA-1((K ⊕ ipad)kx))
ipad and opad are 512-bit constants, defined in hexadecimal notation as
ipad = 3636 · · · 36, opad = 5C5C · · · 5C.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 59 / 77
CBC-MAC
CBC-MAC(x, K)
denote x = x1 kx2 k · · · kxn ;
IV ← 00 · · · 0;
y0 ← IV;
for i ← 1tok − 1 do
yi ← EK (yi−1 ⊕ xi );
end
return yy ;
Algorithm 8: MAC FROM BLOCK CIPHERS
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 60 / 77
Authenticated encryption
MAC-and-encrypt: Given a message x, compute a tag z = hK1 (x) and
a ciphertext y = eK2 (x). The pair (y, z) is transmitted. The receiver
would decrypt y, obtaining x, and then verify the correctness of the tag z
on x.
MAC-then-encrypt Here the tag z = hK1 (x) would be computed first.
Then the plaintext and tag would both be encrypted, yielding
y = eK2 (xkz). The ciphertext y would be transmitted. The receiver will
decrypt y, obtaining x and z, and then verify the correctness of the tag z
on x.
encrypt-then-MAC Here the first step is to encrypt x, producing a
ciphertext y = eK2 (x). Then a tag is created for the ciphertext y, namely,
z = hK1 (y). The pair (y, z) is transmitted. The receiver will first verify
the correctness of the tag z on y. Then, provided that the tag is valid, the
receiver will decrypt y to obtain x.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 61 / 77
Authenticated encryption
Encrypt-then-MAC is preferred over the other methods.
A security result due to Bellare and Namprempre says that this method
of authenticated encryption is secure provided that the two component
schemes are secure.
There exist instantiations of MAC-then-Encrypt and MAC-and-Encrypt
then are insecure, even though the component schemes are secure.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 62 / 77
Counter with CBC MAC (CCM) mode of operation
CCM mode computes a tag using CBC-MAC. This is then followed by
an encryption in counter mode.
Let K be the encryption key and let x = x1 kx2 k · · · kxn be the plaintext.
We choose a counter ctr, and construct a sequence T0 , T1 , . . . , Tn
defined as Ti = ctr + i mod 2m where m is the block length of the
cipher.
The plaintext blocks x1 , x2 , . . . , xn are encrypted by computing
yi = xi ⊕ eK (Ti ).
Compute temp = CBC-MAC(x, K) and y 0 = T0 ⊕ temp.
The ciphertext is the string y = y1 ky2 k · · · kyn ky 0 .
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 63 / 77
Decryption
To decrypt and verify y, one would first decrypt y1 k · · · kyn using the
counter mode decryption with the counter sequence T1 , T2 , . . . , Tn ,
obtaining the plaintext string x.
The second step is to compute CBC-MAC(x, K) and see if it is equal to
y 0 ⊕ T0 .
The ciphertext is rejected if this condition does not hold.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 64 / 77
Galois Counter mode
A detailed description of GCM is given in NIST Special Publication
800-38D.
The encryption is done in counter mode using a 128-bit AES key. The
initial value of the 128-bit counter is derived from an IV that is typically
96 bits in length.
The IV is transmitted along with the ciphertext, and it should be changed
every time a new encryption is performed.
The computation of the authentication tag requires performing
multiplications by a constant value H in the finite field F2128 . The value
of H is determined by encrypting Counter 0.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 65 / 77
Unconditionally secure MACS
Assumptions:
The adversary has infinite computational power.
Any given key is used to produce only one authentication tag.
For Q ∈ {0, 1} we define deception probability P dQ to be the
probability that the adversary can create a successful forgery after
observing Q valid message-tag pairs.
The attack when Q = 0 is said to be impersonation attack, and the attack
when Q = 1 is said to be substitution attack.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 66 / 77
Unconditionally Secure MACs
Assumption: the key K is chosen uniformly at random from K.
In a substitution attack Oscar’s success probability may depend on the
the particular message-tag pair (x, y) that he observes.
We will assume P d1 to be the maximum of the relevant values (x, y).
Thus when we say that P d1 ≤ , it means that Oscar’s success
probability is at most regardless of the message-tag pair he observes
prior to making his substitution.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 67 / 77
Unconditionally Secure MACs
key 0 1 2
(0, 0) 0 0 0
(0, 1) 1 1 1
Example (0, 2) 2 2 2
Suppose X = Y = Z3 , and K = Z3 × Z3 . For (1, 0) 0 1 2
each K = (a, b) ∈ K and each x ∈ X , define (1, 1) 1 2 0
h(a,b) (x) = ax + b mod 3, and then define (1, 2) 2 0 1
H = {h(a,b) : (a, b) ∈ Z3 × Z3 }. Each of the (2, 0) 0 2 1
9 keys are used with probability 19 . (2, 1) 1 0 2
(2, 2) 2 1 0
Table 1: An authentication matrix
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 68 / 77
Unconditionally Secure MACs
Deception probabilities
key 0 1 2
Any message-tag pair (x, y) will be a (0, 0) 0 0 0
valid pair with probability 13 . (0, 1) 1 1 1
So P d0 = 13 . (0, 2) 2 2 2
If Oscar sees the message-tag pair (0, 0) (1, 0) 0 1 2
he knows that (1, 1) 1 2 0
K0 = {(0, 0), (1, 0), (2, 0)}. (1, 2) 2 0 1
(1, 1) is a forgery if K0 = (1, 0). This (2, 0) 0 2 1
happens with probability 13 . (2, 1) 1 0 2
(2, 2) 2 1 0
Repeating this for all possible
message-tag pairs gives us the same Table 2: An authentication matrix
probability. So P d1 = 13 .
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 69 / 77
Payoff: deception probability of impersonation
Let K0 denote the key chosen by Alice and Bob. For x ∈ X and y ∈ Y,
define payoff (x, y) to be the probability that the message-tag pair (x, y)
is valid.
payoff (x, y) = Pr[y = hK0 (x)]
{K ∈ K : hK (x) = y}
= .
|K|
P d0 = max{payoff (x, y) : x ∈ X , y ∈ Y}.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 70 / 77
Payoff: deception probability of substitution
payoff (x0 , y 0 ; x, y) = Pr[y 0 = hK0 (x0 )|y = hK0 (x)]
Pr[y 0 = hK0 (x0 ) ∧ y = hK0 (x)]
=
Pr[y = hK0 (x)]
{K ∈ K : y 0 = hK (x0 ), y = hK (x)}
=
{K ∈ K : y = hK (x)}
V = {(x, y) : {K ∈ K : y = hK (x)} ≥ 1}.
P d1 = max(x,y)∈V {max(x0 ,y0 ),x0 6=x {payoff (x0 , y 0 ; x, y)}}.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 71 / 77
Strongly Universal Hash Families
Definition
Suppose (X , Y, K, H) is an (N, M ) hash family. This hash family is strongly
universal provided that the following condition is satisfied for every x, x0 ∈ X
such that x 6= x0 , and for every y, y 0 ∈ Y:
|K|
{K ∈ K : y 0 = hK (x0 ), y = hK (x)} = .
M2
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 72 / 77
Strongly Universal Hash Families
Suppose that (X , Y, K, H) is a strongly universal (N, M )-hash family. Then
|K|
{K ∈ K : hK (x) = y} = ,
M
for every x ∈ X and for every y ∈ Y.
Suppose (X , Y, K, H) is a strongly universal (N, M )-hash family. Then
1
(X , Y, K, H) is an authentication code with P d0 = P d1 = M .
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 73 / 77
Optimality of Deception Probabilities
Suppose (X , Y, K, H) is and (N, M )-hash family. Suppose we fix a
message x ∈ X . Then we can computer as follows:
X X {K ∈ K : hK (x) = y}
payoff (x, y) =
|K|
y∈Y y∈Y
|K|
= = 1.
|K|
Hence, for every x ∈ X , there exists an authenticating tag y (depending
on x), such that
1
payoff (x, y) ≥ .
M
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 74 / 77
Optimality of deception probabilities
Theorem
1
Suppose (X , Y, K, H) is an (N, M )-hash family. Then P d0 ≥ M. Further
1
P d0 = M if and only if
|K|
{K ∈ K : hK (x) = y} =
M
for every x ∈ X and y ∈ Y.
Theorem
1
Suppose (X , Y, K, H) is an (N, M )-hash family. Then P d1 ≥ M.
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 75 / 77
Optimality of deception probabilities
Theorem
1
Suppose (X , Y, K, H) is an (N, M )-hash family. Then P d1 = M. Further
1
P d0 = M if and only if the hash family is strongly universal.
Theorem
1
Suppose (X , Y, K, H) is an (N, M )-hash family such that P d1 = M. Then
1
P d0 = M .
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 76 / 77
The End
Sugata Gangopadhyay (CSE IITR) Introduction to Hash Functions 77 / 77