Lecture 11
Hash Functions
Motivation
Problem:
Naive signing of long messages generates a signature of same length.
• Three Problems
• Computational overhead
• Message overhead
• Security limitations
Solution:
Instead of signing the whole message, sign only a digest (=hash)
Also secure, but much faster
Needed:
Hash Functions
Principal input–output behavior of hash functions
Message Authentication
• Message authentication is a mechanism or service used to verify that
the message has not been modified in transit (data integrity)
• Message authentication assures that data received are exactly as sent
(i.e., contain no modification, insertion, deletion, or replay)
• In many cases, there is a requirement that the authentication
mechanism assures that purported identity of the sender is valid
• Message authentication does not necessarily include the property of
non-repudiation
• When a hash function is used to provide message authentication, the
hash function value is often referred to as a message digest
Hash Functions
• The essence of the use of a hash function for message authentication is that the
sender computes a hash value as a function of the bits in the message and
transmits both the hash value and the message
• If there is a mismatch, the receiver knows that the message (or possibly the hash
value) has been altered
• M = input message
• H = Hash function
• h = H(M)
Message of arbitrary length (M)
(H) Hash
function
Hash (h)
Hash Verification
generation
Message M h
H
message h
verification
H
compare
yes/no
Man in the Middle Attack
• The hash value must be transmitted in a secure fashion
• That is, the hash value must be protected so that if an adversary alters or replaces
the message, it is not feasible for adversary to also alter the hash value to fool the
receiver
• For instance
• Alice transmits a data block and attaches a hash value
• Darth intercepts the message, alters or replaces the data block, and calculates and attaches a
new hash value
• Bob receives the altered data with the new hash value and does not detect the change.
MiM Attack
Hash Function Properties
• Variable Input Size: H can be applied to a block of data of any size.
• Fixed output: H produces a fixed length output.
• Efficient: H(x) is easy to compute for any given x making both hardware and
software implementations practical.
• Preimage Resistant (One way property): For any given value h, it is
computationally infeasible to find x such that H(x) = h.
• Second Preimage Resistant (weak collision resistant): For any given block x, it
is computationally infeasible to find any y with H(y) = H(x), y ≠ x.
• Pseudorandomness: Output of H meets standard tests for pseudorandomness
Security Properties of Hash Functions
It turns out that Collison resistance causes most problems
• How hard is it to find a collision with a probability of 0.5 ?
• Related Problem: How many people are needed such that two of
them have the same birthday with a probability of 0.5 ?
• 23 are enough ! This is called the birthday paradox
• To deal with this paradox, hash functions need an output size of at
least 160 bits.
• Birthday Paradox: Wikipedia has a nice explanation
• http://en.wikipedia.org/wiki/Birthday_problem
Applications of Hash Functions
• To create a one-way password file
• OS stores hash of password not actual password
• When user enters the password, the hash of that password is compared to the stored hash value for
verification.
• For intrusion detection and virus detection
• keep & check hash of files on system
• Pseudorandom function (PRF) or pseudorandom number generator
(PRNG) for the generation of symmetric keys
• Use of Hash function for Message Authentication Code (MAC)
Secure Hash Algorithm
• The Secure Hash Algorithm is a family of cryptographic hash functions published
by the National Institute of Standards and Technology (NIST)
SHA-512
• The algorithm takes as input a message with a maximum length of less than 2^128 bits and
produces as output a 512-bit message digest
• The input is processed in 1024-bit blocks
Steps
• Step 1 : Append Padding Bits
• The message is padded so that its length is congruent to 896 modulo 1024
• Suppose the length of the message M, in bits, is L.
• Append the bit “1" to the end of the message, and then K zero bits, where K is the smallest
non-negative solution to the equation L+1+K = 896 mod 1024
• Step 2: Append Length
• A block of 128 bits is appended to the message. This block contains length of the message M,
in bits, is L
• For example, the (8-bit ASCII) message abc has length 8x3 = 24 so it is padded with a one (1),
then 896 - (24 + 1) = 871 zero bits, and then its length to become the 1024-bit padded
message.
• The length of the padded message should now be a multiple of 1024 bits
Processing of one block
• Step 3: Initialize Hash Buffer
• A 512-bit buffer is used to hold intermediate and final results of the hash function. The buffer
can be represented as eight 64-bit registers (a, b, c, d, e, f, g, h)
• These words were obtained by taking the first sixty-four bits of the fractional parts of the
square roots of the first eight prime numbers
Calculating W
• There is a way to compute the 64 bit Wt for every round t from the 1024 bit
Message
• The first 16 values of Wt are taken directly from the 16 words of the current block
• The remaining values are defined as
• Thus, in the first 16 steps of processing, the value of Wt is equal to the corresponding word in the
message block
• For the remaining 64 steps, the value of Wt consists of the circular left shift by one bit of the XOR
of four of the preceding values of Wt, with two of those values subjected to shift and rotate
operations.
SHA-512 Round Function
SHA-512 Round Function
T2
T1
Observations about the Round Function
• Six of the eight words of the output of the
round function involve simply
permutation (b, c, d, f , g, h) by means of
rotation. This is indicated by shading
• Only two of the output words (a, e) are
generated by substitution
• Word e is a function of input variables (d,
e, f , g, h), as well as the round word Wt
and the constant Kt. Word a is a function
of all of the input variables except d, as
well as the round word Wt and the
constant Kt.
• Step-4: Process Message in
1024 bit blocks:
• Each round takes as input the 512-
bit buffer value, abcdefgh, and
updates the contents of the buffer
• At input to the first round, the
buffer has the value of the
intermediate hash value, Hi-1
• Each round t makes use of a 64-bit
value Wt, derived from the current
1024-bit block being processed Mi
• These values are derived using a
message schedule described
subsequently
• Each round also makes use of an
additive constant Kt, where
0<=t<=79 indicates one of the 80
rounds
• Step-4: Process Message in
1024 bit blocks:
• Remember its basically the value
of abcdefgh that’s getting updated
• The output of the eightieth round
is added to the input to the first
round Hi-1 to produce Hi
• The addition is done
independently for each of the
eight words in the buffer with
each of the corresponding words
in H , using addition modulo 2^64.
i-1
• Step-5: Output
• After all N 1024-bit blocks have been
processed, the output from the Nth stage
is the 512-bit message digest
• SUMMARY
• IV = initial value of the abcdefgh buffer,
defined in step 3
• abcdefgh = the output of the last round
i
of processing of the ith message block
• N = the number of blocks in the message
(including padding and length fields)
• SUM = addition modulo 2 performed
64 64
separately on each word of the pair of
inputs
• This introduces a great deal of redundancy and interdependence into the message blocks that are
compressed, which complicates the task of finding a different message block that maps to the
same compression function output
• The complex repetition of the basic function F produces results that are well mixed; that is, it is
unlikely that two messages chosen at random, even if they exhibit similar regularities, will have
the same hash code
• Unless there is some hidden weakness in SHA-512, which has not so far been published, the
difficulty of coming up with two messages having the same message digest is on the order of
2^256 operations, while the difficulty of finding a message with a given digest is on the order of
2^512 operations
• For an example of SHA-512, please refer to William Stallings Cryptography and Network
Security Book.
Hash Function from Block Cipher
• As m-bit key input to the cipher, we
use a mapping g from the previous
output Hi-1, which is a b-to-m-bit
mapping. In the case of b = m, which
is, for instance, given if AES with a
128-bit key is being used, the
function g can be the identity
mapping.