Zheng Z. Modern Cryptography. Vol 1...Math Principle 2022
Zheng Z. Modern Cryptography. Vol 1...Math Principle 2022
Zhiyong Zheng
Modern
Cryptography
Volume 1
A Classical Introduction to Informational
and Mathematical Principle
Financial Mathematics and Fintech
Series Editors
Zhiyong Zheng, Renmin University of China, Beijing, Beijing, China
Alan Peng, University of Toronto, Toronto, ON, Canada
This series addresses the emerging advances in mathematical theory related to finance
and application research from all the fintech perspectives. It is a series of mono-
graphs and contributed volumes focusing on the in-depth exploration of financial
mathematics such as applied mathematics, statistics, optimization, and scientific
computation, and fintech applications such as artificial intelligence, block chain,
cloud computing, and big data. This series is featured by the comprehensive under-
standing and practical application of financial mathematics and fintech. This book
series involves cutting-edge applications of financial mathematics and fintech in
practical programs and companies.
The Financial Mathematics and Fintech book series promotes the exchange
of emerging theory and technology of financial mathematics and fintech between
academia and financial practitioner. It aims to provide a timely reflection of the state
of art in mathematics and computer science facing to the application of finance.
As a collection, this book series provides valuable resources to a wide audience
in academia, the finance community, government employees related to finance and
anyone else looking to expand their knowledge in financial mathematics and fintech.
The key words in this series include but are not limited to:
a) Financial mathematics
b) Fintech
c) Computer science
d) Artificial intelligence
e) Big data
Modern Cryptography
Volume 1
A Classical Introduction to Informational
and Mathematical Principle
Zhiyong Zheng
School of Mathematics
Renmin University of China
Beijing, China
© The Editor(s) (if applicable) and The Author(s) 2022. This book is an open access publication.
Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International
License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribu-
tion and reproduction in any medium or format, as long as you give appropriate credit to the original
author(s) and the source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this book are included in the book’s Creative Commons license,
unless indicated otherwise in a credit line to the material. If material is not included in the book’s Creative
Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface
v
vi Preface
be used as the main reference book for scientific researchers engaged in cryptography
research and cryptographic engineering.
The main contents of this book have been taught in the seminar. My doctoral
students Hong Ziwei, Chen Man, Xu Jie, Zhang Mingpei, Associate Professor Huang
Wenlin and Dr. Tian Kun have all put forward many useful suggestions and help for
the contents of this book. In particular, Chen Man has devoted a lot of time and energy
to text printing and proofreading. Here, I would like to express my deep gratitude to
them!
1 Preparatory Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Injective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Computational Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Jensen Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Stirling Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 n-fold Bernoulli Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6 Chebyshev Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.7 Stochastic Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2 The Basis of Code Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.1 Hamming Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.2 Linear Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.3 Lee Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.4 Some Typical Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.4.1 Hadamard Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.4.2 Binary Golay Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.4.3 3-Ary Golay Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.4.4 Reed–Muller Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.5 Shannon Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3 Shannon Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.1 Information Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.2 Joint Entropy, Conditional Entropy, Mutual Information . . . . . . . . . . 96
3.3 Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
3.4 Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
3.5 Source Coding Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.6 Optimal Code Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
3.7 Several Examples of Compression Coding . . . . . . . . . . . . . . . . . . . . . 130
3.7.1 Morse Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
3.7.2 Huffman Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
ix
x Contents
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
Acronyms
1. [x] denotes the largest integer not greater than the real number x, x denotes
the smallest integer not less than the real number x, so there are
xi
Chapter 1
Preparatory Knowledge
1.1 Injective
σ
Let σ be a mapping of two nonempty sets A to B, denoted as A −→ B. Generally,
the mappings between sets can be divided into three categories: injective, surjective
and bijective.
Definition 1.1 Let σ be a mapping of two nonempty sets A → B, we define
(i) a, b ∈ A, if a = b ⇒ σ (a) = σ (b), call σ an injective of A → B, it is called
injective for short.
(ii) If any b ∈ B, there is a a ∈ A ⇒ σ (a) = b, call σ a surjective of A → B.
σ
(iii) If A −→ B is an injective and a surjective, call σ a bijective of A → B.
(iv) Let 1 A be the identity mapping of A → A, which is defined as
1 A (a) = a, ∀ a ∈ A.
σ τ
(v) Suppose A −→ B −→ C are two mappings, define the product mapping of
τ and σ , τ σ : A → C, and define as
© The Author(s) 2022 1
Z. Zheng, Modern Cryptography Volume 1, Financial Mathematics and Fintech,
https://doi.org/10.1007/978-981-19-0920-7_1
2 1 Preparatory Knowledge
τ σ (a) = τ (σ (a)), ∀ a ∈ A.
Obviously, the product of two mappings has no commutativity but has the fol-
lowing associative law.
σ τ δ
Property 1 Let A −→ B −→ C −→ D be three mappings, then we have
(δ · τ ) · σ = δ · (τ · σ ). (1.1)
σ 1 A = σ, 1 B σ = σ. (1.2)
The above formula shows that identity mapping plays the role of multiplication
identity in the product of mapping.
σ τ
Definition 1.2 (i) Suppose A −→ B −→ A are two mappings, if τ σ = 1 A , call τ
is a left inverse mapping of σ , σ is a right inverse mapping of τ .
σ τ
(ii) Let A −→ B −→ A, If τ σ = 1 A , σ τ = 1 B , call τ is an inverse mapping of σ .
−1
Denote as τ = σ .
The essential properties of injective, surjective and bijective between sets are
described by the following lemma.
σ τ
Lemma 1.1 (i) If A −→ B has an inverse mapping B −→ A, that is σ τ = 1 B and
τ σ = 1 A , then τ is unique ( denote as τ = σ −1 ).
σ τ
(ii) A −→ B is an injective if and only if σ has a left inverse mapping B −→ A,
that is τ σ = 1 A .
σ τ
(iii) A −→ B is an surjective if and only if σ has a right inverse mapping B −→ A,
that is σ τ = 1 B .
σ
(iv) A −→ B is an bijective if and only if σ has an inverse mapping τ , and τ is
unique.
τ1 τ2
Proof First of all, prove (i). Let B −→ A and B −→ A be two inverse mappings of
σ , then we have
τ1 σ = 1 A , τ2 σ = 1 A , and σ τ1 = 1 B , σ τ2 = 1 B ,
τ1 = τ1 1 B = τ1 (σ τ2 ) = (τ1 σ )τ2 = 1 A τ2 = τ2 ,
where (dk−1 dk−2 · · · d0 )b is called a b-ary integer, (0.d−1 d−2 · · · )b is called a b-ary
decimal, and
x = (dk−1 dk−2 · · · d0 )b + (0.d−1 d−2 · · · )b . (1.5)
It is our customary decimal expression. It is worth noting that in any system, integers
and integers are one-to-one correspondence, and decimals and decimals are one-to-
one correspondence. For example, integer in decimal system corresponds to integer
in binary system, so does decimal. In other words, the real number of (0, 1) interval
on the real number axis corresponds to the decimal number of (0, 1) under the binary
4 1 Preparatory Knowledge
system one by one. It should be noted that we often ignore binary decimal; in fact,
it is the main technical support of various arithmetic codes, such as Shannon code.
Now let us see the b-ary expression of a positive integer n in the decimal system.
Let
n = (dk−1 dk−2 · · · d1 d0 )b , 0 ≤ di < b, dk−1 = 0,
Lemma 1.2 The number k of b-ary digits of positive integer n can be calculated
according to the following formula:
k = [logb n] + 1, (1.7)
[x] denotes the largest integer not greater than the real number x.
k − 1 ≤ logb n < k,
Now let us see the addition operation in b-ary system. For simplicity, we con-
sider the addition of two positive integers in binary system. Let n = (1111000)2 ,
m = (11110)2 , then n + m = 1111000 + 0011110 = 10010110, that is n + m =
(10010110)2 . The addition of numbers on the same bit actually includes the fol-
lowing five contents (or operations).
1. Observe the numbers in the same bit and note if there are progressions in the
right bit(Every two goes into one).
2. If the upper and lower digits of the same bit are 0, and there is no progression
on the right side, the sum of the two digits is 0.
3. If both the upper and lower digits of the same digit are 0, but there is a progression,
or if one of the two digits is 0 and the other is 1, and there is no progression, the
two digits in this digit add up to 1.
4. If two digits of the same digit have one 0, the other one is 1, and there is one
progression, or two digits are 1, and there is no progression, the result of addition
is 0, and one progression is put forward.
5. If two digits are 1 and have one progression, the sum result is 1 and one progres-
sion forward.
1.2 Computational Complexity 5
Definition 1.3 A bit operation is an addition operation on the same bit in binary
addition. Suppose A is an algorithm in binary system, we use Time(A) to represent
the number of bit operations in algorithm A, that is, Time(A) = completes the total
number of bit operations performed by algorithm A.
It is easy to deduce the number of bit operations of binary about addition and sub-
traction by definition. Let n, m be two positive integers, and their binary expression
bits are k and l respectively, then
In the same way, the number of bit operations required for the multiplication of B
and D in binary system is satisfied
It is very convenient to estimate the number of bit operations by using the symbol “O
commonly used in number theory. If f (x) and g(x) are two real valued functions,
g(x) > 0, suppose there are two absolute constants B and C such that when |x| > B,
we have
| f (x)| ≤ Cg(x), notes f (x) = O(g(x)).
This sign indicates that when x → ∞, the order of growth of f (x) is the same as
that of g(x). For example, let f (x) = ad x d + ad−1 x d−1 + · · · + a1 x + a0 (ad > 0),
then
f (x) = O(|x|d ), or f (n) = O(n d ), n ≥ 1.
Lemma 1.3 Let n, m be two positive integers, k and l are the bits of their binary
expression, respectively, if m ≤ n, then l ≤ k, and
and
Time(n!) = O(n 2 k 2 ) = O(n 2 log2 n).
In the same way, we can prove the bit operation estimation of n!. We complete the
proof of Lemma 1.4.
Let us deduce the computational complexity of some common number theory
algorithms. Let m and n be two positive integers, then there is a nonnegative integer r
such that m ≡ r (mod n), where 0 ≤ r < n, we call r the smallest nonnegative residue
of m under mod n, and denote as r = m mod n. If 1 ≤ m ≤ n, Euclid’s division
method is usually used to find the greatest common divisor (n, m) of n and m. If
(m, n) = 1, then there is a positive integer a such that ma ≡ 1(mod n), a is called the
multiplicative inverse of m under mod n, denote as m −1 mod n. By Bezout formula,
if (n, m) = 1, then there are integers x and y such that xm + yn = 1, we usually
use the extended Euclid algorithm to find x and y. If we find x, we actually calculate
m −1 mod n. Under the above definitions and notations, we have
Lemma 1.5 (i) Suppose m and n are two positive integers, then
(iii) Suppose m and n are two positive integers, and (m, n) = 1, then
(i) holds. The Euclid algorithm used to calculate the greatest common divisor (n, m)
of n and m, in fact, it is a division of O(log n) times with remainder, so
In Euclid algorithm, we can get x and y by pushing from bottom to top, such that
xm + yn = 1, this incremental process is called the expansion of Euclid algorithm,
therefore, if m ≤ n, then
(iv) the computational complexity of the power of an integer under mod n. the proof
method is the famous “repeated square method” . Let
m = (m k−1 m k−2 · · · m 1 m 0 )2
= m 0 + 2m 1 + 4m 2 + · · · + 2k−1 m k−1
Our calculation ends after the square of (k − 1); at this time, there is
Obviously, the number of bit operations per square is O((log n 2 )2 ) = O(log2 n).
There is a total of k square operations, k = O(log m). So (iv) holds. We have com-
pleted the proof.
Table 1.1 Time requirements of algorithms with different computational complexity (k = 106 )
Algorithm type Complexity Number of bit Time
operations
Constant degree O(1) 1 1 microsecond
Linear O(k) 106 1s
Quadratic O(k 2 ) 1012 11.6 days
Cubic O(k 3 ) 1018 32000 years
√
Subexponential O(e k log k ) About 1.8 × 101618 6 × 101604 years
Exponential O(2k ) 10301030 3 × 10301016 years
A real valued function f (x) in the interval (a, b) is called a strictly convex function,
if for ∀ x1 , x2 ∈ (a, b), λ1 > 0, λ2 > 0, λ1 + λ2 = 1, we have
and the equation holds if and only if x1 = x2 . By inductive method, we can prove
the Jensen inequality as follows.
10 1 Preparatory Knowledge
Lemma 1.6 If f (x) is a strictly convex function over (a, b), then for any positive
integer n > 1, any positive number λi (1 ≤ i ≤ n), λ1 + λ2 + · · · + λn = 1 and any
xi ∈ (a, b)(1 ≤ i ≤ n), we have
n
n
λi f (xi ) ≤ f ( λi xi ), (1.12)
i=1 i=1
λ1 λ2
x = x1 + x2 ,
λ1 + λ2 λ1 + λ2
n
n
λi f (xi ) = λ1 f (x1 ) + λ2 f (x2 ) + λi f (xi )
i=1 i=3
n
≤ (λ1 + λ2 ) f (x ) + λi f (xi )
i=3
≤ f (λ1 x1 + λ2 x2 + · · · + λn xn ).
We have the proposition that holds for n. Thus, the inequality (1.12) holds.
Lemma 1.7 Let g(x) be positive function, that is g(x) > 0, then for any integers
λi (1 ≤ i ≤ n), λ1 + λ2 + · · · + λn = 1, and any a1 , a2 , . . . , an , we have
n
n
λi log g(ai ) ≤ log λi g(ai ), (1.13)
i=1 i=1
Proof Because log x is strictly convex, let xi = g(ai ), then xi ∈ (0, +∞)(1 ≤ i ≤
n), by Jensen inequality,
1.3 Jensen Inequality 11
n
n
λi log g(ai ) = λi log xi
i=1 i=1
n
≤ log( λi xi )
i=1
n
= log( λi g(ai )).
i=1
A real valued function f (x) is called a strictly convex function in the interval
(a, b), if for ∀ x1 , x2 ∈ (a, b), λ1 > 0, λ2 > 0, λ1 + λ2 = 1, we have
and the equation holds if and only if x1 = x2 . By induction, we can prove the fol-
lowing general inequality.
Lemma 1.8 If f (x) is called a strictly convex function in the interval (a, b), then
for any positive integer n ≥ 2, any positive numbers λi (1 ≤ i ≤ n), λ1 + λ2 + · · · +
λn = 1 and any xi ∈ (a, b)(1 ≤ i ≤ n), then we have
n
n
f( λi xi ) ≤ λi f (xi ), (1.14)
i=1 i=1
We know that f (x) is strictly convex in the interval (a, b) if and only if f (x) > 0.
Let f (x) = x log x, then f (x) = x ln1 2 > 0, when x ∈ (0, +∞). Then we have the
following logarithmic inequality.
Proof Because f (x) = x log x is a strictly convex function, from 1.8, we have
n
n
f( λi xi ) ≤ λi f (xi ),
i=1 i=1
n
where i=1 λi = 1. Take λi = nbi
bj
, xi = ai
bi
, then
j=1
12 1 Preparatory Knowledge
n
1
n
ai n
ai ai
n ai log i=1
n ≤ n log ,
j=1 b j i=1 i=1 bi i=1 j=1 b j bi
n
j=1 b j is deleted at the same time on both sides, then there is
n
n
ai n
ai
( ai ) log i=1
n ≤ ai log ,
i=1 i=1 bi i=1
bi
The above formula is called logarithm sum inequality, which is often used in
information theory.
In number theory (see reference 1’s Apostol 1976), we can get the average asymptotic
formula of some arithmetic functions by using the Euler sum formula, the most
important of which is the following Stirling formula. For all real numbers x ≥ 1,
we have
log m = x log x − x + O(log x), (1.16)
1≤m≤x
In number theory, the Stirling formula appears in the more precise form below,
√ n
n! ≈ 2π n( )n
e
or
n!
lim √ = 1.
n→∞ 2π n( ne )n
n
Lemma 1.10 Let 0 ≤ m ≤ n, n, m be nonnegative integer, and m
be the combina-
tion number, then
n nn
≤ m . (1.18)
m m (n − m)n−m
1.4 Stirling Formula 13
Proof
n
n = (m + (n − m)) ≥
n
m m (n − m)n−m ,
n
m
Proof We first prove that (i), (ii) can be obtained directly from the logarithm of (i).
n
1 = (λ + (1 − λ))n ≥ λi (1 − λ)n−i
0≤i≤λn
i
n λ i
= (1 − λ)n ( )
0≤i≤λn
i 1 − λ
n λ λn
≥ (1 − λ)n ( )
0≤i≤λn
i 1 − λ
n
−n H (λ)
=2 .
0≤i≤λn
i
In order to prove that (iii), we write m = [λn] = λn + O(1), from (ii), we have
1 n
log ≤ H (λ).
n 0≤i≤λn
i
So there are
1 n log n
log ≥ log n − λ log λn − (1 − λ) log n(1 − λ) + O( )
n 0≤i≤λn
i n
log n
= −λ log λ − (1 − λ) log(1 − λ) + O( )
n
log n
= H (λ) + O( ).
n
In the end, we have
1 n
lim log = H (λ).
n→∞ n i
0≤i≤λn
n
p(xi ) = 1, and p(xi y j ) = 0, when i = j. (1.21)
i=1
In a complete event group, we can assume that 0 < p(xi ) ≤ 1(1 ≤ i ≤ n).
Total probability formula: If {x1 , x2 , . . . , xn } is a complete event group, y is any
random event, then we have
n
p(y) = p(yxi ) (1.22)
i=1
and
n
p(y) = p(xi ) p(y|xi ). (1.23)
i=1
Lemma 1.12 Let {x1 , x2 , . . . , xn } is a complete event group, then the event y can
and can only occur simultaneously with a certain xi , so for any i, 1 ≤ i ≤ n, we
have the following Bayes formula:
p(xi ) p(y|xi )
p(xi |y) = n , 1 ≤ i ≤ n. (1.24)
j=1 p(x j ) p(y|x j )
then there is
p(xi ) p(y|xi )
p(xi |y) = .
p(y)
And from the total probability formula (1.23), then we can know
16 1 Preparatory Knowledge
p(xi ) p(y|xi )
p(xi |y) = n , 1 ≤ i ≤ n,
j=1 p(x j ) p(y|x j )
Now we discuss the n-fold Bernoulli experiment. In statistical test, the test with
only two possible results is called Bernoulli experiment, and the experiment satisfy-
ing the following agreement is called n-fold Bernoulli experiment:
(1) There are at most two possible results in each experiment: a or ā.
(2) The probability p of occurrence of a in each test remains unchanged.
(3) Each experiment is statistically independent.
(4) A total of n experiments were carried out.
Proof The results of the i-th Bernoulli test are recorded as xi (xi = a or ā), then
n-fold Bernoulli experiment forms the following joint event x
x = x1 x2 · · · xn , xi = a or ā.
Because of the independence of the experiment, when there are exactly k xi = a, the
occurrence probability of x is
Obviously, there are exactly k joint events of xi = a, and the total number is xi = a,
so
n k n−k
B(k; n, p) = p q .
k
In the same way, we can calculate the probability of event a appearing at the k-th
in multiple Bernoulli experiments.
Lemma 1.14 Suppose that a and ā are two possible events in Bernoulli experiments,
then the probability of the first appearance of a in the k-th Bernoulli experiment is
pq k−1 .
n-fold Bernoulli experiment is not only the most basic probability model in prob-
ability and statistics, but also a common tool in communication field. Next, we take
the error of binary channel transmission as an example to illustrate.
Call p(x) the probability function of ξ . If ξ has only a finite number of values,
or countable infinite values, that is, the value space of ξ is a finite number of real
numbers, or countable infinite real numbers, then ξ is called discrete random variable;
otherwise, ξ is called continuous random variable. The distribution function F(x) of
a random variable ξ is defined as
f (x) is called the density function of the random variable ξ . Obviously, the density
function satisfies:
+∞
On the other hand, the function f (x) satisfying the formula (1.29) must be the den-
sity function of a random variable. Here, we introduce several common continuous
random variables and their probability distribution.
1. Uniform distribution(Equal probability distribution)
A random variable ξ is equal probability value in interval [a, b], and ξ is said to be
uniformly distributed, or it is also called a random variable of uniformly distributed,
and its density function is
⎧
⎨ 1
, a ≤ x ≤ b.
f (x) = b−a
⎩
0, otherwise.
2. Exponential distribution
The density function of random variable ξ is
λe−λx , when x ≥ 0.
f (x) =
0, when x < 0.
3. Normal distribution
A continuous random variable ξ whose density function f (x) is defined as:
1 (x−μ)2
f (x) = √ e− 2σ 2 , x ∈ (−∞, +∞).
2π σ
where μ and σ are constants, σ > 0. We say that ξ obeys the normal distribution
with parameters of μ and σ 2 , and denote as ξ ∼ N μ, σ 2 . By Possion integral,
+∞
√
e−x dx =
2
π,
−∞
f (x) dx = 1.
−∞
The distribution function F(x) of normal distribution N μ, σ 2 is
x
1 (t−μ)2
F(x) = √ e− 2σ 2 dt.
2π σ
−∞
+∞
|xi | p(xi ) < ∞,
i=1
+∞
Eξ = E(ξ ) = xi p(xi ). (1.30)
i=1
(2) Let ξ be a continuous random variable and f (x) be its density function, if
+∞
+∞
(3) Let h(x) be a real valued function, then h(ξ ) is also a random variable, and
h(ξ ) is called a function of the random variable ξ . The mathematical expectation
E(h(ξ )) of h(ξ ) is E h (ξ ).
Lemma 1.15 (1) Let ξ be a discrete random variable whose value space is {x1 , x2 , . . .,
xn , . . .}, if E(ξ ) exists, then E h (ξ ) also exists, and
+∞
E h (ξ ) = h (xi ) p(xi ).
i=1
(2) If ξ is a continuous random variable, and E(ξ ) exists, then E h (ξ ) also exists,
and
+∞
E h (ξ ) = h(x) f (x)dx.
−∞
+∞
+∞
P η = y j = P( {ξ = xi }) = P {ξ = xi } .
i=1 i=1
h(xi )=y j h(xi )=y j
+∞
E h (ξ ) = E(η) = yj P η = yj
j=1
+∞
+∞
= yj P {ξ = xi }
j=1 i=1
h(xi )=y j
+∞
+∞
= ( y j )P {ξ = xi }
i=1 j=1
h(xi )=y j
+∞
= h(xi ) p(xi ).
i=1
D(ξ ) = E((ξ − Eξ )2 )
= E(ξ 2 − 2ξ Eξ + E 2 (ξ ))
= E(ξ 2 ) − 2(Eξ )2 + (E(ξ ))2
= E(ξ 2 ) − (Eξ )2 .
To prove (5), from Lemma 1.16, we notice that the mathematical expectation of
(ξ − Eξ ) is 0, so if c = E(ξ ), by (3), we have
22 1 Preparatory Knowledge
Since the last term of the above formula is not zero, we always have
(5) holds. This property indicates that E((ξ − c)2 ) reaches the minimum value
D(ξ ) at c = Eξ . We have completed the proof.
Now we give the main results of this section; in mathematics, it is called Chebyshev
type inequality, which is essentially the so-called moment inequality, because the
mathematical expectation Eξ of a random variable ξ is the first-order origin moment
and the variance is the second-order moment.
Theorem 1.1 Let h(x) be a nonnegative real valued function of x, ξ is a random
variable, and expectation Eξ exists, then for any ε > 0, we have
E h (ξ )
P{h(ξ ) ≥ ε} ≤ , (1.32)
ε
and
E h (ξ )
P{h(ξ ) > ε} < . (1.33)
ε
Proof We prove the theorem only for continuous random variable ξ . Let f (x) be
density function of ξ , then by Lemma 1.15,
+∞
E h (ξ ) = h(x) f (x) dx
−∞
≥ h(x) f (x) dx
h(x)≥ε
≥ε f (x) dx
h(x)≥ε
= ε P{h(x) ≥ ε}.
D(ξ )
P{|ξ − Eξ | ≥ ε} ≤ . (1.34)
ε2
1.6 Chebyshev Inequality 23
E h (ξ )
P{|ξ − Eξ | ≥ ε} = P{h(ξ ) ≥ ε2 } ≤ .
ε2
The Corollary holds.
Corollary 1.2 (Chebyshev) Suppose that both the expected value Eξ and the vari-
ance D(ξ ) of the random variable ξ exist, then for any k > 0, we have
1
P{|ξ − Eξ | ≥ k D(ξ )} ≤ 2 . (1.35)
k
√
Proof Take ε = k D(ξ ) in Corollary 1.1, then
D(ξ ) 1
P{|ξ − Eξ | ≥ k D(ξ )} ≤ 2 = 2.
k D(ξ ) k
Then the Chebyshev inequality in the Corollary 1.2 can be written as follows:
1
P{|ξ − μ| ≥ kσ } ≤ . (1.36)
k2
Corollary 1.3 (Markov) If the expected value of the random variable ξ satisfying
the positive integer |ξ |k of k ≥ 1 exists, then
E|ξ |k
P{|ξ | ≥ ε} ≤ .
εk
Proof Take h(ξ ) = |ξ |k in Theorem 1.1, Replace ε with εk , then the Markov inequal-
ity is directly derived from Theorem 1.1.
Next, we introduce several common discrete random variables and their probability
distribution and calculate their expected value and variance.
Example 1.2 (Degenerate distribution) A random variable ξ takes a constant a with
probability 1, that is ξ = a, P{ξ = a} = 1, ξ is called degenerate distribution. From
Lemma 1.16, (1), Eξ = a, its variance is D(ξ ) = 0.
24 1 Preparatory Knowledge
Example 1.3 (Two point distribution) A random variable ξ has only two values
{x1 , x2 }, and its probability distribution is
Specially, take x1 = 1, x2 = 0, then the expected value and variance of the two-point
distribution are
E(ξ ) = p, D(ξ ) = p(1 − p).
Example 1.4 (Equal probability distribution) Let a random variable ξ have n values
{x1 , x2 , . . . , xn } and be equal probability distribution, that is
1
P{ξ = xi } = , 1 ≤ i ≤ n.
n
ξ is called a equal probability distribution or uniform distribution with obeying n
points x1 , x2 , . . . , xn . The expected value and variance are
1 1
n n
E(ξ ) = xi , D(ξ ) = (xi − E(ξ ))2 .
n i=1 n i=1
Example 1.5 (Binomial distribution) In the n-fold Bernoulli experiment, the number
of times ξ of event a is a random variable from 0 to n. The probability distribution
is (see Bernoulli experiment)
n k n−k
P{ξ = k} = b(k; n, p) = p q ,
k
Proof By definition,
n n
n k n−k
E(ξ ) = kb(k; n, p) = k p q
k=0 k=1
k
n
n−1
= np p k−1 q (n−1)−(k−1)
k=1
k−1
n−1
n−1
= np p k q n−1−k
k=0
k
n−1
= np b(k; n − 1, p)
k=0
= np.
n
E(ξ ) =
2
k 2 b(k; n, p) = n 2 p 2 + npq.
k=0
thus
D(ξ ) = E(ξ 2 ) − (E(ξ ))2 = npq.
λn n−k
lim (1 − ) = e−λ ,
n→∞ n
26 1 Preparatory Knowledge
also
1 2 k−1
lim (1 − )(1 − ) · · · (1 − ) = 1.
n→∞ n n n
So there are
λk −λ
lim b(k; n, pn ) = e .
n→∞ k!
So Lemma 1.19 holds.
(np)k −np
b(k; n, p) ≈ e .
k!
F(x1 , x2 , . . . , xn ) = P{ξ1 ≤ x1 , ξ2 ≤ x2 , . . . , ξn ≤ xn },
For the following properties of stochastic process, we do not give any proof. The
reader can find them in the classical probability theory textbook (see reference 1’s
Rényi 1970, Li 2010, Long 2020).
∞
Definition 1.8 Let {ξi }i=1 be a series of random variables, ξ is a given random
variable, if for any ξ > 0, we have
P
it is called {ξn } converges to ξ in probability, denoted as ξn −→ ξ .
P
Obviously, ξn −→ ξ if and only if for any ε > 0, there is
lim P{|ξn − ξ | ≤ ε} = 1.
n→∞
28 1 Preparatory Knowledge
μn 1
E( ) = E(μn ) = p
n n
and
μn 1 pq
D( ) = 2 D(μn ) = , q = 1 − p.
n n n
respectively. By Chebyshev inequality (1.34), we have
μn pq
P{| − p| > ε} < 2 .
n nε
For any given ε > 0, we have
μn
lim P{| − p| > ε} = 0.
n→∞ n
So Bernoulli’s law of large numbers holds.
Then ξi follows a two-point distribution with parameter p (see Sect. 1.6, example
+∞
1.3), and {ξi }i=1 is an independent and identically distributed stochastic process.
Obviously,
n
μn = ξi , E(ξi ) = p.
i=1
1.7 Stochastic Process 29
where {ξi } is a sequence of independent random variables with the same two-point
distribution of 0 − 1 with parameter p. It is not difficult to generalize this conclusion
to a more general case.
+∞
Theorem 1.3 (Chebyshev’s law of large numbers) Let {ξi }i=1 be a series of inde-
pendent random variables, their expected value E(ξi ) and variance D(ξi ) exist, and
the variance is bounded, i.e., D(ξi ) ≤ C holds for any i ≥ 1, then for any ε > 0, we
have
1 1
n n
lim P{| ξi − E(ξi )| < ε} = 1.
n→∞ n i=1 n i=1
1 1
n n
P{| ξi − E( ξi )| ≥ ε}
n i=1 n i=1
n
D( n1 i=1 ξi )
≤
nε
2
D( i=1 ξi )
=
n 2 ε2
1
n
= 2 2 D(ξi )
n ε i=1
C
≤ .
nε2
So there are
1 1
n n
lim P{| ξi − E(ξi )| ≥ ε} = 0.
n→∞ n i=1 n i=1
Chebyshev’s law of large numbers is more general than Bernoulli’s law of large
numbers, it can be understood as a sequence of independent random variables {ξi },
the arithmetic mean of a random variable converges to the arithmetic mean of its
expected value in probability.
As a special case, we consider an independent identically distributed stochastic
process {ξi }. Because there is the same probability distribution, there is the same
expectation and variance.
30 1 Preparatory Knowledge
Corollary 1.4 Let {ξi } be an independent and identically distributed random pro-
cess, their common expectation is μ, the variance is σ 2 , that is E(ξi ) = μ, D(ξi ) =
σ 2 (i = 1, 2, . . .), then we have
1
n
lim P{| ξi − μ| < ε} = 1,
n→∞ n i=1
n P
i=1 ξi −→ μ.
1
that is n
n x
i=1 ξi − nμ 1 t2
lim P{ √ ≤ x} = √ e− 2 dt,
n→∞ σ n 2π
−∞
n
That is, the sum of random variables i=1 ξi , whose standardized variables converge
to the standard normal distribution N (0, 1) in probability.
(n − 1)! + 1 ≡ 0(mod n)
n = b0 + b1 b + b2 b2 + · · · + br −1 br −1 , where 0 ≤ bi < b, r ≥ 1.
12. Prove: There are infinitely many primes p satisfies p ≡ −1(mod 6).
13. Solve the congruence equation: 27x ≡ 25(mod 31).
14. Let p be a prime, n ≥ 1 be a positive integer, find the number of solutions of
quadratic congruence equation x 2 ≡ 1(mod p n ).
32 1 Preparatory Knowledge
15. In order to reduce the number of games, 20 teams were divided into two groups,
each with 10 team, find the probability that the strongest two teams will be in the
same group, and the probability of the strongest two teams in different groups.
16. (Banach question). A mathematician has two boxes of matches. Each box has N
matches. When he uses them, he takes one match from any box and calculates
the probability that one box has k Matches and the other box is empty.
17. A stick of length l can break at any two points, find the probability that the three
pieces of the stick can form a triangle.
18. There are k jars, each containing n balls, numbered from 1 to n. Now take any
ball from each jar and ask the probability of that m is the largest number in the
ball.
19. Take any three of the five numbers of 1, 2, 3, 4, 5 and arrange them from small to
large. Let X denote the number in the middle and find the probability distribution
of X .
20. Let F(x) be a distribution function of a continuous random variable, a > 0,
prove
+∞
|F(x + a) − F(x)|d x = a.
−∞
References
VanderWalden, B. L. (1976). Algebra (II). Translated by Xihua Cao: Kencheng Zeng, Fuxin Hao,
Beijing, Science Press (in Chinese).
VanLint, J. H. (1991). Introduction to coding theory. Springer.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 2
The Basis of Code Theory
The channel of information transmission is called channel for short. The commonly
used channels include cable, optical fiber, medium of radio wave transmission and
carrier line, etc., and also include tape, optical disk, etc. The channel constitutes
the physical conditions for social information to interact across space and time. In
addition, a piece of social information, such as various language information, picture
information, data information and so on, should be exchanged across time and space,
information coding is the basic technical means. What is information coding? In
short, it is the process of digitizing all kinds of social information. Digitization
is not a simple digital substitution of social information, but is full of profound
mathematical principles and beautiful mathematical technology. For example, the
source code used for data compression and storage uses the principle of probability
statistics to attach the required statistical characteristics to social information, so the
source code is also called random code. The other is the so-called channel coding,
which is used to overcome the channel interference. This kind of code is full of
beautiful algebra, geometry and various mathematical techniques in combinatorics,
in order to improve the accuracy of information transmission, so the channel coding
is also called algebraic combinatorial code. The main purpose of this chapter is to
introduce the basic knowledge of code theory for channel coding. Source coding will
be introduced in Chap. 3.
With the hardware support of channel and the software technology of information
coding, we can implement the long-distance exchange of various social information
across time and space. Taking channel coding as an example, this process can be
described as the following diagram (Fig. 2.1).
In 1948, American mathematician Shannon published his pioneering paper
“Mathematical Principles of Communication” in the technical bulletin of Bell labora-
tory, marking the advent of the era of electronic information. In this paper, Shannon
proved the existence of “good code” with the rate infinitely close to the channel
capacity and the transmission error probability arbitrarily small by using probabil-
ity theory (see Theorem in this Chap. 2.10), on the other hand, if the transmission
© The Author(s) 2022 35
Z. Zheng, Modern Cryptography Volume 1, Financial Mathematics and Fintech,
https://doi.org/10.1007/978-981-19-0920-7_2
36 2 The Basis of Code Theory
error probability is arbitrarily small, the code rate (transmission efficiency) does not
exceed an upper bound (channel capacity) (see Theorem in Chap. 3). This upper
bound is called Shannon’s limit, which is regarded as the golden rule in the field of
electronic communication engineering technology.
Shannon’s theorem is an existence proof rather than a constructive proof. How
to construct the so-called good code which can not only ensure the communication
efficiency (the code rate is as large as possible), but also control the transmission error
rate is the unremitting goal after the advent of Shannon’s theory. From Hamming and
Golay to Elias, Goppa, Berrou and Turkish mathematician Arikan, from Hamming
code, Golay code to convolutional code, turbo code to polar code, over the past
decades, electronic communication has reached one peak after another, creating one
technological miracle after another, until today’s 5G era. In 1969, the U.S. Mars probe
used Hadamard code to transmit image information. For the first time, mankind was
lucky to witness one beautiful picture after another in outer space, in 1971, the U.S.
Jupiter and Saturn probe used the famous Golay code G23 to send hundreds of frames
of color photos of Jupiter and Saturn back to earth, 70 years of exploration of channel
coding is a magnificent history of electronic communication.
The main purpose of this chapter is to strictly define and prove the mathematical
characteristics of general codes in theory, so as to provide a solid mathematical
foundation for further study of coding technology and cryptography. This chapter
includes Hamming distance, Lee distance, linear code, some typical good codes,
MacWilliams theorem and famous Shannon coding theorem. Master the content of
this chapter, we will have a basic and comprehensive understanding of channel coding
theory (error correction code).
In channel coding, the alphabet usually chooses a q-element finite field Fq , sometimes
a ring Zm , where q is the power of a prime. Let n 1 be a positive integer, Fqn is an
n-dimensional linear space over Fq , also called codeword space.
Proof Because w(−x) = w(x), so w(x − y) = w(x + (−y)). We can only prove
w(x + y) w(x) + w(y). Let x = x1 . . . xn , y = y1 . . . yn , then
where ρ is a nonnegative integer. Obviously, B0 (x) = {x} contains only one code-
word.
Lemma 2.3 For any x ∈ Fqn , 0 ρ n, we have
ρ
n
|Bρ (x)| = (q − 1)i , (2.3)
i=0
i
Obviously,
n
Ai = (q − 1)i ,
i
so
ρ
ρ
n
|Bρ (x)| = Ai = (q − 1)i .
i=0 i=0
i
That is to say, the number of codewords in Bρ (x) is a constant which only depends
on radius ρ. This constant is usually denoted as Bρ .
2.1 Hamming Distance 39
Definition 2.1 If C Fqn , C is called a q-ary code, code for short, |C| is the number
of codewords in code C. If |C| = 1, we call C a trivial code, and all the codes we
discuss are nontrivial codes.
For a code of C, the following five mathematical quantities are of basic importance.
d = 2ρ1 + 1, d = 2ρ1 + 2.
Proof We can only prove 2ρ1 + 1 d 2ρ1 + 2. If d 2ρ1 , then there are code-
words c1 ∈ C, c2 ∈ C, c1 = c2 such that
d(c1 , c2 ) 2ρ1 .
This means that c1 and c2 have at most 2ρ1 different characters. Without losing
generality, we can make the first 2ρ1 characters of c1 and c2 different, that is
c1 = a1 a2 . . . aρ1 aρ1 +1 . . . a2ρ1 ∗ ∗ · · · ∗
c2 = b1 b2 . . . bρ1 bρ1 +1 . . . b2ρ1 ∗ ∗ · · · ∗
That is
x ∈ Bρ1 (c1 ) ∩ Bρ1 (c2 ).
It’s in contradiction with Bρ1 (c1 ) ∩ Bρ1 (c2 ) = φ. So we have d 2ρ1 + 1. If d >
2ρ1 + 2 = 2(ρ1 + 1), then we can prove the following formula, which is in contra-
diction with the definition of disjoint radius ρ1 .
40 2 The Basis of Code Theory
Because if the above formula does not hold, then c1 , c2 ∈ C, c1 = c2 , Bρ1 +1 (c1 )
intersects with Bρ1 +1 (c2 ), we might as well make
min{d(x, c)|c ∈ C} ρ.
That is, {Bρ (c)|c ∈ C} forms a cover of Fqn . Obviously, {Bρ−1 (c)|c ∈ C} can’t cover
Fqn , because if
Bρ−1 (c) = Fqn .
c∈C
min{d(x, c)|c ∈ C} ρ − 1.
Thus
ρ = max{min{d(x, c)|c ∈ C}|x ∈ Fqn } ρ − 1.
The contradiction indicates that ρ is the smallest positive integer. The lemma holds.
2.1 Hamming Distance 41
Lemma 2.6 Let d be the minimum distance of C and ρ be the covering radius of C,
then
d 2ρ + 1.
/ Bρ (c0 ), x ∈
x∈ / Bρ (c), ∀c ∈ C.
That is, {Bρ (c)|c ∈ C} cannot cover Fqn , which is contrary to lemma 2.5. So we
always have d 2ρ + 1. The Lemma holds.
Combining the above three lemmas, we can get the following simple but very impor-
tant corollaries.
Corollary 2.2 Let C ⊂ Fqn be an arbitrary q-ary code. d, ρ, ρ1 are the minimum
distance, covering radius and disjoint radius of C respectively, then
(i) ρ1 ρ.
(ii) If the minimum distance of C is d = 2e + 1 ⇒ e = ρ1 .
Proof (i) Directly from 2ρ1 + 1 d 2ρ + 1, if d = 2e + 1 is odd, then by the
lemma 2.4, d = 2ρ1 + 1 = 2e + 1 ⇒ e = ρ1 .
Definition 2.3 A code C, if ρ = ρ1 , is called a perfect code.
Corollary 2.3 (i) The minimum distance of any perfect code C is d = 2ρ + 1.
(ii) The minimum distance of a code C is d = 2e + 1, Then C is a perfect code if
and only if ∀ x ∈ Fqn , ∃ the only ball Be (c), c ∈ C ⇒ x ∈ Be (c).
Proof (i) can be directly launched by 2ρ1 + 1 d 2ρ + 1. To prove (ii), if C is
a perfect code and the minimum distance is d = 2e + 1, so we have ρ1 = ρ = e.
On the other hand, if the conditions are right, then the coverage radius of C is
ρ e = ρ1 ρ, so ρ1 = ρ. C is a perfect code.
In order to introduce the concept of error correcting code, we discuss the so-
called decoding principle in electronic information transmission. This principle is
commonly known as the decoding principle of “look most like”. What looks like
the most? When we transmit through the channel with interference, we receive a
codeword x ∈ Fqn , and a codeword x ∈ C. If
Definition 2.4 A code C is called e-error correcting code (e 1). If for any x ∈ Fqn ,
there is a c ∈ C ⇒ x ∈ Be (c), then c is unique.
An error correcting code allows transmission errors without affecting correct
decoding. For example, suppose that C is a e-error correcting code, then for any
c ∈ C, after c is transmitted through the channel with interference, the codeword we
receive is x, if an error occurs when c is transmitted with no more than e characters
at most, that is d(c, x) e, so the most similar codeword in C must be c, so we can
decode
decode x −−−→ c correctly.
Corollary 2.4 A perfect code with minimal distance d = 2e + 1 is e-error correct-
ing code.
Proof Because the disjoint radius ρ1 of C has ρ1 = ρ = e with the covering radius
ρ. Therefore, for any received codeword x ∈ Fqn , there exists and only exists a c ∈
C ⇒ x ∈ Be (c). That is, C is e-error correction code.
e
n
|C| (q − 1)i = q n . (2.4)
i=0
i
Then we have
Be (c) = q n ,
c∈C
thus
e
n
|C|Be = |C| (q − 1)i = q n .
i=0
i
Conversely, the sphere-packing condition (2.4) holds. Because the minimum distance
of C is d = 2e + 1, from corollary 2.2, we can see that ρ1 = e, so we have
Be (c) = Fqn .
c∈C
When q = 2, the alphabet F2 is a finite field of two elements {0, 1}, at this time,
the coding is called binary code or binary code, and the transmission channel is called
binary channel. In binary channel transmission, the most important is binary entropy
function H (λ), define as
0, when λ = 0 or λ = 1,
H (λ) = (2.5)
−λ log λ − (1 − λ) log(1 − λ), when 0 < λ < 1.
1 1
Obviously, H (λ) = H (1 − λ), and 0 H (λ) 1, H = 1, that is λ =
2 2
reaching the maximum. For further properties of H (λ), please refer to Chap. 1.
Theorem 2.2 Let C be a perfect code with minimal distance d = 2e + 1, RC is the
code rate of C, then
1 e e
(i) 1 − RC = log2 n
i
H .
n i=0 n
(ii) When the length of codeword is n → ∞, if lim RC = a, then
n→∞
e
lim H = 1 − a.
n→∞ n
Proof (i) According to the sphere-packing condition, since C is the perfect code, so
e
n
|C| = 2n .
i=0
i
We have
e
1 1 n
log2 |C| + log2 = 1.
n n i=0
i
That is
e
1 n e
1 − RC = log2 H ,
n i=0
i n
The last inequality is derived from lemma 1.11 in the Chap. 1, so (i) holds. If there
is a limit of RC when n → ∞, again from lemma 1.11 in the Chap. 1, we have
e
lim H = 1 − lim RC = 1 − a.
n→∞ n n→∞
First, repeat code A = {0, 1} ⊂ Fn2 contains only two codes 0 = 0 . . . 0 ∈ Fn2 , 1 =
1 . . . 1 ∈ Fn2 , because n = 2e + 1 is an odd number, so from Corollary 2.2, Disjoint
radius of A is ρ1 = e, let’s prove that the covering radius of A is ρ = ρ1 = e, for
any x ∈ Fn2 , if d(0, x) > e, that is d(0, x) ≥ e + 1, this shows that at least e + 1
characters are 1 in x = x1 x2 . . . xn ∈ Fn2 , the maximum number of e characters is 0,
thus d(1, x) ≤ e. This shows that x ∈ / Be (0), then x ∈ Be (1), that is
Lemma 2.7 Suppose C ∼ C1 are two equivalent codes, then they have the same code
rate, the same minimum distance, the same coverage radius and the same disjoint
radius. In particular, if C is a perfect code, then all codes C1 equivalent to C are
perfect codes.
Proof All the results of lemma can be easily proved by using equation (2.8).
1 k
RC = logq |C| = , minimal distance d = minimal weight w.
n n
Let {α1 , α2 , . . . , αk } ⊂ C be a set of bases of linear code C, where
Definition 2.6 If {α1 , α2 , . . . , αk } is a set of bases of linear code C = [n, k], then
have k × n-order matrix
⎡ ⎤ ⎡ ⎤
α1 α11 α12 · · · α1n
⎢ α2 ⎥ ⎢
⎢ ⎥ α21 α22 · · · α2n ⎥
G=⎢ . ⎥=⎢ ⎣
⎥.
⎣ .. ⎦ ··· ··· ··· ··· ⎦
αk αk1 αk2 · · · αkn
C = {aG|a ∈ Fqk }.
x H = 0 ⇔ x ∈ C.
Proof We only prove the conclusion by taking the standard form of the generating
matrix G of C. Let
Then the check matrix of C, that is, the generating matrix of dual code C ⊥ is
−A
H = [−A , In−k ], H = .
In−k
< x, y >= x y = x H b = 0 ⇒ x ∈ (C ⊥ )⊥ = C.
x H = y H ⇔ x − y ∈ C.
A linear code [n, n − k], its check matrix H is a k × n-order matrix, and any
two column vectors are linearly independent, that is H = [a1 , a2 , . . . , an ], then
{a1 , a2 , . . . , an } ⊂ P G(k − 1, q) are n with different nonzeros. So the generating
matrix of an [n, k] projective code consists of n different nonzero points in projec-
tive space P G(k − 1, q). Because n ≤ |P G(k − 1, q)|, when the maximum value
is reached, i.e.
qk − 1
n = |P G(k − 1, q)| = .
q −1
Theorem 2.3 Any Hamming code C = [n, n − k] is a perfect code, its minimum
distance is d = 3; therefore, Hamming codes are perfect 1−error correcting codes.
Proof We first prove that the minimum distance of Hamming code C is d ≥ 3. If
d ≤ 2, there is x = x1 x2 . . . xn ⇒ w(x) ≤ 2, that is, there are at most two characters
xi and x j are not 0. Because the minimum distance d = minimum weight w of a
linear code.
Let H = (α1 , α2 , . . . , αn ) be the check matrix of C. if x H = 0, then
⎡ ⎤
α1
⎢ α2 ⎥
⎢ ⎥
(x1 , x2 , . . . , xn ) ⎢ . ⎥ = 0.
⎣ .. ⎦
αn
⇒ B1 (c) = Fqn .
c∈C
n
A(z) = Ai z i , z is a variable.
i=0
Obviously, for any given c ∈ C, then the number of codewords in C whose Hamming
distance to c is exactly equal to i is Ai , that is
2.2 Linear Code 49
The codes with the above properties are called distance invariant codes; obviously,
linear codes are distance invariant codes.
The following result was proved by MacWilliams in 1963; he established the
relationship between the weight polynomials of a linear code C and its dual code
C ⊥ , which is the most basic achievement in code theory.
Theorem 2.4 (MacWilliams) Let C = [n, k] be a linear code over Fq and the weight
polynomial be A(z), C ⊥ is the dual code of C, the weight polynomial is B(z), then
1−z
B(z) = q −k (1 + (q − 1)z)n A( ).
1 + (q − 1)z
Specially, when q = 2,
1−z
2k B(z) = (1 + z)n A( ).
1+z
2πitr (a)
ψ(a) = exp( ), tr (a) : Fq → F p .
p
therefore,
gc (z) = z w(x) ψ(< x, c >), (2.10)
c∈C x∈Fqn c∈C
/ C ⊥ , let’s prove
If x ∈
ψ(< x, c >) = 0. (2.11)
c∈C
/ C ⊥ , let
If x ∈ Fqn , x ∈
On the contrary, any two additive cosets c1 + T (x), c2 + T (x), if < c1 , x >=<
c2 , x >, then < c1 − c2 , x >= 0, that is c1 − c2 ∈ T (x), so c1 + T (x) = c2 + T (x).
Therefore, the inner product of any two codewords in c + T (x) ⊂ C is the same
with x. Conversely, different additive cosets and the inner product of x are not equal.
Because x ∈ / C ⊥ , ∃ c0 ∈ C, such that < c0 , x >= 0, let < c0 , x >= a = 0, then
< a c0 , x >= 1, let c1 = a −1 c0 ∈ C, then < c1 , x >= 1. Therefore, ∀a ∈ Fq ⇒<
−1
Define the weight function w(a) = 1 for a ∈ Fq , if a = 0, w(0) = 0. For any x ∈ Fqn ,
c ∈ C, write x = x1 x2 . . . xn , c = c1 c2 . . . cn , then it is defined by G, we have
gc (z) = z w(x1 )+w(x2 )+···+w(xn ) ψ(c1 x1 + · · · + cn xn )
1≤i≤n
xi ∈Fq
(2.13)
n
w(x)
= z ψ(ci x).
i=1 x∈Fq
Thus
2.2 Linear Code 51
1−z
gc (z) = (1 + (q − 1)z)n ( )w(c)
c∈C c∈C
1 + (q − 1)z
1−z
= (1 + (q − 1)z)n A( ).
1 + (q − 1)z
1 1−z
B(z) = (1 + (q − 1)z)n A( )
|C| 1 + (q − 1)z
1−z
= q −k (1 + (q − 1)z)n A( ).
1 + (q − 1)z
Obviously,
Because 0 ≤ i + j ≤ m
2
, then
W L (i + j) = i + j = W L (i) + W L ( j).
(2) i ≤ m
2
, j> m
2
, there is
W L (i + j) = m − i − j ≤ m − j = W L ( j) ≤ W L (i) + W L ( j).
(3) i > m
2
, j≤ m
2
, there is
So
d L (a, b) = W L (a − b) = W L ((a − c) + (c − b))
≤ W L (a − c) + W L (c − b) = d L (a, c) + d L (c, b).
Next let’s make m = 4, the alphabet is Z4 , On a 4-ary code, we discuss Lee weight
and Lee distance. Suppose a ∈ Zn4 , 0 ≤ i ≤ 3, let
and
LeeC (x, y) = x 2n−W L (c) y W L (c) . (2.19)
c∈C
2.3 Lee Distance 53
Lemma 2.12 Let C ⊂ Zn4 is a 4-ary code with codeword length of n, then the sym-
metric polynomials and Lee weight polynomials have the following relation on C,
Let a = a1 a2 . . . an , then
n
W L (a) = W L (ai ) = n 1 (a) + 2n 2 (a) + n 3 (a).
i=1
So
LeeC (x, y) = x 2n 0 (c) · (x y)n 1 (c)+n 3 (c) y 2n 2 (c)
c∈C
= sweC (x 2 , x y, y 2 ).
1
LeeC ⊥ (x, y) = LeeC (x + y, x − y).
|C|
Take
f (u) = w n 0 (u) x n 1 (u)+n 3 (u) y n 2 (u) , u ∈ Zn4 .
54 2 The Basis of Code Theory
n i (u) = n i (u 1 ) + n i (u 2 ) + · · · + n i (u n ).
Thus
n
f (u) = f (u i ).
i=1
n
g(c) = ( f (u)ψ(< ci , u >)). (2.22)
i=1 u∈Z4
Now we calculate the inner sum on the right side of equation (2.22).
⎧
⎪
⎨w + 2x + y, if ci = 0
f (u)ψ(< ci , u >) = w − y, if ci = 1 or 3 .
⎪
⎩
u∈Z4 w − 2x + y, or ci = 2.
by (2.22),
So
g(c) = sweC (w + 2x + y, w − y, w − 2x + y).
c∈C
by (2.21),
1
LeeC ⊥ (x, y) = LeeC (x + y, x − y).
|C|
H2 ⊗ H2 = H2⊗2 , H2 ⊗ H2 ⊗ · · · ⊗ H2 = H2⊗n
We get 2n row vectors {±α1 , ±α2 , . . . , ±αn }, for each row vectors ±αi , we
replace the component −1 with 0, the row vector αi so permuted is denoted as
αi , −αi denote as −αi , so ±αi forms a vector of Fn2 , denote as
n
ab = 0 ⇒ ai bi = 0.
i=1
And ai = ±1, bi = ±1. Let the number of the same character be d1 and the number
of different characters be d = d(a, b), so there are d1 − d = 0, that is d1 = d, but
d1 + d = n, so d = n2 . The Lemma holds.
Corollary 2.6 C = {±α1 , ±α2 , . . . , ±αn } is Hadamard code, then the Hamming
distance of any two different codewords on C is n2 .
Proof {±α1 , ±α2 , . . . , ±αn } s the row vector of Hadamard matrix, let a = ±αi , b =
±α j (i = j), then
n
n
ab = ± ai bi = 0 ⇒ d(a, b) = .
i=1
2
In the theory and application of channel coding, binary Golay code is the most famous
one. In order to introduce Golay code G 23 completely, we first introduce the concept
of t − (m, k, λ) design.
58 2 The Basis of Code Theory
Let S be a set of m elements, that is |S| = m. The elements in S are called points.
Let R be the set of subsets with k elements in S, |R| = M, i.e.,
R = {B1 , B2 , . . . , B M }, Bi ⊂ S, |Bi | = k, 1 ≤ i ≤ M.
Lemma 2.14 2 − (11, 6, 3) design is the only definite one, that is to say, let S =
{a1 , a2 , . . . , a11 }, then there are 11 blocks in R,
R = {B1 , B2 , . . . , B11 }.
Proof Suppose ∀ a ∈ S, there is exactly l B j containing it, because there are exactly
3 blocks in any 2 points, so there are 6l − l = 10 × 3. Then l = 6. In addition,
suppose |R| = M, because each point has exactly six blocks containing it, there is
6 × M = 11 × 6, we can get M = 11.
And every row of N has exactly six 1’s and five 0’s, and every column of N has
exactly six 1’s and five 0’s.
N N = I11 + J11 .
Further rank(N ) = 10, and the solution of linear equation system X N = 0 is exactly
two repeated codewords 0 and 1(0 = (0, 0, . . . , 0), 1 = (1, 1, . . . , 1)) in F11
2 .
11
bi j = χk (ai )χk (a j ).
k=1
2 .
(1, 1, . . . , 1)N = (0, 0, . . . , 0) ∈ F11
where αi ∈ F24
2 is the 12 row vector of G. Obviously we have a weight function
If i = 1, j = 1, i = j, then
11
< αi , α j >= 1 + χk (ai )χk (a j ) = 4 ≡ 0(mod 2).
k=1
Definition 2.11 The linear code [24, 12] generated by row vector group {α1 , α2 , . . . ,
α12 } of G in F242 is called Golay code, denoted as G 24 . Remove the last component of
αi , αi → α i , then α i ∈ F23
2 . The linear code [23, 12] generated by {α 1 , α 2 , . . . , α 12 }
in F23
2 is called Golay code, denote as G 23 .
2.4 Some Typical Codes 61
Theorem 2.7 Golay code G 23 is a perfect code [23, 12] with minimal distance of
d = 7.
Proof Because the minimal distance of linear codes is minimal weight, by Lemma
2.16,
3
3
23 23
|G 23 | =2 12
= 223 .
i=0
i i=0
i
In order to introduce 3-ary Golay codes, we first define a Paley matrix of order q.
Let q ≥ 3 be an odd number, and define a second-order real-valued multiplication
characteristic χ (a) in the finite field Fq as
⎧
⎪ 0, if a = 0;
⎨
χ (a) = 1, if a ∈ (Fq∗ )2 ;
⎪
⎩
− 1, if a ∈/ (Fq∗ )2 .
Lemma 2.17 The Paley matrix Sq of order q has the following properties:
(i) Sq Jq = Jq Sq = 0.
(ii) Sq Sq = q Iq − Jq .
q−1
(iii) Sq = (−1) 2 Sq .
Here, Iq is the unit matrix of order q and Jq is the square matrix of order q with all
elements of 1.
q−1
bi j = χ (ai − ak ) = χ (c) = 0.
k=0 c∈Fq
q−1
ci j = χ (ai − ak )χ (a j − ak ).
k=0
Obviously, we have
q − 1, if i = j;
ci j =
− 1, if i = j.
q−1
So (ii) holds. To prove (iii), noticed that χ (−1) = (−1) 2 , so
q−1
Sq = χ (−1)Sq = (−1) 2 Sq ,
Let q = 5, we consider the Paley matrix S5 of order 5, it has been calculated that
⎡ ⎤
0 1 −1 −1 1
⎢ 1 0 1 −1 −1 ⎥
⎢ ⎥
S5 = ⎢
⎢ −1 1 0 1 −1 ⎥
⎥.
⎣ −1 −1 1 0 1 ⎦
1 −1 −1 1 0
In F11
3 , we consider a linear code C whose generator matrix is
11111
G= I6 ,
S5
2.4 Some Typical Codes 63
respectively. Where β is a column vector and satisfies that the sum of all column
vectors of β and G is 0. Further, let q = 2, if the minimum distance d of C is odd,
then the minimum distance of C is d + 1.
Proof The generation matrix and check matrix of C can be given directly by defi-
nition. The minimal weight w = w(c) of C can be obtained by c = c1 c2 . . . cn ∈ C,
because q = 2, so there are w ci = 1, and w is an odd number, then w = 0, let
cn+1 = 1, then
c∗ = c1 c2 . . . cn+1 ∈ C and w(c∗ ) = d + 1.
Consider the extension codes C = [12, 6] of 3-ary Golay code C = [11, 6], its
generating matrix is ⎛ ⎞
11111 0
⎜ −1⎟
⎜ ⎟
G = ⎜ I6 S5 .. ⎟ , (2.25)
⎝ . ⎠
−1
Note that the sum of the components of each row vector of S5 is 0, and the inner
product of the different row vectors is - 1, and the inner product of the same row
vector is 1, so
G · G = 0.
Theorem 2.8 3-ary Golay code C is a perfect linear code [11, 6], its minimum
distance is 5, so it is a 2-error correcting code.
Proof The weight of each row vector of G is 6, according to the calculation, the
weight of the linear combination of row vectors of G is 6, so the minimum distance
of extension code C is 6 ⇒ the minimum distance of C is 5. So the disjoint radius
of C is ρ1 = 2. And because
2
11 i
|C| = 3 , 6
2 = 35 ,
i=0
i
2
11 i
|C| 2 = 311 .
i=0
i
Remark 2.1 It is worth noting that J.H.VanLint in 1971 (See reference 2 [24]),
A.T iet äv äinen in 1973(See reference 2 [43]) independently proved that perfect
codes (nontrivial) with minimal distance greater than 3 have only 2-ary Golay codes
G 23 and 3-ary Golay codes over any finite field.
Reed and Muller proposed a class of 2-ary linear codes based on finite geometry in
1954. In order to discuss the structure and properties of these codes, we first prove
some results in number theory.
Lemma 2.19 Let p be a prime, k, n be two nonnegative integers whose p-ary is
expressed as
l
l
n= n i pi , k = ki p i .
i=0 i=0
Then
l
n ni ni
≡ (mod p), where = 0, if ki > n i .
k i=0
k i ki
so we have
l
n i pi
(1 + x)n = (1 + x) i=0
l
i
≡ (1 + x p )ni (mod p),
i=0
Comparing the coefficients of the x k terms on both sides of the above formula, if
there is a k j > n j , then the x k terms do not appear on the right side of the above
formula, which means that the coefficients of the x k terms on the left side are
n
≡ 0(mod p).
k
If ki ≤ n i , ∀ 0 ≤ i ≤ l, then
l
n ni
≡ (mod p).
k i=0
ki
Massey defined the concept of polynomial weight for the first time in 1973,
on a finite field with characteristic 2 (q = 2r ), a polynomial f (x) ∈ Fq [x], whose
Hamming weight is defined as
Lemma 2.20 (Massey, 1973) Let f (x) = li=0 bi (x + c)i ∈ Fq [x] and bl = 0, let
i 0 be the smallest subscript i of bi = 0, then
2
n
−1
l
f (x) = bi (x + c)i + bi (x + c)i
i=0 i=2n
2n
= f 1 (x) + (x + c) f 2 (x)
n n
= f 1 (x) + c2 f 2 (x) + x 2 f 2 (x),
66 2 The Basis of Code Theory
where deg f 1 (x) < 2n , deg f 2 (x) < 2n . There are two situations to discuss:
(i) If f 1 (x) = 0, then w( f (x)) = 2w( f 2 (x)). Because i 0 ≥ 2n , so
= 2w((x + c)i0 −2 ).
n
So there are
(ii) f 1 (x) = 0, i 1 is the subscript of f 1 (x), i 2 is the subscript of f 2 (x). If the term
n n
not 0 in f 1 (x) plus the corresponding term of c2 f 2 (x) becomes 0, then x 2 f 2 (x)
will have corresponding terms that are not zero, so we always have
m−1
j= ai j 2i , ai j ∈ F2 .
i=0
We define ⎡ ⎤
a0 j
m−1 ⎢ a1 j ⎥
⎢ ⎥
xj = ai j u i = ⎢ .. ⎥ ∈ Fm
2,
⎣ . ⎦
i=0
a(m−1) j
⎡ ⎤
a00 a01 ··· a0(n−1)
⎢ a10 a11 ··· a1(n−1) ⎥
⎢ ⎥
E = [x0 , x1 , . . . , xn−1 ] = ⎢ .. .. .. ⎥ ,
⎣ . . ··· . ⎦
a(m−1)0 a(m−1)1 · · · a(m−1)(n−1) m×n
Bi = {x j ∈ Fm
2 |ai j = 0}.
Ai = {x j ∈ Fm
2 |ai j = 1, 0 ≤ j < n} ⇒ |Ai | = 2
m−1
.
For any two vectors α = (b0 , b1 , . . . , bn−1 ), β = (c0 , c1 , . . . , cn−1 ) in Fn2 , define the
product vector
αβ = (b0 c0 , b1 c1 , . . . , bn−1 cn−1 ) ∈ Fn2 .
αi1 αi2 = (χi1 (x0 )χi2 (x0 ), χi1 (x1 )χi2 (x1 ), . . . , χi1 (xn−1 )χi2 (xn−1 )).
Lemma 2.21 Let i 1 , i 2 , . . . , i s be the number of s(0 ≤ s < m) different indexes from
0 to m − 1, then
|Ai1 ∩ Ai2 ∩ · · · ∩ Ais | = 2m−s ,
Proof The first conclusion is obvious. Let’s just prove the second conclusion,
has 2m−s x j ∈ Ai1 ∩ Ai2 ∩ · · · ∩ Ais , so there are 2m−s components in α that are 1
and the others are 0, so
m−1
I (l) = {i 1 , i 2 , . . . , i s | l = ail 2i satisfy ail = 0}.
i=0
2.4 Some Typical Codes 69
n−1
(1 + x)l = bl j x n−1− j . (2.26)
j=0
m−1
Proof For 0 ≤ j < n, write j = i=0 ai j 2i , then
m−1
n−1− j = ci j 2i , where ci j = 1 − ai j .
i=0
By Lemma 2.19,
m−1
l ail
≡ (mod 2).
n−1− j i=0
ci j
If
l
≡ 1(mod 2),
n−1− j
n−1
(1 + x) =l
bl j x n−1− j .
j=0
Nn−1 = (1, 1, . . . , 1) = e.
m−1
ej = (αi + (1 + ai j )e),
i=0
When 0 ≤ j < n is given, we define the j-th complement of row vector αi (0 ≤ i <
m) of matrix E as
αi , if ai j = 1;
αi ( j) =
αi , if ai j = 0.
Obviously, there is
αi + (1 + ai j )e = αi ( j),
m−1
m−1
(αi + (1 + ai j )e) = αi ( j)
i=0 i=0
= (e − αi ) αi = b.
i∈I ( j) i ∈I
/ ( j)
Lemma 2.24 {Nl }0≤l<n constitutes a group of bases of Fn2 , where Nn−1 = e =
(1, 1, . . . , 1).
Proof {Nl }0≤l<n has exactly n different vectors, let’s prove that they are linearly
independent. Let
n−1
n−1
n−1
n−1
cl Nl = ( cl bl0 , cl bl1 , . . . , cl bl(n−1) )
l=0 l=0 l=0 l=0
n−1
f (x) = (1 + x)l ∈ F2 [x], f (x) = 0.
l=0
n−1 n−1
f (x) = ( cl bl j )x n−1− j .
j=0 l=0
n−1
So if there’s a component l=0 cl bl j = 0, that is {Nl }0≤l<n is a group of bases. The
Lemma holds.
Definition 2.12 Let 0 ≤ r < m, a linear code of order r łłReed–Muller code R(r, m)
be
R(r, m) = L({αi1 αi2 . . . αis |0 ≤ s ≤ r }) ⊂ Fn2 ,
r
m
t= .
s=0
s
Lemma 2.25 The dual code of Reed–Muller code R(r, m) of order r is R(m − r −
1, m).
Proof The dimensions of R(r, m) and R(m − r − 1, m) are
r
m
dim(R(r, m)) =
s=0
s
and
−1
m−r
m
dim(R(m − r − 1, m)) = .
s=0
s
Because
r
−1
m−r
m m
+
s=0
s s=0
m −s
r m
m m
= +
s=0
s s=r +1
s
m
m
= = (1 + 1)m
s=0
s
= 2m = n.
That is
dim(R(r, m)) + dim(R(m − r − 1, m)) = n.
Let αi1 αi2 · · · αis , α j1 α j2 · · · α jt be the basis vectors of R(r,m) and R(m-r-1,m), respec-
tively. Let
α = αi1 αi2 · · · αis , β = α j1 α j2 · · · α jt ,
by Lemma 2.21,
so
< α, β >= 0,
That is, the dual code of R(r, m) is R(m − r − 1, m). The Lemma holds.
Theorem 2.9 Reed–Muller code R(r, m) of order r have minimal distance d =
2m−r , specially, when r = m − 2, R(m − 2, m) is a linear code [n, n − m − 1].
Proof From Lemma 2.21, we have
so the minimum distance of R(r,m) is d ≤ 2m−r , on the other hand, let I1 (r ) be the
value of all l of corresponding {i 1 , i 2 , . . . , i s } under the condition of s ≤ r , let
then
n−1
f (x) = (1 + x)l = ( cl bl j )x n−1− j .
l∈I1 (r ) j=0 l∈I1 (r )
Therefore, the weight function of linear combination has the following relationship:
w( cl αi1 αi2 · · · αis ) = w( f (x)).
l∈I1 (r )
Define i 0 as
i 0 = min{l|l ∈ I1 (r )}.
Obviously,
i 0 = 1 + 2 + · · · + 2m−r −1 = 2m−r − 1,
∀ ki ≤ 1, so as to deduce
i0 l
≡ (mod 2).
k ki
So there is
i0
≡ 1(mod 2).
k
In the end, we have d = 2m−r . If let r = m − 2, then the minimum distance is 4. The
dimension of R(m − 2, m) is
m−2
m
m
m
m
m
t= = − −
s=0
s s=0
s m−1 m
=2 −m−1
m
= n − m − 1.
Because R(m − 2, m) is in the form of linear code [n, n − k], and the minimum
distance is 4, so we consider R(m − 2, m) as a class of extended Hamming codes.
Although it is not perfect, Hamming codes are perfect linear codes.
where ρ1 is the disjoint radius of code C. Therefore, the error probability p(x) of
code word x is related to code C. The error probability of code C is
1
p(C) = p(x).
|C| x∈C
2.5 Shannon Theorem 75
lim p(An ) = 0.
n→∞
If k > 21 n, then there are k > 21 n 0 characters in the received codeword after the
codeword 0 is transmitted, suppose 0 → 0, then d(0, 0) ≤ n − k < 21 n. Because
the disjoint radius of repeat code is 21 n, according to the decoding principle, we can
always decode 0 → 0 correctly; therefore, the error of codeword 0 = (0, 0, . . . , 0) ∈
Fn2 occurs if and only if when k ≤ 21 n, the error probability is
n
p(0) = q k p n−k .
n k
0≤k≤ 2
n
p(1) = q k p n−k .
n k
0≤k≤ 2
n
p(An ) = q k p n−k .
n k
0≤k≤ 2
n n
< = 2n .
n k 0≤k≤n
k
0≤k≤ 2
q n q
k log ≤ log .
p 2 p
Thus
n n
p(An ) ≤ 2n (qp) 2 = (4qp) 2 .
lim P ∗ (n, Mn , p) = 0.
n→∞
2.5 Shannon Theorem 77
In order to understand the meaning of Shannon’s theorem and prove it, we need some
auxiliary conclusions.
Lemma 2.27 0 < λ < 1 + p log p + q log q is a given real number, any binary
code C ⊂ Fn2 , if |C| = 2[λn] , then the code rate RC of C satisfies
1
λ− < RC ≤ λ.
n
Specially, When n → ∞, the rate of C approaches λ.
Proof
|C| = 2[λn] ⇒ log2 |C| = [λn] ≤ λn.
Therefore,
1
RC = log2 |C| ≤ λ.
n
From the properties of square bracket function,
λn < [λn] + 1,
so
λn − 1 < [λn] = log2 |C|.
There are
1 1
λ− < log2 |C| = RC .
n n
The Lemma 2.27 holds.
Combining Lemma 2.27, the significance of Shannon’s theorem is that the code
rate tends to the capacity 1 − H ( p) of a channel when the code length n increases
and tends to infinity, and there exists a code C whose error probability is arbitrarily
small, according to Shannon’s understanding, this kind of code is called “good code”.
Shannon first proved the existence of “good codes” under more general conditions
by probability method. Theorem 2.10 is only a special case of Shannon’s channel
coding theorem. To prove Shannon theorem, we must accurately estimate the error
probability of a given number of codewords under the principle of decoding.
Lemma 2.28 In the memoryless binary channel, let the probability of each transmis-
sion error of characters 0 and 1 be p, q = 1 − p, a codeword x = x1 x2 . . . xn#∈ Fn2
has exactly ω characters error during transmission, then for any ε > 0, let b = npq
ε
,
we have
P{ω > np + b} ≤ ε.
Take k = √1 ,
ε
then we have
That is
P{w > np + b} ≤ ε.
ρ ρ 1
log = p log p + O( √ ),
n n n
ρ ρ 1
(1 − ) log(1 − ) = q log q + O( √ ).
n n n
√
Proof When ε > 0 is given, b = O( n), so ρ can be rewritten as
√ ρ 1
ρ = np + O( n), = p + O( √ ).
n n
Thus
ρ ρ 1 1
log = ( p + O( √ )) log( p + O( √ ))
n n n n
1 1
= ( p + O( √ ))(log p + log(1 + O( √ ))).
n n
For the real number x of |x| < 1, we have the following Taylor expansion
2.5 Shannon Theorem 79
1 1 1
log(1 + x) = x − x 2 + x 3 − x 4 . . . .
2 3 4
So when |x| < 1, we have
log(1 + x) = O(|x|),
thus
1 1
log(1 + O( √ )) = O( √ ),
n n
we have
ρ ρ 1 1
log = ( p + O( √ ))(log p + O( √ ))
n n n n
1
= p log p + O( √ ).
n
ρ ρ 1
(1 − ) log(1 − ) = q log q + O( √ ),
n n n
To prove Shannon theorem, we define the following auxiliary functions, and for
any two codewords x, y ∈ Fn2 , ρ ≥ 0, define
0, if d(x, y) > ρ;
f ρ (x, y) =
1, if d(x, y) ≤ ρ.
M = Mn = 2[λn] , |C| = M.
Let
|C| = {x1 , x2 , . . . , x M } ⊂ Fn2 ,
M (2.27)
= p(y|xi )(1 − f ρ (y, xi )) + p(y|xi ) f ρ (y, x j ).
y∈Fn2 y∈Fn2 j=1
j=i
2.5 Shannon Theorem 81
According to the definition of f ρ (y, xi ), the first term of the above formula is the
probability that the received codeword y sent by xi is not in ball Bρ (xi ), i.e.
p(y|xi )(1 − f ρ (y, xi )) = P{received codewords y|y ∈
/ Bρ (xi )}.
y∈Fn2
M
Pi = p(xi ) ≤ ε + p(y|xi ) f ρ (y, x j ). (2.28)
y∈Fn2 j=1
j=i
1
M M M
p(C) = p(xi ) ≤ ε + M −1 p(y|xi ) f ρ (y, x j ).
M i=1 i=1 n j=1
y∈F2
j=i
P ∗ (n, Mn , p) ≤ E(P(C))
M
M
≤ ε + M −1 E( p(y|xi ) · f ρ (y, x j )).
i=1 y∈Fn2 j=1
j=i
When i is given, the random variables p(y|xi ) and f ρ (y, x j )( j = i) are statistically
independent, so
So there is
M
M
P ∗ (n, Mn , p) ≤ ε + M −1 E( p(y|xi ))E( f ρ (y, x j )). (2.29)
i=1 y∈Fn2 j=1
j=i
82 2 The Basis of Code Theory
Let’s calculate the expected value of f ρ (y, x j ), because y is selected in Fn2 with equal
probability, so
E( f ρ (y, x j )) = p(y) f ρ (y, x j )
y∈Fn2
1
= |Bρ (x j )|
2n
1
= n |Bρ (0)|.
2
So there is
M
M
P ∗ (n, Mn , p) = ε + M −1 E( p(y|xi )) E( f ρ (y, x j ))
i=1 y∈Fn2 j=1
j=i
(2.30)
M
(M − 1)|Bρ (0)|
= ε + M −1 E( p(y|xi )) .
i=1 y∈Fn2
2n
Now let’s calculate the expected value of p(y|xi )(y fixed, xi randomly selected in
C)
M
E( p(y|xi )) = p(xi ) p(y|xi ) = p(y),
i=1
thus
M
M
E( p(y|xi )) = p(y) = M.
i=1 y∈Fn2 i=1 y∈Fn2
From (2.30),
M −1
P ∗ (n, Mn , p) ≤ ε + |Bρ (0)|,
2n
That is
1 1 1
log2 (P ∗ (n, Mn , p) − ε) ≤ log2 M + log2 |Bρ (0)| − 1.
n n n
From Lemma 1.11 of Chap. 1,
ρ
1 1 n ρ
log2 Bρ (0) = log2 ≤ H ( ),
n n i=0
i n
2.5 Shannon Theorem 83
where H (x) = −x log x − (1 − x) log(1 − x)(0 < x < 21 ) is the binary entropy
function, so there is
1 1 ρ
log2 (P ∗ (n, Mn , p) − ε) ≤ log2 M + H ( ) − 1.
n n n
√
By hypothesis M = 2[λn] , ρ = [ pn + b], b = O( n), we have
1 [λn] ρ
log2 (P ∗ (n, Mn , p) − ε) ≤ + H( ) − 1
n n n
ρ 1
= λ + H ( ) − 1 + O( ).
n n
By Lemma 2.29,
ρ ρ ρ ρ ρ
H ( ) = −( log + (1 − ) log(1 − ))
n n n n n
1
= −( p log p + q log q + O( √ )).
n
So
1 1
log2 (P ∗ (n, Mn , p) − ε) ≤ λ − (1 + p log p + q log q) + O( √ ).
n n
1
log2 (P ∗ (n, Mn , p) − ε) ≤ −β(β > 0).
n
the code with arbitrarily small error probability is called “good code”, we further
analyze the construction of this kind of “good code”. (Shannon only proved the
existence of “good code” in probability).
Theorem 2.11 For given λ, 0 < λ < 1 + p log p + q log q( p < 21 ), Mn = 2[λn] , if
there is a perfect code Cn , and |Cn | = Mn , then we have
lim p(Cn ) = 0.
n→∞
84 2 The Basis of Code Theory
1
λ− ≤ RCn ≤ λ.
n
Therefore, the code rate of Cn can be arbitrarily close to λ, the error probability of Cn
can be arbitrarily small, so Cn is a “good code” in the mathematical sense. To prove
Theorem 2.11, because Cn is a perfect code, the minimum distance dn is defined as
n
dn = 2en + 1, en < .
2
Because of lim Rcn = λ, by Theorem 2.2, we have
n→∞
en
lim H ( ) = 1 − λ > H ( p).
n→∞ n
Because the binary entropy function H (x) is a monotone continuous rising function
(0 < x < 21 ). So we have the limit lim enn , and
n→∞
en en
lim > p, that is > p, When n is sufficiently large.
n→∞ n n
From the proof of Theorems 2.10 and 2.11, it can be seen that Shannon randomly
selects a code and randomly selects a codeword, which essentially regards the input
information as a random event in a given probability space, and the transmission
process of information is essentially a random process. The fundamental difference
between Shannon and other mathematicians at the same time is that he regards
information or a code as a random variable. The mathematical model of information
transmission is a dynamic probability model rather than a static algebraic model. The
most important method to study a code naturally is probability statistics rather than
the algebraic combination method of traditional mathematics. From the perspective
of probability theory, Theorems 2.10 and 2.11 regard a code as a random variable,
but they have great particularity. The probability distribution of this random variable
obeys Bernoulli binomial distribution, especially the statistical characteristics of code
rate, which are not clearly expressed. It is the core content of Shannon’s information
theory to study the relationship between random variables with general probability
distribution and codes. One of the most basic concepts is information entropy, or
code entropy. Using the concept of code entropy, the statistical characteristics of a
code are clearly displayed. Therefore, we see a basic framework and prototype of
modern information theory. In the next chapter, we explain and prove these basic
ideas and results of Shannon information theory in detail. One of the most important
results is Shannon channel coding theorem (see Theorem 3.12 in Chap. 3). Shannon
uses the probability method to prove that the so-called good code with a code rate
up to the transmission capacity and an arbitrarily small error probability exists for
the general memoryless channel (whether symmetrical or not). On the contrary, the
code rate of a code with an arbitrarily small error probability must not be greater
than the capacity of the channel. This channel capacity is called Shannon’s limit,
which has been pursued for a long time in the field of electronic communication
engineering technology. People want to find a channel coding scheme with error
probability in a controllable range (e.g., less than ε) and transmission efficiency (i.e.,
code rate) reaching Shannon’s limit. In today’s 5G era, this engineering technical
problem seems to have been overcome. Returning to theorem 2.10, we see that the
upper limit 1 − H ( p) of the code rate is the channel capacity of the memoryless
symmetric binary channel (see example 2 in Sect. 8 of Chap. 3). From this example,
we can get a glimpse of Shannon’s channel coding theory.
Exercise 2
1. Please design a code of length 7, which contains 8 codewords, where the Ham-
ming distance of any two codewords is ≥ 4. The code is transmitted through
symmetric binary channel, assuming the error probability of characters 0 and 1
is p, calculate the success probability of codeword transmission.
2. Let C be a binary code of length 16, satisfy
(i) Each codeword has a weight of 6.
(ii) Any two codewords have Hamming distance of 8.
86 2 The Basis of Code Theory
4. Let C be a binary perfect code of length n, and the minimum distance is 7. Prove:
n = 7 or n = 23.
5. Let C ⊂ Fqn be a linear code, C = [n, k] and any k coordinates be symmetric,
prove: the minimum distance of C is d = n − k + 1.
6. Suppose C = [2k + 1, k] ⊂ F2k+1 2 , and C ⊂ C ⊥ , write the difference set C ⊥ \C.
7. Let x = x1 x2 . . . x6 ∈ F2 , Decide Hamming ball |B1 (x)|. We can find a code
6
C ⊂ F62 ? Where |C| = 9, satisfy the Hamming distance of any two different
codewords in C is ≥ 3?
8. Let C = [n, k] ⊂ Fqn be a linear code, the generating matrix is G, if every column
of G is not all zero, prove
w(x) = n(q − 1)q k−1 .
x∈C
17. Proved that ternary Golay has 132 codewords and its weight is 5. Let x be
the codeword of weight 5, consider all pairs (x, 2x), where w(x) = 5, take the
component whose coordinate component is not zero as a subset. Proved that
there are 66 such subsets and form 4 − (11, 5, 1) designs.
18. If the minimum distance d of a binary code C = (n, M, d) is even, prove that
there exists a binary code such that all its codewords have even weights.
19. Let H be a Hadamard matrix H12 , define
Proved that G is the generating matrix of ternary code [24, 12] and the minimum
distance is 9.
20. Let C = [4, 2] be a ternary Hamming code. H is the check matrix of C, let I be
the unit matrix of order 4, J is a square matrix of order 4 with all elements of 1,
define
J+I I I
G= ,
0 H −H
prove that G generates a ternary code C = [12, 6] and the minimum distance
is 6.
References
Barg, A. M., Katsman, S. L., & Tsfasman, M. A. (1987). Algebraic Geometric Codes from Curves
of Small Genus. Probl. of Information Transmission,23, 34–38.
Berlekamp, E. R. (1972). Decoding the Golay Code, JPL Technical Report 32-1256 (Vol. IX, pp.
81–85). Jet Propulsion Laboratory.
Berlekamp, E. R. (1968). Algebraic Coding Theory. NewYork: McGraw-Hill.
Best, M. R. (1980). Binary cades with a minimum distance of four. IEEE Transactions of Information
Theory, 26, 738–742.
Best, M. R. (1978). On the Existence of Perfect Codes, Report ZN 82/78. Amsterdam: Mathematical
Centre.
Bussey, W. H. (1905). Galois field tables for p n 169. Bull Amer Math Soc, 12, 22–38.
Bussey, W. H. (1910). Tables of Galois fields of order less than 1000. Bull Amer Math Soc, 16,
188–206.
Cameron, P. J., & van Lint, j. H. (1991). Designs, graphs, codes and their links. London Math Soc
Student Texts (Vol. 22). Cambridge University Press.
Conway, J. H., & Sloane, N. J. A. (1994). Quaternary constructions for the binary single- error-
correcting codes of Julin, Best, and others. Designs, Codes and Cryptography,41, 31–42.
Curtis, C. W., & Reiner, I. (1962). Representation Theory of Finite Groups and Associative Algebras.
New York-London: Interscience.
Delsarte, P., & Goethals, J. M. (1975). Unrestricted codes with the Golay parameters are unique.
Discrete Math.,12, 211–224.
Elias, P. Coding for noisy channels. IRE Conv, Record, part 4 (pp. 37–46).
Feller, W. (1950). An introduction to probability theory and its applications (Vol. I). Wiley.
Feng, G.-L., & Rao, T. R. N. (1994). A simple approach for construction of algebraic-geometric
codes from affine plane curves. IEEE Trans. Info. Theory,40, 1003–1012.
88 2 The Basis of Code Theory
Forney, G. D. (1970). Convolutional codes I: Algebraic structure. IEEE Trans Info Theory, 16,
720–738: Ibid, 17, 360 (1971).
Gallagher, R. G. (1968). Information Theory and Reliable Communication. New York: Wiley.
Goethals, J. M. (1977). The extended Nadler code is unique. IEEE Trans Info, 23, 132–135.
Goppa, V. D. (1970). A new class of linear error- correcting codes. Problems of Info Transmission,
6, 207–212.
Goto, M. (1975). A note on perfect decimal AN codes. Info Control, 29, 385–387.
Goto, M., & Fukumara, T. (1975). Perfect nonbinary AN codes with distance three. Info Control,
27, 336–348.
Graham, R. L., & Sloane, N. J. A. (1980). Lower bounds for constant weight codes. IEEE Trans
Info Theory, 26, 37–40.
Gritsenko, V. M. (1969). Nonbinary arithmetic correcting codes. Problems of Info Transmission, 5,
15–22.
Helgert, H. J., & Stinaff, R. D. (1973). Minimum distance bounds for binary linear codes. IEEE
Trans Info Theory, 19, 344–356.
Høholdt, T., & Pellikaan, R. (1995). On the decoding of algebraic-geometric codes. IEEE Transac-
tions of Info Theory, 41, 1589–1614.
Høholdt, T., van Lint, J. H., & Pellikaan, R. (1998). Algebraic geometry codes. In V. S. Pless, W.
C. Huffman & R. A. Brualdi (Eds.),Hand-book of coding theory. Elsevier Science Publishers.
Hong, Y. (1984). On the nonexistence of unknown perfect 6- and 8-codes in Hamming schemes
H (n, q) with q arbitrary. Osaka J. Math., 21, 687–700.
Justesen, J. (1975). An algebraic construction of rate v1 convolutional codes. IEEE Trans Info Theory,
21, 577–580.
Justesen, J., Larsen, K. J., Jensen, E. H., Havemose, A., & Høholdt, T. (1989). Construction and
decoding of a class of algebraic geometry codes. IEEE Transactions of Info Theory, 35, 811–821.
Kasami, T. (1969). An upper bound on k/n for affine invariant codes with fixed d/n. IEEE Trans
Info Theory, 15, 171–176.
Kerdock, A. M. (1972). A class of low-rate nonlinear codes. Info and Control, 20, 182–187.
Levenshtein, V. I. (1975). Minimum redundancy of binary error-correcting codes. Info Control, 28,
268–291.
Macwilliams, F. J., & Sloane, N. J. A. (1977). The Theory of Error-correcting Codes. Amsterdam-
New York-Oxford: North Holland.
Massey, J. L., & Garcia, O. N. (1972). Error-correcting codes in computer arithmetic. In J. T. Ton
(Eds.), Advances in information systems science (Vol. 4, Ch. 5). Plenum Press.
Massey, J. L., Costello, D. J., & Justesen, J. (1973). Polynomial weights and code construction.
IEEE Trans. Info. Theory,19, 101–110.
McEliece, R. J. (1977). The theory of information and coding. In Encyclopedia of mathematics and
its applications (Vol. 3). Addison-Wesley.
McEliece, R. J. (1979). The bounds of Delsarte and Lovasz and their applications to coding theory.
In G. Longo (Eds.), Algebraic coding theory and applicationsCISM Courses and Lectures (Vol.
258). Springer.
McEliece, R. J., Rodemich, E. R., Rumsey, H. C., & Welch, L. R. (1977). New upper bounds on the
rate of a code via the Delsarte- MacWilliams inequalities. IEEE Trans. Info. Theory,23, 157–166.
Peek, J. H. (1985). Communications Aspects of the Compact Disc Digital Audio System. IEEE
Communications Magazine, 23(2), 7–15.
Peterson, W. W., & Weldon, E. J. (1972). Error-correcting codes (2nd Edn). MIT Press.
Piret, P. (1977). Algebraic properties of convolutional codes with automorphisms, Ph.D. Disserta-
tion. University of Catholique de Louvain.
Piret, P. (1988). Convolutional codes, an algebraic approach. The MIT Press.
Posner, E. C. (1968). Combinatorial structures in planetary reconnaissance. In E. B. Mann (Eds.),
Error correcting codes (pp. 15–46). Wiley.
Rao, T. R. N. (1974). Error Coding for Arithmetic Processors. New York-London: Academic Press.
References 89
Roos, C. (1979). On the structure of convolutional and cyclic convolutional codes. IEEE Trans.
Info. Theory, 25, 676–683.
Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technology Journal,
27, 379–423, 623–656.
Sloane, N. J. A., Reddy, S. M., & Chen, C. L. (1972). New binary codes. IEEE Trans. Info. Theory,18,
503–510.
Solomon, G., & van Tilborg, H. C. A. A connection between block and convolutional codes. SIAM
Journal of Applied Mathematics, 37, 358–369.
Stichtenoth, H. (1993). Algebraic function fields and codes. Springer, Universitext.
Tietäváinen, T. (1973). On the nonexistence of perfect codes over finite fields. SIAM Journal of
Applied Mathematics, 24, 88–96.
van der Geer, G., & van Lint, J. H. (1988). Introduction to coding theory and algebraic geometry.
Birkhäuser.
van Lint, J. H. (1971). Nonexistence theorems for perfect error-correcting codes. In Computers in
Algebra and Theory (Vol. IV) (SIAM-AMS Proceedings).
van Lint, J. H. (1972). A new description of the Nadler code. IEEE Trans Info Theory, 18, 825–826.
van Lint, J. H. (1975). A survey of perfect codes. Rocky Mountain Journal of Math, 5, 199–224.
van Lint, J. H. (1990). Algebraic geometric codes. In D. Ray-Chaudhuri (Eds.), Coding theory and
design theory I, The IMA Volumes in Math and Appl 20. Springer.
van Lint, J. H. (1999). Introduction to coding theory, GTM86, Springer.
van Lint, J. H., & Macwilliams, F. J. (1978). Generalized quadratic residue codes. IEEE Trans Info
Theory, 24, 730–737.
van Lint, J. H., & Wilson, R. M. (1992). A course in combinatorics. Cambridge University Press.
van Oorschot, P. C., & Vanstone, S. A. (1989). An introduction to error correcting codes with
applications. Kluwer.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 3
Shannon Theory
is statistically independent, that is, p(x y) = p(x) p(y), then the self-information
amount is I (x y) = I (x) + I (y). Of course, the self-information amount I (x) is
nonnegative, that is I (x) ≥ 0. Shannon prove, the self-information I (x) satisfying
the above three assumptions must be
where c is a constant. This conclusion can be derived directly from the following
mathematical theorems.
Lemma 3.1 If the real function f (x) satisfies the following conditions in interval
[1, +∞):
(i) f (x) ≥ 0,
(ii) If x < y ⇒ f (x) < f (y),
(iii) f (x y) = f (x) + f (y).
Then f (x) = c log x, where c is a constant.
Proof Repeated use condition (iii), then there is
f (x k ) = k f (x), k ≥ 1
for any positive integer k. Take x = 1, then the above formula holds if and only if
f (1) = 0. It can be seen from (ii) that f (x) > 0 when x > 1. Let x > 1, y > 1 and
k ≥ 1 given, you can always find a nonnegative integer n to satisfy
y n ≤ x k < y n+1 ,
n log x n+1
≤ < ,
k log y k
thus
f (x) log x 1
| − |≤ ,
f (y) log y k
when k → ∞, we have
f (x) log x
= , ∀ x, y ∈ (1, +∞).
f (y) log y
Therefore,
3.1 Information Space 93
f (x) f (y)
= = c, ∀ x, y ∈ (1, +∞).
log x log y
We call (X, ξ ) an information space in a given probability space, when the random
variable ξ is clear, we usually record the information space (X, ξ ) as X . If η is another
random variable valued on X , and ξ and η obey the same probability distribution,
that is
P{ξ = x} = P{η = x}, ∀ x ∈ X.
It should be noted that if there are two random variables ξ and η with values on X ,
when the probability distributions obeyed by ξ and η are not equal, then (X, ξ ) and
(X, η) are two different information spaces; at this point, we must distinguish the
two different information spaces with X 1 = (X, ξ ) and X 2 = (X, η).
Definition 3.2 X and Y are two source state sets, and the random variables ξ and η
are taken on X and Y , respectively; if ξ and η are compatible random variables, the
probability distribution of joint event x y(x ∈ X, y ∈ Y ) is defined as
X Y = {x y|x ∈ X, y ∈ Y }
Together with the corresponding random variables ξ and η, it is called the product
space of information space (X, ξ ) and (Y, η), denote as (X Y, ξ, η), when ξ and η
are clear, they can be abbreviated as X Y = (X Y, ξ, η). If X = Y are two identical
source state sets, ξ and η have the same probability distribution, then the product
space X Y is denoted as X 2 and is called a power space.
Since the information space is a complete set of events, defined by the product
information space, we have the following full probability formula and probability
product formula:
⎧
⎪
⎪ p(yx) = p(y), ∀ y ∈ Y
⎨
x∈X
(3.5)
⎪
⎪ p(x y) = p(x), ∀ x ∈ X.
⎩
y∈Y
And
p(x) p(y|x) = p(x y), ∀ x ∈ X, y ∈ Y.
Then called
X 1 X 2 · · · X n = {x1 x2 · · · xn |xi ∈ X i , 1 ≤ i ≤ n}
where 0 < λ < 1, then (X, ξ ) is called a two-point information space with parameter
λ, still denote as X .
3.1 Information Space 95
X = (X 0 , ξ1 )(X 0 , ξ2 ) · · · (X 0 , ξn ) = X 0n ⊂ Fn2 ,
the power space X is called Bernoulli information space, also alled memoryless
binary information space. The probability function p(x) in X is
n
p(x) = p(x1 x2 · · · xn ) = p(xi ), xi = 0 or 1. (3.7)
i=1
n
H (X ) = − p(x) log p(x) = − p(xi ) log p(xi ), (3.8)
x∈X i=1
if p(xi ) = 0 in the above formula, we agreed that p(xi ) log p(xi ) = 0, the base
of logarithm can be selected arbitrarily; if the base of the logarithm is D(D ≥ 2),
then H (X ) is called D-ary entropy, sometimes denote as H D (X ).
Theorem 3.1 For any information space X , always have
0 ≤ H (X ) ≤ log |X |. (3.9)
m
1 m
p(xi )
H (X ) = p(xi ) log ≤ log = log m.
i=1
p(xi ) i=1
p(xi )
The above equal sign holds if and only if p(x1 ) = p(x2 ) = · · · = p(xm ) = m1 , that
is, X is equal probability information space. If X = {x} is a degenerate infor-
mation space, because p(x) = 1, so H (X ) = 0. Conversely, if H (X ) = 0, let
X = {x1 , x2 , . . . , xm }, suppose ∃ xi ∈ X , such that 0 < p(xi ) < 1, then
1
0 < p(xi ) log ≤ H (X ).
p(xi )
X and Y are called independent information space, and the probability distribution
of joint events is
p(x y) = p(x) p(y), ∀x ∈ X, y ∈ Y.
Lemma 3.2 (Addition formula of entropy) For any two information spaces X and
Y , then we have
H (X Y ) = H (X ) + H (Y |X ) = H (Y ) + H (X |Y ).
n
H (X 1 X 2 · · · X n ) = H (X i |X i−1 X i−2 · · · X 1 ). (3.12)
i=1
H (X Y ) = H (Y ) + H (X |Y ).
H (X 1 X 2 ) = H (X 1 ) + H (X 2 |X 1 ).
H (X 1 X 2 · · · X n ) = H (X 1 X 2 · · · X n−1 ) + H (X n |X 1 X 2 · · · X n−1 )
n−1
= H (X i |X i−1 X i−2 · · · X 1 ) + H (X n |X 1 X 2 · · · X n−1 )
i=1
n
= H (X i |X i−1 X i−2 · · · X 1 ).
i=1
H (X Y ) = H (X ) + H (Y ). (3.14)
Generally, we have
H (X 1 X 2 · · · X n ) ≤ H (X 1 ) + H (X 2 ) + · · · + H (X n ). (3.15)
H (X 1 X 2 · · · X n ) = H (X 1 ) + H (X 2 ) + · · · + H (X n ). (3.16)
= 0.
The above equal sign holds, if and only if for all x ∈ X , y ∈ Y , p(x) p(y)
p(x y)
= c( where
c is a constant), thus p(x) p(y) = cp(x y). Both sides sum at the same time, we have
1= p(x) p(y) = c p(x y),
x∈X y∈Y x∈X y∈Y
3.2 Joint Entropy, Conditional Entropy, Mutual Information 99
thus c = 1, p(x y) = p(x) p(y). So if and only if X and Y are independent information
spaces, (3.14) holds. By induction, we have (3.15) and (3.16). Theorem 3.2 holds.
By (3.15), we have the following direct corollary; for any information space X
and n ≥ 1, we have
H (X n ) ≤ n H (X ). (3.17)
Definition 3.7 Let X and Y be two information spaces, and say that X is completely
determined by Y , if there is always a subset N x ⊂ Y of Y for any given x ∈ X , satisfies
p(x|y) = 1, if y ∈ N x ;
(3.18)
p(x|y) = 0, if y ∈
/ Nx .
H (X |Y ) = 0. (3.19)
H (X |Y ) = H (X ). (3.20)
Proof (i) is trivial. Let us prove (3.19) first. By Definition 3.7 and (3.18), for given
x ∈ X , we have
p(x y) = p(y) p(x|y) = 0, y ∈ / Nx .
Thus
H (X |Y ) = − p(x y) log p(x|y)
x∈X y∈Y
=− p(x y) log p(x|y) = 0.
x∈X y∈N x
The proof of the formula (3.20) is obvious. Because X and Y are independent, the
conditional probability
p(x|y) = p(x), ∀ x ∈ X, y ∈ Y.
Thus
H (X |Y ) = − p(x) p(y) log p(x)
x∈X y∈Y
=− p(x) log p(x) = H (X ).
x∈X
100 3 Shannon Theory
We have
p(x|y) p(y|x)
= .
p(x) p(y)
Lemma 3.4
I (X, Y ) = H (X ) − H (X |Y ) = H (Y ) − H (Y |X ).
Proof By definition,
p(x|y)
I (X, Y ) = p(x y) log
x∈X y∈Y
p(x)
= p(x y) log p(x|y) − p(x y) log p(x)
x∈X y∈Y x∈X y∈Y
= −H (X |Y ) − p(x) log p(x)
x∈X
= H (X ) − H (X |Y ).
I (X, Y ) = H (Y ) − H (Y |X ).
Lemma 3.5 Assuming that X and Y are two information spaces, I (X, Y ) is the
amount of mutual information, then
H (X Y ) = H (X ) + H (Y ) − I (X, Y ). (3.22)
3.2 Joint Entropy, Conditional Entropy, Mutual Information 101
H (X Y ) = H (X ) + H (Y |X )
= H (X ) + H (Y ) − (H (Y ) − H (Y |X ))
= H (X ) + H (Y ) − I (X, Y ).
The conclusion about I (X, Y ) ≥ 0 can be deduced directly from Theorem 3.2.
Let us prove an equation about entropy commonly used in the statistical analysis
of cryptography.
H (X Y |Z ) = H (X |Z ) + H (Y |X Z )
(3.23)
= H (Y |Z ) + H (X |Y Z ).
Thus
p(x z) p(y|x z)
p(x y|z) = = p(x|z) p(y|x z).
p(z)
So we have
H (X Y |Z ) = − p(x yz) log p(x|z) p(y|x z)
x∈X y∈Y z∈Z
=− p(x yz)(log p(x|z) + log p(y|x z))
x∈X y∈Y z∈Z
=− p(x z) log p(x|z) − p(x yz) log p(y|x z)
x∈X z∈Z x∈X y∈Y z∈Z
= H (X |Z ) + H (Y |X Z ).
H (X 1 X 2 · · · X n |Y ) ≤ H (X 1 |Y ) + · · · + H (X n |Y ). (3.24)
Specially, when X 1 = X 2 = · · · = X n = X ,
H (X n |Y ) ≤ n H (X |Y ). (3.25)
H (X 1 X 2 · · · X n |Y ) ≤ H (X 1 |Y ) + · · · + H (X n |Y ).
H (X 1 X 2 · · · X n+1 |Y ) = H (X X n+1 |Y )
=− p(x zy) log p(x z|y).
x∈X z∈X n+1 y∈Y
So by Jensen inequality,
H (X X n+1 |Y ) − H (X |Y ) − H (X n+1 |Y )
p(x|y) p(z|y)
= p(x zy) log
x∈X z∈X n+1 y∈Y
p(x z|y)
≤ log p(y) p(x|y) p(z|y).
x∈X z∈X n+1 y∈Y
By product formula
p(y) p(x|y) p(z|y)
x∈X z∈X n+1 y∈Y
= p(x|y) p(y)
x∈X y∈Y
= p(x) = 1.
x∈X
H (X X n+1 |Y ) ≤ H (X n+1 |Y ) + H (X |Y )
≤ H (X 1 |Y ) + H (X 2 |Y ) + · · · + H (X n+1 |Y ).
3.3 Redundancy
H (X |Y Z ) ≤ H (X |Z ). (3.26)
So
H (X |Y Z ) − H (X |Z )
p(x|z)
= p(x yz) log
x∈X y∈Y z∈Z
p(x|zy)
p(x yz) p(x|z)
≤ log
x∈X y∈Y z∈Z
p(x|zy)
= log p(yz) p(x|z)
x∈X y∈Y z∈Z
= log p(z) p(x|z)
x∈X z∈Z
= 0.
Let X be a source state set and randomly select codewords to enter the channel of
information transmission is a discrete random process. This mathematical model can
be constructed and studied on X by taking the value of a group of random variables
{ξi }i≥1 . Firstly, we assume that {ξi }i≥1 obeys the same probability distribution when
taking value on X , and we get a set of information spaces {X i }i≥1 , let H0 = log |X |
be the entropy of X as the equal probability information space, for n ≥ 1, we let
Hn = H (X |X n−1 ), H1 = H (X ).
By Lemma 3.7, then {Hn } constitutes a number sequence with monotonic descent
and lower bound, so that its limit exists, that is
We will extend the above observation to the general case: Let {ξi }i≥1 be any set of
random variables valued on X , for any n ≥ 1, we let
X n = (X, ξn ), n ≥ 1.
Definition 3.9 A source state set X has a set of random variables {ξi }i≥1 valued on
X , then X is called a source.
(i) If {ξi }i≥1 is a group of independent and identically distributed random variables,
X is called a memoryless source.
(ii) If for any integers k, t1 , t2 , . . . , tk and h, random vector
obey the same joint probability distribution, then X is called a stationary source.
(iii) If {ξi }i≥1 is a k-order Markov process, that is, for ∀ m > k ≥ 1,
Lemma 3.8 Let X be a source state set, and {ξi }i≥1 be a set of random variables
valued on X , we write
X i = (X, ξi ), i ≥ 1. (3.28)
n
p(x1 x2 · · · xn ) = p(xi ), xi ∈ X i , n ≥ 1. (3.29)
i=1
where xi ∈ X i , i ≥ 1.
(iii) If X is a stationary Markov source, then the conditional probability distribution
on X satisfies for any m ≥ 1 and x1 x2 · · · xm ∈ X 1 X 2 · · · X m , we have
Proof (i) and (ii) can be derived directly from the definition. We only prove (iii). By
(ii) of the definition 3.9, for ∀ i ≥ 1, we have
and
P{ξi = xm−1 } = P{ξm−1 = xm−1 }.
Thus
P{ξi = xm−1 }P{ξi+1 = xm |ξi = xm−1 }
= P{ξm−1 = xm−1 }P{ξm = xm |ξm−1 = xm−1 }.
We have
P{ξi+1 = xm |ξi = xm−1 } = p(xm |xm−1 ).
Lemma 3.9 Let { f (n)}n1 be a sequence of real numbers, which satisfies the fol-
lowing semi countable additivity,
f (n + m) f (n) + f (m), ∀ n 1, m 1.
1
Then lim f (n) exists, and
n→∞ n
1 1
lim f (n) = inf f (n)|n 1 . (3.32)
n→∞ n n
Proof Let
1
δ = inf f (n)|n 1 , δ = −∞.
n
1 ε
f (m) < δ + .
m 2
Let n = am + b, where a is an integer, 0 b < m, by semi countable additivity, we
have
f (n) a f (m) + (n − am) f (1).
1 a b
f (n) f (m) + f (1).
n am + b am + b
b f (1) 1
< ε.
am + b 2
So there is
1 1 1
f (n) < f (m) + ε < ε + δ. (3.33)
n m 2
Thus we have
1 1
δ lim f (n) lim f (n) < δ + ε.
n→∞ n n→∞ n
So
1
lim f (n) = δ.
n→∞ n
If δ = −∞, by (3.33),
3.3 Redundancy 107
1
lim f (n) = −∞,
n→∞ n
so we still have
1
lim f (n) = δ = −∞.
n→∞ n
The Lemma holds.
Lemma 3.10 Let {an }n1 be a sequence of real numbers, and the limit lim an = a,
n→∞
then
1
n
lim ai = a.
n→∞ n
i=1
Proof
1 1 1
n n n
ai − a = (ai − a) |(ai − a)|
n i=1 n i=1 n i=1
1 1
N n
= |ai − a| + |ai − a|
n i=1 n i=N +1
1
N
n−N
< |ai − a| + ε
n i=1 n
1
N
< |ai − a| + ε.
n i=1
When ε > 0 is given, N is also given accordingly, the first item of the above formula
tends to 0, when n → ∞. So for any ε > 0, when n > N0 ,
1
n
ai − a < 2ε.
n i=1
Thus there is
1
n
lim ai = a.
n→∞ n
i=1
X n = (X, ξn ), n 1.
108 3 Shannon Theory
Then when X is a stationary source, we have the following two limits that exist and
are equal, that is
1
lim H (X 1 X 2 . . . X n ) = lim H (X n |X 1 X 2 . . . X n−1 ).
n→∞ n n→∞
1 1
lim H (X 1 X 2 · · · X n ) = inf H (X 1 X 2 · · · X n )|n 1 0.
n→∞ n n
H (X 1 X 2 · · · X n−1 ) = H (X 2 X 3 · · · X n )
and
H (X 2 X 3 · · · X n X n+1 ) = H (X 1 X 2 · · · X n ).
So we have
H (X n+1 |X 2 X 3 · · · X n ) = H (X n |X 1 X 2 · · · X n−1 ). (3.35)
By Lemma 3.7,
H (X n+1 |X 1 X 2 · · · X n ) H (X n+1 |X 2 X 3 · · · X n )
= H (X n |X 1 X 2 · · · X n−1 ).
3.3 Redundancy 109
1
n
1
H (X 1 X 2 · · · X n ) = H (X i |X 1 X 2 · · · X i−1 ).
n n i=1
1
lim H (X 1 X 2 · · · X n ) = lim H (X n |X 1 X 2 · · · X n−1 ) = H∞ (X ).
n→∞ n n→∞
H∞ (X ) H (X 1 ) log |X |.
H∞ (X ) = H (X 1 ).
H∞ (X ) = H (X 2 |X 1 ).
H∞ (X ) H (X 1 ).
= n H (X 1 ).
So we have
H∞ (X ) = H (X 1 ).
H∞ X
δ = log |X | − H∞ (X ), r = 1 − , (3.36)
log |X |
H0 = log |X |, Hn = H (X n |X 1 X 2 · · · X n−1 ), ∀ n ≥ 1.
Hn ≥ (1 − r )H0 , ∀ n ≥ 1. (3.37)
Say that X and Y are statistically independent under the given condition of Z .
Definition 3.11 If the information space X and Y are statistically independent under
condition Z , X, Y, Z is called a Markov chain, denote as X → Z → Y .
Theorem 3.5 X → Z → Y is a Markov chain if and only if the probability of occur-
rence of the joint event x zy is
if and only if
p(x zy) = p(y) p(z|y) p(x|z). (3.40)
Similarly,
p(x zy) = p(z) p(y|z) p(x|z)
= p(y) p(z|y) p(x|z).
So we have
p(x y|z) = p(x|z) p(y|z).
Both sides sum y ∈ Y at the same time, and notice that y∈Y p(y|z) = 1, then
By Theorem 3.5, U → X → Z is a Markov chain. The left side of the above formula
can be expressed as
p(ux z) = p(ux) p(z|ux).
So we have
p(z|ux) = p(z|x).
112 3 Shannon Theory
By definition, we have
I (X, Y |Z ) = I (Y, X |Z ). (3.43)
and
I (X, Y |Z ) = H (Y |Z ) − H (Y |X Z ). (3.45)
Proof We only prove (3.44), the same is true for equation (3.45). Because
p(x|yz)
H (X |Z ) − H (X |Y Z ) = p(x yz) log
x∈X y∈Y z∈Z
p(x|z)
p(x y|z)
= p(x yz) log
x∈X y∈Y z∈Z
p(x|z) p(y|z)
= I (X, Y |Z ).
So (3.44) holds.
3.4 Markov Chain 113
I (X, Y |Z ) = H (X |Z ) − H (X |Y Z ) ≥ 0.
p(x y|z)
log = log 1 = 0,
p(x|z) p(y|z)
n
I (X 1 X 2 · · · X n , Y ) = I (X i , Y |X i−1 · · · X 1 ). (3.46)
i=1
I (X 1 X 2 , Y ) = I (X 1 , Y ) + I (X 2 , Y |X 1 ). (3.47)
I (X 1 X 2 · · · X n , Y ) = H (X 1 X 2 · · · X n ) − H (X 1 X 2 · · · X n |Y )
n
n
= H (X i |X i−1 · · · X 1 ) − H (X i |X i−1 · · · X 1 Y ).
i=1 i=1
n
I (X 1 X 2 · · · X n , Y ) = I (X i , Y |X 1 X 2 · · · X i−1 ).
i=1
and
I (X, Y ) ≤ I (Y, Z ). (3.49)
Proof We only prove (3.48), the same is true for equation (3.49). From equation
(3.47) and corollary 3.3:
I (Y Z , X ) = I (Y, X ) + I (X, Z |Y ).
Thus we have
I (X, Y ) = I (X, Y Z ) − I (X, Z |Y )
≤ I (X, Y Z )
= I (X, Z ) + I (X, Y |Z )
= I (X, Z ).
In the last step, we use the Markov chain condition, thus I (X, Y |Z ) = 0. The The-
orem holds.
Theorem 3.9 (Data processing inequality) Suppose U → X → Y → V is a Markov
chain, then we have
I (U, V ) ≤ I (X, Y ).
I (U, Y ) ≤ I (X, Y )
and
I (U, V ) ≤ I (U, Y ).
Thus
I (U, V ) ≤ I (X, Y ).
The information coding theory is usually divided into two parts: channel coding
and source coding. The so-called channel coding is to ensure the success rate of
decoding by increasing the length of codewords. Channel coding, also known as
error correction code, is discussed in detail in Chap. 2. Source coding is to compress
the data with redundant information to improve the success rate of decoding and
recovery after information or data is stored. Another important result of Shannon’s
theory is that there are so-called good codes in source coding, which is characterized
3.5 Source Coding Theorem 115
Lemma 3.11 Let X be a memoryless source, p(X n ) and log p(X n ) be two random
variables whose values are on the power space X n , then − n1 log p(X n ) converges to
H (X ) according to probability, that is
1 P
− log p(X n ) −→ H (X ).
n
Proof Since X is a memoryless source, {ξi }i≥1 is a group of independent and identi-
cally distributed random variables, X i = (X, ξi )(i ≥ 1), X n = X 1 X 2 · · · X n (n ≥ 1)
is a power space, then there is
116 3 Shannon Theory
p(X n ) = p(X 1 ) p(X 2 ) · · · p(X n )
n
log p(X n ) = i=1 log p(X i ).
Because {ξi }i≥1 is independent and identically distributed, { p(X n )} and {log p(X n } is
also a group of independent and identically distributed random variables. According
to Chebyshev’s law of large numbers (see Theorem 1.3 of Chap. 1),
1
n
1 1
− log p(X n ) = log
n n i=1 p(X i )
1
P{| − log p(X n ) − H (X )| < ε} > 1 − ε. (3.53)
n
The proof is completed.
For any given ε > 0, n ≥ 1, we define a typical code or a typical sequence Wε(n) in
the power space X n as
1
Wε(n) = {x = x1 · · · xn | | − log p(x) − H (X )| < ε}. (3.55)
n
Because the definition, and ε > 0, n ≥ 1, we have
Wε(n) ⊂ X n , |X n | = |X |n . (3.56)
1
P{| − log p(x) − H (X )| < ε} > 1 − ε.
n
3.5 Source Coding Theorem 117
1
H (X ) − ε < − log p(x) < H (X ) + ε.
n
Equivalent in binary channel,
by (3.58),
|Wε(n) | · 2−n(H (X )+ε) ≤ P{Wε(n) } ≤ 1.
So
|Wε(n) | ≤ 2n(H (X )+ε) .
So we have
|Wε(n) | > (1 − ε)2n(H (X )−ε) .
p(x) ∼ 2−n H (X ) , ∀ x ∈ X n .
|Wε(n) | ∼ 2n H (X ) .
118 3 Shannon Theory
Further analysis shows that the proportion of typical code Wε(n) in block code X n is
very small, which can be summarized as the following Lemma.
Lemma 3.13 For a sufficiently small ε > 0 given, when X is not an equal probability
information space, we have
|Wε(n) |
lim = 0.
n→∞ |X |n
So
|Wε(n) |
≤ 2−n(log |X |−H (X )−ε) .
|X |n
|Wε(n) |
Therefore, when n is sufficiently large, the ratio of |X |n
can be arbitrarily small. The
Lemma 3.13 holds.
Combining Lemmas 3.11, 3.12 and 3.13, we can describe that the typical codes
in block codes have the following statistical characteristics.
Corollary 3.5 Assuming that X is a memoryless source and the typical sequence
(or typical code) Wε(n) in block code X n is defined by formula (3.55), then for any
ε > 0, n ≥ 1, we have
(i) (Progressive bisection)
(iii) When X is not equal to almost information space, the proportion of Wε(n) in block
code X n is any smaller, that is,
|Wε(n) |
lim = 0.
n→∞ |X |n
effective way to compress the packet code information, so that the rearranged code-
words are as few as possible, and the error probability of decoding and recovery is
as small as possible. An effective method is to divide the codeword in block code
X n into two parts; the codeword of typical code Wε(n) is uniformly numbered from
1 to M. That is, the codeword in Wε(n) forms one-to-one correspondence with the
following positive integer set I ,
For codewords that do not belong to Wε(n) , we uniformly number them as 1: Obvi-
ously, for i, i = 1, 1 ≤ i ≤ n, there is a unique codeword x (i) ∈ Wε(n) in Wε(n) , so we
decode
can accurately restore i to x (i) , that is i −→ x (i) is the correct decoding. For i = 1,
we will not be able to decode correctly, resulting in decoding recovery error. We
denote the code rate of the typical code Wε(n) as n1 log M, by Lemma 3.12,
Equivalently,
1 1
log(1 − ε) + H (X ) − ε ≤ log M ≤ H (X ) + ε, (3.59)
n n
when 0 < ε < 1 given, we have
1
H (X ) − ε ≤ lim log M ≤ H (X ) + ε.
n→∞ n
In other words, the code rate is typically close to H (X ). Let us look at the decoding
error probability Pe after this number, where
/ Wε(n) }.
Pe = P{x ∈ X n : x ∈
Because
Pe + P{Wε(n) } = 1,
From this, we derive the main result of this section, the so-called source coding
theorem.
Proof The above analysis has given the proof of (i). In fact, if
1
R= log M1 > H (X ),
n
then when ε is sufficiently small, by (3.59). Typical codes in block code X n are
1
R> log |Wε(n) |, M1 > |Wε(n) |.
n
Therefore, we construct a code C ⊂ X n , which satisfies
Wε(n) ⊂ C, |C| = M1 .
Thus, the code rate of C is just equal to R, and the decoding error probability Pe (C)
after compression coding satisfies Pe (C) < ε. Because the probability of occurrence
of C
P{C} + Pe (C) = 1.
But
P{C} ≥ P{Wε(n) } > 1 − ε,
1
|− log p(x) − H (X )| < ε.
n
1
R= log M < H (X ) − δ,
n
then we have
By (3.61),
P{Wε(n) } < 2−n(δ−ε) , (3.62)
1 − Pe = P{Wε(n) } < ε.
Thus
lim Pe = 1,
n→∞
Definition 3.14 Let X be a source state set, Z D is the remaining class ring of mod D,
n, k are positive integers. The mapping f : X n → ZkD is called equal length code
ψ
coding function; ZkD −→ X n is called the corresponding decoding function. For
∀ x = x1 · · · xn ∈ X n , f (x) = u = u 1 · · · u k ∈ ZkD , u = u 1 · · · u k is called a code-
word of length k.
C = { f (x) ∈ ZkD |x ∈ X n }, (3.63)
call
Call C is the code coded by f , and R = nk log D is the coding rate of f , also
known as the code rate of C. C is called equal length code; it is sometimes called a
block code with a packet length of k.
122 3 Shannon Theory
By Definition 3.14, the error probability of an equal length code coding scheme
( f, ψ) is
Pe = P{ψ( f (x)) = x, x ∈ X n }. (3.64)
k
R= log D ≥ log N = log |X |. (3.65)
n
Therefore, the code rate of error free compression coding f is at least log2 |X | bits
or ln |X | naits.
We consider progressive error free coding, that is, for any given ε > 0, required
decoding error probability Pe ≤ ε. By Theorem 3.10, only the code rate R ≥ H (X ) is
needed. In fact, take X as an information space and encode the n-lengthen message
column x = x1 x2 · · · xn ∈ X n , if x ∈ Wε(n) is a typical sequence (typical code), x
corresponds to a number in M = |Wε(n) |, if x ∈ / Wε(n) , uniformly code x as 1. If the
(n)
M codewords in Wε are represented by D-ary digits, let D k = M (the insufficient
part can be supplemented), and the code rate R is
1 k
R= log M = log D.
n n
1 1 1 1 1 1 1 1
H (X ) = − log2 − log2 − log2 − log2 = 1.75bits.
2 2 4 4 8 8 8 8
3.6 Optimal Code Theory 123
If equal length code is used for coding, the code length is 2, and the code is
Source letter Codeword
1 00
2 01
3 10
4 11
Then the code rate R(k = 2, n = 1) is
Obviously, the use efficiency of equal length codes is not high. If the above codes
are replaced with unequal length codes, such as
Source letter Codeword
1 0
2 10
3 110
4 111
We use l(x) to represent the code length after the source letter x is encoded, then the
average code length L required for X encoding is
4
1 1 1 1
L= p(xi )l(xi ) = × 1 + × 2 + × 3 + × 3 = 1.75 bits = H (X ).
i=1
2 4 8 8
It can be seen that using unequal length code to compile X has higher efficiency.
This example also explains the following compression coding principle: for char-
acters with high probability of occurrence, a shorter codeword is prepared, and for
characters with low probability of occurrence, a longer codeword is prepared to
ensure that the average coding length is as small as possible.
Next, we give the mathematical definition of variable length coding. For this
purpose, ∗
let X and Z∗D be the set of finite length sequences, respectively. That is
∗
X = 1≤k<∞ X . k
f
Definition 3.15 (i) X n −→ Z∗D is called a variable length code function, if any x ∈
X n , f (x) ∈ Z∗D , When x is different, the code length of f (x) is not necessarily
the same. We use l(x) to table the length of f (x), which is called the coding
length of x. C = { f (x) ∈ Z∗D |x ∈ X n } is called variable length codeword set.
(ii) Let f : X ∗ −→Z∗D be a amapping, call f is a coding mapping, f (X ∗ ) is called
a code.
(iii) f : X ∗ −→Z∗D is called a block code mapping, if there is a mapping g :
X −→Z∗D , so that for any x ∈ X n (n ≥ 1), write x = x1 x2 · · · xn , there is f (x) =
g(x1 )g(x2 ) · · · g(xn ).
(iv) f : X ∗ −→Z∗D is called a uniquely decodable map, if f is a block code mapping
and f is a injection.
124 3 Shannon Theory
Then
f (x y) = g(x1 )g(x2 ) · · · g(xn )g(y1 )g(y2 ) · · · g(ym )
= g(y1 )g(y2 ) · · · g(ym )g(x1 )g(x2 ) · · · g(xn )
= f (yx).
But x y = yx, this contradicts the fact that f is restricted to a injection on X n+m .
Lemma 3.15 A real-time code is uniquely decodable, and vice versa.
Proof Suppose f : X ∗ −→Z∗D as an instant code mapping, and for x, y ∈ X ∗ , x = y,
there is f (x) = a1 a2 · · · an ∈ ZnD , f (y) = b1 b2 · · · bm ∈ ZmD (m ≥ n). Because f (x)
is not a prefix of f (y), it exists i(1 ≤ i ≤ n), there is ai = bi , thus f (x) = f (y),
that is f is an injection. In turn, let us take a counter example,
Source letter Codeword
1 0
2 01
3 011
4 111
m
D −li ≤ 1. (3.66)
i=1
3.6 Optimal Code Theory 125
On the contrary, if li satisfies the above conditions, there is a code length set of
real-time code C such that {l1 , l2 , . . . , lm } is C.
Proof Consider
m n
D −li )n = (D −l1 + D −l2 + · · · + D −lm ,
i=1
the form of each item is D −li1 −li2 −···−lin = D −k , where li1 + li2 + · · · + lin = k. Sup-
pose l = max{l1 , l2 , . . . , lm }, then the range of k is from n to nl. Define the number
of items where Nk is D −k , then
m n
nl
−li
D = Nk D −k .
i=1 k=n
Note that Nk can be regarded as the number of codeword sequences with a total
length of k just assembled by n codewords in C, i.e.,
If x ≥ 1, and when n Is Sufficiently Large, x n > nl. But the above formula holds for
m
all arbitrary n. That is i=1 D −li ≤ 1.
On the contrary, assuming that Kraft inequality exists, that is, there is a given
length li (1 ≤ i ≤ m) satisfying formula (3.66), now we need to construct a real-
time code with these lengths, and li (1 ≤ i ≤ m) may not be completely different.
Definition n j is the number of codewords with length j, if l = max{l1 , l2 , . . . , lm },
then
l
n j = m.
j=1
(3.66) equivalent to
l
n j D − j ≤ 1.
j=1
l
Multiply both sides by Dl , then j=1 n j Dl− j ≤ Dl . There is
126 3 Shannon Theory
···
n 3 ≤ D 3 − n 1 D 2 − n 2 D,
n 2 ≤ D 2 − n 1 D,
n 1 ≤ D.
Because the encoder inputs and the decoder receives continuous codeword sym-
bols, if the character received by the decoder is 001101, there may be two decoding
results, 112212 and 3412. This shows that f ∗ is not an injection, that is, the code
written by f is not uniquely decodable.
By Lemma 3.16, real-time codes or, more generally, uniquely decodable codes
must satisfy Kraft inequality. However, the variable length code compiled according
to kraft inequality is not the optimal code, because from the perspective of random
coding, an optimal code not only requires the accuracy of decoding, but also ensures
the efficiency, that is, the average random code length requires the shortest. We
summarize the strict mathematical definition of the optimal code as.
Definition 3.16 Let X = {x1 , x2 , . . . , xm } is an information space, a real-time code
C = { f (x1 ), f (x2 ), . . . , f (xm )} is called an optimal code if its average random code
length
m
L= pi li (3.67)
i=1
3.6 Optimal Code Theory 127
is the smallest, where pi = p(xi ) is the occurrence probability of xi and li is the code
length of xi .
For a source state set X , when its statistical characteristics are determined, that is,
after X becomes an information space, the probability distribution { p(x)|x ∈ X } is
given. Therefore, to find the optimal compression coding scheme for an information
space X is to find the optimal solution {l1 , l2 , . . . , lm } of (3.67) under the condition of
kraft inequality. Usually, we use the Lagrange multiplier method to find the optimal
solution. Let m
m
−li
J= pi li + λ D ,
i=1 i=1
∂J
= pi − λD −li log D.
∂li
Thus pi
D −li = .
λ log D
We get
m
1 m
1
1≥ D −li = pi ⇒ λ ≥ .
i=1
λ log D i=1 log D
m
m
L= pi li ≥ − pi log D pi = H D (X ). (3.69)
i=1 i=1
That is, L is the D-ary information entropy H D (X ) of X . from this, we get the main
results of this section.
Theorem 3.11 The average length L of any D-ary real-time code in an information
space X shall satisfies
L ≥ H D (X ).
128 3 Shannon Theory
Next, we will give another proof of Theorem 3.11. Therefore, we consider that there
are two random variables ξ and η on a source state set X , and their probability
distributions are
Lemma 3.17 The relative entropy D( p||q) of two random variables on X satisfies
Proof If the real number x > 0 is expanded by the power series of e x , it can be
obtained
1
e x−1 = 1 + (x − 1) + (x − 1)2 + · · · .
2
m
m
1
L − H D (X ) = pi li − pi log D
i=1 i=1
pi
(3.71)
m
m
−li
=− pi log D D + pi log D pi .
i=1 i=1
Define
D −li m
ri = , c= D −li .
c j=1
m
c ≤ 1, and ri = 1.
i=1
m
m
m
pi 1
L − H D (X ) = − pi log D cri + pi log D pi = pi log D + log D .
i=1 i=1 i=1
ri c
that is
1
pi = D −li , or li = log D .
pi
By Theorem 3.11, coding according to probability, then the code length of D-ary
optimal code is
1
li = log D , 1 ≤ i ≤ m.
pi
But in general, log D p1i is not an integer, we use a to represent the smallest integer
not less than the real number a. Take
1
li = log D , 1 ≤ i ≤ m. (3.72)
pi
Then
m
m
− log D 1
m
D −li ≤ D pi
= pi = 1.
i=1 i=1 i=1
Then the code length defined by formula (3.72) is {l1 , l2 , . . . , lm } and satisfies Kraft
inequality. From Lemma 3.16, we can define the corresponding real-time code.
Corollary 3.6 The code length l( f (xi )) of a Shannon code C = { f (xi )|1 ≤ i ≤ m}
satisfies
130 3 Shannon Theory
1 1 1
li = log D , log D ≤ li < log D +1 (3.73)
p(xi ) p(xi ) p(xi )
and
H D (X ) ≤ L < H D (X ) + 1.
1 1
log D ≤ li < log D + 1.
p(xi ) p(xi )
m m m
1 1
p(xi ) log D ≤ p(xi )li < p(xi ) log D +1 .
i=1
p(xi ) i=1 i=1
p(xi )
That is
H D (X ) ≤ L < H D (X ) + 1.
In variable length codes, in order to make the average code length as close to the
source entropy as possible, the code length should match the occurrence probability of
the corresponding coded characters as much as possible. The principle of probabilistic
coding is that the characters with high occurrence probability are configured with
short codewords, and the characters with low occurrence probability are configured
with long codewords, So as to make the average code length as close to the source
entropy as possible. This idea has existed long before Shannon theory. For example,
Morse code invented in 1838 uses three symbols of dot, dash and space to encode 26
letters in English. It is expressed in binary, one dot is 10, a total of 2 bits, one dash is
1110, a total of 4 bits and the space is 000. There are three bits in total. For example,
the commonly used English letter E is represented by a dot, while the infrequently
used letter Q is represented by two dashes, one dot and one dash, which can make
the average length of the codeword of the English text shorter. However, Morse code
does not completely match the occurrence probability, so it is not the optimal code,
and it is basically not used now. The following table is the coding table of Morse
code (Fig. 3.1)
3.7 Several Examples of Compression Coding 131
It is worth noting that Morse code appeared as a kind of password in the early
stage, which is widely used in the transmission and storage of sensitive politics (such
as military intelligence). The early cryptosystem compilers were also manufactured
based on the principle of Morse code, which quickly mechanized the compilation
and translation of passwords. In this sense, Morse code has played an important role
in promoting the development of cryptography.
132 3 Shannon Theory
Shannon, Fano and Huffman have all studied the coding methods of variable length
codes, among which Huffman codes have the highest coding efficiency. We focus on
the coding methods of Huffman binary and ternary codes.
Let X = {x1 , x2 , . . . , xm } be the source letter set of m symbols, arrange the m
symbols in the order of occurrence probability, take the two letters with the lowest
probability to prepare the numbers “0” and “1,” respectively, then add their proba-
bilities as a new letter and rearrange them in the order of probability with the source
letters without binary numbers. Then take the two letters with the lowest probability
to prepare the numbers “0” and “1,” respectively, add the probabilities of the two
letters as the probability of a new letter, and re queue; continue the above process
until the probability of the remaining letters is added to 1. At this time, all source
letters correspond to a string of “0” and “1,” and we get a variable length code, which
is called Huffman code. Taking X = {1, 2, 3, 4, 5} as the information space as an
example, the corresponding probability distribution is
1 2 3 4 5
ξ∼ .
0.25 0.25 0.2 0.15 0.15
(1) If pi > p j , then li ≤ l j , that is, the source letter with low probability has a longer
codeword;
(2) The longest two codewords have the same code length;
(3) The codeword letters of the two longest codewords are only different from the
last letter, and the front ones are the same;
(4) In real-time codes, the average code length of Huffman code is the smallest. In
this sense, Huffman code is the optimal code.
Huffman code has been applied in practice, which is mainly used in the compression
standard of fax image. However, in the actual data compression, the statistical char-
acteristics of some sources change before and after. In order to make the statistical
characteristics based on the coding adapt to the changes of the actual statistical char-
acteristics of the source, an adaptive coding technology has been developed. In each
step of coding, the coding of a new message is based on the statistical characteristics
of previous messages. For example, R. G. Gallager first proposed the step-by-step
updating technology of Huffman code in 1978, and D.E. Knuth made this technol-
ogy a practical algorithm in 1985. Adaptive Huffman coding technology requires
complex data structure and continuous updating of codeword set according to the
statistical characteristics of source, We would not go into details here.
F̄(x) = F̄(y).
So when we know F̄(x), we can find the corresponding x. The basic idea of Shannon–
Fano arithmetic code is to use F̄(x) to encode x. Because F̄(x) is a real number, its
binary decimal represents the first l(x) bits, denote as { F̄(x)}l(x) , there is
1 1 p(x)
= < = F̄(x) − F(x − 1), (3.76)
2l(x) log 1
2
2·2 p(x)
Lemma 3.18 The binary Shannon Fano code is a real-time code, and its average
length L is at most two bits different from the theoretical optimal value H (X ).
Proof By (3.76),
1
2−l(x) < p(x) = F̄(x) − F(x − 1).
2
1
F̄(x) ∈ [0.a1 a2 · · · al(x) , 0.a1 a2 · · · al(x) + ].
2l(x)
If y ∈ X , x = y, and f (x) is the prefix of f (y), then we have
1
F̄(y) ∈ [0.a1 a2 · · · al(x) , 0.a1 a2 · · · al(x) + ].
2l(x)
But
1 1 1
F̄(y) − F̄(x) ≥ p(y) ≥ p(x) > l(x) ,
2 2 2
This is contrary to the fact that F̄(x) and F̄(y) are in the same interval. Therefore, we
have f as real-time code, that is, Shannon–Fano code is real-time code. Considering
its average code length L,
1 1
L= p(x)l(x) = p(x) log +1 < p(x) log + 2 = H (X ) + 2.
p(x) p(x)
x∈X x∈X x∈X
Let X be the input alphabet and Y the output alphabet, and let ξ and η be two random
variables with values on X and Y . The probability functions p(x) and p(y) of X and
Y and the conditional probability function p(y|x) are
p(x) = P{ξ = x}, p(y) = P{η = y}, p(y|x) = P{η = y|ξ = x}respectively.
If X and Y are finite sets, the conditional probability matrix T = ( p(y|x))|X |×|Y | is
called the transition probability matrix from X to Y , i.e.,
⎛ ⎞
p(y1 |x1 ) p(y2 |x1 ) . . . p(y N |x1 )
⎜ p(y1 |x2 ) p(y2 |x2 ) . . . p(y N |x2 ) ⎟
T =⎜
⎝ ...
⎟,
⎠ (3.79)
p(y1 |x M ) p(y2 |x M ) . . . p(y N |x M )
X n = {x = x1 · · · xn |xi ∈ X }, Y n = {y = y1 · · · yn |yi ∈ Y }, n ≥ 1.
n
n
p(x) = p(x1 · · · xn ) = p(xi ), p(y) = p(y1 · · · yn ) = p(yi ), (3.80)
i=1 i=1
then X and Y become a memoryless source, X n and Y n are power spaces, respectively.
From the joint event probability p(xi yi ) = p(x1 y1 ) in equation (3.81), then there
is
p(x1 )
p(yi |xi ) = p(y1 |x1 ). (3.82)
p(xi )
3.8 Channel Coding Theorem 137
The above formula shows that in a memoryless channel, the conditional probability
p(yi |xi ) does not depend on yi .
Definition 3.19 is the statistical characteristic of a memoryless channel. The fol-
lowing lemma gives a mathematical characterization of a memoryless channel.
Proof If X Y is a memoryless source (see Definition 3.9), thn for any n ≥ 1, and
x = x1 · · · xn ∈ X n , y = y1 · · · yn ∈ Y n , x y ∈ X n Y n , there is
n
p(x y) = p(x1 · · · xn y1 · · · yn ) = p(x1 y1 · · · xn yn ) = p(xi yi ).
i=1
Thus
n
p(x) p(y|x) = p(x) p(yi |xi ),
i=1
so we have
n
p(y|x) = p(yi |xi ).
i=1
and p(ai ) = p(a1 ), therefore, X Y is a memoryless source, that is, a group of indepen-
dent and identically distributed random vectors ξ = (ξ1 , ξ2 , . . . , ξn , . . .) take value
on X Y , and (X Y )n = X n Y n is called power space. The Lemma holds.
H (X n Y n ) = H ((X Y )n ) = n H (X Y ) = n H (X ) + n H (Y |X ).
H (X n Y n ) = H (X n ) + H (Y n |X n ) = n H (X ) + H (Y n |X n ).
H (Y n |X n ) = n H (Y |X ).
I (X n , Y n ) = H (Y n ) − H (Y n |X n )
= n H (Y ) − n H (Y |X )
= n(H (Y ) − H (Y |X )) = n I (X, Y ).
Let us define the channel capacity of a discrete channel, this concept plays an
important role in channel coding. First, we note that the joint probability distribution
p(x y) in the product space X Y is uniquely determined by the probability distribution
p(x) on X and the probability transformation matrix T , that is p(x y) = p(x) p(y|x);
therefore, the mutual information I (X, Y ) of X and Y is also uniquely determined
by p(x) and T . In fact,
p(x y)
I (X, Y ) = p(x y) log
x∈X y∈Y
p(x) p(y)
p(y|x)
= p(x) p(y|x) log .
x∈X y∈Y
p(x) p(y|x)
x∈X
Proof The amount of mutual information between the two information spaces is
I (X, Y ) ≥ 0 (see Lemma 3.5), so there is B ≥ 0. By Lemma 3.4,
I (X, Y ) = H (X ) − H (X |Y ) ≤ H (X ) ≤ log |X |
and
I (X, Y ) = H (Y ) − H (Y |X ) ≤ H (Y ) ≤ log |Y |,
so we have
0 ≤ B ≤ min{log |X |, log |Y |}.
Proof Let {X, T, Y } be a noise free channel, then |X | = |Y |, and the probability
transfer matrix T is the identity matrix, so
p(y|x)
I (X, Y ) = p(x y) log
x∈X y∈Y
p(y)
p(y|x)
= p(x) p(y|x) log .
x∈X y∈Y
p(y)
Thus
B = max I (X, Y ) = log |X |.
p(x)
Let a be the random variable in the input space F2 and b be the random variable in the
output space F2 , all of which obey the two-point distribution, and then the transfer
matrix T of the symmetric binary channel can be represented by the following clearer
schematic diagram:
P{b = 1|a = 0} = P{b = 0|a = 1} = p
P{b = 0|a = 0} = P{b = 1|a = 1} = 1 − p.
I (X, Y ) = H (X ) − H (X |Y ),
however,
H (X |Y ) = p(x y) log p(x|y)
x∈F2 y∈F2
Thus
B = max{I (X, Y )} = max{H (X ) − H ( p)} = 1 − H ( p).
In order to state and prove the channel coding theorem, we introduce the concept
of joint typical sequence. By the Definition 3.13 of Sect. 5 this chapter, if X is a
memoryless source, for any small ε > 0 and positive integer n ≥ 1, in the power
space X n , we define the typical sequence Wε(n) as
1
Wε(n) = {x = x1 · · · xn ∈ X n || − log p(x) − H (X )| < ε}.
n
If {X, T, Y } is a memoryless channel, by Lemma 3.19, X Y is a memoryless source,
in the power space (X Y )n = X n Y n , we define the joint canonical sequence Wε(n) as
(Fig. 3.4)
1 1
Wε(n) = x y ∈ X n Y n | − log p(x) − H (X )| < ε, | − log p(y) − H (Y )| < ε,
n n
1
|− log p(x y) − H (X Y )| < ε . (3.85)
n
1
− log p(X n ) → H (X ), Convergence according to probability when n → ∞;
n
1
− log p(Y n ) → H (Y ), Convergence according to probability when n → ∞;
n
1
− log p(X n Y n ) → H (X Y ), Convergence according to probability when n → ∞.
n
So when ε is given, as long as n is sufficiently large, there is
1 1
P1 = P | − log p(x) − H (X )| > ε < ε,
n 3
1 1
P2 = P | − log p(y) − H (Y )| > ε < ε,
n 3
1 1
P3 = P | − log p(x y) − H (X Y )| > ε < ε,
n 3
Thus
P x y ∈ Wε(n) > 1 − ε,
in other words,
lim P{x y ∈ Wε(n) } = 1.
n→∞
1
H (X Y ) − ε < − log p(x y) < H (X Y ) + ε.
n
Equivalently,
2−n(H (X Y )+ε) < p(x y) < 2−n(H (X Y )−ε) .
So there is
|Wε(n) | ≤ 2n(H (X Y )+ε) .
So there is
(1 − ε) 2n(H (X Y )−ε) ≤ |Wε(n) | ≤ 2n(H (X Y )+ε) ,
property (ii) holds. Now let’s prove property (iii). If p(x y) = p(x) p(y), then
P{x y ∈ Wε(n) } = p(x) p(y)
x y∈Wε(n)
The following lemma has important applications in proving the channel coding
theorem. In fact, the conclusion of lemma is valid in general probability space.
Thus
p(yx ) = p(y) p(x ).
C = f (W ) = { f (w)|w ∈ W } ⊂ X n
1 1
RC = log |C| = log M.
n n
For each input message w ∈ W , if g(T ( f (w))) = w, it is said that the channel
transmission is wrong, the transmission error probability λw is
1 1
M
Pe (C) = Pe (x) = λw . (3.88)
M x∈C M w=1
Theorem 3.12 (Shannon’s channel coding theorem, 1948) Let {X, T, Y } be a mem-
oryless channel and B be the channel capacity, then
(i) When R < B, there is a column of codes Cn = (n, 2[n R] ), its transmission error
probability Pe (Cn ) satisfies
(ii) Conversely, if the transmission error probability of code Cn = (n, 2[n R] ) satisfies
Eq. (3.89), there is an absolute normal number N0 , and we have the code rate
RCn of Cn satisfies
RCn ≤ B, when n ≥ N0 .
1
R− < RCn ≤ R. (3.90)
n
so (i) of Theorem 3.12 indicates that the code rate is sufficiently close to the channel
capacity B, the “good code” with sufficiently small transmission error probability
exists. (ii) indicates that the bit rate of the so-called good code with sufficiently small
transmission error probability does not exceed the channel capacity. Shannon’s proof
Theorem 3.12 uses random code technology; this idea of using random method to
prove deterministic results is widely used in information theory. At present, it has
more and more applications in other fields.
Proof (Proof of theorem 3.12) Firstly, the probability function p(xi ) is arbitrarily
selected on the input alphabet X , and the joint probability in power space X n is
defined as
n
p(x) = p(xi ), x = x1 · · · xn ∈ X n , (3.91)
i=1
In this way, we get a memoryless source X and power space X n , which consti-
tute the codeword space of channel coding. Then M = 2[n R] codewords are ran-
domly selected in X n to obtain a random code Cn = (n, 2[n R] ). In order to illus-
trate the randomness of codeword selection, we borrow the source message set
W = {1, 2, . . . , M}, where M = 2[n R] . For every message w, 1 ≤ w ≤ M, the ran-
domly generated codeword is marked as X (n) (w). So we get a random code
3.8 Channel Coding Theorem 145
M
M
n
(n)
P{Cn } = P{X (w)} = p(xi (w)),
w=1 w=1 i=1
If you want to prove that for any ε > 0, When n is sufficiently large, P̄e (An ) < ε,
then there is at least one code Cn ∈ An such that Pe (Cn ) < ε, which proves the (i).
Therefore, we prove it in two steps.
(1) Principles of constructing random codes and encoding and decoding
We select each message in the source message set W = {1, 2, . . . , M} with equal
probability, that is w ∈ W , the selection probability of w is
1
p(w) = = 2−[n R] , w = 1, 2, . . . , M.
M
In this way, W becomes an equal probability information space. For each input
message w, it is randomly coded as X (n) (w) ∈ X n , where
Codeword X (n) (w) is transmitted through memoryless channel {X, T, Y } with con-
ditional probability
n
p(y|X (n) (w)) = p(yi |xi (w))
i=1
M (3.93)
1
= λw P{Cn }
M w=1 Cn ∈An
1
M
= λw ,
M w=1
λ1 = λ2 = · · · = λ M .
M
λ1 = P{E 1c ∪ E 2 ∪ · · · ∪ E M } ≤ P{E 1c } + P{E i }. (3.95)
i=2
/ Wε(n) } = 0.
lim P{x y ∈
n→∞
So there is
lim P{X (n) (1)y ∈
/ Wε(n) } = 0.
n→∞
P{E 1c } < ε.
Obviously, codeword X (n) (1) and other codewords X (n) (i), (i = 2, . . . , M) are inde-
pendent of each other (see 3.91). By Lemma 3.23, y = T (X (n) (1)) and X (n) (i) (i =
1) also are independent of each other. Then by the property (iii) of Lemma 3.22,
To sum up,
M
P̄e (An ) =λ1 ≤ ε + 2−n(I (X,Y )−3ε)
i=2
H (W ) = log |W | = [n R].
By Lemma 3.20,
I (W, Y n ) ≤ I (X n , Y n ) = n I (X, Y ) ≤ n B.
1
RC ≤ R < B + ,
n
thus
RC ≤ B, when n is sufficiently large.
The above formula shows that when the transmission error probability is 0, as long as
n is sufficiently large, there is RC ≤ B. Secondly, if the transmission error is allowed,
that is, the error probability of Cn is Pe (Cn ) < ε, where Cn = (n, 2[n R] ). Then when
n is sufficiently large, we still have RCn ≤ B.
148 3 Shannon Theory
In order to prove the above conclusion, we note the error probability of random
code Cn is
Pe (Cn ) = λw , (3.97)
By Theorem 3.3,
H (E W |Y n ) = H (W |Y n ) + H (E|W Y n )
(3.98)
= H (E|Y n ) + H (W |EY n ).
H (E|Y n ) ≤ H (E) ≤ 1.
By (3.98), we have
H (W |Y n ) ≤ 1 + n R Pe (Cn ).
H ( f (W )|Y n ) ≤ H (W |Y n ) ≤ 1 + n R Pe (Cn ).
Finally,
= H (W ) = H (W |Y n ) + I (W, Y n )
≤ H (W |Y n ) + I ( f (W ), Y n )
≤ 1 + n R Pe (Cn ) + I (X n , Y n )
≤ 1 + n R Pe (Cn ) + n B,
n R < 2 + n R Pe (Cn ) + n B.
3.8 Channel Coding Theorem 149
Thus
2
RC n ≤ R < B + + ε,
n
When n is sufficiently large, we obtain RCn ≤ B, which completes the proof of the
theorem.
Y X
0 1
1 1
0 4 4
1 5
1 12 12
H (X 3 ) = 1, and H (X 1 X 2 X 3 ) = 2.
8. Let the information space be X = {0, 1, 2, . . .}, and take an example of the
random variable ξ taken from X , so that H (X ) = ∞.
9. Let X 1 = (X, ξ ), X 2 = (X, η) be two information spaces and ξ be a function of
η, prove H (X 1 ) ≤ H (X 2 ), and explain this result.
10. Let X 1 = (X, ξ ), X 2 = (X, η) be two information spaces and η = f (ξ ), prove
(i) H (X 1 ) ≥ H (X 2 ), give the conditions under which the equal sign holds.
(ii) H (X 1 |X 2 ) ≥ H (X 2 |X 1 ), give the conditions under which the equal sign
holds.
References
Bassoli, R., Marques, H., & Rodriguez, J. (2013). Network coding theory, a survey. IEEE Commun.
Surveys Tutor., 15(4), 1950–1978.
Berger, T. (1971). Rate distortion theory: a mathematical basis for data compression. Prentice-Hall.
Blahut, R. E. P. (1965). Ergodic theory and informtion. Wiley.
Chung, K. L. (1961). A note on the ergodic theorem of information theory. Addison. Math Statist.,
32, 612–614.
Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. Wiley.
Csiszár, I., & Körner, J. (1981). Informaton theory: Coding theorems for discrete memoryless
systems. Academic Press.
EI Gamal, A., & Kim, Y. H. (2011). Network information theory. Cambrige University Press
Fragouli, C., Le Boudec, J. Y., & Widmer, J. (2006). Network coding: An instant primer. ACMSIG-
COMM Computer Communication Review, 36, 63–68.
Gallager, R. G. (1968). Information theory and reliable communication. Wiley.
Gray, R. M. (1990). Entropy and information theory. Springer.
Guiasu, S. (1977). Inormation theory with applications. McGraw-Hill.
Ho, T., & Lun, D. Network coding: An introduction. Computer Journal.
Hu, X. H., & Ye, Z. X. (2006). Generalized quantum entropy. Journal of Mathematical Physics,
47(2), 1–7.
Ihara, S. (1993). Information theory for continuous systems. World Scientific.
Kakihara, Y. (1999). Abstract methods in information theory. World Scientific.
McMillan, B. (1953). The basic theorems of information theory. Annals of Mathematical Statistics,
24(2), 196–219.
Moy, S. C. (1961). A note on generalizations of Shannon-McMilllan theorem. Pacific Journal of
Mathematics, 11, 705–714.
Nielsen, M. A., & Chuang, I. L. (2000). Quantum computation and quantum information. Cambridge
University Press.
Shannon, C. E. (1948). A mathematical theory of communication. Bell Labs Technical Journal,
27(4), 379–423, 623–656.
Shannon, C. E. (1959). Coding theorem for a discrete source with a fidelity criterion. IRE National
Convention Record, 4, 142–163.
Shannon, C. E. (1958). Channels with side information at the transmitter. IBM Journal of Research
and Development, 2(4), 189–193.
Shannon, C. E. (1961). Two-way communication channels. Proceedings of the Fourth Berkeley
Symposium on Mathematical Statistics and Probability, 1, 611–644.
Thomasian, A. J. (1960). An elementary proof of the AEP of information theory. Annals of Math-
ematical Statistics, 31(2), 452–456.
Wolfowitz, J. (1978). Coding theorems of information theory (3rd ed.). Springer-Verlag.
References 151
Ye, Z. X., & Berger, T. (1998). Information measures for discrete random fields. Science Press.
Yeung, R. W. (2002). A first course in information theory. Kluwer Academic.
Qiu, P. (2003). Information theory and coding. Higher Education Press. (in Chinese).
Qiu, P., Zhang, C., Yang, S., et al. (2012). Multi user information theory. Science Press. (in Chinese).
Ye, Z. (2003). Fundamentals of information theory. Higher Education Press. (in Chinese).
Zhang, Z., & Lin, X. (1993). Information theory and optimal coding. Shanghai Science and Tech-
nology Press. (in Chinese).
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 4
Cryptosystem and Authentication System
∞
Let X = {a1 , a2 , . . . , aq } be the plaintext alphabet and a source. {ξi }i=1 is a set of
random variables valued on X , for any given positive integer n ≥ 1, we define the
plaintext space P as the product information space X 1 X 2 · · · X n , that is
P = X 1 X 2 · · · X n , where X i = (X, ξi ), 1 ≤ i ≤ n.
1 1
p(b) = P{ηi = b} = = , ∀ i ≥ 1. (4.2)
|Z | s
K = Z r = {k = k1 k2 · · · kr |ki ∈ Z , 1 ≤ i ≤ r }.
This shows that the r -dimensional random vector η = (η1 , η2 , . . . , ηr ) taking value
on the key space K is also equally almost distributed on K . Unless otherwise spec-
ified, we generally stipulate that the plaintext space P and the key space K are
independent information spaces, that is
E = {E k |k ∈ K }.
Dk E k = 1 P , or Dk (E k (m)) = m, ∀ m ∈ P. (4.5)
C = {E k (m)|m ∈ P, k ∈ K } ⊂ X 1 X 2 · · · X n . (4.6)
That is, cryptosystemtext space C and plaintext space P have the same alphabet and
the same letter length.
For each cryptosystemtext c ∈ C, c = E k (m), then c is uniquely determined by
plaintext m and key k, so we can define the occurrence probability p(c) of cryptosys-
temtext c as
4.1 Definition and Statistical Characteristics of Cryptosystem 155
Obviously,
p(c) = p(km) = p(k) p(m) = 1,
c∈C k∈K m∈P k∈K m∈P
Ak = {E k (m)|m ∈ P} ⊂ C. (4.8)
D = {Dk |k ∈ K }. (4.9)
(2) c ∈ C, m ∈ P, then
p(c|m) = p(k). (4.11)
k∈K
E k (m)=c
(3) c ∈ C, m ∈ P, then
p(m) k∈K p(k)
Dk (c)=m
p(m|c) = , (4.12)
k∈K p(k) p(Dk (c))
c∈Ak
156 4 Cryptosystem and Authentication System
(1) holds. (2) is trivial. Because when m ∈ P is given, the occurrence probability
p(c|m) of cryptosystemtext c has
p(c|m) = p(k).
k∈K ,
E k (m)=c
p(m) p(c|m)
p(m|c) = ,
m ∈P p(m ) p(c|m )
So in the end
p(m) k∈K p(k)
E k (m)=c
p(m|c) = ,
k∈K p(k) p(Dk (c))
c∈Ak
H (P K ) = H (P) + H (K ). (4.13)
It has been previously specified that the key source alphabet Z is an equal proba-
bility information space without memory, and the probability p(k) of the key space
K = {k = k1 k2 · · · kr |ki ∈ Z } is
r
1 1
p(k) = p(ki ) = = . (4.14)
i=1
|Z |r |K |
Thus Nm ⊂ K C, and
p(m|kc) = 1, if kc ∈ Nm ;
p(m|kc) = 0, if kc ∈
/ Nm .
H (P) ≤ H (C).
158 4 Cryptosystem and Authentication System
The Corollary shows that the uncertainty of plaintext is less than that of cryp-
tosystemtext in cryptosystem.
Generally speaking, the mutual information I (P, C) between plaintext space and
cryptosystemtext space (see Definition 3.8 in the previous chapter) reflects the infor-
mation of plaintext space contained in cryptosystemtext space, so I (P, C) minimiza-
tion is an important design goal of cryptosystem. If the cryptosystemtext does not
provide any information about the plaintext, or the analyst cannot obtain any infor-
mation about the plaintext by observing the cryptosystemtext, such a cryptosystem
is called completely confidential.
So we have
H (P|C) = H (K |C) ≤ H (K ).
By definition,
From the previous chapter, we know the amount of mutual information I (X, Y ) ≥
0, there is
Corollary 4.2 In a completely confidential cryptosystem R = {P, C, K , E, D},
there is always
H (P) ≤ H (K ) = log2 |K |. (4.16)
H (P) − H (K ) ≤ I (P, C) = 0,
H (P) ≤ log2 |K |.
It can be seen from the above that the larger the scale |K | of the key space, the
better the confidentiality of the system!
Definition 4.3 A cryptosystem R = {P, C, K , E, D} is called a “one secret at a
time" system, if there is a unique key k ∈ K for a given m ∈ P and c ∈ C, such that
c = E k (m).
As can be seen from the above definition, for given m ∈ P, if k = k , then E k (m) =
E k (m). In other words, we only use a unique key k to encrypt the same set of plaintext
and cryptosystemtext. This is also the origin of the concept of “one secret at a time”.
Thus, for any given plaintext m ∈ P and cryptosystemtext c ∈ C, there happens to
be a unique key k ∈ K such that E k (m) = c. Therefore, when k traverses the key
space K , m traverses the plaintext space P, and each m appears only once. Thus, for
c ∈ C, we have
p(c) = p(k) p(m)
k∈K m∈P
E k (m)=c
= p(k) p(m)
(4.17)
k∈K m∈P
E k (m)=c
1 1
= p(m) = .
|K | m∈P |K |
p(m)
p(m|c) =
k∈K m ∈P p(m )
E m (k)=c
p(m)
=
m ∈P p(m )
= p(m).
Thus
H (P|C) = − p(mc) log2 p(m|c)
m∈P c∈C
=− p(mc) log2 p(m)
m∈P c∈C
=− p(m) log2 p(m)
m∈P
= H (P).
In order to introduce Shannon’s concepts of unique solution distance and ideal cryp-
tosystem, we first consider the scenario of secret only attack. In the scenario of secret
only attack, when the cryptanalyzer intercepts cryptosystemtext c, he may decrypt c
with all decryption keys Dk to obtain
m = Dk (c), k ∈ K .
Therefore, he records the keys corresponding to all meaningful messages m , only one
of the set of these keys is a correct key, while other incorrect keys are called pseudo
keys. A large number of cryptosystemtexts are required as samples in secret only
attacks. Therefore, we will consider the product space P n of plaintext and cryptosys-
temtext and the joint events in C n , P n and C n as plaintext string and cryptosystemtext
string.
4.3 Ideal Security System 161
Proof From the addition formula of information entropy (see Theorem 3.2 in the
previous chapter),
H (C|K P) = H (P|K C) = 0.
thus,
H (K P) = H (K C).
Again, from the addition formula and note that K and P are statistically independent,
so
H (K P) = H (P) + H (K |P) = H (P) + H (K )
and
H (P) + H (K ) = H (K C) = H (C) + H (K |C).
So we have
H (K |C) = H (P) + H (K ) − H (C).
2 H (K )
Sn ≥ − 1. (4.21)
|P|nr
Rn = {P n , C n , K , E n , Dn }
H (K |C n ) = H (K ) + H (P n ) − H (C n ).
By (3.9), we have
H (C n ) ≤ n log2 |C|, |C| = |P|.
So we have
H (P n ) ≥ n H∞ = n(1 − r )H0 = n(1 − r ) log2 |P|. (4.22)
We get
p(k|y) = p(k) = 1.
k∈K (y) k∈K
Finally, (4.21) can be obtained from form (4.23) to complete the proof!
When the mathematical expectation of the number of pseudo keys is greater than
0, the secret only attack cannot break the password in theory, so we define the unique
solution distance of a cryptosystem as the value of n of Sn = 0.
From Theorem 4.6, we can obtain an approximate value of the distance of the
unique solution.
H (k)
n0 ≈ .
r log2 |P|
sender, message receiver and attacker, in which the message sender and receiver trust
each other. They share the same key information; another model is the authentication
system model with arbiter. In this model, the participants of the system have arbiters
in addition to the information sender, receiver and attacker. At this time, the sender
and receiver of the message do not trust each other, but they all trust the arbiter. The
arbiter shares the key information with the sender and receiver.
An authentication system without privacy and confidentiality function and without
arbiter is composed of four parts: a finite set S of source states, called the source set,
a finite set A of authentication tags, called the tag set, a key space composed of all
solvable keys, and an authentication rule set E = {ek (s)|k ∈ K , s ∈ S}, where for
any k ∈ K , s ∈ S, ek (s) is the authentication rule. It is a mapping of S → A.
Definition 4.7 The product space S A is called the message space, and M represents
S A.
Authentication protocol: The sender and receiver of the message use the following
protocol to transmit information. First, they secretly select and share the random
key k ∈ K ; if the sender wants to transmit an information source state s ∈ S to
the receiver, the sender calculates a = ek (s) and sends the message sa ∈ M to the
receiver. When the receiver receives message sa, he calculates a = ek (s) again, if
a = a, he confirms that the message is reliable and receives the message, otherwise
he refuses to receive the message sa.
Definition 4.8 Matrix [ek (s)]|K |×|S| is called authentication matrix. Its rows are
marked by key k ∈ K and columns by source state s ∈ S. It is a |K | × |S|-order
matrix, the element intersecting row k and column s is ek (s).
Theorem 4.7 If the scale of authentication tag space A is set to |A| = r , for any
fixed source state s ∈ S, there will always be an authentication tag a ∈ A such that
1 1
pay off (s, a) ≥ , thus pd0 ≥ .
r r
Proof By the definition of pay off (s, a),
166 4 Cryptosystem and Authentication System
pay off (s, a) = p(k).
a∈A a∈A k∈K
ek (s)=a
When a runs through the s column of the authentication matrix, k traverses the whole
key space, so
pay off (s, a) = p(k) = 1.
a∈A k∈K
1 1
pay off (s, a) ≥ = .
|A| r
log2 pd0 ≥ H (K |S A) − H (K )
and
1
pd0 ≥ .
2 H (K )−H (K |S A)
Proof By definition, we know that pd0 is not less than the mathematical expectation
of pay off (s, a), that is
pd0 ≥ p(sa)pay off (s, a).
s∈S,a∈A
Obviously,
pay off (s, a) = p(a|s).
Thus
p(sa) = p(s) p(a|s) = p(s)pay off (s, a). (4.27)
4.5 Forgery Attack 167
So
log2 pd0 ≥ p(sa) log2 pay off (s, a)
s∈S a∈A
= p(sa) log2 p(a|s)
s∈S a∈A
= −H (A|S).
Because the source space S and the key space K are statistically independent, so
H (S K ) = H (K ) + H (S).
Also, the tag space A is completely determined by the source space S and the key
space K , so
H (A|K S) = 0.
H (K AS) = H (K S) + H (A|K S)
= H (K S) = H (K ) + H (S).
−H (A|S) = H (K |AS) − H (K ).
Thus
log2 pd0 ≥ −H (A|S) = H (K |AS) − H (K ).
M = S A is called message space, it can be seen from theorem 4.8 that the maxi-
mum success probability pd0 of forgery attack satisfies
1
pd0 ≥ ,
2 I (K ,M)
where I (K , M) is the average amount of mutual information between the key space
and the information space. If the amount of mutual information I (K , M) is larger,
the probability of the most successful forgery attack is lower. On the contrary, if the
amount of mutual information is smaller, the success rate of forgery attack is higher.
168 4 Cryptosystem and Authentication System
The so-called substitution attack is that the attacker first observes a message (s, a) on
the message, and then replaces (s, a) with message (s , a ), hoping that the receiver
will receive (s , a ) as a real message. Considering the maximum success probability
pd1 of substitution attack, it is more difficult than forgery attack, the main reason is
that pd1 depends on both the probability distribution of source state space S and the
probability distribution of key space K .
Let (s , a ) and (s, a) be two messages, where s = s . We use pay off (s , a , s, a)
to express the probability that using (s , a ) instead of (s, a) can cheat success, then
The above formula represents the conditional probability of a = ek0 (s ) under the
condition of a = ek0 (s) under the same key k0 , so
When the message (s, a) ∈ M is given, the attacker uses the optimal strategy to
maximize the success probability of the deceiver, so let
The above formula is the formal definition of pd1 , which is the weighted average of
the maximum success probability of pay off (s , a , s, a) in message space M.
Like Theorem 4.7, we have
1 1
pay off (s , a , s, a) ≥ = .
|A| r
4.6 Substitute Attack 169
So we have
1
pd1 ≥ .
r
Proof By (4.28),
1
pay off (s , a , s, a) = p(k)
a ∈A
pay off (s, a) a ∈A k∈K
ek (s)=a,ek (s )=a
1
= p(k) = 1.
pay off (s, a) k∈K
ek (s)=a
1 1
ps,a ≥ = .
|A| r
Thus
1 1
pd1 ≥ p(sa) ps,a ≥ p(a) = .
s∈S,a∈A
r a∈A r
and
1
pd1 ≥ . (4.31)
2 H (K |M)−H (K |M 2 )
Proof By (4.29), ps,a will not be less than the mathematical expectation of pay off
(s , a , s, a) on s ∈ S, a ∈ A, that is
ps,a ≥ p(s a |sa)pay off (s , a , s, a).
s ∈S,a ∈A
= −H (M|M).
In addition,
H (K M 2 ) = H (M|M) + H (K |M 2 )
= H (K |M) + H (M|K M).
So there are
So there are
−H (M|M) = H (K |M 2 ) − H (K |M).
Thus
log2 pd1 ≥ H (K |M 2 ) − H (K |M).
That is
1
pd1 ≥ .
2 H (K |M)−H (K |M 2 )
The Theorem holds!
pd = 2 H (K |M)−H (K ) .
Proof The theorem is proved directly by the construction method. First, let the source
state space be S = {0, 1}. Let N be a positive even number, and define the label space
A and the key space K as follows:
N N
A = Z22 = {a1 a2 · · · a N2 |ai ∈ Z2 , 1 ≤ i ≤ }
2
4.6 Substitute Attack 171
and
K = Z2N = {k1 k2 · · · k N |ki ∈ Z2 , 1 ≤ i ≤ N }.
and
ek (1) = k N2 +1 · · · k N .
Assuming that all 2 N keys k are equitably selected, so for s ∈ S and a ∈ A, we have
pd = 2 − 2 .
N
Easy to calculate
N
H (K |M) − H (K ) = − N = −H (K |M).
2
So
pd = 2 H (K |M)−H (K ) .
Encryption with matrix comes from the classical V igen èr e password. Let X =
{a1 , a2 , . . . , a N } be a plaintext alphabet of N characters, we replace the characters
in Z N and X with numerical values, where Z N is the remaining class ring of mod N .
Let P = ZkN be the plaintext space, x = x1 x2 · · · xk ∈ P is called a plaintext unit
or a plaintext message of length k. Let Mk (Z N ) be a k-order full matrix ring over
Z N , A ∈ Mk (Z N ) is a invertible matrix of order k, b = b1 b2 · · · bk ∈ ZkN is a given
directional quantity, each plaintext unit x = x1 x2 · · · xk in P is encrypted by affine
transformation (A, b):
172 4 Cryptosystem and Authentication System
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
x1 x1 b1
⎜x ⎟ ⎜ x2 ⎟ ⎜ b2 ⎟
⎜ 2⎟ ⎜ ⎟ ⎜ ⎟
⎜ .. ⎟ = A ⎜ .. ⎟ + ⎜ .. ⎟ . (4.32)
⎝ . ⎠ ⎝ . ⎠ ⎝ . ⎠
xk xk bk
Proof Take A as the k-order identity matrix E and b = 0 as the k-dimensional zero
vector, then (E, 0) is the identity transformation of ZkN −→ ZkN and the unit element
of G k . Secondly, we look at the product of two affine transformations (A1 , b1 ) and
(A2 , b2 ),
(A1 , b1 )(A2 , b2 ) = (A1 A2 , A1 b2 + b1 ) ∈ G k .
From the above lemma, any group element (A, b) ∈ G k of affine transformation
group will form a Hill cryptosystem. If we select n group elements (A1 , b1 ), (A2 , b2 ),
. . ., (An , bn ) in G k and let
n
(A, b) = (Ai , bi ).
i=1
Lemma 4.3 If A = (ai j )k×k is a k-order reversible square matrix on Z N , the bit
operation times of A−1 are estimated as follows
Therefore, when k is a fixed constant, the algorithm for finding A−1 is polynomial.
When N is a fixed constant, the algorithm for finding A−1 is exponential. In other
words, the greater the order of the matrix, the higher the computational complexity.
The bit operation times of each algebraic cofactor Ai j of the adjoint matrix A∗ of
formula (4.34) is O((k − 1)3 (k − 1)! log2 N ), and there are k 2 algebraic cofactors,
thus
Time(A∗ ) = O(k 4 k! log2 N ).
So
Time(A−1 ) = O(k 4 k! log3 N ).
When k is constant, the algorithm for finding A−1 is polynomial. When N is constant
and k −→ ∞, it is obvious that the algorithm for finding A−1 is exponential. The
Lemma holds.
4.7.2 RSA
In 1976, two mathematicians from Stanford University, Diffie and Hellman, put
forward a new idea of cryptosystem design. In short, the encryption algorithm and
decryption algorithm are designed based on the principle of asymmetry. We can use
the following schematic diagram to illustrate
4.7 Basic Algorithm 175
f f −1
P−
→ C −−→ P, (4.35)
ϕ(n) = ( p − 1)(q − 1) = n + 1 − p − q.
The large prime numbers p and q and e satisfying formula (4.37) are randomly
generated. The so-called random generation is to randomly select the p, q and e
with the help of the computer random number generator (or pseudo-random number
generator), and its computational complexity is
Lemma 4.4 Randomly generated large prime number p and q, n = pq, ϕ(n) is
Euler function, 1 < e < ϕ(n), (e, ϕ(n)) = 1, then
176 4 Cryptosystem and Authentication System
Time (select out n) = O(log4 n),
Time (find e) = O(log2 n).
Proof Use the random number generator to generate a huge integer m, such as
m > 10300 , and then detect whether m, m + 1, m + 2, . . . , is a prime number. From
the prime number theorem, we know that the frequency of prime numbers adjacent
to m is about O( log1 m ), so we only need about O(log m) tests to find the required
prime number p, by Lemma 1.5 of Chapter 1,
Similarly,
Time (find prime q) = O(log2 n).
Because n = pq, so
Time (select out n) = O(log4 n).
n after confirmation, ϕ(n) = ( p − 1)(q − 1). A positive integer a, 1 < a < ϕ(n),
is randomly generated by the random number generator, and then whether a, a +
1, a + 2, . . . and ϕ(n) are mutually prime is detected in turn. Again, according to the
prime number theorem, the frequency of the prime factor of ϕ(n) appearing in the
vicinity of a is O( log1 a ), so we only need O(log a) tests to get the required e. Thus
Definition 4.10 After randomly determining n = pq, let Pe = (n, e) be called pub-
lic key, Pd = (n, d) be called private key, or e be public key and d be private key.
By Lemma 1.5 of chapter 1, calculate the number Time(d) = O(log3 ϕ(n)) =
O(log3 n) of bit operations required for d = e−1 mod ϕ(n). By Lemma 4.4, we have
Corollary 4.3 The computational complexity of randomly generated public key
Pe = (n, e) and private key Pd = (n, d) is polynomial.
The key mathematical principle used in RSA cryptographic design is the general-
ized Euler congruence theorem. n ≥ 1 is a positive integer, (m, n) = 1, from Euler
theorem, it can be seen that
We will prove that under the condition that n is a positive integer without square
factor, there is formula (4.39) for all positive integers m, whether (m, n) = 1 or
(m, n) > 1.
Lemma 4.5 If n = pq is the product of two different prime numbers, then for all
positive integers m, k, there are
We only consider the case of (m, n) > 1, because n = pq, so (m, n) = p, (m, n) =
q, or (m, n) = n. If (m, n) = n, then(4.40) holds. Might as well let (m, n) = p, then
m = pt, where 1 ≤ t < q. By Euler theorem, because (m, q) = 1, so
For ∀ k ≥ 1, there is
m kϕ(n) ≡ 1(mod q).
We write
m kϕ(n) = rq + 1.
m kϕ(n)+1 = r tn + m.
With the above preparations, the workflow of RSA password can be divided into
the following three steps:
(1) Suppose A is a user of RSA, and A randomly generates two huge prime num-
bers p = p(A), q = q(A), n = n(A), where n = pq, ϕ(n) = ( p − 1)(q − 1).
Then randomly generate positive integers e = e(A), satisfies 1 < e < ϕ(n),
(e, ϕ(n)) = 1, calculated d ≡ e−1 (mod ϕ(n)), and 1 < d < ϕ(n). User A
destroys two prime numbers p and q, and only keeps three numbers n, e, d,
after publishing Pe = (n, e) as public key, he has private key Pd = (n, d) and
keeps it strictly confidential.
178 4 Cryptosystem and Authentication System
(2) User B of another RSA sends encrypted information to user A using the known
public key (n, e) of user A. B selects P = Zn as the plaintext space and encrypts
each m ∈ Zn . The encryption algorithm c = f (m) is defined as
where c is cryptosystemtext.
(3) After receiving the cryptosystemtext c sent by user B, user A decrypts it with
its own private key (n, d). Decryption algorithm f −1 is defined as:
User A gets the plaintext m sent by user B. so far, RSA cryptosystem completes
encryption and decryption.
The correctness and uniqueness of RSA password are guaranteed by the following
Lemma.
Lemma 4.6 The encryption algorithm f defined by equation (4.41) is a 1–1 corre-
spondence of Zn −→ Zn , and f −1 defined by equation (4.42) is the inverse mapping
of f .
Proof By Lemma 4.5, for all m ∈ Zn , k is a positive integer, then there is
ed = kϕ(n) + 1.
f −1 ( f (m)) = m.
In other words,
f ( f −1 (c)) = c.
Another very important application of RSA is for digital signature. From the
workflow of RSA password, it can be seen that the encryption algorithm defined in
formula (4.41) is based on the public key (n A , e A ) of user A, and we denote f as
f A and the decryption algorithm defined in formula (4.42) as f A−1 . The workflow
of RSA digital signature is: User A sends his digital signature to user B, that is, A
sends an encrypted message to B. Let Pe (A) = (n A , e A ) be the public key of A and
Pd (A) = (n A , d A ) the private key of A. Similarly, Pe (B) = (n B , e B ) is the public
key of B and Pd (B) = (n B , d B ) is the private key of B. Then the digital signature
sent by user A to user B is
f B f A−1 (m), if n A < n B
(4.43)
f A−1 f B (m), if n A > n B .
(ii) If n A > n B , user B uses user A’s public key f A = (n A , e A ) first, then decrypt
and verify with your own private key f B−1 = (n B , d B )
The security of RSA is the difficulty of large prime factorization based on n. When
all users select the large prime numbers p and q, let n = pq, then destroy p and
q, only (n, e) and its own secret (n, d) key information are retained, even if (n, e)
is published to the public, outsiders only know n and do not know ϕ(n), so they
cannot obtain the information of private key (n, d). Because the calculation of ϕ(n)
must rely on the prime factorization of n, from the product formula of Euler, it is not
difficult to see
1
ϕ(n) = n (1 − ).
p|n
p
Because we have very little knowledge of prime numbers, we have not found a general
term formula to give an infinite number of prime numbers, so it is undoubtedly a
difficult problem to judge whether a huge integer n is prime, not to mention the prime
factorization of n.
180 4 Cryptosystem and Authentication System
Let G be a finite group and b, y ∈ G be two group elements of G, let t be the minimum
positive integer satisfying bt = 1, t is called the order of b, denote as t = o(b). If there
is one x, 1 ≤ x ≤ o(b) such that y = b x , x is called the discrete logarithm of y under
base b. Known b ∈ G, 0 ≤ x ≤ o(b), it’s easy to calculate y = b x . Conversely, for
any group element y, it is very difficult to find the discrete logarithm of y under base
b. Therefore, using discrete logarithm to encrypt has become the most mainstream
encryption algorithm in public key cryptosystem, including the famous ElGamal
cryptosystem and elliptic curve cryptosystem. ElGamal cryptosystem uses the dis-
crete logarithm on the multiplication group formed by all Fq∗ of nonzero elements
in finite field Fq . Elliptic curve cryptography uses the discrete logarithm algorithm
of Mordell group on elliptic curve. Here we mainly discuss ElGamal cryptography,
and elliptic curve cryptography is discussed in Chap. 6. We first prove several basic
conclusions in finite field.
o(g)
o(g ) = = p − 1.
(q − 1, q−1
p−1
)
Lemma 4.9 Let Fq be a q-element finite field, q = p n , for any d|n, let
4.7 Basic Algorithm 181
and
f d (x) = p(x).
p(x)∈Ad
Then we have
n
xq − x = x p − x = f d (x). (4.44)
d|n
Proof We know
d n
x p − x|x p − x ⇐⇒ d|n.
Let p(x) ∈ Ad , that is p(x) ∈ F p [x], deg p(x) = d, p(x) is an irreducible monic
polynomial. Let α be a root of p(x), then add a finite extension field of α on F p and
F p (α) is a d-th finite extension on F p . If d|n, then
F p (α) = F pd ⊂ Fq ,
Conversely, p(x) is the first irreducible polynomial, and deg p(x) = d. If p(x)|x q −
x, then the zeros of p(x) are all in Fq . Let α be a zero point of p(x), then there is
F p (α) ⊂ Fq , that is F pd ⊂ Fq = F pn , so d|n. Finally,
xq − x = f d (x).
d|n
Lemma 4.10 N p (d) represents the number of the first irreducible polynomial with
degree d in F p [x], then
1 n
N p (n) = μ(d) p d , (4.45)
n d|n
Corollary 4.4 If d is a prime number, the degree in F p [x] is d and the number of
the first irreducible polynomial is d1 ( p d − p), that is
1 d
N p (d) = ( p − p), if d is a prime number.
d
Proof By (4.45),
1 d
N p (d) = μ(δ) p δ
d δ|d
1 d
= ( p − p).
d
The Corollary holds.
Based on the above basic conclusions about finite fields, we introduce two methods
for solving discrete logarithms. The first is the Silver–Pohlig–Hellman smoothing
method, and the second is the so-called exponential integration method.
Silver–Pohlig–Hellman
Let Fq be a q-element finite field, b is the generator, that is Fq∗ =< b >,
j (q−1)
r p, j = b p , 1 ≤ j ≤ p. (4.47)
Now let’s look at the calculation method of discrete logarithm in Fq∗ . Let y ∈ Fq∗ , the
discrete logarithm of y under base b is m, that is y = bm . When y and b are given, the
value of m is desired (1 ≤ m ≤ q − 1), by the prime factor decomposition of q − 1
of formula (4.46), if for each piαi (1 ≤ i ≤ s), the minimum nonnegative residue of
m under mod piαi is m i = m mod piαi , according to the Chinese remainder theorem,
there is a unique m mod q − 1 such that
Therefore, the discrete logarithm m of y is determined. Now the question is: let
p α ||q − 1, we determine m mod p α . Let
in other words, y1 p is a p subunit root of Fq∗ , comparing the unit root table R, we
2
Such that
c(T ) = c0 p(T )αc, p ,
p(T )∈L m
denote the discrete logarithm of a(T ) under b(T ) with ind(a(T )), which can be
obtained from the above formula,
ind(c(T )) − ind(c0 ) ≡ αc, p ind( p(T )) (mod q − 1).
p(T )∈L m
By (4.50), ind(c0 ) is known, therefore, the above formula is a linear equation with
h m variables ind( p(T )). By continuously selecting the appropriate t, we can obtain
h m independent linear equations, that is, the h m × h m -order matrix formed by the
coefficients of h m variables and h m linear equations is reversible under mod q − 1, by
Lemma 4.2, as long as its determinant and q − 1 are coprime. From the knowledge of
4.7 Basic Algorithm 185
linear algebra, we can calculate all ind( p(T )) by solving the above linear equations,
the following exponential integral table Bm is obtained,
With exponential integral table Bm , the discrete logarithm of any element a(T ) ∈
Fq∗ can be easily calculated. Let a1 (T ) = a(T )b(T )t , select the appropriate t such
that
a1 (T ) ≡ a0 p(T )αa (mod f (T )).
p(T )∈L m
Thus
ind(a(T )) = ind(a1 (T )) − t.
Remark 4.1 The key to the above calculation is to select an appropriate m to obtain
the exponential integral table Bm . This m cannot be too large, because h m increases
exponentially with m, for example, if m is a prime number, then by Corollary 4.4,
1 m
h m = |L m | = ( p − p).
m
When h m is too large and calculating the exponential integral table Bm , a matrix
of order h m × h m will be solved, and its computational complexity is exponential.
Obviously, m cannot be too small, the selection of m depends on p and n, when
p = 2, n = 127, m’s best choice is m = 17. Select finite field Fq , q = 2127 , because
q − 1 = 2127 − 1 is a Mersenne prime. This is a popular option at present.
ElGamal cryptosystem
Using the computational complexity of discrete logarithm to design asymmetric
cryptosystem is the basic idea of ElGamal cryptosystem. Each user randomly selects
a finite field Fq , q = p n , p is a sufficiently large prime number, and then calculates
the generator g of Fq∗ , select the positive integer x randomly, 1 < x < q − 1, and cal-
culate y = g x , to get the public key Pe = (y, g, q), own private key Pd = (x, g, q).
Encryption algorithm: To send an encrypted message to user A, user B first corre-
sponds each plaintext unit of plaintext space P to an element in Fq∗ , and then encrypts
each plaintext unit. Let m ∈ Fq∗ be a plaintext unit, and user B randomly selects an
integer k, 1 < k < q − 1, then, the public key (y, g, q) of user A is used to encrypt
m, and the encryption algorithm f is
186 4 Cryptosystem and Authentication System
c = my k ,
f (m) = c , where (4.54)
c = gk .
Lemma 4.11 The encryption algorithm f defined by Eq. (4.54) is a 1–1 correspon-
dence of Fq∗ −→ Fq∗ , the inverse mapping f −1 of f is given by equation (4.55).
Proof By (4.54), c = g k , c = my k , then
c c−x y k = c g −xk g xk = c .
Proof Let f (x) ∈ F p [x], deg f (x) = n, f (x) is a monic irreducible polynomial,
then
Fq = F p [x]/< f (x)> = {a0 + a1 x + · · · + an−1 x n−1 |∀ ai ∈ F p }.
is, the final result of α · β, the number of bit operations required for this operation is
O(n log3 p). Therefore,
the same can be estimated Time( βα ) and Time(α k ). The Lemma holds.
Given a pile of items with different weights, can you put all or several of these items
into a backpack to make it equal to a given weight? This is a knapsack problem arising
from real life. Abstract into mathematical problems: Suppose A = {a0 , a1 , . . . , an−1 }
are n sets of positive integers, N is a positive integer. Is N the sum of the elements
of a subset in A? Using binary system, the knapsack problem in mathematics can be
expressed as follows:
Knapsack problem: When N and A = {a0 , a1 , . . . , an−1 } given, where each ai ≥ 1
is a positive integer, whether there is a binary integer e = (en−1 en−2 · · · e1 e0 )2 makes
the following formula true,
n−1
ei ai = N , where ei = 0 or ei = 1.
i=0
i 0 = max{i|ai ≤ N }. (4.57)
If the equal sign in Eq. (4.58) holds, that is aik = N − ai0 − · · · − aik−1 , then the
algorithm completes and obtains the solution N = ai0 + ai1 · · · + aik of (A, N ). If i k
does not exist, that is
call the algorithm terminated. Obvious indicators i 0 > i 1 > · · · > i k > · · · . Let I be
a set of some indicators, and denote the above algorithm as ψ.
where
ei = 1, if i ∈ I,
ei = 0, if i ∈
/ I.
n−1
ei ai = N .
i=0
4.7 Basic Algorithm 189
Then the knapsack problem (A, N ) has no solution. We can prove this conclusion
by means of counter-evidence. If (A, N ) has a solution, you might as well make
N = a j0 + a j1 + · · · + a jt .
Adjust the order, we can let j0 > j1 > · · · > jt . By the definition of i 0 , and a j0 ≤ N ,
know j0 ≤ i 0 , thus
j0 −1
N ≥ ai0 ≥ a j0 > ar ≥ a j0 + a j1 + · · · + a jt
r =0
n−1
c = f (m) ≡ ti m i (mod p), 0 ≤ c ≤ p, (4.60)
i=0
where c is cryptosystemtext.
Decryption algorithm: First, use the private key N ≡ b−1 c(mod p), 0 ≤ N ≤
p − 1. Then use the algorithm ψ = f −1 of knapsack problem (A, N ) to solve
Lemma 4.14 The encryption algorithm f defined by Eq. (4.60) is a 1–1 correspon-
dence of Fn2 −→ F p , its inverse mapping f −1 is given by equation (4.61).
n−1
N ≡ b−1 c(mod p), c ≡ ti m i (mod p).
i=0
Then
n−1
n−1
N≡ m i b−1 ti ≡ m i ai (mod p).
i=0 i=0
n−1
N= m i ai , =⇒ ψ(A, N ) = m = m 0 m 1 · · · m n−1 .
i=0
So we have
f −1 ( f (m)) = m, ∀ m ∈ Fn2 .
Conversely, if
n−1
N= m i ai ,
i=0
then
n−1
n−1
bN ≡ m i ai b ≡ m i ti (mod p).
i=0 i=0
f ( f −1 (c)) = c, ∀ c ∈ F p .
It can be seen that f is a 1–1 correspondence of Fn2 −→ F p and the inverse mapping
is f −1 = ψ. The Lemma holds.
It can be seen from the above discussion that if A = {a0 , a1 , . . . , an−1 } is not a
super-increasing sequence, the decryption algorithm f −1 is a difficult problem of “NP
complete class", so the encryption and decryption algorithm defined by MH knapsack
cryptosystem is the most typical trapdoor single function. Because of this, people
believe that MH knapsack public key cryptography is very secure for a long time.
However, in 1982, Shamir proved that a class of nonsuper-increasing sequences can
4.7 Basic Algorithm 191
n−1
m1 > αi , m 2 > nm 1 , (a1 , m 1 ) = (a2 , m 2 ) = 1. (4.62)
i=0
A3 = {u 0 , u 1 , . . . , u n−1 }, u i = a2 ωi mod m 2 ,
that is
0 ≤ u i < m 2 , u i ≡ a2 ωi (mod m 2 ). (4.64)
n−1
c = f (x) = ei u i , (4.65)
i=0
n−1
N0 = b2 c mod m 2 = ei ωi . (4.66)
i=0
Because by (4.65),
n−1
n−1
b2 c ≡ ei b2 u i ≡ ei ωi (mod m 2 ).
i=0 i=0
n−1
0≤ ei ωi < m 2 .
i=0
n−1
n−1
N = b1 N0 ≡ ei b1 ωi ≡ ei αi (mod m 1 ).
i=0 i=0
So there is
n−1
N= ei αi , αi ∈ A1 .
i=0
To get plaintext x.
Therefore, Shamir uses simple transformation to transform the general knapsack
problem into super-incremental knapsack problem. Although A3 is very special, we
have reason to doubt that the public key cryptography based on the general knapsack
problem solving algorithm is not as secure as people think.
Exercise 4
1. Explain the following terms. (1) One secret at a time, (2) Completely confidential
system, (3) Unique solution distance, (4) Improve the certification system.
2. Short answer:
(1) What are the advantages and disadvantages of symmetric cryptosystem and
asymmetric cryptosystem?
(2) The goal of perfecting the certification system.
4.7 Basic Algorithm 193
and an is even if and only if 3|n. More generally, find the law of d|an .
6. Suppose N = mn, and (n, m) = 1. A second-order matrix A ∈ M2 (Z N ) on nZ N ,
can consider A ∈ M2 (Zm ) and A ∈ M2 (Zn ), let A1 and A2 represent the elements
of A in M2 (Zm ) and M2 (Zn ), then prove
σ σ
(i) Mapping A −→ (A1 , A2 ) is a 1–1 correspondence between M2 (Z N ) −→
M2 (Zm ) × M2 (Zn ).
(ii) In the corresponding σ , A is the invertible matrix (mod N ) if and only if A1
is the invertible matrix (mod m) and A2 are the invertible matrix (mod n).
7. Let p be a prime, α ≥ 1, then A ∈ M2 (Z pα ) is a reversible square matrix if and
only if A ∈ M2 (Z p ) is a reversible square matrix. By calculate, for ∀ α ≥ 1, find
the number of reversible matrices in M2 (Z pα ).
8. Let ϕ(N ) be Euler function, ϕ2 (N ) is the number of invertible matrices in
M2 (Z N ), calculation formula for ϕ2 (N ): that is, write a formula for ϕ2 (N ) similar
to ϕ(N ). Known ϕ(N ) = N p|N (1 − 1p ), solve ϕ2 (N ) =?
9. Let ϕk (N ) be the number of k-order reversible matrices in Mk (Z N ) and give the
calculation formula of ϕk (N ).
10. According to exercise 8 and exercise 9, find the order of k-dimensional affine
transformation group G = (A, b) on Z N .
11. RSA is used for encryption, the alphabet of plaintext and cryptosystemtext
is {0, 1, 2, . . . , 39} 40 numbers, of which {0, 1, 2, . . . , 25} 26 numbers are
equivalent to English 26 letters. Blank = 26, • = 27, ? = 28, $ = 29, number
{0, 1, 2, . . . , 9} = {30, 31, . . . , 39}. Suppose all public keys n A satisfy 402 <
n A < 403 . Plaintext unit m = m 1 m 2 ∈ Z240 , cryptosystemtext unit c = c1 c2 c3 ∈
Z340 . For any plaintext unit, m = m 1 m 2 corresponds to a number m 2 40 + m 1 of
Zn A , any cryptosystemtext c = c3 402 + c2 40 + c1 ∈ Zn A .
194 4 Cryptosystem and Authentication System
(i) Encrypting plaintext "S E N D$7500" with public key (n A , e A )=(2047, 179).
(ii) Factor n A = 2047 to find the private key (n A , d A ) =?
(iii) A password attacker can quickly find the private key d A without factoring
2047, so n A = 2047 is a pretty bad choice. Why?
12. The computer attacks the public key (n A , e A ) = (536813567, 3602561) and
finds the private key d A . It shows that 29-bit n A is not safe in RSA system.
13. Assuming that the plaintext alphabet is {0, 1, . . . , 26}, and the first 26 num-
bers are 26 letters in English, blank = 26. Cryptosystemtext alphabet adds
“|" = 27 to the plaintext alphabet, a total of 28 numbers. If the plaintext unit
is m = m 1 m 2 m 3 ∈ Z327 , Cryptosystemtext unit is c = c1 c2 c3 ∈ Z328 . Then in the
corresponding number of Z n A (see exercise 11), we need n A to meet
References
Adelman, L. M., Rivest, R. L., & Shamir, A. (1978). A method for obtaining digital signatures and
public-key crypto system. Communication of ACM, 21, 120–126.
Adleman, L. M. (1979). A subexponential algorithm for the discrete logarithm problem with appli-
cation to cryptography. In Proceedings of the 20th Annual Symposium on the Foundations of
Computer Science, pp. 55–60.
Blum, M. (2022). Coin-flipping by telephone–A protocol for solving impossible problems (pp. 133–
137). Spring-Compcan: IEEE Proceeding.
Coppersmith, D. (1984). Fast evaluation of logarithms in fields of characteristic two. IEEE Trans-
actions in Information Theory, IT-30, 587–594.
Cover, T. M. (2003). Fundamentals of information theory. Tsinghua University Press (in Chinese).
Diffie, W., & Hellman, M. E. (1976). New direction in crytography. IEEE Transactions in Informa-
tion Theory, IT-22, 644–654.
EIGamal, T. (1985). A public key cryptosystem and a signature scheme based on discrete logarithms.
IEEE Transactions in Information Theory, IT,314, 469–472.
Fait, A. & Shamir, A. (2022). How to prove yourself: Practical solutions to identifications and
signature problems. In A advance in Crypology-CRYPTO’86 (Vol. 263, pp. 186–194). Springer-
Verlag, LVCS.
Garey, M. R., & Johnson, D. S. (1979). Computers and intractability: A guide to the theory of
NP-completeness. Freeman.
Goldreich, O. (2001). Foundation of cryptography Cambridge University Press.
Gordon, J. A. (1985). Strong prime are easy to find, advance in cryptology. In Proceedings of Euro
Crypt84 (pp. 216–223). Springer.
Hellman, M. E., & Merkle, R. C. (1978). Hiding information and signatures in trap door knapascks.
IEEE Transactions in Information Theory, IT-24, 525–530
Hellman, M. E. (1979). The mathematics of public-key cryptography. Scientific America, 241,
146–157.
Hill, L. S. (1931). Concerning certain linear transformation apparatus of cryptography. American
Math Monthly, 38, 135–154.
Kahn, D. (1967). The codebreakers, the story of secret writing. Macmillan.
Knuth, D. E. (1973). The art of computer programming. Addision-Wesley.
Koblitz, N. (1994). A course in number theory and cryptograph. Springer-Verlag.
Kranakis, E. (1986). Primality and cryptogaphy. John Wiley-Sons.
Massey, J. L. (1983). Logarithms in finite cyclic group-Cryptographic issues. In Proceedings of the
4th Benelux Symposium on Information’s Theory, pp. 17–25.
Odlyzko, A. M. (1985). Discrete logarithms in finite fields and their cryptographic significance. In:
Advance in Cryptology, Proceedings of Eurocrypt 84, pp. 224–314. Springer.
Rivest, R. L. (1985). RSA chips(past, present, and future). Advances in Cryptology, Proceedings of
Eurocrypt, 84, 159–165.
Ruggiu, G. (1985). Cryptology and complexity theories, advances in cryptology. In Proceedings of
Eurocrypt (Vol. 84, pp. 3–9), Springer
Schneier, B. (1996). Applied cryptography, John Wiley 8-sous.
196 4 Cryptosystem and Authentication System
Shamir, A. (1982). A polynomial time algorithm for breaking the basic Markle-Hellman Cryptosys-
tem. In Proceedings of the 23rd Annual Symposium on the Foundations of Computer Science, pp.
145–152.
Shannon, C. E. (1949). Communication theory of secrecy system. The Bell System Technical Jour-
nal, 28, 656–715.
Stinson, D. R. (2003). Principles and practice of cryptography, translated by Guodeng Feng. Elec-
tronic Industry Press (in Chinese).
Trappe, W., & Washington, L. C. (2008). Cryptography and coding theory, translated by Quanlong
Wang et al., people’s Posts and Telecommunications Publishing House (in Chinese).
Wah, P., & Wang, M. Z. (1984). Realization and application of Massey-Omura lock. In Proceedings
of the International, Zürich Seminar(1984),175-182.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 5
Prime Test
In the RSA algorithm in the previous chapter, we see that the decomposition of large
prime factors constitutes the basis of RSA cryptosystem security. Theoretically, this
security should not be questioned, because there is only the definition of prime in
mathematics, and there is no general method to detect prime. The main purpose of
this chapter is to introduce some basic prime test methods, including Fermat test,
Euler test, Monte Carlo method, continued fraction method, etc., understanding the
content of this chapter requires some special number theory knowledge.
The basic properties of pseudo prime numbers are discussed. Our working plat-
form is a finite Abel group Z∗n , define as
o(g) = 1 if and only if g is the unit element of group G. By the definition of o(g),
obviously,
g t = 1 ⇔ o(g)|t. (5.3)
The following two lemmas are the basic conclusions about the order of group element
g.
o(g)
o(g k ) = , (5.4)
(k, o(g))
So we have t = m
(k,m)
, the Lemma holds.
o(ab) = o(a)o(b).
Back to the finite group Z∗n , any integer a ∈ Z, (a, n) = 1, then ā ∈ Z∗n , we denote
o(ā) with o(a), a is called the order mod n, obviously, o(a) = o(b), if a ≡ b(mod n).
A basic problem in number theory is the existence of primitive roots of mod n.
equivalently, is Z∗n a cyclic group? If there is a positive integer a, (a, n) = 1, o(ā) =
|Z∗n | = ϕ(n), then Z∗n is a cyclic group of order ϕ(n), so that the primitive root of
mod n exists and a is the primitive root of mod n.
i pi ( p − 1)
o(a p ) = .
( pi , pi ( p − 1))
Therefore, without losing generality, let o(a) = p − 1, then by Sylow theorem, when
α > 1, p α−a |ϕ( p α ), there is an integer b, (b, n) = 1, b is o(n) = p α−1 in the order
of mod p α , because of (o(a), o(b)) = 1, then by Lemma 5.2, there is
(iii) If exist one b ∈ Z∗n does not satisfy Eq. (5.1), at least half of a, b ∈ Z∗n do not
satisfy Eq. (5.1).
Proof (i) and (ii) are trivial. (i) can be obtained by (5.3). And b1 , b2 ∈ Z∗n ,
b1n−1 ≡ 1(mod n), b2n−1 ≡ 1(mod n). =⇒ (b1 b2 )n−1 ≡ 1(mod n).
bn−1 ≡ 1(mod n), =⇒ (b−1 )n−1 ≡ 1(mod n).
So there is (ii). To prove (iii). Let n not be Fermat pseudo prime to base b, if n is
Fermat pseudo prime to base a, then n is not Fermat pseudo prime to base ab. By
(ii), therefore, if there is a base to make n a Fermat pseudo prime number, there must
be a base to make n not a Fermat pseudo prime number, so more than half of the
base b must make n not a Fermat pseudo prime number. The Lemma holds.
By Lemma 5.3, if there is a base b so that n is not Fermat pseudo prime, detect a,
1 ≤ a ≤ n, (a, n) = 1 in sequence, whether a n−1 ≡ 1(mod n); that is, there is more
than 50% chance that find the exact b such that bn−1 ≡ 1(mod n), this proves that
n is not a prime number. Is it possible that all a, 1 ≤ a ≤ n, (a, n) = 1, n is Fermat
pseudo prime to base a The answer is yes, such a number n is called Carmichael
number.
Definition 5.2 A Carmichael number n is an odd compound number, and for ∀ b ∈
Z∗n , there is
bn−1 ≡ 1(mod n).
According to the Chinese remainder theorem, there is a positive integer b such that
b ≡ g(mod p 2 ),
b ≡ 1(mod n ).
That is p( p − 1)|n − 1, but p|n is contradict with p|n − 1. So bn−1 ≡ 1(mod n), n
is not Carmichael number, (i) holds.
Now to prove (ii). If ∀ p, p|n, there is p − 1|n − 1, then ∀ b ∈ Z∗n ,
n−1
bn−1 = (b p−1 ) p−1 ≡ 1(mod p), ∀ p|n.
By p − 1 n − 1, then g n−1 ≡ 1(mod p), so there is bn−1 ≡ 1(mod n), this contra-
dicts with the assumption that n is the Carmichael number. So (ii) holds.
To prove (iii), we just need to exclude that n is the product of two prime numbers.
By (ii), let n = pq, p < q, if n is a Carmichael number, then q − 1 | n − 1, but
n − 1 = p(q − 1 + 1) − 1 = p(q − 1) + p − 1, then
n − 1 ≡ p − 1(mod q − 1),
Example 5.2 The positive integer 561 = 3 · 11 · 17 is the smallest Carmichael num-
ber.
202 5 Prime Test
Proof Defined by, the Carmichael number is odd and compound, so the minimum
Carmichael number is
has no prime solution q, when p = 11, the above formula has a minimum solution
q = 17, so n = 3 · 11 · 17 is the smallest Carmichael number.
Example 5.3 For given prime number r ≥ 3, then the congruence equations
r pq ≡ 1(mod p − 1)
r pq ≡ 1(mod q − 1)
has only finite different prime solutions p, q. Let’s leave this conclusion for reflection.
Let p > 2 be an odd prime, Euler test uses the Euler criterion in the quadratic residue
of mod p to detect whether a positive integer n is prime. Like Fermat’s test, it is
obvious that the n that passes the test cannot be determined as prime, but the n that
fails the test is certainly not prime. We know that when the positive integers a and n
are given (n > 1), the solution of the quadratic congruence equation x 2 ≡ a(mod n)
is a famous “NP complete” problem. We can’t find a general solution in an effective
time. However, in the special case where n = p > 2 is an odd prime number, we
have rich theoretical knowledge to discuss the quadratic residue of mod p, these
knowledge include the famous Gauss quadratic reciprocal law and Euler criterion,
which constitute the core knowledge system of elementary number theory. First, we
introduce Legendre sign and let p > 2 be a given odd prime number.
Z∗p is a ( p − 1)-order cyclic group, a ∈ Z∗p (i.e., (a, p) = 1), we define the Leg-
endre symbolic function as
a 1, when x 2 ≡ a(mod p) is solvable
=
p − 1, when x 2 ≡ a(mod p) is unsolvable
ab a b
= , ∀ a, b ∈ Z
p p p
and
a b
= , if a ≡ b (mod p) .
p p
Lemma 5.5 a ∈ Z, p a, then the necessary and sufficient condition for a to be the
quadratic residue of mod p is
p−1
a 2 ≡ 1(mod p).
Proof Z∗p is a p − 1-order cyclic group, let g be a primitive root of mod p, that is ḡ
is the generator of Z∗p , that is ∀a ∈ Z, (a, p) = 1, we have
p−1
o(a) = o(g t ) = .
(t, p − 1)
So
p−1
o(a) | ⇔ 2|(t, p − 1) ⇔ 2|t,
2
that is t is even, thus, a is a quadratic residue of mod p, the Lemma holds.
p−1 a
a 2 ≡ (mod p). (5.5)
p
Proof If (a, p) > 1, that is p|a, the above formula holds. Might as well let p a.
By Fermat congruence theorem a p−1 ≡ 1(mod p), there is
p−1 p−1
(a 2 + 1)(a 2 − 1) ≡ 0(mod p).
204 5 Prime Test
Thus p−1
a 2 ≡ ±1(mod p).
p−1 p−1
If a 2 ≡ 1(mod p), by Lemma 5.5, then ( ap ) = 1. If a 2 ≡ −1(mod p), then ( ap ) =
−1. So (5.5) holds.
Call n an Euler pseudo prime under base b. Where ( nb ) is Jacobi symbol, define as
α1 α2 αs
b b b b
= ··· , if n = p1α1 · · · psαs . (5.7)
n p1 p2 ps
From the definition, we obviously have a corollary: if n is Euler pseudo prime under
basis b, then n is Fermat pseudo prime under basis b. This conclusion can be proved
by squaring both sides of Eq. (5.6) at the same time.
The following example shows that the inverse of inference is not tenable; that is,
if n is Fermat pseudo prime under basis b, but not Euler pseudo prime.
Example 5.4 n = 91 is Fermat pseudo prime under basis b = 3, but not Euler pseudo
prime. In fact, it’s easy to calculate 36 ≡ 1(mod 91), thus 390 ≡ 1(mod 91). From
36 ≡ 1(mod 91), we have
10 2 5
= · = −1,
91 91 91
From the Euler criterion of Lemma 5.6, we can easily calculate the Legendre
symbols of −1 and 2.
5.2 Euler Test 205
−1 p−1 2
= (−1) 8 ( p −1)
1 2
= (−1) 2 , . (5.8)
p p
p
) = (−1) 2 . To
calculate the Legendre sign for 2, we notice that
⎧
⎪
⎪ p − 1 ≡ (−1)1 (mod p)
⎪
⎪
⎪
⎪ 2 ≡ 2 · (−1)2 (mod p)
⎪
⎪
⎪
⎨ p − 3 ≡ 3 · (−1)3 (mod p)
⎪
⎪ ..
⎪
⎪ .
⎪
⎪
⎪
⎪
⎩ r ≡ p − 1 · (−1) p−1
⎪ 2 (mod p),
2
where r = p−1
2
, if p−1
2
is a even; r = p − p−1
2
, if p−1
2
is an odd. There is
p−1
!(−1) 8 ( p −1) (mod p),
1 2
2 · 4 · 6 · · · ( p − 1) ≡
2
that is p−1
≡ (−1) 8 ( p −1)
1 2
2 2 (mod p),
by Lemma 5.6,
2
≡ (−1) 8 ( p −1)
1 2
(mod p),
p
there is
2
= (−1) 8 ( p −1)
1 2
,
p
−1 2
= (−1) 8 (n −1)
n−1 1 2
= (−1) 2 , . (5.9)
n n
206 5 Prime Test
Proof The square of any odd number is congruent 1 under mod 8, that is a 2 ≡
1(mod 8). Write n = a 2 · p1 p2 · · · pt , where pi are different prime numbers, then
n ≡ p1 p2 · · · pt (mod 8).
b b b b
= ··· , (5.10)
n p1 p2 pt
thus
−1 −1 −1 −1 p1 −1 p2 −1 pt −1
2 + 2 +···+ 2
n−1
= ··· = (−1) = (−1) 2 . (5.11)
n p1 p2 pt
2
The same can be proved n
, the Lemma holds.
Corollary 5.1 For all odd numbers n, they are Euler pseudo prime under the base
±1.
Lemma 5.9 (Gauss. ) Let p and q be two different odd primes, then
q p
= (−1) 4 ( p−1)(q−1) .
1
p q
Proof According to incomplete statistics, there are currently more than 270 methods
to prove Gauss quadratic reciprocal law. In order to save space, we leave the proof
to the readers, hoping that everyone can find their favorite proof method.
Next, we discuss the computational complexity of Fermat test and Euler test.
Euler test of n to base b, by (5.6), the number of bit operations on the left is O(log3 n).
Find Jacobi symbol ( nb ), from Eq. (5.7) and quadratic reciprocal law, the calculation
5.2 Euler Test 207
can be transformed into the calculation of Legendre symbol. Each reciprocal law is
actually a division, so we only consider the calculation of Legendre symbols. By
Euler criterion,
b p−1
Time calculate = Time b 2 mod p = O(log3 n).
p
The number of prime factors of each n has an estimated O(log log n), so
b
Time calculate Jacobi symbol = O(log log n · log3 n) = O(log4 n).
n
The above formula is directly derived from Lemma 5.3. Let’s introduce a better
Miller–Rabin method than Solovay–Strassen method in a sense.
Definition 5.4 Let n be an odd compound number, write n − 1 = 2t · m, where
t ≥ 1, m is an odd. Let b ∈ Z∗n , if n and b satisfy one of the following conditions,
r
bm ≡ 1(mod n), or exists one r, 0 ≤ r < t, such that b2 m ≡ −1(mod n). (5.12)
Lemma 5.11 Suppose n ≡ 3(mod 4), then n is a strong Pseudoprime under base b
if and only if n is an Euler Pseudoprime under base b.
Therefore, if n is an Euler pseudo prime number under base b, the above formula
holds, so it is also a strong pseudo prime number for base b. Conversely, if the
above formula holds, because of n ≡ 3(mod 4), then 21 (n − 1) is an odd number, so
( −1
n
) = −1, and
208 5 Prime Test
n−1 n−1
b b 2 b 2 n−1
= ≡ ≡b 2 (mod n).
n n n
Before proving Theorem 5.2, let’s introduce Miller–Rabin’s test method, in order
to test whether a large odd number n is a prime number, we write n − 1 = 2t · m,
m is an odd number, t ≥ 1, select one b at random, 1 ≤ b < n, (b, n) = 1. We first
calculate bm mod n, if we get the result is ±1, then n passes the strong pseudo prime
test (5.12). If bm mod n = ±1, then we square bm mod n and find the minimum
nonnegative residue of the squared number under mod n to see if we get the result
of −1 and perform r times. If we can’t get −1, then n to base b fails to test Formula
(5.12). Therefore, it is asserted that n to base b is not a strong pseudo prime number.
If −1 is obtained by r squared, then n passes the test under base b.
In Miller–Rabin’s test, if n to base b fails to pass the test Formula (5.12), then n
must not be a prime number, if n to randomly selected k b = {b1 , b2 , . . . , bk } pass
the test, by property (ii) of 5.2, each bi accounts for no more than 25
1
P{n not prime} ≤ . (5.13)
4k
Compared with the Solovay–Strassen method using Euler test, the Miller–Rabin
method using strong pseudo prime test is more powerful.
To prove 5.2, we first prove the following two lemmas.
Thus
r p−1
x 2 m ≡ −1(mod p) ⇔ 2r m j ≡ (mod p − 1).
2
Namely,
2r m j ≡ 0(mod p − 1).
If r > t − 1, then the congruence has no solution to j, because m and m are odd
numbers, so when r ≥ t, (5.14) is unsolvable. If r < t, let d = (m, m ), then
(2r m, 2t m ) = 2r d,
then Eq. (5.15) has exactly d solutions for j. Each j corresponds to one x = g j , then
the number of solutions of Eq. (5.14) to x is N = 2r d, the Lemma holds.
With the above preparation, we now give the proof of Theorem 5.2.
Proof (The proof of Theorem 5.2). Let’s first prove that (i), that is, n and b satisfy
Eq. (5.12), we want to prove that formula (5.6) is satisfied; that is, if n to base b is
a strong pseudo prime number, then n to base b is an Euler pseudo prime number,
write n − 1 = 2t m, m is prime, we prove the property (i) of Theorem 5.2 in three
cases.
n−1
(1) bm ≡ 1(mod n). In this case, it is obvious that b 2 ≡ 1(mod n). Let’s prove
( nb ) = 1, in fact,
1 bm b m
1= = = = 1.
p p p
There is
n−1 b
b 2 ≡ ≡ 1(mod n).
n
Because
n−1 t−1 t−1
b 2 = b2 m
≡ −1(mod n), =⇒ b2 mm 1
≡ −1(mod n),
by p|n, we have
t−1
b2 mm 1
≡ −1(mod p). (5.17)
b p−1 t−1
≡b 2 = b2 m
≡ −1(mod p).
p
Because if the above formula is 1, both sides will be m power at the same time,
which will contradict Formula (5.17). If t1 > t, put both sides of Eq. (5.17) to
the power of 2t1 −t at the same time, then ( bp ) = 1, so we have (5.16).
We now
complete the proof of case (2) under the conclusion of Eq. (5.16), write
n = p|n p, p does not require different, define the positive integer k as
By (5.16), then
b b
= = (−1)k . (5.18)
n p
n ≡ 1 + 2t ≡ 1 + k · 2t (mod 2t+1 ),
5.2 Euler Test 211
(1) n can be divided by a square number; that is, there is a prime number p, p α ||n,
α ≥ 2.
In this case, we prove that there are at least 41 (n − 1) b, b ∈ Z∗n , n to base
b is not Fermat prime number, let alone a strong pseudo prime. First, suppose
bn−1 ≡ 1(mod n), then there is a prime p, p 2 |n, thus bn−1 ≡ 1(mod p 2 ). Because
Z∗p2 is a p( p − 1)-order cyclic group (see Theorem 5.3), let g be a generator of
Z∗p2 , then
Z∗p2 = {g, g 2 , . . . , g p( p−1) }.
d = (n − 1, p( p − 1)) = (n − 1, p − 1).
or
r r
b2 m ≡ −1(mod p), b2 m ≡ −1(mod q), 0 ≤ r < t. (5.21)
2 4t1 2 1 1
2−2t1 −1 + ≤ 2−3 · + = .
3 3 3 6 4
1
(m, m 1 ) · (m, m 2 ) ≤ m1m2.
3
1 −2t1 2 4t1 1 1 1 1
2 + ≤ + = < .
3 3 3 18 9 6 4
2k+1 − 1 2k − 2 2kt1
2−t1 −t2 −···−tk 1 + ≤ 2−kt1 +
2k − 1 2k − 1 2k − 1
2k − 2 1
= 2−kt1 · k +
2 − 1 2k − 1
2k − 2 1
≤ 2−k k +
2 − 1 2k − 1
= 21−k
1
≤ ,
4
because k ≥ 3, in this way, we have completed all the proofs of Theorem 5.2.
Euler test and strong pseudo prime test require some complex quadratic residual
techniques. We summarize the main conclusions of this section as follows:
(A) n to base b is a strong pseudo prime number ⇒ n to base b is an Euler pseudo
prime number ⇒ n to base b is a Fermat pseudo prime number; therefore, the
strong pseudo prime test is the best way to detect prime numbers.
(B) Although no test can successfully detect a prime number at present, the probabil-
ity detection method of strong pseudo prime number test, that is, Miller–Rabin
method, can obtain that the success probability (see (5.13)) of detecting whether
any odd number n is a prime number can be infinitely close to 1. That is
Using all the prime number test methods introduced in the previous two sections, for
a huge odd number n, even if we already know that n is not a prime number, we cannot
successfully decompose n, because the prime number test does not provide prime
factor decomposition information, A more direct method—like the sieve method— √
verifies whether the prime factor of n is for prime numbers not√greater than n,
because a compound number n must have a prime factor p, p ≤ n. Selected p≤
√ √
n
n, the bit operation required to divide n by p is O(log n), there are O( log n ) prime
√
numbers√ p ≤ n in total, therefore, the bit operation required for such a verification
is O( n). A more effective method was proposed by J. M. Pollard in 1975. We call
it Monte Carlo method, or “rho” method.
f
First, find a convenient mapping f of Zn −→ Zn ; for example, f (x) is an integer
coefficient polynomial, such as f (x) = x 2 + 1; secondly, a prime number x0 is ran-
214 5 Prime Test
l
N = r r −l (r − j).
j=0
N
l
l
j
= r −l (r − j) = 1− , (5.24)
r r +1 j=1 j=1
r
We notice that the real number x ∈ (0, 1), then log(1 − x) < −x. Take the logarithm
to the right of the above formula, then
l
j l
j −l(l + 1) l2
log 1 − <− = <− .
j=1
r j=1
r 2r 2r
√ √
Because of l = 1 + [ 2λr ] > 2λr , from the above formula,
5.3 Monte Carlo Method 215
l
j
log 1 − < −λ.
j=1
r
By (5.24), we have
N
≤ e−λ .
r r +1
We complete the proof of Theorem 5.3.
Monte Carlo method uses a polynomial f (x) ∈ Z[x], so that n is a positive integer,
and the congruence equation of mod n is invariant to polynomial f (x), that is
We call the polynomial f and the initial value x0 described in Lemma 5.14 an
average mapping. When the first subscript k is very large, the amount of calculation
is very large. Here we give an improved Monte Carlo algorithm.
f (x) ∈ Z[x] given, Monte Carlo algorithm needs to continuously calculate
xk (k = 1, 2, . . .). Let 2h ≤ k < 2k+1 (h ≥ 0), j = 2h − 1; that is, k is an (h + 1)-
bit number, j is the maximum h-bit number, compare xk with x j and calculate
(xk − x j , n), if (xk − x j , n) > 1, then the calculation is terminated, otherwise con-
sider k + 1. The improved Monte Carlo algorithm only needs to calculate (xk − x j , n)
once for each k , j = 2h − 1. There is no need to verify every j, 0 ≤ j < k, when k
is very large, it reduces a lot of computation, but there is a disadvantage. It may miss
216 5 Prime Test
the smallest subscript k satisfying the condition, but the error is controllable. In fact,
we have the following error estimation.
Lemma 5.16 Let n√ be an odd number and a compound number, and r be a factor
of n, r |n, 1 < r < n. Let f (x) ∈ Z[x], x0 ∈ Zn given, then the computational
complexity of finding r by Monte Carlo algorithm ( f, x0 ) is
√
Time(( f, x0 )) = O( n log3 n) bits. (5.26)
Further, there is a normal number C, so that for any positive real number λ, the
success probability of Monte Carlo algorithm ( f, x0 ) to find a nontrivial factor r of
n is greater than 1 − e−λ , that is
Proof From the discussion of computational complexity in Chap. 1, finding the max-
imum common divisor of two integers and the addition, subtraction, multiplication
and division in mod n are polynomial. Let C1 satisfies
C2 satisfies
Time( f (x) mod n) ≤ C2 log3 n, x ∈ Zn .
Equation (5.26) proved. In the sense of probability, that is, on the premise of allowing
certain errors, Eq. (5.26) can be further improved. √
Let λ > 0 be any given real number, by Lemma 5.3, ratio of k0 ≥ 1√+ 2λr
< e−λ , in other words, the probability of successfully finding r , r |n, r ≤ n is
√
P{find out r, r |n, r < n} ≥ 1 − e−λ .
√
In order to ensure the success rate, then k0 ≤ 1 + 2λr . By (5.28), the number of
bit operations required shall not be greater than
√ √ √
4(1 + 2λr )(C1 log3 n + C2 log3 n) = O( λ 4 n log3 n).
Remark 5.1 A basic assumption of Monte Carlo method is that the integer coefficient
polynomial f can be used as an average mapping (see Lemma 5.14); this has not
yet been proved.
a+b a−b
σ ((a, b)) = , .
2 2
Inverse mapping is
σ −1 ((t, s)) = (t + s, t − s).
218 5 Prime Test
a+b a−b
σ ((a, b)) = , .
2 2
n = (t + s, n)(t − s, n).
h
b2 mod n = piαi , αi ≥ 0.
i=1
h
αi j
h
α
ai = pj i∈A
, where ai = p j ij ,
i∈A j=1 j=1
Suppose i∈A ei = (0, 0, . . . , 0) is the zero vector in F2h , then
αi j ≡ 0(mod 2), ∀ 1 ≤ j ≤ h.
i∈A
That is, ai is a square number. Let r j = 1
2 i∈A αi j , then
⎛ ⎞2
h
h
ai = ⎝ p jj ⎠
r r
, define c = p jj , (5.29)
i∈A j=1 j=1
On the other hand, bi mod n represents the minimum nonnegative residue of bi under
mod n, let
b= (bi mod n) = δi , (5.30)
i∈A i∈A
Because of ai = bi2 mod n, that is 0 ≤ ai < n, and bi2 ≡ ai (mod n). There is
bi2 = b2 ≡ ai = c2 (mod n).
i∈A i∈A
Two different integers b and c defined by Eqs. (5.29) and (5.30) satisfy b2 ≡
c2 (mod n), We write the above analysis as the following lemma.
number. Write
220 5 Prime Test
h
α
h
αi j
ai = p j ij , ai = pj i∈A
= c2 .
j=1 i∈A j=1
where
h
1
αi j
c= p j2 i∈A
,
j=1
2 1
b2 ≡ c2 (mod n), =⇒ b ≡ ±c(mod n)’s rate ≤ r
≤ ,
2 2
Lemma 5.20 holds.
According to Lemma 5.20, b and c are selected by using factor basis, if b ≡
±c(mod n), then select failure, and the probability of failure is ≤ 21 . If the selection
fails, select another b1 and c1 , in this way, we randomly select k b and c equally
almost independently, and the probability of success of b ≡ ±c(mod n) is
1
P{b2 ≡ c2 (mod n), b ≡ ±c(mod n)} ≥ 1 − . (5.31)
2k
In other words, the probability of finding a nontrivial factor d = (b + c, n) of n by
using the factor base can be infinitely close to 1. Below, we systematically summarize
the factor base decomposition method as follows:
5.4 Fermat Decomposition and Factor Basis Method 221
Factor-based method
Let n be a large odd number and y be an appropriately selected integer (e.g.,
1
y ≤ n 10 ), let the factor base be
and
r 1
c= p j j mod n, r j = αi j .
j∈B
2 i∈A
We have b2 ≡ c2 (mod n), if b ≡ ±c(mod n), then reselect the subset A, Until finally
b ≡ ±c(mod n), in this way, we find a nontrivial factor d|n of n, d = (b + c, n).
Therefore, there is factorization n = d · dn .
Factor decomposition using factor-based method cannot guarantee the success
rate of 100% because b ≡ ±c(mod n) cannot be deduced from b2 ≡ c2 (mod n),
however, the success probability of factorization for large odd n can be infinitely
close to 1. Under the condition of success probability ≥ 1 − 21k (k is a given normal
number), the computational complexity of factorization n of by factor-based method
can be estimated as
√
Time(factor-based method to n factorization) = O(ec log n log log n
). (5.32)
The proof of Formula (5.32) is relatively complex. No detailed proof is given here.
Interested readers can refer to pages 136–141 of (Pomerance, 1982a) in reference 5.
The exact value of C in (5.32) is unknown. It is generally guessed that C = 1 + ε,
where ε > 0 is any small positive real number.
Let k be the√number of bits of n, and the estimate on the right of (5.32) can be
written as O(ec k log k ). Therefore, the computational complexity of the factor-based
method is sub-exponential. Compared with the Monte Carlo method introduced in the
previous section (see (5.31)), its computational complexity is exponential, because
√ 1
O( n) = O(ec1 k ), where c1 = log 2.
2
As we all know, the security of RSA public key cryptography is based on the
prime factorization n = pq of n. Although there is no general method to factor-
222 5 Prime Test
ize any large odd n, although Monte Carlo method and factor-based method are
probability calculation methods, the probability of successful factorization is very
large, The disadvantage is that their computational complexity is exponential and
sub exponential, which is the reason for choosing huge prime numbers p and q in
RSA.
In the factor-based method introduced in the previous section, b2 mod n can be the
residual of the minimum absolute value of b2 under mod n, that is
n
b2 ≡ b2 mod n(mod n), |b2 mod n| ≤ .
2
In this way, b2 mod n can be decomposed into the product of some smaller prime
numbers. The continued fraction method √ is the best method at present. How to find
the integer b, so that |b2 mod n| < 2 n, b2 mod n is more likely to be decomposed
into the product of some small prime numbers. First, we introduce what is continued
fraction and some basic properties.
Suppose x ∈ R is a real number, [x] is the integer part of x, and {x} is the decimal
part of x. Let a0 = [x], if {x} = 0, and let a1 = [ {x}
1
], because of x = [x] + {x}, there
is
1 1
x = a0 + = a0 + .
{x} a1 + {{x}−1 }
If {{x}−1 } = 0, write
a2 = [{{x}−1 }−1 ],
consider
{{{x}−1 }−1 }−1 ,
So we got
1
x = a0 + .
a1 + 1
a2 + a 1
3 +···
The above formula is called the continued fraction expansion of real number x. To
save space, write x = [a0 , a1 , . . . , an , . . .], if and only if x is a rational number, the
continued fraction of x is expanded to be finite, denote as
b0 a0 b1 a1 a0 + 1
= , = .
c0 1 c1 a1
The progressive fraction bcii of the real number x is a reduced fraction, that is
(bi , ci ) = 1, and has the following properties.
(ii) If i ≥ 1, then
bi ci−1 − bi−1 ci = (−1)i−1 . (5.34)
Proof We prove that (i) by induction. Obviously, the proposition of i = 2 holds, that
is
b2 a2 b1 + b0 a2 (a1 a0 + 1) + a0
= = .
c2 a2 c1 + c0 a2 a1 + 1
bi ai bi−1 + bi−2
= .
ci ai ci−1 + ci−2
So (i) holds.
We prove Formula (5.34) by induction, when i = 1,
b1 c0 − b0 c1 = a1 a0 + 1 − a1 a0 = 1 = (−1)0 .
So when i = 1, the proposition holds, and when i, the proposition holds, that is
Then
bi+1 ci − bi ci+1 = (ai+1 bi + bi−1 )ci − bi (ai+1 ci + ci−1 )
= bi−1 ci − bi ci−1
= (−1)i .
Thus
bi bi
|bi −
2
x 2 ci2 |
= x − x +
ci2
ci ci
1 1
< ci2 · x+ x+ .
ci ci+1 ci ci+1
So
ci 1
|bi 2 − x 2 ci2 | − 2x < 2x −1 + + 2
ci+1 2xci+1
ci 1
< 2x −1 + +
ci+1 ci+1
ci+1
< 2x −1 + = 0.
ci+1
Because √ √
|bi2 − nci2 | < 2 n, =⇒ bi2 mod n < 2 n, ∀ i ≥ 0.
Combining the above Lemma 5.23 with the factorization method, we obtain the
continued fraction decomposition method.
Continued fraction decomposition method:
The operations of mod n involved in this algorithm, except that it is specially
pointed out, are the minimum nonnegative residue of mod n. If n √ is a large odd
number,
√ it is also
√ a compound number, first let b−1 = b, b0 = a 0 = [ n], and x0 =
n − a0 = { n}, calculate b02 mod n, in fact, b02 mod n = b02 − n. Second, consider
i = 1, 2, . . .. To determine bi , we proceed in several steps:
1. Let ai = [ xi−1
1
], and xi = xi−1
1
− ai (i ≥ 1).
2. Let bi = ai bi−1 + bi−2 , the minimum nonnegative residual bi mod n of bi under
mod n is still recorded as bi .
3. calculate bi2 mod n.
√
By Lemma 5.23, bi2 mod n < 2 n, it can be decomposed into the product of some
small prime numbers. If a prime number p appears in the decomposition of two or
more bi2 mod n, or in the decomposition of an bi2 mod n, p appears to an even power,
p is called a standard prime number, in other words, a standard prime p is
Or
p α bi2 mod n, α is even.
r
and c = j∈B p j j , where
226 5 Prime Test
1
rj = αi j , ∀ j ∈ B.
2 i∈A
Solution: We calculate ai , bi and bi2 mod n in turn, where bi = (ai bi−1 + bi−2 ) mod n,
the table is as follows:
i 0 1 2 3 4
ai 95 3 1 26 2
bi 95 286 381 1119 2619
bi2 mod n −48 139 −7 87 −27
From the value of bi2 mod n, we can choose the factor base B as B = {−1, 2, 3, 7}.
Then bi2 mod n is the number of B-number, when i = 0, 2, 4, . . .. The corresponding
binary vector is
Because b2 ≡ c2 (mod 9073), that is 38342 = 362 (mod 9073), but 3834 ≡ ±36
(mod 9073), so we get a nontrivial factor of n = 9073, d = (3834 + 36, 9073) = 43.
Thus 9073 = 43 · 211, the factorization of 9073 is obtained.
Exercise 5
1. p is a prime, if and only if b p−1 ≡ 1(mod p 2 ), p 2 to base b is a Fermat pseudo
prime.
2. What is the minimum pseudo prime number with Fermat pseudo prime for base
5? What is the minimum Fermat pseudo prime number for base 2?
3. n = pq, p = q are two primes, let d = ( p − 1, q − 1), it is proved that n to base
b is Fermat pseudo prime number, if and only if bd ≡ 1(mod n), and calculate
the number of bases b.
4. If b ∈ Z∗n , n to base b is Fermat pseudo prime, then n to base −b and b are Fermat
pseudo prime numbers.
5. If n to base 2 is Fermat pseudo prime, then N = 2n − 1 is also Fermat pseudo
prime.
n
−1
6. If n to base b is Fermat pseudo prime, and (b − 1, n) = 1, then N = bb−1 to
base b is also Fermat pseudo prime.
5.5 Continued Fraction Method 227
8. Find all Carmichael numbers of form 3 pq and all Carmichael numbers of form
5 pq.
9. Prove that 561 is the minimum Carmichael number.
10. If n to base 2 is a Fermat pseudo prime, prove N = 2n − 1 is a strong pseudo
prime.
11. There are infinite Euler pseudo primes and strong pseudo primes for base 2.
12. If n to base b is a strong pseudo prime, then n to base bk is also a strong pseudo
prime for any integer k.
13. The Fermat factorization method is used to decompose the positive integer as
follows:
14. The Fermat factorization method is used to decompose the positive integer as
follows:
References
Adelman, L. M., Pomerance, C., & Rumely, R. S. (1983). On distinguishing prime number from
composite numbers. Annals of Mathematics, 117, 173–206.
Berent, R. P., & Pollared, J. M. (1981). Factorization of the eighth Fermat number. Mathematics of
Computation, 36, 627–630.
Blair, W. D., Lacampague, C. B., & Selfridge, J. L. (1986). Factoring large numbers on a pocket
calculator. The American Mathematical Monthly, 93, 802–808.
Brent, R. P. (1980). An improved Monte Carlo factorization algorithm. BIT, 20, 176–184.
Cohen, H., & Lenstra, H. W. (1984). Primality testing and Jacobi sums. Mathematics of Computa-
tion, 142, 297–330.
Dawonport, H. (1982). The higher arithmetic. Cambridge University Press.
Dickson, L. E. (1952). History of the theory of number (Vol. 1). Chelsea.
Dixon, J. D. (1984). Factorization and primality tests. The American Mathematical Monthly, 91,
333–352.
Guy, R. K. (1975). How to factor a number. In Proceedings of the 5th Manitoba Conference on
Numerical Mathematics (pp. 49–89).
Kranakis, E. (1986). Primality and cryptography. Wiley.
228 5 Prime Test
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 6
Elliptic Curve
In 1985, mathematician v. Miller introduced elliptic curve into cryptography for the
first time. In 1987, mathematician N. Koblitz further improved and perfected Miller’s
work and formed the famous elliptic curve public key cryptosystem. Elliptic curve
public key cryptosystem, RSA public key cryptosystem and ElGamal public key
cryptosystem based on discrete logarithm are recognized as the three major public
key cryptosystems, which occupy the most prominent position in modern cryptogra-
phy. Compared with RSA cryptography, elliptic curve cryptography can provide the
same or higher level of security with a shorter key; compared with ElGamal cryp-
tosystem, they are based on the same mathematical principle and are essentially based
on discrete logarithm cryptosystem. ElGamal cryptosystem is based on the discrete
logarithm of multiplication group over finite field, and elliptic curve cryptosystem is
based on the discrete logarithm of Mordell group of elliptic curve over finite field, but
choosing elliptic curve has more flexibility than choosing finite field, so elliptic curve
cryptosystem has attracted more attention This paper systematically and comprehen-
sively introduces elliptic curve cryptography from the three aspects of cryptography
mechanism and factorization, in order to make readers better understand and master
this public key cryptography mechanism.
y 2 = x 3 + ax + b, where a ∈ E, b ∈ E given.
C E represents the elliptic curve, and “O” represents the infinity point, i.e.,
(ii) If χ (E) = 2, then an elliptic curve C E on the field E with the characteristic
of 2 is defined as
line between P and Q on the x y−plane, which is the intersection of the connecting
line and the elliptic curve, define P + Q = −R, is the specular reflection point of
R. If Q is infinity. Then define P + O = P.
(4) If Q = −P, that is, P and Q have the same x-coordinate, and P + Q = O is
defined as infinity.
(5) If P = Q is a finite point on C E . Then the tangent of C E at P has exactly an
intersection R with C E , define P + P = −R.
We use the geometric construction method to define the addition on elliptic curve
C E , for the connection of finite points with different x-coordinates and why the
tangent at the finite point has only a unique intersection with C E , it needs strict
mathematical proof. We attribute it to the following lemma.
Lemma 6.1 Let P = (x1 , y1 ), Q = (x2 , y2 ) be two finite points on elliptic curve
C E , and x1 = x2 , then
(i) The line between P and Q has only a unique intersection R = (x3 , y3 ) with
C E , satisfies R = P, R = Q, where
x3 = ( xy22 −y ) − x1 − x2 ,
1 2
−x1
(6.4)
y3 = −y1 + ( xy22 −y 1
−x1
)(x1 − x3 ).
(ii) Let α be the value of derivative ddyx at point P, then the tangent of point P and
C E only have a unique intersection R = (x3 , y3 ), R = P, where
3x12 +a 2
x3 = ( 2y1
) − 2x1 ,
3x12 +a (6.5)
y3 = −y1 + ( 2y1 )(x1 − x3 ).
Proof Let the functional equation of the connecting line between P and Q be y =
αx + β on the x y−plane, where
y2 − y1
α= , β = y1 − αx1 .
x2 − x1
Therefore, the three solutions of x 3 − (αx + β)2 + ax + b = 0 are x, and each solu-
tion will produce an intersection. But we assume that P and Q are at the intersection,
so there is only the third intersection R = (x, αx + β) = (x3 , αx3 + β). Because the
three solutions x1 , x2 , x3 of equation (6.6) satisfy the following relationship
x1 + x2 + x3 = α 2 .
There is
232 6 Elliptic Curve
x3 = ( xy22 −y ) − x1 − x2 ,
1 2
−x1
y3 = αx3 + β = −y1 + ( xy22 −y 1
−x1
)(x1 − x3 ).
Thus, (6.4) holds. If point Q is infinitely close to point P, the connecting line becomes
the tangent of curve C E at point P, now
dy 3x 2 + a
α= |(x1 ,y1 ) = 1 .
dx 2y1
So the tangent has a unique intersection with C E , R = P, R = (x3 , αx3 + β), where
3x12 +a 2
x3 = α 2 − 2x1 = ( 2y1
) − 2x1 ,
3x 2 +a
y3 = −y1 + ( 2y1 1 )(x1 − x3 ).
Remark 6.1 In Lemma 6.1, if P = (x1 , 0), that is y1 = 0, then the only intersection
of the tangent of point P and C E is defined as the infinity point “O .
From Definition 6.1 and Lemma 6.1, we have the following important corollaries.
Corollary 6.1 (i) All points of elliptic curve C E form an Abel group under addition,
in which the infinity point “O” is the zero element of the group. This group is called
Mordell group.
(ii) If P = (x1 , y1 ), Q = (x2 , y2 ) is a rational point, that is, x1 , y1 , x2 , y2 is a
rational number, then another unique intersection R between the line between P
and Q and C E is also a rational point.
Proof (i) is directly given by Definition 6.1. Conclusion (ii) is directly derived from
Formula (6.4) and Formula (6.5) of Lemma 6.1.
G = T or (G) ⊕ Fr ee(G).
Mordell first proved the following important conclusions. Mordell Theorem: The
Abel group G on elliptic curve C E (E = Q is a rational number field) is finitely
generated; in other words, G is a finitely generated Z-module. Therefore, Mordell
group G can be decomposed into
G = T or (G) ⊕ Z(α1 , α2 , . . . , αr ).
where
x 3 + ax + b, if p = 2,
f (x) = a, b, c ∈ Fq . (6.8)
x 3 + ax 2 + bx + c, if p = 2.
Theorem 6.1 (Hasse Theorem) Let Nq be the number of elliptic curve F(x, y) = 0
at the midpoint of Fq , then we have
√
|Nq − (q + 1)| ≤ 2 q.
We use Fq (x) to represent the rational function field on Fq , then the univariate
algebraic function field defined by y 2 = f (x) can be regarded as a quadratic finite
extension field on Fq (x). The genus d of this function field is d = 3. Hasse can
prove that the Riemann hypothesis on this special algebraic function field is true;
that is, all zeros of the corresponding Riemann ξ −function lie on the straight line of
s = 21 + it. A special case of this conclusion is
√ √
| χ (x 3 + ax + b)| ≤ (d − 1) q = 2 q. (6.11)
x∈Fq
By (6.10),
√
|Nq − (q + 1)| ≤ 2 q.
Remark 6.2 (6.11) is called the characteristic sum over a finite field, so that g(x) ∈
Fq [x] is any polynomial and χ is any nontrivial multiplication characteristic over Fq ,
according to A. Weil’s famous theorem, we have the following general characteristics
and estimates, √
| χ (g(x))| ≤ (deg g − 1) q.
x∈Fq
+∞
1
Z (T ) = Z (T, C E ) = exp( Nq n T n ). (6.12)
n=1
n
The above formula can also be derived directly from Hasse theorem.
Now let’s look at a specific elliptic curve in F2 , y 2 + y = x 3 ; thus, we have a
better understanding of A. Weil’s theorem. Because F(x, y) = y 2 + y − x 3 = 0 has
three points in F2 , the zeta function on the elliptic curve,
+∞
Nn n
Z (T ) = exp( T )
n=1
n
2T 2 + 1
= .
(1 − T )(1 − 2T )
√ √
Write 2T 2 + 1 = (1 − α1 T )(1 − α2 T ), where α1 = i 2, α2 = −i 2. Take loga-
rithms on both sides of the above formula and compare the coefficients of T n on both
sides,
2n + 1, if n is odd,
Nn = n n
2 + 1 − 2(−2) 2 , if n is even.
An elliptic curve over a finite field Fq forms a finite Abel group G, which is similar to
Fq∗ ; therefore, the elliptic curve public key cryptosystem can be constructed by using
discrete logarithm. Compared with other public key cryptosystems based on discrete
logarithm (such as ElGamal cryptosystem), elliptic curve cryptosystem has greater
flexibility, because when a huge q is selected, the working platform of ElGamal
cryptosystem has only one multiplication group Fq∗ , but multiple elliptic curves can
be defined on Fq , so there will be multiple Mordell groups to choose, and elliptic
curve cryptosystem has greater concealment and security.
6.2 Elliptic Curve Public Key Cryptosystem 237
where Nq is the number of points of curve C E and the order of Mordell group G.
Proof Let P = (x, y), y = 0, then P + P = (x , y ), where x and y are determined
by Equation (6.5), (6.5) (addition, subtraction, multiplication, division, etc.) involved
in the formula shall not exceed 20 calculations, and the bit operation times of each
calculation is O(log3 q). By the “repeated square method,” k P can be transformed
into log k steps, thus
T ime(k P) = O(log k log3 q).
1 ≤ n ≤ k M, n = mk + j, 0 ≤ m < M, 1 ≤ j ≤ k. (6.15)
Fq ∼
= F p [x]/<g(x)> = {a0 + a1 x + · · · + ar −1 x r −1 |ai ∈ F p }.
A = {τ (n)|1 ≤ n ≤ k M} ⊂ Fq .
Next, for each m(0 ≤ m < M), we establish a 1-1 correspondence σ between m
and the point on elliptic curve C E . Arbitrary choice 1 ≤ j ≤ k, then n = mk + j
corresponds to an element in Fq , that is τ (n) = x j ∈ Fq . For each x j , consider the
solution of the following equation.
y 2 = f (x j ) = x 3j + ax j + b. (6.16)
If the above equation has a solution, let y1 be one of the solutions, then Pm =
(x j , y1 ) ∈ C E , we let σ (m) = Pm , the inverse mapping σ −1 (Pm ) of σ is
τ −1 (x j ) − 1
σ −1 (Pm ) = [ ]. (6.17)
k
6.2 Elliptic Curve Public Key Cryptosystem 239
τ −1 (x j ) − 1 j −1
[ ] = [m + ] = m.
2 k
1
P{σ Successfully implemented} ≥ 1 − .
2k
We complete the proof of lemma.
Remark 6.3 f (x j ) = x 3j + ax j + b is a square number, that is, the probability that
Equation (6.16) has a solution is exactly Nq /2q, where Nq is the number of points
of C E . By Hasse’s theorem, Nq /2q is very close to 21 .
Definition 6.3 Let C E be an elliptic curve over a finite field Fq and B ∈ C E be a
point. For any point P on C E , if there is an integer x, such that x B = P, x is called
the discrete logarithm of P to base B.
With the above preparation, we can establish elliptic curve public key cryptosystem.
Diffie–Hellman key conversion principle
Symmetric cryptosystem, also known as classical cryptosystem or traditional
cryptosystem, is the mainstream cryptosystem before the advent of public key cryp-
tosystem. It has high efficiency because its encryption and decryption share the same
algorithm (such as DES, the data encryption standard algorithm launched by the
American Bureau of standards in 1977). When Diffie and Hellman proposed asym-
metric cryptosystem, they pointed out that symmetric cryptosystem and asymmetric
cryptosystem are not completely separated. The two cryptosystems are interrelated
and can even be used together. Diffie–Hellman key conversion principle is based on
the following mathematical principles.
240 6 Elliptic Curve
σ (a) = a0 + a1 p + · · · + ar −1 pr −1 ∈ Zrp .
Z N = {0, 1, 2, . . . , N − 1} ⊂ {0, 1, 2, . . . , N − 1, N , . . . , pr − 1} = Z pr .
σ
That is, Z N is regarded as a subset of Z pr . Let Z pr −→ F pr be 1-1 correspond, so σ
gives that Z N → F pr is an injection. The Lemma holds.
From the above conclusions, we can establish Diffie–Hellman’s key conversion
principle. Because symmetric cryptographic keys are related to the numbers of Z N ,
each number in Z N can be embedded into a finite field Fq by Lemma 6.6. Therefore,
the discrete logarithm on Fq can encrypt each embedded number asymmetrically, so
that the two cryptosystems can be combined with each other.
Taking the affine cryptosystem introduced in Chap. 4 as an example, A is a k × k-
order reversible square matrix in Z N , b = (b1 , b2 , . . . , bk ) ∈ ZkN is a given vector,
affine transformation f = (A, b) gives the encryption algorithm of each plaintext
unit m = m 1 m 2 · · · m k ∈ ZkN .
⎛⎞ ⎛ ⎞
m1 b1
⎜ .. ⎟ ⎜ .. ⎟
f (m) = c = A ⎝ . ⎠ + ⎝ . ⎠ .
mk bk
Let A = (ai j )k×k , each ai j ∈ Z N . By Lemma 6.6, we can embed ai j into a finite field
Fq . ai j is encrypted again by using the discrete logarithm algorithm on Fq , so that
the two cryptosystems can be effectively combined.
6.2 Elliptic Curve Public Key Cryptosystem 241
(a0 , a1 , . . . , ar −1 ) → a0 + a1 p + · · · + ar −1 pr −1 .
1 ≤ e ≤ N , and (e, N ) = 1.
de ≡ 1(mod N ), and 1 ≤ d ≤ N .
Suppose user A wants to encrypt and send plaintext message Pm to user B, so that
(e A , d A ) and (e B , d B ) are the respective private keys of A and B. First, A sends a
message e A Pm to B, and then B returns the message e B e A Pm to A, A can calculate
the message by using the private key d A . Because N Pm = 0, d A e A ≡ 1(mod N ), so
d A e B e A Pm = e B Pm .
Finally, user A sends the calculation result e B Pm to B, and user B can read the original
real message Pm of user A by using the private key d B , because d B e B ≡ 1(mod N ),
so
d B e B Pm = Pm .
242 6 Elliptic Curve
It should be noted that even if user B receives the message e A Pm sent by A for the
first time, e A Pm is given to user B as a point Q = e A Pm on the elliptic curve. If B
does not calculate the discrete logarithm, e A and d A are not known. Although the
last user B already knows the plaintext Pm , the calculation of the discrete logarithm
of Q under base Pm is very complex. Similarly, when user A receives a reply from
user B and calculates e B Pm , he cannot know B’s private key (e B , d B ).
ElGamal elliptic curve cryptography
ElGamal cryptosystem is another elliptic curve cryptosystem completely different
from Massey–Omura cryptosystem. In this system, the order N of Mordell group of
elliptic curve does not need to be known. All users jointly select a fixed finite field
Fq , an elliptic curve C E on Fq and a fixed point B ∈ C E on C E as the basis of discrete
logarithm. Each user randomly selects an integer a(0 ≤ a < Nq ) as the private key,
calculates Q = a B ∈ C E and discloses it. Its workflow is as follows:
If user A wants to encrypt and send a plaintext unit Pm to user B, the public key
of A is Q A = a A · B, the private key is a A , the public key of B is Q B = a B · B and
f
the private key is a B . The encryption algorithm of A −→ B is
The decryption algorithm is that user B multiplies the first number with private key
a B and then subtracts the second number. That is,
Because Q B = a B · B, there is
f −1 (c) = Pm + ka B · B − ka B · B = Pm .
Where k is an integer randomly selected by user A. This integer k does not appear
in cryptosystemtext c and is called a layer of “ mask” added by user A to protect
plaintext Pm . In fact, the cryptosystemtext c = (A1 , A2 ) received by user B is two
points on elliptic curve C E , where
A1 = k B, A2 = Pm + k Q B = Pm + k(a B · B).
Even if the third user knows the private key a B of user B (assuming that the private
key of user B is not secure), decryption with A2 − a B · B cannot obtain plaintext
Pm , because
A2 − a B · B = Pm + k Q B − a B B = Pm + k(a B · B) − a B · B = Pm ,
if k = 1.
6.2 Elliptic Curve Public Key Cryptosystem 243
The two elliptic curve cryptosystems introduced above are based on the selected
elliptic curve C E and a point B on C E as the basis of discrete logarithm. How to
randomly select C E and B needs further research.
Proof This conclusion can be deduced directly from the root formula of cubic alge-
braic equation.
Check whether f (x) = x 3 + ax + b has multiple roots. From Lemma 6.7, just check
whether discriminant 4a 3 + 27b2 is 0. If f (x) has no multiple roots, then select the
elliptic curve y 2 = x 3 + ax + b. Where (x0 , y0 ) ∈ C E is a point on an elliptic curve.
So let B = (x0 , y0 ) is the base of discrete logarithm. Similarly, for q = 2r or q = 3r ,
we can also randomly draw an elliptic curve C E and determine the basis B ∈ C E of
the discrete logarithm at the same time.
It should be noted that at present, no algorithm can calculate the number of points
Nq of any elliptic curve. Some special algorithms, such as schoof algorithm, are quite
complex and lengthy in practical application, although the computational complexity
is polynomial.
Now we introduce the second method of selecting elliptic curves, called mod p
method. An elliptic curve C E , if E is a number field, such as E = R, Q, C, C E
is called a global curve. We use the mod p method to convert a global curve into a
“local” curve. Firstly, a point B ∈ C E on a global curve C E and C E is selected, where
B is the group element of Mordell group, its addition order is ∞, where E = Q is
the rational number field.
C E : y 2 = x 3 + ax + b, a, b ∈ Q.
Let p be a prime number and coprime with the integers in the denominators of a and
b, then we obtain an elliptic curve on F p ,
and a point B mod p on C E mod p, when localizing an elliptic curve, the choice of
prime p only needs to satisfy
In this way, the Mordell group of C E mod p is a cyclic group, and any finite point
of C E mod p will be the generator of the group. At present, there is no deterministic
algorithm for selecting the prime number p satisfying Formula (6.20), and it is gen-
erally speculated that a probabilistic algorithm with success probability ≥ O( log1 p )
exists.
In 1986, mathematician H.W. Lenstra used elliptic curve to find a new method of
factor decomposition. Lenstra’s method has greater advantages than the known old
algorithms in many aspects, which is also one of the main reasons why elliptic curve
has attracted more and more attention in the field of cryptography, We first introduce
a classical factorization method called Pollard ( p − 1) algorithm.
( p − 1) algorithm
Suppose n is a compound number, and p is a prime factor of n; of course, p is
unknown and needs to be further determined. If p − 1 happens to have some small
prime factors, or all prime factors of p − 1 are not too large, the essence of ( p − 1)
method is to find the prime factor p with this property of n. ( p − 1) method can be
completed in the following steps:
1. Let B be a positive integer. Select a positive integer k so that k is a multiple of
most positive integers smaller than B, for example, k = B!, or k can be the least
common multiple of all positive integers smaller than B.
2. Select a positive integer a to satisfy 2 ≤ a ≤ n − 2, (a, n) = 1, such as a = 2,
or a = 3, and any randomly selected positive integer.
3. Using the “repeated square method” to calculate the minimum nonnegative resid-
ual a k mod n of a k under mod n.
4. The maximum common divisor d = (a k − 1, n) of a k − 1 and n is calculated
by Euclidean rolling division method.
5. If d = 1 or d = n, that is, if d is the trivial factor of n, re select a, and then repeat
steps 1–4 above.
In order to explain the working principle of ( p − 1) algorithm, we further assume
that k is a multiple of all positive integers less than B, and p|n,
p−1= piαi , where ∀ piαi ≤ B. (6.21)
In order to randomly generate an elliptic curve C E over the rational number field
Q, we randomly select three integers a, x0 , y0 ∈ Z, let b = y02 − x03 − ax0 to satisfy
And a point (x0 mod p, y0 mod p) ∈ C E mod p on C E mod p, let’s write this point
on C E mod p with P, that is
k = a0 + a1 2 + a2 22 + · · · + am−1 2m−1 , ∀ ai = 0 or 1.
P1 mod p + P2 mod p = 0.
If x1 ≡ x2 (mod p), it is obvious that Formula (6.4) is true from Formula (6.25).
Might as well make x1 ≡ x2 (mod p). If P1 = P2 , now x1 = x2 , y1 = y2 , we only
need p 2y1 . If p|2y1 , because the coordinates of 2P1 = (x, y) are determined by
equation (6.5), 3x12 +α 2
x = ( 2y 1
) − 2x1 ;
3x12 +α
y = y1 − ( 2y1
)(x1 − x).
3x 2 +a
Where α = 2y1 1 . By p|2y1 , =⇒ 3x12 + α ≡ 0(mod p). Because n is an odd number,
so p|y1 , we have
x13 + ax1 + b ≡ 0(mod p);
3x12 + a ≡ 0(mod p).
That is, x1 is the root of f (x) = x 3 + ax + b and derivative f (x) = 3x 2 + a(mod p).
This is contradictory to (4a 3 + 27b2 , n) = 1. So you might as well let P1 = P2 , now
x1 ≡ x2 (mod p), x1 = x2 (because P1 = −P2 ), we can write
x2 = x1 + t pr , r ≥ 1.
The numerator and denominator of t and p are mutually prime, which can be deduced
from Formula (6.4),
y2 = y1 + spr .
y22 − y12
≡ 3x12 + a(mod p).
x2 − x1
y22 − y12 y2 − y1
=
(y2 + y1 )(x2 − x1 ) x2 − x1
Lenstra algorithm.
Let n be an odd compound number, we hope to find a nontrivial factor d of n, d|n,
1 < d < n, so there is factorization n = d · dn . Previously, we have introduced the
random selection of an elliptic curve C E on rational number field Q and a point P on
C E . Lenstra’s algorithm hopes to factorize n by (C E , P). There is no doubt that the
Lenstra algorithm to be explained below is also a probability algorithm. If (C E , P)
cannot be factorized successfully, as long as the probability of failure is p < 1,
select another elliptic curve and a point above. If this continues, after randomly and
independently selecting n elliptic curves, the probability of successful factorization
of n,
n
P{n = d · } ≥ 1 − p n ( p < 1).
d
When n is sufficiently large, the success probability of Lenstra algorithm can be
infinitely close to 1. Therefore, the so-called Lenstra algorithm can be simply sum-
marized as an algorithm that factorizes n by using any rational elliptic curve (C E , P),
and its failure probability is p < 1.
Let (C E , P) be a given rational elliptic curve, and B and C be the positive upper
bound of selection. Let k be divided by some small prime powers, to be exact,
k= l αl , (6.27)
1<l≤B
k1 · (P mod p) = 0, ∀ p|n.
From the selection of equation k in (6.27), there is a maximum probability k1 |k, thus
k · (P mod p) = 0, ∀ p|n.
9. The deterministic algorithm can map the embedding of plaintext units to any
Fq − elliptic curve. Please give the specific algorithm process for the following
elliptic curves:
(1) C E : y 2 = x 3 − x, when q ≡ 3(mod 4),
(2) C E : y 2 + y = x 3 , when q ≡ 2(mod 3).
10. Let C E be an elliptic curve on the finite field F p , and Nr represents the number
of midpoint of C E in the finite field F pr , then
(i) If p > 3, when r > 1, Nr is not prime.
(ii) When p = 2, 3, a counterexample is given to show that Nr is a prime number.
11. Take an example of an elliptic curve C E , which has only one point on F4 , the
infinity point. Take Nr as the number of points of C E on F4r , then Nr is the
square of Mersenne prime 2r − 1.
12. Decompose n = 53467 at k = 840, a = 2 using Pollard’s ( p − 1) method.
k
13. Let n k = 22 + 1 be Fermat number, the following is Pepin’s method to detect
whether n k is a prime number:
2k−1
(i) n k is a prime, if and only if there is an integer a, a 2 ≡ −1(mod n k ).
(ii) If n k is a prime, then a ∈ Z∗n k over 50% has the congruence property of (i).
(iii) When k > 1, we can always choose a = 3, 5, or a = 7.
References
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 7
Lattice-Based Cryptography
n
|x| = x, x = xi2 . (7.2)
i=1
λ ∈ R, then λ · x is defined as
If the inner product x, y = 0 of two vectors x and y, x and y are said to be
orthogonal, denote as x⊥y.
Proof (i) and (ii) can be derived directly from the definition. To prove (iii), let
x = (x1 , x2 , . . . , xn ), y = (y1 , y2 , . . . , yn ) ∈ Rn , by Hölder inequality:
n n n 21
xi yi ≤ 2
xi 2
yi .
i=1 i=1 i=1
So there is
n
n
n
n
|x + y|2 = (xi + yi )2 = xi2 + 2 xi yi + yi2
i=1 i=1 i=1 i=1
n
n n
≤ xi + 2
2
xi yi + yi2
i=1 i=1 i=1
⎛ ⎞2
n n
≤⎝ xi2 + yi2 ⎠ = (|x| + |y|)2 ,
i=1 i=1
if x⊥y, then
|x ± y|2 = |x|2 + |y|2 .
From Pythagorean theorem, for orthogonal vector x⊥y, we have the following
conclusion,
|x + y| = |x − y|, if x⊥y. (7.4)
We have
Lemma 7.2 For any A ∈ Rm×n , and any positive vector c = (c1 , c2 , . . . , cn ) ∈ Rn ,
then R(A, c) is a symmetric convex body in Rn .
1 1
ρ= (1 + λ), σ = (1 − λ).
2 2
Then ρ ≥ 0, σ ≥ 0, and ρ + σ = 1. So there is
ρx + σ (−x) = λx ∈ R.
Proof Let η1 be the sign of λ and η2 be the sign of μ, then by Lemma 7.3,
256 7 Lattice-Based Cryptography
x = η1 (|λ| + |μ|)x ∈ R,
y = η2 (|λ| + |μ|)y ∈ R.
|λ| |μ|
Let ρ = |λ|+|μ|
, σ = |λ|+|μ|
, then ρ + σ = 1. By definition, we have
λx + μy = ρx + σ y ∈ R,
thus the Lemma holds. And this result is not difficult to be extended to the case of n
variables.
and
[x] = (δ1 , δ2 , . . . , δn ) ∈ Zn , (7.7)
where [xi ] is the square bracket function of xi and δi is the nearest integer to xi .
For each integral point u ∈ Zn , define
Ru = {x ∈ R|[[x]] = u}
and
Du = {x − u|x ∈ Ru }.
Because Ru 1 ∩ Ru 2 = ∅, if u 1 = u 2 . Thus by R = u Ru ,
⇒ V = Vol(R) = Vol(Ru )
u
= Vu > 1,
u
There is
Vu ≤ 1,
u
7.1 Geometry of Numbers 257
Lemma 7.6 (Minkowski) Let R be a symmetric convex body, and the volume of R
V = Vol(R) > 2n ,
Proof Let
1 1
R= x|x ∈ R .
2 2
Thus
1 1
Vol R = V > 1,
2 2n
1 1
u= y − z, y ∈ R, z ∈ R.
2 2
By Lemma 7.4, then u ∈ R. The Lemma holds.
Remark 7.1 The above Minkowski’s conclusion cannot be improved, that is V > 2n ,
it cannot be improved to V ≥ 2n . A counterexample is
Vol(R(A, c)) = 2n d −1 c1 c2 · · · cn .
258 7 Lattice-Based Cryptography
Proof Let A = (ai j )n×n . Write Ax = y, then x = A−1 y. And let A−1 = (bi j )n×n ,
then for any xi , there is
n n
|xi | = bi j y j ≤ |bi j | · c j ≤ B,
j=1 j=1
1
dx = dx1 · · · dxn = dy1 dy2 · · · dyn .
|det(A)|
Thus
c1 cn
1
Vol(R(A, c)) = ··· dy1 dy2 · · · dyn
|det(A)|
−c1 −cn
n
= 2n d −1 ci ,
i=1
Remark 7.2 In (7.5), “≤” is changed to “<” to define R(A, c), and the above lemma
is still holds.
Now consider the general situation, let A = (ai j )m×n . If m > n, and rank(A) ≥ n,
then R(A, c) defined by Eq. (7.5) is still a bounded region. Obviously if m < n, or
m = n, rank(A) < n, then R(A, c) is an unbounded region, and V = ∞. Therefore,
we have the following Corollary.
Corollary 7.1 Let A = (ai j )m×n , m < n or m = n, det(A) = 0, then for any small
positive vector c = (c1 , c2 , . . . , cn ), 0 < ci < ε, R(A, c) contains a nonzero integer
point. In other words, the following m inequalities
n
ai j x j < ε, 1 ≤ i ≤ m.
j=1
Proof When ε > 0 given, then Vol(R(A, c)) = ∞ > 2n . By Lemma 7.6, R(A, c)
contains at least one nonzero zero point.
When A ∈ Rn×n is a reversible square matrix, we discuss the nonzero integral point
in symmetric convex body R (A, c).
2n c1 c2 · · · cn
Vol(R (A, c)) = > 2n ,
|det(A)|
by Lemma 7.6 and 7.7, then the proposition holds, we only discuss the case when
the equal sign of formula (7.9) holds.
Let ε be any positive real number, 0 < ε < 1, then by Lemma 7.7, there is a
nonzero integral solution x (ε) = (x1(ε) , x2(ε) , . . . , xn(ε) ) ∈ Zn satisfies
⎧
⎪
⎪ n
⎪
⎪ (ε)
a1 j x j ≤ c1 + ε ≤ c1 + 1,
⎪
⎪
⎨ j=1
(7.10)
⎪
⎪
⎪
⎪ n
⎪
⎪ (ε)
ai j x j < ci , 2 ≤ i ≤ n.
⎩
j=1
|x (ε)
j | ≤ B, 1 ≤ j ≤ n.
260 7 Lattice-Based Cryptography
The integral point x (ε) satisfying the above bounded condition is finite, so there must
be a nonzero integral point x = 0, which holds (7.10) for any ε > 0. Let ε → 0, then
the Lemma holds.
In the following discussion, we make the following restrictions on R ⊂ Rn :
Property (i) can be derived from the boundedness of R, and property (ii) can be
derived directly from the definition of R(A, c). Later we will see that 0 ≤ F(x) < ∞
holds for all x ∈ Rn . The main property of distance function F(x) is the following
Lemma.
Lemma 7.9 If F(x) is a distance function defined by R satisfying the constraints,
then
(i) Let λ ≥ 0, then x ∈ λR ⇔ F(x) ≤ λ;
(ii) F(λx) = |λ|F(x) holds for all λ ∈ R, x ∈ Rn ;
(iii) F(x + y) ≤ F(x) + F(y), ∀ x, y ∈ Rn .
Proof Since R is closed, by the definition, F −1 (x)x ∈ R. Thus, if λ ≥ F(x), by
Lemma 7.3, then
F(x) F(x)
−1
λ x= −1
· F (x)x, ≤ 1.
λ λ
F(λx) ≤ |λ|F(x).
7.1 Geometry of Numbers 261
Conversely, let δ = F(λx), because of δ −1 λx ∈ R, you might as well let λ > 0, thus
δ
F(x) ≤ =⇒ λF(x) ≤ F(λx).
λ
So there is F(λx) = |λ|F(x), (ii) holds.
To prove (iii), we let μ1 = F(x), μ2 = F(y), =⇒ μ−1 −1
1 x ∈ R, μ2 y ∈ R. By
Lemma 7.4, we have
μ1 μ2
(μ1 + μ2 )−1 (x + y) = (μ−1 x) + (μ−1 y) ∈ R.
μ1 + μ2 1 μ1 + μ2 2
Thus
F(x + y) ≤ μ1 + μ2 .
Corollary 7.2 Let R ⊂ Rn meet the limiting conditions (7.11), and Vol(R) > 0,
then
(i) ∀ x ∈ Rn , there is λ such that x ∈ λR;
(ii) Let {α1 , α2 , . . . , αn } ⊂ R be a set of bases of Rn , then
n
μi αi ||μ1 | + |μ2 | + · · · + |μn | ≤ 1 ⊂ R.
i=1
Proof Because F(x) < ∞, so by (i) of Lemma 7.9, we can directly deduce the
conclusion of (i) and (ii) given directly by Lemma 7.4.
By Lemma 7.6, let V be the volume of R, then Vol(λR) = λn V , for the first
continuous minimum λ1 , we have the following estimation
λn1 V ≤ 2n . (7.14)
For λ j ( j ≥ 2), there is no explicit upper bound estimation, but we have the following
conclusions.
Lemma 7.10 Let R ⊂ Rn be a convex body satisfying the limiting condition (7.11),
V = Vol(R), λ1 , λ2 , . . . , λn be n continuous minima of R, then we have
2n
≤ V λ1 λ2 · · · λn ≤ 2 n . (7.15)
n!
Proof We only prove the left inequality of the above formula, and we continuously
select the linear independent whole point x (1) , x (2) , . . . , x ( j) such that x ( j) ∈ λ j R,
and x ( j) x (1) , x (2) , . . . , x ( j−1) is linearly independent. Let x ( j) =(x j1 , x j2 , . . . , x jn ) ∈
Zn . Because matrix A = (x ji )n×n is an integer matrix, and det(A) = 0, so
| det(A)| ≥ 1.
So set
R1 = {μ1 x (1) + μ2 x (2) + · · · + μn x (n) ||μ1 |λ1 + |μ2 |λ2 + · · · + |μn |λn ≤ 1} ⊂ R.
So there is
2n
≤ Vol(R1 ) ≤ Vol(R) = V.
n!λ1 · · · λn
7.1 Geometry of Numbers 263
Therefore, the left inequality of (7.15) holds. The proof of the right inequality is
quite complex and is omitted here. Interested readers can refer to the classic works
(1963, 1971) of J. W. S. Cassels.
Theorem 7.1 Let θ1 , θ2 , . . . , θn be any n real numbers, θi = 0, then for any positive
number N > 1, there are nonzero positive integers q and p1 , p2 , . . . , pn to satisfy
|qθi − pi | < N − n , 1 ≤ i ≤ n;
1
(7.16)
|q| ≤ N .
Proof The proof of the theorem is a simple application of Minkowski’s linear type
theorem (see Lemma 7.8). Let A ∈ R(n+1)×(n+1) be an (n + 1)-order reversible square
matrix, defined as ⎛ ⎞
−1 0 · · · · · · 0 θ1
⎜ 0 −1 · · · · · · 0 θ2 ⎟
⎜ ⎟
A=⎜ ⎟
⎜· · · · · · · · · · · · · · · · · ·⎟ .
⎝ 0 0 · · · 0 −1 θn ⎠
0 · · · · · · 0 0 −1
. . ., N − n , N ), because
1
c1 c2 · · · cn cn+1 = N −1 · N = 1 ≥ |det(A)|.
So by Lemma 7.8, the symmetric convex body R (A, c) defined by A and c has
a nonzero integral point x = ( p1 , p2 , . . . , pn , q) = 0. We prove q = 0. Because
x = 0, if q = 0, then pk = 0 (1 ≤ k ≤ n), therefore, the k-th inequality in Eq. (7.16)
will produce the following contradiction,
Corollary 7.3 Let θ1 , . . . , θn be any n real numbers, then for any ε > 0, there is
rational number pqi (1 ≤ i ≤ n) satisfies
θi − pi < ε . (7.17)
q q
264 7 Lattice-Based Cryptography
Proof Any ε > 0 given, let N − n < ε, Formula (7.17) can be derived directly from
1
Theorem 7.1.
Lattice is one of the most important concepts in modern cryptography. Most of the
so-called anti-quantum computing attacks are lattice based cryptosystems. What is
a lattice? In short, a lattice is a geometry in n-dimensional Euclidean space Rn , for
example L = Zn ⊂ Rn , then Zn is a lattice in Rn , which is called an integer lattice
or a trivial lattice. If Zn is rotated once, we get the concept of a general lattice in
Rn , which is a geometric description of a lattice, next, we give an algebraic precise
definition of a lattice.
Definition 7.3 Let L ⊂ Rn be a nonempty subset, which is called a lattice in Rn , if
(i) L is an additive subgroup of Rn ;
(ii) There is a positive constant λ = λ(L) > 0, such that
min{|x||x ∈ L , x = 0} = λ, (7.18)
Equation (7.19) shows the reason why λ is called the minimal distance of a lattice.
If x ∈ L and |x| = λ, x is called the shortest vector of L.
In order to obtain a more explicit and concise mathematical expression of any
lattice, we can regard an additive subgroup as a Z-module. First, we prove that any
lattice is a finitely generated Z-module.
Lemma 7.11 Let L ⊂ Rn be a lattice and {α1 , α2 , . . . , αm } ⊂ L be a set of vectors
in L, then {α1 , α2 , . . . , αm } is linearly independent in R if and only if {α1 , α2 , . . . , αm }
is linearly independent in Z.
Proof If {α1 , α2 , . . . , αm } is linearly independent in R, it is obviously linearly inde-
pendent in Z. conversely, if {α1 , α2 , . . . , αm } is linearly independent in Z, that is, any
linear combination
a1 α1 + · · · + am αm = 0, ai ∈ Z,
θ1 α1 + θ2 α2 + · · · + θm αm = 0, θi ∈ R. (7.20)
q ≤ N.
By (7.20), we have
≤ N − m max |αi |.
1
1≤i≤m
Let λ be the minimal distance of L and ε > 0 be a sufficiently small positive number,
we choose
−m |αi |m
N > max ε , max ,
1≤i≤m λm
1≤i≤m
Thus
| p1 α1 + · · · + pm αm | < λ.
and
m
L= ai βi |ai ∈ Z . (7.22)
i=1
T = (αi , α j )m×m .
Proof First we prove (i). Let x0 ∈ Rm satisfies A Ax0 = A b, then for any x ∈ Rm ,
we have
Ax − b = (Ax0 − b) + A(x − x0 ) = γ + γ1 ∈ Rn .
(A(x − x0 )) (Ax0 − b)
= (x − x0 ) A (Ax0 − b)
= (x − x0 ) (A Ax0 − A b) = 0.
So (i) holds.
To prove (ii), let V A be the solution space of Ax = 0 and V A A the solution space of
A Ax = 0, let’s prove V A = V A A . First, there is V A ⊂ V A A . Conversely, let x ∈ V A A ,
that is A Ax = 0, then
7.2 Basic Properties of Lattice 267
So rank(A) = rank(A A), (ii) holds. To prove (iii), b ∈ Rn given, then the rank of
the augmented matrix of linear equation system A Ax = A b is
Therefore, the augmented matrix and the coefficient matrix have the same rank, so
the linear equations have solutions. When rank(A) = m, then rank(A A) = m, that
is, A A is a reversible m-order square matrix, thus
P T P = diag{δ1 , δ2 , . . . , δm }.
Lemma 7.12 is called the least square method in linear algebra, its significance
is to find a vector x0 with the shortest length in the set {Ax − b|x ∈ Rm } for a given
n × m-order matrix A and a given vector b ∈ Rn . Lemma 7.12 gives an effective
algorithm, that is, to solve the linear equations A Ax = A b, and x0 is the solution
of the equations, Lemma 7.13 is called the diagonalization of quadratic form. Now,
the main results are as follows:
Proof Equation (7.23) proves the necessity of the condition, and we only prove the
sufficiency of the condition. If a subset L in Rn is given by Eq. (7.25), it is obvious that
L is an additive subgroup of Rn , because any α = Bx1 , β = Bx2 , where x1 , x2 ∈ Zm ,
then x = x1 − x2 ∈ Zn , and
α − β = B(x1 − x2 ) = Bx ∈ L .
We prove √
min |Bx| ≥ δ > 0. (7.26)
x∈Zm
x =0
P T P = diag{δ1 , δ2 , . . . , δm }.
|Bx|2 ≥ δ, ∀ x ∈ Zm , x = 0.
This shows that the distance between any two different points in L is ≥ δ > 0.
Therefore, in a sphere with 0 as the center and r as the radius, the number of points
7.2 Basic Properties of Lattice 269
By Theorem 7.2, a sufficient and necessary condition for a full rank lattice with
L as Rn is the existence of a reversible square matrix B ∈ Rn×n , det(B) = 0, such
that n
L = L(B) = ai βi |ai ∈ Z, 1 ≤ i ≤ n = {Bx|x ∈ Zn }. (7.27)
i=1
Obviously, S L n (Z) forms a group under the multiplication of the matrix, because
the n-order identity matrix In ∈ S L n (Z), and A1 ∈ S L n (Z), A2 ∈ S L n (Z), then A =
A1 A2 ∈ S L n (Z). Specially, if A ∈ S L n (Z),A = (ai j )n×n , then the inverse matrix of
A ∗ ∗
a a · · · a ∗
11 12 1n
a ∗ a ∗ · · · a ∗
A−1 = ± 21 22 2n ∈ S L (Z),
·
∗· · · · · · · · · · · n
a · · · · · · a ∗
n1 nn
α, β = α β = y B −1 Bx = y x ∈ Z.
Thus ⎛ ⎞ ⎛ ⎞
y1 x1
⎜ .. ⎟ −1 ⎜ .. ⎟
⎝ . ⎠ ∈ (B ) ⎝ . ⎠ .
yn xn
Corollary 7.5 Let L = L(B) be a full rank lattice, L ∗ is the dual lattice of L, then
d(L ∗ ) = d −1 (L).
Definition 7.5 Let F ⊂ Rn be a subset, and call F the basic region of a lattice (full
rank lattice) L, if
(i) ∀ x ∈ Rn , there is a α ∈ F ⇒ x ≡ α(mod L),
(ii) Any α1 , α2 ∈ F, then α1 ≡ α2 (mod L).
F1 ∼
= Rn /L , F2 ∼
= Rn /L , =⇒ F1 ∼
= F2 .
n
α= ai βi , ∀ ai ∈ R.
i=1
"n
Let [α] B = i=1 [ai ]βi , {α} B = α − [α] B , then {α} B can be expressed as
⎛ ⎞
x1
⎜ x2 ⎟
⎜ ⎟
{α} B = B ⎜ . ⎟ , where 0 ≤ xi < 1, 1 ≤ i ≤ n.
⎝ .. ⎠
xn
α − β = B(x − y) = Bz.
make variable substitution Bx = y and calculate the Jacobi of the vector value
Thus
1 1
Vol(F) = ··· d(λ)dx1 · · · dxn = d(L).
0 0
i−1
βi , β ∗j
βi∗ = βi − β ∗j , (7.31)
j=1
β ∗j , β ∗j
n
n
d= |βi∗ | ≤ |βi |. (7.35)
i=1 i=1
det(B) = det(B ∗ ).
By the definition,
d 2 = det(B B) = det(U (B ∗ ) B ∗ U )
= det((B ∗ ) B ∗ )
= det(diag{|β1∗ |2 , |β2∗ |2 , . . . , |βn∗ |2 }).
So there is
n
d= |βi∗ |.
i=1
In order to prove the inequality on the right of Eq. (7.35), we only prove
i
= u i2j β ∗j , β ∗j
j=1
i−1
= βi∗ , βi∗ + u i2j β ∗j , β ∗j .
j=1
Therefore, the inequality on the right of (7.35) holds, the Lemma is proved.
Equation (7.35) is usually called Hadamard inequality, and we give another proof
here.
In order to define the concept of continuous minima on a lattice L, we record the
minimum distance on L with λ1 . That is λ1 = λ(L). Another definition of λ1 is the
minimum positive real number r , so that the linear space formed by L ∩ Ball(0, r )
is a one-dimensional space, where
Ball(0, r ) = {x ∈ Rn ||x| ≤ r }
is a closed sphere with 0 as the center and r as the radius. The concept of n continuous
minima λ1 , λ2 , . . . , λn in L can be given.
Definition 7.6 Let L = L(B) ⊂ Rn be a full rank lattice, the i-th continuous mini-
mum λi is defined as
The following lemma is a useful lower bound estimate of the minimum distance
λ1 .
Lemma 7.18 L = L(B) ⊂ Rn is a lettice (full rank lattice), B ∗ = [β1∗ , β2∗ , . . . , βn∗ ]
is the corresponding orthogonal basis, then
βi , β ∗j = 0, and β j , β ∗j = β ∗j , β ∗j .
So
|Bx| ≥ |x j ||β ∗j | ≥ min |βi∗ |.
1≤i≤n
Proof The lattice points contained in ball Ball(0, δ) with center 0 and radius
δ (δ > λi ) are finite, because in a bounded region (finite volume), if there are infinite
lattice points, there must be a convergent subsequence, but the distance between any
different two points in L is greater than or equal to λ1 , which indicates that
has finite lattice points, it’s not hard for us to find α1 ∈ L ⇒ |α1 | = λ1 , α2 ∈ L ⇒
|α2 | = λ2 ,…,|αn | = λn . The Corollary holds.
In Sect. 7.1, the geometry of numbers is relative to the integer lattice Zn ; next, we
extend the main results to the general full rank lattice.
276 7 Lattice-Based Cryptography
Lemma 7.19 (Compare with Lemma 7.5) L = L(B) ⊂ Rn is a lattice (full rank
lattice), R ⊂ Rn , if Vol(R) > d(L), then there are two different points in R, α ∈ R,
β ∈ R ⇒ α − β ∈ L.
Proof Let F be a basic region of L, that is
Rn = ∪α∈L {α + y|y ∈ F}
= ∪α∈L {α + F}.
Rα = R ∩ {α + F} = α + Dα , Dα ⊂ F.
(α + x) − (β + x) = α − β ∈ L .
1
(2x − 2y) = x − y ∈ R.
2
The Lemma holds.
Corollary 7.7 Let L be a full rank lattice, λ(L) = λ1 is the minimum distance of L.
Then √ 1
λ1 = λ(L) ≤ n(d(L)) n . (7.38)
By the definition, there are no nonzero lattice points in open ball Ball(0, λ1 ), by
Lemma 7.20, because Ball(0, λ1 ) is a symmetrical convex body, there is
Vol(Ball(0, λ1 )) ≤ 2n d(L).
Thus n
2λ1
√ ≤ 2n d(L).
n
That is √ 1
λ1 ≤ n(d(L)) n .
λk+1 ≤ |y| is obtained from the definition of λk+1 , which contradicts the definition
of k. By y ∈ Span(α1 , α2 , . . . , αk ),
n
k
y, α ∗ 2 y, α ∗ 2
i
= i
i=1
λi |αi∗ | i=1
λi |αi∗ |
1
k
y, αi∗ 2 1
≥ = 2 |y|2 ≥ 1.
λ2k i=1
|αi∗ |2 λk
Therefore y ∈
/ T , by Lemma 7.20, because T is a symmetric convex body, thus
Vol(T ) ≤ 2n d.
So
n
n
λi ≤ n 2 d.
i=1
|u 0 | = min |x| = λ1 .
x∈L ,x =0
It is the so-called shortest vector calculation problem. At present, there are insur-
mountable difficulties in theory and calculation, because we only know the existence
of u 0 , but we can’t calculate u 0 . Second, the current main research focuses on the
approximation of the shortest vector. The so-called shortest vector approximation is
to find a nonzero vector u ∈ L on L , =⇒
|u| ≤ r (n)λ1 , u ∈ L , u = 0,
where r (n) ≥ 1 is called the approximation coefficient, which only depends on the
dimension of lattice L.
In 1982, H. W. Lenstra, A. K. Lenstra and L. Lovasz creatively developed a set
of algorithms in (1982) to effectively solve the approximation problem of the short-
est vector, which is the famous LLL algorithm in lattice theory. The computational
complexity of LLL algorithm is polynomial for the whole lattice, and the approxima-
n−1
tion coefficient r (n) = 2 2 . How to improve the approximation coefficient in LLL
algorithm to the polynomial coefficient of n is the main research topic at present.
For example, Schnorr’s work in 1987 and Gama and Nguyen’s work (2008a, 2008b)
are very representative, but they are still far from the polynomial function, so the
academic circles generally speculate:
Conjecture 1: there is no polynomial algorithm that can approximate the shortest
vector so that the approximation coefficient r (n) is a polynomial function of n.
2. Closest vector problem CVP
Let L ⊂ Rn be a lattice, t ∈ Rn is an arbitrary given vector, and it is easy to prove
that there is a lattice point u t ∈ L , =⇒
|u t − t| = min |x − t|,
x∈L
|x − t| ≤ r1 (n)|u t − t|,
There are many other difficult computational problems on lattice, such as the
Successive Shortest vector problem, which is essentially to find a deterministic algo-
rithm to approximate each αi ∈ L, where |αi | = λi is the continuous minimum of
L. However, SVP and CVP are commonly used in lattice cryptosystem design and
analysis, and most of the research is based on the integer lattice.
It is easy to see from the definition that a lattice L = L(B) is an integer lattice
⇔ B ∈ Zn×n is an integer square matrix, so the determinant d = d(L) of an entire
lattice L is a positive integer.
Proof Let α ∈ dZn , let’s prove that α ∈ L, that is, α = Bx always has the solution
of the entire vector x ∈ Zn . Let B −1 be the inverse matrix of B, then
⎡ ∗ ∗ ∗
⎤
b11 b12 · · · b1n
1 1 ⎢ ∗ ∗ ∗ ⎥
⎢ b21 b22 · · · b2n ⎥ ,
B −1 = B∗ =
det(B) det(B) ⎣ · · · · · · · · · · · · ⎦
∗ ∗ ∗
bn1 bn2 · · · bnn
x = B −1 α = d B −1 β = ±B ∗ β ∈ Zn .
Lemma 7.23 Let L be a q-ary lattice, Zq is the residual class rings mod q, then
(i) Zn /qZn ∼
= Zqn (additive group isomorphism).
∼
(ii) Z /L = Zqn / L/qZn (additive group isomorphism). Therefore, L/qZn is a linear
n
code on Zqn .
where a¯i is the minimum nonnegative residue of ai mod q, and thus, we have α ≡
σ
ᾱ(mod q). Define mapping σ : Zn −→ Zqn as σ (α) = ᾱ, this is a surjection, and
σ (α + β) = ᾱ + β̄ = σ (α) + σ (β).
Zn /qZn ∼
= Zqn .
Zn /L ∼
= Zn /qZn /L/qZn ∼
= Zqn /L/qZn .
Next, we will prove that Zn /L is a finite group. Therefore, we first discuss the
elementary transformation of matrix. The so-called elementary transformation of
matrix refers to elementary row transformation and elementary column transforma-
tion, specifically refers to the following three kinds of elementary transformations:
(1) Transform two rows or two columns of matrix A:
σi j (A)-Transform rows i and j of A
τi j (A)-Transform columns i and j of A
(3) Add the k times of a row (column) to another row (column), k ∈ R, in many
cases, we require k ∈ Z to be an integer:
σki+ j (A)-Add k times of row i of A to row j
τki+ j (A)-Add k times of column i of A to column j
The n-order identity matrix is represented by In , the matrix obtained by the above ele-
mentary transformation of In is called elementary matrix. We note that all elementary
matrices are unimodular matrices (see (7.29)), and
282 7 Lattice-Based Cryptography
⎧
⎨ σi j (A) = σi j (In )A, τi j (A) = Aτi j (In )
⎪
σ−i (A) = σ−i (In )A, τ−i (A) = Aτ−i (In ) (7.43)
⎪
⎩
σki+ j (A) = σki+ j (In )A, τki+ j (A) = Aτki+ j (In )
That is, elementary row transformation for A is equal to multiplying the correspond-
ing elementary matrix from the left, and elementary column transformation for A is
equal to multiplying the corresponding elementary matrix from the right.
U BU1 = diag{δ1 , δ2 , . . . , δn }.
where δi = 0, δi ∈ Z, and
n
d(L) = | det(U BU1 )| = |δi |.
i=1
Let L(U BU1 ) be an integral lattice generated by U BU1 , we have quotient group
isomorphism
Zn /L(U BU1 ) ∼ = ⊕i=1
n
Z/|δi |Z = ⊕i=1
n
Z|δi | .
Thus
n
|Zn /L(U BU1 )| = |δi | = d(L).
i=1
An integer square matrix B = (bi j)n×n ∈ Zn×n is called Hermite normal form
matrix, if B is an upper triangular matrix, that is bi j = 0, 1 ≤ j < i ≤ n, and
Lemma 7.25 Let L ⊂ Zn be an integer lattice, then there is a unique HNF matrix
B ⇒ L = L(B).
Lemma 7.26 Let L = L(B) be an integer lattice, B = (bi j )n×n is a HNF matrix,
B ∗ = [β1∗ , β2∗ , . . . , βn∗ ] is the orthogonal basis corresponding to B = [β1 , β2 , . . . , βn ],
then
B ∗ = [β1∗ , β2∗ , . . . , βn∗ ] = diag{b11 , b22 , . . . , bnn }
is a diagonal matrix.
i
βi+1 , β ∗j
∗
βi+1 = βi+1 − β ∗j
j=1
|β ∗j |2
i
b j (i+1)
= βi+1 − β ∗j
j=1
b jj
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
b1(i+1) b1(i+1) 0
⎜ b2(i+1) ⎟ ⎜b2(i+1) ⎟ ⎜ 0 ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ .. ⎟ ⎜ .. ⎟ ⎜ .. ⎟
⎜ . ⎟ ⎜ ⎟ ⎜ ⎟
⎜ ⎟ ⎜ . ⎟ ⎜ . ⎟
⎜ ⎟ ⎜
= ⎜ bi(i+1) ⎟ − ⎜ bi(i+1) ⎟ = ⎜ ⎟ ⎜ 0 ⎟.
⎟
⎜b(i+1)(i+1) ⎟ ⎜ 0 ⎟ ⎜b(i+1)l(i+1) ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ .. ⎟ ⎜ . ⎟ ⎜ .. ⎟
⎝ . ⎠ ⎝ .. ⎠ ⎝ . ⎠
0 0 0
and
⊥
q (A) = {y ∈ Zm |Ay ≡ 0(mod q)}. (7.46)
⊥
That is, q (A) and q (A) are q-element lattices of dimension m.
Lemma 7.27 We have ⊥ ∗
q (A) =q q (A)
⊥ ∗
q (A) =q q (A)
∗
Proof Any α ∈ q (A) , by the definition, then
y, α ∈ Z, ∀ y ∈ q (A).
And
7.3 Integer Lattice and q-Ary Lattice 285
There is
y qα ≡ 0(mod q), ∀ y ∈ q (A).
Because y ∈ q (A), thus there is x ∈ Zn ⇒ y ≡ A x(mod q), from the above for-
mula,
x Aqα ≡ 0(mod q), ∀ x ∈ Zn .
Thus
⊥
Aqα ≡ 0(mod q), ⇒ qα ∈ q (A).
We prove
∗ ⊥
q q (A) ⊂ q (A).
⊥
Conversely, if y ∈ q (A), we have
1
Ay ≡ 0(mod q) ⇒ A y ≡ 0(mod 1).
q
We have
1 ∗ ∗
y∈ q (A) ⇒y∈q q (A) .
q
That is
⊥ ∗
q (A) ⊂q q (A) .
⊥ ∗
Thus, q (A) =q q (A) . Similarly, the second equation can be proved.
⊥
| det( q (A))| = qn, (7.47)
and
| det( q (A))| = q m−n . (7.48)
Proof In finite field Zq , rank(A) = n, then the linear equation system Ay = 0 has
exactly q m−n solutions, from which we can get
⊥
| q (A)/qZ |
m
= q m−n .
286 7 Lattice-Based Cryptography
By Lemma 7.23,
⊥ ⊥
|Zm / q (A)| = |Zqm / q (A)/qZ |
m
= qn.
By Lemma 7.24,
⊥ ⊥
| det( q (A))| = |Zm / q (A)| = qn.
By Lemma 7.27,
⊥ ∗
| det( q (A))| = q m | det( q (A) )| = q m−n .
In lattice theory, Reduced basis and corresponding LLL algorithm are the most
important contents, which have an important impact on computational algebra, com-
putational number theory and other neighborhoods, and are recognized as one of
the most important computational methods in recent 100 years. In order to introduce
Reduced basis and LLL algorithm, we recall the gram Schmidt orthogonalization
process summarized by Eqs. (7.31)–(7.34). Let {β1 , β2 , . . . , βn } ⊂ Rn be a set of
bases corresponding to Rn , {β1∗ , β2∗ , . . . , βn∗ } is the corresponding Gram–Schmidt
orthogonal basis, where
i−1
βi , β ∗j
β1∗ = β1 , βi∗ = βi − β ∗j , 1 < i ≤ n. (7.49)
j=1
β ∗j , β ∗j
i
βi , β ∗j
βi = β ∗j , 1 ≤ i ≤ n. (7.50)
j=1
β ∗j , β ∗j
There is
Lemma 7.29 Let {β1 , β2 , . . . , βn } be a set of bases of Rn , {β1∗ , β2∗ , . . . , βn∗ } is the
corresponding Gram–Schmidt orthogonal basis, L(β1 , β2 , . . . , βk ) = Span{β1 , β2 ,
. . . , βk } is a linear subspace extended by β1 , β2 , . . . , βk , then
7.4 Reduced Basis 287
(i)
L(β1 , β2 , . . . , βk ) = L(β1∗ , β2∗ , . . . , βk∗ ), 1 ≤ k ≤ n. (7.51)
x, βi∗
xi = , 1 ≤ i ≤ n. (7.53)
βi∗ , βi∗
Proof The above three properties can be derived directly from Eq. (7.49) or (7.50).
βi , β ∗j
Ui j = , ⇒ Ui j = 0, when j > i. Uii = 1. (7.54)
β ∗j , β ∗j
Therefore, U is the lower triangular matrix with element 1 on the diagonal, and
⎡ ⎤ ⎡ ∗⎤
β1 β1
⎢ β2 ⎥ ⎢ β2∗ ⎥
⎢ ⎥ ⎢ ⎥
⎢ .. ⎥ = U ⎢ .. ⎥ . (7.55)
⎣ . ⎦ ⎣ . ⎦
βn βn∗
V ⊥ = {x ∈ Rk |x, α = 0, ∀ α ∈ V }. (7.56)
x = α + β, where α ∈ V, β ∈ V ⊥ .
Lemma 7.30 Let {β1 , β2 , . . . , βn } be a set of bases of Rn and {β1∗ , β2∗ , . . . , βn∗ } be
the corresponding orthogonal basis, 1 ≤ k ≤ n, then βk∗ is the orthogonal projection
of βk on the orthogonal complement space V of the subspace L(β1 , β2 , . . . , βk−1 )
of L(β1 , β2 , . . . , βk ).
288 7 Lattice-Based Cryptography
k−1
βk = βk∗ + u k j β ∗j ,
j=1
and !
k−1
βk∗ , u k j β ∗j = 0.
j=1
Let {α1∗ , α2∗ , . . . , αn∗ } be the corresponding orthogonal basis and A1 = (vi j )n×n be
the corresponding coefficient matrix, then we have
(i) αi∗ = βi∗ , if i = k − 1, k.
(ii)
∗
αk−1 = βk∗ + u kk−1 βk−1
∗
αk∗ = βk−1
∗ ∗
− vkk−1 βk−1 .
+
(iii) vi j = u i j , if 1 ≤ j < i ≤ n, and {i, j} {k, k − 1} = ∅.
(iv) |β ∗ |2
vik−1 = u ik−1 vkk−1 + u ik |α∗k |2 , i > k.
k−1
k−1
βk∗ = βk − u k j β ∗j
j=1
k−2
∗
= βk − u kk−1 βk−1 − u k j β ∗j ,
j=1
αk∗ = βk−1
∗ ∗
− vkk−1 αk−1 .
where ∗ ∗
βk−1 , αk−1
vkk−1 = ∗
|αk−1 | 2
β , u kk−1 β ∗
∗
= k−1 ∗ 2 k−1
|αk−1 |
∗
|βk−1 |2
= u kk−1 ∗ ,
|αk−1 |2
thus (ii) holds. Similarly, other properties can be proved. Lemma 7.31 holds.
Lemma 7.32 Let {β1 , β2 , . . . , βn } be a set of bases of Rn , {β1∗ , β2∗ , . . . , βn∗ } be the
corresponding orthogonal basis, and A = (u i j )n×n be the coefficient matrix. For any
k ≥ 2, if we replace βk with βk − rβk−1 and keep the other βi unchanged (i = k),
we get a new set of bases.
Let {α1∗ , α2∗ , . . . , αn∗ } be the corresponding orthogonal basis and A1 = (vi j )n×n the
corresponding coefficient matrix, then we have
(i) αi∗ = βi∗ , ∀ 1 ≤ i ≤ n, that is, βi∗ remains unchanged.
(ii) vi j = u i j , if 1 ≤ j < i ≤ n, i = k.
(iii)
vk j = u k j − r u k−1, j , if j < k − 1
vkk−1 = u kk−1 − r, if j = k − 1.
290 7 Lattice-Based Cryptography
Proof When i < k, or i > k, αi∗ = βi∗ is trivial, to prove (i), only prove when i =
k. Because αk∗ is the orthogonal projection of αk = βk − rβk−1 in the orthogonal
complement space L(αk∗ ) = L(βk∗ ) of L(β1 , β2 , . . . , βk−1 ) = L(α1 , α2 , . . . , αk−1 ),
k−1
βk∗ = βk − u k j β ∗j
j=1
⎛ ⎞
k−2
= βk − rβk−1 − ⎝ ∗ ⎠
u k j β ∗j + (u kk−1 − r )βk−1
j=1
⎛ ⎞
k−2
= αk − ⎝ ∗ ⎠
u k j β ∗j + (u kk−1 − r )βk−1 .
j=1
This proves that αk∗ = βk∗ . Thus (i) holds. To prove (ii), when i = k, we have
αi , α ∗j βi , β ∗j
vi j = = = ui j ,
|α ∗j |2 |β ∗j |2
αk , α ∗j
vk j =
|α ∗j |2
βk − rβk−1 , β ∗j
= (1 ≤ j < k ≤ n)
|α ∗j |2
βk , β ∗j βk−1 , β ∗j
= −r
|β ∗j |2 |β ∗j |2
= u k j − r u k−1 j .
The above formula holds for all 1 ≤ j ≤ k − 1, thus (iii) holds, the Lemma holds.
lattice L in Rn has Reduced bases, and the method to calculate the Reduced bases is
the famous LLL algorithm.
1
|u kk−1 | ≤ , ∀ 1 ≤ k. (7.58)
2
If there is a k > 1, then the above formula does not hold, let r be the nearest integer
of u kk−1 , obviously,
1
|u kk−1 − r | ≤ .
2
In {β1 , β2 , . . . , βn }, replace βk with βk − rβk−1 , thus by Lemma 7.32,
u k j → u k j − r u k−1 j , 1 ≤ j ≤ k.
Specially, when j = k − 1,
u kk−1 → u kk−1 − r,
under the new basis, all βi and u i j (1 ≤ j < i = k) remain unchanged, so Eq. (7.58)
holds under the new basis.
In the second step of LLL algorithm, we prove that
3 ∗ 2
|βk∗ − u kk−1 βk−1
∗
|2 ≥ |β | , ∀ 1 < k ≤ n. (7.59)
4 k−1
By (7.4),
|βk∗ + u kk−1 βk−1
∗
|2 = |βk∗ − u kk−1 βk−1
∗
|2 .
Therefore, the sign in the absolute value on the right of Eq. (7.59) can be changed
arbitrarily. If there is a k, 1 < k ≤ n such that (7.59) does not hold, that is
3 ∗ 2
|βk∗ + u kk−1 βk−1
∗
|2 < |β | . (7.60)
4 k−1
In this case, if βk and βk−1 are exchanged and the other βi remains unchanged,
there is a new set of bases {α1 , α2 , . . . , αn }, the corresponding orthogonal basis
{α1∗ , α2∗ , . . . , αn∗ } and the coefficient matrix A1 = (vi j )n×n , where
3 ∗ 2
|αk∗ + vkk−1 αk−1
∗
|2 ≥ |α | , (7.61)
4 k−1
by Lemma 7.31,
∗
αk−1 = βk∗ + u kk−1 βk−1
∗
αk∗ = βk−1
∗ ∗
− vkk−1 βk−1 .
By (7.60), we have
∗ 3 ∗ ∗
|αk−1 |2 < |α + vkk−1 αk−1 |2 .
4 k
That is
4 ∗ 2 3 ∗ 2
|αk∗ + vkk−1 αk−1
∗
|2 > |α | > |αk−1 | .
3 k−1 4
Thus (7.61) holds. Using the above method continuously, it can be proved that formula
∗
(7.59) is valid for ∀ k > 1, however, when k is replaced by k − 1, the new βk−1 is
replaced by
∗ ∗ ∗ ∗
βk−1 → βk−1 + u k−1k−2 βk−2 = βk−1 .
1
|u k j | ≤ , ∀ 1 ≤ j < k ≤ n. (7.62)
2
When j = k − 1, (7.58) is the (7.62). For given k, 1 < k ≤ n, if (7.62) does not hold,
let l be the largest subscript ⇒ |u kl | > 21 . Let r be the nearest integer to u kl , then
|u kl − r | ≤ 21 . Replace βk with βk − rβl , from Lemma 7.32, all βi∗ remain unchanged
and the coefficient matrix is changed to:
7.4 Reduced Basis 293
u k j = u k j − r ul j , 1 ≤ j < l
u kl = u kl − r.
1
|u kl − r | = |vkl | ≤ .
2
So we have Eq. (7.62) for all 1 ≤ j < k ≤ n.
The above matrix transformation is equivalent to multiplying a unimodular matrix
from the right, so the Reduced basis B ⇒ L = L(B) of lattice L is finally obtained.
We complete the proof of Theorem 7.3.
Lemma 7.33 Let L = L(B) be a lattice, B is a Reduced basis of L, and B ∗ =
[β1∗ , β2∗ , . . . , βn∗ ] is the corresponding orthogonal basis, then for any 1 ≤ j < i ≤ n,
we have
|β ∗j |2 ≤ 2i− j |βi∗ |2 .
3 ∗ 2
|βk∗ + u kk−1 βk−1
∗
|2 ≥ |β | .
4 k−1
Thus
3 ∗ 2
|βk∗ + u kk−1 βk−1
∗
|2 = |βk∗ |2 + u 2kk−1 |βk−1
∗
|2 ≥ |β | .
4 k−1
There is
3 ∗ 2
|βk∗ |2 = ∗
|β | − u 2kk−1 |βk−1 |2
4 k−1
3 ∗ 2 1 ∗ 2
≥ |βk−1 | − |βk−1 |
4 4
1 ∗ 2
= |βk−1 | .
2
So when 1 ≤ j < i ≤ n given, we have
1 ∗ 2
|βi∗ |2 ≥ |β |
2 i−1
1 ∗ 2
≥ |βi−2 |
4
≥ ···
≥ 2−(i− j) |β ∗j |2 ,
thus
|β ∗j |2 ≤ 2i− j |βi∗ |2 .
294 7 Lattice-Based Cryptography
Remark 7.3 In the definition of Reduced base, the coefficient 43 on the left of the
second inequality of (7.57) can be replaced by any δ, where 41 < δ < 1. Specially,
Babai pointed out in (1986) that the second inequality of Eq. (7.57) can be replaced
by the following weaker inequality,
1 ∗
|βi∗ | ≤ |β |. (7.63)
2 i−1
Let’s discuss the computational complexity of the LLL algorithm. Let B =
{β1 , β2 , . . . , βn } be any set of bases, for any 0 ≤ k ≤ n, we define
n−1
D= dk , (7.66)
k=1
Then
k(k−1)
3 2
dk ≥ m(L)k , 1 ≤ k ≤ n.
4
Then
k(k−1)
3 2
dk ≥ m(L k )k
4
k(k−1)
3 2
≥ (m(L))k .
4
max |βi |2 ≤ N .
1≤i≤n
The binary digits of all integers in the LLL algorithm are O(n log N ), so the compu-
tational complexity of the LLL algorithm on the integer lattice is polynomial.
Proof By (7.36), we have
|βi∗ | ≤ |βi |, 1 ≤ i ≤ n.
k
k
dk = |βi∗ |2 ≤ |βi |2 ≤ N k , 1 ≤ k ≤ n.
i=1 i=1
And n(n−1)
1≤D≤N 2 . (7.68)
Therefore, the first conclusion of Theorem 7.4 is proved. The second conclusion is
more complex, we will omit it. Interested readers can refer to the original (1982) of
A. K. Lenstra, H. W. Lenstra and L. Lovasz.
The most important application of lattice Reduced basis and LLL algorithm is to
provide approximation algorithms for the shortest vector problem and the shortest
adjacent vector problem, and obtain some approximate results. Firstly, we prove the
following Lemma.
Lemma 7.35 Let {β1 , β2 , . . . , βn } be a Reduced basis of a lattice L, {β1∗ , β2∗ , . . . , βn∗ }
be the corresponding orthogonal basis, and d(L) be the determinant of L, then we
have
(i)
n
n(n−1)
d(L) ≤ |βi | ≤ 2 4 d(L). (7.69)
i=1
(ii)
n−1 1
|β1 | ≤ 2 4 d(L) n . (7.70)
Proof The inequality on the left of (i), called Hadamard inequality, has been
,n given
by Lemma 7.17. The inequality on the right of (i) gives an upper bound of i=1 |βi |,
by Lemma 7.33,
i− j
|β ∗j | ≤ 2 2 |βi∗ |, 1 ≤ j < i ≤ n. (7.71)
Thus
i−1
βi = βi∗ + u i j β ∗j .
j=1
We get
i−1
|βi |2 = |βi∗ |2 + u i2j |β ∗j |2
j=1
1 ∗2
i−1
≤ |βi∗ |2
+ |β |
4 j=1 j
⎛ ⎞
1
i−1
≤ ⎝1 + 2i− j ⎠ |βi∗ |2 (7.72)
4 j=1
7.5 Approximation of SVP and CVP 297
1
= 1 + (2i − 2) |βi∗ |2
4
≤ 2i−1 |βi∗ |2 .
There is
n
n
|βi |2 ≤ 2i−1 |βi∗ |2
i=1 i=1
"n−1
n
=2 i=0 i
|βi∗ |2
i=1
n
= 2 2 (n−1) |βi∗ |2
n
i=1
2 (n−1)
n
=2 (d(L))2 .
So
n
|βi | ≤ 2 4 (n−1) d(L).
n
i=1
Thus
"n
n
|β1 |2n ≤ 2 i=0 (i−1) |βi∗ |2
i=1
= 2 2 (n−1) (d(L))2 .
n
So
n−1 1
|β1 | ≤ 2 4 d(L) n .
n−1 n−1
|β1 | ≤ 2 2 λ1 = 2 2 λ(L). (7.74)
n
n
x= ri βi = ri βi∗ , ri ∈ Z, ri ∈ R, 1 ≤ i ≤ n.
i=1 i=1
Thus
|β1 |2 ≤ 2k−1 |x|2 ≤ 2n−1 |x|2 , x ∈ L , x = 0.
The following results show that not only the shortest vector, the whole Reduced
basis vector is the approximation vector of the Successive Shortest vector of the
lattice.
Proof Write
n
xj = ri j βi , ri j ∈ Z, 1 ≤ i ≤ n, 1 ≤ j ≤ t.
i=1
|x j |2 ≥ |βi(∗ j) |2 , 1 ≤ j ≤ t.
Change the order of x j to ensure i(1) ≤ i(2) ≤ · · · ≤ i(t), then j ≤ i( j), for ∀ 1 ≤
j ≤ t holds. Otherwise, the assumption that
x, βk∗
rk = = rk .
βk , βk∗
For a Successive Shortest vector called α1 , α2 , . . . , αn , |αi | is the shortest under the
condition that αi is linearly independent of {α1 , α2 , . . . , αi−1 }.
Next, we choose the Reduced basis to solve the shortest adjacent vector problem
(CVP). For any given t ∈ Rn , because there are only finite lattice points in one lattice
L in the Ball(t, r ) with t as the center and r as the radius, there is a lattice point u t
closest to t, that is
|u t − t| = min |x − t|. (7.79)
x∈L ,x =t
n
x= xi βi , xi ∈ R,
i=1
n
[x] B = δi βi , (7.81)
i=1
[x] B is called the discard vector of x under base B, write x = [x] B + {x} B , then
n
1 1
{x} B ∈ ai βi | − < ai ≤ , 1 ≤ i ≤ u .
i=1
2 2
integer of xi , according to the nearest plane method, we take (see Lemma 7.43 below).
7.5 Approximation of SVP and CVP 301
⎧
⎪
⎪ U = L(β1∗ , β2∗ , . . . , βn−1
∗
) = L(β1 , β2 , . . . , βn−1 )
⎪
⎪
⎪
⎪ r = δn βn ∈ L
⎪
⎪
⎪
⎨ "
n−1
x = xi βi∗ + δn βn∗
(7.82)
⎪
⎪
i=1
⎪
⎪ "
n−1
⎪
⎪ y is a sublattice The grid point closest to x − v in L = Zβi
⎪
⎪
⎪
⎩ i=1
ω = y+v
We prove that
|x − ω| ≤ 2 2 −1 |βn∗ |.
n
(7.84)
|x − ω| = |x1 − δ||θ | ≤ |x − nθ |, ∀ n ∈ Z.
|x − ω| = |u x − x|.
1 ∗
|x − x | = |xn − δn ||βn∗ | ≤ |β |, (7.85)
2 n
since the distance between affine planes {u + z|z ∈ L} is at least |βn∗ |, and |x − x |
is the distance between x and the nearest affine plane, there is
|x − x | ≤ |u x − x|. (7.86)
Thus
1 ∗2
|x − ω|2 ≤ |β | (1 + 2 + 22 + · · · + 2n−1 )
4 n
1
= (2n − 1)|βn∗ |2
4
≤ 2n−2 |βn∗ |2 .
There is
|x − ω| ≤ 2 2 −1 |βn∗ |,
n
(7.88)
|x − ω| = |x − v − y| ≤ Cn−1 |x − u x | ≤ Cn−1 |x − u x |,
n
where Cn = 2 2 . By (7.87), we have
1
|x − ω|2 ≤ (1 + (Cn−1 )2 ) 2 |x − u| < Cn |x − u|.
1 ∗
|x − u x | ≥ |β |.
2 n
By (7.88), we get
n
|x − ω| < 2 2 |x − u x |.
the approximation coefficient. Using the rounding off technique, we can give an
approximation to adjacent vectors, another main result in this section is
Because
|m − βk |
sin θk = min ,
m∈Uk |βk |
so by (7.92), ⇒ (7.91), the Lemma holds. To prove (7.92), let {β1∗ , β2∗ , . . . , βn∗ } be
the orthogonal basis corresponding to the Reduced basis Reduced {β1 , β2 , . . . , βn },
then m ∈ Uk can express as
n
m= ai βi = b j β ∗j , ai , b j ∈ R.
i =k j=1
304 7 Lattice-Based Cryptography
Write ⎡ ⎤ ⎡ ∗⎤
β1 β1
⎢ β2 ⎥ ⎢ β2∗ ⎥
⎢ ⎥ ⎢ ⎥
m = (a1 , . . . , an ) ⎢ . ⎥ = (a1 , . . . , an )U ⎢ . ⎥ .
⎣ .. ⎦ ⎣ .. ⎦
βn βn∗
n
bj = ai u i j , βk = u ki βi∗ .
i =k i=1
So
n
m − βk = γ j β ∗j , where γ j = b j − u k j .
j=1
k n2 n
9
|βk | =
2
u 2k j |β ∗j |2 ≤ γ j2 |β ∗j |2 . (7.94)
j=1
2 j=1
n 2(n−k)
2
γ j2 ≥ . (7.95)
j=k
3
n 2(n−k)
2
γ j2 < .
j=k
3
By (7.93), ⎧
⎪
⎪ γn = a n
⎪
⎪
⎪
⎪
⎨ γn−1 = an−1 + an u nn−1
γn−2 = an−2 + an−1 u n−1n−2 + an u nn−2
⎪
⎪
⎪ ···
⎪
⎪
⎪
⎩ γ =a +a u
k k k+1 k+1k + · · · + an u nk
We can prove
n− j n−k
3 2
|a j | < · . (7.97)
2 3
n n
|ai |
|a j | = |γ j − ai u i j | ≤ |γ j | +
i= j+1 i= j+1
2
n−k
1 3 n−i 2 n−k
n
2
< +
3 2 i= j+1 2 3
n−k n− j−1
2 1 2 n−k 3 i
= +
3 2 3 i=0
2
n−k n−k n− j
2 2 3
= + −1
3 3 2
n− j n−k
3 2
= .
2 3
Therefore, under the assumption of (7.96), we have (7.97). Take j = k in (7.97), then
|ak | < 1, but ak = −1, this contradiction shows that Formula (7.96) does not hold,
thus (7.95) holds.
We now prove Formula (7.94) to complete the proof of Lemma. By Lemma 7.33,
|βk∗ |2 ≥ 2 j−k |β ∗j |2 , 1 ≤ j ≤ k ≤ n.
And
|βk∗ |2 ≤ 2 j−k |β ∗j |2 , 1 ≤ k ≤ j ≤ n.
k
k
u 2k j |β ∗j |2 ≤ |βk∗ |2 u 2k j 2k− j
j=1 j=1
1 ∗ 2 k− j
k
≤ |β | 2
4 k j=1
1 ∗2 k
= |β | (2 − 1)
4 k
< 2k |βk∗ |2 .
n
n
γ j2 |β ∗j |2 ≥ γ j2 |β ∗j |2
j=1 j=k
n
≥ γ j2 2k− j |βi∗ |2
j=k
n
≥ 2k−n |βk∗ |2 γ j2
j=k
2(n−k)
k−n 2
≥2 |βk∗ |2
3
n2
2
≥ |βk∗ |2 .
9
n
1
x − ω = x − [x] B = ci βi , |ci | ≤ (1 ≤ i ≤ n).
i=1
2
n
ux − ω = ai βi , ai ∈ Z.
i=1
We prove
n2
9
|u x − ω| ≤ 2n |u x − x|. (7.99)
2
Obviously,
|u x − ω| ≤ n|ak βk |. (7.100)
n
u x − x = (u x − ω) + (ω − x) = (ai + ci )βi = (ak + ck )(βk − m).
i=1
where
1
m=− (a j + c j )β j ∈ Uk .
ak + ck j =k
By (7.99),
n2
1 2
|u x − x| = |ak + ck ||βk − m| ≥ |βk ||ak |.
2 9
There is n2
9
|ak βk | ≤ 2 |u x − x|.
2
So n2
9
|u x − ω| ≤ 2n |u x − x|.
2
Because there is a unique HNF base in L (see Lemma 3.4). Let B = HNF(L) be
HNF matrix, B as public key and R as private key. Let v ∈ Zn be an integer point,
e ∈ Rn is an error vector. Let σ be a parameter vector. Take e = σ or e = −σ , they
each chose with a probability of 21 .
Encryption: for the plaintext v ∈ Zn encoded and input and the error vector ran-
domly selected according to the parameter vector σ , the public key B is used for
encryption. The encryption function f B,σ is defined as
n
c= xi αi , xi ∈ R.
i=1
n
[c] R = δi αi ∈ L . (7.102)
i=1
−1
In order to verify the correctness of decryption function f B,σ , we first prove the
following simple Lemma. For any x ∈ R , and R = [α1 , α2 , . . . , αn ] ∈ Rn×n is any
n
n
[x] R = δi αi ∈ L(R). (7.105)
i=1
Proof Write
⎡ ⎤ ⎡ ⎤
a1 δ1
⎢ a2 ⎥ ⎢ δ2 ⎥
⎢ ⎥ ⎢ ⎥ 1
x = ⎢ . ⎥ ∈ Rn ⇒ [x] = ⎢ . ⎥ ∈ Zn , |ai − δi | ≤ .
.
⎣ . ⎦ .
⎣ . ⎦ 2
an δn
"n
If x = i=1 xi αi , R = [α1 , α2 , . . . , αn ], then
⎡⎤ ⎡ ⎤
x1 δ1
⎢ x2 ⎥ ⎢ δ2 ⎥
⎢ ⎥ ⎢ ⎥
x = R ⎢ . ⎥ , and [x] R = R ⎢ . ⎥ , δi is the nearest integer to xi .
⎣ .. ⎦ ⎣ .. ⎦
xn δn
Thus ⎡ ⎤
δ1
⎢ δ2 ⎥
⎢ ⎥
R −1 [x] R = ⎢ . ⎥ = [R −1 x].
⎣ .. ⎦
δn
B = RU, U ∈ S Ln(Z).
So
B −1 R = U R −1 R = U = T,
T [R −1 c] = T [R −1 (Bv + e)]
= T [R −1 Bv + R −1 e]
= T [T −1 v + R −1 e].
[T −1 v + R −1 e] = T −1 v + [R −1 e]. (7.107)
Thus
T [R −1 c] = v + T [R −1 e].
That is
−1
f B,σ (c) = v + T [R −1 e].
Therefore, when the private key R is given, the selection of error vector e and param-
eter vector σ becomes the key to the correctness of GGH password. Notice that
(7.106), if we decrypt with public key B, then
Therefore, the basic condition for the security and accuracy of GGH password is
[R −1 e] = 0
(7.109)
[B −1 e] = 0.
each ei has the same absolute value, that is |ei | = σ , σ is the parameter. Thus,
2|en | > bnn ⇒ [B −1 e] = 0. Let’s focus on [R −1 e] = 0.
∀ x = (x1 , x2 , . . . , xn ) ∈ Rn , define the L 1 norm |x|1 and L ∞ norm |x|∞ of x as
n
|x|∞ = max |xi |, |x|1 = |xi |. (7.110)
1≤i≤n
i=1
⎡ ⎤
α1
⎢ α2 ⎥
⎢ ⎥
Lemma 7.39 Let R ∈ Rn×n be a reversible square matrix, R −1 = ⎢ . ⎥, where αi
⎣ .. ⎦
αn
−1
is the row vector of R . e = (e1 , e2 , . . . , en ) ∈ R , |ei | = σ , ∀ 1 ≤ i ≤ n, let
n
Proof Suppose αi = (ci1 , ci2 , . . . , cin ), the i-th component of R −1 e can be written
as n
n
ci j e j ≤ σ |ci j | = σ |αi |∞ ≤ σρ.
i=1 j=1
If σ < 1
2ρ
, then each component of R −1 e is < 21 , there is [R −1 e] = 0.
⎡ ⎤
α1
⎢ α2 ⎥
⎢ ⎥ √γ ,
Lemma 7.40 R ∈ Rn×n , R −1 = ⎢ . ⎥, let max1≤i≤n |αi |∞ = then the prob-
⎣ .. ⎦ n
αn
ability of [R −1 e] = 0 is
1
P{[R −1 e] = 0} ≤ 2n exp − . (7.112)
8σ 2 γ 2
To satisfy [R −1 e] = 0, then only one of the above conditions {|ai | > 21 } is true. Thus
n
1
P{[R −1 e] = 0} = P |ai | >
i=1
2
n
1
≤ P |ai | >
i=1
2
1
< 2n exp − 2 2 .
8σ γ
In order to have a direct impression of Eq. (7.113), let’s give an example. Let n = 120,
ε = 10−5 , when the elements of matrix R −1 = (ci j )n×n change in the interval [−4, 4],
that is −4 ≤ ci j ≤ 4, then it can be verified that the maximum L ∞ norm of the row
vector of R −1 is approximately equal to 30×√ 1
, thus γ = 301
, by Corollary, when
120
−1
σ ≤ ( 30 8 log 240 × 10 ) ≈ 11.6 ≈ 2.6, we have
1 5 30
It can be seen from the above analysis that GGH cryptosystem does not effectively
solve the selection of private key R, public key B, especially parameter σ and error
vector. In 2001, Professor Micciancio of the University of California, San Diego
further improved GGH cryptosystem by using HNF basis and adjacent plane method.
In order to introduce GGH/HNF cryptosystem, we review several important results
in the previous sections.
1
ρ= min |β ∗ |. (7.115)
2 1≤i≤n i
α ∈ L ⇒ |x − α| < ρ. (7.116)
(ii) Suppose L ⊂ Zn is an integer lattice, then L has a unique HNF base B, that is
L = L(B), B = (bi j )n×n , satisfies
That is, B is an upper triangular matrix, and the corresponding orthogonal basis
B ∗ of B is a diagonal matrix, that is
Proof Equation (7.114) is given by Lemma 7.18 and the property (ii) is given by
Lemma 7.26. We only prove that if there is lattice point α ∈ L ⇒ |x − α| < ρ, then
α is the only one. Let α1 ∈ L , α2 ∈ L, and
In the previous section, we introduced Babai’s adjacent plane method (see (7.82)).
The distance between two subsets A1 and A2 in Rn is defined as
τ B (x) = w = y + v ∈ L . (7.117)
Lemma 7.42 Under the above definition, if v1 , v2 ∈ L, and Av1 = Av2 , then
Proof v1 , v2 ∈ L, then it can be given by the linear combination of {β1∗ , β2∗ , . . . , βn∗ },
that is "n
v1 = i=1 ai βi∗ , where ai ∈ R, an ∈ Z.
"n
v2 = i=1 bi βi∗ , where bi ∈ R, bn ∈ Z.
n
n
v1 = ai∗ βi , v2 = bi∗ βi , ai∗ , bi∗ ∈ Z.
i=1 i=1
Therefore,
v1 , βn∗ an∗ βn , βn∗
an = = = an∗ ∈ Z.
|βn∗ |2 |βn∗ |2
The above equation uses Eq. (7.52), which can prove bn ∈ Z in the same way. By
condition v1 − v2 ∈
/ U , then an = bn , therefore
That is, the affine plane closest to x is Aβn , the orthogonal projection of x on Av
is x .
(ii) Let u x ∈ L be the lattice point closest to x, then
|x − x | ≤ |x − u x |. (7.120)
7.6 GGH/HNF Cryptosystem 315
Proof Take v = δβn , then v ∈ L,"we want to prove that the distance between x and
n
Av is the smallest. Because x = i=1 γi βi∗ , so (see (7.119))
n−1
x −v = γi βi∗ + (γn − δ)βn∗ ,
i=1
1 ∗
=⇒ |x − Av | = |x − v − U | ≤ |γn − δ||βn∗ | ≤ |β |.
2 n
Let v1 ∈ L , v − v1 ∈
/ U , by trigonometric inequality,
1 1
|x − Av1 | ≥ |Av1 − Av | − |x − Av | ≥ |βn∗ | − |βn∗ | = |βn∗ | ≥ |x − Av |.
2 2
n−1
n−1
βn = ci βi∗ + βn∗ ⇒ δβn = δci βi∗ + δβn∗ = v. (7.121)
i=1 i=1
Thus
n−1
x −v = (γi − δci )βi∗ ∈ U.
i=1
Then Av and U are two parallel planes, thus (x − x )⊥Av . This proves that the
orthogonal projection of x on Av is x , and thus (i) holds.
The proof of (ii) is direct. By the definition of x and any affine plane Aα , the
distance of α ∈ L satisfies
|x − α| ≥ |x − Aα |.
|x − x | = |x − Av | ≤ |x − Aα |, ∀ α ∈ L .
|x − x | ≤ |x − Au x | ≤ |x − u x |.
τ B (x) = α. (7.122)
Proof Because of
|x − Aα | ≤ |x − α| < ρ.
1
bii < ρ, ∀ 1 ≤ i ≤ n.
2
2. Let v ∈ Zn be an integer, e ∈ Rn is the error vector, satisfies |e| < ρ.
3. Encryption: after any plaintext information v ∈ Zn and error vector e are selected,
with ρ as the parameter, the encryption function f B,ρ is defined as
f B,ρ (v, e) = Bv + e = c.
F is just a quadrilateral.
Lemma 7.45 For any integer point α ∈ Zn , there is a unique w ∈ F(B ∗ ) such that
α ≡ w(mod L).
n
n
w= ai βi∗ − [ai ]βi . (7.126)
i=1 i=1
Then
n
α−w = [ai ]βi ∈ L ⇒ α ≡ w(mod L).
i=1
318 7 Lattice-Based Cryptography
n
w= bi βi∗ .
i=1
n
w= ai βi∗ , where |ai | < 1.
i=1
We prove that if
w = 0(mod L) ⇔ w = 0. (7.127)
"n
Write w = i=1 bi βi , then by (7.52), there is
w1 − w2 ≡ 0(mod L).
From the above lemma, any two points in parallelogram F(B ∗ ) are not congruent
mod L, therefore, for not congruent lattice points α1 , α2 ∈ L, then
{F(B ∗ ) + α1 } ∩ {F(B ∗ ) + α2 } = ∅.
w = α mod L .
If B is the HNF basis of the whole lattice L, then B ∗ = diag{b11 , b22 , . . . , bnn },
thus, parallelogram F(B ∗ ) takes the simplest form:
n
log bii = log (bii ) = log d(L). (7.132)
i=1
To sum up, the parallelogram of the HNF basis of L has a particularly simple
geometry, which is actually a cube, which is very helpful for calculating the reduction
vector x mod L of an entire point x ∈ Zn , the reduction vector is of great significance
in the further improvement and analysis of GGH/HNF cryptosystem. For detailed
work, please refer to D. Micciancio’s paper (Micciancio, 2001) in 2001.
the key generation is very simple, and the encryption and decryption algorithms are
much faster than the commonly used RSA and elliptic curve cryptography, NTRU, in
particular, can resist quantum computing attacks and is considered to be a potential
public key cryptography that can replace RSA in the postquantum cryptography era.
The essence of NTRU cryptographic design is the generalization of RSA on
polynomials, so it is called the cryptosystem based on polynomial rings. However,
NTRU can give a completely equivalent form by using the concept of q-ary lattice, so
NTRU is also a lattice based cryptosystem. For simplicity, we start with polynomial
rings.
Let Z[x] be a polynomial ring with integral coefficients and N ≥ 1 be a positive
integer. We define the polynomial quotient ring R as
N −1
F(x) = Fi x i = (F0 , F1 , . . . , FN −1 ) ∈ Z N . (7.133)
i=0
N −1
N −1
F(x) = Fi x i , G(x) = Gi x i .
i=0 i=0
Define
N −1
F ⊗ G = H (x) = Hi x i = (H0 , H1 , . . . , HN −1 ).
i=0
For any k, 0 ≤ k ≤ N − 1,
k N −1
Hk = Fi G k−i + Fi G N +k−i
i=0 i=k+1
(7.134)
= Fi G j .
0≤i<N
0≤ j<N
i+ j≡k(mod N )
Lemma 7.46 Under the new multiplication, R is a commutative ring with unit ele-
ments.
Proof By (7.134),
F ⊗ G = G ⊗ F, F ⊗ (G + H ) = F ⊗ G + F ⊗ H.
7.7 NTRU Cryptosystem 321
a ⊗ F = a F = (a F0 , a F1 , . . . , a FN −1 ).
N −1
1
F̃ = Fi , is arithmetic mean of the coefficients of F. (7.135)
N i=0
N −1
2
1
(|F|2 ) =
2
Fi −
i=0
N
N −1
2 1
= Fi2 − Fi + 2
i=0
N N
2 1 1
= 2d − 1 − + = 2d − 1 − ,
N N N
When the private key ( f, g) is selected, the public key h is given by the following
formula:
h ≡ Fq ⊗ g(mod q). (7.139)
Then select a polynomial φ ∈ R, deg φ = N − 1 at random, then use the public key
h of user A for encryption. The encryption function σ is
pφ ⊗ h + m ∈ ZqN . (7.143)
Then
c = pφ ⊗ h + m. (7.144)
a ⊗ F p = F p ⊗ f ⊗ c ≡ c = pφ ⊗ h + m(mod p).
Thus
a ⊗ F p ≡ m(mod p).
Because m ∈ ZqN , so
Therefore, as a necessary condition, when the following formula holds, (7.145) holds.
q q
| f ⊗ m|∞ ≤ , and | pφ ⊗ g|∞ ≤ . (7.146)
4 4
Lemma 7.48 For any ε > 0, there are constants r1 and r2 > 0, depending only on ε
and N , for randomly selected polynomial F, G ∈ R, then the probability of satisfying
the following formula is ≥ 1 − ε, that is
Then, Eq. (7.146) can be guaranteed to be true (in the sense of probability), so that
the success rate of the decryption algorithm will be greater than 1 − ε. Thus, (7.148)
becomes the main parameter selection index of NTRU.
Next, we use the concept of q-element lattice to make an equivalent description
of the above NTRU. We first discuss it from the cyclic matrix. Let T and T1 be the
following two N -order square matrices.
⎛ ⎞ ⎛ ⎞
0 ··· 0 1 0
⎜ 0⎟ ⎜ 0 In−1 ⎟
⎜ ⎟ ⎜ ⎟
T =⎜ .. ⎟ , T1 = ⎜ .. ⎟.
⎝ In−1 . ⎠ ⎝. ⎠
0 10 0 0
In order to distinguish in the mathematical formula, T ∗ (a) and T1∗ (a) are some-
times written as T ∗ a and T1∗ a or [T ∗ a] and [T1∗ a]. Obviously, the transpose of T ∗ (a)
is ⎡ ⎤
a
⎢ a T1 ⎥
⎢ ⎥
(T ∗ (a)) = ⎢ .. ⎥ = T1∗ (a ). (7.152)
⎣ . ⎦
a T1N −1
Equation (7.150) is column vector blocking of cyclic matrix, in order to obtain row
vector blocking of cyclic matrix. For any x ∈ (x1 , . . . , x N ) ∈ R N , we let
x = (x N , x N −1 , . . . , x2 , x1 ) ⇒ x = x.
On the right side of (7.153) is a cyclic matrix, which is partitioned by rows. We first
prove that the transpose of the cyclic matrix is still a cyclic matrix.
⎡ ⎤
α1
⎢ α2 ⎥
⎢ ⎥
Lemma 7.49 ∀ a = ⎢ . ⎥ ∈ R N , then (T ∗ (a)) = T ∗ (T −1 a).
⎣ .. ⎦
αN
⎡ ⎤
α1
⎢ α2 ⎥
⎢ ⎥
Proof Let α = ⎢ . ⎥ ∈ R N , by (7.152), (T ∗ (a)) = T1∗ (a ), where
⎣ .. ⎦
αN
α = (α1 , . . . , α N ) is the transpose of α, let
β = (α1 , α N , α N −1 , . . . , α2 ) = α T1 .
Easy to verify
326 7 Lattice-Based Cryptography
⎡ ⎤
β
⎢ βT1 ⎥
⎢ ⎥
T1∗ (β) = ⎢ .. ⎥ = T ∗ (α).
⎣ . ⎦
βT1N −1
There is
T1∗ (β) = (T ∗ (β )) = T ∗ (α).
Because α = (α N , α N −1 , · · · , α2 , α1 ), and β = α T1 , so
β = T α ⇒ T −1 β = α ⇒ α = T −1 β .
We let a = β , then
(T ∗ (α)) = T ∗ (α) = T ∗ (T −1 α).
k = N + 1 + i − j ⇒ 1 + i − j ≡ k(mod N ).
To prove (ii), using the row vector block of cyclic matrix (see (7.153)), then
⎡ ⎤ ⎡ ⎤
a T1 a T1 b
⎢ a T2 ⎥ ⎢ a T 2b ⎥
⎢ 1 ⎥ ⎢ 1 ⎥
[T ∗ (a)]b = ⎢ . ⎥ b = ⎢ .. ⎥ ,
⎣ .. ⎦ ⎣ . ⎦
a T1N a T1N b
and ⎡ ⎤
a T1
⎢ a T2 ⎥
⎢ 1 ⎥
T ∗ (a) · T ∗ (b) = ⎢ . ⎥ [b, T b, . . . , T N −1 b] = (Ai j ) N ×N .
⎣ . . ⎦
a T1N
where
N +i− j+1 i+1− j
Ai j = a T1i · T j−1 b = a T1 b = a T1 b.
By Lemma 7.50, then T ∗ (a) · T ∗ (b) = T ∗ ([T ∗ (a)]b), so there is the first conclusion
of (ii). We notice that
j−1 N −i−1+ j j−i−1
Ai j = Ai j = b T1 T i a = b T1 a = b T1 a.
It is easy to prove that for any row vector x and column vector y, there is x · y = x · y,
and
x T1k = x · T1N −k , 1 ≤ k ≤ N . (7.154)
Thus,
j−i−1 N +i+1− j i+1− j
Ai j = b T1 a = b T1 a = b T1 a.
This proves that T ∗ (a)T ∗ (b) = T ∗ (b)T ∗ (a); that is, the multiplication of cyclic
matrix to matrix is commutative.
To prove (iii), suppose (T ∗ (a)) = A, but det(T ∗ (a)) = det((T ∗ (a)) ), so we just
need to calculate det(A). Make polynomial f (x) = a1 + a2 x + · · · + a N x N −1 , and
let
328 7 Lattice-Based Cryptography
⎡ ⎤
1 1 1 ··· 1
⎢ ξ1 ξ2 ξ3 ··· ξN ⎥
⎢ ⎥
V =⎢
⎢ ξ12 ξ22 ξ32 ··· ξ N2 ⎥.
⎥
⎣ ··· ··· ··· ··· ··· ⎦
ξ1N −1 ξ2N −1 ξ3N −1 ··· ξ NN −1
Then ⎡ ⎤
f (ξ1 ) f (ξ2 ) ··· f (ξ N )
⎢ ξ1 f (ξ1 ) ξ f (ξ2 ) · · · ξ N f (ξ N ) ⎥
AV = ⎢
⎣
2 ⎥.
⎦
··· ··· ··· ···
ξ1N −1 f (ξ1 ) ξ2N −1 f (ξ2 ) · · · ξ NN −1 f (ξ N )
So
det(A) det(V ) = det(AV ) = f (ξ1 ) f (ξ2 ) · · · f (ξ N ) det(V ).
N
= (a1 + a2 ξk + · · · + a N ξkN −1 ).
k=1
⎡ ⎤
1
⎢0⎥
⎢ ⎥
Now prove (iv). Let e = ⎢ . ⎥ ∈ R N , then
⎣ .. ⎦
0
T ∗ (e) = [e, T e, . . . , T N −1 e] = I N .
So take b ∈ R N to satisfy
T ∗ (a) · b = e ⇒ b = (T ∗ (a))−1 e.
Thus, (T ∗ (a))−1 = T ∗ (b). In other words, the inverse of a reversible cyclic matrix
is also a cyclic matrix.
7.7 NTRU Cryptosystem 329
⎡ ⎤
a1
⎢ ⎥ "N
Corollary 7.10 Let N be a prime, a = ⎣ ... ⎦ ∈ Rn , satisfy a = 1, and i=1 ai =
aN
0, then the cyclic matrix T ∗ (a) generated by a is a reversible square matrix.
Proof Under given conditions, we can only prove det(T ∗ (a)) = 0. Let εk
= exp( 2πik ), 1 ≤ k ≤ N − 1, be N − 1 primitive unit roots of N -th( because N
N "N
is a prime), if det(T ∗ (a)) = 0, because of i=1 ai = 0, there must be a k, 1 ≤ k ≤
N − 1, such that
a1 + εk a2 + εk2 a3 + · · · + εkN −1 a N = 0.
(φ(x), 1 + x + · · · + x N −1 ) > 1.
And ⎡ ⎤
g0
⎢ ⎥
g = ⎣ ... ⎦ ∈ Z N , g = (g0 , g1 , . . . , g N −1 ) ∈ Z N .
g N −1
By (7.153), ⎡ ⎤ ⎡ ⎤
a T1 x ax
⎢ a T 2x ⎥ ⎢ a T1 x ⎥
⎢ 1 ⎥ ⎢ ⎥
T (T ∗ (a)x) = T ⎢ . ⎥=⎢ .. ⎥.
⎣ . . ⎦ ⎣ . ⎦
a T1N x a T1N −1 x
Proof By Lemma 7.27, q (A) is a q-ary lattice, that is qZ2N ⊂ q (A)⊂ Z2N , we
only prove q (A) is closed under linear transformation σ . If y ∈ q (A), then there
is x ∈ Z N ⇒ y ≡ A x(mod q), by the definition of σ ,
6 7 6 ∗ 7
T (T ∗ ( f )x) T ( f )T x
σ (y) ≡ = ≡ A T x(mod q).
T (T ∗ (g)x) T ∗ (g)T x
T ∗ ( f )e = f, T ∗ (g)e = g.
Thus, 6 7 6 7
T ∗ ( f )e f
Ae= = ∈ q (A).
T ∗ (g)e g
With the above preparation, we now introduce the equivalent form of NTRU in
lattice theory. 6 7
f
Public key generation. After selected private key ∈ Z2N , NTRU’s public
g
6 7is generated as follows: Because the convolution q-ary lattice q (A) containing
key
f
is an entire lattice, q (A) has a unique HNF basis H , where
g
6 7
I N T ∗ (h)
H= , h ≡ [T ∗ ( f )]−1 g(mod q). (7.162)
0 q IN
That is, m has d f + 1 1, d f −1, other components are 0. Then, the plaintext m is
encrypted with the public key H of the message recipient:
6 7 6 7
m m + [T ∗ (h)]r
c=H ≡ (mod q). (7.164)
r 0
c is called cryptosystem text, the first N components are m + [T ∗ (h)]r , the last N
components are 0.
7.7 NTRU Cryptosystem 333
Obviously, the first condition can be derived from the second condition; that is, the
(7.168) can be derived from the (7.166). We first prove the following Lemma.
( q4 −1)
Lemma 7.54 If the parameter meets d f < 2p
, then
8 q q 9N
[T ∗ ( f )]m + [T ∗ (g)]r ∈ − , .
2 2
Proof Because all components of m and r are ±1 or 0, therefore, we only prove that
the absolute value of the row vectors of [T ∗ ( f )] and [T ∗ (g)] is not greater than q2 .
334 7 Lattice-Based Cryptography
Write f = ( f 0 , f 1 , . . . , f N −1 ), because of f 0 = 1,
N −1 N −1
q
fi ≤ | f i | = 1 + (2d f + 1) p < .
4
i=0 i=0
Similarly, N −1 N −1
q
gi ≤ |gi | = (2d f + 1) p < .
4
i=0 i=0
Thus 8 q q 9N
[T ∗ ( f )]m + [T ∗ (g)]r ∈ − , .
2 2
The Lemma holds.
According to the above lemma, NTRU algorithm needs to add the following
additional conditions to ensure the correctness of decryption transformation:
(D)
( q4 − 1)
df < .
2p
a = a N −1 a N −2 · · · a1 a0 ∈ FqN . (7.170)
[N , 0] = {0 = 00 · · · 0}, [N , N ] = FqN .
C = C(x) = {c(x)|c ∈ C} ⊂ R.
That is, a code C is equivalent to a subset of Fq [x]/x N − 1. The following lemma
reveals the algebraic meaning of a cyclic code.
Proof If C(x) is an ideal of Fq [x]/x N − 1, obviously C is a linear code, for any
code c = c0 c1 · · · c N −1 , there is c(x) = c0 + c1 x + · · · + c N −1 x N −1 ∈ C(x), thus
xc(x) = c N −1 + co x + c1 x 2 + · · · + c N −2 x N −1 ∈ C(x). So cT1 = c N −1 c0 c1 · · ·
c N −2 ∈ C, C is a cyclic code on Fq . Conversely, if C is a cyclic code, then cT1 ∈ C,
thus cT1k ∈ C, for all 0 ≤ k ≤ N − 1 holds. Where T10 = I N is the N -th order identity
matrix. Since the polynomial cT1k (x) corresponding to cT1k is
So ∀ g(x) ∈ R ⇒ g(x)c(x) ∈ C(x). This proves that C(x) is an ideal. The Lemma
holds.
π
kerπ = x N − 1 ⊂ A ⊂ Fq [x] −→ Fq [x]/x N − 1 = R.
Since Fq [x] is the principal ideal ring and A is an ideal of Fq [x], and x N − 1 ⊂ A,
then
A = g(x), where g(x)|x N − 1. (7.172)
Therefore, all ideals in R are finite principal ideals, which can be listed as follows
where g(x) mod x N − 1 represents the principal ideal generated by g(x) in R, that
is
This proves that Fq [x]/x N − 1 is a ring of principal ideals, and the number of
principal ideals is the number d + 1 of positive factors of x N − 1. The so-called
positive factor is a polynomial with the first term coefficient of 1. Therefore, the
Corollary is as follows:
Corollary 7.11 Let d be the number of positive factors of x N − 1, then the number
of cyclic codes with length N is d + 1.
Lemma 7.56 Let g(x)|x N − 1, g(x) be the generating polynomial of cyclic code C,
and deg g(x) = N − k, then C is [N , k] linear code, further, let g(x) = g0 + g1 x +
· · · + g N −k−1 x N −k−1 + g N −k x N −k , the corresponding codeword g = (g0 , g1 , . . . ,
g N −k , 0, 0, . . . , 0) ∈ C, then the generating matrix G of C is
⎡ ⎤
g
⎢ gT1 ⎥
⎢ ⎥
G=⎢ .. ⎥ . (7.174)
⎣ . ⎦
gT1k−1 k×N
7.8 McEliece/Niederreiter Cryptosystem 337
Proof Let C correspond to ideal C(x) = g(x) mod x N − 1, then g(x), xg(x), . . .,
x k−1 g(x) ∈ C(x), their corresponding codewords are {g, gT1 , . . . , gT1k−1 } ⊂ C,
"k−1
let’s prove that {g, gT1 , . . . , gT1k−1 } is a set of bases of C. If ∃ ai ∈ Fq ⇒ i=0 ai gT1i
= 0, then its corresponding polynomial is 0, that is
k−1
k−1
k−1
ai gT1i (x) = ai gT1i (x) = ai x i g(x) = 0.
i=0 i=0 i=0
Thus
k−1
ai x i = 0 ⇒ ∀ ai = 0, 0 ≤ i ≤ k − 1.
i=0
This proves that the dimension of linear subspace C is N − deg g(x) = k; that is, C
is [N , k] linear code. Its generating matrix G is
⎡ ⎤
g
⎢ gT1 ⎥
⎢ ⎥
G=⎢ .. ⎥ .
⎣ . ⎦
gT1k−1 k×N
Next, we discuss the dual code of cyclic code and its check matrix.
Lemma 7.57 Let C ⊂ FqN be a cyclic code and g(x) be the generating polyno-
mial of g(x), deg g(x) = N − k, let g(x)h(x) = x N − 1, h(x) = h 0 + h 1 x + · · · +
h k x k , h = (h 0 , h 1 , . . . , h k , 0, 0, · · · , 0) ∈ FqN is the corresponding codeword. h is
the reverse order codeword, then the check matrix of C is
⎡ ⎤
h
⎢ hT1 ⎥
⎢ ⎥
H =⎢ .. ⎥ . (7.175)
⎣ . ⎦
hT1N −k−1 (N −k)×N
338 7 Lattice-Based Cryptography
C ⊥ = {a H |a ∈ FqN −k },
g0 h i + g1 h i−1 + · · · + g N −k h i−N +k = 0, ∀ 0 ≤ i ≤ N − 1.
h(x) = h 0 x N −1 + h 1 x N −2 + · · · + h k x N −k−1 .
In general, when h(x)|x N − 1, h(x) x N − 1, therefore, the dual code of cyclic code
is not necessarily cyclic code.
Definition 7.15 Let x N − 1 = g1 (x)g2 (x) · · · gt (x) be the irreducible decomposi-
tion of x N − 1 on Fq , where gi (x)(1 ≤ i ≤ t) is the irreducible polynomial with the
first term coefficient of 1 in Fq [x]. Then the cyclic code generated by gi (x) is called
the i-th maximal cyclic code in FqN , denote as Mi+ . The cyclic code generated by
x N −1
gi (x)
is called the i-th minimal cyclic code, denote as Mi− .
Minimal cyclic codes are also called irreducible cyclic codes because they no
longer contain the nontrivial cyclic codes of FqN in Mi− . The irreducibility of minimal
cyclic codes can be derived from the fact that the ideal Mi− (x) in R corresponding
to Mi− is a field. We can give a proof of pure algebra.
Corollary 7.12 Let Mi− be the i-th minimal cyclic code of FqN (1 ≤ i ≤ t), Mi− (x)
is the ideal corresponding to Mi− in R, then Mi− (x) is a field, thus, Mi− no longer
contains any nontrivial cyclic code of FqN .
where g(x)Fq [x] is the principal ideal generated by g(x) in Fq [x]. Since gi (x) is an
irreducible polynomial, so Mi− (x) is a field.
Example 7.1 All cyclic codes with length of 7 are determined on binary finite field
F2 .
Solve: Polynomial x 7 − 1 has the following irreducible decomposition on F2
7.8 McEliece/Niederreiter Cryptosystem 339
Mi+ (x) = {gi (x) f (x)|0 ≤ deg f (x) ≤ N − deg gi (x) − 1}.
Let β be a root of gi (x) in the split field. Then gi (x) is the minimal polynomial of β
in Fq [x], all c(x) ∈ Mi+ (x) ⇒ c(β) = 0. Therefore,
C = {c(x)|c(β) = 0, c(x) ∈ R}
H = [1, β, β 2 , . . . , β N −1 ]m×N
constitutes the check matrix of cyclic code C, and any two rows of H are linearly
independent on Fq , by the definition, C is [N , N − m] Hamming code.
a(x)g(x) + b(x)h(x) = 1.
Let c(x) = a(x)g(x) = 1 − b(x)h(x) ∈ C(x), so for ∀ d(x) ∈ C(x), write d(x) =
g(x) f (x), thus
Therefore
c(x)d(x) ≡ d(x)(mod x N − 1).
There is c(x)d(x) = d(x) in R = Fq [x]/x N − 1. That is, c(x) is the multiplication
unit element of C(x). obviously, c(x) exists only. The Lemma holds.
Definition 7.16 C ⊂ FqN is a cyclic code, and the multiplication unit element c(x)
in C(x) is called the idempotent element of C. If C = Mi− is the i-th minimal cyclic
code, the idempotent element of C is called the primitive idempotent element, denote
as θi (x).
Lemma 7.59 Let C1 ⊂ FqN , C2 ⊂ FqN are two cyclic codes, (N , q) = 1, Idempotent
elements are c1 (x) c2 (x), respectively, then
+
(i) C1 C2 is also the cyclic code of FqN , idempotent element is c1 (x)c2 (x).
(ii) C1 + C2 is also the cyclic code of FqN , idempotent element is c1 (x) + c2 (x) +
c1 (x)c2 (x).
+
Proof It is obvious that C1 C2 and C1 + C2 are cyclic codes in FqN , because they
correspond to ideal C1 (x) and C2 (x) in R, we have
Definition 7.17 A cyclic code C ⊂ FqN with length N is called a δ-BCH code. If its
generating polynomial is the least common multiple of the minimal polynomial of
β, β 2 , . . . , β δ−1 , where δ is a positive integer, β is a primitive N -th unit root. δ-BCH
code is also called BCH code with design distance of δ. If β ∈ Fq m , N = q m − 1,
such BCH codes are called primitive.
Lemma 7.60 Let d be the minimal distance of a δ-BCH code, then we have d ≥ δ.
Proof Suppose x N − 1 = (x − 1)g1 (x)g2 (x) · · · gt (x), β is a primitive N -th unit
root on Fq , then β is the root of a gi (x). Let deg gi (x) = m ⇒ β ∈ Fq m . Because
of [Fq m : Fq ] = m, we can think of β, β 2 , . . . , β δ−1 as an m-dimensional column
vector. Let H be the following m(δ − 1) × N -order matrix.
⎡ ⎤
1 β β2 · · · β N −1
⎢ 1 β2 β4 · · · β 2(N −1) ⎥
⎢ ⎥
H =⎢. . .. .. .. ⎥ .
⎣ .. .. . . . ⎦
1 β δ−1 β 2(δ−1) · · · β (N −1)(δ−1) m(δ−1)×N
c ∈ C ⇐⇒ cH = 0.
We prove that any (δ − 1) column vectors of H are linear independent vectors. Let
the first component of these (δ − 1) column vectors be β i1 , β i2 , . . . , β iδ−1 , where
i j ≥ 0, the corresponding determinant is Vandermonde determinant , and
= β i1 +i2 +···+iδ−1 (β ir − β is ) = 0.
r >s
FqN −k is a correspondence of Spaces FqN to FqN −k , let’s prove that this correspondence
is a single shot on a special codeword whose weight is not greater than t.
Proof By hypothesis,
x H = y H ⇒ (x − y)H = 0 ⇒ x − y ∈ C.
Obviously, the Hamming distance d(0, x) = w(x) ≤ t between x and 0, and the
Hamming distance d(x, x − y) between x and x − y is
We use t-error correction code C and check matrix H as the private key.
Public key: In order to generate the public key, we randomly select a permutation
matrix PN ×N so that I N is an N -order identity matrix, I N = [e1 , e2 , . . . , e N ], σ ∈ S N
is an N -ary substitution, then
This kind of matrix is also called Wyel matrix. A nonsingular diagonal matrix
diag{λ1 , λ2 , . . . , λ N }(λi ∈ Fq , λi = 0) can also be randomly selected, and suppose
K = P H M, this is N × (N − k) ordermatrix.
w(m) = w(m P) ≤ t.
decode
m P H −→ m P.
and
⊥
q (A) = {y ∈ Zm |Ay ≡ 0(mod q)}.
If parameter d, q, n, m is satisfied
n log q
n log q < m log d ⇒ < m. (7.178)
log d
this shows that the collision points y and y of Hash function f A directly lead to a
shortest vector y − y on q-element lattice q⊥ (A).
In order to obtain the anti-collision Hash function, the selection of n × m-order
matrix A is very important. First, we can select the parameter system: let d = 2,
344 7 Lattice-Based Cryptography
A = [A(1) , A(2) , . . . , A( n ) ],
m
(7.179)
where α (i) ∈ Zqn is the n-dimensional column vector and A(i) is the cyclic matrix
generated by α (i) (see (7.149)), that is
m
A(i) = T ∗ (α (i) ) = [α (i) , T α (i) , . . . , T n−1 α (i) ], 1 ≤ i ≤ .
n
A is called an n × m-dimensional generalized cyclic matrix, and the q-element lattice
in Rm defined by A,
⊥
q (A) = {y ∈ Zm |Ay ≡ 0(mod q)}
is called a cyclic lattice. The Ajtai/Dwork cryptosystem based on cyclic lattice can
be stated as follows:
Algorithm 1: Hash function based on cyclic lattice.
Parameter: q, n, m, d is a positive integer, n | m, m log d > n log q.
Secret key: mn column vectors α (i) ∈ Zqn , 1 ≤ i ≤ mn .
Hash function f A : {0, 1, . . . , d − 1}m −→ Zqn define as
f (λ) = det(λIn − Th )
λ 0 · · · 0 a0
. ..
−1 λ · · · .. .
= ··· ··· ··· ··· ···
0 · · · · · · λ an−2
0 · · · · · · −1 an−1
λ 0 ··· 0 a0
1 1 1 0 λ · · · ..
. a1 λ + a0
= · · · n−1
λ λ2 λ ··· ··· ··· ··· ···
0 ··· ··· · · · λn + an−1 λn−1 + · · · + a1 λ + a0
= λn + an−1 λn−1 + · · · + a1 λ + a0 = h(λ).
So 6 7
−a0−1 α In−1
Th−1 = .
−a0−1 0
R = Z[x]/h(x), (7.181)
where h(x) is the ideal generated by h(x) in Z[x]. Because of deg h(x) = n, then
polynomial g(x) ∈ R in R has a unique expression: g(x) = gn−1 x n−1 + gn−2 x n−2 +
· · · + g1 x + g0 ∈ R, define mapping σ : R −→ Zn as
346 7 Lattice-Based Cryptography
⎡ ⎤
g0
⎢ g1 ⎥
⎢ ⎥
σ (g(x)) = ⎢ .. ⎥ ∈ Zn . (7.182)
⎣ . ⎦
gn−1
the n-order square matrix Th∗ (g) is called an ideal matrix generated by vector g.
Ideal matrix is a more general generalization of cyclic matrix. The former corre-
sponds to a first n-degree polynomial h(x), and the latter corresponds to a special
polynomial x n − 1. We first prove that the ideal matrix Th∗ (g) and the rotation matrix
Th generated by any vector g ∈ Zn are commutative under matrix multiplication.
Lemma 7.64 For any given first n-degree polynomial h(x) ∈ Z[x], and
n-dimensional column vector g ∈ Zn , we have
there is
6 7
0 −a0
Th∗ (g)Th = [g, Th g, Th2 g, . . . , Thn−1 g]
In−1 −α
= [Th g, Th2 g, . . . , −a0 g − a1 Th g − · · · − an−1 Thn−1 g]
= [Th g, Th2 g, . . . , (−a0 − a1 Th − · · · − an−1 Thn−1 )g]
= [Th g, Th2 g, . . . , Thn g]
= Th [g, Th g, . . . , Thn−1 g]
= Th · Th∗ (g).
7.9 Ajtai/Dwork Cryptosystem 347
Theorem 7.10 The principal ideal in R = Z[x]/h(x) 1-1 corresponds to the ideal
lattice in Zn . Specifically,
(i) If N = g(x) is any principal ideal in R, then
And because
so
⎡ ⎤
⎡ ⎤ 0
−gn−1 a0 ⎢1⎥
⎢ ⎥
g0 − gn−1 a1 ⎢ ⎥
⎢ ⎥ ⎢ ⎥
σ (xg(x)) = ⎢ .. ⎥ = Th · g = Th∗ (g) ⎢ 0 ⎥ ∈ L(Th∗ (g)).
⎣ . ⎦ ⎢ .. ⎥
⎣.⎦
gn−2 − gn−1 an−1
0
348 7 Lattice-Based Cryptography
σ ( f (x)) = σ (b(x)g(x))
n−1
= bk σ (x k g(x))
k=0
⎡ ⎤ (7.185)
b0
⎢ b1 ⎥
⎢ ⎥
= Th∗ (g) ⎢ .. ⎥ ∈ L(Th∗ (g)).
⎣ . ⎦
bn−1
That proves
σ (x) = σ (g(x)) ⊂ L(Th∗ (g)).
So we have
σ (N ) = σ (g(x)) = L(Th∗ (g)).
(i) holds. Again, σ is 1-1 corresponds, so (ii) can be derived directly. We complete
the proof of Theorem 7.10.
The above discussion on ideal matrix and ideal lattice can be extended to a finite
field Zq , because any quotient ring Zq [x]/h(x) on polynomial ring Zq [x] in finite
7.9 Ajtai/Dwork Cryptosystem 349
field is a principal ideal ring. Therefore, we can establish the 1-1 correspondence
between all ideals in R = Zq [x]/h(x) and linear codes on Zq .
Back to the Ajtai/Dwork cryptosystem, let h(x) ∈ Zq [x] be a given polynomial,
and select an n × m-dimensional matrix A ∈ Zqn×m as the generalized ideal matrix,
i.e.,
A = [A1 , A2 , . . . , A mn ], (7.186)
where Ai (1 ≤ i ≤ m
n
) is the ideal matrix generated by g (i) ∈ Zqn , that is
λ1 (L) · λ1 (L ∗ ) ≤ n.
4. Let λ1 (L), λ2 (L), . . . , λn (L) be the length of the Successive Shortest vector of
lattice L, prove
λ1 (L) · λn (L ∗ ) ≥ 1.
1
λ1 (L) ≤ min{|β1∗ |, |β2∗ |, . . . , |βn∗ |} ≤ λ1 (L).
n
350 7 Lattice-Based Cryptography
λ1 (L) · μ(L ∗ ) ≤ n.
References
Ajtai, M. (2004). Generating hard instances of lattice problems. In Quad. Mat.: Vol. 13. Complexity
of computations and proofs (pp. 1–32). Dept. Math., Seconda Univ. Napoli. Preliminary version
in STOC 1996.
Ajtai, M., & Dwork, C. (1997). A public-key cryptosystem with worst-case/average-case equiva-
lence. In Proceedings of 29th Annual ACM Symposium on Theory of Computing (STOC) (pp.
284–293).
Babai, L. (1986). On Loväasz lattice reduction and the nearest lattice point problem. Combinatorica,
6, 1–13.
Cassels, J. W. S. (1963). Introduction to diophantine approximation. Cambridge University Press.
Cassels, J. W. S. (1971). An introduction to the geometry of numbers. Springer.
Gama, N., & Nguyen, P. Q. (2008a). Finding short lattice vectors within Mordell’s inequality. In
Proceedings of 40th ACM Symposium on Theory of Computing (STOC) (pp. 207–216).
Gama, N., & Nguyen, P. Q. (2008b). Predicting lattice reduction. In Lecture Notes in Computer
Science: Advances in cryptology. Proceedings of Eurocrypt’08. Springer
Goldreich, O., Goldwasser, S., & Halevi, S. (1997). Public-key cryptosystems from lattice reduction
problems. In Lecture Notes in Computer Science: Vol. 1294. Advances in cryptology (pp. 112–
131). Springer.
Hoffstein, J., Pipher, J., & Silverman, J. H. (1998). NTRU: A ring based public key cryptosystem.
In LNCS: Vol. 1423. Proceedings of ANTS-III (pp. 267–288). Springer.
Klein, P. (2000). Finding the closest lattice vector when it’s unusually close. In Proceedings of 11th
Annual ACM-SIAM Symposium on Discrete Algorithms (pp. 937–941).
Lenstra, A. K., Lenstra, H. W., Jr., & Lovasz, L. (1982). Factoring polynomials with rational
coefficients. Mathematische Annalen, 261(4), 515–534.
McEliece, R. (1978). A public-key cryptosystem based on algebraic number theory. Technical
Report, Jet Propulsion Laboratory. DSN Progress Report 42-44.
References 351
Micciancio, D. (2001). Improving lattice based cryptosystems using the Hermite normal form. In
J. Silverman (Ed.), Lecture Notes in Computer Science: Vol. 2146. Cryptography and lattices
conference—CaLC 2001 (pp. 126–145). Springer.
Micciancio, D., & Regev, O. (2009). Lattice-based cryptography. Springer.
Niederreiter, H. (1986). Knapsack-type cryptosystems and algebraic coding theory. Problems of
Control and Information Theory/Problemy Upravlen. Teor. Inform., 15(2), 159–166.
Peikert, C. (2016). A decade of lattice cryptography. Foundations & Trends in Theoretical Computer
Science.
Regev, O. (2004). Lattices in computer science (Lecture 1–Lecture 7). Tel Aviv University, Fall.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
References
24. Delsarte, P., & Goethals, J. M. (1975). Unrestricted codes with the Golay parameters are
unique. Discrete Mathematics, 12, 211–224.
25. Elias, P. Coding for noisy channels. IRE Conv, Record, Part 4, pp. 37–46.
26. Feller, W. (1950). An introduction to probability theory and its applications (Vol. I). Wiley.
27. Forney, G. D. (1970). Convolutional codes I: algebraic structure. IEEE Transactions on Infor-
mation Theory, 16, 720–738. Ibid., 17, 360 (1971).
28. Gallagher, R. G. (1968). Information theory and reliable communication. Wiley.
29. Goethals, J. M. (1977). The extended Nadler code is unique. IEEE Transactions on Information
Theory, 23, 132–135.
30. Goppa, V. D. (1970). A new class of linear error-correcting codes. Problems of Information
Tansmission, 6, 207–212.
31. Goto, M. (1975). A note on perfect decimal AN codes. Information and Control, 29, 385–387.
32. Goto, M., & Fukumara, T. (1975). Perfect nonbinary AN codes with distance three. Informa-
tion and Control, 27, 336–348.
33. Graham, R. L., & Sloane, N. J. A. (1980). Lower bounds for constant weight codes. IEEE
Transactions on Information Theory, 26, 37–40.
34. Gritsenko, V. M. (1969). Nonbinary arithmetic correcting codes. Problems of Information
and Transmission, 5, 15–22.
35. Helgert, H. J., & Stinaff, R. D. (1973). Minimum distance bounds for binary linear codes.
IEEE Transactions on Information Theory, 19, 344–356.
36. Justesen, J. (1975). An algebraic construction of rate v1 convolutional codes. IEEE Transac-
tions on Information Theory, 21, 577–580.
37. Kasami, T. (1969). An upper bound on k/n for affine invariant codes with fixed d/n. IEEE
Transactions on Information Theory, 15, 171–176.
38. Levenshtein, V. I. (1975). Minimum redundancy of binary error-correcting codes. Information
and Control, 28, 268–291.
39. van Lint, J. H. (1971). Nonexistence theorems for perfect error-correcting codes. In Computers
in algerbra and theory. SIAM-AMS Proceedings (Vol. IV).
40. van Lint, J. H. (1999). Introduction to coding theory, GTM86. Springer.
41. van Lint, J. H. (1972). A new description of the Nadler code. IEEE Transactions on Information
Theory, 18, 825–826.
42. van Lint, J. H. (1975). A survey of perfect codes. Rocky Mountain Journal of Mathematics,
5, 199–224.
43. van Lint, J. H., & Macwilliams, F. J. (1978). Generalized quadratic residue codes. IEEE
Transactions on Information Theory, 24, 730–737.
44. Macwilliams, F. J., & Sloane, N. J. A. (1977). The theory of error-correcting codes. North
Holland.
45. Massey, J. L., & Garcia, O. N. (1972). Error-correcting codes in computer arithmetic. In: J.
T. Ton (Ed.) Advances in information systems science (Vol. 4, Chap. 5). Plenum Press.
46. Massey, J. L., Costello, D. J., & Justesen, J. (1973). Polynomial weights and code construction.
IEEE Transactions on Information Theory, 19, 101–110.
47. McEliece, R. J., Rodemich, E. R., Rumsey, H. C., & Welch, L. R. (1977). New upper bounds
on the rate of a code via the Delsarte-MacWilliams inequalities. IEEE Transactions on Infor-
mation Theory, 23, 157–166.
48. McEliece, R. J. (1977). The theory of information and coding. In Encyclopedia of mathematics
and its applications (Vol. 3). Addison-Wesley.
49. McEliece, R. J. (1979). The bounds of Delsarte and Lovasz and their applications to coding
theory. In: G. Longo (Ed.) CISM Courses and Lectures: Vol. 258. Algebraic coding theory
and applications. Springer.
50. Peterson, W. W., & Weldon, E. J. (1972). Error-correcting codes (2nd Ed.). MIT Press.
51. Piret, Ph. (1977). Algebraic properties of convolutional codes with automorphisms (Ph.D.
Dissertation, Université Catholique de Louvain).
52. Posner, E. C. (1968). Combinatorial structures in planetary reconnaissance. In H. B. Mann
(Ed.), Error correcting codes (pp. 15–46). Wiley.
References 355
53. Rao, T. R. N. (1974). Error coding for arithmetic processors. Academic Press.
54. Roos, C. (1979). On the structure of convolutional and cyclic convolutional codes. IEEE
Transactions on Information Theory, 25, 676–683.
55. Shannon, C. E. (1948). A mathematical theory of communication. Bell Labs Technical Journal,
27, 379–423, 623–656.
56. Sloane, N. J. A., Reddy, S. M., & Chen, C. L. (1972). New binary codes. IEEE Transactions
on Information Theory, 18, 503–510.
57. Solomon, G., & van Tilborg, H. C. A. (1979). A connection between block and convolutional
codes. SIAM Journal on Applied Mathematics, 37, 358–369.
58. Tietäváinen, A. (1973). On the nonexistence of perfect codes over finite fields. SIAM Journal
on Applied Mathematics, 24, 88–96.
59. van der Geer, G., & van Lint, J. H. (1988). Introduction to coding theory and algebraic
geometry. Birkhäuser.
60. Hong, Y. (1984). On the nonexistence of unknown perfect 6- and 8-codes in Hamming schemes
H (n, q) with q arbitrary. Osaka Journal of Mathematics, 21, 687–700.
61. Kerdock, A. M. (1972). A class of low-rate nonlinear codes. Information and Control, 20,
182–187.
62. van Oorschot, P. C., & Vanstone, S. A. (1989). An introduction to error correcting codes with
applications. Kluwer.
63. Peek, J. H. (1985). Communications aspects of the compact disc digital audio system. IEEE
Communications Magazine, 23(2), 7–15.
64. Piret, Ph. (1988). Convolutional codes, an algebraic approach. The MIT Press.
65. Barg, A. M., Katsman, S. L., & Tsfasman, M. A. (1987). Algebraic geometric codes from
curves of small genus. Problems of Information Transmission, 23, 34–38.
66. Conway, J. H., & Sloane, N. J. A. (1994). Quaternary constructions for the binary single-
error-correcting codes of Julin, Best, and others. Designs, Codes and Cryptography, 41, 31–
42.
67. Feng, G.-L., & Rao, T. R. N. (1994). A simple approach for construction of algebraic-
geometric codes from affine plane curves. IEEE Transactions on Information Theory, 40,
1003–1012.
68. Høholdt, T., & Pellikaan, R. (1995). On the decoding of algebraic-geometric codes. IEEE
Transactions on Information Theory, 41, 1589–1614.
69. Høholdt, T., van Lint, J. H., & Pellikaan, R. (1998). Algebraic geometry codes. In V. S.
Pless, W. C. Huffman& R. A. Brualdi (Eds.), Hand-book of coding theory. Elsevier Science
Publishers.
70. Justesen, J., Larsen, K. J., Jensen, E. H., & Havemose, A., & Høholdt, T. (1989). Construction
and decoding of a class of algebraic geometry codes. IEEE Transactions on Information
Theory, 35, 811–821.
71. van Lint, J. H. (1990). Algebraic geometric codes. In D. Ray-Chaudhuri (Ed.), Coding theory
and design theory I. The IMA Volumes in Mathematics and its Applications (Vol. 20). Springer.
72. van Lint, J. H., & Wilson, R. M. (1992). A course in combinatorics. Cambridge University
Press.
73. Stichtenoth, H. (1993). Algebraic function fields and codes. Universitext. Springer.
74. Bassoli, R., Marques, H., & Rodriguez, J. (2013). Network coding theory, a survey. IEEE
Communications Surveys & Tutorials, 15(4), 1950–1978.
75. Berger, T. (1971). Rate distortion theory: a mathematical basis for data compression. Prentice-
Hall.
76. Billingsley, P. (1965). Ergodic theory and information. Wiley.
77. Chung, K. L. (1961). A note on the ergodic theorem of information theory. Annals of Mathe-
matical Statistics, 32, 612–614.
78. Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. Wiley.
79. Csiszár, I., & Körner, J. (1981). Information theory: Coding theorems for discrete memoryless
systems. Academic Press.
80. EI Gamal, A., & Kim, Y. H. (2011). Network information theory. Cambridge University Press.
356 References
81. Fragouli, C., Le Boudec, J. Y., & Widmer, J. (2006). Network coding: an instant primer.
ACMSIGCOMM Computer Communication Review, 36, 63–68.
82. Gallager, R. G. (1968). Information theory and reliable communication. Wiley.
83. Gray, R. M. (1990). Entropy and information theory. Springer.
84. Guiasu, S. (1977). Inormation theory with applications. McGraw-Hill.
85. Ho, T., & Lun, D. (2008). Network coding: An introduction. Computer Journal.
86. Hu, X. H., & Ye, Z. X. (2006). Generalized quantum entropy. Journal of Mathematical Physics,
47(2), 1–7.
87. Ihara, S. (1993). Information theory for continuous systems. World Scientific.
88. Kakihara, Y. (1999). Abstact methods in information theory. World Scientific.
89. McMillan, B. (1953). The basic theorems of information theory. Annals of Mathematical
Statistics, 24(2), 196–219.
90. Moy, S. C. (1961). A note on generalizations of Shannon-McMilllan theorem. Pacific Journal
of Mathematics, 11, 705–714.
91. Nielsen, M. A., & Chuang, I. L. (2000). Quantum computation and quantum information.
Cambridge University Press.
92. Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical
Journal, 27(4), 379–423, 623–656.
93. Shannon, C. E. (1959). Coding theorem for a discrete source with a fidelity criterion. IRE
National Convention Record, 4, 142–163.
94. Shannon, C. E. (1958). Channels with side information at the transmitter. IBM Journal of
Research and Development, 2(4), 189–193.
95. Shannon, C. E. (1961). Two-way communication channels. In Proceedings of the Fourth
Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1, pp. 611–644).
96. Thomasian, A. J. (1960). An elementary proof of the AEP of information theory. Annals of
Mathematical Statistics, 31(2), 452–456.
97. Wolfowitz, J. (1978). Coding theorems of information theory (3rd Ed.). Springer.
98. Ye, Z. X., & Berger, T. (1998). Information measures for discrete random fields. Science
Press.
99. Yeung, R. W. (2002). A first course in information theory. Kluwer Academic.
100. Qiu, P. (2003). Information theory and coding. Higher Education Press (in Chinese).
101. Qiu, P., Zhang, C., Yang, S., et al. (2012). Multi user information theory. Science Press (in
Chinese).
102. Ye, Z. (2003). Fundamentals of information theory. Higher Education Press (in Chinese).
103. Zhang, Z., & Lin, X. (1993). Information theory and optimal coding. Shanghai Science and
Technology Press (in Chinese).
104. Adleman, L. M. (1979). A subexponential algorithm for the discrete logarithm problem with
application to cryptography. In Proceedings of the 20th Annual Symposium on the Foundations
of Computer Science (pp. 55–60).
105. Adelman, L. M., Rivest, R. L., & Shamir, A. (1978). A method for obtaining digital signatures
and public-key crypto system. Communications of ACM, 21, 120–126.
106. Blum, M. Coin-flipping by telephone—A protocol for Solving impossible problems. IEEE
Proceeding, 133–137.
107. Diffie, W., & Hellman, M. E. (1976). New direction in crytography. IEEE Transactions in
Information Theory, IT-22, 644–654.
108. Hellman, M. E. (1979). The mathematics of public-key cryptography. Scientific America, 241,
146–157.
109. Hellman, M. E., & Merkle, R. C. (1978). Hiding information and signatures in trap door
knapsacks. IEEE Transactions in Information Theory, IT-24, 525–530.
110. Garey, M. R., & Johnson, D. S. (1979). Computers and intractability. A Guide to the Theory
of NP-Completeness. W.H. Freeman.
111. Coppersmith, D. (1984). Fast evaluation of logarithms in fields of characteristic two. IEEE
Transactions in Information Theory, IT-30, 587–594.
References 357
112. EI Gamal, T. (1985). A public key cryptosystem and a signature scheme based on discrete
logarithms. IEEE Transactions in Information Theory, IT-314, 469–472.
113. Gordon, J. A. (1985). Strong prime are easy to find. In Advance in cryptology. Proceedings
of Euro Crypt84 (pp. 216–223). Springer.
114. Fait, A., & Shamir, A. (1986). How to prove yourself: Practical solutions to identifications
and signature problems. In A advance in cryptology-CRYPTO’86 (Lvcs 263, pp. 186–194).
Springer.
115. Goldreich, O. (2001). Foundation of cryptography. Cambridge University Press.
116. Hill, L. S. (1931). Concerning certain linear transformation apparatus of cryptography. The
American Mathematical Monthly, 38, 135–154.
117. Knuth, D. E. (1973). The art of computer programming. Addison-Wesley.
118. Kranakis, E. (1986). Primality and cryptography. Wiley.
119. Kahn, D. (1967). The codebreakers: The story of secret writing. Macmillan.
120. Massey, J. L. (1983). Logarithms in finite cyclic group-cryptographic issues. In Proceedings
of the 4th Benelux Symposium on Informations Theory (pp. 17–25).
121. Koblitz, N. (1994). A course in number theory and cryptograph. Springer.
122. Ruggiu, G. (1985). Cryptology and complexity theories. In Advances in cryptology. Proceed-
ings of Eurocrypt 84 (pp. 3–9). Springer.
123. Rivest, R. L. (1985). RSA chips (past, present, and future). In Advances in cryptology. Pro-
ceedings of Eurocrypt 84 (pp. 159–165).
124. Schneier, B. (1996). Applied cryptography. Wiley.
125. Shannon, C. E. (1949). Communication theory of secrecy system. The Bell System Technical
Journal, 28, 656–715.
126. Odlyzko, A. M. (1985). Discrete logarithms in finite fields and their cryptographic signifi-
cance. In Advance in cryptology. Proceedings of Eurocrypt 84 (pp. 224–314). Springer.
127. Wah, P., & Wang, M. Z. (1984). Realization and application of Massey-Omura lock. In Pro-
ceedings of the International, Zürich Seminar (pp. 175–182).
128. Shamir, A. (1982). A polynomial time algorithm for breaking the basic Markle-Hellman
cryptosystem. In Proceedings of the 23rd Annual Symposium on the Foundations of Computer
Science (pp. 145–152).
129. Stinson, D. R. (2003). Principles and practice of cryptography (G. Feng, Trans.). Electronic
Industry Press (in Chinese).
130. Cover, T. M. (2003). Fundamentals of information theory. Tsinghua University Press (in
Chinese).
131. Trappe, W., & Washington, L. C. (2008). Cryptography and coding theory (Q. Wang et al.,
Trans.). People’s Posts and Telecommunications Publishing House (in Chinese).
132. Adelman, L. M., Pomerance, C., & Rumely, R. S. (1983). On distinguishing prime number
from composite numbers. Annuals of Mathematics, 117, 173–206.
133. Blair, W. D., Lacampague, C. B., & Selfridge, J. L. (1986). Factoring large numbers on a
Pocket calculator. The American Mathematical Monthly, 93, 802–808.
134. Brent, R. P. (1980). An improved Monte Carlo factorization algorithm. BIT, 20, 176–184.
135. Berent, R. P., & Pollared, J. M. (1981). Factorization of the eighth Fermat number. Mathe-
matics of Computation, 36, 627–630.
136. Cohen, H., & Lenstra, H. W. (1984). Primality testing and Jacobi sums. Mathematics of
Computation, 142, 297–330.
137. Davenport, H. (1982). The higher arithemetic. Cambridge University Press.
138. Dickson, L. E. (1952). History of the theory of number (Vol. 1). Chelsea.
139. Dixon, J. D. (1984). Factorization and primality tests. The American Mathematical Monthly,
91, 333–352.
140. Guy, R. K. (1975). How to factor a number. In Proceedings of the 5th Manitoba Conference
on Numerical Mathematics (pp. 49–89).
141. Kranakis, E. (1986). Primality and cryptography. Wiley.
142. Lehmer, D. H., & Powers, R. E. (1931). On factoring large number. Bulletin of the American
Mathematical Society, 37, 770–776.
358 References
143. Lehman, R. S. (1974). Factoring large number. Mathematics of Computation, 28, 637–646.
144. Miller, G. L. Riemann’s hypothesis and tests for primality. In Proceedings of the 7th Annual
ACM Symposium on the Theory of Computing (pp. 234–239).
145. Morrison, M. A., & Brillhart, J. (1975). A method of factoring and the factorization of F7 .
Mathematics of Computation, 29, 183–205.
146. Pomerance, C. (1981). Recent development in primality testing. The Mathematical Intelli-
gencer, 3, 97–105.
147. Pomerance, C. (1982). The search for prime number. Scientific American, 427, 136–147.
148. Pomerance, C. (1982). Analysis and comparison of some integer factoring algorithms. In
Computation Methods in Number Theory, Part 1. Mathematics Chentrum.
149. Pomerance, C., & Wagstaff, S. S. (1983). Implementation of the continued fraction integer
factor in algorithm. In Proceedings of the 12th Winnipeg Conference on Numerical Methods
and Computing.
150. Rabin, M. O. (1980). Probabilistic algorithms for testing Primality. Journal of Number Theory,
12, 128–138.
151. Pollard, J. M. (1975). A Monte Carlo method for factorization. BIT, 15, 331–334.
152. Solovag, R., & Strassen, V. (1977). A fast Munte Carlo test for primality. SIAM Journal for
Computing, 6, 84–85.
153. Wagon, S. (1986). Primality testing. The Mathematical Intelligence, 8, 58–61.
154. Wunderlich, M. C. (1979). A running time and analysis of Brillhart’s continued fraction
factoring method. Number Theory, Carbondale, Springer Lecture Notes, 175, 328–342.
155. Wunderlich, M. C. (1985). Implementing the continued fraction factoring algorithm on parallel
machines. Mathematics of Computation, 44, 251–260.
156. Fulton, W. (1969). Algebraic curves. Benjamin.
157. Koblitz, N. (1984). Introduction to elliptic curves and modular forms. Springer.
158. Koblitz, N. (1982). Why study equations over finite fields. Mathematics Magazine, 55, 144–
149.
159. Koblitz, N. (1987). Elliptic curves cryptosystems. Mathematics of Computation, 48.
160. Koblitz, N. Primality of the number of points on an elliptic curve over finite field.
161. Gupta, R., & Murty, M. R. (1986). Primitive points on elliptic curves. Compositio Mathemat-
ica, 58, 13–44.
162. Lenstra, H. W., Jr. (1986). Factoring integers with elliptic curves. Report 86-18. Mathematic
Institute University of Van Amsterdam.
163. Lenstra, H. W., Jr. (1986). Elliptic curves and number-theoretic algorithms. Report 86-19.
Mathematics Institute University of Van Amsterdam.
164. Lang, S. (1978). Elliptic curves: diophantine analysis. Springer.
165. Miller, V. (1985). Use of elliptic curves in cryptography. In Abstracts for Crypto’85.
166. Odlyzko, A. M. (1985). Discrete logarithms in finite fields and their cryptographic signifi-
cance. In Advance in cryptography. Proceedings of Eurocrypt 84 (pp. 224–314). Springer.
167. Schoof, H. (1985). Elliptic curves over finite fields and the computation of square roots mod
p. Mathematics of Computation, 44, 483–494.
168. Silverman, J. (1986). The arithmatic of elliptic curves. Springer.
169. Pollard, J. M. (1974). Theorems on factorization and primality testing. Mathematical Pro-
ceedings of the Cambridge Philosophical Society, 76, 521–528.