0% found this document useful (0 votes)
26 views

Zheng Z. Modern Cryptography. Vol 1...Math Principle 2022

The 'Financial Mathematics and Fintech' book series explores advances in mathematical theory related to finance and fintech applications, including AI, blockchain, and big data. It aims to bridge the gap between academia and financial practitioners, providing resources for a diverse audience interested in financial mathematics and fintech. The series emphasizes a comprehensive understanding of the mathematical principles underlying modern financial technologies and their practical applications.

Uploaded by

Franke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Zheng Z. Modern Cryptography. Vol 1...Math Principle 2022

The 'Financial Mathematics and Fintech' book series explores advances in mathematical theory related to finance and fintech applications, including AI, blockchain, and big data. It aims to bridge the gap between academia and financial practitioners, providing resources for a diverse audience interested in financial mathematics and fintech. The series emphasizes a comprehensive understanding of the mathematical principles underlying modern financial technologies and their practical applications.

Uploaded by

Franke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 364

Financial Mathematics and Fintech

Zhiyong Zheng

Modern
Cryptography
Volume 1
A Classical Introduction to Informational
and Mathematical Principle
Financial Mathematics and Fintech

Series Editors
Zhiyong Zheng, Renmin University of China, Beijing, Beijing, China
Alan Peng, University of Toronto, Toronto, ON, Canada
This series addresses the emerging advances in mathematical theory related to finance
and application research from all the fintech perspectives. It is a series of mono-
graphs and contributed volumes focusing on the in-depth exploration of financial
mathematics such as applied mathematics, statistics, optimization, and scientific
computation, and fintech applications such as artificial intelligence, block chain,
cloud computing, and big data. This series is featured by the comprehensive under-
standing and practical application of financial mathematics and fintech. This book
series involves cutting-edge applications of financial mathematics and fintech in
practical programs and companies.
The Financial Mathematics and Fintech book series promotes the exchange
of emerging theory and technology of financial mathematics and fintech between
academia and financial practitioner. It aims to provide a timely reflection of the state
of art in mathematics and computer science facing to the application of finance.
As a collection, this book series provides valuable resources to a wide audience
in academia, the finance community, government employees related to finance and
anyone else looking to expand their knowledge in financial mathematics and fintech.
The key words in this series include but are not limited to:
a) Financial mathematics
b) Fintech
c) Computer science
d) Artificial intelligence
e) Big data

More information about this series at https://link.springer.com/bookseries/16497


Zhiyong Zheng

Modern Cryptography
Volume 1
A Classical Introduction to Informational
and Mathematical Principle
Zhiyong Zheng
School of Mathematics
Renmin University of China
Beijing, China

ISSN 2662-7167 ISSN 2662-7175 (electronic)


Financial Mathematics and Fintech
ISBN 978-981-19-0919-1 ISBN 978-981-19-0920-7 (eBook)
https://doi.org/10.1007/978-981-19-0920-7

© The Editor(s) (if applicable) and The Author(s) 2022. This book is an open access publication.
Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International
License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribu-
tion and reproduction in any medium or format, as long as you give appropriate credit to the original
author(s) and the source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this book are included in the book’s Creative Commons license,
unless indicated otherwise in a credit line to the material. If material is not included in the book’s Creative
Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface

I organized several seminars on cryptography, the students generally reflected that


cryptography doesn’t need much mathematics, and computer language and computer
working environment are more important. Later, I reviewed several common cryp-
tography textbooks at home and abroad. If so, these textbooks are for engineering
students, and the purpose is to cultivate cryptographic engineers. It is my original
intention to write a textbook of theoretical cryptography for students of mathematics
department and science postgraduates, which systematically teaches the statistical
characteristics of cryptographic system, the computational complexity theory of
cryptographic algorithm and the mathematical principles behind various encryption
and decryption algorithms.
With the rapid development of the new generation of digital technology, China
has entered the era of information, network and intelligence. Cryptography is not
only the cornerstone of national security in the information age, but also a sharp
sword to protect people’s property security, personal privacy and personal dignity.
After the establishment of the first-class discipline of Cyberspace Security, China has
established the first-class discipline of security. In particular, on December 19, 2019,
China officially promulgated the code law to formulate a law for a discipline. This is
rare all over the world. Lately, the central government explicitly requests to cultivate
our own cryptography professionals. It can be seen that the discipline construction
and personnel training of cryptography have been promoted to the height of national
security, which has become a major national strategic demand. Writing a textbook on
cryptography theory aims to cultivate our own cryptographers, which is the ultimate
reason for writing this book.
Cryptosystem is an ancient art. Since the birth of human beings, there has been
cryptosystem. For example, the means of communication used by human beings in
war, the marks and conventions used by special groups can be classified into the
category of cryptosystem art. Among them, the famous Caesar cryptosystem can be
regarded as the representative work of ancient cryptosystem. For thousands of years,
cryptosystem, as a technology, relies on personal intelligence and ingenuity. Occa-
sionally, some mathematical ideas and methods were used fragmentarily. This era of

v
vi Preface

cryptographers changed fundamentally only after the great American mathematician


M. Shannon came out.
In 1948 and 1949, Shannon successively published two epoch-making papers in
the technical bulletin of Bell laboratory. In the first paper, Shannon established the
mathematical theory of communication and established the random measurement of
information by using the method of probability theory, thus laid the foundation of
modern information theory. In the second paper, Shannon established the informatics
principle of cryptography, introduced the probability and statistics principle system
of mathematics into cryptography structure and cryptanalysis, and transformed the
ancient cryptography technology from art to science. Therefore, people not only
call Shannon the father of modern information theory, but also the father of modern
cryptography.
After Shannon’s great changes from the era of cryptographer to the era of cryp-
toscience, the ancient cryptology technology ushered in the second historic leap in
1976, that is, the era of symmetric cryptography changed into the era of public key
cryptography. In 1976, two Stanford University scholars W. Diffie and M. Hellman
published a pioneering paper on asymmetric cryptography in IEEE Transactions
on Information Theory and then entered the era of public key cryptography. Public
key cryptography and mathematics are more deeply crossed and integrated, making
cryptography an inseparable branch of mathematics. The era characteristic of public
key cryptography is to change the cryptography from a few users to mass consumer
products, which greatly improves the efficiency and social value of the cryptography.
Nowadays, asymmetric cryptosystem is widely used in message authentication, iden-
tity authentication, digital signature, digital currency and blockchain architecture,
which cannot be replaced by classical cryptosystem.
Based on Shannon’s information theory, this book systematically introduces the
information theory, statistical characteristics and computational complexity theory
of public key cryptography, focusing on the three main algorithms of public key cryp-
tography, RSA, discrete logarithm and elliptic curve cryptosystem, strives to know
what it is and why it is, and lays a solid theoretical foundation for new cryptosystem
design, cryptoanalysis and attack.
Lattice theory-based cryptography is a representative technology of postquantum
cryptography, which is recognized by the academic community as being able to
resist quantum computing attacks. At present, the theory and technology of lattice
cryptography have not entered university textbooks, and various achievements and
introductions have been scattered in research papers at home and abroad in the past
two decades. The greatest feature of this book is that it systematically simplifies
and combs the theory and technology of lattice cryptography, making it a classroom
textbook for senior college students and postgraduates of cryptography, which will
play an important role in accelerating the training of modern cryptography talents in
China.
This book requires the reader to have a good foundation in algebra, number theory
and probability statistics. It is suitable for senior students majoring in mathematics,
compulsory for cryptography and science and engineering postgraduates. It can also
Preface vii

be used as the main reference book for scientific researchers engaged in cryptography
research and cryptographic engineering.
The main contents of this book have been taught in the seminar. My doctoral
students Hong Ziwei, Chen Man, Xu Jie, Zhang Mingpei, Associate Professor Huang
Wenlin and Dr. Tian Kun have all put forward many useful suggestions and help for
the contents of this book. In particular, Chen Man has devoted a lot of time and energy
to text printing and proofreading. Here, I would like to express my deep gratitude to
them!

Beijing, China Zhiyong Zheng


November 2021
Contents

1 Preparatory Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Injective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Computational Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Jensen Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Stirling Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 n-fold Bernoulli Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6 Chebyshev Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.7 Stochastic Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2 The Basis of Code Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.1 Hamming Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.2 Linear Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.3 Lee Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.4 Some Typical Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.4.1 Hadamard Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.4.2 Binary Golay Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.4.3 3-Ary Golay Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.4.4 Reed–Muller Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.5 Shannon Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3 Shannon Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.1 Information Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.2 Joint Entropy, Conditional Entropy, Mutual Information . . . . . . . . . . 96
3.3 Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
3.4 Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
3.5 Source Coding Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.6 Optimal Code Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
3.7 Several Examples of Compression Coding . . . . . . . . . . . . . . . . . . . . . 130
3.7.1 Morse Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
3.7.2 Huffman Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

ix
x Contents

3.7.3 Shannon–Fano Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133


3.8 Channel Coding Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
4 Cryptosystem and Authentication System . . . . . . . . . . . . . . . . . . . . . . . . 153
4.1 Definition and Statistical Characteristics of Cryptosystem . . . . . . . . 153
4.2 Fully Confidential System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
4.3 Ideal Security System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
4.4 Message Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
4.5 Forgery Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
4.6 Substitute Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
4.7 Basic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
4.7.1 Affine Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
4.7.2 RSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
4.7.3 Discrete Logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
4.7.4 Knapsack Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
5 Prime Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
5.1 Fermat Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
5.2 Euler Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
5.3 Monte Carlo Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
5.4 Fermat Decomposition and Factor Basis Method . . . . . . . . . . . . . . . . 217
5.5 Continued Fraction Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
6 Elliptic Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
6.1 Basic Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
6.2 Elliptic Curve Public Key Cryptosystem . . . . . . . . . . . . . . . . . . . . . . . 236
6.3 Elliptic Curve Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
7 Lattice-Based Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
7.1 Geometry of Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
7.2 Basic Properties of Lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
7.3 Integer Lattice and q-Ary Lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
7.4 Reduced Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
7.5 Approximation of SVP and CVP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
7.6 GGH/HNF Cryptosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
7.7 NTRU Cryptosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
7.8 McEliece/Niederreiter Cryptosystem . . . . . . . . . . . . . . . . . . . . . . . . . . 334
7.9 Ajtai/Dwork Cryptosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
Acronyms

1. [x] denotes the largest integer not greater than the real number x, x denotes
the smallest integer not less than the real number x, so there are

[x] ≤ x < [x] + 1, x − 1 < x ≤ x.

2. C denotes a complex field, R denotes a real number field, Q denotes a rational


number field, Fq denotes a finite field of q elements, q = pr , p is prime, Z
denotes a integer ring, Zm denotes a residue class ring of modm (m ≥ 1).
3. (a, b) denotes the greatest common divisor of two integers and sometimes a
two-dimensional vector.
4. amodn denotes the minimum nonnegative residue of the integer a modulo n,
i.e., 0 ≤ amodn < n. Sometimes it means minimum absolute residue, i.e.,
|amodn| < 21 n.
5. Let F be a field, F[x] denotes a polynomial ring of one variable over the field
F. Sometimes the variable T is used, i.e., F[T ], where F = C, R, Q, or F = Fq
is a finite field.
6. The base of logarithm logN can be any real number b > 1. If b = 2, it is binary
logarithm, and when b = q, it is q-base logarithm. Sometimes logN also means
natural logarithm, which is determined according to the specific situation.
7. P{A} denotes the probability of occurrence of random event A.
8. If G is a group, a ∈ G is the element of the group. Then o(a) denotes the order
of a.

xi
Chapter 1
Preparatory Knowledge

Modern cryptography and information theory is a branch of mathematics which


develops rapidly. Almost all mathematical knowledge, such as algebra, geometry,
analysis, probability and statistics, has very important applications in information
theory. Especially, some modern mathematical theories, such as algebraic geometry,
elliptic curve and ergodic theory, play more and more important roles in coding and
cryptography. It can be said that information theory is the most dynamic branch of
modern mathematics with wide application, strong intersection. This chapter requires
the reader to have a preliminary knowledge of analysis, algebra, number theory and
probability statistics.

1.1 Injective
σ
Let σ be a mapping of two nonempty sets A to B, denoted as A −→ B. Generally,
the mappings between sets can be divided into three categories: injective, surjective
and bijective.
Definition 1.1 Let σ be a mapping of two nonempty sets A → B, we define
(i) a, b ∈ A, if a = b ⇒ σ (a) = σ (b), call σ an injective of A → B, it is called
injective for short.
(ii) If any b ∈ B, there is a a ∈ A ⇒ σ (a) = b, call σ a surjective of A → B.
σ
(iii) If A −→ B is an injective and a surjective, call σ a bijective of A → B.
(iv) Let 1 A be the identity mapping of A → A, which is defined as

1 A (a) = a, ∀ a ∈ A.
σ τ
(v) Suppose A −→ B −→ C are two mappings, define the product mapping of
τ and σ , τ σ : A → C, and define as
© The Author(s) 2022 1
Z. Zheng, Modern Cryptography Volume 1, Financial Mathematics and Fintech,
https://doi.org/10.1007/978-981-19-0920-7_1
2 1 Preparatory Knowledge

τ σ (a) = τ (σ (a)), ∀ a ∈ A.

Obviously, the product of two mappings has no commutativity but has the fol-
lowing associative law.
σ τ δ
Property 1 Let A −→ B −→ C −→ D be three mappings, then we have

(δ · τ ) · σ = δ · (τ · σ ). (1.1)

Proof It can be verified directly by definition.


σ
If A −→ B is a given mappings, obviously, there is

σ 1 A = σ, 1 B σ = σ. (1.2)

The above formula shows that identity mapping plays the role of multiplication
identity in the product of mapping.
σ τ
Definition 1.2 (i) Suppose A −→ B −→ A are two mappings, if τ σ = 1 A , call τ
is a left inverse mapping of σ , σ is a right inverse mapping of τ .
σ τ
(ii) Let A −→ B −→ A, If τ σ = 1 A , σ τ = 1 B , call τ is an inverse mapping of σ .
−1
Denote as τ = σ .
The essential properties of injective, surjective and bijective between sets are
described by the following lemma.
σ τ
Lemma 1.1 (i) If A −→ B has an inverse mapping B −→ A, that is σ τ = 1 B and
τ σ = 1 A , then τ is unique ( denote as τ = σ −1 ).
σ τ
(ii) A −→ B is an injective if and only if σ has a left inverse mapping B −→ A,
that is τ σ = 1 A .
σ τ
(iii) A −→ B is an surjective if and only if σ has a right inverse mapping B −→ A,
that is σ τ = 1 B .
σ
(iv) A −→ B is an bijective if and only if σ has an inverse mapping τ , and τ is
unique.
τ1 τ2
Proof First of all, prove (i). Let B −→ A and B −→ A be two inverse mappings of
σ , then we have

τ1 σ = 1 A , τ2 σ = 1 A , and σ τ1 = 1 B , σ τ2 = 1 B ,

From (1.2), we have

τ1 = τ1 1 B = τ1 (σ τ2 ) = (τ1 σ )τ2 = 1 A τ2 = τ2 ,

so if σ has an inverse mapping, then the inverse mapping is unique.


To prove (ii), we note that if σ has a left inverse mapping τ , that is τ σ = 1 A ,
then σ must be an injective, because if a, b ∈ A, a = b, then we have σ (a) = σ (b).
1.1 Injective 3

If σ (a) = σ (b) ⇒ τ (σ (a)) = τ (σ (b)) ⇒ a = b, contradiction with a = b. Con-


σ
versely, if A −→ B is an injective, then for the element σ (a) ∈ σ (A) in σ (A) ⊂ B,
Let τ (σ (a)) = a, For the elements in the difference set B\σ (A), arrange an image
τ
randomly, then B −→ A satisfies τ σ = 1 A . Similarly, we can prove (iii) and (iv),
we thus complete the proof.

In many books of information theory, we often confuse injective and bijective,


but they are two different concepts in mathematics, which needs attention.

1.2 Computational Complexity

In binary computing environment, the complexity of an algorithm is measured by


the number of bit operations. Bit, short for “Binary digit,” is the basic amount of
information, one bit represents one digit of binary system, two bits represent two
digits of binary system, so what is “bit operation”?
To understand “bit operation” accurately, we start with the b-ary expression of
real number. Let b > 1 be a positive integer, and any nonnegative real number xcan
be uniquely expanded into the following geometric series.

x= di bi
−∞<i≤k−1
+∞
 (1.3)
−i
= dk−1 b k−1
+ dk−2 b k−2
+ · · · + d1 b + d0 + d−i b ,
i=1

where ∀ di satisfies 0 ≤ di < b. So we can express x as

x = (dk−1 dk−2 · · · d0 d−1 d−2 · · · )b , (1.4)

where (dk−1 dk−2 · · · d0 )b is called a b-ary integer, (0.d−1 d−2 · · · )b is called a b-ary
decimal, and
x = (dk−1 dk−2 · · · d0 )b + (0.d−1 d−2 · · · )b . (1.5)

If b = 2, then x = (dk−1 dk−2 · · · d0 d−1 · · · )2 is called the binary representation of x.


If b = 10, then
x = (dk−1 dk−2 · · · d1 d0 d−1 d−2 · · · )10
(1.6)
= dk−1 dk−2 · · · d1 d0 .d−1 d−2 · · · .

It is our customary decimal expression. It is worth noting that in any system, integers
and integers are one-to-one correspondence, and decimals and decimals are one-to-
one correspondence. For example, integer in decimal system corresponds to integer
in binary system, so does decimal. In other words, the real number of (0, 1) interval
on the real number axis corresponds to the decimal number of (0, 1) under the binary
4 1 Preparatory Knowledge

system one by one. It should be noted that we often ignore binary decimal; in fact,
it is the main technical support of various arithmetic codes, such as Shannon code.
Now let us see the b-ary expression of a positive integer n in the decimal system.
Let
n = (dk−1 dk−2 · · · d1 d0 )b , 0 ≤ di < b, dk−1 = 0,

k is the number of b-ary digits of n.

Lemma 1.2 The number k of b-ary digits of positive integer n can be calculated
according to the following formula:

k = [logb n] + 1, (1.7)

[x] denotes the largest integer not greater than the real number x.

Proof Because of dk−1 = 0, there is bk−1 ≤ n < bk , that is

k − 1 ≤ logb n < k,

There’s k − 1 ≤ [logb n] on the left, [logb n] + 1 ≤ k on the right, and together


there’s
k = [logb n] + 1.

We complete the proof of Lemma 1.2.

Now let us see the addition operation in b-ary system. For simplicity, we con-
sider the addition of two positive integers in binary system. Let n = (1111000)2 ,
m = (11110)2 , then n + m = 1111000 + 0011110 = 10010110, that is n + m =
(10010110)2 . The addition of numbers on the same bit actually includes the fol-
lowing five contents (or operations).
1. Observe the numbers in the same bit and note if there are progressions in the
right bit(Every two goes into one).
2. If the upper and lower digits of the same bit are 0, and there is no progression
on the right side, the sum of the two digits is 0.
3. If both the upper and lower digits of the same digit are 0, but there is a progression,
or if one of the two digits is 0 and the other is 1, and there is no progression, the
two digits in this digit add up to 1.
4. If two digits of the same digit have one 0, the other one is 1, and there is one
progression, or two digits are 1, and there is no progression, the result of addition
is 0, and one progression is put forward.
5. If two digits are 1 and have one progression, the sum result is 1 and one progres-
sion forward.
1.2 Computational Complexity 5

Definition 1.3 A bit operation is an addition operation on the same bit in binary
addition. Suppose A is an algorithm in binary system, we use Time(A) to represent
the number of bit operations in algorithm A, that is, Time(A) = completes the total
number of bit operations performed by algorithm A.

It is easy to deduce the number of bit operations of binary about addition and sub-
traction by definition. Let n, m be two positive integers, and their binary expression
bits are k and l respectively, then

Time(n ± m) = max{k, l}. (1.8)

In the same way, the number of bit operations required for the multiplication of B
and D in binary system is satisfied

Time(nm) ≤ (k + l) · min{k, l} ≤ 2kl. (1.9)

It is very convenient to estimate the number of bit operations by using the symbol “O
commonly used in number theory. If f (x) and g(x) are two real valued functions,
g(x) > 0, suppose there are two absolute constants B and C such that when |x| > B,
we have
| f (x)| ≤ Cg(x), notes f (x) = O(g(x)).

This sign indicates that when x → ∞, the order of growth of f (x) is the same as
that of g(x). For example, let f (x) = ad x d + ad−1 x d−1 + · · · + a1 x + a0 (ad > 0),
then
f (x) = O(|x|d ), or f (n) = O(n d ), n ≥ 1.

For any ε > 0, there is


log n = O(n ε ), n ≥ 1.

From the lemmas of 1.2, (1.8) and (1.9), we have

Lemma 1.3 Let n, m be two positive integers, k and l are the bits of their binary
expression, respectively, if m ≤ n, then l ≤ k, and

Time(n ± m) = O(k) = O(log n);

Time(nm) = O(kl) = O(log n log m);


n
and Time( ) = O(kl) = O(log n log m).
m
In the above lemma, division is similar to multiplication. Next, we discuss the
number of bit operations required to convert a binary representation into a decimal
representation, and the number of bit operations required for n! to operate in binary.
6 1 Preparatory Knowledge

Lemma 1.4 Let k be the number of digits in binary of n, then

Time(n convert to decimal expression) = O(k 2 ) = O(log2 n)

and
Time(n!) = O(n 2 k 2 ) = O(n 2 log2 n).

Proof To convert n = (dk−1 dk−2 · · · d1 d0 )2 to decimal expression. Then divide n by


10 = (1010)2 , and the remainder is 0, 1, 10, 11, 100, 101, 110, 111, 1000 or 1001, one
of these binary numbers, these ten numbers correspond to one of the numbers from 0
to 9 and are denoted as a0 (0 ≤ a0 ≤ 9), put a0 as the decimal number of n. Similarly,
divide the quotient by 10 = (1010)2 , and the remainder is converted into a number
from 0 to 9 as the ten digits of n in decimal system. If we go on like this, we use
division [log10 n] + 1 times, the bit operation required for each division is O(4k), so

Time(n convert to decimal expression) ≤ k · O(4k) = O(k 2 ).

In the same way, we can prove the bit operation estimation of n!. We complete the
proof of Lemma 1.4.
Let us deduce the computational complexity of some common number theory
algorithms. Let m and n be two positive integers, then there is a nonnegative integer r
such that m ≡ r (mod n), where 0 ≤ r < n, we call r the smallest nonnegative residue
of m under mod n, and denote as r = m mod n. If 1 ≤ m ≤ n, Euclid’s division
method is usually used to find the greatest common divisor (n, m) of n and m. If
(m, n) = 1, then there is a positive integer a such that ma ≡ 1(mod n), a is called the
multiplicative inverse of m under mod n, denote as m −1 mod n. By Bezout formula,
if (n, m) = 1, then there are integers x and y such that xm + yn = 1, we usually
use the extended Euclid algorithm to find x and y. If we find x, we actually calculate
m −1 mod n. Under the above definitions and notations, we have
Lemma 1.5 (i) Suppose m and n are two positive integers, then

Time(calculate m mod n) = O(log n · log m).

(ii) Suppose m and n are two positive integers, and m ≤ n, then

Time(calculate (n, m)) = O(log3 n).

(iii) Suppose m and n are two positive integers, and (m, n) = 1, then

Time(calculate m −1 mod n) = O(log3 max(n, m)).

(iv) Suppose n, m, b are positive integers, b < n, then

Time(bm mod n) = O(log m · log2 n).


1.2 Computational Complexity 7

Proof To find the minimum nonnegative residue r of m under mod n is actually a


division with remainder!
m = kn + r, 0 ≤ r < n.

From the lemma 1.3,

Time(calculate m mod n) = O(log n · log m),

(i) holds. The Euclid algorithm used to calculate the greatest common divisor (n, m)
of n and m, in fact, it is a division of O(log n) times with remainder, so

Time(calculate (n, m)) = O(log3 n).

In Euclid algorithm, we can get x and y by pushing from bottom to top, such that
xm + yn = 1, this incremental process is called the expansion of Euclid algorithm,
therefore, if m ≤ n, then

Time(calculate m −1 mod n) = Time (calculate(n, m)) = O(log3 n).

(iv) the computational complexity of the power of an integer under mod n. the proof
method is the famous “repeated square method” . Let

m = (m k−1 m k−2 · · · m 1 m 0 )2
= m 0 + 2m 1 + 4m 2 + · · · + 2k−1 m k−1

be the binary representation of m, where m i = 0 or 1. First, let a = 1. if m 0 = 1,


replace a with b, if m 0 = 0, then a = 1 remains unchanged, and let b1 = b2 mod n,
this is the first square. If m 1 = 1, replace a with ab1 mod n, if m 1 = 0, a remains
unchanged, and let b2 = b12 mod n, this is the second square. So if we go on to the
square of j, we have
j
b j ≡ b2 (mod n).

Our calculation ends after the square of (k − 1); at this time, there is

a ≡ bm 0 +2m 1 +4m 2 +···+2


k−1
m k−1
≡ bm (mod n).

Obviously, the number of bit operations per square is O((log n 2 )2 ) = O(log2 n).
There is a total of k square operations, k = O(log m). So (iv) holds. We have com-
pleted the proof.

Definition 1.4 If an algorithm f involves positive integers n 1 , n 2 , . . . , n r , whose


binary digits are k1 , k2 , . . . , kr , and there are absolute nonnegative integers d1 , d2 ,
. . . , dr such that
Time( f ) = O(k1d1 k2d2 · · · krdr ), (1.10)
8 1 Preparatory Knowledge

The complexity of algorithm f is called polynomial; otherwise, it is called nonpoly-


nomial.
From Lemma 1.4 , we can see that addition, subtraction, multiplication and division
between positive integers are polynomial algorithms, but n! operation is the simplest
example of nonpolynomial algorithm. If we do not need an exact value of n! and
only need an approximate value, we can get an approximate value of n! by using a
polynomial term algorithm based on Stirling formula (see Sect. 1.4 of this chapter).
In the formula (1.10), if d1 = d2 = · · · = dr = 0, the complexity of algorithm f is
constant, if d1 = d2 = · · · = dr = 1, the complexity of the algorithm f is said to
be linear (the same is true for quadratic, cubic, etc.). In order to characterize non-
polynomial algorithms, we introduce two concepts: exponential and subexponential
algorithms.
Definition 1.5 Suppose that an algorithm f involves a positive integer n, and its
binary digits are k, if
Time( f ) = O(t g(k) ), (1.11)

where t is a constant greater than 1, and g(k) is a polynomial function of k and


deg g ≥ 1, then the computational complexity of f is exponential. If g(k)
√ is not a
polynomial function, but a function smaller than a polynomial, such as e k log k , then
the computational complexity of f is subexponential.
From the above definition, we can see the computational complexity of n!, let k
be the binary number of n, from 1.2, then n = O(2k ), and then from 1.4,

Time(n!) = O(n 2 k 2 ) = O(k 2 22k ) = O(23k ),

So the computational complexity of n! in binary system is exponential. This is the


simplest example of exponential algorithm.
Bit algorithm cannot only define the computational complexity but also describe
the running speed and time complexity of computer. The so-called computer speed
refers to the total number of bits that the computer can complete in unit time (such as
a second, or 1 microsecond). Therefore, there is no difference between the compu-
tational complexity and the time complexity of an algorithm. We can use the figure
below to illustrate, suppose that a computer can complete 106 bit operations in one
second. When the binary bit of the algorithm is k = 106 , the following figure lists
the running time of different computational complexity algorithms on this computer
(Table 1.1).
Note that 1 year ≈ 3 × 107 seconds, the age of the universe is about 1010 years;
when the number of binary digits k is large, the algorithm with exponential or subex-
ponential computational complexity is actually impossible to complete on the com-
puter; therefore, the only way to solve the problem is to improve the speed of the
computer.
Computational complexity is often used to describe the complexity of a prob-
lem, because the computational complexity is also time complexity when the com-
puter hardware conditions (such as computing speed and storage capacity) remain
1.2 Computational Complexity 9

Table 1.1 Time requirements of algorithms with different computational complexity (k = 106 )
Algorithm type Complexity Number of bit Time
operations
Constant degree O(1) 1 1 microsecond
Linear O(k) 106 1s
Quadratic O(k 2 ) 1012 11.6 days
Cubic O(k 3 ) 1018 32000 years

Subexponential O(e k log k ) About 1.8 × 101618 6 × 101604 years
Exponential O(2k ) 10301030 3 × 10301016 years

unchanged. At present, the complexity of algorithms is defined in a model called


Turing machine. Turing machine is a kind of finite state machine with infinite read
and write ability. If the result of each operation and the content of the next oper-
ation are uniquely determined, such Turing machine is called deterministic Turing
machine. Therefore, the determinacy of a polynomial algorithm is accomplished on
a determinate Turing machine.
Definition 1.6 If a problem can be solved by polynomial algorithm on a certain
Turing machine, it is called a P class problem, and the P class problem is often called
an easy to handle problem. If a problem can be solved by polynomial algorithm on
an uncertain Turing machine, it is called a N P class problem.
According to the definition, the P class problem is definitely a N P class prob-
lem, because it can be solved by polynomial algorithm on deterministic Turing
machine, and it can also be solved by polynomial algorithm on nondeterministic
Turing machine. On the other hand, is the N P problem strictly larger than the P
problem? This is an open problem that has not been solved in the field of theoretical
computer. There is neither strict proof nor counterexample to show that a problem
that can be solved by polynomial on a nondeterministic Turing machine cannot be
solved by polynomial algorithm on a deterministic Turing machine. It is widely spec-
ulated that the problem of P class and N P class is not equivalent, which is also the
cornerstone of many cryptosystems.

1.3 Jensen Inequality

A real valued function f (x) in the interval (a, b) is called a strictly convex function,
if for ∀ x1 , x2 ∈ (a, b), λ1 > 0, λ2 > 0, λ1 + λ2 = 1, we have

λ1 f (x1 ) + λ2 f (x2 ) ≤ f (λ1 x1 + λ2 x2 ),

and the equation holds if and only if x1 = x2 . By inductive method, we can prove
the Jensen inequality as follows.
10 1 Preparatory Knowledge

Lemma 1.6 If f (x) is a strictly convex function over (a, b), then for any positive
integer n > 1, any positive number λi (1 ≤ i ≤ n), λ1 + λ2 + · · · + λn = 1 and any
xi ∈ (a, b)(1 ≤ i ≤ n), we have


n 
n
λi f (xi ) ≤ f ( λi xi ), (1.12)
i=1 i=1

the equation holds if and only if x1 = x2 = · · · = xn .

Proof By inductive method, the proposition holds when n = 1 and n = 2. Suppose


the proposition holds for n − 1. When n > 2, let

λ1 λ2
x = x1 + x2 ,
λ1 + λ2 λ1 + λ2

it can be seen that x ∈ (a, b) and (λ1 + λ2 )x = λ1 x1 + λ2 x2 , therefore,


n 
n
λi f (xi ) = λ1 f (x1 ) + λ2 f (x2 ) + λi f (xi )
i=1 i=3

n
≤ (λ1 + λ2 ) f (x ) + λi f (xi )
i=3
≤ f (λ1 x1 + λ2 x2 + · · · + λn xn ).

We have the proposition that holds for n. Thus, the inequality (1.12) holds.

From the knowledge of mathematical analysis, f (x) is called a strictly convex


function in the interval (a, b) if and only if f (x) < 0. Take f (x) = log x, then
f (x) = x 2−1
ln 2
, thus log x is a strictly convex function on the interval of (0, +∞),
from Jensen inequality, we have the following inequality.

Lemma 1.7 Let g(x) be positive function, that is g(x) > 0, then for any integers
λi (1 ≤ i ≤ n), λ1 + λ2 + · · · + λn = 1, and any a1 , a2 , . . . , an , we have


n 
n
λi log g(ai ) ≤ log λi g(ai ), (1.13)
i=1 i=1

the equation holds if and only if g(a1 ) = g(a2 ) = · · · = g(an ).

Proof Because log x is strictly convex, let xi = g(ai ), then xi ∈ (0, +∞)(1 ≤ i ≤
n), by Jensen inequality,
1.3 Jensen Inequality 11


n 
n
λi log g(ai ) = λi log xi
i=1 i=1

n
≤ log( λi xi )
i=1
n
= log( λi g(ai )).
i=1

So the lemma holds.

A real valued function f (x) is called a strictly convex function in the interval
(a, b), if for ∀ x1 , x2 ∈ (a, b), λ1 > 0, λ2 > 0, λ1 + λ2 = 1, we have

f (λ1 x1 + λ2 x2 ) ≤ λ1 f (x1 ) + λ2 f (x2 ),

and the equation holds if and only if x1 = x2 . By induction, we can prove the fol-
lowing general inequality.

Lemma 1.8 If f (x) is called a strictly convex function in the interval (a, b), then
for any positive integer n ≥ 2, any positive numbers λi (1 ≤ i ≤ n), λ1 + λ2 + · · · +
λn = 1 and any xi ∈ (a, b)(1 ≤ i ≤ n), then we have


n 
n
f( λi xi ) ≤ λi f (xi ), (1.14)
i=1 i=1

the equation holds if and only if x1 = x2 = · · · = xn .

We know that f (x) is strictly convex in the interval (a, b) if and only if f (x) > 0.
Let f (x) = x log x, then f (x) = x ln1 2 > 0, when x ∈ (0, +∞). Then we have the
following logarithmic inequality.

Lemma 1.9 If a1 , a2 , . . . , an and b1 , b2 , . . . , bn are two groups of positive numbers,


then there are n
 n
ai  n
ai
ai log ≥ ( ai ) log i=1
n . (1.15)
i=1
b i i=1 i=1 bi

Proof Because f (x) = x log x is a strictly convex function, from 1.8, we have


n 
n
f( λi xi ) ≤ λi f (xi ),
i=1 i=1

n
where i=1 λi = 1. Take λi = nbi
bj
, xi = ai
bi
, then
j=1
12 1 Preparatory Knowledge
n
1 
n
ai n
ai ai
n ai log i=1
n ≤ n log ,
j=1 b j i=1 i=1 bi i=1 j=1 b j bi
n
j=1 b j is deleted at the same time on both sides, then there is
n

n
ai n
ai
( ai ) log i=1
n ≤ ai log ,
i=1 i=1 bi i=1
bi

thus (1.15) holds.

The above formula is called logarithm sum inequality, which is often used in
information theory.

1.4 Stirling Formula

In number theory (see reference 1’s Apostol 1976), we can get the average asymptotic
formula of some arithmetic functions by using the Euler sum formula, the most
important of which is the following Stirling formula. For all real numbers x ≥ 1,
we have 
log m = x log x − x + O(log x), (1.16)
1≤m≤x

where the O constant is an absolute constant. Take x = n ≥ 1 as a positive integer,


then there is Stirling formula

log n! = n log n − n + O(log n). (1.17)

In number theory, the Stirling formula appears in the more precise form below,
√ n
n! ≈ 2π n( )n
e
or
n!
lim √ = 1.
n→∞ 2π n( ne )n
n 
Lemma 1.10 Let 0 ≤ m ≤ n, n, m be nonnegative integer, and m
be the combina-
tion number, then  
n nn
≤ m . (1.18)
m m (n − m)n−m
1.4 Stirling Formula 13

Proof  
n
n = (m + (n − m)) ≥
n
m m (n − m)n−m ,
n
m

The (1.18) follows at once.

We define the binary entropy function H (x)(0 ≤ x ≤ 1) as follows.



0, if x = 0.
H (x) = (1.19)
− x log x − (1 − x) log (1 − x), if 0 < x ≤ 1.

It is obvious that H (x) = H (1 − x). So we only need to consider the case of


0 ≤ x ≤ 21 . H (x) is the information entropy of binary information space (see the
example 3.5 in Sect. 1.1 of Chap. 3), the image description is as follows (Fig. 1.1):

Lemma 1.11 Let 0 ≤ λ ≤ 21 , then we have


 
(i) 0≤i≤λn ni ≤ 2n H (λ) .
 
(ii) log 0≤i≤λn ni ≤ n H (λ).
 
(iii) limn→∞ n1 log 0≤i≤λn ni = H (λ).

Proof We first prove that (i), (ii) can be obtained directly from the logarithm of (i).

Fig. 1.1 The information


entropy of binary
information space
14 1 Preparatory Knowledge

 n 
1 = (λ + (1 − λ))n ≥ λi (1 − λ)n−i
0≤i≤λn
i
 n  λ i
= (1 − λ)n ( )
0≤i≤λn
i 1 − λ
 n  λ λn
≥ (1 − λ)n ( )
0≤i≤λn
i 1 − λ
 n 
−n H (λ)
=2 .
0≤i≤λn
i

In order to prove that (iii), we write m = [λn] = λn + O(1), from (ii), we have

1  n 
log ≤ H (λ).
n 0≤i≤λn
i

on the other hand,


1  n  1  
n
log ≥ log
n 0≤i≤λn
i n m
1
= {log n! − log m! − log(n − m)!}.
n
From the Stirling formula (1.17), we have
log n! − log m! − log(n − m)! = n log n − m log m − (n − m) log(n − m) + O(log n).

So there are

1  n  log n
log ≥ log n − λ log λn − (1 − λ) log n(1 − λ) + O( )
n 0≤i≤λn
i n
log n
= −λ log λ − (1 − λ) log(1 − λ) + O( )
n
log n
= H (λ) + O( ).
n
In the end, we have
1  n 
lim log = H (λ).
n→∞ n i
0≤i≤λn

Lemma 1.11 holds.


1.5 n-fold Bernoulli Experiment 15

1.5 n-fold Bernoulli Experiment

In a given probability space, suppose that x is a random event and y is a random


event. We denote the probability of event x occurrence by p(x), the probability of
joint occurrence of x and y is denoted by p(x y) and the probability of occurrence of
x under the condition of event y is denoted by p(x|y), which is called conditional
probability. Obviously, there is a multiplication formula as follows:

p(x y) = p(y) p(x|y). (1.20)

Two events x and y, if p(x y) = 0, say x and y are incompatible, if p(x y) =


p(x) p(y), say two events are independent, or independent of each other.
A finite set of events {x1 , x2 , . . . , xn } is called complete event group, if


n
p(xi ) = 1, and p(xi y j ) = 0, when i = j. (1.21)
i=1

In a complete event group, we can assume that 0 < p(xi ) ≤ 1(1 ≤ i ≤ n).
Total probability formula: If {x1 , x2 , . . . , xn } is a complete event group, y is any
random event, then we have
 n
p(y) = p(yxi ) (1.22)
i=1

and

n
p(y) = p(xi ) p(y|xi ). (1.23)
i=1

Lemma 1.12 Let {x1 , x2 , . . . , xn } is a complete event group, then the event y can
and can only occur simultaneously with a certain xi , so for any i, 1 ≤ i ≤ n, we
have the following Bayes formula:

p(xi ) p(y|xi )
p(xi |y) = n , 1 ≤ i ≤ n. (1.24)
j=1 p(x j ) p(y|x j )

Proof From the product formula (1.20), we have

p(xi y) = p(y) p(xi |y) = p(xi ) p(y|xi ).

then there is
p(xi ) p(y|xi )
p(xi |y) = .
p(y)

And from the total probability formula (1.23), then we can know
16 1 Preparatory Knowledge

p(xi ) p(y|xi )
p(xi |y) = n , 1 ≤ i ≤ n,
j=1 p(x j ) p(y|x j )

the Bayes formula (1.24) is proved.

Now we discuss the n-fold Bernoulli experiment. In statistical test, the test with
only two possible results is called Bernoulli experiment, and the experiment satisfy-
ing the following agreement is called n-fold Bernoulli experiment:
(1) There are at most two possible results in each experiment: a or ā.
(2) The probability p of occurrence of a in each test remains unchanged.
(3) Each experiment is statistically independent.
(4) A total of n experiments were carried out.

Lemma 1.13 (Bernoulli theorem) In Bernoulli experiment, the probability of event


a is p, and then in the n-fold Bernoulli experiment, the probability B(k; n, p) of a
appearing k(0 ≤ k ≤ n) times is
 
n k n−k
B(k; n, p) = p q , q = 1 − p. (1.25)
k

Proof The results of the i-th Bernoulli test are recorded as xi (xi = a or ā), then
n-fold Bernoulli experiment forms the following joint event x

x = x1 x2 · · · xn , xi = a or ā.

Because of the independence of the experiment, when there are exactly k xi = a, the
occurrence probability of x is

p(x) = p(x1 ) p(x2 ) · · · p(xn ) = p k q n−k .

Obviously, there are exactly k joint events of xi = a, and the total number is xi = a,
so  
n k n−k
B(k; n, p) = p q .
k

Lemma 1.13 holds.

In the same way, we can calculate the probability of event a appearing at the k-th
in multiple Bernoulli experiments.

Lemma 1.14 Suppose that a and ā are two possible events in Bernoulli experiments,
then the probability of the first appearance of a in the k-th Bernoulli experiment is
pq k−1 .

Proof Joint event x = x1 x2 · · · xk formed by k-fold Bernoulli experiment, where


k − 1 xi = ā, and xk = a, then
1.5 n-fold Bernoulli Experiment 17

p(x) = p(x1 ) · · · p(xk−1 ) p(xk ) = pq k−1 .

We have completed the proof.

n-fold Bernoulli experiment is not only the most basic probability model in prob-
ability and statistics, but also a common tool in communication field. Next, we take
the error of binary channel transmission as an example to illustrate.

Example 1.1 (Error probability of binary channel) In binary channel transmission, a


codeword x of length n is a vector x = (x1 , x2 , . . . , xn ) in n-dimensional vector space
Fn2 , where xi = 0 or 1(1 ≤ i ≤ n). For convenience, let us write x = x1 x2 · · · xn .
Due to channel interference, characters 0 and 1 may have errors in transmission,
that is, 0 becomes 1, 1 becomes 0, let the error probability be p ( p may be very
small), and the error probability of each transmission is constant. Under the above
assumption, the codeword x with a transmission length of n can be regarded as a
n-fold Bernoulli experiment, and the error probability B(k; n, p) of k(0 ≤ k ≤ n)
errors of x in transmission is
 
n k n−k
B(k; n, p) = p q , q = 1 − p.
k

1.6 Chebyshev Inequality

We call the variable ξ defined as a real number in a probability space a random


variable. For any real number x ∈ (−∞, +∞), p(x) is defined as the probability of
the value x of the random variable ξ , i.e.,

p(x) = P{ξ = x}, (1.26)

Call p(x) the probability function of ξ . If ξ has only a finite number of values,
or countable infinite values, that is, the value space of ξ is a finite number of real
numbers, or countable infinite real numbers, then ξ is called discrete random variable;
otherwise, ξ is called continuous random variable. The distribution function F(x) of
a random variable ξ is defined as

F(x) = P{ξ ≤ x}, x ∈ (−∞, +∞). (1.27)

Obviously, the distribution function F(x) of ξ is defined as a monotone increasing


function on the whole real axis (−∞, +∞). And it is a right continuous function,
that is F(x0 ) = lim F(x). The probability distribution of a random variable ξ is
x→x0 +0
completely determined by its distribution function F(x), in fact, for any x,

p(x) = P{ξ = x} = F(x) − F(x − 0).


18 1 Preparatory Knowledge

Let f (x) be a nonnegative integrable function on the real axis. And


x

F(x) = f (t) dt, (1.28)


−∞

f (x) is called the density function of the random variable ξ . Obviously, the density
function satisfies:
+∞

f (x) ≥ 0, ∀x ∈ (−∞, +∞), f (x) dx = 1. (1.29)


−∞

On the other hand, the function f (x) satisfying the formula (1.29) must be the den-
sity function of a random variable. Here, we introduce several common continuous
random variables and their probability distribution.
1. Uniform distribution(Equal probability distribution)
A random variable ξ is equal probability value in interval [a, b], and ξ is said to be
uniformly distributed, or it is also called a random variable of uniformly distributed,
and its density function is

⎨ 1
, a ≤ x ≤ b.
f (x) = b−a

0, otherwise.

Its distribution function F(x) is



⎪ 0, when x < a.

⎨ x −a
F(x) = , when a ≤ x ≤ b.

⎪ b−a

1, when x > b.

2. Exponential distribution
The density function of random variable ξ is

λe−λx , when x ≥ 0.
f (x) =
0, when x < 0.

where the given parameter is λ ≥ 0, and its distribution function is



1 − e−λx , when x ≥ 0.
F(x) =
0, when x < 0.

We call ξ an exponential distribution with parameter λ or a random variable with


exponential distribution.
1.6 Chebyshev Inequality 19

3. Normal distribution
A continuous random variable ξ whose density function f (x) is defined as:

1 (x−μ)2
f (x) = √ e− 2σ 2 , x ∈ (−∞, +∞).
2π σ

where μ and σ are constants, σ > 0. We say that ξ obeys the normal distribution
with parameters of μ and σ 2 , and denote as ξ ∼ N μ, σ 2 . By Possion integral,

+∞

e−x dx =
2
π,
−∞

it is not hard to verify


+∞

f (x) dx = 1.
−∞

 
The distribution function F(x) of normal distribution N μ, σ 2 is

x
1 (t−μ)2
F(x) = √ e− 2σ 2 dt.
2π σ
−∞

When μ = 0, σ = 1, N (0, 1) is called standard normal distribution.


Let us define the mathematical expectation and variance of a random variable ξ .
First, let us see the mathematical expectation of a discrete random variable .
(1) Let ξ be a discrete random variable whose value space is {x1 , x2 , . . . , xn , . . .}.
And let p(xi ) = P {ξ = xi }. If

+∞

|xi | p(xi ) < ∞,
i=1

Then the mathematical expectation E(ξ ) of ξ is defined as

+∞

Eξ = E(ξ ) = xi p(xi ). (1.30)
i=1

(2) Let ξ be a continuous random variable and f (x) be its density function, if

+∞

|x| f (x)dx < +∞,


−∞
20 1 Preparatory Knowledge

Then the mathematical expectation E(ξ ) of ξ is defined as

+∞

Eξ = E(ξ ) = x f (x)dx. (1.31)


−∞

(3) Let h(x) be a real valued function, then h(ξ ) is also a random variable, and
h(ξ ) is called a function of the random variable ξ . The mathematical expectation
E(h(ξ )) of h(ξ ) is E h (ξ ).

Lemma 1.15 (1) Let ξ be a discrete random variable whose value space is {x1 , x2 , . . .,
xn , . . .}, if E(ξ ) exists, then E h (ξ ) also exists, and

+∞

E h (ξ ) = h (xi ) p(xi ).
i=1

(2) If ξ is a continuous random variable, and E(ξ ) exists, then E h (ξ ) also exists,
and
+∞

E h (ξ ) = h(x) f (x)dx.
−∞

Proof Let the value space of η = h(ξ ) be {y1 , y2 , . . . , yn , . . .}, then

+∞
 +∞

 
P η = y j = P( {ξ = xi }) = P {ξ = xi } .
i=1 i=1
h(xi )=y j h(xi )=y j

By the definition of E(η), then

+∞
  
E h (ξ ) = E(η) = yj P η = yj
j=1
+∞
 +∞

= yj P {ξ = xi }
j=1 i=1
h(xi )=y j
+∞
 +∞

= ( y j )P {ξ = xi }
i=1 j=1
h(xi )=y j
+∞

= h(xi ) p(xi ).
i=1

The same can be proved (2).


1.6 Chebyshev Inequality 21

The following basic properties of mathematical expectation are easy to prove.

Lemma 1.16 (1) If ξ = c is constant, then E(ξ ) = c.


(2) If a and b are constants, then E(aξ + bξ ) = a E(ξ ) + bE(ξ ).
(3) If a ≤ ξ ≤ b, then a ≤ E(ξ ) ≤ b.

If the mathematical expectation E(ξ ) of a random variable exists, then (ξ − Eξ )2


is also a random variable (take h(x) = (x − a)2 , where a = E(ξ )), We define the
mathematical expectation E h (ξ ) of h(ξ ) as the variance of ξ , denoted as D(ξ ), that
is
D(ξ ) = E((ξ − Eξ )2 ).

Denote σ = D(ξ ) is the standard deviation of ξ . Here are some basic properties
about variance.

Lemma 1.17 (1) D(ξ ) = E(ξ 2 ) − E 2 (ξ ).


(2) If ξ = ais constant, then D(ξ ) = 0.
(3) D(ξ + c) = D(ξ ).
(4) D(cξ ) = c2 D(ξ ).
(5) If c = Eξ , then D(ξ ) < E((ξ − c)2 ).

Proof (1) can be seen from the definition,

D(ξ ) = E((ξ − Eξ )2 )
= E(ξ 2 − 2ξ Eξ + E 2 (ξ ))
= E(ξ 2 ) − 2(Eξ )2 + (E(ξ ))2
= E(ξ 2 ) − (Eξ )2 .

(2) is trivial. Let us prove (3). By (1),

D(ξ + c) = E((ξ + c)2 ) − (E(ξ + c))2


= E(ξ 2 + 2cξ + c2 ) − ((Eξ )2 + 2cE(ξ ) + c2 )
= E(ξ 2 ) + 2cE(ξ ) + c2 − (Eξ )2 − 2cE(ξ ) − c2
= E(ξ 2 ) − (Eξ )2 = D(ξ ).

(4) can also be derived directly from (1). In fact,

D(cξ ) = E(c2 ξ 2 ) − (E(cξ ))2


= c2 E(ξ 2 ) − c2 (Eξ )2
= c2 D(ξ ).

To prove (5), from Lemma 1.16, we notice that the mathematical expectation of
(ξ − Eξ ) is 0, so if c = E(ξ ), by (3), we have
22 1 Preparatory Knowledge

D(ξ ) = D(ξ − c) = E((ξ − c)2 ) − (E(ξ − c))2 .

Since the last term of the above formula is not zero, we always have

D(ξ ) < E((ξ − c)2 ).

(5) holds. This property indicates that E((ξ − c)2 ) reaches the minimum value
D(ξ ) at c = Eξ . We have completed the proof.
Now we give the main results of this section; in mathematics, it is called Chebyshev
type inequality, which is essentially the so-called moment inequality, because the
mathematical expectation Eξ of a random variable ξ is the first-order origin moment
and the variance is the second-order moment.
Theorem 1.1 Let h(x) be a nonnegative real valued function of x, ξ is a random
variable, and expectation Eξ exists, then for any ε > 0, we have

E h (ξ )
P{h(ξ ) ≥ ε} ≤ , (1.32)
ε
and
E h (ξ )
P{h(ξ ) > ε} < . (1.33)
ε

Proof We prove the theorem only for continuous random variable ξ . Let f (x) be
density function of ξ , then by Lemma 1.15,

+∞

E h (ξ ) = h(x) f (x) dx
−∞

≥ h(x) f (x) dx
h(x)≥ε

≥ε f (x) dx
h(x)≥ε

= ε P{h(x) ≥ ε}.

so (1.32) holds. Similarly, we can prove (1.33).


In the theorem, we can get different Chebyshev inequality by replacing h(ξ ) with
ξ − Eξ .
Corollary 1.1 (Chebyshev) If the variance D(ξ ) of the random variable ξ exists,
then for any ε > 0, we have

D(ξ )
P{|ξ − Eξ | ≥ ε} ≤ . (1.34)
ε2
1.6 Chebyshev Inequality 23

Proof Take h(ξ ) = (ξ − Eξ )2 in Theorem 1.1, then |ξ − Eξ | ≥ ε if and only if


h(ξ ) ≥ ε2 , from definition, E h (ξ ) = D(ξ ). thus

E h (ξ )
P{|ξ − Eξ | ≥ ε} = P{h(ξ ) ≥ ε2 } ≤ .
ε2
The Corollary holds.

Corollary 1.2 (Chebyshev) Suppose that both the expected value Eξ and the vari-
ance D(ξ ) of the random variable ξ exist, then for any k > 0, we have
 1
P{|ξ − Eξ | ≥ k D(ξ )} ≤ 2 . (1.35)
k

Proof Take ε = k D(ξ ) in Corollary 1.1, then
 D(ξ ) 1
P{|ξ − Eξ | ≥ k D(ξ )} ≤ 2 = 2.
k D(ξ ) k

Corollary 1.2 holds.



In mathematics, μ is often used as the expected value, σ = D(ξ )(σ ≥ 0) as the
standard deviation, that is

μ = Eξ, σ = D(ξ ), σ ≥ 0.

Then the Chebyshev inequality in the Corollary 1.2 can be written as follows:

1
P{|ξ − μ| ≥ kσ } ≤ . (1.36)
k2
Corollary 1.3 (Markov) If the expected value of the random variable ξ satisfying
the positive integer |ξ |k of k ≥ 1 exists, then

E|ξ |k
P{|ξ | ≥ ε} ≤ .
εk

Proof Take h(ξ ) = |ξ |k in Theorem 1.1, Replace ε with εk , then the Markov inequal-
ity is directly derived from Theorem 1.1.

Next, we introduce several common discrete random variables and their probability
distribution and calculate their expected value and variance.
Example 1.2 (Degenerate distribution) A random variable ξ takes a constant a with
probability 1, that is ξ = a, P{ξ = a} = 1, ξ is called degenerate distribution. From
Lemma 1.16, (1), Eξ = a, its variance is D(ξ ) = 0.
24 1 Preparatory Knowledge

Example 1.3 (Two point distribution) A random variable ξ has only two values
{x1 , x2 }, and its probability distribution is

P{ξ = x1 } = p, P{ξ = x2 } = 1 − p, 0 < p < 1.

ξ is called a two-point distribution with parameter p, and its mathematical expectation


and variance are 
E(ξ ) = x1 p + x2 (1 − p),
D(ξ ) = p(1 − p)(x1 − x2 )2 .

Specially, take x1 = 1, x2 = 0, then the expected value and variance of the two-point
distribution are
E(ξ ) = p, D(ξ ) = p(1 − p).

Example 1.4 (Equal probability distribution) Let a random variable ξ have n values
{x1 , x2 , . . . , xn } and be equal probability distribution, that is

1
P{ξ = xi } = , 1 ≤ i ≤ n.
n
ξ is called a equal probability distribution or uniform distribution with obeying n
points x1 , x2 , . . . , xn . The expected value and variance are

1 1
n n
E(ξ ) = xi , D(ξ ) = (xi − E(ξ ))2 .
n i=1 n i=1

Example 1.5 (Binomial distribution) In the n-fold Bernoulli experiment, the number
of times ξ of event a is a random variable from 0 to n. The probability distribution
is (see Bernoulli experiment)
 
n k n−k
P{ξ = k} = b(k; n, p) = p q ,
k

where 0 ≤ k ≤ n, p is the probability of event a occurring in each experiment. ξ is


called a binomial distribution with parameter n, p, denotes as ξ ∼ b(n, p). In fact,
b(k; n, p) is the expansion of binomial ( p + q)n .

Lemma 1.18 Let ξ ∼ b(n, p), then

E(ξ ) = np, D(ξ ) = npq, q = 1 − p.


1.6 Chebyshev Inequality 25

Proof By definition,


n n  
n k n−k
E(ξ ) = kb(k; n, p) = k p q
k=0 k=1
k
n 
 
n−1
= np p k−1 q (n−1)−(k−1)
k=1
k−1
n−1 
 
n−1
= np p k q n−1−k
k=0
k

n−1
= np b(k; n − 1, p)
k=0
= np.

Similarly, it can be calculated


n
E(ξ ) =
2
k 2 b(k; n, p) = n 2 p 2 + npq.
k=0

thus
D(ξ ) = E(ξ 2 ) − (E(ξ ))2 = npq.

We have completes the proof.

Lemma 1.19 pn is the probability of event a in the n-fold Bernoulli experiment. If


npn → λ, then we have
λk
lim b(k; n, pn ) = e−λ .
n→∞ k!

Proof Write λn = npn , then


 
n
b(k; n, pn ) = ( pn )k (1 − pn )n−k
k
n(n − 1) · · · (n − (k − 1)) λn k λn
= ( ) (1 − )n−k
k! n n
(λn )k 1 k−1 λn n−k
= (1 − ) · · · (1 − )(1 − ) .
k! n n n

Because for fixed k, there is lim (λn )k = λk , and


n→∞

λn n−k
lim (1 − ) = e−λ ,
n→∞ n
26 1 Preparatory Knowledge

also
1 2 k−1
lim (1 − )(1 − ) · · · (1 − ) = 1.
n→∞ n n n
So there are
λk −λ
lim b(k; n, pn ) = e .
n→∞ k!
So Lemma 1.19 holds.

Example 1.6 (Possion distribution) The value of discrete random variable ξ is


0, 1, . . . , n, . . ., λ ≥ 0 is a nonnegative real number, if the probability distribution
of ξ is
λk
P{ξ = k} = p(k, λ) = e−λ ,
k!
ξ is called a random variable which obeys Poisson distribution. It can be proved
that the expected value and variance of Poisson distribution ξ are λ. When p is very
small, the random variable ξn of n-fold Bernoulli experiment can be considered to be
close to the Poisson distribution ξ . In this case, the probability distribution function
b(k; n, p) can be approximately replaced by the possion distribution, that is

(np)k −np
b(k; n, p) ≈ e .
k!

1.7 Stochastic Process

The so-called stochastic process is to consider the statistical characteristics of a


consistent random variable {ξi }i=1 n
. We can describe it as a n dimensional random
vector. Let {ξi }i=1 be n compatible random variables of a given probability space,
n

ξ = (ξ1 , ξ2 , . . . , ξn ) is called an n-dimensional random vector with values in Rn in


the probability space.
A stochastic process or a n dimensional random vector ξ = (ξ1 , ξ2 , . . . , ξn ) is
uniquely determined by the occurrence probability of the following joint events.
Let A(ξi ) ⊂ R be the value space of random variable ξi (1 ≤ i ≤ n); then for
any (x1 , x2 , . . . , xn ) ∈ A(ξ1 ) × A(ξ2 ) × · · · × A(ξn ) ⊂ Rn , the probability of occur-
rence of the following joint event is denoted as

p(x1 x2 · · · xn ) = p((x1 , x2 , . . . , xn )) = P{ξ1 = x1 , ξ2 = x2 , . . . , ξn = xn }.

Definition 1.7 If for any xi ∈ R(1 ≤ i ≤ n), we have

p(x1 x2 · · · xn ) = p(x1 ) p(x2 ) · · · p(xn ).


1.7 Stochastic Process 27

Called stochastic process {ξi }i=1


n
is statistically independent.
Strictly speaking, each real number xi in the Definition 1.7 should belong to the
set of Bor el on the line to ensure the event {ξi = xi } generated by ξi is the event in
a given probability space.
Similarly, we can define a vector function F(x1 , x2 , . . . , xn ) in Rn as

F(x1 , x2 , . . . , xn ) = P{ξ1 ≤ x1 , ξ2 ≤ x2 , . . . , ξn ≤ xn },

This is the distribution function of random vector ξ = (ξ1 , ξ2 , . . . , ξn ). Its marginal


distribution function is

Fi (xi ) = P{ξi ≤ xi } = F(+∞, +∞, . . . , xi , +∞, . . . , +∞).

For the following properties of stochastic process, we do not give any proof. The
reader can find them in the classical probability theory textbook (see reference 1’s
Rényi 1970, Li 2010, Long 2020).

Lemma 1.20 (1) A stochastic process {ξi }i=1


n
is statistically independent if and only
if
F(x1 x2 · · · xn ) = F(x1 )F(x2 ) · · · F(xn ).

(2) Suppose {ξi }i=1


n
is statistically independent, for any real value function gi (x),
then {gi (ξi )}i=1 is also statistically independent.
n

(3) If ξi is n random variables, then

E(ξ1 + ξ2 + · · · + ξn ) = E(ξ1 ) + E(ξ2 ) + · · · + E(ξn ).

(4) If {ξi }i=1


n
is statistically independent, the expected value E(ξi ) of each ran-
dom variable existence, then the mathematical expectation of random variable
ξ = (ξ1 , ξ2 , . . . , ξn ) exists, and

E(ξ ) = E((ξ1 , ξ2 , . . . , ξn )) = E(ξ1 )E(ξ2 ) · · · E(ξn ).


Definition 1.8 Let {ξi }i=1 be a series of random variables, ξ is a given random
variable, if for any ξ > 0, we have

lim P{|ξn − ξ | > ε} = 0,


n→∞

P
it is called {ξn } converges to ξ in probability, denoted as ξn −→ ξ .
P
Obviously, ξn −→ ξ if and only if for any ε > 0, there is

lim P{|ξn − ξ | ≤ ε} = 1.
n→∞
28 1 Preparatory Knowledge

If the occurrence probability of an event is p, the frequency of the event in the


statistical test gradually approaches its probability p. Strict mathematical statements
and proof are attributed to the Bernoulli law of large numbers.

Theorem 1.2 (Bernoulli) Let μn be the number of occurrences of event a in the n-


fold Bernoulli experiment, it is known that the probability of occurrence of a in each
experiment is p(0 < p < 1), then the frequency { μnn } of a converges in probability
to p, that is, for any ε > 0, there is
μn
lim P{| − p| > ε} = 0.
n→∞ n
μn
Proof Consider n
as a random variable, its expected value and variance are

μn 1
E( ) = E(μn ) = p
n n
and
μn 1 pq
D( ) = 2 D(μn ) = , q = 1 − p.
n n n
respectively. By Chebyshev inequality (1.34), we have
μn pq
P{| − p| > ε} < 2 .
n nε
For any given ε > 0, we have
μn
lim P{| − p| > ε} = 0.
n→∞ n
So Bernoulli’s law of large numbers holds.

In order to better understand Bernoulli’s law of large numbers, we can use a


random process to describe it. Define

1, if event a occurs in the i-th experiment.
ξi =
0, if event a does not occur in the i-th experiment.

Then ξi follows a two-point distribution with parameter p (see Sect. 1.6, example
+∞
1.3), and {ξi }i=1 is an independent and identically distributed stochastic process.
Obviously,
 n
μn = ξi , E(ξi ) = p.
i=1
1.7 Stochastic Process 29

So Bernoulli’s law of large numbers can be rewritten as follows:


 
1 1 
n n
lim P | ξi − E( ξi )| < ε = 1,
n→∞ n i=1 n i=1

where {ξi } is a sequence of independent random variables with the same two-point
distribution of 0 − 1 with parameter p. It is not difficult to generalize this conclusion
to a more general case.
+∞
Theorem 1.3 (Chebyshev’s law of large numbers) Let {ξi }i=1 be a series of inde-
pendent random variables, their expected value E(ξi ) and variance D(ξi ) exist, and
the variance is bounded, i.e., D(ξi ) ≤ C holds for any i ≥ 1, then for any ε > 0, we
have
1 1
n n
lim P{| ξi − E(ξi )| < ε} = 1.
n→∞ n i=1 n i=1

Proof By Chebyshev inequality,

1 1 
n n
P{| ξi − E( ξi )| ≥ ε}
n i=1 n i=1
n
D( n1 i=1 ξi )

nε
2

D( i=1 ξi )
=
n 2 ε2
1 
n
= 2 2 D(ξi )
n ε i=1
C
≤ .
nε2
So there are
1 1
n n
lim P{| ξi − E(ξi )| ≥ ε} = 0.
n→∞ n i=1 n i=1

That is, Theorem 1.3 holds.

Chebyshev’s law of large numbers is more general than Bernoulli’s law of large
numbers, it can be understood as a sequence of independent random variables {ξi },
the arithmetic mean of a random variable converges to the arithmetic mean of its
expected value in probability.
As a special case, we consider an independent identically distributed stochastic
process {ξi }. Because there is the same probability distribution, there is the same
expectation and variance.
30 1 Preparatory Knowledge

Corollary 1.4 Let {ξi } be an independent and identically distributed random pro-
cess, their common expectation is μ, the variance is σ 2 , that is E(ξi ) = μ, D(ξi ) =
σ 2 (i = 1, 2, . . .), then we have

1
n
lim P{| ξi − μ| < ε} = 1,
n→∞ n i=1

n P
i=1 ξi −→ μ.
1
that is n

In the above Corollary, the existence of variance is unnecessary, Sinchin proved


an independent and identically distributednstochastic process {ξi }, as long as the
expected value E(ξi ) = μ exists. Then n1 i=1 ξi converges to its expected value in
probability. This conclusion is called Sinchin’s law of large numbers.
Finally, we state the so-called Lindbergh Levy’s central limit theorem without
proof.
+∞
Theorem 1.4 (central limit theorem) Let {ξi }i=1 is an independent and identically
distributed stochastic process, the expected value is E(ξi ) = μ, the variance is
D(ξi ) = σ 2 > 0(i = 1, 2, . . .), then for any x, we have

n x
i=1 ξi − nμ 1 t2
lim P{ √ ≤ x} = √ e− 2 dt,
n→∞ σ n 2π
−∞

n
That is, the sum of random variables i=1 ξi , whose standardized variables converge
to the standard normal distribution N (0, 1) in probability.

Exercise 1 (Nie and Ding 2000)


σ
1. Let A, B, C be three nonempty sets, A −→ B is the given mapping, τ1 and τ2
are any two mappings of B → C. Prove: if σ is surjective and τ1 σ = τ2 σ , then
τ1 = τ2 .
2. Let τ1 and τ2 be any two mappings of A → B, σ is the given mapping of B → C.
Prove: if σ is injective and σ τ1 = σ τ2 , then τ1 = τ2 .
σ
3. Let A −→ B be a injective, τ : B → A is the left inverse of σ , Is the left inverse
τ of σ unique?
σ
4. Let A −→ B be a surjective, Is the right inverse of σ unique?
5. Suppose that a, m, n are integers, a ≥ 0, m ≥ 1, n ≥ 1, prove
m n
(a 2 + 1, a 2 + 1) = 1 or 2.

Thus prove Polya theorem: there are infinitely many primes.


6. On the positive integer set, the Möbius function μ(n) is defined as
1.7 Stochastic Process 31


⎨1, when n = 1,
μ(n) = 0, when n contain square factor,


(−1)t , when n = p1 p2 · · · pt , pi are different primes.

Prove Möbius identity



 1, when n = 1,
μ(d) =
d|n
0, when n > 1.

7. Suppose ϕ(n) is a Euler function, prove


 μ(d)
ϕ(n) = n .
d|n
d

8. Let n ≥ 1 be positive integer, prove Wilson theorem:

(n − 1)! + 1 ≡ 0(mod n)

if and only if n is prime.


9. Let n and b be positive integers, n > b, prove n can be uniquely expressed as
the following b-ary number:

n = b0 + b1 b + b2 b2 + · · · + br −1 br −1 , where 0 ≤ bi < b, r ≥ 1.

n = (br −1 br −2 · · · b1 b0 )b is called the b-ary expression of n and r is called the


b-ary digit of n.
10. Let f (n) be a complex valued function on a set of positive integers, and prove
the inversion formula of Möbius:
  n
F(n) = f (d), ∀n ≥ 1 ⇔ f (n) = μ(d)F( ), ∀n ≥ 1.
d|n d|n
d

11. Prove that the following sum formula:


 nϕ(n)
r= .
1≤r ≤n
2
(r,n)=1

12. Prove: There are infinitely many primes p satisfies p ≡ −1(mod 6).
13. Solve the congruence equation: 27x ≡ 25(mod 31).
14. Let p be a prime, n ≥ 1 be a positive integer, find the number of solutions of
quadratic congruence equation x 2 ≡ 1(mod p n ).
32 1 Preparatory Knowledge

15. In order to reduce the number of games, 20 teams were divided into two groups,
each with 10 team, find the probability that the strongest two teams will be in the
same group, and the probability of the strongest two teams in different groups.
16. (Banach question). A mathematician has two boxes of matches. Each box has N
matches. When he uses them, he takes one match from any box and calculates
the probability that one box has k Matches and the other box is empty.
17. A stick of length l can break at any two points, find the probability that the three
pieces of the stick can form a triangle.
18. There are k jars, each containing n balls, numbered from 1 to n. Now take any
ball from each jar and ask the probability of that m is the largest number in the
ball.
19. Take any three of the five numbers of 1, 2, 3, 4, 5 and arrange them from small to
large. Let X denote the number in the middle and find the probability distribution
of X .
20. Let F(x) be a distribution function of a continuous random variable, a > 0,
prove
+∞

|F(x + a) − F(x)|d x = a.
−∞

21. (Generalization of Bernoulli’s law of large numbers) Let μn is the number of


occurrences of event A in the first n experiments of a series of independent
Bernoulli experiments, it is known that the probability of occurrence of event A
in the i test is pi , try to write the corresponding law of large numbers and prove
it.

References

Apostol, T. M. (1976). Introduction to analytic number theory. Springer.


Hardy, G. H., & Wright, E. M. (1979). An introduction to the theory of number. Oxford University
Press.
Jacobson, N. (1989). Basic algebra (I). Translated by the Department of algebra, Department of
mathematics, Shanghai Normal University, Beijing, Higher Education Press (in Chinese).
Leveque, W. J. (1977). Fundamentals of number theory. Addison-Wesley.
Li, X. (2010). Basic probability theory. Higher Education Press (in Chinese)
Lidl, R., & Niederreiter, H. (1983). Finite fields. Addison-Wesley.
Long, Y. (2020). Probability theory and mathematical statistics. Higher Education Press (in Chi-
nese)
Nie, L., & Ding, S. (2000). Introduction to algebra. Higher Education Press (in Chinese)
Rényi, A. (1970). Probability theory. North-Holland.
Rosen, K. H. (1984). Elementaty number theory and its applications. Addison-Wesley.
Rosen, M. H. (2002). Number theory in function fields. Springer.
Spencer, D. (1982). Computers in number theory. Computer Science Press.
VanderWalden, B. L. (1963). Algebra (I). Translated by Shisun Ding: Kencheng Zeng, Fuxin Hao,
Beijing, Science Press (in Chinese).
References 33

VanderWalden, B. L. (1976). Algebra (II). Translated by Xihua Cao: Kencheng Zeng, Fuxin Hao,
Beijing, Science Press (in Chinese).
VanLint, J. H. (1991). Introduction to coding theory. Springer.

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 2
The Basis of Code Theory

The channel of information transmission is called channel for short. The commonly
used channels include cable, optical fiber, medium of radio wave transmission and
carrier line, etc., and also include tape, optical disk, etc. The channel constitutes
the physical conditions for social information to interact across space and time. In
addition, a piece of social information, such as various language information, picture
information, data information and so on, should be exchanged across time and space,
information coding is the basic technical means. What is information coding? In
short, it is the process of digitizing all kinds of social information. Digitization
is not a simple digital substitution of social information, but is full of profound
mathematical principles and beautiful mathematical technology. For example, the
source code used for data compression and storage uses the principle of probability
statistics to attach the required statistical characteristics to social information, so the
source code is also called random code. The other is the so-called channel coding,
which is used to overcome the channel interference. This kind of code is full of
beautiful algebra, geometry and various mathematical techniques in combinatorics,
in order to improve the accuracy of information transmission, so the channel coding
is also called algebraic combinatorial code. The main purpose of this chapter is to
introduce the basic knowledge of code theory for channel coding. Source coding will
be introduced in Chap. 3.
With the hardware support of channel and the software technology of information
coding, we can implement the long-distance exchange of various social information
across time and space. Taking channel coding as an example, this process can be
described as the following diagram (Fig. 2.1).
In 1948, American mathematician Shannon published his pioneering paper
“Mathematical Principles of Communication” in the technical bulletin of Bell labora-
tory, marking the advent of the era of electronic information. In this paper, Shannon
proved the existence of “good code” with the rate infinitely close to the channel
capacity and the transmission error probability arbitrarily small by using probabil-
ity theory (see Theorem in this Chap. 2.10), on the other hand, if the transmission
© The Author(s) 2022 35
Z. Zheng, Modern Cryptography Volume 1, Financial Mathematics and Fintech,
https://doi.org/10.1007/978-981-19-0920-7_2
36 2 The Basis of Code Theory

Fig. 2.1 Channel Coding

error probability is arbitrarily small, the code rate (transmission efficiency) does not
exceed an upper bound (channel capacity) (see Theorem in Chap. 3). This upper
bound is called Shannon’s limit, which is regarded as the golden rule in the field of
electronic communication engineering technology.
Shannon’s theorem is an existence proof rather than a constructive proof. How
to construct the so-called good code which can not only ensure the communication
efficiency (the code rate is as large as possible), but also control the transmission error
rate is the unremitting goal after the advent of Shannon’s theory. From Hamming and
Golay to Elias, Goppa, Berrou and Turkish mathematician Arikan, from Hamming
code, Golay code to convolutional code, turbo code to polar code, over the past
decades, electronic communication has reached one peak after another, creating one
technological miracle after another, until today’s 5G era. In 1969, the U.S. Mars probe
used Hadamard code to transmit image information. For the first time, mankind was
lucky to witness one beautiful picture after another in outer space, in 1971, the U.S.
Jupiter and Saturn probe used the famous Golay code G23 to send hundreds of frames
of color photos of Jupiter and Saturn back to earth, 70 years of exploration of channel
coding is a magnificent history of electronic communication.
The main purpose of this chapter is to strictly define and prove the mathematical
characteristics of general codes in theory, so as to provide a solid mathematical
foundation for further study of coding technology and cryptography. This chapter
includes Hamming distance, Lee distance, linear code, some typical good codes,
MacWilliams theorem and famous Shannon coding theorem. Master the content of
this chapter, we will have a basic and comprehensive understanding of channel coding
theory (error correction code).

2.1 Hamming Distance

In channel coding, the alphabet usually chooses a q-element finite field Fq , sometimes
a ring Zm , where q is the power of a prime. Let n  1 be a positive integer, Fqn is an
n-dimensional linear space over Fq , also called codeword space.

Fqn = {x = (x1 , x2 , . . . , xn )|∀xi ∈ Fq }.


2.1 Hamming Distance 37

A vector x = (x1 , x2 , . . . , xn ) in Fqn is called a codeword of length n. For conve-


nience, a codeword x, we write as x = x1 x2 . . . xn , each xi ∈ Fq is called a character,
denoted by 0 = (0, 0, . . . , 0).
Two codewords x = x1 x2 . . . xn and y = y1 y2 . . . yn define the number of char-
acters whose Hamming distance is different from x and y, that is

d(x, y) = #{i|1 ≤ i ≤ n, xi = yi }. (2.1)

Obviously 0  d(x, y)  n is a positive integer, the weight function of a codeword


x ∈ Fqn is defined as w(x) = d(x, 0), that is Hamming distance between x and 0. The
following properties are obvious.
Property 2.1 If x, y ∈ Fqn , then
(i) d(x, y) ≥ 0, d(x, y) = 0 if and only if x = y.
(ii) d(x, y) = d(y, x).
(iii) w(−x) = w(x).
(iv) d(x, y) = d(x − z, y − z), ∀ z ∈ Fqn .
(v) d(x, y) = w(x − y).
Property (i) is called nonnegativity, property (ii) symmetry and property (iv) trans-
lation invariance. This is the basic property of distance function in mathematics, and
we can analogy with the distance between two points in plane or Euclidean space.

Lemma 2.1 Let x, y ∈ Fqn be two codings, then

w(x ± y)  w(x) + w(y).

Proof Because w(−x) = w(x), so w(x − y) = w(x + (−y)). We can only prove
w(x + y)  w(x) + w(y). Let x = x1 . . . xn , y = y1 . . . yn , then

x + y = (x1 + y1 )(x2 + y2 ) . . . (xn + yn ).

Obviously, if xi + yi = 0, then xi = 0, or yi = 0(1  i  n). Thus w(x + y) 


w(x) + w(y).

w(x − y) = w(x + (−y))  w(x) + w(−y) = w(x) + w(y).

We have completed the proof.

Lemma 2.2 (Trigonometric inequality) If x, y, z ∈ Fqn are three codings, then

d(x, y)  d(x, z) + d(z, y).

Proof From 2.1, if z ∈ Fqn , then

w(x − y) ≤ w(x − z) + w(z − y).


38 2 The Basis of Code Theory

Then by property (v), d(x, y) = w(x − y), we have

d(x, y)  d(x, z) + d(z, y).

The Lemma holds.

The nonnegativity, symmetry, translation invariance of Hamming distance and


trigonometric inequality described in lemma 2 together show that Hamming distance
of two codewords is equal to the distance between two points in physical space, which
is a real distance function in mathematical sense. Similarly, we can define the concept
of ball. A Hamming sphere with radius ρ centered on codeword x is defined as

Bρ (x) = {y|y ∈ Fqn , d(x, y)  ρ}, (2.2)

where ρ is a nonnegative integer. Obviously, B0 (x) = {x} contains only one code-
word.
Lemma 2.3 For any x ∈ Fqn , 0  ρ  n, we have

ρ  
n
|Bρ (x)| = (q − 1)i , (2.3)
i=0
i

where |Bρ (x)| is the number of codewords in Hamming ball Bρ (x).

Proof Let x = x1 x2 . . . xn , 0  i  ρ, i given, let

Ai = #{y ∈ Fqn |d(y, x) = i}.

Obviously,  
n
Ai = (q − 1)i ,
i
so
ρ
 ρ  
n
|Bρ (x)| = Ai = (q − 1)i .
i=0 i=0
i

Corollary 2.1 For ∀x ∈ Fqn , we have

|Bρ (x)| = |Bρ (0)|.

That is to say, the number of codewords in Bρ (x) is a constant which only depends
on radius ρ. This constant is usually denoted as Bρ .
2.1 Hamming Distance 39

Definition 2.1 If C  Fqn , C is called a q-ary code, code for short, |C| is the number
of codewords in code C. If |C| = 1, we call C a trivial code, and all the codes we
discuss are nontrivial codes.
For a code of C, the following five mathematical quantities are of basic importance.

Definition 2.2 If C is a code, define


1
Bit rate of C R = RC = logq |C|
n
Minimum distance of C d = min{d(x, y)|x, y ∈ C, x = y}
Minimal weight of C w = min{w(x)|x ∈ C, x = 0}
Coverage radius of C ρ = max{min{d(x, c)|c ∈ C}|x ∈ Fqn }
Disjoint radius of C ρ1 = max{r |0  r  n, Br (c1 ) ∩ Br (c2 ) = φ, ∀c1 , c2 ∈
C, c1 = c2 }
It is important to discuss the relationship between the above five mathematical
quantities for the study of codes. We begin by proving lemma 2.4.
Lemma 2.4 Let d be minimum distance of C, ρ1 be disjoint radius of C, then

d = 2ρ1 + 1, d = 2ρ1 + 2.

Proof We can only prove 2ρ1 + 1  d  2ρ1 + 2. If d  2ρ1 , then there are code-
words c1 ∈ C, c2 ∈ C, c1 = c2 such that

d(c1 , c2 )  2ρ1 .

This means that c1 and c2 have at most 2ρ1 different characters. Without losing
generality, we can make the first 2ρ1 characters of c1 and c2 different, that is

c1 = a1 a2 . . . aρ1 aρ1 +1 . . . a2ρ1 ∗ ∗ · · · ∗
c2 = b1 b2 . . . bρ1 bρ1 +1 . . . b2ρ1 ∗ ∗ · · · ∗

where * represents the same character. We can put

x = a1 a2 . . . aρ1 bρ1 +1 . . . b2ρ1 ∗ · · · ∗,

this shows that


d(x, c1 )  ρ1 , d(x, c2 )  ρ1 .

That is
x ∈ Bρ1 (c1 ) ∩ Bρ1 (c2 ).

It’s in contradiction with Bρ1 (c1 ) ∩ Bρ1 (c2 ) = φ. So we have d  2ρ1 + 1. If d >
2ρ1 + 2 = 2(ρ1 + 1), then we can prove the following formula, which is in contra-
diction with the definition of disjoint radius ρ1 .
40 2 The Basis of Code Theory

Bρ1 +1 (c1 ) ∩ Bρ1 +1 (c2 ) = φ, ∀c1 , c2 ∈ C, c1 = c2 .

Because if the above formula does not hold, then c1 , c2 ∈ C, c1 = c2 , Bρ1 +1 (c1 )
intersects with Bρ1 +1 (c2 ), we might as well make

x ∈ Bρ1 +1 (c1 ) ∩ Bρ1 +1 (c2 ),

Then the trigonometric inequality of lemma 2.2 is derived

d(c1 , c2 )  d(c1 , x) + d(c2 , x)  2(ρ1 + 1),

It contradicts the hypothesis of d > 2(ρ1 + 1). So we have 2ρ1 + 1  d  2ρ1 + 2.


The Lemma holds.
In order to discuss the geometric meaning of covering radius ρ, we consider the
set {Bρ (c)|c ∈ C} of balls on code C, if

Bρ (c) = Fqn ,
c∈C

Then {Bρ (c)|c ∈ C} is called a cover of codeword space Fqn .


Lemma 2.5 Let ρ be the covering radius of C, then ρ is the smallest positive integer
of {Bρ (c)|c ∈ C} covering Fqn .
Proof By the definition of ρ, for all x ∈ Fqn , there is

min{d(x, c)|c ∈ C}  ρ.

Therefore, when x ∈ Fqn is given, there is a codeword c ∈ C ⇒ d(x, c)  ρ, that is


x ∈ Bρ (c), this shows that 
Bρ (c) = Fqn .
c∈C

That is, {Bρ (c)|c ∈ C} forms a cover of Fqn . Obviously, {Bρ−1 (c)|c ∈ C} can’t cover
Fqn , because if

Bρ−1 (c) = Fqn .
c∈C

Then for any x ∈ Fqn , ∃ c ∈ C ⇒ x ∈ Bρ−1 (c), so

min{d(x, c)|c ∈ C}  ρ − 1.

Thus
ρ = max{min{d(x, c)|c ∈ C}|x ∈ Fqn }  ρ − 1.

The contradiction indicates that ρ is the smallest positive integer. The lemma holds.
2.1 Hamming Distance 41

Lemma 2.6 Let d be the minimum distance of C and ρ be the covering radius of C,
then
d  2ρ + 1.

Proof If d > 2ρ + 1, Let c0 ∈ C be given, then we have

Bρ+1 (c0 ) ∩ Bρ (c) = φ, ∀c ∈ C, c = c0 .

So you can choose x ∈ Bρ+1 (c0 ), and d(x, c0 ) = ρ + 1, then

/ Bρ (c0 ), x ∈
x∈ / Bρ (c), ∀c ∈ C.

That is, {Bρ (c)|c ∈ C} cannot cover Fqn , which is contrary to lemma 2.5. So we
always have d  2ρ + 1. The Lemma holds.
Combining the above three lemmas, we can get the following simple but very impor-
tant corollaries.
Corollary 2.2 Let C ⊂ Fqn be an arbitrary q-ary code. d, ρ, ρ1 are the minimum
distance, covering radius and disjoint radius of C respectively, then
(i) ρ1  ρ.
(ii) If the minimum distance of C is d = 2e + 1 ⇒ e = ρ1 .
Proof (i) Directly from 2ρ1 + 1  d  2ρ + 1, if d = 2e + 1 is odd, then by the
lemma 2.4, d = 2ρ1 + 1 = 2e + 1 ⇒ e = ρ1 .
Definition 2.3 A code C, if ρ = ρ1 , is called a perfect code.
Corollary 2.3 (i) The minimum distance of any perfect code C is d = 2ρ + 1.
(ii) The minimum distance of a code C is d = 2e + 1, Then C is a perfect code if
and only if ∀ x ∈ Fqn , ∃ the only ball Be (c), c ∈ C ⇒ x ∈ Be (c).
Proof (i) can be directly launched by 2ρ1 + 1  d  2ρ + 1. To prove (ii), if C is
a perfect code and the minimum distance is d = 2e + 1, so we have ρ1 = ρ = e.
On the other hand, if the conditions are right, then the coverage radius of C is
ρ  e = ρ1  ρ, so ρ1 = ρ. C is a perfect code.
In order to introduce the concept of error correcting code, we discuss the so-
called decoding principle in electronic information transmission. This principle is
commonly known as the decoding principle of “look most like”. What looks like
the most? When we transmit through the channel with interference, we receive a
codeword x ∈ Fqn , and a codeword x ∈ C. If

d(x, x ) = min{d(c, x )|c ∈ C},

x is the most similar codeword to x in code C. So we decode x to x. If the most


similar codeword x is the only one in C, then theoretically, x is the codeword received
x
after x transmission, so x −→ is accurate.
42 2 The Basis of Code Theory

Definition 2.4 A code C is called e-error correcting code (e  1). If for any x ∈ Fqn ,
there is a c ∈ C ⇒ x ∈ Be (c), then c is unique.
An error correcting code allows transmission errors without affecting correct
decoding. For example, suppose that C is a e-error correcting code, then for any
c ∈ C, after c is transmitted through the channel with interference, the codeword we
receive is x, if an error occurs when c is transmitted with no more than e characters
at most, that is d(c, x)  e, so the most similar codeword in C must be c, so we can
decode
decode x −−−→ c correctly.
Corollary 2.4 A perfect code with minimal distance d = 2e + 1 is e-error correct-
ing code.

Proof Because the disjoint radius ρ1 of C has ρ1 = ρ = e with the covering radius
ρ. Therefore, for any received codeword x ∈ Fqn , there exists and only exists a c ∈
C ⇒ x ∈ Be (c). That is, C is e-error correction code.

Finally, we prove the main conclusion of this section.


Theorem 2.1 The minimum distance of a code C is d = 2e + 1, then C is a perfect
code if and only if the following sphere-packing condition holds.

e  
n
|C| (q − 1)i = q n . (2.4)
i=0
i

Proof If the minimum distance of C is d = 2e + 1, and C is the perfect code ⇒


ρ = ρ1 = e. So 
Be (c) = Fqn .
c∈C

Then we have  
 
 
 Be (c) = q n ,
 
c∈C

thus
e  
n
|C|Be = |C| (q − 1)i = q n .
i=0
i

Conversely, the sphere-packing condition (2.4) holds. Because the minimum distance
of C is d = 2e + 1, from corollary 2.2, we can see that ρ1 = e, so we have

Be (c) = Fqn .
c∈C

It can be concluded that ρ  e = ρ1  ρ, thus ρ = ρ1 , C is a perfect code. The


theorem holds.
2.1 Hamming Distance 43

When q = 2, the alphabet F2 is a finite field of two elements {0, 1}, at this time,
the coding is called binary code or binary code, and the transmission channel is called
binary channel. In binary channel transmission, the most important is binary entropy
function H (λ), define as

0, when λ = 0 or λ = 1,
H (λ) = (2.5)
−λ log λ − (1 − λ) log(1 − λ), when 0 < λ < 1.
 
1 1
Obviously, H (λ) = H (1 − λ), and 0  H (λ)  1, H = 1, that is λ =
2 2
reaching the maximum. For further properties of H (λ), please refer to Chap. 1.
Theorem 2.2 Let C be a perfect code with minimal distance d = 2e + 1, RC is the
code rate of C, then
1 e e
(i) 1 − RC = log2 n
i
H .
n i=0 n
(ii) When the length of codeword is n → ∞, if lim RC = a, then
n→∞

e
lim H = 1 − a.
n→∞ n

Proof (i) According to the sphere-packing condition, since C is the perfect code, so
e  
 n
|C| = 2n .
i=0
i

We have
e  
1 1 n
log2 |C| + log2 = 1.
n n i=0
i

That is
e  
1 n e
1 − RC = log2 H ,
n i=0
i n

The last inequality is derived from lemma 1.11 in the Chap. 1, so (i) holds. If there
is a limit of RC when n → ∞, again from lemma 1.11 in the Chap. 1, we have
e
lim H = 1 − lim RC = 1 − a.
n→∞ n n→∞

The Theorem 2.2 holds.

Finally, we give an example of perfect code.


Example 2.1 Let n = 2e + 1 is an odd number, then the repeated code in Fn2 is
A = {0, 1}, where 0 = 00 . . . 0, 1 = 11 · · · 1 are Perfect codes of length n.
44 2 The Basis of Code Theory

First, repeat code A = {0, 1} ⊂ Fn2 contains only two codes 0 = 0 . . . 0 ∈ Fn2 , 1 =
1 . . . 1 ∈ Fn2 , because n = 2e + 1 is an odd number, so from Corollary 2.2, Disjoint
radius of A is ρ1 = e, let’s prove that the covering radius of A is ρ = ρ1 = e, for
any x ∈ Fn2 , if d(0, x) > e, that is d(0, x) ≥ e + 1, this shows that at least e + 1
characters are 1 in x = x1 x2 . . . xn ∈ Fn2 , the maximum number of e characters is 0,
thus d(1, x) ≤ e. This shows that x ∈ / Be (0), then x ∈ Be (1), that is

Be (0) ∪ Be (1) = Fn2 ,

so ρ ≤ e = ρ1 ≤ ρ, we have ρ = ρ1 . That is A is the perfect code. Note that in this


example, e can take any positive integer, so the code rate of the repeat code has a
limit value e
lim R A = 0, ⇒ lim H ( ) = 1.
n→∞ n→∞ n
As the end of this chapter, we discuss and define the equivalence of two codes.
Let C ⊂ Fqn be a code of length n and Sn be a permutation group of n elements. Any
σ ∈ Sn is a n permutation, x = x1 x2 . . . xn ∈ Fqn , We define σ (x) as

σ (x) = xσ (1) xσ (2) . . . xσ (n) ∈ Fqn , (2.6)

σ (C) = {σ (c) | c ∈ C}. (2.7)

Definition 2.5 Let C and C1 be two codes in Fqn , if there is σ ∈ Sn ⇒ σ (C) =


C1 , Call C and C1 is equivalent, denoted as C ∼ C1 . Obviously, equivalence is
an equivalence relation between codes, because take σ = 1, then have C ∼ C. If
C ∼ C1 , that is C1 = σ (C), then we have C = σ −1 (C1 ), that is C ∼ C1 ⇒ C1 ∼ C.
Similarly, if C ∼ C1 , C1 ∼ C2 , then C ∼ C2 . Because C1 = σ (C), C2 = τ (C1 ) ⇒
C2 = τ σ (C). Another obvious property is that the function of σ does not change the
Hamming distance between two codewords, that is, we have

d(σ (x), σ (y)) = d(x, y), ∀σ ∈ Sn . (2.8)

Lemma 2.7 Suppose C ∼ C1 are two equivalent codes, then they have the same code
rate, the same minimum distance, the same coverage radius and the same disjoint
radius. In particular, if C is a perfect code, then all codes C1 equivalent to C are
perfect codes.

Proof All the results of lemma can be easily proved by using equation (2.8).

2.2 Linear Code

Let C ⊂ Fqn be a code, if C is a k-dimensional linear subspace of Fqn , C is called a


linear code, denote as C = [n, k]. So for a linear code C, we have
2.2 Linear Code 45

1 k
RC = logq |C| = , minimal distance d = minimal weight w.
n n
Let {α1 , α2 , . . . , αk } ⊂ C be a set of bases of linear code C, where

αi = αi1 αi2 · · · αin ∈ Fqn , 1 ≤ i ≤ k.

Definition 2.6 If {α1 , α2 , . . . , αk } is a set of bases of linear code C = [n, k], then
have k × n-order matrix
⎡ ⎤ ⎡ ⎤
α1 α11 α12 · · · α1n
⎢ α2 ⎥ ⎢
⎢ ⎥ α21 α22 · · · α2n ⎥
G=⎢ . ⎥=⎢ ⎣
⎥.
⎣ .. ⎦ ··· ··· ··· ··· ⎦
αk αk1 αk2 · · · αkn

Called generation matrix of C, write

G = [Ik , Ak×(n−k) ], Ik is k order identity matrix.

It is called the standard form of G.

Lemma 2.8 C = [n, k] is a linear code, G is generation matrix, then

C = {aG|a ∈ Fqk }.

Proof Because {α1 , α2 , . . . , αk } is a set of bases of linear code C. ∀ x ∈ C, then


⎡ ⎤
α1
⎢ ⎥
x = a1 α1 + a2 α2 + · · · + ak αk = (a1 , a2 , . . . , ak ) ⎣ ... ⎦ = a · G.
αk

Where a = (a1 , a2 , . . . , ak ) ∈ Fqk , the Lemma holds.

Define the inner product in Fqn , x = x1 . . . xn , y = y1 . . . yn ∈ Fqn , then define <


n
x, y >= i=1 xi yi , if < x, y >= 0, Say x and y orthogonal, denote as x ⊥ y.
Definition 2.7 Let C = [n, k] be a linear code whose orthogonal complement space
C ⊥ is
C ⊥ = {y ∈ Fqn | < x, y >= 0, ∀ x ∈ C}.

Obviously, C ⊥ is an [n, n − k]-linear code, and C ⊥ is the dual code of C. The


generating matrix H of C ⊥ is called the check matrix of C.

Lemma 2.9 C = [n, k] is a linear code, H is a check matrix, then we have


46 2 The Basis of Code Theory

x H = 0 ⇔ x ∈ C.

Where H is the transpose matrix of H .

Proof We only prove the conclusion by taking the standard form of the generating
matrix G of C. Let

G = [Ik , Ak×(n−k) ] = [Ik , A], A = Ak×(n−k) .

Then the check matrix of C, that is, the generating matrix of dual code C ⊥ is
 
−A
H = [−A , In−k ], H = .
In−k

By Lemma 2.8, if x ∈ C, then ∃ a ∈ Fqk ⇒ x = aG, thus


 
−A
x H = aG H = a[Ik , A] = 0.
In−k

Conversely, if x H = 0, because H is the generating matrix of C ⊥ , again by


Lemma 2.8, for ∀ y ∈ C ⊥ , ∃ b ∈ Fqn−k ⇒ y = bH , thus

< x, y >= x y = x H b = 0 ⇒ x ∈ (C ⊥ )⊥ = C.

The Lemma holds.

By Lemma 2.9, ∀ x, y ∈ Fqn , then

x H = y H ⇔ x − y ∈ C.

Because C is an additive subgroup of Fqn , x H is called the check value of codeword


x. Then the check values of the two codewords are equal ⇔. These two codewords
are in the same additive coset of C. The following decoding principle of linear code
is produced.
Decoding principle: If the C = [n, k] linear code is used for coding, through an
interference channel, when the received codeword is x ∈ Fqn , then find a codeword
x0 with the least weight in the additive coset x + C of x, that is, x0 satisfies

x0 ∈ x + C, and w(x0 ) = min{w(α)|α ∈ x + C}.

x0 is called the leader codeword in coset x + C. We’re going to decode x into x − x0 .

Lemma 2.10 If the minimum distance of linear code C = [n, k] is d = 2e + 1, then


there is at most one codeword x 0 ⇒ w(x0 ) ≤ e in any additive coset x + C of C.
2.2 Linear Code 47

Proof If α, β ∈ x + C, and w(α) ≤ e, w(β) ≤ e. Then α − β ∈ C. And w(α −


β) ≤ w(α) + w(β) = 2e, but minimal weight of C =Minimal distance of C =
2e + 1, so there are contradictions, thus α = β. The Lemma holds.
Corollary 2.5 For a perfect linear code C = [n, k] with minimal distance d = 2e +
1, then there exists and only exists a codeword with weight ≤ e in any additive coset
x + C of C. In other words, the leader code in any addition set is unique.
Proof x ∈ Fqn ⇒ ∃ c ∈ C such that x ∈ Be (c), that is d(c, x) ≤ e. So w(x − c) ≤
e.But x − c ∈ x + C. The Lemma holds.
Definition 2.8 If any two column vectors of the generator matrix G of a linear code
C = [n, k] are linearly independent, C is called a projective code.
In order to discuss the true meaning of projective codes, we consider the (k − 1)-
dimensional projective space P G(k − 1, q) over Fq .
In Fqk , any two vectors a = (a1 , a2 , . . . , ak ), b = (b1 , b2 , . . . , bk ), say a ∼ b, if
∃ λ ∈ Fq∗ ⇒ a = λb. This is an equivalent relation on Fqk . Obviously b ∼ 0 ⇔ b = 0,
any a ∈ Fqk , a = {λa|λ ∈ Fq∗ }, the quotient set Fqk /∼ is called a (k − 1)-dimensional
projective space over Fq . Denote as P G(k − 1, q), therefore

P G(k − 1, q) = Fqk /∼ = {a|a ∈ Fqk }.

The number of nonzero points in (k − 1)-dimensional projective space P G(k − 1, q)


is
qk − 1
|P G(k − 1, q)| = = 1 + q + · · · + q k−1 .
q −1

A linear code [n, n − k], its check matrix H is a k × n-order matrix, and any
two column vectors are linearly independent, that is H = [a1 , a2 , . . . , an ], then
{a1 , a2 , . . . , an } ⊂ P G(k − 1, q) are n with different nonzeros. So the generating
matrix of an [n, k] projective code consists of n different nonzero points in projec-
tive space P G(k − 1, q). Because n ≤ |P G(k − 1, q)|, when the maximum value
is reached, i.e.
qk − 1
n = |P G(k − 1, q)| = .
q −1

This leads to a perfect example of linear codes, called Hamming codes.


−1 k
Definition 2.9 Let k > 1, n = qq−1 , a linear code C = [n, n − k] is called a Ham-
ming code if any two column vectors of the check matrix H of C are linearly inde-
pendent.
Since C is a n − k-dimensional linear subspace and C ⊥ is a k-dimensional linear
subspace, its generating matrix H is a k × n-order matrix. Therefore, if any two
column vectors of H are linearly independent, they represent n different points in
k
−1
projective space P G(k − 1, q). Because n = qq−1 , then the construction of Ham-
ming codes is the most possible.
48 2 The Basis of Code Theory

Theorem 2.3 Any Hamming code C = [n, n − k] is a perfect code, its minimum
distance is d = 3; therefore, Hamming codes are perfect 1−error correcting codes.
Proof We first prove that the minimum distance of Hamming code C is d ≥ 3. If
d ≤ 2, there is x = x1 x2 . . . xn ⇒ w(x) ≤ 2, that is, there are at most two characters
xi and x j are not 0. Because the minimum distance d = minimum weight w of a
linear code.
Let H = (α1 , α2 , . . . , αn ) be the check matrix of C. if x H = 0, then
⎡ ⎤
α1
⎢ α2 ⎥
⎢ ⎥
(x1 , x2 , . . . , xn ) ⎢ . ⎥ = 0.
⎣ .. ⎦
αn

We have αi xi + α j x j = 0, thus αi and α j are linearly related, contradiction. So


d ≥ 3, by Lemma 2.4, then the disjoint radius of C is ρ1 ≥ 1.
On the other hand, c ∈ C, by Lemma 2.3, the number of elements in ball B1 (c) is

|B1 (c)| = 1 + n(q − 1) = q k .

Because C is a (n − k)-dimensional linear subspace, that is |C| = q n−k , so



| B1 (c)| = |C|q k = q n = |Fqn |,
c∈C


⇒ B1 (c) = Fqn .
c∈C

We have 1 ≤ ρ1 ≤ ρ ≤ 1 ⇒ ρ1 = ρ = 1. C is a perfect code. Its minimal distance


is d = 2ρ + 1 = 3, the Lemma holds.
Next, we discuss the weight polynomial of a linear code C and prove the famous
MacWilliams theorem.
x ∈ C = [n, k], then the value of weight function w(x) is from 0 to n, actually
w(x) = 0 ⇔ x = 0 ∈ C, w(x) = n ⇔ x = x1 . . . xn , ∀ xi = 0. So for each i, 0 ≤
i ≤ n, define
A(i) = #{x ∈ C|w(x) = i}

and weighted polynomials of C.


n
A(z) = Ai z i , z is a variable.
i=0

Obviously, for any given c ∈ C, then the number of codewords in C whose Hamming
distance to c is exactly equal to i is Ai , that is
2.2 Linear Code 49

Ai = #{x ∈ C|d(x, c) = i}.

The codes with the above properties are called distance invariant codes; obviously,
linear codes are distance invariant codes.
The following result was proved by MacWilliams in 1963; he established the
relationship between the weight polynomials of a linear code C and its dual code
C ⊥ , which is the most basic achievement in code theory.
Theorem 2.4 (MacWilliams) Let C = [n, k] be a linear code over Fq and the weight
polynomial be A(z), C ⊥ is the dual code of C, the weight polynomial is B(z), then

1−z
B(z) = q −k (1 + (q − 1)z)n A( ).
1 + (q − 1)z

Specially, when q = 2,
1−z
2k B(z) = (1 + z)n A( ).
1+z

Proof Let ψ(a) be an additive feature on Fq . ψ(a) can be constructed as follows:

2πitr (a)
ψ(a) = exp( ), tr (a) : Fq → F p .
p

For any c ∈ C, we define the polynomial gc (z) as



gc (z) = z w(x) ψ(< x, c >), (2.9)
x∈Fqn

therefore,   
gc (z) = z w(x) ψ(< x, c >), (2.10)
c∈C x∈Fqn c∈C

Let’s calculate the inner sum of (2.10). If x ∈ C ⊥ , then



ψ(< x, c >) = |C|.
c∈C

/ C ⊥ , let’s prove
If x ∈ 
ψ(< x, c >) = 0. (2.11)
c∈C

/ C ⊥ , let
If x ∈ Fqn , x ∈

T (x) = {y ∈ C |< y, x >= 0}  C,


50 2 The Basis of Code Theory

so T (x) is a linear subspace of C. c ∈ C, we consider additive cosets any two code-


words c + y1 and c + y2 in this set , we have

< c + y1 , x >=< c + y2 , x >=< c, x > .

On the contrary, any two additive cosets c1 + T (x), c2 + T (x), if < c1 , x >=<
c2 , x >, then < c1 − c2 , x >= 0, that is c1 − c2 ∈ T (x), so c1 + T (x) = c2 + T (x).
Therefore, the inner product of any two codewords in c + T (x) ⊂ C is the same
with x. Conversely, different additive cosets and the inner product of x are not equal.
Because x ∈ / C ⊥ , ∃ c0 ∈ C, such that < c0 , x >= 0, let < c0 , x >= a = 0, then
< a c0 , x >= 1, let c1 = a −1 c0 ∈ C, then < c1 , x >= 1. Therefore, ∀a ∈ Fq ⇒<
−1

ac1 , x >= a. < c, x > takes every element of Fq , so


 
ψ(< x, c >) = [C : T (x)] ψ(a) = 0.
c∈C a∈Fq

That is, (2.11) holds. From (2.10), we can get that


 
gc (z) = |C| z w(x) = |C|B(z). (2.12)
c∈C x∈Fqn

x∈C

Define the weight function w(a) = 1 for a ∈ Fq , if a = 0, w(0) = 0. For any x ∈ Fqn ,
c ∈ C, write x = x1 x2 . . . xn , c = c1 c2 . . . cn , then it is defined by G, we have

gc (z) = z w(x1 )+w(x2 )+···+w(xn ) ψ(c1 x1 + · · · + cn xn )
1≤i≤n
xi ∈Fq
(2.13)

n 
w(x)
= z ψ(ci x).
i=1 x∈Fq

The inner layer sum of the above formula can be calculated as



 1 − z, i f ci = 0,
w(x)
z ψ(ci x) =
x∈Fq
1 + (q − 1)z, i f ci = 0.

From (2.13), then we have

gc (z) = (1 − z)w(c) (1 + (q − 1)z)n−w(c) .

Thus
2.2 Linear Code 51

  1−z
gc (z) = (1 + (q − 1)z)n ( )w(c)
c∈C c∈C
1 + (q − 1)z
1−z
= (1 + (q − 1)z)n A( ).
1 + (q − 1)z

Finally, from (2.12), we have

1 1−z
B(z) = (1 + (q − 1)z)n A( )
|C| 1 + (q − 1)z
1−z
= q −k (1 + (q − 1)z)n A( ).
1 + (q − 1)z

We have completed the proof of the theorem.

2.3 Lee Distance

m > 1 is a positive integer, Zm a is residue class rings of mod m, if Zm is the alphabet


and C ⊂ Znm is the proper subset, then C is called an m-ary code. In this case,
Hamming distance is not the best tool to measure error, we substitute Lee distance
and Lee weight. Let i ∈ Zm , define Lee weight as

W L (i) = min{i, m − i}. (2.14)

Obviously,

W L (0) = 0, W L (−i) = W L (m − i) = W L (i). (2.15)

Suppose a = (a1 , a2 , . . . , an ) = a1 a2 . . . an ∈ Znm , b = b1 b2 . . . bn ∈ Znm , define Lee


weight and Lee distance on m-ary code C as follows

⎨ W (a) =  W (a )
n
L L i
i=1

d L (a, b) = W L (a − b).

From (2.15), we have

W L (−a) = W L (a), d L (a, b) = d L (b, a), ∀ a, b ∈ Znm .

Lemma 2.11 For ∀ a, b, c ∈ Znm , we have the following trigonometric inequalities

d L (a, b) ≤ d L (a, c) + d L (c, b).


52 2 The Basis of Code Theory

Proof Suppose 0 ≤ i < m, 0 ≤ j < m, we have

W L (i + j) ≤ W L (i) + W L ( j). (2.16)

Because 0 ≤ i + j ≤ m
2
, then

W L (i + j) = i + j = W L (i) + W L ( j).

If m2 < i + j < m, we discuss it in three ways,


(1) i ≤ m2 , j ≤ m2 , there is

W L (i + j) = m − i − j < i + j = W L (i) + W L ( j).

(2) i ≤ m
2
, j> m
2
, there is

W L (i + j) = m − i − j ≤ m − j = W L ( j) ≤ W L (i) + W L ( j).

(3) i > m
2
, j≤ m
2
, there is

W L (i + j) = m − i − j ≤ m − i = W L (i) ≤ W L (i) + W L ( j).

So we always have (2.16), in Znm , (2.16) can be extended to

W L (a + b) ≤ W L (a) + W L (b), ∀ a, b ∈ Znm .

So
d L (a, b) = W L (a − b) = W L ((a − c) + (c − b))
≤ W L (a − c) + W L (c − b) = d L (a, c) + d L (c, b).

The Lemma holds.

Next let’s make m = 4, the alphabet is Z4 , On a 4-ary code, we discuss Lee weight
and Lee distance. Suppose a ∈ Zn4 , 0 ≤ i ≤ 3, let

n i (a) = #{ j|1 ≤ j ≤ n, a = a1 a2 . . . an , a j = i}. (2.17)

n i (a) is the number of characters equal to i in codeword a. C ⊂ Zn4 is a 4-ary code,


the symmetric polynomial and Lee weight polynomial of C are defined as

sweC (w, x, y) = w n 0 (c) x n 1 (c)+n 3 (c) y n 2 (c) (2.18)
c∈C

and 
LeeC (x, y) = x 2n−W L (c) y W L (c) . (2.19)
c∈C
2.3 Lee Distance 53

Lemma 2.12 Let C ⊂ Zn4 is a 4-ary code with codeword length of n, then the sym-
metric polynomials and Lee weight polynomials have the following relation on C,

LeeC (x, y) = sweC (x 2 , x y, y 2 ).

Proof a ∈ Zn4 , by definition

n 0 (a) + n 1 (a) + n 2 (a) + n 3 (a) = n.

Let a = a1 a2 . . . an , then


n
W L (a) = W L (ai ) = n 1 (a) + 2n 2 (a) + n 3 (a).
i=1

So 
LeeC (x, y) = x 2n 0 (c) · (x y)n 1 (c)+n 3 (c) y 2n 2 (c)
c∈C

= sweC (x 2 , x y, y 2 ).

The Lemma holds.


By using Lee weight and Lee distance, we can extend the MacWilliams theorem
to Z4 codes, we have
Theorem 2.5 Let C ⊂ Zn4 be a linear code and C ⊥ be its dual code, LeeC (x, y) be
a Lee weighted polynomial of C, then

1
LeeC ⊥ (x, y) = LeeC (x + y, x − y).
|C|

Proof Let ψ be a nontrivial characteristic of Z4 , and let ψ be



ψ(i) = ( −1)i , i = 0, 1, 2, 3.

Let f (u) be a function defined on Zn4 , we let



g(c) = f (u)ψ(< c, u >). (2.20)
u∈Zn4

As in Theorem 2.4, there are


 
g(c) = |C| f (u). (2.21)
c∈C u∈C ⊥

Take
f (u) = w n 0 (u) x n 1 (u)+n 3 (u) y n 2 (u) , u ∈ Zn4 .
54 2 The Basis of Code Theory

Write u = u 1 u 2 . . . u n ∈ Zn4 , then for each i, 0 ≤ i ≤ 3, we have

n i (u) = n i (u 1 ) + n i (u 2 ) + · · · + n i (u n ).

Thus

n
f (u) = f (u i ).
i=1

Let c = c1 c2 . . . cn ∈ Zn4 , by (2.20),

n 

g(c) = ( f (u)ψ(< ci , u >)). (2.22)
i=1 u∈Z4

Now we calculate the inner sum on the right side of equation (2.22).


⎨w + 2x + y, if ci = 0

f (u)ψ(< ci , u >) = w − y, if ci = 1 or 3 .


u∈Z4 w − 2x + y, or ci = 2.

by (2.22),

g(c) = (w + 2x + y)n 0 (c) (w − y)n 1 (c)+n 3 (c) (w − 2x + y)n 2 (c) .

So 
g(c) = sweC (w + 2x + y, w − y, w − 2x + y).
c∈C

by (2.21),

|C|sweC ⊥ (w, x, y) = sweC (w + 2x + y, w − y, w − 2x + y).

By Lemma 2.12, and replace the variable, there are

1
LeeC ⊥ (x, y) = LeeC (x + y, x − y).
|C|

We have completed the proof.


2.4 Some Typical Codes 55

2.4 Some Typical Codes

2.4.1 Hadamard Codes

In order to introduce Hadamard codes, we first define a Hadamard matrix of order


n. Let H = (ai j ), if ai j = ±1, and
⎡ ⎤
n 0 ··· 0
⎢0 n ··· 0⎥
⎢ ⎥
H H = n In = ⎢ . .. . ⎥,
⎣ .. . · · · .. ⎦
0 0 ··· n

H is called a Hadamard matrix of order n. It is easy to verify that the following H2


is a Hadamard matrix of second order
 
1 1
H2 = .
1 −1

In order to obtain higher-order Hadamard matrices, a useful tool is the so-called


Kronecker product. Let A = (ai j )m×m , B = (bi j )n×n , then A and B’s Kronecker
product A ⊗ B define as
⎡ ⎤
a11 B a12 B · · · a1m B
⎢ a21 B a22 B · · · a2m B ⎥
⎢ ⎥
A⊗B =⎢ . .. .. ⎥ .
⎣ .. . ··· . ⎦
am1 B am2 B · · · amm B

Obviously, A ⊗ B is a square matrix of order nm × nm. The following results are


easy to prove.
Lemma 2.13 Let A be a Hadamard matrix of order m, B be a Hadamard matrix of
order n, then A ⊗ B be a Hadamard matrix of order nm × nm.

Proof Let A = (ai j )m×m , B = (bi j )n×n , H = A ⊗ B, then


56 2 The Basis of Code Theory
⎡ ⎤ ⎡ ⎤
a11 B a12 B · · · a1m B a11 B a21 B · · · am1 B
⎢ a21 B a22 B · · · a2m B ⎥ ⎢ a12 B a22 B · · · am2 B ⎥
⎢ ⎥ ⎢ ⎥
HH = ⎢ . .. . ⎥.⎢ . .. . ⎥
⎣ .. . · · · .. ⎦ ⎣ .. . · · · .. ⎦
am1 B am2 B · · · amm B a1m B a2m B · · · amm B
⎡ ⎤
c11 B B c12 B B · · · c1m B B
⎢ .. .. .. ⎥
=⎣ . . ··· . ⎦
cm1 B B cm2 B B · · · cmm B B
⎡ ⎤
mn In 0 · · · 0
⎢ 0 mn In · · · 0 ⎥
⎢ ⎥
=⎢ . .. .. ⎥
⎣ .. . ··· . ⎦
0 0 · · · mn In
= mn Inm .

The Lemma holds.

Since H2 is a Hadamard matrix of order 2, then

H2 ⊗ H2 = H2⊗2 , H2 ⊗ H2 ⊗ · · · ⊗ H2 = H2⊗n

are Hadamard matrix of order 4 and order 2n respectively.


Let n be an even number and Hn be a Hadamard matrix of order n, take
α1 , α2 , . . . , αn as n row vectors, i.e.,
⎡ ⎤ ⎡ ⎤
α1 −α1
⎢ α2 ⎥ ⎢ −α2 ⎥
⎢ ⎥ ⎢ ⎥
Hn = ⎢ . ⎥ , −Hn = ⎢ . ⎥ .
⎣ .. ⎦ ⎣ .. ⎦
αn −αn

We get 2n row vectors {±α1 , ±α2 , . . . , ±αn }, for each row vectors ±αi , we
replace the component −1 with 0, the row vector αi so permuted is denoted as
αi , −αi denote as −αi , so ±αi forms a vector of Fn2 , denote as

C = {±α1 , ±α2 , . . . , ±αn } ⊂ Fn2 .

C is called a Hadamard code.


Theorem 2.6 The minimum distance of Hadamard code C of length n (n is an even
number) is d = n2 .
2.4 Some Typical Codes 57
⎤ ⎡
α1
⎢ α2 ⎥
⎢ ⎥
Proof Let Hn be a Hadamard matrix of order n, Hn = ⎢ . ⎥ , Each αi is a row
⎣ .. ⎦
αn
σ
vector of Hn , substitute αi −→ αi , such that each αi ⊂ become a binary codeword.
Fn2
We see that this kind of permutation does not change the corresponding Hamming
distance, that is 
d(αi , α j ) = d(α i , α j )
d(−αi , −α j ) = d(α i , α j ),

where i = j. Let us prove that the minimum distance of C is n2 , let a = a1 a2 . . . an ,


b = b1 b2 . . . bn are two different row vectors of Hadamard matrix Hn , because of


n
ab = 0 ⇒ ai bi = 0.
i=1

And ai = ±1, bi = ±1. Let the number of the same character be d1 and the number
of different characters be d = d(a, b), so there are d1 − d = 0, that is d1 = d, but
d1 + d = n, so d = n2 . The Lemma holds.

Corollary 2.6 C = {±α1 , ±α2 , . . . , ±αn } is Hadamard code, then the Hamming
distance of any two different codewords on C is n2 .

Proof {±α1 , ±α2 , . . . , ±αn } s the row vector of Hadamard matrix, let a = ±αi , b =
±α j (i = j), then
 n
n
ab = ± ai bi = 0 ⇒ d(a, b) = .
i=1
2

A code of length n, number of codewords M, minimum distance d, denoted as


(n, M, d), different from linear code [n, k] or [n, k, d], Hadamard code is
n
C = (n, 2n, ).
2
When n = 8, d = 4, this is an extension of Hamming code. When n = 32, (32, 64, 16)
is the code used by the U.S. Mars probe in 1969 to transmit pictures taken on Mars.

2.4.2 Binary Golay Codes

In the theory and application of channel coding, binary Golay code is the most famous
one. In order to introduce Golay code G 23 completely, we first introduce the concept
of t − (m, k, λ) design.
58 2 The Basis of Code Theory

Let S be a set of m elements, that is |S| = m. The elements in S are called points.
Let R be the set of subsets with k elements in S, |R| = M, i.e.,

R = {B1 , B2 , . . . , B M }, Bi ⊂ S, |Bi | = k, 1 ≤ i ≤ M.

Element Bi in R is called block.


Definition 2.10 (S, R) is called t − (m, k, λ) design, if for any T ⊂ S, |T | = t,
then there are exactly λ blocks B in R such that T ⊂ B. If (S, R) is a t − (m, k, λ)
design, denote as (S, R) = t − (m, k, λ). If λ = 1, then t − (m, k, 1) is called a
Steiner system.
In a t − (m, k, λ) design (S, R), we introduce its occurrence matrix. For any
a ∈ S, the characteristic function χi (a) is defined as

1, if a ∈ Bi ,
χi (a) =
0, / Bi ,
if a ∈

write S = {a1 , a2 , . . . , am }, R = {B1 , B2 , . . . , B M }, |R| = M. Matrix


⎡ ⎤
χ1 (a1 ) χ2 (a1 ) · · · χ M (a1 )
⎢ χ1 (a2 ) χ2 (a2 ) · · · χ M (a2 ) ⎥
⎢ ⎥
A = (χ j (ai ))m×M =⎢ . .. .. ⎥,
⎣ .. . ··· . ⎦
χ1 (am ) χ2 (am ) · · · χ M (am )

A is called the occurrence matrix of t − (m, k, λ) design.


Let’s now consider a concrete example, 2 − (11, 6, 3) design. Where there are
11 points in S and 6 points in R, and any two points in S have exactly three blocks
containing it.

Lemma 2.14 2 − (11, 6, 3) design is the only definite one, that is to say, let S =
{a1 , a2 , . . . , a11 }, then there are 11 blocks in R,

R = {B1 , B2 , . . . , B11 }.

And for any a ∈ S, exactly 6 blocks B j in R contain a.

Proof Suppose ∀ a ∈ S, there is exactly l B j containing it, because there are exactly
3 blocks in any 2 points, so there are 6l − l = 10 × 3. Then l = 6. In addition,
suppose |R| = M, because each point has exactly six blocks containing it, there is
6 × M = 11 × 6, we can get M = 11.

By Lemma 2.14, the generating matrix N of 2 − (11, 6, 3) design is an 11-order


square matrix
2.4 Some Typical Codes 59
⎡ ⎤
χ1 (a1 ) χ2 (a1 ) · · · χ11 (a1 )
⎢ χ1 (a2 ) χ2 (a2 ) · · · χ11 (a2 ) ⎥
⎢ ⎥
N =⎢ .. .. .. ⎥.
⎣ . . ··· . ⎦
χ1 (a11 ) χ2 (a11 ) · · · χ11 (a11 )

And every row of N has exactly six 1’s and five 0’s, and every column of N has
exactly six 1’s and five 0’s.

Lemma 2.15 Let N be the occurrence matrix of 2 − (11, 6, 3) design, then


⎡ ⎤
1 1 ··· 1
⎢ .. .. .. ⎥ .
N N = 3I11 + 3J11 , J11 = ⎣ . . ··· .⎦
1 1 ··· 1

If N is regarded as a square matrix of order 11 over F2 , then

N N = I11 + J11 .

Further rank(N ) = 10, and the solution of linear equation system X N = 0 is exactly
two repeated codewords 0 and 1(0 = (0, 0, . . . , 0), 1 = (1, 1, . . . , 1)) in F11
2 .

Proof Let N N = (bi j )11×11 , defined by


11
bi j = χk (ai )χk (a j ).
k=1

When i = j, bi j = 3, when i = j, bi j = 6, so we have

N N = 3I11 + 3J11 ≡ I11 + J11 (mod 2).

Let N (mod 2) still be N , which is a square matrix of order 11 over F2 . we have

rank(N ) = rank(I11 ) − rank(J11 ) = 10.

So the solution space of X N = 0 is a one-dimensional linear subspace of F11


2 . Since
each column vector of N has exactly six 1’s and five 0’s, then

2 .
(1, 1, . . . , 1)N = (0, 0, . . . , 0) ∈ F11

So there are exactly two solutions for X N = 0:

x = (0, 0, . . . , 0), x = (1, 1, . . . , 1).

The Lemma holds.


60 2 The Basis of Code Theory

Next, let’s construct a matrix G of order 12 × 24, G = (I12 , P), where


⎛ ⎞ ⎡ ⎤
0 1 ··· 1 α1
⎜1 ⎟ ⎢ α2 ⎥
⎜ ⎟ ⎢ ⎥
P = ⎜. N ⎟ , and G = ⎢ .. ⎥.
⎝ .. ⎠ ⎣ . ⎦
1 α12

where αi ∈ F24
2 is the 12 row vector of G. Obviously we have a weight function

w(α1 ) = 12, w(αi ) = 8, 2 ≤ i ≤ 12. (2.23)


⎡ ⎤
α1
⎢ α2 ⎥
⎢ ⎥
Lemma 2.16 Let G = ⎢ .. ⎥, then {α1 , α2 , . . . , α12 } ⊂ F24
2 is a linear indepen-
⎣ . ⎦
α12
dent group, and the weight of any nonzero linear combination is at least 8, that
is
w(a1 α1 + a2 α2 + · · · + a12 α12 ) ≥ 8, ai not all zero. (2.24)

Proof Let’s prove that {αi }i=1


12
is a set of vectors orthogonal to each other, that is,
the inner product is < αi , α j >= αi α j = 0. Obviously we have

< α1 , α j >= α1 α j = 6 ≡ 0(mod 2), j = 1.

If i = 1, j = 1, i = j, then


11
< αi , α j >= 1 + χk (ai )χk (a j ) = 4 ≡ 0(mod 2).
k=1

So < αi , α j >= 0, when i = j, that is {α1 , α2 , . . . , α12 } is a linear independent


2 . If ai ∈ F2 , not all zero, take a = a1 a2 . . . a12 , let’s prove (2.24) by
group of F24
induction of w(a). If w(a) = 1, the proposition holds by (2.23). When w(a) ≥ 8,
the proposition is ordinary, for 2 ≤ w(a) ≤ 7, we can still prove

w(a1 α1 + a2 α2 + · · · + a12 α12 ) ≥ 8.

So the Lemma holds.

Definition 2.11 The linear code [24, 12] generated by row vector group {α1 , α2 , . . . ,
α12 } of G in F242 is called Golay code, denoted as G 24 . Remove the last component of
αi , αi → α i , then α i ∈ F23
2 . The linear code [23, 12] generated by {α 1 , α 2 , . . . , α 12 }
in F23
2 is called Golay code, denote as G 23 .
2.4 Some Typical Codes 61

Theorem 2.7 Golay code G 23 is a perfect code [23, 12] with minimal distance of
d = 7.

Proof Because the minimal distance of linear codes is minimal weight, by Lemma
2.16,

w(a1 α 1 + a2 α 2 + · · · + a12 α 12 ) ≥ w(a1 α1 + a2 α2 + · · · + a12 α12 ) − 1 ≥ 7.

On the one hand, w(αi ) = 8 for ∀ αi , i = 1, so there is α i ⇒ w(α i ) = w(αi ) − 1 =


7. So the minimum distance of G 23 is d = 7. On the other hand, we note that

3  
 3  

23 23
|G 23 | =2 12
= 223 .
i=0
i i=0
i

By the sphere-packing condition of Theorem 2.1 ⇒ G 23 is a perfect code, the Lemma


holds.

2.4.3 3-Ary Golay Code

In order to introduce 3-ary Golay codes, we first define a Paley matrix of order q.
Let q ≥ 3 be an odd number, and define a second-order real-valued multiplication
characteristic χ (a) in the finite field Fq as

⎪ 0, if a = 0;

χ (a) = 1, if a ∈ (Fq∗ )2 ;


− 1, if a ∈/ (Fq∗ )2 .

Obviously, χ is a character in Fq∗ . Because Fq∗ is a (q − 1)-order cyclic multiplicative


group, so we have

q−1 1, if q ≡ 1(mod 4);
χ (−1) = (−1) 2 =
− 1, if q ≡ 2 or 3(mod 4).

Write Fq = {a0 , a1 , . . . , aq−1 } , where a0 = 0, then Paley matrix Sq of order q is


defined as
⎡ ⎤
0 χ (−a1 ) χ (−a2 ) · · · χ (−aq−1 )
⎢ χ (a1 ) 0 χ (a1 − a2 ) · · · χ (a1 − aq−1 ) ⎥
⎢ ⎥
Sq = (χ (ai − a j ))q×q = ⎢ ⎢ χ (a 2 ) χ (a 2 − a 1 ) 0 · · · χ (a2 − aq−1 ) ⎥
⎥.
⎣ ··· ··· ··· ··· ··· ⎦
χ (aq−1 ) χ (aq−1 − a1 ) ··· ··· 0
62 2 The Basis of Code Theory

Lemma 2.17 The Paley matrix Sq of order q has the following properties:
(i) Sq Jq = Jq Sq = 0.
(ii) Sq Sq = q Iq − Jq .
q−1
(iii) Sq = (−1) 2 Sq .
Here, Iq is the unit matrix of order q and Jq is the square matrix of order q with all
elements of 1.

Proof Let Sq Jq = (bi j )q×q , then for ∀ 0 ≤ i ≤ q − 1, 0 ≤ j ≤ q − 1, there is


q−1

bi j = χ (ai − ak ) = χ (c) = 0.
k=0 c∈Fq

So (i) holds. To prove (ii), let Sq Sq = (ci j )q×q , then


q−1
ci j = χ (ai − ak )χ (a j − ak ).
k=0

Obviously, we have 
q − 1, if i = j;
ci j =
− 1, if i = j.

q−1
So (ii) holds. To prove (iii), noticed that χ (−1) = (−1) 2 , so
q−1
Sq = χ (−1)Sq = (−1) 2 Sq ,

the Lemma holds.

Let q = 5, we consider the Paley matrix S5 of order 5, it has been calculated that
⎡ ⎤
0 1 −1 −1 1
⎢ 1 0 1 −1 −1 ⎥
⎢ ⎥
S5 = ⎢
⎢ −1 1 0 1 −1 ⎥
⎥.
⎣ −1 −1 1 0 1 ⎦
1 −1 −1 1 0

In F11
3 , we consider a linear code C whose generator matrix is
 
11111
G= I6 ,
S5
2.4 Some Typical Codes 63

So C is a six-dimensional linear subspace in F11 3 , that is C = [11, 6]. This code is


called 3-ary Golay code. In order to further discuss 3-ary Golay codes [11, 6], we
discuss the concept of extended codes of linear codes.
If C ⊂ Fqn is a q-ary linear code of length n, the extension code C of C is defined
as

n+1
C = {(c1 , c2 , . . . , cn+1 )|(c1 , c2 , . . . , cn ) ∈ C, and ci = 0}.
i=1

Obviously, C ⊂ Fqn+1 is a linear code.


Lemma 2.18 If C ⊂ Fqn is a linear code, the generation matrix is G and the test
matrix is H , then the length of extension code C ⊂ Fqn+1 is n + 1, its generation
matrix G and test matrix H are
⎛ ⎞
1 1 ··· 1
⎜ 0⎟
⎜ ⎟
G = [G, β], and H = ⎜ H . ⎟ ,
⎝ .. ⎠
0

respectively. Where β is a column vector and satisfies that the sum of all column
vectors of β and G is 0. Further, let q = 2, if the minimum distance d of C is odd,
then the minimum distance of C is d + 1.

Proof The generation matrix and check matrix of C can be given directly by defi-
nition. The minimal weight w = w(c) of C can be obtained by c = c1 c2 . . . cn ∈ C,
because q = 2, so there are w ci = 1, and w is an odd number, then w = 0, let
cn+1 = 1, then
c∗ = c1 c2 . . . cn+1 ∈ C and w(c∗ ) = d + 1.

This is the minimal weight in C. The lemma is proved.

Consider the extension codes C = [12, 6] of 3-ary Golay code C = [11, 6], its
generating matrix is ⎛ ⎞
11111 0
⎜ −1⎟
⎜ ⎟
G = ⎜ I6 S5 .. ⎟ , (2.25)
⎝ . ⎠
−1

Note that the sum of the components of each row vector of S5 is 0, and the inner
product of the different row vectors is - 1, and the inner product of the same row
vector is 1, so
G · G = 0.

Therefore, the extended code C is a self-dual code, that is (C)⊥ = C.


64 2 The Basis of Code Theory

Theorem 2.8 3-ary Golay code C is a perfect linear code [11, 6], its minimum
distance is 5, so it is a 2-error correcting code.

Proof The weight of each row vector of G is 6, according to the calculation, the
weight of the linear combination of row vectors of G is 6, so the minimum distance
of extension code C is 6 ⇒ the minimum distance of C is 5. So the disjoint radius
of C is ρ1 = 2. And because

2  
11 i
|C| = 3 , 6
2 = 35 ,
i=0
i

then the condition of sphere packing satisfy

2  
11 i
|C| 2 = 311 .
i=0
i

Thus by Theorem 2.1, C is a perfect code, the Theorem holds.

Remark 2.1 It is worth noting that J.H.VanLint in 1971 (See reference 2 [24]),
A.T iet äv äinen in 1973(See reference 2 [43]) independently proved that perfect
codes (nontrivial) with minimal distance greater than 3 have only 2-ary Golay codes
G 23 and 3-ary Golay codes over any finite field.

2.4.4 Reed–Muller Codes

Reed and Muller proposed a class of 2-ary linear codes based on finite geometry in
1954. In order to discuss the structure and properties of these codes, we first prove
some results in number theory.
Lemma 2.19 Let p be a prime, k, n be two nonnegative integers whose p-ary is
expressed as
l 
l
n= n i pi , k = ki p i .
i=0 i=0

Then
   l    
n ni ni
≡ (mod p), where = 0, if ki > n i .
k i=0
k i ki

Proof If k = 0, then ki = 0, so the above formula holds. If n = k, then n i = ki , the


above formula also holds. We might as well make 1 ≤ k < n, note the polynomial
congruence
(1 + x) p ≡ 1 + x p (mod p),
2.4 Some Typical Codes 65

so we have
l
n i pi
(1 + x)n = (1 + x) i=0


l
i
≡ (1 + x p )ni (mod p),
i=0

Comparing the coefficients of the x k terms on both sides of the above formula, if
there is a k j > n j , then the x k terms do not appear on the right side of the above
formula, which means that the coefficients of the x k terms on the left side are
 
n
≡ 0(mod p).
k

If ki ≤ n i , ∀ 0 ≤ i ≤ l, then
   l  
n ni
≡ (mod p).
k i=0
ki

We complete the proof of Lemma.

Massey defined the concept of polynomial weight for the first time in 1973,
on a finite field with characteristic 2 (q = 2r ), a polynomial f (x) ∈ Fq [x], whose
Hamming weight is defined as

w( f (x)) = The number of nonzero coefficients of f (x).


Lemma 2.20 (Massey, 1973) Let f (x) = li=0 bi (x + c)i ∈ Fq [x] and bl = 0, let
i 0 be the smallest subscript i of bi = 0, then

w( f (x)) ≥ w((x + c)i0 ).

Proof l = 0, then i 0 = 0, the lemma holds. Let l < 2n be lemma, we consider 2n ≤


l < 2n+1 , write f (x) as

2
n
−1 
l
f (x) = bi (x + c)i + bi (x + c)i
i=0 i=2n
2n
= f 1 (x) + (x + c) f 2 (x)
n n
= f 1 (x) + c2 f 2 (x) + x 2 f 2 (x),
66 2 The Basis of Code Theory

where deg f 1 (x) < 2n , deg f 2 (x) < 2n . There are two situations to discuss:
(i) If f 1 (x) = 0, then w( f (x)) = 2w( f 2 (x)). Because i 0 ≥ 2n , so

w((x + c)i0 ) = w((x 2 + c2 )(x + c)i0 −2 )


n n n

= 2w((x + c)i0 −2 ).
n

From inductive hypothesis

w( f 2 (x)) ≥ w((x + c)i0 −2 ).


n

So there are

w( f (x)) = 2w( f 2 (x)) > 2w((x + c)i0 −2 ) = w((x + c)i0 ).


n

(ii) f 1 (x) = 0, i 1 is the subscript of f 1 (x), i 2 is the subscript of f 2 (x). If the term
n n
not 0 in f 1 (x) plus the corresponding term of c2 f 2 (x) becomes 0, then x 2 f 2 (x)
will have corresponding terms that are not zero, so we always have

w( f (x)) ≥ w( f 1 (x)), w( f (x)) ≥ w( f 2 (x)).

If i 1 < i 2 , then i 0 = i 1 , from inductive hypothesis,

w( f (x)) ≥ w( f 1 (x)) ≥ w((x − c)i1 ) = w((x − c)i0 ).

Similarly, if i 2 < i 1 , then i 0 = i 2 , there is

w( f (x)) ≥ w( f 2 (x)) ≥ w((x − c)i2 ) = w((x − c)i0 ).

If i 1 = i 2 , then it can always be changed into the case of i 1 = i 0 , so we always


have Lemma holds.

Next, we use Massey’s method to construct Reed–Muller codes. Let m ≥ 1, Fm 2


be an m-dimensional affine space, denote as AG(m, 2), α ∈ AG(m, 2) is a point in
affine space, write α as an m-dimensional column vector, let {u 0 , u 1 , . . . , u m−1 } be
the standard base of Fm2 , that is
⎡ ⎤ ⎡ ⎤
⎡ ⎤ 1 0
a0 ⎢0⎥ ⎢0⎥
⎢ a1 ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
α = ⎢ . ⎥ , u 0 = ⎢ 0 ⎥ , . . . , u m−1 = ⎢ 0 ⎥ ,
⎣ .. ⎦ ⎢ .. ⎥ ⎢ .. ⎥
⎣.⎦ ⎣.⎦
am−1
0 1

where ai = 0 or 1. Let’s establish a 1 − 1 correspondence between the points in the


integer set {0 ≤ j < 2m } and AG(m, 2). Let 0 ≤ j < 2m , then
2.4 Some Typical Codes 67


m−1
j= ai j 2i , ai j ∈ F2 .
i=0

We define ⎡ ⎤
a0 j

m−1 ⎢ a1 j ⎥
⎢ ⎥
xj = ai j u i = ⎢ .. ⎥ ∈ Fm
2,
⎣ . ⎦
i=0
a(m−1) j

Because when j1 = j2 , there is x j1 = x j2 , So {x j |0 ≤ j < 2m } gives all the points in


2 . Write n = 2 and consider the matrix
Fm m

⎡ ⎤
a00 a01 ··· a0(n−1)
⎢ a10 a11 ··· a1(n−1) ⎥
⎢ ⎥
E = [x0 , x1 , . . . , xn−1 ] = ⎢ .. .. .. ⎥ ,
⎣ . . ··· . ⎦
a(m−1)0 a(m−1)1 · · · a(m−1)(n−1) m×n

Each row vector αi = (ai0 , ai1 , . . . , ai(n−1) )(0 ≤ i ≤ m − 1) of E is a vector of Fn2 ,


which is written as
⎡ ⎤
α0
⎢ α1 ⎥
⎢ ⎥
E = ⎢ . ⎥ = (ai j )m×n (0 ≤ i < m, 0 ≤ j < 2m = n).
⎣ .. ⎦
αm−1

For each i, 0 ≤ i < m, define a linear subspace in Fm


2,

Bi = {x j ∈ Fm
2 |ai j = 0}.

Obviously, Bi is a linear subspace, and the additive coset of Bi is called an m − 1-


dimensional plat in Fm2 . We consider Ai = Bi + u i ,

Ai = {x j ∈ Fm
2 |ai j = 1, 0 ≤ j < n} ⇒ |Ai | = 2
m−1
.

We define the characteristic function χi (α) in Fm


2 according to Ai ,

1, if α ∈ Ai ;
χi (α) =
0, if α ∈
/ Ai .

2 . So each row vector αi (0 ≤ i < m) in E can be expressed as


where α ∈ Fm

αi = (χi (x0 ), χi (x1 ), . . . , χi (xn−1 )).


68 2 The Basis of Code Theory

For any two vectors α = (b0 , b1 , . . . , bn−1 ), β = (c0 , c1 , . . . , cn−1 ) in Fn2 , define the
product vector
αβ = (b0 c0 , b1 c1 , . . . , bn−1 cn−1 ) ∈ Fn2 .

So for 0 ≤ i 1 , i 2 < m, we have the product of row vectors of E

αi1 αi2 = (χi1 (x0 )χi2 (x0 ), χi1 (x1 )χi2 (x1 ), . . . , χi1 (xn−1 )χi2 (xn−1 )).

So the j-th (0 ≤ j < 2m ) component of αi1 αi2 is



1, if x j ∈ Ai1 ∩ Ai2 ;
χi1 (x j )χi2 (x j ) =
0, if x j ∈
/ Ai 1 ∩ Ai 2 .

From the definition of Ai , obviously,

|Ai1 ∩ Ai2 | = 2m−2 .

Lemma 2.21 Let i 1 , i 2 , . . . , i s be the number of s(0 ≤ s < m) different indexes from
0 to m − 1, then
|Ai1 ∩ Ai2 ∩ · · · ∩ Ais | = 2m−s ,

And αi1 αi2 · · · αis ∈ Fn2 has a weight function

w(αi1 αi2 · · · αis ) = 2m−s .

Proof The first conclusion is obvious. Let’s just prove the second conclusion,

α = αi 1 αi 2 · · · αi s = (xi 1 (x0 ) · · · xi s (x0 ), xi 1 (x1 ) · · · xi s (x1 ), . . . , xi 1 (xn−1 ) · · · xi s (xn−1 ))

has 2m−s x j ∈ Ai1 ∩ Ai2 ∩ · · · ∩ Ais , so there are 2m−s components in α that are 1
and the others are 0, so

w(α) = w(αi1 αi2 · · · αis ) = 2m−s ,

the Lemma holds.

For 0 ≤ l < 2m , I (l) is defined as an indicator set,


m−1
I (l) = {i 1 , i 2 , . . . , i s | l = ail 2i satisfy ail = 0}.
i=0
2.4 Some Typical Codes 69

The following properties of the indicator set I (l) are obvious:


(i) If!l1 = l2 ⇒ I (l1 ) = I (l2 ).
(ii) 0≤l<n I (l) = {0, 1, 2, . . . , m − 1}.
(iii) If l = n − 1 ⇒ I (n − 1) is an empty set.
The above properties are easy to verify, such as (iii), because l = n − 1 = 2m − 1 =
1 + 2 + · · · + 2m−1 , so the subscripts i of ail = 0 don’t exist, that is I (n − 1) = ∅.
Sometimes we can write indicator sets I (l) = {i 1 , i 2 , . . . , i s }l .
Lemma 2.22 Let 0 ≤ l < n = 2m , I (l) = {i 1 , i 2 , . . . , i s }, re hypothesis

αi1 αi2 · · · αis = (bl0 , bl1 , . . . , bl(n−1) ) ∈ Fn2 ,

then in the ring F2 [x], there is


n−1
(1 + x)l = bl j x n−1− j . (2.26)
j=0

m−1
Proof For 0 ≤ j < n, write j = i=0 ai j 2i , then


m−1
n−1− j = ci j 2i , where ci j = 1 − ai j .
i=0

By Lemma 2.19,
  
m−1 
l ail
≡ (mod 2).
n−1− j i=0
ci j

If  
l
≡ 1(mod 2),
n−1− j

then when ail = 0, ⇒ ci j = 0 ⇒ ai j = 1, that is to say


 
l
≡ 1(mod 2) ⇔ ai j = 1, for ∀ i ∈ I (l).
n−1− j

on the other hand, from Lemma 2.21,


" " "
bl j = 1 ⇔ x j ∈ Ai1 Ai 2 ··· Ais ⇔ ai j = 1, when i ∈ I (l).
70 2 The Basis of Code Theory

Compare the x n−1− j terms on both sides of formula (2.26), so we have


n−1
(1 + x) =l
bl j x n−1− j .
j=0

The Lemma holds.


For any 0 ≤ l < n = 2m , we define the index set I (l) = {i 1 , i 2 , . . . , i s } and the
vector in Fn2 .
Nl = αi1 αi2 · · · αis .

The index set I (l) corresponding to different l is different, so the corresponding


vector Nl is different; since the index set corresponding to l = n − 1 is an empty set,
the corresponding vector Nn−1 is defined as

Nn−1 = (1, 1, . . . , 1) = e.

Let e0 = (1, 0, . . . , 0), . . . , en−1 = (0, 0, . . . , 1) be a set of standard bases of Fn2 .


Lemma 2.23 For 0 ≤ j < n, we have


m−1
ej = (αi + (1 + ai j )e),
i=0

where αi is the i-th row of matrix E.


Proof For vector α in Fn2 , its complement vector α is defined to replace the component
of 1 in α with 0, and the component of 0 in α with 1. So there are

α + α = e = (1, 1, . . . , 1), ∀ α ∈ Fn2 .

When 0 ≤ j < n is given, we define the j-th complement of row vector αi (0 ≤ i <
m) of matrix E as 
αi , if ai j = 1;
αi ( j) =
αi , if ai j = 0.

Obviously, there is
αi + (1 + ai j )e = αi ( j),

from the definition of index set I (l), we have



αi , if i ∈
/ I ( j);
αi ( j) =
e − αi , if i ∈ I ( j).
2.4 Some Typical Codes 71

Now let’s calculate


m−1 
m−1
(αi + (1 + ai j )e) = αi ( j)
i=0 i=0
 
= (e − αi ) αi = b.
i∈I ( j) i ∈I
/ ( j)

where b ∈ Fn2 , let b = (b0 , b1 , . . . , bn−1 ). Obviously, b j = 1. If k = j, then


 
bk = aik · (1 − aik ) = 0.
i ∈I
/ ( j) i∈I ( j)

Thus b = e j . We have completed the proof of Lemma.

Lemma 2.24 {Nl }0≤l<n constitutes a group of bases of Fn2 , where Nn−1 = e =
(1, 1, . . . , 1).

Proof {Nl }0≤l<n has exactly n different vectors, let’s prove that they are linearly
independent. Let

Nl = αi1 αi2 · · · αis = (bl0 , bl1 , . . . , bl(n−1) ),


n−1 
n−1 
n−1 
n−1
cl Nl = ( cl bl0 , cl bl1 , . . . , cl bl(n−1) )
l=0 l=0 l=0 l=0

be a linear combination. Where c = (c0 , c1 , . . . , cn−1 ) = 0. Because


n−1
f (x) = (1 + x)l ∈ F2 [x], f (x) = 0.
l=0

By Lemma 2.22, we have


n−1 n−1
f (x) = ( cl bl j )x n−1− j .
j=0 l=0

n−1
So if there’s a component l=0 cl bl j = 0, that is {Nl }0≤l<n is a group of bases. The
Lemma holds.

Definition 2.12 Let 0 ≤ r < m, a linear code of order r łłReed–Muller code R(r, m)
be
R(r, m) = L({αi1 αi2 . . . αis |0 ≤ s ≤ r }) ⊂ Fn2 ,

the vector corresponding to s = 0 is e.


72 2 The Basis of Code Theory

Obviously, when r = 0, R(0, m) corresponds to the repeated code in Fn2 :

R(0, m) = {(0, 0, . . . , 0), (1, 1, . . . , 1)}.

For general r , 0 ≤ r < m, R(r, m) is a t-dimensional linear subspace in Fn2 , where

r  
 m
t= .
s=0
s

Lemma 2.25 The dual code of Reed–Muller code R(r, m) of order r is R(m − r −
1, m).
Proof The dimensions of R(r, m) and R(m − r − 1, m) are
r  
 m
dim(R(r, m)) =
s=0
s

and
−1 
m−r
m

dim(R(m − r − 1, m)) = .
s=0
s

Because
r  
 −1 
m−r 
m m
+
s=0
s s=0
m −s
r   m  
m m
= +
s=0
s s=r +1
s
m  
m
= = (1 + 1)m
s=0
s
= 2m = n.

That is
dim(R(r, m)) + dim(R(m − r − 1, m)) = n.

Let αi1 αi2 · · · αis , α j1 α j2 · · · α jt be the basis vectors of R(r,m) and R(m-r-1,m), respec-
tively. Let
α = αi1 αi2 · · · αis , β = α j1 α j2 · · · α jt ,

by Lemma 2.21,

w(α) = 2m−s , w(β) = 2m−t , s ≤ r < m, t ≤ m − r − 1,

because s + t < m, the product αβ = αi1 αi2 · · · αis · α j1 α j2 · · · α jt has weight


2.4 Some Typical Codes 73

w(αβ) = w(αi1 αi2 · · · αis · α j1 α j2 · · · α jt ) = 2m−(s+t) ,

so
< α, β >= 0,

That is, the dual code of R(r, m) is R(m − r − 1, m). The Lemma holds.
Theorem 2.9 Reed–Muller code R(r, m) of order r have minimal distance d =
2m−r , specially, when r = m − 2, R(m − 2, m) is a linear code [n, n − m − 1].
Proof From Lemma 2.21, we have

w(αi1 αi2 · · · αis ) = 2m−s ,

so the minimum distance of R(r,m) is d ≤ 2m−r , on the other hand, let I1 (r ) be the
value of all l of corresponding {i 1 , i 2 , . . . , i s } under the condition of s ≤ r , let

αi1 αi2 · · · αis = (bl0 , bl1 , . . . , bl(n−1) ),

then
 
n−1 
f (x) = (1 + x)l = ( cl bl j )x n−1− j .
l∈I1 (r ) j=0 l∈I1 (r )

Therefore, the weight function of linear combination has the following relationship:

w( cl αi1 αi2 · · · αis ) = w( f (x)).
l∈I1 (r )

Define i 0 as
i 0 = min{l|l ∈ I1 (r )}.

Obviously,
i 0 = 1 + 2 + · · · + 2m−r −1 = 2m−r − 1,

from Lemma 2.20, then there is

w( f (x)) ≥ w((x + 1)i0 ) = i 0 + 1 = 2m−r .

Because the combination numbers


   m−r 
i0 2 −1
= (0 ≤ k ≤ 2m−r − 1)
k k

are all odd, this is because

i 0 = 1 + 2 + · · · + 2m−r −1 , k = k0 + k1 · 2 + · · · + km−r −1 2m−r −1 ,


74 2 The Basis of Code Theory

∀ ki ≤ 1, so as to deduce
   
i0 l
≡ (mod 2).
k ki

So there is  
i0
≡ 1(mod 2).
k

In the end, we have d = 2m−r . If let r = m − 2, then the minimum distance is 4. The
dimension of R(m − 2, m) is


m−2
m
 m  
 m

m
  
m
t= = − −
s=0
s s=0
s m−1 m
=2 −m−1
m

= n − m − 1.

So R(m − 2, m) is a linear code [n, n − m − 1]. The theorem is proved.

Because R(m − 2, m) is in the form of linear code [n, n − k], and the minimum
distance is 4, so we consider R(m − 2, m) as a class of extended Hamming codes.
Although it is not perfect, Hamming codes are perfect linear codes.

2.5 Shannon Theorem

In the channel transmission, due to the interference of the channel, a codeword x ∈ C


cannot be decoded correctly after it is sent, the probability of this error is recorded
as p(x), which is called the error probability of codeword x. According to Hamming
distance, after code C is selected, according to the decoding principle of “look most
sending
like”, the error probability p(x) of a codeword x −→ x satisfies

⎨ p(x) = 0, if d(x, x ) ≤ ρ < 1 n;
1
2

p(x) > 0, if d(x, x ) > ρ1 .

where ρ1 is the disjoint radius of code C. Therefore, the error probability p(x) of
code word x is related to code C. The error probability of code C is

1 
p(C) = p(x).
|C| x∈C
2.5 Shannon Theorem 75

It is difficult to calculate the error probability of a codeword mathematically, we take


the binary channel as an example, C ⊂ Fn2 is a binary code of length n, to calculate the
error probability p(x) of x ∈ C, we agree that the transmission error probability of
character 0 is p, p < 21 , that is the probability of receiving 0 as 1 after transmission,
and the probability of character 1 transmission error is also p, although the probability
of error is very low, that is, the value of p is very small, the probability of error exists
due to the interference of channel. We further agree that the error probability of
each transmission of character 0 or 1 is p, which is called memoryless channel. In
the memoryless binary channel, the transmission of a codeword x = x1 x2 . . . xn ∈ C
just constitutes the n-fold Bernoulli test, this probability model provides a theoretical
basis for calculating the error probability of codeword x, let’s take 2-tuple code as
an example.
Lemma 2.26 Let An be a binary repeated code of length n, that is An = {0, 1} ⊂ Fn2 ,
p(An ) is the probability of error, then

lim p(An ) = 0.
n→∞

Proof The transmission of codeword 0 = (0, 0, . . . , 0) is regarded as n-fold Bernoulli


test, the character 0 has only two results of 0 and 1 after each transmission, the prob-
ability of occurrence of 0 is q = 1 − p, and the probability of occurrence of 1 is
p < 21 . Let 0 ≤ k ≤ n, then the probability of 0 appearing k times is
 
n k n−k
q p .
k

If k > 21 n, then there are k > 21 n 0 characters in the received codeword after the
codeword 0 is transmitted, suppose 0 → 0, then d(0, 0) ≤ n − k < 21 n. Because
the disjoint radius of repeat code is 21 n, according to the decoding principle, we can
always decode 0 → 0 correctly; therefore, the error of codeword 0 = (0, 0, . . . , 0) ∈
Fn2 occurs if and only if when k ≤ 21 n, the error probability is

 n 
p(0) = q k p n−k .
n k
0≤k≤ 2

Similarly, the error probability of codeword 1 = (1, 1, . . . , 1) ∈ Fn2 is

 n 
p(1) = q k p n−k .
n k
0≤k≤ 2

Therefore, the error probability of repeat code An is


76 2 The Basis of Code Theory

 n 
p(An ) = q k p n−k .
n k
0≤k≤ 2

To calculate the limit value n → ∞ of the above equation, let’s see

 n   n 
< = 2n .
n k 0≤k≤n
k
0≤k≤ 2

Because p < 21 , so p < q, and when k ≤ n2 , we have

q n q
k log ≤ log .
p 2 p

It can be directly proved by the above formula


n
q k p n−k ≤ (qp) 2 .

Thus
n n
p(An ) ≤ 2n (qp) 2 = (4qp) 2 .

Because when p < 21 ,


1 1
p2 − p + = ( p − )2 > 0,
4 2
so
1
p(1 − p) = pq < , that is 4 pq < 1.
4
Therefore,
n
0 ≤ lim p(An ) ≤ lim (4qp) 2 = 0.
n→∞ n→∞

The Lemma holds.


Below, we assume that the channel transmission is binary memoryless symmetric
channel. Each code is binary code. The error probability of each transmission of
characters 0 and 1 is p, q = 1 − p, p < 21 . For given codeword length n and the
number of codewords M = Mn , we define Shannon’s probability P ∗ (n, Mn , p) as

P ∗ (n, Mn , p) = min{P(C)|C ⊂ Fn2 , |C| = Mn }.

Shannon proved the following famous theorem in 1948.


Theorem 2.10 (Shannon) In a memoryless symmetric binary channel, let 0 < λ <
1 + p log p + q log q be a given real number, Mn = 2[λn] , then we have

lim P ∗ (n, Mn , p) = 0.
n→∞
2.5 Shannon Theorem 77

In order to understand the meaning of Shannon’s theorem and prove it, we need some
auxiliary conclusions.
Lemma 2.27 0 < λ < 1 + p log p + q log q is a given real number, any binary
code C ⊂ Fn2 , if |C| = 2[λn] , then the code rate RC of C satisfies

1
λ− < RC ≤ λ.
n
Specially, When n → ∞, the rate of C approaches λ.
Proof
|C| = 2[λn] ⇒ log2 |C| = [λn] ≤ λn.

Therefore,
1
RC = log2 |C| ≤ λ.
n
From the properties of square bracket function,

λn < [λn] + 1,

so
λn − 1 < [λn] = log2 |C|.

There are
1 1
λ− < log2 |C| = RC .
n n
The Lemma 2.27 holds.
Combining Lemma 2.27, the significance of Shannon’s theorem is that the code
rate tends to the capacity 1 − H ( p) of a channel when the code length n increases
and tends to infinity, and there exists a code C whose error probability is arbitrarily
small, according to Shannon’s understanding, this kind of code is called “good code”.
Shannon first proved the existence of “good codes” under more general conditions
by probability method. Theorem 2.10 is only a special case of Shannon’s channel
coding theorem. To prove Shannon theorem, we must accurately estimate the error
probability of a given number of codewords under the principle of decoding.
Lemma 2.28 In the memoryless binary channel, let the probability of each transmis-
sion error of characters 0 and 1 be p, q = 1 − p, a codeword x = x1 x2 . . . xn#∈ Fn2
has exactly ω characters error during transmission, then for any ε > 0, let b = npq
ε
,
we have
P{ω > np + b} ≤ ε.

Proof For any a codeword x = x1 x2 . . . xn ∈ Fn2 , when transmitted in a memoryless


binary channel, it can be regarded as an n-fold Bernoulli test, ω with exactly ω errors
78 2 The Basis of Code Theory

in x can be regarded as a discrete random variable with a value of 0, 1, 2, . . . , n, the


probability of occurrence of ω is (i.e., the probability of the value ω of the random
variable ω)  
n ω n−ω
b(ω, n, p) = p q .
ω

Therefore, the probability distribution of ω obeys the discrete random variable of


binomial distribution. From Lemma 1.18 of the first chapter, the expected value
E(ω) and variance D(ω) of ω are as follows:

E(ω) = np, D(ω) = npq.

From the Chebyshev inequality of corollary 1.2, for any k > 0,


$ 1
P{|ω − E(ω)| ≥ k D(ω)} ≤ 2 .
k

Take k = √1 ,
ε
then we have

P{w > np + b} ≤ P{|ω − np| > b} ≤ ε.

That is
P{w > np + b} ≤ ε.

The Lemma 2.28 holds.


#
Lemma 2.29 Take ρ = [np + b], where b = np(1− p)
ε
, then

ρ ρ 1
log = p log p + O( √ ),
n n n
ρ ρ 1
(1 − ) log(1 − ) = q log q + O( √ ).
n n n

Proof When ε > 0 is given, b = O( n), so ρ can be rewritten as

√ ρ 1
ρ = np + O( n), = p + O( √ ).
n n

Thus
ρ ρ 1 1
log = ( p + O( √ )) log( p + O( √ ))
n n n n
1 1
= ( p + O( √ ))(log p + log(1 + O( √ ))).
n n

For the real number x of |x| < 1, we have the following Taylor expansion
2.5 Shannon Theorem 79

1 1 1
log(1 + x) = x − x 2 + x 3 − x 4 . . . .
2 3 4
So when |x| < 1, we have
log(1 + x) = O(|x|),

thus
1 1
log(1 + O( √ )) = O( √ ),
n n

we have
ρ ρ 1 1
log = ( p + O( √ ))(log p + O( √ ))
n n n n
1
= p log p + O( √ ).
n

Similarly, for the second asymptotic formula,

ρ ρ 1
(1 − ) log(1 − ) = q log q + O( √ ),
n n n

the Lemma 2.29 holds.

To prove Shannon theorem, we define the following auxiliary functions, and for
any two codewords x, y ∈ Fn2 , ρ ≥ 0, define

0, if d(x, y) > ρ;
f ρ (x, y) =
1, if d(x, y) ≤ ρ.

Let C = {x1 , x2 , . . . , x M } ⊂ Fn2 be a binary code of |C| = M, define



gi (y) = 1 − f ρ (y, xi ) + f ρ (y, x j ).
j=i

Lemma 2.30 Assuming y ∈ Fn2 is a given codeword, then



gi (y) = 0, if xi ∈ C is the only codeword so that d(y, xi ) ≤ ρ,
gi (y) ≥ 1, otherwise.

Proof If there is a unique xi ∈ C such that d(y, xi ) ≤ ρ, then f ρ (y, xi ) = 1, but


f ρ (y, x j ) = 0(i = j), therefore

gi (y) = 1 − f ρ (y, xi ) + f ρ (y, x j ) = 0.
j=i
80 2 The Basis of Code Theory

If d(y, xi ) > ρ, then f ρ (y, xi ) = 0, so


 
gi (y) = 1 − f ρ (y, xi ) + f ρ (y, x j ) = 1 + f ρ (y, x j ) ≥ 1.
j=i j=i

If d(y, xi ) ≤ ρ, but there is at least one xk = xi such that d(y, xk ) ≤ ρ, then



gi (y) = 1 − f ρ (y, xi ) + f ρ (y, x j )
j=i

=1+ f ρ (y, x j ) ≥ 1.
j=i, j=k

The Lemma 2.30 holds.

With the above preparation, we give the proof of Shannon’s theorem.


Proof (The proof of Theorem 2.10) According to the assumptions of the theorem, we
assume that 0 < λ < 1 + p log p + q log q is a given positive real number ( p < 21 ).

M = Mn = 2[λn] , |C| = M.

Let
|C| = {x1 , x2 , . . . , x M } ⊂ Fn2 ,

ε > 0 is any given positive number,


%
npq
b= , ρ = [ pn + b].
ε

Because of p < 21 , when n is sufficiently large, we have ρ = pn + O( n) < 21 n.
transmit
In order to calculate the error probability of codeword xi ∈ C, suppose xi −→ y,
if d(xi , y) ≤ ρ, and there is a unique codeword xi ∈ C such that d(y, xi ) ≤ ρ, so
according to the decoding principle of “look the most like”, xi is the most similar
codeword in C, so we can decode it correctly as y →transmit xi , in this case, the
error probability of xi is 0. Otherwise, there will be real decoding error. On the other
hand, y becomes xi , and the occurrence probability of the received codeword after
transmission is the conditional probability p = (y|xi ), so the error probability of xi
is estimated as

Pi = p(xi ) ≤ p(y|xi )gi (y)
y∈Fn2

 
M (2.27)
= p(y|xi )(1 − f ρ (y, xi )) + p(y|xi ) f ρ (y, x j ).
y∈Fn2 y∈Fn2 j=1
j=i
2.5 Shannon Theorem 81

According to the definition of f ρ (y, xi ), the first term of the above formula is the
probability that the received codeword y sent by xi is not in ball Bρ (xi ), i.e.

p(y|xi )(1 − f ρ (y, xi )) = P{received codewords y|y ∈
/ Bρ (xi )}.
y∈Fn2

Because ω = d(y, xi ) is exactly the number of ω error characters in xi → y, from


the Chebyshev inequality of Lemma 2.28, we have

/ Bρ (xi )} = P{ω > ρ} ≤ P{ω ≥ np + b} < ε,


P{received codewords|y ∈

from (2.27), we have


M
Pi = p(xi ) ≤ ε + p(y|xi ) f ρ (y, x j ). (2.28)
y∈Fn2 j=1
j=i

Because the definition of the error probability p(C) of code C, so there is

1  
M M  M
p(C) = p(xi ) ≤ ε + M −1 p(y|xi ) f ρ (y, x j ).
M i=1 i=1 n j=1
y∈F2
j=i

Since C is randomly selected, we can regard p(C) as a random variable, so


Shannon’s probability P ∗ (n, Mn , p) is the minimum value of p(C), so it is less
than the expected value of p(C), i.e.

P ∗ (n, Mn , p) ≤ E(P(C))

M 
M
≤ ε + M −1 E( p(y|xi ) · f ρ (y, x j )).
i=1 y∈Fn2 j=1
j=i

When i is given, the random variables p(y|xi ) and f ρ (y, x j )( j = i) are statistically
independent, so

E( p(y|xi ) · f ρ (y, x j )) = E( p(y|xi ))E( f ρ (y, x j )).

So there is


M 
M
P ∗ (n, Mn , p) ≤ ε + M −1 E( p(y|xi ))E( f ρ (y, x j )). (2.29)
i=1 y∈Fn2 j=1
j=i
82 2 The Basis of Code Theory

Let’s calculate the expected value of f ρ (y, x j ), because y is selected in Fn2 with equal
probability, so 
E( f ρ (y, x j )) = p(y) f ρ (y, x j )
y∈Fn2
1
= |Bρ (x j )|
2n
1
= n |Bρ (0)|.
2
So there is


M  
M
P ∗ (n, Mn , p) = ε + M −1 E( p(y|xi )) E( f ρ (y, x j ))
i=1 y∈Fn2 j=1
j=i
(2.30)

M 
(M − 1)|Bρ (0)|
= ε + M −1 E( p(y|xi )) .
i=1 y∈Fn2
2n

Now let’s calculate the expected value of p(y|xi )(y fixed, xi randomly selected in
C)

M
E( p(y|xi )) = p(xi ) p(y|xi ) = p(y),
i=1

thus

M  
M 
E( p(y|xi )) = p(y) = M.
i=1 y∈Fn2 i=1 y∈Fn2

From (2.30),
M −1
P ∗ (n, Mn , p) ≤ ε + |Bρ (0)|,
2n

log2 (P ∗ (n, Mn , p) − ε) ≤ log2 M + log2 |Bρ (0)| − n.

That is
1 1 1
log2 (P ∗ (n, Mn , p) − ε) ≤ log2 M + log2 |Bρ (0)| − 1.
n n n
From Lemma 1.11 of Chap. 1,

ρ  
1 1 n ρ
log2 Bρ (0) = log2 ≤ H ( ),
n n i=0
i n
2.5 Shannon Theorem 83

where H (x) = −x log x − (1 − x) log(1 − x)(0 < x < 21 ) is the binary entropy
function, so there is

1 1 ρ
log2 (P ∗ (n, Mn , p) − ε) ≤ log2 M + H ( ) − 1.
n n n

By hypothesis M = 2[λn] , ρ = [ pn + b], b = O( n), we have

1 [λn] ρ
log2 (P ∗ (n, Mn , p) − ε) ≤ + H( ) − 1
n n n
ρ 1
= λ + H ( ) − 1 + O( ).
n n
By Lemma 2.29,
ρ ρ ρ ρ ρ
H ( ) = −( log + (1 − ) log(1 − ))
n n n n n
1
= −( p log p + q log q + O( √ )).
n

So
1 1
log2 (P ∗ (n, Mn , p) − ε) ≤ λ − (1 + p log p + q log q) + O( √ ).
n n

By hypothesis λ < 1 + p log p + q log q, when n is sufficiently large, we have

1
log2 (P ∗ (n, Mn , p) − ε) ≤ −β(β > 0).
n

Therefore, 0 ≤ P ∗ (n, Mn , p) ≤ ε + 2−βn , take the limit n → ∞ on both sides,


finally,
lim P ∗ (n, Mn , p) = 0.
n→∞

We completed the proof of the theorem.


According to Shannon, the code rate is close to a given normal number λ,

0 < λ < 1 + p log p + q log q = 1 − H ( p),

the code with arbitrarily small error probability is called “good code”, we further
analyze the construction of this kind of “good code”. (Shannon only proved the
existence of “good code” in probability).
Theorem 2.11 For given λ, 0 < λ < 1 + p log p + q log q( p < 21 ), Mn = 2[λn] , if
there is a perfect code Cn , and |Cn | = Mn , then we have

lim p(Cn ) = 0.
n→∞
84 2 The Basis of Code Theory

Proof If perfect code Cn exists, by Lemma 2.27,

1
λ− ≤ RCn ≤ λ.
n
Therefore, the code rate of Cn can be arbitrarily close to λ, the error probability of Cn
can be arbitrarily small, so Cn is a “good code” in the mathematical sense. To prove
Theorem 2.11, because Cn is a perfect code, the minimum distance dn is defined as
n
dn = 2en + 1, en < .
2
Because of lim Rcn = λ, by Theorem 2.2, we have
n→∞

en
lim H ( ) = 1 − λ > H ( p).
n→∞ n

Because the binary entropy function H (x) is a monotone continuous rising function
(0 < x < 21 ). So we have the limit lim enn , and
n→∞

en en
lim > p, that is > p, When n is sufficiently large.
n→∞ n n

Now consider the error probability p(x) of codeword x = x1 x2 . . . xn ∈ Cn , since Cn


is en error correction code, so x → x , when d(x, x ) ≤ en , we can always decode
correctly, at this time, the error probability of x is 0. Therefore, x transmission
error, that is, the case where x cannot be decoded correctly occurs only in case
d(x , x) = wn > en . At this point we have(When n is sufficiently large)
wn en
> > p + ε, (exist a ε > 0)
n n

So the error probability p(x) of x ∈ Cn is estimated


wn
p(x) ≤ P{ > p + ε}
n
wn
≤ P{| − p| > ε}.
n
Because when n → ∞, the random variable sequence {wn } is a Bernoulli random
process (i.e., for each n, it is n-folds Bernoulli test). From theorem 1.2 in Chap. 1,
we have wn
lim p(x) ≤ lim P{| − p| > ε} = 0.
n→∞ n→∞ n
For ∀ x ∈ Cn holds, so
lim p(Cn ) = 0.
n→∞
2.5 Shannon Theorem 85

The Theorem 2.11 holds.

From the proof of Theorems 2.10 and 2.11, it can be seen that Shannon randomly
selects a code and randomly selects a codeword, which essentially regards the input
information as a random event in a given probability space, and the transmission
process of information is essentially a random process. The fundamental difference
between Shannon and other mathematicians at the same time is that he regards
information or a code as a random variable. The mathematical model of information
transmission is a dynamic probability model rather than a static algebraic model. The
most important method to study a code naturally is probability statistics rather than
the algebraic combination method of traditional mathematics. From the perspective
of probability theory, Theorems 2.10 and 2.11 regard a code as a random variable,
but they have great particularity. The probability distribution of this random variable
obeys Bernoulli binomial distribution, especially the statistical characteristics of code
rate, which are not clearly expressed. It is the core content of Shannon’s information
theory to study the relationship between random variables with general probability
distribution and codes. One of the most basic concepts is information entropy, or
code entropy. Using the concept of code entropy, the statistical characteristics of a
code are clearly displayed. Therefore, we see a basic framework and prototype of
modern information theory. In the next chapter, we explain and prove these basic
ideas and results of Shannon information theory in detail. One of the most important
results is Shannon channel coding theorem (see Theorem 3.12 in Chap. 3). Shannon
uses the probability method to prove that the so-called good code with a code rate
up to the transmission capacity and an arbitrarily small error probability exists for
the general memoryless channel (whether symmetrical or not). On the contrary, the
code rate of a code with an arbitrarily small error probability must not be greater
than the capacity of the channel. This channel capacity is called Shannon’s limit,
which has been pursued for a long time in the field of electronic communication
engineering technology. People want to find a channel coding scheme with error
probability in a controllable range (e.g., less than ε) and transmission efficiency (i.e.,
code rate) reaching Shannon’s limit. In today’s 5G era, this engineering technical
problem seems to have been overcome. Returning to theorem 2.10, we see that the
upper limit 1 − H ( p) of the code rate is the channel capacity of the memoryless
symmetric binary channel (see example 2 in Sect. 8 of Chap. 3). From this example,
we can get a glimpse of Shannon’s channel coding theory.

Exercise 2
1. Please design a code of length 7, which contains 8 codewords, where the Ham-
ming distance of any two codewords is ≥ 4. The code is transmitted through
symmetric binary channel, assuming the error probability of characters 0 and 1
is p, calculate the success probability of codeword transmission.
2. Let C be a binary code of length 16, satisfy
(i) Each codeword has a weight of 6.
(ii) Any two codewords have Hamming distance of 8.
86 2 The Basis of Code Theory

Prove: |C| ≤ 16. Does the binary code C of |C| = 16 exist?


3. Let C be a binary code of length n and an error correcting code of one character,
prove
2n
|C| ≤ (n is even).
n+2

4. Let C be a binary perfect code of length n, and the minimum distance is 7. Prove:
n = 7 or n = 23.
5. Let C ⊂ Fqn be a linear code, C = [n, k] and any k coordinates be symmetric,
prove: the minimum distance of C is d = n − k + 1.
6. Suppose C = [2k + 1, k] ⊂ F2k+1 2 , and C ⊂ C ⊥ , write the difference set C ⊥ \C.
7. Let x = x1 x2 . . . x6 ∈ F2 , Decide Hamming ball |B1 (x)|. We can find a code
6

C ⊂ F62 ? Where |C| = 9, satisfy the Hamming distance of any two different
codewords in C is ≥ 3?
8. Let C = [n, k] ⊂ Fqn be a linear code, the generating matrix is G, if every column
of G is not all zero, prove

w(x) = n(q − 1)q k−1 .
x∈C

Where w(x) is the weight of codeword x.


9. Let C = [n, k] be a linear binary code, and there is a codeword with odd weight in
C, prove that the codewords with even weight in C form a linear code [n, k − 1].
10. Let C be a linear binary code, the generating matrix G is
⎛ ⎞
1 0 0 0 1 0 1
⎜0 1 0 0 1 0 1⎟
⎜ ⎟,
⎝0 0 1 0 0 1 1⎠
0 0 0 1 0 1 1

Please decode the received codewords as follows: y1 = (1101011),


y2 = (0110111), y3 = (0111000).
11. Let p be a prime, is there a self-dual linear code C = [8, 4] over F p ?
12. Let Rk be the rate of binary Hamming codes, find lim Rk =?
k→∞
13. Let C be a linear binary code, the weight distribution polynomial is A(z), finding
the weight distribution polynomial B(z) of dual code C ⊥ .
14. Let C = [n, k] ⊂ Fn2 , weight distribution polynomial be A(z), we use binary
symmetric channel to transmit codewords, and the error probability is p (the
error probability of characters 0 and 1), we hope that a codeword transmission
error can be detected, and calculate the probability that a codeword transmission
error will not be detected.
15. There is no linear code C = [15, 8] with minimum distance 5 over any finite
field Fq .
16. Let n = 2m , proved that Reed–Muller code R(1, m) is Hadamard code of length
n.
2.5 Shannon Theorem 87

17. Proved that ternary Golay has 132 codewords and its weight is 5. Let x be
the codeword of weight 5, consider all pairs (x, 2x), where w(x) = 5, take the
component whose coordinate component is not zero as a subset. Proved that
there are 66 such subsets and form 4 − (11, 5, 1) designs.
18. If the minimum distance d of a binary code C = (n, M, d) is even, prove that
there exists a binary code such that all its codewords have even weights.
19. Let H be a Hadamard matrix H12 , define

A = H − I, G = (I, A), I is the unit matrix.

Proved that G is the generating matrix of ternary code [24, 12] and the minimum
distance is 9.
20. Let C = [4, 2] be a ternary Hamming code. H is the check matrix of C, let I be
the unit matrix of order 4, J is a square matrix of order 4 with all elements of 1,
define  
J+I I I
G= ,
0 H −H

prove that G generates a ternary code C = [12, 6] and the minimum distance
is 6.

References

Barg, A. M., Katsman, S. L., & Tsfasman, M. A. (1987). Algebraic Geometric Codes from Curves
of Small Genus. Probl. of Information Transmission,23, 34–38.
Berlekamp, E. R. (1972). Decoding the Golay Code, JPL Technical Report 32-1256 (Vol. IX, pp.
81–85). Jet Propulsion Laboratory.
Berlekamp, E. R. (1968). Algebraic Coding Theory. NewYork: McGraw-Hill.
Best, M. R. (1980). Binary cades with a minimum distance of four. IEEE Transactions of Information
Theory, 26, 738–742.
Best, M. R. (1978). On the Existence of Perfect Codes, Report ZN 82/78. Amsterdam: Mathematical
Centre.
Bussey, W. H. (1905). Galois field tables for p n 169. Bull Amer Math Soc, 12, 22–38.
Bussey, W. H. (1910). Tables of Galois fields of order less than 1000. Bull Amer Math Soc, 16,
188–206.
Cameron, P. J., & van Lint, j. H. (1991). Designs, graphs, codes and their links. London Math Soc
Student Texts (Vol. 22). Cambridge University Press.
Conway, J. H., & Sloane, N. J. A. (1994). Quaternary constructions for the binary single- error-
correcting codes of Julin, Best, and others. Designs, Codes and Cryptography,41, 31–42.
Curtis, C. W., & Reiner, I. (1962). Representation Theory of Finite Groups and Associative Algebras.
New York-London: Interscience.
Delsarte, P., & Goethals, J. M. (1975). Unrestricted codes with the Golay parameters are unique.
Discrete Math.,12, 211–224.
Elias, P. Coding for noisy channels. IRE Conv, Record, part 4 (pp. 37–46).
Feller, W. (1950). An introduction to probability theory and its applications (Vol. I). Wiley.
Feng, G.-L., & Rao, T. R. N. (1994). A simple approach for construction of algebraic-geometric
codes from affine plane curves. IEEE Trans. Info. Theory,40, 1003–1012.
88 2 The Basis of Code Theory

Forney, G. D. (1970). Convolutional codes I: Algebraic structure. IEEE Trans Info Theory, 16,
720–738: Ibid, 17, 360 (1971).
Gallagher, R. G. (1968). Information Theory and Reliable Communication. New York: Wiley.
Goethals, J. M. (1977). The extended Nadler code is unique. IEEE Trans Info, 23, 132–135.
Goppa, V. D. (1970). A new class of linear error- correcting codes. Problems of Info Transmission,
6, 207–212.
Goto, M. (1975). A note on perfect decimal AN codes. Info Control, 29, 385–387.
Goto, M., & Fukumara, T. (1975). Perfect nonbinary AN codes with distance three. Info Control,
27, 336–348.
Graham, R. L., & Sloane, N. J. A. (1980). Lower bounds for constant weight codes. IEEE Trans
Info Theory, 26, 37–40.
Gritsenko, V. M. (1969). Nonbinary arithmetic correcting codes. Problems of Info Transmission, 5,
15–22.
Helgert, H. J., & Stinaff, R. D. (1973). Minimum distance bounds for binary linear codes. IEEE
Trans Info Theory, 19, 344–356.
Høholdt, T., & Pellikaan, R. (1995). On the decoding of algebraic-geometric codes. IEEE Transac-
tions of Info Theory, 41, 1589–1614.
Høholdt, T., van Lint, J. H., & Pellikaan, R. (1998). Algebraic geometry codes. In V. S. Pless, W.
C. Huffman & R. A. Brualdi (Eds.),Hand-book of coding theory. Elsevier Science Publishers.
Hong, Y. (1984). On the nonexistence of unknown perfect 6- and 8-codes in Hamming schemes
H (n, q) with q arbitrary. Osaka J. Math., 21, 687–700.
Justesen, J. (1975). An algebraic construction of rate v1 convolutional codes. IEEE Trans Info Theory,
21, 577–580.
Justesen, J., Larsen, K. J., Jensen, E. H., Havemose, A., & Høholdt, T. (1989). Construction and
decoding of a class of algebraic geometry codes. IEEE Transactions of Info Theory, 35, 811–821.
Kasami, T. (1969). An upper bound on k/n for affine invariant codes with fixed d/n. IEEE Trans
Info Theory, 15, 171–176.
Kerdock, A. M. (1972). A class of low-rate nonlinear codes. Info and Control, 20, 182–187.
Levenshtein, V. I. (1975). Minimum redundancy of binary error-correcting codes. Info Control, 28,
268–291.
Macwilliams, F. J., & Sloane, N. J. A. (1977). The Theory of Error-correcting Codes. Amsterdam-
New York-Oxford: North Holland.
Massey, J. L., & Garcia, O. N. (1972). Error-correcting codes in computer arithmetic. In J. T. Ton
(Eds.), Advances in information systems science (Vol. 4, Ch. 5). Plenum Press.
Massey, J. L., Costello, D. J., & Justesen, J. (1973). Polynomial weights and code construction.
IEEE Trans. Info. Theory,19, 101–110.
McEliece, R. J. (1977). The theory of information and coding. In Encyclopedia of mathematics and
its applications (Vol. 3). Addison-Wesley.
McEliece, R. J. (1979). The bounds of Delsarte and Lovasz and their applications to coding theory.
In G. Longo (Eds.), Algebraic coding theory and applicationsCISM Courses and Lectures (Vol.
258). Springer.
McEliece, R. J., Rodemich, E. R., Rumsey, H. C., & Welch, L. R. (1977). New upper bounds on the
rate of a code via the Delsarte- MacWilliams inequalities. IEEE Trans. Info. Theory,23, 157–166.
Peek, J. H. (1985). Communications Aspects of the Compact Disc Digital Audio System. IEEE
Communications Magazine, 23(2), 7–15.
Peterson, W. W., & Weldon, E. J. (1972). Error-correcting codes (2nd Edn). MIT Press.
Piret, P. (1977). Algebraic properties of convolutional codes with automorphisms, Ph.D. Disserta-
tion. University of Catholique de Louvain.
Piret, P. (1988). Convolutional codes, an algebraic approach. The MIT Press.
Posner, E. C. (1968). Combinatorial structures in planetary reconnaissance. In E. B. Mann (Eds.),
Error correcting codes (pp. 15–46). Wiley.
Rao, T. R. N. (1974). Error Coding for Arithmetic Processors. New York-London: Academic Press.
References 89

Roos, C. (1979). On the structure of convolutional and cyclic convolutional codes. IEEE Trans.
Info. Theory, 25, 676–683.
Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technology Journal,
27, 379–423, 623–656.
Sloane, N. J. A., Reddy, S. M., & Chen, C. L. (1972). New binary codes. IEEE Trans. Info. Theory,18,
503–510.
Solomon, G., & van Tilborg, H. C. A. A connection between block and convolutional codes. SIAM
Journal of Applied Mathematics, 37, 358–369.
Stichtenoth, H. (1993). Algebraic function fields and codes. Springer, Universitext.
Tietäváinen, T. (1973). On the nonexistence of perfect codes over finite fields. SIAM Journal of
Applied Mathematics, 24, 88–96.
van der Geer, G., & van Lint, J. H. (1988). Introduction to coding theory and algebraic geometry.
Birkhäuser.
van Lint, J. H. (1971). Nonexistence theorems for perfect error-correcting codes. In Computers in
Algebra and Theory (Vol. IV) (SIAM-AMS Proceedings).
van Lint, J. H. (1972). A new description of the Nadler code. IEEE Trans Info Theory, 18, 825–826.
van Lint, J. H. (1975). A survey of perfect codes. Rocky Mountain Journal of Math, 5, 199–224.
van Lint, J. H. (1990). Algebraic geometric codes. In D. Ray-Chaudhuri (Eds.), Coding theory and
design theory I, The IMA Volumes in Math and Appl 20. Springer.
van Lint, J. H. (1999). Introduction to coding theory, GTM86, Springer.
van Lint, J. H., & Macwilliams, F. J. (1978). Generalized quadratic residue codes. IEEE Trans Info
Theory, 24, 730–737.
van Lint, J. H., & Wilson, R. M. (1992). A course in combinatorics. Cambridge University Press.
van Oorschot, P. C., & Vanstone, S. A. (1989). An introduction to error correcting codes with
applications. Kluwer.

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 3
Shannon Theory

3.1 Information Space

According to Shannon, a message x is a random event. Let p(x) be the probability


of occurrence of event x. If p(x) = 0, this event does not occur; If p(x) = 1, this
event must occur. When p(x) = 0 or p(x) = 1, information x can be called trivial
information or spam information. Therefore, the real mathematical significance of
information x lies in its uncertainty, that is 0 < p(x) < 1. Quantitative research on
the uncertainty of nontrivial information constitutes all the starting point of Shannon’s
theory; this starting point is now called information quantity or information entropy,
or entropy for short. Shannon and his colleagues at Bell laboratory considered “bit”
as the basic quantitative unit of information. What is “bit”? We can simply understand
it as the number of bits in the binary system. However, according to Shannon, the
binary system with n digits can express up to 2n numbers. From the point of view
of probability and statistics, the probability of occurrence of these 2n numbers is 21n .
Therefore, a bit is the amount of information contained in event x with probability 21 .
Taking this as the starting point, Shannon defined the self-information I (x) contained
in an information x as
I (x) = − log2 p(x). (3.1)

Therefore, one piece of information x contains I (x)-bit information, when p(x) = 21 ,


then I (x) = 1. Equation (3.1) is Shannon’s first extraordinary progress in information
quantification. On the other hand, with the emergence of Telegraph and telephone,
binary is widely used in the conversion and transmission of information. Therefore,
we can assert that without binary, there would be no Shannon’s theory, let alone the
current informatics and information age. The purpose of this section is to strictly
mathematically deduce and simplify the most basic and important conclusions in
Shannon’s theory. First, we start with the rationality of the definition of formula (3.1).
If I (x) is used to represent the self-information of a random event x, the greater the
probability of occurrence p(x), the smaller the uncertainty. Therefore, I (x) should
be a monotonic decreasing function of probability p(x). If x y is a joint event and

© The Author(s) 2022 91


Z. Zheng, Modern Cryptography Volume 1, Financial Mathematics and Fintech,
https://doi.org/10.1007/978-981-19-0920-7_3
92 3 Shannon Theory

is statistically independent, that is, p(x y) = p(x) p(y), then the self-information
amount is I (x y) = I (x) + I (y). Of course, the self-information amount I (x) is
nonnegative, that is I (x) ≥ 0. Shannon prove, the self-information I (x) satisfying
the above three assumptions must be

I (x) = −c log p(x),

where c is a constant. This conclusion can be derived directly from the following
mathematical theorems.
Lemma 3.1 If the real function f (x) satisfies the following conditions in interval
[1, +∞):
(i) f (x) ≥ 0,
(ii) If x < y ⇒ f (x) < f (y),
(iii) f (x y) = f (x) + f (y).
Then f (x) = c log x, where c is a constant.
Proof Repeated use condition (iii), then there is

f (x k ) = k f (x), k ≥ 1

for any positive integer k. Take x = 1, then the above formula holds if and only if
f (1) = 0. It can be seen from (ii) that f (x) > 0 when x > 1. Let x > 1, y > 1 and
k ≥ 1 given, you can always find a nonnegative integer n to satisfy

y n ≤ x k < y n+1 ,

Take logarithms on both sides to get

n log x n+1
≤ < ,
k log y k

On the other hand, we have

n f (y) ≤ k f (x) < (n + 1) f (y),

thus
f (x) log x 1
| − |≤ ,
f (y) log y k

when k → ∞, we have

f (x) log x
= , ∀ x, y ∈ (1, +∞).
f (y) log y

Therefore,
3.1 Information Space 93

f (x) f (y)
= = c, ∀ x, y ∈ (1, +∞).
log x log y

That is f (x) = c log x. The Lemma holds.


In Lemma 3.1, let I (x) = f ( p(x)
1
), then f (x) satisfies the condition (i), (ii) and
(iii), thus I (x) = −c log p(x). That is (3.1) holds.
In order to introduce the definition of information space, we use X to represent a
finite set of original information, or a countable and additive information set, which
is called source state set. It can be an alphabet, a finite number of symbols or a
set of numbers. For example, 26 letters in English and 2-element finite field F2 are
commonly used source state sets. Elements in X can be called messages, events,
etc., or characters. We often use English capital letters such as X, Y, Z to represent a
source state set, and lowercase Greek letters ξ, η, . . . to represent a random variable
in a given probability space.
Definition 3.1 The value space of a random variable ξ is a source state set X ; the
probability distribution of characters on X as events is defined as

p(x) = P{ξ = x}, ∀ x ∈ X. (3.2)

We call (X, ξ ) an information space in a given probability space, when the random
variable ξ is clear, we usually record the information space (X, ξ ) as X . If η is another
random variable valued on X , and ξ and η obey the same probability distribution,
that is
P{ξ = x} = P{η = x}, ∀ x ∈ X.

Call two information spaces (X, ξ ) = (X, η), usually recorded as X .


As can be seen from Definition 3.1, an information space X constitutes a finite
complete event group, that is, we have

p(x) = 1, 0 ≤ p(x) ≤ 1, x ∈ X. (3.3)
x∈X

It should be noted that if there are two random variables ξ and η with values on X ,
when the probability distributions obeyed by ξ and η are not equal, then (X, ξ ) and
(X, η) are two different information spaces; at this point, we must distinguish the
two different information spaces with X 1 = (X, ξ ) and X 2 = (X, η).
Definition 3.2 X and Y are two source state sets, and the random variables ξ and η
are taken on X and Y , respectively; if ξ and η are compatible random variables, the
probability distribution of joint event x y(x ∈ X, y ∈ Y ) is defined as

p(x y) = P{ξ = x, η = y}, ∀ x ∈ X, y ∈ Y. (3.4)

Then, we call the joint event set


94 3 Shannon Theory

X Y = {x y|x ∈ X, y ∈ Y }

Together with the corresponding random variables ξ and η, it is called the product
space of information space (X, ξ ) and (Y, η), denote as (X Y, ξ, η), when ξ and η
are clear, they can be abbreviated as X Y = (X Y, ξ, η). If X = Y are two identical
source state sets, ξ and η have the same probability distribution, then the product
space X Y is denoted as X 2 and is called a power space.

Since the information space is a complete set of events, defined by the product
information space, we have the following full probability formula and probability
product formula:
⎧

⎪ p(yx) = p(y), ∀ y ∈ Y

x∈X
 (3.5)

⎪ p(x y) = p(x), ∀ x ∈ X.

y∈Y

And
p(x) p(y|x) = p(x y), ∀ x ∈ X, y ∈ Y.

Where p(y|x) is the conditional probability of y under the condition of x.

Definition 3.3 Let X 1 , X 2 , . . . , X n (n ≥ 2) be n source state sets, ξ1 , ξ2 , . . . , ξn be


n compatible random variables with values, respectively, in X i , the probability dis-
tribution of joint event x1 x2 · · · xn is

p(x1 x2 · · · xn ) = P{ξ1 = x1 , ξ2 = x2 , . . . , ξn = xn }. (3.6)

Then called
X 1 X 2 · · · X n = {x1 x2 · · · xn |xi ∈ X i , 1 ≤ i ≤ n}

are the product of n information spaces, especially when X 1 = X 2 = · · · = X n = X ,


and each ξi has the same probability distribution on X , define X n = X 1 X 2 · · · X n ,
called the n-th power space of information space X .

Let us give some classic examples of information space.


Example 3.1 (Two point information space with parameter λ) Let X = {0, 1} = F2
be a binary finite field, the random variable ξ taken on X is subject to the two-point
distribution with parameter λ, that is

p(0) = P{ξ = 0} = λ,
p(1) = P{ξ = 1} = 1 − λ.

where 0 < λ < 1, then (X, ξ ) is called a two-point information space with parameter
λ, still denote as X .
3.1 Information Space 95

Example 3.2 (Equal probability information space) Let X = {x1 , x2 , . . . , xn } be a


source state sets, the random variable ξ on X obeys the equal probability distribution,
that is
1
p(x) = P{ξ = x} = , ∀ x ∈ X.
|X |

Then (X, ξ ) is called equal probability information space, still denote as X .


Example 3.3 (Bernoulli information space) Let X 0 = {0, 1} = F2 . Let the random
variable ξi be the i-th Bernoulli test; therefore, {ξi }i=1
n
is a set of independent and
identically distributed random variables. We let the product space

X = (X 0 , ξ1 )(X 0 , ξ2 ) · · · (X 0 , ξn ) = X 0n ⊂ Fn2 ,

the power space X is called Bernoulli information space, also alled memoryless
binary information space. The probability function p(x) in X is


n
p(x) = p(x1 x2 · · · xn ) = p(xi ), xi = 0 or 1. (3.7)
i=1

where p(0) = λ, p(1) = 1 − λ.


Example 3.4 (Degenerate information space) If X = {x}, it contains only one char-
acter. X is called a degenerate information space, or trivial information space. The
random variable ξ takes the value x of probability 1, that is P{ξ = x} = 1. At this
time, ξ is a random variable with degenerate distribution in probability.
Definition 3.4 Let X = {x1 , x2 , . . . , xn } be a source state sets, if X is an information
space, the information entropy H (X ) of X is defined as

 
n
H (X ) = − p(x) log p(x) = − p(xi ) log p(xi ), (3.8)
x∈X i=1

if p(xi ) = 0 in the above formula, we agreed that p(xi ) log p(xi ) = 0, the base
of logarithm can be selected arbitrarily; if the base of the logarithm is D(D ≥ 2),
then H (X ) is called D-ary entropy, sometimes denote as H D (X ).
Theorem 3.1 For any information space X , always have

0 ≤ H (X ) ≤ log |X |. (3.9)

And H (X ) = 0 if and only if X is a degenerate information space, H (X ) = log |X |


if and only if X is a equal probability information space.
Proof H (X ) ≥ 0 is trivial. We only prove the inequality on the right of Eq. (3.9).
Because f (x) = log x is a strictly convex real value, from the Lemma 1.7 in Chap. 1,
thake g(x) = p(x)
1
is a positive function, p(x) > 0, thus let X = {x1 , x2 , . . . , xm },
96 3 Shannon Theory


m
1 m
p(xi )
H (X ) = p(xi ) log ≤ log = log m.
i=1
p(xi ) i=1
p(xi )

The above equal sign holds if and only if p(x1 ) = p(x2 ) = · · · = p(xm ) = m1 , that
is, X is equal probability information space. If X = {x} is a degenerate infor-
mation space, because p(x) = 1, so H (X ) = 0. Conversely, if H (X ) = 0, let
X = {x1 , x2 , . . . , xm }, suppose ∃ xi ∈ X , such that 0 < p(xi ) < 1, then

1
0 < p(xi ) log ≤ H (X ).
p(xi )

So there is p(xi ) = 1, but p(x j ) = 0( j = i); at this time, X degenerates into X =


{xi }, which is a trivial information space, the Lemma holds.
An information space is a dynamic code (which changes with the change of the
random variable on it). For “dynamic code”, that is, the code rate of information space
X , Shannon replaces n1 H (X ) with information entropy, so information entropy H (X )
becomes the first mathematical quantity to describe dynamic code. From Theorem
3.1, when the code is degenerate, the minimum rate of a dynamic code is 0, when
the code is equal probability, the maximum rate is the rate of the usual static code.
Next, we discuss the information entropy of several typical information spaces.
Example 3.5 (i) Let X be the two-point information space of parameter λ, then

H (X ) = −λ log λ − (1 − λ) log(1 − λ) = H (λ).

H (λ) we defined it in Chap. 1, it was called binary information entropy function at


that time. Now we know why it is called entropy function
(ii) X = {x} is degraded information space, then H (X ) = 0.
(iii) When X is equal overview information space, then H (X ) = log |X |.
Remark Most authors directly regard a random variable as an information space.
Mathematically, it is convenient to do so and call it the information measurement of
random variables. However, from the perspective of information, using the concept
of information space can better understand and simplify Shannon’s theory; the core
idea of this theory is the random measurement of information, not the information
measurement of random variables.

3.2 Joint Entropy, Conditional Entropy, Mutual


Information

Definition 3.5 Let X, Y be two information spaces, and ξ, η be random variables


with corresponding values, respectively. If ξ and η are independent random variables,
that is
3.2 Joint Entropy, Conditional Entropy, Mutual Information 97

P{ξ = x, η = y} = P{ξ = x} · P{η = y}, ∀ x ∈ X, y ∈ Y.

X and Y are called independent information space, and the probability distribution
of joint events is
p(x y) = p(x) p(y), ∀x ∈ X, y ∈ Y.

Definition 3.6 Let X, Y be two information spaces, the information entropy H (X Y )


of the product space X Y is called the joint entropy of X and Y , that is

H (X Y ) = − p(x y) log p(x y). (3.10)
x∈X y∈Y

The conditional entropy H (X |Y ) of X versus Y is defined as



H (X |Y ) = − p(x y) log p(x|y). (3.11)
x∈X y∈Y

Lemma 3.2 (Addition formula of entropy) For any two information spaces X and
Y , then we have

H (X Y ) = H (X ) + H (Y |X ) = H (Y ) + H (X |Y ).

Generally, for n information spaces X 1 , X 2 , . . . , X n , we have


n
H (X 1 X 2 · · · X n ) = H (X i |X i−1 X i−2 · · · X 1 ). (3.12)
i=1

Proof By (3.10) and probability multiplication formula,



H (X Y ) = − p(x y) log p(x y)
x∈X y∈Y

=− p(x y)(log p(x) + log p(y|x))
x∈X y∈Y

=− p(x) log p(x) + H (Y |X )
x∈X
= H (X ) + H (Y |X ).

The same can be proved

H (X Y ) = H (Y ) + H (X |Y ).

We prove (3.12) by induction, when n = 2,


98 3 Shannon Theory

H (X 1 X 2 ) = H (X 1 ) + H (X 2 |X 1 ).

The proposition is true, and for general n, we have

H (X 1 X 2 · · · X n ) = H (X 1 X 2 · · · X n−1 ) + H (X n |X 1 X 2 · · · X n−1 )

n−1
= H (X i |X i−1 X i−2 · · · X 1 ) + H (X n |X 1 X 2 · · · X n−1 )
i=1
n
= H (X i |X i−1 X i−2 · · · X 1 ).
i=1

The Lemma 3.2 holds.

Theorem 3.2 We have


H (X Y ) ≤ H (X ) + H (Y ). (3.13)

If and only if X and Y are statistically independent information spaces,

H (X Y ) = H (X ) + H (Y ). (3.14)

Generally, we have

H (X 1 X 2 · · · X n ) ≤ H (X 1 ) + H (X 2 ) + · · · + H (X n ). (3.15)

If and only if X 1 , X 2 , . . . , X n is an independent random process,

H (X 1 X 2 · · · X n ) = H (X 1 ) + H (X 2 ) + · · · + H (X n ). (3.16)

Proof By definition and Jensen inequality, we have


 p(x) p(y)
H (X Y ) − H (X ) − H (Y ) = p(x y) log
x∈X y∈Y
p(x y)

≤ log p(x) p(y)
x∈X y∈Y

= 0.

The above equal sign holds, if and only if for all x ∈ X , y ∈ Y , p(x) p(y)
p(x y)
= c( where
c is a constant), thus p(x) p(y) = cp(x y). Both sides sum at the same time, we have
  
1= p(x) p(y) = c p(x y),
x∈X y∈Y x∈X y∈Y
3.2 Joint Entropy, Conditional Entropy, Mutual Information 99

thus c = 1, p(x y) = p(x) p(y). So if and only if X and Y are independent information
spaces, (3.14) holds. By induction, we have (3.15) and (3.16). Theorem 3.2 holds.
By (3.15), we have the following direct corollary; for any information space X
and n ≥ 1, we have
H (X n ) ≤ n H (X ). (3.17)

Definition 3.7 Let X and Y be two information spaces, and say that X is completely
determined by Y , if there is always a subset N x ⊂ Y of Y for any given x ∈ X , satisfies

p(x|y) = 1, if y ∈ N x ;
(3.18)
p(x|y) = 0, if y ∈
/ Nx .

With regard to conditional information entropy H (X |Y ), we have the following


two important special cases.
Lemma 3.3 (i) 0 ≤ H (X |Y ) ≤ H (X ).
(ii) If the information space X is completely determined by Y , then

H (X |Y ) = 0. (3.19)

(iii) If X and Y are two separate information spaces,

H (X |Y ) = H (X ). (3.20)

Proof (i) is trivial. Let us prove (3.19) first. By Definition 3.7 and (3.18), for given
x ∈ X , we have
p(x y) = p(y) p(x|y) = 0, y ∈ / Nx .

Thus 
H (X |Y ) = − p(x y) log p(x|y)
x∈X y∈Y

=− p(x y) log p(x|y) = 0.
x∈X y∈N x

The proof of the formula (3.20) is obvious. Because X and Y are independent, the
conditional probability

p(x|y) = p(x), ∀ x ∈ X, y ∈ Y.

Thus 
H (X |Y ) = − p(x) p(y) log p(x)
x∈X y∈Y

=− p(x) log p(x) = H (X ).
x∈X
100 3 Shannon Theory

The Lemma 3.3 holds.


Next, we define the mutual information I (X, Y ) of two information spaces X and
Y.
Definition 3.8 Let X and Y be two information spaces, and then their mutual infor-
mation I (X, Y ) is defined as
 p(x|y)
I (X, Y ) = p(x y) log . (3.21)
x∈X y∈Y
p(x)

From the multiplication formula of probability, for all x ∈ X, y ∈ Y ,

p(x) p(y|x) = p(y) p(x|y) = p(x y).

We have
p(x|y) p(y|x)
= .
p(x) p(y)

Therefore, there is a direct conclusion from the definition of mutual information


I (X, Y )
I (X, Y ) = I (Y, X ).

Lemma 3.4

I (X, Y ) = H (X ) − H (X |Y ) = H (Y ) − H (Y |X ).

Proof By definition,
 p(x|y)
I (X, Y ) = p(x y) log
x∈X y∈Y
p(x)
 
= p(x y) log p(x|y) − p(x y) log p(x)
x∈X y∈Y x∈X y∈Y

= −H (X |Y ) − p(x) log p(x)
x∈X
= H (X ) − H (X |Y ).

The same can be proved

I (X, Y ) = H (Y ) − H (Y |X ).

Lemma 3.5 Assuming that X and Y are two information spaces, I (X, Y ) is the
amount of mutual information, then

H (X Y ) = H (X ) + H (Y ) − I (X, Y ). (3.22)
3.2 Joint Entropy, Conditional Entropy, Mutual Information 101

Further, we have I (X, Y ) ≥ 0, if and only if X and Y are independent, I (X, Y ) = 0.

Proof By the addition formula of Lemma 3.2,

H (X Y ) = H (X ) + H (Y |X )
= H (X ) + H (Y ) − (H (Y ) − H (Y |X ))
= H (X ) + H (Y ) − I (X, Y ).

The conclusion about I (X, Y ) ≥ 0 can be deduced directly from Theorem 3.2.

Let us prove an equation about entropy commonly used in the statistical analysis
of cryptography.

Theorem 3.3 If X, Y, Z are three information spaces, then

H (X Y |Z ) = H (X |Z ) + H (Y |X Z )
(3.23)
= H (Y |Z ) + H (X |Y Z ).

Proof By the definition, we have



H (X Y |Z ) = − p(x yz) log p(x y|z).
x∈X y∈Y z∈Z

By probability product formula,

p(x yz) = p(z) p(x y|z) = p(x z) p(y|x z).

Thus
p(x z) p(y|x z)
p(x y|z) = = p(x|z) p(y|x z).
p(z)

So we have

H (X Y |Z ) = − p(x yz) log p(x|z) p(y|x z)
x∈X y∈Y z∈Z

=− p(x yz)(log p(x|z) + log p(y|x z))
x∈X y∈Y z∈Z
 
=− p(x z) log p(x|z) − p(x yz) log p(y|x z)
x∈X z∈Z x∈X y∈Y z∈Z

= H (X |Z ) + H (Y |X Z ).

Similarly, the second formula can be proved.

Finally, we extend the formula (3.15) to conditional entropy.


102 3 Shannon Theory

Lemma 3.6 Let X 1 , X 2 , . . . , X n , Y be information spaces, then we have

H (X 1 X 2 · · · X n |Y ) ≤ H (X 1 |Y ) + · · · + H (X n |Y ). (3.24)

Specially, when X 1 = X 2 = · · · = X n = X ,

H (X n |Y ) ≤ n H (X |Y ). (3.25)

Proof We make an induction of n. The proposition is trivial when n = 1. Let the


proposition be true when n, i.e.,

H (X 1 X 2 · · · X n |Y ) ≤ H (X 1 |Y ) + · · · + H (X n |Y ).

Then when n + 1, we let X = X 1 X 2 · · · X n , then

H (X 1 X 2 · · · X n+1 |Y ) = H (X X n+1 |Y )
  
=− p(x zy) log p(x z|y).
x∈X z∈X n+1 y∈Y

From the full probability formula,


  
H (X |Y ) + H (X n+1 |Y ) = − p(x zy) log p(x|y) p(z|y).
x∈X z∈X n+1 y∈Y

So by Jensen inequality,

H (X X n+1 |Y ) − H (X |Y ) − H (X n+1 |Y )
   p(x|y) p(z|y)
= p(x zy) log
x∈X z∈X n+1 y∈Y
p(x z|y)
  
≤ log p(y) p(x|y) p(z|y).
x∈X z∈X n+1 y∈Y

By product formula   
p(y) p(x|y) p(z|y)
x∈X z∈X n+1 y∈Y

= p(x|y) p(y)
x∈X y∈Y

= p(x) = 1.
x∈X

So by the inductive hypothesis,


3.2 Joint Entropy, Conditional Entropy, Mutual Information 103

H (X X n+1 |Y ) ≤ H (X n+1 |Y ) + H (X |Y )
≤ H (X 1 |Y ) + H (X 2 |Y ) + · · · + H (X n+1 |Y ).

The proposition holds for n + 1. So the Lemma holds.

3.3 Redundancy

Select a alphabet Fq or a remaining class ring Zm of module m, each element in the


alphabet is called character, and in the field of communication, alphabet is also called
source state, and character is also called transmission signal. If the length of a q-ary
code is increased, redundant transmission signals or characters will appear in each
codeword. The digital measurement of “redundant characters” is called redundancy,
which is a technical means to improve the accuracy of codeword transmission, and
redundancy is an important mathematical quantity to describe this technical means.
Therefore, we start by proving the following lemma.

Lemma 3.7 Let X, Y, Z be three information spaces, then

H (X |Y Z ) ≤ H (X |Z ). (3.26)

Proof By total probability formula,



H (X |Z ) = − p(x z) log p(x|z)
x∈X z∈Z

=− p(x yz) log p(x|z).
x∈X z∈Z y∈Y

So
H (X |Y Z ) − H (X |Z )
 p(x|z)
= p(x yz) log
x∈X y∈Y z∈Z
p(x|zy)
   p(x yz) p(x|z)
≤ log
x∈X y∈Y z∈Z
p(x|zy)

= log p(yz) p(x|z)
x∈X y∈Y z∈Z

= log p(z) p(x|z)
x∈X z∈Z

= 0.

Thus H (X |Y Z ) ≤ H (X |Z ). The Lemma holds.


104 3 Shannon Theory

Let X be a source state set and randomly select codewords to enter the channel of
information transmission is a discrete random process. This mathematical model can
be constructed and studied on X by taking the value of a group of random variables
{ξi }i≥1 . Firstly, we assume that {ξi }i≥1 obeys the same probability distribution when
taking value on X , and we get a set of information spaces {X i }i≥1 , let H0 = log |X |
be the entropy of X as the equal probability information space, for n ≥ 1, we let

Hn = H (X |X n−1 ), H1 = H (X ).

By Lemma 3.7, then {Hn } constitutes a number sequence with monotonic descent
and lower bound, so that its limit exists, that is

lim Hn = a (a ≥ 0). (3.27)


n→∞

We will extend the above observation to the general case: Let {ξi }i≥1 be any set of
random variables valued on X , for any n ≥ 1, we let

X n = (X, ξn ), n ≥ 1.

Definition 3.9 A source state set X has a set of random variables {ξi }i≥1 valued on
X , then X is called a source.
(i) If {ξi }i≥1 is a group of independent and identically distributed random variables,
X is called a memoryless source.
(ii) If for any integers k, t1 , t2 , . . . , tk and h, random vector

(ξt1 , ξt2 , . . . , ξtk )(ξt1 +h , ξt2 +h , . . . , ξtk +h )

obey the same joint probability distribution, then X is called a stationary source.
(iii) If {ξi }i≥1 is a k-order Markov process, that is, for ∀ m > k ≥ 1,

p(xm |xm−1 xm−2 · · · x1 )


= p(xm |xm−1 xm−2 · · · xm−k ), ∀ x1 , x2 , . . . , xm ∈ X,

Then X is called k-order Markov source, specially, k = 1, i.e.,

p(xm |xm−1 xm−2 · · · x1 ) = p(xm |xm−1 ), ∀ x1 , x2 , . . . , xm ∈ X,

call X Markov source.


The concept from information space to source changes from a single random
variable taking value on X to an infinite dimensional random vector, so that the
transmission process of code X constitutes a discrete random process. By definition,
we have
3.3 Redundancy 105

Lemma 3.8 Let X be a source state set, and {ξi }i≥1 be a set of random variables
valued on X , we write
X i = (X, ξi ), i ≥ 1. (3.28)

(i) If X is a memoryless source, the joint probability distribution on X satisfies


n
p(x1 x2 · · · xn ) = p(xi ), xi ∈ X i , n ≥ 1. (3.29)
i=1

(ii) If X is a stationary source, then for all integers t1 , t2 , . . . , tk (k ≥ 1) and h, there


is the following joint probability distribution,

p(xt1 xt2 · · · xtk ) = p(xt1 +h xt2 +h · · · xtk +h ), (3.30)

where xi ∈ X i , i ≥ 1.
(iii) If X is a stationary Markov source, then the conditional probability distribution
on X satisfies for any m ≥ 1 and x1 x2 · · · xm ∈ X 1 X 2 · · · X m , we have

p(xm |x1 · · · xm−1 ) = p(xm |xm−1 )


(3.31)
= P{ξi+1 = xm |ξi = xm−1 }, ∀ 1 ≤ i ≤ m − 1.

Proof (i) and (ii) can be derived directly from the definition. We only prove (iii). By
(ii) of the definition 3.9, for ∀ i ≥ 1, we have

P{ξi = xm−1 , ξi+1 = xm } = P{ξm−1 = xm−1 , ξm = xm }

and
P{ξi = xm−1 } = P{ξm−1 = xm−1 }.

Thus
P{ξi = xm−1 }P{ξi+1 = xm |ξi = xm−1 }
= P{ξm−1 = xm−1 }P{ξm = xm |ξm−1 = xm−1 }.

We have
P{ξi+1 = xm |ξi = xm−1 } = p(xm |xm−1 ).

The Lemma holds.

Corollary 3.1 A memoryless source X must be a stationary source.

Proof Derived directly from Definition 3.9.

Next, we extend the limit formula in memoryless sources revealed by formula


(3.27) to general stationary sources. For this purpose, we first prove two lemmas.
106 3 Shannon Theory

Lemma 3.9 Let { f (n)}n1 be a sequence of real numbers, which satisfies the fol-
lowing semi countable additivity,

f (n + m)  f (n) + f (m), ∀ n  1, m  1.

1
Then lim f (n) exists, and
n→∞ n

1 1
lim f (n) = inf f (n)|n  1 . (3.32)
n→∞ n n

Proof Let
1
δ = inf f (n)|n  1 , δ = −∞.
n

For any ε > 0, select a sufficiently large positive integer m so that

1 ε
f (m) < δ + .
m 2
Let n = am + b, where a is an integer, 0  b < m, by semi countable additivity, we
have
f (n)  a f (m) + (n − am) f (1).

Divide n on both sides, we have

1 a b
f (n)  f (m) + f (1).
n am + b am + b

For given b, when m is large enough, we have

b f (1) 1
< ε.
am + b 2

So there is
1 1 1
f (n) < f (m) + ε < ε + δ. (3.33)
n m 2
Thus we have
1 1
δ  lim f (n)  lim f (n) < δ + ε.
n→∞ n n→∞ n

So
1
lim f (n) = δ.
n→∞ n
If δ = −∞, by (3.33),
3.3 Redundancy 107

1
lim f (n) = −∞,
n→∞ n

so we still have
1
lim f (n) = δ = −∞.
n→∞ n
The Lemma holds.
Lemma 3.10 Let {an }n1 be a sequence of real numbers, and the limit lim an = a,
n→∞
then
1
n
lim ai = a.
n→∞ n
i=1

Proof

1 1 1
n n n
ai − a = (ai − a)  |(ai − a)|
n i=1 n i=1 n i=1

1 1 
N n
= |ai − a| + |ai − a|
n i=1 n i=N +1

1
N
n−N
< |ai − a| + ε
n i=1 n

1
N
< |ai − a| + ε.
n i=1

When ε > 0 is given, N is also given accordingly, the first item of the above formula
tends to 0, when n → ∞. So for any ε > 0, when n > N0 ,

1
n
ai − a < 2ε.
n i=1

Thus there is
1
n
lim ai = a.
n→∞ n
i=1

The Lemma holds.


With the above preparations, we now give the main results of this section.
Theorem 3.4 Let X be any source, {ξi }i1 is a set of random variables valued on
X . For any positive integer n  1, let

X n = (X, ξn ), n  1.
108 3 Shannon Theory

Then when X is a stationary source, we have the following two limits that exist and
are equal, that is

1
lim H (X 1 X 2 . . . X n ) = lim H (X n |X 1 X 2 . . . X n−1 ).
n→∞ n n→∞

We denote the above common limit as H∞ (X ).


Proof Because X is a stationary source, for any n  1, m  1, then the joint event
probability distribution of random vector {ξn+1 , ξn+2 , . . . , ξn+m } on X is equal to the
joint probability distribution of random vector (ξ1 , ξ2 , . . . , ξm ); therefore, we have

H (X 1 X 2 · · · X m ) = H (X n+1 X n+2 · · · X n+m ). (3.34)

By Theorem 3.2, then

H (X 1 X 2 · · · X n X n+1 · · · X n+m )  H (X 1 · · · X n ) + H (X n+1 · · · X n+m )


= H (X 1 · · · X n ) + H (X 1 · · · X m ).

Let f (n) = H (X 1 · · · X n ), then f (n + m)  f (n) + f (m), so { f (n)}n1 is a non-


negative real number sequence with semi countable additive property, by Lemma
3.9, we have

1 1
lim H (X 1 X 2 · · · X n ) = inf H (X 1 X 2 · · · X n )|n  1  0.
n→∞ n n

Next, we prove that there is a second limit, that is

lim H (X n |X 1 X 2 · · · X n−1 )exist.


n→∞

Firstly, we prove that the sequence is monotonically decreasing, because X is a


stationary source, so

H (X 1 X 2 · · · X n−1 ) = H (X 2 X 3 · · · X n )

and
H (X 2 X 3 · · · X n X n+1 ) = H (X 1 X 2 · · · X n ).

So we have
H (X n+1 |X 2 X 3 · · · X n ) = H (X n |X 1 X 2 · · · X n−1 ). (3.35)

By Lemma 3.7,

H (X n+1 |X 1 X 2 · · · X n )  H (X n+1 |X 2 X 3 · · · X n )
= H (X n |X 1 X 2 · · · X n−1 ).
3.3 Redundancy 109

So {H (X n |X 1 X 2 · · · X n−1 )}n1 is a monotonically decreasing sequence and has a


lower bound, so lim H (X n |X 1 X 2 · · · X n−1 ) exist. Further, by the addition formula
n→∞
of Lemma 3.2,

1
n
1
H (X 1 X 2 · · · X n ) = H (X i |X 1 X 2 · · · X i−1 ).
n n i=1

By Lemma 3.10, finally we have

1
lim H (X 1 X 2 · · · X n ) = lim H (X n |X 1 X 2 · · · X n−1 ) = H∞ (X ).
n→∞ n n→∞

We completed the proof of the Theorem.


We call H∞ (X ) the entropy rate of source X . obviously, there is the following
corollary.
Corollary 3.2 (i) For any stationary source X , we have

H∞ (X )  H (X 1 )  log |X |.

(ii) If X is a memoryless source, then

H∞ (X ) = H (X 1 ).

(iii) If X is a stationary Markov source, then

H∞ (X ) = H (X 2 |X 1 ).

Proof Since {H (X n |X 1 · · · X n−1 )}n1 is a monotonically decreasing sequence, then

H∞ (X )  H (X 1 ).

That is, (i) holds. If X is a memoryless source, then


 
H (X 1 · · · X n ) = − ... p(x1 x2 · · · xn ) log p(x1 x2 · · · xn )
x1 ∈X 1 xn ∈X n
 
=− ··· p(x1 . . . xn ) {log p(x1 ) + · · · + log p(xn )}
x1 ∈X 1 xn ∈X n

= n H (X 1 ).

So we have
H∞ (X ) = H (X 1 ).

Similarly, we can prove (iii).


110 3 Shannon Theory

Definition 3.10 Let X be a stationary source, we define

H∞ X
δ = log |X | − H∞ (X ), r = 1 − , (3.36)
log |X |

δ is the redundancy of information space X , and r is the relative redundancy of X .


We write

H0 = log |X |, Hn = H (X n |X 1 X 2 · · · X n−1 ), ∀ n ≥ 1.

By Theorem 3.4, we have H∞ (X ) = H0 ≤ Hn , so

Hn ≥ (1 − r )H0 , ∀ n ≥ 1. (3.37)

In information theory, redundancy is used to describe the effectiveness of the


information carried by the source output symbol. The smaller the redundancy, the
higher the effectiveness of the information carried by the source output symbol, and
vice versa.

3.4 Markov Chain

Let X, Y, Z be three information spaces, if there is the following conditional proba-


bility formula
p(x y|z) = p(x|z) p(y|z). (3.38)

Say that X and Y are statistically independent under the given condition of Z .
Definition 3.11 If the information space X and Y are statistically independent under
condition Z , X, Y, Z is called a Markov chain, denote as X → Z → Y .
Theorem 3.5 X → Z → Y is a Markov chain if and only if the probability of occur-
rence of the joint event x zy is

p(x zy) = p(x) p(z|x) p(y|z), (3.39)

if and only if
p(x zy) = p(y) p(z|y) p(x|z). (3.40)

Proof If X → Z → Y is a Markov chain, then p(x y|z) = p(x|z) p(y|z), thus

p(x zy) = p(z) p(x y|z)


= p(z) p(x|z) p(y|z)
= p(x) p(z|x) p(y|z).
3.4 Markov Chain 111

Similarly,
p(x zy) = p(z) p(y|z) p(x|z)
= p(y) p(z|y) p(x|z).

That is (3.39) and (3.40) holds. Conversely, if (3.39) holds, then

p(x zy) = p(x) p(z|x) p(y|z)


= p(z) p(x|z) p(y|z).

On the other hand, the product formula

p(x zy) = p(z) p(x y|z).

So we have
p(x y|z) = p(x|z) p(y|z).

That is X → Z → Y is a Markov chain. Similarly, if (3.40) holds, then X → Z → Y


also is a Markov chain. The Theorem holds.

According to the above Theorem, or by Definition 3.11, obviously, if X → Z →


Y is a Markov chain, then Y → Z → X is also a Markov chain.
Definition 3.12 Let U, X, Z , Y be four information spaces, and the probability of
joint event ux zy is

p(ux zy) = p(u) p(x|u) p(z|x) p(y|z), (3.41)

Call U, X, Z , Y a Markov chain, denote as U → X → Z → Y .

Theorem 3.6 If U → X → Z → Y is a Markov chain, then U → X → Z and


U → Z → Y are also Markov chains.

Proof Assuming that U → X → Z → Y is a Markov chain, then

p(ux zy) = p(u) p(x|u) p(z|x) p(y|z),

Both sides sum y ∈ Y at the same time, and notice that y∈Y p(y|z) = 1, then

p(ux z) = p(u) p(x|u) p(z|x).

By Theorem 3.5, U → X → Z is a Markov chain. The left side of the above formula
can be expressed as
p(ux z) = p(ux) p(z|ux).

So we have
p(z|ux) = p(z|x).
112 3 Shannon Theory

Because U → X → Z → Y is a Markov chain, then

p(ux zy) = p(u) p(x|u) p(z|x) p(y|z)


= p(ux) p(z|ux) p(y|z)
= p(ux z) p(y|z).

Both sides sum x ∈ X at the same time, then we have

p(uzy) = p(uz) p(y|z)


= p(u) p(z|u) p(y|z).

Thus U → Z → Y is also a Markov chain. The Theorem holds.


In the previous section, we defined the mutual information I (X, Y ) of two infor-
mation spaces X and Y as
 p(x y)
I (X, Y ) = p(x y) log .
x∈X y∈Y
p(x) p(y)

Now we define the mutual information I (X, Y |Z ) of X and Y under condition Z as


 p(x y|z)
I (X, Y |Z ) = p(x yz) log . (3.42)
x∈X y∈Y z∈Z
p(x|z) p(y|z)

By definition, we have
I (X, Y |Z ) = I (Y, X |Z ). (3.43)

I (X, Y |Z ) is called the conditional mutual information of X and Y .


For conditional mutual information, we first prove the following formula.
Theorem 3.7 Let X, Y, Z be three information spaces, then
I (X, Y |Z ) = H (X |Z ) − H (X |Y Z ) (3.44)

and
I (X, Y |Z ) = H (Y |Z ) − H (Y |X Z ). (3.45)

Proof We only prove (3.44), the same is true for equation (3.45). Because
 p(x|yz)
H (X |Z ) − H (X |Y Z ) = p(x yz) log
x∈X y∈Y z∈Z
p(x|z)
 p(x y|z)
= p(x yz) log
x∈X y∈Y z∈Z
p(x|z) p(y|z)
= I (X, Y |Z ).

So (3.44) holds.
3.4 Markov Chain 113

Corollary 3.3 We have I (X, Y |Z ) ≥ 0, if and only if X → Z → Y is a Markov


chain I (X, Y |Z ) = 0.

Proof By Theorem 3.7,

I (X, Y |Z ) = H (X |Z ) − H (X |Y Z ) ≥ 0.

If X → Z → Y is a Markov chain, by (3.42),

p(x y|z)
log = log 1 = 0,
p(x|z) p(y|z)

that is I (X, Y |Z ) = 0. Vice versa.

Conditional mutual information can be used to establish the addition formula of


mutual information.
Corollary 3.4 (Addition formula of mutual information) If X 1 , X 2 , . . . , X n , Y are
information spaces, then


n
I (X 1 X 2 · · · X n , Y ) = I (X i , Y |X i−1 · · · X 1 ). (3.46)
i=1

Specially, when n = 2, we have

I (X 1 X 2 , Y ) = I (X 1 , Y ) + I (X 2 , Y |X 1 ). (3.47)

Proof By Lemma 3.4, we have

I (X 1 X 2 · · · X n , Y ) = H (X 1 X 2 · · · X n ) − H (X 1 X 2 · · · X n |Y )

n 
n
= H (X i |X i−1 · · · X 1 ) − H (X i |X i−1 · · · X 1 Y ).
i=1 i=1

Again by the chain rule of conditional entropy to get


n
I (X 1 X 2 · · · X n , Y ) = I (X i , Y |X 1 X 2 · · · X i−1 ).
i=1

Therefore, the corollary holds.

Finally, we use Markov chain to prove the inequality of mutual information.

Theorem 3.8 Suppose X → Z → Y is a Markov chain, then we have

I (X, Y ) ≤ I (X, Z ) (3.48)


114 3 Shannon Theory

and
I (X, Y ) ≤ I (Y, Z ). (3.49)

Proof We only prove (3.48), the same is true for equation (3.49). From equation
(3.47) and corollary 3.3:

I (Y Z , X ) = I (Y, X ) + I (X, Z |Y ).

Thus we have
I (X, Y ) = I (X, Y Z ) − I (X, Z |Y )
≤ I (X, Y Z )
= I (X, Z ) + I (X, Y |Z )
= I (X, Z ).

In the last step, we use the Markov chain condition, thus I (X, Y |Z ) = 0. The The-
orem holds.
Theorem 3.9 (Data processing inequality) Suppose U → X → Y → V is a Markov
chain, then we have
I (U, V ) ≤ I (X, Y ).

Proof According to the conditions, U → X → Y and U → Y → V is a Markov


chain, respectively, by Theorem 3.8,

I (U, Y ) ≤ I (X, Y )

and
I (U, V ) ≤ I (U, Y ).

Thus
I (U, V ) ≤ I (X, Y ).

The Theorem holds.

3.5 Source Coding Theorem

The information coding theory is usually divided into two parts: channel coding
and source coding. The so-called channel coding is to ensure the success rate of
decoding by increasing the length of codewords. Channel coding, also known as
error correction code, is discussed in detail in Chap. 2. Source coding is to compress
the data with redundant information to improve the success rate of decoding and
recovery after information or data is stored. Another important result of Shannon’s
theory is that there are so-called good codes in source coding, which is characterized
3.5 Source Coding Theorem 115

by fewer codewords as much as possible. To improve the storage space efficiency,


and the error of decoding and restoration can be arbitrarily small. Source coding is
also called typical code. Shannon first proved the asymptotic bisection property of
‘block code’for memoryless source, and drew the statistical characteristics of typical
code from now on. At Shannon’s suggestion, McMillan (1953) and Breiman (1957)
also proved a similar asymptotic bisection property for stationary ergodic sources.
This is the very famous Shannon–McMillan–Breiman theorem in source coding,
which constitutes the core content of modern typical code theory. The main purpose
of this section is to strictly prove the asymptotic bisection of memoryless sources, so
as to derive the source coding theorem for data compression (see Theorem 3.10). For
the more general Shannon–McMillan–Breiman theorem, Chap. 2 of Ye Zhongxing’s
fundamentals of information theory (see Zhongxing, 2003 in reference 3) gives a
proof under the condition of stationary ergodic Markov source, interested readers
can refer to it or refer to more original documents (see McMillan, 1953; Moy, 1961;
Shannon, 1959 in reference 3).
Firstly, let X = (X, ξ ) be an information space, and the entropy H (X ) of X
essentially depends only on the probability function p(x)(x ∈ X ) of random variable
ξ . We can define the random variable taking value on X according to p(x).

η1 = p(X ), η2 = log p(X ). (3.50)

The probability function is

P{η1 value x} = P{η2 value x} = p(x). (3.51)

It is easy to see the expected value of η2

−E(η2 ) = −E(log p(X ))



=− p(x) log p(x) = H (X ). (3.52)
x∈X

Therefore, we can regard the entropy H (X ) of X as the mathematical expectation of


1
random variable log p(X )
.

Lemma 3.11 Let X be a memoryless source, p(X n ) and log p(X n ) be two random
variables whose values are on the power space X n , then − n1 log p(X n ) converges to
H (X ) according to probability, that is

1 P
− log p(X n ) −→ H (X ).
n

Proof Since X is a memoryless source, {ξi }i≥1 is a group of independent and identi-
cally distributed random variables, X i = (X, ξi )(i ≥ 1), X n = X 1 X 2 · · · X n (n ≥ 1)
is a power space, then there is
116 3 Shannon Theory

p(X n ) = p(X 1 ) p(X 2 ) · · · p(X n )
n
log p(X n ) = i=1 log p(X i ).

Because {ξi }i≥1 is independent and identically distributed, { p(X n )} and {log p(X n } is
also a group of independent and identically distributed random variables. According
to Chebyshev’s law of large numbers (see Theorem 1.3 of Chap. 1),

1
n
1 1
− log p(X n ) = log
n n i=1 p(X i )

converges to the common expected value H (X ), that is


 
1 1
E log = E log = H (X ).
p(X i ) p(X )

For any ε > 0, for any codeword x = x1 x2 · · · xn ∈ X n , there is

1
P{| − log p(X n ) − H (X )| < ε} > 1 − ε. (3.53)
n
The proof is completed.

Definition 3.13 Let X be a memoryless source, power space X n , also known as


block code,
X n = {x = x1 · · · xn |xi ∈ X, 1 ≤ i ≤ n}, n ≥ 1. (3.54)

For any given ε > 0, n ≥ 1, we define a typical code or a typical sequence Wε(n) in
the power space X n as

1
Wε(n) = {x = x1 · · · xn | | − log p(x) − H (X )| < ε}. (3.55)
n
Because the definition, and ε > 0, n ≥ 1, we have

Wε(n) ⊂ X n , |X n | = |X |n . (3.56)

Lemma 3.12 (Progressive bisection) |Wε(n) | represents the number of codewords in


typical code Wε(n) , then for any ε > 0, in binary channels, we have

(1 − ε)2n(H (X )−ε) ≤ |Wε(n) | ≤ 2n(H (X )+ε) . (3.57)

Proof By Lemma 3.11 and (3.53), then for any x ∈ X n , we have

1
P{| − log p(x) − H (X )| < ε} > 1 − ε.
n
3.5 Source Coding Theorem 117

In other words, for all codewords x = x1 x2 · · · xn ∈ Wε(n) , we have

1
H (X ) − ε < − log p(x) < H (X ) + ε.
n
Equivalent in binary channel,

2−n(H (X )+ε) ≤ p(x) ≤ 2−n(H (X )−ε) , (3.58)

Denote the probability of occurrence of Wε(n) as P{Wε(n) }, then

P{Wε(n) } = P{x ∈ X n : x ∈ Wε(n) } > 1 − ε.

On the other hand, 


P{Wε(n) } = p(x),
x∈Wε(n)

by (3.58),
|Wε(n) | · 2−n(H (X )+ε) ≤ P{Wε(n) } ≤ 1.

So
|Wε(n) | ≤ 2n(H (X )+ε) .

Again by (3.58), there is

|Wε(n) | · 2−n(H (X )−ε) ≥ P{Wε(n) } > 1 − ε.

So we have
|Wε(n) | > (1 − ε)2n(H (X )−ε) .

Combined with the above inequalities on both sides, we have

(1 − ε)2n(H (X )−ε) ≤ |Wε(n) | ≤ 2n(H (X )+ε) .

We completed the proof.

By Lemma 3.12, for memoryless source X , the probability distribution p(x) of


its power space X n is approximate to

p(x) ∼ 2−n H (X ) , ∀ x ∈ X n .

The number of codewords |Wε(n) | in typical code Wε(n) is approximately

|Wε(n) | ∼ 2n H (X ) .
118 3 Shannon Theory

Further analysis shows that the proportion of typical code Wε(n) in block code X n is
very small, which can be summarized as the following Lemma.
Lemma 3.13 For a sufficiently small ε > 0 given, when X is not an equal probability
information space, we have
|Wε(n) |
lim = 0.
n→∞ |X |n

Proof By Lemma 3.12, we have

|Wε(n) | 2n(H (X )+ε)


≤ .
|X | n |X |n

So
|Wε(n) |
≤ 2−n(log |X |−H (X )−ε) .
|X |n

By Theorem 3.1, since X is not an equal probability information space, when ε is


sufficient, we have
H (X ) + ε < log |X |.

|Wε(n) |
Therefore, when n is sufficiently large, the ratio of |X |n
can be arbitrarily small. The
Lemma 3.13 holds.
Combining Lemmas 3.11, 3.12 and 3.13, we can describe that the typical codes
in block codes have the following statistical characteristics.
Corollary 3.5 Assuming that X is a memoryless source and the typical sequence
(or typical code) Wε(n) in block code X n is defined by formula (3.55), then for any
ε > 0, n ≥ 1, we have
(i) (Progressive bisection)

(1 − ε)2n(H (X )−ε) ≤ |Wε(n) | ≤ 2n(H (X )+ε) .

(ii) The occurrence probability P{Wε(n) } of Wε(n) is infinitely close to 1, that is

P{Wε(n) } = P{x ∈ X n : x ∈ Wε(n) } > 1 − ε.

(iii) When X is not equal to almost information space, the proportion of Wε(n) in block
code X n is any smaller, that is,

|Wε(n) |
lim = 0.
n→∞ |X |n

The above description of the statistical characteristics of typical codes is an impor-


tant theoretical basis for source coding or data compression. Therefore, we find an
3.5 Source Coding Theorem 119

effective way to compress the packet code information, so that the rearranged code-
words are as few as possible, and the error probability of decoding and recovery is
as small as possible. An effective method is to divide the codeword in block code
X n into two parts; the codeword of typical code Wε(n) is uniformly numbered from
1 to M. That is, the codeword in Wε(n) forms one-to-one correspondence with the
following positive integer set I ,

I = {1, 2, . . . , M}, M = |Wε(n) |.

For codewords that do not belong to Wε(n) , we uniformly number them as 1: Obvi-
ously, for i, i = 1, 1 ≤ i ≤ n, there is a unique codeword x (i) ∈ Wε(n) in Wε(n) , so we
decode
can accurately restore i to x (i) , that is i −→ x (i) is the correct decoding. For i = 1,
we will not be able to decode correctly, resulting in decoding recovery error. We
denote the code rate of the typical code Wε(n) as n1 log M, by Lemma 3.12,

(1 − ε)2n(H (X )−ε) ≤ M ≤ 2n(H (X )+ε) .

Equivalently,

log(1 − ε) + n(H (X ) − ε) ≤ log M ≤ n(H (X ) + ε),

Therefore, the bit rate of typical code Wε(n) is estimated as follows

1 1
log(1 − ε) + H (X ) − ε ≤ log M ≤ H (X ) + ε, (3.59)
n n
when 0 < ε < 1 given, we have
1
H (X ) − ε ≤ lim log M ≤ H (X ) + ε.
n→∞ n

In other words, the code rate is typically close to H (X ). Let us look at the decoding
error probability Pe after this number, where

/ Wε(n) }.
Pe = P{x ∈ X n : x ∈

Because

Pe + P{Wε(n) } = 1,

According to the statistical characteristics (ii) of the typical code Wε(n) ,

Pe = 1 − P{Wε(n) } < ε. (3.60)


120 3 Shannon Theory

From this, we derive the main result of this section, the so-called source coding
theorem.

Theorem 3.10 (Shannon, 1948) Assuming that X is a memoryless source, then


(i) When the code rate R = n1 log M1 > H (X ), there is an encoding with the code
rate of R, so that when n → ∞, the error probability of decoding recovery is
Pe → 0.
(ii) When the code rate R = n1 log M1 < H (X ) − δ, δ > 0 and does not change with
n → ∞, then any coding with R as the code rate has lim Pe = 1.
n→∞

Proof The above analysis has given the proof of (i). In fact, if

1
R= log M1 > H (X ),
n
then when ε is sufficiently small, by (3.59). Typical codes in block code X n are

1
R> log |Wε(n) |, M1 > |Wε(n) |.
n
Therefore, we construct a code C ⊂ X n , which satisfies

Wε(n) ⊂ C, |C| = M1 .

Thus, the code rate of C is just equal to R, and the decoding error probability Pe (C)
after compression coding satisfies Pe (C) < ε. Because the probability of occurrence
of C
P{C} + Pe (C) = 1.

But
P{C} ≥ P{Wε(n) } > 1 − ε,

(i) holds. To prove (ii), we note that, ∀ x ∈ Wε(n) , then

1
|− log p(x) − H (X )| < ε.
n

The above formula contains ∀ x ∈ Wε(n) ,

p(x) < 2−n(H (X )−ε) .

Thus, the probability of occurrence of Wε(n) satisfies



P{Wε(n) } = p(x) ≤ |Wε(n) | · 2−n(H (X )−ε) . (3.61)
x∈Wε(n)
3.5 Source Coding Theorem 121

If we use R as the bit rate, because

1
R= log M < H (X ) − δ,
n
then we have

|Wε(n) | < M = 2n(H (X )−δ) .

By (3.61),
P{Wε(n) } < 2−n(δ−ε) , (3.62)

when 0 < ε < δ, we have

1 − Pe = P{Wε(n) } < ε.

Thus
lim Pe = 1,
n→∞

Thus the theorem holds.

3.6 Optimal Code Theory

Let X be a source state set, x = x1 x2 · · · xn ∈ X n be a message sequence, and x be


output as a codeword u = u 1 u 2 · · · u k ∈ ZkD of length k after compression coding,
where D ≥ 1 is a positive integer, Z D is the remaining class ring of mod D, u =
u 1 u 2 · · · u k ∈ ZkD is called a D- ary codeword of length k. u is decoded and translated
into message x, that is u → x. The purpose of source coding is to find a good
coding scheme to make the code rate as small as possible under the requirement of
sufficiently small decoding error. Below, we give the strict mathematical definitions
of equal length code and variable length code.

Definition 3.14 Let X be a source state set, Z D is the remaining class ring of mod D,
n, k are positive integers. The mapping f : X n → ZkD is called equal length code
ψ
coding function; ZkD −→ X n is called the corresponding decoding function. For
∀ x = x1 · · · xn ∈ X n , f (x) = u = u 1 · · · u k ∈ ZkD , u = u 1 · · · u k is called a code-
word of length k.
C = { f (x) ∈ ZkD |x ∈ X n }, (3.63)

call
Call C is the code coded by f , and R = nk log D is the coding rate of f , also
known as the code rate of C. C is called equal length code; it is sometimes called a
block code with a packet length of k.
122 3 Shannon Theory

By Definition 3.14, the error probability of an equal length code coding scheme
( f, ψ) is
Pe = P{ψ( f (x)) = x, x ∈ X n }. (3.64)

Let us first consider error free coding, that is Pe = 0. Obviously, Pe = 0 if and


only if f is a injection, ψ = f −1 is the left inverse mapping of f . select a coding
function f : X n → ZkD as a injection if and only if |ZkD | ≥ |X n |, that is D k ≥ N n ,
where N = |X |, take logarithms on both sides,

k
R= log D ≥ log N = log |X |. (3.65)
n
Therefore, the code rate of error free compression coding f is at least log2 |X | bits
or ln |X | naits.
We consider progressive error free coding, that is, for any given ε > 0, required
decoding error probability Pe ≤ ε. By Theorem 3.10, only the code rate R ≥ H (X ) is
needed. In fact, take X as an information space and encode the n-lengthen message
column x = x1 x2 · · · xn ∈ X n , if x ∈ Wε(n) is a typical sequence (typical code), x
corresponds to a number in M = |Wε(n) |, if x ∈ / Wε(n) , uniformly code x as 1. If the
(n)
M codewords in Wε are represented by D-ary digits, let D k = M (the insufficient
part can be supplemented), and the code rate R is

1 k
R= log M = log D.
n n

Since M is approximately 2n H (X ) , R is approximately H (X ), that is R = n1 log M ∼


H (X ). From the asymptotic bisection, the error probability of such coding is

/ Wε(n) } < ε, When n is sufficiently large.


Pe = P{x = x1 · · · xn ∈

However, in practical application, n cannot increase infinitely, which requires us to


find the best coding scheme when given a finite n, so that the code rate is as close as
possible to the theoretical value H (X ). However, in application, we find that equal
length code is not an efficient coding scheme, while variable length code is more
practical. For example,
Example 3.6 Let X = {1, 2, 3, 4} be an information space, and the probability dis-
tribution of random variable ξ taking value on X is

1234
ξ∼ 1 1 1 1 .
2 4 8 8

The entropy H (X ) of information space X is

1 1 1 1 1 1 1 1
H (X ) = − log2 − log2 − log2 − log2 = 1.75bits.
2 2 4 4 8 8 8 8
3.6 Optimal Code Theory 123

If equal length code is used for coding, the code length is 2, and the code is
Source letter Codeword
1 00
2 01
3 10
4 11
Then the code rate R(k = 2, n = 1) is

R = 2 log2 2 = 2 > 1.75bits.

Obviously, the use efficiency of equal length codes is not high. If the above codes
are replaced with unequal length codes, such as
Source letter Codeword
1 0
2 10
3 110
4 111

We use l(x) to represent the code length after the source letter x is encoded, then the
average code length L required for X encoding is


4
1 1 1 1
L= p(xi )l(xi ) = × 1 + × 2 + × 3 + × 3 = 1.75 bits = H (X ).
i=1
2 4 8 8

It can be seen that using unequal length code to compile X has higher efficiency.
This example also explains the following compression coding principle: for char-
acters with high probability of occurrence, a shorter codeword is prepared, and for
characters with low probability of occurrence, a longer codeword is prepared to
ensure that the average coding length is as small as possible.
Next, we give the mathematical definition of variable length coding. For this
purpose, ∗
 let X and Z∗D be the set of finite length sequences, respectively. That is

X = 1≤k<∞ X . k

f
Definition 3.15 (i) X n −→ Z∗D is called a variable length code function, if any x ∈
X n , f (x) ∈ Z∗D , When x is different, the code length of f (x) is not necessarily
the same. We use l(x) to table the length of f (x), which is called the coding
length of x. C = { f (x) ∈ Z∗D |x ∈ X n } is called variable length codeword set.
(ii) Let f : X ∗ −→Z∗D be a amapping, call f is a coding mapping, f (X ∗ ) is called
a code.
(iii) f : X ∗ −→Z∗D is called a block code mapping, if there is a mapping g :
X −→Z∗D , so that for any x ∈ X n (n ≥ 1), write x = x1 x2 · · · xn , there is f (x) =
g(x1 )g(x2 ) · · · g(xn ).
(iv) f : X ∗ −→Z∗D is called a uniquely decodable map, if f is a block code mapping
and f is a injection.
124 3 Shannon Theory

(v) f : X ∗ −→Z∗D is called a real-time code mapping. If f is a block code mapping,


and for any x, y ∈ X ∗ , f (x) and f (y) cannot be prefixes to each other.
Remark 3.1 a = a1 a2 · · · an ∈ ZnD , b = b1 b2 · · · bm ∈ ZmD , call codeword a the pre-
fix of b, if m ≥ n, and for any 1 ≤ i ≤ n, there is ai = bi .
Lemma 3.14 Block code mapping f : X ∗ −→Z∗D is called a uniquely decodable
mapping if and only if for ∀ n ≥ 1, X n −→Z∗D , f is restricted to a injection on X n .
Proof The necessity is obvious and the adequacy is proved. That is to prove for
∀ x = x1 x2 · · · xn ∈ X n , y = y1 y2 · · · ym ∈ X m , x = y, there is f (x) = f (y). Sup-
pose there is f (x) = f (y), because f is a block code mapping, there is a mapping
g : X −→Z∗D , we have

f (x) = g(x1 )g(x2 ) · · · g(xn ) = g(y1 )g(y2 ) · · · g(ym ) = f (y).

Then
f (x y) = g(x1 )g(x2 ) · · · g(xn )g(y1 )g(y2 ) · · · g(ym )
= g(y1 )g(y2 ) · · · g(ym )g(x1 )g(x2 ) · · · g(xn )
= f (yx).

But x y = yx, this contradicts the fact that f is restricted to a injection on X n+m .
Lemma 3.15 A real-time code is uniquely decodable, and vice versa.
Proof Suppose f : X ∗ −→Z∗D as an instant code mapping, and for x, y ∈ X ∗ , x = y,
there is f (x) = a1 a2 · · · an ∈ ZnD , f (y) = b1 b2 · · · bm ∈ ZmD (m ≥ n). Because f (x)
is not a prefix of f (y), it exists i(1 ≤ i ≤ n), there is ai = bi , thus f (x) = f (y),
that is f is an injection. In turn, let us take a counter example,
Source letter Codeword
1 0
2 01
3 011
4 111

where X = {1, 2, 3, 4} is the information space and f : X → Z∗2 is a variable


length code. f (1) is the prefix of f (2), that is, f is not a real-time code map, but
obviously f is the only decodeable map. The Lemma holds.
What are the conditions for the code length of a real-time code? The following
Kraft inequality gives a satisfactory answer.
Lemma 3.16 For the uniquely decodable code C value in Z∗D , |C| = m, the code
lengths are l1 , l2 , . . . , lm , then there is the following McMillan–Kraft inequality.


m
D −li ≤ 1. (3.66)
i=1
3.6 Optimal Code Theory 125

On the contrary, if li satisfies the above conditions, there is a code length set of
real-time code C such that {l1 , l2 , . . . , lm } is C.

Proof Consider
 m n

D −li )n = (D −l1 + D −l2 + · · · + D −lm ,
i=1

the form of each item is D −li1 −li2 −···−lin = D −k , where li1 + li2 + · · · + lin = k. Sup-
pose l = max{l1 , l2 , . . . , lm }, then the range of k is from n to nl. Define the number
of items where Nk is D −k , then
 m n
 
nl
−li
D = Nk D −k .
i=1 k=n

Note that Nk can be regarded as the number of codeword sequences with a total
length of k just assembled by n codewords in C, i.e.,

Nk = |{(c1 , c2 , . . . , cn ) | |c1 c2 · · · cn | = k, ci ∈ C}|.

The codeword is still in Z∗D , and because f : X ∗ −→Z∗D is an injection, so Nk ≤ D k .


then we have
 m n
  nl nl
−li
D = Nk D −k ≤ D k D −k = nl − n + 1 ≤ nl.
i=1 k=n k=n

If x ≥ 1, and when n Is Sufficiently Large, x n > nl. But the above formula holds for
m
all arbitrary n. That is i=1 D −li ≤ 1.
On the contrary, assuming that Kraft inequality exists, that is, there is a given
length li (1 ≤ i ≤ m) satisfying formula (3.66), now we need to construct a real-
time code with these lengths, and li (1 ≤ i ≤ m) may not be completely different.
Definition n j is the number of codewords with length j, if l = max{l1 , l2 , . . . , lm },
then
 l
n j = m.
j=1

(3.66) equivalent to

l
n j D − j ≤ 1.
j=1

l
Multiply both sides by Dl , then j=1 n j Dl− j ≤ Dl . There is
126 3 Shannon Theory

n l ≤ Dl − n 1 Dl−1 − n 2 Dl−2 − · · · − n l−1 D,

n l−1 ≤ Dl−1 − n 1 Dl−2 − n 2 Dl−3 − · · · − n l−2 D,

···

n 3 ≤ D 3 − n 1 D 2 − n 2 D,

n 2 ≤ D 2 − n 1 D,

n 1 ≤ D.

Because n 1 ≤ D, we can choose these n 1 codes arbitrarily, and the remaining D −


n 1 codes with length 1 can be used as the prefix of other codewords. Therefore,
there are (D − n 1 )D options for codewords with length of 2. That is n 2 ≤ D 2 −
n 1 D. Similarly, (D − n 1 )D − n 2 codewords can be used as prefixes of subsequent
codewords. Therefore, there are at most ((D − n 1 )D − n 2 )D options for codewords
with length of 3. That is n 3 ≤ D 3 − n 1 D 2 − n 2 D. . . ., in this way, we can always
construct a real-time code with length {l1 , l2 , . . . , lm }. The Lemma holds!
Let us give an example that is not the only one that can be decoded.
Example 3.7 Let X = {1, 2, 3, 4}, Z D = F2 , the coding scheme is
Source letter Codeword
1 0=f(1)
2 1=f(2)
3 00=f(3)
4 11=f(4)

Because the encoder inputs and the decoder receives continuous codeword sym-
bols, if the character received by the decoder is 001101, there may be two decoding
results, 112212 and 3412. This shows that f ∗ is not an injection, that is, the code
written by f is not uniquely decodable.
By Lemma 3.16, real-time codes or, more generally, uniquely decodable codes
must satisfy Kraft inequality. However, the variable length code compiled according
to kraft inequality is not the optimal code, because from the perspective of random
coding, an optimal code not only requires the accuracy of decoding, but also ensures
the efficiency, that is, the average random code length requires the shortest. We
summarize the strict mathematical definition of the optimal code as.
Definition 3.16 Let X = {x1 , x2 , . . . , xm } is an information space, a real-time code
C = { f (x1 ), f (x2 ), . . . , f (xm )} is called an optimal code if its average random code
length
m
L= pi li (3.67)
i=1
3.6 Optimal Code Theory 127

is the smallest, where pi = p(xi ) is the occurrence probability of xi and li is the code
length of xi .
For a source state set X , when its statistical characteristics are determined, that is,
after X becomes an information space, the probability distribution { p(x)|x ∈ X } is
given. Therefore, to find the optimal compression coding scheme for an information
space X is to find the optimal solution {l1 , l2 , . . . , lm } of (3.67) under the condition of
kraft inequality. Usually, we use the Lagrange multiplier method to find the optimal
solution. Let  m 
m 
−li
J= pi li + λ D ,
i=1 i=1

Find the partial derivative of li

∂J
= pi − λD −li log D.
∂li

Thus pi
D −li = .
λ log D

By Kraft inequality, that is



m
D −li ≤ 1.
i=1

We get

m
1 m
1
1≥ D −li = pi ⇒ λ ≥ .
i=1
λ log D i=1 log D

Thus, the optimal code length li is

li ≥ − log D pi , pi ≥ D −li . (3.68)

The corresponding optimal average code length L is


m 
m
L= pi li ≥ − pi log D pi = H D (X ). (3.69)
i=1 i=1

That is, L is the D-ary information entropy H D (X ) of X . from this, we get the main
results of this section.

Theorem 3.11 The average length L of any D-ary real-time code in an information
space X shall satisfies
L ≥ H D (X ).
128 3 Shannon Theory

The equal sign holds if and only if pi = D −li .

Next, we will give another proof of Theorem 3.11. Therefore, we consider that there
are two random variables ξ and η on a source state set X , and their probability
distributions are

p(x) = P{ξ = x}, q(x) = P{η = x}, ∀ x ∈ X.

The relative entropy of random variables is defined as


 p(x)
D( p||q) = p(x) log . (3.70)
x∈X
q(x)

Lemma 3.17 The relative entropy D( p||q) of two random variables on X satisfies

D( p||q) ≥ 0, and D( p||q) = 0 ⇐⇒ p(x) = q(x), ∀ x ∈ X.

Proof If the real number x > 0 is expanded by the power series of e x , it can be
obtained
1
e x−1 = 1 + (x − 1) + (x − 1)2 + · · · .
2

Thus e x−1 ≥ x, there is log x ≤ x − 1, by (3.70), then


 q(x)
−D( p||q) = p(x) log
x∈X
p(x)
 q(x)
≤ p(x)( − 1) = 0.
x∈X
p(x)

Thus, there is D( p||q) ≥ 0, D( p||q) = 0’s conclusion is obvious.

Proof (Another proof of theorem 3.11) Investigate L − H D (X ),


m 
m
1
L − H D (X ) = pi li − pi log D
i=1 i=1
pi
(3.71)

m 
m
−li
=− pi log D D + pi log D pi .
i=1 i=1

Define
D −li  m
ri = , c= D −li .
c j=1

By Kraft inequality, we have


3.6 Optimal Code Theory 129


m
c ≤ 1, and ri = 1.
i=1

Therefore, {ri , 1 ≤ i ≤ m} is a probability distribution on X , by (3.71),


m 
m 
m 
pi 1
L − H D (X ) = − pi log D cri + pi log D pi = pi log D + log D .
i=1 i=1 i=1
ri c

By Lemma 3.17 and c ≤ 1, we have

L − H D (X ) ≥ 0, and L = H D (X ) if and only if c = 1 and ri = pi ,

that is
1
pi = D −li , or li = log D .
pi

We complete the proof of theorem 3.11.

By Theorem 3.11, coding according to probability, then the code length of D-ary
optimal code is
1
li = log D , 1 ≤ i ≤ m.
pi

But in general, log D p1i is not an integer, we use a to represent the smallest integer
not less than the real number a. Take
 
1
li = log D , 1 ≤ i ≤ m. (3.72)
pi

Then

m 
m
− log D 1 
m
D −li ≤ D pi
= pi = 1.
i=1 i=1 i=1

Then the code length defined by formula (3.72) is {l1 , l2 , . . . , lm } and satisfies Kraft
inequality. From Lemma 3.16, we can define the corresponding real-time code.

Definition 3.17 Let X = {x1 , x2 , . . . , xm } be an information space, pi = p(xi ),


 
1
l( f (xi )) = li = log D , 1 ≤ i ≤ m.
pi

Then the real-time code corresponding to {l1 , l2 , . . . , lm } is called Shannon code.

Corollary 3.6 The code length l( f (xi )) of a Shannon code C = { f (xi )|1 ≤ i ≤ m}
satisfies
130 3 Shannon Theory
 
1 1 1
li = log D , log D ≤ li < log D +1 (3.73)
p(xi ) p(xi ) p(xi )

and
H D (X ) ≤ L < H D (X ) + 1.

Where L is the average code length of C.


Proof According to the definition of a, a ≤ a < a + 1, thus

1 1
log D ≤ li < log D + 1.
p(xi ) p(xi )

So both sides multiply by p(xi ) and sum 1 ≤ i ≤ m, then there is


m m m 
1 1
p(xi ) log D ≤ p(xi )li < p(xi ) log D +1 .
i=1
p(xi ) i=1 i=1
p(xi )

That is
H D (X ) ≤ L < H D (X ) + 1.

The Corollary holds.

3.7 Several Examples of Compression Coding

3.7.1 Morse Codes

In variable length codes, in order to make the average code length as close to the
source entropy as possible, the code length should match the occurrence probability of
the corresponding coded characters as much as possible. The principle of probabilistic
coding is that the characters with high occurrence probability are configured with
short codewords, and the characters with low occurrence probability are configured
with long codewords, So as to make the average code length as close to the source
entropy as possible. This idea has existed long before Shannon theory. For example,
Morse code invented in 1838 uses three symbols of dot, dash and space to encode 26
letters in English. It is expressed in binary, one dot is 10, a total of 2 bits, one dash is
1110, a total of 4 bits and the space is 000. There are three bits in total. For example,
the commonly used English letter E is represented by a dot, while the infrequently
used letter Q is represented by two dashes, one dot and one dash, which can make
the average length of the codeword of the English text shorter. However, Morse code
does not completely match the occurrence probability, so it is not the optimal code,
and it is basically not used now. The following table is the coding table of Morse
code (Fig. 3.1)
3.7 Several Examples of Compression Coding 131

Fig. 3.1 The coding table of


Morse code

It is worth noting that Morse code appeared as a kind of password in the early
stage, which is widely used in the transmission and storage of sensitive politics (such
as military intelligence). The early cryptosystem compilers were also manufactured
based on the principle of Morse code, which quickly mechanized the compilation
and translation of passwords. In this sense, Morse code has played an important role
in promoting the development of cryptography.
132 3 Shannon Theory

3.7.2 Huffman Codes

Shannon, Fano and Huffman have all studied the coding methods of variable length
codes, among which Huffman codes have the highest coding efficiency. We focus on
the coding methods of Huffman binary and ternary codes.
Let X = {x1 , x2 , . . . , xm } be the source letter set of m symbols, arrange the m
symbols in the order of occurrence probability, take the two letters with the lowest
probability to prepare the numbers “0” and “1,” respectively, then add their proba-
bilities as a new letter and rearrange them in the order of probability with the source
letters without binary numbers. Then take the two letters with the lowest probability
to prepare the numbers “0” and “1,” respectively, add the probabilities of the two
letters as the probability of a new letter, and re queue; continue the above process
until the probability of the remaining letters is added to 1. At this time, all source
letters correspond to a string of “0” and “1,” and we get a variable length code, which
is called Huffman code. Taking X = {1, 2, 3, 4, 5} as the information space as an
example, the corresponding probability distribution is

1 2 3 4 5
ξ∼ .
0.25 0.25 0.2 0.15 0.15

Binary information entropy H2 (X ) and ternary information entropy H3 (X ) are


H2 (X ) = −0.25 log2 0.25 − 0.25 log2 0.25 − 0.2 log2 0.2
− 0.15 log2 0.15 − 0.15 log2 0.15
= 2.28 bits,
H3 (X ) = −0.25 log3 0.25 − 0.25 log3 0.25 − 0.2 log3 0.2
− 0.15 log3 0.15 − 0.15 log3 0.15
= 1.44 bits,

respectively. The binary Huffman coding diagram of X is (Fig. 3.2).


The ternary Huffman coding diagram of X is (Fig. 3.3).
In summary, Huffman code has the following characteristics. Assuming that the
occurrence probability of the i-th source letter is pi and the corresponding code
length is li , then

Fig. 3.2 The binary


Huffman coding
3.7 Several Examples of Compression Coding 133

Fig. 3.3 The ternary


Huffman coding

(1) If pi > p j , then li ≤ l j , that is, the source letter with low probability has a longer
codeword;
(2) The longest two codewords have the same code length;
(3) The codeword letters of the two longest codewords are only different from the
last letter, and the front ones are the same;
(4) In real-time codes, the average code length of Huffman code is the smallest. In
this sense, Huffman code is the optimal code.
Huffman code has been applied in practice, which is mainly used in the compression
standard of fax image. However, in the actual data compression, the statistical char-
acteristics of some sources change before and after. In order to make the statistical
characteristics based on the coding adapt to the changes of the actual statistical char-
acteristics of the source, an adaptive coding technology has been developed. In each
step of coding, the coding of a new message is based on the statistical characteristics
of previous messages. For example, R. G. Gallager first proposed the step-by-step
updating technology of Huffman code in 1978, and D.E. Knuth made this technol-
ogy a practical algorithm in 1985. Adaptive Huffman coding technology requires
complex data structure and continuous updating of codeword set according to the
statistical characteristics of source, We would not go into details here.

3.7.3 Shannon–Fano Codes

Shannon–Fano code is an arithmetic code. Let X be an information space. It can be


inferred from Corollary 3.6 in the previous section that the code length of Shannon
code on X is  
1
l(x) = log , ∀ x ∈ X.
p(x)
134 3 Shannon Theory

Here, we introduce a constructive coding method using cumulative distribution func-


tion to allocate codewords, commonly known as Shannon–Fano coding method.
Without losing generality, let each letter x in X , there is p(x) > 0, and define the
cumulative distribution function F(x) and the modified distribution function F̄(x)
as
  1
F(x) = p(a), F̄(x) = p(a) + p(x), (3.74)
a≤x a<x
2

where X = {1, 2, . . . , m} is a given information space. Without losing generality, let


p(1) ≤ p(2) ≤ · · · ≤ p(m).
As can be seen from the definition, if x ∈ X , then p(x) = F(x) − F(x − 1),
specially, if x, y ∈ X , then we have

F̄(x) = F̄(y).

So when we know F̄(x), we can find the corresponding x. The basic idea of Shannon–
Fano arithmetic code is to use F̄(x) to encode x. Because F̄(x) is a real number, its
binary decimal represents the first l(x) bits, denote as { F̄(x)}l(x) , there is

F̄(x) − { F̄(x)}l(x) < 2−l(x) . (3.75)


 
Take l(x) = log 1
p(x)
+ 1, then we have

1 1 p(x)
=   < = F̄(x) − F(x − 1), (3.76)
2l(x) log 1
2
2·2 p(x)

Now let the binary decimal of F̄(x) be expressed as

F̄(x) = 0.a1 a2 · · · al(x) al(x)+1 · · · , ∀ ai ∈ F2 .

Then Shannon–Fano code is


encode
f (x) = a1 a2 · · · al(x) , that is x −→ a1 a2 · · · al(x) ∈ Fl(x)
2 . (3.77)

Lemma 3.18 The binary Shannon Fano code is a real-time code, and its average
length L is at most two bits different from the theoretical optimal value H (X ).

Proof By (3.76),
1
2−l(x) < p(x) = F̄(x) − F(x − 1).
2

Let the binary decimal of F̄(x) be expressed as

F̄(x) = 0.a1 a2 · · · al(x) · · · , ∀ ai ∈ F2 .


3.8 Channel Coding Theorem 135

We use [A, B] to represent a closed interval on the real axis, so

1
F̄(x) ∈ [0.a1 a2 · · · al(x) , 0.a1 a2 · · · al(x) + ].
2l(x)
If y ∈ X , x = y, and f (x) is the prefix of f (y), then we have

1
F̄(y) ∈ [0.a1 a2 · · · al(x) , 0.a1 a2 · · · al(x) + ].
2l(x)
But
1 1 1
F̄(y) − F̄(x) ≥ p(y) ≥ p(x) > l(x) ,
2 2 2

This is contrary to the fact that F̄(x) and F̄(y) are in the same interval. Therefore, we
have f as real-time code, that is, Shannon–Fano code is real-time code. Considering
its average code length L,

      
1 1
L= p(x)l(x) = p(x) log +1 < p(x) log + 2 = H (X ) + 2.
p(x) p(x)
x∈X x∈X x∈X

We complete the proof of the Lemma.

Let n ≥ 1, X n is the power space of the information space, x = x1 · · · xn ∈ X n


is called a message column of length n. In order to improve the coding efficiency,
it is often necessary to compress the power space X n , which is called arithmetic
coding. Shannon–Fano code can also be used as arithmetic coding. Its basic method
is to find a fast algorithm for calculating joint probability distribution p(x1 x2 · · · xn )
and cumulative distribution function F(x), and then use Shannon–Fano method to
encode x = x1 · · · xn . We will not introduce the specific details here.

3.8 Channel Coding Theorem

Let X be the input alphabet and Y the output alphabet, and let ξ and η be two random
variables with values on X and Y . The probability functions p(x) and p(y) of X and
Y and the conditional probability function p(y|x) are

p(x) = P{ξ = x}, p(y) = P{η = y}, p(y|x) = P{η = y|ξ = x}respectively.

From the full probability formula,



⎨ p(y|x) ≥ 0, ∀ x ∈ X, y ∈ Y.
p(y|x) = 1, ∀ x ∈ X. (3.78)

y∈Y
136 3 Shannon Theory

If X and Y are finite sets, the conditional probability matrix T = ( p(y|x))|X |×|Y | is
called the transition probability matrix from X to Y , i.e.,
⎛ ⎞
p(y1 |x1 ) p(y2 |x1 ) . . . p(y N |x1 )
⎜ p(y1 |x2 ) p(y2 |x2 ) . . . p(y N |x2 ) ⎟
T =⎜
⎝ ...
⎟,
⎠ (3.79)
p(y1 |x M ) p(y2 |x M ) . . . p(y N |x M )

where |X | = M, |Y | = N . By (3.78), each row of the transition probability matrix


T is added to 1.

Definition 3.18 (i) A discrete channel is composed of a finite information space X


as the input alphabet, a finite information space Y as the output alphabet, and a
transition probability matrix T from X to Y , denote that this discrete channel is
{X, T, Y }. If X = Y = Fq is q -element finite field, then {X, T, Y } is a discrete
q-ary channel. In particular, if q = 2, then {X, T, Y } is called discrete binary
channel.
(ii) If {X, T, Y } is a discrete q-ary channel and T = Iq is the q-order identity matrix,
{X, Iq , Y } is called a noise free channel.
(iii) If {X, T, Y } is a discrete q-ary channel and T = T  is a q-order symmetric
matrix, {X, T, Y } is called a symmetric channel.

In discrete channel {X, T, Y }, codeword spaces X n and Y n with length n are


defined as

X n = {x = x1 · · · xn |xi ∈ X }, Y n = {y = y1 · · · yn |yi ∈ Y }, n ≥ 1.

The probabilities of joint events x = x1 · · · xn and y = y1 · · · yn are defined as


n 
n
p(x) = p(x1 · · · xn ) = p(xi ), p(y) = p(y1 · · · yn ) = p(yi ), (3.80)
i=1 i=1

then X and Y become a memoryless source, X n and Y n are power spaces, respectively.

Definition 3.19 Discrete channel {X, T, Y } is called a memoryless channel if for


any positive integer n ≥ 1, x = x1 · · · xn ∈ X n , y = y1 · · · yn ∈ Y n , we have

⎨ p(y|x) =  p(y |x ),
n
i i
i=1 . (3.81)

p(xi yi ) = p(x1 y1 ), ∀ i ≥ 1.

From the joint event probability p(xi yi ) = p(x1 y1 ) in equation (3.81), then there
is
p(x1 )
p(yi |xi ) = p(y1 |x1 ). (3.82)
p(xi )
3.8 Channel Coding Theorem 137

The above formula shows that in a memoryless channel, the conditional probability
p(yi |xi ) does not depend on yi .
Definition 3.19 is the statistical characteristic of a memoryless channel. The fol-
lowing lemma gives a mathematical characterization of a memoryless channel.

Lemma 3.19 A discrete channel {X, T, Y } is a memoryless channel if and only


if the product information space X Y is a memoryless source, and a power space
(X Y )n = X n Y n .

Proof If X Y is a memoryless source (see Definition 3.9), thn for any n ≥ 1, and
x = x1 · · · xn ∈ X n , y = y1 · · · yn ∈ Y n , x y ∈ X n Y n , there is


n
p(x y) = p(x1 · · · xn y1 · · · yn ) = p(x1 y1 · · · xn yn ) = p(xi yi ).
i=1

Thus

n
p(x) p(y|x) = p(x) p(yi |xi ),
i=1

so we have

n
p(y|x) = p(yi |xi ).
i=1

p(xi yi ) = p(x1 y1 ) is given by the definition of memoryless source, so {X, T, Y } is


a memoryless channel. Conversely, if {X, T, Y } is a memoryless channel, by (3.81),
there are
 n
p(x y) = p(xi yi )
i=1

and p(xi yi ) = p(x1 y1 ), then for any a = a1 a2 · · · an ∈ (X Y )n , where ai = xi yi , we


have
n n
p(a) = p(x1 · · · xn y1 · · · yn ) = p(x y) = p(xi yi ) = p(ai )
i=1 i=1

and p(ai ) = p(a1 ), therefore, X Y is a memoryless source, that is, a group of indepen-
dent and identically distributed random vectors ξ = (ξ1 , ξ2 , . . . , ξn , . . .) take value
on X Y , and (X Y )n = X n Y n is called power space. The Lemma holds.

The following lemma further characterizes the statistical characteristics of a mem-


oryless channel.

Lemma 3.20 If {X, T, Y } is a discrete memoryless channel, the conditional entropy


H (Y n |X n ) and information I (X n , Y n ) of information space X n and Y n satisfy ∀ n ≥
1,
138 3 Shannon Theory

H (Y n |X n ) = n H (Y |X ).
(3.83)
I (X n , Y n ) = n I (X, Y ).

Proof Because X Y is a memoryless source, we have

H (X n Y n ) = H ((X Y )n ) = n H (X Y ) = n H (X ) + n H (Y |X ).

On the other hand, by the addition formula of entropy, there is

H (X n Y n ) = H (X n ) + H (Y n |X n ) = n H (X ) + H (Y n |X n ).

The combination of the above two formulas has

H (Y n |X n ) = n H (Y |X ).

According to the definition of mutual information,

I (X n , Y n ) = H (Y n ) − H (Y n |X n )
= n H (Y ) − n H (Y |X )
= n(H (Y ) − H (Y |X )) = n I (X, Y ).

The Lemma holds.

Let us define the channel capacity of a discrete channel, this concept plays an
important role in channel coding. First, we note that the joint probability distribution
p(x y) in the product space X Y is uniquely determined by the probability distribution
p(x) on X and the probability transformation matrix T , that is p(x y) = p(x) p(y|x);
therefore, the mutual information I (X, Y ) of X and Y is also uniquely determined
by p(x) and T . In fact,
 p(x y)
I (X, Y ) = p(x y) log
x∈X y∈Y
p(x) p(y)
  p(y|x)
= p(x) p(y|x) log .
x∈X y∈Y
p(x) p(y|x)
x∈X

Definition 3.20 The channel capacity B of a discrete memoryless channel {X, T, Y }


is defined as
B = max I (X, Y ), (3.84)
p(x)

where formula (3.84) is the maximum of all probability distributions p(x) on X .

Lemma 3.21 The channel capacity B of a discrete memoryless channel {X, T, Y }


is estimated as follows:
3.8 Channel Coding Theorem 139

0 ≤ B ≤ min{log |X |, log |Y |}.

Proof The amount of mutual information between the two information spaces is
I (X, Y ) ≥ 0 (see Lemma 3.5), so there is B ≥ 0. By Lemma 3.4,

I (X, Y ) = H (X ) − H (X |Y ) ≤ H (X ) ≤ log |X |

and
I (X, Y ) = H (Y ) − H (Y |X ) ≤ H (Y ) ≤ log |Y |,

so we have
0 ≤ B ≤ min{log |X |, log |Y |}.

The calculation of information capacity is a problem of solving the conditional


extremum of constrained convex function. We will not discuss it in detail here but
calculate its channel capacity for two simple channels.

Example 3.8 The channel capacity of noiseless channel {X, T, Y } is B = log |X |.

Proof Let {X, T, Y } be a noise free channel, then |X | = |Y |, and the probability
transfer matrix T is the identity matrix, so
 p(y|x)
I (X, Y ) = p(x y) log
x∈X y∈Y
p(y)
  p(y|x)
= p(x) p(y|x) log .
x∈X y∈Y
p(y)

Because p(y|x) = 0, if y = x; p(y|x) = 1, if x = y. So there is


 1
I (X, Y ) = p(x) log = H (X ) ≤ log |X |.
x∈X
p(x)

Thus
B = max I (X, Y ) = log |X |.
p(x)

Example 3.9 The channel capacity B of binary symmetric channel {X, T, Y } is

B = 1 − p log p − (1 − p) log(1 − p) = 1 − H ( p),

where p < 21 , H ( p) is the binary entropy function.

Proof In binary symmetric channel {X, T, Y }, X = Y = F2 = {0, 1}, T is a second-


order symmetric matrix
140 3 Shannon Theory

1− p p
T = , p < 1.
p 1− p

Let a be the random variable in the input space F2 and b be the random variable in the
output space F2 , all of which obey the two-point distribution, and then the transfer
matrix T of the symmetric binary channel can be represented by the following clearer
schematic diagram:

P{b = 1|a = 0} = P{b = 0|a = 1} = p
P{b = 0|a = 0} = P{b = 1|a = 1} = 1 − p.

Calculate mutual information I (X, Y ), there is

I (X, Y ) = H (X ) − H (X |Y ),

however, 
H (X |Y ) = p(x y) log p(x|y)
x∈F2 y∈F2

= − p log p − (1 − p) log(1 − p) = H ( p).

Thus
B = max{I (X, Y )} = max{H (X ) − H ( p)} = 1 − H ( p).

In order to state and prove the channel coding theorem, we introduce the concept
of joint typical sequence. By the Definition 3.13 of Sect. 5 this chapter, if X is a
memoryless source, for any small ε > 0 and positive integer n ≥ 1, in the power
space X n , we define the typical sequence Wε(n) as

1
Wε(n) = {x = x1 · · · xn ∈ X n || − log p(x) − H (X )| < ε}.
n
If {X, T, Y } is a memoryless channel, by Lemma 3.19, X Y is a memoryless source,
in the power space (X Y )n = X n Y n , we define the joint canonical sequence Wε(n) as
(Fig. 3.4)

1 1
Wε(n) = x y ∈ X n Y n | − log p(x) − H (X )| < ε, | − log p(y) − H (Y )| < ε,
n n
1
|− log p(x y) − H (X Y )| < ε . (3.85)
n

Lemma 3.22 (Progressive bisection) In memoryless channel {X, T, Y }, the joint


typical sequence Wε(n) satisfies the following asymptotic bisection properties:
(i) lim P{x y ∈ Wε(n) } = 1;
n→∞
3.8 Channel Coding Theorem 141

Fig. 3.4 The transfer matrix

(ii) (1 − ε) 2n(H (X Y )−ε) ≤ |Wε(n) | ≤ 2n(H (X Y )+ε) ;


(iii) If x ∈ X n , y ∈ Y n ,and p(x y) = p(x) p(y), then

(1 − ε) 2−n(I (X,Y )+3ε) ≤ P{x y ∈ Wε(n) } ≤ 2−n(I (X,Y )−3ε) .

Proof By Lemma 3.13, we have

1
− log p(X n ) → H (X ), Convergence according to probability when n → ∞;
n
1
− log p(Y n ) → H (Y ), Convergence according to probability when n → ∞;
n
1
− log p(X n Y n ) → H (X Y ), Convergence according to probability when n → ∞.
n
So when ε is given, as long as n is sufficiently large, there is

1 1
P1 = P | − log p(x) − H (X )| > ε < ε,
n 3

1 1
P2 = P | − log p(y) − H (Y )| > ε < ε,
n 3

1 1
P3 = P | − log p(x y) − H (X Y )| > ε < ε,
n 3

where x ∈ X n , y ∈ Y n . Thus, it can be obtained


 
/ Wε(n) ≤ P1 + P2 + P3 < ε.
P xy ∈

Thus  
P x y ∈ Wε(n) > 1 − ε,

in other words,
lim P{x y ∈ Wε(n) } = 1.
n→∞

Property (i) holds. To prove (ii), let x ∈ X n , y ∈ Y n , and x y ∈ Wε(n) , then


142 3 Shannon Theory

1
H (X Y ) − ε < − log p(x y) < H (X Y ) + ε.
n
Equivalently,
2−n(H (X Y )+ε) < p(x y) < 2−n(H (X Y )−ε) .

By total probability formula,


 
1= p(x y) ≥ p(x y) ≥ |Wε(n) | 2−n(H (X Y )+ε) .
x y∈X n Y n x y∈Wε(n)

So there is
|Wε(n) | ≤ 2n(H (X Y )+ε) .

On the other hand, when n is sufficiently large,



1 − ε < P{x y ∈ Wε(n) } = p(x y)
x y∈Wε(n)

≤ |Wε(n) | 2−n(H (X Y )−ε) .

So there is
(1 − ε) 2n(H (X Y )−ε) ≤ |Wε(n) | ≤ 2n(H (X Y )+ε) ,

property (ii) holds. Now let’s prove property (iii). If p(x y) = p(x) p(y), then

P{x y ∈ Wε(n) } = p(x) p(y)
x y∈Wε(n)

≤ |Wε(n) |2−n(H (X )−ε) 2−n(H (Y )−ε)


≤ 2n(H (X Y )+ε−H (X )−H (Y )+2ε)
= 2−n(I (X,Y )−3ε) .

Similarity can prove its lower bound, so we have

(1 − ε) 2−n(I (X,Y )+3ε) ≤ P{x y ∈ Wε(n) } ≤ 2−n(I (X,Y )−3ε) .

We have completed the proof of Lemma.

The following lemma has important applications in proving the channel coding
theorem. In fact, the conclusion of lemma is valid in general probability space.

Lemma 3.23 In memoryless channel {X, T, Y }, if codeword y ∈ Y n is uniquely


determined by x ∈ X n , x  ∈ X n , x  and x are independent, y and x  are also inde-
pendent.
3.8 Channel Coding Theorem 143

Proof If y is uniquely determined by x, then p(x) = p(y) = p(x y), or p(y|x) = 1.


Therefore, the probability of joint event yx x  is

p(yx x  ) = p(x x  ) = p(x) p(x  ) = p(y) p(x  ).

on the other hand,


p(yx x  ) = p(yx  ).

Thus
p(yx  ) = p(y) p(x  ).

The Lemma holds.

In order to define the error probability of channel transmission, we first introduce


the workflow of channel coding. After source compression coding, a source message
input set is generated,

W = {1, 2, . . . , M}, M ≥ 1 is positive integers.

Injection f : W → X n is called coding function, f encodes each input message


w ∈ W as f (w) ∈ X n . Codeword x = f (w) ∈ X n receives codeword y ∈ Y n after
T
transmission through channel {X, T, Y }, we write x −→ y, or y = T (x). Mapping
g : Y n → W is called decoding function. Therefore, the so-called channel coding is
a pair of mapping ( f, g). Obviously,

C = f (W ) = { f (w)|w ∈ W } ⊂ X n

is a code with length n in codeword space X n , number of codewords is |C| = |W | =


M. C is the code of f . The code rate RC is

1 1
RC = log |C| = log M.
n n

For each input message w ∈ W , if g(T ( f (w))) = w, it is said that the channel
transmission is wrong, the transmission error probability λw is

λw = P{g(T ( f (w))) = w}, w ∈ W. (3.86)

The transmission error probability of codeword x = f (w) ∈ C is recorded as Pe (x),


obviously, Pe (x) = λw , that is, Pe (x) is the conditional probability

Pe (x) = P{g(T (x)) = w|x = f (w)}


(3.87)
= P{g(T ( f (w))) = w} = λw .

We define the transmission error probability of code C = f (W ) ⊂ X n as Pe (C),


144 3 Shannon Theory

1  1 
M
Pe (C) = Pe (x) = λw . (3.88)
M x∈C M w=1

As before, a code C with length n and number of codewords M is recorded as


C = (n, M).

Theorem 3.12 (Shannon’s channel coding theorem, 1948) Let {X, T, Y } be a mem-
oryless channel and B be the channel capacity, then
(i) When R < B, there is a column of codes Cn = (n, 2[n R] ), its transmission error
probability Pe (Cn ) satisfies

lim Pe (Cn ) = 0; (3.89)


n→∞

(ii) Conversely, if the transmission error probability of code Cn = (n, 2[n R] ) satisfies
Eq. (3.89), there is an absolute normal number N0 , and we have the code rate
RCn of Cn satisfies
RCn ≤ B, when n ≥ N0 .

If Cn = (n, 2[n R] ), by Lemma 2.27 of Chap. 2,

1
R− < RCn ≤ R. (3.90)
n

so (i) of Theorem 3.12 indicates that the code rate is sufficiently close to the channel
capacity B, the “good code” with sufficiently small transmission error probability
exists. (ii) indicates that the bit rate of the so-called good code with sufficiently small
transmission error probability does not exceed the channel capacity. Shannon’s proof
Theorem 3.12 uses random code technology; this idea of using random method to
prove deterministic results is widely used in information theory. At present, it has
more and more applications in other fields.

Proof (Proof of theorem 3.12) Firstly, the probability function p(xi ) is arbitrarily
selected on the input alphabet X , and the joint probability in power space X n is
defined as
n
p(x) = p(xi ), x = x1 · · · xn ∈ X n , (3.91)
i=1

In this way, we get a memoryless source X and power space X n , which consti-
tute the codeword space of channel coding. Then M = 2[n R] codewords are ran-
domly selected in X n to obtain a random code Cn = (n, 2[n R] ). In order to illus-
trate the randomness of codeword selection, we borrow the source message set
W = {1, 2, . . . , M}, where M = 2[n R] . For every message w, 1 ≤ w ≤ M, the ran-
domly generated codeword is marked as X (n) (w). So we get a random code
3.8 Channel Coding Theorem 145

Cn = {X (n) (1), X (n) (2), . . . , X (n) (M)} ⊂ X n .

The generation probability P{Cn } of Cn is


M 
M 
n
(n)
P{Cn } = P{X (w)} = p(xi (w)),
w=1 w=1 i=1

where X (n) (w) = x1 (w)x2 (w) · · · xn (w) ∈ X n .


We take An = {Cn } as the set of all random codes Cn , which is called the random
code set. The average transmission error probability on random code set An is defined
as 
P̄e (An ) = P{Cn }Pe (Cn ). (3.92)
Cn ∈An

If you want to prove that for any ε > 0, When n is sufficiently large, P̄e (An ) < ε,
then there is at least one code Cn ∈ An such that Pe (Cn ) < ε, which proves the (i).
Therefore, we prove it in two steps.
(1) Principles of constructing random codes and encoding and decoding
We select each message in the source message set W = {1, 2, . . . , M} with equal
probability, that is w ∈ W , the selection probability of w is

1
p(w) = = 2−[n R] , w = 1, 2, . . . , M.
M
In this way, W becomes an equal probability information space. For each input
message w, it is randomly coded as X (n) (w) ∈ X n , where

X (n) (w) = x1 (w)x2 (w) · · · xn (w) ∈ X n .

Codeword X (n) (w) is transmitted through memoryless channel {X, T, Y } with con-
ditional probability

n
p(y|X (n) (w)) = p(yi |xi (w))
i=1

received codeword y = y1 y2 · · · yn ∈ Y n . The decoding principle of y is: If X (n) (w)


is the only input codeword so that X (n) (w)y is joint typical, that is X (n) (w)y ∈ Wε(n) ,
then decode g(y) = w; if there is no such codeword X (n) (w), or there are two or
more codewords X (n) (w) and y are joint typical, y cannot be decoded correctly.
(2) Estimating the average error probability of random code set An
By (3.92) and (3.88),
146 3 Shannon Theory

P̄e (An ) = P{Cn }Pe (Cn )
Cn ∈An
 1 
= P{Cn } Pe (x)
Cn ∈An
M x∈C
n


M  (3.93)
1
= λw P{Cn }
M w=1 Cn ∈An

1 
M
= λw ,
M w=1

where λw is given by Eq. (3.86). Because w is input with equal probability, in


other words, w is encoded with equal probability. Therefore, the transmission error
probability λw of w does not depend on w, that is

λ1 = λ2 = · · · = λ M .

By (3.93), we have P̄e (An ) = λ1 . To estimate λ1 , we define

E i = {y ∈ Y n |X n (i)y ∈ Wε(n) }, i = 1, 2, . . . , M, (3.94)

If E 1c = Y n \E 1 is the remainder of E 1 , because of the decoding principle,


M
λ1 = P{E 1c ∪ E 2 ∪ · · · ∪ E M } ≤ P{E 1c } + P{E i }. (3.95)
i=2

By property (i) of Lemma 3.22,

/ Wε(n) } = 0.
lim P{x y ∈
n→∞

So there is
lim P{X (n) (1)y ∈
/ Wε(n) } = 0.
n→∞

Therefore, when n is sufficiently large,

P{E 1c } < ε.

Obviously, codeword X (n) (1) and other codewords X (n) (i), (i = 2, . . . , M) are inde-
pendent of each other (see 3.91). By Lemma 3.23, y = T (X (n) (1)) and X (n) (i) (i =
1) also are independent of each other. Then by the property (iii) of Lemma 3.22,

P{E i } = P{X (n) (i)y ∈ Wε(n) } ≤ 2−n(I (X,Y )−3ε) (i = 1).


3.8 Channel Coding Theorem 147

To sum up,

M
P̄e (An ) =λ1 ≤ ε + 2−n(I (X,Y )−3ε)
i=2

≤ ε + 2[n R] 2−n(I (X,Y )−3ε)


≤ ε + 2−n(I (X,Y )−R−3ε) .

If R < I (X, Y ), then I (X, Y ) − R − 3ε > 0(when ε is sufficiently small), so when n


is large enough, we have P̄e (An )<2ε. Due to the channel capacity B = max{I (X, Y )},
we can choose p(x) to make B = I (X, Y ). So when R < B, we have P̄e (An ) < 2ε,
this completes the proof of (i).
To prove (ii), let’s look at a special case first. If the error probability of C =
(n, 2[n R] ) is Pe (C) = 0, then the bit rate of C is RC < B + n1 , so when n is sufficiently
large, there is RC ≤ B.
In fact, because Pe (C) = 0, decoding function g : Y n → W only determines W ,
there is H (W |Y n ) = 0. Because W is equal probability information space, so

H (W ) = log |W | = [n R].

Using the decomposition of mutual information, there are

I (W, Y n ) = H (W ) − H (W |Y n ) = H (W ) = [n R]. (3.96)

on the other hand, W → X n → Y n forms a Markov chain, by data inequality (see


Theorem 3.8)
I (W, Y n ) ≤ I (X n , Y n ).

By Lemma 3.20,

I (W, Y n ) ≤ I (X n , Y n ) = n I (X, Y ) ≤ n B.

By (3.96), there is [n R] ≤ n B. Because n R − 1 < [n R] ≤ n R, so n R < n B + 1,


that is R < B + n1 , by (3.90), we have

1
RC ≤ R < B + ,
n
thus
RC ≤ B, when n is sufficiently large.

The above formula shows that when the transmission error probability is 0, as long as
n is sufficiently large, there is RC ≤ B. Secondly, if the transmission error is allowed,
that is, the error probability of Cn is Pe (Cn ) < ε, where Cn = (n, 2[n R] ). Then when
n is sufficiently large, we still have RCn ≤ B.
148 3 Shannon Theory

In order to prove the above conclusion, we note the error probability of random
code Cn is
Pe (Cn ) = λw , (3.97)

where w ∈ W is any given message. When w is given, we define a random variable


ξw with a value on {0, 1} as

1, if g(T ( f (w))) = w;
ξw =
0, if g(T ( f (w))) = w.

Let E = (F2 , ξw ) be a binary information space, by (3.97), then we have

Pe (Cn ) = P{ξw = 1}.

By Theorem 3.3,
H (E W |Y n ) = H (W |Y n ) + H (E|W Y n )
(3.98)
= H (E|Y n ) + H (W |EY n ).

Note that E is uniquely determined by Y n and W , so H (E|W Y n ) = 0, at the same


time, E is a binary information space, H (E) ≤ log 2 = 1, there is

H (E|Y n ) ≤ H (E) ≤ 1.

On the other hand, the random variable ξw is only related to w ∈ W , so

H (W |EY n ) = Pe (Cn ) log(|W | − 1) ≤ n R Pe (Cn ).

By (3.98), we have
H (W |Y n ) ≤ 1 + n R Pe (Cn ).

Because f (W ) = X n (W ) is a function of W , we have the following Fano inequality

H ( f (W )|Y n ) ≤ H (W |Y n ) ≤ 1 + n R Pe (Cn ).

Finally,
= H (W ) = H (W |Y n ) + I (W, Y n )
≤ H (W |Y n ) + I ( f (W ), Y n )
≤ 1 + n R Pe (Cn ) + I (X n , Y n )
≤ 1 + n R Pe (Cn ) + n B,

because of n R − 1 < [n R], then we have

n R < 2 + n R Pe (Cn ) + n B.
3.8 Channel Coding Theorem 149

Thus
2
RC n ≤ R < B + + ε,
n
When n is sufficiently large, we obtain RCn ≤ B, which completes the proof of the
theorem.

It can be seen from Example 3.9 that the channel capacity B = 1 − H ( p) of a


binary symmetric channel. Therefore, Theorem 3.12 extends Theorem 2.10 in the
previous chapter to a more general memoryless channel; at the same time, it is also
proved that the code rate of a good code does not exceed the capacity of the channel.
Exercise 3
1. The joint probability functions of the two information spaces X and Y are as
follows:

Y X
0 1
1 1
0 4 4
1 5
1 12 12

Solve H (X ), H (Y ), H (X Y ), H (X |Y ), H (Y |X ), and I (X, Y ).


2. Let X 1 , X 2 , X 3 be three information spaces on F2 , Known I (X 1 ,
X 2 ) = 0, I (X 1 , X 2 , X 3 ) = 1, prove:

H (X 3 ) = 1, and H (X 1 X 2 X 3 ) = 2.

3. Give an example to illustrate I (X, Y |Z ) ≥ I (X, Y ).


4. Can I (X, Y |Z ) = 0 be derived from I (X, Y ) = 0? In turn, can I (X, Y |Z ) = 0
deduce I (X, Y ) = 0? Please prove or give examples.
5. Let X, Y, Z be three information spaces, prove:
(i) H (X Y |Z ) ≥ H (X |Z );
(ii) I (X Y, Z ) ≥ I (X, Z );
(iii) H (X Y Z ) − H (X Y ) ≤ H (X Z ) − H (X );
(iv) I (X, Z |Y ) = I (Z , Y |Z ) − I (Z , Y ) + I (X, Z ).
It also explains under what conditions the equality sign holds.
6. Can I (X, Y ) = 0 deduce I (X, Z ) = I (X, Z |Y )?
7. Let the information space be X = {0, 1, 2, . . .} and the value probability p(n)
of random variable ξ be

p(n) = P{ξ = n}, n = 0, 1, . . . .

Given the mathematical expectation Eξ = A > 0 of ξ , find the maximum proba-


bility distribution { p(n)|n = 0, 1, . . .} of H (X ) and the corresponding maximum
information entropy.
150 3 Shannon Theory

8. Let the information space be X = {0, 1, 2, . . .}, and take an example of the
random variable ξ taken from X , so that H (X ) = ∞.
9. Let X 1 = (X, ξ ), X 2 = (X, η) be two information spaces and ξ be a function of
η, prove H (X 1 ) ≤ H (X 2 ), and explain this result.
10. Let X 1 = (X, ξ ), X 2 = (X, η) be two information spaces and η = f (ξ ), prove
(i) H (X 1 ) ≥ H (X 2 ), give the conditions under which the equal sign holds.
(ii) H (X 1 |X 2 ) ≥ H (X 2 |X 1 ), give the conditions under which the equal sign
holds.

References

Bassoli, R., Marques, H., & Rodriguez, J. (2013). Network coding theory, a survey. IEEE Commun.
Surveys Tutor., 15(4), 1950–1978.
Berger, T. (1971). Rate distortion theory: a mathematical basis for data compression. Prentice-Hall.
Blahut, R. E. P. (1965). Ergodic theory and informtion. Wiley.
Chung, K. L. (1961). A note on the ergodic theorem of information theory. Addison. Math Statist.,
32, 612–614.
Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. Wiley.
Csiszár, I., & Körner, J. (1981). Informaton theory: Coding theorems for discrete memoryless
systems. Academic Press.
EI Gamal, A., & Kim, Y. H. (2011). Network information theory. Cambrige University Press
Fragouli, C., Le Boudec, J. Y., & Widmer, J. (2006). Network coding: An instant primer. ACMSIG-
COMM Computer Communication Review, 36, 63–68.
Gallager, R. G. (1968). Information theory and reliable communication. Wiley.
Gray, R. M. (1990). Entropy and information theory. Springer.
Guiasu, S. (1977). Inormation theory with applications. McGraw-Hill.
Ho, T., & Lun, D. Network coding: An introduction. Computer Journal.
Hu, X. H., & Ye, Z. X. (2006). Generalized quantum entropy. Journal of Mathematical Physics,
47(2), 1–7.
Ihara, S. (1993). Information theory for continuous systems. World Scientific.
Kakihara, Y. (1999). Abstract methods in information theory. World Scientific.
McMillan, B. (1953). The basic theorems of information theory. Annals of Mathematical Statistics,
24(2), 196–219.
Moy, S. C. (1961). A note on generalizations of Shannon-McMilllan theorem. Pacific Journal of
Mathematics, 11, 705–714.
Nielsen, M. A., & Chuang, I. L. (2000). Quantum computation and quantum information. Cambridge
University Press.
Shannon, C. E. (1948). A mathematical theory of communication. Bell Labs Technical Journal,
27(4), 379–423, 623–656.
Shannon, C. E. (1959). Coding theorem for a discrete source with a fidelity criterion. IRE National
Convention Record, 4, 142–163.
Shannon, C. E. (1958). Channels with side information at the transmitter. IBM Journal of Research
and Development, 2(4), 189–193.
Shannon, C. E. (1961). Two-way communication channels. Proceedings of the Fourth Berkeley
Symposium on Mathematical Statistics and Probability, 1, 611–644.
Thomasian, A. J. (1960). An elementary proof of the AEP of information theory. Annals of Math-
ematical Statistics, 31(2), 452–456.
Wolfowitz, J. (1978). Coding theorems of information theory (3rd ed.). Springer-Verlag.
References 151

Ye, Z. X., & Berger, T. (1998). Information measures for discrete random fields. Science Press.
Yeung, R. W. (2002). A first course in information theory. Kluwer Academic.
Qiu, P. (2003). Information theory and coding. Higher Education Press. (in Chinese).
Qiu, P., Zhang, C., Yang, S., et al. (2012). Multi user information theory. Science Press. (in Chinese).
Ye, Z. (2003). Fundamentals of information theory. Higher Education Press. (in Chinese).
Zhang, Z., & Lin, X. (1993). Information theory and optimal coding. Shanghai Science and Tech-
nology Press. (in Chinese).

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 4
Cryptosystem and Authentication System

In 1949, Shannon published a famous paper entitled “communication theory of secure


systems” in the technical bulletin of Bell laboratory. Based on the mathematical
theory of information established by him in 1948 (see Chap. 3), this paper makes a
comprehensive discussion on the problem of secure communication and establishes
the mathematical theory of secure communication system. It has a great impact
on the later development of cryptography. It is generally believed that Shannon
transformed cryptography from art (creative ways and methods) to science, so he is
also known as the father of modern cryptography. The main purpose of this chapter
is to introduce Shannon’s important ideas and results in cryptography theory, which
is the cornerstone of the whole modern cryptography.

4.1 Definition and Statistical Characteristics


of Cryptosystem


Let X = {a1 , a2 , . . . , aq } be the plaintext alphabet and a source. {ξi }i=1 is a set of
random variables valued on X , for any given positive integer n ≥ 1, we define the
plaintext space P as the product information space X 1 X 2 · · · X n , that is

P = X 1 X 2 · · · X n , where X i = (X, ξi ), 1 ≤ i ≤ n.

If m = m 1 m 2 · · · m n ∈ P(m i ∈ X i ), m is called a plaintext information column of


alphabet length n, or a plaintext string of length n, the joint probability p(m) is
defined as

p(m) = p(m 1 m 2 · · · m n ) = P{ξ1 = m 1 , ξ2 = m 2 , . . . , ξn = m n }. (4.1)

© The Author(s) 2022 153


Z. Zheng, Modern Cryptography Volume 1, Financial Mathematics and Fintech,
https://doi.org/10.1007/978-981-19-0920-7_4
154 4 Cryptosystem and Authentication System

Let Z = {b1 , b2 , . . . , bs } be the key alphabet, which is also a memoryless source



(see Definition 3.9 of Chap. 3), let {ηi }i=1 be a group of random variables with
independent values on Z and equal probability distribution, then for any b ∈ Z ,

1 1
p(b) = P{ηi = b} = = , ∀ i ≥ 1. (4.2)
|Z | s

We define the power space Z r as a key space, denoted by K , that is

K = Z r = {k = k1 k2 · · · kr |ki ∈ Z , 1 ≤ i ≤ r }.

Each k = k1 k2 · · · kr ∈ K is called a key of length r , and the joint probability p(k)


is
r
1 1
p(k) = p(k1 k2 · · · kr ) = p(ki ) = = . (4.3)
i=1
|Z |r |K |

This shows that the r -dimensional random vector η = (η1 , η2 , . . . , ηr ) taking value
on the key space K is also equally almost distributed on K . Unless otherwise spec-
ified, we generally stipulate that the plaintext space P and the key space K are
independent information spaces, that is

p(mk) = p(m) p(k), ∀ m ∈ P, k ∈ K . (4.4)

For every k ∈ K , k defines or controls an encryption transform E k , denote by

E = {E k |k ∈ K }.

E is called encryption algorithm. When k ∈ K is given, the encryption transformation


E k acts on the plaintext m ∈ P to produce a cryptosystemtext E k (m), each encryption
transformation E k is an injection, and its left inverse mapping is recorded as Dk ,
which is called decryption transformation. Taking 1 P as the identity transformation
of plaintext space, that is 1 P (m) = m, ∀ m ∈ P, then we have

Dk E k = 1 P , or Dk (E k (m)) = m, ∀ m ∈ P. (4.5)

Define cryptosystemtext space C as

C = {E k (m)|m ∈ P, k ∈ K } ⊂ X 1 X 2 · · · X n . (4.6)

That is, cryptosystemtext space C and plaintext space P have the same alphabet and
the same letter length.
For each cryptosystemtext c ∈ C, c = E k (m), then c is uniquely determined by
plaintext m and key k, so we can define the occurrence probability p(c) of cryptosys-
temtext c as
4.1 Definition and Statistical Characteristics of Cryptosystem 155

p(c) = p(E k (m)) = p(km) = p(k) p(m). (4.7)

Obviously,
   
p(c) = p(km) = p(k) p(m) = 1,
c∈C k∈K m∈P k∈K m∈P

Therefore, the cryptosystemtext space C defined by formula (4.7) is also an infor-


mation space.
When k ∈ K is given, we let

Ak = {E k (m)|m ∈ P} ⊂ C. (4.8)

Then the encryption transformation E k is the full mapping of P → Ak , so E k is a 1–1


correspondence of P → Ak , and its inverse mapping is the decryption transformation
Dk , that is
Dk E k = 1 P , E k Dk = 1 A k , k ∈ K .

We denote D as all decryption transformations, that is

D = {Dk |k ∈ K }. (4.9)

D is called decryption algorithm.

Definition 4.1 Under the above provisions, R = {P, C, K , E, D} is called a cryp-


tosystem, where P, C, K is the information space, K and P are statistically inde-
pendent, E is the encryption algorithm and D is the decryption algorithm.

The statistical characteristics of a cryptosystem are attributed to the following


theorem.

Theorem 4.1 If R = {P, C, K , E, D}, then


(1) ∀ c ∈ C, we have 
p(c) = p(k) p(Dk (c)). (4.10)
k∈K
c∈Ak

(2) c ∈ C, m ∈ P, then 
p(c|m) = p(k). (4.11)
k∈K
E k (m)=c

(3) c ∈ C, m ∈ P, then

p(m) k∈K p(k)
Dk (c)=m
p(m|c) =  , (4.12)
k∈K p(k) p(Dk (c))
c∈Ak
156 4 Cryptosystem and Authentication System

where Ak is given by equation (4.8).

Proof By (4.7), if c ∈ C, then


 
p(c) = p(km) = p(k) p(m)
k∈K ,m∈P k∈K ,m∈P
E k (m)=c E k (m)=c
  
= p(k) p(m) = p(k) p(Dk (c)).
k∈K m∈P k∈K
E k (m)=c c∈Ak

(1) holds. (2) is trivial. Because when m ∈ P is given, the occurrence probability
p(c|m) of cryptosystemtext c has

p(c|m) = p(k).
k∈K ,
E k (m)=c

To prove (3), by (1.24),

p(m) p(c|m)
p(m|c) =  ,
m ∈P p(m ) p(c|m )

the items in the denominator are


  
p(m ) p(c|m ) = p(m ) p(k)
m ∈P m ∈P k∈K
E k (m )=c
 
= p(k) p(m )
k∈K m ∈P
E k (m )=c

= p(k) p(Dk (c)).
k∈K
c∈Ak

So in the end 
p(m) k∈K p(k)
E k (m)=c
p(m|c) =  ,
k∈K p(k) p(Dk (c))
c∈Ak

Theorem 4.1 holds!

By Theorem 4.1, the statistical characteristics of a cryptosystem can be summa-


rized as follows: The probability distribution of cryptosystemtext space and the con-
ditional probability distribution of plaintext about cryptosystemtext are completely
determined by the probability distribution of plaintext space and key space. That is,
anyone who knows the probability distribution of plaintext space and key space will
4.1 Definition and Statistical Characteristics of Cryptosystem 157

know the probability distribution of cryptosystemtext and the conditional probability


distribution of plaintext about cryptosystemtext.
It is assumed that the plaintext space and the key space are statistically indepen-
dent, by (3.14) of Theorem 3.2 of Chap. 3, we have

H (P K ) = H (P) + H (K ). (4.13)

It has been previously specified that the key source alphabet Z is an equal proba-
bility information space without memory, and the probability p(k) of the key space
K = {k = k1 k2 · · · kr |ki ∈ Z } is


r
1 1
p(k) = p(ki ) = = . (4.14)
i=1
|Z |r |K |

Therefore, the key space is also an equal probability information space.


From the definition of cryptosystem, when the plaintext space and key space are
given, the cryptosystemtext space is completely determined. On the contrary, when
the cryptosystemtext space and key space are known, the plaintext space is also
known, combined with Lemma 3.3 in the previous chapter, we have
Theorem 4.2 In a cryptosystem R = {P, C, K , E, D}, we have

H (P|K C) = 0, H (C|K P) = 0, H (K |PC) = 0.

Proof We only prove H (P|K C) = 0, similarly to H (C|K P) = 0 and H (K |PC) =


0. For given m ∈ P, let

Nm = {kc|c ∈ C, k ∈ K and E k (m) = c}.

Thus Nm ⊂ K C, and 
p(m|kc) = 1, if kc ∈ Nm ;
p(m|kc) = 0, if kc ∈
/ Nm .

Because assuming kc ∈ Nm is selected, then E k (m) = c, thus m = Dk (c), m will be


determined. Conversely, if kc ∈
/ Nm , when the kc-joint event occurs and m cannot
occur, thus p(m|kc) = 0. By Lemma 3.3 from the previous chapter, H (P|K C) = 0,
we complete the proof of Theorem 4.2.

Corollary 4.1 In a cryptosystem R = {P, C, K , E, D}, we always have

H (P) ≤ H (C).
158 4 Cryptosystem and Authentication System

Proof It is stipulated that P and K are statistically independent, so there are

H (P) = H (P|K ) = H (P|K ) + H (C|P K )


= H (PC|K )
= H (C|K ) + H (P|K C)
= H (C|K )
≤ H (C).

The Corollary holds.

The Corollary shows that the uncertainty of plaintext is less than that of cryp-
tosystemtext in cryptosystem.

4.2 Fully Confidential System

Generally speaking, the mutual information I (P, C) between plaintext space and
cryptosystemtext space (see Definition 3.8 in the previous chapter) reflects the infor-
mation of plaintext space contained in cryptosystemtext space, so I (P, C) minimiza-
tion is an important design goal of cryptosystem. If the cryptosystemtext does not
provide any information about the plaintext, or the analyst cannot obtain any infor-
mation about the plaintext by observing the cryptosystemtext, such a cryptosystem
is called completely confidential.

Definition 4.2 A cryptosystem R = {P, C, K , E, D}, if H (P|C) = H (P), or


I (P, C) = 0, R is called complete secrecy system, or unconditional secrecy system.

Theorem 4.3 For any cryptosystem R = {P, C, K , E, D}, we have

I (P, C) ≥ H (P) − H (K ). (4.15)

Proof By Theorem 4.2, we have H (P|K C) = 0, and

H (P|C) = H (P|C) + H (K |PC)


= H (P K |C)
= H (K |C) + H (P|K C).

So we have
H (P|C) = H (K |C) ≤ H (K ).

By definition,

I (P, C) = H (P) − H (P|C) ≥ H (P) − H (K ).


4.2 Fully Confidential System 159

So the Theorem holds.

From the previous chapter, we know the amount of mutual information I (X, Y ) ≥
0, there is
Corollary 4.2 In a completely confidential cryptosystem R = {P, C, K , E, D},
there is always
H (P) ≤ H (K ) = log2 |K |. (4.16)

Proof Defined by R = {P, C, K , E, D} as a completely confidential system, so


I (P, C) = 0. From the above theorem, there are

H (P) − H (K ) ≤ I (P, C) = 0,

Thus there is H (P) ≤ H (K ). By (4.14), K is equipotential distribution, so there is

H (P) ≤ log2 |K |.

It can be seen from the above that the larger the scale |K | of the key space, the
better the confidentiality of the system!
Definition 4.3 A cryptosystem R = {P, C, K , E, D} is called a “one secret at a
time" system, if there is a unique key k ∈ K for a given m ∈ P and c ∈ C, such that
c = E k (m).
As can be seen from the above definition, for given m ∈ P, if k = k , then E k (m) =
E k (m). In other words, we only use a unique key k to encrypt the same set of plaintext
and cryptosystemtext. This is also the origin of the concept of “one secret at a time”.
Thus, for any given plaintext m ∈ P and cryptosystemtext c ∈ C, there happens to
be a unique key k ∈ K such that E k (m) = c. Therefore, when k traverses the key
space K , m traverses the plaintext space P, and each m appears only once. Thus, for
c ∈ C, we have  
p(c) = p(k) p(m)
k∈K m∈P
E k (m)=c
 
= p(k) p(m)
(4.17)
k∈K m∈P
E k (m)=c
1  1
= p(m) = .
|K | m∈P |K |

That is to say, in a one-time cryptosystem, the cryptosystemtext space C is also an


equal probability information space.
160 4 Cryptosystem and Authentication System

Theorem 4.4 The one-time password system is a completely confidential system.


Proof When c, m given, by (4.11),
 1
p(c|m) = p(k) = .
k∈K
|K |
E k (m)=c

By (4.12) and (4.17),

p(m)
p(m|c) =  
k∈K m ∈P p(m )
E m (k)=c
p(m)
=
m ∈P p(m )
= p(m).

Thus 
H (P|C) = − p(mc) log2 p(m|c)
m∈P c∈C

=− p(mc) log2 p(m)
m∈P c∈C

=− p(m) log2 p(m)
m∈P
= H (P).

Therefore, R = {P, C, K , E, D} is a completely confidential system.

4.3 Ideal Security System

In order to introduce Shannon’s concepts of unique solution distance and ideal cryp-
tosystem, we first consider the scenario of secret only attack. In the scenario of secret
only attack, when the cryptanalyzer intercepts cryptosystemtext c, he may decrypt c
with all decryption keys Dk to obtain

m = Dk (c), k ∈ K .

Therefore, he records the keys corresponding to all meaningful messages m , only one
of the set of these keys is a correct key, while other incorrect keys are called pseudo
keys. A large number of cryptosystemtexts are required as samples in secret only
attacks. Therefore, we will consider the product space P n of plaintext and cryptosys-
temtext and the joint events in C n , P n and C n as plaintext string and cryptosystemtext
string.
4.3 Ideal Security System 161

Definition 4.4 For cryptosystemtext string y ∈ C n with given length n, let

K (y) = {k ∈ K |∃ x ∈ P n such that E k (x) = y}. (4.18)

Then the number of pseudo keys is |K (y)| − 1. The mathematical expectation Sn of


the pseudo key is defined as

Sn = p(y)(|K (y)| − 1). (4.19)
y∈C n

Therefore, the mathematical expectation of pseudo key is the weighted average


of the number of pseudo keys of each cryptosystemtext string. We first prove the
following two theorems.

Theorem 4.5 If R = {P, C, K , E, D} is a cryptosystem, there are

H (K |C) = H (K ) + H (P) − H (C). (4.20)

Proof From the addition formula of information entropy (see Theorem 3.2 in the
previous chapter),

H (K PC) = H (K P) + H (C|K P) = H (K C) + H (P|K C).

By Theorem 4.2, we have

H (C|K P) = H (P|K C) = 0.

thus,
H (K P) = H (K C).

Again, from the addition formula and note that K and P are statistically independent,
so
H (K P) = H (P) + H (K |P) = H (P) + H (K )

and
H (P) + H (K ) = H (K C) = H (C) + H (K |C).

So we have
H (K |C) = H (P) + H (K ) − H (C).

The Theorem holds.

Theorem 4.6 Let R = {P, C, K , E, D} be a cryptosystem, and |C| = |P|, let r


be the redundancy of P, then the pseudo key mathematical expectation Sn of a
cryptosystemtext string with a given length of n satisfies
162 4 Cryptosystem and Authentication System

2 H (K )
Sn ≥ − 1. (4.21)
|P|nr

Proof From the definition and properties of product space,

Rn = {P n , C n , K , E n , Dn }

also constitutes a cryptosystem. By Theorem 4.5, then

H (K |C n ) = H (K ) + H (P n ) − H (C n ).

By (3.9), we have
H (C n ) ≤ n log2 |C|, |C| = |P|.

Replace information space X with P, then we have

H (P n ) = H (P n−1 ) + H (P|P n−1 )


= H (P n−1 ) + Hn
≥ H (P n−1 ) + H∞ .

So we have
H (P n ) ≥ n H∞ = n(1 − r )H0 = n(1 − r ) log2 |P|. (4.22)

Combined with the above formula, we have an estimate

H (K |C n ) ≥ H (K ) + n(1 − r ) log2 |P| − n log2 |P|. (4.23)

Because of the definition,



H (K |C n ) = − p(ky) log2 p(k|y)
y∈C n k∈K
 
=− p(y) p(k|y) log2 p(k|y)
y∈C n k∈K
 
=− p(y) p(k|y) log2 p(k|y).
y∈C n k∈K (y)

We get  
p(k|y) = p(k) = 1.
k∈K (y) k∈K

Then by Jensen inequality,


4.3 Ideal Security System 163

H (K |C n ) ≤ p(y) log2 |k(y)|
y∈C n

≤ log2 p(y)|k(y)|
y∈C n

= log2 (Sn + 1).

Finally, (4.21) can be obtained from form (4.23) to complete the proof!

When the mathematical expectation of the number of pseudo keys is greater than
0, the secret only attack cannot break the password in theory, so we define the unique
solution distance of a cryptosystem as the value of n of Sn = 0.

Definition 4.5 A cryptosystem whose unique solution distance is infinite is called


an ideal security system.

From Theorem 4.6, we can obtain an approximate value of the distance of the
unique solution.
H (k)
n0 ≈ .
r log2 |P|

The unique solution distance indicates the minimum amount of cryptosystemtext


that may be decrypted successfully when an exhaustive attack is carried out. Gener-
ally speaking, the greater the unique solution distance, the better the confidentiality
of the system. However, Shannon only gives the existence of the unique solution
distance, but does not give a specific calculation program. In practice, the amount of
cryptosystemtext required to decryptosystem a cryptosystemtext is far greater than
the theoretical value of the unique solution distance.

4.4 Message Authentication

Authentication system, also known as authentication code, is an important tool to


ensure the authenticity and integrity of messages. In 1984, Simmons systematically
put forward the information theory of authentication system for the first time. He used
mathematics to study the theoretical and practical security of authentication system.
This paper puts forward the performance limit of authentication system and the
mathematical principles that should be followed in the design of authentication code.
Although Simmons’ theory is not mature and perfect, its position in authentication
system is as important as Shannon’s theory in cryptosystem, which lays a theoretical
foundation for the research of mathematical theory of authentication system.
In cryptography, authentication system includes entity authentication and message
authentication. We mainly discuss message authentication system. At present, there
are two main models of authentication system. One is the arbiter-free authentication
system model. In this model, the participants of the system are mainly message
164 4 Cryptosystem and Authentication System

sender, message receiver and attacker, in which the message sender and receiver trust
each other. They share the same key information; another model is the authentication
system model with arbiter. In this model, the participants of the system have arbiters
in addition to the information sender, receiver and attacker. At this time, the sender
and receiver of the message do not trust each other, but they all trust the arbiter. The
arbiter shares the key information with the sender and receiver.
An authentication system without privacy and confidentiality function and without
arbiter is composed of four parts: a finite set S of source states, called the source set,
a finite set A of authentication tags, called the tag set, a key space composed of all
solvable keys, and an authentication rule set E = {ek (s)|k ∈ K , s ∈ S}, where for
any k ∈ K , s ∈ S, ek (s) is the authentication rule. It is a mapping of S → A.

Definition 4.6 An authentication system is T = {S, A, K , E}, where S, A, K is the


information space, S is the source space or source set, A is the label space or label
set, and K is the key space, where S and K are statistically independent,

E = {ek (s)|k ∈ K , s ∈ S}.

Each ek (s) is an injection of S → A, which is called an authentication rule.

Definition 4.7 The product space S A is called the message space, and M represents
S A.
Authentication protocol: The sender and receiver of the message use the following
protocol to transmit information. First, they secretly select and share the random
key k ∈ K ; if the sender wants to transmit an information source state s ∈ S to
the receiver, the sender calculates a = ek (s) and sends the message sa ∈ M to the
receiver. When the receiver receives message sa, he calculates a = ek (s) again, if
a = a, he confirms that the message is reliable and receives the message, otherwise
he refuses to receive the message sa.

Definition 4.8 Matrix [ek (s)]|K |×|S| is called authentication matrix. Its rows are
marked by key k ∈ K and columns by source state s ∈ S. It is a |K | × |S|-order
matrix, the element intersecting row k and column s is ek (s).

Authentication matrix is an important tool in authentication theory research. Our


detailed list is as follows:
Let K = {k1 , k2 , . . . , kn }, S = {s1 , s2 , . . . , sm }. Then the authentication matrix is an
n × m-order matrix, which is listed as follows:
⎡ ⎤
ek1 (s1 ) ek1 (s2 ) · · · ek1 (sm )
⎢ ek2 (s1 ) ek2 (s2 ) · · · ek2 (sm ) ⎥
⎢ ⎥
⎢ .. ⎥ .
⎣ . ⎦
ekn (s1 ) ekn (s2 ) · · · ekn (sm ) n×m
4.5 Forgery Attack 165

4.5 Forgery Attack

In the process of message authentication, the attacker is an intermediate intruder.


We usually consider two types of attacks, one is forgery attack and the other is
substitution attack, which correspond to secret only attack and plaintext attack in
cryptosystem. In forgery attack, the attacker sends message sa ∈ M in the channel
and wants the receiver to confirm that it is true and receive it; in the substitution
attack, the attacker first observes a message sa ∈ M in the channel, so he analyzes
the coding rules currently used, then he tampers the message sa with s a ∈ M, where
s = s, and wants the receiver to receive it as a real message.
We assume that the attacker adopts the optimal deception strategy. pd0 represents
the probability that the forgery attacker is most likely to succeed in deception, and
pd1 represents the probability that the attacker is most likely to succeed in deception.
The probability pd that the attacker is successful in deception is defined as

pd = max{ pd0 , pd1 }. (4.24)

Simmons’ theory is mainly to estimate the lower bound of pd , so as to provide a


theoretical basis for constructing authentication codes with attack success probability
pd as small as possible.
First, let’s look at the definition and estimation of the maximum probability pd0
of successful deception by forgery attackers.
A = {a1 , a2 , . . . , ar } represents the authentication tag space. The attacker first
selects a source state s ∈ S and an authentication tag a ∈ A. Let k0 ∈ K represent
the shared key selected by the sender and receiver, if a = ek0 (s), the forgery attacker
can successfully deceive the receiver. We use pay off (s, a) to represent the probability
that the message receiver receives sa as a true message, that is

pay off (s, a) = p(a = ek0 (s)) = p(k). (4.25)
k∈K
ek (s)=a

If the attacker adopts the optimal strategy, then

pd0 = max{pay off (s, a)|s ∈ S, a ∈ A}. (4.26)

Theorem 4.7 If the scale of authentication tag space A is set to |A| = r , for any
fixed source state s ∈ S, there will always be an authentication tag a ∈ A such that

1 1
pay off (s, a) ≥ , thus pd0 ≥ .
r r
Proof By the definition of pay off (s, a),
166 4 Cryptosystem and Authentication System
  
pay off (s, a) = p(k).
a∈A a∈A k∈K
ek (s)=a

When a runs through the s column of the authentication matrix, k traverses the whole
key space, so  
pay off (s, a) = p(k) = 1.
a∈A k∈K

Therefore, there is at least one a ∈ A such that

1 1
pay off (s, a) ≥ = .
|A| r

Theorem 4.8 Let T = {S, A, K , E} be a message authentication system,

pd0 = max{pay off (s, a)|s ∈ S, a ∈ A}

is the maximum probability of successful forgery attack, then

log2 pd0 ≥ H (K |S A) − H (K )

and
1
pd0 ≥ .
2 H (K )−H (K |S A)
Proof By definition, we know that pd0 is not less than the mathematical expectation
of pay off (s, a), that is

pd0 ≥ p(sa)pay off (s, a).
s∈S,a∈A

Then by Jensen inequality, we have



log2 pd0 ≥ log2 p(sa)pay off (s, a)
s∈S,a∈A

≥ p(sa) log2 pay off (s, a).
s∈S,a∈A

Obviously,
pay off (s, a) = p(a|s).

Thus
p(sa) = p(s) p(a|s) = p(s)pay off (s, a). (4.27)
4.5 Forgery Attack 167

So 
log2 pd0 ≥ p(sa) log2 pay off (s, a)
s∈S a∈A

= p(sa) log2 p(a|s)
s∈S a∈A
= −H (A|S).

Because the source space S and the key space K are statistically independent, so

H (S K ) = H (K ) + H (S).

Also, the tag space A is completely determined by the source space S and the key
space K , so
H (A|K S) = 0.

By the addition formula of information space,

H (K AS) = H (AS) + H (K |AS)


= H (S) + H (A|S) + H (K |AS).

On the other hand,

H (K AS) = H (K S) + H (A|K S)
= H (K S) = H (K ) + H (S).

On the whole, we have

−H (A|S) = H (K |AS) − H (K ).

Thus
log2 pd0 ≥ −H (A|S) = H (K |AS) − H (K ).

We completed the proof of the theorem.

M = S A is called message space, it can be seen from theorem 4.8 that the maxi-
mum success probability pd0 of forgery attack satisfies

1
pd0 ≥ ,
2 I (K ,M)
where I (K , M) is the average amount of mutual information between the key space
and the information space. If the amount of mutual information I (K , M) is larger,
the probability of the most successful forgery attack is lower. On the contrary, if the
amount of mutual information is smaller, the success rate of forgery attack is higher.
168 4 Cryptosystem and Authentication System

4.6 Substitute Attack

The so-called substitution attack is that the attacker first observes a message (s, a) on
the message, and then replaces (s, a) with message (s , a ), hoping that the receiver
will receive (s , a ) as a real message. Considering the maximum success probability
pd1 of substitution attack, it is more difficult than forgery attack, the main reason is
that pd1 depends on both the probability distribution of source state space S and the
probability distribution of key space K .
Let (s , a ) and (s, a) be two messages, where s = s . We use pay off (s , a , s, a)
to express the probability that using (s , a ) instead of (s, a) can cheat success, then

pay off (s , a , s, a) = p(a = ek0 (s )|a = ek0 (s)), k0 ∈ K .

The above formula represents the conditional probability of a = ek0 (s ) under the
condition of a = ek0 (s) under the same key k0 , so

p(a = ek0 (s ), a = ek0 (s))


pay off (s , a , s, a) =
p(a = ek0 (s))
 (4.28)
k∈K , p(k)
ek0 (s )=a ,ek0 (s)=a
= .
pay off (s, a)

When the message (s, a) ∈ M is given, the attacker uses the optimal strategy to
maximize the success probability of the deceiver, so let

ps,a = max{pay off (s , a , s, a)|s ∈ S, s = s, a ∈ A}, (4.29)

Taking ps,a as a random variable, its mathematical expectation on message set M =


S A is 
pd1 ≥ p(sa) ps,a . (4.30)
s∈S,a∈A

The above formula is the formal definition of pd1 , which is the weighted average of
the maximum success probability of pay off (s , a , s, a) in message space M.
Like Theorem 4.7, we have

Theorem 4.9 Let T = {S, A, K , E} be an authentication code, |A| = r , then for


any given s ∈ S, s ∈ S, s = s and a ∈ A, there is a label a ∈ A such that

1 1
pay off (s , a , s, a) ≥ = .
|A| r
4.6 Substitute Attack 169

So we have
1
pd1 ≥ .
r
Proof By (4.28),
 1  
pay off (s , a , s, a) = p(k)
a ∈A
pay off (s, a) a ∈A k∈K
ek (s)=a,ek (s )=a
1 
= p(k) = 1.
pay off (s, a) k∈K
ek (s)=a

So at least one a ∈ A such that


1 1
pay off (s , a , s, a) ≥ = .
|A| r

By the definition of ps,a , for ∀ s ∈ S and a ∈ A, we have

1 1
ps,a ≥ = .
|A| r

Thus
 1 1
pd1 ≥ p(sa) ps,a ≥ p(a) = .
s∈S,a∈A
r a∈A r

Theorem 4.10 Let T = {S, A, K , E} be an authentication code, for any (s, a) ∈ M,


when using (s , a ) instead of attack, let pd1 be the mathematical expectation of ps,a
in space M, then
log2 pd1 ≥ H (K |M 2 ) − H (K |M)

and
1
pd1 ≥ . (4.31)
2 H (K |M)−H (K |M 2 )
Proof By (4.29), ps,a will not be less than the mathematical expectation of pay off
(s , a , s, a) on s ∈ S, a ∈ A, that is

ps,a ≥ p(s a |sa)pay off (s , a , s, a).
s ∈S,a ∈A

By (4.30) and Jensen inequality, we have


170 4 Cryptosystem and Authentication System

log2 pd1 ≥ p(sa) log2 ps,a
s∈S,a∈A
 
≥ p(sa) p(s a |sa) log2 pay off (s , a , s, a)
s∈S,a∈A s ∈S,a ∈A
 
= p(sas a ) log2 pay off (s , a , s, a)
s∈S,a∈A s ∈S,a ∈A
 
= p(sas a ) log2 p(a s |as)
s∈S,a∈A s ∈S,a ∈A

= −H (M|M).

In addition,
H (K M 2 ) = H (M|M) + H (K |M 2 )
= H (K |M) + H (M|K M).

So there are

−H (M|M) = H (K |M 2 ) − H (K |M) − H (M|K M).

It can be proved that


H (M|K M) = 0.

So there are
−H (M|M) = H (K |M 2 ) − H (K |M).

Thus
log2 pd1 ≥ H (K |M 2 ) − H (K |M).

That is
1
pd1 ≥ .
2 H (K |M)−H (K |M 2 )
The Theorem holds!

Definition 4.9 An authentication code {S, A, K , E} is called perfect if

pd = 2 H (K |M)−H (K ) .

Theorem 4.11 Perfect certification system exists.

Proof The theorem is proved directly by the construction method. First, let the source
state space be S = {0, 1}. Let N be a positive even number, and define the label space
A and the key space K as follows:
N N
A = Z22 = {a1 a2 · · · a N2 |ai ∈ Z2 , 1 ≤ i ≤ }
2
4.6 Substitute Attack 171

and
K = Z2N = {k1 k2 · · · k N |ki ∈ Z2 , 1 ≤ i ≤ N }.

The authentication rule ek (s) determined by k = k1 k2 · · · k N2 k N2 +1 · · · k N is defined


as
ek (0) = k1 k2 · · · k N2

and
ek (1) = k N2 +1 · · · k N .

Assuming that all 2 N keys k are equitably selected, so for s ∈ S and a ∈ A, we have

pay off (s, a) = p(a = ek (s)) = 2− 2 .


N

So there is pd0 = 2− 2 , similarly to pd1 = 2− 2 , so


N N

pd = 2 − 2 .
N

Easy to calculate

N
H (K |M) − H (K ) = − N = −H (K |M).
2
So
pd = 2 H (K |M)−H (K ) .

Therefore, {S, A, K , E} is a perfect authentication system.

4.7 Basic Algorithm

4.7.1 Affine Transformation

Encryption with matrix comes from the classical V igen èr e password. Let X =
{a1 , a2 , . . . , a N } be a plaintext alphabet of N characters, we replace the characters
in Z N and X with numerical values, where Z N is the remaining class ring of mod N .
Let P = ZkN be the plaintext space, x = x1 x2 · · · xk ∈ P is called a plaintext unit
or a plaintext message of length k. Let Mk (Z N ) be a k-order full matrix ring over
Z N , A ∈ Mk (Z N ) is a invertible matrix of order k, b = b1 b2 · · · bk ∈ ZkN is a given
directional quantity, each plaintext unit x = x1 x2 · · · xk in P is encrypted by affine
transformation (A, b):
172 4 Cryptosystem and Authentication System
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
x1 x1 b1
⎜x ⎟ ⎜ x2 ⎟ ⎜ b2 ⎟
⎜ 2⎟ ⎜ ⎟ ⎜ ⎟
⎜ .. ⎟ = A ⎜ .. ⎟ + ⎜ .. ⎟ . (4.32)
⎝ . ⎠ ⎝ . ⎠ ⎝ . ⎠
xk xk bk

where x = x1 x2 · · · xk is clear text, x = x1 x2 · · · xk is cryptosystemtext. The decryp-


tion algorithm is: ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
x1 x1 b1
⎜ x2 ⎟ ⎜x ⎟ ⎜ b2 ⎟
⎜ ⎟ ⎜ 2⎟ ⎜ ⎟
⎜ .. ⎟ = A−1 ⎜ .. ⎟ − A−1 ⎜ .. ⎟ . (4.33)
⎝ . ⎠ ⎝ . ⎠ ⎝ . ⎠
xk xk bk

Because affine transformation (A, b) is a 1–1 correspondence on ZkN −→ ZkN ,


its inverse transformation is (A−1 , −A−1 b); therefore, using affine transformation
(A, b), we obtain the so-called high-order affine cryptosystem. This cryptosystem
was first proposed by mathematician Lester hill in American Mathematics monthly
in 1929, so it is also called Hill cryptosystem.
Hill cryptosystem divides the plaintext into a group of k characters and then
encrypts each plaintext unit in turn by using k-order affine transformation (A, b) on
Z N . The advantage of this password is that it hides the statistical characteristics of a
single character (such as 26 letters in English), which can better resist the statistical
analysis of the occurrence frequency of characters, and has strong ability to resist
cryptosystemtext only attacks. However, on the basis of mastering a large amount of
plaintext, it is not difficult to find the key (A, b), so the hill password is not strong
against the attack of known plaintext.
The mathematical principles used by Hill cryptosystem are the following two
conclusions.

Lemma 4.1 The set of all k-order affine transformations on Z N is written as G k ,


that is

G k = {(A, b)|Ais a k -order reversible square matrix, b ∈ ZkN }.

Then G k forms a group under the multiplication of transformation, which is called


the k-order affine transformation group of ring Z N .

Proof Take A as the k-order identity matrix E and b = 0 as the k-dimensional zero
vector, then (E, 0) is the identity transformation of ZkN −→ ZkN and the unit element
of G k . Secondly, we look at the product of two affine transformations (A1 , b1 ) and
(A2 , b2 ),
(A1 , b1 )(A2 , b2 ) = (A1 A2 , A1 b2 + b1 ) ∈ G k .

Obviously, the inverse transformation of (A, b) is


4.7 Basic Algorithm 173

(A, b)−1 = (A−1 , −A−1 b) ∈ G k .

Therefore, G k is a group. The Lemma holds.

From the above lemma, any group element (A, b) ∈ G k of affine transformation
group will form a Hill cryptosystem. If we select n group elements (A1 , b1 ), (A2 , b2 ),
. . ., (An , bn ) in G k and let
n
(A, b) = (Ai , bi ).
i=1

Using (A, b) to encrypt, we get a more complex Hill cryptosystem.

Lemma 4.2 A ∈ Mk (Z N ), |A| = D is the determinant of A, then A is reversible if


and only if D and N are coprime, that is (D, N ) = 1.

Proof If (D, N ) = 1, then there is D1 such that D1 D ≡ 1(mod N ), let


⎡ ⎤
A11 A12 ··· A1k
⎢ A21 A22 ··· A2k ⎥
A = D1 ⎢

⎣ ···
⎥, (4.34)
··· ··· ··· ⎦
Ak1 Ak2 ··· Akk

where A = (ai j )k×k , Ai j is the algebraic cofactor of ai j . obviously, we have


⎡ ⎤
1 0 ··· 0
⎢0 1 ··· 0⎥
⎢ ⎥
A∗ A = A A∗ = ⎢ . .. .. ⎥ ,
⎣ .. . .⎦
0 0 ··· 1

So A is reversible, A−1 = A∗ . Let’s take k = 2 as an example, if |A| = D,


(D, N ) = 1,    
ab −1 D1 d −D1 b
A= =⇒ A = .
cd −D1 c D1 a

Conversely, if A is reversible and A−1 is the inverse matrix, because A−1 A =


A A−1 = E, we get
|A A−1 | = |A||A−1 | ≡ 1(mod N ).

So we have (D, N ) = 1. The Lemma holds.

If k = 1, first-order affine cryptosystem x ≡ ax + b(mod N ), where (a, N ) = 1,


contains many famous classical passwords, especially when a = 1, b = 3, N = 26,
x = x + 3(mod 26) is the famous Caesar code in history.
Next, we analyze the computational complexity of affine cryptography. We have
the following Lemma.
174 4 Cryptosystem and Authentication System

Lemma 4.3 If A = (ai j )k×k is a k-order reversible square matrix on Z N , the bit
operation times of A−1 are estimated as follows

Time(A−1 ) = O(k 4 k! log3 N ).

Therefore, when k is a fixed constant, the algorithm for finding A−1 is polynomial.
When N is a fixed constant, the algorithm for finding A−1 is exponential. In other
words, the greater the order of the matrix, the higher the computational complexity.

Proof Because A = (ai j )k×k is reversible, then determinant



D = |A| = (−1)τ ( j1 j2 · · · jk )a1 j1 a2 j2 · · · ak jk ,
j1 j2 ··· jk

where j1 j2 · · · jk is an arrangement of 1, 2, . . . , k and τ ( j1 j2 · · · jk ) is the reverse


order number of the arrangement. The number of bit operations of each summation
is O(k 3 log2 N ), and there are k! summation terms in total, thus

Time(D) = O(k 3 k! log2 N ).

By Lemma 1.5 of Chapter 1, find the multiplicative inverse of D under mod N ,


D −1 mod N = D1 is
Time(D1 ) = O(log3 N ).

The bit operation times of each algebraic cofactor Ai j of the adjoint matrix A∗ of
formula (4.34) is O((k − 1)3 (k − 1)! log2 N ), and there are k 2 algebraic cofactors,
thus
Time(A∗ ) = O(k 4 k! log2 N ).

So
Time(A−1 ) = O(k 4 k! log3 N ).

When k is constant, the algorithm for finding A−1 is polynomial. When N is constant
and k −→ ∞, it is obvious that the algorithm for finding A−1 is exponential. The
Lemma holds.

4.7.2 RSA

In 1976, two mathematicians from Stanford University, Diffie and Hellman, put
forward a new idea of cryptosystem design. In short, the encryption algorithm and
decryption algorithm are designed based on the principle of asymmetry. We can use
the following schematic diagram to illustrate
4.7 Basic Algorithm 175

f f −1
P−
→ C −−→ P, (4.35)

where P is the plaintext space, C is the cryptosystemtext space, f encryption algo-


rithm and f −1 decryption algorithm. If f and f −1 are the same algorithm, such as
the involution operation in binary system, or the encryption algorithm f can easily
deduce the decryption algorithm f −1 . For example, the matrix encryption algorithm
mentioned in the previous section (the matrix order is very small), which is called
symmetric cryptosystem. The essence of symmetric cryptosystem is that the encryp-
tion key and decryption key have the same confidentiality importance. Diffie and
Hellman proposed that if f = f −1 and f are encryption algorithms that are easy
to implement, while f −1 is a decryption algorithm that is very difficult to calcu-
late, the key can be divided into encryption key and decryption key. Even if the
encryption key is made public to the public, the security of decryption key will not
be affected. This encryption algorithm f is called asymmetric or trapdoor one-way
function. The password using asymmetric f is called asymmetric password or public
key cryptosystem. Due to the bold innovation of Diffie and Hellman, cryptography
has ushered in a new era—the era of public key cryptography. Its basic feature is
that passwords change from few users to many users, which greatly improves the
efficiency and social value of passwords.
How to design asymmetric encryption algorithm? Rivest, Shamir and Adleman
jointly put forward the first secure and practical one-way encryption algorithm, which
is called RSA algorithm in academic circles. This public key cryptosystem has been
widely used in cryptographic design and has become an international standard algo-
rithm. In addition to its simplicity and practicality, its security completely depends
on the difficulty of large prime factorization of huge integers.
Let p, q be two large and relatively safe prime numbers, assume

10300 < p, q, its binary digits k > 1024bits. (4.36)

Let n = pq, ϕ(n) be an Euler function, then

ϕ(n) = ( p − 1)(q − 1) = n + 1 − p − q.

Randomly select a positive integer e to satisfy

1 < e < ϕ(n), (e, ϕ(n)) = 1. (4.37)

The large prime numbers p and q and e satisfying formula (4.37) are randomly
generated. The so-called random generation is to randomly select the p, q and e
with the help of the computer random number generator (or pseudo-random number
generator), and its computational complexity is

Lemma 4.4 Randomly generated large prime number p and q, n = pq, ϕ(n) is
Euler function, 1 < e < ϕ(n), (e, ϕ(n)) = 1, then
176 4 Cryptosystem and Authentication System

Time (select out n) = O(log4 n),
Time (find e) = O(log2 n).

Proof Use the random number generator to generate a huge integer m, such as
m > 10300 , and then detect whether m, m + 1, m + 2, . . . , is a prime number. From
the prime number theorem, we know that the frequency of prime numbers adjacent
to m is about O( log1 m ), so we only need about O(log m) tests to find the required
prime number p, by Lemma 1.5 of Chapter 1,

Time (find prime p) = O(log2 m) = O(log2 n).

Similarly,
Time (find prime q) = O(log2 n).

Because n = pq, so
Time (select out n) = O(log4 n).

n after confirmation, ϕ(n) = ( p − 1)(q − 1). A positive integer a, 1 < a < ϕ(n),
is randomly generated by the random number generator, and then whether a, a +
1, a + 2, . . . and ϕ(n) are mutually prime is detected in turn. Again, according to the
prime number theorem, the frequency of the prime factor of ϕ(n) appearing in the
vicinity of a is O( log1 a ), so we only need O(log a) tests to get the required e. Thus

Time (select out e) = O(log2 a) = O(log2 n).

The Lemma holds.


After randomly selecting p, q, n = pq, and e, because (e, ϕ(n)) = 1, then exist
d = e−1 mod ϕ(n), that is

de ≡ 1(mod ϕ(n)), 1 < d < ϕ(n). (4.38)

Definition 4.10 After randomly determining n = pq, let Pe = (n, e) be called pub-
lic key, Pd = (n, d) be called private key, or e be public key and d be private key.
By Lemma 1.5 of chapter 1, calculate the number Time(d) = O(log3 ϕ(n)) =
O(log3 n) of bit operations required for d = e−1 mod ϕ(n). By Lemma 4.4, we have
Corollary 4.3 The computational complexity of randomly generated public key
Pe = (n, e) and private key Pd = (n, d) is polynomial.
The key mathematical principle used in RSA cryptographic design is the general-
ized Euler congruence theorem. n ≥ 1 is a positive integer, (m, n) = 1, from Euler
theorem, it can be seen that

m ϕ(n) ≡ 1(mod n), =⇒ m ϕ(n)+1 ≡ m(mod n). (4.39)


4.7 Basic Algorithm 177

We will prove that under the condition that n is a positive integer without square
factor, there is formula (4.39) for all positive integers m, whether (m, n) = 1 or
(m, n) > 1.

Lemma 4.5 If n = pq is the product of two different prime numbers, then for all
positive integers m, k, there are

m kϕ(n)+1 ≡ m(mod n). (4.40)

Proof If (m, n) = 1, then by Euler Theorem,

m kϕ(n) ≡ 1(mod n), =⇒ m kϕ(n)+1 ≡ m(mod n).

We only consider the case of (m, n) > 1, because n = pq, so (m, n) = p, (m, n) =
q, or (m, n) = n. If (m, n) = n, then(4.40) holds. Might as well let (m, n) = p, then
m = pt, where 1 ≤ t < q. By Euler theorem, because (m, q) = 1, so

m ϕ(q) ≡ 1(mod q), =⇒ m kϕ(q)ϕ( p) ≡ 1(mod q).

For ∀ k ≥ 1, there is
m kϕ(n) ≡ 1(mod q).

We write
m kϕ(n) = rq + 1.

Both sides are multiplied by m,

m kϕ(n)+1 = r tn + m.

The above formula contains

m kϕ(n)+1 ≡ m(mod n).

We have completed the proof of lemma.

With the above preparations, the workflow of RSA password can be divided into
the following three steps:
(1) Suppose A is a user of RSA, and A randomly generates two huge prime num-
bers p = p(A), q = q(A), n = n(A), where n = pq, ϕ(n) = ( p − 1)(q − 1).
Then randomly generate positive integers e = e(A), satisfies 1 < e < ϕ(n),
(e, ϕ(n)) = 1, calculated d ≡ e−1 (mod ϕ(n)), and 1 < d < ϕ(n). User A
destroys two prime numbers p and q, and only keeps three numbers n, e, d,
after publishing Pe = (n, e) as public key, he has private key Pd = (n, d) and
keeps it strictly confidential.
178 4 Cryptosystem and Authentication System

(2) User B of another RSA sends encrypted information to user A using the known
public key (n, e) of user A. B selects P = Zn as the plaintext space and encrypts
each m ∈ Zn . The encryption algorithm c = f (m) is defined as

c = f (m) ≡ m e (mod n), 1 ≤ c ≤ n. (4.41)

where c is cryptosystemtext.
(3) After receiving the cryptosystemtext c sent by user B, user A decrypts it with
its own private key (n, d). Decryption algorithm f −1 is defined as:

m = f −1 (c) ≡ cd (mod n), 1 ≤ m ≤ n. (4.42)

User A gets the plaintext m sent by user B. so far, RSA cryptosystem completes
encryption and decryption.
The correctness and uniqueness of RSA password are guaranteed by the following
Lemma.
Lemma 4.6 The encryption algorithm f defined by equation (4.41) is a 1–1 corre-
spondence of Zn −→ Zn , and f −1 defined by equation (4.42) is the inverse mapping
of f .
Proof By Lemma 4.5, for all m ∈ Zn , k is a positive integer, then there is

m kϕ(n)+1 ≡ m(mod n).

Because of ed ≡ 1(mod ϕ(n)), we can write

ed = kϕ(n) + 1.

By (4.41), then there is

cd ≡ m ed ≡ m kϕ(n)+1 ≡ m(mod n).

That is to say, for all m ∈ Zn ,

f −1 ( f (m)) = m.

In the same way, we have

m e ≡ ced ≡ ckϕ(n)+1 ≡ c(mod n).

In other words,
f ( f −1 (c)) = c.

By Lemma 1.1 of Chap. 1, f is a 1–1 correspondence of Zn −→ Zn , and f f −1 =


1, f −1 f = 1. Th Lemma holds.
4.7 Basic Algorithm 179

Another very important application of RSA is for digital signature. From the
workflow of RSA password, it can be seen that the encryption algorithm defined in
formula (4.41) is based on the public key (n A , e A ) of user A, and we denote f as
f A and the decryption algorithm defined in formula (4.42) as f A−1 . The workflow
of RSA digital signature is: User A sends his digital signature to user B, that is, A
sends an encrypted message to B. Let Pe (A) = (n A , e A ) be the public key of A and
Pd (A) = (n A , d A ) the private key of A. Similarly, Pe (B) = (n B , e B ) is the public
key of B and Pd (B) = (n B , d B ) is the private key of B. Then the digital signature
sent by user A to user B is

f B f A−1 (m), if n A < n B
(4.43)
f A−1 f B (m), if n A > n B .

where m ∈ Zn A is the digital signature published by user A. After receiving the


above digital signature of user A, user B adopts the following two different digital
verification according to the two cases of n A < n B and n A > n B , formula (4.43) is
the real signature of user A.
(i) If n A < n B , user B first decrypts with his private key f B−1 = (n B , d B ) and then
decrypts with user A’s public key f A = (n A , e A ), the verification is as follows

f A f B−1 ( f B f A−1 (m)) = f A f A−1 (m) = m.

(ii) If n A > n B , user B uses user A’s public key f A = (n A , e A ) first, then decrypt
and verify with your own private key f B−1 = (n B , d B )

f B−1 f A ( f A−1 f B (m)) = f B−1 f B (m) = m.

The security of RSA is the difficulty of large prime factorization based on n. When
all users select the large prime numbers p and q, let n = pq, then destroy p and
q, only (n, e) and its own secret (n, d) key information are retained, even if (n, e)
is published to the public, outsiders only know n and do not know ϕ(n), so they
cannot obtain the information of private key (n, d). Because the calculation of ϕ(n)
must rely on the prime factorization of n, from the product formula of Euler, it is not
difficult to see
 1
ϕ(n) = n (1 − ).
p|n
p

Because we have very little knowledge of prime numbers, we have not found a general
term formula to give an infinite number of prime numbers, so it is undoubtedly a
difficult problem to judge whether a huge integer n is prime, not to mention the prime
factorization of n.
180 4 Cryptosystem and Authentication System

4.7.3 Discrete Logarithm

Let G be a finite group and b, y ∈ G be two group elements of G, let t be the minimum
positive integer satisfying bt = 1, t is called the order of b, denote as t = o(b). If there
is one x, 1 ≤ x ≤ o(b) such that y = b x , x is called the discrete logarithm of y under
base b. Known b ∈ G, 0 ≤ x ≤ o(b), it’s easy to calculate y = b x . Conversely, for
any group element y, it is very difficult to find the discrete logarithm of y under base
b. Therefore, using discrete logarithm to encrypt has become the most mainstream
encryption algorithm in public key cryptosystem, including the famous ElGamal
cryptosystem and elliptic curve cryptosystem. ElGamal cryptosystem uses the dis-
crete logarithm on the multiplication group formed by all Fq∗ of nonzero elements
in finite field Fq . Elliptic curve cryptography uses the discrete logarithm algorithm
of Mordell group on elliptic curve. Here we mainly discuss ElGamal cryptography,
and elliptic curve cryptography is discussed in Chap. 6. We first prove several basic
conclusions in finite field.

Lemma 4.7 Let Fq be a finite field of q elements and q = p n be the power of


prime p. Fq∗ = Fq \{0} is all the nonzero elements in Fq , then Fq∗ is a cyclic group of
order (q − 1) under multiplication, and the generating element g of Fq∗ is called the
generator of finite field Fq .

Proof According to Lagrange theorem, the number of zeros of polynomials in any


field is not greater than the degree of polynomials. The finite field Fq∗ is a finite group
of order (q − 1) under multiplication. To prove that Fq∗ is a cyclic group, it is only
proved that for any factor d of q − 1, d|q − 1, the number of solutions of equation
x d = 1 in Fq∗ is not greater than d. This point can be deduced from Lagrange’s
theorem, because the number of zeros of polynomial x d − 1 in the whole field Fq
is not greater than d, so the number of zeros in Fq∗ is not greater than d. So Fq∗ is a
finite cyclic group. The Lemma holds.

Lemma 4.8 Let Fq be a q-element finite field, q = p n , F p ⊂ Fq is a subfield, F∗p <


q−1
Fq∗ is a subgroup of Fq∗ , if g is the generator of Fq∗ , then g = g p−1 is the generator
of F∗p .
q−1
Proof g is the generator of Fq∗ , then o(g) = q − 1. Let g = g p−1 , then

o(g)
o(g ) = = p − 1.
(q − 1, q−1
p−1
)

Thus (g ) p−1 = 1, that is (g ) p = g , so g ∈ F p . Because F∗p is a cyclic group of order


p − 1, and o(g ) = p − 1, so F∗p =< g >, g is the generator of F∗p . The Lemma
holds.

Lemma 4.9 Let Fq be a q-element finite field, q = p n , for any d|n, let
4.7 Basic Algorithm 181

Ad = { p(x) ∈ F p [x]| deg p(x) = d, p(x) is an irreducible monic polynomial}

and 
f d (x) = p(x).
p(x)∈Ad

Then we have 
n
xq − x = x p − x = f d (x). (4.44)
d|n

Proof We know
d n
x p − x|x p − x ⇐⇒ d|n.

Let p(x) ∈ Ad , that is p(x) ∈ F p [x], deg p(x) = d, p(x) is an irreducible monic
polynomial. Let α be a root of p(x), then add a finite extension field of α on F p and
F p (α) is a d-th finite extension on F p . If d|n, then

F p (α) = F pd ⊂ Fq ,

so there is α ∈ Fq . Because the zeros of p(x) are all in Fq , so there is p(x)|x q − x.


Any p(x) in Ad has p(x)|x q − x, so

f d (x) = p(x), f d (x)|x q − x.
p(x)∈Ad

Conversely, p(x) is the first irreducible polynomial, and deg p(x) = d. If p(x)|x q −
x, then the zeros of p(x) are all in Fq . Let α be a zero point of p(x), then there is
F p (α) ⊂ Fq , that is F pd ⊂ Fq = F pn , so d|n. Finally,

xq − x = f d (x).
d|n

The Lemma holds.

Lemma 4.10 N p (d) represents the number of the first irreducible polynomial with
degree d in F p [x], then
1 n
N p (n) = μ(d) p d , (4.45)
n d|n

where μ is Möbius function.

Proof By Lemma 4.9 and (4.44),


n

xq − x = x p − x = f d (x).
d|n
182 4 Cryptosystem and Authentication System

Comparing the degree of polynomials on both sides, there is



pn = d N p (d).
d|n

By the Möbius inverse formula,


 n
n N p (n) = μ(d) p d ,
d|n

so there is (4.45), the Lemma holds.

Corollary 4.4 If d is a prime number, the degree in F p [x] is d and the number of
the first irreducible polynomial is d1 ( p d − p), that is

1 d
N p (d) = ( p − p), if d is a prime number.
d

Proof By (4.45),
1 d
N p (d) = μ(δ) p δ
d δ|d
1 d
= ( p − p).
d
The Corollary holds.

Based on the above basic conclusions about finite fields, we introduce two methods
for solving discrete logarithms. The first is the Silver–Pohlig–Hellman smoothing
method, and the second is the so-called exponential integration method.
Silver–Pohlig–Hellman
Let Fq be a q-element finite field, b is the generator, that is Fq∗ =< b >,

o(b) = |Fq∗ | = q − 1 = p1α1 p2α2 · · · psαs , (4.46)

where pi is a different prime number. p for each prime factor of q − 1, p|q − 1, if


p is relatively “small", the positive integer q − 1 is called a smooth positive integer.
Under the condition that q − 1 is smooth, for each prime factor p, calculate all p-th
unit roots r p, j in Fq∗ , where

j (q−1)
r p, j = b p , 1 ≤ j ≤ p. (4.47)

Denote R( p) = {r p, j |1 ≤ j ≤ p} is the root of p p subunits in Fq∗ , then in Fq∗ , we


get a unit root table R.

unit root table R = {R( p1 ), R( p2 ), . . . , R( ps )}. (4.48)


4.7 Basic Algorithm 183

Now let’s look at the calculation method of discrete logarithm in Fq∗ . Let y ∈ Fq∗ , the
discrete logarithm of y under base b is m, that is y = bm . When y and b are given, the
value of m is desired (1 ≤ m ≤ q − 1), by the prime factor decomposition of q − 1
of formula (4.46), if for each piαi (1 ≤ i ≤ s), the minimum nonnegative residue of
m under mod piαi is m i = m mod piαi , according to the Chinese remainder theorem,
there is a unique m mod q − 1 such that

m ≡ m i (mod( piαi )), ∀ i, 1 ≤ i ≤ s.

Therefore, the discrete logarithm m of y is determined. Now the question is: let
p α ||q − 1, we determine m mod p α . Let

m mod p α = m 0 + m 1 p + m 2 p 2 + · · · + m α−1 p α−1 , 0 ≤ m i < p

A is the minimum nonnegative residue of m mod p α , let’s determine each m i . First,


we calculate m 0 . Because y = bm , so
q−1 m(q−1) m 0 (q−1)
y p =b p =b p .
q−1
That is, y p is a unit root in Fq∗ , compare the unit root table R in Fq∗ , then we
have m 0 = j, 1 ≤ j ≤ p, which determines m 0 . Next, calculate m 1 , let y1 = bmy 0 =
bm−m 0 , therefore, the discrete logarithm of y1 is m − m 0 , and

m − m 0 ≡ m 1 p + m 2 p 2 + · · · + m α−1 p α−1 (mod p α ),

so q−1 (m−m 0 )(q−1) m 1 (q−1)


2
y1 p = b p2 =b p .
q−1

in other words, y1 p is a p subunit root of Fq∗ , comparing the unit root table R, we
2

can determine m 1 . Continuing with this method, we can calculate m 2 , . . . , m α−1 in


turn, so m mod p α is calculated, then by the Chinese remainder theorem, the discrete
logarithm m of y under b is calculated.
Exponential integral method
Let Fq be the finite field of q element, q = p n , p be a relatively small prime
number, and n be a large positive integer, so that the security of q can meet certain
requirements. Let F p be the finite field of p element, we can think of Fq as an
n-th extension field of F p , according to the finite extension theory of the field, Fq
equivalent to (isomorphism) a quotient ring of polynomial ring F p [T ] over F p . Let
f (T ) ∈ F p [T ] be the first irreducible polynomial of n degree, then

Fq = F p [T ]/< f (T )> = {a0 + a1 T + · · · + a0 T n−1 |∀ ai ∈ F p }. (4.49)


184 4 Cryptosystem and Authentication System

Therefore, any element a in Fq is equivalent to a polynomial a(T ) on F p , where


deg a(T ) ≤ n − 1. Let b ∈ Fq be the generator of Fq , b = b(T ), if a0 ∈ F p is a
constant polynomial, a0 is called a constant in Fq .
By Lemma 4.8, the discrete logarithm of the constant in Fq can be easily deter-
mined. Let b ∈ F p be the generator of F∗p , if m is the discrete logarithm of constant
a0 ∈ F p to base b , then by Lemma 4.8, m = m q−1 p−1
is the discrete logarithm of
a0 ∈ Fq under base b. Take m (a0 ) as the discrete logarithm of a0 under base b ,
since p is small, we can easily calculate and list the discrete logarithms of all con-
stants in Fq :
q −1
L 0 = {m (a0 ) |a0 ∈ F p }. (4.50)
p−1

Next, we determine the discrete logarithm of a nonconstant polynomial under base


b(T ). Let 1 < m < n, define

L m = { p(x) ∈ F p [x]| p(x) is monic irreducible polynomial, deg p(x) ≤ m},


(4.51)
The number of irreducible polynomials in L m is written as h m , that is |L m | = h m .
We first calculate the discrete exponent of irreducible polynomials in L m .
Let b = b(T ) be the generator of Fq∗ , b(T ) ∈ F p [T ], deg b(T ) ≤ n − 1, obviously,
when t runs through all positive integers from 1 to q − 1, bt (T ) runs through all
nonzero polynomials in Eq. (4.49). Appropriate choice t, let

bt (T ) ≡ c(T )(mod f (T )), deg c(T ) ≤ n − 1.

Such that 
c(T ) = c0 p(T )αc, p ,
p(T )∈L m

denote the discrete logarithm of a(T ) under b(T ) with ind(a(T )), which can be
obtained from the above formula,

ind(c(T )) − ind(c0 ) ≡ αc, p ind( p(T )) (mod q − 1).
p(T )∈L m

Because of ind(c(T )) = t, thus,



t − ind(c0 ) ≡ αc, p ind( p(T )) (mod q − 1). (4.52)
p(T )∈L m

By (4.50), ind(c0 ) is known, therefore, the above formula is a linear equation with
h m variables ind( p(T )). By continuously selecting the appropriate t, we can obtain
h m independent linear equations, that is, the h m × h m -order matrix formed by the
coefficients of h m variables and h m linear equations is reversible under mod q − 1, by
Lemma 4.2, as long as its determinant and q − 1 are coprime. From the knowledge of
4.7 Basic Algorithm 185

linear algebra, we can calculate all ind( p(T )) by solving the above linear equations,
the following exponential integral table Bm is obtained,

Bm = {ind( p(T ))| p(T ) ∈ L m }. (4.53)

With exponential integral table Bm , the discrete logarithm of any element a(T ) ∈
Fq∗ can be easily calculated. Let a1 (T ) = a(T )b(T )t , select the appropriate t such
that 
a1 (T ) ≡ a0 p(T )αa (mod f (T )).
p(T )∈L m

Once the decomposition is established, there are



ind(a1 (T )) = ind(a0 ) + αa ind( p(T )).
p(T )∈L m

Thus
ind(a(T )) = ind(a1 (T )) − t.

The discrete logarithm of a(T ) is obtained.

Remark 4.1 The key to the above calculation is to select an appropriate m to obtain
the exponential integral table Bm . This m cannot be too large, because h m increases
exponentially with m, for example, if m is a prime number, then by Corollary 4.4,

1 m
h m = |L m | = ( p − p).
m
When h m is too large and calculating the exponential integral table Bm , a matrix
of order h m × h m will be solved, and its computational complexity is exponential.
Obviously, m cannot be too small, the selection of m depends on p and n, when
p = 2, n = 127, m’s best choice is m = 17. Select finite field Fq , q = 2127 , because
q − 1 = 2127 − 1 is a Mersenne prime. This is a popular option at present.

ElGamal cryptosystem
Using the computational complexity of discrete logarithm to design asymmetric
cryptosystem is the basic idea of ElGamal cryptosystem. Each user randomly selects
a finite field Fq , q = p n , p is a sufficiently large prime number, and then calculates
the generator g of Fq∗ , select the positive integer x randomly, 1 < x < q − 1, and cal-
culate y = g x , to get the public key Pe = (y, g, q), own private key Pd = (x, g, q).
Encryption algorithm: To send an encrypted message to user A, user B first corre-
sponds each plaintext unit of plaintext space P to an element in Fq∗ , and then encrypts
each plaintext unit. Let m ∈ Fq∗ be a plaintext unit, and user B randomly selects an
integer k, 1 < k < q − 1, then, the public key (y, g, q) of user A is used to encrypt
m, and the encryption algorithm f is
186 4 Cryptosystem and Authentication System

c = my k ,
f (m) = c , where (4.54)
c = gk .

Get cryptosystemtext (c, c ).


Decryption algorithm: After receiving the cryptosystemtext (c, c ) sent by user
B, user A decrypts (c, c ) with its own private key (x, g, q), decryption algorithm
f −1 is
f −1 (c ) = c c−x . (4.55)

Lemma 4.11 The encryption algorithm f defined by Eq. (4.54) is a 1–1 correspon-
dence of Fq∗ −→ Fq∗ , the inverse mapping f −1 of f is given by equation (4.55).
Proof By (4.54), c = g k , c = my k , then

c c−x = my k g −kx = mg xk g −xk = m.

That is to say f −1 ( f (m)) = m, conversely,

c c−x y k = c g −xk g xk = c .

that is f ( f −1 (c )) = c , therefore, f is the 1–1 correspondence of Fq∗ −→ Fq∗ and


the inverse mapping of f is f −1 . The Lemma holds.
Finally, we discuss the computational complexity over finite fields.
Lemma 4.12 Fq is a finite field, q = p n , α, β ∈ Fq∗ , k ≥ 1 is a positive integer, then

Time(αβ) = O(log3 q),


α
Time( ) = O(log3 q),
β

Time(α k ) = O(log k log3 q).

Proof Let f (x) ∈ F p [x], deg f (x) = n, f (x) is a monic irreducible polynomial,
then
Fq = F p [x]/< f (x)> = {a0 + a1 x + · · · + an−1 x n−1 |∀ ai ∈ F p }.

Let α, β ∈ Fq∗ , then

α = a0 + a1 x + · · · + an−1 x n−1 , β = b0 + b1 x + · · · + bn−1 x n−1 .

The multiplication of two polynomials requires n 2 times of mod p operation, and


the bit operation times of each mod p operation is O(log2 p), so α · β needs
O(n 2 log2 p) = O(log2 q)- bit operation to get a polynomial on F p [x]. The result-
ing polynomial is divided by f (x) to obtain a polynomial of degree ≤ n − 1, that
4.7 Basic Algorithm 187

is, the final result of α · β, the number of bit operations required for this operation is
O(n log3 p). Therefore,

Time(αβ) = O(log3 q + n log3 p) = O(log3 q),

the same can be estimated Time( βα ) and Time(α k ). The Lemma holds.

4.7.4 Knapsack Problem

Given a pile of items with different weights, can you put all or several of these items
into a backpack to make it equal to a given weight? This is a knapsack problem arising
from real life. Abstract into mathematical problems: Suppose A = {a0 , a1 , . . . , an−1 }
are n sets of positive integers, N is a positive integer. Is N the sum of the elements
of a subset in A? Using binary system, the knapsack problem in mathematics can be
expressed as follows:
Knapsack problem: When N and A = {a0 , a1 , . . . , an−1 } given, where each ai ≥ 1
is a positive integer, whether there is a binary integer e = (en−1 en−2 · · · e1 e0 )2 makes
the following formula true,


n−1
ei ai = N , where ei = 0 or ei = 1.
i=0

If e exists, it is called knapsack problem (A, N ) solvable, denote as ψ(A, N ) = e. If


N = 0, then ψ(A, 0) = 0 (each ei = 0) is called a trivial solution. Therefore, N ≥ 1
is assumed to be a positive integer.
The above knapsack problem may have solutions, no solutions or multiple solu-
tions. It is very difficult to solve the general knapsack problem (A, N ), which belongs
to the “NP complete” problem. If the conjecture of “P = N P” holds, there is no gen-
eral algorithm, and its computational complexity is polynomial of n and log N . How-
ever, under some special conditions, such as the so-called super-increasing sequence,
the solution of the problem will be very easy. Next, we introduce the polynomial solu-
tion method on the premise of super-increasing sequence.

Definition 4.11 A positive integer sequence {ai }i≥0 is called a super-increasing


sequence, if each ai (i ≥ 1) is greater than the sum of the previous i positive integers,
that is

i−1
ai > a j , 1 ≤ i < ∞. (4.56)
j=0

The knapsack problem of super-increasing sequence is actually to find a monoton-


ically decreasing index sequence {i k }k≥0 , where i k > i k+1 , 0 ≤ i k ≤ n − 1, ∀ k ≥ 0.
First, i 0 is defined as
188 4 Cryptosystem and Authentication System

i 0 = max{i|ai ≤ N }. (4.57)

Then consider N − ai0 = 0, then the algorithm is completed, that N = ai0 . If N −


ai0 > 0, then define
i 1 = max{i|ai ≤ N − ai0 }.

For any k ≥ 1, define

i k = max{i|ai ≤ N − ai0 − · · · − aik−1 }. (4.58)

If the equal sign in Eq. (4.58) holds, that is aik = N − ai0 − · · · − aik−1 , then the
algorithm completes and obtains the solution N = ai0 + ai1 · · · + aik of (A, N ). If i k
does not exist, that is

N − ai0 − · · · − aik−1 < ai , ∀ i = i 0 , i 1 , . . . , i k−1 ,

call the algorithm terminated. Obvious indicators i 0 > i 1 > · · · > i k > · · · . Let I be
a set of some indicators, and denote the above algorithm as ψ.

Lemma 4.13 Let A = {a0 , a1 , . . . , an−1 } be a given set of positive integers, ai (i ≥


0) is a super-increasing sequence, N is a positive integer. If there is a k ≥ 0 that
makes ψ complete at k, that is aik = N − ai0 − · · · − aik−1 , then the knapsack problem
(A, N ) has a solution and the solution is

ψ(A, N ) = e = (en−1 en−2 · · · e1 e0 )2 ,

where 
ei = 1, if i ∈ I,
ei = 0, if i ∈
/ I.

If there is a k ≥ 0, ψ that terminates at k, i.e.,

N − ai0 − · · · − aik−1 < ai , ∀ i ∈


/ {i 0 , i 1 , . . . , i k−1 }.

Then the knapsack problem (A, N ) has no solution.

Proof If ψ is completed at k ≥ 0, then

N = ai0 + ai1 + · · · + aik , I = {i 0 , i 1 , . . . , i k },

Let ei = 1, when i ∈ I ; ei = 0, wheni ∈


/ I , obviously,


n−1
ei ai = N .
i=0
4.7 Basic Algorithm 189

So ψ(A, N ) = e = (en−1 en−2 · · · e1 e0 )2 , if k ≥ 0 exists so that ψ terminates at k,


that is
N − ai0 − · · · − aik−1 < ai , ∀ i ∈ / {i 0 , i 1 , . . . , i k−1 }.

Then the knapsack problem (A, N ) has no solution. We can prove this conclusion
by means of counter-evidence. If (A, N ) has a solution, you might as well make

N = a j0 + a j1 + · · · + a jt .

Adjust the order, we can let j0 > j1 > · · · > jt . By the definition of i 0 , and a j0 ≤ N ,
know j0 ≤ i 0 , thus

j0 −1

N ≥ ai0 ≥ a j0 > ar ≥ a j0 + a j1 + · · · + a jt
r =0

contradict with N = a j0 + a j1 + · · · + a jt , so (A, N ) has no solution. The Lemma


holds.
MH knapsack public key encryption system
Merkle and Hellman first proposed an encryption method using knapsack problem
in 1978, it is the first public key encryption password. Let A = {a0 , a1 , . . . , an−1 }
be a sequence of super-increasing positive integers, take p, b as two prime numbers
and satisfy

n−1
p> ai , 1 ≤ b ≤ p − 1. (4.59)
i=0

Calculate ti ≡ bai (mod p), 0 ≤ i ≤ n − 1, then the public key is t = (t0 , t1 , . . . ,


tn−1 ), private key are A and b.
Encryption algorithm: The plaintext space P = Fn2 , for each plaintext unit m =
(m 0 m 1 · · · m n−1 ) ∈ P, encryption algorithm


n−1
c = f (m) ≡ ti m i (mod p), 0 ≤ c ≤ p, (4.60)
i=0

where c is cryptosystemtext.
Decryption algorithm: First, use the private key N ≡ b−1 c(mod p), 0 ≤ N ≤
p − 1. Then use the algorithm ψ = f −1 of knapsack problem (A, N ) to solve

f −1 (N ) = (m 0 m 1 · · · m n−1 ) ∈ Fn2 , (4.61)

to get plaintext m = (m 0 m 1 · · · m n−1 ).


The correctness of MH knapsack public key cryptography is attributed to the
following Lemma.
190 4 Cryptosystem and Authentication System

Lemma 4.14 The encryption algorithm f defined by Eq. (4.60) is a 1–1 correspon-
dence of Fn2 −→ F p , its inverse mapping f −1 is given by equation (4.61).

Proof If m = 0 is the zero vector in Fn2 , then c = 0, thus N = 0. Knapsack problem


(A, 0) has a unique trivial solution ψ(A, 0) = 0 ∈ Fn2 is a zero vector. Therefore, the
zero vector in Fn2 is a 1–1 correspondence of the zero element in F p . Let m = 0, if


n−1
N ≡ b−1 c(mod p), c ≡ ti m i (mod p).
i=0

Then

n−1 
n−1
N≡ m i b−1 ti ≡ m i ai (mod p).
i=0 i=0

By (4.59) and 0 ≤ N < p, to obtain


n−1
N= m i ai , =⇒ ψ(A, N ) = m = m 0 m 1 · · · m n−1 .
i=0

So we have
f −1 ( f (m)) = m, ∀ m ∈ Fn2 .

Conversely, if

n−1
N= m i ai ,
i=0

then

n−1 
n−1
bN ≡ m i ai b ≡ m i ti (mod p).
i=0 i=0

So there is N ≡ b−1 c(mod p), that is

f ( f −1 (c)) = c, ∀ c ∈ F p .

It can be seen that f is a 1–1 correspondence of Fn2 −→ F p and the inverse mapping
is f −1 = ψ. The Lemma holds.

It can be seen from the above discussion that if A = {a0 , a1 , . . . , an−1 } is not a
super-increasing sequence, the decryption algorithm f −1 is a difficult problem of “NP
complete class", so the encryption and decryption algorithm defined by MH knapsack
cryptosystem is the most typical trapdoor single function. Because of this, people
believe that MH knapsack public key cryptography is very secure for a long time.
However, in 1982, Shamir proved that a class of nonsuper-increasing sequences can
4.7 Basic Algorithm 191

be transformed into super-increasing sequences by a simple transformation x −→


ax mod m, which can be solved by polynomial algorithm. Although this kind of
convertible nonsuper-increasing sequence knapsack problem is quite special, it is
enough to shake people’s confidence in the security of knapsack problem public key
cryptosystem. It is now generally accepted that knapsack public key cryptography is
no longer secure.
Shamir transform
Let A1 = {α0 , α1 , . . . , αn−1 } is a super-increasing sequence of positive integers.
Randomly select four positive integers m 1 , a1 , m 2 , a2 , where


n−1
m1 > αi , m 2 > nm 1 , (a1 , m 1 ) = (a2 , m 2 ) = 1. (4.62)
i=0

A new positive integer sequence is defined by m 1 and a1 ,

A2 = {ω0 , ω1 , . . . , ωn−1 }, where ωi = a1 αi mod m 1 .

Where a1 αi mod m 1 represents the minimum nonnegative residue of a1 αi mod m 1 ,


that is
0 ≤ ωi < m 1 , and ωi ≡ a1 αi (mod m 1 ). (4.63)

By the third sequence of positive integers is defined by m 2 and a2 ,

A3 = {u 0 , u 1 , . . . , u n−1 }, u i = a2 ωi mod m 2 ,

that is
0 ≤ u i < m 2 , u i ≡ a2 ωi (mod m 2 ). (4.64)

Because {u i } is not a super-increasing sequence, if A3 is used for encryption, it


seems to be a general knapsack problem. Its difficulty will be NP complete, but
Shamir transform will prove that its decryption algorithm is polynomial.
Let x = (en−1 en−2 · · · e1 e0 )2 ∈ Fn2 be clear text and encrypt with A3 ,


n−1
c = f (x) = ei u i , (4.65)
i=0

get cryptosystemtext c. If decryption is required after receiving cryptosystemtext c, it


is a general knapsack problem, but the problem of using private key (b1 , m 1 , b2 , m 2 )
will become quite simple, where

0 ≤ b1 < m 1 , a1 b1 ≡ 1(mod m 1 )
0 ≤ b2 < m 2 , a2 b2 ≡ 1(mod m 2 ).
192 4 Cryptosystem and Authentication System

First, note the minimum nonnegative residue of b2 c under mod m 2 ,


n−1
N0 = b2 c mod m 2 = ei ωi . (4.66)
i=0

Because by (4.65),


n−1 
n−1
b2 c ≡ ei b2 u i ≡ ei ωi (mod m 2 ).
i=0 i=0

By the assumption m 2 > nm 1 of formula (4.62), and (4.63), there is


n−1
0≤ ei ωi < m 2 .
i=0

So (4.66) holds. Then consider the minimum nonnegative residue N = b1 N0 mod m 1 (0 ≤


N < m 1 ) of b1 N0 mod m 1 , by (4.63),


n−1 
n−1
N = b1 N0 ≡ ei b1 ωi ≡ ei αi (mod m 1 ).
i=0 i=0

So there is

n−1
N= ei αi , αi ∈ A1 .
i=0

Since A1 is a super-increasing sequence, the algorithm of polynomial (see Lemma


4.13), we have
ψ(A1 , N ) = (en−1 en−2 · · · e1 e0 )2 = x.

To get plaintext x.
Therefore, Shamir uses simple transformation to transform the general knapsack
problem into super-incremental knapsack problem. Although A3 is very special, we
have reason to doubt that the public key cryptography based on the general knapsack
problem solving algorithm is not as secure as people think.
Exercise 4
1. Explain the following terms. (1) One secret at a time, (2) Completely confidential
system, (3) Unique solution distance, (4) Improve the certification system.
2. Short answer:
(1) What are the advantages and disadvantages of symmetric cryptosystem and
asymmetric cryptosystem?
(2) The goal of perfecting the certification system.
4.7 Basic Algorithm 193

3. It is known that the plaintext is “Friday”, and the cryptosystemtext obtained


after encryption with m = 2’s Hill password is “POCFKU”, find the key of Hill
password.
4. Find the inverse matrix (mod N ) of the following matrix:
   
13 13
A= mod 5, A = mod 29,
43 43
  
15 17 197 62
A= mod 26, A = mod 841.
4 9 603 271

5. In number theory, Fibonacci number is defined as a1 = 1, a2 = 1, a3 = 2, when


n > 1, an+1 = an + an−1 . Prove
   n
an+1 an 11
= ,
an an−1 10

and an is even if and only if 3|n. More generally, find the law of d|an .
6. Suppose N = mn, and (n, m) = 1. A second-order matrix A ∈ M2 (Z N ) on nZ N ,
can consider A ∈ M2 (Zm ) and A ∈ M2 (Zn ), let A1 and A2 represent the elements
of A in M2 (Zm ) and M2 (Zn ), then prove
σ σ
(i) Mapping A −→ (A1 , A2 ) is a 1–1 correspondence between M2 (Z N ) −→
M2 (Zm ) × M2 (Zn ).
(ii) In the corresponding σ , A is the invertible matrix (mod N ) if and only if A1
is the invertible matrix (mod m) and A2 are the invertible matrix (mod n).
7. Let p be a prime, α ≥ 1, then A ∈ M2 (Z pα ) is a reversible square matrix if and
only if A ∈ M2 (Z p ) is a reversible square matrix. By calculate, for ∀ α ≥ 1, find
the number of reversible matrices in M2 (Z pα ).
8. Let ϕ(N ) be Euler function, ϕ2 (N ) is the number of invertible matrices in
M2 (Z N ), calculation formula for ϕ2 (N ): that is, write a formula for ϕ2 (N ) similar
to ϕ(N ). Known ϕ(N ) = N p|N (1 − 1p ), solve ϕ2 (N ) =?
9. Let ϕk (N ) be the number of k-order reversible matrices in Mk (Z N ) and give the
calculation formula of ϕk (N ).
10. According to exercise 8 and exercise 9, find the order of k-dimensional affine
transformation group G = (A, b) on Z N .
11. RSA is used for encryption, the alphabet of plaintext and cryptosystemtext
is {0, 1, 2, . . . , 39} 40 numbers, of which {0, 1, 2, . . . , 25} 26 numbers are
equivalent to English 26 letters. Blank = 26, • = 27, ? = 28, $ = 29, number
{0, 1, 2, . . . , 9} = {30, 31, . . . , 39}. Suppose all public keys n A satisfy 402 <
n A < 403 . Plaintext unit m = m 1 m 2 ∈ Z240 , cryptosystemtext unit c = c1 c2 c3 ∈
Z340 . For any plaintext unit, m = m 1 m 2 corresponds to a number m 2 40 + m 1 of
Zn A , any cryptosystemtext c = c3 402 + c2 40 + c1 ∈ Zn A .
194 4 Cryptosystem and Authentication System

(i) Encrypting plaintext "S E N D$7500" with public key (n A , e A )=(2047, 179).
(ii) Factor n A = 2047 to find the private key (n A , d A ) =?
(iii) A password attacker can quickly find the private key d A without factoring
2047, so n A = 2047 is a pretty bad choice. Why?
12. The computer attacks the public key (n A , e A ) = (536813567, 3602561) and
finds the private key d A . It shows that 29-bit n A is not safe in RSA system.
13. Assuming that the plaintext alphabet is {0, 1, . . . , 26}, and the first 26 num-
bers are 26 letters in English, blank = 26. Cryptosystemtext alphabet adds
“|" = 27 to the plaintext alphabet, a total of 28 numbers. If the plaintext unit
is m = m 1 m 2 m 3 ∈ Z327 , Cryptosystemtext unit is c = c1 c2 c3 ∈ Z328 . Then in the
corresponding number of Z n A (see exercise 11), we need n A to meet

19683 = 273 < n A < 283 = 21952,

(i) If your decryption key is (n A , d A ) = (21583, 20787), decrypt cryptosys-


temtext is “Y S N AU O Z H X X H " (blank at the end).
(ii) If you know the Euler function ϕ(n) = 21280, calculate e = d −1 mod ϕ(n)
and factorize n.
14. Prove: In RSA, the 35 bit integer n = 23360947609 is a particularly bad choice.
(Hint: n = p · q factorization, the size difference between p and q remains
unchanged, and Fermat factorization can be used to attack.)
15. Let n be a square free number, and de ≡ 1(mod ϕ(n)). It is proved that there is
congruence
a de ≡ a(mod n)

for all integers a.


16. The multiplication group F∗181 of finite field F181 is generated by g = 2, the
discrete logarithm of 153 pairs of basis 2 is calculated by smoothing factor
method.
17. In the knapsack problem, determine whether the following sequence is an over
increasing sequence, whether the knapsack problem is solvable for a given N ,
and how many solutions there are:
(i) A = {2, 3, 7, 20, 35, 69}, N = 45;
(ii) A = {1, 2, 5, 9, 20, 49}, N = 73;
(iiii) A = {1, 3, 7, 12, 22, 45}, N = 67;
(iv) A = {2, 3, 6, 11, 21, 40}, N = 39;
(v) A = {4, 5, 10, 30, 50, 101}, N = 186.
18. If A = {ai |i = 0, 1, 2, · · · } is an over increasing
 sequence and a0 = 1, ai is the
smallest positive integer satisfies ai ≥ i−1 j=0 a j , then ai = 2 holds for ∀ i ≥ 1.
i
4.7 Basic Algorithm 195

19. Let A = {a0 , a1 , . . . , ai , . . .} be a super-increasing sequence, where ai = 2i (i ≥


1), then for any positive integer N , Knapsack problem (A, N ) has a unique
solution.
20. Let A = {a0 , a1 , . . . , ai , . . .} be a super-increasing sequence, if for any positive
integer N , knapsack problem (A, N ) always has a solution, prove ai = 2i (i ≥ 1).

References

Adelman, L. M., Rivest, R. L., & Shamir, A. (1978). A method for obtaining digital signatures and
public-key crypto system. Communication of ACM, 21, 120–126.
Adleman, L. M. (1979). A subexponential algorithm for the discrete logarithm problem with appli-
cation to cryptography. In Proceedings of the 20th Annual Symposium on the Foundations of
Computer Science, pp. 55–60.
Blum, M. (2022). Coin-flipping by telephone–A protocol for solving impossible problems (pp. 133–
137). Spring-Compcan: IEEE Proceeding.
Coppersmith, D. (1984). Fast evaluation of logarithms in fields of characteristic two. IEEE Trans-
actions in Information Theory, IT-30, 587–594.
Cover, T. M. (2003). Fundamentals of information theory. Tsinghua University Press (in Chinese).
Diffie, W., & Hellman, M. E. (1976). New direction in crytography. IEEE Transactions in Informa-
tion Theory, IT-22, 644–654.
EIGamal, T. (1985). A public key cryptosystem and a signature scheme based on discrete logarithms.
IEEE Transactions in Information Theory, IT,314, 469–472.
Fait, A. & Shamir, A. (2022). How to prove yourself: Practical solutions to identifications and
signature problems. In A advance in Crypology-CRYPTO’86 (Vol. 263, pp. 186–194). Springer-
Verlag, LVCS.
Garey, M. R., & Johnson, D. S. (1979). Computers and intractability: A guide to the theory of
NP-completeness. Freeman.
Goldreich, O. (2001). Foundation of cryptography Cambridge University Press.
Gordon, J. A. (1985). Strong prime are easy to find, advance in cryptology. In Proceedings of Euro
Crypt84 (pp. 216–223). Springer.
Hellman, M. E., & Merkle, R. C. (1978). Hiding information and signatures in trap door knapascks.
IEEE Transactions in Information Theory, IT-24, 525–530
Hellman, M. E. (1979). The mathematics of public-key cryptography. Scientific America, 241,
146–157.
Hill, L. S. (1931). Concerning certain linear transformation apparatus of cryptography. American
Math Monthly, 38, 135–154.
Kahn, D. (1967). The codebreakers, the story of secret writing. Macmillan.
Knuth, D. E. (1973). The art of computer programming. Addision-Wesley.
Koblitz, N. (1994). A course in number theory and cryptograph. Springer-Verlag.
Kranakis, E. (1986). Primality and cryptogaphy. John Wiley-Sons.
Massey, J. L. (1983). Logarithms in finite cyclic group-Cryptographic issues. In Proceedings of the
4th Benelux Symposium on Information’s Theory, pp. 17–25.
Odlyzko, A. M. (1985). Discrete logarithms in finite fields and their cryptographic significance. In:
Advance in Cryptology, Proceedings of Eurocrypt 84, pp. 224–314. Springer.
Rivest, R. L. (1985). RSA chips(past, present, and future). Advances in Cryptology, Proceedings of
Eurocrypt, 84, 159–165.
Ruggiu, G. (1985). Cryptology and complexity theories, advances in cryptology. In Proceedings of
Eurocrypt (Vol. 84, pp. 3–9), Springer
Schneier, B. (1996). Applied cryptography, John Wiley 8-sous.
196 4 Cryptosystem and Authentication System

Shamir, A. (1982). A polynomial time algorithm for breaking the basic Markle-Hellman Cryptosys-
tem. In Proceedings of the 23rd Annual Symposium on the Foundations of Computer Science, pp.
145–152.
Shannon, C. E. (1949). Communication theory of secrecy system. The Bell System Technical Jour-
nal, 28, 656–715.
Stinson, D. R. (2003). Principles and practice of cryptography, translated by Guodeng Feng. Elec-
tronic Industry Press (in Chinese).
Trappe, W., & Washington, L. C. (2008). Cryptography and coding theory, translated by Quanlong
Wang et al., people’s Posts and Telecommunications Publishing House (in Chinese).
Wah, P., & Wang, M. Z. (1984). Realization and application of Massey-Omura lock. In Proceedings
of the International, Zürich Seminar(1984),175-182.

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 5
Prime Test

In the RSA algorithm in the previous chapter, we see that the decomposition of large
prime factors constitutes the basis of RSA cryptosystem security. Theoretically, this
security should not be questioned, because there is only the definition of prime in
mathematics, and there is no general method to detect prime. The main purpose of
this chapter is to introduce some basic prime test methods, including Fermat test,
Euler test, Monte Carlo method, continued fraction method, etc., understanding the
content of this chapter requires some special number theory knowledge.

5.1 Fermat Test

According to Fermat’s congruence theorem (commonly known as Fermat’s small


theorem, which is a special case of Euler congruence theorem), if n is a prime
number, the following congruence formula holds for all integers b, (b, n) = 1,

bn−1 ≡ 1(mod n). (5.1)

The above formula is an important characteristic of prime numbers. Although n


satisfying the above formula is not necessarily prime, it can be used as an important
basis for detecting prime numbers, because we can conclude that n not satisfying the
above formula is definitely not a prime number. Using Formula (5.1) as the standard
to detect prime numbers is called Fermat test.

Definition 5.1 An odd number n, assuming that n is a compound number (not a


prime number) and there is a positive integer b, (b, n) = 1, satisfying

bn−1 ≡ 1(mod n),

© The Author(s) 2022 197


Z. Zheng, Modern Cryptography Volume 1, Financial Mathematics and Fintech,
https://doi.org/10.1007/978-981-19-0920-7_5
198 5 Prime Test

the compound number n is called a Fermat pseudo prime under base b.

The basic properties of pseudo prime numbers are discussed. Our working plat-
form is a finite Abel group Z∗n , define as

Z∗n = {ā|1 ≤ a ≤ n, (a, n) = 1}, n > 1, (5.2)

where ā is a congruence class of mod n represented by a. The multiplication of two


¯ obviously, Z∗n forms an Abel group of
congruence classes is defined as ā · b̄ = ab;
order ϕ(n) under multiplication, in a finite group G, the order of a group element
g ∈ G is defined as

o(g) = min{m : g m = 1, 1 ≤ m ≤ |G|}.

o(g) = 1 if and only if g is the unit element of group G. By the definition of o(g),
obviously,
g t = 1 ⇔ o(g)|t. (5.3)

The following two lemmas are the basic conclusions about the order of group element
g.

Lemma 5.1 G is a finite group, g ∈ G, k ∈ Z is an integer, then

o(g)
o(g k ) = , (5.4)
(k, o(g))

where the denominator is the greatest common divisor of k and o(g).

Proof Let o(g) = m, o(g k ) = t, obviously, (g k )m = 1, in particular,



 m
= 1, =⇒ t 
k·m
g (k,m) .
(k, m)

On the other hand, by g kt = 1, there is m|kt, thus


 
m  k m 
t, =⇒ t.
(k, m)  (k, m) (k, m) 

So we have t = m
(k,m)
, the Lemma holds.

Lemma 5.2 Suppose G is a finite Abel group, a, b ∈ G, (o(a), o(b)) = 1, then

o(ab) = o(a)o(b).

Proof Let o(a) = m 1 , o(b) = m 2 , then (m 1 , m 2 ) = 1. Let o(ab) = t, by (ab)m 1 m 2 =


a m 1 m 2 bm 1 m 2 = 1, there is t|m 1 m 2 , on the other hand, (ab)t = 1, then (ab)tm 1 = 1, thus
5.1 Fermat Test 199

btm 1 = 1, m 2 |m 1 t, m 2 |t. By the same reason, there is m 1 |t, thus m 1 m 2 |t, t = m 1 m 2 .


The Lemma holds.

Back to the finite group Z∗n , any integer a ∈ Z, (a, n) = 1, then ā ∈ Z∗n , we denote
o(ā) with o(a), a is called the order mod n, obviously, o(a) = o(b), if a ≡ b(mod n).
A basic problem in number theory is the existence of primitive roots of mod n.
equivalently, is Z∗n a cyclic group? If there is a positive integer a, (a, n) = 1, o(ā) =
|Z∗n | = ϕ(n), then Z∗n is a cyclic group of order ϕ(n), so that the primitive root of
mod n exists and a is the primitive root of mod n.

Lemma 5.3 (Existence of primitive root) If and only if n = 2, 4, p α (α ≥ 1) and


a = 2 p α (α ≥ 1) four cases, the primitive root of mod n exists, where p > 2 is an
odd prime.

Proof If n = 2, 4, then the lemma holds. If n = p, then Zn = F p , Z∗n = F∗p , by


Lemma 4.7 of Chap. 4, it can be seen that F∗p is a cyclic group of order ( p − 1),
so mod p has primitive roots. Now, we need to prove for all positive integer α, the
primitive root of mod p α also exists. Therefore, let a be a primitive root of mod p,
that is, the order of a mod p is p − 1. If the order of a mod p α is denoted by o(a),
then
a o(a) ≡ 1(mod p α ), =⇒ a o(a) ≡ 1(mod p),

so there is p − 1|o(a). And the number of elements of Z∗pα is ϕ( p α ) = p α−1 ( p − 1),


obviously, o(a)| p α−1 ( p − 1), thus, o(a) = pi ( p − 1), 0 ≤ i ≤ α − 1.
We might as well let o(a) = p − 1, if o(a) = pi ( p − 1),1 ≤ i, then replace a
i
with a p . By Lemma 5.1,

i pi ( p − 1)
o(a p ) = .
( pi , pi ( p − 1))

Therefore, without losing generality, let o(a) = p − 1, then by Sylow theorem, when
α > 1, p α−a |ϕ( p α ), there is an integer b, (b, n) = 1, b is o(n) = p α−1 in the order
of mod p α , because of (o(a), o(b)) = 1, then by Lemma 5.2, there is

o(ab) = o(a)o(b) = p α−1 ( p − 1) = ϕ( p α ),

So the primitive root of mod p α exists.


When n = 2 p α , p > 2 is odd prime, then ϕ(n) = ϕ( p α ). Thus, the primitive root
a of mod p α is also an primitive root of mod 2 p α . The Lemma holds.

Lemma 5.4 Let n be an odd compound number, then


(i) b ≥ 1 is a positive integer, (b, n) = 1, n is Fermat pseudo prime under base b if
and only if o(b)|n − 1.
(ii) n is Fermat pseudo prime under bases b1 and b2 , then it is Fermat pseudo prime
under bases b1 b2 and b1 b2−1 , where b2−1 is the multiplicative inverse of b2 mod n.
200 5 Prime Test

(iii) If exist one b ∈ Z∗n does not satisfy Eq. (5.1), at least half of a, b ∈ Z∗n do not
satisfy Eq. (5.1).
Proof (i) and (ii) are trivial. (i) can be obtained by (5.3). And b1 , b2 ∈ Z∗n ,

b1n−1 ≡ 1(mod n), b2n−1 ≡ 1(mod n). =⇒ (b1 b2 )n−1 ≡ 1(mod n).
bn−1 ≡ 1(mod n), =⇒ (b−1 )n−1 ≡ 1(mod n).

So there is (ii). To prove (iii). Let n not be Fermat pseudo prime to base b, if n is
Fermat pseudo prime to base a, then n is not Fermat pseudo prime to base ab. By
(ii), therefore, if there is a base to make n a Fermat pseudo prime number, there must
be a base to make n not a Fermat pseudo prime number, so more than half of the
base b must make n not a Fermat pseudo prime number. The Lemma holds.
By Lemma 5.3, if there is a base b so that n is not Fermat pseudo prime, detect a,
1 ≤ a ≤ n, (a, n) = 1 in sequence, whether a n−1 ≡ 1(mod n); that is, there is more
than 50% chance that find the exact b such that bn−1 ≡ 1(mod n), this proves that
n is not a prime number. Is it possible that all a, 1 ≤ a ≤ n, (a, n) = 1, n is Fermat
pseudo prime to base a The answer is yes, such a number n is called Carmichael
number.
Definition 5.2 A Carmichael number n is an odd compound number, and for ∀ b ∈
Z∗n , there is
bn−1 ≡ 1(mod n).

For Carmichael number, we have the following engraving.


Theorem 5.1 Let n be a compound number, then
(i) If there is an integer a > 1, a 2 |n, then n is not a Carmichael number.
(ii) Assuming that n is a square free number, then n is a Carmichael number ⇔ for
all prime p, p|n, there is p − 1|n − 1.
(iii) A Carmichael number is the product of at least three different prime numbers.
Proof Let’s prove (i) first. Let p 2 |n, p be a prime number, by Lemma 5.3, mod p 2
has primitive roots. Let g be an original root of mod p 2 , that is o(g) = p( p − 1), let

n = p , p is a prime number.
p |n, p = p

According to the Chinese remainder theorem, there is a positive integer b such that

b ≡ g(mod p 2 ),
b ≡ 1(mod n ).

Then b is an primitive root of mod p 2 , and (b, n) = 1. We assert that n to base b is


not a Fermat pseudo prime. If n to base b is a Fermat pseudo prime, then
5.1 Fermat Test 201

bn−1 ≡ 1(mod n), =⇒ bn−1 ≡ 1(mod p 2 ), =⇒ o(b)|n − 1.

That is p( p − 1)|n − 1, but p|n is contradict with p|n − 1. So bn−1 ≡ 1(mod n), n
is not Carmichael number, (i) holds.
Now to prove (ii). If ∀ p, p|n, there is p − 1|n − 1, then ∀ b ∈ Z∗n ,
n−1
bn−1 = (b p−1 ) p−1 ≡ 1(mod p), ∀ p|n.

Because n is a square free number, so

bn−1 ≡ 1(mod n), ∀ b ∈ Z∗n .

Therefore, n is the Carmichael number. Conversely, if there is a prime number p,


p|n, but p − 1  n − 1, Let g be a primitive root of mod p, which is given by the
Chinese remainder theorem,


⎨ b ≡ g(mod p),
n

⎩ b ≡ 1 mod .
p

Then (b, n) = 1, and


b p−1 ≡ g p−1 ≡ 1(mod p).

By p − 1  n − 1, then g n−1 ≡ 1(mod p), so there is bn−1 ≡ 1(mod n), this contra-
dicts with the assumption that n is the Carmichael number. So (ii) holds.
To prove (iii), we just need to exclude that n is the product of two prime numbers.
By (ii), let n = pq, p < q, if n is a Carmichael number, then q − 1 | n − 1, but
n − 1 = p(q − 1 + 1) − 1 = p(q − 1) + p − 1, then

n − 1 ≡ p − 1(mod q − 1),

this contradicts with n − 1 ≡ 0(mod q − 1), so n = pq must not be a Carmichael


number, the Theorem holds.
Below we give some examples of Carmichael numbers, from property (ii) in
Theorem 5.1, we can easily verify whether a square free number is Carmichael
number.
Example 5.1 The following positive integers n are Carmichael numbers,

n = 1105 = 5 · 13 · 7, n = 1729 = 1 · 13 · 19, n = 2465 = 5 · 17 · 29,


n = 2821 = 7 · 13 · 31, n = 6601 = 7 · 23 · 41.

Example 5.2 The positive integer 561 = 3 · 11 · 17 is the smallest Carmichael num-
ber.
202 5 Prime Test

Proof Defined by, the Carmichael number is odd and compound, so the minimum
Carmichael number is

n = 3 · p · q, where p − 1|n − 1, q − 1|n − 1, p < q is a prime.

Let p = 5, p = 7, the congruence equation

3 · p · q ≡ 1(mod q − 1), q > p

has no prime solution q, when p = 11, the above formula has a minimum solution
q = 17, so n = 3 · 11 · 17 is the smallest Carmichael number.

Example 5.3 For given prime number r ≥ 3, then the congruence equations

r pq ≡ 1(mod p − 1)
r pq ≡ 1(mod q − 1)

has only finite different prime solutions p, q. Let’s leave this conclusion for reflection.

5.2 Euler Test

Let p > 2 be an odd prime, Euler test uses the Euler criterion in the quadratic residue
of mod p to detect whether a positive integer n is prime. Like Fermat’s test, it is
obvious that the n that passes the test cannot be determined as prime, but the n that
fails the test is certainly not prime. We know that when the positive integers a and n
are given (n > 1), the solution of the quadratic congruence equation x 2 ≡ a(mod n)
is a famous “NP complete” problem. We can’t find a general solution in an effective
time. However, in the special case where n = p > 2 is an odd prime number, we
have rich theoretical knowledge to discuss the quadratic residue of mod p, these
knowledge include the famous Gauss quadratic reciprocal law and Euler criterion,
which constitute the core knowledge system of elementary number theory. First, we
introduce Legendre sign and let p > 2 be a given odd prime number.
Z∗p is a ( p − 1)-order cyclic group, a ∈ Z∗p (i.e., (a, p) = 1), we define the Leg-
endre symbolic function as

a 1, when x 2 ≡ a(mod p) is solvable
=
p − 1, when x 2 ≡ a(mod p) is unsolvable

If (a, p) > 1, that is p | a, we let ( ap ) = 0, for ∀ a ∈ Z, Legendre symbolic function


( ap ) is all defined, and it is a completely integral function of Z → {1, −1, 0}.
5.2 Euler Test 203

ab a b
= , ∀ a, b ∈ Z
p p p

and
a b
= , if a ≡ b (mod p) .
p p

If ( ap ) = 1, then x 2 ≡ a(mod p) is solvable, a is called a quadratic residue of mod p,


if ( ap ) = −1, then x 2 ≡ a(mod p) is unsolvable, a is called a quadratic nonresidue
of mod p.

Lemma 5.5 a ∈ Z, p  a, then the necessary and sufficient condition for a to be the
quadratic residue of mod p is
p−1
a 2 ≡ 1(mod p).

Proof Z∗p is a p − 1-order cyclic group, let g be a primitive root of mod p, that is ḡ
is the generator of Z∗p , that is ∀a ∈ Z, (a, p) = 1, we have

a ≡ g t (mod p), where 1 ≤ t ≤ p − 1.

Obviously, a is the quadratic residue of mod p ⇔ t is even. Therefore, if t is even,


then p−1 t ( p−1) t
a 2 ≡ g 2 ≡ (g 2 ) p−1 ≡ 1(mod p).
p−1
Conversely, if a 2 ≡ 1(mod p), then o(a) | p−1
2
, and by Lemma 5.1, can calculate

p−1
o(a) = o(g t ) = .
(t, p − 1)

So
p−1
o(a) | ⇔ 2|(t, p − 1) ⇔ 2|t,
2
that is t is even, thus, a is a quadratic residue of mod p, the Lemma holds.

Lemma 5.6 (Euler criterion). For ∀ a ∈ Z, we have

p−1 a
a 2 ≡ (mod p). (5.5)
p

Proof If (a, p) > 1, that is p|a, the above formula holds. Might as well let p  a.
By Fermat congruence theorem a p−1 ≡ 1(mod p), there is
p−1 p−1
(a 2 + 1)(a 2 − 1) ≡ 0(mod p).
204 5 Prime Test

Thus p−1
a 2 ≡ ±1(mod p).
p−1 p−1
If a 2 ≡ 1(mod p), by Lemma 5.5, then ( ap ) = 1. If a 2 ≡ −1(mod p), then ( ap ) =
−1. So (5.5) holds.

Definition 5.3 Suppose n is an odd compound number, if there is an integer


b, (b, n) = 1, it satisfies
n−1 b
b 2 ≡ (mod n), (5.6)
n

Call n an Euler pseudo prime under base b. Where ( nb ) is Jacobi symbol, define as
α1 α2 αs
b b b b
= ··· , if n = p1α1 · · · psαs . (5.7)
n p1 p2 ps

From the definition, we obviously have a corollary: if n is Euler pseudo prime under
basis b, then n is Fermat pseudo prime under basis b. This conclusion can be proved
by squaring both sides of Eq. (5.6) at the same time.
The following example shows that the inverse of inference is not tenable; that is,
if n is Fermat pseudo prime under basis b, but not Euler pseudo prime.

Example 5.4 n = 91 is Fermat pseudo prime under basis b = 3, but not Euler pseudo
prime. In fact, it’s easy to calculate 36 ≡ 1(mod 91), thus 390 ≡ 1(mod 91). From
36 ≡ 1(mod 91), we have

342 ≡ 1(mod 91), =⇒ 345 ≡ 9(mod 91).

So 91 to base 3 is not an Euler pseudo prime.

Example 5.5 n = 91 to base b = 10 is an Euler pseudo prime. Because

1045 ≡ 103 ≡ −1(mod 91),

calculate Legendre symbols

10 2 5
= · = −1,
91 91 91

so n = 91 to base b = 10 is an Euler pseudo prime.

From the Euler criterion of Lemma 5.6, we can easily calculate the Legendre
symbols of −1 and 2.
5.2 Euler Test 205

Lemma 5.7 Let p > 2 be an odd prime, then we have

−1 p−1 2
= (−1) 8 ( p −1)
1 2
= (−1) 2 , . (5.8)
p p

Proof By Lemma 5.6,


p−1 −1
(−1) 2 ≡ (mod p),
p

Since both sides of the congruence are ±1, p > 2, there is ( −1


p−1

p
) = (−1) 2 . To
calculate the Legendre sign for 2, we notice that


⎪ p − 1 ≡ (−1)1 (mod p)



⎪ 2 ≡ 2 · (−1)2 (mod p)



⎨ p − 3 ≡ 3 · (−1)3 (mod p)

⎪ ..

⎪ .




⎩ r ≡ p − 1 · (−1) p−1
⎪ 2 (mod p),
2

where r = p−1
2
, if p−1
2
is a even; r = p − p−1
2
, if p−1
2
is an odd. There is

p−1
!(−1) 8 ( p −1) (mod p),
1 2
2 · 4 · 6 · · · ( p − 1) ≡
2

that is p−1
≡ (−1) 8 ( p −1)
1 2
2 2 (mod p),

by Lemma 5.6,
2
≡ (−1) 8 ( p −1)
1 2
(mod p),
p

there is
2
= (−1) 8 ( p −1)
1 2
,
p

Lemma 5.7 holds.


Let ( an ) be a Jacobi symbol, defined by Eq. (5.6), then Lemma 5.7 can be extended
to Jacobi symbol.
Lemma 5.8 Let n be an odd, then we have

−1 2
= (−1) 8 (n −1)
n−1 1 2
= (−1) 2 , . (5.9)
n n
206 5 Prime Test

Proof The square of any odd number is congruent 1 under mod 8, that is a 2 ≡
1(mod 8). Write n = a 2 · p1 p2 · · · pt , where pi are different prime numbers, then

n ≡ p1 p2 · · · pt (mod 8).

Similarly, for ∀ n ∈ Z, by (5.7),

b b b b
= ··· , (5.10)
n p1 p2 pt

thus

−1 −1 −1 −1 p1 −1 p2 −1 pt −1
2 + 2 +···+ 2
n−1
= ··· = (−1) = (−1) 2 . (5.11)
n p1 p2 pt

2
The same can be proved n
, the Lemma holds.

Corollary 5.1 For all odd numbers n, they are Euler pseudo prime under the base
±1.

Proof It is trivial that n to 1 is an Euler pseudo prime number, and n to −1 is an


Euler pseudo prime number, which is directly derived from Lemma 5.8.

Lemma 5.9 (Gauss. ) Let p and q be two different odd primes, then

q p
= (−1) 4 ( p−1)(q−1) .
1

p q

Proof According to incomplete statistics, there are currently more than 270 methods
to prove Gauss quadratic reciprocal law. In order to save space, we leave the proof
to the readers, hoping that everyone can find their favorite proof method.

Next, we discuss the computational complexity of Fermat test and Euler test.

Lemma 5.10 Let n be an odd, 1 ≤ b < n, (b, n) = 1, then



Time(n to base b s Fermat test) = O(log3 n),
Time(n to base b s Euler test) = O(log4 n).

Proof By (5.1), the Fermat test of n to base b is actually an operation of bn−1 to


mod n, by the Lemma 1.5 of Chap. 1, bit operations of bn−1 mod n,

Time(bn−1 mod n) = O(log n log2 n) = O(log3 n).

Euler test of n to base b, by (5.6), the number of bit operations on the left is O(log3 n).
Find Jacobi symbol ( nb ), from Eq. (5.7) and quadratic reciprocal law, the calculation
5.2 Euler Test 207

can be transformed into the calculation of Legendre symbol. Each reciprocal law is
actually a division, so we only consider the calculation of Legendre symbols. By
Euler criterion,

b p−1

Time calculate = Time b 2 mod p = O(log3 n).
p

The number of prime factors of each n has an estimated O(log log n), so

b
Time calculate Jacobi symbol = O(log log n · log3 n) = O(log4 n).
n

We have completed the calculation of Lemma 5.10.

Solovay and Strassen proposed a probabilistic method to detect prime numbers by


Euler test in 1977. When n > 1 is an odd number, k numbers are randomly selected,
b1 , b2 , . . . , bk , where 1 < bi < n, (bi , n) = 1. Use Eq. (5.6) to calculate both sides
of each b in turn, and the required bit operation is O(log4 n), if both sides of Eq. (5.6)
are not equal, then n is not a prime number and the test is terminated. If k b pass the
Euler test of Eq. (5.6), then n is the probability < 21k of compound number, that is

P{n is not prime} ≤ 2−k .

The above formula is directly derived from Lemma 5.3. Let’s introduce a better
Miller–Rabin method than Solovay–Strassen method in a sense.
Definition 5.4 Let n be an odd compound number, write n − 1 = 2t · m, where
t ≥ 1, m is an odd. Let b ∈ Z∗n , if n and b satisfy one of the following conditions,
r
bm ≡ 1(mod n), or exists one r, 0 ≤ r < t, such that b2 m ≡ −1(mod n). (5.12)

Then n is called a strong pseudo prime under base b.

Lemma 5.11 Suppose n ≡ 3(mod 4), then n is a strong Pseudoprime under base b
if and only if n is an Euler Pseudoprime under base b.

Proof Because n ≡ 3(mod 4), then n − 1 = 2m, that is t = 1, m = 21 (n − 1). By


Definition 5.4, n is a strong pseudo prime under base b if and only if
n−1
bm = b 2 ≡ ±1(mod n).

Therefore, if n is an Euler pseudo prime number under base b, the above formula
holds, so it is also a strong pseudo prime number for base b. Conversely, if the
above formula holds, because of n ≡ 3(mod 4), then 21 (n − 1) is an odd number, so
( −1
n
) = −1, and
208 5 Prime Test

n−1  n−1

b b 2 b 2 n−1
= ≡ ≡b 2 (mod n).
n n n

Therefore, n to base b is Euler pseudo prime. The Lemma holds.

Below we give the main results of this section.

Theorem 5.2 Let n be an odd number, b ∈ Z∗n , then


(i) If n to base b is a strong pseudo prime, then n to base b is an Euler pseudo
prime.
(ii) Base b, which makes n a strong pseudo prime number, accounts for 25% of
1 ≤ b < n, (b, n) = 1 at most.

Before proving Theorem 5.2, let’s introduce Miller–Rabin’s test method, in order
to test whether a large odd number n is a prime number, we write n − 1 = 2t · m,
m is an odd number, t ≥ 1, select one b at random, 1 ≤ b < n, (b, n) = 1. We first
calculate bm mod n, if we get the result is ±1, then n passes the strong pseudo prime
test (5.12). If bm mod n = ±1, then we square bm mod n and find the minimum
nonnegative residue of the squared number under mod n to see if we get the result
of −1 and perform r times. If we can’t get −1, then n to base b fails to test Formula
(5.12). Therefore, it is asserted that n to base b is not a strong pseudo prime number.
If −1 is obtained by r squared, then n passes the test under base b.
In Miller–Rabin’s test, if n to base b fails to pass the test Formula (5.12), then n
must not be a prime number, if n to randomly selected k b = {b1 , b2 , . . . , bk } pass
the test, by property (ii) of 5.2, each bi accounts for no more than 25

1
P{n not prime} ≤ . (5.13)
4k
Compared with the Solovay–Strassen method using Euler test, the Miller–Rabin
method using strong pseudo prime test is more powerful.
To prove 5.2, we first prove the following two lemmas.

Lemma 5.12 Let G = g be a finite group of order m, that is o(g) = m, then


equation x k = 1 has exactly d solutions in G, d = (k, m).

Proof x ∈ G, write x = g t , then x k = g kt = 1 ⇔ m|kt, that is md | dk · t, thus md |t, let


t = md · s, then when s = 1, 2, . . . , d, x = g t has exactly d solutions. The Lemma
holds.

Lemma 5.13 Let p be an odd prime number, p − 1 = 2t m , t ≥ 1, m is prime, then


r
x 2 m ≡ −1(mod p), m is odd (5.14)

The number of solutions N in Z∗p satisfies


5.2 Euler Test 209

0, if r ≥ t;
N=
2 (m, m ), if r < t.
r

Proof Let g be a generator of Z∗p , write x = g j , 1 ≤ j ≤ p − 1, because o(g) =


p − 1, so
p−1
g 2 ≡ −1(mod p).

Thus
r p−1
x 2 m ≡ −1(mod p) ⇔ 2r m j ≡ (mod p − 1).
2
Namely,
2r m j ≡ 0(mod p − 1).

Because p − 1 = 2t m , the above formula is equivalent to

2r m j ≡ 2t−1 m (mod 2t m ). (5.15)

If r > t − 1, then the congruence has no solution to j, because m and m are odd
numbers, so when r ≥ t, (5.14) is unsolvable. If r < t, let d = (m, m ), then

(2r m, 2t m ) = 2r d,

then Eq. (5.15) has exactly d solutions for j. Each j corresponds to one x = g j , then
the number of solutions of Eq. (5.14) to x is N = 2r d, the Lemma holds.

With the above preparation, we now give the proof of Theorem 5.2.

Proof (The proof of Theorem 5.2). Let’s first prove that (i), that is, n and b satisfy
Eq. (5.12), we want to prove that formula (5.6) is satisfied; that is, if n to base b is
a strong pseudo prime number, then n to base b is an Euler pseudo prime number,
write n − 1 = 2t m, m is prime, we prove the property (i) of Theorem 5.2 in three
cases.
n−1
(1) bm ≡ 1(mod n). In this case, it is obvious that b 2 ≡ 1(mod n). Let’s prove
( nb ) = 1, in fact,
1 bm b m
1= = = = 1.
p p p

There is
n−1 b
b 2 ≡ ≡ 1(mod n).
n

That is n to base b is an Euler pseudo prime number.


n−1
(2) b 2 ≡ −1(mod n). In this case, we have to prove ( nb ) = −1, let p|n be any
prime factor of n, write p − 1 = 2t1 m 1 , where t1 ≥ 1, m 1 is an odd number.
210 5 Prime Test

Let’s calculate the Legendre symbol ( bp ), in fact, t1 ≥ t, and



b − 1, if t1 = t;
= (5.16)
p 1, if t1 > t.

Because
n−1 t−1 t−1
b 2 = b2 m
≡ −1(mod n), =⇒ b2 mm 1
≡ −1(mod n),

by p|n, we have
t−1
b2 mm 1
≡ −1(mod p). (5.17)

If t1 < t, from the above formula, there is


t1
b2 m1
≡ −1(mod p), =⇒ b p−1 ≡ −1(mod p).

This contradicts Fermat’s congruence theorem, so we always have t1 ≥ t. If


t1 = t, by (5.17), then

b p−1 t−1
≡b 2 = b2 m
≡ −1(mod p).
p

Because if the above formula is 1, both sides will be m power at the same time,
which will contradict Formula (5.17). If t1 > t, put both sides of Eq. (5.17) to
the power of 2t1 −t at the same time, then ( bp ) = 1, so we have (5.16).
We now
 complete the proof of case (2) under the conclusion of Eq. (5.16), write
n = p|n p, p does not require different, define the positive integer k as

k = #{ p | p|n, p − 1 = 2t1 m 1 , m 1 is odd, t1 = t}.

By (5.16), then
b  b
= = (−1)k . (5.18)
n p

Let’s prove that k is an odd number, because t1 ≥ t, p − 1 = 2t1 m 1 , n − 1 = 2t m,


under mod 2t+1 , we have

1(mod 2t+1 ), if t1 > t;
p≡
1 + 2 (mod 2 ), if t1 = t.
t t+1

Because n = 1 + 2t (mod 2t+1 ), so

n ≡ 1 + 2t ≡ 1 + k · 2t (mod 2t+1 ),
5.2 Euler Test 211

So k must be odd, by (5.18), then ( nb ) = −1. Case (2) is proved.


r −1
(3) b2 ·m ≡ −1(mod n), where 1 ≤ r ≤ t, n − 1 = 2t · m.
In this case, we replace r of Eq. (5.12) with r − 1. Because r − 1 ≤ t − 1, so
n−1
b 2 ≡ 1(mod n). To prove property (i) of Theorem 5.2, we have to prove ( nb ) =
1, as in case (2), we let p|n, write p − 1 = 2t1 · m 1 , m 1 is odd, then we have
t1 ≥ r , and 
b − 1, if t1 = r ;
= (5.19)
p 1, if t1 > r.

The proof of Formula (5.19) is the same as that of case (2), write n = p, p is
not required to be a different prime, define positive integer k1 :

k1 = #{ p | p|n, p − 1 = 2t1 m 1 , m 1 is odd, t1 = r }.

as in case (2), we have ( nb ) = (−1)k1 , similarly, under mod 2r +1 , it can be proved


that k1 must be even. Thus ( nb ) = 1, we have completed all the proofs of property
(i) in Theorem 5.2.
Next, we prove property (ii) in Theorem 5.2. It is also discussed in three cases.

(1) n can be divided by a square number; that is, there is a prime number p, p α ||n,
α ≥ 2.
In this case, we prove that there are at least 41 (n − 1) b, b ∈ Z∗n , n to base
b is not Fermat prime number, let alone a strong pseudo prime. First, suppose
bn−1 ≡ 1(mod n), then there is a prime p, p 2 |n, thus bn−1 ≡ 1(mod p 2 ). Because
Z∗p2 is a p( p − 1)-order cyclic group (see Theorem 5.3), let g be a generator of
Z∗p2 , then
Z∗p2 = {g, g 2 , . . . , g p( p−1) }.

By Lemma 5.12, the number of b satisfying bn−1 ≡ 1(mod p 2 ) is d,

d = (n − 1, p( p − 1)) = (n − 1, p − 1).

Because p|n, so p  n − 1, and p  d; therefore, the maximum possibility of d


is p − 1; therefore, the proportion of b in bn−1 ≡ 1(mod p 2 ) in 1 ≤ b < n shall
not exceed
p−1 1 1
= ≤ .
p2 − 1 p+1 4

Therefore, there is at most b in the proportion of 41 , so that n to base b is Fermat


prime, in case (1), we prove the property (ii) of Theorem 5.2.
(2) n = pq are two different prime numbers.
In this case, let p − 1 = 2t1 m 1 , q − 1 = 2t2 m 2 , m 1 , m 2 to be odd. Without losing
generality, you can let t1 ≤ t2 . Let b ∈ Z∗n , in order for n to base b to be a strong
pseudo prime number, it is necessary to satisfy
212 5 Prime Test

bm ≡ 1(mod p), bm ≡ 1(mod q) (5.20)

or
r r
b2 m ≡ −1(mod p), b2 m ≡ −1(mod q), 0 ≤ r < t. (5.21)

By Lemma 5.12, the number of b satisfied (5.20) is ≤ (m, m 1 )(m, m 2 ) ≤ m 1 m 2 .


By Lemma 5.13, for each r , 0 ≤ r < min(t1 , t2 ) = t1 , the number of b satisfy-
r
ing b2 m ≡ −1(mod n) is 2r (m, m 1 ) · 2r (m, m 2 ) < 4r m 1 m 2 . Because n = pq,
then ϕ(n) = ( p − 1)(q − 1), =⇒ n − 1 > ϕ(n) = 2t1 +t2 , therefore, the propor-
tion of b of the strong pseudo prime of n to base b does not exceed

m 1 m 2 + m 1 m 2 + 4m 1 m 2 + · · · + 4t1 −1 m 1 m 2 −t1 −t2 4t1 − 1


= 2 1 +
2t1 +t2 m 1 m 2 4−1
(5.22)
in 1 ≤ b < n, (b, n) = 1.
If t1 < t2 , then the above formula shall not exceed

2 4t1 2 1 1
2−2t1 −1 + ≤ 2−3 · + = .
3 3 3 6 4

If t1 = t2 , then m 1 = m 2 , so (m, m 1 ) ≤ m 1 and (m, m 2 ) ≤ m 2 , one must be


strictly less than. The reason is that if they are equal, then m 1 |m, m 2 |m, n −
1 = 2t m, =⇒ n − 1 = 2t m = pq − 1 ≡ q − 1(mod m 1 ), thus m 1 |n − 1, =⇒
m 1 |q − 1 = 2t2 m 2 , =⇒ m 1 |m 2 , this is a contradiction. So (m, m 1 ) ≤ m 1 and
(m, m 2 ) ≤ m 2 must have a strict less than 0. We have

1
(m, m 1 ) · (m, m 2 ) ≤ m1m2.
3

If m 1 m 2 is substituted for 13 m 1 m 2 in Eq. (5.22), the proportion of n to b whose


base b is a strong pseudo prime number does not exceed

1 −2t1 2 4t1 1 1 1 1
2 + ≤ + = < .
3 3 3 18 9 6 4

We complete the proof of property (ii) of Theorem 5.2 in case (2).


(3) Finally, suppose n = p1 p2 · · · pk , k ≥ 3 is the product of different prime factors.
In this case, write pi − 1 = 2ti m i , m i as an odd number. As in case (2), with-
out losing generality, it can make t1 ≤ t j (1 ≤ j ≤ k). Similarly to the proof of
formula (5.22), the proportion of b satisfying that n is a strong pseudo prime
number for base b does not exceed
5.2 Euler Test 213

2k+1 − 1 2k − 2 2kt1
2−t1 −t2 −···−tk 1 + ≤ 2−kt1 +
2k − 1 2k − 1 2k − 1
2k − 2 1
= 2−kt1 · k +
2 − 1 2k − 1
2k − 2 1
≤ 2−k k +
2 − 1 2k − 1
= 21−k
1
≤ ,
4
because k ≥ 3, in this way, we have completed all the proofs of Theorem 5.2.

Euler test and strong pseudo prime test require some complex quadratic residual
techniques. We summarize the main conclusions of this section as follows:
(A) n to base b is a strong pseudo prime number ⇒ n to base b is an Euler pseudo
prime number ⇒ n to base b is a Fermat pseudo prime number; therefore, the
strong pseudo prime test is the best way to detect prime numbers.
(B) Although no test can successfully detect a prime number at present, the probabil-
ity detection method of strong pseudo prime number test, that is, Miller–Rabin
method, can obtain that the success probability (see (5.13)) of detecting whether
any odd number n is a prime number can be infinitely close to 1. That is

P{detect whether odd n is prime} > 1 − ε, ∀ ε > 0 given.

Moreover, the computational complexity of the detection algorithm is polyno-


mial.

5.3 Monte Carlo Method

Using all the prime number test methods introduced in the previous two sections, for
a huge odd number n, even if we already know that n is not a prime number, we cannot
successfully decompose n, because the prime number test does not provide prime
factor decomposition information, A more direct method—like the sieve method— √
verifies whether the prime factor of n is for prime numbers not√greater than n,
because a compound number n must have a prime factor p, p ≤ n. Selected p≤
√ √
n
n, the bit operation required to divide n by p is O(log n), there are O( log n ) prime

numbers√ p ≤ n in total, therefore, the bit operation required for such a verification
is O( n). A more effective method was proposed by J. M. Pollard in 1975. We call
it Monte Carlo method, or “rho” method.
f
First, find a convenient mapping f of Zn −→ Zn ; for example, f (x) is an integer
coefficient polynomial, such as f (x) = x 2 + 1; secondly, a prime number x0 is ran-
214 5 Prime Test

domly generated, let x1 = f (x0 ), x2 = f (x1 ), . . ., x j+1 = f (x j )( j = 0, 1, 2, · · · ).


In these x j , we want to find two integers x j and xk , which are different elements in
Zn , but there are some factors d of n, d|n, and x j and xk are the same elements in
Zd , that is to say
x j ≡ xk (mod n), (x j − xk , n) > 1. (5.23)

Once x j and xk are found, the algorithm is said to be completed.

Theorem 5.3 Let S be a set of r elements, let f : S → S is a mapping, x0 ∈ S,


define√x j+1 = f (x j )( j = 0, 1, 2, . . .). Suppose λ is a positive real number, let l =
1 + [ 2λr], then the condition x0 , x1 , . . . , xl is the ratio ≤ e−λ of the mapping f
of elements in different s to the initial value x0 , ( f, x0 ), f in all mappings S and all
x0 ∈ S.

Proof The total number of mappings f from f : S → S is r r , because each x ∈ S,


we can arrange r images for it, that is, f (x) has r choices. The initial value x0 has r
choices, so the total number of ( f, x0 ) is r r +1 . The question is which of these ( f, x0 )
choices can satisfy the condition that x0 , x1 , . . . , xl is a different element in S. we
want to prove that the proportion of ( f, x0 ) satisfying the condition in r r +1 ( f, x0 )
is not greater than ≤ e−λ .
When x0 ∈ S given, there are r x0 choices, then x1 = f (x0 ) has only r − 1 choices
and x2 = f (x1 ) has only r − 2 choices, this goes on until xl = f (xl−1 ), there are
only r − l options. The remaining x ∈ S and f can be selected arbitrarily; that is,
there are r r −l choices. Therefore, when x0 is given, there are N f to make ( f, x0 )
meet the required conditions, where


l
N = r r −l (r − j).
j=0

Divide N by r r +1 , and the proportion of ( f, x0 ) satisfying the condition is

N 
l 
l
j
= r −l (r − j) = 1− , (5.24)
r r +1 j=1 j=1
r

We notice that the real number x ∈ (0, 1), then log(1 − x) < −x. Take the logarithm
to the right of the above formula, then


l
j l
j −l(l + 1) l2
log 1 − <− = <− .
j=1
r j=1
r 2r 2r

√ √
Because of l = 1 + [ 2λr ] > 2λr , from the above formula,
5.3 Monte Carlo Method 215


l
j
log 1 − < −λ.
j=1
r

By (5.24), we have
N
≤ e−λ .
r r +1
We complete the proof of Theorem 5.3.

Monte Carlo method uses a polynomial f (x) ∈ Z[x], so that n is a positive integer,
and the congruence equation of mod n is invariant to polynomial f (x), that is

a ≡ b(mod n), =⇒ f (a) ≡ f (b)(mod n). (5.25)

x0 ∈ Zn given, x j+1 = f (x j )( j = 0, 1, . . .), if you find an xk0 ∈ Zn that satisfies


xk0 ≡ x j0 (mod r ), where r |n, r > 1, k0 > j0 . By (5.25),

f (xk0 ) ≡ f (x j0 )(mod r ), =⇒ xk0 +1 ≡ x j0 +1 (mod r ).

Thus for any k > j, if k − j = k0 − j0 , there is xk ≡ x j (mod r ), this proves that


f
a polynomial mapping Zn −→ Zn produces k0 different residue classes under
mod r (r |n),
{x0 , x1 , . . . , xk0 −1 }.

Therefore, there is the following Lemma 5.14.

Lemma 5.14 f (x) ∈ Z[x] is a polynomial, n > 1 is an positive integer, let x0 ∈ Zn ,


x j = f (x j−1 )( j = 1, 2, . . .), if k is the first subscript, there is a j, 0 ≤ j < k, such
that
(xk − x j , n) = r > 1.

Then {x0 , x1 , . . . , xk−1 } is k different residual classes under mod r , so it is also k


different residual classes under mod n. Moreover, Monte Carlo calculation defined
by f can only produce k different residual classes.

We call the polynomial f and the initial value x0 described in Lemma 5.14 an
average mapping. When the first subscript k is very large, the amount of calculation
is very large. Here we give an improved Monte Carlo algorithm.
f (x) ∈ Z[x] given, Monte Carlo algorithm needs to continuously calculate
xk (k = 1, 2, . . .). Let 2h ≤ k < 2k+1 (h ≥ 0), j = 2h − 1; that is, k is an (h + 1)-
bit number, j is the maximum h-bit number, compare xk with x j and calculate
(xk − x j , n), if (xk − x j , n) > 1, then the calculation is terminated, otherwise con-
sider k + 1. The improved Monte Carlo algorithm only needs to calculate (xk − x j , n)
once for each k , j = 2h − 1. There is no need to verify every j, 0 ≤ j < k, when k
is very large, it reduces a lot of computation, but there is a disadvantage. It may miss
216 5 Prime Test

the smallest subscript k satisfying the condition, but the error is controllable. In fact,
we have the following error estimation.

Lemma 5.15 f (x) ∈ Z[x], n ≥ 1 given, x0 ∈ Zn , x j = f (x j−1 )( j = 1, 2, . . .), let


k0 be the smallest subscript and satisfy (xk0 − x j0 , n) > 1, where 0 ≤ j0 < k0 , assum-
ing that k is the smallest positive integer satisfying (xk − x j , n) > 1 in the improved
Monte Carlo algorithm, we have k ≤ 4k0 .

Proof Suppose k0 has (h + 1) bits. Let j = 2h+1 − 1, k = j + (k0 − j0 ). By Lemma


5.14, then
(xk0 − x j0 , n) > 1, =⇒ (xk − x j , n) > 1.

Obviously, j is the maximum number of (h + 1) bits and k is the number of (h + 2)


bits, so k is the required subscript calculated by the improved Monte Carlo algorithm.
Obviously,

k = j + (k0 − j0 ) ≤ 2h+1 − 1 + 2h+1 < 4 · 2h ≤ 4k0 .

Lemma 5.15 holds.

Example 5.6 Let n = 91, f (x) = x 2 + 1, x0 = 1. By Monte Carlo algorithm,


then x1 = 2, x2 = 5, x3 = 26 and x4 = 40 (because262 + 1 ≡ 40(mod 91)). By the
improved Monte Carlo algorithm, only (x4 − x3 , 91) needs to be detected to obtain

(x4 − x3 , 91) = (14, 91) = 7.

Lemma 5.16 Let n√ be an odd number and a compound number, and r be a factor
of n, r |n, 1 < r < n. Let f (x) ∈ Z[x], x0 ∈ Zn given, then the computational
complexity of finding r by Monte Carlo algorithm ( f, x0 ) is

Time(( f, x0 )) = O( n log3 n) bits. (5.26)

Further, there is a normal number C, so that for any positive real number λ, the
success probability of Monte Carlo algorithm ( f, x0 ) to find a nontrivial factor r of
n is greater than 1 − e−λ , that is

P{( f, x0 )find out r |n, r > 1} ≥ 1 − e−λ . (5.27)

The number of bit calculation operations required by the algorithm


√ √that depends on
parameter λ (to ensure the success rate of the algorithm) is O( λ 4 n log3 n).

Proof From the discussion of computational complexity in Chap. 1, finding the max-
imum common divisor of two integers and the addition, subtraction, multiplication
and division in mod n are polynomial. Let C1 satisfies

Time((y − z, n)) ≤ C1 log3 n, where y, z ≤ n.


5.3 Monte Carlo Method 217

C2 satisfies
Time( f (x) mod n) ≤ C2 log3 n, x ∈ Zn .

If k0 is ( f, x0 ), the first subscript in the calculation satisfies (xk0 − x j0 , n) > 1, by


the improved Monte Carlo algorithm, we have (xk − x j , n) > 1, where j = 2h − 1,
2h ≤ k < 2h+1 . By Lemma 5.15, k ≤ 4k. Thus

Time(found by ( f, x0 ) k) ≤ 4k0 (C1 log3 n + C2 log3 n). (5.28)



Let (xk0 − x j0 , n) = r > 1, r < n, by Lemma 5.14, k0 ≤ r , so
√ √
Time(find r, r |n, r < n) ≤ 4 n(C1 log3 n, C2 log3 n).

Equation (5.26) proved. In the sense of probability, that is, on the premise of allowing
certain errors, Eq. (5.26) can be further improved. √
Let λ > 0 be any given real number, by Lemma 5.3, ratio of k0 ≥ 1√+ 2λr
< e−λ , in other words, the probability of successfully finding r , r |n, r ≤ n is

P{find out r, r |n, r < n} ≥ 1 − e−λ .

In order to ensure the success rate, then k0 ≤ 1 + 2λr . By (5.28), the number of
bit operations required shall not be greater than
√ √ √
4(1 + 2λr )(C1 log3 n + C2 log3 n) = O( λ 4 n log3 n).

We have completed the proof of Lemma.

Remark 5.1 A basic assumption of Monte Carlo method is that the integer coefficient
polynomial f can be used as an average mapping (see Lemma 5.14); this has not
yet been proved.

5.4 Fermat Decomposition and Factor Basis Method

Lemma 5.17 Suppose n is an odd number, there is a 1-1 correspondence between


factorization n = a · b(a ≥ b > 0) of n and expression n = t 2 − s 2 (t and s are
nonnegative integers) of n. The corresponding σ : (a, b) → (t, s) can be written as
σ ((a, b)) = (t, s), where

a+b a−b
σ ((a, b)) = , .
2 2

Inverse mapping is
σ −1 ((t, s)) = (t + s, t − s).
218 5 Prime Test

Proof If n = ab, because both a and b are odd, then n = ( a+b


2
)2 − ( a−b
2
)2 , so define

a+b a−b
σ ((a, b)) = , .
2 2

Conversely, if n = t 2 − s 2 , then n = (t + s)(t − s). So define σ −1 ((t, s)) = (t +


s, t − s), we prove σ −1 σ = 1, σ σ −1 = 1. By the definition,


⎨ σ −1 σ ((a, b)) = σ −1 a+b a−b
, = (a, b),
2 2

⎩ σ (σ −1 ((t, s))) = σ (t + s, t − s) = (t, s).

So σ is a 1-1 correspondence between the two decomposition n = ab = t 2 − s 2 , the


Lemma holds.
The above simple lemma provides us with a method of factor decomposition,
called Fermat factor decomposition: if n = ab, a is very close to b, then n = ( a+b )2 +
√ 2
( 2 ) = t − s , where
a−b 2 2 2
√ s is very small and t is only a little larger2than n. Therefore,
starting from t = [ n] + 1, we successively √ detect whether t − n is a complete
square number. If not, we change it to t = [ n] + 2 for detection. In this way, until
t 2 − n = s 2 , we get n = (t + s)(t − s) through Fermat factorization. This method
is effective when n = ab, a and b are very close.
Fermat factor decomposition can be further expanded into a factor-based method
to become a more effective factor decomposition method. Its basic idea is: in Fermat
factorization, t 2 − n 2 is required to be a complete square, which is difficult to appear
in practice, but t 2 ≡ s 2 (mod n), t ≡ ±s(mod n) is easy to appear. Calculate the
maximum common divisor (t + s, n) and (t − s, n), then we have factorization

n = (t + s, n)(t − s, n).

Definition 5.5 Let B be h different primes (maybe p1 = −1), B is called a factor


base. An integer b is called a B-number, if the minimum nonnegative residue of b2
under mod n can be expressed as the product of prime numbers in B, where n is the
given positive integer.
Example 5.7 Let n = 4633, B = {−1, 2, 3}, then 67, 68, 69 are all B-number,
because 672 ≡ −144(mod 4633), 682 ≡ −9(mod 4633), 692 ≡ 128(mod 4633).
If b is a B-number, b2 mod n represents the minimum nonnegative residue of b2
under mod n, by the definition,


h
b2 mod n = piαi , αi ≥ 0.
i=1

Let e = {e1 , e2 , . . . , eh } ∈ F2h be an h-dimensional binary vector, define


5.4 Fermat Decomposition and Factor Basis Method 219

0, if α j is even;
ej = 1 ≤ j ≤ h.
1, if α j is odd.

e is called the binary vector corresponding to b if {bi } = A is a set of B-numbers. The


binary vector corresponding to each bi is denoted as ei = {ei1 , ei2 , . . . , eih }, denote
bi2 mod n with ai . We have

 
h 
αi j

h
α
ai = pj i∈A
, where ai = p j ij ,
i∈A j=1 j=1


Suppose i∈A ei = (0, 0, . . . , 0) is the zero vector in F2h , then

αi j ≡ 0(mod 2), ∀ 1 ≤ j ≤ h.
i∈A

 
That is, ai is a square number. Let r j = 1
2 i∈A αi j , then
⎛ ⎞2
 
h 
h
ai = ⎝ p jj ⎠
r r
, define c = p jj , (5.29)
i∈A j=1 j=1

On the other hand, bi mod n represents the minimum nonnegative residue of bi under
mod n, let  
b= (bi mod n) = δi , (5.30)
i∈A i∈A

where δi = bi mod n, that is 0 ≤ δi < n, and bi ≡ δi (mod n), thus



bi ≡ b(mod n).
i∈A

Because of ai = bi2 mod n, that is 0 ≤ ai < n, and bi2 ≡ ai (mod n). There is
 
bi2 = b2 ≡ ai = c2 (mod n).
i∈A i∈A

Two different integers b and c defined by Eqs. (5.29) and (5.30) satisfy b2 ≡
c2 (mod n), We write the above analysis as the following lemma.

Lemma 5.18 Let A = {b1 , b2 , . . . , bi , . . .} be a finite set of some B-numbers, let


ei = (ei1 , ei2 , . . . , 
eih ) ∈ F2h be the binary vector corresponding to
bi , ai = bi mod n,
2

δi = bi mod n. If i∈A ei = 0 is the zero vector in F2 , then i∈A ai is a square


h

number. Write
220 5 Prime Test


h
α
 
h 
αi j
ai = p j ij , ai = pj i∈A
= c2 .
j=1 i∈A j=1

where

h
1 
αi j
c= p j2 i∈A
,
j=1

Further let b = δ1 δ2 · · · , we have b2 ≡ c2 (mod n).


From the above lemma, if b2 ≡ c2 (mod n), b ≡ ±c(mod n). Then we will find
a nontrivial factor d = (b + c, n) of n. Now the question is, if b2 ≡ c2 (mod n),
how likely is b ≡ ±c(mod n)? Might as well make (b, n) = (c, n) = 1, otherwise
both sides are divided by (b, n)2 : by b2 ≡ c2 (mod n), =⇒ (bc−1 )2 ≡ 1(mod n). The
problem is transformed into how many solutions x are in x 2 ≡ 1(mod n), 1 ≤ x < n.
Lemma 5.19 Let n be an odd number, then the number of solutions of x 2 ≡ 1(mod n)
is 2r , where r is the number of different prime factors of n.
Proof If r = 1, then n = p α (α ≥ 1), p is an odd prime, now x 2 ≡ 1(mod p α ) has
two solutions x = ±1, because let g be the original root of mod p α , then x = g t (1 ≤
t ≤ p α−1 ( p − 1)), x 2 = 1 ⇔ p α−1 ( p − 1)|2t. So there are only two solutions t =
p ( p − 1) and t = p α−1 ( p − 1). So x ≡ ±1(mod p α ). If n = p1α1 · · · prαr , then
1 α−1
2
the number of solutions of x 2 ≡ 1(mod n) deduced from the Chinese remainder
theorem is 2r . The Lemma holds!
Lemma 5.20 n is an odd number and is the product of the power of more than
two different primes, B = { p1 , p2 , . . . , ph } is a factor base. Randomly select two
B-numbers b and c, then b2 ≡ c2 (mod n), =⇒ b ≡ ±c(mod n)’s rate is ≤ 21 .
Proof x 2 ≡ 1(mod n) has 2r different solutions (mod n), r ≥ 2. The two solutions
corresponding to x ≡ ±1(mod n) correspond to b ≡ ±c(mod n). Thus

2 1
b2 ≡ c2 (mod n), =⇒ b ≡ ±c(mod n)’s rate ≤ r
≤ ,
2 2
Lemma 5.20 holds.
According to Lemma 5.20, b and c are selected by using factor basis, if b ≡
±c(mod n), then select failure, and the probability of failure is ≤ 21 . If the selection
fails, select another b1 and c1 , in this way, we randomly select k b and c equally
almost independently, and the probability of success of b ≡ ±c(mod n) is

1
P{b2 ≡ c2 (mod n), b ≡ ±c(mod n)} ≥ 1 − . (5.31)
2k
In other words, the probability of finding a nontrivial factor d = (b + c, n) of n by
using the factor base can be infinitely close to 1. Below, we systematically summarize
the factor base decomposition method as follows:
5.4 Fermat Decomposition and Factor Basis Method 221

Factor-based method
Let n be a large odd number and y be an appropriately selected integer (e.g.,
1
y ≤ n 10 ), let the factor base be

B = {−1, p | p is prime, p ≤ y}.

Select a certain number of B-number at random, A1 = {b1 , b2 , . . . , b N }, usually


N ≤ π(y) + 2 will meet the needs. Each bi is expressed as the product of prime
numbers in B. Calculate  the corresponding binary vector ei , select a subset A ⊂ A1
in A1 , such that i∈A ei = 0, bi corresponding to binary vector ei , denote as A =
{b1 , b2 , . . . , bi , . . .}. Let
 
b= (bi mod n) = δi , where δi = bi mod n
i∈A i∈A

and
 r 1
c= p j j mod n, r j = αi j .
j∈B
2 i∈A

We have b2 ≡ c2 (mod n), if b ≡ ±c(mod n), then reselect the subset A, Until finally
b ≡ ±c(mod n), in this way, we find a nontrivial factor d|n of n, d = (b + c, n).
Therefore, there is factorization n = d · dn .
Factor decomposition using factor-based method cannot guarantee the success
rate of 100% because b ≡ ±c(mod n) cannot be deduced from b2 ≡ c2 (mod n),
however, the success probability of factorization for large odd n can be infinitely
close to 1. Under the condition of success probability ≥ 1 − 21k (k is a given normal
number), the computational complexity of factorization n of by factor-based method
can be estimated as

Time(factor-based method to n factorization) = O(ec log n log log n
). (5.32)

The proof of Formula (5.32) is relatively complex. No detailed proof is given here.
Interested readers can refer to pages 136–141 of (Pomerance, 1982a) in reference 5.
The exact value of C in (5.32) is unknown. It is generally guessed that C = 1 + ε,
where ε > 0 is any small positive real number.
Let k be the√number of bits of n, and the estimate on the right of (5.32) can be
written as O(ec k log k ). Therefore, the computational complexity of the factor-based
method is sub-exponential. Compared with the Monte Carlo method introduced in the
previous section (see (5.31)), its computational complexity is exponential, because

√ 1
O( n) = O(ec1 k ), where c1 = log 2.
2
As we all know, the security of RSA public key cryptography is based on the
prime factorization n = pq of n. Although there is no general method to factor-
222 5 Prime Test

ize any large odd n, although Monte Carlo method and factor-based method are
probability calculation methods, the probability of successful factorization is very
large, The disadvantage is that their computational complexity is exponential and
sub exponential, which is the reason for choosing huge prime numbers p and q in
RSA.

5.5 Continued Fraction Method

In the factor-based method introduced in the previous section, b2 mod n can be the
residual of the minimum absolute value of b2 under mod n, that is
n
b2 ≡ b2 mod n(mod n), |b2 mod n| ≤ .
2

In this way, b2 mod n can be decomposed into the product of some smaller prime
numbers. The continued fraction method √ is the best method at present. How to find
the integer b, so that |b2 mod n| < 2 n, b2 mod n is more likely to be decomposed
into the product of some small prime numbers. First, we introduce what is continued
fraction and some basic properties.
Suppose x ∈ R is a real number, [x] is the integer part of x, and {x} is the decimal
part of x. Let a0 = [x], if {x} = 0, and let a1 = [ {x}
1
], because of x = [x] + {x}, there
is
1 1
x = a0 + = a0 + .
{x} a1 + {{x}−1 }

If {{x}−1 } = 0, write
a2 = [{{x}−1 }−1 ],

consider
{{{x}−1 }−1 }−1 ,

So we got
1
x = a0 + .
a1 + 1
a2 + a 1
3 +···

The above formula is called the continued fraction expansion of real number x. To
save space, write x = [a0 , a1 , . . . , an , . . .], if and only if x is a rational number, the
continued fraction of x is expanded to be finite, denote as

x = [a0 , a1 , . . . , an ], where an > 1.

It is called the standard expansion of rational number x.


5.5 Continued Fraction Method 223

Definition 5.6 x = [a0 , a1 , . . . , an , . . .] is the continued fraction expansion of x, for


i ≥ 0, call bcii = [a0 , a1 , . . . , ai ] the ith asymptotic fraction of x, specially,

b0 a0 b1 a1 a0 + 1
= , = .
c0 1 c1 a1

The progressive fraction bcii of the real number x is a reduced fraction, that is
(bi , ci ) = 1, and has the following properties.

Lemma 5.21 x = [a0 , a1 , . . . , an , · · · ] is the continued fraction expansion of x, bi


ci
is the asymptotic fraction, then
(i) when i ≥ 2,
bi ai bi−1 + bi−2
= . (5.33)
ci ai ci−1 + ci−2

(ii) If i ≥ 1, then
bi ci−1 − bi−1 ci = (−1)i−1 . (5.34)

Proof We prove that (i) by induction. Obviously, the proposition of i = 2 holds, that
is
b2 a2 b1 + b0 a2 (a1 a0 + 1) + a0
= = .
c2 a2 c1 + c0 a2 a1 + 1

If the proposition holds for i, that is

bi ai bi−1 + bi−2
= .
ci ai ci−1 + ci−2

Then write [a0 , a1 , . . . , ai , ai+1 ] = [a0 , a1 , . . . , ai + 1


ai+1
],

bi+1 ai + ai+1
1
bi−1 + bi−2 ai+1 bi + bi−1
=  = .
ci+1 ai + ai+1
1
ci−1 + ci−2 ai+1 ci + ci−1

So (i) holds.
We prove Formula (5.34) by induction, when i = 1,

b1 c0 − b0 c1 = a1 a0 + 1 − a1 a0 = 1 = (−1)0 .

So when i = 1, the proposition holds, and when i, the proposition holds, that is

bi ci−1 − bi−1 ci = (−1)i−1 .


224 5 Prime Test

Then
bi+1 ci − bi ci+1 = (ai+1 bi + bi−1 )ci − bi (ai+1 ci + ci−1 )
= bi−1 ci − bi ci−1
= (−1)i .

Lemma 5.21 holds.


Continued fractions have many important applications in numbers, such as rational
approximation of real numbers and rational approximation of algebraic numbers.
Periodic continued fractions are an important special case in rational approximation
of algebraic numbers. x = [a0 , a1 , . . . , an , . . .]. If these ai occur in cycles of a certain
length, they are called periodic continued fractions. The famous Lagrange theorem
shows that the necessary and sufficient condition for the expansion of the continued
fraction of x into a periodic continued fraction is that x is a quadratic real algebraic
number. Here we do not discuss some profound properties of continued fractions,
but only prove some properties we need.
Lemma 5.22 Let x > 1 be a real number, bcii (i ≥ 0) is the asymptotic fraction of x,
then
|bi 2 − x 2 ci2 | < 2x, ∀ i ≥ 0.

Proof Because x is between progressive scores bcii and bci+1


i+1
, by property (ii) of Lemma
5.21, there is  
 bi+1 bi  1
 −  = , i ≥ 0.
c ci ci ci+1
i+1

Thus   
 bi   bi 
|bi −
2
x 2 ci2 | 
= x −  x + 
ci2
ci ci
1 1
< ci2 · x+ x+ .
ci ci+1 ci ci+1

So  
ci 1
|bi 2 − x 2 ci2 | − 2x < 2x −1 + + 2
ci+1 2xci+1
ci 1
< 2x −1 + +
ci+1 ci+1
ci+1
< 2x −1 + = 0.
ci+1

The Lemma holds.


Lemma 5.23 Let n be a positive integer and n not a complete square. Let { bcii }i≥0

be the asymptotic fraction of the continued fraction expansion of n, and bi2 mod n
be the residue of the minimum absolute value of bi2 under mod n, then we have
5.5 Continued Fraction Method 225

bi2 mod n < 2 n, ∀ i ≥ 0.

Proof By Lemma 5.22, let x = n, then

bi2 ≡ bi2 − nci2 (mod n).

Because √ √
|bi2 − nci2 | < 2 n, =⇒ bi2 mod n < 2 n, ∀ i ≥ 0.

The Lemma holds.

Combining the above Lemma 5.23 with the factorization method, we obtain the
continued fraction decomposition method.
Continued fraction decomposition method:
The operations of mod n involved in this algorithm, except that it is specially
pointed out, are the minimum nonnegative residue of mod n. If n √ is a large odd
number,
√ it is also
√ a compound number, first let b−1 = b, b0 = a 0 = [ n], and x0 =
n − a0 = { n}, calculate b02 mod n, in fact, b02 mod n = b02 − n. Second, consider
i = 1, 2, . . .. To determine bi , we proceed in several steps:
1. Let ai = [ xi−1
1
], and xi = xi−1
1
− ai (i ≥ 1).
2. Let bi = ai bi−1 + bi−2 , the minimum nonnegative residual bi mod n of bi under
mod n is still recorded as bi .
3. calculate bi2 mod n.

By Lemma 5.23, bi2 mod n < 2 n, it can be decomposed into the product of some
small prime numbers. If a prime number p appears in the decomposition of two or
more bi2 mod n, or in the decomposition of an bi2 mod n, p appears to an even power,
p is called a standard prime number, in other words, a standard prime p is

p|bi2 mod n, p|b2j mod n, i = j.

Or
p α bi2 mod n, α is even.

We choose factor base B as

B = {−1, standard prime}.

In this way, all bi2 mod n are B-numbers,


 and the corresponding binary vector is ei .
Select a subset A = {bi }, =⇒ i∈A ei = 0. Let
 
b= (bi mod n) = δi
i∈A i∈A

 r
and c = j∈B p j j , where
226 5 Prime Test

1
rj = αi j , ∀ j ∈ B.
2 i∈A

If b ≡ ±c(mod n), then (b + c, n) is a nontrivial factor of n, and we obtain the


factorization of n. If b ≡ ±c(mod n), then another subset A is selected and repeated
to complete the continued fraction factorization method.

Example 5.8 The continued fraction method is used to factorize n = 9073.

Solution: We calculate ai , bi and bi2 mod n in turn, where bi = (ai bi−1 + bi−2 ) mod n,
the table is as follows:

i 0 1 2 3 4
ai 95 3 1 26 2
bi 95 286 381 1119 2619
bi2 mod n −48 139 −7 87 −27

From the value of bi2 mod n, we can choose the factor base B as B = {−1, 2, 3, 7}.
Then bi2 mod n is the number of B-number, when i = 0, 2, 4, . . .. The corresponding
binary vector is

e0 = (1, 4, 1, 0), e2 = (1, 0, 0, 1), and e4 = (1, 0, 3, 0).

Easy to calculate e0 + e4 = (0, 0, 0, 0). Therefore, we choose



b = 95 · 2619 ≡ 3834 mod 9073;
c = 22 · 32 = 36.

Because b2 ≡ c2 (mod 9073), that is 38342 = 362 (mod 9073), but 3834 ≡ ±36
(mod 9073), so we get a nontrivial factor of n = 9073, d = (3834 + 36, 9073) = 43.
Thus 9073 = 43 · 211, the factorization of 9073 is obtained.
Exercise 5
1. p is a prime, if and only if b p−1 ≡ 1(mod p 2 ), p 2 to base b is a Fermat pseudo
prime.
2. What is the minimum pseudo prime number with Fermat pseudo prime for base
5? What is the minimum Fermat pseudo prime number for base 2?
3. n = pq, p = q are two primes, let d = ( p − 1, q − 1), it is proved that n to base
b is Fermat pseudo prime number, if and only if bd ≡ 1(mod n), and calculate
the number of bases b.
4. If b ∈ Z∗n , n to base b is Fermat pseudo prime, then n to base −b and b are Fermat
pseudo prime numbers.
5. If n to base 2 is Fermat pseudo prime, then N = 2n − 1 is also Fermat pseudo
prime.
n
−1
6. If n to base b is Fermat pseudo prime, and (b − 1, n) = 1, then N = bb−1 to
base b is also Fermat pseudo prime.
5.5 Continued Fraction Method 227

7. Prove that the following integers are Carmichael numbers:

1105 = 5 · 13 · 17, 1729 = 7 · 13 · 19, 2465 = 5 · 17 · 29, 2821 = 7 · 13 · 31,

6601 = 7 · 23 · 41, 29,341 = 13 · 37 · 61, 172,081 = 7 · 13 · 31 · 61, 278,545 = 5 · 17 · 29 · 113.

8. Find all Carmichael numbers of form 3 pq and all Carmichael numbers of form
5 pq.
9. Prove that 561 is the minimum Carmichael number.
10. If n to base 2 is a Fermat pseudo prime, prove N = 2n − 1 is a strong pseudo
prime.
11. There are infinite Euler pseudo primes and strong pseudo primes for base 2.
12. If n to base b is a strong pseudo prime, then n to base bk is also a strong pseudo
prime for any integer k.
13. The Fermat factorization method is used to decompose the positive integer as
follows:

n = 8633, n = 809,009, n = 92,296,873, n = 88,169,891.

14. The Fermat factorization method is used to decompose the positive integer as
follows:

n = 68,987, n = 29,895,581, n = 19,578,079, n = 17,018,759.

15. Expand the rational number x = 45 89


, x = 5589
, x = 1.13 into continued fractions.
16. Let a be a positive integer, x = [a, a, a, · · · ], calculate x =?

References

Adelman, L. M., Pomerance, C., & Rumely, R. S. (1983). On distinguishing prime number from
composite numbers. Annals of Mathematics, 117, 173–206.
Berent, R. P., & Pollared, J. M. (1981). Factorization of the eighth Fermat number. Mathematics of
Computation, 36, 627–630.
Blair, W. D., Lacampague, C. B., & Selfridge, J. L. (1986). Factoring large numbers on a pocket
calculator. The American Mathematical Monthly, 93, 802–808.
Brent, R. P. (1980). An improved Monte Carlo factorization algorithm. BIT, 20, 176–184.
Cohen, H., & Lenstra, H. W. (1984). Primality testing and Jacobi sums. Mathematics of Computa-
tion, 142, 297–330.
Dawonport, H. (1982). The higher arithmetic. Cambridge University Press.
Dickson, L. E. (1952). History of the theory of number (Vol. 1). Chelsea.
Dixon, J. D. (1984). Factorization and primality tests. The American Mathematical Monthly, 91,
333–352.
Guy, R. K. (1975). How to factor a number. In Proceedings of the 5th Manitoba Conference on
Numerical Mathematics (pp. 49–89).
Kranakis, E. (1986). Primality and cryptography. Wiley.
228 5 Prime Test

Lehman, R. S. (1974). Factoring large number. Mathematics of Computation, 28, 637–646.


Lehmer, D. H., & Powers, R. E. (1931). On factoring large number. Bulletin of the American
Mathematical Society, 37, 770–776.
Miller, G. L. Riemann’s hypothesis and tests for primality. In Proceedings of the 7th Annual ACM
Symposium on the Theory of Computing (pp. 234–239).
Morrison, M. A., & Brillhart, J. (1975). A method of factoring and the factorization of F7 . Mathe-
matics of Computation, 29, 183–205.
Pollard, J. M. (1975). A Monte Carlo method for factorization. BIT, 15, 331–334.
Pomerance, C. (1981). Recent development in primality testing. The Mathematical Intelligencer,3,
97–105.
Pomerance, C. (1982a). Analysis and comparison of some integer factoring algorithms. Computa-
tion Methods in Number Theory, Part 1.
Pomerance, C. (1982b). The search for prime number. Scientific American,427, 136–147.
Pomerance, C., & Wagstaff, S. S. (1983). Implementation of the continued fraction integer fac-
toring algorithm. In Proceedings of the 12th Winnipeg Conference on Numerical Methods and
computing.
Rabin, M. O. (1980). Probabilities algorithms for testing Primality. Journal of Number Theory, 12,
128–138.
Solovag, R., & Strassen, V. (1977). A fast Munte Carlo test for primality. SIAM Journal for Com-
puting, 6, 84–85.
Wagon, S. (1986). Primality testing. The Mathematical Intelligence, 8, 58–61.
Wunderlich, M. C. (1979). A running time and analysis of Brillhart’s continued fraction factoring
method. Number Theory, Carbondale, Springer Lecture Notes, 175, 328–342.
Wunderlich, M. C. (1985). Implementing the continued fraction factoring algorithm on parallel
machines. Mathematics of Computation, 44, 251–260.

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 6
Elliptic Curve

In 1985, mathematician v. Miller introduced elliptic curve into cryptography for the
first time. In 1987, mathematician N. Koblitz further improved and perfected Miller’s
work and formed the famous elliptic curve public key cryptosystem. Elliptic curve
public key cryptosystem, RSA public key cryptosystem and ElGamal public key
cryptosystem based on discrete logarithm are recognized as the three major public
key cryptosystems, which occupy the most prominent position in modern cryptogra-
phy. Compared with RSA cryptography, elliptic curve cryptography can provide the
same or higher level of security with a shorter key; compared with ElGamal cryp-
tosystem, they are based on the same mathematical principle and are essentially based
on discrete logarithm cryptosystem. ElGamal cryptosystem is based on the discrete
logarithm of multiplication group over finite field, and elliptic curve cryptosystem is
based on the discrete logarithm of Mordell group of elliptic curve over finite field, but
choosing elliptic curve has more flexibility than choosing finite field, so elliptic curve
cryptosystem has attracted more attention This paper systematically and comprehen-
sively introduces elliptic curve cryptography from the three aspects of cryptography
mechanism and factorization, in order to make readers better understand and master
this public key cryptography mechanism.

6.1 Basic Theory

The working platform of this chapter is a field E, especially E = R(real number


field), E = C(complex field), E = Q(rational number field) or E = Fq (Finite field
of q elements) four common fields. The characteristic χ (E) of a field E is the order
of the multiplicative unit element e of E in the additive group. That is, χ (E) = o(e)
is a prime number or ∞, specifically,

© The Author(s) 2022 229


Z. Zheng, Modern Cryptography Volume 1, Financial Mathematics and Fintech,
https://doi.org/10.1007/978-981-19-0920-7_6
230 6 Elliptic Curve

∞, if E = C, R, Q,
χ (E) =
p, if E = Fq , q = pr .

Definition 6.1 (i) Suppose E is a field, the character of E χ (E) = 2, 3, f (x) =


x 3 + ax + b ∈ E[x] is a cubic polynomial and has no multiple roots in the split
field. An elliptic curve in field E refers to the set of finite points (x, y) ∈ E 2 plus
infinity on the “plane,” where the finite point (x, y) satisfies

y 2 = x 3 + ax + b, where a ∈ E, b ∈ E given.

C E represents the elliptic curve, and “O” represents the infinity point, i.e.,

C E = {(x, y) ∈ E 2 |y 2 = x 3 + ax + b} ∪ {O}. (6.1)

(ii) If χ (E) = 2, then an elliptic curve C E on the field E with the characteristic
of 2 is defined as

C E = {(x, y) ∈ E 2 |y 2 + y = x 3 + ax + b} ∪ {O}. (6.2)

(iii) If χ (E) = 3, x 3 + ax 2 + bx + c ∈ E[x] has no multiple roots in the split


field, then an elliptic curve C E on E is defined as

C E = {(x, y) ∈ E 2 |y 2 = x 3 + ax 2 + bx + c} ∪ {O}. (6.3)

Let F(x, y) ∈ E[x, y] be a bivariate polynomial, then F(x, y) = 0 defines an


algebraic curve C on E. (x0 , y0 ) ∈ C is called a nonsingular point on C, if at least
one of the partial derivatives ∂∂ Fx and ∂∂Fy at (x0 , y0 ) is not 0. If χ (E) = 2, 3, let f (x) =
x 3 + bx + c, then the finite points of an elliptic curve F(x, y) = y 2 − f (x) = 0 on E
are nonsingular points, which is the same as that in χ (E) = 2, χ (E) = 3. Therefore,
an elliptic curve is also called a nonsingular cubic curve.
Among many profound arithmetic properties of elliptic curves, Mordell group
on elliptic curves is the most beautiful and important basic property. Firstly, we
introduce Mordell group when E = R is familiar with real number field and then
extend it to finite field.
Elliptic curve over real number field
Definition 6.2 Let E = R be real number field, C E is an elliptic curve, P and Q
are two points on C E , that is P ∈ C E , Q ∈ C E , we define addition according to the
following rules:
(1) If P = O is infinity, define that P + P = O is still infinity; that is, infinity is
the unit element of addition, and the negative element of P is −P = P = O.
(2) If P = (x, y) ∈ C E is a finite point. Define −P = (x, −y), obviously, −P ∈
C E , is the specular reflection point of point P on the x y−plane.
(3) If P, Q ∈ C E are two finite points, they have different x-coordinates (i.e.,P =
(x1 , y1 ), Q = (x2 , y2 ), x1 = x2 ), then there is exactly a point R on the connecting
6.1 Basic Theory 231

line between P and Q on the x y−plane, which is the intersection of the connecting
line and the elliptic curve, define P + Q = −R, is the specular reflection point of
R. If Q is infinity. Then define P + O = P.
(4) If Q = −P, that is, P and Q have the same x-coordinate, and P + Q = O is
defined as infinity.
(5) If P = Q is a finite point on C E . Then the tangent of C E at P has exactly an
intersection R with C E , define P + P = −R.
We use the geometric construction method to define the addition on elliptic curve
C E , for the connection of finite points with different x-coordinates and why the
tangent at the finite point has only a unique intersection with C E , it needs strict
mathematical proof. We attribute it to the following lemma.
Lemma 6.1 Let P = (x1 , y1 ), Q = (x2 , y2 ) be two finite points on elliptic curve
C E , and x1 = x2 , then
(i) The line between P and Q has only a unique intersection R = (x3 , y3 ) with
C E , satisfies R = P, R = Q, where

x3 = ( xy22 −y ) − x1 − x2 ,
1 2
−x1
(6.4)
y3 = −y1 + ( xy22 −y 1
−x1
)(x1 − x3 ).

(ii) Let α be the value of derivative ddyx at point P, then the tangent of point P and
C E only have a unique intersection R = (x3 , y3 ), R = P, where
 3x12 +a 2
x3 = ( 2y1
) − 2x1 ,
3x12 +a (6.5)
y3 = −y1 + ( 2y1 )(x1 − x3 ).

Proof Let the functional equation of the connecting line between P and Q be y =
αx + β on the x y−plane, where

y2 − y1
α= , β = y1 − αx1 .
x2 − x1

A point (x, αx + β) on line y = αx + β is on elliptic curve C E if and only if

(αx + β)2 = x 3 + ax + b. (6.6)

Therefore, the three solutions of x 3 − (αx + β)2 + ax + b = 0 are x, and each solu-
tion will produce an intersection. But we assume that P and Q are at the intersection,
so there is only the third intersection R = (x, αx + β) = (x3 , αx3 + β). Because the
three solutions x1 , x2 , x3 of equation (6.6) satisfy the following relationship

x1 + x2 + x3 = α 2 .

There is
232 6 Elliptic Curve

x3 = ( xy22 −y ) − x1 − x2 ,
1 2
−x1
y3 = αx3 + β = −y1 + ( xy22 −y 1
−x1
)(x1 − x3 ).

Thus, (6.4) holds. If point Q is infinitely close to point P, the connecting line becomes
the tangent of curve C E at point P, now

dy 3x 2 + a
α= |(x1 ,y1 ) = 1 .
dx 2y1

So the tangent has a unique intersection with C E , R = P, R = (x3 , αx3 + β), where
 3x12 +a 2
x3 = α 2 − 2x1 = ( 2y1
) − 2x1 ,
3x 2 +a
y3 = −y1 + ( 2y1 1 )(x1 − x3 ).

(6.5) holds, So as to complete the proof of Lemma (Fig. 6.1).

Example 6.1 On the real plane, we give a specific example y 2 = x 3 − x to illustrate


the addition rule on this elliptic curve:
The point of C E in the left half plane is called the torsion point of C E , and the
point of C in the right half plane is called the free point of C E .

Remark 6.1 In Lemma 6.1, if P = (x1 , 0), that is y1 = 0, then the only intersection
of the tangent of point P and C E is defined as the infinity point “O  .

From Definition 6.1 and Lemma 6.1, we have the following important corollaries.

Fig. 6.1 Elliptic Curve


6.1 Basic Theory 233

Corollary 6.1 (i) All points of elliptic curve C E form an Abel group under addition,
in which the infinity point “O” is the zero element of the group. This group is called
Mordell group.
(ii) If P = (x1 , y1 ), Q = (x2 , y2 ) is a rational point, that is, x1 , y1 , x2 , y2 is a
rational number, then another unique intersection R between the line between P
and Q and C E is also a rational point.

Proof (i) is directly given by Definition 6.1. Conclusion (ii) is directly derived from
Formula (6.4) and Formula (6.5) of Lemma 6.1.

Elliptic curves over rational fields


Let E = Q, then a, b, c in Definition 6.1 are rational numbers. Elliptic curves
over rational number fields are one of the most important research topics in modern
number theory. There are many important conclusions and famous number theory
problems related to them, such as the famous “BSD” conjecture, the ancient con-
gruence problem and so on. Mordell theorem is the most basic conclusion of elliptic
curves over rational fields. Since cryptography only cares about elliptic curves over
finite fields, here we briefly introduce some important results without proof.
Let C E be an elliptic curve in the field of rational numbers. From Corollary 6.1,
all points of C E form an Abel group G. In algebra, an Abel group is equivalent
to a module over an integer ring, so an Abel group is also called Z−module. The
Mordell group on elliptic curve C E is regarded as a Z−module G, according to the
decomposition theorem of modules on the principal ideal ring, a Z−module can be
decomposed into the direct sum of a twisted module and a free module. Therefore,
the Mordell group G on C E has

G = T or (G) ⊕ Fr ee(G).

Mordell first proved the following important conclusions. Mordell Theorem: The
Abel group G on elliptic curve C E (E = Q is a rational number field) is finitely
generated; in other words, G is a finitely generated Z-module. Therefore, Mordell
group G can be decomposed into

G = T or (G) ⊕ Z(α1 , α2 , . . . , αr ).

where α1 , α2 , . . . , αr is a set of bases of free module Fr ee(G) and r is the rank of


free module. The rank r is only known to be finite, but how to calculate it is a famous
number theory problem. The so-called BSD conjecture holds that r can be given by
the function value of L-function on elliptic curve, but it has not been fully proved at
present.
Another problem related to elliptic curves is the ancient congruence problem,
which can be traced back to Plato’s time in ancient Greece.
The congruent number problem: if n > 1 is a positive integer, is there a right
triangle with rational side length, and its area is exactly n?
234 6 Elliptic Curve

This problem is equivalent to the rank r > 0 of elliptic curve y 2 = x 3 − n 2 x, at


present, this problem has not been completely solved. Chinese mathematicians prof.
Tian Ye have made substantial progress in this problem.
Elliptic curves over finite fields
Let E = Fq be a q-element finite field, q = pr , and p be a prime number. Let

y 2 − f (x), if p = 2,
F(x, y) = (6.7)
y 2 + y − f (x), if p = 2.

where 
x 3 + ax + b, if p = 2,
f (x) = a, b, c ∈ Fq . (6.8)
x 3 + ax 2 + bx + c, if p = 2.

Then an elliptic curve C E on Fq is defined as

C E = {(x, y) ∈ Fq2 |F(x, y) = 0} ∪ {O}. (6.9)

where “O  is the infinity point.


Obviously, the number of points in C E is limited, let Nq = |C E |, be called the
number of points of elliptic curve in Fq . Nq ≤ 2q + 1 is a trivial estimate, because
each x ∈ Fq has at most two y values, together with the infinity point. The more accu-
rate estimation of Nq depends on the Riemann hypothesis on the field of univariate
algebraic functions proved by A. Weil, which is a very profound result in mathemat-
ics. A. Hasse proved the following results when F(x, y) is an elliptic curve.

Theorem 6.1 (Hasse Theorem) Let Nq be the number of elliptic curve F(x, y) = 0
at the midpoint of Fq , then we have

|Nq − (q + 1)| ≤ 2 q.

Proof Let χ be a quadratic real feature in Fq , that is




⎨0, if a = 0,
χ (a) = 1, if a = b2 , b ∈ Fq ,


−1, otherwise.

By definition, it is obvious that the number of solutions of u 2 = a in Fq is 1 + χ (a),


so suppose Nq is the number of solutions of elliptic curve F(x, y) = 0 in Fq , where
F(x, y) is given by Eq. (6.7), then
6.1 Basic Theory 235

Nq = 1 + (1 + χ (x 3 + ax + b))
x∈Fq
 (6.10)
=q +1+ χ (x 3 + ax + b).
x∈Fq

We use Fq (x) to represent the rational function field on Fq , then the univariate
algebraic function field defined by y 2 = f (x) can be regarded as a quadratic finite
extension field on Fq (x). The genus d of this function field is d = 3. Hasse can
prove that the Riemann hypothesis on this special algebraic function field is true;
that is, all zeros of the corresponding Riemann ξ −function lie on the straight line of
s = 21 + it. A special case of this conclusion is
 √ √
| χ (x 3 + ax + b)| ≤ (d − 1) q = 2 q. (6.11)
x∈Fq

By (6.10),

|Nq − (q + 1)| ≤ 2 q.

We have completed the proof.

Remark 6.2 (6.11) is called the characteristic sum over a finite field, so that g(x) ∈
Fq [x] is any polynomial and χ is any nontrivial multiplication characteristic over Fq ,
according to A. Weil’s famous theorem, we have the following general characteristics
and estimates,  √
| χ (g(x))| ≤ (deg g − 1) q.
x∈Fq

Let’s briefly introduce A. Weil’s theorem. Let Fq n be an n-th extension on Fq ,


that is n = [Fq n : Fq ]. Nq n is the number of solutions of elliptic curve F(x, y) = 0
in extended field Fq n . Zeta function Z (T, C E ) on elliptic curve C E is defined as the
formal power series of T :

+∞
 1
Z (T ) = Z (T, C E ) = exp( Nq n T n ). (6.12)
n=1
n

where exp(a) = ea is an exponential function. A. Weil proves that Z (T ) is a rational


function, i.e.,
qT 2 − αT + 1
Z (T ) = . (6.13)
(1 − T )(1 − qT )

where α is an integer depends on elliptic curve C E . In fact, the above formula is


valid for general algebraic curves. Because of Nq = q + 1 − α, and α 2 − 4q < 0
(Hasse theorem). Therefore, zeta function Z (T ) has two complex roots, that is, the

two solutions α1 and α2 of qT 2 − αT + 1 = 0, and | α11 | = | α12 | = q. This is the
236 6 Elliptic Curve

Riemann hypothesis on elliptic curves. A. Weil proved it on general algebraic curves


for the first time. See Chap. 5 of Silverman (1986 of reference 6) for the specific
proof process.
From the above a. Weil theorem, take logarithms on both sides of Eq. (6.12) and
compare the coefficients on both sides of the formal power series. Let Nq n be the
number of points of the elliptic curve in Fq n , then
n
|Nq n − (q n + 1)| ≤ 2q 2 (n ≥ 1). (6.14)

The above formula can also be derived directly from Hasse theorem.
Now let’s look at a specific elliptic curve in F2 , y 2 + y = x 3 ; thus, we have a
better understanding of A. Weil’s theorem. Because F(x, y) = y 2 + y − x 3 = 0 has
three points in F2 , the zeta function on the elliptic curve,

+∞
Nn n
Z (T ) = exp( T )
n=1
n
2T 2 + 1
= .
(1 − T )(1 − 2T )
√ √
Write 2T 2 + 1 = (1 − α1 T )(1 − α2 T ), where α1 = i 2, α2 = −i 2. Take loga-
rithms on both sides of the above formula and compare the coefficients of T n on both
sides, 
2n + 1, if n is odd,
Nn = n n
2 + 1 − 2(−2) 2 , if n is even.

Where Nn represents the number of points of elliptic curve y 2 + y = x 3 in F2n .


Finally, the Mordell group of elliptic curve on Fq is a finite Abel group of order
Nq ; according to the classification theorem of finite Abel groups, this group can be
expressed as the direct sum of two cyclic groups, which will be further explained
when necessary.

6.2 Elliptic Curve Public Key Cryptosystem

An elliptic curve over a finite field Fq forms a finite Abel group G, which is similar to
Fq∗ ; therefore, the elliptic curve public key cryptosystem can be constructed by using
discrete logarithm. Compared with other public key cryptosystems based on discrete
logarithm (such as ElGamal cryptosystem), elliptic curve cryptosystem has greater
flexibility, because when a huge q is selected, the working platform of ElGamal
cryptosystem has only one multiplication group Fq∗ , but multiple elliptic curves can
be defined on Fq , so there will be multiple Mordell groups to choose, and elliptic
curve cryptosystem has greater concealment and security.
6.2 Elliptic Curve Public Key Cryptosystem 237

Before introducing elliptic curve cryptography, we first discuss the computational


complexity on two species group. The computational complexity of multiplication
over finite field Fq has been discussed in Chap. 4 Lemma 4.12, specially, α ∈ Fq ,
k is an integer, then T ime(α k ) = O(log k log3 q). In the case of elliptic curves, the
Mordell group G is an addition operation, so that P ∈ G is a point. k P means that
the points P are added k times continuously.
Lemma 6.2 Let E = Fq be a q-element finite field, C E be an elliptic curve on Fq
defined by Weierstrass equations (6.7), (6.8) and (6.9), P ∈ G, G be a Mordell group
on C E , then for any integer k,

T ime(k P) = O(log k log3 q), if k ≤ Nq ;
T ime(k P) = O(log4 q), if k > Nq .

where Nq is the number of points of curve C E and the order of Mordell group G.
Proof Let P = (x, y), y = 0, then P + P = (x  , y  ), where x  and y  are determined
by Equation (6.5), (6.5) (addition, subtraction, multiplication, division, etc.) involved
in the formula shall not exceed 20 calculations, and the bit operation times of each
calculation is O(log3 q). By the “repeated square method,” k P can be transformed
into log k steps, thus
T ime(k P) = O(log k log3 q).

If y = 0, defined by P + P = O and “repeated square method,” there is T ime(k P) =


O(log k).
If k > Nq , because Nq · P = 0, let k = s · Nq + r , 1 ≤ r ≤ Nq , thus k P = r P.
We can calculate r P. Thus

T ime(k P) = O(log Nq + log Nq log3 q) = O(log4 q).



We use Hasse’s theorem: Nq = q + 1 + O( q), there is Nq = O(q), thus

log Nq = O(log q).

Lemma 6.2 holds.


Secondly, we consider how to correspond a plaintext unit m to a point on a given
elliptic curve C E , which is a necessary premise for encryption using elliptic curve.
Unfortunately, there is no definite algorithm for polynomial bit operation, which can
correspond any huge integer m to a point on any elliptic curve. Instead, we can only
choose the probability algorithm with sufficiently low error probability to realize the
correspondence from number to point. The so-called probability algorithm does not
guarantee 100% success rate (therefore, each operation depends on your luck), but
the success probability should be large enough, that is

P{number to point correspondence} > 1 − ε, ε > 0 sufficient small.


238 6 Elliptic Curve

Next, we introduce a probabilistic algorithm to realize the correspondence from


number to point, which makes theoretical preparation for the construction of elliptic
curve cryptosystem.
Probabilistic algorithm
Treat each plaintext unit as an integer m, 0 ≤ m < M, k is an integer. Select a
finite field Fq , q = pr satisfies q > k M. We write the positive integer n from 1 to
k M as follows,

1 ≤ n ≤ k M, n = mk + j, 0 ≤ m < M, 1 ≤ j ≤ k. (6.15)

Lemma 6.3 There is a 1-1 correspondence τ between the set of integers A =


{1, 2, . . . , k M} and a subset of finite field Fq (q > k M).

Proof Because q = pr , let g(x) ∈ Fq [x] be a monic irreducible polynomial, and


deg g(x) = r − 1, from the finite extension theory of fields, Fq is isomorphic to a
quotient ring of polynomial ring F p [x] over subfield F p , that is

Fq ∼
= F p [x]/<g(x)> = {a0 + a1 x + · · · + ar −1 x r −1 |ai ∈ F p }.

Each element α ∈ Fq uniquely corresponds to a polynomial a0 + a1 x + · · · +


ar −1 x r −1 , we write
α = (ar −1 ar −2 · · · a1 a0 ) p ,

is called the p-ary representation of α.


For every m, 0 ≤ m < M, each j, 1 ≤ j ≤ k, then it uniquely corresponds
to n = mk + j, express n as a p-ary number, if the p-ary of n is expressed as
(ar −1 ar −2 · · · a1 a0 ) p , then let τ (n) = α ∈ Fq . The uniqueness represented by p-ary,
then τ is an injection.

A = {τ (n)|1 ≤ n ≤ k M} ⊂ Fq .

Therefore, we establish a 1-1 correspondence τ of A → A . The Lemma holds.

Next, for each m(0 ≤ m < M), we establish a 1-1 correspondence σ between m
and the point on elliptic curve C E . Arbitrary choice 1 ≤ j ≤ k, then n = mk + j
corresponds to an element in Fq , that is τ (n) = x j ∈ Fq . For each x j , consider the
solution of the following equation.

y 2 = f (x j ) = x 3j + ax j + b. (6.16)

If the above equation has a solution, let y1 be one of the solutions, then Pm =
(x j , y1 ) ∈ C E , we let σ (m) = Pm , the inverse mapping σ −1 (Pm ) of σ is

τ −1 (x j ) − 1
σ −1 (Pm ) = [ ]. (6.17)
k
6.2 Elliptic Curve Public Key Cryptosystem 239

where τ is the 1-1 correspondence in lemma 6.3. Because τ −1 (x j ) = mk + j, so

τ −1 (x j ) − 1 j −1
[ ] = [m + ] = m.
2 k

So σ −1 is exactly the inverse mapping of σ . From σ , a 1-1 correspondence between


each m and the point on the elliptic curve is established. σ is called a probabilistic
algorithm.
Lemma 6.4 Probability algorithm σ can successfully achieve probability ≥ 1 − 1
2k
,
that is
1
P{is generated by σ and the number m corresponds to 1-1 of the point} ≥ 1 − .
2k
Proof When m, 0 ≤ m < M given, n = mk + j, where k is any given positive inte-
ger, 1 ≤ j ≤ k. By Lemma 6.3, τ (n) = x j ∈ Fq , then the probability that f (x j ) =
x 3j + ax j + b is a square number is 21 , in other words, the probability that equation
(6.16) has a solution in Fq is 21 ; therefore, the probability of no solution is also 21 .
We randomly and independently select j, 1 ≤ j ≤ k, the error probability of each
j (no solution in Eq. (6.16)) is 21 ; therefore, the error probability of k j is 21k . Once
Equation (6.16) has a solution, then Pm = (x j , y) ∈ C E , we can establish the 1-1
correspondence σ between m and points on C E , σ (m) = Pm . Thus

1
P{σ Successfully implemented} ≥ 1 − .
2k
We complete the proof of lemma.
Remark 6.3 f (x j ) = x 3j + ax j + b is a square number, that is, the probability that
Equation (6.16) has a solution is exactly Nq /2q, where Nq is the number of points
of C E . By Hasse’s theorem, Nq /2q is very close to 21 .
Definition 6.3 Let C E be an elliptic curve over a finite field Fq and B ∈ C E be a
point. For any point P on C E , if there is an integer x, such that x B = P, x is called
the discrete logarithm of P to base B.
With the above preparation, we can establish elliptic curve public key cryptosystem.
Diffie–Hellman key conversion principle
Symmetric cryptosystem, also known as classical cryptosystem or traditional
cryptosystem, is the mainstream cryptosystem before the advent of public key cryp-
tosystem. It has high efficiency because its encryption and decryption share the same
algorithm (such as DES, the data encryption standard algorithm launched by the
American Bureau of standards in 1977). When Diffie and Hellman proposed asym-
metric cryptosystem, they pointed out that symmetric cryptosystem and asymmetric
cryptosystem are not completely separated. The two cryptosystems are interrelated
and can even be used together. Diffie–Hellman key conversion principle is based on
the following mathematical principles.
240 6 Elliptic Curve

Lemma 6.5 Let p be a prime number, q = pr , Fq is a q-element finite field, Z pr is


the residual class ring mod pr , Zrp is an n-dimensional vector space on F p , then Fq ,
Z pr , Zrp have 1-1 correspondence with each other.
Proof Fq is an r -th finite extension on F p , so the additive group Fq+ of Fq is isomor-
phic with Frp , that is Fq+ ∼
= Frp , therefore, there is a 1-1 correspondence between Fq
and F p . Each a = (a0 , a1 , . . . , ar −1 ) ∈ Frp , define
r

σ (a) = a0 + a1 p + · · · + ar −1 pr −1 ∈ Zrp .

Then σ is a surjection and a injection of Frp → Z pr , so σ is a 1-1 correspondence


of Frp → Z pr . Since there is a 1-1 correspondence between Fq and Frp and a 1-1
correspondence between Frp and Z pr , there is also a 1-1 correspondence between Fq
and Z pr , the Lemma holds.
From the above lemma, we have the following conclusions.
Lemma 6.6 Let N be a positive integer. Z N is a residue class ring mod N . Then for
any prime p, there is a finite field F pr such that there is an injection σ of Z N → F pr ,
this injection is also called embedded mapping.
Proof When N given, for any prime p, express N as a p-ary number, then exists a
positive integer r ≥ 1, such that pr −1 ≤ N < pr . We write

Z N = {0, 1, 2, . . . , N − 1} ⊂ {0, 1, 2, . . . , N − 1, N , . . . , pr − 1} = Z pr .

σ
That is, Z N is regarded as a subset of Z pr . Let Z pr −→ F pr be 1-1 correspond, so σ
gives that Z N → F pr is an injection. The Lemma holds.
From the above conclusions, we can establish Diffie–Hellman’s key conversion
principle. Because symmetric cryptographic keys are related to the numbers of Z N ,
each number in Z N can be embedded into a finite field Fq by Lemma 6.6. Therefore,
the discrete logarithm on Fq can encrypt each embedded number asymmetrically, so
that the two cryptosystems can be combined with each other.
Taking the affine cryptosystem introduced in Chap. 4 as an example, A is a k × k-
order reversible square matrix in Z N , b = (b1 , b2 , . . . , bk ) ∈ ZkN is a given vector,
affine transformation f = (A, b) gives the encryption algorithm of each plaintext
unit m = m 1 m 2 · · · m k ∈ ZkN .
⎛⎞ ⎛ ⎞
m1 b1
⎜ .. ⎟ ⎜ .. ⎟
f (m) = c = A ⎝ . ⎠ + ⎝ . ⎠ .
mk bk

Let A = (ai j )k×k , each ai j ∈ Z N . By Lemma 6.6, we can embed ai j into a finite field
Fq . ai j is encrypted again by using the discrete logarithm algorithm on Fq , so that
the two cryptosystems can be effectively combined.
6.2 Elliptic Curve Public Key Cryptosystem 241

In the case of elliptic curve, we introduce the workflow of Diffie–Hellman elliptic


curve cryptography. First, the user selects a public finite field Fq , and an elliptic curve
C E on Fq , randomly select a point P ∈ C E , let P = (x, y), then x ∈ Fq . By Lemma
6.5, x corresponds to an r -dimensional vector (a0 , a1 , . . . , ar −1 ) in Frp space (where
q = pr ), consider (a0 , a1 , . . . , ar −1 ) as a p-ary number, that is

(a0 , a1 , . . . , ar −1 ) → a0 + a1 p + · · · + ar −1 pr −1 .

Then (a0 , a1 , . . . , ar −1 ) can be used as the key of other cryptosystems, especially


symmetric cryptosystems.
Secondly, the user selects a common point B ∈ C E , like a finite field, as the basis
of the discrete logarithm on the Mordell group. The difference from finite field is
that the Mordell group on elliptic curve is not a cyclic group, so point B is not the
generator of Mordell group. However, we require order o(B) of B to be as large as
possible (o(B)|Nq ). When point B is selected, the working platform of elliptic curve
cryptography is actually the subgroup < B > generated by B.
In order to generate the key, each user can randomly select an integer a, whose
order is roughly the same as Nq , as their own user’s private key, a should be strictly
confidential. Calculate a B = A ∈ C E . Point A is the public key of each user. Now
each user has its own public key (A, B) and private key (a, B).
Massey–Omura elliptic curve cryptography.
In order to encrypt and send a plaintext unit m(0 ≤ m < M), m is corresponding
to the only point Pm ∈ C E on elliptic curve C E by using the probability algorithm
introduced earlier. Let N = Nq = |C E |; that is, the order of Mordell group is known.
Each user randomly selects an integer e to satisfy

1 ≤ e ≤ N , and (e, N ) = 1.

d = e−1 mod N is calculated by Euclidean rolling division method, that is

de ≡ 1(mod N ), and 1 ≤ d ≤ N .

Suppose user A wants to encrypt and send plaintext message Pm to user B, so that
(e A , d A ) and (e B , d B ) are the respective private keys of A and B. First, A sends a
message e A Pm to B, and then B returns the message e B e A Pm to A, A can calculate
the message by using the private key d A . Because N Pm = 0, d A e A ≡ 1(mod N ), so

d A e B e A Pm = e B Pm .

Finally, user A sends the calculation result e B Pm to B, and user B can read the original
real message Pm of user A by using the private key d B , because d B e B ≡ 1(mod N ),
so
d B e B Pm = Pm .
242 6 Elliptic Curve

It should be noted that even if user B receives the message e A Pm sent by A for the
first time, e A Pm is given to user B as a point Q = e A Pm on the elliptic curve. If B
does not calculate the discrete logarithm, e A and d A are not known. Although the
last user B already knows the plaintext Pm , the calculation of the discrete logarithm
of Q under base Pm is very complex. Similarly, when user A receives a reply from
user B and calculates e B Pm , he cannot know B’s private key (e B , d B ).
ElGamal elliptic curve cryptography
ElGamal cryptosystem is another elliptic curve cryptosystem completely different
from Massey–Omura cryptosystem. In this system, the order N of Mordell group of
elliptic curve does not need to be known. All users jointly select a fixed finite field
Fq , an elliptic curve C E on Fq and a fixed point B ∈ C E on C E as the basis of discrete
logarithm. Each user randomly selects an integer a(0 ≤ a < Nq ) as the private key,
calculates Q = a B ∈ C E and discloses it. Its workflow is as follows:
If user A wants to encrypt and send a plaintext unit Pm to user B, the public key
of A is Q A = a A · B, the private key is a A , the public key of B is Q B = a B · B and
f
the private key is a B . The encryption algorithm of A −→ B is

f (m) = f (Pm ) = (k B, Pm + k Q B ) = c. (6.18)

The decryption algorithm is that user B multiplies the first number with private key
a B and then subtracts the second number. That is,

f −1 (c) = Pm + k Q B − a B (k B). (6.19)

Because Q B = a B · B, there is

f −1 (c) = Pm + ka B · B − ka B · B = Pm .

Where k is an integer randomly selected by user A. This integer k does not appear
in cryptosystemtext c and is called a layer of “ mask” added by user A to protect
plaintext Pm . In fact, the cryptosystemtext c = (A1 , A2 ) received by user B is two
points on elliptic curve C E , where

A1 = k B, A2 = Pm + k Q B = Pm + k(a B · B).

Even if the third user knows the private key a B of user B (assuming that the private
key of user B is not secure), decryption with A2 − a B · B cannot obtain plaintext
Pm , because

A2 − a B · B = Pm + k Q B − a B B = Pm + k(a B · B) − a B · B = Pm ,

if k = 1.
6.2 Elliptic Curve Public Key Cryptosystem 243

The two elliptic curve cryptosystems introduced above are based on the selected
elliptic curve C E and a point B on C E as the basis of discrete logarithm. How to
randomly select C E and B needs further research.

Lemma 6.7 Let x 3 + ax + b ∈ Fq [x] be a cubic polynomial, then x 3 + ax + b = 0


have no multiple roots in the split domain if and only if the discriminant 4a 3 + 27b2 =
0.

Proof This conclusion can be deduced directly from the root formula of cubic alge-
braic equation.

In order to randomly select an elliptic curve on Fq , C E is determined by equation


y 2 = x 3 + ax + b at χ (Fq ) = 2, 3. Randomly select three elements (x0 , y0 , a) in
Fq , let
b = y02 − (x03 + ax0 ).

Check whether f (x) = x 3 + ax + b has multiple roots. From Lemma 6.7, just check
whether discriminant 4a 3 + 27b2 is 0. If f (x) has no multiple roots, then select the
elliptic curve y 2 = x 3 + ax + b. Where (x0 , y0 ) ∈ C E is a point on an elliptic curve.
So let B = (x0 , y0 ) is the base of discrete logarithm. Similarly, for q = 2r or q = 3r ,
we can also randomly draw an elliptic curve C E and determine the basis B ∈ C E of
the discrete logarithm at the same time.
It should be noted that at present, no algorithm can calculate the number of points
Nq of any elliptic curve. Some special algorithms, such as schoof algorithm, are quite
complex and lengthy in practical application, although the computational complexity
is polynomial.
Now we introduce the second method of selecting elliptic curves, called mod p
method. An elliptic curve C E , if E is a number field, such as E = R, Q, C, C E
is called a global curve. We use the mod p method to convert a global curve into a
“local” curve. Firstly, a point B ∈ C E on a global curve C E and C E is selected, where
B is the group element of Mordell group, its addition order is ∞, where E = Q is
the rational number field.

C E : y 2 = x 3 + ax + b, a, b ∈ Q.

Let p be a prime number and coprime with the integers in the denominators of a and
b, then we obtain an elliptic curve on F p ,

C E mod p : y 2 ≡ x 3 + ax + b(mod p), a, b ∈ F p .

and a point B mod p on C E mod p, when localizing an elliptic curve, the choice of
prime p only needs to satisfy

p  aand b ’s denominator, and 4a 3 + 27b2 ≡ 0(mod p).

In fact, we can ask further


244 6 Elliptic Curve

N p = |C E mod p| = prime. (6.20)

In this way, the Mordell group of C E mod p is a cyclic group, and any finite point
of C E mod p will be the generator of the group. At present, there is no deterministic
algorithm for selecting the prime number p satisfying Formula (6.20), and it is gen-
erally speculated that a probabilistic algorithm with success probability ≥ O( log1 p )
exists.

6.3 Elliptic Curve Factorization

In 1986, mathematician H.W. Lenstra used elliptic curve to find a new method of
factor decomposition. Lenstra’s method has greater advantages than the known old
algorithms in many aspects, which is also one of the main reasons why elliptic curve
has attracted more and more attention in the field of cryptography, We first introduce
a classical factorization method called Pollard ( p − 1) algorithm.
( p − 1) algorithm
Suppose n is a compound number, and p is a prime factor of n; of course, p is
unknown and needs to be further determined. If p − 1 happens to have some small
prime factors, or all prime factors of p − 1 are not too large, the essence of ( p − 1)
method is to find the prime factor p with this property of n. ( p − 1) method can be
completed in the following steps:
1. Let B be a positive integer. Select a positive integer k so that k is a multiple of
most positive integers smaller than B, for example, k = B!, or k can be the least
common multiple of all positive integers smaller than B.
2. Select a positive integer a to satisfy 2 ≤ a ≤ n − 2, (a, n) = 1, such as a = 2,
or a = 3, and any randomly selected positive integer.
3. Using the “repeated square method” to calculate the minimum nonnegative resid-
ual a k mod n of a k under mod n.
4. The maximum common divisor d = (a k − 1, n) of a k − 1 and n is calculated
by Euclidean rolling division method.
5. If d = 1 or d = n, that is, if d is the trivial factor of n, re select a, and then repeat
steps 1–4 above.
In order to explain the working principle of ( p − 1) algorithm, we further assume
that k is a multiple of all positive integers less than B, and p|n,

p−1= piαi , where ∀ piαi ≤ B. (6.21)

There is p − 1|k. By Fermat congruence theorem,

a p−1 ≡ 1(mod p), =⇒ a k ≡ 1(mod p).

So p|d, where d = (a k − 1, n).


6.3 Elliptic Curve Factorization 245

Definition 6.4 Suppose n is a compound number, p|n. B is a sufficiently large


positive integer arbitrarily selected, and p is called B−smooth prime, if Eq. (6.21)
holds. That is, p − 1 can be decomposed into the product of prime powers less than
B.
Lemma 6.8 Suppose n is a compound number and B is a positive integer. If n has a
B−smoothing prime factor p, select k and a according to the algorithm steps 1 − 4,
then we have d = (a k − 1, n) > 1, so we have factor decomposition n = d · dn .
Proof If p is a smoothing prime factor of n, then we have p|(a k − 1, n), thus d > 1.
The Lemma holds.
In the above algorithm, if d = (a k − 1, n) = n. That is n|a k − 1, if the algorithm
fails, we must reselect a and carry out a new round of testing.
Example 6.2 Factorization of n = 540143, if ( p − 1) method is used, then choose
B = 8, k = 840, is the least common multiple of 1, 2, . . . , 8, let a = 2, calculate the
minimum nonnegative residue of 2840 under mod n,

2840 ≡ 53047(mod 540143).

Calculate (2840 − 1, n),

d = (2840 − 1, n) = (53046, 540143) = 421.

So we have factorization 540143 = 421 × 1283.


Pollard’s ( p − 1) method is essentially the multiplication group of Z p , the order
of Z∗p cannot be divided by a huge prime number; otherwise, this method will not
work. Lenstra can overcome this disadvantage by using elliptic curves for factor
decomposition, because there are many elliptic curves to choose from, we can always
hope that the order of Mordell group on an elliptic curve is not divided by a huge
prime number. Next, we introduce Lenstra’s method in detail. First, we discuss the
elliptic curve mod n.
The following general assumption is that n is an odd number and a compound
number, p|n ( p is unknown) and p > 3. Let m be a positive integer, x1 , x2 be two
rational numbers, and the denominators of x1 and x2 are mutually prime with m, so
that x1 − x2 = dc is a reduced fraction, then define

x1 ≡ x2 (mod m), if m|c. (6.22)

Lemma 6.9 Suppose x1 ∈ Q is a rational number, if its denominator and m are


mutually prime, there is a unique nonnegative integer r , such that x1 ≡ r (mod m).
r is called the nonnegative residue of x1 under mod m, denote as r = x1 mod m.
Proof Write x1 = ab , where (a, m) = 1, x1 − x = −ax+b
a
, because the congruence
equation −ax + b ≡ 0(mod m) has a unique solution r , 0 ≤ r < m. So there is a
unique r such that x1 ≡ r (mod m). The Lemma holds.
246 6 Elliptic Curve

In order to randomly generate an elliptic curve C E over the rational number field
Q, we randomly select three integers a, x0 , y0 ∈ Z, let b = y02 − x03 − ax0 to satisfy

= 4a 2 + 27b2 = 0, and ( , n) = 1. (6.23)

We get an elliptic curve C E : y 2 = x 3 + ax + b, where (x0 , y0 ) ∈ C E . Because


a, b ∈ Z, = 4a 2 + 27b2 and n are coprime, then for all prime p, p|n, =⇒ ≡
0(mod p). Therefore, as a cubic algebraic equation over a finite field F p , x 3 + ax + b
has no multiple roots, so we obtain a “local” elliptic curve C E mod p, where

C E mod p : y 2 ≡ x 3 + ax + b(mod p). (6.24)

And a point (x0 mod p, y0 mod p) ∈ C E mod p on C E mod p, let’s write this point
on C E mod p with P, that is

P = (x0 mod p, y0 mod p) ∈ C E mod p.

Next, we want to calculate k P, like the “continuous square method” of multipli-


cation, and there is a similar continuous doubling method for addition.
Lemma 6.10 When k is a huge integer, the computational complexity of k P is

T ime(k P) = log k · T ime(P).

Proof k is expressed as a binary integer, i.e.,

k = a0 + a1 2 + a2 22 + · · · + am−1 2m−1 , ∀ ai = 0 or 1.

We can double continuously, that is, 2 j P + 2 j P = 2 · 2 j P(0 ≤ j ≤ m − 2), thus


obtain k P, m is the binary digit of k, m = O(log k), there is

T ime(k P) = log k · T ime(P).

The Lemma holds.


Theorem 6.2 Let C E be an elliptic curve over the rational field Q, define the equa-
tion as y 2 = x 3 + ax + b, where a, b ∈ Z, and (4a 3 + 27b2 , n) = 1. Let P1 and P2
be two points on C E , and their denominators are coprime with n, and P1 = −P2 ,
P1 + P2 ∈ C E . Let P1 + P2 = (x, y), then the necessary and sufficient condition for
the denominator of x and y to be mutually prime with n is that there is no prime
factor p|n of n, P1 mod p and P2 mod p are two points on the local curve C E mod p,

P1 mod p + P2 mod p = 0.

Proof Let P1 = (x1 , y1 ), P2 = (x2 , y2 ) is the two points on C E . P1 + P2 = (x, y).


If the denominators of x and y are coprime with n, we have to prove
6.3 Elliptic Curve Factorization 247

P1 mod p + P2 mod p = 0, ∀ p|n. (6.25)

If x1 ≡ x2 (mod p), it is obvious that Formula (6.4) is true from Formula (6.25).
Might as well make x1 ≡ x2 (mod p). If P1 = P2 , now x1 = x2 , y1 = y2 , we only
need p  2y1 . If p|2y1 , because the coordinates of 2P1 = (x, y) are determined by
equation (6.5),  3x12 +α 2
x = ( 2y 1
) − 2x1 ;
3x12 +α
y = y1 − ( 2y1
)(x1 − x).

3x 2 +a
Where α = 2y1 1 . By p|2y1 , =⇒ 3x12 + α ≡ 0(mod p). Because n is an odd number,
so p|y1 , we have 
x13 + ax1 + b ≡ 0(mod p);
3x12 + a ≡ 0(mod p).

That is, x1 is the root of f (x) = x 3 + ax + b and derivative f  (x) = 3x 2 + a(mod p).
This is contradictory to (4a 3 + 27b2 , n) = 1. So you might as well let P1 = P2 , now
x1 ≡ x2 (mod p), x1 = x2 (because P1 = −P2 ), we can write

x2 = x1 + t pr , r ≥ 1.

The numerator and denominator of t and p are mutually prime, which can be deduced
from Formula (6.4),
y2 = y1 + spr .

On the other hand, by y22 = x23 + ax2 + b, there is

y22 = (x1 + t pr )3 + a(x1 + t pr ) + b


≡ x13 + ax1 + b + t pr (3x12 + a)(mod p) (6.26)
≡ y12 + t pr (3x12 + a)(mod p).

But x1 ≡ x2 (mod p), y1 ≡ y2 (mod p), there is

P1 mod p + P2 mod p = 2P1 .

The above formula is infinite if and only if y1 ≡ y2 ≡ 0(mod p). If y1 ≡ y2 ≡


0(mod p), then y22 − y12 = (y2 − y1 )(y2 + y1 ) will be divided by pr +1 . Therefore,
Equation (6.26) contains 3x12 + a ≡ 0(mod p). It’s impossible. Because x 3 + ax +
b(mod p) has no multiple roots, x1 cannot be the roots of x 3 + ax + b and derivative
3x 2 under mod p. This proves that Formula (6.25) holds under the assumption.
Conversely, if Eq. (6.25) holds, we prove that the denominator of P1 + P2 and
n are coprime. Fixed p|n, if x1 ≡ x2 (mod p), from equation (6.4), the denominator
of P1 + P2 and p are coprime. Might as well make x1 ≡ x2 (mod p), then y2 ≡
248 6 Elliptic Curve

±y1 (mod p). Because P1 mod p + P2 mod p = 0, we have y2 ≡ y1 ≡ 0(mod p).


First assume P2 = P1 , then Equation (6.5) and the fact of y1 ≡ 0(mod p) prove
that the denominator of P1 + P2 = 2P1 and p is coprime. Finally, let P2 = P1 , we
write x2 = x1 + t pr , (t, p) = 1, using the congruence of Formula (6.26), there are

y22 − y12
≡ 3x12 + a(mod p).
x2 − x1

Because p  y2 + y1 ≡ 2y1 (mod p), so the denominator of

y22 − y12 y2 − y1
=
(y2 + y1 )(x2 − x1 ) x2 − x1

cannot be divided by p, by (6.4), the denominator of P1 + P2 cannot be divided by


p. Since p|n is arbitrary, we complete the proof of the whole theorem.

Lenstra algorithm.
Let n be an odd compound number, we hope to find a nontrivial factor d of n, d|n,
1 < d < n, so there is factorization n = d · dn . Previously, we have introduced the
random selection of an elliptic curve C E on rational number field Q and a point P on
C E . Lenstra’s algorithm hopes to factorize n by (C E , P). There is no doubt that the
Lenstra algorithm to be explained below is also a probability algorithm. If (C E , P)
cannot be factorized successfully, as long as the probability of failure is p < 1,
select another elliptic curve and a point above. If this continues, after randomly and
independently selecting n elliptic curves, the probability of successful factorization
of n,
n
P{n = d · } ≥ 1 − p n ( p < 1).
d
When n is sufficiently large, the success probability of Lenstra algorithm can be
infinitely close to 1. Therefore, the so-called Lenstra algorithm can be simply sum-
marized as an algorithm that factorizes n by using any rational elliptic curve (C E , P),
and its failure probability is p < 1.
Let (C E , P) be a given rational elliptic curve, and B and C be the positive upper
bound of selection. Let k be divided by some small prime powers, to be exact,

k= l αl , (6.27)
1<l≤B

where αl is the largest index satisfying l αl ≤ C. Thus αl = [ log C


log l
].
Next, we calculate k P(mod n), by (6.4) and (6.5), if x2 − x1 and 2y1 have a rational
number whose denominator and n are not prime, for example d = (x2 − x1 , n), 1 <
d < n; Then we have factorization n = d · dn . If d = n, then re select point P on
rational elliptic curves C E and C E . By Theorem 6.2, d > 1 appears in these rational
numbers x2 − x1 and y1 if and only if there is a k1 , such that
6.3 Elliptic Curve Factorization 249

k1 · (P mod p) = 0, ∀ p|n.

From the selection of equation k in (6.27), there is a maximum probability k1 |k, thus

k · (P mod p) = 0, ∀ p|n.

Therefore, in Lenstra algorithm, by calculating the rational point k P, there is a great


probability that there is a certain p, p|n such that

k(P mod p) = 0, p|n, is a prime number. (6.28)

By Theorem 6.2, let P = (x1 , y1 ), (k − 1)P = (x2 , y2 ), thus d = (x2 − x1 , n) > 1


or (2y1 , n) = d  > 1, we obtain the nontrivial factorization of n.
From the above Lenstra algorithm, the key problem is to calculate k · P. Using
the continuous doubling method given in Lemma 6.10, we only need to calculate
2P,
 2(2P), 2(4P), . . . , 2α2 P, 3(2α2 P), 3(3 · 2α2 P) · · · 3α3 2α2 P, this continues until
αl
( 1<l≤B l )P, i.e., k P.
For the probability estimation and computational complexity of Lenstra algorithm,
see 1986 of reference 6.
Exercise 6

1. Let C E ={(x, y) ∈ C | y 2 = x 3 + ax + b, a, b ∈ R} is a complex elliptic curve,


then C E R2 is a subgroup of C E , determine all subgroups of C E whose coor-
dinates are real numbers.
2. The points of order n on complex elliptic curve and real elliptic curve are deter-
mined.
3. Take an example of a rational elliptic curve C E , there are exactly two points on
C E with order 2. Another example is that there are exactly four points on C E
with order 2.
4. Let C E is a real elliptic curve, P ∈ C E is a finite point, determine the geometric
equivalence condition of o(P) = 2, o(P) = 3, o(P) = 4.
5. Calculate the order of points on the following rational elliptic curves:
(i) P = (0, 16), C E : y 2 = x 3 + 256;
(ii) P = ( 21 , 21 ), C E : y 2 = x 3 + x4 ;
(iii) P = (3, 8), C E : y 2 = x 3 − 43x + 16;
(iv) P = (0, 0), C E : y 2 + y = x 3 − x 2 .
6. Proved that the following elliptic curve has exactly q + 1 points in Fq :
(a) y 2 = x 3 − x, when q ≡ 3(mod 4);
(b) y 2 = x 3 − 1, when q ≡ 2(mod 3), q is odd;
(c) y 2 + y = x 3 , when q ≡ 2(mod 3).
7. Let q = 2r , the elliptic curve C E on Fq be: y 2 + y = x 3 ; P = (x, y) ∈ C E ,
calculate 2P and −P. If q = 16, prove that every point on C E has order 3.
8. Please give a probabilistic algorithm to find a nonsquare number in the finite
field Fq .
250 6 Elliptic Curve

9. The deterministic algorithm can map the embedding of plaintext units to any
Fq − elliptic curve. Please give the specific algorithm process for the following
elliptic curves:
(1) C E : y 2 = x 3 − x, when q ≡ 3(mod 4),
(2) C E : y 2 + y = x 3 , when q ≡ 2(mod 3).
10. Let C E be an elliptic curve on the finite field F p , and Nr represents the number
of midpoint of C E in the finite field F pr , then
(i) If p > 3, when r > 1, Nr is not prime.
(ii) When p = 2, 3, a counterexample is given to show that Nr is a prime number.
11. Take an example of an elliptic curve C E , which has only one point on F4 , the
infinity point. Take Nr as the number of points of C E on F4r , then Nr is the
square of Mersenne prime 2r − 1.
12. Decompose n = 53467 at k = 840, a = 2 using Pollard’s ( p − 1) method.
k
13. Let n k = 22 + 1 be Fermat number, the following is Pepin’s method to detect
whether n k is a prime number:
2k−1
(i) n k is a prime, if and only if there is an integer a, a 2 ≡ −1(mod n k ).
(ii) If n k is a prime, then a ∈ Z∗n k over 50% has the congruence property of (i).
(iii) When k > 1, we can always choose a = 3, 5, or a = 7.

References

Fulton, W. (1969). Algebraic curves. Benjamin.


Gupta, R., & Murty, M. R. (1986). Primitive points on elliptic curves. Composition Mathematics ,
58, 13–44.
Koblitz, N. (1984). Introduction to elliptic curves and modular forms. Springer.
Koblitz, N. (1987). Elliptic curves cryptosystems, mathematics of computation (Vol. 48).
Koblitz, N. Primality of the number of points on an elliptic curve over finite field.
Koblitz, N. (1982). Why study equations over Finite Fields. Mathematics Magazine, 55, 144–149.
Lang, S. (1978). Elliptic curves: diophantine analysis. Springer.
Lenstra Jr, H. W. (1986). Elliptic curves and number-theoretic algorithms. Report 86-19, Mathe-
matics Institute University of Van Amsterdam.
Lenstra Jr, H. W. (1986). Factoring integers with elliptic curves. Report 86-18,Mathematic Institute
University of Van Amsterdam.
Miller, V. (1985). Use of elliptic curves in cryptography. Abstracts for Crypto 85.
Odlyzko, A. M. (1985). Discrete logarithms in finite fields and their cryptographic significance. In
Advance in cryptography, Proceeds of Eurocrypt (Vol. 84, pp. 224–314). Springer.
Pollard, J. M. (1974). Theorems on factorization and primality testing. In Proceedings Cambridge
Phil Soc, 76, 521–528.
Schoof, H. (1985). Elliptic curves over finite fields and the computation of square roots mod p.
Mathematics of Computation, 44, 483–494.
Silverman, J. (1986). The arithmetic of elliptic curves. Springer.
References 251

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 7
Lattice-Based Cryptography

7.1 Geometry of Numbers

Let Rn be an n-dimensional Euclidean space and x = (x1 , x2 , . . . , xn ) ∈ Rn be an


n-dimensional vector, x can be a row vector or a column vector, depending on the situ-
ation. If x ∈ Zn , then x is called a integral point. Rm×n is all m × n-dimensional matri-
ces on R. x = (x1 , x2 , . . . , xn ) ∈ Rn , y = (y1 , y2 , . . . , yn ) ∈ Rn , define the inner
product of x and y as
 n
x, y = xi yi . (7.1)
i=1

The length |x| of vector x is defined as

 
n
|x| = x, x = xi2 . (7.2)
i=1

λ ∈ R, then λ · x is defined as

λx = (λx1 , λx2 , . . . , λxn ). (7.3)

If the inner product x, y = 0 of two vectors x and y, x and y are said to be
orthogonal, denote as x⊥y.

Lemma 7.1 Let x, y ∈ Rn , λ ∈ R is any real number, then


(i) |x| ≥ 0, |x| = 0 if and only if x = 0 is a zero vector;
(ii) |λx| = |λ||x|, ∀ x ∈ Rn , λ ∈ R;
(iii) (Trigonometric inequality) |x + y| ≤ |x| + |y|, and |x − y| ≥ ||x| − |y||;

© The Author(s) 2022 253


Z. Zheng, Modern Cryptography Volume 1, Financial Mathematics and Fintech,
https://doi.org/10.1007/978-981-19-0920-7_7
254 7 Lattice-Based Cryptography

(iv) (Pythagorean theorem) If and only if x⊥y, we have

|x ± y|2 = |x|2 + |y|2 .

Proof (i) and (ii) can be derived directly from the definition. To prove (iii), let
x = (x1 , x2 , . . . , xn ), y = (y1 , y2 , . . . , yn ) ∈ Rn , by Hölder inequality:
 n   n  n  21
   
 
 xi yi  ≤ 2
xi 2
yi .
 
i=1 i=1 i=1

So there is


n 
n 
n 
n
|x + y|2 = (xi + yi )2 = xi2 + 2 xi yi + yi2
i=1 i=1 i=1 i=1
 n 

n    n
 
≤ xi + 2 
2
xi yi  + yi2
 
i=1 i=1 i=1
⎛ ⎞2
 n  n
≤⎝ xi2 + yi2 ⎠ = (|x| + |y|)2 ,
i=1 i=1

so (iii) holds. Then, by the definition of inner product,

x ± y, x ± y = x, x ± 2x, y + y, y,

if x⊥y, then
|x ± y|2 = |x|2 + |y|2 .

Conversely, if x is not orthogonal to y, then x, y = 0, thus

|x ± y|2 = |x|2 + |y|2 .

Lemma 7.1 holds.

From Pythagorean theorem, for orthogonal vector x⊥y, we have the following
conclusion,
|x + y| = |x − y|, if x⊥y. (7.4)

Definition 7.1 Let R ⊂ Rn be a subset, 0 ∈ R, R is called a symmetric convex


body of Rn , if
(i) x ∈ R, ⇒ −x ∈ R (Symmetry);
(ii) Let x, y ∈ R, λ ≥ 0, μ ≥ 0, and λ + μ = 1, then λx + μy ∈ R (Convexity).
7.1 Geometry of Numbers 255

The following example is a famous example of a symmetric convex body


defined by a set of linear inequalities. Let A ∈ Rm×n be an m × n-order matrix,
c = (c1 , c2 , . . . , cn ) ∈ Rn , and ∀ ci ≥ 0, R(A, c) is defined as the set of solu-
tions of x = (x1 , x2 , . . . , xn ) ∈ Rn defined by the following m linear inequalities,
let A = (ai j )m×n ,  
 
 n 
 a x 
i j j  ≤ ci , 1 ≤ i ≤ m. (7.5)

 j=1 

We have

Lemma 7.2 For any A ∈ Rm×n , and any positive vector c = (c1 , c2 , . . . , cn ) ∈ Rn ,
then R(A, c) is a symmetric convex body in Rn .

Proof Obviously zero vector x = (0, 0, . . . , 0) ∈ R(A, c), and if x ∈ R(A, c) ⇒


−x ∈ R(A, c). So we only prove the convexity of R(A, c). Suppose x, y ∈ R(A, c),
let
z = λx + μy, λ > 0, μ > 0, λ + μ = 1.

Then for any 1 ≤ i ≤ m, we have

|ai1 z 1 + ai2 z 2 + · · · + ain z n |


≤ λ|ai1 x1 + ai2 x2 + · · · + ain xn | + μ|ai1 y1 + ai2 y2 + · · · + ain yn |
≤ λci + μci = ci .

So there is z = λx + μy ∈ R(A, c). Thus, R(A, c) is a symmetrical convex body.


Lemma 7.2 holds.

Lemma 7.3 Let R ⊂ Rn be a symmetrical convex body, x ∈ R, then when |λ| ≤ 1,


we have λx ∈ R.

Proof By convexity, let

1 1
ρ= (1 + λ), σ = (1 − λ).
2 2
Then ρ ≥ 0, σ ≥ 0, and ρ + σ = 1. So there is

ρx + σ (−x) = λx ∈ R.

The Lemma holds.

Lemma 7.4 If x, y ∈ R, then λx + μy ∈ R, where λ, μ are real numbers, and


satisfies |λ| + |μ| ≤ 1.

Proof Let η1 be the sign of λ and η2 be the sign of μ, then by Lemma 7.3,
256 7 Lattice-Based Cryptography

x = η1 (|λ| + |μ|)x ∈ R,
y = η2 (|λ| + |μ|)y ∈ R.

|λ| |μ|
Let ρ = |λ|+|μ|
, σ = |λ|+|μ|
, then ρ + σ = 1. By definition, we have

λx + μy = ρx + σ y ∈ R,

thus the Lemma holds. And this result is not difficult to be extended to the case of n
variables.

Lemma 7.5 (Blichfeldt) Let R ⊂ Rn be any region in Rn and V be the volume of


R. If V > 1, then there are two different vectors x ∈ R, x ∈ R so that x − x is an
integral point (thus a nonzero integral point).

Proof For ∀x = (x1 , x2 , . . . , xn ) ∈ Rn , we define

[[x]] = ([x1 ], [x2 ], . . . , [xn ]) ∈ Zn (7.6)

and
[x] = (δ1 , δ2 , . . . , δn ) ∈ Zn , (7.7)

where [xi ] is the square bracket function of xi and δi is the nearest integer to xi .
For each integral point u ∈ Zn , define

Ru = {x ∈ R|[[x]] = u}

and
Du = {x − u|x ∈ Ru }.

Because Ru 1 ∩ Ru 2 = ∅, if u 1 = u 2 . Thus by R = u Ru ,

⇒ V = Vol(R) = Vol(Ru )
u

= Vu > 1,
u

where Vu = Vol(Ru ). Thus Vu = Vol(Du ). If Du is disjoint, then


 
 
Vu = Vol Du ⊂ [0, 1) × · · · × [0, 1).
u u

There is 
Vu ≤ 1,
u
7.1 Geometry of Numbers 257

so there is a contradiction. Therefore, there must be two different integral points


u and u (u = u ) ⇒ Du ∩ Du = 0, that is x − u = x − u ⇒ x − x = u − u ∈
Zn . The Lemma holds.

Lemma 7.6 (Minkowski) Let R be a symmetric convex body, and the volume of R

V = Vol(R) > 2n ,

then R contains at least one nonzero integer point.

Proof Let  
1 1
R= x|x ∈ R .
2 2

Thus  
1 1
Vol R = V > 1,
2 2n

by Lemma 7.5, there are integral points where x , x ∈ 21 R ⇒ x − x = u is


nonzero. We prove u ∈ R. Write x = 21 y, x = 21 z, where y, z ∈ R. Then

1 1
u= y − z, y ∈ R, z ∈ R.
2 2
By Lemma 7.4, then u ∈ R. The Lemma holds.

Remark 7.1 The above Minkowski’s conclusion cannot be improved, that is V > 2n ,
it cannot be improved to V ≥ 2n . A counterexample is

R = {x ∈ Rn |x = (x1 , x2 , . . . , xn ), ∀ |xi | < 1}.

Obviously Vol(R) = 2n , but there is no nonzero integer point in ordinary R.

When Vol(R) = 2n , in order to make a symmetric convex body R still have


nonzero integral points, we need to make some supplementary constraints on R,
first, we consider the bounded region. Let R ⊂ Rn , call R bounded, if

R = {x = (x1 , x2 , . . . , xn ) ∈ Rn ||xi | ≤ B, 1 ≤ i ≤ n},

where B is a bounded constant.

Lemma 7.7 Let A∈Rn×n be a reversible matrix, d = |det(A)| > 0, c = (c1 , c2 , . . . ,


cn ) ∈ Rn is a positive vector, that is ∀ ci > 0, then the symmetric convex body R(A, c)
defined by Eq. (7.5) is bounded and its volume

Vol(R(A, c)) = 2n d −1 c1 c2 · · · cn .
258 7 Lattice-Based Cryptography

Proof Let A = (ai j )n×n . Write Ax = y, then x = A−1 y. And let A−1 = (bi j )n×n ,
then for any xi , there is
 
  
 n  n

|xi | =  bi j y j  ≤ |bi j | · c j ≤ B,
 j=1  j=1

where B is a bounded constant. Therefore, R(A, c) is a bounded set. Obviously


 
Vol(R(A, c)) = ··· dx1 dx2 · · · dxn ,
x=(x1 ,x2 ,...,xn )∈R (A,c)

do variable replacement Ax = y, then

1
dx = dx1 · · · dxn = dy1 dy2 · · · dyn .
|det(A)|

Thus
c1 cn
1
Vol(R(A, c)) = ··· dy1 dy2 · · · dyn
|det(A)|
−c1 −cn

n
= 2n d −1 ci ,
i=1

Lemma 7.7 holds.

Remark 7.2 In (7.5), “≤” is changed to “<” to define R(A, c), and the above lemma
is still holds.

Now consider the general situation, let A = (ai j )m×n . If m > n, and rank(A) ≥ n,
then R(A, c) defined by Eq. (7.5) is still a bounded region. Obviously if m < n, or
m = n, rank(A) < n, then R(A, c) is an unbounded region, and V = ∞. Therefore,
we have the following Corollary.

Corollary 7.1 Let A = (ai j )m×n , m < n or m = n, det(A) = 0, then for any small
positive vector c = (c1 , c2 , . . . , cn ), 0 < ci < ε, R(A, c) contains a nonzero integer
point. In other words, the following m inequalities
 
 
 n 
 ai j x j  < ε, 1 ≤ i ≤ m.

 j=1 

There exists a nonzero integer solution x = (x1 , x2 , . . . , xn ) ∈ Zn .


7.1 Geometry of Numbers 259

Proof When ε > 0 given, then Vol(R(A, c)) = ∞ > 2n . By Lemma 7.6, R(A, c)
contains at least one nonzero zero point.

Let A ∈ Rm×n be a matrix of order m × n, c = (c1 , c2 , . . . , cn ) ∈ Rn is a positive


vector, that is ∀ ci > 0, write A = (ai j )m×n , R (A, c) is defined as the set of solutions
x = (x1 , x2 , . . . , xn ) of the following linear inequality:
⎧  
 n 

⎪  

⎪  
1 j j  ≤ c1 ,

⎪  a x
⎨  j=1 
  (7.8)

⎪  n 

⎪  

⎪  a 
i j j  < ci , i = 2, . . . , m.
x
⎩ 
 j=1 

When A ∈ Rn×n is a reversible square matrix, we discuss the nonzero integral point
in symmetric convex body R (A, c).

Lemma 7.8 If A ∈ Rn×n is a reversible matrix and c = (c1 , c2 , . . . , cn ) is a positive


vector, when
c1 c2 · · · cn ≥ |det(A)|, (7.9)

Then R (A, c) contains a nonzero integer point.

Proof When c1 c2 · · · cn > |det(A)|, because of

2n c1 c2 · · · cn
Vol(R (A, c)) = > 2n ,
|det(A)|

by Lemma 7.6 and 7.7, then the proposition holds, we only discuss the case when
the equal sign of formula (7.9) holds.
Let ε be any positive real number, 0 < ε < 1, then by Lemma 7.7, there is a
nonzero integral solution x (ε) = (x1(ε) , x2(ε) , . . . , xn(ε) ) ∈ Zn satisfies
⎧  
⎪  
⎪  n 

⎪  (ε) 
a1 j x j  ≤ c1 + ε ≤ c1 + 1,

⎪ 
⎨  j=1 
  (7.10)

⎪  

⎪  n 

⎪  (ε) 
ai j x j  < ci , 2 ≤ i ≤ n.
⎩ 
 j=1 

And there is an upper bound B independent of ε, which satisfies

|x (ε)
j | ≤ B, 1 ≤ j ≤ n.
260 7 Lattice-Based Cryptography

The integral point x (ε) satisfying the above bounded condition is finite, so there must
be a nonzero integral point x = 0, which holds (7.10) for any ε > 0. Let ε → 0, then
the Lemma holds.
In the following discussion, we make the following restrictions on R ⊂ Rn :

R is a symmetric convex body, R is bounded, and R is a closed subset of Rn .


(7.11)
Obviously, when A is an n-order reversible square matrix, for any positive vector
c = (c1 , c2 , . . . , cn ), R(A, c) satisfies the above restriction (7.11), but R (A, c) does
not because R (A, c) is not closed.
Definition 7.2 If R ⊂ Rn satisfies the restriction (7.11), then for any x ∈ Rn , define
the distance function F(x) as

F(x) = FR (x) = inf{λ|λ > 0, λ−1 x ∈ R}. (7.12)

By definition, it is obvious that we have the following ordinary conclusions:


(i) F(x) = 0 ⇔ x = 0;
(ii) If A is a reversible n-order square matrix, the distance function defined by
R(A, c) is  
 n 
  
F(x) = max ci−1  ai j x j  . (G1.12’)
1≤i≤n  j=1 

Property (i) can be derived from the boundedness of R, and property (ii) can be
derived directly from the definition of R(A, c). Later we will see that 0 ≤ F(x) < ∞
holds for all x ∈ Rn . The main property of distance function F(x) is the following
Lemma.
Lemma 7.9 If F(x) is a distance function defined by R satisfying the constraints,
then
(i) Let λ ≥ 0, then x ∈ λR ⇔ F(x) ≤ λ;
(ii) F(λx) = |λ|F(x) holds for all λ ∈ R, x ∈ Rn ;
(iii) F(x + y) ≤ F(x) + F(y), ∀ x, y ∈ Rn .
Proof Since R is closed, by the definition, F −1 (x)x ∈ R. Thus, if λ ≥ F(x), by
Lemma 7.3, then  
F(x)  F(x) 
−1
λ x= −1
· F (x)x,    ≤ 1.
λ λ 

We have λ−1 x ∈ R ⇒ x ∈ λR. Conversely, if λ < F(x) ⇒ λ−1 x ∈


/ R. So when
x ∈ λ−R, there must be λ ≥ F(x), (i) holds.
(ii) is ordinary. Because |λ|−1 F −1 (x)λx ∈ R. There is

F(λx) ≤ |λ|F(x).
7.1 Geometry of Numbers 261

Conversely, let δ = F(λx), because of δ −1 λx ∈ R, you might as well let λ > 0, thus

δ
F(x) ≤ =⇒ λF(x) ≤ F(λx).
λ
So there is F(λx) = |λ|F(x), (ii) holds.
To prove (iii), we let μ1 = F(x), μ2 = F(y), =⇒ μ−1 −1
1 x ∈ R, μ2 y ∈ R. By
Lemma 7.4, we have
μ1 μ2
(μ1 + μ2 )−1 (x + y) = (μ−1 x) + (μ−1 y) ∈ R.
μ1 + μ2 1 μ1 + μ2 2

Thus
F(x + y) ≤ μ1 + μ2 .

The Lemma holds.

Let the volume of R ∈ Rn be V > 0, there are n linearly independent vec-


tors {α1 , α2 , . . . , αn } in R to form a set of bases of Rn . For any real number
μ1 , μ2 , . . . , μn , by Lemma 7.9, we have

F(μ1 α1 + · · · + μn αn ) ≤ |μ1 |F(α1 ) + |μ2 |F(α2 ) + · · · + |μn |F(αn )


≤ |μ1 | + |μ2 | + · · · + |μn |.

Because αi ∈ R ⇒ F(αi ) ≤ 1, so the above formula holds. That proves for ∀ x ∈


Rn ⇒ F(x) ≤ ∞.

Corollary 7.2 Let R ⊂ Rn meet the limiting conditions (7.11), and Vol(R) > 0,
then
(i) ∀ x ∈ Rn , there is λ such that x ∈ λR;
(ii) Let {α1 , α2 , . . . , αn } ⊂ R be a set of bases of Rn , then
 n 

μi αi ||μ1 | + |μ2 | + · · · + |μn | ≤ 1 ⊂ R.
i=1

Proof Because F(x) < ∞, so by (i) of Lemma 7.9, we can directly deduce the
conclusion of (i) and (ii) given directly by Lemma 7.4.

Now let j be a subscript, and we define λ j as

λ j = min{λ ≥ 0|λR contains j linear independent integral points in Rn }, (7.13)

and λ j is called the jth continuous minimum of R. By Lemma 7.3, λR ⊂ λ R, if


0 ≤ λ ≤ λ . Therefore, λ increases continuously, then λR can always contain any
set of desired vectors. Therefore, the existence of λ j is proof.
262 7 Lattice-Based Cryptography

By Lemma 7.6, let V be the volume of R, then Vol(λR) = λn V , for the first
continuous minimum λ1 , we have the following estimation

λn1 V ≤ 2n . (7.14)

For λ j ( j ≥ 2), there is no explicit upper bound estimation, but we have the following
conclusions.
Lemma 7.10 Let R ⊂ Rn be a convex body satisfying the limiting condition (7.11),
V = Vol(R), λ1 , λ2 , . . . , λn be n continuous minima of R, then we have

2n
≤ V λ1 λ2 · · · λn ≤ 2 n . (7.15)
n!

Proof We only prove the left inequality of the above formula, and we continuously
select the linear independent whole point x (1) , x (2) , . . . , x ( j) such that x ( j) ∈ λ j R,
and x ( j) x (1) , x (2) , . . . , x ( j−1) is linearly independent. Let x ( j) =(x j1 , x j2 , . . . , x jn ) ∈
Zn . Because matrix A = (x ji )n×n is an integer matrix, and det(A) = 0, so

| det(A)| ≥ 1.

By Lemma 7.9, for any constant μ1 , μ2 , . . . , μn , we have

F(μ1 x (1) + μ2 x (2) + · · · + μn x (n) )


≤ |μ1 |F(x (1) ) + |μ2 |F(x (2) ) + · · · + |μn |F(x (n) )
≤ |μ1 |λ1 + |μ2 |λ2 + · · · + |μn |λn .

Thus, if |μ1 |λ1 + |μ2 |λ2 + · · · + |μn |λn ≤ 1, then

μ1 x (1) + μ2 x (2) + · · · + μn x (n) ∈ R.

So set

R1 = {μ1 x (1) + μ2 x (2) + · · · + μn x (n) ||μ1 |λ1 + |μ2 |λ2 + · · · + |μn |λn ≤ 1} ⊂ R.

The volume of the left set R1 is


 
2n | det(A)|
Vol(R1 ) = ··· dμ1 dμ2 · · · dμn =
n!λ1 · · · λn
|μ1 |λ1 +|μ2 |λ2 +···+|μn |λn ≤1
n
2
≥ .
n!λ1 · · · λn

So there is
2n
≤ Vol(R1 ) ≤ Vol(R) = V.
n!λ1 · · · λn
7.1 Geometry of Numbers 263

Therefore, the left inequality of (7.15) holds. The proof of the right inequality is
quite complex and is omitted here. Interested readers can refer to the classic works
(1963, 1971) of J. W. S. Cassels.

An important application of the above geometry of numbers is to solve the problem


of rational approximation of real numbers, which is called Diophantine approxima-
tion in classical number theory. The main conclusion of this section is the following
simultaneous rational approximation theorem of n real numbers.

Theorem 7.1 Let θ1 , θ2 , . . . , θn be any n real numbers, θi = 0, then for any positive
number N > 1, there are nonzero positive integers q and p1 , p2 , . . . , pn to satisfy

|qθi − pi | < N − n , 1 ≤ i ≤ n;
1

(7.16)
|q| ≤ N .

Proof The proof of the theorem is a simple application of Minkowski’s linear type
theorem (see Lemma 7.8). Let A ∈ R(n+1)×(n+1) be an (n + 1)-order reversible square
matrix, defined as ⎛ ⎞
−1 0 · · · · · · 0 θ1
⎜ 0 −1 · · · · · · 0 θ2 ⎟
⎜ ⎟
A=⎜ ⎟
⎜· · · · · · · · · · · · · · · · · ·⎟ .
⎝ 0 0 · · · 0 −1 θn ⎠
0 · · · · · · 0 0 −1

Obviously |det(A)| = 1. Let (n + 1)-dimensional positive vector c = (N − n , N − n ,


1 1

. . ., N − n , N ), because
1

c1 c2 · · · cn cn+1 = N −1 · N = 1 ≥ |det(A)|.

So by Lemma 7.8, the symmetric convex body R (A, c) defined by A and c has
a nonzero integral point x = ( p1 , p2 , . . . , pn , q) = 0. We prove q = 0. Because
x = 0, if q = 0, then pk = 0 (1 ≤ k ≤ n), therefore, the k-th inequality in Eq. (7.16)
will produce the following contradiction,

1 ≤ |qθk − pk | < N − n < 1.


1

So q = 0, we complete the proof of Theorem 7.1.

Corollary 7.3 Let θ1 , . . . , θn be any n real numbers, then for any ε > 0, there is
rational number pqi (1 ≤ i ≤ n) satisfies
 
 
θi − pi  < ε . (7.17)
 q q
264 7 Lattice-Based Cryptography

Proof Any ε > 0 given, let N − n < ε, Formula (7.17) can be derived directly from
1

Theorem 7.1.

7.2 Basic Properties of Lattice

Lattice is one of the most important concepts in modern cryptography. Most of the
so-called anti-quantum computing attacks are lattice based cryptosystems. What is
a lattice? In short, a lattice is a geometry in n-dimensional Euclidean space Rn , for
example L = Zn ⊂ Rn , then Zn is a lattice in Rn , which is called an integer lattice
or a trivial lattice. If Zn is rotated once, we get the concept of a general lattice in
Rn , which is a geometric description of a lattice, next, we give an algebraic precise
definition of a lattice.
Definition 7.3 Let L ⊂ Rn be a nonempty subset, which is called a lattice in Rn , if
(i) L is an additive subgroup of Rn ;
(ii) There is a positive constant λ = λ(L) > 0, such that

min{|x||x ∈ L , x = 0} = λ, (7.18)

λ = λ(L) is called the minimal distance of a lattice L.


By Definition 7.3, a lattice is simply a discrete additive subgroup in Rn , in which
the minimum distance λ = λ(L) is the most important mathematical quantity of the
lattice. Obviously, we have

λ = min{|x − y||x ∈ L , y ∈ L , x = y}, (7.19)

Equation (7.19) shows the reason why λ is called the minimal distance of a lattice.
If x ∈ L and |x| = λ, x is called the shortest vector of L.
In order to obtain a more explicit and concise mathematical expression of any
lattice, we can regard an additive subgroup as a Z-module. First, we prove that any
lattice is a finitely generated Z-module.
Lemma 7.11 Let L ⊂ Rn be a lattice and {α1 , α2 , . . . , αm } ⊂ L be a set of vectors
in L, then {α1 , α2 , . . . , αm } is linearly independent in R if and only if {α1 , α2 , . . . , αm }
is linearly independent in Z.
Proof If {α1 , α2 , . . . , αm } is linearly independent in R, it is obviously linearly inde-
pendent in Z. conversely, if {α1 , α2 , . . . , αm } is linearly independent in Z, that is, any
linear combination
a1 α1 + · · · + am αm = 0, ai ∈ Z,

we have a1 = a2 = · · · = am = 0, then the linear combination in R is equal to 0, that


is
7.2 Basic Properties of Lattice 265

θ1 α1 + θ2 α2 + · · · + θm αm = 0, θi ∈ R. (7.20)

We prove θ1 = θ2 = · · · = θm = 0. By Lemma 7.1, for sufficiently large N > 1,


there are positive integers q = 0 and p1 , p2 , . . . , pm such that

|qθi − pi | < N − m , 1 ≤ i ≤ m;
1

q ≤ N.

By (7.20), we have

| p1 α1 + · · · + pm αm | = |(qθ1 − p1 )α1 + · · · + (qθm − pm )αm |


≤ N − m (|α1 | + · · · + |αm |)
1

≤ N − m max |αi |.
1

1≤i≤m

Let λ be the minimal distance of L and ε > 0 be a sufficiently small positive number,
we choose  
−m |αi |m
N > max ε , max ,
1≤i≤m λm

then N − m < ε, and


1

N − m max |αi | < λ.


1

1≤i≤m

Thus
| p1 α1 + · · · + pm αm | < λ.

Notice that p1 α1 + · · · + pm αm ∈ L, so p1 α1 + · · · + pm αm = 0. Since {α1 , α2 ,


. . . , αm } is linearly independent on Z, p1 = p2 = · · · = pm = 0 is derived. For any
i, 1 ≤ i ≤ m, we get |θi | ≤ |qθi | < N − m < ε. Since ε is any small positive number,
1

there is θ1 = θ2 = · · · = θm = 0. This proves that {α1 , α2 , . . . , αm } is also linearly


independent in R. Lemma 7.11 holds.

From the above lemma, any lattice L in Rn is a finitely generated Z-module.


Let {β1 , β2 , . . . , βm } ⊂ L be a set of Z-bases in L, then L as the rank of Z-module
satisfies
rank(L) = m ≤ n, (7.21)

and  

m
L= ai βi |ai ∈ Z . (7.22)
i=1

If {β1 , β2 , . . . , βm } is a Z-basis of L and each βi is regarded as a column vector,


then the matrix
266 7 Lattice-Based Cryptography

B = [β1 , β2 , . . . , βm ] ∈ Rn×m , rank(B) = m.

Equation (7.22) can be written as

L = L(B) = {Bx|x ∈ Zm } ⊂ Rn . (7.23)

We take L as the Z-modules, m as the rank of lattice L, B ∈ Rn×m as the generating


matrix of lattice L, and {β1 , β2 , . . . , βm } as a set of generating bases of L.
If {α1 , α2 , . . . , αm } ⊂ Rn is any m column vectors in Rn , the Gram matrix of
A = [α1 , α2 , . . . , αm ] ∈ Rn×m , {α1 , α2 , . . . , αm } is defined as

T = (αi , α j )m×m .

Obviously, we have T = A A, where A is the transpose matrix of A.

Lemma 7.12 Let A ∈ Rn×m , b ∈ Rn (m ≤ n is not required), then


(i) Let x0 ∈ Rm be a solution of A Ax = A b, then

|Ax0 − b|2 = minm |Ax − b|2 .


x∈R

(ii) rank(A A) = rank(A), and homogeneous linear equations Ax = 0 and A Ax =


0 have the same solution.
(iii) A Ax = A b always has a solution x ∈ Rm , and when rank(A) = m, the solution
is unique
x = (A A)−1 A b.

Proof First we prove (i). Let x0 ∈ Rm satisfies A Ax0 = A b, then for any x ∈ Rm ,
we have
Ax − b = (Ax0 − b) + A(x − x0 ) = γ + γ1 ∈ Rn .

We prove that γ and γ1 are two orthogonal vectors in Rn . Because

(A(x − x0 )) (Ax0 − b)
= (x − x0 ) A (Ax0 − b)
= (x − x0 ) (A Ax0 − A b) = 0.

So γ ⊥γ1 , by Pythagorean theorem, we have

|Ax − b|2 = |Ax0 − b|2 + |A(x − x0 )|2 ≥ |Ax0 − b|2 .

So (i) holds.
To prove (ii), let V A be the solution space of Ax = 0 and V A A the solution space of
A Ax = 0, let’s prove V A = V A A . First, there is V A ⊂ V A A . Conversely, let x ∈ V A A ,
that is A Ax = 0, then
7.2 Basic Properties of Lattice 267

x A Ax = 0 ⇒ (Ax) Ax = Ax, Ax = 0.

The above formula holds if and only if Ax = 0, so x ∈ V A . There is V A = V A A .


Notice that 
dim V A = m − rank(A)
dim V A A = m − rank(A A).

So rank(A) = rank(A A), (ii) holds. To prove (iii), b ∈ Rn given, then the rank of
the augmented matrix of linear equation system A Ax = A b is

rank[A A, A b] = rank(A [A, b])


≤ rank(A ) = rank(A) = rank(A A).

Therefore, the augmented matrix and the coefficient matrix have the same rank, so
the linear equations have solutions. When rank(A) = m, then rank(A A) = m, that
is, A A is a reversible m-order square matrix, thus

x = (A A)−1 · A b, =⇒ the solution is unique.

Lemma 7.12 holds!

Lemma 7.13 A ∈ Rn×m , and rank(A) = m, then A A is a positive definite real


symmetric matrix of order m, so there is a real orthogonal matrix P ∈ Rm×m of
order m satisfies  
 δ1 · · · 0 

 .. 
. δ 
P A A P =  . 2 ,
 (7.24)
 .. · · · . . . 
 
0 δm 

where δi > 0 is the m eigenvalues of A A.

Proof rank(A) = m ⇒ m ≤ n. Let T = A A, then T is a symmetric matrix of order


m. Let x ∈ Rm be m arguments, quadratic form

x T x = x A Ax = (Ax) (Ax) = Ax, Ax ≥ 0.

Because rank(A) = m, the above formula if and only if when x = 0, x T x = 0. So


T is a positive definite matrix. From the knowledge of linear algebra, there is an
orthogonal matrix of order m, P ⇒ P T P is a diagonal matrix, that is

P T P = diag{δ1 , δ2 , . . . , δm }.

Because P T P and T have the same eigenvalue, δ1 , δ2 , . . . , δm is the eigenvalue of


T , and ∀ δi > 0. The Lemma holds.
268 7 Lattice-Based Cryptography

Lemma 7.12 is called the least square method in linear algebra, its significance
is to find a vector x0 with the shortest length in the set {Ax − b|x ∈ Rm } for a given
n × m-order matrix A and a given vector b ∈ Rn . Lemma 7.12 gives an effective
algorithm, that is, to solve the linear equations A Ax = A b, and x0 is the solution
of the equations, Lemma 7.13 is called the diagonalization of quadratic form. Now,
the main results are as follows:

Theorem 7.2 Let L ⊂ Rn be a lattice, rank(L) = m (m ≤ n), if and only if there is


a real matrix B ∈ Rn×m of order n × m, rank(B) = m, such that
 

m
L = {Bx|x ∈ Z } = m
ai βi |ai ∈ Z , (7.25)
i=1

where B = [β1 , β2 , . . . , βm ], each βi ∈ Rn is a column vector.

Proof Equation (7.23) proves the necessity of the condition, and we only prove the
sufficiency of the condition. If a subset L in Rn is given by Eq. (7.25), it is obvious that
L is an additive subgroup of Rn , because any α = Bx1 , β = Bx2 , where x1 , x2 ∈ Zm ,
then x = x1 − x2 ∈ Zn , and

α − β = B(x1 − x2 ) = Bx ∈ L .

So we only prove the discreteness of L. Let T = B B, then from Lemma 7.13, T is


a positive definite real symmetric matrix, let δ1 , δ2 , . . . , δm be the eigenvalue of T ,
then
δ = min{δ1 , δ2 , . . . , δm } > 0.

We prove √
min |Bx| ≥ δ > 0. (7.26)
x∈Zm
x =0

By Lemma 7.13, there is an orthogonal matrix P of order m such that

P T P = diag{δ1 , δ2 , . . . , δm }.

For any given x ∈ Zm , x = 0. We have

|Bx|2 = x T x = x P(P T P)P x ≥ δ|P x|2 = δ|x|2 .

Because x = 0, then |x|2 ≥ 1, so

|Bx|2 ≥ δ, ∀ x ∈ Zm , x = 0.

This shows that the distance between any two different points in L is ≥ δ > 0.
Therefore, in a sphere with 0 as the center and r as the radius, the number of points
7.2 Basic Properties of Lattice 269

in L is finite. In these finite vectors, there is a α ∈ L , ⇒

|α| = min |x| = λ ≥ δ > 0.


x∈L
x =0

According to the definition of lattice, L is a lattice in Rn , the Lemma holds.

It can be directly deduced from the above theorem

Corollary 7.4 Let L = L(B) ⊂ Rn be a lattice of rank(L) = m, λ be the √


minimum
distance of L, B ∈ Rn×m , δ be the minimum eigenvalue of B B, then λ ≥ δ.

Definition 7.4 L ⊂ Rn is a lattice, and rank(L) = n, call L is a full rank lattice of


Rn .

By Theorem 7.2, a sufficient and necessary condition for a full rank lattice with
L as Rn is the existence of a reversible square matrix B ∈ Rn×n , det(B) = 0, such
that  n 

L = L(B) = ai βi |ai ∈ Z, 1 ≤ i ≤ n = {Bx|x ∈ Zn }. (7.27)
i=1

If L = L(B) is a full rank lattice, define d = d(L) as

d = d(L) = |det(B)|, (7.28)

call d is the determinant of L. d = d(L) is the second most important mathematical


quantity of a lattice. The lattice we discuss below is always assumed to be a full rank
lattice.
For a lattice (full rank lattice), the generating matrix is not unique, but d = d(L)
is unique. To prove this, first define the so-called unimodular matrix. Define

S L n (Z) = {A = (ai j )n×n |ai j ∈ Z, det(A) = ±1}, (7.29)

Obviously, S L n (Z) forms a group under the multiplication of the matrix, because
the n-order identity matrix In ∈ S L n (Z), and A1 ∈ S L n (Z), A2 ∈ S L n (Z), then A =
A1 A2 ∈ S L n (Z). Specially, if A ∈ S L n (Z),A = (ai j )n×n , then the inverse matrix of
A  ∗ ∗ 
a a · · · a ∗ 
 11 12 1n 
a ∗ a ∗ · · · a ∗ 
A−1 = ±  21 22 2n  ∈ S L (Z),
·
 ∗· · · · · · · · · · ·  n
a · · · · · · a ∗ 
n1 nn

where ai∗j is the algebraic cofactor of ai j .

Lemma 7.14 L = L(B) ⊂ Rn is a lattice (full rank lattice), B1 ∈ Rn×n , then L =


L(B) = L(B1 ) if and only if there is a unimodular matrix U ∈ S L n (Z) ⇒ B = B1 U .
270 7 Lattice-Based Cryptography

Proof If B = B1 U , U ∈ S L n (Z), we prove L(B) = L(B1 ). Let α = B1 x ∈ L(B1 ),


where x ∈ Zn , then
α = B1 x = B1 UU −1 x = BU −1 x.

Because of U −1 x ∈ Zn , then α ∈ L(B), that is L(B1 ) ⊂ L(B). Similarly, if α = Bx,


x ∈ Zn , then
α = Bx = B1 U x, where U x ∈ Zn .

Thus, α ∈ L(B1 ), that is L(B) = L(B1 ).


Conversely, if L(B) = L(B1 ), let B = [β1 , β2 , . . . , βn ], B1 = [α1 , α2 , . . . , αn ],
transition matrix
(β1 , β2 , . . . , βn ) = (α1 , α2 , . . . , αn )U.

Obviously by βi ∈ L(B1 )(1 ≤ i ≤ n), U is an integer matrix. and

(α1 , α2 , . . . , αn ) = (β1 , β2 , . . . , βn )U1 .

Because αi ∈ L(B)(1 ≤ i ≤ n), U1 also is an integer matrix. Because of

(β1 , β2 , . . . , βn ) = (α1 , α2 , . . . , αn )U = (β1 , β2 , . . . , βn )U1 U.

We have U1 U = In , thus det(U ) = ±1, that is U ∈ S L n (Z), B = B1 U , the Lemma


holds.
By Lemma 7.14, B, B1 are any two generating matrices of a lattice L, then

|det(B)| = |det(B1 )| = d = d(L).

That is, the determinant d(L) of a lattice is an invariant.


For a lattice (full rank lattice) L ⊂ Rn , the dual lattice of L is defined as

L ∗ = {α ∈ Rn |α, β ∈ Z, ∀ β ∈ L}. (7.30)

Lemma 7.15 Let L = L(B) be a lattice, then the dual lattice of L is L ∗ =


L((B −1 ) ), that is, if B is the generating matrix of L, then (B −1 ) is the generat-
ing matrix of L ∗ .
Proof Let
L((B −1 ) ) = {(B −1 ) y|y ∈ Zn }.

any α ∈ L((B −1 ) ), α = (B −1 ) y, y ∈ Zn , β ∈ L, β = Bx, x ∈ Zn , then

α, β = α β = y B −1 Bx = y x ∈ Z.

That means L((B −1 ) ) ⊂ L ∗ . Conversely, any α ∈ L ∗ , for all β ∈ L, there is α, β ∈


Z. So let B = [β1 , β2 , . . . , βn ], then
7.2 Basic Properties of Lattice 271
!

n 
n
α, xi βi = xi α, βi  ∈ Z, ∀ xi ∈ Z,
i=1 i=1

therefore, for each generating vector βi (1 ≤ i ≤ n), there is α, βi  ∈ Z. Write α =


(y1 , y2 , . . . , yn ), ⎛ ⎞ ⎛ ⎞
y1 x1
⎜ .. ⎟ ⎜ .. ⎟
α, βi  ∈ Z, =⇒ B ⎝ . ⎠ = ⎝ . ⎠ ∈ Zn .
yn xn

Thus ⎛ ⎞ ⎛ ⎞
y1 x1
⎜ .. ⎟ −1 ⎜ .. ⎟
⎝ . ⎠ ∈ (B ) ⎝ . ⎠ .
yn xn

That is α ∈ L((B )−1 ). Because B · B −1 = In , =⇒ (B −1 ) B = In , thus (B −1 ) =


(B )−1 . So α ∈ L((B −1 ) ), that is L ∗ ⊂ L((B −1 ) ). We have L ∗ = L((B −1 ) ). The
Lemma holds.

By Lemma 7.15, we immediately have the following corollary.

Corollary 7.5 Let L = L(B) be a full rank lattice, L ∗ is the dual lattice of L, then

d(L ∗ ) = d −1 (L).

An equivalence relation in Rn can be defined by using a lattice L, for all α, β ∈ Rn ,


we define
α ≡ β(mod L) ⇐⇒ α − β ∈ L .

Obviously, this is an equivalent relation, called the congruence relation of mod L.

Definition 7.5 Let F ⊂ Rn be a subset, and call F the basic region of a lattice (full
rank lattice) L, if
(i) ∀ x ∈ Rn , there is a α ∈ F ⇒ x ≡ α(mod L),
(ii) Any α1 , α2 ∈ F, then α1 ≡ α2 (mod L).

By definition, the basic neighborhood of a lattice is the representative element


set of the additive quotient group Rn /L. Therefore, a basic neighborhood of any L
forms an additive group under mod L.

Lemma 7.16 Let L = L(B) be a full rank lattice, then


(i) Any two basic neighborhoods F1 and F2 of L are isomorphic additive groups
(mod L).
(ii) F = {Bx|x = (x1 , x2 , . . . , xn ) , and 0 ≤ xi < 1, 1 ≤ i ≤ n} is a basic neigh-
borhood of L(B).
272 7 Lattice-Based Cryptography

(iii) Vol(F) = d = d(L).

Proof (i) is trivial, because

F1 ∼
= Rn /L , F2 ∼
= Rn /L , =⇒ F1 ∼
= F2 .

To prove (ii), let B = [β1 , β2 , . . . , βn ], then {β1 , β2 , . . . , βn } is a set of bases of Rn ,


∀ α ∈ Rn , α can be expressed as a linear combination of β1 , β2 , . . . , βn , that is


n
α= ai βi , ∀ ai ∈ R.
i=1

"n
Let [α] B = i=1 [ai ]βi , {α} B = α − [α] B , then {α} B can be expressed as
⎛ ⎞
x1
⎜ x2 ⎟
⎜ ⎟
{α} B = B ⎜ . ⎟ , where 0 ≤ xi < 1, 1 ≤ i ≤ n.
⎝ .. ⎠
xn

That is {α} B ∈ F. Because α − {α} B = [α] B ∈ L, so for any α ∈ Rn , there is a


{α} B ∈ F, such that
α ≡ {α} B (mod L).

And two points α = Bx and β = By in F, then

α − β = B(x − y) = Bz.

where z = (z 1 , z 2 , . . . , z n ), |z i | < 1. So α ≡ β(mod L), if α = β. So F is a basic


neighborhood of L.
Let’s prove (iii). Because all basic neighborhoods of L are isomorphic, they have
the same volume, (iii) gives a specific basic region F of L, so we can only prove
Vol(F) = d = d(L). Obviously,
 
Vol(F) = ··· dy1 dy2 · · · dyn
y=(y1 ,y2 ,...,yn )∈F

make variable substitution Bx = y and calculate the Jacobi of the vector value

dy1 dy2 · · · dyn = d(λ)dx1 · · · dxn .


7.2 Basic Properties of Lattice 273

Thus
1 1
Vol(F) = ··· d(λ)dx1 · · · dxn = d(L).
0 0

We have completed the proof of Lemma 7.16.

Next, we discuss the gram Schmidt orthogonalization algorithm. If B = [β1 , β2 ,


. . . , βn ] is the generation matrix of L, {β1 , β2 , . . . , βn } can be transformed into a set
of orthogonal bases {β1∗ , β2∗ , . . . , βn∗ }, where β1∗ = β1 , and


i−1
βi , β ∗j 
βi∗ = βi − β ∗j , (7.31)
j=1
β ∗j , β ∗j 

{β1∗ , β2∗ , . . . , βn∗ } is called the orthogonal basis corresponding to {β1 , β2 , . . . , βn }.


B ∗ = [β1∗ , . . . , βn∗ ] is the orthogonal matrix corresponding to B. For any 1 ≤ i ≤ n,
denote ⎧
⎪ u = 1, u i j = 0, when j > i.
⎪ ii

⎨ βi , βi∗ 
ui j = , when 1 ≤ j ≤ i ≤ n. (7.32)

⎪ |β ∗j |2


U = (u i j )n×n .

Then U is a lower triangular matrix, and


⎛ ⎞ ⎛ ∗⎞
β1 β1
⎜β2 ⎟ ⎜β2∗ ⎟
⎜ ⎟ ⎜ ⎟
⎜ .. ⎟ = U ⎜ .. ⎟ . (7.33)
⎝.⎠ ⎝.⎠
βn βn∗

If both sides are transposed at the same time, there is

(β1 , β2 , . . . , βn ) = (β1∗ , β2∗ , . . . , βn∗ )U . (7.34)

Therefore, U is the transition matrix between two groups of bases.

Lemma 7.17 Let L = L(B) ⊂ Rn be a lattice, B = [β1 , β2 , . . . , βn ] is the gen-


erating matrix, B ∗ = [β1∗ , β2∗ , . . . , βn∗ ] is the corresponding orthogonal matrix,
d = d(L) is the determinant of L, then we have


n 
n
d= |βi∗ | ≤ |βi |. (7.35)
i=1 i=1

Proof By (7.24), we have B = B ∗ U , because det(U ) = 1, so


274 7 Lattice-Based Cryptography

det(B) = det(B ∗ ).

By the definition,
d 2 = det(B B) = det(U (B ∗ ) B ∗ U )
= det((B ∗ ) B ∗ )
= det(diag{|β1∗ |2 , |β2∗ |2 , . . . , |βn∗ |2 }).

So there is

n
d= |βi∗ |.
i=1

In order to prove the inequality on the right of Eq. (7.35), we only prove

|βi∗ | ≤ |βi |, 1 ≤ i ≤ n. (7.36)


"i
Because βi = j=1 u i j β ∗j , then
!

i 
i
|βi |2 = βi , βi  = u i j β ∗j , u i j β ∗j
j=1 j=1


i
= u i2j β ∗j , β ∗j 
j=1


i−1
= βi∗ , βi∗  + u i2j β ∗j , β ∗j .
j=1

Therefore, the inequality on the right of (7.35) holds, the Lemma is proved.
Equation (7.35) is usually called Hadamard inequality, and we give another proof
here.
In order to define the concept of continuous minima on a lattice L, we record the
minimum distance on L with λ1 . That is λ1 = λ(L). Another definition of λ1 is the
minimum positive real number r , so that the linear space formed by L ∩ Ball(0, r )
is a one-dimensional space, where

Ball(0, r ) = {x ∈ Rn ||x| ≤ r }

is a closed sphere with 0 as the center and r as the radius. The concept of n continuous
minima λ1 , λ2 , . . . , λn in L can be given.
Definition 7.6 Let L = L(B) ⊂ Rn be a full rank lattice, the i-th continuous mini-
mum λi is defined as

λi = λi (L) = inf{r | dim(span(L ∩ Ball(0, r ))) ≥ i}.


7.2 Basic Properties of Lattice 275

The following lemma is a useful lower bound estimate of the minimum distance
λ1 .

Lemma 7.18 L = L(B) ⊂ Rn is a lettice (full rank lattice), B ∗ = [β1∗ , β2∗ , . . . , βn∗ ]
is the corresponding orthogonal basis, then

λ1 = λ(L) ≥ min |βi∗ |. (7.37)


1≤i≤n

Proof For ∀ x ∈ Zn , x = 0, we prove

|Bx| ≥ min |βi∗ |, x ∈ Zn , x = 0.


1≤i≤n

Let x = (x1 , x2 , . . . , xn ) = 0, j be the largest subscript ⇒ x j = 0, then


 j !
  
 ∗ 
|Bx, β ∗j | = xi β j , β j  = |x j ||β ∗j |2 .
 
i=1

Because when i < j,

βi , β ∗j  = 0, and β j , β ∗j  = β ∗j , β ∗j .

On the other hand,


|Bx, β ∗j | ≤ |Bx||β ∗j |.

So
|Bx| ≥ |x j ||β ∗j | ≥ min |βi∗ |.
1≤i≤n

Lemma 7.18 holds!

Corollary 7.6 The continuous minimum λ1 , λ2 , . . . , λn of a lattice L is reachable,


that is, it exists αi ∈ L ⇒ |αi | = λi , 1 ≤ i ≤ n.

Proof The lattice points contained in ball Ball(0, δ) with center 0 and radius
δ (δ > λi ) are finite, because in a bounded region (finite volume), if there are infinite
lattice points, there must be a convergent subsequence, but the distance between any
different two points in L is greater than or equal to λ1 , which indicates that

|L ∩ Ball(0, δ)|∞, δλi

has finite lattice points, it’s not hard for us to find α1 ∈ L ⇒ |α1 | = λ1 , α2 ∈ L ⇒
|α2 | = λ2 ,…,|αn | = λn . The Corollary holds.

In Sect. 7.1, the geometry of numbers is relative to the integer lattice Zn ; next, we
extend the main results to the general full rank lattice.
276 7 Lattice-Based Cryptography

Lemma 7.19 (Compare with Lemma 7.5) L = L(B) ⊂ Rn is a lattice (full rank
lattice), R ⊂ Rn , if Vol(R) > d(L), then there are two different points in R, α ∈ R,
β ∈ R ⇒ α − β ∈ L.
Proof Let F be a basic region of L, that is

F = {Bx|x = (x1 , . . . , xn ), 0 ≤ |xi | < 1, 1 ≤ i ≤ n}.

Obviously, Rn can be divided into the following disjoint subsets,

Rn = ∪α∈L {α + y|y ∈ F}
= ∪α∈L {α + F}.

For a given lattice point α ∈ L, define

Rα = R ∩ {α + F} = α + Dα , Dα ⊂ F.

Therefore, R can be divided into the following disjoint subsets,


 
R = ∪α∈L Rα , ⇒ Vol(R) = Vol(Rα ) = Vol(Dα ).
α∈L α∈L

If for any α, β ∈ L, α = β, Dα ∩ Dβ = ∅, then

Vol(R) = Vol(∪α∈L Dα ) ≤ Vol(F) = d(L),

contradicts assumptions. So it must exist α, β ∈ L, α = β, =⇒ Dα ∩ Dβ = ∅. Let


x ∈ Dα ∩ Dβ , then α + x ∈ R, β + x ∈ R. And

(α + x) − (β + x) = α − β ∈ L .

The Lemma holds.


Lemma 7.20 (Compare with 7.6) Let L be a full rank lattice, R ⊂ Rn is a symmetric
convex body. And Vol(R) > 2n d(L), then R contains a nonzero lattice point, that
is ∃ α ∈ L, α = 0, such that α ∈ R.
Proof Let
1
R = {x|2x ∈ R}.
2
Then  
1
Vol R = 2−n Vol(R) > d(L).
2

By 7.19, there is x ∈ 21 R, y ∈ 21 R, =⇒ x − y ∈ L. And because 2x ∈ R, 2y ∈ R,


R is a symmetric convex body, by Lemma 7.4,
7.2 Basic Properties of Lattice 277

1
(2x − 2y) = x − y ∈ R.
2
The Lemma holds.
Corollary 7.7 Let L be a full rank lattice, λ(L) = λ1 is the minimum distance of L.
Then √ 1
λ1 = λ(L) ≤ n(d(L)) n . (7.38)

Proof First we prove  n


2r
Vol(Ball(0, r )) ≥ √ . (7.39)
n

This is because Ball(0, r ) contains the following cubes


 
r
x ∈ R |x = (x1 , . . . , xn ), ∀ |xi | < √
n
⊂ Ball(0, r ).
n

By the definition, there are no nonzero lattice points in open ball Ball(0, λ1 ), by
Lemma 7.20, because Ball(0, λ1 ) is a symmetrical convex body, there is

Vol(Ball(0, λ1 )) ≤ 2n d(L).

Thus  n
2λ1
√ ≤ 2n d(L).
n

That is √ 1
λ1 ≤ n(d(L)) n .

The Corollary holds.


Combined with Eq. (7.37), we obtain the estimation of the upper and lower bounds
of the minimum distance of a lattice,

min |βi∗ | ≤ λ(L) ≤
1
n(d(L)) n . (7.40)
1≤i≤n

Lemma 7.21 Let L ⊂ Rn be a lattice (full rank lattice), λ1 , λ2 , . . . , λn is the con-


tinuous minimum of L, d = d(L) is the determinant of L, then
n
λ1 λ2 . . . λn ≤ n 2 d(L). (7.41)

Proof Let {α1 , α2 , . . . , αn } ⊂ L, and |αi | = λi is a set of bases of Rn . Let


 n   
 y, α ∗  2
T = y ∈ Rn | i
<1 , (7.42)
i=1
λi |αi∗ |
278 7 Lattice-Based Cryptography

where {α1∗ , α2∗ , . . . , αn∗ } is the orthogonal basis corresponding to {α1 , α2 , . . . , αn }.


Let’s prove that T does not contain any nonzero lattice points. Let y ∈ L, y = 0, let
k be the largest subscript so that |y| ≥ λk , then

y ∈ Span(α1∗ , α2∗ , . . . , αk∗ ) = Span(α1 , α2 , . . . , αk ).

Because if y is linearly independent of α1 , α2 , . . . , αk , then

k + 1 ≤ dim(Span(α1 , α2 , . . . , αk , y) ∩ Ball(0, |y|)).

λk+1 ≤ |y| is obtained from the definition of λk+1 , which contradicts the definition
of k. By y ∈ Span(α1 , α2 , . . . , αk ),

n 
  k 
 
y, α ∗  2 y, α ∗  2
i
= i

i=1
λi |αi∗ | i=1
λi |αi∗ |

1 
k
y, αi∗ 2 1
≥ = 2 |y|2 ≥ 1.
λ2k i=1
|αi∗ |2 λk

Therefore y ∈
/ T , by Lemma 7.20, because T is a symmetric convex body, thus

Vol(T ) ≤ 2n d.

On the other hand,  n 



Vol(T ) = λi · Vol(Ball(0, 1))
i=1

n  n
2
≥ λi √ .
i=1
n

So

n
n
λi ≤ n 2 d.
i=1

Lemma 7.21 holds.


The above lemma shows that the upper bound (7.38) of λ1 is valid for λi in the
sense of geometric average.
Finally, we discuss the computational difficulties on the lattice. These problems
are the main scientific basis and technical support in the design of trap gate function,
and they are also the cornerstone of the security of lattice cryptography.
1. Shortest vector problem SVP
Lattice L is a discrete geometry in Rn , we know that its minimum distance λ1 =
λ(L) is the length of the shortest vector in L. How to find its shortest vector u 0 ∈ L
7.2 Basic Properties of Lattice 279

for any full rank lattice L , =⇒

|u 0 | = min |x| = λ1 .
x∈L ,x =0

It is the so-called shortest vector calculation problem. At present, there are insur-
mountable difficulties in theory and calculation, because we only know the existence
of u 0 , but we can’t calculate u 0 . Second, the current main research focuses on the
approximation of the shortest vector. The so-called shortest vector approximation is
to find a nonzero vector u ∈ L on L , =⇒

|u| ≤ r (n)λ1 , u ∈ L , u = 0,

where r (n) ≥ 1 is called the approximation coefficient, which only depends on the
dimension of lattice L.
In 1982, H. W. Lenstra, A. K. Lenstra and L. Lovasz creatively developed a set
of algorithms in (1982) to effectively solve the approximation problem of the short-
est vector, which is the famous LLL algorithm in lattice theory. The computational
complexity of LLL algorithm is polynomial for the whole lattice, and the approxima-
n−1
tion coefficient r (n) = 2 2 . How to improve the approximation coefficient in LLL
algorithm to the polynomial coefficient of n is the main research topic at present.
For example, Schnorr’s work in 1987 and Gama and Nguyen’s work (2008a, 2008b)
are very representative, but they are still far from the polynomial function, so the
academic circles generally speculate:
Conjecture 1: there is no polynomial algorithm that can approximate the shortest
vector so that the approximation coefficient r (n) is a polynomial function of n.
2. Closest vector problem CVP
Let L ⊂ Rn be a lattice, t ∈ Rn is an arbitrary given vector, and it is easy to prove
that there is a lattice point u t ∈ L , =⇒

|u t − t| = min |x − t|,
x∈L

u t is called the nearest lattice point (vector) of t. When t = 0 is a zero vector, u 0


is the shortest vector of L, so the adjacent vector problem is a general form of the
shortest vector problem. Similarly, we only know the existence of the adjacent vector
u t , and there is no definite algorithm to find u t instead of the approximation problem
of the adjacent vector, x ∈ L, if

|x − t| ≤ r1 (n)|u t − t|,

then x is called the approximation coefficient, which is the approximation adjacent


vector of r1 (n), in 1986, Babai proposed an effective algorithm to approximate the
adjacent vector in Babai (1986), and its approximation coefficient r1 (n) is generally
of the same order as the approximation coefficient r (n) of the shortest vector.
280 7 Lattice-Based Cryptography

There are many other difficult computational problems on lattice, such as the
Successive Shortest vector problem, which is essentially to find a deterministic algo-
rithm to approximate each αi ∈ L, where |αi | = λi is the continuous minimum of
L. However, SVP and CVP are commonly used in lattice cryptosystem design and
analysis, and most of the research is based on the integer lattice.

7.3 Integer Lattice and q-Ary Lattice

Definition 7.7 A full rank lattice L is called an integer lattice, if L ⊂ Zn , an integer


lattice L is called a q-ary lattice, if qZn ⊂ L ⊂ Zn , where q ≥ 1 is a positive integer.

It is easy to see from the definition that a lattice L = L(B) is an integer lattice
⇔ B ∈ Zn×n is an integer square matrix, so the determinant d = d(L) of an entire
lattice L is a positive integer.

Lemma 7.22 Let L = L(B) ⊂ Zn be an integer lattice, d = d(L) is the determinant


of L, then dZn ⊂ L ⊂ Zn , therefore, an integer lattice is always a d-ary lattice
(d = q).

Proof Let α ∈ dZn , let’s prove that α ∈ L, that is, α = Bx always has the solution
of the entire vector x ∈ Zn . Let B −1 be the inverse matrix of B, then
⎡ ∗ ∗ ∗

b11 b12 · · · b1n
1 1 ⎢ ∗ ∗ ∗ ⎥
⎢ b21 b22 · · · b2n ⎥ ,
B −1 = B∗ =
det(B) det(B) ⎣ · · · · · · · · · · · · ⎦
∗ ∗ ∗
bn1 bn2 · · · bnn

where B = (bi j )n×n , bi∗j is the algebraic cofactor of bi j . Because B ∈ Zn×n , so B ∗ ∈


Zn×n , thus d B −1 = ±B ∗ ∈ Zn×n , write α = dβ, then β ∈ Zn , and

x = B −1 α = d B −1 β = ±B ∗ β ∈ Zn .

Thus α ∈ L. That is dZn ⊂ L, the Lemma holds.

The following lemma is a simple conclusion in algebra. For completeness, we


prove the following.

Lemma 7.23 Let L be a q-ary lattice, Zq is the residual class rings mod q, then
(i) Zn /qZn ∼
= Zqn (additive group isomorphism).

(ii) Z /L = Zqn / L/qZn (additive group isomorphism). Therefore, L/qZn is a linear
n

code on Zqn .

Proof α = (a1 , a2 , . . . , an ) ∈ Zn , β = (b1 , b2 , . . . , bn ) ∈ Zn , if ∀ ai ≡ bi (mod q),


we write α ≡ β(mod q). For any α ∈ Zn , define
7.3 Integer Lattice and q-Ary Lattice 281

ᾱ = (a¯1 , a¯2 , . . . , a¯n ) ∈ Zqn ,

where a¯i is the minimum nonnegative residue of ai mod q, and thus, we have α ≡
σ
ᾱ(mod q). Define mapping σ : Zn −→ Zqn as σ (α) = ᾱ, this is a surjection, and

σ (α + β) = ᾱ + β̄ = σ (α) + σ (β).

Therefore, σ is a full group homomorphism. Obviously Kerσ = qZn , therefore, by


the isomorphism theorem of groups, we have

Zn /qZn ∼
= Zqn .

Because of qZn ⊂ L ⊂ Zn , then by the isomorphism theorem of groups,

Zn /L ∼
= Zn /qZn /L/qZn ∼
= Zqn /L/qZn .

The Lemma holds.

Next, we will prove that Zn /L is a finite group. Therefore, we first discuss the
elementary transformation of matrix. The so-called elementary transformation of
matrix refers to elementary row transformation and elementary column transforma-
tion, specifically refers to the following three kinds of elementary transformations:
(1) Transform two rows or two columns of matrix A:

σi j (A)-Transform rows i and j of A
τi j (A)-Transform columns i and j of A

(2) A row or column multiplied by −1 by A:



σ−i (A)-Multiply row i of A by − 1
τ−i (A)-Multiply column i of A by − 1

(3) Add the k times of a row (column) to another row (column), k ∈ R, in many
cases, we require k ∈ Z to be an integer:

σki+ j (A)-Add k times of row i of A to row j
τki+ j (A)-Add k times of column i of A to column j

The n-order identity matrix is represented by In , the matrix obtained by the above ele-
mentary transformation of In is called elementary matrix. We note that all elementary
matrices are unimodular matrices (see (7.29)), and
282 7 Lattice-Based Cryptography

⎨ σi j (A) = σi j (In )A, τi j (A) = Aτi j (In )

σ−i (A) = σ−i (In )A, τ−i (A) = Aτ−i (In ) (7.43)


σki+ j (A) = σki+ j (In )A, τki+ j (A) = Aτki+ j (In )

That is, elementary row transformation for A is equal to multiplying the correspond-
ing elementary matrix from the left, and elementary column transformation for A is
equal to multiplying the corresponding elementary matrix from the right.

Lemma 7.24 Let L = L(B) ⊂ Zn be an integer lattice, then Zn /L is a finite group,


and
|Zn /L| = d(L).

Proof According to the knowledge of linear algebra, an integer square matrix B ∈


Zn can always be transformed into a lower triangular matrix by elementary row
transformation; that is, there is a unimodular matrix U ∈ S Ln(Z), so that
⎡ ⎤
∗ 0 ··· 0
⎢ ∗ ∗ ··· 0 ⎥
UB = ⎢
⎣···
⎥.
··· ··· ···⎦
∗ ∗ ··· ∗

Then the elementary column transformation of U B can always be transformed into


an upper triangular matrix, so it is a diagonal matrix; that is, there is a unimodular
matrix U1 ∈ S Ln(Z), ⇒

U BU1 = diag{δ1 , δ2 , . . . , δn }.

where δi = 0, δi ∈ Z, and


n
d(L) = | det(U BU1 )| = |δi |.
i=1

Let L(U BU1 ) be an integral lattice generated by U BU1 , we have quotient group
isomorphism
Zn /L(U BU1 ) ∼ = ⊕i=1
n
Z/|δi |Z = ⊕i=1
n
Z|δi | .

Thus

n
|Zn /L(U BU1 )| = |δi | = d(L).
i=1

Because of L(B) = L(BU1 ) and L(B) ∼


= L(U B), Thus L(B) ∼
= L(U BU1 ), so

|Zn /L(B)| = |Zn /L(U BU1 )| = d(L).


7.3 Integer Lattice and q-Ary Lattice 283

Lemma 7.24 holds.

An integer square matrix B = (bi j)n×n ∈ Zn×n is called Hermite normal form
matrix, if B is an upper triangular matrix, that is bi j = 0, 1 ≤ j < i ≤ n, and

bii ≥ 1, 0 ≤ bi j < bii , 1 ≤ i < j ≤ n. (7.44)

A Hermite normal form matrix, referred to as HNF matrix.


Definition 7.8 L = L(B) ⊂ Zn is an integer lattice, and B is the HNF matrix, which
is called the HNF basis of L, denote as B = HNF(L).
The following lemma proves that a whole lattice has a unique HNF basis, so it is
reasonable to use HNF(L) to represent HNF basis.

Lemma 7.25 Let L ⊂ Zn be an integer lattice, then there is a unique HNF matrix
B ⇒ L = L(B).

Proof Let L = L(A), A is the generating matrix of L, by using the elementary


column transformation, A can be transformed into an upper triangular matrix, that is
⎡ ⎤
c11 c12 · · · c1n
⎢ 0 c22 · · · c2n ⎥
AU1 = ⎢
⎣···
⎥ , U1 ∈ S Ln(Z).
··· ··· ···⎦
0 0 · · · cnn

where Cii > 0, 1 ≤ i ≤ n, if AU1 is transformed continuously, there is a unimodular


matrix U2 , ⇒ AU1 U2 = B is the HNF matrix, because L(B) = L(AU1 U2 ), know
that L has HNF base B.
Let’s prove the uniqueness of HNF base B if there are two HNF matrices
B1 ,B2 ⇒ L(B1 ) = L(B2 ), then from Lemma 7.14, there is a unimodular matrix
U ∈ S Ln(Z) such that B1 = B2 U ; that is, the elementary column transformation
defined by formula (7.43) can be continuously implemented on B2 to obtain B1 , but
for B2 , any column transformation τi j ,τ−i and τki+ j is not a HNF matrix, so U = In
is a unit matrix, that is B1 = B2 . The Lemma holds.

Lemma 7.26 Let L = L(B) be an integer lattice, B = (bi j )n×n is a HNF matrix,
B ∗ = [β1∗ , β2∗ , . . . , βn∗ ] is the orthogonal basis corresponding to B = [β1 , β2 , . . . , βn ],
then
B ∗ = [β1∗ , β2∗ , . . . , βn∗ ] = diag{b11 , b22 , . . . , bnn }

is a diagonal matrix.

Proof We prove βi∗ = (0, 0, . . . , bii , 0, . . . , 0) , induction of i, when i = 1, β1∗ =


β1 = (b11 , 0, . . . , 0) . The proposition holds, if for j ≤ i, there is β ∗j = (0, 0, . . . , b j j ,
0, . . . , 0) holds, then when i + 1, by (7.31), there is
284 7 Lattice-Based Cryptography


i
βi+1 , β ∗j 

βi+1 = βi+1 − β ∗j
j=1
|β ∗j |2


i
b j (i+1)
= βi+1 − β ∗j
j=1
b jj
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
b1(i+1) b1(i+1) 0
⎜ b2(i+1) ⎟ ⎜b2(i+1) ⎟ ⎜ 0 ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ .. ⎟ ⎜ .. ⎟ ⎜ .. ⎟
⎜ . ⎟ ⎜ ⎟ ⎜ ⎟
⎜ ⎟ ⎜ . ⎟ ⎜ . ⎟
⎜ ⎟ ⎜
= ⎜ bi(i+1) ⎟ − ⎜ bi(i+1) ⎟ = ⎜ ⎟ ⎜ 0 ⎟.

⎜b(i+1)(i+1) ⎟ ⎜ 0 ⎟ ⎜b(i+1)l(i+1) ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ .. ⎟ ⎜ . ⎟ ⎜ .. ⎟
⎝ . ⎠ ⎝ .. ⎠ ⎝ . ⎠
0 0 0

Thus the proposition holds.


Next, we discuss q-ary lattices, where q ≥ 1 is a positive integer, the following
two q-ary lattices are often used in lattice cryptosystems.
Definition 7.9 Let Zq be a residue class ring mod q, A ∈ Zqn×m , the following two
q-ary lattices are defined as

q (A) = {y ∈ Zm |there is x ∈ Zn ⇒ y ≡ A x(mod q)}, (7.45)

and

q (A) = {y ∈ Zm |Ay ≡ 0(mod q)}. (7.46)

By the definition: q (A) ⊂ Zm and q⊥ (A) ⊂ Zm is an m-dimensional integer


lattice. And any α ∈ qZm , then there is x = 0 ∈ Zm , ⇒ α ≡ A x(mod q), and Aα ≡
0(mod q), there is  m
qZ ⊂ q (A) ⊂ Zm

qZm ⊂ q (A) ⊂ Zm .


That is, q (A) and q (A) are q-element lattices of dimension m.
Lemma 7.27 We have  ⊥ ∗
q (A) =q q (A)
⊥ ∗
q (A) =q q (A)


Proof Any α ∈ q (A) , by the definition, then

y, α ∈ Z, ∀ y ∈ q (A).

And
7.3 Integer Lattice and q-Ary Lattice 285

y, α = y α ∈ Z ⇒ y α ≡ 0(mod 1).

There is
y qα ≡ 0(mod q), ∀ y ∈ q (A).

Because y ∈ q (A), thus there is x ∈ Zn ⇒ y ≡ A x(mod q), from the above for-
mula,
x Aqα ≡ 0(mod q), ∀ x ∈ Zn .

Thus

Aqα ≡ 0(mod q), ⇒ qα ∈ q (A).

We prove
∗ ⊥
q q (A) ⊂ q (A).


Conversely, if y ∈ q (A), we have
 
1
Ay ≡ 0(mod q) ⇒ A y ≡ 0(mod 1).
q

Any α ∈ q (A), let x ∈ Zn , α ≡ A x(mod q), then


) *  
1 1
α, y = x A y ≡ 0(mod 1), ∀ x ∈ Zn .
q q

We have
1 ∗ ∗
y∈ q (A) ⇒y∈q q (A) .
q

That is
⊥ ∗
q (A) ⊂q q (A) .

⊥ ∗
Thus, q (A) =q q (A) . Similarly, the second equation can be proved.

Lemma 7.28 Let q be a prime, A ∈ Zqn×m , m ≥ n, and rank(A) = n, then


| det( q (A))| = qn, (7.47)

and
| det( q (A))| = q m−n . (7.48)

Proof In finite field Zq , rank(A) = n, then the linear equation system Ay = 0 has
exactly q m−n solutions, from which we can get

| q (A)/qZ |
m
= q m−n .
286 7 Lattice-Based Cryptography

By Lemma 7.23,
⊥ ⊥
|Zm / q (A)| = |Zqm / q (A)/qZ |
m
= qn.

By Lemma 7.24,
⊥ ⊥
| det( q (A))| = |Zm / q (A)| = qn.

So (7.47) holds. By Corollary 7.5 of the previous section, we have


⊥ ∗
| det( q (A) )| = q −n .

By Lemma 7.27,
⊥ ∗
| det( q (A))| = q m | det( q (A) )| = q m−n .

The Lemma holds.

7.4 Reduced Basis

In lattice theory, Reduced basis and corresponding LLL algorithm are the most
important contents, which have an important impact on computational algebra, com-
putational number theory and other neighborhoods, and are recognized as one of
the most important computational methods in recent 100 years. In order to introduce
Reduced basis and LLL algorithm, we recall the gram Schmidt orthogonalization
process summarized by Eqs. (7.31)–(7.34). Let {β1 , β2 , . . . , βn } ⊂ Rn be a set of
bases corresponding to Rn , {β1∗ , β2∗ , . . . , βn∗ } is the corresponding Gram–Schmidt
orthogonal basis, where


i−1
βi , β ∗j 
β1∗ = β1 , βi∗ = βi − β ∗j , 1 < i ≤ n. (7.49)
j=1
β ∗j , β ∗j 

The above formula can be written as


i
βi , β ∗j 
βi = β ∗j , 1 ≤ i ≤ n. (7.50)
j=1
β ∗j , β ∗j 

There is

Lemma 7.29 Let {β1 , β2 , . . . , βn } be a set of bases of Rn , {β1∗ , β2∗ , . . . , βn∗ } is the
corresponding Gram–Schmidt orthogonal basis, L(β1 , β2 , . . . , βk ) = Span{β1 , β2 ,
. . . , βk } is a linear subspace extended by β1 , β2 , . . . , βk , then
7.4 Reduced Basis 287

(i)
L(β1 , β2 , . . . , βk ) = L(β1∗ , β2∗ , . . . , βk∗ ), 1 ≤ k ≤ n. (7.51)

(ii) For 1 ≤ i ≤ n, there is



βi , βk∗  = 0, when k > i;
(7.52)
βi , βk  = βk∗ , βk∗ , when k = i.
"n
(iii) ∀ x ∈ Rn , x = i=1 xi βi∗ , then

x, βi∗ 
xi = , 1 ≤ i ≤ n. (7.53)
βi∗ , βi∗ 

Proof The above three properties can be derived directly from Eq. (7.49) or (7.50).

Let U = (Ui j )n×n , where

βi , β ∗j 
Ui j = , ⇒ Ui j = 0, when j > i. Uii = 1. (7.54)
β ∗j , β ∗j 

Therefore, U is the lower triangular matrix with element 1 on the diagonal, and
⎡ ⎤ ⎡ ∗⎤
β1 β1
⎢ β2 ⎥ ⎢ β2∗ ⎥
⎢ ⎥ ⎢ ⎥
⎢ .. ⎥ = U ⎢ .. ⎥ . (7.55)
⎣ . ⎦ ⎣ . ⎦
βn βn∗

U is called the coefficient matrix when {β1 , β2 , . . . , βn } is orthogonalized.


Let’s introduce the concept of orthogonal projection: suppose V ⊂ Rk ⊂ Rn (1 ≤
k ≤ n), the orthogonal complement space V ⊥ of V in Rk is

V ⊥ = {x ∈ Rk |x, α = 0, ∀ α ∈ V }. (7.56)

Because Rk = V ⊕ V ⊥ , so ∀ x ∈ Rk , the only can be expressed as

x = α + β, where α ∈ V, β ∈ V ⊥ .

α is called the orthogonal projection of x on subspace V , obviously |x|2 = |α|2 +


|β|2 .

Lemma 7.30 Let {β1 , β2 , . . . , βn } be a set of bases of Rn and {β1∗ , β2∗ , . . . , βn∗ } be
the corresponding orthogonal basis, 1 ≤ k ≤ n, then βk∗ is the orthogonal projection
of βk on the orthogonal complement space V of the subspace L(β1 , β2 , . . . , βk−1 )
of L(β1 , β2 , . . . , βk ).
288 7 Lattice-Based Cryptography

Proof When k = 1, the proposition is trivial, if k > 1, then by Lemma 7.29,

L(β1 , β2 , . . . , βk−1 ) = L(β1∗ , β2∗ , . . . , βk−1



).

Therefore, the orthogonal complement space V = L(βk∗ ) of L(β1 , β2 , . . . , βk−1 ) in


L(β1 , β2 , . . . , βk−1 , βk ) is a one-dimensional space, because of


k−1
βk = βk∗ + u k j β ∗j ,
j=1

and !

k−1
βk∗ , u k j β ∗j = 0.
j=1

So βk∗ is the orthogonal projection of βk on V . The Lemma holds.


Next, we discuss the transformation law of the corresponding orthogonal
basis when making the elementary column transformation of the base matrix
[β1 , β2 , . . . , βn ].
Lemma 7.31 Let {β1 , β2 , . . . , βn } ⊂ Rn is a set of bases, {β1∗ , β2∗ , . . . , βn∗ } is the
corresponding orthogonal basis, A = (u i j )n×n is the coefficient matrix. Exchange
βk−1 with βk to get a set of bases {α1 , α2 , . . . , αn } of Rn , where

αk−1 = βk , αk = βk−1 , αi = βi , when i = k − 1, k.

Let {α1∗ , α2∗ , . . . , αn∗ } be the corresponding orthogonal basis and A1 = (vi j )n×n be
the corresponding coefficient matrix, then we have
(i) αi∗ = βi∗ , if i = k − 1, k.
(ii) 

αk−1 = βk∗ + u kk−1 βk−1

αk∗ = βk−1
∗ ∗
− vkk−1 βk−1 .
+
(iii) vi j = u i j , if 1 ≤ j < i ≤ n, and {i, j} {k, k − 1} = ∅.
(iv)  |β ∗ |2
vik−1 = u ik−1 vkk−1 + u ik |α∗k |2 , i > k.
k−1

vik = u ik−1 − u ik u kk−1 , i > k.

(v) vk−1 j = u k j ,vk j = u k−1 j , 1 ≤ j < k − 1.


Proof If 1 ≤ i < k − 1, or k < i ≤ n, then the orthogonal complement space in
L(α1 , α2 , . . . , αi ) = L(β1 , β2 , · · · , βi ),

V = L ⊥ (α1 , α2 , . . . , αi−1 ) = L ⊥ (β1 , β2 , . . . , βi−1 ).


7.4 Reduced Basis 289

Therefore, the orthogonal projection of αi∗ as αi = βi on V is the same as that of βi∗


as βi on V , that is αi∗ = βi∗ (i = k − 1, k), (i) holds.

To prove (ii), because αk−1 is the orthogonal projection of βk (= αk−1 ) on the

orthogonal complement space L(βk−1 ) of L(β1 , β2 , . . . , βk−2 ), because of


k−1
βk∗ = βk − u k j β ∗j
j=1


k−2

= βk − u kk−1 βk−1 − u k j β ∗j ,
j=1

and L(β1 , β2 , . . . , βk−2 ) = L(β1∗ , β2∗ , . . . , βk−2 )∗ , there is



αk−1 = βk∗ + u kk−1 βk−1

.

Similarly, αk∗ is the orthogonal projection of βk−1


∗ ∗
on L(αk−1 ), thus

αk∗ = βk−1
∗ ∗
− vkk−1 αk−1 .

where ∗ ∗
βk−1 , αk−1 
vkk−1 = ∗
|αk−1 | 2

β , u kk−1 β ∗ 

= k−1 ∗ 2 k−1
|αk−1 |

|βk−1 |2
= u kk−1 ∗ ,
|αk−1 |2

thus (ii) holds. Similarly, other properties can be proved. Lemma 7.31 holds.
Lemma 7.32 Let {β1 , β2 , . . . , βn } be a set of bases of Rn , {β1∗ , β2∗ , . . . , βn∗ } be the
corresponding orthogonal basis, and A = (u i j )n×n be the coefficient matrix. For any
k ≥ 2, if we replace βk with βk − rβk−1 and keep the other βi unchanged (i = k),
we get a new set of bases.

{α1 , α2 , . . . , αn } = {β1 , β2 , . . . , βk−1 , βk − rβk−1 , βk+1 , . . . , βn }.

Let {α1∗ , α2∗ , . . . , αn∗ } be the corresponding orthogonal basis and A1 = (vi j )n×n the
corresponding coefficient matrix, then we have
(i) αi∗ = βi∗ , ∀ 1 ≤ i ≤ n, that is, βi∗ remains unchanged.
(ii) vi j = u i j , if 1 ≤ j < i ≤ n, i = k.
(iii) 
vk j = u k j − r u k−1, j , if j < k − 1
vkk−1 = u kk−1 − r, if j = k − 1.
290 7 Lattice-Based Cryptography

Proof When i < k, or i > k, αi∗ = βi∗ is trivial, to prove (i), only prove when i =
k. Because αk∗ is the orthogonal projection of αk = βk − rβk−1 in the orthogonal
complement space L(αk∗ ) = L(βk∗ ) of L(β1 , β2 , . . . , βk−1 ) = L(α1 , α2 , . . . , αk−1 ),


k−1
βk∗ = βk − u k j β ∗j
j=1
⎛ ⎞

k−2
= βk − rβk−1 − ⎝ ∗ ⎠
u k j β ∗j + (u kk−1 − r )βk−1
j=1
⎛ ⎞
k−2
= αk − ⎝ ∗ ⎠
u k j β ∗j + (u kk−1 − r )βk−1 .
j=1

This proves that αk∗ = βk∗ . Thus (i) holds. To prove (ii), when i = k, we have

αi , α ∗j  βi , β ∗j 
vi j = = = ui j ,
|α ∗j |2 |β ∗j |2

that is (ii) holds. When i = k,

αk , α ∗j 
vk j =
|α ∗j |2
βk − rβk−1 , β ∗j 
= (1 ≤ j < k ≤ n)
|α ∗j |2
βk , β ∗j  βk−1 , β ∗j 
= −r
|β ∗j |2 |β ∗j |2
= u k j − r u k−1 j .

The above formula holds for all 1 ≤ j ≤ k − 1, thus (iii) holds, the Lemma holds.

Next, we introduce the concept of a set of Reduced bases of Rn .


Definition 7.10 Let {β1 , β2 , . . . , βn } ⊂ R be a set of bases, {β1∗ , β2∗ , . . . , βn∗ } be
the corresponding orthogonal basis, A = (u i j )n×n be the coefficient matrix, and
{β1 , β2 , . . . , βn−1 } be a set of Reduced bases of Rn , if

(i) |u i j | ≤ 21 , ∀ 1 ≤ j < i ≤ n.
(7.57)
(ii) |βi∗ − u kk−1 βi−1 ∗ ∗
|2 ≥ 43 |βi−1 |2 , ∀ 1 < i ≤ n.

A set of Reduced bases of Rn is sometimes called Lovisz Reduced bases, which is


of great significance in lattice theory. The important result of this section is that any
7.4 Reduced Basis 291

lattice L in Rn has Reduced bases, and the method to calculate the Reduced bases is
the famous LLL algorithm.

Theorem 7.3 Let L ⊂ Rn be a lattice(full rank lattice), then there is a generating


matrix B = [β1 , β2 , . . . , βn ] of L, where {β1 , β2 , . . . , βn } is a Reduced basis of Rn
and will also be a Reduced basis of lattice L = L(B).

Proof Let B = [β1 , β2 , . . . , βn ], L = L(B), first we prove

1
|u kk−1 | ≤ , ∀ 1 ≤ k. (7.58)
2
If there is a k > 1, then the above formula does not hold, let r be the nearest integer
of u kk−1 , obviously,
1
|u kk−1 − r | ≤ .
2
In {β1 , β2 , . . . , βn }, replace βk with βk − rβk−1 , thus by Lemma 7.32,

u k j → u k j − r u k−1 j , 1 ≤ j ≤ k.

Specially, when j = k − 1,
u kk−1 → u kk−1 − r,

under the new basis, all βi and u i j (1 ≤ j < i = k) remain unchanged, so Eq. (7.58)
holds under the new basis.
In the second step of LLL algorithm, we prove that

3 ∗ 2
|βk∗ − u kk−1 βk−1

|2 ≥ |β | , ∀ 1 < k ≤ n. (7.59)
4 k−1
By (7.4),
|βk∗ + u kk−1 βk−1

|2 = |βk∗ − u kk−1 βk−1

|2 .

Therefore, the sign in the absolute value on the right of Eq. (7.59) can be changed
arbitrarily. If there is a k, 1 < k ≤ n such that (7.59) does not hold, that is

3 ∗ 2
|βk∗ + u kk−1 βk−1

|2 < |β | . (7.60)
4 k−1
In this case, if βk and βk−1 are exchanged and the other βi remains unchanged,
there is a new set of bases {α1 , α2 , . . . , αn }, the corresponding orthogonal basis
{α1∗ , α2∗ , . . . , αn∗ } and the coefficient matrix A1 = (vi j )n×n , where

αi = βi (i = k − 1, k), αk−1 = βk , αk = βk−1 .

Let’s prove that under the new base {α1 , α2 , . . . , αn }, there is


292 7 Lattice-Based Cryptography

3 ∗ 2
|αk∗ + vkk−1 αk−1

|2 ≥ |α | , (7.61)
4 k−1
by Lemma 7.31, 

αk−1 = βk∗ + u kk−1 βk−1

αk∗ = βk−1
∗ ∗
− vkk−1 βk−1 .

By (7.60), we have
∗ 3 ∗ ∗
|αk−1 |2 < |α + vkk−1 αk−1 |2 .
4 k
That is
4 ∗ 2 3 ∗ 2
|αk∗ + vkk−1 αk−1

|2 > |α | > |αk−1 | .
3 k−1 4
Thus (7.61) holds. Using the above method continuously, it can be proved that formula

(7.59) is valid for ∀ k > 1, however, when k is replaced by k − 1, the new βk−1 is
replaced by
∗ ∗ ∗ ∗
βk−1 → βk−1 + u k−1k−2 βk−2 = βk−1 .

We have to prove (7.59), it remains unchanged when k − 1 is used instead of k. In


fact,
|βk∗ + u kk−1 βk−1

|2 = |βk∗ + u kk−1 (βk−1
∗ ∗
+ u k−1k−2 βk−2 )|2
= |βk∗ + u kk−1 βk−1
∗ ∗
|2 + |u kk−1 u k−1k−2 βk−2 |2
3 ∗ 2 ∗
≥ (|βk−1 | + u 2kk−1 |u k−1k−2 βk−2 |2 )
4
3 ∗ 2 ∗
≥ (|βk−1 | + |u k−1k−2 βk−2 |2 )
4
3 ∗ ∗
= |βk−1 + u k−1k−2 βk−2 |2
4
3 ∗ 2
= |βk−1 | .
4
Therefore, Eq. (7.59) does not change when the transformation of commutative vector
is carried out continuously; that is, Eq. (7.59) holds for all k, 1 < k ≤ n.
The third step of the LLL algorithm, let’s prove that

1
|u k j | ≤ , ∀ 1 ≤ j < k ≤ n. (7.62)
2
When j = k − 1, (7.58) is the (7.62). For given k, 1 < k ≤ n, if (7.62) does not hold,
let l be the largest subscript ⇒ |u kl | > 21 . Let r be the nearest integer to u kl , then
|u kl − r | ≤ 21 . Replace βk with βk − rβl , from Lemma 7.32, all βi∗ remain unchanged
and the coefficient matrix is changed to:
7.4 Reduced Basis 293

u k j = u k j − r ul j , 1 ≤ j < l
u kl = u kl − r.

While the other u i j remains unchanged, at this time,

1
|u kl − r | = |vkl | ≤ .
2
So we have Eq. (7.62) for all 1 ≤ j < k ≤ n.
The above matrix transformation is equivalent to multiplying a unimodular matrix
from the right, so the Reduced basis B ⇒ L = L(B) of lattice L is finally obtained.
We complete the proof of Theorem 7.3.
Lemma 7.33 Let L = L(B) be a lattice, B is a Reduced basis of L, and B ∗ =
[β1∗ , β2∗ , . . . , βn∗ ] is the corresponding orthogonal basis, then for any 1 ≤ j < i ≤ n,
we have
|β ∗j |2 ≤ 2i− j |βi∗ |2 .

Proof Because B = [β1 , β2 , . . . , βn ] is a Reduced basis, then

3 ∗ 2
|βk∗ + u kk−1 βk−1

|2 ≥ |β | .
4 k−1
Thus
3 ∗ 2
|βk∗ + u kk−1 βk−1

|2 = |βk∗ |2 + u 2kk−1 |βk−1

|2 ≥ |β | .
4 k−1
There is
3 ∗ 2
|βk∗ |2 = ∗
|β | − u 2kk−1 |βk−1 |2
4 k−1
3 ∗ 2 1 ∗ 2
≥ |βk−1 | − |βk−1 |
4 4
1 ∗ 2
= |βk−1 | .
2
So when 1 ≤ j < i ≤ n given, we have

1 ∗ 2
|βi∗ |2 ≥ |β |
2 i−1
1 ∗ 2
≥ |βi−2 |
4
≥ ···
≥ 2−(i− j) |β ∗j |2 ,

thus
|β ∗j |2 ≤ 2i− j |βi∗ |2 .
294 7 Lattice-Based Cryptography

Remark 7.3 In the definition of Reduced base, the coefficient 43 on the left of the
second inequality of (7.57) can be replaced by any δ, where 41 < δ < 1. Specially,
Babai pointed out in (1986) that the second inequality of Eq. (7.57) can be replaced
by the following weaker inequality,

1 ∗
|βi∗ | ≤ |β |. (7.63)
2 i−1
Let’s discuss the computational complexity of the LLL algorithm. Let B =
{β1 , β2 , . . . , βn } be any set of bases, for any 0 ≤ k ≤ n, we define

d0 = 1, dk = det(βi , β j k×k ). (7.64)

If {β1∗ , β2∗ , . . . , βn∗ } is the orthogonal basis corresponding to {β1 , β2 , . . . , βn }, there


is obviously
k
dk = |βi∗ |2 , 0 < k ≤ n. (7.65)
i=1

Thus, di is a positive number, and dn = d(L)2 . Let


n−1
D= dk , (7.66)
k=1

We first prove that dk (0 < k ≤ n) and D have lower bounds.

Lemma 7.34 Let

m(L) = λ(L)2 = min{|x|2 : x ∈ L , x = 0}.

Then
  k(k−1)
3 2
dk ≥ m(L)k , 1 ≤ k ≤ n.
4

Proof The determinant of k-dimensional lattice L k = L(β1 , β2 , . . . , βk ) ⊂ Rk (1 ≤


k ≤ n) has
d 2 (L k ) = dk .

By the conclusion of Cassels (1971), there is a nonzero lattice point x in L k , which


satisfies x ∈ L k , x = 0, and
  k−1
4 2 k1
|x| ≤
2
dk . (7.67)
3
7.4 Reduced Basis 295

Then
  k(k−1)
3 2
dk ≥ m(L k )k
4
  k(k−1)
3 2
≥ (m(L))k .
4

The Lemma holds.


Another important conclusion of this section is that for the integer lattice L esti-
mation, the computational complexity of the Reduced basis of the integer lattice is
obtained by using the LLL algorithm. We prove that the LLL algorithm on the integer
lattice is polynomial.
Theorem 7.4 Let L = L(B) ⊂ Zn be an integer lattice, B = [β1 , β2 , . . . , βn ] is the
generating matrix, suppose N satisfies

max |βi |2 ≤ N .
1≤i≤n

Then the computational complexity of the Reduced basis of L obtained by B using


the LLL algorithm is

Time(LLL algorithm) = O(n 4 log N ).

The binary digits of all integers in the LLL algorithm are O(n log N ), so the compu-
tational complexity of the LLL algorithm on the integer lattice is polynomial.
Proof By (7.36), we have
|βi∗ | ≤ |βi |, 1 ≤ i ≤ n.

where {β1∗ , β2∗ , . . . , βn∗ } is the orthogonal basis corresponding to {β1 , β2 , . . . , βn },


then by (7.65) and (7.66), we have


k 
k
dk = |βi∗ |2 ≤ |βi |2 ≤ N k , 1 ≤ k ≤ n.
i=1 i=1

And n(n−1)
1≤D≤N 2 . (7.68)

The inequality on the left of the above formula is because of dk ∈ Z, and dk ≥ 1, by


(7.66), then D ≥ 1. Therefore, O(n) arithmetic operations are required in the first
step of the LLL algorithm, O(n 3 ) arithmetic operations are required in the second
and third steps, and the number of bit operations per algorithm operation is ≤ Time
(calculate D), thus

Time(LLL algorithm) ≤ O(n 3 )Time(calculate D) = O(n 4 log N ).


296 7 Lattice-Based Cryptography

Therefore, the first conclusion of Theorem 7.4 is proved. The second conclusion is
more complex, we will omit it. Interested readers can refer to the original (1982) of
A. K. Lenstra, H. W. Lenstra and L. Lovasz.

7.5 Approximation of SVP and CVP

The most important application of lattice Reduced basis and LLL algorithm is to
provide approximation algorithms for the shortest vector problem and the shortest
adjacent vector problem, and obtain some approximate results. Firstly, we prove the
following Lemma.

Lemma 7.35 Let {β1 , β2 , . . . , βn } be a Reduced basis of a lattice L, {β1∗ , β2∗ , . . . , βn∗ }
be the corresponding orthogonal basis, and d(L) be the determinant of L, then we
have
(i)

n
n(n−1)
d(L) ≤ |βi | ≤ 2 4 d(L). (7.69)
i=1

(ii)
n−1 1
|β1 | ≤ 2 4 d(L) n . (7.70)

Proof The inequality on the left of (i), called Hadamard inequality, has been
,n given
by Lemma 7.17. The inequality on the right of (i) gives an upper bound of i=1 |βi |,
by Lemma 7.33,
i− j
|β ∗j | ≤ 2 2 |βi∗ |, 1 ≤ j < i ≤ n. (7.71)

Thus

i−1
βi = βi∗ + u i j β ∗j .
j=1

We get


i−1
|βi |2 = |βi∗ |2 + u i2j |β ∗j |2
j=1

1 ∗2
i−1
≤ |βi∗ |2
+ |β |
4 j=1 j
⎛ ⎞
1 
i−1
≤ ⎝1 + 2i− j ⎠ |βi∗ |2 (7.72)
4 j=1
7.5 Approximation of SVP and CVP 297
 
1
= 1 + (2i − 2) |βi∗ |2
4
≤ 2i−1 |βi∗ |2 .

There is

n 
n
|βi |2 ≤ 2i−1 |βi∗ |2
i=1 i=1
"n−1 
n
=2 i=0 i
|βi∗ |2
i=1
n
= 2 2 (n−1) |βi∗ |2
n

i=1
2 (n−1)
n
=2 (d(L))2 .

So

n
|βi | ≤ 2 4 (n−1) d(L).
n

i=1

We have (7.69) holds. To prove (iii), by (7.72) and (7.71), then

|β j |2 ≤ 2 j−1 |β ∗j |2 ≤ 2 j−1 2i− j |βi∗ |2 = 2i−1 |βi∗ |2 . (7.73)

For all 1 ≤ j ≤ i ≤ n, especially,

|β1∗ | ≤ 2i−1 |βi∗ |2 , 1 ≤ i ≤ n.

Thus
"n 
n
|β1 |2n ≤ 2 i=0 (i−1) |βi∗ |2
i=1

= 2 2 (n−1) (d(L))2 .
n

So
n−1 1
|β1 | ≤ 2 4 d(L) n .

Lemma 7.35 holds!

The following theorem shows that if {β1 , β2 , . . . , βn } is a set of Reduced bases


of a lattice L, then β1 is the approximation vector of the shortest vector u 0 of lattice
L, and the approximation coefficient rn = 2n−1 .

Theorem 7.5 Let L = L(B) ⊂ Rn be a lattice (full rank lattice), B = [β1 , β2 ,


. . . , βn ] is a set of Reduced bases of L, λ1 = λ(L) is the minimal distance of L,
then
298 7 Lattice-Based Cryptography

n−1 n−1
|β1 | ≤ 2 2 λ1 = 2 2 λ(L). (7.74)

Proof We only prove that for ∀ x ∈ L, x = 0, there is

|β1 |2 ≤ 2n−1 |x|2 , ∀ x ∈ L , x = 0. (7.75)

When x ∈ L, x = 0 given, let


n 
n
x= ri βi = ri βi∗ , ri ∈ Z, ri ∈ R, 1 ≤ i ≤ n.
i=1 i=1

Let k be the largest subscript ⇒ rk = 0, thus rk = rk . So

|x|2 ≥ rk2 |βk∗ |2 ≥ |βk∗ |2 ≥ 21−k |β1 |2 . (7.76)

Thus
|β1 |2 ≤ 2k−1 |x|2 ≤ 2n−1 |x|2 , x ∈ L , x = 0.

That is (7.75) holds, thus Theorem 7.5 holds.

The following results show that not only the shortest vector, the whole Reduced
basis vector is the approximation vector of the Successive Shortest vector of the
lattice.

Lemma 7.36 Let L ⊂ Rn be a lattice, {β1 , β2 , . . . , βn } is a Reduced base of L, let


{x1 , x2 , . . . , xt } ⊂ L be t linearly independent lattice points, then

|β j |2 ≤ 2n−1 max{|x1 |2 , |x2 |2 , . . . , |xt |2 }. (7.77)

For all 1 ≤ j ≤ t holds.

Proof Write

n
xj = ri j βi , ri j ∈ Z, 1 ≤ i ≤ n, 1 ≤ j ≤ t.
i=1

For fixed j, let i( j) the largest positive integer i ⇒ ri j = 0, by (7.76), we have

|x j |2 ≥ |βi(∗ j) |2 , 1 ≤ j ≤ t.

Change the order of x j to ensure i(1) ≤ i(2) ≤ · · · ≤ i(t), then j ≤ i( j), for ∀ 1 ≤
j ≤ t holds. Otherwise, the assumption that

{x1 , x2 , . . . , xn } ⊂ L(β1 , β2 , . . . , β j−1 )


7.5 Approximation of SVP and CVP 299

is linearly independent of x1 , x2 , . . . , x j is contradictory. Thus j ≤ i( j). By (7.73)


of Lemma 7.35, then
|β j |2 ≤ 2i( j)−1 |βi(∗ j) |2
≤ 2n−1 |βi(∗ j) |2
≤ 2n−1 |x j |2 , ∀ 1 ≤ j ≤ t.

Thus (7.77) holds, the Lemma holds.

Remark 7.4 We give a proof of rk = rk in Theorem 7.5, because k is the largest


subscript ⇒ rk = 0, so
 k 
k
x= ri βi = ri βi∗ .
i=1 i=1

By (7.52) and (7.53),


x, βk∗  x, βk∗ 
rk = ∗ 2 , rk = .
|βk | βk , βk∗ 

Because βk , βk∗  = βk∗ , βk∗ , so

x, βk∗ 
rk = = rk .
βk , βk∗ 

In order to discuss the approximation of the Successive Shortest vector of a lattice,


let’s look at the definitions of the continuous minimum λ1 , λ2 , . . . , λn and the Suc-
cessive Shortest vector of a lattice, by Definition 7.6 and Corollary 7.6 in Sect. 7.2,
the continuous minimum λ1 , λ2 , . . . , λn of a full rank lattice is reachable, for all
1 ≤ i ≤ n, there is
|αi | = λi , αi ∈ L , 1 ≤ i ≤ n.

For a Successive Shortest vector called α1 , α2 , . . . , αn , |αi | is the shortest under the
condition that αi is linearly independent of {α1 , α2 , . . . , αi−1 }.

Theorem 7.6 Let {β1 , β2 , . . . , βn } be a Reduced basis of lattice L, and λ1 , λ2 , . . . , λn


be the continuous minimum of L, then we have

|βi |2 ≤ 2n−1 λi , 1 ≤ i ≤ n. (7.78)

Proof We make an induction of i. Because {β1 , β2 , . . . , βi } is an Reduced basis of


lattice L i in Ri , the proposition is obviously true when i = 1 (see Theorem 7.5 ). If
the proposition holds for i − 1, then by Lemma 7.36,

|βi∗ |2 ≤ 2n−1 max{λ1 , λ2 , . . . , λi } = 2n−1 λi .

Therefore, (7.78) holds for all i. The Theorem holds.


300 7 Lattice-Based Cryptography

Next, we choose the Reduced basis to solve the shortest adjacent vector problem
(CVP). For any given t ∈ Rn , because there are only finite lattice points in one lattice
L in the Ball(t, r ) with t as the center and r as the radius, there is a lattice point u t
closest to t, that is
|u t − t| = min |x − t|. (7.79)
x∈L ,x =t

We use the Reduced basis to find a lattice point ω ∈ L ⇒

|ω − t| ≤ r1 (n)|u t − t|, (7.80)

ω is called an approximation of the nearest lattice point u t , and r1 (n) is called an


approximation coefficient. According to Babai (1986), to solve the approximation
of the nearest lattice point u t , we adopt the following two technical means:
(A) rounding off: ∀ x ∈ Rn , [β1 , β2 , . . . , βn ] = B is a Reduced base of lattice L.
The discard vector [x] B of x is defined as follows, let


n
x= xi βi , xi ∈ R,
i=1

Let δi be the nearest integer to xi , then define


n
[x] B = δi βi , (7.81)
i=1

[x] B is called the discard vector of x under base B, write x = [x] B + {x} B , then
 

n
1 1
{x} B ∈ ai βi | − < ai ≤ , 1 ≤ i ≤ u .
i=1
2 2

(B) Adjacent plane


"n−1
Let U "= i=1 Rβi = L(β1 , β2 , . . . , βn−1 ) ⊂ Rn be an n − 1-dimensional subspace,
n
L = i=1 Zβi ⊂ L be a sublattice of L, and v ∈ L, call U + v is an affine plane of
Rn . When x ∈ Rn given, if the distance between x and U + v is the smallest, U + v
is called the nearest affine plane of x.
Let x be the orthogonal projection of x in the nearest affine plane U + v, let
y ∈ L be the vector closest to x − v in L , and let w = y + v be the approximation
of the vector closest to x in L.
∗ ∗ ∗
Let L(β1 , β2 , . . . , βn ) ⊂ Rn be a lattice,
"n {β1∗, β2 , . . . ,n βn } is the corresponding
orthogonal basis. ∀ x ∈ R , write x = i=1 xi βi , xi ∈ R , δi represents the nearest
n

integer of xi , according to the nearest plane method, we take (see Lemma 7.43 below).
7.5 Approximation of SVP and CVP 301


⎪ U = L(β1∗ , β2∗ , . . . , βn−1

) = L(β1 , β2 , . . . , βn−1 )



⎪ r = δn βn ∈ L



⎨ "
n−1
x = xi βi∗ + δn βn∗
(7.82)


i=1

⎪ "
n−1

⎪ y is a sublattice The grid point closest to x − v in L = Zβi



⎩ i=1
ω = y+v

We prove that

Theorem 7.7 Let L = L(B) ⊂ Rn be a lattice, B = [β1 , β2 , . . . , βn ] is a Reduced


base of L, for ∀ x ∈ Rn given, the adjacent plane method produces a lattice point
ω = y + v adjacent to x in L (by (7.82)), satisfies
n
|w − x| ≤ 2 2 |u x − x|, (7.83)

where u x is given by Eq. (7.79) and further

|x − ω| ≤ 2 2 −1 |βn∗ |.
n
(7.84)

Proof If n = 1, then B = θ ∈ R, θ = 0. Let x ∈ R, x = x1 θ , L = nθ , then when


n ∈ Z,
|x − nθ | = |x1 θ − nθ | = |x1 − n||θ | ≥ |x1 − δ||θ |,

where δ is the nearest integer to x1 , let ω = δθ , then

|x − ω| = |x1 − δ||θ | ≤ |x − nθ |, ∀ n ∈ Z.

So ω = δθ is the lattice point closest to x in L, so ω = u x ∈ L, that is

|x − ω| = |u x − x|.

Thus (7.83) holds. "n−1


Let n ≥ 2, we observe (see (7.82)), v = δn βn , x = i=1 xi βi∗ + δn βn∗ , then

1 ∗
|x − x | = |xn − δn ||βn∗ | ≤ |β |, (7.85)
2 n

since the distance between affine planes {u + z|z ∈ L} is at least |βn∗ |, and |x − x |
is the distance between x and the nearest affine plane, there is

|x − x | ≤ |u x − x|. (7.86)

Let ω = y + v = y + δn βn ∈ L, we prove that


302 7 Lattice-Based Cryptography

|x − ω|2 = |x − x |2 + |x − ω|2 . (7.87)

Because x − x = (xn − δn )βn∗ , x − ω = x − v − y ∈ u, so (x − x )⊥(x − ω).


Therefore, by the Pythagorean theorem, (7.87) holds. By induction, we have (see
(7.79))
1
|x − ω|2 ≤ (|β1∗ |2 + |β2∗ |2 + · · · + |βn∗ |2 ).
4
By (7.71),
|βi∗ |2 ≤ 2n−i |βn∗ |2 .

Thus
1 ∗2
|x − ω|2 ≤ |β | (1 + 2 + 22 + · · · + 2n−1 )
4 n
1
= (2n − 1)|βn∗ |2
4
≤ 2n−2 |βn∗ |2 .

There is
|x − ω| ≤ 2 2 −1 |βn∗ |,
n
(7.88)

that is (7.84) holds. To prove (7.83), we have two situations:


Case 1: if u x ∈ U + x,
In this case, u x − v ∈ U ⇒ u x − v ∈ L is the lattice point closest to x − v in L,
so there is

|x − ω| = |x − v − y| ≤ Cn−1 |x − u x | ≤ Cn−1 |x − u x |,
n
where Cn = 2 2 . By (7.87), we have
1
|x − ω|2 ≤ (1 + (Cn−1 )2 ) 2 |x − u| < Cn |x − u|.

The proposition holds.


Case 2: If u x ∈
/ U + x, then

1 ∗
|x − u x | ≥ |β |.
2 n
By (7.88), we get
n
|x − ω| < 2 2 |x − u x |.

Thus, Theorem 7.7 holds.

Comparing Theorems 7.6 and 7.7, when x = 0, the approximation coefficient



n−1
of Theorem 7.6 is 2 2 , for general x ∈ Rn , there is an additional factor 2 in
7.5 Approximation of SVP and CVP 303

the approximation coefficient. Using the rounding off technique, we can give an
approximation to adjacent vectors, another main result in this section is

Theorem 7.8 Let B = [β1 , β2 , . . . , βn ] be a Reduced basis of L, x ∈ Rn given arbi-


trarily, u x ∈ L is the lattice point closest to x, and [x] B is given by Eq. (7.82), then
ω = [x] B ∈ L, and
   n2 
9
|x − [x] B | ≤ 1 + 2n |x − u x |. (7.89)
2

By Theorem 7.8, [x] B ∈ L is an approximation of the nearest lattice point u x , and


- .n
the approximation coefficient is γ1 (n) = 1 + 2n 29 2 , it is a little worse than the
approximation coefficients generated by adjacent planes, but the approximation vec-
tor is relatively simple. In lattice cryptosystem, [x] B as input information has higher
efficiency. To prove Theorem 7.8, we need the following Lemma.

Lemma 7.37 Let B = [β1 , β2 , . . . , βn ] is a Reduced base of Rn , θk represents the


angle between vector βk and subspace Uk , where

Uk = Rβi . (7.90)
i =k

Then for each k, 1 ≤ k ≤ n, we have


 √ n
2
sin θk ≥ . (7.91)
3

Proof 1 ≤ k ≤ n given, ∀ m ∈ Uk , we prove


  n2
9
|βk | ≤ |m − βk |, m ∈ Uk . (7.92)
2

Because
|m − βk |
sin θk = min ,
m∈Uk |βk |

so by (7.92), ⇒ (7.91), the Lemma holds. To prove (7.92), let {β1∗ , β2∗ , . . . , βn∗ } be
the orthogonal basis corresponding to the Reduced basis Reduced {β1 , β2 , . . . , βn },
then m ∈ Uk can express as

 
n
m= ai βi = b j β ∗j , ai , b j ∈ R.
i =k j=1
304 7 Lattice-Based Cryptography

Write ⎡ ⎤ ⎡ ∗⎤
β1 β1
⎢ β2 ⎥ ⎢ β2∗ ⎥
⎢ ⎥ ⎢ ⎥
m = (a1 , . . . , an ) ⎢ . ⎥ = (a1 , . . . , an )U ⎢ . ⎥ .
⎣ .. ⎦ ⎣ .. ⎦
βn βn∗

where ak = 0, U is the transition matrix of Gram–Schmidt orthogonalization (see


(7.87)). Then for any 1 ≤ j ≤ n, 1 ≤ k ≤ n, there is

 
n
bj = ai u i j , βk = u ki βi∗ .
i =k i=1

So

n
m − βk = γ j β ∗j , where γ j = b j − u k j .
j=1

Let ak = −1, then



n 
n
γj = ai u i j = a j + ai u i j . (7.93)
i=1 i= j+1

Therefore, Eq. (7.92) can be rewritten as


k   n2 n
9
|βk | =
2
u 2k j |β ∗j |2 ≤ γ j2 |β ∗j |2 . (7.94)
j=1
2 j=1

Let us first prove the following assertion:


n  2(n−k)
2
γ j2 ≥ . (7.95)
j=k
3

If the above formula does not hold, i.e.,


n  2(n−k)
2
γ j2 < .
j=k
3

Then for all j, k ≤ j ≤ n, there is


 2(n−k)  (n−k)
2 2
γ j2 < ⇒ |γ j | < . (7.96)
3 3
7.5 Approximation of SVP and CVP 305

By (7.93), ⎧

⎪ γn = a n




⎨ γn−1 = an−1 + an u nn−1
γn−2 = an−2 + an−1 u n−1n−2 + an u nn−2


⎪ ···



⎩ γ =a +a u
k k k+1 k+1k + · · · + an u nk

We can prove
 n− j  n−k
3 2
|a j | < · . (7.97)
2 3

Because when j = n, an = γn , (7.96) ensures that (7.97) holds. Reverse induction


of j (k ≤ j ≤ n), by (7.93),


n n
|ai |
|a j | = |γ j − ai u i j | ≤ |γ j | +
i= j+1 i= j+1
2
 n−k    
1  3 n−i 2 n−k
n
2
< +
3 2 i= j+1 2 3
 n−k   n− j−1  
2 1 2 n−k  3 i
= +
3 2 3 i=0
2
 n−k  n−k  n− j 
2 2 3
= + −1
3 3 2
 n− j  n−k
3 2
= .
2 3

Therefore, under the assumption of (7.96), we have (7.97). Take j = k in (7.97), then
|ak | < 1, but ak = −1, this contradiction shows that Formula (7.96) does not hold,
thus (7.95) holds.
We now prove Formula (7.94) to complete the proof of Lemma. By Lemma 7.33,

|βk∗ |2 ≥ 2 j−k |β ∗j |2 , 1 ≤ j ≤ k ≤ n.

And
|βk∗ |2 ≤ 2 j−k |β ∗j |2 , 1 ≤ k ≤ j ≤ n.

Therefore, there is an estimate on the left of Eq. (7.94)


306 7 Lattice-Based Cryptography


k 
k
u 2k j |β ∗j |2 ≤ |βk∗ |2 u 2k j 2k− j
j=1 j=1

1 ∗ 2  k− j
k
≤ |β | 2
4 k j=1
1 ∗2 k
= |β | (2 − 1)
4 k
< 2k |βk∗ |2 .

On the other hand, there is an estimate on the right of (7.94),


n 
n
γ j2 |β ∗j |2 ≥ γ j2 |β ∗j |2
j=1 j=k

n
≥ γ j2 2k− j |βi∗ |2
j=k

n
≥ 2k−n |βk∗ |2 γ j2
j=k
 2(n−k)
k−n 2
≥2 |βk∗ |2
3
  n2
2
≥ |βk∗ |2 .
9

Thus (7.94) holds, we complete the proof of Lemma 7.37.

Now we give the proof of 7.8:

Proof (The proof of Theorem 7.8) Let B = {β1 , β2 , . . . , βn } be a Reduced basis of


lattice L = L(B), 1 ≤ k ≤ n given, Uk is a linear subspace generated by B − {βk },
by Lemma 7.37, we have
  n2
9
|β1 | ≤ |m − βk |, ∀ m ∈ Uk . (7.98)
2

Let x ∈ Rn , ω = [x] B ∈ L, then


n
1
x − ω = x − [x] B = ci βi , |ci | ≤ (1 ≤ i ≤ n).
i=1
2

Let u x be the nearest grid point to x in L, and let


7.5 Approximation of SVP and CVP 307


n
ux − ω = ai βi , ai ∈ Z.
i=1

We prove
  n2
9
|u x − ω| ≤ 2n |u x − x|. (7.99)
2

Might as well make u x = ω, and suppose

|ak βk | = max |a j β j | > 0.


1≤ j≤n

Obviously,
|u x − ω| ≤ n|ak βk |. (7.100)

On the other hand,


n
u x − x = (u x − ω) + (ω − x) = (ai + ci )βi = (ak + ck )(βk − m).
i=1

where
1 
m=− (a j + c j )β j ∈ Uk .
ak + ck j =k

By (7.99),
  n2
1 2
|u x − x| = |ak + ck ||βk − m| ≥ |βk ||ak |.
2 9

There is   n2
9
|ak βk | ≤ 2 |u x − x|.
2

So   n2
9
|u x − ω| ≤ 2n |u x − x|.
2

That is (7.99) holds, finally,



  n2 
9
|x − ω| ≤ |x − u x | + |u x − ω| ≤ 1 + 2n |x − u x |.
2

We complete the proof of Theorem 7.8.


308 7 Lattice-Based Cryptography

7.6 GGH/HNF Cryptosystem

Lattice-based cryptosystem is the main research object of postquantum cryptography.


Since it was first proposed in 1996, it has only a history of more than 20 years.
Among them, the representative technologies are Ajtai-Dwork cryptosystem, GGH
cryptosystem, McEliece-Niederreiter cryptosystem and NTRU cryptosystem based
on algebraic code theory. We will introduce them, respectively, below.
GGH cryptosystem is a cryptosystem based on lattice theory proposed by Gol-
dreich, Goldwasser and Halevi in 1997. It is generally considered that it is a new
public key cryptosystem to replace RSA in the postquantum cryptosystem era.
Let L ⊂ Zn be an integer lattice, B and R are two generating matrices of L, that
is
L = L(B) = L(R).

Because there is a unique HNF base in L (see Lemma 3.4). Let B = HNF(L) be
HNF matrix, B as public key and R as private key. Let v ∈ Zn be an integer point,
e ∈ Rn is an error vector. Let σ be a parameter vector. Take e = σ or e = −σ , they
each chose with a probability of 21 .
Encryption: for the plaintext v ∈ Zn encoded and input and the error vector ran-
domly selected according to the parameter vector σ , the public key B is used for
encryption. The encryption function f B,σ is defined as

f B,σ (v, e) = Bv + e = c ∈ Rn . (7.101)

Decryption: decrypt cryptosystem text c with private key R, because c ∈ Rn ,


R = [α1 , α2 , . . . , αn ], then c can be expressed in {α1 , α2 , . . . , αn } linearity,


n
c= xi αi , xi ∈ R.
i=1

Let δi be the nearest integer to xi , define (see (7.81))


n
[c] R = δi αi ∈ L . (7.102)
i=1

Define the decryption function as



−1
f B,σ (c) = B −1 [c] R = v,
(7.103)
e = c − Bv.

−1
In order to verify the correctness of decryption function f B,σ , we first prove the
following simple Lemma. For any x ∈ R , and R = [α1 , α2 , . . . , αn ] ∈ Rn×n is any
n

set of bases of Rn , if x = (a1 , a2 , . . . , an ) ∈ Rn , γi represents the integer closest to


7.6 GGH/HNF Cryptosystem 309

ai , then define (see (7.7))

[x] = (γ1 , γ2 , . . . , γn ) ∈ Zn . (7.104)


"n
Write x = i=1 xi αi , δi is the nearest integer to xi , then define (see (7.102))


n
[x] R = δi αi ∈ L(R). (7.105)
i=1

Lemma 7.38 For ∀ x ∈ Rn , R ∈ Rn×n is a set of bases of Rn , we have

[x] R = R[R −1 x].

Proof Write
⎡ ⎤ ⎡ ⎤
a1 δ1
⎢ a2 ⎥ ⎢ δ2 ⎥
⎢ ⎥ ⎢ ⎥ 1
x = ⎢ . ⎥ ∈ Rn ⇒ [x] = ⎢ . ⎥ ∈ Zn , |ai − δi | ≤ .
.
⎣ . ⎦ .
⎣ . ⎦ 2
an δn
"n
If x = i=1 xi αi , R = [α1 , α2 , . . . , αn ], then
⎡⎤ ⎡ ⎤
x1 δ1
⎢ x2 ⎥ ⎢ δ2 ⎥
⎢ ⎥ ⎢ ⎥
x = R ⎢ . ⎥ , and [x] R = R ⎢ . ⎥ , δi is the nearest integer to xi .
⎣ .. ⎦ ⎣ .. ⎦
xn δn

Thus ⎡ ⎤
δ1
⎢ δ2 ⎥
⎢ ⎥
R −1 [x] R = ⎢ . ⎥ = [R −1 x].
⎣ .. ⎦
δn

Lemma 7.38 holds.

Theorem 7.9 Let L = L(R) = L(B) ⊂ Zn be an integer lattice, B is the public


key, R is the private key, v ∈ Zn is plaintext, e is the error vector. If and only if
[R −1 e] = 0,
−1
f B,σ (c) = v.

Proof By the definition, cryptosystem text c = Bv + e = f B,σ (v, e), and


−1
f B,σ (c) ≡ B −1 [c] R = B −1 R[R −1 c] = T [R −1 c]. (7.106)
310 7 Lattice-Based Cryptography

where T = B −1 R ∈ Rn×n is a unimodular matrix. Because L(B) = L(R), ⇒

B = RU, U ∈ S Ln(Z).

So
B −1 R = U R −1 R = U = T,

that is T is a unimodular matrix. By (7.106),

T [R −1 c] = T [R −1 (Bv + e)]
= T [R −1 Bv + R −1 e]
= T [T −1 v + R −1 e].

Because T is a unimodular matrix, v ∈ Zn , so

[T −1 v + R −1 e] = T −1 v + [R −1 e]. (7.107)

Thus
T [R −1 c] = v + T [R −1 e].

That is
−1
f B,σ (c) = v + T [R −1 e].

Because T is a unimodular matrix, T [R −1 e] = 0 ⇔ [R −1 e] = 0, so the Theorem


holds.

By Theorem 7.9, whether the GGH cryptographic mechanism is correct or not


depends entirely on whether [R −1 e] is a 0 vector, that is
−1
f B,σ (c) = v ⇔ [R −1 e] = 0. (7.108)

Therefore, when the private key R is given, the selection of error vector e and param-
eter vector σ becomes the key to the correctness of GGH password. Notice that
(7.106), if we decrypt with public key B, then

[B −1 c] = [B −1 (Bv + e)] = [v + B −1 e] = v + [B −1 e].

Therefore, the basic condition for the security and accuracy of GGH password is

[R −1 e] = 0
(7.109)
[B −1 e] = 0.

Because the public key B we choose is HNF matrix, [B −1 e] = 0 is easy to satisfy.


Let B = (bi j )n×n ⇒ B −1 = (ci j )n×n . Where cii = bii−1 . Let e = (e1 , e2 , . . . , en ),
7.6 GGH/HNF Cryptosystem 311

each ei has the same absolute value, that is |ei | = σ , σ is the parameter. Thus,
2|en | > bnn ⇒ [B −1 e] = 0. Let’s focus on [R −1 e] = 0.
∀ x = (x1 , x2 , . . . , xn ) ∈ Rn , define the L 1 norm |x|1 and L ∞ norm |x|∞ of x as


n
|x|∞ = max |xi |, |x|1 = |xi |. (7.110)
1≤i≤n
i=1

⎡ ⎤
α1
⎢ α2 ⎥
⎢ ⎥
Lemma 7.39 Let R ∈ Rn×n be a reversible square matrix, R −1 = ⎢ . ⎥, where αi
⎣ .. ⎦
αn
−1
is the row vector of R . e = (e1 , e2 , . . . , en ) ∈ R , |ei | = σ , ∀ 1 ≤ i ≤ n, let
n

ρ = max |αi |(|αi |1 ) (7.111)


1≤i≤n

be the maximum of the L 1 norm of n row vectors of R −1 , then when σ < 1



, we have
[R −1 e] = 0.

Proof Suppose αi = (ci1 , ci2 , . . . , cin ), the i-th component of R −1 e can be written
as  n 
   n
 
 ci j e j  ≤ σ |ci j | = σ |αi |∞ ≤ σρ.
 
i=1 j=1

If σ < 1

, then each component of R −1 e is < 21 , there is [R −1 e] = 0.
⎡ ⎤
α1
⎢ α2 ⎥
⎢ ⎥ √γ ,
Lemma 7.40 R ∈ Rn×n , R −1 = ⎢ . ⎥, let max1≤i≤n |αi |∞ = then the prob-
⎣ .. ⎦ n

αn
ability of [R −1 e] = 0 is
 
1
P{[R −1 e] = 0} ≤ 2n exp − . (7.112)
8σ 2 γ 2

where σ is the parameter, error vector e = (e1 , . . . , en ), |ei | = σ .


⎡ ⎤
a1
⎢ a2 ⎥ "
⎢ ⎥
Proof Let R −1 = (ci j )n×n , R −1 e = ⎢ . ⎥, where ai = nj=1 ci j e j .
.
⎣ . ⎦
an
Because |ci j | ≤ √γn , |e j | = σ , then ci j e j is in interval [− √
γσ √
n
, γ σn ]; therefore, by
Hoeffding inequality, we have
312 7 Lattice-Based Cryptography
⎧  ⎫

1
 ⎨ n 
 1⎬

1

P |ai | > =P  
ci j e j  > < 2 exp − 2 2 .
2 ⎩  2⎭ 8σ γ
j=1

To satisfy [R −1 e] = 0, then only one of the above conditions {|ai | > 21 } is true. Thus

n  
 1
P{[R −1 e] = 0} = P |ai | >
i=1
2
n  
1
≤ P |ai | >
i=1
2
 
1
< 2n exp − 2 2 .
8σ γ

The Lemma holds.

Corollary 7.8 For any given ε > 0, when parameter σ satisfies


 2 −1
2n
σ ≤ γ 8 log ⇒ P{[R −1 e] = 0} < ε. (7.113)
ε

In order to have a direct impression of Eq. (7.113), let’s give an example. Let n = 120,
ε = 10−5 , when the elements of matrix R −1 = (ci j )n×n change in the interval [−4, 4],
that is −4 ≤ ci j ≤ 4, then it can be verified that the maximum L ∞ norm of the row
vector of R −1 is approximately equal to 30×√ 1
, thus γ = 301
, by Corollary, when
 120
−1
σ ≤ ( 30 8 log 240 × 10 ) ≈ 11.6 ≈ 2.6, we have
1 5 30

P{[R −1 e] = 0} < 10−5 .

It can be seen from the above analysis that GGH cryptosystem does not effectively
solve the selection of private key R, public key B, especially parameter σ and error
vector. In 2001, Professor Micciancio of the University of California, San Diego
further improved GGH cryptosystem by using HNF basis and adjacent plane method.
In order to introduce GGH/HNF cryptosystem, we review several important results
in the previous sections.

Lemma 7.41 Let L = L(B) ⊂ Rn be a lattice, B = [β1 , β2 , . . . , βn ] is the generat-


ing base, B ∗ = [β1∗ , β2∗ , . . . , βn∗ ] is the corresponding orthogonal basis, λ1 = λ(L)
is the minimum distance of L, then
(i)
λ1 = λ(L) ≥ min |βi∗ |. (7.114)
1≤i≤n

For L = L(B), take parameter ρ = ρ(B) as


7.6 GGH/HNF Cryptosystem 313

1
ρ= min |β ∗ |. (7.115)
2 1≤i≤n i

Then for any x ∈ Rn , there is at most one grid point

α ∈ L ⇒ |x − α| < ρ. (7.116)

(ii) Suppose L ⊂ Zn is an integer lattice, then L has a unique HNF base B, that is
L = L(B), B = (bi j )n×n , satisfies

0 ≤ bi j < bii , when 1 ≤ i < j ≤ n, bi j = 0, when 1 ≤ j < i ≤ n.

That is, B is an upper triangular matrix, and the corresponding orthogonal basis
B ∗ of B is a diagonal matrix, that is

B ∗ = diag{b11 , b22 , . . . , bnn }.

Proof Equation (7.114) is given by Lemma 7.18 and the property (ii) is given by
Lemma 7.26. We only prove that if there is lattice point α ∈ L ⇒ |x − α| < ρ, then
α is the only one. Let α1 ∈ L , α2 ∈ L, and

|α1 − x| < ρ, |α2 − x| < ρ ⇒ |α1 − α2 | < 2ρ = min |βi∗ | ≤ λ1 .


1≤i≤n

Because α1 − α2 ∈ L, this contradicts the definition of λ1 . There is α1 = α2 .

In the previous section, we introduced Babai’s adjacent plane method (see (7.82)).
The distance between two subsets A1 and A2 in Rn is defined as

|A1 − A2 | = min{|x − y||x ∈ A1 , y ∈ A2 }.

x ∈ Rn is a vector, A ⊂ Rn is a subset, the distance between x and A is defined as

|x − A| = min{|x − y||y ∈ A}.

Suppose L ∈ Rn , B = [β1 , β2 , . . . , βn ] is a generating base, B ∗ = [β1∗ , β2∗ , . . . , βn∗ ]


is the corresponding orthogonal basis. Define subspace
 "n−1
U = L(β1 , β2 , . . . , βn−1 ) = Rn−1 , L = i=1 Zβi is a sub-lattice.
Av = U + v, v ∈ L.

Av is called an affine plane with v as the representative element. Any x ∈ Rn , let Av


be the affine plane closest to x, that is

|x − Av | = min{|x − Aα ||α ∈ L}.


314 7 Lattice-Based Cryptography

Let x be the orthogonal projection of x on Av . Because x − v ∈ U = Rn−1 . Recur-


sively let y ∈ L be the nearest lattice point to x − v. Then we define the adjacent
plane operator τ B of x under base B as

τ B (x) = w = y + v ∈ L . (7.117)

Lemma 7.42 Under the above definition, if v1 , v2 ∈ L, and Av1 = Av2 , then

|Av1 − Av2 | ≥ |βn∗ |. (7.118)

Proof v1 , v2 ∈ L, then it can be given by the linear combination of {β1∗ , β2∗ , . . . , βn∗ },
that is  "n
v1 = i=1 ai βi∗ , where ai ∈ R, an ∈ Z.
"n
v2 = i=1 bi βi∗ , where bi ∈ R, bn ∈ Z.

In order to prove the n-th component, an and bn are integers, let


n 
n
v1 = ai∗ βi , v2 = bi∗ βi , ai∗ , bi∗ ∈ Z.
i=1 i=1

Therefore,
v1 , βn∗  an∗ βn , βn∗ 
an = = = an∗ ∈ Z.
|βn∗ |2 |βn∗ |2

The above equation uses Eq. (7.52), which can prove bn ∈ Z in the same way. By
condition v1 − v2 ∈
/ U , then an = bn , therefore

|Av1 − Av2 | = |an − bn ||βn∗ | ≥ |βn∗ |.

We have completed the proof of Lemma.

"n 7.43 ∗Under the above definitions and symbols, suppose x ∈ R ,


n
Lemma
x = i=1 γi βi , δ is the nearest integer to γn , then
(i)

n−1
v = δβn , x = γi βi∗ + δβn∗ . (7.119)
i=1

That is, the affine plane closest to x is Aβn , the orthogonal projection of x on Av
is x .
(ii) Let u x ∈ L be the lattice point closest to x, then

|x − x | ≤ |x − u x |. (7.120)
7.6 GGH/HNF Cryptosystem 315

Proof Take v = δβn , then v ∈ L,"we want to prove that the distance between x and
n
Av is the smallest. Because x = i=1 γi βi∗ , so (see (7.119))


n−1
x −v = γi βi∗ + (γn − δ)βn∗ ,
i=1

1 ∗
=⇒ |x − Av | = |x − v − U | ≤ |γn − δ||βn∗ | ≤ |β |.
2 n
Let v1 ∈ L , v − v1 ∈
/ U , by trigonometric inequality,

1 1
|x − Av1 | ≥ |Av1 − Av | − |x − Av | ≥ |βn∗ | − |βn∗ | = |βn∗ | ≥ |x − Av |.
2 2

So it is correct to take v = δβn . Secondly, we prove that the orthogonal projection x


of x and affine plane Av is

n−1
x = γi βi∗ + δβn∗ .
i=1

Let’s first prove x ∈ Av . Because v = δβn , and


n−1 
n−1
βn = ci βi∗ + βn∗ ⇒ δβn = δci βi∗ + δβn∗ = v. (7.121)
i=1 i=1

Thus

n−1
x −v = (γi − δci )βi∗ ∈ U.
i=1

That is x ∈ U + v = Av . And x − x = δβn∗ ⇒ (x − x )⊥U . Because


3
U Av = ∅.

Then Av and U are two parallel planes, thus (x − x )⊥Av . This proves that the
orthogonal projection of x on Av is x , and thus (i) holds.
The proof of (ii) is direct. By the definition of x and any affine plane Aα , the
distance of α ∈ L satisfies
|x − α| ≥ |x − Aα |.

When α = v, because (x − x )⊥Av , thus

|x − x | = |x − Av | ≤ |x − Aα |, ∀ α ∈ L .

Let u x ∈ L be the lattice point closest to x, then take α = u x , there is


316 7 Lattice-Based Cryptography

|x − x | ≤ |x − Au x | ≤ |x − u x |.

The Lemma holds.

Lemma 7.44 Let L = L(B) ⊂ Rn be a lattice, x ∈ Rn , α ∈ L. If |x − α| < ρ,


where ρ = 21 min{|βi∗ ||1 ≤ i ≤ n}, then the nearest plane operator τ B has

τ B (x) = α. (7.122)

Proof Because of
|x − Aα | ≤ |x − α| < ρ.

By Lemma 7.42, Aα is the plane Av closest to x, that is Aα = Av . And τ B (x) = w =


y + v, then we have
|x − w| ≤ |x − α| < ρ. (7.123)

By Lemma 7.41, we have α = w = τ B (x). The Lemma holds!

Now let’s introduce the workflow of GGH/HNF password:


1. L = L(B) = L(R) ⊂ Zn is an integer lattice, R = [r1 , r2 , . . . , rn ] is the private
key, B = [β1 , β2 , . . . , βn ] is the public key, and is the HNF basis of L, where

B ∗ = diag{b11 , b22 , . . . , bnn }.

We choose the private key R as a particularly good base, that is ρ = 1


2
min{|ri∗ ||1 ≤
i ≤ n}. Specially, public key B satisfies

1
bii < ρ, ∀ 1 ≤ i ≤ n.
2
2. Let v ∈ Zn be an integer, e ∈ Rn is the error vector, satisfies |e| < ρ.
3. Encryption: after any plaintext information v ∈ Zn and error vector e are selected,
with ρ as the parameter, the encryption function f B,ρ is defined as

f B,ρ (v, e) = Bv + e = c.

4. Decryption: We decrypt cryptosystem text c with private key R. Decryption is


transformed into
−1
f B,ρ (v, e) = B −1 τ R (c).

where τ R is the nearest plane operator defined by R.

By Lemma 7.44, when |e| < ρ, ⇒ |c − Bv| = |e| < ρ, thus

B −1 τ R (c) = B −1 τ R (Bv + e) = B −1 Bv = v. (7.124)


7.6 GGH/HNF Cryptosystem 317

This ensures the correctness of decryption.


Comparing GGH with GGH/HNF, they choose the same encryption function, but
the decryption transformation is very different. GGH adopts Babai’s rounding off
method, while GGH/HNF adopts Babai’s nearest plane method. There is a certain
difference between the two at the selection point of error vector e. The error vector
e of GGH depends on each component of parameter σ and e, and ±σ . The error
vector e of GGH/HNF depends on the parameter ρ as long as the length is less than
ρ. Therefore, GGH/HNF has greater flexibility in the selection of error vector e.
Next, we explain the reason why public key B chooses HNF basis. For any
entire lattice L = L(B) ⊂ Zn , B ∗ = [β1∗ , β2∗ , . . . , βn∗ ] is the corresponding orthog-
onal basis. Using the congruence relation mod L, we define an equivalent relation
in Rn , which is also the equivalent relation between integral points in Zn . By Lemma
7.24, quotient group Zn /L is a finite group, and |Zn /L| = d(L). We further give a
set of representative elements of Zn /L. Let
 

n
F(B ∗ ) = xi βi∗ |0 ≤ xi < 1 (7.125)
i=1

be a parallelogram, it can be compared with the base area F = F(B) of Rn /L (see


Lemma 7.16).  n 

F = F(B) = xi βi |0 ≤ xi < 1 .
i=1

F is just a quadrilateral.

Lemma 7.45 For any integer point α ∈ Zn , there is a unique w ∈ F(B ∗ ) such that

α ≡ w(mod L).

Proof α ∈ Zn is a integer point, then α can be expressed as a linear combination of


B ∗ , write
n
α= ai βi∗ , ai ∈ R.
i=1

[ai ] represents the largest integer not greater than ai , Suppose


n 
n
w= ai βi∗ − [ai ]βi . (7.126)
i=1 i=1

Then

n
α−w = [ai ]βi ∈ L ⇒ α ≡ w(mod L).
i=1
318 7 Lattice-Based Cryptography

We prove that w ∈ F(B ∗ ), linearly express w with the basis vector of B ∗ ,


n
w= bi βi∗ .
i=1

We can only prove that 0 ≤ bi < 1. By (7.52), it is not difficult to have

w, βn∗  (an − [an ])|βn∗ |2


bn = = = an − [an ].
|βn∗ |2 |βn∗ |2

Thus 0 ≤ bn < 1, It is not difficult to verify that ∀ 1 ≤ i ≤ n, we have 0 ≤ bi < 1


by induction, that is w ∈ F(B ∗ ). To prove uniqueness. Let


n
w= ai βi∗ , where |ai | < 1.
i=1

We prove that if
w = 0(mod L) ⇔ w = 0. (7.127)
"n
Write w = i=1 bi βi , then by (7.52), there is

w, βn∗  bn |βn∗ |2


an = ∗ ∗
= = bn .
βn , βn  |βn∗ |2

Because of w ∈ L and |bn | < 1 ⇒ bn = 0. It is not difficult to have b1 = b2 = · · · =


bn = 0 by induction. That is w = 0, (7.127) holds.
α ∈ Zn , if w1 ∈ F(B ∗ ), w2 ∈ F(B ∗ ), α ≡ w1 (mod L), α ≡ w2 (mod L), then

w1 − w2 ≡ 0(mod L).

By (7.127), there is w1 = w2 . As can be seen from the above, w1 ∈ F(B ∗ ), w2 ∈


F(B ∗ ), then when w1 = w2 , there is w1 ≡ w2 (mod L), that is, the points in F(B ∗ )
are not congruent under mod L, the Lemma holds.

From the above lemma, any two points in parallelogram F(B ∗ ) are not congruent
mod L, therefore, for not congruent lattice points α1 , α2 ∈ L, then

{F(B ∗ ) + α1 } ∩ {F(B ∗ ) + α2 } = ∅.

Thus, Rn can be split into


Rn = ∪α∈L F(B ∗ ) + α. (7.128)

By Lemma 7.45, any α ∈ Zn , there exists a unique w ∈ F(B ∗ ) ⇒ α ≡ w(mod L),


define
7.6 GGH/HNF Cryptosystem 319

w = α mod L .

Then α → α mod L gives a surjection of Zn → Zn ∩ F(B ∗ ), this mapping is a 1-1


correspondence of Zn /L → Zn ∩ F(B ∗ ). Because if α, β ∈ Zn , then

α ≡ β(mod L) ⇒ α mod L = β mod L ∈ Zn ∩ F(B ∗ )
α ≡ β(mod L) ⇒ α mod L = β mod L .

By Lemma 7.24, we obviously have the following Corollary.


Corollary 7.9 If L = L(B) ⊂ Zn is an integer lattice, then F(B ∗ ) ∩ Zn is a repre-
sentative element set of Zn /L, and

|F(B ∗ ) ∩ Zn | = d(L). (7.129)

If B is the HNF basis of the whole lattice L, then B ∗ = diag{b11 , b22 , . . . , bnn },
thus, parallelogram F(B ∗ ) takes the simplest form:

F(B ∗ ) = {(x1 , x2 , . . . , xn )|0 ≤ xi < bii }. (7.130)

This is a cube with a volume of d(L). Thus

Zn /L = F(B ∗ ) ∩ Zn = {(x1 , x2 , . . . , xn )|0 ≤ xi < bii , xi ∈ Z}. (7.131)

This is another proof of Lemma 7.24.


α mod L is called the reduction vector of α under module L, for any α ∈ Zn ,
express that the number of bits of the reduction vector α mod L is


n 
log bii = log (bii ) = log d(L). (7.132)
i=1

To sum up, the parallelogram of the HNF basis of L has a particularly simple
geometry, which is actually a cube, which is very helpful for calculating the reduction
vector x mod L of an entire point x ∈ Zn , the reduction vector is of great significance
in the further improvement and analysis of GGH/HNF cryptosystem. For detailed
work, please refer to D. Micciancio’s paper (Micciancio, 2001) in 2001.

7.7 NTRU Cryptosystem

NTRU cryptosystem is a new public key cryptosystem proposed in 1996 by the


number theory research unit (NTRU) composed of three digit theorists J. Hoffstein,
J. Piper and J. Silverman of Brown University in the USA. Its main feature is that
320 7 Lattice-Based Cryptography

the key generation is very simple, and the encryption and decryption algorithms are
much faster than the commonly used RSA and elliptic curve cryptography, NTRU, in
particular, can resist quantum computing attacks and is considered to be a potential
public key cryptography that can replace RSA in the postquantum cryptography era.
The essence of NTRU cryptographic design is the generalization of RSA on
polynomials, so it is called the cryptosystem based on polynomial rings. However,
NTRU can give a completely equivalent form by using the concept of q-ary lattice, so
NTRU is also a lattice based cryptosystem. For simplicity, we start with polynomial
rings.
Let Z[x] be a polynomial ring with integral coefficients and N ≥ 1 be a positive
integer. We define the polynomial quotient ring R as

R = Z[x]/x N − 1 = {a0 + a1 x + · · · + a N −1 x N −1 |ai ∈ Z}.

Any F(x) ∈ R, F(x) can be written as an entire vector,

N −1

F(x) = Fi x i = (F0 , F1 , . . . , FN −1 ) ∈ Z N . (7.133)
i=0

In R, we define a new operation ⊗ called the convolution of two polynomials. Let

N −1
 N −1

F(x) = Fi x i , G(x) = Gi x i .
i=0 i=0

Define
N −1

F ⊗ G = H (x) = Hi x i = (H0 , H1 , . . . , HN −1 ).
i=0

For any k, 0 ≤ k ≤ N − 1,


k N −1

Hk = Fi G k−i + Fi G N +k−i
i=0 i=k+1
 (7.134)
= Fi G j .
0≤i<N
0≤ j<N
i+ j≡k(mod N )

Lemma 7.46 Under the new multiplication, R is a commutative ring with unit ele-
ments.

Proof By (7.134),

F ⊗ G = G ⊗ F, F ⊗ (G + H ) = F ⊗ G + F ⊗ H.
7.7 NTRU Cryptosystem 321

So R forms a commutative ring under ⊗.


If a ∈ Z, 0 ≤ a ≤ N − 1, is a constant polynomial in R, then

a ⊗ F = a F = (a F0 , a F1 , . . . , a FN −1 ).

Therefore, R has the unit element a = 1. The Lemma holds..


Let F(x) = (F0 , F1 , . . . , FN −1 ) ∈ R. Define

N −1
1 
F̃ = Fi , is arithmetic mean of the coefficients of F. (7.135)
N i=0

The L 2 norm (European norm) and L ∞ norm of F are defined as


 " N −1 1
|F|2 = ( i=0 (Fi − F̃)2 ) 2
(7.136)
|F|∞ = max0≤i≤N −1 Fi − min0≤i≤N −1 Fi .

Definition 7.11 Let d1 , d2 be two positive integers, and d1 + d2 ≤ N , define poly-


nomial set A(d1 , d2 ) as

A(d1 , d2 ) = {F ∈ R|F has d1 coefficients of 1, d2 coefficients of − 1,


other coefficients are 0}. (7.137)

Lemma 7.47 Let 1 ≤ d < [ N2 ],


(i) Suppose F ∈ A(d, d − 1), then
2
1
|F|2 = 2d − 1 − .
N

(ii) If F ∈ A(d, d), then √


|F|2 = 2d.

Proof If F ∈ A(d, d − 1), by (7.135), then F̃ = 1


N
, thus

N −1 
 2
1
(|F|2 ) =
2
Fi −
i=0
N
N −1 
 
2 1
= Fi2 − Fi + 2
i=0
N N
2 1 1
= 2d − 1 − + = 2d − 1 − ,
N N N

so (i) holds. If F ∈ A(d, d), then F̃ = 0, thus


322 7 Lattice-Based Cryptography

(|F|2 )2 = 2d, ⇒ |F|2 = 2d.

The Lemma holds.

The parameters of NTRU cryptosystem are three positive integers, N , q, p, where


1 ≤ p < q, and ( p, q) = 1, that is

parameter system = {(N , q, p)|1 ≤ p < q, and ( p, q) = 1}.

When the parameter (N , q, p) is selected, we will discuss the key generation of


NTRU.
Key generation. Each NTRU user selects two polynomials f ∈ R, g ∈ R, deg f =
deg g = N − 1, as private key. Take f = ( f 0 , f 1 , . . . , f N −1 ), g = (g0 , g1 , . . . , g N −1 )
as the row vector, then ( f, g) ∈ Z2N ⊂ R 2N . Where f mod q is reversible as a poly-
nomial on Zq and f mod p is reversible as a polynomial on Z p , that is ∃ Fq ∈
Zq [x], F p ∈ Z p [x] such that

Fq ⊗ f ≡ 1(mod q), and F p ⊗ f ≡ 1(mod p). (7.138)

When the private key ( f, g) is selected, the public key h is given by the following
formula:
h ≡ Fq ⊗ g(mod q). (7.139)

h can be regarded as a polynomial on Zq . Quotient rings Zq and Z p are


4 q q5
Zq = Z/qZ = a ∈ Z| − ≤ a < .
2 2
4 p p5
Z p = Z/ pZ = a ∈ Z| − ≤ a < .
2 2
Encryption transformation. User B wants to use NTRU to send encrypted information
m to user A. First, the plaintext m is encoded as m ∈ R, that is m ∈ Z N , then take
the value under mod p, that is
m ∈ Z Np .

Then select a polynomial φ ∈ R, deg φ = N − 1 at random, then use the public key
h of user A for encryption. The encryption function σ is

σ (m) = c ≡ pφ ⊗ h + m(mod q). (7.140)

c is the cryptosystem text received by user A, c is a polynomial on Zq and a vector


in ZqN .
Decryption transformation. After receiving cryptosystem text c, user A decrypts
it with its own private keys f and F p and first calculates
7.7 NTRU Cryptosystem 323

a ≡ f ⊗ c(mod q). (7.141)

a as a polynomial on Zq , that is, a ∈ ZqN is unique. Finally, the decryption transform


σ −1 is
σ −1 (c) ≡ a ⊗ F p (mod p). (7.142)

Why is the decryption transformation correct? If the parameter selection meets

pφ ⊗ h + m ∈ ZqN . (7.143)

Then
c = pφ ⊗ h + m. (7.144)

Similarly, if c ⊗ f ∈ ZqN , then a = f ⊗ c. By (7.142),

a ⊗ F p = F p ⊗ f ⊗ c ≡ c = pφ ⊗ h + m(mod p).

Thus
a ⊗ F p ≡ m(mod p).

Because m ∈ ZqN , so

σ −1 (c) ≡ a ⊗ F p ≡ m(mod p), ⇒ σ −1 (c) = m.

Therefore, the decryption transformation is correct under the conditions of (7.143)


and c ⊗ f ∈ ZqN .
NTRU’s encryption and decryption transformation cannot guarantee the correct
decryption of 100%. Because a is taken out as a polynomial under mod q for decryp-
tion operation (see (7.142)). To satisfy (7.144), and c ⊗ f ∈ ZqN , then the following
formula is necessary,

| f ⊗ c|∞ = | f ⊗ ( pφ ⊗ h + m)|∞ < q. (7.145)

Therefore, as a necessary condition, when the following formula holds, (7.145) holds.
q q
| f ⊗ m|∞ ≤ , and | pφ ⊗ g|∞ ≤ . (7.146)
4 4
Lemma 7.48 For any ε > 0, there are constants r1 and r2 > 0, depending only on ε
and N , for randomly selected polynomial F, G ∈ R, then the probability of satisfying
the following formula is ≥ 1 − ε, that is

P{r1 |F|2 |G|2 ≤ |F ⊗ G|∞ ≤ r2 |F|2 |G|2 } ≥ 1 − ε.

Proof See reference Hoffstein et al. (1998) in this chapter.


324 7 Lattice-Based Cryptography

By Lemma, to satisfy (7.146), we choose three parameters d f , dg and d, where

f ∈ A(d f , d f − 1), g ∈ A(dg , dg ), φ ∈ A(d, d). (7.147)

By Lemma 7.47, | f |2 , |g|2 and |φ|2 are known, we choose


q q
| f |2 · |m|2 ≈ , |φ|2 · |g|2 ≈ . (7.148)
4r2 4 pr2

Then, Eq. (7.146) can be guaranteed to be true (in the sense of probability), so that
the success rate of the decryption algorithm will be greater than 1 − ε. Thus, (7.148)
becomes the main parameter selection index of NTRU.
Next, we use the concept of q-element lattice to make an equivalent description
of the above NTRU. We first discuss it from the cyclic matrix. Let T and T1 be the
following two N -order square matrices.
⎛ ⎞ ⎛ ⎞
0 ··· 0 1 0
⎜ 0⎟ ⎜ 0 In−1 ⎟
⎜ ⎟ ⎜ ⎟
T =⎜ .. ⎟ , T1 = ⎜ .. ⎟.
⎝ In−1 . ⎠ ⎝. ⎠
0 10 0 0

Then T N = T1N = I N , T1 = T , and T1 = T −1 , because T is an orthogonal matrix


⇒ T1 = T N −1 , where I N is the N -th order identity matrix, let a = (a1 , a2 , . . . , a N ) ∈
R N , it is easy to verify
⎡ ⎤ ⎡ ⎤
a1 aN
⎢ a2 ⎥ ⎢ a1 ⎥
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥
T · ⎢ a3 ⎥=⎢ a2 ⎥ , (a1 , a2 , a3 , . . . , a N )T1 = (a N , a1 , a2 , . . . , a N −1 ).
⎢ .. ⎥ ⎢ .. ⎥
⎣ . ⎦ ⎣ . ⎦
aN a N −1
⎡ ⎤ (7.149)
a1
⎢ a2 ⎥
⎢ ⎥
The following general assumptions a = ⎢ . ⎥ ∈ R N are the column vector. The
⎣ .. ⎦
aN
N -order cyclic matrix T ∗ (a) generated by a is defined as

T ∗ (a) = [a, T a, T 2 a, . . . , T N −1 a]. (7.150)

If b = (b1 , b2 , . . . , b N ) ∈ R N is a row vector, we define an N -order matrix


7.7 NTRU Cryptosystem 325
⎡ ⎤
b
⎢ bT1 ⎥
⎢ ⎥
T1∗ (b) = ⎢ .. ⎥. (7.151)
⎣ . ⎦
bT1N −1

In order to distinguish in the mathematical formula, T ∗ (a) and T1∗ (a) are some-
times written as T ∗ a and T1∗ a or [T ∗ a] and [T1∗ a]. Obviously, the transpose of T ∗ (a)
is ⎡ ⎤
a
⎢ a T1 ⎥
⎢ ⎥
(T ∗ (a)) = ⎢ .. ⎥ = T1∗ (a ). (7.152)
⎣ . ⎦
a T1N −1

Equation (7.150) is column vector blocking of cyclic matrix, in order to obtain row
vector blocking of cyclic matrix. For any x ∈ (x1 , . . . , x N ) ∈ R N , we let

x = (x N , x N −1 , . . . , x2 , x1 ) ⇒ x = x.

Similarly, define column vectors x. So for any column vector a ∈ R N , we have


⎡ ⎤
a T1
⎢ a T2 ⎥
⎢ 1 ⎥
T ∗ (a) = [a, T a, T 2 a, . . . , T N −1 a] = ⎢ . ⎥. (7.153)
⎣ .. ⎦
a T1N

On the right side of (7.153) is a cyclic matrix, which is partitioned by rows. We first
prove that the transpose of the cyclic matrix is still a cyclic matrix.
⎡ ⎤
α1
⎢ α2 ⎥
⎢ ⎥
Lemma 7.49 ∀ a = ⎢ . ⎥ ∈ R N , then (T ∗ (a)) = T ∗ (T −1 a).
⎣ .. ⎦
αN
⎡ ⎤
α1
⎢ α2 ⎥
⎢ ⎥
Proof Let α = ⎢ . ⎥ ∈ R N , by (7.152), (T ∗ (a)) = T1∗ (a ), where
⎣ .. ⎦
αN
α = (α1 , . . . , α N ) is the transpose of α, let

β = (α1 , α N , α N −1 , . . . , α2 ) = α T1 .

Easy to verify
326 7 Lattice-Based Cryptography
⎡ ⎤
β
⎢ βT1 ⎥
⎢ ⎥
T1∗ (β) = ⎢ .. ⎥ = T ∗ (α).
⎣ . ⎦
βT1N −1

There is
T1∗ (β) = (T ∗ (β )) = T ∗ (α).

Because α = (α N , α N −1 , · · · , α2 , α1 ), and β = α T1 , so

β = T α ⇒ T −1 β = α ⇒ α = T −1 β .

We let a = β , then
(T ∗ (α)) = T ∗ (α) = T ∗ (T −1 α).

We have completed the proof of Lemma.


Next, we give an equivalent characterization of cyclic matrix.
⎡ ⎤
a11
⎢ a21 ⎥
⎢ ⎥
Lemma 7.50 Let A = (ai j ) N ×N , a = ⎢ . ⎥ ∈ R N is the first column of A, then
⎣ .. ⎦
aN 1
A = T ∗ (a) is a cyclic matrix if and only if for all 1 ≤ k ≤ N , if 1 + i − j ≡
k(mod N ), then ai j = ak1 .
Proof If A = T ∗ (a) is a cyclic matrix, by simple observation, there is


⎪ a11 = a22 = · · · = a N N = a11



⎨ a21 = a32 = · · · = a N N −1 = a21

..
. .



⎪ a(N −1)1 = a N 2



aN 1 = aN 1

Thus, 1 + i − j = k. The same applies to i < j. We have

k = N + 1 + i − j ⇒ 1 + i − j ≡ k(mod N ).

So the Lemma holds.


The following lemma characterizes the main properties of cyclic matrices.
⎡ ⎤ ⎡ ⎤
a1 b1
⎢ .. ⎥ ⎢ .. ⎥
Lemma 7.51 If a = ⎣ . ⎦ , b = ⎣ . ⎦ are two column vectors, then
aN bN
7.7 NTRU Cryptosystem 327

(i) T ∗ (a) + T ∗ (b) = T ∗ (a + b).


(ii) T ∗ (a) · T ∗ (b) ,
= T ∗ ([T ∗ a] · b), and T ∗ (a)T ∗ (b) = T ∗ (b)T ∗ (a).

(iii) det(T (a)) = k=1 N
(a1 + a2 ξk + · · · + a N ξkN −1 ). Where ξk (1 ≤ k ≤ N ) is the
root of all N -th units of 1.
(iv) If the cyclic matrix T ∗ (a) is reversible, the inverse matrix is (T ∗ (a))−1 = T ∗ (b),
Where b is the first column of T ∗ (a).

Proof (i) is trivial, because

T ∗ (a) + T ∗ (b) = [a + b, T (a + b), . . . , T N −1 (a + b)] = T ∗ (a + b).

To prove (ii), using the row vector block of cyclic matrix (see (7.153)), then
⎡ ⎤ ⎡ ⎤
a T1 a T1 b
⎢ a T2 ⎥ ⎢ a T 2b ⎥
⎢ 1 ⎥ ⎢ 1 ⎥
[T ∗ (a)]b = ⎢ . ⎥ b = ⎢ .. ⎥ ,
⎣ .. ⎦ ⎣ . ⎦
a T1N a T1N b

and ⎡ ⎤
a T1
⎢ a T2 ⎥
⎢ 1 ⎥
T ∗ (a) · T ∗ (b) = ⎢ . ⎥ [b, T b, . . . , T N −1 b] = (Ai j ) N ×N .
⎣ . . ⎦
a T1N

where
N +i− j+1 i+1− j
Ai j = a T1i · T j−1 b = a T1 b = a T1 b.

By Lemma 7.50, then T ∗ (a) · T ∗ (b) = T ∗ ([T ∗ (a)]b), so there is the first conclusion
of (ii). We notice that
j−1 N −i−1+ j j−i−1
Ai j = Ai j = b T1 T i a = b T1 a = b T1 a.

It is easy to prove that for any row vector x and column vector y, there is x · y = x · y,
and
x T1k = x · T1N −k , 1 ≤ k ≤ N . (7.154)

Thus,
j−i−1 N +i+1− j i+1− j
Ai j = b T1 a = b T1 a = b T1 a.

This proves that T ∗ (a)T ∗ (b) = T ∗ (b)T ∗ (a); that is, the multiplication of cyclic
matrix to matrix is commutative.
To prove (iii), suppose (T ∗ (a)) = A, but det(T ∗ (a)) = det((T ∗ (a)) ), so we just
need to calculate det(A). Make polynomial f (x) = a1 + a2 x + · · · + a N x N −1 , and
let
328 7 Lattice-Based Cryptography
⎡ ⎤
1 1 1 ··· 1
⎢ ξ1 ξ2 ξ3 ··· ξN ⎥
⎢ ⎥
V =⎢
⎢ ξ12 ξ22 ξ32 ··· ξ N2 ⎥.

⎣ ··· ··· ··· ··· ··· ⎦
ξ1N −1 ξ2N −1 ξ3N −1 ··· ξ NN −1

Then ⎡ ⎤
f (ξ1 ) f (ξ2 ) ··· f (ξ N )
⎢ ξ1 f (ξ1 ) ξ f (ξ2 ) · · · ξ N f (ξ N ) ⎥
AV = ⎢

2 ⎥.

··· ··· ··· ···
ξ1N −1 f (ξ1 ) ξ2N −1 f (ξ2 ) · · · ξ NN −1 f (ξ N )

So
det(A) det(V ) = det(AV ) = f (ξ1 ) f (ξ2 ) · · · f (ξ N ) det(V ).

Because ξi is different from each other, that is det(V ) = 0, so

det(A) = f (ξ1 ) f (ξ2 ) · · · f (ξ N )



N
= f (ξk )
k=1


N
= (a1 + a2 ξk + · · · + a N ξkN −1 ).
k=1

⎡ ⎤
1
⎢0⎥
⎢ ⎥
Now prove (iv). Let e = ⎢ . ⎥ ∈ R N , then
⎣ .. ⎦
0

T ∗ (e) = [e, T e, . . . , T N −1 e] = I N .

So take b ∈ R N to satisfy

T ∗ (a) · b = e ⇒ b = (T ∗ (a))−1 e.

Obviously, b is the first column of (T ∗ (a))−1 , by (ii),

T ∗ (a)T ∗ (b) = T ∗ ([T ∗ (a)]b) = T ∗ (e) = I N .

Thus, (T ∗ (a))−1 = T ∗ (b). In other words, the inverse of a reversible cyclic matrix
is also a cyclic matrix.
7.7 NTRU Cryptosystem 329
⎡ ⎤
a1
⎢ ⎥ "N
Corollary 7.10 Let N be a prime, a = ⎣ ... ⎦ ∈ Rn , satisfy a = 1, and i=1 ai =
aN
0, then the cyclic matrix T ∗ (a) generated by a is a reversible square matrix.

Proof Under given conditions, we can only prove det(T ∗ (a)) = 0. Let εk
= exp( 2πik ), 1 ≤ k ≤ N − 1, be N − 1 primitive unit roots of N -th( because N
N "N
is a prime), if det(T ∗ (a)) = 0, because of i=1 ai = 0, there must be a k, 1 ≤ k ≤
N − 1, such that
a1 + εk a2 + εk2 a3 + · · · + εkN −1 a N = 0.

In other words, εk is a root of polynomial φ(x) = a1 + a2 x + · · · + a N x N −1 , so φ(x)


and 1 + x + · · · + x N −1 have a common root εk , therefore, the greatest common
divisor of two polynomials

(φ(x), 1 + x + · · · + x N −1 ) > 1.

Since 1 + x + · · · + x N −1 is a circular polynomial, it is an irreducible polynomial,


a = 1, contradiction shows det(T ∗ (a)) = 0, the Corollary holds.

Next, we give an equivalent description of a lattice of NTRU by using the cyclic


matrix. Firstly, we define the linear transformation σ in the even dimensional
Euclidean space R2N , if x and y are two column vectors, define
6 7 6 7
x Tx
σ = ∈ R2N . (7.155)
y Ty

Equivalently, if x ∈ R N , y ∈ R N are two row vectors, define

σ (x, y) = (x T1 , yT1 ) ∈ R2N . (7.156)

Obviously, σ defined above is a linear transformation of R2N → R2N .

Definition 7.12 An entire lattice L ⊂ R2N is called a convolution q-ary lattice, if


(i) L is q-ary lattice, that is qZ2N ⊂ L ⊂ Z2N .
(ii) L is closed under the linear transformation σ , that is, x, y ∈ R N is the column
vector, 6 7 6 7 6 7
x x Tx
∈L⇒σ = ∈ L.
y y Ty
" N −1
Recall that NTRU’s private key is two N − 1-degree polynomials f = fi x i ,
" N −1 i=0
g = i=0 gi x i , and write f and g in column vector form:
330 7 Lattice-Based Cryptography
⎡ ⎤
f0
⎢ ⎥
f = ⎣ ... ⎦ ∈ Z N , f = ( f 0 , f 1 , . . . , f N −1 ) ∈ Z N .
f N −1

And ⎡ ⎤
g0
⎢ ⎥
g = ⎣ ... ⎦ ∈ Z N , g = (g0 , g1 , . . . , g N −1 ) ∈ Z N .
g N −1

NTRU’s parameter system is N , q, p is two positive integers, N is prime, p < q,


and defines a polynomial set

Ad { p, 0, − p} = { f (x) ∈ Z N |d + 1 coefficients of f are p,


d coefficients of f are p, others are 0}. (7.157)

Select two polynomials f, g ∈ Z N of degree N − 1, and parameter d f are positive


integers, which meet the following restrictions.
(A) N , p, q, d f are positive integers, N is a prime, 1 < p < q, ( p, q) = 1;
(B) f and g are two polynomials of degree N − 1, and the constant term of f is 1,
and
f − 1 ∈ Ad f { p, 0, − p}, g ∈ Ad f { p, 0, − p}.

(C) T ∗ ( f ) is reversible mod q.


The above (A)–(C) are the parameter constraints of NTRU. Obviously, under
these conditions, T ∗ ( f ) and T ∗ (g) are reversible matrices, and

T ∗ ( f ) ≡ I N (mod p), T ∗ (g) ≡ 0(mod p). (7.158)

After the polynomials


6 7 f and g satisfying the above conditions are selected as the
f
private key, then ∈ Z2N , let’s construct a minimum convolution q-ary lattice
g
6 7
f
containing . Suppose
g
6 7
T ∗( f )
A = [T1∗ ( f ), T1∗ (g )] N ×2N , and A = . (7.159)
T ∗ (g)

Consider A as an N × 2N -order matrix on Zq , that is A ∈ ZqN ×2N , then by (7.45), A


defines a 2N dimensional q-ary lattice q (A), that is

q (A) = {y ∈ Z2N | there is x ∈ Z N ⇒ y ≡ A x(mod q)}. (7.160)


7.7 NTRU Cryptosystem 331
6 7
f
We prove that q (A) is a convolution q-ary lattice containing . First, we prove
g
the following general identity
⎡ ⎤
a1
⎢ ⎥
Lemma 7.52 Suppose a = ⎣ ... ⎦ ∈ R N , then for ∀ x ∈ R N and 0 ≤ k ≤ N − 1,
aN
we have
T k (T ∗ (a)x) = T ∗ (a)(T k x), where T 0 = I N .

Proof k = 0 is trivial, obviously, we can assume k = 1, that is

T (T ∗ (a)x) = T ∗ (a)(T x). (7.161)

By (7.153), ⎡ ⎤ ⎡ ⎤
a T1 x ax
⎢ a T 2x ⎥ ⎢ a T1 x ⎥
⎢ 1 ⎥ ⎢ ⎥
T (T ∗ (a)x) = T ⎢ . ⎥=⎢ .. ⎥.
⎣ . . ⎦ ⎣ . ⎦
a T1N x a T1N −1 x

Because of T = T1N −1 , then the right side of Eq. (7.161) is


⎡ ⎤ ⎡ ⎤
a T1 T x ax
⎢ a T 2T x ⎥ ⎢ a T1 x ⎥
⎢ 1 ⎥ ⎢ ⎥
T ∗ (a)(T x) = ⎢ .. ⎥=⎢ .. ⎥.
⎣ . ⎦ ⎣ . ⎦
a T1N T x a T1N −1 x

So (7.161) holds, the Lemma holds.


6 7
f
Lemma 7.53 q (A) is a convolution q-ary lattice, and ∈ q (A).
g

Proof By Lemma 7.27, q (A) is a q-ary lattice, that is qZ2N ⊂ q (A)⊂ Z2N , we
only prove q (A) is closed under linear transformation σ . If y ∈ q (A), then there
is x ∈ Z N ⇒ y ≡ A x(mod q), by the definition of σ ,
6 7 6 ∗ 7
T (T ∗ ( f )x) T ( f )T x
σ (y) ≡ = ≡ A T x(mod q).
T (T ∗ (g)x) T ∗ (g)T x

Because of x ∈ Z N ⇒ T x ∈ Z N , thus σ (y) ∈ q (A). That is, ⎡ q (A)


⎤ is a con-
1
6 7 ⎢0⎥
f ⎢ ⎥
volution q-ary lattice, which is proved ∈ q (A). Let e = ⎢ . ⎥ ∈ Z , then
N
g ⎣ .. ⎦
0
332 7 Lattice-Based Cryptography

T ∗ ( f ) · e is the first column of T ∗ ( f ), that is

T ∗ ( f )e = f, T ∗ (g)e = g.

Thus, 6 7 6 7
T ∗ ( f )e f
Ae= = ∈ q (A).
T ∗ (g)e g

The Lemma holds.

With the above preparation, we now introduce the equivalent form of NTRU in
lattice theory. 6 7
f
Public key generation. After selected private key ∈ Z2N , NTRU’s public
g
6 7is generated as follows: Because the convolution q-ary lattice q (A) containing
key
f
is an entire lattice, q (A) has a unique HNF basis H , where
g
6 7
I N T ∗ (h)
H= , h ≡ [T ∗ ( f )]−1 g(mod q). (7.162)
0 q IN

By (7.48) of Lemma 7.28, the determinant d( q (A)) of q (A) is

d( q (A)) = | det( q (A))| = q 2N −N = q N .

So the diagonal elements of H are I N and q I N . By the assumption T ∗ ( f ) ∈ Z N ×N ,


and reversible mod q, [T ∗ ( f )]−1 is the inverse matrix of T ∗ ( f ) mod q, h ∈ Z N , its
component h i is selected between − q2 and q2 , that is − q2 ≤ h i < q2 , such an h is the
only one that exists. It is not difficult to verify that H is an HNF matrix and the
lattice generated by H is q (A), so H is the HNF basis of q (A). H is published
as a public key.
Encryption transformation. The message sender encodes the plaintext as m ∈ Z N ,
and randomly select a vector r ∈ Z N to satisfy

m ∈ Ad f {1, 0, −1}, r ∈ Ad f {1, 0, −1}. (7.163)

That is, m has d f + 1 1, d f −1, other components are 0. Then, the plaintext m is
encrypted with the public key H of the message recipient:
6 7 6 7
m m + [T ∗ (h)]r
c=H ≡ (mod q). (7.164)
r 0

c is called cryptosystem text, the first N components are m + [T ∗ (h)]r , the last N
components are 0.
7.7 NTRU Cryptosystem 333

Decryption transformation. If all components of m + [T ∗ (h)]r are between inter-


vals [− q2 , q2 ), the message receiver can determine that the cryptosystem text c is
6 7
m + [T ∗ (h)]r
c= .
0

Then decrypt it with its own private key T ∗ ( f ),

c ≡ [T ∗ ( f )]m + [T ∗ ( f )][T ∗ (h)]r (mod q)


(7.165)
≡ [T ∗ ( f )]m + [T ∗ (g)]r (mod q).

By the definition of h, there is

[T ∗ ( f )]h ≡ g(mod q) ⇒ T ∗ ([T ∗ ( f )]h) ≡ T ∗ (g)(mod q).

And by Lemma 7.51, there is T ∗ ([T ∗ ( f )]h) ≡ T ∗ ( f ) · T ∗ (h), so

T ∗ ( f )T ∗ (h) ≡ T ∗ (g)(mod q).

Equation (7.165) holds.


If 8 q q 9N
[T ∗ ( f )]m + [T ∗ (g)]r ∈ − , . (7.166)
2 2
So do mod p operation on [T ∗ ( f )]m + [T ∗ (g)]r , and by (7.158), thus

([T ∗ ( f )]m + [T ∗ (g)]r ) mod p = I N m + 0 · r = m. (7.167)

The correctness of decryption transformation is guaranteed.


In order to ensure that (7.167) holds, it can be seen from the above analysis that
the following conditions are necessary.

m + [T ∗ (h)]r ∈ [− q2 , q2 ] N
(7.168)
[T ∗ ( f )]m + [T ∗ (g)]r ∈ [− q2 , q2 ] N .

Obviously, the first condition can be derived from the second condition; that is, the
(7.168) can be derived from the (7.166). We first prove the following Lemma.
( q4 −1)
Lemma 7.54 If the parameter meets d f < 2p
, then

8 q q 9N
[T ∗ ( f )]m + [T ∗ (g)]r ∈ − , .
2 2
Proof Because all components of m and r are ±1 or 0, therefore, we only prove that
the absolute value of the row vectors of [T ∗ ( f )] and [T ∗ (g)] is not greater than q2 .
334 7 Lattice-Based Cryptography

Write f = ( f 0 , f 1 , . . . , f N −1 ), because of f 0 = 1,
 N −1  N −1
  
  q
 fi  ≤ | f i | = 1 + (2d f + 1) p < .
  4
i=0 i=0

Similarly,  N −1  N −1
  
  q
 gi  ≤ |gi | = (2d f + 1) p < .
  4
i=0 i=0

Thus 8 q q 9N
[T ∗ ( f )]m + [T ∗ (g)]r ∈ − , .
2 2
The Lemma holds.

According to the above lemma, NTRU algorithm needs to add the following
additional conditions to ensure the correctness of decryption transformation:
(D)
( q4 − 1)
df < .
2p

To sum up, when NTRU cryptosystem satisfies


6 7 the additional restrictions (A)–(D)
f
on the parameter system, the private key is and the public key is HNF matrix H ,
g
the encryption and decryption algorithm can be based on the algorithm introduced
above.

7.8 McEliece/Niederreiter Cryptosystem

McEliece/Niederreiter cryptosystem is a cryptosystem designed based on the asym-


metry of coding and decoding of a special class of linear codes (Goppa codes) over
a finite field. It was proposed by McEliece and Niederreiter in 1978 and 1985. It is
included in the category of postquantum cryptography. We start with cyclic codes.
Recall the concept of linear code in Chap. 2, let Fq be a q-element finite field, also
known as the alphabet, and the elements in Fq are called letters or characters. The
N -dimensional linear space FqN on Fq is called the codeword space of length N . Any
a vector a = (a0 , a1 , . . . , a N −1 ) ∈ FqN , a is called a codeword of length N , which is
usually written as a = a0 a1 · · · a N −1 ∈ FqN , from the previous section, we have

aT1 = (a0 , a1 , . . . , a N −1 )T1 = (a N −1 , a0 , a1 , . . . , a N −2 ). (7.169)

The reverse codeword a of a codeword a = a0 a1 · · · a N −1 is defined as


7.8 McEliece/Niederreiter Cryptosystem 335

a = a N −1 a N −2 · · · a1 a0 ∈ FqN . (7.170)

If C ⊂ FqN , and C is a k-dimensional linear subspace of FqN , which is called a linear


code, usually written as C = [N , k], k = 0, or k = N , [N , 0] and [N , N ] is called
trivial code, actually,

[N , 0] = {0 = 00 · · · 0}, [N , N ] = FqN .

The reverse order code C of code C is defined as C = {c|c ∈ C}, obviously, if


C = [N , k], then C = [N , k].
Definition 7.13 A linear code C of length N is called a cyclic code, if ∀ c ∈ C ⇒
cT1 ∈ C.
Next, we give an algebraic expression of cyclic codes using ideal theory. For
this purpose, note that Fq [x] is a univariate polynomial ring on Fq , and x N − 1 is
the principal ideal generated by polynomial x N − 1. Write R = Fq [x]/x N − 1 as
quotient ring. If a = a0 a1 · · · a N −1 ∈ FqN , then a(x) = a0 + a1 x · · · + a N −1 x N −1 ∈
R, so a → a(x) is a 1-1 correspondence of FqN → R and an isomorphism between
additive groups. In this correspondence, we equate codeword a with polynomial
a(x). That is a = a(x) ⇒ FqN = R = Fq [x]/x N − 1, and any code C ⊂ FqN .

C = C(x) = {c(x)|c ∈ C} ⊂ R.

That is, a code C is equivalent to a subset of Fq [x]/x N − 1. The following lemma
reveals the algebraic meaning of a cyclic code.

Lemma 7.55 C ⊂ FqN is a cyclic code ⇔ C(x) is an ideal in Fq [x]/x N − 1.

Proof If C(x) is an ideal of Fq [x]/x N − 1, obviously C is a linear code, for any
code c = c0 c1 · · · c N −1 , there is c(x) = c0 + c1 x + · · · + c N −1 x N −1 ∈ C(x), thus
xc(x) = c N −1 + co x + c1 x 2 + · · · + c N −2 x N −1 ∈ C(x). So cT1 = c N −1 c0 c1 · · ·
c N −2 ∈ C, C is a cyclic code on Fq . Conversely, if C is a cyclic code, then cT1 ∈ C,
thus cT1k ∈ C, for all 0 ≤ k ≤ N − 1 holds. Where T10 = I N is the N -th order identity
matrix. Since the polynomial cT1k (x) corresponding to cT1k is

cT1k (x) = x k c(x). (7.171)

So ∀ g(x) ∈ R ⇒ g(x)c(x) ∈ C(x). This proves that C(x) is an ideal. The Lemma
holds.

Using the homomorphism theorem of rings, we give the mathematical expressions


π
of all ideals in R. Let π be the natural homomorphism of Fq [x] −→ Fq [x]/x N − 1,
then all ideals in R correspond to all ideals containing kerπ = x N − 1 in Fq [x]
one by one, that is
336 7 Lattice-Based Cryptography

π
kerπ = x N − 1 ⊂ A ⊂ Fq [x] −→ Fq [x]/x N − 1 = R.

Since Fq [x] is the principal ideal ring and A is an ideal of Fq [x], and x N − 1 ⊂ A,
then
A = g(x), where g(x)|x N − 1. (7.172)

Therefore, all ideals in R are finite principal ideals, which can be listed as follows

{g(x) mod N − 1|g(x) divide x N − 1}.

where g(x) mod x N − 1 represents the principal ideal generated by g(x) in R, that
is

g(x) mod x N − 1 = {g(x) f (x)|0 ≤ deg f (x) ≤ N − deg(g(x)) − 1}. (7.173)

This proves that Fq [x]/x N − 1 is a ring of principal ideals, and the number of
principal ideals is the number d + 1 of positive factors of x N − 1. The so-called
positive factor is a polynomial with the first term coefficient of 1. Therefore, the
Corollary is as follows:

Corollary 7.11 Let d be the number of positive factors of x N − 1, then the number
of cyclic codes with length N is d + 1.

A cyclic code C corresponds to an ideal C(x) = g(x) mod x N − 1 in R, we


define
Definition 7.14 Let C be a cyclic code, if C(x) = g(x) mod x N − 1, then g(x) is
called the generating polynomial of C, where g(x)|x N − 1.
If g(x) = x N − 1, then x N − 1 mod x N − 1 = 0, corresponding to zero ideal
in R. Thus, the corresponding cyclic code C = {0 = 00 · · · 0} is called zero code. If
g(x) = 1, then g(x) mod x N − 1 = R. The corresponding code C = FqN . There-
fore, there are always two trivial cyclic codes in cyclic codes of length N , zero code
and FqN , which correspond to zero ideal in R and R itself, respectively.

Lemma 7.56 Let g(x)|x N − 1, g(x) be the generating polynomial of cyclic code C,
and deg g(x) = N − k, then C is [N , k] linear code, further, let g(x) = g0 + g1 x +
· · · + g N −k−1 x N −k−1 + g N −k x N −k , the corresponding codeword g = (g0 , g1 , . . . ,
g N −k , 0, 0, . . . , 0) ∈ C, then the generating matrix G of C is
⎡ ⎤
g
⎢ gT1 ⎥
⎢ ⎥
G=⎢ .. ⎥ . (7.174)
⎣ . ⎦
gT1k−1 k×N
7.8 McEliece/Niederreiter Cryptosystem 337

Proof Let C correspond to ideal C(x) = g(x) mod x N − 1, then g(x), xg(x), . . .,
x k−1 g(x) ∈ C(x), their corresponding codewords are {g, gT1 , . . . , gT1k−1 } ⊂ C,
"k−1
let’s prove that {g, gT1 , . . . , gT1k−1 } is a set of bases of C. If ∃ ai ∈ Fq ⇒ i=0 ai gT1i
= 0, then its corresponding polynomial is 0, that is
 k−1 
 
k−1 
k−1
ai gT1i (x) = ai gT1i (x) = ai x i g(x) = 0.
i=0 i=0 i=0

Thus

k−1
ai x i = 0 ⇒ ∀ ai = 0, 0 ≤ i ≤ k − 1.
i=0

That is, {g, gT1 , . . . , gT1k−1 } is a linear independent group in C. Further ∀c ∈ C,


we can prove that c can be expressed linearly. suppose c ∈ C, then c(x) ∈ C(x), by
(7.174), there is f (x),

f (x) = f 0 + f 1 x + · · · + f k−1 x k−1 ⇒ c(x) = g(x) f (x)



k−1 
k−1
= f i x i g(x) ⇒ c = f i gT1i .
i=0 i=0

This proves that the dimension of linear subspace C is N − deg g(x) = k; that is, C
is [N , k] linear code. Its generating matrix G is
⎡ ⎤
g
⎢ gT1 ⎥
⎢ ⎥
G=⎢ .. ⎥ .
⎣ . ⎦
gT1k−1 k×N

The Lemma holds.

Next, we discuss the dual code of cyclic code and its check matrix.

Lemma 7.57 Let C ⊂ FqN be a cyclic code and g(x) be the generating polyno-
mial of g(x), deg g(x) = N − k, let g(x)h(x) = x N − 1, h(x) = h 0 + h 1 x + · · · +
h k x k , h = (h 0 , h 1 , . . . , h k , 0, 0, · · · , 0) ∈ FqN is the corresponding codeword. h is
the reverse order codeword, then the check matrix of C is
⎡ ⎤
h
⎢ hT1 ⎥
⎢ ⎥
H =⎢ .. ⎥ . (7.175)
⎣ . ⎦
hT1N −k−1 (N −k)×N
338 7 Lattice-Based Cryptography

The dual code C ⊥ of C is [N , N − k] linear code, and

C ⊥ = {a H |a ∈ FqN −k },

h(x) is called the check polynomial of cyclic code C.


Proof By Lemma 7.56, C is a k-dimensional linear subspace, and the generat-
ing matrix G is given by (7.175). Because of g(x)h(x) = x N − 1, then there is
g(x)h(x) = 0 in ring R. Equivalently,

g0 h i + g1 h i−1 + · · · + g N −k h i−N +k = 0, ∀ 0 ≤ i ≤ N − 1.

The matrix of the above formula is expressed as G H = 0, so H is the generation


matrix of dual code of C, and we have Lemma holds.
Remark 7.5 The polynomial h(x) corresponding to the reverse codeword h is

h(x) = h 0 x N −1 + h 1 x N −2 + · · · + h k x N −k−1 .

In general, when h(x)|x N − 1, h(x)  x N − 1, therefore, the dual code of cyclic code
is not necessarily cyclic code.
Definition 7.15 Let x N − 1 = g1 (x)g2 (x) · · · gt (x) be the irreducible decomposi-
tion of x N − 1 on Fq , where gi (x)(1 ≤ i ≤ t) is the irreducible polynomial with the
first term coefficient of 1 in Fq [x]. Then the cyclic code generated by gi (x) is called
the i-th maximal cyclic code in FqN , denote as Mi+ . The cyclic code generated by
x N −1
gi (x)
is called the i-th minimal cyclic code, denote as Mi− .
Minimal cyclic codes are also called irreducible cyclic codes because they no
longer contain the nontrivial cyclic codes of FqN in Mi− . The irreducibility of minimal
cyclic codes can be derived from the fact that the ideal Mi− (x) in R corresponding
to Mi− is a field. We can give a proof of pure algebra.
Corollary 7.12 Let Mi− be the i-th minimal cyclic code of FqN (1 ≤ i ≤ t), Mi− (x)
is the ideal corresponding to Mi− in R, then Mi− (x) is a field, thus, Mi− no longer
contains any nontrivial cyclic code of FqN .

Proof Let g(x) = (x N − 1)/gi (x), gi (x) be an irreducible polynomial in Fq [x], by


(7.175),
Mi− (x) = g(x)Fq [x]/(x N − 1)Fq [x] ∼ = Fq [x]/gi (x)Fq [x],

where g(x)Fq [x] is the principal ideal generated by g(x) in Fq [x]. Since gi (x) is an
irreducible polynomial, so Mi− (x) is a field.
Example 7.1 All cyclic codes with length of 7 are determined on binary finite field
F2 .
Solve: Polynomial x 7 − 1 has the following irreducible decomposition on F2
7.8 McEliece/Niederreiter Cryptosystem 339

x 7 − 1 = (x − 1)(x 3 + x + 1)(x 3 + x 2 + 1).

Therefore, x 7 − 1 has 7 positive factors on F2 , by Corollary 7.11, there are 8 cyclic


codes with length of 7 on F2 . Where 0 and F72 are two trivial cyclic codes. There
are three maximal cyclic codes generated by g(x) = x − 1, g(x) = x 3 + x + 1 and
g(x) = x 3 + x 2 + 1, respectively. The dimensions of the corresponding cyclic codes
are 6 dimension, 4 dimension and 4 dimension. Similarly, there are three minimal
cyclic codes, corresponding to the dimension of one and two three-dimensional cyclic
codes.
Another characterization of cyclic codes is zeroing polynomials, if x N − 1 =
g1 (x) · · · gt (x), the ideal Mi+ (x) in R corresponding to the maximum cyclic code
Mi+ (1 ≤ i ≤ t) generated by gi (x) is

Mi+ (x) = {gi (x) f (x)|0 ≤ deg f (x) ≤ N − deg gi (x) − 1}.

Let β be a root of gi (x) in the split field. Then gi (x) is the minimal polynomial of β
in Fq [x], all c(x) ∈ Mi+ (x) ⇒ c(β) = 0. Therefore,

Mi+ (x) = {c(x)|c(x) ∈ R, and c(β) = 0}.

Example 7.2 Suppose N = (q m − 1)/q − 1, (m, q − 1) = 1, β is an N -th primi-


tive unit root in Fq m , then the cyclic code

C = {c(x)|c(β) = 0, c(x) ∈ R}

is equivalent to Hamming code [N , N − m].

Proof Because (m, q − 1) = 1, and

N = q m−1 + q m−2 + · · · + q + 1 = (q − 1)(q m−2 + 2q m−3 + · · · + (m − 1)) + m.

So (N , q − 1) = 1. Therefore, β i(q−1) = 1, for 1 ≤ i ≤ N − 1, in other words, β i ∈ /


Fq for ∀ 1 ≤ i ≤ N − 1 holds. In Fq m , any two elements of {1, β, β 2 , . . . , β N −1 } are
linearly independent on Fq . If each element is regarded as an m-dimensional column
vector on Fq , then the m × N -order matrix

H = [1, β, β 2 , . . . , β N −1 ]m×N

constitutes the check matrix of cyclic code C, and any two rows of H are linearly
independent on Fq , by the definition, C is [N , N − m] Hamming code.

Lemma 7.58 Let C ⊂ FqN be a cyclic code, C(x) ⊂ Fq [x]/x N − 1 be an ideal,


(N , q) = 1, then C(x) contains a multiplication unit element c(x) ∈ C(x) ⇒

c(x)d(x) ≡ d(x)(mod x N − 1), ∀ d(x) ∈ C(x).


340 7 Lattice-Based Cryptography

The unit element c(x) in C(x) is unique.

Proof Because (N , q) = 1 ⇒ x N − 1 has no double root in Fq , let g(x) be the gener-


ating polynomial of C and h(x) be the checking polynomial of C, that is g(x)h(x) =
x N − 1. Therefore, (g(x), h(x)) = 1, and there is a(x), h(x) ∈ Fq [x], ⇒

a(x)g(x) + b(x)h(x) = 1.

Let c(x) = a(x)g(x) = 1 − b(x)h(x) ∈ C(x), so for ∀ d(x) ∈ C(x), write d(x) =
g(x) f (x), thus

c(x)d(x) = a(x)g(x)g(x) f (x)


= (1 − b(x)h(x))g(x) f (x)
= g(x) f (x) − b(x)h(x)g(x) f (x).

Therefore
c(x)d(x) ≡ d(x)(mod x N − 1).

There is c(x)d(x) = d(x) in R = Fq [x]/x N − 1. That is, c(x) is the multiplication
unit element of C(x). obviously, c(x) exists only. The Lemma holds.

Definition 7.16 C ⊂ FqN is a cyclic code, and the multiplication unit element c(x)
in C(x) is called the idempotent element of C. If C = Mi− is the i-th minimal cyclic
code, the idempotent element of C is called the primitive idempotent element, denote
as θi (x).

Lemma 7.59 Let C1 ⊂ FqN , C2 ⊂ FqN are two cyclic codes, (N , q) = 1, Idempotent
elements are c1 (x) c2 (x), respectively, then
+
(i) C1 C2 is also the cyclic code of FqN , idempotent element is c1 (x)c2 (x).
(ii) C1 + C2 is also the cyclic code of FqN , idempotent element is c1 (x) + c2 (x) +
c1 (x)c2 (x).
+
Proof It is obvious that C1 C2 and C1 + C2 are cyclic codes in FqN , because they
correspond to ideal C1 (x) and C2 (x) in R, we have

C1 (x) ∩ C2 (x) and C1 (x) + C2 (x)

is still the ideal in R. Therefore, the corresponding codes C1 ∩ C2 and C1 + C2 are


still cyclic codes, and the conclusion on idempotents is not difficult to verify. The
Lemma holds.

In 1959, A. Hocquenghem and 1960, R. Bose and D. Chaudhuri independently


proposed a special class of cyclic codes, which required minimal distance. At present,
it is generally called BCH codes in academic circles.
7.8 McEliece/Niederreiter Cryptosystem 341

Definition 7.17 A cyclic code C ⊂ FqN with length N is called a δ-BCH code. If its
generating polynomial is the least common multiple of the minimal polynomial of
β, β 2 , . . . , β δ−1 , where δ is a positive integer, β is a primitive N -th unit root. δ-BCH
code is also called BCH code with design distance of δ. If β ∈ Fq m , N = q m − 1,
such BCH codes are called primitive.
Lemma 7.60 Let d be the minimal distance of a δ-BCH code, then we have d ≥ δ.
Proof Suppose x N − 1 = (x − 1)g1 (x)g2 (x) · · · gt (x), β is a primitive N -th unit
root on Fq , then β is the root of a gi (x). Let deg gi (x) = m ⇒ β ∈ Fq m . Because
of [Fq m : Fq ] = m, we can think of β, β 2 , . . . , β δ−1 as an m-dimensional column
vector. Let H be the following m(δ − 1) × N -order matrix.
⎡ ⎤
1 β β2 · · · β N −1
⎢ 1 β2 β4 · · · β 2(N −1) ⎥
⎢ ⎥
H =⎢. . .. .. .. ⎥ .
⎣ .. .. . . . ⎦
1 β δ−1 β 2(δ−1) · · · β (N −1)(δ−1) m(δ−1)×N

In fact, H is the check matrix of δ-BCH code C, that is

c ∈ C ⇐⇒ cH = 0.

We prove that any (δ − 1) column vectors of H are linear independent vectors. Let
the first component of these (δ − 1) column vectors be β i1 , β i2 , . . . , β iδ−1 , where
i j ≥ 0, the corresponding determinant is Vandermonde determinant , and

 = β i1 +i2 +···+iδ−1 (β ir − β is ) = 0.
r >s

Therefore, any (δ − 1) column vectors of H are linearly independent. Thus, the


minimum distance of C is d ≥ δ.
Now, we can introduce the design principle of McEliece/Niederreiter cryptosys-
tem. Its basic mathematical idea is based on the decoding principle of error correction
code. Recall the concept of error correction code in Chap. 2, a code C ⊂ FqN is called
t-error correction code (t ≥ 1 is a positive integer). If for ∀ y ∈ FqN , there is at most
one codeword c ∈ C ⇒ d(c, y) ≤ t, d(c, y) is the Hamming distance between c and
y. We know that if the :minimum
; distance of a code C is d, then C is a t-error correc-
tion code, where t = d−1 2
is the smallest integer not less than d−1
2
. Lemma 7.60
proves the existence of t-error correction codes for any positive integer t, i.e., 2t + 1-
BCH code (δ = 2t + 1), this kind of code is called Goppa code (see the next section),
which provides a theoretical basis for McEliece/Niederreiter cryptosystem. Next, we
will introduce the working mechanism of this kind of cryptosystem in detail. First,
let’s look at the generation of key.
Private key: Select a t-error correction code C ⊂ FqN , C = [N , k], H is the check
matrix of C, H is an (N − k) × N -dimensional matrix. For ∀ x ∈ FqN , x → x H ∈
342 7 Lattice-Based Cryptography

FqN −k is a correspondence of Spaces FqN to FqN −k , let’s prove that this correspondence
is a single shot on a special codeword whose weight is not greater than t.

Lemma 7.61 ∀ x, y ∈ FqN , if x H = y H , and w(x) ≤ t, w(y) ≤ t, then x = y.

Proof By hypothesis,

x H = y H ⇒ (x − y)H = 0 ⇒ x − y ∈ C.

Obviously, the Hamming distance d(0, x) = w(x) ≤ t between x and 0, and the
Hamming distance d(x, x − y) between x and x − y is

d(x, x − y) = w(x − (x − y)) = w(y) ≤ t.

Because C is t-error correction code, then x − y = 0, the Lemma holds.

We use t-error correction code C and check matrix H as the private key.
Public key: In order to generate the public key, we randomly select a permutation
matrix PN ×N so that I N is an N -order identity matrix, I N = [e1 , e2 , . . . , e N ], σ ∈ S N
is an N -ary substitution, then

P = σ (I N ) = [eσ (1) , eσ (2) , . . . , eσ (N ) ].

This kind of matrix is also called Wyel matrix. A nonsingular diagonal matrix
diag{λ1 , λ2 , . . . , λ N }(λi ∈ Fq , λi = 0) can also be randomly selected, and suppose

P = σ (diag{λ1 , λ2 , . . . , λ N }) = diag{λσ1 , λσ2 , . . . , λσ N }.

Let M be an (N − k) × (N − k)-order invertible matrix. The public key is the (N −


k) × N -order matrix K generated as follows,

K = P H M, this is N × (N − k) ordermatrix.

We take K as the public key and H , P and M as the private key.


Encryption: Let m ∈ FqN be a codeword, w(m) ≤ t, encrypt m as plaintext as
follows.
c = m K ∈ FqN −k , c is cryptosystem text.

In fact, a plaintext with length N and weight no greater than t on Fq is encrypted


into a cryptosystem text with length (N − k) on Fq through public key K .
Decrypt: After receiving cryptosystem text c, decrypt it through private keys H, P
and M.
c · M −1 = m K M −1 = m P H M M −1 = m P H .

Since m P ∈ FqN and m have the same root, that is


7.8 McEliece/Niederreiter Cryptosystem 343

w(m) = w(m P) ≤ t.

Using the decoding principle of error correction code: all codewords x H = m P H


satisfying x ∈ FqN actually constitute an additive coset of code C, as the leader vector
of this additive coset, m P can be obtained accurately. That is

decode
m P H −→ m P.

Finally, we have m = (m P) · P −1 , and get plaintext.

7.9 Ajtai/Dwork Cryptosystem

By choosing an appropriate n × m-order matrix A ∈ Zqn×m , two m-dimensional q-


element lattices q (A) and q⊥ (A) are defined (see (7.45) and (7.46)),

q (A) = {y ∈ Zm |∃ x ∈ Zn ⇒ y ≡ A x(mod q)}

and

q (A) = {y ∈ Zm |Ay ≡ 0(mod q)}.

Using matrix A, an anti-collision hash function can be defined:

f A : {0, 1, . . . , d − 1}m → Zqn , (7.176)

where for any y ∈ {0, 1, . . . , d − 1}m , define f A (y) as

f A (y) = Ay mod q, (7.177)

If parameter d, q, n, m is satisfied

n log q
n log q < m log d ⇒ < m. (7.178)
log d

Then Hash function f A will produce collision, that is there is y, y ∈ {0, 1, . . . , d −


1}m , y = y , and f A (y) = f A (y ). By (7.177), we have it directly

A(y − y ) ≡ 0(mod q) ⇒ y − y ∈ q (A),

this shows that the collision points y and y of Hash function f A directly lead to a
shortest vector y − y on q-element lattice q⊥ (A).
In order to obtain the anti-collision Hash function, the selection of n × m-order
matrix A is very important. First, we can select the parameter system: let d = 2,
344 7 Lattice-Based Cryptography

q = n 2 , n|m, and m log 2 > n log q, where n is a positive integer. In Ajtai/Dwork


cryptographic algorithm, there are two choices of parameter matrix A, one is cyclic
matrix and the other is more general ideal matrix. Their corresponding q-element
lattice q⊥ (A) are cyclic lattice and ideal lattice, respectively.
Cyclic lattice
Because n|m, A can be divided into mn n × n-order cyclic matrices, that is

A = [A(1) , A(2) , . . . , A( n ) ],
m
(7.179)

where α (i) ∈ Zqn is the n-dimensional column vector and A(i) is the cyclic matrix
generated by α (i) (see (7.149)), that is
m
A(i) = T ∗ (α (i) ) = [α (i) , T α (i) , . . . , T n−1 α (i) ], 1 ≤ i ≤ .
n
A is called an n × m-dimensional generalized cyclic matrix, and the q-element lattice
in Rm defined by A,

q (A) = {y ∈ Zm |Ay ≡ 0(mod q)}

is called a cyclic lattice. The Ajtai/Dwork cryptosystem based on cyclic lattice can
be stated as follows:
Algorithm 1: Hash function based on cyclic lattice.
Parameter: q, n, m, d is a positive integer, n | m, m log d > n log q.
Secret key: mn column vectors α (i) ∈ Zqn , 1 ≤ i ≤ mn .
Hash function f A : {0, 1, . . . , d − 1}m −→ Zqn define as

f A (y) ≡ Ay(mod q),

the cyclic matrix A ∈ Zqn×m is given by (7.179).


We can extend the above concepts of cyclic matrix and cyclic lattice to more
general cases and obtain the concepts of ideal matrix and ideal lattice. Let h(x) be
the first integer coefficient polynomial of n degree, h(x) = x n + an−1 x n−1 + · · · +
a1 x + a0 ∈ Z[x], define the rotation matrix Th as
⎛ ⎞
0 · · · 0 −a0
⎜ −a1 ⎟
⎜ ⎟
Th = ⎜ .. ⎟, (7.180)
⎝ I n−1 . ⎠
−an−1

if h(x) = x n − 1 is a special polynomial, then Th = T . T is highlighted in Sect. 7.7


of this chapter. Here, we discuss the more general Th . Obviously, when the constant
term a0 = 0, Th is a reversible n-order square matrix, and Th = det(Th ) = (−1)n a0 .

Lemma 7.62 The characteristic polynomial of rotation matrix Th is f (λ) = h(λ).


7.9 Ajtai/Dwork Cryptosystem 345

Proof By the definition, the characteristic polynomial f (λ) of Th is

f (λ) = det(λIn − Th )
 
 λ 0 · · · 0 a0 
 
 . .. 
 −1 λ · · · .. . 

= ··· ··· ··· ··· ··· 
 
 0 · · · · · · λ an−2 
 
 0 · · · · · · −1 an−1 
 
 λ 0 ··· 0 a0 
 
1 1 1  0 λ · · · ..
. a1 λ + a0


= · · · n−1  
λ λ2 λ ··· ··· ··· ··· ··· 
 
 0 ··· ··· · · · λn + an−1 λn−1 + · · · + a1 λ + a0 
= λn + an−1 λn−1 + · · · + a1 λ + a0 = h(λ).

Lemma 7.63 Let h(x) = x n + an−1 x n−1 + · · · + a1 x + a0 ∈ Z[x], if a0 = 0, then


the rotation matrix Th is a reversible n-order square matrix, and
⎡ ⎤
a1
6 7 ⎢ ⎥
−a0−1 α In−1 ⎢ a2 ⎥
Th−1 = −1 , α=⎢ .. ⎥ ∈ Zn−1 .
−a0 0 ⎣ . ⎦
an−1

Proof By the definition of Th ,


6 7 6 7 6 −1 7
−a0−1 α In−1 0 −a0 a0 α In−1
Th · =
−a0−1 0 In−1 −α −a0−1 0
6 7
1 0
= = In .
0 In−1

So 6 7
−a0−1 α In−1
Th−1 = .
−a0−1 0

For a given first polynomial h(x) = x n + an−1 x n−1 + · · · + a1 x + a0 ∈ Z[x] of


degree n, let R be a residue class ring of module h(x) in Z[x], i.e.,

R = Z[x]/h(x), (7.181)

where h(x) is the ideal generated by h(x) in Z[x]. Because of deg h(x) = n, then
polynomial g(x) ∈ R in R has a unique expression: g(x) = gn−1 x n−1 + gn−2 x n−2 +
· · · + g1 x + g0 ∈ R, define mapping σ : R −→ Zn as
346 7 Lattice-Based Cryptography
⎡ ⎤
g0
⎢ g1 ⎥
⎢ ⎥
σ (g(x)) = ⎢ .. ⎥ ∈ Zn . (7.182)
⎣ . ⎦
gn−1

Obviously, σ is an Abel group isomorphism of R −→ Zn . Therefore, any polynomial


g(x) in R can be regarded as an n-dimensional integer column vector.
⎡ ⎤
g0
⎢ g1 ⎥
⎢ ⎥
Definition 7.18 For any n-dimensional column vector g = σ (g(x)) = ⎢ . ⎥ ∈
⎣ .. ⎦
gn−1
Zn in Zn , define

Th∗ (g) = [g, Th (g), Th2 (g), . . . , Thn−1 (g)]n×n , (7.183)

the n-order square matrix Th∗ (g) is called an ideal matrix generated by vector g.
Ideal matrix is a more general generalization of cyclic matrix. The former corre-
sponds to a first n-degree polynomial h(x), and the latter corresponds to a special
polynomial x n − 1. We first prove that the ideal matrix Th∗ (g) and the rotation matrix
Th generated by any vector g ∈ Zn are commutative under matrix multiplication.

Lemma 7.64 For any given first n-degree polynomial h(x) ∈ Z[x], and
n-dimensional column vector g ∈ Zn , we have

Th · Th∗ (g) = Th∗ (g) · Th .

Proof Let h(x) = x n + an−1 x n−1 + · · · + a1 x + a0 ∈ Z[x], by Lemma 7.62, the


characteristic polynomial of rotation matrix Th is h(λ), then by Hamilton–Cayley
theorem, we have
Thn + an−1 Thn−1 + · · · + a1 Th + a0 = 0, (7.184)

there is
6 7
0 −a0
Th∗ (g)Th = [g, Th g, Th2 g, . . . , Thn−1 g]
In−1 −α
= [Th g, Th2 g, . . . , −a0 g − a1 Th g − · · · − an−1 Thn−1 g]
= [Th g, Th2 g, . . . , (−a0 − a1 Th − · · · − an−1 Thn−1 )g]
= [Th g, Th2 g, . . . , Thn g]
= Th [g, Th g, . . . , Thn−1 g]
= Th · Th∗ (g).
7.9 Ajtai/Dwork Cryptosystem 347

When the monic n-degree integer coefficient polynomial h is selected, we want


to establish the corresponding relationship between the ideal and the integer lattice
L ⊂ Zn in the quotient ring R = Z[x]/h(x). First, we define the concept of an ideal
lattice. In short, an ideal lattice is an integer lattice generated by the ideal matrix.
Definition 7.19 Let g = (g0 , g1 , . . . , gn−1 )T ∈ Zn be a given column vector and
Th∗ (g) be the ideal matrix generated by g, and call the integer lattice L = L(Th∗ (g))
an ideal lattice.
Our main result is the 1-1 correspondence between ideal and ideal lattice in R =
Z[x]/h(x). This also explains the reason why L(Th∗ (g)) is called ideal lattice.

Theorem 7.10 The principal ideal in R = Z[x]/h(x) 1-1 corresponds to the ideal
lattice in Zn . Specifically,
(i) If N = g(x) is any principal ideal in R, then

σ (N ) = {σ ( f )| f ∈ N } = L(Th∗ (σ (g(x)))) = L(Th∗ (g)).

(ii) If g = (g0 , g1 , . . . , gn−1 )T ∈ Zn , Th∗ (g) ⊂ Zn is any ideal lattice, then

σ −1 (Th∗ (g)) = {σ −1 (b)|b ∈ Th∗ (g)} = g(x) ⊂ R,

where g(x) = g0 + g1 x + · · · + gn−1 x n−1 = σ −1 (g).

Proof We first prove (i). Let g(x) = g0 + g1 x + · · · + gn−1 x n−1 ∈ R be a given


polynomial, N = g(x) ⊂ R is a principal ideal generated by g(x) in R, by (7.182),
⎡ ⎤
1
⎢0⎥
⎢ ⎥
σ (g(x)) = (g0 , g1 , . . . , gn−1 )T = Th∗ (g) · ⎢ . ⎥ ∈ L(Th∗ (g)).
⎣ .. ⎦
0

And because

xg(x) = gn−1 x n + gn−2 x n−1 + · · · + g1 x 2 + g0 x


= (gn−2 − gn−1 an−1 )x n−1 + (gn−3 − gn−1 an−2 )x n−2 + · · ·
+ (g0 − gn−1 a1 )x − gn−1 a0 ,

so
⎡ ⎤
⎡ ⎤ 0
−gn−1 a0 ⎢1⎥
⎢ ⎥
g0 − gn−1 a1 ⎢ ⎥
⎢ ⎥ ⎢ ⎥
σ (xg(x)) = ⎢ .. ⎥ = Th · g = Th∗ (g) ⎢ 0 ⎥ ∈ L(Th∗ (g)).
⎣ . ⎦ ⎢ .. ⎥
⎣.⎦
gn−2 − gn−1 an−1
0
348 7 Lattice-Based Cryptography

For the same reason, for 0 ≤ k ≤ n − 1, we have


⎡ ⎤
0
⎢0⎥
⎢ ⎥
⎢ .. ⎥
⎢ ⎥
σ (x g(x)) = Th · g = Th (g) · ⎢ . ⎥ ∈ L(Th∗ (g)).
k k ∗
⎢1⎥
⎢ ⎥
⎣0⎦
0

Suppose f (x) ∈ N = g(x), then f (x) = b(x) · g(x), where b(x) = b0 + b1 x +


· · · + bn−1 x n−1 , then we have

σ ( f (x)) = σ (b(x)g(x))

n−1
= bk σ (x k g(x))
k=0
⎡ ⎤ (7.185)
b0
⎢ b1 ⎥
⎢ ⎥
= Th∗ (g) ⎢ .. ⎥ ∈ L(Th∗ (g)).
⎣ . ⎦
bn−1

That proves
σ (x) = σ (g(x)) ⊂ L(Th∗ (g)).

Conversely, for any lattice point α ∈ L(Th∗ (g)), then


⎡ ⎤
b0
⎢ b1 ⎥
⎢ ⎥
α = Th∗ (g)b = Th∗ (g) ⎢ .. ⎥,
⎣ . ⎦
bn−1

since σ is 1-1 corresponds, by (7.185), then

f (x) = σ −1 ( f (x)) = σ −1 (Th∗ (g)b) ∈ N = g(x).

So we have
σ (N ) = σ (g(x)) = L(Th∗ (g)).

(i) holds. Again, σ is 1-1 corresponds, so (ii) can be derived directly. We complete
the proof of Theorem 7.10.

The above discussion on ideal matrix and ideal lattice can be extended to a finite
field Zq , because any quotient ring Zq [x]/h(x) on polynomial ring Zq [x] in finite
7.9 Ajtai/Dwork Cryptosystem 349

field is a principal ideal ring. Therefore, we can establish the 1-1 correspondence
between all ideals in R = Zq [x]/h(x) and linear codes on Zq .
Back to the Ajtai/Dwork cryptosystem, let h(x) ∈ Zq [x] be a given polynomial,
and select an n × m-dimensional matrix A ∈ Zqn×m as the generalized ideal matrix,
i.e.,
A = [A1 , A2 , . . . , A mn ], (7.186)

where Ai (1 ≤ i ≤ m
n
) is the ideal matrix generated by g (i) ∈ Zqn , that is

Ai = Th∗ (g (i) ) = [g (i) , Th g (i) , . . . , Thn−1 g (i) ], (7.187)

we get the second algorithm of Ajtai/Dwork cryptosystem:


Algorithm 2: Hash function based on ideal lattice.
Parameter: q, n, m, d are positive integers, n|m, m log d > n log q.
Secret key: mn column vectors g (i) ∈ Zqn (1 ≤ i ≤ mn ), polynomial h(x) = x n +
an−1 x n−1 + · · · + a1 x + a0 ∈ Zq [x].
Hash function f A : {0, 1, . . . , d − 1}m −→ Zqn defined as

f A (y) ≡ Ay(mod q),

The ideal matrix A ∈ Zqn×m is given by Eq. (7.186).


We will not introduce the anti-collision performance of hash functions constructed
by cyclic lattices and ideal lattices here. Interested students can refer to the reference
Micciancio and Regev (2009) in this chapter.
Exercise 7
1. L ⊂ Rn is a lattice (full rank lattice), if L ∗ is a dual lattice of L, then the integer
lattice L = Zn is a self-dual lattice, that is (Zn )∗ = Zn . Let L = 2Zn , find L ∗ =?
2. Is it correct that L is a self-dual lattice if and only if L = Zn ? Why?
3. Under the assumption of exercise 1, let λ1 (L) be the shortest vector length of L
and λ1 (L ∗ ) be the shortest vector length of dual lattice L ∗ . Then

λ1 (L) · λ1 (L ∗ ) ≤ n.

4. Let λ1 (L), λ2 (L), . . . , λn (L) be the length of the Successive Shortest vector of
lattice L, prove
λ1 (L) · λn (L ∗ ) ≥ 1.

5*. Let L be a lattice, B = [β1 , β2 , . . . , βn ] is the generating matrix of L, B ∗ =


[β1∗ , β2∗ , . . . , βn∗ ] is the corresponding orthogonal matrix. Prove: any lattice L
has a set of bases {β1 , β2 , . . . , βn }, such that

1
λ1 (L) ≤ min{|β1∗ |, |β2∗ |, . . . , |βn∗ |} ≤ λ1 (L).
n
350 7 Lattice-Based Cryptography

(Hint: use KZ basis on lattice L).


6. Under the assumption of exercise 5, let λ1 (L), λ2 (L), . . . , λn (L) be the contin-
uous minimum of lattice L, prove:

λ j (L) ≥ min |βi∗ |, 1 ≤ j ≤ n.


j≤i≤n

7. For a full rank lattice L ⊂ Rn , define its coverage radius μ(L) as

μ(L) = maxn |x − L|.


x∈R

Prove: the covering√radius of any lattice L exists.


8. Prove: μ(Zn ) = 21 n.
9. For any lattice L ⊂ Rn , prove: μ(L) ≥ 21 λn (L).
10. For any lattice L ⊂ Rn , prove the following theorem:

λ1 (L) · μ(L ∗ ) ≤ n.

References

Ajtai, M. (2004). Generating hard instances of lattice problems. In Quad. Mat.: Vol. 13. Complexity
of computations and proofs (pp. 1–32). Dept. Math., Seconda Univ. Napoli. Preliminary version
in STOC 1996.
Ajtai, M., & Dwork, C. (1997). A public-key cryptosystem with worst-case/average-case equiva-
lence. In Proceedings of 29th Annual ACM Symposium on Theory of Computing (STOC) (pp.
284–293).
Babai, L. (1986). On Loväasz lattice reduction and the nearest lattice point problem. Combinatorica,
6, 1–13.
Cassels, J. W. S. (1963). Introduction to diophantine approximation. Cambridge University Press.
Cassels, J. W. S. (1971). An introduction to the geometry of numbers. Springer.
Gama, N., & Nguyen, P. Q. (2008a). Finding short lattice vectors within Mordell’s inequality. In
Proceedings of 40th ACM Symposium on Theory of Computing (STOC) (pp. 207–216).
Gama, N., & Nguyen, P. Q. (2008b). Predicting lattice reduction. In Lecture Notes in Computer
Science: Advances in cryptology. Proceedings of Eurocrypt’08. Springer
Goldreich, O., Goldwasser, S., & Halevi, S. (1997). Public-key cryptosystems from lattice reduction
problems. In Lecture Notes in Computer Science: Vol. 1294. Advances in cryptology (pp. 112–
131). Springer.
Hoffstein, J., Pipher, J., & Silverman, J. H. (1998). NTRU: A ring based public key cryptosystem.
In LNCS: Vol. 1423. Proceedings of ANTS-III (pp. 267–288). Springer.
Klein, P. (2000). Finding the closest lattice vector when it’s unusually close. In Proceedings of 11th
Annual ACM-SIAM Symposium on Discrete Algorithms (pp. 937–941).
Lenstra, A. K., Lenstra, H. W., Jr., & Lovasz, L. (1982). Factoring polynomials with rational
coefficients. Mathematische Annalen, 261(4), 515–534.
McEliece, R. (1978). A public-key cryptosystem based on algebraic number theory. Technical
Report, Jet Propulsion Laboratory. DSN Progress Report 42-44.
References 351

Micciancio, D. (2001). Improving lattice based cryptosystems using the Hermite normal form. In
J. Silverman (Ed.), Lecture Notes in Computer Science: Vol. 2146. Cryptography and lattices
conference—CaLC 2001 (pp. 126–145). Springer.
Micciancio, D., & Regev, O. (2009). Lattice-based cryptography. Springer.
Niederreiter, H. (1986). Knapsack-type cryptosystems and algebraic coding theory. Problems of
Control and Information Theory/Problemy Upravlen. Teor. Inform., 15(2), 159–166.
Peikert, C. (2016). A decade of lattice cryptography. Foundations & Trends in Theoretical Computer
Science.
Regev, O. (2004). Lattices in computer science (Lecture 1–Lecture 7). Tel Aviv University, Fall.

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
References

1. Apostol, T. M. (1976). Introduction to analytic number theory. Springer.


2. Hardy, G. H., & Wright, E. M. (1979). An introduction to the theory of number. Oxford
University Press.
3. Leveque, W. J. (1977). Fundamentals of number theory. Addison-Wesley.
4. Lidl, R., & Niederreiter, H. (1983). Finite fields. Addison-Wesley.
5. VanLint, J. H. (1991). Introduction to coding theory. Springer.
6. Rosen, K. H. (1984). Elementary number theory and its applications. Addison-Wesley.
7. Rosen, M. H. (2002). Number theory in function fields. Springer.
8. Spencer, D. (1982). Computers in number theory. Computer Science Press.
9. Rényi, A. (1970). Probability theory. North-Holland.
10. Vander Walden, B. L. (1963). Algebra (I) (S. Ding, K. Zeng & F. Hao, Trans.). Science Press
(in Chinese).
11. Vander Walden, B. L. (1976). Algebra (II) (X. Cao, K. Zeng & F. Hao, Trans.). Science Press
(in Chinese).
12. Jacobson, N. (1989). Basic algebra (I) Translated by the Department of Algebra, Department
of Mathematics, Shanghai Normal University, Beijing. Higher Education Press (in Chinese).
13. Nie, L., & Ding, S. (2000). Introduction to algebra. Higher Education Press (in Chinese).
14. Li, X. (2010). Basic probability theory. Higher Education Press (in Chinese).
15. Long, Y. (2020). Probability theory and mathematical statistics. Higher Education Press (in
Chinese).
16. Berlekamp, E. R. (1968). Algebraic coding theory. McGraw-Hill.
17. Berlekamp, E. R. (1972). Decoding the Golay code. JPL Technical Report 32-1256 (Vol. IX,
pp. 81–85). Jet Propulsion Laboratory.
18. Best, M. R. (1978). On the existence of perfect codes. Report ZN 82/78. Mathematical Centre.
19. Best, M. R. (1980). Binary cades with a minimum distance of four. IEEE Transactions on
Information Theory, 26, 738–742.
20. Bussey, W. H. (1905). Galois field tables for p n 169. Bulletin of the American Mathematical
Society, 12, 22–38.
21. Bussey, W. H. (1910). Tables of Galois fields of order less than 1000. Bulletin of the American
Mathematical Society, 16, 188–206.
22. Cameron, P. J., & van Lint, J. H. (1991). London Mathematical Society Student Texts: Vol.
22. Designs, graphs, codes and their links. Cambridge University Press.
23. Curtis, C. W., & Reiner, I. (1962). Representation theory of finite groups and associative
algebras. Interscience.
© The Editor(s) (if applicable) and The Author(s) 2022 353
Z. Zheng, Modern Cryptography Volume 1, Financial Mathematics and Fintech,
https://doi.org/10.1007/978-981-19-0920-7
354 References

24. Delsarte, P., & Goethals, J. M. (1975). Unrestricted codes with the Golay parameters are
unique. Discrete Mathematics, 12, 211–224.
25. Elias, P. Coding for noisy channels. IRE Conv, Record, Part 4, pp. 37–46.
26. Feller, W. (1950). An introduction to probability theory and its applications (Vol. I). Wiley.
27. Forney, G. D. (1970). Convolutional codes I: algebraic structure. IEEE Transactions on Infor-
mation Theory, 16, 720–738. Ibid., 17, 360 (1971).
28. Gallagher, R. G. (1968). Information theory and reliable communication. Wiley.
29. Goethals, J. M. (1977). The extended Nadler code is unique. IEEE Transactions on Information
Theory, 23, 132–135.
30. Goppa, V. D. (1970). A new class of linear error-correcting codes. Problems of Information
Tansmission, 6, 207–212.
31. Goto, M. (1975). A note on perfect decimal AN codes. Information and Control, 29, 385–387.
32. Goto, M., & Fukumara, T. (1975). Perfect nonbinary AN codes with distance three. Informa-
tion and Control, 27, 336–348.
33. Graham, R. L., & Sloane, N. J. A. (1980). Lower bounds for constant weight codes. IEEE
Transactions on Information Theory, 26, 37–40.
34. Gritsenko, V. M. (1969). Nonbinary arithmetic correcting codes. Problems of Information
and Transmission, 5, 15–22.
35. Helgert, H. J., & Stinaff, R. D. (1973). Minimum distance bounds for binary linear codes.
IEEE Transactions on Information Theory, 19, 344–356.
36. Justesen, J. (1975). An algebraic construction of rate v1 convolutional codes. IEEE Transac-
tions on Information Theory, 21, 577–580.
37. Kasami, T. (1969). An upper bound on k/n for affine invariant codes with fixed d/n. IEEE
Transactions on Information Theory, 15, 171–176.
38. Levenshtein, V. I. (1975). Minimum redundancy of binary error-correcting codes. Information
and Control, 28, 268–291.
39. van Lint, J. H. (1971). Nonexistence theorems for perfect error-correcting codes. In Computers
in algerbra and theory. SIAM-AMS Proceedings (Vol. IV).
40. van Lint, J. H. (1999). Introduction to coding theory, GTM86. Springer.
41. van Lint, J. H. (1972). A new description of the Nadler code. IEEE Transactions on Information
Theory, 18, 825–826.
42. van Lint, J. H. (1975). A survey of perfect codes. Rocky Mountain Journal of Mathematics,
5, 199–224.
43. van Lint, J. H., & Macwilliams, F. J. (1978). Generalized quadratic residue codes. IEEE
Transactions on Information Theory, 24, 730–737.
44. Macwilliams, F. J., & Sloane, N. J. A. (1977). The theory of error-correcting codes. North
Holland.
45. Massey, J. L., & Garcia, O. N. (1972). Error-correcting codes in computer arithmetic. In: J.
T. Ton (Ed.) Advances in information systems science (Vol. 4, Chap. 5). Plenum Press.
46. Massey, J. L., Costello, D. J., & Justesen, J. (1973). Polynomial weights and code construction.
IEEE Transactions on Information Theory, 19, 101–110.
47. McEliece, R. J., Rodemich, E. R., Rumsey, H. C., & Welch, L. R. (1977). New upper bounds
on the rate of a code via the Delsarte-MacWilliams inequalities. IEEE Transactions on Infor-
mation Theory, 23, 157–166.
48. McEliece, R. J. (1977). The theory of information and coding. In Encyclopedia of mathematics
and its applications (Vol. 3). Addison-Wesley.
49. McEliece, R. J. (1979). The bounds of Delsarte and Lovasz and their applications to coding
theory. In: G. Longo (Ed.) CISM Courses and Lectures: Vol. 258. Algebraic coding theory
and applications. Springer.
50. Peterson, W. W., & Weldon, E. J. (1972). Error-correcting codes (2nd Ed.). MIT Press.
51. Piret, Ph. (1977). Algebraic properties of convolutional codes with automorphisms (Ph.D.
Dissertation, Université Catholique de Louvain).
52. Posner, E. C. (1968). Combinatorial structures in planetary reconnaissance. In H. B. Mann
(Ed.), Error correcting codes (pp. 15–46). Wiley.
References 355

53. Rao, T. R. N. (1974). Error coding for arithmetic processors. Academic Press.
54. Roos, C. (1979). On the structure of convolutional and cyclic convolutional codes. IEEE
Transactions on Information Theory, 25, 676–683.
55. Shannon, C. E. (1948). A mathematical theory of communication. Bell Labs Technical Journal,
27, 379–423, 623–656.
56. Sloane, N. J. A., Reddy, S. M., & Chen, C. L. (1972). New binary codes. IEEE Transactions
on Information Theory, 18, 503–510.
57. Solomon, G., & van Tilborg, H. C. A. (1979). A connection between block and convolutional
codes. SIAM Journal on Applied Mathematics, 37, 358–369.
58. Tietäváinen, A. (1973). On the nonexistence of perfect codes over finite fields. SIAM Journal
on Applied Mathematics, 24, 88–96.
59. van der Geer, G., & van Lint, J. H. (1988). Introduction to coding theory and algebraic
geometry. Birkhäuser.
60. Hong, Y. (1984). On the nonexistence of unknown perfect 6- and 8-codes in Hamming schemes
H (n, q) with q arbitrary. Osaka Journal of Mathematics, 21, 687–700.
61. Kerdock, A. M. (1972). A class of low-rate nonlinear codes. Information and Control, 20,
182–187.
62. van Oorschot, P. C., & Vanstone, S. A. (1989). An introduction to error correcting codes with
applications. Kluwer.
63. Peek, J. H. (1985). Communications aspects of the compact disc digital audio system. IEEE
Communications Magazine, 23(2), 7–15.
64. Piret, Ph. (1988). Convolutional codes, an algebraic approach. The MIT Press.
65. Barg, A. M., Katsman, S. L., & Tsfasman, M. A. (1987). Algebraic geometric codes from
curves of small genus. Problems of Information Transmission, 23, 34–38.
66. Conway, J. H., & Sloane, N. J. A. (1994). Quaternary constructions for the binary single-
error-correcting codes of Julin, Best, and others. Designs, Codes and Cryptography, 41, 31–
42.
67. Feng, G.-L., & Rao, T. R. N. (1994). A simple approach for construction of algebraic-
geometric codes from affine plane curves. IEEE Transactions on Information Theory, 40,
1003–1012.
68. Høholdt, T., & Pellikaan, R. (1995). On the decoding of algebraic-geometric codes. IEEE
Transactions on Information Theory, 41, 1589–1614.
69. Høholdt, T., van Lint, J. H., & Pellikaan, R. (1998). Algebraic geometry codes. In V. S.
Pless, W. C. Huffman& R. A. Brualdi (Eds.), Hand-book of coding theory. Elsevier Science
Publishers.
70. Justesen, J., Larsen, K. J., Jensen, E. H., & Havemose, A., & Høholdt, T. (1989). Construction
and decoding of a class of algebraic geometry codes. IEEE Transactions on Information
Theory, 35, 811–821.
71. van Lint, J. H. (1990). Algebraic geometric codes. In D. Ray-Chaudhuri (Ed.), Coding theory
and design theory I. The IMA Volumes in Mathematics and its Applications (Vol. 20). Springer.
72. van Lint, J. H., & Wilson, R. M. (1992). A course in combinatorics. Cambridge University
Press.
73. Stichtenoth, H. (1993). Algebraic function fields and codes. Universitext. Springer.
74. Bassoli, R., Marques, H., & Rodriguez, J. (2013). Network coding theory, a survey. IEEE
Communications Surveys & Tutorials, 15(4), 1950–1978.
75. Berger, T. (1971). Rate distortion theory: a mathematical basis for data compression. Prentice-
Hall.
76. Billingsley, P. (1965). Ergodic theory and information. Wiley.
77. Chung, K. L. (1961). A note on the ergodic theorem of information theory. Annals of Mathe-
matical Statistics, 32, 612–614.
78. Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. Wiley.
79. Csiszár, I., & Körner, J. (1981). Information theory: Coding theorems for discrete memoryless
systems. Academic Press.
80. EI Gamal, A., & Kim, Y. H. (2011). Network information theory. Cambridge University Press.
356 References

81. Fragouli, C., Le Boudec, J. Y., & Widmer, J. (2006). Network coding: an instant primer.
ACMSIGCOMM Computer Communication Review, 36, 63–68.
82. Gallager, R. G. (1968). Information theory and reliable communication. Wiley.
83. Gray, R. M. (1990). Entropy and information theory. Springer.
84. Guiasu, S. (1977). Inormation theory with applications. McGraw-Hill.
85. Ho, T., & Lun, D. (2008). Network coding: An introduction. Computer Journal.
86. Hu, X. H., & Ye, Z. X. (2006). Generalized quantum entropy. Journal of Mathematical Physics,
47(2), 1–7.
87. Ihara, S. (1993). Information theory for continuous systems. World Scientific.
88. Kakihara, Y. (1999). Abstact methods in information theory. World Scientific.
89. McMillan, B. (1953). The basic theorems of information theory. Annals of Mathematical
Statistics, 24(2), 196–219.
90. Moy, S. C. (1961). A note on generalizations of Shannon-McMilllan theorem. Pacific Journal
of Mathematics, 11, 705–714.
91. Nielsen, M. A., & Chuang, I. L. (2000). Quantum computation and quantum information.
Cambridge University Press.
92. Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical
Journal, 27(4), 379–423, 623–656.
93. Shannon, C. E. (1959). Coding theorem for a discrete source with a fidelity criterion. IRE
National Convention Record, 4, 142–163.
94. Shannon, C. E. (1958). Channels with side information at the transmitter. IBM Journal of
Research and Development, 2(4), 189–193.
95. Shannon, C. E. (1961). Two-way communication channels. In Proceedings of the Fourth
Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1, pp. 611–644).
96. Thomasian, A. J. (1960). An elementary proof of the AEP of information theory. Annals of
Mathematical Statistics, 31(2), 452–456.
97. Wolfowitz, J. (1978). Coding theorems of information theory (3rd Ed.). Springer.
98. Ye, Z. X., & Berger, T. (1998). Information measures for discrete random fields. Science
Press.
99. Yeung, R. W. (2002). A first course in information theory. Kluwer Academic.
100. Qiu, P. (2003). Information theory and coding. Higher Education Press (in Chinese).
101. Qiu, P., Zhang, C., Yang, S., et al. (2012). Multi user information theory. Science Press (in
Chinese).
102. Ye, Z. (2003). Fundamentals of information theory. Higher Education Press (in Chinese).
103. Zhang, Z., & Lin, X. (1993). Information theory and optimal coding. Shanghai Science and
Technology Press (in Chinese).
104. Adleman, L. M. (1979). A subexponential algorithm for the discrete logarithm problem with
application to cryptography. In Proceedings of the 20th Annual Symposium on the Foundations
of Computer Science (pp. 55–60).
105. Adelman, L. M., Rivest, R. L., & Shamir, A. (1978). A method for obtaining digital signatures
and public-key crypto system. Communications of ACM, 21, 120–126.
106. Blum, M. Coin-flipping by telephone—A protocol for Solving impossible problems. IEEE
Proceeding, 133–137.
107. Diffie, W., & Hellman, M. E. (1976). New direction in crytography. IEEE Transactions in
Information Theory, IT-22, 644–654.
108. Hellman, M. E. (1979). The mathematics of public-key cryptography. Scientific America, 241,
146–157.
109. Hellman, M. E., & Merkle, R. C. (1978). Hiding information and signatures in trap door
knapsacks. IEEE Transactions in Information Theory, IT-24, 525–530.
110. Garey, M. R., & Johnson, D. S. (1979). Computers and intractability. A Guide to the Theory
of NP-Completeness. W.H. Freeman.
111. Coppersmith, D. (1984). Fast evaluation of logarithms in fields of characteristic two. IEEE
Transactions in Information Theory, IT-30, 587–594.
References 357

112. EI Gamal, T. (1985). A public key cryptosystem and a signature scheme based on discrete
logarithms. IEEE Transactions in Information Theory, IT-314, 469–472.
113. Gordon, J. A. (1985). Strong prime are easy to find. In Advance in cryptology. Proceedings
of Euro Crypt84 (pp. 216–223). Springer.
114. Fait, A., & Shamir, A. (1986). How to prove yourself: Practical solutions to identifications
and signature problems. In A advance in cryptology-CRYPTO’86 (Lvcs 263, pp. 186–194).
Springer.
115. Goldreich, O. (2001). Foundation of cryptography. Cambridge University Press.
116. Hill, L. S. (1931). Concerning certain linear transformation apparatus of cryptography. The
American Mathematical Monthly, 38, 135–154.
117. Knuth, D. E. (1973). The art of computer programming. Addison-Wesley.
118. Kranakis, E. (1986). Primality and cryptography. Wiley.
119. Kahn, D. (1967). The codebreakers: The story of secret writing. Macmillan.
120. Massey, J. L. (1983). Logarithms in finite cyclic group-cryptographic issues. In Proceedings
of the 4th Benelux Symposium on Informations Theory (pp. 17–25).
121. Koblitz, N. (1994). A course in number theory and cryptograph. Springer.
122. Ruggiu, G. (1985). Cryptology and complexity theories. In Advances in cryptology. Proceed-
ings of Eurocrypt 84 (pp. 3–9). Springer.
123. Rivest, R. L. (1985). RSA chips (past, present, and future). In Advances in cryptology. Pro-
ceedings of Eurocrypt 84 (pp. 159–165).
124. Schneier, B. (1996). Applied cryptography. Wiley.
125. Shannon, C. E. (1949). Communication theory of secrecy system. The Bell System Technical
Journal, 28, 656–715.
126. Odlyzko, A. M. (1985). Discrete logarithms in finite fields and their cryptographic signifi-
cance. In Advance in cryptology. Proceedings of Eurocrypt 84 (pp. 224–314). Springer.
127. Wah, P., & Wang, M. Z. (1984). Realization and application of Massey-Omura lock. In Pro-
ceedings of the International, Zürich Seminar (pp. 175–182).
128. Shamir, A. (1982). A polynomial time algorithm for breaking the basic Markle-Hellman
cryptosystem. In Proceedings of the 23rd Annual Symposium on the Foundations of Computer
Science (pp. 145–152).
129. Stinson, D. R. (2003). Principles and practice of cryptography (G. Feng, Trans.). Electronic
Industry Press (in Chinese).
130. Cover, T. M. (2003). Fundamentals of information theory. Tsinghua University Press (in
Chinese).
131. Trappe, W., & Washington, L. C. (2008). Cryptography and coding theory (Q. Wang et al.,
Trans.). People’s Posts and Telecommunications Publishing House (in Chinese).
132. Adelman, L. M., Pomerance, C., & Rumely, R. S. (1983). On distinguishing prime number
from composite numbers. Annuals of Mathematics, 117, 173–206.
133. Blair, W. D., Lacampague, C. B., & Selfridge, J. L. (1986). Factoring large numbers on a
Pocket calculator. The American Mathematical Monthly, 93, 802–808.
134. Brent, R. P. (1980). An improved Monte Carlo factorization algorithm. BIT, 20, 176–184.
135. Berent, R. P., & Pollared, J. M. (1981). Factorization of the eighth Fermat number. Mathe-
matics of Computation, 36, 627–630.
136. Cohen, H., & Lenstra, H. W. (1984). Primality testing and Jacobi sums. Mathematics of
Computation, 142, 297–330.
137. Davenport, H. (1982). The higher arithemetic. Cambridge University Press.
138. Dickson, L. E. (1952). History of the theory of number (Vol. 1). Chelsea.
139. Dixon, J. D. (1984). Factorization and primality tests. The American Mathematical Monthly,
91, 333–352.
140. Guy, R. K. (1975). How to factor a number. In Proceedings of the 5th Manitoba Conference
on Numerical Mathematics (pp. 49–89).
141. Kranakis, E. (1986). Primality and cryptography. Wiley.
142. Lehmer, D. H., & Powers, R. E. (1931). On factoring large number. Bulletin of the American
Mathematical Society, 37, 770–776.
358 References

143. Lehman, R. S. (1974). Factoring large number. Mathematics of Computation, 28, 637–646.
144. Miller, G. L. Riemann’s hypothesis and tests for primality. In Proceedings of the 7th Annual
ACM Symposium on the Theory of Computing (pp. 234–239).
145. Morrison, M. A., & Brillhart, J. (1975). A method of factoring and the factorization of F7 .
Mathematics of Computation, 29, 183–205.
146. Pomerance, C. (1981). Recent development in primality testing. The Mathematical Intelli-
gencer, 3, 97–105.
147. Pomerance, C. (1982). The search for prime number. Scientific American, 427, 136–147.
148. Pomerance, C. (1982). Analysis and comparison of some integer factoring algorithms. In
Computation Methods in Number Theory, Part 1. Mathematics Chentrum.
149. Pomerance, C., & Wagstaff, S. S. (1983). Implementation of the continued fraction integer
factor in algorithm. In Proceedings of the 12th Winnipeg Conference on Numerical Methods
and Computing.
150. Rabin, M. O. (1980). Probabilistic algorithms for testing Primality. Journal of Number Theory,
12, 128–138.
151. Pollard, J. M. (1975). A Monte Carlo method for factorization. BIT, 15, 331–334.
152. Solovag, R., & Strassen, V. (1977). A fast Munte Carlo test for primality. SIAM Journal for
Computing, 6, 84–85.
153. Wagon, S. (1986). Primality testing. The Mathematical Intelligence, 8, 58–61.
154. Wunderlich, M. C. (1979). A running time and analysis of Brillhart’s continued fraction
factoring method. Number Theory, Carbondale, Springer Lecture Notes, 175, 328–342.
155. Wunderlich, M. C. (1985). Implementing the continued fraction factoring algorithm on parallel
machines. Mathematics of Computation, 44, 251–260.
156. Fulton, W. (1969). Algebraic curves. Benjamin.
157. Koblitz, N. (1984). Introduction to elliptic curves and modular forms. Springer.
158. Koblitz, N. (1982). Why study equations over finite fields. Mathematics Magazine, 55, 144–
149.
159. Koblitz, N. (1987). Elliptic curves cryptosystems. Mathematics of Computation, 48.
160. Koblitz, N. Primality of the number of points on an elliptic curve over finite field.
161. Gupta, R., & Murty, M. R. (1986). Primitive points on elliptic curves. Compositio Mathemat-
ica, 58, 13–44.
162. Lenstra, H. W., Jr. (1986). Factoring integers with elliptic curves. Report 86-18. Mathematic
Institute University of Van Amsterdam.
163. Lenstra, H. W., Jr. (1986). Elliptic curves and number-theoretic algorithms. Report 86-19.
Mathematics Institute University of Van Amsterdam.
164. Lang, S. (1978). Elliptic curves: diophantine analysis. Springer.
165. Miller, V. (1985). Use of elliptic curves in cryptography. In Abstracts for Crypto’85.
166. Odlyzko, A. M. (1985). Discrete logarithms in finite fields and their cryptographic signifi-
cance. In Advance in cryptography. Proceedings of Eurocrypt 84 (pp. 224–314). Springer.
167. Schoof, H. (1985). Elliptic curves over finite fields and the computation of square roots mod
p. Mathematics of Computation, 44, 483–494.
168. Silverman, J. (1986). The arithmatic of elliptic curves. Springer.
169. Pollard, J. M. (1974). Theorems on factorization and primality testing. Mathematical Pro-
ceedings of the Cambridge Philosophical Society, 76, 521–528.

You might also like