0% found this document useful (0 votes)
87 views

MT3042-guide_2024 (1)

Uploaded by

maximivanovevg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views

MT3042-guide_2024 (1)

Uploaded by

maximivanovevg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 175

Undergraduate study in Economics,

Management, Finance and the Social Sciences

Optimisation
theory

B. von Stengel

MT3042
2024
Optimisation theory
B. von Stengel
MT3042
2024

Undergraduate study in
Economics, Management,
Finance and the Social Sciences

This subject guide is for a 300 course offered as part of the University of London’s
undergraduate study in Economics, Management, Finance and the Social
Sciences. This is equivalent to Level 6 within the Framework for Higher Education
Qualifications in England, Wales and Northern Ireland (FHEQ).
For more information see: london.ac.uk
This guide was prepared for the University of London by:
Bernhard von Stengel, Department of Mathematics, London School of Economics
and Political Science.
This is one of a series of subject guides published by the University. We regret that
due to pressure of work the author is unable to enter into any correspondence
relating to, or arising from, the guide. If you have any comments on this subject
guide, please communicate these through the discussion forum on the virtual
learning environment.

University of London
Publications Office
Stewart House
32 Russell Square
London WC1B 5DN
United Kingdom
london.ac.uk

Published by: University of London


© University of London 2024

The University of London asserts copyright over all material in this subject guide
except where otherwise indicated. All rights reserved. No part of this work may
be reproduced in any form, or by any means, without permission in writing from
the publisher. We make every effort to respect copyright. If you think we have
inadvertently used your copyright material, please let us know.
Contents

1 Introduction to the Subject Guide 5


1.1 What Is a Subject Guide? . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Introduction to the Subject Area . . . . . . . . . . . . . . . . . . . . 6
1.3 Syllabus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Aims of the Course . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Learning Outcomes for the Course . . . . . . . . . . . . . . . . . . 8
1.6 Employability Outcomes . . . . . . . . . . . . . . . . . . . . . . . . 8
1.7 Overview of Learning Resources . . . . . . . . . . . . . . . . . . . . 8
1.7.1 The Subject Guide . . . . . . . . . . . . . . . . . . . . . . . . 8
1.7.2 Essential Reading . . . . . . . . . . . . . . . . . . . . . . . . 9
1.7.3 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.7.4 Online Study Resources . . . . . . . . . . . . . . . . . . . . 10
1.7.5 The VLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.7.6 Making Use of the Online Library . . . . . . . . . . . . . . 11
1.8 Examination Advice . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.9 Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 Combinatorial Optimisation 14
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.1 Learning Outcomes . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.2 Essential Reading . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.3 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.4 Synopsis of Chapter Content . . . . . . . . . . . . . . . . . 15
2.2 Introductory Example: The Marriage Problem . . . . . . . . . . . . 16
2.3 Graphs, Digraphs, Networks . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Walks, Paths, Tours, and Cycles . . . . . . . . . . . . . . . . . . . . 19
2.5 Shortest Walks in Networks . . . . . . . . . . . . . . . . . . . . . . 20
2.6 Introduction to Algorithms . . . . . . . . . . . . . . . . . . . . . . . 23
2.7 Single-Source Shortest Paths: Bellman–Ford . . . . . . . . . . . . . 27
2.8 O-Notation and Running-Time Analysis . . . . . . . . . . . . . . . 35
2.9 Single-Source Shortest Paths: Dijkstra’s Algorithm . . . . . . . . . 38
2.10 Reminder of Learning Outcomes . . . . . . . . . . . . . . . . . . . 43
2.11 Exercises for Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . 43

2
Contents 3

3 Continuous Optimisation 46
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.1.1 Learning Outcomes . . . . . . . . . . . . . . . . . . . . . . . 46
3.1.2 Essential Reading . . . . . . . . . . . . . . . . . . . . . . . . 47
3.1.3 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.1.4 Synopsis of Chapter Content . . . . . . . . . . . . . . . . . 47
3.2 The Real Numbers and Their Order . . . . . . . . . . . . . . . . . . 48
3.3 Infimum and Supremum . . . . . . . . . . . . . . . . . . . . . . . . 51
3.4 Constructing the Real Numbers * . . . . . . . . . . . . . . . . . . . 52
3.5 Maximisation and Minimisation . . . . . . . . . . . . . . . . . . . . 55
3.6 Sequences, Convergence, and Limits . . . . . . . . . . . . . . . . . 57
3.7 Euclidean Norm and Maximum Norm . . . . . . . . . . . . . . . . 60
3.8 Sequences and Convergence in R𝑛 . . . . . . . . . . . . . . . . . . . 63
3.9 Open and Closed Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.10 Bounded and Compact Sets . . . . . . . . . . . . . . . . . . . . . . 67
3.11 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.12 Proving Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.13 The Theorem of Weierstrass . . . . . . . . . . . . . . . . . . . . . . 77
3.14 Using the Theorem of Weierstrass . . . . . . . . . . . . . . . . . . . 78
3.15 Reminder of Learning Outcomes . . . . . . . . . . . . . . . . . . . 82
3.16 Exercises for Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . 83

4 First-Order Conditions 85
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.1.1 Learning Outcomes . . . . . . . . . . . . . . . . . . . . . . . 85
4.1.2 Essential Reading . . . . . . . . . . . . . . . . . . . . . . . . 86
4.1.3 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.1.4 Synopsis of Chapter Content . . . . . . . . . . . . . . . . . 87
4.2 Introductory Example . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.3 Matrix Multiplication for Vectors and Scalars . . . . . . . . . . . . 91
4.4 Differentiability in R𝑛 . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.5 Partial Derivatives and 𝐶 1 Functions . . . . . . . . . . . . . . . . . 96
4.6 Taylor’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.7 Unconstrained Optimisation . . . . . . . . . . . . . . . . . . . . . . 100
4.8 Equality Constraints and the Theorem of Lagrange . . . . . . . . . 104
4.9 Inequality Constraints and the KKT Conditions . . . . . . . . . . . 110
4.10 Reminder of Learning Outcomes . . . . . . . . . . . . . . . . . . . 121
4.11 Exercises for Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . 121

5 Linear Optimisation 124


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.1.1 Learning Outcomes . . . . . . . . . . . . . . . . . . . . . . . 124
4 Contents

5.1.2 Essential Reading . . . . . . . . . . . . . . . . . . . . . . . . 125


5.1.3 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.1.4 Synopsis of Chapter Content . . . . . . . . . . . . . . . . . 126
5.2 Linear Functions, Hyperplanes, and Halfspaces . . . . . . . . . . . 127
5.3 Linear Programming in Two Dimensions . . . . . . . . . . . . . . . 130
5.4 Linear Programs and Duality . . . . . . . . . . . . . . . . . . . . . . 133
5.5 The Lemma of Farkas and Strong LP Duality . . . . . . . . . . . . . 136
5.5.1 Statement of the Lemma of Farkas . . . . . . . . . . . . . . 137
5.5.2 Proof of Strong LP Duality . . . . . . . . . . . . . . . . . . . 138
5.5.3 Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.5.4 Separating Hyperplanes . . . . . . . . . . . . . . . . . . . . 141
5.5.5 The Convex Cone in the Lemma of Farkas . . . . . . . . . . 144
5.5.6 Linear Independence and Carathéodory’s Theorem . . . . 145
5.5.7 Closedness of the Cone ∗ . . . . . . . . . . . . . . . . . . . . 147
5.6 Boundedness and Dual Feasibility . . . . . . . . . . . . . . . . . . . 148
5.7 Equality LP Constraints and Unrestricted Variables . . . . . . . . . 151
5.8 General LP Duality * . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
5.9 Complementary Slackness . . . . . . . . . . . . . . . . . . . . . . . 157
5.10 LP Duality and the KKT Theorem * . . . . . . . . . . . . . . . . . . 160
5.11 The Simplex Algorithm: Example . . . . . . . . . . . . . . . . . . . 162
5.12 The Simplex Algorithm: General Description * . . . . . . . . . . . . 166
5.13 Reminder of Learning Outcomes . . . . . . . . . . . . . . . . . . . 171
5.14 Exercises for Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . 172
1
Introduction to the Subject Guide

1.1 What Is a Subject Guide?

The subject guide is your main resource for studying the subject on your own.
This particular guide on Optimisation Theory is designed to contain all necessary
materials for self-study, with activities along the way and exercises at the end of
each chapter. Additional textbooks or other sources are only listed to allow you
to consider topics in more depth or from a different angle. The material can be
understood fully from the subject guide alone.
Given that this subject guide is largely self-contained, it is also similar to a
textbook on mathematical optimisation. At the same time, it aims to support you
in engaging actively with the subject.
This is a mathematics text, which assumes a mathematical mindset that aims
to be precise and is abstract. Mathematical thinking is based on concepts, such as
numbers, functions, or sets. In the mind of a mathematician, they have a clear
meaning, which are made precise with definitions and using certain commonly used
notations. The interesting parts are the useful and often surprising relationships
between these concepts stated as theorems (also called lemmas or propositions if of
lesser importance). Every theorem has a proof that argues that the theorem is
true, with a sequence of convincing logical steps that can be followed sentence by
sentence. The proof (which can be quite involved) is stated separately from the
theorem. In this text, every proof ends with the symbol. This has the explicit
purpose that one can skip the proof at first reading. It is important to understand
and remember the theorem in order to use it, but much less so its proof.
Precise meanings of words for mathematical concepts, mathematical notation,
and the formal statements of definitions, theorems, and proofs take some time to get
used to, but they are only employed to achieve clarity and precision. Understanding
mathematics requires seeing how these abstract concepts apply to specific examples
and problems. In fact, it is best to start with examples first, which is the approach

5
6 Chapter 1. Introduction to the Subject Guide

taken in this guide. The example should explain the concept, the theorem, and in
some cases even why the theorem is true and thereby the idea behind the proof.

1.2 Introduction to the Subject Area

This half course brings together several parts of the wide area of mathematical
optimisation, as encountered in many applied fields. The emphasis is on the
mathematical ideas and theory used in different types of optimisation called
combinatorial, linear, and continuous optimisation. Each of these types represents a
large topic of its own, and this course only gives an introduction to the basic ideas.
Combinatorial optimisation is about problems with finitely many combinations
of choices, of which the best one has to be found. For example, in a public
transportation network the problem may be to get from one location to another
by buses and underground trains in order to minimise travel time. The question
is then to find the best combination of bus and underground trips. A suitable
abstraction of this problem is a network with nodes that represent bus stops and
underground stations, and connections between them with associated travel times.
The optimisation problem is then to find the sequence of nodes with the shortest
overall travel time between two nodes. Combinatorial optimisation methods are
generally algorithms that take an input, such as the data of the transportation
network and the start node and end node of a trip, and compute an output, such
as the best route. All this will be covered in detail in Chapter 2.
At the other end of the spectrum of optimisation methods are choices of
continuous quantities such that a function that depends continuously on these
quantities is minimised. Chapter 3 introduces the theory for this set-up. A classic
problem is that of minimising the material needed for a cylindrical container such
as a beer can that contains a prescribed volume. In essence, all that is needed is the
height of the cylinder, which then determines its diameter in order to obtain the
given volume. The resulting surface area, as a function of the height, determines
the amount of material (of a fixed thickness) that is to be minimised. A suitable
method for this is known from calculus: Take the derivative of the surface area
function that depends on the single variable height and find the zeroes of this
derivative, one of which should be the minimum of the surface area function. It
turns out that a more general and elegant way is to look at the surface area as
a function of the two variables height and diameter, where these variables have
to satisfy the constraint of giving the prescribed volume. An optimal solution
then has to fulfill the property that the derivative vector of the function that is
optimised (called its gradient) is a linear multiple of the gradient of the constraint
function. This is explained in Chapter 4, which is about continuous optimisation
of differentiable functions.
1.3. Syllabus 7

Linear optimisation (also called linear programming) represents a kind of


bridge between these two optimisation topics. It can be applied with computer
algorithms to large-scale problems with many variables that are subject to many
linear constraints, with a linear function that is optimised. Conceptually, it has a
beautiful duality theory that provides a “dual” function which provides bounds
on the original objective function and thus shows whether an optimum has been
found. Because this topic is slightly more abstract, and sheds additional light on
the previous topics, it is treated in the final Chapter 5.

1.3 Syllabus

This course aims to bring together several parts of the wide area of mathematical
optimisation. The course starts with an introduction to combinatorial optimisation
with the discrete problem of finding shortest paths in networks.
Subsequent parts concentrate on continuous optimisation, and in this sense
extend the theory studied in standard calculus courses. In contrast to the Mathemat-
ics 1 and Mathematics 2 half courses, the emphasis in this part of the Optimisation
Theory course will be on the mathematical ideas and theory used in continuous
optimisation.
The final part on linear programming and its duality theorem relates to both
combinatorial and continuous optimisation.
This course covers the following topics:
• Introduction to combinatorial optimisation. Shortest paths in directed graphs.
Algorithms and their running time.
• Introduction and review of relevant parts from real analysis, with emphasis
on higher dimensions.
• Classical results on continuous optimisation: Weierstrass’s Theorem con-
cerning continuous functions on compact sets. Review with added rigour
of unconstrained optimisation of differentiable functions on open sets. La-
grange’s Theorem on equality-constrained optimisation. Karush, Kuhn, and
Tucker’s Theorem on inequality-constrained optimisation.
• Linear programming and duality.

1.4 Aims of the Course

This half course is designed to:


• enable students to obtain a rigorous mathematical background to optimisation
techniques used in areas such as logistics, economics, and finance
8 Chapter 1. Introduction to the Subject Guide

• enable students to understand the connections between various optimisation


approaches, the difference between combinatorial and continuous problems,
and about the suitability and limitations of optimisation methods for different
purposes.

1.5 Learning Outcomes for the Course

At the end of this half course and having completed the essential reading and
activities, students should:
• have knowledge and understanding of important definitions, concepts and
results in the subject, and of how to apply these in different situations
• have knowledge of basic techniques and methodologies in the topics covered
• have basic understanding of the theoretical aspects of the concepts and
methodologies covered
• be able to understand new situations and definitions, including combinations
with elements from different areas covered in the course, investigate their
properties, and relate them to existing knowledge
• be able to think critically and with sufficient mathematical rigour.

1.6 Employability Outcomes

Below are the four most relevant skill outcomes for students undertaking this
course which can be conveyed to future prospective employers:
• complex problem-solving
• decision making
• adaptability and resilience
• creativity and innovation.

1.7 Overview of Learning Resources

1.7.1 The Subject Guide

The main material is provided in this self-contained guide.


For each chapter, the necessary mathematical background and the required
material from previous chapters are listed in a first section on prerequisites and
learning objectives.
1.7. Overview of Learning Resources 9

Each chapter concludes with a set of exercises; many activities in the text
point to these exercises. They test methods (such as how to solve specific optimi-
sation problems), ask to prove some simple properties, or in some cases test the
understanding of mathematical concepts used in proofs.

1.7.2 Essential Reading

This subject guide is self-contained and no further essential reading is required.

1.7.3 Further Reading

While the subject guide is meant to provide all material, it is good study practice
to consult other resources as well. They give a different perspective, and allow
you to compare what you read in this guide with other descriptions. This is useful
in several respects. For example, it may help you to communicate with a future
colleague who has learned the material using slightly different terminology. It
may also help you to check and improve your own understanding of the topic.
Reading more than one author helps you acquire the generally useful skill of
quickly understanding technical texts.
The following books provide additional reading. The relevant reading material
will be repeated in each chapter, with explanations about its relevance.
Bryant, V. (1990). Yet Another Introduction to Analysis. Cambridge University Press,
Cambridge, UK. ISBN 978-0521388351.
Chvátal, V. (1983). Linear Programming. W. H. Freeman, New York. ISBN 978-
0716715870.
Conforti, M., G. Cornuéjols, and G. Zambelli (2014). Integer Programming. Springer,
Cham. ISBN 978-3319110073.
Cormen, T. H., C. E. Leiserson, R. L. Rivest, and C. Stein (2022). Introduction to
Algorithms, 4th ed. MIT Press, Cambridge, MA. ISBN 978-0262046305.
Dantzig, G. B. (1963). Linear Programming and Extensions. Princeton University
Press, Princeton, NJ. ISBN 978-0691059136.
Gale, D. (1960). The Theory of Linear Economic Models. McGraw-Hill, New York.
ISBN 978-0070227286.
Kuhn, H. W. (1991). Nonlinear programming: A historical note. In: History of
Mathematical Programming: A Collection of Personal Reminiscences, edited by J. K.
Lenstra, A. H. G. Rinnoy Kan, and A. Schrijver, 82–96. CWI and North-Holland,
Amsterdam. ISBN 978-0444888181.
Matoušek, J. and B. Gärtner (2007). Understanding and Using Linear Programming.
Springer, Berlin. ISBN 978-3540306979.
10 Chapter 1. Introduction to the Subject Guide

Papadimitriou, C. H. and K. Steiglitz (1998). Combinatorial Optimization: Algorithms


and Complexity. Dover, Mineola, NY. ISBN 978-0486402581.
Rudin, W. (1976). Principles of Mathematical Analysis, 3rd ed., volume 3. McGraw-Hill,
New York. ISBN 978-0070542358.
Schrijver, A. (1986). Theory of Linear and Integer Programming. John Wiley & Sons,
Chichester, UK. ISBN 978-0471908548.
Sundaram, R. K. (1996). A First Course in Optimization Theory. Cambridge University
Press, Cambridge, UK. ISBN 978-0521497190.

1.7.4 Online Study Resources

In addition to the subject guide and the Essential reading, it is crucial that you
take advantage of the study resources that are available online for this course,
including the VLE and the Online Library.
You can access the VLE, the Online Library and your University of London
email account via the Student Portal at: https://my.london.ac.uk
You should have received your login details for the Student Portal with your
official offer, which was emailed to the address that you gave on your application
form. You have probably already logged in to the Student Portal in order to register!
As soon as you registered, you will automatically have been granted access to the
VLE, Online Library and your fully functional University of London email account.
If you have forgotten these login details, please click on the ‘Forgot password’
link on the login page.

1.7.5 The VLE

The VLE, which complements this subject guide, has been designed to enhance
your learning experience, providing additional support and a sense of community.
It forms an important part of your study experience with the University of London
and you should access it regularly.
The VLE provides a range of resources for EMFSS courses:
Course materials: Subject guides and other course materials available for
download. In some courses, the content of the subject guide is transferred into the
VLE and additional resources and activities are integrated with the text.
• Readings: Direct links, wherever possible, to essential readings in the Online
Library, including journal articles and ebooks.
• Video content: Including introductions to courses and topics within courses,
interviews, lessons and debates.
1.7. Overview of Learning Resources 11

• Screencasts: Videos of PowerPoint presentations, animated podcasts and


on-screen worked examples.
• External material: Links out to carefully selected third-party resources.
• Self-test activities: Multiple-choice, numerical and algebraic quizzes to check
your understanding.
• Collaborative activities: Work with fellow students to build a body of knowl-
edge.
• Discussion forums: A space where you can share your thoughts and questions
with fellow students. Many forums will be supported by a ‘course moderator’,
a subject expert employed by LSE to facilitate the discussion and clarify difficult
topics.
• Past examination papers: We provide up to three years of past examinations
alongside Examiners’ commentaries that provide guidance on how to approach
the questions.
• Study skills: Expert advice on getting started with your studies, preparing for
examinations and developing your digital literacy skills.
Note: Students registered for Laws courses also receive access to the dedicated
Laws VLE.
Some of these resources are available for certain courses only, but we are
expanding our provision all the time and you should check the VLE regularly for
updates.

1.7.6 Making Use of the Online Library

The Online Library (http://onlinelibrary.london.ac.uk) contains a huge array of


journal articles and other resources to help you read widely and extensively.
To access the majority of resources via the Online Library you will either
need to use your University of London Student Portal login details, or you will be
required to register and use an Athens login.
The easiest way to locate relevant content and journal articles in the Online
Library is to use the Summon search engine.
If you are having trouble finding an article listed in a reading list, try removing
any punctuation from the title, such as single quotation marks, question marks
and colons.
For further advice, please use the online help pages which can be found at
https://onlinelibrary.london.ac.uk/resources/summon, or contact the Online Library
team using the ‘Chat with us’ function.
12 Chapter 1. Introduction to the Subject Guide

1.8 Examination Advice

A sample examination paper is available on the VLE. The general advice for exam
preparation is to identify the central idea behind each concept. Start with the basic
concepts, and test them with examples, as in the text or in the exercises. This will
help you understand what is going on, which is absolutely essential for coping with
a relatively abstract mathematical topic as in this guide. More involved concepts
can then be studied with a solid foundation of the basics.
Some methods that apply these concepts should also be practiced. However,
do not overdo this: The exam will most likely consist of unseen questions. Do not
rush into using a method that you have practiced without checking carefully if it
applies to the current question. A few minutes of thinking what is actually asked
can help you save a lot of precious time that you would lose with a wasted effort.
Write down your reasoning concisely in words rather than just producing a
sequence of equations. This will also help the examiner judge what approach you
are using in order to give you partial credit if the answer is not fully correct.
In general, allocate your time well, and proceed to the next part of the question,
or next question altogether, when you are stuck.

1.9 Conventions

Activities in the text are shown inside a box and start with an arrow ⇒, as in

⇒ Try Exercise 2.1.

References to statements inside this guide are in upper case, like “Theorem 4.5”,
and to other works in lower case, like “theorem 9.21 of Rudin (1976)”.
Sections with a star * after their title are optional reading and are included
for the mathematically inclined student who would like to know more about
more general ideas. The material in these “starred” sections is still designed to be
accessible, but is kept optional in order to limit the overall amount of important
things needed to learn in this course.
The following common mathematical notations are assumed to be known: N is
the set {1, 2, 3, . . .} of positive integers, Q is the set of rational numbers (fractions of
integers), R is the set of real numbers (points on the real line, with a mathematical
construction discussed in Section 3.4), and, as a special notation used in this guide,
R≥ is the set of nonnegative reals, R≥ = { 𝑥 ∈ R | 𝑥 ≥ 0}. This definition of a set
reads “R≥ is the set of real numbers 𝑥 such that 𝑥 is greater than or equal to zero”,
where the vertical bar | means “such that”.
1.9. Conventions 13

If 𝐴 and 𝐵 are sets, then 𝐴 ⊆ 𝐵 means 𝐴 is a subset of 𝐵 (that is, every element
of 𝐴 is an element of 𝐵). This includes the case 𝐴 = 𝐵. If this is not allowed, that is,
𝐴 is a proper subset of 𝐵, then we write 𝐴 ⊂ 𝐵.
2
Combinatorial Optimisation

2.1 Introduction

This chapter is about combinatorial optimisation. That is, the set of possibilities
that can be optimised over is usually finite, and these possibilities consist of
combinations of choices. These combinations will be explored by algorithms that
are executed by a computer.
The concept of an algorithm will be explained in this chapter. Section 2.6 may
be the first time you ever see an algorithm. No prior computer programming
experience is necessary.
The contents of this chapter are independent of the remaining chapters of the
guide and are not needed for studying them.

2.1.1 Learning Outcomes

After studying this chapter, you should be able to:


• understand graphs, digraphs, and networks, and their differences
• see graphs, digraphs, and networks as finite combinatorial structures (that is,
independently of how they are drawn, which is just for visualisation)
• explain how digraphs are stored as computer inputs, for example with adja-
cency lists as in (2.7), and how this extends to networks
• understand pseudo-code for algorithms and how blocks of code are shown
with indentation (starting at later vertical positions)
• explain the O-notation for running times of algorithms
• write down the Bellman–Ford algorithm
• understand the difference between the two versions of the Bellman–Ford
algorithm
• write down Dijkstra’s algorithm

14
2.1. Introduction 15

• understand the different assumptions for the algorithms of Bellman–Ford and


Dijkstra
• apply these algorithms by hand to small networks, and document their
progress with suitable tables.

2.1.2 Essential Reading

Essential reading is this chapter.

2.1.3 Further Reading

Algorithms are a central topic in computer science. They are closely related to data
structures that represent the elements of a network, say, in computer memory. The
“bible” of algorithms is the following book:
Cormen, T. H., C. E. Leiserson, R. L. Rivest, and C. Stein (2022). Introduction to
Algorithms, 4th ed. MIT Press, Cambridge, MA. ISBN 978-0262046305.
At nearly 1,400 pages, it describes important algorithms and their analysis in great
detail. In that book, you find further discussion of
• graphs and their representations (Section 2.3 in this guide) in chapter 20,
• single-source shortest path algorithms (Sections 2.5, 2.7, 2.9) in chapter 22,
• O-notation (Section 2.8) in chapter 3.
A good description of an algorithm for bipartite matching (see Section 2.2) is
given in section 10.2 of
Papadimitriou, C. H. and K. Steiglitz (1998). Combinatorial Optimization:
Algorithms and Complexity. Dover, Mineola, NY. ISBN 978-0486402581.

2.1.4 Synopsis of Chapter Content

The structure of this first chapter of the guide is as follows.


• Section 2.2 gives an introductory example of the so-called marriage problem. It
illustrates that a “brute force” method that tries out all possible combinations
is not feasible for larger problems.
• We then define typical discrete structures where combinatorial optimisation
applies: graphs, digraphs, networks (Section 2.3) and concepts about them
(Section 2.4) such as paths and cycles.
• Sections 2.5, 2.7, and 2.9 study optimisation problems for networks, concerned
with finding shortest paths from a starting node to any other node.
• Section 2.6 introduces algorithms and “pseudo-code” for describing them.
16 Chapter 2. Combinatorial Optimisation

• Section 2.8 defines the O-notation to analyse and describe the running time of
algorithms in a concise manner.
Section 2.1.3 gives some pointers to further optional reading on the large subject of
algorithms and combinatorial optimisation.

2.2 Introductory Example: The Marriage Problem

We start with an example of a combinatorial problem defined on a graph, called


the marriage problem. Suppose there are 𝑛 women and 𝑛 men, and for every pair
(𝑎, 𝑏) of a woman 𝑎 and a man 𝑏 it is known if they would both consider marrying
each other, called a “possible couple”.

𝑎1 𝑏1

𝑎2 𝑏2 (2.1)

𝑎3 𝑏3

An example is shown in (2.1) for 𝑛 = 3 with women 𝑎 1 , 𝑎 2 , 𝑎 3 and men 𝑏 1 , 𝑏 2 , 𝑏 3 .


The possible couples are shown by lines between them as (𝑎1 , 𝑏1 ), (𝑎 2 , 𝑏1 ), (𝑎 3 , 𝑏2 ),
and (𝑎 3 , 𝑏3 ). The problem is to marry as many women and men. They must
be possible couples, and every woman and man can have at most one marriage
partner. The maximum number of couples is clearly 𝑛. A simpler problem is to
decide if such a “perfect matching” is possible.
In the example (2.1), this is not possible, even though there is a possible partner
for every woman and man. This is easily seen as follows: The sole possible partner
for both 𝑎1 and 𝑎 2 is 𝑏 1 , but 𝑏 1 can marry only one of them (similarly, 𝑎 3 can marry
only one of the men 𝑏 2 and 𝑏 3 who have only 𝑎3 as a possible partner).

⇒ Try Exercise 2.1, which considers these possibilities when one more possible
couple is added to (2.1).

Instead of women and men who marry, the matching problem similarly applies
to assigning workers to jobs (where the graph describes which worker can do
which job), or other applications. In an abstract setting, the women and men define
the nodes of a graph, with the possible couples called edges, here drawn as lines
between the nodes. The graph in the marriage problem has the special property of
being bipartite (meaning that edges always connect two nodes 𝑎 and 𝑏 that come
from two disjoint sets). We will soon, in Definition 2.1, define the concept of a
graph.
2.3. Graphs, Digraphs, Networks 17

We use the matching problem to illustrate how combinatorial optimisation


problems should not be solved as soon as the problem becomes large. In principle,
a finite optimisation problem can be solved by trying out all possible combinations
and finding the best one. This is called a brute force approach. For the marriage
problem with 𝑛 women and 𝑛 men, a brute force approach is to try out all possible
ways to order the 𝑛 men so as to marry them to the 𝑛 women 𝑎1 , 𝑎2 , . . . , 𝑎 𝑛 and
to check if the resulting couples are all possible. There are six possible ways of
ordering three men, namely in the order 123, 132, 213, 231, 312, or 321. The order
213 means that the men 𝑏 2 , 𝑏1 , 𝑏3 are matched in that order to the women 𝑎 1 , 𝑎2 , 𝑎3 ,
with couples (𝑎 1 , 𝑏2 ), (𝑎 2 , 𝑏1 ), (𝑎 3 , 𝑏3 ). However, this is not a perfect matching
because the couple (𝑎1 , 𝑏2 ) is not allowed in (2.1).
This brute force approach works for 𝑛 = 3. However, for general 𝑛 there are
𝑛! = 𝑛 · (𝑛 − 1) · · · 2 · 1 such orderings (namely 𝑛 choices for the partner of 𝑎1 , then
𝑛 − 1 for the partner of 𝑎 2 , and so on). Already for 𝑛 = 32, say, 𝑛! is about 3 × 1029 .
Even if ten billion (1010 ) possibilities could be checked per second, this would take
1012 years, about 100 times the current age of the universe. So such an approach
is clearly impractical, even taking future improvements in computing technology
into account.
Much research in combinatorial optimisation is dedicated to finding computa-
tionally efficient algorithms whose running time “scales well” with the size of their
input. For the matching problem on bipartite graphs with 𝑛 nodes on each side,
there is a relatively straightforward algorithm that can find the maximum number
of couples in a number of steps that is proportional to 𝑛 3 . This is vastly better than
the brute-force number of 𝑛! many possibilities. For 𝑛 = 32, on a modern computer
(such as your smartphone), this is performed in a fraction of a second, and even for
𝑛 = 1, 000 in less than a minute. So it is clearly useful to look for clever algorithms.
For reasons of space, we will in fact not consider further the marriage problem,
or bipartite graphs. Suitable reading on this topic is mentioned in Section 2.1.3.

2.3 Graphs, Digraphs, Networks

We now define basic terminology for combinatorial problems defined on graphs.

Definition 2.1. A graph is given by (𝑉 , 𝐸) with a finite set 𝑉 of vertices or nodes and
a set 𝐸 of edges which are unordered pairs {𝑢, 𝑣} of nodes 𝑢, 𝑣.

The terminology of “vertex” and “edge” is derived from a geometric view,


for example of a three-dimensional cube, whose vertices (corners) are indeed
connected by edges of the cube. The following picture on the left shows a distorted,
“flattened” view of such a cube. According to Definition 2.1, the graph is a
combinatorial structure, that is, it only matters which nodes are connected by edges,
18 Chapter 2. Combinatorial Optimisation

not how these edges are drawn. To emphasise the nodes (which may ambiguously
appear when edges cross in the drawing), they are typically drawn as small disks,
as in the middle picture. It may also be necessary to draw crossing edges, as in the
right picture which has an additional edge from 𝑢 to 𝑣.
u

The following is an example with 𝑉 = {𝑎, 𝑏, 𝑐, 𝑑}, and edges in 𝐸 given by the
unordered pairs {𝑎, 𝑏}, {𝑏, 𝑐}, {𝑏, 𝑑}, and {𝑐, 𝑑}:

a b

(2.2)

c d

We will normally not assume that connections between two nodes 𝑢 and 𝑣 are
symmetric (even though this may apply in many cases). The concept of a directed
graph allows us to distinguish between getting from 𝑢 to 𝑣 and getting from 𝑣 to 𝑢.

Definition 2.2. A digraph (directed graph) is given by (𝑉 , 𝐴) with a finite set 𝑉 of


vertices or nodes and a set 𝐴 of arcs which are ordered pairs (𝑢, 𝑣) of nodes 𝑢, 𝑣.

The following is an example with 𝑉 = {𝑎, 𝑏, 𝑐, 𝑑}, and arcs in 𝐴 given by the
ordered pairs (𝑎, 𝑏), (𝑏, 𝑐), (𝑐, 𝑑), (𝑏, 𝑑), and (𝑑, 𝑏), where an arc (𝑢, 𝑣) is drawn as
an arrow that points from 𝑢 to 𝑣 :

a b

(2.3)

c d

An arc (𝑢, 𝑣) is also called an arc from 𝑢 to 𝑣. In a digraph, we do not allow arcs
(𝑢, 𝑢) from a node 𝑢 to itself (such arcs are called “loops”). We do allow arcs in
reverse directions such as (𝑢, 𝑣) and (𝑣, 𝑢), as in the example (2.3) for 𝑢, 𝑣 = 𝑏, 𝑑.
Because 𝐴 is a set, it cannot record multiple or “parallel” arcs between the same
nodes, that is, two or more arcs of the form (𝑢, 𝑣), so these are automatically
excluded.
2.4. Walks, Paths, Tours, and Cycles 19

⇒ Given this definition of digraphs which does not allow loops (𝑢, 𝑢) or multiple
parallel arcs, answer Exercise 2.2.

The following definition defines a network as a digraph where every arc has a
weight. This weight is in principle a real number (an element of R) but needs to
be stored in a computer in finite form, and is therefore assumed to be a fraction
(that is, a rational number, an element of Q). Sometimes we also allow weights
that are larger than any rational number, denoted by the symbol ∞ for infinity; in
a computer, this has a separate encoding as a special “number”.
Definition 2.3. Let 𝐷 = (𝑉 , 𝐴) be a digraph. A network is given by (𝐷, 𝑤) with a
weight function 𝑤 : 𝐴 → Q ∪ {∞} that assigns a weight 𝑤(𝑢, 𝑣) to every arc (𝑢, 𝑣)
in 𝐴.

The following is an example of a network with the digraph from (2.3). Next to
each arc is written its weight. In many examples we use integers as weights, but
this need not be the case, like here where 𝑤(𝑎, 𝑏) = 1.2; this is a fraction, namely 12
10 .

1.2
a b

−9 3 −7

c d
2

2.4 Walks, Paths, Tours, and Cycles

The underlying structure in our study will always be a digraph, typically with a
weight function so that this becomes a network. The arcs in the digraph represent
connections between nodes that can be followed. A sequence of such connections
defines a walk in the network. The following terminology also defines certain
special cases of walks.
Definition 2.4. Let 𝐷 = (𝑉 , 𝐴) be a digraph. A walk in 𝐷 is a finite sequence of
nodes 𝑢0 , 𝑢1 , . . . , 𝑢 𝑘 for some 𝑘 ≥ 0 such that (𝑢𝑖 , 𝑢𝑖+1 ) ∈ 𝐴 for 0 ≤ 𝑖 < 𝑘, which
are the 𝑘 arcs of the walk, and 𝑘 is called the length of the walk. This is also called a
𝑢0 , 𝑢 𝑘 -walk, or a walk from 𝑢0 to 𝑢 𝑘 , or a walk with startpoint 𝑢0 and endpoint 𝑢 𝑘 .
The walk is called a path if the nodes 𝑢0 , 𝑢1 , . . . , 𝑢 𝑘 are all distinct, a tour if 𝑢0 = 𝑢 𝑘 ,
and a cycle if it is a tour and the nodes 𝑢1 , . . . , 𝑢 𝑘 are all distinct.

The visited nodes on a walk are just the nodes of the walk. In a walk, but not in
a path, a node may be revisited. A tour starts and ends at the same node. A cycle
has also the same startpoint and endpoint but otherwise does not allow revisiting
a node.
20 Chapter 2. Combinatorial Optimisation

In (2.3), the sequence 𝑎, 𝑏, 𝑐, 𝑑, 𝑏 is a walk but not a path, 𝑎, 𝑏, 𝑐, 𝑑 is a path,


𝑑, 𝑏, 𝑑 is a tour and a cycle, and 𝑐, 𝑑, 𝑏, 𝑑, 𝑏, 𝑐 is a tour but not a cycle.
Sometimes a walk, path, tour, or cycle is called a directed walk, directed
path, directed tour, or directed cycle to emphasise that arcs cannot be traversed
backwards. Because we only consider a digraph (and not, say, the graph that
results when ignoring the direction of each arc), we do not need to add the adjective
“directed”.
One calls a graph “connected” if any two nodes are connected by a path; for
example, the graph (2.2) is connected.
Paths are of particular interest in digraphs because there are only finitely many
of them, because each visited node on a path must be new. (As discussed at the
beginning of this chapter, the number of possible paths may still be huge.) The
following is a simple but crucial observation.

Proposition 2.5. Consider a digraph and two nodes 𝑢, 𝑣. If there is a walk from 𝑢 to 𝑣,
then there is a path from 𝑢 to 𝑣.

Proof. Consider a walk 𝑢0 , 𝑢1 , . . . , 𝑢 𝑘 with 𝑢0 = 𝑢 and 𝑢 𝑘 = 𝑣. If this is not a


path, then 𝑢𝑖 = 𝑢 𝑗 for some 𝑖, 𝑗 with 0 ≤ 𝑖 < 𝑗 ≤ 𝑘, so the sequence of nodes
𝑢𝑖 , 𝑢𝑖+1 , . . . , 𝑢 𝑗 is a tour 𝑇 with 𝑗 − 𝑖 arcs. By removing the nodes 𝑢𝑖+1 , . . . , 𝑢 𝑗 from
the walk (effectively, removing the “detour” 𝑇) we obtain a walk with fewer arcs.
By continuing in this manner, we eventually obtain a walk with no repetition of
nodes, which is a path. (This may result in a path of length zero when 𝑢 = 𝑣.)

⇒ Exercise 2.3 is about joining two paths together.

In a similar way, one can show the following (the qualifier “positive length” is
added to exclude the trivial cycle of length zero).

Proposition 2.6. Consider a digraph and a node 𝑢. If there is a tour of positive length
that starts and ends at 𝑢, then there is a cycle of positive length that starts and ends at 𝑢.

⇒ Provide your own proof of Proposition 2.6.

2.5 Shortest Walks in Networks

In a network, the weights typically represent costs of some sort associated with
the respective arcs. Weights for walks (and similarly of paths, tours, cycles) are
defined by summing the weights of their arcs.
2.5. Shortest Walks in Networks 21

Definition 2.7. Let (𝐷, 𝑤) be a network and let 𝑊 = 𝑢0 , 𝑢1 , . . . , 𝑢 𝑘 be a walk in 𝐷.


The weight 𝑤(𝑊) of the walk 𝑊 is the sum of the weights of the 𝑘 arcs of that walk:
𝑘−1
Õ
𝑤(𝑊) = 𝑤(𝑢𝑖 , 𝑢𝑖+1 ). (2.4)
𝑖=0

We allow for walks 𝑊 of length zero (𝑘 = 0), in which case 𝑤(𝑊) = 0. If in (2.4)
𝑤(𝑢𝑖 , 𝑢𝑖+1 ) = ∞ for some 𝑖, then 𝑤(𝑊) = ∞.

Note the difference between length and weight of a walk: length counts the
number of arcs in the walk, whereas weight is the sum of the weights of these arcs
(length is the same as weight only if every arc has weight one).
Given a network and two nodes 𝑢 and 𝑣, we are interested in the 𝑢, 𝑣-walk
of minimum weight, often called shortest walk from 𝑢 to 𝑣 (remember throughout
that “shortest” means “least weighty”). Because there may be infinitely many
walks between two nodes if there is a possibility to revisit some nodes on the
way, “minimum weight” may not be a number, but may be equal to plus or minus
infinity.

Definition 2.8. Let 𝑢, 𝑣 be two nodes in a network (𝐷, 𝑤). Let

𝑌(𝑢, 𝑣) = {𝑊 | 𝑊 is a walk from 𝑢 to 𝑣}

and 𝑦(𝑢, 𝑣) = {𝑤(𝑊) | 𝑊 ∈ 𝑌(𝑢, 𝑣)}. The distance from 𝑢 to 𝑣, denoted by dist(𝑢, 𝑣),
is defined as follows:

 +∞


 if 𝑤(𝑊) = ∞ for all 𝑤 ∈ 𝑌(𝑢, 𝑣),
dist(𝑢, 𝑣) = −∞ if 𝑦(𝑢, 𝑣) is nonempty and has no smallest number,
 min 𝑦(𝑢, 𝑣)

otherwise.

So dist(𝑢, 𝑣) is +∞ (also written ∞) if every walk from 𝑢 to 𝑣 has an arc of


weight ∞ (which includes the case that there is no walk from 𝑢 to 𝑣, that is,
𝑌(𝑢, 𝑣) = ∅), and −∞ if there are walks of arbitrarily negative weight from 𝑢 to 𝑣.
We will be able to identify the latter with a finite condition, namely the existence
of a cycle of negative weight that can be inserted into a walk from 𝑢 to 𝑣.

Theorem 2.9. Let 𝑢, 𝑣 be two nodes in a network (𝐷, 𝑤). Then dist(𝑢, 𝑣) = −∞ if and
only if there is a cycle 𝐶 with 𝑤(𝐶) < 0 that starts and ends at some node on a walk from
𝑢 to 𝑣. If dist(𝑢, 𝑣) ≠ ±∞, then

dist(𝑢, 𝑣) = min{𝑤(𝑃) | 𝑃 is a path from 𝑢 to 𝑣 }. (2.5)

Proof. We make repeated use of the proof of Proposition 2.5. Suppose there is a
walk 𝑃 = 𝑢0 , 𝑢1 , . . . , 𝑢 𝑘 with 𝑢0 = 𝑢 and 𝑢 𝑘 = 𝑣 and a cycle 𝐶 = 𝑢𝑖 , 𝑣1 , . . . , 𝑣ℓ −1 , 𝑢𝑖
22 Chapter 2. Combinatorial Optimisation

that starts and ends at some node 𝑢𝑖 on that walk, 0 ≤ 𝑖 ≤ 𝑘, with 𝑤(𝐶) < 0. Let
𝑛 ∈ N. We insert 𝑛 repetitions of 𝐶 into 𝑃 to obtain a walk 𝑊 that we write (in an
obvious notation) as
𝑊 = 𝑢0 , 𝑢1 , . . . , 𝑢𝑖 , [𝑣 1 , . . . , 𝑣ℓ −1 , 𝑢𝑖 , ]𝑛 𝑢𝑖+1 , . . . , 𝑢 𝑘 .
The first 𝑖 arcs together with the last 𝑘 − 𝑖 arcs of 𝑊 are those of 𝑃, with 𝑛
copies of 𝐶 in the middle, so 𝑊 has weight 𝑤(𝑃) + 𝑛 · 𝑤(𝐶). For larger 𝑛 this is
arbitrarily negative because 𝑤(𝐶) < 0, and 𝑊 belongs to the set 𝑌(𝑢, 𝑣). Hence
dist(𝑢, 𝑣) = −∞.
Conversely, let dist(𝑢, 𝑣) = −∞. Consider a path 𝑃 from 𝑢 to 𝑣 of minimum
weight 𝑤(𝑃) as given by the minimum in (2.5). Suppose there is a 𝑢, 𝑣-walk 𝑊
with 𝑤(𝑊) < 𝑤(𝑃), which exists because dist(𝑢, 𝑣) = −∞ (otherwise 𝑤(𝑃) would
be a lower bound of 𝑦(𝑢, 𝑣)). Because 𝑊 is clearly not a path, it contains a tour 𝑇
as in the proof of Proposition 2.5. If 𝑤(𝑇) ≥ 0 then we could remove 𝑇 from 𝑊 and
obtain a path 𝑊 ′ with weight 𝑤(𝑊 ′) = 𝑤(𝑊) − 𝑤(𝑇) ≤ 𝑤(𝑊) and thus eventually
a path of weight less than 𝑤(𝑃), in contrast to the definition of 𝑃. So 𝑊 contains a
tour 𝑇 with 𝑤(𝑇) < 0, which starts and ends at some node 𝑥, say. We now claim
that 𝑇 contains a cycle 𝐶 with 𝑤(𝐶) < 0. If 𝑇 is itself a cycle, that is clearly the case.
Otherwise, 𝑇 either contains a “subtour” 𝑇 ′ with 𝑤(𝑇 ′) < 0 (and in general some
other startpoint 𝑦) which we can consider instead of 𝑇, or else every subtour 𝑇 ′
of 𝑇 fulfills 𝑤(𝑇 ′) ≥ 0 in which case we can remove 𝑇 ′ from 𝑇 without increasing
𝑤(𝑇); an example of these two possibilities is shown in the following picture with
𝑇 = 𝑥, 𝑦, 𝑧, 𝑦, 𝑥 and 𝑇 ′ = 𝑦, 𝑧, 𝑦.
1 1 1 1
u x v u x v

1 1 −1 −2 (2.6)
−1 1

z y z y
−2 1
In either case, 𝑇 is eventually reduced to a cycle 𝐶 with 𝑤(𝐶) < 0 which is part of
𝑊 (where 𝑊 is modified alongside 𝑇 when removing subtours 𝑇 ′ of nonnegative
weight). This shows the first claim of the theorem.
This implies that if dist(𝑢, 𝑣) ≠ ±∞, then there is a walk and hence a path from
𝑢, 𝑣, and no 𝑢, 𝑣-walk contains a cycle or tour of negative weight, and hence (2.5)
holds according to the preceding reasoning.
Note that the left picture in (2.6) shows that we can have dist(𝑢, 𝑣) = −∞ even
though no cycle of negative weight can be inserted into a path from 𝑢 to 𝑣. In this
example it is only possible to insert a tour of negative weight into a path from 𝑢
to 𝑣, or to insert a cycle of negative weight into a walk from 𝑢 to 𝑣.
Theorem 2.9 would be simpler to prove if it just stated the existence of a
negative-weight tour that can be inserted into a walk from 𝑢 to 𝑣 as an equivalent
2.6. Introduction to Algorithms 23

statement to dist(𝑢, 𝑣) = −∞. However, it is common to call this condition the


existence of a negative-weight (or just “negative”) cycle. For that reason, we proved
the stronger statement.
Consider again the network in the left picture in (2.6) but with the arc (𝑦, 𝑥)
removed:
1 1
u x v

1
−1

z y
−2
In that case dist(𝑢, 𝑣) = 2 because the only walk from 𝑢 to 𝑣 is the path 𝑢, 𝑥, 𝑣, and
the negative (-weight) cycle 𝑦, 𝑧, 𝑦 can be reached from 𝑢 but cannot be extended
to a walk to 𝑣. Nevertheless, we will in the following consider all negative cycles
that can be reached from a given node 𝑢 as “bad” for the computation of distances
𝑑(𝑢, 𝑣) for nodes 𝑣.

2.6 Introduction to Algorithms

Before we describe two algorithms for computing shortest paths in networks, we


explain some notations for algorithms. An algorithm is a sequence of precise
instructions to solve a computational problem in finitely many steps. These
instructions are normally executed by a computer, but can also be performed
“manually”.
As a first example, we consider an algorithm that finds the minimum 𝑚 of a
finite nonempty set 𝑆 of real numbers.

Algorithm 2.10 (Finding the minimum of a nonempty finite set).

Input : finite nonempty set 𝑆 ⊂ R.


Output : minimum 𝑚 of 𝑆, that is, 𝑚 ∈ 𝑆 and 𝑚 ≤ 𝑥 for all 𝑥 ∈ 𝑆.

1. 𝑚 ← some element of 𝑆
2. remove 𝑚 from 𝑆
3. while 𝑆 ≠ ∅ :
4. 𝑥 ← some element of 𝑆
5. 𝑚 ← min{𝑚, 𝑥}
6. remove 𝑥 from 𝑆

In this algorithm, we first specify its behaviour in terms of its input and output.
Here the input is a nonempty finite set 𝑆 of real numbers, and the output is the
24 Chapter 2. Combinatorial Optimisation

minimum of that set, denoted by 𝑚. The algorithm is described by a sequence of


instructions. These instructions are numbered, but solely for reference; the line
numbers are irrelevant for the operation of the algorithm (some descriptions of
algorithms make use of them, such as “go to step 3”, but we do not).
Line 1 says that the variable 𝑚 (which here stores a real number) will be set
to the result of the right-hand side of the arrow “ ← ”. Such a statement is also
called an assignment. Here, this right-hand side is a function that produces some
element of 𝑆 (which for the moment we assume is implemented somehow). Line 2
states to remove 𝑚 from 𝑆. Line 3 is the beginning of a repeated sequence of
instructions, and says that as long as the condition 𝑆 ≠ ∅ is true, the instructions
in the subsequent lines 4–6 will be executed. The fact that the repeated set of
instructions are these three lines (and not just line 4, say) is indicated in this
notation by the indentation of lines 4–6, that is, they all start further to the right
with a fixed blank space inserted on the left (see also the discussion that follows
Algorithm 2.11 below). This convention makes such a “computer program” very
readable, and is in fact adopted by the programming language Python. Other
programming languages, such as C, C++, or Java, require that several instructions
that are to be executed together are put between delimiters { and } (so the opening
brace { would appear between lines 3 and 4 and the corresponding closing brace }
after line 6); we use indentation instead.
Consider now what happens in lines 3–6 which are executed repeatedly. First, if
𝑆 = ∅, then the computation in these lines finishes, and because there are no further
instructions, the algorithm terminates altogether. This happens immediately if
the original set 𝑆 contains only a single element. Otherwise, 𝑆 is not empty and
in line 4 another element 𝑥 is found in 𝑆. In line 5, the right-hand side is the
minimum of two numbers, here the current values of 𝑚 and 𝑥, and the result will
be assigned again to 𝑚. The effect is that if 𝑥 < 𝑚, then 𝑚 will assume the new
smaller value 𝑥, otherwise 𝑚 is unchanged. In line 6, the element 𝑥 is removed
from 𝑆. Because 𝑆 loses one element in each iteration and is originally finite, 𝑆 will
eventually become empty, and the algorithm terminates. It can be seen that 𝑚 will
then be the smallest of all the elements in 𝑆, as required.
Several observations are in order. First, 𝑆 will in practice not contain a
set of arbitrary real numbers but only of rational numbers, which have a finite
representation; even that representation is typically limited to a certain number
of digits. Nevertheless, the algorithm works also in an idealised setting where 𝑆
contains arbitrary reals. Second, the instruction in line 5 seems circuitous because
it asks to compute the minimum of two numbers 𝑚 and 𝑥, but we are meant to
define an algorithm that computes a minimum of a finite set. However, computing
the minimum of two numbers is a simpler task, and in fact one of the numbers, 𝑚,
will become the result. That is, line 5 can be replaced by the more basic conditional
instruction that uses an if statement:
2.6. Introduction to Algorithms 25

5. if 𝑥 < 𝑚 : 𝑚 ← 𝑥
where the assignment 𝑚 ← 𝑥 will not happen if 𝑥 < 𝑚 is false, that is, if 𝑥 ≥ 𝑚,
in which case 𝑚 is unchanged. We have chosen the description in Algorithm 2.10
because it is more readable.
A further observation is that the algorithm can be made more “elegant” by
avoiding the repetition of the similar instructions in lines 2 and 6. Namely, we
omit line 2 altogether and replace line 1 with the assignment
1. 𝑚 ← ∞
under the assumption that an element ∞ that is larger than all real numbers exists
and can be stored in the computer. In that case, the first element that is found in
the set 𝑆 is 𝑥 in line 4, which when compared in line 5 with 𝑚 (which currently has
value ∞) will certainly fulfill 𝑥 < 𝑚 and thus 𝑚 takes in the first iteration the value
of 𝑥, which is then removed from 𝑆 in line 6. So then the first iteration of the “loop”
in lines 3–6 performs what happened in lines 1–2 in the original Algorithm 2.10.
This variant of the algorithm is not only shorter but also more general because it
can also be applied to an empty set 𝑆. It is reasonable to define min ∅ = ∞ because
∞ is the neutral element of min in the sense that min{𝑥, ∞} = 𝑥 for all reals 𝑥, just
as an empty sum is 0 (the neutral element of addition) or an empty product is 1
(the neutral element of multiplication). For example, this would apply to the case
that dist(𝑢, 𝑣) = ∞ in (2.5) when there is no path from 𝑢 to 𝑣.
When Algorithm 2.10 terminates, the set 𝑆 will be empty and therefore no
longer be the original set. If this is undesired, one may instead create a copy of 𝑆
on which the algorithm operates that can be “destroyed” in this way while the
original 𝑆 is preserved.
This raises the question of how a set 𝑆 is represented in a computer. The best
way to think of this is as a table of a fixed length 𝑛, say, that stores the elements of 𝑆
which will be denoted as 𝑆[1], 𝑆[2], . . . , 𝑆[𝑛]. Each table element 𝑆[𝑖] for 1 ≤ 𝑖 ≤ 𝑛
is a “real” number in a given limited precision just as the variables 𝑚 and 𝑥. In
programming terminology, 𝑆 is then also called an array of numbers, with a given
array index 𝑖 in a specified range (here 1 ≤ 𝑖 ≤ 𝑛) to access the array element 𝑆[𝑖].
In the computer, the array corresponds to a consecutive sequence of memory cells,
each of which stores an array element. The only difference to a set 𝑆 is that in
that way, repetitions of elements may occur if the numbers 𝑆[1], 𝑆[2], . . . , 𝑆[𝑛]
are not all distinct. Computing the minimum of these 𝑛 (not necessarily distinct)
numbers is possible just as before. Algorithm 2.11, shown below, is close to an
actual implementation in a programming language such as Python. We just say
“numbers” which are real (in fact, rational) numbers as they can be represented in
a computer.
In Algorithm 2.11, the indentation (white space at the left) in lines 6–7 means
that these two statements are executed if the condition 𝑆[𝑘] < 𝑚 of the if statement
26 Chapter 2. Combinatorial Optimisation

Algorithm 2.11 (Finding the minimum in an array of numbers).

Input : 𝑛 numbers 𝑆[1], 𝑆[2], . . . , 𝑆[𝑛], 𝑛 ≥ 1.


Output : their minimum 𝑚 and its index 𝑖, that is, 𝑚 = 𝑆[𝑖] and 𝑚 ≤ 𝑆[𝑘] for all 𝑘.

1. 𝑚 ← 𝑆[1]
2. 𝑖 ← 1
3. 𝑘 ← 2
4. while 𝑘 ≤ 𝑛 :
5. if 𝑆[𝑘] < 𝑚 :
6. 𝑚 ← 𝑆[𝑘]
7. 𝑖 ← 𝑘
8. 𝑘 ← 𝑘+1

in line 5 is true. Line 8 has the same indentation as line 5 so the statement 𝑘 ← 𝑘 +1
is always executed inside the “loop” in lines 4–8. This is important because if line 8
was indented like lines 6 and 7, then if 𝑆[𝑘] ≥ 𝑚 the value of 𝑘 would stay the
same instead of being incremented by 1, so that the loop in lines 4–8 would from
then on be executed forever. The result would be a faulty algorithm that normally
never terminates (unless the elements in the array are in strictly decreasing order).
Line 2 and 7 together with their respective preceding line make sure that the
index 𝑖 will be such that always 𝑚 = 𝑆[𝑖] holds. If one is not interested in this
index 𝑖 of the minimum in the array, then lines 2 and 7 can be omitted.

⇒ Try Exercise 2.4, which is about a subtle change of the return value 𝑖 in
Algorithm 2.11 in case of repeated elements in the array 𝑆.

Algorithm 2.11 is very detailed and shows how to iterate with an index
variable 𝑘 through array elements 𝑆[𝑘] that represent the elements of a set.
Moreover, the array itself is not modified by this operation, unlike the description
in Algorithm 2.10. We normally aim for the most concise description. The following
is a short version that uses the loop description for all 𝑥 ∈ 𝑆 which means a suitable
iteration through the elements 𝑥 of 𝑆, where 𝑆 has some representation such as an
array.
Algorithm 2.12 (Finding the minimum in a set of numbers using “for all”).

Input : finite set of numbers 𝑆


Output : minimum 𝑚 of 𝑆, where 𝑚 = ∞ if 𝑆 is empty.

1. 𝑚 ← ∞
2. for all 𝑥 ∈ 𝑆 :
2.7. Single-Source Shortest Paths: Bellman–Ford 27

3. 𝑚 ← min{𝑚, 𝑥}

To represent a digraph 𝐷 = (𝑉 , 𝐴) we need representations of 𝑉 and 𝐴. For


the example (2.3), the following table lists in the top row all vertices of 𝐷, and each
column 𝑢 contains all vertices 𝑣 so that (𝑢, 𝑣) ∈ 𝐴.

𝑎 𝑏 𝑐 𝑑
𝑏 𝑐 𝑑 𝑏 (2.7)
𝑑

The columns in this table are also called adjacency lists (which in general have
different lengths). Such a table is easily stored in a computer. Often the vertices are
represented by the integers 1, . . . , |𝑉 | (which makes it easy to find the adjacency
list for each vertex).

⇒ Exercise 2.5 is about more elaborate adjacency lists.

2.7 Single-Source Shortest Paths: Bellman–Ford

We resume the discussion from Section 2.5. That is, in the following we will study
algorithms for single-source shortest paths, where some node 𝑠 (for “source”, or
“start node”) is specified and the task is to compute dist(𝑠, 𝑣) for all nodes 𝑣, or to
find out that some node 𝑣 can be reached from 𝑠 so that 𝑑(𝑠, 𝑣) = −∞, in which
case the algorithm will stop. In short, meaningful distances will only be computed
under the assumption that there is no negative cycle that can be reached from 𝑠.
The reasoning behind computing dist(𝑠, 𝑣) for all nodes 𝑣 is that even if one is only
interested in computing dist(𝑠, 𝑡) for a specific pair 𝑠, 𝑡, there is essentially no other
way than to compute dist(𝑠, 𝑣) for all other nodes 𝑣 because 𝑣 could be the last
node before 𝑡 on a shortest path from 𝑠 to 𝑡.
Algorithm 2.13 below, generally known as the Bellman–Ford Algorithm, finds
shortest paths from a single source node 𝑠 to all other nodes 𝑣, or terminates with
a warning message that a negative (-weight) cycle can be reached from 𝑠 so that all
nodes 𝑣 in that cycle (and possibly others) fulfill dist(𝑠, 𝑣) = −∞.
The algorithm uses an internal table 𝑑[𝑣, 𝑖 ] where 𝑣 is a vertex and 𝑖 takes
values between 0 and |𝑉 | − 1. Possible values for 𝑑[𝑣, 𝑖 ] are real numbers as well as
∞, where it is assumed that ∞ + 𝑥 = ∞ and min{∞, 𝑥} = 𝑥 for any real number 𝑥 or
if 𝑥 = ∞. The algorithm is presented in a first version that is easier to understand
than a second version (Algorithm 2.18 below) where the two-dimensional table
with entries 𝑑[𝑣, 𝑖 ] will be replaced by a one-dimensional array with entries 𝑑[𝑣].
We explain Algorithm 2.13 alongside the following example of a network with
four nodes 𝑠, 𝑥, 𝑦, 𝑧.
28 Chapter 2. Combinatorial Optimisation

Algorithm 2.13 (Bellman–Ford, first version).

Input : network (𝑉 , 𝐴, 𝑤) and source (start node) 𝑠.


Output : dist(𝑠, 𝑣) for all nodes 𝑣 if no such distance is −∞.

1. 𝑑[𝑠, 0] ← 0
2. for all 𝑣 ∈ 𝑉 − {𝑠} : 𝑑[𝑣, 0] ← ∞
3. 𝑖 ← 0
4. while 𝑖 < |𝑉 | − 1 :
5. for all 𝑣 ∈ 𝑉 : 𝑑[𝑣, 𝑖 + 1] ← 𝑑[𝑣, 𝑖 ]
6. for all (𝑢, 𝑣) ∈ 𝐴 :
7. 𝑑[𝑣, 𝑖 + 1] ← min{ 𝑑[𝑣, 𝑖 + 1], 𝑑[𝑢, 𝑖 ] + 𝑤(𝑢, 𝑣) }
8. 𝑖 ← 𝑖+1
9. for all (𝑢, 𝑣) ∈ 𝐴 :
10. if 𝑑[𝑢, |𝑉 | − 1] + 𝑤(𝑢, 𝑣) < 𝑑[𝑣, |𝑉 | − 1] :
11. print “Negative cycle!” and stop immediately
12. for all 𝑣 ∈ 𝑉 : dist(𝑠, 𝑣) ← 𝑑[𝑣, |𝑉 | − 1]

𝑣 𝑠 𝑥 𝑦 𝑧
2
s
1
x 𝑑[𝑣, 0] 0 ∞ ∞ ∞

−1 𝑑[𝑣, 1] 0 1 ∞ ∞ (2.8)
2
𝑑[𝑣, 2] 0 1 3 0
y z
1 𝑑[𝑣, 3] 0 1 1 0

The right side of (2.8) shows 𝑑[𝑣, 𝑖 ] as rows of a table for 𝑖 = 0, 1, 2, 3, with the
vertices 𝑣 as columns. In lines 1–2 of Algorithm 2.13, these values are initialised
(initially set) to 𝑑[𝑠, 0] = 0 and 𝑑[𝑠, 𝑣] = ∞ for 𝑣 ≠ 𝑆. Lines 4–8 represent the main
loop of the algorithm, where 𝑖 takes successively the values 0, 1, . . . |𝑉 | − 2, and the
entries in row 𝑑[𝑣, 𝑖 + 1] are computed from those in row 𝑑[𝑣, 𝑖 ]. The important
property of these numbers, which we prove shortly, is the following.
Theorem 2.14. In Algorithm 2.13, at the beginning of each iteration of the main loop
(lines 4–8), 𝑑[𝑣, 𝑖 ] is the smallest weight of any walk from 𝑠 to 𝑣 that has at most 𝑖 arcs.

The main loop begins with 𝑖 = 0 after line 3. In line 5, the entries 𝑑[𝑣, 𝑖 + 1] are
copied from 𝑑[𝑣, 𝑖 ], and will subsequently be updated. In the example (2.8), 𝑑[𝑣, 1]
first contains 0, ∞, ∞, ∞. Lines 6–7 describe a second “inner” loop that considers
all arcs (𝑢, 𝑣). Whenever 𝑑[𝑢, 𝑖 ] + 𝑤(𝑢, 𝑣) is smaller than 𝑑[𝑣, 𝑖 + 1], the assignment
𝑑[𝑣, 𝑖 + 1] ← 𝑑[𝑢, 𝑖 ] + 𝑤(𝑢, 𝑣) takes place. This will not happen if 𝑑[𝑢, 𝑖 ] = ∞
2.7. Single-Source Shortest Paths: Bellman–Ford 29

because then also 𝑑[𝑢, 𝑖 ] + 𝑤(𝑢, 𝑣) = ∞. For 𝑖 = 0, the only arc (𝑢, 𝑣) where this is
not the case is (𝑢, 𝑣) = (𝑠, 𝑥), in which case 𝑑[𝑢, 𝑖 ] + 𝑤(𝑢, 𝑣) = 𝑑[𝑠, 0] + 1 = 1, which
is less than ∞, resulting in the assignment 𝑑[𝑥, 1] ← 1. In (2.8), this assignment
is shown by the new entry 1 for 𝑑[𝑥, 1] surrounded by a box. This is the only
assignment of this sort. After all arcs have been considered, it can be verified that
the entries 0, 1, ∞, ∞ in row 𝑑[𝑣, 1] represent indeed the shortest weights of walks
from 𝑠 to 𝑣 that use at most one arc, as asserted by Theorem 2.14.
After 𝑖 is increased from 0 to 1 in line 8, the second iteration of the main loop
starts with 𝑖 = 1. Then arcs (𝑢, 𝑣) where 𝑑[𝑢, 𝑖 ] < ∞ are those where 𝑢 = 𝑠 or 𝑢 = 𝑥,
which are the arcs (𝑠, 𝑥), (𝑥, 𝑠), (𝑥, 𝑦), and (𝑥, 𝑧). The last two produce the updates
𝑑[𝑦, 2] ← 𝑑[𝑥, 1] + 𝑤(𝑥, 𝑦) = 1 + 2 = 3 and 𝑑[𝑧, 2] ← 𝑑[𝑥, 1] + 𝑤(𝑥, 𝑧) = 1 − 1 = 0,
shown by the boxed entries in row 𝑑[𝑣, 2] of the table. Again, it can be verified
that these are the weights of shortest walks from 𝑠 to 𝑣 with at most two arcs.
The last iteration of the main loop is for 𝑖 = 2, which produces only a single
update, namely when the arc (𝑧, 𝑦) is considered in line 7, where 𝑑[𝑦, 3] is updated
from its current value 3 to 𝑑[𝑦, 3] ← 𝑑[𝑧, 2] + 𝑤(𝑧, 𝑦) = 0 + 1 = 1. Row 𝑑[𝑣, 3]
then has the weights of shortest walks from 𝑠 to 𝑣 that use at most three arcs. The
main loop terminates when 𝑖 = 3 (in general, when 𝑖 = |𝑉 | − 1).
Because the network in (2.8) has only four nodes, any walk with more than
three arcs (in general, more than |𝑉 | − 1 arcs) cannot be a path. In fact, if there
is a walk with |𝑉 | arcs that is shorter than found so far, it must contain a tour of
negative weight, as will be proved in Theorem 2.16 below. In Algorithm 2.13, lines
9–11 test for a possible improvement of the current values in 𝑑[𝑣, |𝑉 | − 1] (the last
row in the table), much in the same way as in the previous updates in lines 6–7,
by considering all arcs (𝑢, 𝑣). However, unlike the assignment in line 7, such a
possible improvement is now taken to terminate the algorithm immediately with
the notification that there must be a negative cycle that can be reached from 𝑠.
The normal case is that no such improvement is possible. In that case, line 12
produces the desired output of the distances 𝑑(𝑠, 𝑣). In the example (2.8), these
are the entries 0, 1, 1, 0 in the last row 𝑑[𝑣, 3].
Before we prove Theorem 2.14, we note that any prefix of a shortest walk is a
shortest walk to its last node.

Lemma 2.15. Consider two nodes 𝑠 and 𝑣 in a network, and suppose 𝑊 = 𝑢0 , 𝑢1 , . . . , 𝑢 𝑘 is


a shortest walk from 𝑠 to 𝑣, with 𝑢0 , 𝑢 𝑘 = 𝑠, 𝑣. Then for 0 ≤ 𝑖 < 𝑘, the walk 𝑢0 , 𝑢1 , . . . , 𝑢𝑖
(called a prefix of 𝑊) is a shortest walk from 𝑢 to 𝑢𝑖 .

Proof. Suppose there was a shorter walk 𝑊 ′ from 𝑢 to 𝑢𝑖 than the prefix 𝑢0 , 𝑢1 , . . . , 𝑢𝑖
of 𝑊. Then 𝑊 ′ followed by 𝑢𝑖+1 , . . . , 𝑢 𝑘 is a shorter walk from 𝑠 to 𝑢 than 𝑊, which
contradicts the definition of 𝑊.
30 Chapter 2. Combinatorial Optimisation

Proof of Theorem 2.14. This is proved by induction. It clearly holds for 𝑖 = 0


because the only walk from 𝑠 with zero arcs is from 𝑠 to 𝑠, so that 𝑑[𝑠, 0] = 0 and
𝑑[𝑣, 0] = ∞ for 𝑣 ≠ 𝑠.
Suppose that the assertion holds for some 𝑖 ≥ 0. We show that it holds for
𝑖 + 1 when line 8 is reached, and hence at the beginning of the next iteration of
the loop in line 4. Consider a shortest walk 𝑊 from 𝑠 to 𝑣 that has most 𝑖 + 1
arcs (if such a walk exists; if not, then clearly 𝑑[𝑣, 𝑖 + 1] = 𝑑[𝑣, 𝑖 ] = ∞ as set in
line 5, and 𝑑[𝑢, 𝑖 ] = ∞ for all arcs (𝑢, 𝑣) that end in 𝑣, so 𝑑[𝑣, 𝑖 + 1] remains equal
to ∞ in line 7). Either 𝑊 has most 𝑖 arcs or exactly 𝑖 + 1 arcs. In the former
case, 𝑤(𝑊) = 𝑑[𝑣, 𝑖 ] by induction hypothesis. In the latter case, 𝑊 is given by
some walk of 𝑖 arcs from 𝑠 to some node 𝑢, followed by an arc (𝑢, 𝑣), where the
walk from 𝑠 to 𝑢 has minimum weight 𝑑[𝑢, 𝑖 ] by Lemma 2.15. In either case,
𝑤(𝑊) = 𝑑[𝑣, 𝑖 + 1] = min{𝑑[𝑣, 𝑖 ], 𝑑[𝑢, 𝑖 ] + 𝑤(𝑢, 𝑣)} as computed (via line 5) in
line 7, which was to be shown.

Theorem 2.14 presents one part of the correctness of Algorithm 2.13. A second
part is the correct detection of negative (-weight) cycles. We first consider an
example, which is the same network as in (2.8) with an additional arc (𝑦, 𝑠) of
weight −5, which creates two negative cycles, namely 𝑠, 𝑥, 𝑦, 𝑠 and 𝑠, 𝑥, 𝑦, 𝑧, 𝑠.

𝑣 𝑠 𝑥 𝑦 𝑧
2 𝑑[𝑣, 0] 0 ∞ ∞ ∞
s x
1
𝑑[𝑣, 1] 0 1 ∞ ∞
−5 −1 (2.9)
2 𝑑[𝑣, 2] 0 1 3 0

y z 𝑑[𝑣, 3] −2 1 1 0
1
neg. cycle? −4 −1

In this network, any walk from 𝑠 to 𝑦 has two or more arcs, so by Theorem 2.14 the
rows 𝑑[𝑣, 0], 𝑑[𝑣, 1], 𝑑[𝑣, 2] are the same as in (2.8) and only 𝑑[𝑣, 3] has the different
entries −2, 1, 1, 0, when the main loop terminates. In the additional row in the
table in (2.9), there are two possible improvements of 𝑑[𝑣, 3], namely of 𝑑[𝑠, 3],
indicated by −4 , when the arc (𝑦, 𝑠) is considered as (𝑢, 𝑣) in line 10, or of 𝑑[𝑥, 3],
indicated by −1 , when the arc (𝑠, 𝑥) is considered. Whichever improvement is
discovered first (depending on the order of arcs (𝑢, 𝑣) in line 9), it leads to the
immediate stop of the algorithm in line 11. Both improvements reveal the existence
of a walk with four arcs that is shorter than the current shortest walk with at
most three arcs. For the first improvement, this four-arc walk is 𝑠, 𝑥, 𝑧, 𝑦, 𝑠, for the
second it is 𝑠, 𝑥, 𝑦, 𝑠, 𝑥.
2.7. Single-Source Shortest Paths: Bellman–Ford 31

Theorem 2.16. Consider a network (𝑉 , 𝐴, 𝑤) with a source node 𝑠. Then there is a negative
cycle that starts at some node which can be reached from 𝑠 if and only if Algorithm 2.13
stops in line 11.

Proof. The existence of such a cycle corresponds to the condition dist(𝑠, 𝑣) = −∞


by Theorem 2.9, for any node 𝑣 on the cycle. Suppose there is no such cycle. Then
the weight of a shortest walk from 𝑠 to 𝑣 for any node 𝑣 is that of a shortest path.
Because a path has at most |𝑉 | − 1 arcs, this weight is computed as 𝑑[𝑣, |𝑉 | − 1]
according to Theorem 2.14. Furthermore, by the same reasoning, no improvement
is possible in line 10 (that is, the condition of the if statement is never true), and
the algorithm terminates with the output of all distances dist(𝑠, 𝑣) in line 12, none
of which is −∞.
Conversely, consider any cycle 𝐶 = 𝑣0 , 𝑣1 , . . . , 𝑣 𝑘−1 , 𝑣0 that starts and ends at
some node 𝑣0 which is reachable from 𝑠, which implies 𝑑[𝑣 𝑖 ] < ∞ for 0 ≤ 𝑖 < 𝑘,
where we let 𝑑[𝑣] := 𝑑[𝑣, |𝑉 | − 1] for brevity. Suppose that there is no improvement
in line 10, that is, for all arcs (𝑢, 𝑣) we have

𝑑[𝑢] + 𝑤(𝑢, 𝑣) ≥ 𝑑[𝑣] . (2.10)

Applied to the arcs in the cycle, this gives

𝑑[𝑣0 ] + 𝑤(𝑣0 , 𝑣1 ) ≥ 𝑑[𝑣1 ]


𝑑[𝑣1 ] + 𝑤(𝑣1 , 𝑣2 ) ≥ 𝑑[𝑣2 ]
.. .. ..
. . . (2.11)
𝑑[𝑣 𝑘−2 ] + 𝑤(𝑣 𝑘−2 , 𝑣 𝑘−1 ) ≥ 𝑑[𝑣 𝑘−1 ]
𝑑[𝑣 𝑘−1 ] + 𝑤(𝑣 𝑘−1 , 𝑣0 ) ≥ 𝑑[𝑣0 ] .
𝑘−1
Summation of these 𝑘 inequalities and subtracting the sum 𝑖=0 𝑑[𝑣 𝑖 ] on both
Í
sides gives 𝑤(𝐶) ≥ 0. That is, if the algorithm does not terminate in line 11, then all
cycles 𝐶 that can be reached from 𝑠 have nonnegative weight, as claimed. Note that
the condition only applies to reachable cycles 𝐶, because otherwise the inequalities
in (2.11) only imply ∞ + 𝑤(𝐶) ≥ ∞ which also holds if 𝑤(𝐶) < 0.
Theorems 2.14 and 2.16 demonstrate the correctness of Algorithm 2.13, that is,
it fulfills its stated input-output behaviour.
A small extension of the algorithm not only computes the distances from 𝑠
to 𝑣, but also a shortest path itself. Here we make use of Lemma 2.15, where it
suffices to know for each node 𝑣 only the last arc (𝑢, 𝑣) on such a shortest path.
That is, we only need to store the predecessor 𝑢 of the shortest path from 𝑠 to 𝑣,
until have arrived back at 𝑠. That is, the network formed by the shortest paths is
a tree with root 𝑠. A tree is a digraph with a distinguished node, the root, from
which is there is a unique path to every other node. It can be stored with a single
“predecessor” array which contains the predecessor pred[𝑣] on the path from the
32 Chapter 2. Combinatorial Optimisation

root to 𝑣 (unless 𝑣 = 𝑠, in which case pred[𝑠] is not given, often written NIL). The
following is an example of such a shortest path tree in a network with six nodes,
with the corresponding pred array and the distances from 𝑠. The arcs of the tree
(which are also part of the original network) are indicated as dashed arrows.

2 2
s x a 𝑣 𝑠 𝑥 𝑦 𝑧 𝑎 𝑏
1
−3
−1 2 dist(𝑠, 𝑣) 0 1 1 0 3 1
2
pred[𝑣] NIL 𝑠 𝑧 𝑥 𝑥 𝑧
y z b
1 1
(2.12)
The following algorithm is an extension of Algorithm 2.13 that also computes
the shortest-path predecessors.
Algorithm 2.17 (Bellman–Ford, first version, with shortest-path tree).
Input : network (𝑉 , 𝐴, 𝑤) and source (start node) 𝑠.
Output : dist(𝑠, 𝑣) for all nodes 𝑣 if no such distance is −∞, and predecessor pred[𝑣]
of 𝑣 on shortest path from 𝑠.

1. for all 𝑣 ∈ 𝑉 : 𝑑[𝑣, 0] ← +∞ ; pred[𝑣] ← NIL


2. 𝑑[𝑠, 0] ← 0
3. 𝑖 ← 0
4. while 𝑖 < |𝑉 | − 1 :
5. for all 𝑣 ∈ 𝑉 : 𝑑[𝑣, 𝑖 + 1] ← 𝑑[𝑣, 𝑖 ]
6. for all (𝑢, 𝑣) ∈ 𝐴 :
7. if 𝑑[𝑢, 𝑖 ] + 𝑤(𝑢, 𝑣) < 𝑑[𝑣, 𝑖 + 1] :
7a. 𝑑[𝑣, 𝑖 + 1] ← 𝑑[𝑢, 𝑖 ] + 𝑤(𝑢, 𝑣)
7b. pred[𝑣] ← 𝑢
8. 𝑖 ← 𝑖+1
9. for all (𝑢, 𝑣) ∈ 𝐴 :
10. if 𝑑[𝑢, |𝑉 | − 1] + 𝑤(𝑢, 𝑣) < 𝑑[𝑣, |𝑉 | − 1] :
11. print “Negative cycle!” and stop immediately
12. for all 𝑣 ∈ 𝑉 : dist(𝑠, 𝑣) ← 𝑑[𝑣, |𝑉 | − 1]

Line 1 of this algorithm not only initialises 𝑑[𝑣, 0] to ∞ but also pred[𝑣] to NIL,
for all nodes 𝑣. Line 2 then sets 𝑑[𝑠, 0] to 0.
Lines 7 and 7a represent the update of 𝑑[𝑣, 𝑖 + 1] in line 7 of Algorithm 2.13,
and line 7b the new assignment of the predecessor pred[𝑣] on the new shortest
2.7. Single-Source Shortest Paths: Bellman–Ford 33

walk to 𝑣 from 𝑠. This is the predecessor as computed by the algorithm. In general,


a shortest path is not necessarily unique. For example, in (2.13) a shortest path
from 𝑠 to 𝑧 could also go via node 𝑎. Algorithm 2.17 will only compute pred[𝑧] as 𝑥,
as shown in (2.13), irrespective of the order in which the arcs in 𝐴 are traversed in
line 6 (why? – hint: Theorem 2.14).
The following demonstrates the updating of the pred array in Algorithm 2.17
for our familiar example (2.8). Its main purpose is a notation to record the progress
of the algorithm with the additional information of the assignment pred[𝑣] ← 𝑢
in line 7b, by simply writing 𝑢 as a subscript next to the box which indicates the
update of 𝑑[𝑣, 𝑖 + 1] in line 7a. There are four such updates, and the most recent
one gives the final value of pred[𝑣] as shown in the last row of the table.

𝑣 𝑠 𝑥 𝑦 𝑧
2 𝑑[𝑣, 0] 0 ∞ ∞ ∞
s x
1
𝑑[𝑣, 1] 0 1 𝑠 ∞ ∞
−1 (2.13)
2 𝑑[𝑣, 2] 0 1 3 𝑥 0 𝑥

y z 𝑑[𝑣, 3] 0 1 1 𝑧 0
1
pred[𝑣] NIL 𝑠 𝑧 𝑥

Our notation with updated values of 𝑑[𝑣, 𝑖 +1] shown by boxes with a subscript
𝑢 for the corresponding arc (𝑢, 𝑣) is completely ad-hoc. We chose it to document
the progress of the algorithm in a compact and unambiguous way.
A case that has not yet occurred is that 𝑑[𝑣, 𝑖 + 1] is updated more than once
for the same value of 𝑖, in case there are several arcs (𝑢, 𝑣) where this occurs,
depending on the order in which these arcs are traversed in line 6. An example is
(2.12) above for 𝑖 = 2 and 𝑣 = 2, where the update of 𝑑[𝑏, 3] occurs from ∞ to 5
via the arc (𝑎, 𝑏), and then to 1 via the arc (𝑧, 𝑏). One may record these updates
of 𝑑[𝑏, 3] in the table as 5 𝑎 1 𝑧 , or by only listing the last update 1 𝑧 (if (𝑧, 𝑏) is
considered before (𝑎, 𝑏), this is the only update).
Algorithm 2.13 is the first version of the Bellman–Ford algorithm. The progress
of the algorithm is nicely described by Theorem 2.14. Algorithm 2.18 is a second,
simpler version of the algorithm. Instead of storing the current distances from 𝑠
for walks that use at most 𝑖 arcs in a separate table row 𝑑[𝑣, 𝑖 ], the second version
of the algorithm uses just a single array with entries 𝑑[𝑣]. The new algorithm has
fewer instructions, which we have numbered with some line numbers omitted for
easier comparison with the first version in Algorithm 2.13.
The main difference between this algorithm and the first version is the update
rule in line 7. The first version compared 𝑑[𝑣, 𝑖 + 1] with 𝑑[𝑢, 𝑖 ] + 𝑤(𝑢, 𝑣) where
34 Chapter 2. Combinatorial Optimisation

Algorithm 2.18 (Bellman–Ford, second version).

Input : network (𝑉 , 𝐴, 𝑤) and source (start node) 𝑠.


Output : dist(𝑠, 𝑣) for all nodes 𝑣 if no such distance is −∞, and predecessor pred[𝑣]
of 𝑣 on shortest path from 𝑠.

1. 𝑑[𝑠] ← 0
2. for all 𝑣 ∈ 𝑉 − {𝑠} : 𝑑[𝑣] ← ∞
4. repeat |𝑉 | − 1 times :
6. for all (𝑢, 𝑣) ∈ 𝐴 :
7. 𝑑[𝑣] ← min{ 𝑑[𝑣], 𝑑[𝑢] + 𝑤(𝑢, 𝑣) }
9. for all (𝑢, 𝑣) ∈ 𝐴 :
10. if 𝑑[𝑢] + 𝑤(𝑢, 𝑣) < 𝑑[𝑣] :
11. print “Negative cycle!” and stop immediately
12. for all 𝑣 ∈ 𝑉 : dist(𝑠, 𝑣) ← 𝑑[𝑣]

𝑑[𝑢, 𝑖 ] was always the value of the previous iteration, whereas the second version
compares 𝑑[𝑣] with 𝑑[𝑢] + 𝑤(𝑢, 𝑣) where 𝑑[𝑢] may already have improved in the
current iteration. The following simple example illustrates the difference.

𝑣 𝑠 𝑥 𝑦 𝑧

𝑑[𝑣, 0] 0 ∞ ∞ ∞

s x y z 𝑑[𝑣, 1] 0 1 ∞ ∞ (2.14)
1 1 1
𝑑[𝑣, 2] 0 1 2 ∞
𝑑[𝑣, 3] 0 1 2 3

The table on the right in (2.14) shows the progress of the first version of the
algorithm. Suppose that in the second version in line 6, the arcs are considered in
the order (𝑠, 𝑥), (𝑥, 𝑦), and (𝑦, 𝑧). Then the assignments in the inner loop in line 7
are 𝑑[𝑥] ← 1, 𝑑[𝑦] ← 2, 𝑑[𝑧] ← 3, so the complete array is already found in the
first iteration of the main loop in lines 4–7, without any further improvements in
the second and third iteration of the main loop. However, if the order of arcs in
line 6 is (𝑦, 𝑧), (𝑥, 𝑦), and (𝑠, 𝑥), then the only update in the main loop in line 7 in the
first iteration is 𝑑[𝑥] ← 1, with 𝑑[𝑦] ← 2 in the second iteration, and 𝑑[𝑧] ← 3
in the last iteration. In general, the main loop does need |𝑉 | − 1 iterations for the
algorithm to work correctly, as asserted by the following theorem.

Theorem 2.19. In Algorithm 2.18, at the beginning of the 𝑖th iteration of the main loop
(lines 4–7), 1 ≤ 𝑖 ≤ |𝑉 | − 1, we have 𝑑[𝑣] ≤ 𝑤(𝑊) for any node 𝑣 and any 𝑠, 𝑣-walk 𝑊
that has at most 𝑖 − 1 arcs. Moreover, if 𝑑[𝑣] < ∞, then there is some 𝑠, 𝑣-walk of weight
2.8. O-Notation and Running-Time Analysis 35

𝑑[𝑣]. If the algorithm terminates without stopping in line 11, then 𝑑[𝑣] = dist(𝑠, 𝑣) as
claimed.

Proof. The algorithm performs at least the updates of Algorithm 2.13 (possibly
more quickly), which shows that 𝑑[𝑣] ≤ 𝑤(𝑊) for any 𝑠, 𝑣-walk 𝑊 with at most
𝑖 − 1 arcs, as in Theorem 2.14. If 𝑑[𝑣] < ∞, then 𝑑[𝑣] = 𝑤(𝑊 ′) for some 𝑠, 𝑣-walk 𝑊 ′
because of the way 𝑑[𝑣] is computed in line 7. Furthermore, dist(𝑠, 𝑣) = 𝑑[𝑣] ≠ −∞
if and only if 𝑑[𝑢] + 𝑤(𝑢, 𝑣) ≥ 𝑑[𝑣] for all arcs (𝑢, 𝑣), as proved for Theorem 2.16.

2.8 O-Notation and Running-Time Analysis

In this section, we define the 𝑂-notation used to describe the running time of
algorithms, and apply it to the analysis of the Bellman–Ford algorithm.
The time needed to execute an algorithm depends on the size of its input, and
on the machine that performs the instructions of the algorithm. The size of the
input can be very accurately measured in terms of the number of bits (binary digits)
to represent the input. If the input is a network, then the input size is normally
measured more coarsely by the number of nodes and arcs, assuming that each
piece of associated information (such as the endpoints of an arc, and the weight of
an arc) can be stored in some fixed number of bits (which is realistic in practice).
The execution time of an instruction depends on the computer, and on the
way that the instruction is represented in terms of more primitive instructions,
for example how an assigment translates to the evaluation of the right-hand side
of the assignment and to storing the computed value of the assigned variable in
memory. Because computing technology is constantly improving, it is normally
assumed that a basic instruction, such as an assignment or a test of a condition
like 𝑥 < 𝑚, takes a certain constant amount of time, without specifying what that
constant is.
The functions to measure running times take nonnegative values. Let
R≥ = { 𝑥 ∈ R | 𝑥 ≥ 0}. (2.15)
Suppose 𝑓 : N → R≥ is a function where 𝑓 (𝑛) measures, say, the number of
microseconds needed to run the Bellman–Ford Algorithm 2.13 for a network with
𝑛 nodes on a specific computer. Changing the computer, or changing microseconds
to nanoseconds, would result in changing 𝑓 (𝑛) by a constant factor. It makes sense
to specify running times “up to a constant factor” as a function of the input size.
The 𝑂-notation, or “order of” notation, is designed to capture this, as well as the
asymptotic behaviour of a function (that is, of 𝑓 (𝑛) for sufficiently large 𝑛).
Definition 2.20. Consider two functions 𝑓 , 𝑔 : N → R≥ . Then we say 𝑓 (𝑛) ∈
𝑂(𝑔(𝑛)), or 𝑓 (𝑛) is of order 𝑔(𝑛), if there is some 𝐶 > 0 and 𝐾 ∈ N so that
𝑓 (𝑛) ≤ 𝐶 · 𝑔(𝑛) for all 𝑛 ≥ 𝐾.
36 Chapter 2. Combinatorial Optimisation

As an example, 1000 + 10𝑛 + 3𝑛 2 ∈ 𝑂(𝑛 2 ), for the following reason: choose


𝐾 = 10. Then 𝑛 ≥ 𝐾 implies 1000 ≤ 10𝑛 2 , 10𝑛 ≤ 𝑛 2 , and thus 1000 + 10𝑛 + 3𝑛 2 ≤
(10 + 1 + 3)𝑛 2 so that the claim is true for 𝐶 = 14. If we are interested in a
smaller constant 𝐶 when 𝑛 is large, we can choose 𝐾 = 100 and 𝐶 = 3.2. It is
clear that asymptotically (for large 𝑛) the quadratic term dominates the growth
of the function, which is captured by the notation 𝑂(𝑛 2 ). As a running time, this
particular function may, for example, represent 1000 time units to load the program,
10𝑛 time units for initialisation, and 3𝑛 2 time units to run the main algorithm.
The notation 𝑓 (𝑛) ∈ 𝑂(𝑔(𝑛)) is very commonly written as “ 𝑓 (𝑛) = 𝑂(𝑔(𝑛))”.
However, this is imprecise, because 𝑓 (𝑛) represents a function of 𝑛, whereas
𝑂(𝑔(𝑛)) is a set of functions, as stated in Definition 2.20.
For brevity, we say that an algorithm has “running time 𝑂(𝑔(𝑛))” rather than
the more accurate “a running time in 𝑂(𝑔(𝑛))”.
In many ways, 𝑂-notation is a shorthand, for example by allowing to omit
constants: For a positive constant 𝑐, it is immediate from Definition 2.20 that

𝑂(𝑐 · 𝑔(𝑛)) = 𝑂(𝑔(𝑛)) . (2.16)

In addition, 𝑂-notation allows to describe upper bounds of asymptotic growth in a


convenient way. We have for functions 𝑓 , 𝑔, ℎ : N → R≥

𝑓 (𝑛) ∈ 𝑂(𝑔(𝑛)), 𝑔(𝑛) ∈ 𝑂(ℎ(𝑛)) ⇒ 𝑓 (𝑛) ∈ 𝑂(ℎ(𝑛)) (2.17)

because if there are 𝐶, 𝐷 > 0 with 𝑓 (𝑛) ≤ 𝐶 · 𝑔(𝑛) for 𝑛 ≥ 𝐾 and 𝑔(𝑛) ≤ 𝐷 · ℎ(𝑛) for
𝑛 ≥ 𝐿, then 𝑓 (𝑛) ≤ 𝐶 · 𝐷 · ℎ(𝑛) for 𝑛 ≥ max{𝐾, 𝐿}. Note that (2.17) is equivalent to
the statement
𝑔(𝑛) ∈ 𝑂(ℎ(𝑛)) ⇔ 𝑂(𝑔(𝑛)) ⊆ 𝑂(ℎ(𝑛)) (2.18)
for the following reason: Suppose 𝑔(𝑛) ∈ 𝑂(ℎ(𝑛)). Then (2.17) says that any
function 𝑓 (𝑛) in 𝑂(𝑔(𝑛)) is also in 𝑂(ℎ(𝑛)), which shows 𝑂(𝑔(𝑛)) ⊆ 𝑂(ℎ(𝑛))
and thus “⇒” in (2.18). Conversely, if 𝑂(𝑔(𝑛)) ⊆ 𝑂(ℎ(𝑛)), then we have clearly
𝑔(𝑛) ∈ 𝑂(𝑔(𝑛)) and thus 𝑔(𝑛) ∈ 𝑂(ℎ(𝑛)), which shows “⇐” in (2.18).
What is 𝑂(1)? This is the set of functions 𝑓 (𝑛) that fulfill 𝑓 (𝑛) ≤ 𝐶 for all
𝑛 ≥ 𝐾, for some constants 𝐶 and 𝐾. Because the finitely many numbers 𝑛 with
𝑛 < 𝐾 are bounded, we can if necessary increase 𝐶 to obtain that 𝑓 (𝑛) ≤ 𝐶 for all
𝑛 ∈ N. In other words, 𝑂(1) is the set of functions that are bounded by a constant.
In addition to (2.17), the following rules are useful and easy to prove:

𝑓 (𝑛) ∈ 𝑂(𝑔(𝑛)) ⇒ 𝑓 (𝑛) + 𝑔(𝑛) ∈ 𝑂(𝑔(𝑛)) (2.19)

which shows that a sum of two functions can be “absorbed” into the function with
higher growth rate. With the definition

𝑂(𝑔(𝑛)) + 𝑂(ℎ(𝑛)) = { 𝑓 (𝑛) + 𝑓 ′(𝑛) | 𝑓 (𝑛) ∈ 𝑂(𝑔(𝑛)), 𝑓 ′(𝑛) ∈ 𝑂(ℎ(𝑛)) } (2.20)


2.8. O-Notation and Running-Time Analysis 37

one can similarly see

𝑂(𝑔(𝑛)) + 𝑂(ℎ(𝑛)) = 𝑂(𝑔(𝑛) + ℎ(𝑛)). (2.21)

In addition,
𝑓 (𝑛) ∈ 𝑂(𝑔(𝑛)) ⇒ 𝑛 · 𝑓 (𝑛) ∈ 𝑂(𝑛 · 𝑔(𝑛)) . (2.22)

We now apply this notation to analyse the running time of the Bellman–Ford
algorithm, where we consider Algorithm 2.17 because it is slightly more detailed
than Algorithm 2.13. Suppose the input to the algorithm is a network (𝑉 , 𝐴, 𝑤)
with 𝑛 = |𝑉 | and 𝑚 = |𝐴|. Line 1 takes running time 𝑂(𝑛) because in that line all
nodes are considered, each with two assignments that take constant time. Lines 2
and 3 take constant time 𝑂(1). The main loop in lines 4–8 is executed 𝑛 − 1 times.
Testing the condition 𝑖 < |𝑉 | − 1 in line 4 takes time 𝑂(1). Line 5 takes time 𝑂(𝑛).
The “inner loop” in lines 6–7b takes time 𝑂(𝑚) because the evaluation of the if
condition in line 7 and the assigments in lines 7a–7b take constant time (which is
shorter when they are not executed because the if condition is false, but bounded
in either case). Line 8 takes time 𝑂(1). So the time to perform one iteration of
the main loop in lines 4–8 is 𝑂(1) + 𝑂(𝑛) + 𝑂(𝑚) + 𝑂(1), which by (2.19) we can
shorten to 𝑂(𝑛 + 𝑚) because we can assume 𝑛 > 0. The main loop is performed
𝑛 − 1 times, where in view of the constants this can be simplified to multiplication
with 𝑛, that is, it takes together time 𝑛 · 𝑂(𝑛 + 𝑚) = 𝑂(𝑛 2 + 𝑛𝑚). The test for
negative cycles in lines 9–11 takes time 𝑂(𝑚), and the final assigment of distance
in line 12 time 𝑂(𝑛). So the overall running time from lines 1–3, 4–8, 9–11, 12
is 𝑂(𝑛) + 𝑂(𝑛 2 + 𝑛𝑚) + 𝑂(𝑚) + 𝑂(𝑛) where the second term absorbs the others
according to (2.19). So the overall running time is 𝑂(𝑛 2 + 𝑛𝑚).
The number 𝑚 of arcs of a digraph with 𝑛 nodes is at most 𝑛 · (𝑛 − 1), that is,
𝑚 ∈ 𝑂(𝑛 2 ), so that 𝑂(𝑛 2 + 𝑛𝑚) ⊆ 𝑂(𝑛 2 + 𝑛 3 ) = 𝑂(𝑛 3 ). That is, for a network with
𝑛 nodes, the running time of the Bellman–Ford algorithm is 𝑂(𝑛 3 ). (It is therefore also
called a cubic algorithm.)
The above analysis shows a more accurate running time of 𝑂(𝑛 2 + 𝑛𝑚) that
depends on the number 𝑚 of arcs in the network. The algorithm works for any
number of arcs (even if 𝑚 = 0). Normally the number of arcs is at least 𝑛 − 1
because otherwise some nodes cannot be reached from the source node 𝑠 (this
can be seen by induction on 𝑛 by adding nodes one at a time to the network,
starting with 𝑠: every new node 𝑣 requires at least one new arc (𝑢, 𝑣) in order to
be reachable from the nodes 𝑢 that are currently reachable from 𝑠). In that case
𝑛 ∈ 𝑂(𝑚) and thus 𝑂(𝑛 2 + 𝑛𝑚) = 𝑂(𝑛𝑚), so that the running time is 𝑂(𝑛𝑚). (We
have to be careful here: when we say the digraph has at least 𝑛 − 1 arcs, we cannot
write this as 𝑚 ∈ 𝑂(𝑛), because this would mean an upper bound on 𝑚; the correct
way to say this is 𝑛 ∈ 𝑂(𝑚), which translates to 𝑛 ≤ 𝐶𝑚 and thus to 𝑚 ≥ 𝑛/𝐶,
meaning that the number of arcs is at least proportional to the number of nodes.
38 Chapter 2. Combinatorial Optimisation

An upper bound for 𝑚 is given by 𝑚 ∈ 𝑂(𝑛 2 ).) In short, for a network with 𝑛 nodes
and 𝑚 arcs, where 𝑛 ∈ 𝑂(𝑚), the running time of the Bellman–Ford algorithm is
𝑂(𝑛𝑚).
The second version of the Bellman–Ford algorithm has the same running time
𝑂(𝑛 3 ). Algorithm 2.18 is faster than the first version but, in the worst case, only
by a constant factor, because the main loop in lines 4–7 is still performed 𝑛 − 1
times, and the algorithm would in general be incorrect with fewer iterations, as
the example in (2.14) shows (which can be generalised to an arbitrary path).

2.9 Single-Source Shortest Paths: Dijkstra’s Algorithm

Dijkstra’s Algorithm 2.21, shown below, is a single-source shortest path algorithm


that works in networks where all weights are nonnegative. We use the example
in Figure 2.1 to explain the algorithm, with a suitable table that demonstrates its
progress.

Algorithm 2.21 (Dijkstra’s algorithm for single-source shortest paths).

Input : network (𝑉 , 𝐴, 𝑤), 𝑤(𝑢, 𝑣) ≥ 0 for all (𝑢, 𝑣) ∈ 𝐴, source 𝑠.


Output : dist(𝑠, 𝑣) for all nodes 𝑣

1. for all 𝑣 ∈ 𝑉 : 𝑑[𝑣] ← ∞ ; colour[𝑣] ← black


2. 𝑑[𝑠] ← 0 ; colour[𝑠] ← grey
3. while there are grey nodes :
4. 𝑢 ← grey node with smallest 𝑑[𝑢]
5. colour[𝑢] ← white
6. for all 𝑣 ∈ 𝑉 so that colour[𝑣] ≠ white and (𝑢, 𝑣) ∈ 𝐴 :
7. colour[𝑣] ← grey
8. 𝑑[𝑣] ← min{ 𝑑[𝑣], 𝑑[𝑢] + 𝑤(𝑢, 𝑣) }
9. for all 𝑣 ∈ 𝑉 : dist(𝑠, 𝑣) ← 𝑑[𝑣]
The algorithm uses an array that defines for each node 𝑣 an array entry colour[𝑣]
which is either black, grey, or white (so this “colour” may internally be stored by
one of three distinct integers such as 0, 1, 2). In the course of the computation,
each node will change its colour from black to grey to white, unless there is no path
from the source 𝑠 to the node, in which case the node will stay black. In addition,
the array entries 𝑑[𝑣] represent preliminary distances from 𝑠, according to the
following theorem that we prove later and which will be used to show that the
algorithm is correct.
Theorem 2.22. In Algorithm 2.21, at the beginning of each iteration of the main loop (lines
3–8), 𝑑[𝑣] is the smallest weight of a path 𝑢0 , 𝑢𝑖 , . . . , 𝑢 𝑘 from 𝑠 to 𝑣 (so 𝑢0 , 𝑢 𝑘 = 𝑠, 𝑣) that
2.9. Single-Source Shortest Paths: Dijkstra’s Algorithm 39

𝑢 𝑠 𝑎 𝑏 𝑐 𝑥 𝑦

0𝐺 ∞𝐵 ∞𝐵 ∞𝐵 ∞𝐵 ∞𝐵
y
3 1 𝑠 0𝑊 1 𝐺
𝑠 4 𝐺
𝑠 ∞ ∞ ∞
0
c x 𝑎 0 1𝑊 3 𝐺
𝑎 6 𝐺
𝑎 ∞ ∞
2
𝐺 𝐺
5 2 1 𝑏 0 1 3𝑊 5 𝑏
4 𝑏

2 𝐺
a b 𝑥 0 1 3 4 𝑥 4𝑊 5 𝐺
𝑥
3
1 4 𝑐 0 1 3 4𝑊 4 5𝐺
s
𝑦 0 1 3 4 4 5𝑊

dist(𝑠, 𝑢) 0 1 3 4 4 5
pred[𝑣] NIL 𝑠 𝑎 𝑥 𝑏 𝑥

Figure 2.1 Example for Dijkstra’s Algorithm 2.21.

with the exception of 𝑣 consists exclusively of white nodes, that is, colour[𝑢𝑖 ] = white for
0 ≤ 𝑖 < 𝑘. When 𝑢 is made white in line 5, we have 𝑑[𝑢] = dist(𝑠, 𝑢).

In line 1, all nodes 𝑣 are initially black with 𝑑[𝑣] ← ∞. In line 2, the source
𝑠 becomes grey and 𝑑[𝑠] ← 0. This is also shown as the first row in the table in
Figure 2.1, where we use superscripts 𝐵, 𝐺, 𝑊 for a newly assigned colour black,
grey, or white. Because grey nodes are of special interest, we will indicate their
colour all the time, even if it has not been updated in that particular iteration.
The main loop in lines 3–8 operates as long as the set of grey nodes is not empty,
in which case it selects in line 4 a particular grey node 𝑢 with smallest value 𝑑[𝑢].
Because of the initialisation in line 2, the only grey node is 𝑠, which is therefore
chosen in the first iteration. Each row in the table in Figure 2.1 represents one
iteration of the main loop, where the node 𝑢 that is chosen in line 4 is displayed
on the left of that row. The row entries are the values 𝑑[𝑣] for all nodes 𝑣, as they
become updated or stay unchanged in that iteration. The chosen node 𝑢 changes
its colour to white in line 5, indicated by the superscript 𝑊 in the table, where that
node is also underlined.
Lines 6–8 are a second inner loop of the algorithm. It traverses all non-white
neighbours 𝑣 of 𝑢, that is, all nodes 𝑣 so that colour[𝑣] is not white and so that
(𝑢, 𝑣) is an arc. For all these non-white neighbours 𝑣 of 𝑢 are set to grey in line 7
(indicated by a superscript 𝐺), and their distance 𝑑[𝑣] is updated to 𝑑[𝑢] + 𝑤(𝑢, 𝑣)
in case this is smaller than the previous value of 𝑑[𝑣] (which happens always if
40 Chapter 2. Combinatorial Optimisation

𝑣 is black and therefore 𝑑[𝑣] = ∞). If such an update happens, this means that
there is an all-white path from 𝑠 to 𝑢 followed by an arc (𝑢, 𝑣) that connects 𝑢
to the grey node 𝑣, and in that case we can also set pred[𝑣] ← 𝑢 to indicate that
𝑢 is the predecessor of 𝑣 on the current path from 𝑠 to 𝑣 (we have omitted that
update of pred[𝑣] to keep Algorithm 2.21 short; it is the same as in lines 7–7b of
Algorithm 2.17). As in example (2.13) for the Bellman–Ford algorithm, the update
of pred[𝑣] with 𝑢 is shown with the subscript 𝑢, and the update of 𝑑[𝑣] is shown
by surrounding the new value with a box.
In the first iteration in Figure 2.1 where 𝑢 = 𝑠, the updated neighbours 𝑣 of
𝑢 are 𝑎 and 𝑏. These are also the grey nodes in the next iteration of the main
loop, where node 𝑎 is selected because 𝑑[𝑎] < 𝑑[𝑏], and 𝑎 is made white. The two
neighbours of 𝑎 are 𝑏 and 𝑐. Both are non-white and become grey (𝑏 is already
grey). The value of 𝑑[𝑏] is updated from 4 to 𝑑[𝑎] + 𝑤(𝑎, 𝑏) = 3, with pred[𝑏] ← 𝑎.
The value of 𝑑[𝑐] is updated from ∞ to 𝑑[𝑎] + 𝑤(𝑎, 𝑐) = 6, with pred[𝑐] ← 𝑎. The
current row shows that the grey nodes are 𝑏 and 𝑐, where 𝑑[𝑏] < 𝑑[𝑐].
In the next iteration therefore 𝑢 is chosen to be 𝑏, which gives the next row
of the table. The neighbours of 𝑏 are 𝑎, 𝑐, 𝑥. Here 𝑎 is white and is ignored, 𝑐 is
non-white and gets the update 𝑑[𝑐] ← 𝑑[𝑏] + 𝑤(𝑏, 𝑐) = 5 because this is smaller
than the current value 6, and pred[𝑐] ← 𝑏. Node 𝑥 changes colour from black to
grey and 𝑑[𝑥] ← 𝑑[𝑏] + 𝑤(𝑏, 𝑥) = 4. In the next iteration, 𝑥 is the grey node 𝑢 with
smallest 𝑑[𝑢], creating updates for 𝑐 and 𝑦. The next and penultimate iteration
chooses 𝑐 among two remaining grey nodes, where the neighbour 𝑦 of 𝑐 creates no
update (other than being set to grey, which is already its colour). The final iteration
chooses 𝑦. Because all nodes are now white, the algorithm terminates with the
output of distances in line 9, as shown in the table in Figure 2.1 in addition to the
predecessors in the shortest-path tree with root 𝑠.
Proof of Theorem 2.22. First, we note that because all weights are nonnegative, the
shortest walk from 𝑠 to any node 𝑣 can always be chosen as a path by Proposition 2.5
because any tour that the walk contains can be removed and is of nonnegative
weight, which will not increase the weight of the walk.
We prove the theorem by induction. Before the main loop in lines 3–8 is
executed for the first time, there are no white nodes. Hence, the only path from
𝑠 where all but the last node are white is a path with no arcs that consists of the
single node 𝑠, and its weight is zero, where 𝑑[𝑠] = 0 as claimed. Furthermore, this
is the only (and shortest) path from 𝑠 to 𝑠, so dist(𝑠, 𝑠) = 𝑑[𝑠] = 0.
Suppose now that at the beginning of the main loop the condition is true for
any set of white nodes. If there are no grey nodes, then the main loop will no
longer be performed and the algorithm proceeds to line 9. If there are grey nodes,
then the main loop will be executed, and we will show that the condition holds
again afterwards. Let 𝑢 be a node that is chosen in line 4, which is made white
2.9. Single-Source Shortest Paths: Dijkstra’s Algorithm 41

in line 5. We prove, as claimed in the theorem, that just before this assignment
we have 𝑑[𝑢] = dist(𝑠, 𝑢). This has already been shown when 𝑢 = 𝑠. There is a
path from 𝑠 to 𝑢 with weight 𝑑[𝑢], namely the assumed path (by the induction
hypothesis) where all nodes except 𝑢 are white. Consider any shortest path 𝑃 from
𝑠 to 𝑢; we will show 𝑑[𝑢] ≤ 𝑤(𝑃) which implies 𝑑[𝑢] = 𝑤(𝑃) = dist(𝑠, 𝑢). Let 𝑦 be
the first node on the path 𝑃 which is not white. Let 𝑃 ′ be the prefix of 𝑃 given by
the path from 𝑠 to 𝑦, which is a shortest path from 𝑠 to 𝑦 by Lemma 2.15. Moreover,
𝑦 is grey because there are no arcs (𝑥, 𝑦) where 𝑥 is white (such as the previous
node 𝑥 on 𝑃 before 𝑦) and 𝑦 is black because after 𝑥 has been made white in line 5,
all its black neighbours 𝑦 are made grey in line 7. So is 𝑃 ′ is a shortest path from
𝑠 to 𝑦 and certainly a shortest path among those where all but the last node are
white, so by the induction hypothesis, 𝑑[𝑦] = 𝑤(𝑃 ′). By the choice of 𝑢 in line 4 we
have 𝑑[𝑢] ≤ 𝑑[𝑦] = 𝑤(𝑃 ′) ≤ 𝑤(𝑃), where the latter inequality holds because all
weights are nonnegative. That is, 𝑑[𝑢] ≤ 𝑤(𝑃) as claimed.
We now show that updating the non-white neighbours 𝑣 of 𝑢 in lines 7–8 will
complete the induction step, that is, any shortest path 𝑃 from 𝑠 to 𝑣 where all nodes
but 𝑣 are white has weight 𝑑[𝑣]. If the last arc of such a shortest path is not (𝑢, 𝑣),
then this is true by the induction hypothesis. If the last arc of 𝑃 is (𝑢, 𝑣), then 𝑃
without its last node 𝑣 defines a shortest path from 𝑠 to 𝑢 (where all nodes are white),
were we just proved 𝑑[𝑢] = dist(𝑠, 𝑢), and hence 𝑤(𝑃) = 𝑑[𝑢] + 𝑤(𝑢, 𝑣) = 𝑑[𝑣]
because that is how 𝑑[𝑣] has been updated in line 8. This completes the induction.

Corollary 2.23. Dijkstra’s Algorithm 2.21 works correctly.

Proof. When the algorithm terminates, every node is either white or black. As
shown in the preceding proof, at the end of each iteration of the main loop there
are no arcs (𝑥, 𝑦) where 𝑥 is white and 𝑦 is black. Hence, the white nodes are exactly
the nodes 𝑢 that can be reached from 𝑠 by a path, with dist(𝑠, 𝑢) = 𝑑[𝑢] < ∞ by
Theorem 2.22. The black nodes 𝑣 are those that cannot be reached from 𝑠, where
dist(𝑠, 𝑣) = ∞ = 𝑑[𝑣] as set at initialisation in line 1.

In Dijkstra’s algorithm, a grey node 𝑢 with minimal 𝑑[𝑢] has already its final
distance 𝑑(𝑠, 𝑣) given by 𝑑[𝑢], so that 𝑢 can be made white. There can be no shorter
“detour” to reach 𝑢 via nodes that at that time are grey or black, because the first
grey node 𝑦 on such a path from 𝑠 to 𝑢 would fulfill 𝑑[𝑢] ≤ 𝑑[𝑦] (see the proof of
Theorem 2.22), and the remaining part of that path from 𝑦 to 𝑢 has nonnegative
weight by assumption. This argument fails for negative weights. In the following
network,
y u
−5
4 1
s
42 Chapter 2. Combinatorial Optimisation

the next node made white after 𝑠 is 𝑢 and is recorded with distance 1, and after
that node 𝑦 with distance 4. However, the path 𝑠, 𝑦, 𝑢 has weight −1 which is less
than the computed weight 1 of the path 𝑠, 𝑢. So the output of Dijkstra’s algorithm
is incorrect, here because of the negative weight of the arc (𝑦, 𝑢). It may happen
that the output of Dijkstra’s algorithm is correct (as in the preceding example if
𝑤(𝑠, 𝑢) = 5), but in general this is not guaranteed.
We now analyse the running time of Dijkstra’s algorithm. Let 𝑛 = |𝑉 | and
𝑚 = |𝐴|. The initialisation in lines 1–2 takes time 𝑂(𝑛), and so does the final
output (if 𝑑[𝑣] is not taken directly as the output) in line 9. In each iteration of the
main loop in lines 3–8, exactly one node becomes (and stays) white. Hence, the
loop is performed 𝑛 times, assuming (which in general is the case) that all nodes
are eventually white, that is, are reachable by a path from the source node 𝑠. We
assume that the colour of a node 𝑣 is represented by the array entry colour[𝑣], and
that nodes themselves are just represented by the numbers 1, . . . , 𝑛. By iterating
through the colour array, identifying the grey nodes in line 3 and finding the node 𝑢
with minimal 𝑑[𝑢] in line 4 takes time 𝑂(𝑛). (Even if the number of grey nodes were
somehow represented in an array of shrinking size, it is possible that they are at
least a constant fraction, if not all, of the nodes that are not white, and their number
is initially 𝑛, then 𝑛 − 1, and so on, so that the number of nodes checked in line 4
counted over the 𝑛 iterations of the main loop is 𝑛 + (𝑛 − 1) + · · · + 2 + 1 = 𝑂(𝑛 2 ),
which is the same as in our current analysis.)
The inner loop in lines 6–8 of Algorithm 2.21 iterates through the nodes 𝑣,
and checks if they are not white and if they are neighbours of 𝑢, that is, (𝑢, 𝑣) ∈ 𝐴.
The time this takes depends on the representation of the digraph (𝑉 , 𝐴). If the
neighbours of every node 𝑢 are stored in an adjacency list as in (2.7), then this is
as efficient as possible, that is, over the entire iterations of the main loop each arc
(𝑢, 𝑣) is considered exactly once, namely after 𝑢 has just become white. So over
all iterations of the main loop the steps in lines 6–8 take time 𝑂(𝑚). Alternatively,
the digraph may be represented by an adjacency table which has Boolean entries
𝑎[𝑢, 𝑣] which have value true if and only if (𝑢, 𝑣) ∈ 𝐴, otherwise false. In that case,
all nodes 𝑣 have to be checked in line 6, which takes time 𝑂(𝑛) for each iteration of
the main loop.
Taken together, lines 4, 5, and 6–8 take time 𝑂(𝑛) + 𝑂(1) + 𝑂(𝑛), and because
they are performed up to 𝑛 times in total time 𝑂(𝑛 2 ), which dominates the overall
running time compared to lines 1–2 and 9. If the digraph is represented by
adjacency lists, the running time is 𝑂(𝑛 2 ) for lines 3–4 plus 𝑂(𝑚) for lines 6–8 over
all iterations of the main loop, which is 𝑂(𝑛 2 ) + 𝑂(𝑚) = 𝑂(𝑛 2 ) because 𝑚 ∈ 𝑂(𝑛 2 ),
by (2.19). In summary, for a network with 𝑛 nodes, the running time of Dijkstra’s
algorithm is 𝑂(𝑛 2 ). This is better by a factor of 𝑛 than the Bellman–Ford algorithm,
but requires the assumption of nonnegative arc weights.
2.10. Reminder of Learning Outcomes 43

2.10 Reminder of Learning Outcomes

After studying this chapter, you should be able to:


• understand graphs, digraphs, and networks, and their differences
• see graphs, digraphs, and networks as finite combinatorial structures (that is,
independently of how they are drawn, which is just for visualisation)
• explain how digraphs are stored as computer inputs, for example with adja-
cency lists as in (2.7), and how this extends to networks
• understand pseudo-code for algorithms and how blocks of code are shown
with indentation (starting at later vertical positions)
• explain the O-notation for running times of algorithms
• write down the Bellman–Ford algorithm
• understand the difference between the two versions of the Bellman–Ford
algorithm
• write down Dijkstra’s algorithm
• understand the different assumptions for the algorithms of Bellman–Ford and
Dijkstra
• apply these algorithms by hand to small networks, and document their
progress with suitable tables.

2.11 Exercises for Chapter 2

Exercise 2.1. In the example (2.1) of the marriage problem with three women and
men on each side, it was shown in the text that it is not possible to find a perfect
matching of three couples. Now assume you can add exactly one more possible
couple as an edge to the graph in (2.1), for example the pair (𝑎 1 , 𝑏2 ). Show for
which added edge it will be possible to create a perfect matching, and for which
added edge it will not work and why (arguing similarly to the text).

Exercise 2.2. Let 𝐷 = (𝑉 , 𝐴) be a digraph.


(a) Prove that |𝐴| ≤ |𝑉 | · (|𝑉 | − 1). Is it possible that this inequality holds as an
equality? If so, for what digraphs does that happen?
(b) Suppose that 𝐷 has the property that for each pair 𝑢, 𝑣 ∈ 𝑉, 𝑢 ≠ 𝑣, at most one
of the two possible arcs (𝑢, 𝑣) or (𝑣, 𝑢) is present. What is the largest number
of arcs that 𝐷 can have under that condition?

Exercise 2.3. Let 𝑢, 𝑣, 𝑤 be three distinct vertices in a digraph 𝐷. Prove that if


there is a path from 𝑢 to 𝑣 and a path from 𝑣 to 𝑤, then there is a path from 𝑢 to 𝑤.
(Why can we not just connect a 𝑢, 𝑣-path to a 𝑣, 𝑤-path?)
44 Chapter 2. Combinatorial Optimisation

Exercise 2.4. Recall Algorithm 2.11 that computes the minimum of 𝑛 array elements:
Input : 𝑛 numbers 𝑆[1], 𝑆[2], . . . , 𝑆[𝑛], 𝑛 ≥ 1.
Output : their minimum 𝑚 and its index 𝑖, so that 𝑚 = 𝑆[𝑖 ] and 𝑚 ≤ 𝑆[𝑘] for all 𝑘.

1. 𝑚 ← 𝑆[1]
2. 𝑖 ← 1
3. 𝑘 ← 2
4. while 𝑘 ≤ 𝑛 :
5. if 𝑆[𝑘] < 𝑚 :
6. 𝑚 ← 𝑆[𝑘]
7. 𝑖 ← 𝑘
8. 𝑘 ← 𝑘+1

The array elements 𝑆[1], 𝑆[2], . . . , 𝑆[𝑛], 𝑛 ≥ 1 need not be distinct so that the
returned index 𝑖 with 𝑚 = 𝑆[𝑖 ] may not be unique according to the output speci-
fication. With this algorithm, what is 𝑖 when 𝑆[1], 𝑆[2], . . . , 𝑆[𝑛] = 5, 3, 3, 4, 3, 8?
Which value of 𝑖 is returned in general? How should one modify the algorithm
so that the index 𝑖 is as large as possible (i.e., 𝑆[𝑖 ] is the last among the minimal
elements in the array)? Justify your answers.

Exercise 2.5. In the adjacency list for a digraph we place 𝑦 in column 𝑥 whenever
(𝑥, 𝑦) is an arc. Sketch the digraph whose adjacency list is the following:

𝑎 𝑏 𝑐 𝑑 𝑒 𝑓
𝑑 𝑎 𝑏 𝑏 𝑓 𝑎
𝑒 𝑐
𝑒

Find a directed path from 𝑐 to 𝑓 , and all directed cycles that start and end at 𝑑.

Exercise 2.6.
(a) Apply the first version of the Bellman-Ford algorithm (which records separate
distances 𝑑[𝑣, 𝑖 ] from the source 𝑠 for each vertex 𝑣 and iteration 𝑖 ) to the
following network with source 𝑠. Do so by listing for each vertex the interme-
diate distances for each iteration and showing which ones are newly updated.
What is the output of the algorithm, with the found distances from 𝑠 to each
vertex?
Also, for each node 𝑣, give the predecessor node pred[𝑣] on the shortest path
from the source 𝑠 to 𝑣 (record any update of pred[𝑣] in the table that documents
the progress of the algorithm).
2.11. Exercises for Chapter 2 45

−1
c d
2 3 −7
s −2 6 2 e

3 1
a 1 b

(b) Do the same as in (a) with the same network except that the weight 6 on the
arc (𝑑, 𝑎) is replaced with weight 4.

Exercise 2.7. Apply the first version of the Bellman-Ford algorithm to the following
network.
−2
s x
1

2 −1

y z
1

Exercise 2.8. Using the definition of 𝑂-notation, prove the following: Let 𝑓 : N → R
be a polynomial of degree 𝑑 ≥ 0 with positive leading coefficient, that is,

𝑓 (𝑛) = 𝑎 𝑑 𝑛 𝑑 + 𝑎 𝑑−1 𝑛 𝑑−1 + · · · + 𝑎 1 𝑛 + 𝑎0

where 𝑎 𝑑 , 𝑎 𝑑−1 , . . . , 𝑎 1 , 𝑎0 are real numbers with 𝑎 𝑑 > 0. Prove that 𝑓 (𝑛) ∈ 𝑂(𝑛 𝑑 ).
Hint: consider a polynomial that uses only the positive coefficients of 𝑓 .

Exercise 2.9. What is the running time of the algorithm in Exercise 2.4 that finds a
minimum in an array of 𝑛 numbers? Use 𝑂-notation. Justify your answer.

Exercise 2.10.
(a) Demonstrate Dijkstra’s shortest-path algorithm for the following network with
source node 𝑎, in the style of the example in (2.18). Draw the computed
shortest-path tree.

b 7
4 6
e f
3
a d 3
1 1
2
1
c g h
7 2

(b) Would the algorithm still give the correct answer for this network if the weight
7 of the arc (𝑏, 𝑒) was replaced by −7 ? Explain precisely why or why not.
3
Continuous Optimisation

3.1 Introduction

This chapter studies optimisation problems for continuous functions. It studies


suitable assumptions about the domain of a continuous function so that it is
guaranteed to have a maximum and minimum. The correct condition is that
of compactness, which leads to Theorem 3.37, due to Weierstrass, which says: A
continuous real-valued function on a compact domain assumes its maximum and
minimum. This is the main theorem of this chapter.
The next Chapter 4 studies optimisation problems for functions that are not
only continuous but differentiable, which is a stronger condition. Continuity
and differentiability are conceptually different and therefore treated in separate
chapters.

3.1.1 Learning Outcomes

After studying this chapter, you should be able to:


• explain the important difference between a real number 𝑥 being positive (𝑥 > 0)
and being nonnegative (𝑥 ≥ 0)
• explain why the square 𝑥 2 of a real number 𝑥 is always nonnegative
• use confidently the notions of infimum and supremum, and their related but
different notions of minimum and maximum
• explain what it means for a sequence of real numbers to converge (tend to a
limit that is a real number), or to tend to plus or minus infinity
• understand what it means for a sequence in R𝑛 to converge
• explain the difference between Euclidean norm and maximum norm, and why
both can be used to prove convergence
• have an intuition of open, closed, bounded, and compact sets, and define them
formally

46
3.1. Introduction 47

• understand the central concept of continuity of functions defined on R𝑛 or on


subsets of R𝑛
• explain the importance of compactness for subsets of R𝑛 as domains of
functions, with suitable examples
• state the Theorem of Weierstrass (Theorem 3.37)
• apply the Theorem of Weierstrass to examples to prove existence of a maximum
and minimum of a function.

3.1.2 Essential Reading

Essential reading is this chapter.

3.1.3 Further Reading

You should know the important concepts of real analysis, which we review in this
chapter. A good introductory book is
Bryant, V. (1990). Yet Another Introduction to Analysis. Cambridge University
Press, Cambridge, UK. ISBN 978-0521388351.
The image of “seaview hotels” to prove the Bolzano–Weierstrass theorem using
Proposition 3.15 is taken from page 32 of that book. That book is also a good
introduction to the material in Section 3.6.
The following introductory textbook on optimisation provides also some
foundational material.
Sundaram, R. K. (1996). A First Course in Optimization Theory. Cambridge
University Press, Cambridge, UK. ISBN 978-0521497190.
In particular, appendix B and section 1.2.4 of that book complement our (optional)
Section 3.4 on constructions of the real numbers. Further useful explanations of
this topic are found in these wikipedia articles on construction of the real numbers
and the Dedekind cut:
https://en.wikipedia.org/wiki/Construction_of_the_real_numbers
https://en.wikipedia.org/wiki/Dedekind_cut
For proofs about real numbers as Dedekind cuts see pages 17–21 of
Rudin, W. (1976). Principles of Mathematical Analysis, 3rd ed., volume 3. McGraw-
Hill, New York. ISBN 978-0070542358.

3.1.4 Synopsis of Chapter Content

A lot of this chapter is review material, in a more rigorous version that what you
may know already, but also largely self-contained:
48 Chapter 3. Continuous Optimisation

• Sections 3.2–3.4 deal with the real numbers, namely why we can maximise real
values (which uses their order, see Section 3.2) and their main property of the
existence of infimum and supremum for nonempty bounded sets (Section 3.3).
Section 3.4 (which is optional, as indicated by a star *) explains mathematical
“constructions” of real numbers. We mostly rely on the intuition of the real
numbers as points on a line.
• Section 3.5 recalls concepts about functions such as domain, range, and image.
• Sequences and their limits are important tools to prove continuity. They make
their appearance twice, in Section 3.6 for real numbers and in Section 3.8 for
vectors of real numbers (elements of R𝑛 ).
• Section 3.7 is about measuring distance in R𝑛 .
• Open and closed sets (Section 3.9) and compact sets (Section 3.10) are important
concepts when studying continuity, the topic of Sections 3.11 and 3.12.
• All this leads up to the central theorem of this chapter, the Theorem of
Weierstrass (Section 3.13) and its use (Section 3.14).

3.2 The Real Numbers and Their Order

Optimisation is concerned with finding a “best” solution to a mathematically


expressed problem. This may be a shortest path between two nodes in a network,
or a cylindrical beer can with minimal surface for a given volume. “Better” is here
measured by a real number, like the length of a path, or the required surface area.
This real number is minimised (or, in other contexts, maximised).
This minimisation uses the basic ordering of real numbers so that one number 𝑥
can be declared as greater than another number 𝑦, written as 𝑥 > 𝑦 or, equivalently
𝑦 < 𝑥, that is, 𝑦 is smaller than 𝑥. This order “greater than” is denoted by >. It is
used directly together with ≥ where 𝑥 ≥ 𝑦 is to be read as “𝑥 is greater than or
equal to 𝑦”, or equivalently 𝑦 ≤ 𝑥, to be read as “𝑦 is less than or equal to 𝑥”.
An inequality such as 𝑥 > 0 is called a strict inequality, whereas 𝑥 ≥ 0 is called
a weak inequality. It is important not to confuse the two, because one can find a
solution to the problem “minimise 𝑥 subject to 𝑥 ≥ 0” (which has the obvious
solution 𝑥 = 0) but not to the problem “minimise 𝑥 subject to 𝑥 > 0” because this
problem has no solution, which is easily seen by contradiction: If there was a
smallest number 𝑥 such that 𝑥 > 0, then 𝑥/2 would also fulfill 𝑥/2 > 0, but 𝑥/2 is
smaller than 𝑥. Hence, when you write 𝑥 ≥ 0, be careful to observe that this means
“𝑥 is nonnegative” and not “𝑥 is positive”, because 𝑥 could also be zero (which is
not a positive number). Every real number is either positive, zero, or negative.
Comparing real numbers by size is due to their ordering, which we imagine to
be along the real line. We consider a straight line on which we mark one point as 0,
3.2. The Real Numbers and Their Order 49

and a second point as 1 (normally to the right of 0 if the line is drawn horizontally,
or above 0 if the line is drawn vertically). Then the distance between 0 and 1 is the
“unit length”. Any other point 𝑥 is then a point on this line where 𝑥 is to the right
of 0 if 𝑥 > 0, to the left of 0 if 𝑥 < 0, with a distance 𝑥 from 0 assuming that the
distance between 0 and 1 is 1. The set of reals R is thought to be exactly the set of
these points on the line.
The fact that real numbers are ordered makes them one of the most useful
mathematical objects in practical applications. For example, the complex numbers
cannot be ordered in such a useful way. The complex numbers allow us to solve
arbitrary polynomial equations such as 𝑥 2 = −1, for which no real solution 𝑥 exists,
because 𝑥 2 ≥ 0 holds for every real number 𝑥. This, as we show shortly, is a
property of the order relation ≥, and so we cannot have a system of numbers that
we can order and find a minimum or maximum, and that at the same time allows
solving arbitrary polynomial equations.
We state a few properties of the order relation ≥ that imply the inequality
𝑥2 ≥ 0 for all 𝑥. Most importantly, the order relation 𝑥 ≥ 𝑦 should be compatible
with addition in the sense that we can add any number 𝑧 to both sides and preserve
the property (which is obvious from our picture of the real line). That is, for any
reals 𝑥, 𝑦, 𝑧
𝑥 ≤ 𝑦 ⇒ 𝑥+𝑧 ≤ 𝑦+𝑧 (3.1)
which for 𝑧 = −𝑥 − 𝑦 implies

𝑥≤𝑦 ⇒ − 𝑦 ≤ −𝑥 . (3.2)

Because −𝑥 = (−1) · 𝑥, the implication (3.2) states the well-known rule that
multiplication with −1 reverses an inequality. If you forget why this is the case, simply
subtract the terms on both sides from the inequality to get this result, as we have
done. Another condition concerning multiplication of real numbers is

𝑥, 𝑦 ≥ 0 ⇒ 𝑥 · 𝑦 ≥ 0 . (3.3)

In terms of the real number line, this means that 𝑦 is “stretched” (if 𝑥 > 1) or
“shrunk” (if 0 ≤ 𝑥 < 1) or stays the same (if 𝑥 = 1) when multiplied with the
nonnegative number 𝑥, but stays on the same side as 0 (this holds for any real
number 𝑦; here 𝑦 is also assumed to be nonnegative). Condition (3.3) holds for a
positive integer 𝑥 as a consequence of (3.1), because 𝑦 ≥ 0 implies 𝑦 + 𝑦 ≥ 𝑦 and
hence 2𝑦 = 𝑦 + 𝑦 ≥ 𝑦 ≥ 0, and similarly for any repeated addition of 𝑦. Extending
this from positive integers 𝑥 to real numbers 𝑥 gives (3.3).
We now show that 𝑥 · 𝑥 ≥ 0 for any real number 𝑥. This holds if 𝑥 ≥ 0 by
(3.3). If 𝑥 ≤ 0 then −𝑥 ≥ 0 by (3.2), and so 𝑥 · 𝑥 = (−1) · (−1) · 𝑥 · 𝑥 = (−𝑥)(−𝑥) ≥ 0
again by (3.2), where we have used that (−1) · (−1) = 1. This, in turn, follows
from something we have already used, namely that (−1) · 𝑦 = −𝑦 for any 𝑦,
50 Chapter 3. Continuous Optimisation

because −𝑦 is the unique negative of 𝑦 so that 𝑦 + (−𝑦) = 0: Namely, we also


have 𝑦 + (−1) · 𝑦 = 1 · 𝑦 + (−1) · 𝑦 = (1 − 1) · 𝑦 = 0 · 𝑦 = 0, so (−1) · 𝑦 is indeed −𝑦.
Similarly, −(−𝑦) = 𝑦 (because (−𝑦) + 𝑦 = 0) and in particular −(−1) = 1, which
means (−1) · (−1) = 1 as claimed.
A systematic derivation of all properties of the order ≤ of the reals in combina-
tion with the arithmetic operations addition and multiplication is laborious, and
so we appeal to the intuition of the real number line. Here we note the following
“axioms” for ≤ which are useful to understand in their separate importance. These
are transitivity, which says that for all 𝑥, 𝑦, 𝑧 ∈ R

𝑥 ≤ 𝑦, 𝑦 ≤ 𝑧 ⇒ 𝑥 ≤ 𝑧. (3.4)

The next condition is antisymmetry: for all 𝑥, 𝑦 ∈ R

𝑥 ≤ 𝑦, 𝑦 ≤ 𝑥 ⇒ 𝑥 = 𝑦. (3.5)

Another condition is reflexivity: for all 𝑥 ∈ R

𝑥 ≤ 𝑥. (3.6)

The corresponding strict order < is then defined by

𝑥<𝑦 :⇔ 𝑥 ≤ 𝑦 and 𝑥 ≠ 𝑦 (3.7)

(and correspondingly the relations ≥ and >). Transitivity (3.4), antisymmetry (3.5)
and reflexivity (3.6) define what is called a partial order. The term “partial” means
that there can be elements 𝑥 and 𝑦 that are incomparable in the sense that neither
𝑥 ≤ 𝑦 not 𝑦 ≤ 𝑥 holds. One of the most important partial orders is the inclusion
relation ⊆ between sets, where 𝐴 ⊆ 𝐵 means that 𝐴 is a subset of 𝐵.
For the order ≤ on the set R of reals, incomparability does not occur. This
order is therefore called total in the sense that for all 𝑥, 𝑦 ∈ R

𝑥 ≤ 𝑦 or 𝑦 ≤ 𝑥 . (3.8)

It is useful to study properties of an order relation ≤ based on these abstract


properties. That is, we consider a set, say 𝑆, with a binary relation ≤ where 𝑥 ≤ 𝑦
says that this relation holds for two particular elements 𝑥 and 𝑦 of 𝑆. The symbol
< is then defined according to (3.7), and the symbols ≥ understood > as the usual
shorthands, such as 𝑥 ≥ 𝑦 for 𝑦 ≤ 𝑥. Then 𝑆 together with ≥ is called an “ordered”
set.

Definition 3.1. An ordered set is a set 𝑆 together with a binary relation ≤ that is
transitive, antisymmetric, and reflexive, that is, (3.4) (3.5), (3.6) hold for all 𝑥, 𝑦, 𝑧
in 𝑆. The order is called total and the set totally ordered if (3.8) holds for all 𝑥, 𝑦 in 𝑆.
3.3. Infimum and Supremum 51

3.3 Infimum and Supremum

The real numbers R allow addition and multiplication, and comparison with the
order ≤. The same applies to the rational numbers Q. However, the real numbers
have an additional property of completeness that we will state using the order
relation ≤. The rational numbers lack this√property. Because there is no rational
number 𝑥 such that 𝑥 2 = 2 (in other words, 2 is irrational), the set {𝑥 ∈ Q | 𝑥 2 < 2}
has no least upper bound in Q (defined shortly), but it does in R. Intuitively, the
parabola that consists of all pairs (𝑥, 𝑦) such that 𝑦 = 𝑥 2 − 2 is a “continuous curve”
in R2 √that should intersect the “𝑥-axis” where 𝑦 = 0 at two points (𝑥, 𝑦) where
𝑥 = ± 2 and 𝑦 = 0, in agreement with the intuition that 𝑥 can take all values on
the real number line.
The following definition of an upper or lower bound of a set applies to any
ordered set; the order need not be total. The applications we have in mind occur
when the ordered set 𝑆 is R or Q.

Definition 3.2. Let 𝑆 be an ordered set and let 𝐴 ⊆ 𝑆. An upper bound of 𝐴 is an


element 𝑧 of 𝑆 such that 𝑥 ≤ 𝑧 for all 𝑥 ∈ 𝐴. A lower bound of 𝐴 is an element 𝑧
of 𝑆 such that 𝑥 ≥ 𝑧 for all 𝑥 ∈ 𝐴. A greatest element or maximum of 𝐴 is an upper
bound of 𝐴 that belongs to 𝐴. A smallest element or minimum of 𝐴 is a lower bound
of 𝐴 that belongs to 𝐴.

Note that a maximum of a set 𝐴, if it exists, is unique, so that we can call it


the maximum or say that the maximum is attained: Namely, if 𝑥 and 𝑦 are two
maxima of 𝐴, then 𝑥 ≤ 𝑦 because 𝑥 ∈ 𝐴 and 𝑦 is an upper bound of 𝐴, and 𝑦 ≤ 𝑥
because 𝑦 ∈ 𝐴 and 𝑥 is an upper bound of 𝐴. Hence, 𝑥 = 𝑦 by antisymmetry (3.5).
Similarly, a minimum of a set, if it exists, is unique.

Definition 3.3. Let 𝑆 be an ordered set and let 𝐴 ⊆ 𝑆. We say 𝐴 is bounded from
above if 𝐴 has an upper bound, and bounded from below if 𝐴 has a lower bound,
and just bounded if 𝐴 is bounded from above and below. The least upper bound or
supremum of a set 𝐴, denoted sup 𝐴, is the least element of the set of upper bounds
of 𝐴 (if it exists). The greatest lower bound or infimum of a set 𝐴, denoted inf 𝐴, is
the greatest element of the set of lower bounds of 𝐴 (if it exists).

Clearly, for a set to have a least or greatest element, it needs to be nonempty,


so 𝐴 can only have a least upper bound if 𝐴 is bounded from above, and 𝐴 can
only have a greatest lower bound if 𝐴 is bounded from below. What if 𝐴 itself is
empty? Then, by definition, every real number 𝑧 is trivially an upper (or lower)
bound of 𝐴, and the set of upper (or lower) bounds of 𝐴 itself is not bounded from
below or above, and so cannot have a least or greatest element. On the other hand,
if 𝐴 is nonempty, then any element 𝑥 of 𝐴 is a lower bound for the set of upper
bounds 𝑧 of 𝐴 (because 𝑥 ≤ 𝑧). Similarly, 𝑥 is an upper bound for the set of lower
52 Chapter 3. Continuous Optimisation

bounds of 𝐴. In summary, if a set 𝐴 is nonempty, then 𝐴 can possibly have a least


upper bound or supremum if 𝐴 is bounded from above, and 𝐴 can possibly have a
greatest lower bound or infimum if 𝐴 is bounded from below.
The supremum of 𝐴 is the same as a maximum of 𝐴 if (and only if) it belongs
to 𝐴. This is stated in the following proposition, analogously also for infimum and
minimum.
Proposition 3.4. Let 𝑆 be an ordered set and let 𝐴 ⊆ 𝑆. Then 𝐴 has a maximum if and
only if sup 𝐴 exists and belongs to 𝐴. Similarly, 𝐴 has a minimum if and only if inf 𝐴
exists and belongs to 𝐴.

Proof. Suppose 𝐴 has a maximum 𝑎. Then by definition 𝑎 is an upper bound of 𝐴,


and for any upper bound 𝑧 of 𝐴 we have 𝑎 ≤ 𝑧 because 𝑎 ∈ 𝐴. Hence, 𝑎 is the
least upper bound sup 𝐴. Conversely, if sup 𝐴 exists and belongs to 𝐴 then it is
an upper bound of 𝐴 and hence a maximum. The argument for minimum and
infimum is similar.
The mentioned order completeness of R asserts that any nonempty set of real
numbers with an upper bound has a least upper bound (supremum), and any
nonempty set of real numbers with a lower bound has a greatest lower bound
(infimum). It is a basic property of the real numbers.
Axiom 3.5 (Order completeness of R). Let 𝐴 be a nonempty set of real numbers. Then
if 𝐴 is bounded from above then 𝐴 has a supremum sup 𝐴, and if 𝐴 is bounded from below
then 𝐴 has an infimum inf 𝐴.

We have stated condition 3.5 as an axiom about the real numbers rather than a
theorem. That is, we assume this condition for all sets of real numbers, according
to our intuition about the real number line.

3.4 Constructing the Real Numbers *

It is also possible to prove Axiom 3.5 as a theorem when one has “constructed” the
real numbers. There are several ways to do this, which we outline in this section,
as an excursion and general background.
The standard approach is to consider the set of R as a “system” of numbers
in a sequence of increasingly powerful systems N, Z, Q, R. We first consider the
set N of natural numbers (positive integers) as used for counting, then introduce
zero, and negative integers, which gives us the set of all integers Z. We also have a
way of writing down these integers, namely as finite sequences of decimal digits
(elements of {0, 1, . . . , 9}), preceded by a minus sign for negative integers. This
representation of an integer is unique if the first digit is not 0; all integers can be
written in this way except for the integer zero itself which is written as 0.
3.4. Constructing the Real Numbers * 53

Next, the set of integers Z is extended to fractions 𝑝/𝑞 of integers 𝑝 and 𝑞


where 𝑞 is positive. This is not a unique representation because 𝑎𝑝/𝑎𝑞 (for any
positive integer 𝑎) represents the same fraction as 𝑝/𝑞. The set of all fractions
defines the set Q of rational numbers.
The most familiar way to define a real number is as an infinite decimal fraction.
A decimal fraction starts with the representation of an integer in decimal notation,
followed by a decimal point, followed by an infinite sequence of decimal digits.
The decimal fraction that represents a real number is unique except when the
fraction is finite, that is, after some time all digits are 0, for example 1.25 (which
represents the fraction 5/4 in decimal). As an infinite sequence of decimal digits,
this can be written as either 1.25000 . . . or 1.24999 . . .. Here one typically chooses
the finite sequence that ends in all 0’s rather than the sequence that ends in all 9’s.
For example, 1/3 is represented as 0.333 . . .. Multiplied by 3, this gives 1 or the
equivalent representation 0.999 . . ..
It can be shown that any rational number is represented by a decimal fraction
that is eventually periodic, that is, it becomes an infinitely repeated finite sequence
of digits. For example, 1/7 has the decimal fraction 0.142857142857142857 . . .
which is written as 0.142857. Another example is 1/12 = 0.08333 . . . = 0.083. That
is, any rational number, either as 𝑝/𝑞 with a pair of integers 𝑝 and 𝑞, or as an
eventually periodic decimal fraction, has a finite description.
In contrast, an arbitrary real number (which can be irrational, that is, it is not
an element of Q) is a general decimal fraction. It requires an infinite description
with an infinite sequence of digits after the decimal point which in general has no
predictable pattern. For example, the ratio 𝜋 of the circumference of a circle to its
diameter starts with 3.1415926 . . . but has no discernible pattern in its digits (of
which billions have been computed). Already a single real number is therefore
“described” by an infinite object. In practice, we are typically content to assume
that a finite prefix of the sequence of digits suffices to describe the real number
“sufficiently accurately”, where we can extend this accuracy as much as we like. The
intuition is that finite (truncated) decimal fractions (which are rational numbers)
approximate the represented real number more and more accurately, depending
on the considered length of the truncated sequence.
One complication of infinite decimal fractions is that the arithmetic operations,
such as addition and multiplication, are hard to describe using these infinite de-
scriptions. Essentially, they are performed on the finite approximations themselves.
A way to do this more generally is to define real numbers as “Cauchy sequences”
of rational numbers.
A sequence of numbers (which themselves can be rationals or reals) is written
as 𝑥1 , 𝑥2 , . . . with elements 𝑥 𝑘 of the sequence for each 𝑘 ∈ N (where 𝑥 𝑘 ∈ Q for a
sequence of rational numbers, and 𝑥 𝑘 ∈ R for a sequence of real numbers). The
54 Chapter 3. Continuous Optimisation

entire sequence is denoted by {𝑥 𝑘 } 𝑘∈N or just {𝑥 𝑘 } with the understanding that 𝑘


goes through all natural numbers.
A Cauchy sequence {𝑥 𝑘 } has the property that eventually any two of its elements
are eventually arbitrarily close together, that is,

∀𝜀 > 0 ∃𝐾 ∈ N ∀𝑖, 𝑗 ∈ N : 𝑖, 𝑗 ≥ 𝐾 ⇒ |𝑥 𝑖 − 𝑥 𝑗 | < 𝜀 . (3.9)

That is, for any positive 𝜀, which can be as small as one likes, there is a subscript 𝐾
so that for all 𝑖 and 𝑗 that are at least 𝐾, the sequence elements 𝑥 𝑖 and 𝑥 𝑗 differ by
less than 𝜀. Note that, in particular, we could choose 𝑖 = 𝐾 and 𝑗 arbitrarily larger
than 𝑖, and yet 𝑥 𝑖 and 𝑥 𝑗 would differ by less than 𝜀.
In the Cauchy condition (3.9), all elements 𝑥 1 , 𝑥2 , . . . and 𝜀 can be rational
numbers. An example of such a Cauchy sequence is the sequence of finite decimal
fractions 𝑥 𝑘 obtained from an infinite decimal fraction up to the 𝑘th place past
the decimal point. For example, if the infinite decimal fraction is 3.1415926 . . .,
then this sequence of rational numbers is given by 𝑥1 = 3.1, 𝑥2 = 3.14, 𝑥 3 = 3.141,
𝑥4 = 3.1415, 𝑥 5 = 3.14159, 𝑥6 = 3.141592, 𝑥7 = 3.1415926, and so on, which is easily
seen to be a Cauchy sequence.
More generally, we can define a real number to be a Cauchy sequence of rational
numbers. Two sequences {𝑥 𝑘 } and {𝑦 𝑘 } are equivalent if |𝑥 𝑘 − 𝑦 𝑘 | is arbitrarily small
for sufficiently large 𝑘, and if one of these two sequences is a Cauchy sequence
then so is the other (which is easily seen). Any two equivalent Cauchy sequences
define the same real number. Note that a real number is an infinite object (in fact
an entire equivalence class of Cauchy sequences of rational numbers), similar to
an infinite decimal fraction.
With real numbers defined as Cauchy sequences of rational numbers, it is
possible to prove Axiom 3.5 as a theorem. This requires to show the existence
of limits of sequences of real numbers, and the construction of a supremum as a
suitable limit; see appendix B and section 1.2.4 of Sundaram (1996).
We mention a second possible construction of the real numbers where Axiom
3.5 is much easier to prove. A Dedekind cut is a partition of Q into a lower set 𝐿 and
an upper set 𝑈 such that 𝑎 < 𝑏 for every 𝑎 ∈ 𝐿 and 𝑏 ∈ 𝑈, and such that 𝐿 has no
maximal element. The idea is that each real number 𝑥 defines uniquely such a cut
of the rational numbers into 𝐿 and 𝑈 given by

𝐿 = {𝑎 ∈ Q | 𝑎 < 𝑥}, 𝑈 = {𝑏 ∈ Q | 𝑏 ≥ 𝑥} . (3.10)

If 𝑥 is itself a rational number, then 𝑥 belongs to the upper set 𝑈 for the Dedekind
cut 𝐿, 𝑈 for 𝑥 (which is why we require that 𝐿 has no maximal element, to make
this a unique choice). If 𝑥 is irrational, then 𝑥 belongs to neither 𝐿 nor 𝑈 and is
“between” 𝐿 and 𝑈. Hence, we can see this cut as a definition of 𝑥. The Dedekind
cut 𝐿, 𝑈 in (3.10) that represents 𝑥 is unique. This holds because any two different
3.5. Maximisation and Minimisation 55

real numbers 𝑥 and 𝑦 define different cuts (because a suitable rational number 𝑐
with 𝑥 < 𝑐 < 𝑦 belongs to the upper set of the cut for 𝑥 but to the lower set of the
cut for 𝑦).
In constructing R as Dedekind cuts, the described partitions 𝐿, 𝑈 of Q, each
real number has a unique description as such a cut. (The lower set 𝐿 suffices,
because 𝑈 is just the set of upper bounds of 𝐿 in Q.) Similar to the representation as
a Cauchy sequence, a real number 𝑥 has an infinite description as a set of rational
numbers. If 𝑥 is described by the cut 𝐿, 𝑈 and 𝑥 ′ is described by the cut 𝐿′ , 𝑈 ′,
then we can define 𝑥 ≤ 𝑥 ′ by the inclusion relation 𝐿 ⊆ 𝐿′ (as seen from (3.10)).
Now Axiom 3.5, the order completeness of R, is very easy to prove: Given a
nonempty set 𝐴, bounded above, of real numbers 𝑥 represented by their lower
cut sets 𝐿 in (3.10), the supremum of 𝐴 is represented by the union of these sets 𝐿.
This union is a set of rational numbers, which can be easily shown to fulfill the
properties of a lower cut set of a Dedekind cut, and thus defines a real number,
which can be shown to be the supremum sup 𝐴.
Dedekind cuts are an elegant construction of the real numbers from the rational
numbers. It is slightly more complicated to define arithmetic operations of addition
and, in particular, multiplication of real numbers via the rational numbers in the
respective cut sets than using Cauchy sequences; see Rudin (1976, pages 17–21).
Dedekind cuts are an abstraction that “defines” a point 𝑥 on the real line via
all the rational numbers 𝑎 to the left of 𝑥, which defines the lower cut set 𝐿 in (3.10).
This infinite set 𝐿 is mathematically “simpler” than 𝑥 because it contains only
rational numbers 𝑎. We “know” these rational numbers via their finite descriptions
as fractions, but as points on the line they do not provide a good intuition about
the reals. In our reasoning about the real numbers, we therefore refer usually to
our intuition of the real number line.

3.5 Maximisation and Minimisation

Real numbers are the values that a function takes which we want to optimise
(maximise or minimise). The domain of the considered function can be rather
general, and will often by denoted by 𝑋, which is always a nonempty set.
Consider a function 𝑓 : 𝑋 → R, where 𝑋 is a nonempty set. The function 𝑓 is
called the objective function. The domain 𝑋 of 𝑓 is sometimes called the constraint
set (typically when 𝑋 is described by certain “constraints”, for example 𝑥 ≥ 0 and
𝑥 ≤ 1 if 𝑋 is the interval [0, 1]).
Our basic optimisation problems are:
(a) maximise 𝑓 (𝑥) subject to 𝑥 ∈ 𝑋,
(b) minimise 𝑓 (𝑥) subject to 𝑥 ∈ 𝑋.
56 Chapter 3. Continuous Optimisation

A solution to the maximisation problem (a) is an element 𝑥 ∗ ∈ 𝑋 such that

𝑓 (𝑥 ∗ ) ≥ 𝑓 (𝑥) for all 𝑥 ∈ 𝑋.

If such an element 𝑥 ∗ exists we say that 𝑓 attains the (global) maximum on 𝑋


at 𝑥 ∗ . We refer to 𝑥 ∗ as a (global) maximiser of 𝑓 on 𝑋, and to 𝑓 (𝑥 ∗ ) as the (global)
maximum of 𝑓 on 𝑋. (The adjective “global” is used in distinction to a “local”
maximum, defined later.)
The set of all solutions to (a) is denoted by

arg max{ 𝑓 (𝑥) | 𝑥 ∈ 𝑋} := {𝑥 ∗ ∈ 𝑋 | 𝑓 (𝑥 ∗ ) ≥ 𝑓 (𝑥) for all 𝑥 ∈ 𝑋}.

Analogously, a solution to the minimisation problem (b) is an element 𝑥 of 𝑋


such that
𝑓 (𝑥) ≤ 𝑓 (𝑥) for all 𝑥 ∈ 𝑋.
If such an element 𝑥 exists we say that 𝑓 attains the (global) minimum on 𝑋 at 𝑥,
with 𝑥 as a (global) minimiser of 𝑓 on 𝑋, and 𝑓 (𝑥) as the (global) minimum of 𝑓
on 𝑋. The set of all solutions to (b) is denoted by

arg min{ 𝑓 (𝑥) | 𝑥 ∈ 𝑋} := {𝑥 ∈ 𝑋 | 𝑓 (𝑥) ≤ 𝑓 (𝑥) for all 𝑥 ∈ 𝑋}.

Note the difference between “a solution” to one of the optimisation problems


above and “the solutions” (or “the set of solutions”), which are all solutions.
Furthermore, if a global maximum exists, then it is unique, but the global maximiser
is not necessarily unique. (The same is true for the global minimum.) Quite often
there will be no solution at all; in that case, the set of solutions is the empty set.
Here are some examples that illustrate, in particular, the importance of the
domain 𝑋.

Example 3.6. 𝑋 = [0, ∞), 𝑓 : 𝑋 → R, 𝑓 (𝑥) = 𝑥. Then the maximisation problem


has no solution, that is, arg max{ 𝑓 (𝑥) | 𝑥 ∈ 𝑋} = ∅, whereas the minimisation
problem has a unique solution, arg min{ 𝑓 (𝑥) | 𝑥 ∈ 𝑋} = {0}, and the global
minimum is 𝑓 (0) = 0.

Example 3.7. 𝑋 = [−1, 1], 𝑓 : 𝑋 → R, 𝑓 (𝑥) = 𝑥 2 . Then arg max{ 𝑓 (𝑥) | 𝑥 ∈ 𝑋} =


{−1, 1} and the global maximum is 𝑓 (−1) = 𝑓 (1) = 1. The minimisation problem
has a unique solution, arg min{ 𝑓 (𝑥) | 𝑥 ∈ 𝑋} = {0}, and the global minimum is
𝑓 (0) = 0.

Example 3.8. 𝑋 = R, 𝑓 : 𝑋 → R, 𝑓 (𝑥) = 5. This is a constant function, where


the set of solutions of both optimisation problems is the entire domain, that is,
arg max{ 𝑓 (𝑥) | 𝑥 ∈ 𝑋} = arg min{ 𝑓 (𝑥) | 𝑥 ∈ 𝑋} = 𝑋, and the global maximum
and minimum is 5. Only for constant functions (that is, 𝑓 (𝑥) = 𝑓 (𝑦) for all 𝑥, 𝑦 ∈ 𝑋)
maximum and minimum coincide.
3.6. Sequences, Convergence, and Limits 57

The following is an easy but useful observation, proved with the help of (3.2).
We use it in order to focus on maximisation problems and to avoid repeating very
similar considerations for minimisation problems.

Proposition 3.9. Consider a function 𝑓 : 𝑋 → R and let 𝑥 ∈ 𝑋. Then 𝑥 is a maximiser


of 𝑓 on 𝑋 if and only if 𝑥 is a minimiser of − 𝑓 on 𝑋.

The following theorem is useful in applications.

Theorem 3.10. Suppose 𝑋 = 𝑋1 ∪ 𝑋2 (the two sets 𝑋1 and 𝑋2 need not be disjoint) such
that there exists an element 𝑦 in 𝑋1 with 𝑓 (𝑦) ≥ 𝑓 (𝑥) for all 𝑥 ∈ 𝑋2 and that 𝑓 attains a
maximum on 𝑋1 . Then 𝑓 attains a maximum on 𝑋.

Proof. Any element 𝑥 of 𝑋 belongs to 𝑋1 or 𝑋2 . Consider the maximum 𝑥 ∗ of


𝑓 on 𝑋1 . If 𝑥 ∈ 𝑋1 then 𝑓 (𝑥 ∗ ) ≥ 𝑓 (𝑥). If 𝑥 ∈ 𝑋2 then we have 𝑓 (𝑦) ≥ 𝑓 (𝑥), and
𝑓 (𝑥 ∗ ) ≥ 𝑓 (𝑦), and hence also 𝑓 (𝑥 ∗ ) ≥ 𝑓 (𝑥), so 𝑥 ∗ is indeed a maximiser of 𝑓 on 𝑋.

3.6 Sequences, Convergence, and Limits

Analysis and the study of continuity require the use of sequences and limits. For
the moment we consider only sequences of real numbers. Recall that a sequence
𝑥1 , 𝑥2 , . . . is denoted by {𝑥 𝑘 } 𝑘∈N or just {𝑥 𝑘 }. The limit of such a sequence, if it
exists, is a real number 𝐿 so that the elements 𝑥 𝑘 of the sequence are eventually
arbitrary close to 𝐿. This closeness is described by a maximum distance from 𝐿,
often called 𝜀, that is chosen as an arbitrarily small positive real number.

Definition 3.11. A sequence {𝑥 𝑘 } 𝑘∈N of real numbers converges to 𝐿, or has limit 𝐿,


if
∀𝜀 > 0 ∃𝐾 ∈ N ∀𝑘 ∈ N : 𝑘 ≥ 𝐾 ⇒ |𝑥 𝑘 − 𝐿| < 𝜀 . (3.11)

Then we also write lim𝑛→∞ 𝑥 𝑘 = 𝐿, or 𝑥 𝑘 → 𝐿 as 𝑘 → ∞ (read as “𝑥 𝑘 tends to 𝐿 as


𝑘 tends to infinity”).

In words, (3.11) says that for every (arbitrarily small) positive 𝜀 there is some
index 𝐾 such that from 𝐾 onwards (𝑘 ≥ 𝐾) all sequence elements 𝑥 𝑘 differ in
absolute value by less than 𝜀 from 𝐿.
The next proposition asserts that if a sequence has a limit, that limit is unique.
You should remember this from a course on real analysis.

⇒ Try proving Proposition 3.12 on your own before you study its proof.

Proposition 3.12. A sequence {𝑥 𝑘 } 𝑘∈N can have at most one limit.


58 Chapter 3. Continuous Optimisation

Proof. Suppose there are two limits 𝐿 and 𝐿′ of the sequence {𝑥 𝑘 } 𝑘∈N with 𝐿 ≠ 𝐿′.
We arrive at a contradiction as follows. Let 𝜀 = |𝐿 − 𝐿′ |/2 and consider 𝐾 and 𝐾 ′
such that 𝑘 ≥ 𝐾 implies |𝑥 𝑘 − 𝐿| = |𝐿 − 𝑥 𝑘 | < 𝜀 and 𝑘 ≥ 𝐾 ′ implies |𝑥 𝑘 − 𝐿′ | < 𝜀.
Consider some 𝑘 such that 𝑘 ≥ 𝐾 and 𝑘 ≥ 𝐾 ′. Then
2𝜀 = |𝐿 − 𝐿′ | = |𝐿 − 𝑥 𝑘 + 𝑥 𝑘 − 𝐿′ | ≤ |𝐿 − 𝑥 𝑘 | + |𝑥 𝑘 − 𝐿′ | < 𝜀 + 𝜀 = 2𝜀, (3.12)
which is a contradiction.
The symbol ∞ for (positive) infinity can be thought of as an extra element
that is larger than any real number. Similarly, −∞ is an additional element that is
smaller than any real number. In terms of the order ≤, it is unproblematic to extend
the set R with the elements ∞ and −∞. However, when used with arithmetic
operations these infinite elements are in general not useful and should not be
treated as “numbers”; for example, ∞ − ∞ cannot be meaningfully defined.
We say that a sequence is bounded (from above or below, or just bounded
if both) if this holds for the set of its elements. A sequence that converges is
necessarily bounded: Fix some 𝜀 (for example 𝜀 = 1). Choose 𝐾 in (3.11) such
that 𝐿 − 𝜀 < 𝑥 𝑘 < 𝐿 + 𝜀 for all 𝑘 ≥ 𝐾. If 𝐾 = 1 then the sequence is bounded by
𝐿 − 𝜀 from below and 𝐿 + 𝜀 from above. If 𝐾 > 1, then the set {𝑥 𝑖 | 1 ≤ 𝑖 < 𝐾} is
nonempty and finite, and thus has a maximum 𝑎 and minimum 𝑏. Then the larger
number of 𝐿 + 𝜀 and 𝑎 is an upper bound and the smaller number of 𝐿 − 𝜀 and 𝑏 is
a lower bound for the sequence.
An unbounded sequence may nevertheless show the “limiting behaviour” that
eventually its elements become arbitrarily large. We then say that the sequence
tends to infinity.
Definition 3.13. A sequence {𝑥 𝑘 } 𝑘∈N of real numbers tends to infinity, written
𝑥 𝑘 → ∞ as 𝑘 → ∞, or lim 𝑘→∞ 𝑥 𝑘 = ∞, if
∀𝑀 ∈ R ∃𝐾 ∈ N ∀𝑘 ∈ N : 𝑘 ≥ 𝐾 ⇒ 𝑥 𝑘 > 𝑀 . (3.13)

Instead of “tends to infinity” the sequence is also said to “diverge” to infinity.


However, in general a sequence diverges if it does not converge. Even a bounded
sequence can diverge, however.

⇒ Prove, using the definition of a limit, that the sequence {𝑦 𝑘 } defined by


𝑦 𝑘 = (−1) 𝑘 does not converge (this is Exercise 3.1(b)).

To conclude this section, we will show a connection between boundedness and


convergence, namely that every bounded sequence has a convergent subsequence.
This makes crucial use of the order completeness of the reals (see Proposition 3.14
below).
We need a few more definitions. A subsequence of a sequence {𝑥 𝑘 } 𝑘∈N is a
sequence {𝑥 𝑘 𝑛 } 𝑛∈N where 𝑘 1 , 𝑘2 , . . . is an increasing sequence of natural numbers.
3.6. Sequences, Convergence, and Limits 59

For example, these may be the even numbers given by 𝑘 𝑛 = 2𝑛, but they do not
need to be defined explicitly. The subsequence just considers some (infinite) subset
of the elements of the original sequence in increasing order.

⇒ Show that if a sequence converges, then every subsequence of that sequence


converges to the same limit.

A sequence {𝑥 𝑘 } 𝑘∈N is called increasing if (always for all 𝑘 ∈ N) 𝑥 𝑘 < 𝑥 𝑘+1 ,


weakly increasing if 𝑥 𝑘 ≤ 𝑥 𝑘+1 , decreasing if 𝑥 𝑘 > 𝑥 𝑘+1 , weakly decreasing if 𝑥 𝑘 ≤ 𝑥 𝑘+1 ,
and monotonic if it is weakly increasing or weakly decreasing.

Proposition 3.14. A weakly increasing sequence that is bounded from above converges to
the supremum of the set of its elements. A weakly decreasing sequence that is bounded
from below converges to the infimum of the set of its elements.

Proof. Let the sequence be {𝑥 𝑘 } and let 𝐴 = {𝑥 𝑘 | 𝑘 ∈ N} be the set of the elements of
the sequence. Assume 𝐴 is bounded from above, so 𝐴 has a supremum 𝐿 = sup 𝐴
by Axiom 3.5. Let 𝜀 > 0. We want to show that for some 𝐾 ∈ N we have |𝑥 𝑘 − 𝐿| < 𝜀
for all 𝑘 ≥ 𝐾. Because 𝐿 is an upper bound of 𝐴, we have 𝐿 ≥ 𝑥 𝑘 and thus
|𝑥 𝑘 − 𝐿| = 𝐿 − 𝑥 𝑘 for all 𝑘, so we want to show 𝐿 − 𝑥 𝑘 < 𝜀 or equivalently 𝐿 − 𝜀 < 𝑥 𝑘 .
It suffices to find 𝐾 with 𝐿 − 𝜀 < 𝑥 𝐾 because 𝑥 𝐾 ≤ 𝑥 𝐾+1 ≤ 𝑥 𝐾+2 ≤ · · · ≤ 𝑥 𝑘 for all
𝑘 ≥ 𝐾 because the sequence is weakly increasing. Now, if for all 𝐾 ∈ N we had
𝐿 − 𝜀 ≥ 𝑥 𝐾 then 𝐿 − 𝜀 would be an upper bound of 𝐴 which is less than 𝐿, but 𝐿 is
the least upper bound of 𝐴. So the desired 𝐾 with 𝐿 − 𝜀 < 𝑥 𝐾 exists.
The claim for the infimum is proved similarly, or obtained by considering the
sequence {−𝑥 𝑘 } and its supremum instead of the original sequence.

Proposition 3.15. Every sequence has monotonic subsequence.

Proof. The following argument has a nice visualisation in terms of “hotels that have
a view of the sea”. Suppose the real numbers 𝑥1 , 𝑥2 , . . . are the heights of hotels.
From the top of each hotel with height 𝑥 𝑘 you can look beyond the subsequent
hotels with heights 𝑥 𝑘+1 , 𝑥 𝑘+2 , . . . if they have lower height, and see the sea at
infinity if these are all lower. In other words, a hotel has “seaview” if it belongs to
the set 𝑆 given by
𝑆 = {𝑘 ∈ N | 𝑥 𝑘 > 𝑥 𝑗 for all 𝑗 > 𝑘}
(presumably, these are very expensive hotels). If 𝑆 is infinite, then we take the
elements of 𝑆, in ascending order as the subscripts 𝑘 1 , 𝑘2 , 𝑘3 , . . . that give our
subsequence 𝑥 𝑘1 , 𝑥 𝑘2 , 𝑥 𝑘3 , . . ., which is clearly decreasing. If, however, 𝑆 is finite
with maximal element 𝐾 (take 𝐾 = 0 if 𝑆 is empty), then for each 𝑘 > 𝐾 we have
𝑘 ∉ 𝑆 and hence for 𝑥 𝑘 there exists some 𝑗 > 𝑘 with 𝑥 𝑘 ≤ 𝑥 𝑗 . Starting with 𝑥 𝑘1
for 𝑘 1 = 𝐾 + 1 we let 𝑘2 = 𝑗 > 𝑘 1 with 𝑥 𝑘1 ≤ 𝑥 𝑘2 . Then find another 𝑘 3 > 𝑘2 such
that 𝑥 𝑘2 ≤ 𝑥 𝑘3 , and so on, which gives a weakly increasing subsequence {𝑥 𝑘 𝑛 } 𝑛∈N
60 Chapter 3. Continuous Optimisation

with 𝑥 𝑘1 ≤ 𝑥 𝑘2 ≤ 𝑥 𝑘3 ≤ · · · . In either case, the original sequence has a monotonic


subsequence.
The last two propositions together give a famous theorem known as the
“Bolzano–Weierstrass” theorem.

Theorem 3.16 (Bolzano–Weierstrass). Every bounded sequence has a convergent


subsequence.

Proof. The sequence has a monotonic subsequence by Proposition 3.15. Because the
sequence is bounded, so is the subsequence, whose set of elements therefore has
supremum and infinimum. If the subsequence is weakly increasing, it converges
to the supremum of its elements by Proposition 3.14; if the subsequence is weakly
decreasing, it converges to the infimum.

3.7 Euclidean Norm and Maximum Norm

Recall that if 𝑋 and 𝑌 are sets and 𝑓 : 𝑋 → 𝑌 is a function, then 𝑋 is called the
domain and 𝑌 the range of 𝑓 . The range should be distinguished from the image
of 𝑓 , denoted by
𝑓 (𝑋) = { 𝑓 (𝑥) | 𝑥 ∈ 𝑋 } , (3.14)
which is the set of possible values of 𝑓 . The image 𝑓 (𝑋) is always a subset of the
range 𝑌. When 𝑓 (𝑋) = 𝑌 then 𝑓 is called a surjective function. Because we want
to maximise or minimise the function, the range will always be R, that is, 𝑓 is
real-valued.
The domain 𝑋 of 𝑓 will in the following be assumed to be a subset of R𝑛 ,
which is the set of 𝑛-tuples of real numbers,

R𝑛 = { (𝑥1 , . . . , 𝑥 𝑛 ) | 𝑥 𝑖 ∈ R for 1 ≤ 𝑖 ≤ 𝑛 } (3.15)

(𝑛 is some positive integer). The components 𝑥 𝑖 of an 𝑛-tuple (𝑥1 , . . . , 𝑥 𝑛 ) are also


called its coordinates. Often R𝑛 is called the 𝑛-dimensional Euclidean space and
its elements are called points or sometimes vectors. This generalises the familiar
geometric cases for 𝑛 = 1, 2, 3, where R1 is the real line, R2 the plane, and R3 the
three-dimensional space. In these cases, we will often use coordinates 𝑥, 𝑦, and 𝑧,
writing (𝑥, 𝑦) for the elements of R2 and (𝑥, 𝑦, 𝑧) for the elements of R3 , but this
notation may vary. It is always useful to consider simple cases as examples, where
often R1 or R2 suffices.
We will soon define formally what it means for a function to be continuous.
Intuitively, a continuous function maps nearby points to nearby points. Here, “nearby”
means “of arbitrarily small distance”. The distance between two real numbers 𝑥
and 𝑦 (that is, points on the real line) is simply |𝑥 − 𝑦|. There are several ways to
3.7. Euclidean Norm and Maximum Norm 61

generalise this when 𝑥 and 𝑦 are points in R𝑛 . The standard “Euclidean” distance
is well known from measuring geometric distances in R2 , for example. In order to
deal with continuity, a distance function defined in terms of the “maximum norm”
will often be simpler to use.

Definition 3.17. Let 𝑥 = (𝑥1 , . . . , 𝑥 𝑛 ) ∈ R𝑛 . Then the (Euclidean) norm ∥𝑥 ∥ of 𝑥 is


defined by v
𝑛
t
Õ
∥𝑥∥ = 𝑥 2𝑖 . (3.16)
𝑖=1

The maximum norm ∥𝑥∥ max of 𝑥 is defined by

∥𝑥∥ max = max{ |𝑥1 |, . . . , |𝑥 𝑛 | } . (3.17)

For 𝑥, 𝑦 ∈ R𝑛 , the (Euclidean) distance between 𝑥 and 𝑦 is defined by

𝑑(𝑥, 𝑦) = ∥𝑥 − 𝑦∥ , (3.18)

and their maximum-norm distance by

𝑑max (𝑥, 𝑦) = ∥𝑥 − 𝑦∥ max . (3.19)

In Definition 3.17, the distance of two elements 𝑥 and 𝑦 of R𝑛 is defined in terms


of a norm as ∥𝑥 − 𝑦∥. For any arbitray set instead of R𝑛 , a distance can be considered
as any function that assigns a real number 𝑑(𝑥, 𝑦) to any two elements 𝑥, 𝑦 of the
set, provided it fulfills the following axioms. These axioms state: nonnegativity of
distance, and positive distance of distinct elements, according to

𝑑(𝑥, 𝑥) = 0, and 𝑥 ≠ 𝑦 ⇒ 𝑑(𝑥, 𝑦) > 0 , (3.20)

symmetry
𝑑(𝑥, 𝑦) = 𝑑(𝑦, 𝑥) , (3.21)
and the triangle inequality

𝑑(𝑥, 𝑧) ≤ 𝑑(𝑥, 𝑦) + 𝑑(𝑦, 𝑧) . (3.22)

It can be shown that the maximum-norm distance fulfills these axioms (see
Exercise 3.3), and so does the Euclidean norm.
The triangle inequality is then often stated as

∥𝑥 + 𝑦∥ ≤ ∥𝑥∥ + ∥ 𝑦∥ (3.23)

which implies (3.22) using 𝑥 − 𝑦 and 𝑦 − 𝑧 instead of 𝑥 and 𝑦. For an arbitrary set,
a trivially defined distance function that also fulfills axioms (3.20)–(3.22) is given
by 𝑑(𝑥, 𝑥) = 0 and 𝑑(𝑥, 𝑦) = 1 for 𝑥 ≠ 𝑦.
62 Chapter 3. Continuous Optimisation

Let 𝜀 > 0 and 𝑥 ∈ R𝑛 . The set of all points 𝑦 that have distance less than 𝜀 is
called the 𝜀-ball around 𝑥, defined as
𝐵(𝑥, 𝜀) = {𝑦 ∈ R𝑛 | ∥𝑦 − 𝑥∥ < 𝜀 } . (3.24)
It is also called the open ball because the inequality in (3.24) is strict. That is, 𝐵(𝑥, 𝜀)
does not include its “boundary”, called a sphere, which consists of all points 𝑦
whose distance to 𝑥 is equal to 𝜀.
Similary, the maximum-norm 𝜀-ball around a point 𝑥 in R𝑛 is defined as the set
𝐵max (𝑥, 𝜀) = {𝑦 ∈ R𝑛 | ∥𝑦 − 𝑥∥ max < 𝜀 } . (3.25)
The following is elementary but extremely useful:
∀𝑦 ∈ R𝑛 : ( 𝑦 ∈ 𝐵max (𝑥, 𝜀) ⇔ ∀𝑖 ∈ {1, . . . , 𝑛} : |𝑦 𝑖 − 𝑥 𝑖 | < 𝜀 ) . (3.26)
In other words, 𝑦 is in the maximum-norm 𝜀-ball around 𝑥 if and only if 𝑦 differs
from 𝑥 in each component by less than 𝜀. This follows immediately from (3.17).
The following picture shows the 𝜀-ball and the maximum-norm 𝜀-ball for
𝜀 = 1 around the origin 0 in R2 . The latter, 𝐵max (0, 1), is the set of all points (𝑥 1 , 𝑥2 )
so that −1 < 𝑥1 < 1 and −1 < 𝑥 2 < 1, which is the open square shown on the right.
x2 x2

(0,1) (0,1) (1,1)

x1 x1
0 0
(1,0) (1,0)

The following picture on the left


x2 x2

(0,1) (1,1) (0,1) (1,1)

x1 x1
0 0
(1,0) (1,0)
3.8. Sequences and Convergence in R𝑛 63

illustrates for 𝑥 = 0 and 𝜀 = 1 that


𝐵(𝑥, 𝜀) ⊆ 𝐵max (𝑥, 𝜀) . (3.27)
For general 𝑥 and 𝜀 this can be seen as follows. Assume 𝑦 ∈ 𝐵(𝑥, 𝜀), that is,
∥ 𝑦 − 𝑥∥ < 𝜀. We have to show that then 𝑦 ∈ 𝐵max (𝑥, 𝜀), that is, ∥ 𝑦 − 𝑥∥ pmax < 𝜀, that
is, |𝑦 𝑘 − 𝑥 𝑘 | < 𝜀 for all 𝑘 = 1, . . . , 𝑛. But this holds because |𝑦 𝑘 − 𝑥 𝑘 | = (𝑦 𝑘 − 𝑥 𝑘 )2 ≤
Í𝑛
𝑖=1 (𝑦 𝑖 − 𝑥 𝑖 ) = ∥ 𝑦 − 𝑥∥ < 𝜀, which shows (3.27).
p
2

The above picture on the right shows for 𝑛 = 2 that



𝐵max (𝑥, 𝜀) ⊆ 𝐵(𝑥, 𝜀 𝑛) . (3.28)
The reason is that the corner point (1, 1) of the square has farthest
√ Euclidean distance
from (0, 0). So we can put the square into a disk of radius 2. In general, (3.28) is
seen as follows: Let 𝑦 ∈ 𝐵maxp (𝑥, 𝜀), that is, ∥𝑦 − 𝑥∥ max < 𝜀 and thusp |𝑦 𝑘 − 𝑥 𝑘 | < 𝜀 for
Í𝑛 𝑛 Í𝑛 2 √
1 ≤ 𝑘 ≤ 𝑛. Then ∥ 𝑦 − 𝑥∥ = 𝑖=1 (𝑦 𝑖 − 𝑥 𝑖 ) = 𝑖=1 |𝑦 𝑖 − 𝑥 𝑖 | < 𝑖=1 𝜀 = 𝜀 𝑛

2 2

as claimed.

3.8 Sequences and Convergence in R𝑛

In Section 3.6, we considered sequences of real numbers and their limits, if they
exist. In this section we give analogous definitions and results for sequences of
points in R𝑛 . Because 𝑥 𝑖 denotes a component of a point 𝑥 = (𝑥1 , . . . , 𝑥 𝑛 ), we will
write sequence elements, which are now elements of R𝑛 , in the form 𝑥 (𝑘) for 𝑘 ∈ N.
Analogously to (3.11), the sequence {𝑥 (𝑘) } 𝑘∈N has limit 𝑥 ∈ R𝑛 , or 𝑥 (𝑘) → 𝑥 as
𝑘 → ∞, if
∀𝜀 > 0 ∃𝐾 ∈ N ∀𝑘 ∈ N : 𝑘 ≥ 𝐾 ⇒ ∥𝑥 (𝑘) − 𝑥 ∥ < 𝜀 . (3.29)
The sequence is bounded if there is some 𝑀 ∈ R so that
∀𝑘 ∈ N : ∥𝑥 (𝑘) ∥ ≤ 𝑀 . (3.30)
Analogously to Proposition 3.12, a sequence can have at most one limit. This is
proved in the same way, where the contradiction (3.12) is proved with the help of
the triangle inequality (3.23), using the norm instead of the absolute value.
In the definitions (3.29) and (3.30), we have used the Euclidean norm, but we
could have used in the same way the maximum norm as defined in (3.17) instead,
as asserted by the following lemma.
Lemma 3.18. The sequence {𝑥 (𝑘) } 𝑘∈N in R𝑛 has limit 𝑥 ∈ R𝑛 if and only if
∀𝜀 > 0 ∃𝐾 ∈ N ∀𝑘 ∈ N : 𝑘 ≥ 𝐾 ⇒ ∥𝑥 (𝑘) − 𝑥 ∥ max < 𝜀 , (3.31)
and it is bounded if and only if for some 𝑀 ∈ R
∀𝑘 ∈ N : ∥𝑥 (𝑘) ∥ max ≤ 𝑀 . (3.32)
64 Chapter 3. Continuous Optimisation

Proof. Suppose {𝑥 (𝑘) } converges to 𝑥 in the Euclidean norm. Let 𝜀 > 0, and choose
𝐾 in (3.29) so that 𝑘 ≥ 𝐾 implies ∥𝑥 (𝑘) − 𝑥∥ < 𝜀, that is, 𝑥 (𝑘) ∈ 𝐵(𝑥, 𝜀). Because
𝐵(𝑥, 𝜀) ⊆ 𝐵max (𝑥, 𝜀) by (3.27), this also means 𝑥 (𝑘) ∈ 𝐵max (𝑥, 𝜀), which shows (3.31).
Conversely, assume (3.31) holds and let 𝜀 > 0. Choose 𝐾 so that 𝑘 ≥ 𝐾 implies
√ √
𝑥 (𝑘)∈ 𝐵max (𝑥, 𝜀/ 𝑛). Then 𝐵max (𝑥, 𝜀/ 𝑛) ⊆ 𝐵max (𝑥, 𝜀) by (3.28), which shows
(3.29).
The equivalence of (3.30) and (3.32) is shown similarly.
According to (3.29), a sequence {𝑥 (𝑘) } converges to 𝑥 when for every 𝜀 > 0
eventually (that is, for sufficiently large 𝑘) all elements of the sequence are in
the open ball 𝐵(𝑥, 𝜀) of radius 𝜀 around 𝑥. The same applies to (3.31), using the
(square- or cubical-looking) ball 𝐵max (𝑥, 𝜀). Another useful view of (3.31) is that
the sequence {𝑥 (𝑘) } converges to 𝑥 if for each component 𝑖 = 1, . . . , 𝑛 of these
(𝑘)
𝑛-tuples, we have 𝑥 𝑖 → 𝑥 𝑖 as 𝑘 → ∞, because the condition
(𝑘)
|𝑥 𝑖 − 𝑥 𝑖 | < 𝜀 (3.33)

for all 𝑖 = 1, . . . , 𝑛, if it holds for all 𝑘 ≥ 𝐾, is equivalent to (3.31), as shown in


(3.26). Conversely, if we have convergence in each component, that is, (3.33) holds
for 𝑘 ≥ 𝐾 𝑖 , for all 𝑖, then we can simply take 𝐾 = max{𝐾1 , . . . , 𝐾 𝑛 } to obtain (3.31).
To repeat, a sequence in R𝑛 converges if and only if it convergences in each of its 𝑛
components.

⇒ Similarly, prove that a sequence is bounded if and only if it is bounded in


each component, according to (3.32).

3.9 Open and Closed Sets

We are concerned with the behaviour of a function 𝑓 “near a point 𝑎”, that is, how
the function value 𝑓 (𝑥) behaves when 𝑥 is near 𝑎, where 𝑥 and 𝑎 are points in some
subset 𝑆 of R𝑛 . For that purpose, it is of interest whether 𝑎 can be approached
with suitable choices of 𝑥 from “all sides”, which is the case if there is an 𝜀-ball
around 𝑎 that is fully contained in 𝑆. If that is the case, then the set 𝑆 will be called
open according to the following definition.

Definition 3.19. Let 𝑆 ⊆ R𝑛 . Then 𝑆 is called open if

∀𝑎 ∈ 𝑆 ∃𝜀 > 0 : 𝐵(𝑎, 𝜀) ⊆ 𝑆 . (3.34)

By (3.27) and (3.28), we could use the maximum-norm ball instead of the
Euclidean-norm ball in (3.34), that is, 𝑆 is open if and only if

∀𝑎 ∈ 𝑆 ∃𝜀 > 0 : 𝐵max (𝑎, 𝜀) ⊆ 𝑆 . (3.35)


3.9. Open and Closed Sets 65

The following is a useful exercise.

⇒ Prove that the open balls 𝐵(𝑎, 𝜀) and 𝐵max (𝑎, 𝜀) are themselves open subsets
of R𝑛 .

Definition 3.20. An interior point of a subset 𝑆 of R𝑛 is a point 𝑎 so that 𝐵(𝑎, 𝜀) ⊆ 𝑆


for some 𝜀 > 0.

Hence, a set 𝑆 is open if all its elements are interior points of 𝑆.


Related to the concept of an open set is the concept of a closed set.

Definition 3.21. Let 𝑆 ⊆ R𝑛 . Then 𝑆 is called closed if for all 𝑎 ∈ R𝑛 and all
sequences {𝑥 (𝑘) } in 𝑆 (that is, 𝑥 (𝑘) ∈ 𝑆 for all 𝑘 ∈ N) with limit 𝑎 we have 𝑎 ∈ 𝑆.

Definition 3.22. A limit point of a subset 𝑆 of R𝑛 is a point 𝑎 ∈ R𝑛 so that there is a


sequence {𝑥 (𝑘) } in 𝑆 with limit 𝑎. Note that 𝑎 need not be an element of 𝑆.

Another common term for limit point is accumulation point. Clearly, a set is
closed if and only if it contains all its limit points. Trivially, every element 𝑎 of 𝑆 is a
limit point of 𝑆, by taking the constant sequence given by 𝑥 (𝑘) = 𝑎 in Definition 3.22.
The next lemma is important to show the connection between open and closed
sets.

Lemma 3.23. Let 𝑆 ⊆ R𝑛 and 𝑎 ∈ R𝑛 . Then 𝑎 is a limit point of 𝑆 if and only if

∀𝜀 > 0 : 𝐵(𝑎, 𝜀) ∩ 𝑆 ≠ ∅ . (3.36)

Proof. Suppose 𝑎 is a limit point of 𝑆 according to Definition 3.22, so that there is a


sequence {𝑥 (𝑘) } in 𝑆 with limit 𝑎. Let 𝜀 > 0. By (3.29), there exists 𝐾 ∈ N so that
∥𝑥 (𝑘) − 𝑎∥ < 𝜀 for all 𝑘 ≥ 𝐾, that is, 𝑥 (𝑘) ∈ 𝐵(𝑎, 𝜀), in particular for 𝑘 = 𝐾, so 𝐵(𝑎, 𝜀)
and 𝑆 contain the common element 𝑥 (𝐾) , which shows (3.36).
Conversely, suppose (3.36) holds. We want to find a sequence {𝑥 (𝑘) } in 𝑆 with
limit 𝑎. For that purpose, we choose a sequence of smaller and smaller distances 𝜀,
such as 1/𝑘 for 𝑘 ∈ N. By assumption, 𝐵(𝑎, 1/𝑘) ∩ 𝑆 ≠ ∅, so let 𝑥 (𝑘) be such an
element of 𝑆 that also belongs to 𝐵(𝑎, 1/𝑘), that is, ∥𝑥 (𝑘) − 𝑎 ∥ < 1/𝑘. This defines
our sequence {𝑥 (𝑘) } in 𝑆. This sequence has limit 𝑎. Namely, in order to prove (3.29)
for 𝑥 = 𝑎, let 𝜀 > 0. Let 𝐾 be an integer such that 𝐾 ≥ 1/𝜀. Then 𝑘 ≥ 𝐾 implies
𝑘 ≥ 1/𝜀 and hence ∥𝑥 (𝑘) − 𝑎∥ < 1/𝑘 ≤ 𝜀 as required.
The next theorem states the connection between open and closed sets: A set is
closed if and only if its set-theoretic complement is open.

Theorem 3.24. Let 𝑆 ⊆ R𝑛 and let 𝑇 = R𝑛 \ 𝑆 = {𝑥 ∈ R𝑛 | 𝑥 ∉ 𝑆}. Then 𝑆 is closed if


and only if 𝑇 is open.
66 Chapter 3. Continuous Optimisation

Proof. Suppose 𝑆 is closed, so it contains all its limit points. We want to show that
𝑇 is open, so let 𝑎 ∈ 𝑇. We want to show that 𝐵(𝑎, 𝜀) ⊆ 𝑇 for some 𝜀 > 0. If that
was not the case, then for all 𝜀 > 0 there would be some element 𝑎 in 𝐵(𝑎, 𝜀) that
does not belong to 𝑇 and hence to 𝑆, so that 𝐵(𝑎, 𝜀) ∩ 𝑆 ≠ ∅. But then 𝑎 is a limit
point of 𝑆 according to Lemma 3.23, hence 𝑎 ∈ 𝑆 because 𝑆 is closed, contrary to
the assumption that 𝑎 ∈ 𝑇.
Conversely, assume 𝑇 is open, so for all 𝑎 ∈ 𝑇 we have 𝐵(𝑎, 𝜀) ⊆ 𝑇 for some
𝜀 > 0. But then 𝐵(𝑎, 𝜀) ∩ 𝑆 = ∅, and thus 𝑎 is not a limit point of 𝑆. Hence 𝑆
contains all its limits points (if not, such a point would belong to 𝑇), so 𝑆 is closed.

It is possible that a set is both open and closed, which applies to the full set R𝑛
and to the empty set ∅. (For any “connected space” such as R𝑛 , these are the only
possibilities.) A set may also be neither open nor closed, such as the half-open
interval [0, 1) as a subset of R1 . This set does not contain its limit point 1 and is
therefore not closed. It is also not open, because its element 0 does not have a ball
𝐵(0, 𝜀) = (−𝜀, 𝜀) around it that is fully contained in [0, 1). Another example of a
set which is neither open nor closed is the set {1/𝑛 | 𝑛 ∈ N} which is missing its
limit point 0.
The following theorem states that the intersection of any two open sets 𝑆 and
𝑆′ is open, and the arbitrary union 𝑖∈𝐼 𝑆 𝑖 of any open sets 𝑆 𝑖 is open. Similarly, the
Ð
intersection of any two closed sets 𝑆 and 𝑆′ is closed, and the arbitrary intersection
𝑖∈𝐼 𝑆 𝑖 of any closed sets 𝑆 𝑖 is closed. Here 𝐼 is any (possibly infinite) nonempty
Ð
set of subscripts 𝑖 for the sets 𝑆 𝑖 , and

𝑆 𝑖 = {𝑥 | ∃𝑖 ∈ 𝐼 : 𝑥 ∈ 𝑆 𝑖 } 𝑆 𝑖 = {𝑥 | ∀𝑖 ∈ 𝐼 : 𝑥 ∈ 𝑆 𝑖 } .
Ð Ñ
𝑖∈𝐼 and 𝑖∈𝐼 (3.37)

Theorem 3.25. Let 𝑆, 𝑆′ ⊆ R𝑛 , and let 𝑆 𝑖 ⊆ R𝑛 for 𝑖 ∈ 𝐼 for some arbitrary nonempty
set 𝐼. Then
(a) If 𝑆 and 𝑆′ are both open, then 𝑆 ∩ 𝑆′ is open.
(b) If 𝑆 and 𝑆′ are both closed, then 𝑆 ∪ 𝑆′ is closed.
(c) If 𝑆 𝑖 is open for 𝑖 ∈ 𝐼, then 𝑆 𝑖 is open.
Ð
𝑖∈𝐼

(d) If 𝑆 𝑖 is closed for 𝑖 ∈ 𝐼, then 𝑆 𝑖 is closed.


Ñ
𝑖∈𝐼

Proof. Assume both 𝑆 and 𝑆′ are open, and let 𝑎 ∈ 𝑆 ∩ 𝑆′. Then 𝐵(𝑎, 𝜀) ⊆ 𝑆 and
𝐵(𝑎, 𝜀′) ⊆ 𝑆′ for suitable positive 𝜀 and 𝜀′. The smaller of the two balls 𝐵(𝑎, 𝜀) and
𝐵(𝑎, 𝜀′) is therefore a subset of both sets 𝑆 and 𝑆′ and therefore of their intersection.
So 𝑆 ∩ 𝑆′ is open, which shows (a).
Condition (b) holds because if 𝑆 and 𝑆′ are closed, then 𝑇 = R𝑛 \ 𝑆 and
𝑇 ′ = R𝑛 \ 𝑆′ are open, and so is 𝑇 ∩ 𝑇 ′ by (a), and hence 𝑆 ∪ 𝑆′ = R𝑛 \ (𝑇 ∩ 𝑇 ′) is
open by Theorem 3.24.
3.10. Bounded and Compact Sets 67

To see (c), let 𝑆 𝑖 be open for all 𝑖 ∈ 𝐼, and let 𝑎 ∈ ∪𝑖∈𝐼 𝑆 𝑖 , that is, 𝑎 ∈ 𝑆 𝑗 for some
𝑗 ∈ 𝐼. Then there is some 𝜀 > 0 so that 𝐵(𝑎, 𝜀) is a subset of 𝑆 𝑗 , which is a subset of
the set ∪𝑖∈𝐼 𝑆 𝑖 which is therefore open.
We obtain (d) from (c) because the intersection of complements of sets is the
complement of their union, that is, 𝑖∈𝐼 (R𝑛 \ 𝑆 𝑖 ) = R𝑛 \ 𝑖∈𝐼 𝑆 𝑖 , which we consider
Ñ Ð
here for closed sets 𝑆 𝑖 and hence open sets R𝑛 \ 𝑆 𝑖 .

Note that by induction, Theorem 3.25(a) extends to the statement that the
intersection of any finite number of open sets is open. However, this is no longer
true for arbitrary intersections. For example, each of the intervals 𝑆𝑛 = (− 𝑛1 , 𝑛1 ) for
𝑛 ∈ N is open, but their intersection 𝑛∈N 𝑆𝑛 is the singleton {0} which is not an
Ñ
open set. Similarly, arbitrary unions of closed sets are not necessarily closed, for
example the closed intervals [ 𝑛1 , 1] for 𝑛 ∈ N, whose union is the half-open interval
(0, 1] which is not closed. However, (c) and (d) do allow arbitrary unions of open
sets and arbitrary intersections of closed sets.

3.10 Bounded and Compact Sets

Condition states (3.30) what it means for a sequence {𝑥 𝑘 } to be bounded. The same
definition applies to a set.

Definition 3.26. Let 𝑆 ⊆ R𝑛 . Then 𝑆 is called bounded if there is some 𝑀 ∈ R so


that ∥𝑥∥ < 𝑀 for all 𝑥 ∈ 𝑆.

In other words, 𝑆 is bounded if it is contained in some 𝑀-ball around the origin


0 in R𝑛 . Because of (3.27) and (3.28), 𝑆 is bounded if and only if 𝑆 is contained in
some maximum-norm 𝑀-ball around the origin 0, that is, 𝑆 ⊆ 𝐵max (0, 𝑀) or

∀(𝑥1 , . . . , 𝑥 𝑛 ) ∈ 𝑆 ∀ 𝑖 = 1, . . . , 𝑛 : | 𝑥 𝑖 | < 𝑀 . (3.38)

That is, 𝑆 is bounded if and only if the components 𝑥 𝑖 of the points 𝑥 in 𝑆 are
bounded.
Theorem 3.16 states that a bounded sequence in R has convergent subsequence.
The same holds for R𝑛 instead of R.

Theorem 3.27 (Bolzano–Weierstrass in R𝑛 ). Every bounded sequence in R𝑛 has a


convergent subsequence.

(𝑘) (𝑘)
Proof. Consider a bounded sequence {𝑥 (𝑘) } in R𝑛 , where 𝑥 (𝑘) = (𝑥 1 , . . . , 𝑥 𝑛 ) for
(𝑘)
each 𝑘. Because the sequence is bounded, the sequence of 𝑖th components {𝑥 𝑖 }
(𝑘)
is bounded in R, for each 𝑖 = 1, . . . , 𝑛. In particular, the sequence {𝑥1 } given
by the first component is bounded, and because it is a bounded sequence of real
68 Chapter 3. Continuous Optimisation

(𝑘 𝑗 )
numbers, it has a convergent subsequence by Theorem 3.16, call it {𝑥 1 } where
𝑘 𝑗 for 𝑗 = 1, 2, . . . indicates the subsequence. That is, the sequence {𝑥 (𝑘 𝑗 ) } 𝑗∈N of
points in R𝑛 converges in its first component. We now consider the sequence
(𝑘 𝑗 )
of real numbers {𝑥2 } 𝑗∈N given the second components of the elements of that
subsequence. Again, by Theorem 3.16, this sequence has a convergent subsequence
for suitable values 𝑗ℓ for ℓ = 1, 2, . . ., so that the resulting sequence of points
{𝑥 (𝑘 𝑗ℓ ) }ℓ ∈N of points in R𝑛 convergences in its second component. Because the
subscripts 𝑘 𝑗ℓ for ℓ ∈ N define a subsequence of 𝑘 𝑗 for 𝑗 ∈ N, the first components
(𝑘 𝑗 ) (𝑘 𝑗 )
𝑥1 ℓ of these vectors are a subsequence of the convergent sequence 𝑥1 which is
therefore also convergent with the same limit.
So the sequence of vectors {𝑥 (𝑘 𝑗ℓ ) }ℓ ∈N convergences in their first and second
component. We now proceed in the same manner by considering the sequence
(𝑘 𝑗 )
of third components {𝑥3 ℓ }ℓ ∈N of these vectors, which again has a convergent
subsequence since these are bounded real numbers, and that subsequence now
defines a sequence of vectors in R𝑛 that convergence in their first, second, and
third components. By continuing in this manner, we obtain eventually, after 𝑛
applications of Theorem 3.16, a subsequence of the original sequence {𝑥 (𝑘) } 𝑘∈N that
converges in each component. As mentioned at the end of Section 3.8, convergence
in each component means overall convergence. So we have found the required
subsequence.
By the previous theorem, a sequence of points in a bounded subset 𝑆 of R𝑛
has a convergent subsequence. If that sequence has also its limit in 𝑆, then 𝑆 is
called compact, which is a very important concept.

Definition 3.28. Let 𝑆 ⊆ R𝑛 . Then 𝑆 is called compact if and only if every sequence
of points in 𝑆 has a convergent subsequence whose limit belongs to 𝑆.

The following characterisation of compact sets in R𝑛 is very useful.

Theorem 3.29. Let 𝑆 ⊆ R𝑛 . Then 𝑆 is compact if and only if 𝑆 is closed and bounded.

Proof. Assume first that 𝑆 is closed and bounded, and consider an arbitrary
sequence of points in 𝑆. Then by Theorem 3.27, this sequence has a convergent
subsequence with limit 𝑥, say, which belongs to 𝑆 because 𝑆 is closed. So 𝑆 is
compact according to Definition 3.28.
Conversely, assume 𝑆 is compact. Consider any convergent sequence of
points in 𝑆 with limit 𝑥. Because 𝑆 is compact, that sequence has a convergent
subsequence, whose limit is also 𝑥, and which belongs to 𝑆. So every limit point of
𝑆 belongs to 𝑆, which means that 𝑆 is closed.
In order to show that 𝑆 is bounded, assume this is not the case. Then
for every 𝑘 ∈ N there is a point that we call 𝑥 (𝑘) in 𝑆 with ∥𝑥 (𝑘) ∥ ≥ 𝑘. This
3.11. Continuity 69

defines an unbounded sequence {𝑥 (𝑘) } 𝑘∈N in 𝑆 where clearly every subsequence is


also unbounded which therefore cannot converge (every convergent sequence is
bounded), in contradiction to the assumption that 𝑆 is compact. This proves that 𝑆
is bounded, as claimed.
Because a subset of a bounded set is clearly bounded, we immediately obtain
the following consequence of Theorem 3.29.

Corollary 3.30. A closed subset of a compact set is compact.

3.11 Continuity

We now consider functions that are defined on a subset 𝑆 of R𝑛 . The concepts


of being open, closed, or bounded apply to such sets 𝑆. These are “topological”
properties of 𝑆, which means they refer to the way points in 𝑆 can be “approached”,
for example by sequences. If 𝑆 is open, then any point in 𝑆 has an 𝜀-ball around
it that belongs to 𝑆, for sufficiently small 𝜀. If 𝑆 is closed, then 𝑆 contains all its
limit points. If 𝑆 is bounded, then every sequence in 𝑆 is bounded, and moreover
boundedness is necessary for compactness.
The central notion of “topology” is continuity, which refers to a function 𝑓 and
means that 𝑓 preserves “nearness”. That is, a function 𝑓 is continuous if it maps
nearby points to nearby points. Here, “nearby” means “arbitrarily close”. “Closeness”
is defined in terms of the distance between two points, according to the Euclidean
norm or the maximum norm, as discussed in Section 3.7. Basically, we can say that
a function is continuous if it preserves limits in the sense that

lim 𝑓 (𝑥 𝑘 ) = 𝑓 (𝑥) (3.39)


𝑥 𝑘 →𝑥

where {𝑥 𝑘 } is an arbitrary sequence that converges to 𝑥, or

lim 𝑓 (𝑥 𝑘 ) = 𝑓 ( lim 𝑥 𝑘 )
𝑘→∞ 𝑘→∞

assuming that lim 𝑥 𝑘 exists, which is called 𝑥 in (3.39).


𝑘→∞
We now give a formal definition of continuity. For now, no particular topologi-
cal property is assumed about the domain 𝑆 of the function. Moreover, we define
this for functions that may take values in some R𝑚 , because it can be stated in the
same manner for 𝑚 = 1 or any other positive integer 𝑚.

Definition 3.31. Let 𝑆 ⊆ R𝑛 , let 𝑓 : 𝑆 → R𝑚 be a function defined on 𝑆, and let


𝑥 ∈ 𝑆. Then 𝑓 is called continuous at 𝑥 if

∀𝜀 > 0 ∃𝛿 > 0 ∀𝑥 ∈ 𝑆 : ∥𝑥 − 𝑥∥ < 𝛿 ⇒ ∥ 𝑓 (𝑥) − 𝑓 (𝑥)∥ < 𝜀 . (3.40)

The function 𝑓 is called continuous on 𝑆 if it is continuous at all points of 𝑆.


70 Chapter 3. Continuous Optimisation

The condition ∥ 𝑓 (𝑥) − 𝑓 (𝑥)∥ < 𝜀 in (3.40) states that 𝑓 (𝑥) belongs to the 𝜀-ball
around 𝑓 (𝑥) (in R𝑚 ), which says that the function values 𝑓 (𝑥) are “close” to 𝑓 (𝑥).
This is required to hold for all points 𝑥 provided these belong to 𝑆 (so that 𝑓 (𝑥)
is defined) and are within an 𝛿-ball around 𝑥. Here 𝛿 can be chosen as small as
required but must be positive. This captures the intuition that 𝑓 maps points near
𝑥 to points near 𝑓 (𝑥).
A simple example of a function that is not continuous is the function 𝑓 : R → R
defined by (
0 if 𝑥 ≠ 0,
𝑓 (𝑥) = (3.41)
1 if 𝑥 = 0,
which is not continuous at 𝑥 = 0. Namely, if we choose 𝜀 = 12 , for example, then
for any 𝛿 > 0 we will always have a point 𝑥 near 𝑥 so that we have ∥𝑥 − 𝑥∥ < 𝛿
but ∥ 𝑓 (𝑥) − 𝑓 (𝑥)∥ ≥ 𝜀, in this case for example 𝑥 = 𝛿/2 where ∥ 𝑓 (𝑥) − 𝑓 (𝑥)∥ = 1,
which contradicts the requirement (3.40).

𝑔(𝑥, 𝑦)

Figure 3.1 Plot of the function 𝑔(𝑥, 𝑦) in (3.42).

The following function 𝑔 : R2 → R is a much more sophisticated example (a


plot of this function is shown in Figure 3.1):

0

 if (𝑥, 𝑦) = (0, 0)
𝑥𝑦

𝑔(𝑥, 𝑦) = (3.42)
 𝑥2 + 𝑦2
 otherwise.

What is interesting about the function 𝑔 is that it is separately continuous in each
variable, which means the following: Consider 𝑔(𝑥, 𝑦) as a function of 𝑥 for fixed 𝑦,
3.12. Proving Continuity 71

that is, consider the function 𝑔 𝑦 : R → R given by 𝑔 𝑦 (𝑥) = 𝑔(𝑥, 𝑦). If 𝑦 = 0, then we
clearly have 𝑔 𝑦 (𝑥) = 0 for all 𝑥, which is certainly a continuous function. If 𝑦 ≠ 0,
𝑥𝑦
then 𝑦 2 > 0, and 𝑔 𝑦 given by 𝑔 𝑦 (𝑥) = 𝑥 2 +𝑦 2 is also continuous. Because 𝑔(𝑥, 𝑦) is
symmetric in 𝑥 and 𝑦, the function 𝑔 is also separately continuous in 𝑦. However,
𝑔 is not a continuous function when its arguments are allowed to vary jointly.
Namely, for 𝑥 = 𝑦 ≠ 0 we have 𝑔(𝑥, 𝑦) = 𝑔(𝑥, 𝑥) = 𝑥 2𝑥+𝑥 2 = 12 , so this function is
2

constant but with a different constant value 12 than 𝑔(0, 0), which is zero. That is, 𝑔
is not continuous at (0, 0).
A useful criterion to prove that a function is continuous relates to our initial
consideration of this section in terms of sequences.
Proposition 3.32. Let 𝑆 ⊆ R𝑛 , let 𝑓 : 𝑆 → R𝑚 be a function defined on 𝑆, and let 𝑥 ∈ 𝑆.
Then 𝑓 is continuous at 𝑥 if and only if for all sequences {𝑥 (𝑘) } in 𝑆 that converge to 𝑥

lim 𝑓 (𝑥 (𝑘) ) = 𝑓 (𝑥) . (3.43)


𝑘→∞

Proof. Suppose 𝑓 is continuous according to Definition 3.31, and let {𝑥 (𝑘) } be a


sequence in 𝑆 that converges to 𝑥. We want to show that 𝑓 (𝑥 (𝑘) ) → 𝑓 (𝑥) as 𝑘 → ∞.
For given 𝜀 > 0, we choose 𝛿 > 0 so that ∥𝑥 − 𝑥∥ < 𝛿 ⇒ ∥ 𝑓 (𝑥) − 𝑓 (𝑥)∥ < 𝜀 for all
𝑥 ∈ 𝑆 according to (3.40). Because 𝑥 (𝑘) → 𝑥, there is some natural number 𝐾 so
that ∥𝑥 (𝑘) − 𝑥∥ < 𝛿 whenever 𝑘 ≥ 𝐾 according to (3.29) (with 𝛿 instead of 𝜀). Then
𝑘 ≥ 𝐾 implies ∥ 𝑓 (𝑥 (𝑘) ) − 𝑓 (𝑥)∥ < 𝜀, as required.
Conversely, assume that (3.43) holds for all sequences {𝑥 (𝑘) } in 𝑆 that converge
to 𝑥. Suppose 𝑓 is not continuous according to Definition 3.31. That is, there
is some 𝜀 > 0 so that for all 𝛿 > 0 there is some point 𝑥 in 𝑆 with ∥𝑥 − 𝑥∥ < 𝛿
but ∥ 𝑓 (𝑥) − 𝑓 (𝑥)∥ ≥ 𝜀. Similary to the proof of Lemma 3.23, we let 𝑥 (𝑘) be such
a point for 𝛿 = 1/𝑘, that is, ∥𝑥 (𝑘) − 𝑥∥ < 1/𝑘 but ∥ 𝑓 (𝑥 (𝑘) ) − 𝑓 (𝑥)∥ ≥ 𝜀, for all
𝑘 ∈ N. This clearly defines a sequence {𝑥 (𝑘) } in 𝑆 that converges to 𝑥 but where the
corresponding images 𝑓 (𝑥 (𝑘) ) under 𝑓 do not converge to 𝑓 (𝑥)∥, a contradiction.
With the help of this proposition, we see that 𝑔 as defined in (3.42) is not
continuous at (0, 0) by considering the sequence (𝑥, 𝑦)(𝑘) = ( 1𝑘 , 1𝑘 ) which converges
to (0, 0), but where 𝑔((𝑥, 𝑦)(𝑘) ) = 𝑔( 1𝑘 , 1𝑘 ) = 12 , which are function values of 𝑔 that
do not converge to 𝑔(0, 0) = 0.

3.12 Proving Continuity

Intuitively, a function 𝑓 from R𝑛 to R is continuous if its graph has no “disruptions”.


This graph is a subset of R𝑛+1 given by the points (𝑥1 , . . . , 𝑥 𝑛 , 𝑓 (𝑥1 , . . . , 𝑥 𝑛 )). The
function 𝑓 in (3.41) is not continuous at 0, and the function (𝑥, 𝑦) ↦→ 𝑔(𝑥, 𝑦) in
(3.42) is not continuous because of a similar disruption at 𝑥 = 0 when considered
for its arguments (𝑥, 𝑥), for example.
72 Chapter 3. Continuous Optimisation

The graph of a function 𝑓 from R1 to R has no disruptions if it be drawn as a


continuous line. In order to prove this formally, note that condition (3.40) states
that if the argument 𝑥 of 𝑓 (𝑥) is at most distance 𝛿 away from 𝑥, then 𝑓 (𝑥) is less
than 𝜀 away from 𝑓 (𝑥). So the challenge is to identify how small 𝛿 needs to be for
a given 𝜀. We demonstrate this with two familiar but nontrivial examples. The
functions will have real values and arguments, so that ∥𝑥 − 𝑥 ∥ is just |𝑥 − 𝑥|.

𝑓 (𝑥) + 𝜀

𝑓 (𝑥)

𝑓 (𝑥) − 𝜀

0 𝑥
0 𝑥 1

Figure 3.2 Graph of the function 𝑓 (𝑥) = 1/𝑥 for 𝑥 > 0.

The first function we consider is 𝑓 (𝑥) = 1/𝑥. The function is not defined for
𝑥 = 0. If we extend 𝑓 with some value for 𝑓 (0) in order to have 𝑓 (𝑥) defined
for all 𝑥 ∈ R, the resulting function is surely not continuous at 0 because 𝑓 (𝑥)
is arbitrarily negative for negative 𝑥 that are close to 0, and arbitrarily positive
for small positive 𝑥, and these function values will not approach 𝑓 (0) no matter
how we defined 𝑓 (0). But the domain of 𝑓 is not R but R \ {0}, which is an open
set, and on this set we will show that 𝑓 is continuous. In short, “continuity at 0”
is not an issue because 𝑓 (0) is not defined. In order to simplify things, we just
consider 𝑓 (𝑥) = 1/𝑥 for 𝑥 ∈ (0, ∞), that is, 𝑥 > 0, where we want to show that 𝑓 is
continuous at 𝑥; a similar consideration applies to 𝑥 ∈ (−∞, 0).
Let 𝑥 > 0. We want to show continuity of 𝑓 at 𝑥. Our proof will involve
algebraic manipulations, but it is also very helpful to draw a graph of the function,
as in Figure 3.2, to understand what needs to be done. Given some 𝜀 > 0, we want
to ensure that | 𝑓 (𝑥) − 𝑓 (𝑥)| < 𝜀 for all 𝑥 such that |𝑥 − 𝑥| < 𝛿. The choice of 𝛿 will
3.12. Proving Continuity 73

depend on 𝜀, but also on 𝑥 because the graph of 𝑓 becomes very steep when 𝑥 is
close to zero.
We now work backwards from the condition | 𝑓 (𝑥) − 𝑓 (𝑥)| < 𝜀 in order to
obtain a suitable constraint on |𝑥 − 𝑥|:

| 𝑓 (𝑥) − 𝑓 (𝑥)| < 𝜀


⇔ |1/𝑥 − 1/𝑥| < 𝜀
|𝑥 − 𝑥| (3.44)
⇔ < 𝜀
|𝑥𝑥|
⇔ |𝑥 − 𝑥| < 𝜀𝑥𝑥 .

The last inequality in (3.44) is a condition on |𝑥 − 𝑥|, but we cannot choose 𝛿 = 𝜀𝑥𝑥
because this expression does not depend solely on 𝜀 and 𝑥 but also on 𝑥. However,
all we want is that |𝑥 − 𝑥| < 𝛿 implies |𝑥 − 𝑥| < 𝜀𝑥𝑥. It is not necessary, as in
the dashed lines in Figure 3.2, that 𝑓 (𝑥 − 𝛿) meets exactly one of the endpoints
𝑓 (𝑥) ± 𝜀 of the “target interval” of the function values. Any positive 𝛿 that is at
most 𝜀𝑥𝑥 will serve the purpose. If 𝛿 is 𝑥/2 or less, then |𝑥 − 𝑥| < 𝛿 clearly implies
𝑥 ∈ ( 12 𝑥, 32 𝑥) and thus in particular 𝑥 > 12 𝑥. With that consideration, we let

𝛿 = min{ 12 𝑥, 21 𝜀𝑥 2 }. (3.45)

Then |𝑥 − 𝑥| < 𝛿 implies 12 𝑥 < 𝑥 and thus |𝑥 − 𝑥| < 𝛿 ≤ 12 𝜀𝑥 2 < 𝜀𝑥𝑥 and therefore
| 𝑓 (𝑥) − 𝑓 (𝑥)| < 𝜀 according to (3.44), as intended.
In the preceding proof, the function 𝑓 : 𝑥 ↦→ 1/𝑥 was shown to be continuous
at 𝑥 by choosing 𝛿 = 12 𝜀𝑥 2 (which for small 𝜀 also implies 𝛿 ≤ 12 𝑥 as required in
(3.45)). We see here that we have to choose 𝛿 as a function not only of 𝜀 but also of
the point 𝑥 at which we want to prove continuity.
As an aside, the concept of uniform continuity means that 𝛿 can be chosen as a
function of 𝜀 only. That is, a function 𝑓 : 𝑆 → R is called uniformly continuous if

∀𝜀 > 0 ∃𝛿 > 0 ∀𝑥 ∈ 𝑆 ∀𝑥 ∈ 𝑆 : ∥𝑥 − 𝑥∥ < 𝛿 ⇒ | 𝑓 (𝑥) − 𝑓 (𝑥)| < 𝜀 . (3.46)

In contrast, the function 𝑓 is just continuous if (3.40) holds prefixed with the
quantification ∀𝑥 ∈ 𝑆, so that 𝛿 can be chosen depending on 𝜀 and 𝑥, as in (3.45).
It can be shown that a continuous function on a compact domain is uniformly
continuous.
⇒ Show that the function 𝑓 : (0, ∞) → R, 𝑓 (𝑥) = 1/𝑥 is not uniformly continuous.
Note that its domain is not compact.

Our second example of a continuous function is 𝑓 : [0, ∞) → R, 𝑓 (𝑥) = 𝑥.
For 𝑥 > 0, the functon 𝑓 has the derivative 𝑓 ′(𝑥) = 12 𝑥 −1/2 . At 𝑥 = 0, the function 𝑓
has no derivative because that derivative would have to be arbitrarily steep. The
74 Chapter 3. Continuous Optimisation

graph of 𝑓 is a flipped parabola arc and 𝑓 is clearly continuous. We prove this


using the definition of continuity (3.40), similar to the equivalences in (3.44):

| 𝑓 (𝑥) − 𝑓 (𝑥)| < 𝜀


√ √
⇔ | 𝑥 − 𝑥| < 𝜀
√ √ (3.47)
⇔ ( 𝑥 − 𝑥)2 < 𝜀2

⇔ 𝑥 + 𝑥 < 𝜀2 + 2 𝑥𝑥.

We now consider (3.47) separately


√ for the two cases 𝑥 ≥ 𝑥 and 𝑥 < 𝑥. If 𝑥 ≥ 𝑥, then
we rewrite 𝑥 + 𝑥 < 𝜀 + 2 𝑥𝑥 equivalently as
2


|𝑥 − 𝑥| = 𝑥 − 𝑥 < 𝜀2 + 2( 𝑥𝑥 − 𝑥), (3.48)
√ √
where this inequality is implied by |𝑥 − 𝑥| < 𝜀2 because
√ 𝑥𝑥 − 𝑥 ≥ 𝑥 𝑥 − 𝑥 = 0.
Similarly, if 𝑥 < 𝑥, then we rewrite 𝑥 + 𝑥 < 𝜀 + 2 𝑥𝑥 equivalently as
2


|𝑥 − 𝑥| = 𝑥 − 𝑥 < 𝜀2 + 2( 𝑥𝑥 − 𝑥), (3.49)
√ √
where this inequality is again implied by |𝑥 −𝑥| < 𝜀2 because 𝑥𝑥 −𝑥 ≥ 𝑥 𝑥−𝑥 = 0.
In both cases, if we choose 𝛿 = 𝜀2 , then (3.40) holds, which proves that 𝑓 is
continuous, and in fact uniformly continuous on 𝑆 = [0, ∞) according to (3.46).
(This is an example of a function that is uniformly continuous even though its
domain 𝑆 is not compact.) Note that we do not need to worry that |𝑥 − 𝑥| < 𝛿
implies 𝑥 ∈ 𝑆 because that condition is also imposed in (3.40) and (3.46). For the
function 𝑥 ↦→ 1/𝑥 defined for all (positive or negative) 𝑥 ≠ 0, we also have to make
sure that 𝑥 and 𝑥 have the same sign because otherwise 1/𝑥 and 1/𝑥 would be
very far apart, but this follows from (3.45).
We now prove the continuity of some functions on R𝑛 . The following lemma
states that we can replace the Euclidean norm in (3.40) with the maximum norm,
which in many situations is more convenient to use.

Lemma 3.33. Let 𝑆 ⊆ R𝑛 , let 𝑓 : 𝑆 → R𝑚 be a function defined on 𝑆, and let 𝑥 ∈ 𝑆.


Then 𝑓 is continuous at 𝑥 if and only if

∀𝜀 > 0 ∃𝛿 > 0 ∀𝑥 ∈ 𝑆 : ∥𝑥 − 𝑥∥ max < 𝛿 ⇒ ∥ 𝑓 (𝑥) − 𝑓 (𝑥)∥ max < 𝜀 . (3.50)

Proof. Suppose first 𝑓 is continuous at 𝑥. Let 𝜀 > 0 and choose 𝛿 > 0 so that (3.40)

holds, and let 𝛿′ = 𝛿/ 𝑛. Then 𝑥 ∈ 𝐵max (𝑥, 𝛿′) ⊆ 𝐵(𝑥, 𝛿) by (3.28), which implies
𝑓 (𝑥) ∈ 𝐵( 𝑓 (𝑥), 𝜀) by choice of 𝛿 and thus 𝑓 (𝑥) ∈ 𝐵max ( 𝑓 (𝑥), 𝜀) by (3.27), which
implies (3.50) (with 𝛿′ instead of 𝛿) as claimed.
Conversely, given (3.50) and 𝜀 > 0, we choose 𝛿 > 0 so that 𝑥 ∈ 𝐵max (𝑥, 𝛿)

implies 𝑓 (𝑥) ∈ 𝐵max ( 𝑓 (𝑥), 𝜀/ 𝑚). Then 𝑥 ∈ 𝐵(𝑥, 𝛿) implies 𝑥 ∈ 𝐵max (𝑥, 𝛿) by (3.27)

and thus 𝑓 (𝑥) ∈ 𝐵max ( 𝑓 (𝑥), 𝜀/ 𝑚) ⊆ 𝐵( 𝑓 (𝑥), 𝜀) by (3.28), which proves (3.40).
3.12. Proving Continuity 75

We now prove that the multiplication of real numbers is a continuous operation,


that is, that the function 𝑓 : R2 → R, 𝑓 (𝑥, 𝑦) = 𝑥 · 𝑦, is continuous. This is intuitive
from the graph of 𝑓 which has no disruptions, but we prove it formally. As before,
let 𝜀 > 0, let (𝑥, 𝑦) ∈ R2 , and we want to find how close (𝑥, 𝑦) needs to be to (𝑥, 𝑦)
to prove that 𝑓 (𝑥, 𝑦) is close to 𝑓 (𝑥, 𝑦), that is,

|𝑥𝑦 − 𝑥 𝑦| < 𝜀 . (3.51)

Using the triangle inequality, we have

|𝑥 𝑦−𝑥 𝑦| = |𝑥𝑦−𝑥 𝑦+𝑥 𝑦−𝑥 𝑦| ≤ |𝑥𝑦−𝑥 𝑦|+|𝑥 𝑦−𝑥 𝑦| = |𝑥−𝑥| |𝑦|+|𝑥| |𝑦−𝑦| (3.52)

so that we have proved (3.51) if we can prove

|𝑥 − 𝑥| |𝑦| < 𝜀/2, |𝑥| |𝑦 − 𝑦| < 𝜀/2 . (3.53)

The second inequality in (3.53) produces an easy constraint on |𝑦 − 𝑦|: Let


𝜀
𝛿𝑦 = (3.54)
2(|𝑥| + 1)
(the denominator is chosen to avoid complications if 𝑥 = 0), so that |𝑦 − 𝑦| < 𝛿 𝑦
implies |𝑥| |𝑦 − 𝑦| < 𝜀/2. The first inequality in (3.53) is (if 𝑦 ≠ 0) equivalent to
|𝑥 − 𝑥| < 𝜀/2𝑦 but 𝑦 is not fixed, so we use that 𝑦 is close to 𝑦. Assume that 𝛿 𝑦 ≤ 1
(if, as defined in (3.54), 𝛿 𝑦 > 1, then we just set 𝛿 𝑦 = 1). Then

|𝑦 − 𝑦| < 𝛿 𝑦 ⇒ |𝑦| = |𝑦 − 𝑦 + 𝑦| ≤ |𝑦 − 𝑦| + |𝑦| < 1 + |𝑦| . (3.55)

Define
𝜀
𝛿𝑥 = . (3.56)
2(|𝑦| + 1)
Then |𝑥 − 𝑥| < 𝛿 𝑥 and |𝑦 − 𝑦| < 𝛿 𝑦 imply |𝑥 − 𝑥| |𝑦| < 𝜀/2, that is, the first inequality
in (3.53). Now let 𝛿 = min{𝛿 𝑥 , 𝛿 𝑦 }. Then |(𝑥, 𝑦) − (𝑥, 𝑦)| max < 𝛿 implies |𝑥 − 𝑥| < 𝛿 𝑥
and |𝑦 − 𝑦| < 𝛿 𝑦 , which in turn imply (3.53) and therefore (3.51). With Lemma 3.33,
this shows the continuity of the function (𝑥, 𝑦) ↦→ 𝑥𝑦.
This is an important observation: the arithmetic operation of multiplication
is continuous, and it is also easy to prove that addition, that is, the function
(𝑥, 𝑦) ↦→ 𝑥 + 𝑦, is continuous. Similarly, the function 𝑥 ↦→ −𝑥 is continuous, which
is nearly trivial compared to proving that 𝑥 ↦→ 1/𝑥 (for 𝑥 ≠ 0) is continuous.
The following lemma exploits that we have defined continuity for functions
that take values in R𝑚 and not just in R1 . It states that the composition of continuous
functions is continuous. Recall that 𝑓 (𝑆) is the image of 𝑓 as defined in (3.14).

Lemma 3.34. Let 𝑆 ⊆ R𝑛 , 𝑇 ⊆ R𝑚 , and 𝑓 : 𝑆 → R𝑚 and 𝑔 : 𝑇 → Rℓ be functions so


that 𝑓 (𝑆) ⊆ 𝑇. Then if 𝑓 and 𝑔 are continuous, their composition 𝑔 𝑓 : 𝑆 → Rℓ , given by
(𝑔 𝑓 )(𝑥) = 𝑔( 𝑓 (𝑥)) for 𝑥 ∈ 𝑆, is also continuous.
76 Chapter 3. Continuous Optimisation

Proof. Assume that 𝑓 and 𝑔 are continuous. Let 𝑥 ∈ 𝑆 and 𝜀 > 0. We want to show
that there is some 𝛿 > 0 so that ∥𝑥 − 𝑥∥ < 𝛿 and 𝑥 ∈ 𝑆 imply ∥ 𝑔( 𝑓 (𝑥))− 𝑔( 𝑓 (𝑥))∥ < 𝜀.
Because 𝑔 is continuous at 𝑓 (𝑥), there is some 𝛾 > 0 such that for any 𝑦 ∈ 𝑇 with
∥ 𝑦 − 𝑓 (𝑥)∥ < 𝛾 we have ∥ 𝑔(𝑦) − 𝑔( 𝑓 (𝑥))∥ < 𝜀. Now choose 𝛿 > 0 such that, by
continuity of 𝑓 at 𝑥, we have for any 𝑥 ∈ 𝑆 that ∥𝑥 −𝑥∥ < 𝛿 implies ∥ 𝑓 (𝑥)− 𝑓 (𝑥)∥ < 𝛾.
Then (for 𝑦 = 𝑓 (𝑥)) this implies ∥ 𝑔( 𝑓 (𝑥)) − 𝑔( 𝑓 (𝑥))∥ < 𝜀 as required.

In principle, this lemma should allow us to prove that the function (𝑥, 𝑦) ↦→ 𝑥/𝑦
is continuous on R ×(R \{0}) (that is, for 𝑦 ≠ 0), by considering it as the composition
of the functions (𝑥, 𝑦) ↦→ (𝑥, 1/𝑦) and (𝑥, 𝑧) ↦→ 𝑥𝑧 . We have just proved that
(𝑥, 𝑧) ↦→ 𝑥𝑧 is continuous, and earlier that 𝑦 ↦→ 1/𝑦 is continuous, but what about
(𝑥, 𝑦) ↦→ (𝑥, 1/𝑦) where the function 𝑦 ↦→ 1/𝑦 affects only the second component
of its input? Clearly, the function (𝑥, 𝑦) ↦→ (𝑥, 1/𝑦) should also be continuous, but
we need one further simple observation to prove this.
Whereas Lemma 3.34 considers the sequential composition of two functions,
the following lemma refers to a “parallel” composition of functions.

Lemma 3.35. Let 𝑆 ⊆ R𝑛 , and 𝑓 : 𝑆 → R𝑚 and 𝑔 : 𝑆 → Rℓ be two functions defined


on 𝑆, and consider the function ℎ : 𝑆 → R𝑚+ℓ defined by ℎ(𝑥) = ( 𝑓 (𝑥), 𝑔(𝑥)) for 𝑥 ∈ 𝑆.
Then ℎ is continuous if and only if 𝑓 and 𝑔 are continuous.

Proof. Let 𝑥 ∈ 𝑆 and 𝜀 > 0. Suppose 𝑓 and 𝑔 are continuous at 𝑥. Then according
to Lemma 3.33 there is some 𝛿 > 0 so that ∥𝑥 − 𝑥∥ max < 𝛿 and 𝑥 ∈ 𝑆 imply
∥ 𝑓 (𝑥) − 𝑓 (𝑥)∥ max < 𝜀 and ∥ 𝑔(𝑥) − 𝑔(𝑥)∥ max < 𝜀. But then also ∥ ℎ(𝑥) − ℎ(𝑥)∥ max < 𝜀
because each of the 𝑚 + ℓ components of ℎ(𝑥) − ℎ(𝑥) is either the corresponding
component of 𝑓 (𝑥) − 𝑓 (𝑥) or of 𝑔(𝑥) − 𝑔(𝑥).
Conversely, if ℎ is continuous at 𝑥, there is some 𝛿 > 0 so that ∥𝑥 − 𝑥∥ max < 𝛿
and 𝑥 ∈ 𝑆 imply ∥ ℎ(𝑥)− ℎ(𝑥)∥ max < 𝜀. Because ∥ 𝑓 (𝑥)− 𝑓 (𝑥)∥ max ≤ ∥ ℎ(𝑥)− ℎ(𝑥)∥ max
and ∥ 𝑔(𝑥)− 𝑔(𝑥)∥ max ≤ ∥ ℎ(𝑥)− ℎ(𝑥)∥ max by the definition of ℎ and of the maximum
norm (3.17), this implies ∥ 𝑓 (𝑥) − 𝑓 (𝑥)∥ max < 𝜀 and ∥ 𝑔(𝑥) − 𝑔(𝑥)∥ max < 𝜀. This
proves the claim.

Corollary 3.36. Let 𝑆 ⊆ R𝑛 and 𝑓 : 𝑆 → R𝑚 , where 𝑓 (𝑥) = ( 𝑓1 (𝑥), . . . , 𝑓𝑚 (𝑥)) for


𝑥 ∈ 𝑆, that is, 𝑓𝑖 : 𝑆 → R is the 𝑖th component function of 𝑓 , for 1 ≤ 𝑖 ≤ 𝑚. Then 𝑓 is
continuous if and only if 𝑓𝑖 is continuous for 1 ≤ 𝑖 ≤ 𝑚.

Proof. This follows by induction on 𝑚 from Lemma 3.35.

Corollary 3.36 states that a function 𝑓 that takes values in R𝑚 is continuous if


and only if all its component functions 𝑓1 , . . . , 𝑓𝑚 are continuous. These functions
take values in R. Note that in comparison, separate continuity in each variable is
not sufficient for continuity of a function defined on R𝑛 , as example (3.42) shows.
So for functions R𝑛 → R𝑚 , where both 𝑛 and 𝑚 are positive integers, it is the
3.13. The Theorem of Weierstrass 77

higher dimensionality of the domain R𝑛 that requires additional considerations


for continuity, not of the range R𝑚 . In order to check continuity, we can focus on
the case 𝑚 = 1.
We now use the results of this section to show that the following function
ℎ : R2 → R, somewhat similar to 𝑔 in (3.42), is continuous (but remember that 𝑔 is
not):
 √ 𝑥𝑦 if (𝑥, 𝑦) ≠ (0, 0),



ℎ(𝑥, 𝑦) = 𝑥 2 +𝑦 2 (3.57)
if (𝑥, 𝑦) = (0, 0) .

0

We want to show that ℎ is continuous at (𝑥, 𝑦). If (𝑥, 𝑦) ≠ (0, 0), then ℎ is a
composition of continuous functions as follows. The numerator 𝑥𝑦 is a continuous
function of (𝑥, 𝑦) as we proved earlier. The function 𝑥 ↦→ 𝑥 2 is the composition of the
functions 𝑥 ↦→ (𝑥, 𝑥) (continuous by Lemma 3.35 because 𝑥 ↦→ 𝑥 is continuous) and
(𝑥, 𝑦) → 𝑥𝑦. Therefore also (𝑥, 𝑦) ↦→ (𝑥 2 , 𝑦 2 ) is continuous (again by Lemma
p 3.35)
and thus (because addition is continuous) (𝑥, 𝑦) ↦→ 𝑥 + 𝑦 , then (𝑥, 𝑦) ↦→ 𝑥 2 + 𝑦 2
2 2
p
because the square root function is continuous, then (𝑥, 𝑦) ↦→ 1/ 𝑥 2 + 𝑦 2 because
𝑧 ↦→ 1/𝑧 is continuous, and here 𝑧 ≠ 0 because (𝑥, 𝑦) ≠ (0, 0), and finally p the
function (𝑥, 𝑦) ↦→ ℎ(𝑥, 𝑦) itself is continuous as the product of 𝑥𝑦 and 1/ 𝑥 + 𝑦 2 .
2

In short, for (𝑥, 𝑦) ≠ (0, 0) we have ℎ(𝑥, 𝑦) defined as a composition of continuous


functions, and so ℎ is continuous. (Note that all considerations of this section were
developed carefully just to prove this paragraph.)
The question is if ℎ(𝑥, 𝑦) is continuous at (0, 0), so for given 𝜀 > 0 we have to
find 𝛿 > 0 so that
∥(𝑥, 𝑦) − (0, 0)∥ < 𝛿 ⇒ |ℎ(𝑥, 𝑦) − ℎ(0, 0)| < 𝜀 , (3.58)
which√is (trivially)
p (𝑥, 𝑦) =
the case if p p (0, 0), so assume (𝑥, 𝑦) ≠ (0, 0). Then
|𝑥| = 𝑥 ≤ 𝑥 + 𝑦 and |𝑦| = 𝑦 ≤ 𝑥 2 + 𝑦 2 , and therefore
2 2 2 2

|𝑥| |𝑦|
q
|ℎ(𝑥, 𝑦) − ℎ(0, 0)| = |ℎ(𝑥, 𝑦)| = p ≤ 𝑥 2 + 𝑦 2 = ∥(𝑥, 𝑦)∥ . (3.59)
𝑥2 + 𝑦2
So if we choose 𝛿 = 𝜀 then ∥(𝑥, 𝑦)∥ < 𝛿 implies |ℎ(𝑥, 𝑦)| ≤ ∥(𝑥, 𝑦)∥ < 𝜀 and thus
(3.58) as required.
In this section, we have seen how continuity can be proved for functions that
are defined on R𝑛 . The maximum norm is particularly useful for these proofs.

3.13 The Theorem of Weierstrass

The following theorem of Weierstrass is the central theorem of this chapter.


Theorem 3.37 (Weierstrass). A continuous real-valued function on a nonempty compact
domain assumes its maximum and minimum.
78 Chapter 3. Continuous Optimisation

We first recall the notions used in Theorem 3.37, which is about a function
𝑓 : 𝑋 → R. The function 𝑓 is assumed to be continuous and the domain 𝑋 to be
compact (and nonempty). The theorem says that under these conditions, there are
𝑥 ∗ and 𝑥 in 𝑋 such that 𝑓 (𝑥 ∗ ) is the maximum and 𝑓 (𝑥) the minimum of 𝑓 (𝑋) (see
Section 3.5).
The proof of the Theorem of Weierstrass is based on the following two lemmas.
The first lemma refers to a subset of R.
Lemma 3.38. Let 𝐴 be a nonempty compact subset of R. Then 𝐴 has a maximum and a
minimum.

Proof. We only show that 𝐴 has a maximum. By Theorem 3.29, 𝐴 is closed and
bounded, and sup 𝐴 exists. We show that sup 𝐴 is a limit point of 𝐴. Otherwise,
𝐵(sup 𝐴, 𝜀) ∩ 𝐴 = ∅ for some 𝜀 > 0 by Lemma 3.23. But then there is no 𝑡 ∈ 𝐴 with
𝑡 > sup 𝐴 − 𝜀, so sup 𝐴 − 𝜀 is an upper bound of 𝐴, but sup 𝐴 is the least upper
bound, a contradiction. So sup 𝐴 is a limit point of 𝐴 and therefore belongs to 𝐴
because 𝐴 is closed, and hence sup 𝐴 is also the maximum of 𝐴.
The second lemma says that for a continuous function the image of a compact
set is compact.
Lemma 3.39. Let ∅ ≠ 𝑋 ⊆ R𝑛 and 𝑓 : 𝑋 → R. Then if 𝑓 is continuous and 𝑋 is
compact, then 𝑓 (𝑋) is compact.

Proof. Let {𝑦 𝑘 } 𝑘∈N be any sequence in 𝑓 (𝑋). We show that there exists a subse-
quence {𝑦 𝑘 𝑛 } 𝑛∈N and a point 𝑦 ∈ 𝑓 (𝑋) such that lim𝑛→∞ 𝑦 𝑘 𝑛 = 𝑦, which will show
that 𝑓 (𝑋) is compact. For that purpose, for each 𝑘 choose 𝑥 (𝑘) ∈ 𝑋 with 𝑓 (𝑥 (𝑘) ) = 𝑦 𝑘 ,
which exists by the definition of 𝑓 (𝑋). Then {𝑥 (𝑘) } is a sequence in 𝑋, which has a
convergent subsequence {𝑥 (𝑘 𝑛 ) } 𝑛∈N with limit 𝑥 in 𝑋 because 𝑋 is compact. Then,
because 𝑓 is continuous,
𝑓 (𝑥) = 𝑓 ( lim 𝑥 (𝑘 𝑛 ) ) = lim 𝑓 (𝑥 (𝑘 𝑛 ) ) = lim 𝑦 𝑘 𝑛
𝑛→∞ 𝑛→∞ 𝑛→∞

where 𝑓 (𝑥) ∈ 𝑓 (𝑋) because 𝑥 ∈ 𝑋. This proves the claim.


The Theorem of Weierstrass is a simple corollary to these two lemmas.
Proof of Theorem 3.37. Consider a function 𝑓 : 𝑋 → R on a compact domain 𝑋. By
Lemma 3.39, 𝑓 (𝑋) is a compact subset of R, which by Lemma 3.38 has a maximum,
which is the maximum value 𝑓 (𝑥 ∗ ) of 𝑓 on 𝑋 (for some 𝑥 ∗ in 𝑋), and a minimum,
which is the minimum value 𝑓 (𝑥) of 𝑓 on 𝑋 (for some 𝑥 in 𝑋).

3.14 Using the Theorem of Weierstrass

The Theorem 3.37 of Weierstrass is about a continuous function, say 𝑓 : 𝑋 → R,


on a compact domain 𝑋, where typically 𝑋 ⊆ R𝑛 . In Section 3.12 we have given a
3.14. Using the Theorem of Weierstrass 79

number of examples that show how to prove that a function is continuous. In order
to prove that a subset 𝑋 of R𝑛 is compact, we normally use the characterisation
in Theorem 3.29 that compact means “closed and bounded”. Boundedness is
typically most easily proved using the maximum norm, that is, boundedness in
each component, as for example in (3.38).
For closedness, the following observation is most helpful: sets that are pre-
images of closed sets under continuous functions are closed. We state and prove
this via the equivalent statement that pre-images of open sets under continuous
functions are open.

Lemma 3.40. Let 𝑓 : R𝑛 → R 𝑘 be a continuous function, let 𝑇 ⊆ R 𝑘 , and let 𝑆 be the


pre-image of 𝑇 under 𝑓 , that is,

𝑆 = 𝑓 −1 (𝑇) = {𝑥 ∈ R𝑛 | 𝑓 (𝑥) ∈ 𝑇} . (3.60)

Then 𝑆 is open if 𝑇 is open, and 𝑆 is closed if 𝑇 is closed.

Proof. We first prove that if 𝑇 is open, then 𝑆 is also open. Let 𝑥 ∈ 𝑆 and thus
𝑓 (𝑥) ∈ 𝑇 by (3.60). Because 𝑇 is open, there is some 𝜀 > 0 such that 𝐵( 𝑓 (𝑥), 𝜀) ⊆ 𝑇.
Because 𝑓 is continuous, there is some 𝛿 > 0 such that for all 𝑥 ∈ 𝐵(𝑥, 𝛿) we have
𝑓 (𝑥) ∈ 𝐵( 𝑓 (𝑥), 𝜀) and therefore 𝑓 (𝑥) ∈ 𝑇, that is, 𝑥 ∈ 𝑆. This shows 𝐵(𝑥, 𝛿) ⊆ 𝑆,
which proves that 𝑆 is open, as required.
In order to observe the same property for closed sets note that a set is closed
if and only if its set-theoretic complement is open, and that the pre-image of
the complement is the complement of the pre-image. Namely, let 𝑇 ′ ⊆ R 𝑘 and
suppose that 𝑇 ′ is closed, that is, 𝑇 given by 𝑇 = R 𝑘 \ 𝑇 ′ is open. Let 𝑆 = 𝑓 −1 (𝑇),
where 𝑆 is open as just shown, and let 𝑆′ = R𝑛 \ 𝑆, so that 𝑆′ is closed. But then
𝑆′ = {𝑥 ∈ R𝑛 | 𝑓 (𝑥) ∉ 𝑇} = 𝑓 −1 (𝑇 ′) , and 𝑆′ is closed, as claimed.
Note that Lemma 3.40 concerns the pre-image of a continuous function. In
contrast, Lemma 3.39 concerns the image of a continuous function 𝑓 . The statement
in Lemma 3.40 is not true for images, that is, if 𝑆 is closed, then 𝑓 (𝑆) is not
necessarily closed; for a counterexample, 𝑆 has to be unbounded since otherwise 𝑆
would be compact. An example is the function 𝑓 : R → R given by 𝑓 (𝑥) = 1/(1+ 𝑥 2 )
and 𝑆 = R, where 𝑓 (𝑆) = (0, 1]. That is, 𝑓 (𝑆) is neither closed nor open even though
𝑆 is both closed and open. A simpler example of a continuous function 𝑓 where
𝑓 (𝑆) is not open for open sets 𝑆 is a constant function 𝑓 , where 𝑓 (𝑆) is singleton if
𝑆 is nonempty.
In a more abstract setting, Lemma 3.40 can also be used to define that a function
𝑓 : 𝑋 → 𝑌 is continuous. In that case, 𝑋 and 𝑌 are so-called “topological spaces”.
A topological space is a set 𝑋 together with a set (called a “topology”) of subsets
of 𝑋 which are called open sets, which have to fulfill the following conditions: The
empty set ∅ and the entire set 𝑋 are open; the intersection of any two open sets is
80 Chapter 3. Continuous Optimisation

open; and arbitrary unions of open sets are open. These conditions hold for the
open sets as defined in Definition 3.19 (which define the standard topology on R𝑛 )
according to Theorem 3.25. Given that 𝑋 and 𝑌 are topological spaces, a function
𝑓 : 𝑋 → 𝑌 is called continuous if and only if the pre-image of any open set (in 𝑌)
is open (in 𝑋). This characterisation of continuous functions is important enough
to state it as a theorem.

Theorem 3.41. Let 𝑓 be a function R𝑛 → R 𝑘 . Then 𝑓 is continuous if and only if the


pre-image 𝑓 −1 (𝑇) as defined in (3.60) of any open subset 𝑇 of R 𝑘 is an open subset of R𝑛 .

Proof. If 𝑓 is continuous according to Definition 3.31, then any pre-image of an


open set under 𝑓 is open by Lemma 3.40.
Conversely, suppose that for any open set 𝑇 its pre-image 𝑓 −1 (𝑇) is open. We
want to show that 𝑓 is continuous at 𝑥 for any 𝑥 ∈ R𝑛 . Let 𝜀 > 0 and consider
the 𝜀-ball 𝐵( 𝑓 (𝑥), 𝜀) around 𝑓 (𝑥). The key observation is that this 𝜀-ball is itself
an open set. Namely, let 𝑦 ∈ 𝐵( 𝑓 (𝑥), 𝜀) and let 𝜀′ = 𝜀 − ∥ 𝑦 − 𝑓 (𝑥)∥ > 0. Consider
any 𝑦 ′ ∈ 𝐵(𝑦, 𝜀′), so that ∥𝑦 ′ − 𝑦∥ < 𝜀′ = 𝜀 − ∥ 𝑦 − 𝑓 (𝑥)∥. By the triangle inequality
(3.23), ∥ 𝑦 ′ − 𝑓 (𝑥)∥ ≤ ∥𝑦 ′ − 𝑦∥ + ∥ 𝑦 − 𝑓 (𝑥)∥ < 𝜀, that is, 𝑦 ′ ∈ 𝐵( 𝑓 (𝑥), 𝜀), which shows
𝐵(𝑦, 𝜀′) ⊆ 𝐵( 𝑓 (𝑥), 𝜀) . That is, 𝐵( 𝑓 (𝑥), 𝜀) is open as claimed.
Let 𝑆 = 𝑓 −1 (𝐵( 𝑓 (𝑥), 𝜀)) = {𝑥 ∈ R𝑛 | 𝑓 (𝑥) ∈ 𝐵( 𝑓 (𝑥), 𝜀)} = {𝑥 ∈ R𝑛 | ∥ 𝑓 (𝑥) −
𝑓 (𝑥)∥ < 𝜀}. By assumption, 𝑆 is open, and clearly 𝑥 ∈ 𝑆. So for some 𝛿 > 0
there is some 𝛿-ball around 𝑥 that is contained in 𝑆, that is, 𝐵(𝑥, 𝛿) ⊆ 𝑆. Then
𝑓 (𝐵(𝑥, 𝛿)) ⊆ 𝑓 (𝑆) = 𝐵( 𝑓 (𝑥), 𝜀) (understand this carefully!). But this means that
∥𝑥 − 𝑥∥ < 𝛿 implies ∥ 𝑓 (𝑥) − 𝑓 (𝑥)∥ < 𝜀 as in (3.40), so 𝑓 is continuous.

For our purposes, we only need the part of Theorem 3.41 that is stated in
Lemma 3.40. It implies the following observation, which is most useful to identify
certain subsets of R𝑛 as closed or open.

Lemma 3.42. Let 𝑓 : R𝑛 → R be continuous and let 𝑎 ∈ R. Then the sets {𝑥 ∈ R𝑛 |


𝑓 (𝑥) ≤ 𝑎} and {𝑥 ∈ R𝑛 | 𝑓 (𝑥) ≥ 𝑎} are closed, the sets {𝑥 ∈ R𝑛 | 𝑓 (𝑥) > 𝑎} and
{𝑥 ∈ R𝑛 | 𝑓 (𝑥) < 𝑎} are open, and the set {𝑥 ∈ R𝑛 | 𝑓 (𝑥) = 𝑎} is also closed.

Proof. We have {𝑥 ∈ R𝑛 | 𝑓 (𝑥) ≤ 𝑎} = 𝑓 −1 ((−∞, 𝑎]) and {𝑥 ∈ R𝑛 | 𝑓 (𝑥) ≥ 𝑎} =


𝑓 −1 ([𝑎, ∞)) so these are closed sets by Lemma 3.40 because the intervals (−∞, 𝑎]
and [𝑎, ∞) are closed. The sets {𝑥 ∈ R𝑛 | 𝑓 (𝑥) > 𝑎} and {𝑥 ∈ R𝑛 | 𝑓 (𝑥) < 𝑎} are the
complements of these closed sets and therefore open. The set {𝑥 ∈ R𝑛 | 𝑓 (𝑥) = 𝑎}
is the intersection of these closed sets and therefore closed, and also equal to the
pre-image 𝑓 −1 ({𝑎}) of the closed set {𝑎}.

We now consider, as an example, the following set 𝑋,

𝑋 = {(𝑥, 𝑦) ∈ R2 | 𝑥 ≥ 0, 𝑦 ≥ 0, 𝑥𝑦 ≥ 1} . (3.61)
3.14. Using the Theorem of Weierstrass 81

Here 𝑋 is the intersection of the closed sets {(𝑥, 𝑦) ∈ R2 | 𝑥 ≥ 0}, {(𝑥, 𝑦) ∈ R2 |


𝑦 ≥ 0}, and {(𝑥, 𝑦) ∈ R2 | 𝑥𝑦 ≥ 1} given by the three conditions 𝑥 ≥ 0, 𝑦 ≥ 0,
and 𝑥 𝑦 ≥ 1. These are all closed sets by Lemma 3.42 because the functions
(𝑥, 𝑦) ↦→ 𝑥, (𝑥, 𝑦) ↦→ 𝑦, and (𝑥, 𝑦) ↦→ 𝑥𝑦 are all continuous (see Section 3.12), and
the intersection of any closed sets is closed. In words, 𝑋 is the set of pairs (𝑥, 𝑦) in
the positive quadrant of R2 (defined by 𝑥 ≥ 0 and 𝑦 ≥ 0) bounded by the hyperbola
given by 𝑥𝑦 = 1. While closed, 𝑋 is clearly not bounded because, for example,
∥(𝑥, 𝑥)∥ can become arbitrarily large.
We now consider the problem of maximising or minimising the function
𝑓 : 𝑋 → R given by
1
𝑓 (𝑥, 𝑦) = . (3.62)
𝑥+𝑦
The function 𝑓 is well-defined because 𝑥 + 𝑦 cannot be zero on 𝑋. Clearly,
minimising 𝑓 (𝑥, 𝑦) is equivalent to maximising 𝑥 + 𝑦, and that problem has no
solution because 𝑥 + 𝑦 can become arbitrarily large on 𝑋.

2 𝑋2

𝑋1
1

0 𝑥
0 1 2 3

Figure 3.3 Decomposition of the set 𝑋 in (3.61) into 𝑋 = 𝑋1 ∪ 𝑋2 as in (3.63).

However, the problem of maximising 𝑓 (𝑥, 𝑦) (which is equivalent to minimising


𝑥 + 𝑦) has a solution on 𝑋. With the aim to use the Theorem of Weierstrass, we let
𝑋 = 𝑋1 ∪ 𝑋2 with

𝑋1 = {(𝑥, 𝑦) ∈ 𝑋 | 𝑥 + 𝑦 ≤ 3}, 𝑋2 = {(𝑥, 𝑦) ∈ 𝑋 | 𝑥 + 𝑦 ≥ 3}, (3.63)

as shown in Figure 3.3. By definition, 𝑥 + 𝑦 is bounded from above by 3 on the


set 𝑋1 , and the set 𝑋1 is closed because it is the intersection of 𝑋 with the set
{(𝑥, 𝑦) ∈ R2 | 𝑥 + 𝑦 ≤ 3} which is closed because the function (𝑥, 𝑦) ↦→ 𝑥 + 𝑦 is
continuous. Moreover, 𝑋1 is bounded, because |𝑥| ≤ 3 and |𝑦| ≤ 3 for (𝑥, 𝑦) ∈ 𝑋1 .
So 𝑋1 is compact and therefore 𝑓 (𝑥, 𝑦) has a maximum on 𝑋1 , by the Theorem of
82 Chapter 3. Continuous Optimisation

Weierstrass. We also need that 𝑋1 is not empty: it contains for example the point
(2, 1). Now, the maximum of 𝑓 on 𝑋1 is also the maximum of 𝑓 on 𝑋. Namely,
for (𝑥, 𝑦) ∈ 𝑋2 we have 𝑥 + 𝑦 ≥ 3 and therefore 𝑓 (𝑥, 𝑦) = 𝑥+𝑦
1
≤ 13 = 𝑓 (2, 1), where
(2, 1) ∈ 𝑋1 , so Theorem 3.10 applies.
In this example of maximising the function 𝑓 in (3.62) on the domain 𝑋 in
(3.61), 𝑋 is closed but not compact. However, we have applied Theorem 3.10 with
𝑋 = 𝑋1 ∪ 𝑋2 as in (3.63) in order to obtain a compact domain 𝑋1 where we know
the maximisation of 𝑓 has a solution, which then applies to all of 𝑋. This is an
important example which we will consider further in some exercises.
The Theorem of Weierstrass only gives us the existence of a maximum of 𝑓 on
𝑋1 (and thereby on 𝑋), but it does not show how to find it. It seems rather clear
that the maximum of 𝑓 (𝑥, 𝑦) on 𝑋 is obtained for (𝑥, 𝑦) = (1, 1), but proving (and
finding) this maximum is shown in the next chapter.

3.15 Reminder of Learning Outcomes

After studying this chapter, you should be able to:


• explain the important difference between a real number 𝑥 being positive (𝑥 > 0)
and being nonnegative (𝑥 ≥ 0)
• explain why the square 𝑥 2 of a real number 𝑥 is always nonnegative
• use confidently the notions of infimum and supremum, and their related but
different notions of minimum and maximum
• explain what it means for a sequence of real numbers to converge (tend to a
limit that is a real number), or to tend to plus or minus infinity
• understand what it means for a sequence in R𝑛 to converge
• explain the difference between Euclidean norm and maximum norm, and why
both can be used to prove convergence
• have an intuition of open, closed, bounded, and compact sets, and define them
formally
• understand the central concept of continuity of functions defined on R𝑛 or on
subsets of R𝑛
• explain the importance of compactness for subsets of R𝑛 as domains of
functions, with suitable examples
• state the Theorem of Weierstrass (Theorem 3.37)
• apply the Theorem of Weierstrass to examples to prove existence of a maximum
and minimum of a function.
3.16. Exercises for Chapter 3 83

3.16 Exercises for Chapter 3

Exercise 3.1.
(a) Use the formal definition of the limit of a sequence to prove that the sequence
{𝑥 𝑘 } given by 𝑥 𝑘 = 𝑘−1
𝑘 for 𝑘 ∈ N converges to 1.
(b) Use the formal definition of the limit of a sequence to prove that the sequence
{𝑦 𝑘 } given by 𝑦 𝑘 = (−1) 𝑘 does not converge to any limit.
Exercise 3.2. Let 𝐴 ⊆ R, 𝐴 ≠ ∅. Use the definitions of sup and inf to prove the
following: If inf 𝐴 = sup 𝐴, then 𝐴 has only one element.
Exercise 3.3. Prove the triangle inequality ∥𝑥 + 𝑦∥ max ≤ ∥𝑥∥ max + ∥𝑦∥ max for the
maximum norm. Hint: for a set 𝐴 of reals that has a maximum and 𝑏 ∈ R, we have
max 𝐴 ≤ 𝑏 if and only if 𝑎 ≤ 𝑏 for all 𝑎 ∈ 𝐴.
Exercise 3.4. Which of the following sets 𝐴, 𝐵, 𝐶, 𝐷 are open, closed, compact?
Justify your answers.

𝐴 = {𝑥 ∈ R | 𝑥 = 1/𝑛 for some 𝑛 ∈ N},


𝐵 = {𝑥 ∈ R | 𝑥 = 0 or 𝑥 = 1/𝑛 for some 𝑛 ∈ N},
𝐶 = {(𝑥, 𝑦) ∈ R2 | 0 ≤ 𝑥 ≤ 1},
𝐷 = {(𝑥, 𝑦) ∈ R2 | 0 < 𝑥 < 1, 𝑦 = 1}.
Exercise 3.5.
(a) Give an example of a continuous function on a bounded set that does not
attain a maximum.
(b) Give an example of a continuous function on a bounded set whose function
values do not have a supremum.
(c) Give an example of a continuous function on a closed set that does not attain a
minimum.
(d) Give an example of a continuous and bounded function on a closed set that
attains neither maximum nor minimum.
Exercise 3.6. If 𝐿 ∈ R and 𝑓 (𝑥) is defined for 𝑥 ≥ 0 then we say 𝑓 (𝑥) → 𝐿 as 𝑥 → ∞
if for all 𝜀 > 0 there is some 𝑀 so that 𝑥 ≥ 𝑀 implies | 𝑓 (𝑥) − 𝐿| < 𝜀.
Recall that R≥ = { 𝑥 ∈ R | 𝑥 ≥ 0}. Let 𝑓 : R≥ → R be a continuous function,
𝑓 (𝑥) → 0 as 𝑥 → ∞, and 𝑓 (0) = 1. Show that 𝑓 attains a maximum. Hint: use
Theorem 3.10. What about a minimum?
Exercise 3.7. Consider the function 𝑔 : R2 → R in (3.42) defined by

0

 if (𝑥, 𝑦) = (0, 0)
𝑥𝑦

𝑔(𝑥, 𝑦) =
 𝑥2 + 𝑦2
 otherwise.

84 Chapter 3. Continuous Optimisation

(a) Show that 𝑔 is not continuous at (0, 0).


(b) Show that for any 𝑘 ∈ R, the function 𝑔(𝑥, 𝑦) takes the same constant value
along the line 𝑦 = 𝑘𝑥 for any 𝑥 ∈ R − {0}. Given the symmetry of 𝑔(𝑥, 𝑦) in 𝑥
and 𝑦, for which other pairs (𝑥, 𝑦) does 𝑔(𝑥, 𝑦) take that same value?
(c) Show that the “contour lines” of 𝑔, given by {(𝑥, 𝑦) ∈ R2 | 𝑔(𝑥, 𝑦) = 𝑐} for
some 𝑐 ∈ R, are exactly those described in (b). Hint: this is simple – find a
suitable 𝑘.
(d) Find the set of all maximisers of 𝑔 (which is empty if 𝑔 does not have a
maximum).

Exercise 3.8. Recall that R≥ = { 𝑥 ∈ R | 𝑥 ≥ 0}. Suppose a producer of some


good wants to produce quantity 𝑥 ∈ R≥ of the good, and knows the maximum
price 𝑝(𝑥) ∈ R that allows her to sell all of that good. Let 𝑐(𝑥) ≥ 0 be the cost of
producing quantity 𝑥. Both 𝑝 and 𝑐 are continuous functions R≥ → R, where 𝑝
is weakly decreasing and 𝑐 weakly increasing, with 𝑝(0) ∈ R and 𝑐(0) = 0. The
producer wants to maximise her profit 𝜋 : R≥ → R given by 𝜋(𝑥) = 𝑥𝑝(𝑥) − 𝑐(𝑥).
(a) Show that if 𝑝(𝑥 ∗ ) = 0 for some 𝑥 ∗ > 0, then the maximum of 𝜋(𝑥) exists, using
the Theorem of Weierstrass.
(b) Show the same if there is 𝑥 ′ > 0 such that 𝑐(𝑥) ≥ 𝑥𝑝(𝑥) for all 𝑥 ≥ 𝑥 ′.
(c) What can you say about the case that 𝑝(𝑥) = 𝑝 for all 𝑥 and 𝑐(𝑥) → ∞ as
𝑥 → ∞?

Exercise 3.9. Which of the following sets are open, closed, compact? Justify your
answers. You can refer to any theorems in the guide.

𝐴 = {𝑧 ∈ R | 𝑧 = 𝑥 · 𝑦 for some 𝑥, 𝑦 so that 0 ≤ 𝑥 ≤ 1, 0 ≤ 𝑦 ≤ 1} ,


𝐵 = {𝑧 ∈ R | 𝑧 = 𝑥/𝑦 for some 𝑥, 𝑦 so that 0 ≤ 𝑥 ≤ 1, 1 ≤ 𝑦 < 2} ,
𝐶 = {𝑧 ∈ R | 𝑧 = 𝑥/𝑦 for some 𝑥, 𝑦 so that 0 ≤ 𝑥 ≤ 1, 1 < 𝑦 ≤ 2} ,
𝐷 = {𝑧 ∈ R | 𝑧 = 𝑥 + 𝑦 for some 𝑥, 𝑦 so that 0 ≤ 𝑥 ≤ 1, 0 < 𝑦 < 1} .
4
First-Order Conditions

4.1 Introduction

This chapter studies optimisation problems for differentiable functions. It discusses


how the concept of differentiability can be used to find necessary conditions for a
(local) maximum of a function. If the function is subject to equality constraints,
this leads to the Theorem of Lagrange that asserts that at a local maximum, the
derivative of the function is a linear combination of the derivatives of the constraint
functions.
If the function is subject to inequality constraints, the corresponding KKT
Theorem (due to Karush, Kuhn, and Tucker) states, similarly, that the derivative of the
function is a nonnegative linear combination of the derivatives of those constraint
functions where the inequalities hold tight, that is, hold as equalities.
Both the Theorem of Lagrange and the KKT Theorem require an additional
constraint qualification that the respective derivatives of the constraints functions
have to be linearly independent.
First-order conditions refer to derivatives. Second-order conditions refer to
second derivatives. They are much less important and are omitted from our
introduction to these methods.

4.1.1 Learning Outcomes

After studying this chapter, you should be able to:


• explain the concept of differentiability of a function defined on a subset of R𝑛
• understand how such subsets are defined by constraints given by equalities or
inequalities
• understand and draw contour lines of functions, for example as in Figure 4.2
• compute gradients, via partial derivatives, of given functions

85
86 Chapter 4. First-Order Conditions

• find the zeroes of such gradients for examples of functions that are uncon-
strained, and then examine the points where the gradient is zero to identify
minima and maxima of the objective function
• for equality constraints, apply Lagrange’s Theorem to specific examples
• understand that so-called “critical points” (where gradients of the constraint
functions are not linearly independent) also have to be examined as possible
minima or maxima, and apply this to given examples
• use the insight that the KKT conditions distinguish between tight and non-tight
inequalities, and that non-tight inequalities are in effect treated as if they were
absent, in order to identify candidate points for local minima and maxima.
Apply this to specific examples.

4.1.2 Essential Reading

Essential reading is this chapter. As prerequisites we need the previous Chapter 3,


in particular for concepts such as limits and open sets.

4.1.3 Further Reading

The structure of this chapter is similar to chapters 5 and 6 of the following book:
Sundaram, R. K. (1996). A First Course in Optimization Theory. Cambridge
University Press, Cambridge, UK. ISBN 978-0521497190.
A classic book on differentiation of functions of several variables is this book:
Rudin, W. (1976). Principles of Mathematical Analysis, 3rd ed., volume 3. McGraw-
Hill, New York. ISBN 978-0070542358.
In that book, you can find on page 219, theorem 9.21, a proof of Theorem 4.5.
Theorem 4.11 is also known as the Kuhn–Tucker Theorem. The original
publication of that theorem is
Kuhn, H. W. and A. W. Tucker (1951). Nonlinear programming. In: Proceedings
of the Second Berkeley Symposium on Mathematical Statistics and Probability, edited
by J. Neyman, 481–492. University of California Press, Berkeley, CA.
An accessible history of that material is given in
Kuhn, H. W. (1991). Nonlinear programming: A historical note. In: History of
Mathematical Programming: A Collection of Personal Reminiscences, edited by J. K.
Lenstra, A. H. G. Rinnoy Kan, and A. Schrijver, 82–96. CWI and North-Holland,
Amsterdam. ISBN 978-0444888181.
It describes how Kuhn found out that the Kuhn-Tucker theorem had already been
shown this unpublished Master’s thesis:
4.2. Introductory Example 87

Karush, W. (1939). Minima of Functions of Several Variables with Inequalities as


Side Constraints. M.Sc. Thesis, Dept. of Mathematics, University of Chicago,
Chicago.
For historical accuracy, the theorem is now also called the KKT or Karush-Kuhn-
Tucker Theorem.
The name “first-order conditions” for maximisation under inequality con-
straints is generally associated with the KKT Theorem.

4.1.4 Synopsis of Chapter Content

• Section 4.2 gives an introductory example to illustrate the main idea that the
gradients of objective function and constraint function have to be co-linear in
an optimum.
• Section 4.3 recalls matrix multiplication, which we will use throughout also
for scalar products of vectors and for products of vectors with scalars.
• Section 4.4 explains that differentiability means that a function can be lo-
cally approximated (intuitively, by looking at it “with a sufficiently strong
magnifying glass”) with a linear function, called the gradient of the function.
• Section 4.5 shows that the gradient of a function on R𝑛 is given by the 𝑛-tuple
of its partial derivatives.
• Taylor’s Theorem, which may be familiar for functions of a single variable, is
discussed in Section 4.6 for differentiable functions of 𝑛 variables.
• Section 4.7 is about unconstrained optimisation where a local minimum or
maximum necessarily has a zero gradient (intuitively, because the function
could be improved in a nonzero direction). Some examples are given.
• Section 4.8 considers equality constraints. The central Theorem of Lagrange
states that the gradient of the objective function in an optimum is a linear
combination of the gradients of the constraints, provided these are linearly
independent.
• Adding inequality constraints gives rise to the “KKT” conditions by Karush,
Kuhn, and Tucker, which are treated in Section 4.9.

4.2 Introductory Example

We consider an introductory example similar to the example considered at the end


of the last chapter. The problem states:

minimise 𝑥 + 4𝑦 subject to 𝑥 ≥ 0, 𝑦 ≥ 0, 𝑥𝑦 = 1. (4.1)

The objective function to be minimised is here the linear function 𝑓 defined by


88 Chapter 4. First-Order Conditions

𝑓 (𝑥, 𝑦) = 𝑥 + 4𝑦 (4.2)

on the domain 𝑋 = {(𝑥, 𝑦) ∈ R2 | 𝑥 ≥ 0, 𝑦 ≥ 0, 𝑥𝑦 = 1}, which is a closed set by


Lemma 3.42. Note that the domain 𝑋 is here just the hyperbola (in the positive
quadrant of R2 ) defined by the equality 𝑥𝑦 = 1. An analogue of Theorem 3.10
shows that this minimum exists, by restricting 𝑓 to a suitable compact subset 𝑋1
of 𝑋 where 𝑓 has a minimum by the theorem of Weierstrass. For that purpose,
consider any point in 𝑋, for example (1, 1), and its function value of 𝑓 , here
𝑓 (1, 1) = 5. We then define 𝑋1 to be the subset of 𝑋 where 𝑓 (𝑥, 𝑦) is at most 5.
That is, similar to (3.63), we write 𝑋 = 𝑋1 ∪ 𝑋2 with

𝑋1 = {(𝑥, 𝑦) ∈ 𝑋 | 𝑥 + 4𝑦 ≤ 5}, 𝑋2 = {(𝑥, 𝑦) ∈ 𝑋 | 𝑥 + 4𝑦 ≥ 5}, (4.3)

and because 𝑋1 is nonempty we can restrict the minimisation of 𝑓 to the compact


set 𝑋1 which then gives the minimum of 𝑓 on 𝑋.

18
16
14
12
10
8
6
4
2
0
3.0 2.5 5
2.0 1.5 4
3
y 1.0
0.5 1
2 x
0.0 0
Figure 4.1 Plot of the function 𝑓 (𝑥, 𝑦) = 𝑥 + 4𝑦 for 0 ≤ 𝑥 ≤ 5 and 0 ≤ 𝑦 ≤ 3,
with the blue curve showing the restriction 𝑥 𝑦 = 1.

Figure 4.1 shows a perspective drawing of a three-dimensional plot of 𝑓 (𝑥, 𝑦)


which shows the linearity of this function. The function value (drawn vertically)
increases by 1/4 when 𝑥 is increased by 1/4, and by 1 when 𝑦 is increased by 1/4;
each small rectangle represents such a step size of 1/4 for both 𝑥 and 𝑦. In addition,
the blue curve in the picture shows the restriction to those pairs (𝑥, 𝑦) where
𝑥 𝑦 = 1.
4.2. Introductory Example 89

While a plot as in Figure 4.1 is useful to get an understanding of the behaviour


of 𝑓 , it does not show the minimum of 𝑓 exactly, and it is also hard to draw without
computer tools.

(1, 4)
y

0 x
0 1 2 3 4 5

Figure 4.2 Contour lines and gradient (1, 4) of the function 𝑓 (𝑥, 𝑦) = 𝑥 + 4𝑦
for 𝑥 ≥ 0, 𝑦 ≥ 0.

A more instructive picture that can be drawn in two dimensions uses contour
lines of the function 𝑓 in (4.2), shown as the dashed lines in Figure 4.2. Such
a contour line for 𝑓 (𝑥, 𝑦) is the set of points (𝑥, 𝑦) where 𝑓 (𝑥, 𝑦) = 𝑐 for some
constant 𝑐, that is, where 𝑓 (𝑥, 𝑦) takes a fixed value. One could also say that
a contour line is the pre-image 𝑓 −1 ({𝑐}) under 𝑓 of one of its possible values 𝑐.
Clearly, for different values of 𝑐 any two such contour lines are disjoint. Here,
because 𝑓 is linear, these contour lines are parallel lines. For (𝑥, 𝑦) ∈ R2 , such a
contour line corresponds to the equation 𝑥 + 4𝑦 = 𝑐 or equivalently 𝑦 = 4𝑐 − 𝑥4
(we also only consider nonnegative values for 𝑥 and 𝑦). Contour lines are known
from topographical maps of, say, mountain regions, where each line corresponds
to a particular height above sea level; the two-dimensional picture of these lines
conveys information about the three-dimensional terrain. Here, they indicate how
the function should be minimised, by choosing the smallest function value 𝑐 that
is possible.
Figure 4.2 also shows the gradient of the function 𝑓 . We will define this gradient,
called 𝐷 𝑓 , formally later. It is given by the derivatives of 𝑓 with respect to 𝑥
𝑑 𝑑
and to 𝑦, that is, the pair ( 𝑑𝑥 𝑓 (𝑥, 𝑦), 𝑑𝑦 𝑓 (𝑥, 𝑦)), which is here (1, 4) for every (𝑥, 𝑦)
because 𝑓 is the linear function (4.2). This vector (1, 4) is drawn in Figure 4.2. The
gradient (1, 4) shows in which direction the function increases (which is discussed
in further detail in the introductory Section 5.2 of the next chapter), and can be
90 Chapter 4. First-Order Conditions

interpreted as the direction of “steepest ascent”. Correspondingly, the opposite


direction of the gradient is the direction of “steepest descent”. In addition, the
gradient is orthogonal to the contour line, because along the contour line the function
neither increases nor decreases.

3
(12 , 2)

0 x
0 1 2 3 4 5

Figure 4.3 Minimum of the function 𝑓 (𝑥, 𝑦) = 𝑥 + 4𝑦 subject to the con-


straints 𝑥 ≥ 0, 𝑦 ≥ 0, and 𝑔(𝑥, 𝑦) = 𝑥 𝑦 − 1 = 0, at the point (𝑥, 𝑦) = (2, 21 ) where
the gradient 𝐷 𝑔(𝑥, 𝑦) = (𝑦, 𝑥) = ( 12 , 2) of the constraint function 𝑔 is co-linear
with the gradient 𝐷 𝑓 (𝑥, 𝑦) = (1, 4) of 𝑓 .

Moving along any direction which is not orthogonal to the gradient means
either moving partly in the same direction as the gradient (increasing the function
value), or away from it (decreasing the function value). Consider now Figure 4.3
where we have drawn the hyperbola which represents the constraint 𝑥𝑦 = 1 in (4.1).
A point on this hyperbola is, for example, (1, 1). At that point, the contour lines
show that the function value can still be lowered by moving towards (1 + 𝜀, 1+𝜀 1
).
But at the point (𝑥, 𝑦) = (2, 2 ) the contour line just touches the hyperbola and so
1

the function value of 𝑓 (𝑥, 𝑦) cannot be reduced further.


The way to compute this point is the method of so-called Lagrange multipliers,
here a single multiplier that corresponds to the single constraint 𝑥𝑦 = 1. We write
this constraint in the form 𝑔(𝑥, 𝑦) = 0 where 𝑔(𝑥, 𝑦) = 𝑥𝑦 − 1. This constraint
function 𝑔(𝑥, 𝑦) has itself a gradient, which depends on (𝑥, 𝑦) and is given by
𝑑 𝑑
𝐷 𝑔(𝑥, 𝑦) = ( 𝑑𝑥 𝑔(𝑥, 𝑦), 𝑑𝑦 𝑔(𝑥, 𝑦)) = (𝑦, 𝑥). The Lagrange multiplier method says
that only when there is a scalar 𝜆 such that 𝐷 𝑓 (𝑥, 𝑦) = 𝜆𝐷 𝑔(𝑥, 𝑦), that is, when the
gradients of the objective function 𝑓 and of the constraint function 𝑔 are co-linear,
then no further improvement of 𝑓 (𝑥, 𝑦) is possible. The reason is that only in this
case the contour lines of 𝑓 and of 𝑔, which are orthogonal to the gradients 𝐷 𝑓 and
4.3. Matrix Multiplication for Vectors and Scalars 91

𝐷 𝑔, touch as required, so moving along the contour line of 𝑔 (that is, maintaining
the constraint) also neither increases nor decreases the value of 𝑓 .
Here 𝐷 𝑓 (𝑥, 𝑦) = 𝜆𝐷 𝑔(𝑥, 𝑦) is the equation (1, 4) = 𝜆 · (𝑦, 𝑥) = (𝜆𝑦, 𝜆𝑥) and
thus 𝜆 = 1/𝑦 = 4/𝑥 and thus 𝑦 = 𝑥/4, which together with the constraint 𝑥𝑦 = 1
means 𝑥 2 /4 = 1 or 𝑥 = 2 (because 𝑥 = −2 violates 𝑥 ≥ 0) and 𝑦 = 1/2, which is
indeed the optimum (𝑥, 𝑦) = (2, 12 ).
Of course, this simple example (4.1) has a solution that can be found directly
using one-variable calculus. Namely, the constraint 𝑥𝑦 = 1 translates to 𝑦 = 1/𝑥 so
that we can consider the problem of minimising 𝑥 + 4/𝑥 (for 𝑥 ≥ 0, in fact 𝑥 > 0
because 𝑥 = 0 is excluded by the condition 𝑥𝑦 = 1). We differentiate and set the
𝑑
derivative to zero. That is, 𝑑𝑥 (𝑥 + 4/𝑥) = 1 − 4/𝑥 2 = 0 gives the same solution 𝑥 = 2,
𝑦 = 1/2 .
The point of this introductory section was to illustrate the geometric un-
derstanding of contour lines and co-linear gradients of objective and constraint
functions in an optimum.

4.3 Matrix Multiplication for Vectors and Scalars

We recall here a useful way to treat all vectors and scalars as matrices, which will
be particularly important in the final Chapter 5. This gives a unified description
of multiplying matrices with vectors, of the scalar product of two vectors, and of
multiplying a vector with a scalar, as special cases of matrix multiplication.
If 𝐴 is an 𝑚 × 𝑘 matrix and 𝐵 is a 𝑘 × 𝑛 matrix, then their matrix product 𝐴 · 𝐵
(or 𝐴𝐵 for short) is the 𝑚 × 𝑛 matrix 𝐶 with entries
𝑘
Õ
𝑐 𝑖𝑗 = 𝑎 𝑖𝑠 𝑏 𝑠 𝑗 (1 ≤ 𝑖 ≤ 𝑚, 1 ≤ 𝑗 ≤ 𝑛). (4.4)
𝑠=1

When writing down a matrix product 𝐴𝐵 it is assumed to be defined, that is, 𝐴


has as many columns as 𝐵 has rows. Matrix multiplication is associative, that is,
(𝐴𝐵)𝐶 = 𝐴(𝐵𝐶), written as 𝐴𝐵𝐶, for matrices 𝐴, 𝐵, 𝐶.
When dealing with matrices, we use the following notation throughout. All
matrices considered here are real matrices (with entries in R). The set of 𝑚 × 𝑛
matrices is R𝑚×𝑛 . The transpose of an 𝑚 × 𝑛 matrix 𝐴 is written 𝐴⊤, which is the
𝑛 × 𝑚 matrix with entries 𝑎 𝑗𝑖 if 𝑎 𝑖𝑗 are the entries of 𝐴, for 1 ≤ 𝑖 ≤ 𝑚, 1 ≤ 𝑗 ≤ 𝑛.
So far, we have considered vectors 𝑥 that are elements of R𝑛 simply as 𝑛-tuples
(𝑥1 , . . . , 𝑥 𝑛 ) where each component 𝑥 𝑖 for 𝑖 = 1, . . . , 𝑛 is an element of R. This
will also be our main usage in this chapter, but when we use scalar products of
two vectors or multiply a matrix with a vector, we pay attention to the “shape”
of a vector as a row vector or column vector. In that case, by default all vectors
92 Chapter 4. First-Order Conditions

are column vectors. That is, if 𝑥 ∈ R𝑛 then 𝑥 is an 𝑛 × 1 matrix and 𝑥⊤ is the


corresponding row vector in R1×𝑛 , and row vectors are easily identified by this
transposition sign. The components of a vector 𝑥 ∈ R𝑛 are 𝑥 1 , . . . , 𝑥 𝑛 . We write
the scalar product of two vectors 𝑥, 𝑦 in R𝑛 as the matrix product 𝑥⊤𝑦. In pictures
(with a square for each matrix entry, here for 𝑛 = 4):

𝑥⊤𝑦 : · = (4.5)

We use matrix multiplication where possible. In particular, scalars are treated like
1 × 1 matrices. A column vector 𝑥 is multiplied with a scalar 𝜆 from the right, and
a row vector 𝑥⊤ is multiplied with a scalar 𝜆 from the left,

𝑥𝜆 : · = , 𝜆𝑥⊤ : · = (4.6)

In a matrix product 𝐶 = 𝐴𝐵, the entry 𝑐 𝑖𝑗 of 𝐶 in row 𝑖 and column 𝑗 is the


scalar product of the 𝑖th row of 𝐴 with the 𝑗th column of 𝐵, according to (4.5) and
(4.4). There are two useful ways of visualising the product 𝐴𝐵. Let 𝐴 ∈ R𝑚×𝑘 and
𝑥 ∈ R 𝑘 . Then 𝐴𝑥 ∈ R𝑚 . Let 𝐴 = [𝐴1 · · · 𝐴 𝑘 ], that is, 𝐴1 , . . . , 𝐴 𝑘 are the columns
of 𝐴. Then 𝐴𝑥 = 𝐴1 𝑥 1 + · · · + 𝐴 𝑘 𝑥 𝑘 , that is, 𝐴𝑥 is the linear combination of the
columns of 𝐴 with the components 𝑥1 , . . . , 𝑥 𝑘 of 𝑥 as coefficients. In pictures for
𝑚 = 3, 𝑘 = 4 :

· = · = (4.7)

Let 𝐵 ∈ R 𝑘×𝑛 and 𝐵 = [𝐵1 · · · 𝐵𝑛 ]. Then the 𝑗th column of 𝐴𝐵 is the linear
combination 𝐴𝐵 𝑗 of the columns of 𝐴 with the components of 𝐵 𝑗 as coefficients.
We can visualise the columns of 𝐴𝐵 as follows (for 𝑛 = 2):

· = · = =

The second view of 𝐴𝐵 is the same but using rows. Let 𝑦 ∈ R 𝑘 , so that 𝑦⊤𝐵 is a
row vector in R1×𝑛 , given as the linear combination 𝑦1 𝑏⊤ 1
+ · · · + 𝑦 𝑘 𝑏⊤𝑘 of the rows
𝑏⊤
1
, . . . , 𝑏⊤𝑘 of 𝐵 (which we can write as 𝐵⊤ = [𝑏1 · · · 𝑏 𝑘 ]),

· = · = (4.8)
4.4. Differentiability in R𝑛 93

Let 𝐴⊤ = [𝑎 1 · · · 𝑎 𝑚 ]. Then the 𝑖th row of 𝐴𝐵 is the linear combination 𝑎⊤


𝑖
𝐵 of the
rows of 𝐵 with the components of the 𝑖th row 𝑎 𝑖 of 𝐴 as coefficients:

· = · = =

It is useful to acquire some routine in these manipulations. Most common is


multiplication of a matrix with a column vector from the right as in (4.7) or with a
row vector from the left as in (4.8).
Furthermore, we use the special vectors 0 and 1 with all components equal to
0 and 1, respectively, where their dimension depends on the context. Inequalities
like 𝑥 ≥ 0 hold for all components.

4.4 Differentiability in R𝑛

The idea of differentiability is to approximate a function locally with a linear (more


precisely: affine) function. If 𝑓 : R𝑛 → R, then 𝑓 is called affine if there are reals
𝑐 0 , 𝑐1 , . . . , 𝑐 𝑛 such that for all 𝑥 = (𝑥1 , . . . , 𝑥 𝑛 ) ∈ R𝑛 we have

𝑓 (𝑥1 , . . . , 𝑥 𝑛 ) = 𝑐 0 + 𝑐 1 𝑥1 + · · · + 𝑐 𝑛 𝑥 𝑛 . (4.9)

If thereby 𝑐 0 = 0, then 𝑓 is called linear (which is equivalent to 𝑓 (0, . . . , 0) = 0).

𝑓 (𝑥)

𝑓 (𝑥) + 𝐺 · (𝑥 − 𝑥)

𝑓 (𝑥)

𝑥 𝑥
Figure 4.4 Approximating a function 𝑓 (𝑥) for 𝑥 near 𝑥 with an affine function
with slope or “gradient” 𝐺 that represents the tangent line at 𝑓 (𝑥).

In order to see what it means that a function 𝑓 is differentiable, we consider


first the familiar case that 𝑓 (𝑥) depends on a single variable 𝑥 in R1 . Figure 4.4
94 Chapter 4. First-Order Conditions

shows the graph of a function 𝑓 which is “smooth” (differentiable), which we


understand to mean that near a point 𝑥, the function 𝑓 (𝑥) (by “zooming in” to
the function graph near 𝑥) becomes less and less distinguishable from an affine
function which has a “gradient” 𝐺 that defines the slope of the “tangent” at 𝑓 (𝑥),
as in
𝑓 (𝑥) ≈ 𝑓 (𝑥) + 𝐺 · (𝑥 − 𝑥) . (4.10)
But this is an imprecise statement, which should clearly mean more than the left
and right side in (4.10) approach the same value as 𝑥 tends to 𝑥, because this would
be true for any 𝐺 as long as 𝑓 is continuous. What we mean is that 𝐺 should
represent the “rate of change” of 𝑓 (𝑥) near 𝑓 (𝑥). (Then 𝐺 will be the gradient or
derivative of 𝑓 at 𝑥.)

𝑓 (𝑥) = 𝑐 0 + 𝑐 1 𝑥

𝑓 (𝑥) = 𝑐 0 + 𝑐 1 𝑥

𝑥 𝑥
Figure 4.5 Two points (𝑥, 𝑓 (𝑥)) and (𝑥, 𝑓 (𝑥)) on the graph of 𝑓 define a
“secant” line, given by an affine function 𝑡 ↦→ 𝑐 0 + 𝑐1 𝑡. If 𝑐1 has a limit as 𝑥 → 𝑥,
then this limit defines 𝐺 in Figure 4.4.

In order to define 𝐺 as a suitable limit, we consider, as in Figure 4.5, a point 𝑥


near (but distinct from) 𝑥 and the unique affine function 𝑡 ↦→ 𝑐0 + 𝑐1 𝑡 that coincides
with 𝑓 for 𝑡 = 𝑥 and 𝑡 = 𝑥, that is,

𝑓 (𝑥) = 𝑐0 + 𝑐1 𝑥 , 𝑓 (𝑥) = 𝑐0 + 𝑐1 𝑥 . (4.11)

Then 𝑐 1 is defined by 𝑓 (𝑥) − 𝑓 (𝑥) = 𝑐1 (𝑥 − 𝑥), that is,

𝑓 (𝑥) − 𝑓 (𝑥)
𝑐1 = (4.12)
𝑥−𝑥
so that (by definition, with 𝑐1 defined as a function of 𝑥)

𝑓 (𝑥) = 𝑓 (𝑥) + 𝑐 1 (𝑥 − 𝑥) . (4.13)


4.4. Differentiability in R𝑛 95

In the representation (4.13), we are interested in 𝑐1 as the “rate of change” of 𝑓 ; we


do not care about the constant 𝑐 0 in (4.11), which would be given by 𝑐0 = 𝑓 (𝑥) − 𝑐 1 𝑥.
If 𝑐1 as defined in (4.12) has a limit as 𝑥 → 𝑥, then this limit will take the role
of 𝐺 in (4.10) and will be called the gradient or derivative of 𝑓 at 𝑥 and denoted
by 𝐷 𝑓 (𝑥). Geometrically, the “secant” line defined by two points (𝑥, 𝑓 (𝑥)) and
(𝑥, 𝑓 (𝑥)) on the curve in Figure 4.5 becomes the tangent line when 𝑥 → 𝑥.
For 𝑓 : R𝑛 → R, the aim will be to replicate (4.13) where the single coefficient
𝑐 1 will be replaced by an 𝑛-vector of coefficients (𝑐 1 , . . . , 𝑐 𝑛 ). It will be convenient
to treat this as a row vector. One problem is that this vector can no longer be
represented as a quotient as in (4.12). The following is the central definition of this
section, which we explain in detail afterwards. Interior points have been defined
in Definition 3.20.

Definition 4.1. Let 𝑋 ⊆ R𝑛 , 𝑓 : 𝑋 → R, and let 𝑥 be an interior point of 𝑋. Then 𝑓


is called differentiable at 𝑥 if for some 𝐺 ∈ R1×𝑛 and all ways to approach 𝑥 with 𝑥 we
have
𝑓 (𝑥) − 𝑓 (𝑥) − 𝐺 · (𝑥 − 𝑥)
lim = 0. (4.14)
𝑥→𝑥 ∥𝑥 − 𝑥∥
Any such 𝐺 is called a gradient or derivative of 𝑓 at 𝑥 and is denoted by 𝐷 𝑓 (𝑥). The
function 𝑓 is called differentiable on 𝑋 (or just differentiable if 𝑋 = R𝑛 ) if 𝑓 is
differentiable at all interior points 𝑥 of 𝑋.

In order to understand (4.14) better, we rewrite it as


𝑓 (𝑥) − 𝑓 (𝑥) 1
≈ 𝐺 · (𝑥 − 𝑥) · (4.15)
∥𝑥 − 𝑥∥ ∥𝑥 − 𝑥∥
where “≈” is here meant to say “holds as equality in the limit as 𝑥 → 𝑥”. The
right-hand side in (4.15) is a product of three terms: a row vector 𝐺 in R1×𝑛 , a
column vector 𝑥 − 𝑥 in R𝑛 = R𝑛×1 , and a scalar 1/∥𝑥 − 𝑥∥ in R. Written in this
way, each such product is the familiar matrix product as explained in Section 4.3.
In Definition 4.1 the vector 𝐺 is specified as a row vector (an element of R1×𝑛 ) to
avoid having to write it with a transposition sign. Note that when we consider
linear combinations of row vectors with scalars, each scalar is multiplied from the
left with its row vector.
Both sides of (4.15) are real numbers, because 𝑓 is real-valued and the term
𝑓 (𝑥) − 𝑓 (𝑥) is divided by the Euclidean norm ∥𝑥 − 𝑥∥ of the vector 𝑥 − 𝑥. Because
we have ∥𝑧𝛼∥ = ∥𝑧 ∥ |𝛼| for any 𝑧 ∈ R𝑛 and 𝛼 ∈ R, the vector 𝑦 = 𝑧 · 1/∥𝑧∥ (for
𝑧 ≠ 0) has unit length, that is, ∥𝑦∥ = 1. Therefore, on the right-hand side of (4.15)
the vector (𝑥 − 𝑥) · 1/∥𝑥 − 𝑥∥ has unit length. This vector is a scalar multiple of 𝑥 − 𝑥
and can thus be interpreted as the direction of 𝑥 − 𝑥 (a vector normalised to have
length one). If (4.15) holds, then the scalar product of 𝐺 with this vector given by
𝐺 · (𝑥 − 𝑥) · 1/∥𝑥 − 𝑥∥ shows the “growth rate” of 𝑓 (𝑥) − 𝑓 (𝑥) in the direction 𝑥 − 𝑥.
96 Chapter 4. First-Order Conditions

Consider now the case 𝑛 = 1, where ∥𝑧 ∥ = |𝑧| for any 𝑧 ∈ R1 , so that 𝑧 · 1/∥𝑧 ∥
is either 1 (if 𝑧 > 0) or −1 (if 𝑧 < 0). Then the right-hand side in (4.15) is 𝐺 if
𝑓 (𝑥)− 𝑓 (𝑥)
𝑥 > 𝑥 and −𝐺 if 𝑥 < 𝑥. Similarly, the left-hand side of (4.15 is 𝑥−𝑥 if 𝑥 > 𝑥 and
𝑓 (𝑥)− 𝑓 (𝑥)
− 𝑥−𝑥
if 𝑥 < 𝑥. Hence for 𝑥 ≠ 𝑥 these two conditions state

𝑓 (𝑥) − 𝑓 (𝑥)
lim = 𝐺, (4.16)
𝑥→𝑥 𝑥−𝑥
which is exactly the familiar notion of differentiability of a function defined on
R1 . The case distinction 𝑥 > 𝑥 and 𝑥 < 𝑥 that we just made emphasises that the
limit of the quotient in (4.16) has to exist for any possible approach of 𝑥 to 𝑥, which
is also stated in Definition 4.1. For example, consider the function 𝑓 (𝑥) = |𝑥|
which is well known not to be differentiable at 0. Namely, if we restrict 𝑥 to be
𝑓 (𝑥)− 𝑓 (𝑥)
positive (that is, 𝑥 > 𝑥 = 0), then 𝑥−𝑥 = |𝑥| 𝑥 = 1, whereas for 𝑥 < 𝑥 = 0 we have
𝑓 (𝑥)− 𝑓 (𝑥)
𝑥−𝑥
= |𝑥|
𝑥 = −1. Therefore, there is no common limit to these fractions as 𝑥 → 𝑥,
for example if we approach 𝑥 with the sequence {𝑥 𝑘 } defined by 𝑥 𝑘 = (−1/2) 𝑘
which converges to 0 but with alternating signs of 𝑥 𝑘 . In Definition 4.1, the limit
has to exist for any possible approach of 𝑥 to 𝑥 (for example, by letting 𝑥 be the
points of a sequence that converges to 𝑥).
Next, we show that the gradient 𝐺 of a differentiable function is unique
and nicely described by the row vector of “partial derivatives” of the function.
Subsequently, we describe in “Taylor’s theorem”, equation (4.20), how the derivative
represents a local linear approximation of the function.

4.5 Partial Derivatives and 𝐶 1 Functions

Next we show that if a function is differentiable at 𝑥 then its gradient 𝐷 𝑓 (𝑥) is


unique. We will show that the components of 𝐷 𝑓 (𝑥) are the 𝑛 partial derivatives
of 𝑓 , defined as follows.

Definition 4.2. Let 𝑋 ⊆ R𝑛 , 𝑓 : 𝑋 → R, and let 𝑥 be an interior point of 𝑋. Let


𝑒 𝑗 ∈ R𝑛 for 1 ≤ 𝑗 ≤ 𝑛 be the 𝑗-th unit vector, i.e., 𝑒 𝑗 is 1 in the 𝑗-th coordinate and 0
𝜕 𝑓 (𝑥)
everywhere else. Then the 𝑗-th partial derivative of 𝑓 at 𝑥 is the real number 𝜕𝑥 𝑗
𝜕
(or 𝜕𝑥 𝑗
𝑓 (𝑥)) such that

𝑓 (𝑥 + 𝑒 𝑗 𝑡) − 𝑓 (𝑥) 𝜕 𝑓 (𝑥)
lim = . (4.17)
𝑡→0 𝑡 𝜕𝑥 𝑗

𝑑
We have earlier (in our introductory Section 4.2) used the notation 𝑑𝑥 𝑗 𝑓 (𝑥)
𝜕
rather than 𝑓 (𝑥), which means the same, namely differentiating 𝑓 (𝑥1 , . . . , 𝑥 𝑛 )
𝜕𝑥 𝑗
as a function of 𝑥 𝑗 only, while keeping the values of all other variables 𝑥 1 , . . . , 𝑥 𝑗−1 ,
4.5. Partial Derivatives and 𝐶 1 Functions 97

𝜕
𝑥 𝑗+1 , . . . 𝑥 𝑛 fixed. For example, if 𝑓 (𝑥1 , 𝑥2 ) = 𝑥 1 𝑥2 + 𝑥1 , then 𝜕𝑥1
𝑓 (𝑥1 , 𝑥2 ) = 𝑥2 + 1
𝜕
and 𝜕𝑥2
𝑓 (𝑥 1 , 𝑥2 ) = 𝑥1 .
Next we show that the gradient of a differentiable function is the vector of
partial derivatives.

Proposition 4.3. Let 𝑋 ⊆ R𝑛 , 𝑓 : 𝑋 → R, and let 𝑥 be an interior point of 𝑋. If 𝑓 is


differentiable at 𝑥, then
 𝜕 𝑓 (𝑥) 𝜕 𝑓 (𝑥) 𝜕 𝑓 (𝑥) 
𝐷 𝑓 (𝑥) = , , ..., . (4.18)
𝜕𝑥 1 𝜕𝑥2 𝜕𝑥 𝑛

Proof. Consider some 𝑗 for 1 ≤ 𝑗 ≤ 𝑛 and consider the points 𝑥 = 𝑥 + 𝑒 𝑗 𝑡 for 𝑡 ∈ R


where 𝑥 → 𝑥 if 𝑡 → 0. Then 𝑥 − 𝑥 = 𝑒 𝑗 𝑡 and ∥𝑥 − 𝑥∥ = ∥𝑒 𝑗 𝑡 ∥ = |𝑡 |. Hence, by (4.14),

𝑓 (𝑥 + 𝑒 𝑗 𝑡) − 𝑓 (𝑥) 𝐺 · 𝑒𝑗 · 𝑡
 
lim − = 0
𝑡→0 |𝑡| |𝑡 |
or equivalently, as in the consideration for (4.16) for the two cases 𝑡 > 0 and 𝑡 < 0,

𝜕 𝑓 (𝑥 + 𝑒 𝑗 𝑡) − 𝑓 (𝑥)
𝑓 (𝑥) = lim = 𝐺 · 𝑒𝑗
𝜕𝑥 𝑗 𝑡→0 𝑡
𝜕
and therefore 𝐺 · 𝑒 𝑗 , which is the 𝑗th component of 𝐺, is 𝜕𝑥 𝑗
𝑓 (𝑥) as claimed.

Hence, if 𝑓 has a gradient 𝐷 𝑓 , it is uniquely given by the vector of partial


derivatives. However, the existence of these partial derivatives is not enough to
guarantee that the function is differentiable. In fact, the function may not even
be continuous. As an example, consider the function 𝑔 : R2 → R defined in
𝜕
(3.42). Since 𝑔(𝑥, 0) = 𝑔(0, 𝑦) = 0 for all 𝑥, 𝑦, we clearly have 𝜕𝑥 𝑔(𝑥, 0) = 0 and
𝜕
𝜕𝑦
𝑔(0, 𝑦) = 0, and for 𝑦 ≠ 0 we have

𝜕 𝑦(𝑥 2 + 𝑦 2 ) − 2𝑥(𝑥𝑦) 𝑦 3 − 𝑦𝑥 2
𝑔(𝑥, 𝑦) = = (4.19)
𝜕𝑥 (𝑥 2 + 𝑦 2 )2 (𝑥 2 + 𝑦 2 )2
𝜕 𝑥 3 −𝑥 𝑦 2
and for 𝑥 ≠ 0 we have 𝜕𝑦 𝑔(𝑥, 𝑦) = (𝑥 2 +𝑦 2 )2 because 𝑔(𝑥, 𝑦) is symmetric in 𝑥 and 𝑦.
So the partial derivatives of 𝑔 exist everywhere. However, 𝑔(𝑥, 𝑦) is not even
continuous, let alone differentiable.
It can be shown that the continuous function ℎ(𝑥, 𝑦) defined in (3.57) is not
differentiable at (0, 0).
Nevertheless, the partial derivatives of a function are very useful if they are
continuous, which is often the case.

Definition 4.4. Let 𝑋 be an open subset of R𝑛 and 𝑓 : 𝑋 → R. Then 𝑓 is called


continuously differentiable on 𝑋, and we say 𝑓 is 𝐶 1 (𝑋), if 𝑓 is differentiable on 𝑋
and its gradient 𝐷 𝑓 (𝑥) is a continuous function of 𝑥 ∈ 𝑋.
98 Chapter 4. First-Order Conditions

Theorem 4.5. Let 𝑋 be an open subset of R𝑛 and 𝑓 : 𝑋 → R. Then 𝑓 is 𝐶 1 (𝑋) if and


only if all partial derivatives 𝜕𝑥𝜕 𝑓 (𝑥) exist and are continuous functions of 𝑥 on 𝑋, for
𝑗
1 ≤ 𝑗 ≤ 𝑛.

Theorem 4.5 is not a trivial observation, because the existence of partial


derivatives does not imply differentiability. However, if the partial derivatives
exist and are continuous, then the function is differentiable and continuously
differentiable. As with many further results in this chapter, we will not prove
Theorem 4.5 for reasons of space. A proof is given on page 219, theorem 9.21 of
Rudin (1976).
What is important is that the partial derivatives have to be jointly continuous
in all variables, not just separately continuous. For example, the function 𝑔(𝑥, 𝑦)
defined in (3.42) is separately continuous in 𝑥 and 𝑦, respectively (when the other
𝜕 𝑦 3 −𝑦𝑥 2
variable is fixed), and so is in fact each partial derivative, such as 𝜕𝑥 𝑔(𝑥, 𝑦) = (𝑥 2 +𝑦 2 )2
in (4.19). However, this function is not jointly continuous at (0, 0), because for
𝜕 3 −2𝑥 3
𝑦 = 2𝑥, say, we have (for 𝑥 ≠ 0) 𝜕𝑥 𝑔(𝑥, 𝑦) = (𝑥8𝑥2 +4𝑥 6
2 )2 = 5𝑥 which does not tend to
𝜕
𝜕𝑥
𝑔(0, 0) = 0 when 𝑥 → 0.
By definition, a 𝐶 1 function is differentiable. Many functions (in particular if
they are not defined by case distinctions) are easily seen to have continuous partial
derivatives, so for simplicity this is often assumed rather than the more general
condition of differentiability.

4.6 Taylor’s Theorem

The following theorem expresses, once more, that differentiability means local
approximation by a linear function. It will also be used to prove (sometimes only
with a heuristic argument) first-order conditions for optimality that we consider
later.

Theorem 4.6 (Taylor). Let 𝑋 ⊆ R𝑛 , 𝑓 : 𝑋 → R, and let 𝑥 be an interior point of 𝑋.


Then 𝑓 is differentiable at 𝑥 with gradient 𝐺 in R1×𝑛 if and only if there exists a function
𝑅 : 𝑋 → R (called the “remainder term”) so that

𝑓 (𝑥) = 𝑓 (𝑥) + 𝐺 · (𝑥 − 𝑥) + 𝑅(𝑥) · ∥𝑥 − 𝑥∥ (4.20)

where lim𝑥→𝑥 𝑅(𝑥) = 𝑅(𝑥) = 0.

Proof. By (4.14), the differentiability of 𝑓 at 𝑥 is equivalent to the condition that the


function 𝑅 : 𝑋 → R defined by 𝑅(𝑥) = 0, and for 𝑥 ≠ 𝑥 by

𝑓 (𝑥) − 𝑓 (𝑥) − 𝐺 · (𝑥 − 𝑥)
𝑅(𝑥) = (4.21)
∥𝑥 − 𝑥∥
4.6. Taylor’s Theorem 99

fulfills lim𝑥→𝑥 𝑅(𝑥) = 0. Multiplication of both sides in (4.21) with ∥𝑥 − 𝑥∥ gives


(4.20), and division of both sides of (4.20) by ∥𝑥 − 𝑥∥ gives (4.21), so the two
equations are equivalent.
The important part in (4.20) is that the remainder term 𝑅(𝑥) tends to zero as
𝑥 → 𝑥. The norm ∥𝑥 − 𝑥∥ also tends to zero as 𝑥 → 𝑥, so the product 𝑅(𝑥) · ∥𝑥 − 𝑥∥
becomes negligible in comparison to 𝐺 · (𝑥 − 𝑥) which is therefore the dominant
linear term. Condition (4.20) is perhaps the best way to understand differentiability
as “local approximation by a linear function”.
Taylor’s theorem is familiar from the case 𝑛 = 1, where 𝐷 𝑓 is typically written
as 𝑓 ′,
and the second derivative (the derivative of 𝑓 ′) as 𝑓 ′′, if it exists. In that case
the theorem can be applied again to 𝑅(𝑥) which itself has a Taylor approximation,
which gives rise to an expression like

𝑓 ′′(𝑥)

𝑓 (𝑥) = 𝑓 (𝑥) + 𝑓 (𝑥)(𝑥 − 𝑥) + ˆ
(𝑥 − 𝑥)2 + 𝑅(𝑥) · |𝑥 − 𝑥| 2 (4.22)
2
ˆ
where lim𝑥→𝑥 𝑅(𝑥) = 0. By iterating this process for functions that are differentiable
sufficiently many times, one obtains a “Taylor expansion” that approximates the
function not just linearly but by a higher-degree polynomial.
Furthermore, the expression (4.22) for a function that is twice differentiable
is more informative than the expression (4.20), with the following additional
observation: one can show that it allows to represent the original remainder term
𝑅(𝑥) to be of the form 𝑓 ′′(𝑧)/2 · (𝑥 − 𝑥) for some “intermediate value” 𝑧 that is
between 𝑥 and 𝑥; hence bounds on 𝑓 ′′(𝑧) translate to bounds on 𝑅(𝑥). These
variations of Taylor’s theorem are often stated in the literature. We do not consider
them here, only the simple version of Theorem 4.6.
We illustrate (4.20) with a specific remainder term for some differentiable
function 𝑓 : R2 → R with gradient 𝐷 𝑓 (𝑥, 𝑦). Fix (𝑥, 𝑦) and let (𝑥, 𝑦) = (𝑥, 𝑦) +
(Δ𝑥 , Δ 𝑦 ). Then (4.20) becomes

Δ𝑥 
𝑓 (𝑥 + Δ𝑥 , 𝑦 + Δ 𝑦 ) = 𝑓 (𝑥, 𝑦) + 𝐷 𝑓 (𝑥, 𝑦) · + 𝑅(Δ𝑥 , Δ 𝑦 ) · ∥(Δ𝑥 , Δ 𝑦 )∥. (4.23)
Δ𝑦

Consider now the function 𝑓 (𝑥, 𝑦) = 𝑥 · 𝑦 which has gradient 𝐷 𝑓 (𝑥, 𝑦) = (𝑦, 𝑥),
which is a continuous function of (𝑥, 𝑦). By Theorem 4.5, 𝑓 is continuously
differentiable. Then

𝑓 (𝑥, 𝑦) = 𝑓 (𝑥 + Δ𝑥 , 𝑦 + Δ 𝑦 ) = (𝑥 + Δ𝑥 ) · (𝑦 + Δ 𝑦 )
= 𝑥 𝑦 + 𝑦Δ𝑥 + 𝑥Δ 𝑦 + Δ𝑥 Δ 𝑦
(4.24)
Δ𝑥 
= 𝑓 (𝑥, 𝑦) + 𝐷 𝑓 (𝑥, 𝑦) · + Δ𝑥 Δ 𝑦
Δ𝑦
100 Chapter 4. First-Order Conditions

which is of the form (4.23) if we can find a remainder term 𝑅(Δ𝑥 , Δ 𝑦 ) such that
𝑅(Δ𝑥 , Δ 𝑦 ) · ∥(Δ𝑥 , Δ 𝑦 )∥ = Δ𝑥 Δ 𝑦 . This holds if 𝑅(Δ𝑥 , Δ 𝑦 ) = Δ𝑥 Δ 𝑦 /∥(Δ𝑥 , Δ 𝑦 )∥, and
then v
t
|Δ𝑥 Δ 𝑦 | Δ2𝑥 Δ2𝑦 1
|𝑅(Δ𝑥 , Δ 𝑦 )| = q = 2 2
= q (4.25)
2
Δ𝑥 + Δ 𝑦 2 Δ 𝑥 + Δ 𝑦 1
+ 1
Δ2𝑦 Δ2𝑥

which indeed goes to zero as (Δ𝑥 , Δ 𝑦 ) → (0, 0) because then the denominator in
(4.25) becomes arbitrarily large.

4.7 Unconstrained Optimisation

We give some definitions of global and local maximum and minimum.

Definition 4.7. Let ∅ ≠ 𝑋 ⊆ R𝑛 , 𝑓 : 𝑋 → R, and 𝑥 ∈ 𝑋. Then


• 𝑓 attains a global maximum on 𝑋 at 𝑥 if 𝑓 (𝑥) ≤ 𝑓 (𝑥) for all 𝑥 ∈ 𝑋. Then 𝑥 is
called a global maximiser of 𝑓 on 𝑋.
• 𝑓 attains a local maximum on 𝑋 at 𝑥 if there exists 𝜀 > 0 so that 𝑓 (𝑥) ≤ 𝑓 (𝑥) for
all 𝑥 ∈ 𝑋 ∩ 𝐵(𝑥, 𝜀). Then 𝑥 is called a local maximiser of 𝑓 on 𝑋.
• 𝑓 attains an unconstrained local maximum on 𝑋 at 𝑥 if there exists 𝜀 > 0 so that
𝐵(𝑥, 𝜀) ⊆ 𝑋 and 𝑓 (𝑥) ≤ 𝑓 (𝑥) for all 𝐵(𝑥, 𝜀). Then 𝑥 is called an unconstrained
local maximiser of 𝑓 on 𝑋.
The analogous definitions hold for “minimum” instead of “maximum” by replacing
“ 𝑓 (𝑥) ≤ 𝑓 (𝑥)” with “ 𝑓 (𝑥) ≥ 𝑓 (𝑥)”.

a b
c d e

Figure 4.6 Illustration of Definition 4.7 for a function defined on the interval
[𝑎, 𝑒].
4.7. Unconstrained Optimisation 101

In Figure 4.7, 𝑎, 𝑐, and 𝑒 are local maximisers and 𝑏 and 𝑑 are local minimisers
of the function shown, where 𝑏, 𝑐, 𝑑 are unconstrained. The function attains its
global minimum at 𝑏 and its global maximum at 𝑒.

Lemma 4.8. Let 𝑋 ⊆ R𝑛 , 𝑓 : 𝑋 → R, and 𝑥 ∈ 𝑋. Then 𝑥 is an unconstrained local


maximiser of 𝑓 on 𝑋 ⇔ 𝑥 is a local maximiser of 𝑓 on 𝑋 and an interior point of 𝑋.

Proof. The direction “⇒” is immediate from Definition 4.7. To see the converse
direction “⇐”, if 𝑥 is a local maximiser of 𝑓 on 𝑋 then there is some 𝜀1 > 0 so that
𝑓 (𝑥) ≤ 𝑓 (𝑥) for all 𝑥 ∈ 𝑋 ∩ 𝐵(𝑥, 𝜀1 ), and 𝑥 is an interior point of 𝑋 if 𝐵(𝑥, 𝜀2 ) ⊆ 𝑋
for some 𝜀2 > 0. With 𝜀 = min{𝜀1 , 𝜀2 } we obtain 𝐵(𝑥, 𝜀) ⊆ 𝑋 and 𝑓 (𝑥) ≤ 𝑓 (𝑥) for
all 𝑥 ∈ 𝐵(𝑥, 𝜀), that is, 𝑥 is an unconstrained local maximiser of 𝑓 on 𝑋.

For a differentiable (for example, 𝐶 1 ) function, the following lemma shows


that at an unconstrained maximiser or minimiser the function has a zero gradient.
The lemma is a corollary to Taylor’s Theorem 4.6.

Lemma 4.9. Let 𝑋 ⊆ R𝑛 and let 𝑓 : 𝑋 → R be differentiable at 𝑥 ∈ 𝑋. If 𝑥 is an


unconstrained maximiser or minimiser of 𝑓 , then 𝐷 𝑓 (𝑥) = 0.

Proof. Suppose 𝑥 is an unconstrained maximiser of 𝑓 . Then 𝐵(𝑥, 𝜀) ⊆ 𝑋 for some


𝜀 > 0, and 𝑓 (𝑥) ≤ 𝑓 (𝑥) for all 𝑥 ∈ 𝐵(𝑥, 𝜀).
We apply Taylor’s Theorem 4.6, so that by (4.20), with Δ𝑥 = 𝑥 − 𝑥, we have for
all Δ𝑥 so that ∥Δ𝑥 ∥ < 𝜀

𝑓 (𝑥 + Δ𝑥 ) = 𝑓 (𝑥) + 𝐷 𝑓 (𝑥) · Δ𝑥 + 𝑅(Δ𝑥 ) · ∥Δ𝑥 ∥ (4.26)

where |𝑅(Δ𝑥 )| → 0 as Δ𝑥 → 0. Because the local maximisation is unconstrained,


we can choose any Δ𝑥 with ∥Δ𝑥 ∥ < 𝜀 in (4.26). Suppose that 𝐺 = 𝐷 𝑓 (𝑥) ≠ 0.
We will show that the value of 𝑓 increases in the direction of the gradient 𝐺, a
contradiction. That is, let Δ𝑥 = 𝐺⊤𝑡 for any 𝑡 > 0 such that ∥Δ𝑥 ∥ = ∥𝐺∥𝑡 < 𝜀 (we
transpose 𝐺 to obtain a column vector 𝐺⊤; then 𝐺 · 𝐺⊤ = ∥𝐺∥ 2 ). Then by (4.26),

𝑓 (𝑥 + Δ𝑥 ) = 𝑓 (𝑥) + 𝐷 𝑓 (𝑥) · Δ𝑥 + 𝑅(Δ𝑥 ) · ∥Δ𝑥 ∥


= 𝑓 (𝑥) + 𝐺 · 𝐺⊤𝑡 + 𝑅(𝐺⊤𝑡) · ∥𝐺∥𝑡
= 𝑓 (𝑥) + ∥𝐺∥𝑡 · (∥𝐺∥ + 𝑅(𝐺⊤𝑡)) .

Because ∥𝐺∥ > 0, 𝑡 > 0, and 𝑅(𝐺⊤𝑡) → 0 as 𝑡 → 0, the term ∥𝐺∥ + 𝑅(𝐺⊤𝑡) is
positive for sufficiently small positive 𝑡, and therefore 𝑓 (𝑥 + Δ𝑥 ) > 𝑓 (𝑥), which
contradicts the local maximality of 𝑓 (𝑥). So 𝐷 𝑓 (𝑥) = 0 as claimed.
If 𝑥 is an unconstrained local minimiser of 𝑓 , then 𝑥 is an unconstrained local
maximiser of − 𝑓 , so that −𝐷 𝑓 (𝑥) = 0, which is equivalent to 𝐷 𝑓 (𝑥) = 0.
102 Chapter 4. First-Order Conditions

In Lemma 4.9, it is important that the local maximum is unconstrained. For


example, points 𝑎 and 𝑒 in Figure 4.6 are local maximisers where the derivative of
the function is not zero. Moreover, a zero gradient is only a necessary condition. It
may indicate a stationary point that is neither a local maximiser nor minimiser, such
as 𝑥 = 0 of the function 𝑓 : R → R defined by 𝑓 (𝑥) = 𝑥 3 . Another example is the
function 𝑓 : R2 → R defined by 𝑓 (𝑥, 𝑦) = 𝑥 · 𝑦 where (0, 0) has gradient zero, but
is neither a local maximiser nor minimiser because 𝑓 (0, 0) = 0 and 𝑓 (𝑥, 𝑦) takes
positive and negative values nearby.
For a differentiable function 𝑓 : R𝑛 → R, its gradient 𝐷 𝑓 (𝑥) for 𝑥 ∈ R𝑛 is a row
vector with 𝑛 components, so the condition 𝐷 𝑓 (𝑥) = 0 amounts to 𝑛 equations for
𝑛 unknowns (the 𝑛 components of 𝑥). Often these 𝑛 equations have only a finite
number of solutions, which can then be checked as to whether they represent local
maxima or minima of 𝑓 .
We show how to use Lemma 4.9 with two examples. First, consider the
function 𝑓 : R2 → R,
𝑥−𝑦
𝑓 (𝑥, 𝑦) = , (4.27)
2 + 𝑥2 + 𝑦2

where 𝐷 𝑓 (𝑥, 𝑦) = (0, 0) means

2 + 𝑥 2 + 𝑦 2 − (𝑥 − 𝑦)2𝑥 −2 − 𝑥 2 − 𝑦 2 − (𝑥 − 𝑦)2𝑦
= 0, )=0
(2 + 𝑥 2 + 𝑦 2 )2 (2 + 𝑥 2 + 𝑦 2 )2

or equivalently

2 − 𝑥 2 + 𝑦 2 + 2𝑥𝑦 = 0 ,
(4.28)
−2 − 𝑥 2 + 𝑦 2 − 2𝑥𝑦 = 0 .

The simultaneous solution of two nonlinear equations as in (4.28) is in general not


easy. Here, these equations are very similar, so that sometimes a simpler equation
results by, for example, adding them (in other cases another manipulation may be
useful, such as taking their difference). Adding the two equations in (4.28) gives
−2𝑥 2 + 2𝑦 2 = 0 or equivalently 𝑥 2 = 𝑦 2 which means that either 𝑥 = 𝑦 or 𝑥 = −𝑦.
Furthermore, 𝑥 2 = 𝑦 2 in either equation (4.28) gives 2 + 2𝑥𝑦 = 0 and thus 𝑥𝑦 = −1,
which has no solution if 𝑥 = 𝑦 but the solutions (𝑥, 𝑦) = (1, −1) or (𝑥, 𝑦) = (−1, 1)
if 𝑥 = −𝑦. It seems that (𝑥, 𝑦) = (1, −1) is a local or even global maximiser and
(𝑥, 𝑦) = (−1, 1) a local or global minimiser of 𝑓 (𝑥, 𝑦). It can be verified directly that
𝑓 (1, −1) is the global maximum of 𝑓 , because the following are equivalent for all
(𝑥, 𝑦):
4.7. Unconstrained Optimisation 103

2
𝑓 (𝑥, 𝑦) ≤ 𝑓 (1, −1) =
2+1+1
𝑥−𝑦 1

2 + 𝑥2 + 𝑦2 2
(4.29)
2𝑥 − 2𝑦 ≤ 2 + 𝑥 2 + 𝑦 2
0 ≤ 1 − 2𝑥 + 𝑥 2 + 1 + 2𝑦 + 𝑦 2
0 ≤ (1 − 𝑥)2 + (1 + 𝑦)2

which is true (with equality for (𝑥, 𝑦) = (1, −1), a useful check). The inequality
𝑓 (𝑥, 𝑦) ≥ 𝑓 (−1, 1) = − 12 is shown very similarly, which shows that 𝑓 (−1, 1) is the
global minimum of 𝑓 .
In the following example, the first-order condition of a zero derivative gives
also useful information, although of a different kind. Consider the function
𝑔 : R2 → R,
𝑥𝑦
𝑔(𝑥, 𝑦) = , (4.30)
1 + 𝑥2 + 𝑦2

where 𝐷 𝑔(𝑥, 𝑦) = (0, 0) means

𝑦(1 + 𝑥 2 + 𝑦 2 ) − (𝑥𝑦)2𝑥 𝑥(1 + 𝑥 2 + 𝑦 2 − (𝑥𝑦)2𝑦


= 0, )=0
(1 + 𝑥 2 + 𝑦 2 )2 (1 + 𝑥 2 + 𝑦 2 )2

or equivalently
𝑦 − 𝑦𝑥 2 + 𝑦 3 = 0 ,
(4.31)
𝑥 + 𝑥 3 − 𝑥𝑦 2 = 0 .

An obvious solution to (4.31) is (𝑥, 𝑦) = (0, 0), but this is only a stationary point of
𝑔 and neither maximum nor minimum (not even locally), because 𝑔(0, 0) = 0 but
𝑔(𝑥, 𝑦) takes positive as well as negative values (also near (0, 0)). Similarly, when
𝑥 = 0 or 𝑦 = 0, then 𝑔(𝑥, 𝑦) = 0 but this is not a maximum or minimum, so that we
can assume 𝑥 ≠ 0 and 𝑦 ≠ 0. Then the equations (4.31) are equivalent to

1 − 𝑥2 + 𝑦2 = 0
(4.32)
1 + 𝑥2 − 𝑦2 = 0

which when added give 2 = 0 which is a contradiction. This shows that there is no
solution to (4.31) where 𝑥 ≠ 0 and 𝑦 ≠ 0 and thus 𝑔(𝑥, 𝑦) has no local and therefore
also no global maximum or minimum. This is possible because the domain R2 of
𝑔 is not compact. For 𝑥 = 𝑦 and large 𝑥, for example, we have

𝑥2 1
𝑔(𝑥, 𝑥) = 2
=
1 + 2𝑥 1/𝑥 2 + 2
104 Chapter 4. First-Order Conditions

which tends to 12 as 𝑥 → ∞. It seems that 12 is an upper bound for 𝑔(𝑥, 𝑦). We can
prove this for all (𝑥, 𝑦) via the following equivalences:
𝑥𝑦 1
𝑔(𝑥, 𝑦) = <
1+𝑥 +𝑦
2 2 2
2𝑥𝑦 < 1 + 𝑥 2 + 𝑦 2
0 < 1 + 𝑥 2 − 2𝑥𝑦 + 𝑦 2
0 < 1 + (𝑥 − 𝑦)2

which is true. We can show similarly that 𝑔(𝑥, 𝑦) > − 12 and that 𝑔(𝑥, −𝑥) gets
arbitrarily close to − 12 . This shows that the image of 𝑔 is the interval (− 21 , 12 ).

⇒ Test your understanding of this section with Exercise 4.1.

4.8 Equality Constraints and the Theorem of Lagrange

The following central Theorem of Lagrange gives conditions for a constrained local
maximum or minimum of a continuously differentiable function 𝑓 . The con-
straints are given as 𝑘 equations 𝑔1 (𝑥) = 0, . . . , 𝑔 𝑘 (𝑥) = 0 with continuously
differentiable functions 𝑔1 , . . . , 𝑔 𝑘 . The theorem states that at a local maximum
or minimum, the optimised function 𝑓 (𝑥) has a gradient that is a linear combina-
tion of the gradients of these constraint functions, provided these gradients are
linearly independent. The latter condition is called the constraint qualification. The
corresponding coefficients 𝜆1 , . . . , 𝜆 𝑘 are known as Lagrange multipliers.

Theorem 4.10 (Lagrange). Let 𝑓 : 𝑈 → R be a 𝐶 1 (𝑈) function on an open subset 𝑈


of R𝑛 , and let 𝑔1 , . . . , 𝑔 𝑘 : R𝑛 → R be 𝐶 1 (R𝑛 ) functions. Let 𝑥 be a local maximiser or
minimiser of 𝑓 on 𝑋 = 𝑈 ∩ {𝑥 ∈ R𝑛 | 𝑔𝑖 (𝑥) = 0, 1 ≤ 𝑖 ≤ 𝑘}. Let 𝐷 𝑔1 (𝑥), . . . , 𝐷 𝑔 𝑘 (𝑥)
be linearly independent (“constraint qualification”). Then there exist 𝜆1 , . . . , 𝜆 𝑘 ∈ R
(“Lagrange multipliers”) such that
𝑘
Õ
𝐷 𝑓 (𝑥) = 𝜆 𝑖 𝐷 𝑔𝑖 (𝑥) . (4.33)
𝑖=1

To understand this theorem, consider first the case 𝑘 = 1, that is, a single
constraint 𝑔(𝑥) = 0. Then (4.33) states 𝐷 𝑓 (𝑥) = 𝜆 𝐷 𝑔(𝑥), which means that the
gradient of 𝑓 (a row vector) is a scalar multiple of the gradient of 𝑔. The two
gradients have the 𝑛 partial derivatives of 𝑓 and 𝑔 as components, and each partial
derivative of 𝑔 is multiplied with the same 𝜆 to equal the respective partial derivative
of 𝑓 . These are 𝑛 equations for the 𝑛 components of 𝑥 and 𝜆 as unknowns. An
additional equation is 𝑔(𝑥) = 0. Hence, these are 𝑛 +1 equations for 𝑛 +1 unknowns
in total. If there are 𝑘 constraints 𝑔𝑖 (𝑥) = 0 for 1 ≤ 𝑖 ≤ 𝑛, then (4.33) and these
4.8. Equality Constraints and the Theorem of Lagrange 105

constraints are 𝑛 + 𝑘 equations for 𝑛 + 𝑘 unknowns 𝑥 and 𝜆1 , . . . , 𝜆 𝑘 . Often these


equations have only finitely many solutions that can then be investigated further.
As an example with a single constraint, consider the functions 𝑓 , 𝑔 : R2 → R,
𝑓 (𝑥, 𝑦) = 𝑥 · 𝑦, 𝑔(𝑥, 𝑦) = 𝑥 2 + 𝑦 2 − 2, (4.34)
where 𝑓 (𝑥, 𝑦) is to be maximised or minimised subject to 𝑔(𝑥, 𝑦) = 0,√that is, on
the set 𝑋 = {(𝑥, 𝑦) ∈ R2 | 𝑥 2 + 𝑦 2 − 2 = 0}, which is a circle of radius 2 around
the origin (0, 0). The contour lines of 𝑓 and 𝑔 are shown in Figure 4.7. Because 𝑋
is compact and 𝑓 is continuous, 𝑓 assumes its maximum and minimum on 𝑋 by
the theorem of Weierstrass.

Figure 4.7 Illustration of the Theorem of Lagrange for 𝑓 and 𝑔 in (4.34).


The arrows indicate the gradients of 𝑓 and 𝑔, which are orthogonal to the
contour lines. These gradients have to be co-linear in order to find a local
maximum or minimum of 𝑓 (𝑥, 𝑦) subject to the constraint 𝑔(𝑥, 𝑦) = 0.

For (4.34), 𝐷 𝑓 (𝑥, 𝑦) = (𝑦, 𝑥) and 𝐷 𝑔(𝑥, 𝑦) = (2𝑥, 2𝑦). Here 𝐷 𝑔(𝑥, 𝑦) is linearly
dependent only if (𝑥, 𝑦) = (0, 0), which however does not fulfill 𝑔(𝑥, 𝑦) = 0,
so the constraint qualification holds always. The Lagrange multiplier 𝜆 has
to fulfill 𝐷 𝑓 (𝑥, 𝑦) = 𝜆 𝐷 𝑔(𝑥, 𝑦), that is, (𝑦, 𝑥) = 𝜆(2𝑥, 2𝑦). Here 𝑥 = 0 would
imply 𝑦 = 0 and vice versa, so we have 𝑥 ≠ 0 and 𝑦 ≠ 0, and the first equation
𝑦 = 𝜆2𝑥 implies 𝜆 = 𝑦/2𝑥, which when substituted into the second equation
gives 𝑥 = 𝜆2𝑦 = 2𝑦 2 /2𝑥, and thus 𝑥 2 = 𝑦 2 or |𝑥| = |𝑦|. The constraint 𝑔(𝑥, 𝑦) = 0
then implies 𝑥 2 + 𝑦 2 − 2 = 2𝑥 2 − 2 = 0 and therefore |𝑥| = 1, which gives the four
solutions (1, 1), (−1, −1), (−1, 1), and (1, −1). For the first two solutions, 𝑓 takes the
value 1, and for the last two the value −1, so these are the local and in fact global
maxima and minima of 𝑓 on the circle 𝑋.
The following functions illustrate why the constraint qualification is needed in
Theorem 4.10. Let
𝑓 (𝑥, 𝑦) = −𝑦, 𝑔(𝑥, 𝑦) = 𝑥 2 − 𝑦 3 , (4.35)
106 Chapter 4. First-Order Conditions

Figure 4.8 Horizontal contour lines of 𝑓 in (4.35), and of 𝑔 for 𝑔(𝑥, 𝑦) = 0


given by 𝑦 = |𝑥| 2/3 . The arrows, orthogonal to the contour lines, are the
gradients of 𝑓 and 𝑔, which are nowhere co-linear. The maximum of 𝑓 (𝑥, 𝑦)
is at (𝑥, 𝑦) = (0, 0) where the constraint qualification fails.

where 𝑓 (𝑥, 𝑦) is maximised subject to 𝑔(𝑥, 𝑦) = 0, that is, on 𝑋 = {(𝑥, 𝑦) ∈


R2 | 𝑦 3 = 𝑥 2 }. The set 𝑋 is shown in Figure 4.8 as the two mirrored arcs of
the function 𝑦 = |𝑥| 2/3 , which end in a cusp (pointed end) at (𝑥, 𝑦) = (0, 0).
Here 𝐷 𝑓 (𝑥, 𝑦) = (0, −1) and 𝐷 𝑔(𝑥, 𝑦) = (2𝑥, −3𝑦 2 ). However, the equation
𝐷 𝑓 (𝑥, 𝑦) = 𝜆 𝐷 𝑔(𝑥, 𝑦), that is, (0, −1) = 𝜆(2𝑥, −3𝑦 2 ) has no solution at all, since
0 = 𝜆2𝑥 implies 𝜆 = 0 or 𝑥 = 0 and in either case 𝑦 = 0. However, the unique
maximiser of 𝑓 (𝑥, 𝑦) on 𝑋 is clearly (0, 0). The equation 𝐷 𝑓 (𝑥, 𝑦) = 𝜆 𝐷 𝑔(𝑥, 𝑦)
fails to hold because the constraint qualification is not fulfilled, because then the
gradient 𝐷 𝑔(𝑥, 𝑦) equals (0, 0), which is not a linearly independent vector.
An example that gives a geometric justification for Theorem 4.10 was given in
Figure 4.3. We do not prove Theorem 4.10, but give a more general plausibility
argument for the case 𝑘 = 1, with the help of Taylor’s Theorem 4.6. Consider 𝑥 so
that 𝑓 (𝑥) is a local maximum of 𝑓 on 𝑋 = {𝑥 ∈ 𝑈 | 𝑔(𝑥) = 0}, and thus 𝑔(𝑥) = 0.
Any variation Δ𝑥 around 𝑥 so that 𝑥 + Δ𝑥 ∈ 𝑋 requires

0 = 𝑔(𝑥) = 𝑔(𝑥 + Δ𝑥 ) ≈ 𝑔(𝑥) + 𝐷 𝑔(𝑥) · Δ𝑥 (4.36)

where “≈” means that we neglect the remainder term because we assume Δ𝑥
to be sufficiently small. By (4.36), 𝐷 𝑔(𝑥) · Δ𝑥 = 0, and the set of these Δ𝑥 ’s is a
subspace of R𝑛 of dimension 𝑛 −1 provided 𝐷 𝑔(𝑥) ≠ 0, which holds by the constraint
qualification (this just says that the gradient of 𝑔 at the point 𝑥 is orthogonal to the
“contour set” {𝑥 ∈ R𝑛 | 𝑔(𝑥) = 0}). Similarly, a local maximum 𝑓 (𝑥) requires

𝑓 (𝑥 + Δ𝑥 ) ≈ 𝑓 (𝑥) + 𝐷 𝑓 (𝑥) · Δ𝑥 ≤ 𝑓 (𝑥) (4.37)

and therefore
𝐷 𝑔(𝑥) · Δ𝑥 = 0 ⇒ 𝐷 𝑓 (𝑥) · Δ𝑥 = 0 (4.38)
4.8. Equality Constraints and the Theorem of Lagrange 107

for the following reason: The condition 𝐷 𝑔(𝑥) · Δ𝑥 = 0 states that 𝑥 + Δ𝑥 ∈ 𝑋


by (4.36). If for such a Δ𝑥 we had 𝐷 𝑓 (𝑥) · Δ𝑥 ≠ 0, then either 𝐷 𝑓 (𝑥) · Δ𝑥 > 0 or
𝐷 𝑓 (𝑥) · (−Δ𝑥 ) > 0 where also 𝐷 𝑔(𝑥) · (−Δ𝑥 ) = 0 and then 𝑥 − Δ𝑥 ∈ 𝑋 because
𝑋 = 𝑈 ∩ {𝑥 ∈ R𝑛 | 𝑔(𝑥) = 0} and 𝑈 is an open set. But this contradicts (4.37). In
turn, because 𝐷 𝑔(𝑥) ≠ 0, (4.38) holds only if 𝐷 𝑓 (𝑥) = 𝜆 𝐷 𝑔(𝑥) as claimed. This is a
heuristic argument where we have assumed that the functions 𝑓 and 𝑔 behave like
affine functions near 𝑥, which is approximately true because they are differentiable.
Consider the maximisation problem of 𝑓 (𝑥) subject to 𝑔𝑖 (𝑥) = 0 for 1 ≤ 𝑖 ≤ 𝑘,
for 𝑥 ∈ R𝑛 as in Theorem 4.10 with 𝑈 = R𝑛 ; all functions are 𝐶 1 . Then the Lagrangian
for this problem is the function 𝐹 : R𝑛 × R 𝑘 → R where 𝜆 = (𝜆1 , . . . , 𝜆 𝑘 ) ∈ R 𝑘 ,
𝑘
Õ
𝐹(𝑥, 𝜆) = 𝑓 (𝑥) − 𝜆 𝑖 𝑔𝑖 (𝑥). (4.39)
𝑖=1

Then the stationary points of 𝐹 are by definition the points (𝑥, 𝜆) ∈ R𝑛 × R 𝑘 with
zero derivative, that is,
𝐷𝐹(𝑥, 𝜆) = 0 . (4.40)
These are 𝑛 + 𝑘 equations for the partial derivatives of 𝐹 with 𝑛 + 𝑘 unknowns,
the components of (𝑥, 𝜆). These equations define exactly the problem of finding
the Lagrangian multipliers in (4.33) and of solving the given equality constraints.
Namely, the first 𝑛 equations in (4.40) are for the 𝑛 partial derivatives with respect
to 𝑥 𝑗 of 𝐹, that is, by (4.39),
𝑘
𝜕 𝜕 Õ 𝜕
𝐹(𝑥, 𝜆) = 𝑓 (𝑥) − 𝜆𝑖 𝑔𝑖 (𝑥) = 0 (1 ≤ 𝑗 ≤ 𝑛). (4.41)
𝜕𝑥 𝑗 𝜕𝑥 𝑗 𝜕𝑥 𝑗
𝑖=1

These 𝑛 equations can be written as


𝑘
Õ
𝐷 𝑓 (𝑥) − 𝜆 𝑖 𝐷 𝑔𝑖 (𝑥) = 0 (4.42)
𝑖=1

which is equivalent to (4.33). The last 𝑘 equations in (4.40) are for the 𝑘 partial
derivatives with respect to 𝜆 𝑖 of 𝐹, that is,
𝜕
𝐹(𝑥, 𝜆) = −𝑔𝑖 (𝑥) = 0 (1 ≤ 𝑖 ≤ 𝑘) (4.43)
𝜕𝜆 𝑖
which is equivalent to 𝑔𝑖 (𝑥) = 0 for 1 ≤ 𝑖 ≤ 𝑘, which are the given equality
constraints. Note that it does not make sense to maximise the Lagrangian 𝐹(𝑥, 𝜆)
without these constraints, because it is unbounded when we take any 𝑥 where
𝑔𝑖 (𝑥) ≠ 0 and let 𝜆 → ∞ (if 𝑔𝑖 (𝑥) > 0, or 𝜆 → −∞ if 𝑔𝑖 (𝑥) < 0)
𝑘
The Lagrangian is often defined as 𝐷𝐹(𝑥, 𝜆) = 𝑓 (𝑥) + 𝑖=1 𝜆 𝑖 𝑔𝑖 (𝑥) which is
Í
(4.39) but with a plus sign instead of a minus sign, which accordingly gives (4.42)
108 Chapter 4. First-Order Conditions

with a plus sign instead of a minus sign. This is also equivalent to (4.33) except for
the sign change of each 𝜆 𝑖 . We prefer (4.33) which states directly that 𝐷 𝑓 (𝑥) is a
linear combination of the gradients 𝐷 𝑔𝑖 (𝑥).
Lagrangian multipliers can be interpreted as shadow prices in certain economic
settings. In such a setting, 𝑥 may represent an allocation of the variables 𝑥1 , . . . , 𝑥 𝑛
according to some production schedule which results in profit 𝑓 (𝑥) for the firm,
subject to the constraints 𝑔𝑖 (𝑥) = 0 for 1 ≤ 𝑖 ≤ 𝑘. The profit 𝑓 (𝑥) is maximised at 𝑥,
with Lagrange multipliers 𝜆1 , . . . , 𝜆 𝑘 as in (4.33). Suppose that 𝑔ˆ 𝑗 (𝑥) is the amount
of some resource 𝑗 needed for production schedule 𝑥, for example manpower, of
which amount 𝑎 𝑗 is available, so that 𝑔 𝑗 (𝑥) = 𝑔ˆ 𝑗 (𝑥) − 𝑎 𝑗 = 0 (assuming all manpower
is used; we could more generally assume 𝑔 𝑗 (𝑥) ≤ 0, but here just assume that for
𝑥 = 𝑥 this inequality is tight, 𝑔 𝑗 (𝑥) = 0).
Now suppose the amount of manpower can be increased from 𝑎 𝑗 to 𝑎 𝑗 + 𝜀 for
some small amount 𝜀 > 0, which results in the new constraint 𝑔 𝑗 (𝑥) = 𝜀, where all
other constraints are kept fixed. Assume that the new constraint results in a new
optimal solution 𝑥(𝜀), that is, 𝑔 𝑗 (𝑥(𝜀)) = 𝜀 and 𝑔𝑖 (𝑥(𝜀)) = 0 for 𝑖 ≠ 𝑗. We claim that
then
𝑓 (𝑥(𝜀)) ≈ 𝑓 (𝑥) + 𝜆 𝑗 𝜀 . (4.44)
Namely, with 𝑥(𝜀) = 𝑥 + Δ𝑥 we have 𝐷 𝑔𝑖 (𝑥) · Δ𝑥 = 0 for 𝑖 ≠ 𝑗 in order to keep the
condition 𝑔𝑖 (𝑥 + Δ𝑥 ) = 0 (see (4.36) above), but

𝜀 = 𝑔 𝑗 (𝑥 + Δ𝑥 ) ≈ 𝑔 𝑗 (𝑥) + 𝐷 𝑔 𝑗 (𝑥) · Δ𝑥 = 𝐷 𝑔 𝑗 (𝑥) · Δ𝑥 , (4.45)

and thus
𝑓 (𝑥(𝜀)) = 𝑓 (𝑥 + Δ𝑥 ) ≈ 𝑓 (𝑥) + 𝐷 𝑓 (𝑥) · Δ𝑥
Í𝑘
= 𝑓 (𝑥) + 𝑖=1 𝜆 𝑖 𝐷 𝑔𝑖 (𝑥) · Δ𝑥
(4.46)
= 𝑓 (𝑥) + 𝜆 𝑗 𝐷 𝑔 𝑗 (𝑥) · Δ𝑥
= 𝑓 (𝑥) + 𝜆 𝑗 𝜀
which shows (4.44). The interpretation of (4.44) is that adding 𝜀 more manpower
(amount of resource 𝑗 ) so that the constraint 𝑔 𝑗 (𝑥) = 0 is changed to 𝑔 𝑗 (𝑥) = 𝜀
increases the firm’s profit by 𝜆 𝑗 𝜀. Hence, 𝜆 𝑗 is the price per extra unit of manpower
that the firm should be willing to pay, given the current maximiser 𝑥 and associated
Lagrangian multipliers 𝜆1 , . . . , 𝜆 𝑘 in (4.33).
The following is a typical problem that can be solved with the help of Lagrange’s
Theorem 4.10. A manufacturer of rectangular milk cartons wants to minimise the
material used to obtain a carton of a given volume. A carton is 𝑥 cm high, 𝑦 cm
wide and 𝑧 cm deep, and is folded according to the layout shown on the right in
Figure 4.9 (which is used twice, for front and back). Each of the four squares in a
corner of the layout with length 𝑧/2 (together with its counterpart on the back) is
folded into a triangle as shown on the left (the triangles at the bottom are folded
4.8. Equality Constraints and the Theorem of Lagrange 109

underneath the carton). We ignore any overlapping material used for glueing.
What are the optimal dimensions 𝑥, 𝑦, 𝑧 for a carton with volume 500 cm3 ?

z /2

z /2 z /2

x
x

z y
z /2
y

Figure 4.9 Optimisation of a milk carton using the Theorem of Lagrange.

The layout on the right shows that the area 𝑓 (𝑥, 𝑦, 𝑧) of the material used is
(𝑥 + 𝑧)(𝑦 + 𝑧) times two (for front and back, but the factor 2 can be ignored in the
minimisation), subject to 𝑔(𝑥, 𝑦, 𝑧) = 𝑥𝑦𝑧 − 500 = 0. We have
𝐷 𝑓 (𝑥, 𝑦, 𝑧) = (𝑦 + 𝑧, 𝑥 + 𝑧, 𝑥 + 𝑦 + 2𝑧) , 𝐷 𝑔(𝑥, 𝑦, 𝑧) = (𝑦𝑧, 𝑥𝑧, 𝑥𝑦) .
Because clearly 𝑥, 𝑦, 𝑧 > 0, the derivative 𝐷 𝑔(𝑥, 𝑦, 𝑧) is never zero and therefore
linearly independent. By Lagrange’s theorem, there is some 𝜆 so that
𝑦 + 𝑧 = 𝜆𝑦𝑧 ,
𝑥 + 𝑧 = 𝜆𝑥𝑧 , (4.47)
𝑥 + 𝑦 + 2𝑧 = 𝜆𝑥𝑦 .
These equations are nonlinear, but simpler equations can be found by exploiting
their symmetry. Multiplying the first, second, and third equation in (4.47) by
𝑥, 𝑦, 𝑧, respectively (all of which are nonzero), these equations are equivalent to
𝑥(𝑦 + 𝑧) = 𝜆𝑥𝑦𝑧 ,
𝑦(𝑥 + 𝑧) = 𝜆𝑥𝑦𝑧 , (4.48)
𝑧(𝑥 + 𝑦 + 2𝑧) = 𝜆𝑥𝑦𝑧 ,
that is, they all have the same right-hand side. The first two equations in (4.48)
imply 𝑥𝑧 = 𝑦𝑧 and thus 𝑥 = 𝑦. With 𝑥 = 𝑦, the second and third equation give
𝑥(𝑥 + 𝑧) = 𝑧(2𝑥 + 2𝑧) = 2𝑧(𝑥 + 𝑧)
110 Chapter 4. First-Order Conditions

and thus 𝑥 = 2𝑧. That is, the only optimal solution is of the form (2𝑧, 2𝑧, 𝑧).
Applied to the volume equation this gives 4𝑧 3 = 500 or 𝑧 3 = 125, that is, 𝑥 = 𝑦 = 10
cm and 𝑧 = 5 cm.
The area of material used is 2(𝑥 + 𝑧)(𝑦 + 𝑧) = 2 × 152 = 450 cm2 . In comparison,
the surface area of the carton without the extra folded triangles is 2(𝑥𝑦 + 𝑥𝑧 + 𝑦𝑧) =
2(100 + 50 + 50) = 400 cm2 . The extra material is from the eight squares of size
2.5 × 2.5 for the folded triangles which do not contribute to the surface of the
carton, which have area 8 × 2.52 = 2 × 52 = 50 cm2 .

4.9 Inequality Constraints and the KKT Conditions

In this section, we discuss conditions for local optimality of a function subject to


inequalities, as in the problem

maximise 𝑓 (𝑥) subject to ℎ 𝑖 (𝑥) ≤ 0 (1 ≤ 𝑖 ≤ ℓ ), (4.49)

where 𝑓 and ℎ 𝑖 are continuously differentiable functions on R𝑛 . The inequalities


(4.49) are always weak (allowing for equality) so that 𝑓 is maximised on a closed set,
which if bounded ensures existence of a maximum by the Theorem of Weierstrass.
The inequalities are written as in (4.49), that is, they require ℎ 𝑖 (𝑥) to be
nonpositive, because for a maximisation problem it is natural to think of the
constraints as upper bounds on limited resources. For example, if 𝑓 (𝑥) is the profit
from a production schedule 𝑥, then ℎ 𝑖 (𝑥) ≤ 0 means that, in order to implement 𝑥,
of resource 𝑖 the use ℎ 𝑖 (𝑥) cannot exceed a certain bound (which is set to 0 by
a suitable choice of the function ℎ 𝑖 ). In contrast, in a minimisation problem it
is often more natural to write lower bounds as in ℎ 𝑖 (𝑥) ≥ 0, which express that
certain minimum conditions that have to be met, such as producing at least a
certain quantity of each good 𝑖, where the goal is to do so with minimum overall
cost 𝑓 (𝑥). The direction of the inequality is in principle arbitrary because ℎ 𝑖 (𝑥) can
be replaced by −ℎ 𝑖 (𝑥). We consider maximisation problems with the convention
in (4.49).
Suppose ℓ = 1 in (4.49), that is, we have a single inequality constraint that we
write as ℎ(𝑥) ≤ 0. At a local maximiser 𝑥 of 𝑓 , the inequality is either not tight,
ℎ(𝑥) < 0, or tight, ℎ(𝑥) = 0. The distinction between these two cases is central to the
first-order conditions for optimisation under inequality constraints, and illustrated
in Figures 4.10 and 4.11. These can be summarised as follows: A constraint that
is not tight can be treated as if it is absent, and a tight constraint can be treated
like an equality constraint where the corresponding Lagrange multiplier has to be
nonnegative.
Consider the case that ℎ(𝑥) < 0, that is, the constraint is not tight, shown
in Figure 4.10. This means that there is an 𝜀-ball 𝐵(𝑥, 𝜀) around 𝑥 where (by
4.9. Inequality Constraints and the KKT Conditions 111

𝐷𝑓
ℎ(𝑥) < 0

𝐷ℎ
𝑥

𝑓 (𝑥) = 𝑐

ℎ(𝑥) = 0

Figure 4.10 Maximisation of 𝑓 (𝑥) subject to ℎ(𝑥) ≤ 0 (grey region). At the


maximiser 𝑥, the inequality is not tight (ℎ(𝑥) < 0) which requires 𝐷 𝑓 (𝑥) = 0.
The ellipses are contour lines of 𝑓 .

ℎ(𝑥) < 0

𝐷𝑓 𝐷ℎ
𝑥
𝑓 (𝑥) = 𝑐

ℎ(𝑥) = 0

Figure 4.11 Maximisation of 𝑓 (𝑥) subject to ℎ(𝑥) ≤ 0. At the maximiser 𝑥,


the inequality is tight (ℎ(𝑥) = 0) which requires 𝐷 𝑓 (𝑥) = 𝜇𝐷 ℎ(𝑥) for some
𝜇 ≥ 0.

continuity of ℎ) we also have ℎ(𝑥) < 0, and so 𝑓 (𝑥) is an unconstrained local


maximum by Lemma 4.8, where by Lemma 4.9 we have 𝐷 𝑓 (𝑥) = 0. Hence, as
long as the constraint ℎ(𝑥) ≤ 0 holds as a strict inequality, it has no effect on the
gradient 𝐷 𝑓 (𝑥), just as if the constraint was absent.
The case ℎ(𝑥) = 0 is illustrated in Figure 4.11. The grey region {𝑥 | ℎ(𝑥) ≤ 0}
shown there has the contour line {𝑥 | ℎ(𝑥) = 0} as a boundary. If ℎ(𝑥) denotes
112 Chapter 4. First-Order Conditions

the “height of a terrain” at location 𝑥, then this grey region can be seen as a “lake”
with the surface of the water at height 0. The gradient 𝐷 ℎ(𝑥) points outwards,
orthogonally to the boundary, for getting out of the lake. The function 𝑓 (𝑥) may
denote the height of a different terrain, and maximising 𝑓 (𝑥) is achieved at 𝑓 (𝑥)
where the contour line of 𝑓 touches the contour line of ℎ. This is exactly the same
situation as in the Lagrange multiplier problem, meaning 𝐷 𝑓 (𝑥) = 𝜇𝐷 ℎ(𝑥) for
some Lagrange multiplier 𝜇, with the additional constraint that the gradients of 𝑓
and ℎ have to be not only co-linear, but point in the same direction, that is, 𝜇 ≥ 0.
(We use a different Greek letter 𝜇 instead of the usual 𝜆 to emphasise this.) The
reason is that at the point 𝑥 both ℎ(𝑥) and 𝑓 (𝑥) are maximised, by “getting out of
the lake”, and by maximising 𝑓 , in the direction of the gradients.
The following is the central theorem of this chapter. For its naming see
Section 4.1.3.

Theorem 4.11 (KKT, Karush-Kuhn-Tucker). Let 𝑓 : 𝑈 → R be a 𝐶 1 (𝑈) function on


an open subset 𝑈 of R𝑛 , and let ℎ1 , . . . , ℎℓ : R𝑛 → R be 𝐶 1 (R𝑛 ) functions. Let 𝑥 be a
local maximiser of 𝑓 on

𝑋 = 𝑈 ∩ {𝑥 ∈ R𝑛 | ℎ 𝑖 (𝑥) ≤ 0 , 1 ≤ 𝑖 ≤ ℓ } . (4.50)

Let the set of vectors {𝐷 ℎ 𝑖 (𝑥) | ℎ 𝑖 (𝑥) = 0 } be linearly independent (“constraint qualifica-
tion” for the tight constraints). Then there exist 𝜇1 , . . . , 𝜇ℓ ∈ R so that for 1 ≤ 𝑖 ≤ ℓ


Õ
𝜇𝑖 ≥ 0 , 𝜇𝑖 ℎ 𝑖 (𝑥) = 0 , 𝐷 𝑓 (𝑥) = 𝜇𝑖 𝐷 ℎ 𝑖 (𝑥) . (4.51)
𝑖=1

In (4.51), it is important to understand the middle condition 𝜇𝑖 ℎ 𝑖 (𝑥) = 0, which


is equivalent to
ℎ 𝑖 (𝑥) < 0 ⇒ 𝜇𝑖 = 0 . (4.52)
The purpose of this condition is to disregard all non-tight constraints ℎ 𝑖 (𝑥) < 0 by
requiring the multiplier 𝜇𝑖 to be zero, so that the corresponding gradients 𝐷 ℎ 𝑖 (𝑥)
do not even appear in the linear combination that represents 𝐷 𝑓 (𝑥) in (4.51). That
is, 𝐷 𝑓 (𝑥) is a nonnegative linear combination of the gradients 𝐷 ℎ 𝑖 (𝑥) for the tight
constraints only (assuming they are linearly independent as stated in the constraint
qualification). That is, (4.51) can also be written as follows: Let 𝐸 be the set of tight
(or “effective”) constraints,

𝐸 = {𝑖 | 1 ≤ 𝑖 ≤ ℓ , ℎ 𝑖 (𝑥) = 0 } . (4.53)

Then there are reals 𝜇𝑖 ≥ 0 for 𝑖 ∈ 𝐸 such that


Õ
𝐷 𝑓 (𝑥) = 𝜇𝑖 𝐷 ℎ 𝑖 (𝑥) . (4.54)
𝑖∈𝐸
4.9. Inequality Constraints and the KKT Conditions 113

Proof of Theorem 4.11. We prove the KKT Theorem with the help of the Theorem
4.10 of Lagrange. Let 𝑥 be a local maximiser of 𝑓 on 𝑋, and let 𝐸 be the set of
effective constraints as in (4.53). Because the functions ℎ 𝑖 for 𝑖 ∉ 𝐸 are continuous,
the set 𝑉 defined by

𝑉 = 𝑈 ∩ {𝑥 ∈ R𝑛 | ℎ 𝑖 (𝑥) < 0, 1 ≤ 𝑖 ≤ ℓ , 𝑖 ∉ 𝐸 } (4.55)

is open, and we can consider 𝑓 as a function 𝑉 → R. We apply Theorem 4.10 to


this function subject to 𝑘 = |𝐸| equations on the set

𝑉 ∩ {𝑥 ∈ R𝑛 | ℎ 𝑖 (𝑥) = 0, 𝑖 ∈ 𝐸 } (4.56)

where it has the local maximiser 𝑥. Because the constraint qualification holds for
the gradients 𝐷 ℎ 𝑖 (𝑥) for 𝑖 ∈ 𝐸, there are Lagrange multipliers 𝜇𝑖 for 𝑖 ∈ 𝐸 so that
(4.54) holds. It remains to show that they are nonnegative.
Suppose 𝜇 𝑗 < 0 for some 𝑗 ∈ 𝐸. Because 𝑥 is in the interior of 𝑉, for sufficiently
small 𝜀 > 0 we can find Δ𝑥 ∈ R𝑛 so that 𝑥 + Δ𝑥 ∈ 𝑉 and, as in (4.45),

ℎ 𝑗 (𝑥 + Δ𝑥 ) = −𝜀 (4.57)

and ℎ 𝑖 (𝑥 + Δ𝑥 ) = 0 for 𝑖 ∈ 𝐸 − {𝑗}, so that 𝑥 + Δ𝑥 ∈ 𝑋 in (4.50), that is, all inequality


constraints are fulfilled. Then, as in (4.46),

𝑓 (𝑥 + Δ𝑥 ) ≈ 𝑓 (𝑥) − 𝜇 𝑗 𝜀 > 𝑓 (𝑥) (4.58)

because 𝜇 𝑗 < 0. This is a contradiction because 𝑓 (𝑥) is a local maximum on the


set 𝑋 (see also our explanation with Figure 4.11 above). This proves that 𝜇𝑖 ≥ 0 for
all 𝑖 ∈ 𝐸 as claimed.

𝑓 (𝑥)

ℎ(𝑥)
𝑥
𝑎 𝑏 𝑐 𝑑

Figure 4.12 Illustration of the KKT Theorem for 𝑛 = 1, ℓ = 1. The condition


ℎ(𝑥) ≤ 0 holds for 𝑥 ∈ [𝑎, 𝑏] ∪ [𝑐, 𝑑] (where both functions are shown as bold
curves), and is tight for 𝑥 ∈ {𝑎, 𝑏, 𝑐, 𝑑}. For 𝑥 = 𝑑 the constraint qualification
fails because 𝐷 ℎ(𝑑) = 0.

The sign conditions in the KKT Theorem are most easily remembered (or
reconstructed) for a single constraint in dimension 𝑛 = 1, as shown in Figure 4.12.
114 Chapter 4. First-Order Conditions

There the condition ℎ(𝑥) ≤ 0 holds on the two intervals [𝑎, 𝑏] and [𝑐, 𝑑] and is tight
at any end of either interval. For 𝑥 = 𝑎 both 𝑓 and ℎ have a negative derivative,
and hence 𝐷 𝑓 (𝑥) = 𝜇𝐷 ℎ(𝑥) for some 𝜇 ≥ 0, and indeed 𝑥 is a local maximiser
of 𝑓 . For 𝑥 ∈ {𝑏, 𝑐} the derivatives of 𝑓 and ℎ have opposite sign, and in each case
𝐷 𝑓 (𝑥) = 𝜆𝐷 ℎ(𝑥) for some 𝜆 but 𝜆 < 0, so these are not maximisers of 𝑓 . However,
in that case −𝐷 𝑓 (𝑥) = −𝜆𝐷 ℎ(𝑥) and hence both 𝑏 and 𝑐 are local maximisers of
− 𝑓 and hence local minimisers of 𝑓 , in agreement with the picture. For 𝑥 = 𝑑 we
have a local maximum 𝑓 (𝑥) with 𝐷 𝑓 (𝑥) > 0 but 𝐷 ℎ(𝑥) = 0 and hence no 𝜇 with
𝐷 𝑓 (𝑥) = 𝜇𝐷 ℎ(𝑥) because the constraint qualification fails. Moreover, there are two
points 𝑥 in the interior of [𝑐, 𝑑] where 𝑓 has zero derivative, which is a necessary
condition for a local maximum of 𝑓 because ℎ(𝑥) ≤ 0 is not tight. One of these
points is indeed a local maximum.

Method 4.12. The following is a “cookbook procedure” to use the KKT Theorem
4.11 in order to find the optimum of a function 𝑓 : R𝑛 → R.
1. Write all inequality constraints in the form ℎ 𝑖 (𝑥) ≤ 0 for 1 ≤ 𝑖 ≤ ℓ . In particular,
write a constraint such as 𝑔(𝑥) ≥ 0 in the form −𝑔(𝑥) ≤ 0 .
2. Assert that the functions 𝑓 , ℎ1 , . . . , ℎℓ are 𝐶 1 functions on R𝑛 . If the function 𝑓
is to be minimised, replace it by − 𝑓 to obtain a maximisation problem.
3. Check if the set
𝑆 = {𝑥 ∈ R𝑛 | ℎ 𝑖 (𝑥) ≤ 0, 1 ≤ 𝑖 ≤ ℓ } (4.59)

is bounded and hence compact, which ensures the existence of a (global)


maximum of 𝑓 on 𝑆 by the Theorem of Weierstrass.
3a. If not, check if the set

𝑇 = 𝑆 ∩ {𝑥 ∈ R𝑛 | 𝑓 (𝑥) ≥ 𝑐} (4.60)

is non-empty and bounded for some 𝑐 ∈ R, so that 𝑓 (𝑥) has a maximum on


𝑇, also by Weierstrass, which is also a maximum of 𝑓 on 𝑆 because any other
point 𝑥 in 𝑆 − 𝑇 fulfills 𝑓 (𝑥) < 𝑐 . If you cannot easily assert these conditions,
𝑓 may be unbounded on 𝑆, which you should find out. (In fact, if the set 𝑇 in
(4.60) is non-empty for all 𝑐 ∈ R, then 𝑓 is indeed unbounded on 𝑆, but you
are not meant to perform a search for the largest such 𝑐 in order to solve the
maximisation problem.)
4. For all 2ℓ subsets 𝐸 of {1, . . . , ℓ } as possible “effective” constraints, consider
the set of solutions 𝑥 so that ℎ 𝑖 (𝑥) = 0 for 𝑖 ∈ 𝐸, and ℎ 𝑖 (𝑥) < 0 for 𝑖 ∉ 𝑆. For
every such 𝐸, do the following.
4a. Determine the gradients 𝐷 ℎ 𝑖 (𝑥) for 𝑖 ∈ 𝐸 and if they are linearly independent.
For any critical point 𝑥 where they are not linearly independent the constraint
qualification fails and we have to evaluate 𝑓 (𝑥) as a possible maximum.
4.9. Inequality Constraints and the KKT Conditions 115

4b. Find solutions 𝑥 and 𝜇𝑖 for 𝑖 ∈ 𝐸 to (4.54) and to the equations ℎ 𝑖 (𝑥) = 0 for
𝑖 ∈ 𝐸. If 𝜇𝑖 ≥ 0 for all 𝑖 ∈ 𝐸, then this is a local maximum of 𝑓 , otherwise not.
5. Compare the function values of 𝑓 (𝑥) found in 4b, and of 𝑓 (𝑥) for the critical
points 𝑥 in 4a, to determine the global maximum (which may occur for more
than one maximiser).

The main step in this method is Step 4. As an example, we apply Method 4.12
to the problem

maximise 𝑥 + 𝑦 subject to 2𝑦
1
≤ 12 𝑥, 𝑦≤ 5
4 − 14 𝑥 2 , 𝑦 ≥ 0 . (4.61)

In the standard form required by Step 1, this is a problem with 𝑛 = 2, ℓ = 3, where


𝑓 , ℎ1 , ℎ2 , ℎ3 : R2 → R are defined by

𝑓 (𝑥, 𝑦) = 𝑥 + 𝑦
ℎ 1 (𝑥, 𝑦) = − 21 𝑥 + 2𝑦
1
(4.62)
ℎ 2 (𝑥, 𝑦) = 4𝑥
1 2
+ 𝑦− 5
4
ℎ 3 (𝑥, 𝑦) = − 𝑦

and 𝑓 (𝑥, 𝑦) is to be maximised subject to ℎ 𝑖 (𝑥, 𝑦) ≤ 0 for 𝑖 = 1, 2, 3. All functions


are in 𝐶 1 (R2 ) as required by Step 2.
This defines the set 𝑆 in (4.59) shown in Figure 4.13. The conditions (4.61) are
in a more familar format for drawing in the (𝑥, 𝑦)-plane, because they show the
constraints on (𝑥, 𝑦) with 𝑦 as constrained by a function of 𝑥 (the first inequality is
clearly equivalent to 𝑦 ≤ 𝑥). As the picture shows, the set 𝑆 is bounded and hence
compact (Step 3; we do not need Step 3a).
From (4.62) we obtain the following gradients which are required in Step 4:

𝐷 𝑓 (𝑥, 𝑦) = (1, 1)
𝐷 ℎ 1 (𝑥, 𝑦) = (− 21 , 21 )
(4.63)
𝐷 ℎ 2 (𝑥, 𝑦) = ( 12 𝑥, 1)
𝐷 ℎ 3 (𝑥, 𝑦) = (0, −1) .

There are eight possible subsets 𝐸 of {1, 2, 3}. If 𝐸 = ∅, then (4.54) holds if
𝐷 𝑓 (𝑥, 𝑦) = (0, 0), which is never the case. Next, consider the three “corners” of the
set 𝑆 which are defined when two inequalities are tight, where 𝐸 has two elements.
If 𝐸 = {1, 2} then ℎ1 (𝑥, 𝑦) = 0 and ℎ2 (𝑥, 𝑦) = 0 hold if 𝑥 = 𝑦 and 14 𝑥 2 + 𝑥 − 54 = 0
or 𝑥 2 + 4𝑥 − 5 = 0, that is, (𝑥 − 1)(𝑥 + 5) = 0 or 𝑥 ∈ {1, −5} where only 𝑥 = 𝑦 = 1
fulfills 𝑦 ≥ 0, so this is the point 𝑎 = (1, 1) shown in Figure 4.13. In this case
ℎ3 (1, 1) = −1 < 0, so the third inequality is indeed not tight (if it was then this would
116 Chapter 4. First-Order Conditions

𝑦 𝐷 ℎ2
𝐶

𝐷 ℎ1

1
𝑎
𝐷𝑓
𝑆
ℎ2 ≤ 0
ℎ1 ≤ 0

ℎ3 ≤ 0 𝑏
0 𝑥
0 1 2
Figure 4.13 The set 𝑆 defined by the constraints in (4.62). The triple short
lines next to each line defined by ℎ 𝑖 (𝑥, 𝑦) = 0 (for 𝑖 = 1, 2, 3) show the side
where ℎ 𝑖 (𝑥, 𝑦) ≤ 0 holds, abbreviated as ℎ 𝑖 ≤ 0. The (infinite) set 𝐶 is the
cone of all nonnegative linear combinations of the gradients 𝐷 ℎ1 (𝑥, 𝑦) and
𝐷 ℎ2 (𝑥, 𝑦) for the point 𝑎 = (𝑥, 𝑦) = (1, 1), which does not contain 𝐷 𝑓 (𝑥, 𝑦), so
𝑓 (𝑎) is not a local maximum. At the point 𝑏 = (2, 14 ) we have 𝐷 𝑓 (𝑏) = 𝜇2 𝐷 ℎ2 (𝑏)
for the (only) tight constraint ℎ 2 (𝑏) = 0, and 𝜇2 ≥ 0, so 𝑓 (𝑏) is a local maximum.

correspond to the case 𝐸 = {1, 2, 3}). Then the two gradients are 𝐷 ℎ 1 (1, 1) = (− 12 , 21 )
and 𝐷 ℎ 2 (1, 1) = ( 12 , 1) which are not scalar multiples of each other and therefore
linearly independent, so the constraint qualification holds. Because these are
two linearly independent vectors in R2 , any vector, in particular 𝐷 𝑓 (1, 1), can be
represented as a linear combination of them. That is, there are 𝜇1 and 𝜇2 with

𝐷 𝑓 (1, 1) = (1, 1) = 𝜇1 𝐷 ℎ 1 (1, 1) + 𝜇2 𝐷 ℎ 2 (1, 1) = 𝜇1 (− 21 , 12 ) + 𝜇2 ( 12 , 1), (4.64)

which are uniquely given by 𝜇1 = − 23 , 𝜇2 = 43 . Because 𝜇1 < 0, we do not have a


local maximum of 𝑓 . We can also see this in the picture: By allowing the constraint
ℎ1 (𝑥, 𝑦) ≤ 0 to be non-tight and keeping ℎ2 (𝑥, 𝑦) ≤ 0 tight, (𝑥, 𝑦) can move along
the line ℎ2 (𝑥, 𝑦) = 0 and increase 𝑓 (𝑥, 𝑦) in the direction 𝐷 𝑓 (𝑥, 𝑦) (exactly as in
(4.58) in the proof of Theorem 4.11 for a negative 𝜇 𝑗 ). Figure 4.13 shows the cone 𝐶
spanned by 𝐷 ℎ1 (𝑎) and 𝐷 ℎ2 (𝑎), that is, 𝐶 = {𝜇1 𝐷 ℎ 1 (𝑎) + 𝜇2 𝐷 ℎ2 (𝑎) | 𝜇1 , 𝜇2 ≥ 0 }.
Only gradients 𝐷 𝑓 (𝑎) in that cone “push” the function values of 𝑓 in such a way
that 𝑓 is maximised at 𝑎. This is not the case here.
If 𝐸 = {1, 3} then ℎ 1 (𝑥, 𝑦) = 0 and ℎ3 (𝑥, 𝑦) = 0 require 𝑥 = 𝑦 and 𝑦 = 0,
which is the point (0, 0). The two gradients 𝐷 ℎ 1 (0, 0) and 𝐷 ℎ 2 (0, 0) in (4.63 are
linearly independent. The unique solution to 𝐷 𝑓 (0, 0) = (1, 1) = 𝜇1 𝐷 ℎ1 (0, 0) +
4.9. Inequality Constraints and the KKT Conditions 117

𝜇3 𝐷 ℎ3 (0, 0) = 𝜇1 (− 12 , 12 ) + 𝜇3 (0, −1) is 𝜇1 = −2, 𝜇3 = −2. Here both multipliers are


negative, so then −𝐷 𝑓 (0, 0) is a positive combination of 𝐷 ℎ 1 (0, 0) and 𝐷 ℎ2 (0, 0),
which shows that 𝑓 (0, 0) is a local minimum of 𝑓 (it is easily seen to be the global
minimum).
If 𝐸 = {2, 3} then ℎ 2 (𝑥, 𝑦) = 0 and ℎ3 (𝑥, 𝑦) = 0 require 𝑦 = 0 and 14 𝑥 2 − 54 = 0 or
√ √ √ √
𝑥 = √5 (for 𝑥 = − 5 we would have ℎ1 (𝑥, 0) > 0). Then 𝐷 ℎ 2 ( 5, 0) = ( 5/2, 1) and
𝐷 ℎ 3√
( 5, 0) = (0, −1)√which are also√linearly independent. The√ unique solution to
𝐷 𝑓 ( 5, 0)√= 𝜇2 𝐷 ℎ2 ( √
5, 0)+ 𝜇3 𝐷 ℎ 3 ( 5, 0), that is, to (1, 1) = 𝜇2 ( 5/2, 1)+ 𝜇3 (0, −1),
is 𝜇2 = 2/ 5, 𝜇3 = 2/ 5 − 1 < 0, √ so this point is also not a local maximum, even
though it has the highest value 5 of the three “corners” of the set 𝑆 considered so
far.
If 𝐸 = {1, 2, 3} then there is no common solution to the three equations
ℎ 𝑖 (𝑥, 𝑦) = 0 for 𝑖 ∈ 𝐸 because any two of them already have different unique
solutions.
When 𝐸 is a singleton {𝑖}, the gradient 𝐷 ℎ 𝑖 (𝑥, 𝑦) is always a nonzero vector
and so the constraint qualification holds. If 𝐸 = {1} then (4.54) requires 𝐷 𝑓 (𝑥, 𝑦) =
(1, 1) = 𝜇1 𝐷 ℎ1 (𝑥, 𝑦) = 𝜇1 (− 12 , 12 ) which has no solution 𝜇1 because the two gradients
are not scalar multiples of each other. The same applies when 𝐸 = {3} where
𝐷 𝑓 (𝑥, 𝑦) = (1, 1) = 𝜇3 𝐷 ℎ3 (𝑥, 𝑦) = 𝜇3 (0, −1) has no solution 𝜇3 .
However, for 𝐸 = {2} we do have a solution to Step 4b in Method 4.12. The
equation 𝐷 𝑓 (𝑥, 𝑦) = (1, 1) = 𝜇2 𝐷 ℎ2 (𝑥, 𝑦) = 𝜇2 ( 12 𝑥, 1) has the unique solution
𝜇2 = 1 ≥ 0 and 𝑥 = 2, and ℎ2 (𝑥, 𝑦) = 0 is solved for 44 + 𝑦 − 54 = 0 or 𝑦 = 14 . This
is the point 𝑏 = (2, 14 ) shown in Figure 4.13. Hence, 𝑓 (𝑏) is a local maximum of 𝑓 .
Because it is the only local maximum found, it is also the global maximum of 𝑓 ,
which exists as confirmed in Step 3.
A rather laborious part of Method 4.12 in this example has been to compute
the multipliers 𝜇𝑖 for 𝑖 ∈ 𝐸 in Step 4b and to check their signs. We have done this
in detail to explain the interpretation of these signs, with the cone 𝐶 shown in
Figure 4.13 at the candidate point 𝑎 which fulfills all conditions except for the sign
of 𝜇1 . However, there is a shortcut which avoids computing the multipliers, by
going directly to Step 5, and simply comparing the function√values at these points.
In our example, the candidate points where (1, 1), (0, 0), ( 5, 0), and (2, 14 ), with

corresponding function values 2, 0, 5, and 94 , of which the latter is the largest (we

have 94 > 5 because ( 49 )2 = 81
16 > 5).
We consider a second example that has two local maxima, where one local
maximum has a tight constraint that has a zero multiplier. The problem is again
about a function 𝑓 : R2 → R and says

maximise 𝑓 (𝑥, 𝑦) = 𝑥 2 − 𝑦 subject to 𝑥 ≥ 0, 𝑥2 + 𝑦2 ≤ 1 . (4.65)

We write the constraints with ℎ1 , ℎ2 : R2 → R as


118 Chapter 4. First-Order Conditions

𝑓 (𝑥, 𝑦) = 𝑐
ℎ2 (𝑥, 𝑦) = 0
𝑆

𝑥
ℎ1 (𝑥, 𝑦) = 0

𝑏 𝐷𝑓
𝐷 ℎ2
𝐷 ℎ1
𝑎
𝐷𝑓

𝐷 ℎ2
Figure 4.14 The maximisation problem√(4.65). The parabolas are contour
lines of 𝑓 . The points 𝑎 = (0, −1) and 𝑏 = ( 3/2, − 12 ) are local maximisers of 𝑓 ,
and 𝑓 (𝑏) is the global maximum. The gradients are displayed shorter to save
space.

ℎ1 (𝑥, 𝑦) = −𝑥 ≤ 0 , ℎ2 (𝑥, 𝑦) = 𝑥 2 + 𝑦 2 − 1 ≤ 0 . (4.66)

The corresponding set 𝑆 is shown in Figure 4.14 and compact, so 𝑓 has a maximum.
We have

𝐷 𝑓 (𝑥, 𝑦) = (2𝑥, −1), 𝐷 ℎ1 (𝑥, 𝑦) = (−1, 0), 𝐷 ℎ2 (𝑥, 𝑦) = (2𝑥, 2𝑦). (4.67)

There are four possible subsets 𝐸 of {1, 2} of tight constraints in (4.53). If


𝐸 = ∅ then we need 𝐷 𝑓 (𝑥, 𝑦) = (0, 0) which is never the case. If 𝐸 = {1, 2} then
ℎ1 (𝑥, 𝑦) = ℎ2 (𝑥, 𝑦) = 0 in (4.66) has two solutions (𝑥, 𝑦) = (0, 1) and (𝑥, 𝑦) = (0, −1).
Then the two gradients 𝐷 ℎ 1 (𝑥, 𝑦) and 𝐷 ℎ 2 (𝑥, 𝑦) are linearly independent and the
constraint qualification holds. For (𝑥, 𝑦) = (0, 1) we have 𝐷 𝑓 (0, 1) = (0, −1) =
𝜇1 𝐷 ℎ1 (0, 1) + 𝜇2 𝐷 ℎ2 (0, 1) = 𝜇1 (−1, 0) + 𝜇2 (0, 2) so 𝜇1 = 0 and 𝜇2 = − 12 , so this
is a local (and in fact global) minimum of 𝑓 . For (𝑥, 𝑦) = 𝑎 = (0, −1) we have
𝐷 𝑓 (0, −1) = (0, −1) = 𝜇1 𝐷 ℎ1 (0, −1) + 𝜇2 𝐷 ℎ 2 (0, −1) = 𝜇1 (−1, 0) + 𝜇2 (0, −2) so
𝜇1 = 0 and 𝜇2 = 12 , so 𝑓 (𝑎) is a local maximum of 𝑓 . Note that ℎ1 (𝑎) = 0 but the
corresponding multiplier 𝜇1 for this tight inequality is zero, which is possible and
allowed.
4.9. Inequality Constraints and the KKT Conditions 119

If 𝐸 = {1} then ℎ 1 (𝑥, 𝑦) = 0 and 𝐷 𝑓 (𝑥, 𝑦) = (2𝑥, −1) = 𝜇1 𝐷 ℎ 1 (𝑥, 𝑦) = 𝜇1 (−1, 0)


has no solution 𝜇1 . If 𝐸 = {2} then ℎ2 (𝑥, 𝑦) = 0 and 𝐷 𝑓 (𝑥, 𝑦) = (2𝑥, −1) =
𝜇2 𝐷 ℎ2 (𝑥, 𝑦) = 𝜇2 (2𝑥, 2𝑦) has the following solution: We have 𝑥 > 0 because the
first constraint ℎ1 (𝑥, 𝑦) ≤ 0 is not tight, which requires 𝜇2 = 1 and √ hence 2𝑦 = −1,
that is, 𝑦 = − 2 . Then ℎ2 (𝑥, 𝑦) = 0 means 𝑥 + 4 − 1 = 0 or 𝑥 = 3/2. This is the
1 2 1

point 𝑏 = ( 3/2, − 21 ) with 𝑓 (𝑏) = 43 + 12 = 45 , which is larger than 𝑓 (𝑎) = 𝑓 (0, −1) = 1,
so 𝑓 (𝑏) is the global maximum of 𝑓 .
Finally, we consider an example of the KKT Theorem where the constraint
qualification fails, where unusually the set over which 𝑓 is optimised does not
have a cusp. This example provides also a motivation for the next chapter on linear
optimisation. The problem says:

maximise 2𝑥 + 𝑦 subject to 𝑥 ≥ 0, 𝑦 ≥ 0, 𝑦 · (𝑥 + 𝑦 − 1) ≤ 0. (4.68)

It is of the form considered in Theorem 4.11 for 𝑛 = 2, ℓ = 3, where the objective


function 𝑓 and constraint functions ℎ1 , ℎ2 , ℎ3 , and their gradients, are given as
follows:

𝑓 (𝑥, 𝑦) = 2𝑥 + 𝑦, 𝐷 𝑓 (𝑥, 𝑦) = (2, 1)


ℎ1 (𝑥, 𝑦) = −𝑥 ≤ 0, 𝐷 ℎ 1 (𝑥, 𝑦) = (−1, 0)
(4.69)
ℎ2 (𝑥, 𝑦) = −𝑦 ≤ 0, 𝐷 ℎ 2 (𝑥, 𝑦) = (0, −1)
ℎ3 (𝑥, 𝑦) = 𝑦 · (𝑥 + 𝑦 − 1) ≤ 0, 𝐷 ℎ 3 (𝑥, 𝑦) = (𝑦, 𝑥 + 2𝑦 − 1).

The constraint ℎ3 (𝑥, 𝑦) ≤ 0 can be understood as follows: It holds as a tight


constraint if ℎ3 (𝑥, 𝑦) = 0, that is, 𝑦 = 0 or 𝑥 + 𝑦 − 1 = 0, the latter equation given by
the line 𝑦 = 1 − 𝑥. If 𝑦 > 0 then ℎ3 (𝑥, 𝑦) ≤ 0 only if 𝑥 + 𝑦 − 1 ≤ 0, that is, 𝑦 ≤ 1 − 𝑥,
and if 𝑦 < 0 then ℎ3 (𝑥, 𝑦) ≤ 0 only if 𝑥 + 𝑦 − 1 ≥ 0, that is, 𝑦 ≥ 1 − 𝑥, as shown in
Figure 4.15. However, the case 𝑦 < 0 is excluded by the second constraint 𝑦 ≥ 0
in (4.68). This means that the set 𝑆 in (4.59) is given by the triangle with corners
(0, 0), (1, 0), and (0, 1), shown in Figure 4.16.
We first consider the corners of the triangle as possible solutions to the KKT
conditions. (It is obviously much easier to just evaluate the function on those
corners as in Step 5 of Method 4.12, but we want to check if the KKT Theorem can be
applied.) Let (𝑥, 𝑦) = (0, 0). Then all three constraints are tight, with 𝐸 = {1, 2, 3}
in (4.53). The three vectors of derivatives 𝐷 ℎ1 (0, 0), 𝐷 ℎ 2 (0, 0), 𝐷 ℎ 3 (0, 0) in R2 are
necessarily linearly dependent, so the constraint qualification fails and we have to
investigate this critical point as a possible maximum.
For (𝑥, 𝑦) = (1, 0), we have ℎ2 (1, 0) = 0 and ℎ3 (1, 0) = 0 but ℎ 1 (1, 0) < 0, so
𝐸 = {2, 3} in (4.53). By (4.69), 𝐷 ℎ2 (1, 0) = (0, −1) and 𝐷 ℎ 3 (1, 0) = (0, 0), which are
linearly dependent vectors, so this is another critical point where the constraint
qualification fails.
120 Chapter 4. First-Order Conditions

(0,1)

(1,0)
(0,0)
x

Figure 4.15 The set of points (𝑥, 𝑦) so that ℎ3 (𝑥, 𝑦) = 𝑦 · (𝑥 + 𝑦 − 1) ≤ 0, shown


as the dark area. It stretches infinitely in both directions and is therefore
shown as “torn off” on the left and right.

(2,1)
(0,1)

(1,0)
(0,0)
x

Figure 4.16 The feasible set and the objective function for the example
(4.68), and the optimal point (1, 0).

For (𝑥, 𝑦) = (0, 1), the tight constraints are given by ℎ 1 (0, 1) = ℎ3 (0, 1) = 0
whereas ℎ 2 (0, 1) < 0, so 𝐸 = {1, 3} in (4.53). By (4.69), 𝐷 ℎ1 (0, 1) = (−1, 0) and
𝐷 ℎ3 (0, 1) = (1, 1), which are linearly independent vectors. We want to find 𝜇1 and
𝜇3 that are nonnegative so that

𝐷 𝑓 (0, 1) = 𝜇1 𝐷 ℎ 1 (0, 1) + 𝜇3 𝐷 ℎ 3 (0, 1),

that is,
(2, 1) = 𝜇1 (−1, 0) + 𝜇3 (1, 1)
which has the unique solution 𝜇3 = 1 and 𝜇1 = −1. Because 𝜇1 < −1, the KKT
conditions (4.51) fail and (𝑥, 𝑦) = (0, 1) cannot be a local maximum of 𝑓 (nor, for
that matter, a minimum because 𝜇3 > 0, because for a minimum of 𝑓 , that is, a
maximum of − 𝑓 , a we would need 𝜇1 ≤ 0 and 𝜇3 ≤ 0).
For completeness, we consider the cases of fewer tight constraints. 𝐸 = ∅
would require 𝐷 𝑓 (𝑥, 𝑦) = (0, 0), which is never the case. If 𝐸 = {1} then 𝐷 𝑓 (𝑥, 𝑦)
would have to be scalar multiple of 𝐷 ℎ1 (𝑥, 𝑦) but it is not, and neither is it a scalar
multiple of 𝐷 ℎ2 (𝑥, 𝑦) when 𝐸 = {2}. Consider 𝐸 = {3}, so ℎ 3 (𝑥, 𝑦) = 0 is the
4.10. Reminder of Learning Outcomes 121

only tight constraint, that is, 𝑥 > 0 and 𝑦 > 0. Then ℎ 3 (𝑥, 𝑦) = 0 is equivalent to
𝑥 + 𝑦 − 1 = 0. Then we need 𝜇3 ≥ 0 so that 𝐷 𝑓 (𝑥, 𝑦) = 𝜇3 𝐷 ℎ 3 (𝑥, 𝑦), that is,

(2, 1) = 𝜇3 (𝑦, 𝑥 − 2𝑦 − 1)

or 2 = 𝜇3 𝑦 and 1 = 𝜇3 (𝑥 + 2𝑦 − 1). The first of these equations means 𝜇3 = 2/𝑦


(note that 𝑦 > 0) and the second 1 = 2/𝑦 · (𝑥 + 2𝑦 − 1), that is, 𝑦 = 2𝑥 + 4𝑦 − 2 or
−𝑦 = 2(𝑥 + 𝑦 − 1). Because 𝑥 + 𝑦 − 1 = 0 due to the constraint ℎ3 (𝑥, 𝑦) = 0, this
implies 𝑦 = 0, a contradiction. Hence, the KKT conditions cannot be fulfilled when
only a single constraint is tight.
It thus remains to investigate the critical points (0, 0) and (1, 0). Clearly, (1, 0)
produces the maximum of 𝑓 (and (0, 0) the minimum).

4.10 Reminder of Learning Outcomes

After studying this chapter, you should be able to:


• explain the concept of differentiability of a function defined on a subset of R𝑛
• understand how such subsets are defined by constraints given by equalities or
inequalities
• understand and draw contour lines of functions, for example as in Figure 4.2
• compute gradients, via partial derivatives, of given functions
• find the zeroes of such gradients for examples of functions that are uncon-
strained, and then examine the points where the gradient is zero to identify
minima and maxima of the objective function
• for equality constraints, apply Lagrange’s Theorem to specific examples
• understand that so-called “critical points” (where gradients of the constraint
functions are not linearly independent) also have to be examined as possible
minima or maxima, and apply this to given examples
• use the insight that the KKT conditions distinguish between tight and non-tight
inequalities, and that non-tight inequalities are in effect treated as if they were
absent, in order to identify candidate points for local minima and maxima.
Apply this to specific examples.

4.11 Exercises for Chapter 4

Exercise 4.1. Consider the following functions 𝑓 , 𝑔, ℎ : R2 → R and decide whether


they attain a global minimum or global maximum. If they do, determine where it
is attained.
(a) 𝑓 (𝑥, 𝑦) = 4𝑥 2 + 𝑥𝑦 3 + 𝑦 2 ,
122 Chapter 4. First-Order Conditions

(b) 𝑔(𝑥, 𝑦) = 𝑥/(1 + 𝑥 2 + 𝑦 2 ),


(c) ℎ(𝑥, 𝑦) = 𝑦 cos 𝑥.
Exercise 4.2. Recall that R≥ = {𝑥 ∈ R | 𝑥 ≥ 0}. Let 𝑓 : R≥ → R be a 𝐶 1 function
such that 𝑓 (0) = 0, 𝑓 (1) > 0, and lim𝑥→∞ 𝑓 (𝑥) = 0. Suppose there is only a single
𝑥 ∗ ∈ R≥ at which 𝑓 ′(𝑥 ∗ ) = 0. Show that
(a) 𝑥 ∗ is global maximiser of 𝑓 on R≥ , and that
(b) 𝑓 (𝑥) ≥ 0 for all 𝑥 ≥ 0.
Exercise 4.3. A manufacturer of aluminium beer cans has to produce cylindrical
cans where top and bottom are discs with radius 𝑟 and the sidewall of the cylinder
has height ℎ. The sidewall has thickness 1 and the top and bottom have thickness
𝐴 for some parameter 𝐴 > 0 (both 1 and 𝐴 are very small compared to 𝑟 and ℎ).
The prescribed volume of the cylinder is 𝑉 > 0 (we neglect the volume of the
aluminium needed for the can itself).
With the help of the Theorem of Lagrange, find 𝑟 and ℎ so that the can is made
with the least amount of material. The following picture shows a layout of the
material needed.

top and bottom:


thickness A
r

cylinder sidewall:
h thickness 1

How are ℎ and 𝑟 related when 𝐴 = 1? What are 𝑟 and ℎ if 𝑉 = 324 cm3 and
𝐴 = 6/𝜋 ≈ 1.91?
Exercise 4.4. A firm has to produce a good with two inputs 𝑥 and 𝑦, where 𝑦 ≥ 0

and, due to contractual obligations, 𝑥 ≥ 1. The output produced is given by 𝑥𝑦,
and the firm has to produce at least a certain fixed amount 𝑢 of output, 𝑢 > 0. The
firm’s costs are 𝑎𝑥 + 𝑏𝑦 with 𝑎, 𝑏 > 0, which the firm tries to minimise. We want to
find the optimal choice of 𝑥 and 𝑦 for the firm under these conditions.
(a) Show that as described, the set of possible choices 𝑥, 𝑦 is closed but not
compact. Show that the search for an optimum can nevertheless be restricted
4.11. Exercises for Chapter 4 123

to a compact set (hint: introduce an additional restriction on the cost obtained


for a suitable feasible pair 𝑥, 𝑦).
(b) Solve the optimisation problem of the firm. Hint: simplify the condition on
the produced output. The solution will depend on the parameters 𝑢, 𝑎, 𝑏.

Exercise 4.5. Solve the problem of maximizing 𝑓 (𝑥, 𝑦, 𝑧) = 𝑥𝑦𝑧 subject to 𝑥 + 𝑦 + 𝑧 ≤


4, 0 ≤ 𝑥 ≤ 1, 𝑦 ≥ 2, 𝑧 ≥ 0. Hint: show that an optimal solution fulfills 𝑥, 𝑦, 𝑧 > 0.

Exercise 4.6. Consider the functions 𝑓 , 𝑔, ℎ : R2 → R,

3 𝑥 3
𝑓 (𝑥, 𝑦) = 𝑥 · 𝑦, 𝑔(𝑥, 𝑦) = 𝑥 2 − 𝑦 − , ℎ(𝑥, 𝑦) = − +𝑦− .
2 2 2
State why 𝑓 (𝑥, 𝑦) has or does not have a maximum or minimum on the set 𝑆,

𝑆 = {(𝑥, 𝑦) ∈ R2 | 𝑔(𝑥, 𝑦) ≤ 0, ℎ(𝑥, 𝑦) ≤ 0 },

and find all points of 𝑆 (if any) where 𝑓 attains its maximum and minimum.
5
Linear Optimisation

5.1 Introduction

We now consider certain optimisation problems on R𝑛 where the objective function


and all constraints are linear. This is treated in a chapter of its own because
the available methods can be applied to really high-dimensional problems, that
is, functions with a large number of variables. A common name for a linear
optimisation problem is linear program or LP, which we will use throughout.
This chapter can be largely read on its own. In Section 5.10, we tie it in to
results from the previous Chapter 4 to show the connections.
One prerequisite is Section 4.3 on notation for matrices, vectors, and scalars
(all of which are treated as special cases of matrices). This notation is also recalled
at the beginning of Sections 5.2 and 5.4.

5.1.1 Learning Outcomes

After studying this chapter, you should be able to:


• for linear inequalities in dimension two, draw with confidence the correspond-
ing lines in the plane and the halfspace (here half-plane) where that inequality
is valid
• draw the feasible set for an LP with two real variables, and indicate the
direction of the objective function
• state the dual LP of an LP in inequality form, and also later (see Section 5.7)
for an LP in equality form or with unconstrained variables
• state the Lemma of Farkas, and apply it to examples (as in Exercise 5.2)
• understand the differences between feasible, infeasible, and unbounded LPs
and how this relates to the dual LP
• state the complementarity slackness condition and apply it to finding optimal
solutions in small examples

124
5.1. Introduction 125

• describe the role of dictionaries for the simplex algorithm


• apply the simplex algorithm to small examples.

5.1.2 Essential Reading

Essential reading is this chapter.

5.1.3 Further Reading

Linear optimisation, also called linear programming, is truly central to optimisation.


There are many books written on linear programming, both from a practical and
from a theoretical perspective, and this chapter can only give an introduction to
the most important ideas.
The example in Section 5.3 is taken from
Matoušek, J. and B. Gärtner (2007). Understanding and Using Linear Programming.
Springer, Berlin. ISBN 978-3540306979.
Further good books on linear programming, in particular the simplex algorithm,
are:
Chvátal, V. (1983). Linear Programming. W. H. Freeman, New York. ISBN
978-0716715870.
Dantzig, G. B. (1963). Linear Programming and Extensions. Princeton University
Press, Princeton, NJ. ISBN 978-0691059136.
(George Dantzig is the inventor of the simplex algorithm), and
Gale, D. (1960). The Theory of Linear Economic Models. McGraw-Hill, New York.
ISBN 978-0070227286.
The last two books are classical texts that give mathematical definitions more
directly than the book by Chvátal (1983). The proof of the strong duality theorem
with the help of the Lemma of Farkas in Subsection 5.5.2 follows closely Gale (1960,
page 79).
A very authoritative book with many historical references is
Schrijver, A. (1986). Theory of Linear and Integer Programming. John Wiley &
Sons, Chichester, UK. ISBN 978-0471908548.
An alternative short proof of the Lemma of Farkas is given in chapter 3 of
Conforti, M., G. Cornuéjols, and G. Zambelli (2014). Integer Programming.
Springer, Cham. ISBN 978-3319110073.
The connection between linear programming and the KKT conditions (see Sec-
tion 5.10) is discussed in detail in
126 Chapter 5. Linear Optimisation

Kuhn, H. W. (1991). Nonlinear programming: A historical note. In: History of


Mathematical Programming: A Collection of Personal Reminiscences, edited by J. K.
Lenstra, A. H. G. Rinnoy Kan, and A. Schrijver, 82–96. CWI and North-Holland,
Amsterdam. ISBN 978-0444888181.

5.1.4 Synopsis of Chapter Content

• Section 5.2 is important in giving you a careful understanding of contour lines


of linear functions in the two-dimensional plane, which nicely visualise the
geometry of sets defined by linear equations and inequalities.
• Section 5.3 gives an introductory example in the two-dimensional plane about
how linear programs (LPs) work. It illustrates that an LP solution may not be
unique (although the value of the objective function is of course unique), and
may not exist either because there is no solution to the constraints, in which
case the LP is called infeasible, or because the objective function is unbounded,
in which case the LP itself is called unbounded.
• Section 5.4 motivates and states the central concept of LP duality. The key
idea is that the matrix of coefficients of the inequality constraints, their right-
hand side, and the objective function, written horizontally, can also be “read
vertically” and then define the dual LP, visualised in the “Tucker diagram”
(5.30). The original LP (called the primal LP) and its dual LP provide mutual
bounds. The strong duality theorem states that they have in fact equal values,
which are then provably optimal solutions.
• Section 5.5 is about the so-called Lemma of Farkas, also known as the theorem
of the separating hyperplane. It provides one way to prove the strong LP
duality theorem, which will be given. The Lemma of Farkas has a convincing
geometric intuition. We prove it in a number of subsections of Section 5.5.
• Section 5.6 studies the possible combinations of optimality, feasibility, and
unboundedness.
• Section 5.7 is about different types of constraints: instead of inequalities they
may be equations, and variables may be assumed to be nonnegative or have no
sign constraints. For an inequality constraint the corresponding dual variable
is nonnegative, and for an equality constraint the corresponding dual variable
is constrained. This is fully in line with the motivation in Section 5.4 of the
dual LP as providing an upper bound on the primal objective function.
• The optional Section 5.8 is about mixing these types of constraints in a single
“general” LP. Conceptually, this is just what has been explained in Section 5.7,
but is a bit more complicated to write down.
• Section 5.9 is an important combinatorial condition called complementarity
slackness. This condition shows if a primal solution vector 𝑥 and dual solution
5.2. Linear Functions, Hyperplanes, and Halfspaces 127

vector 𝑦 are both optimal. Simply put, if there is an inequality involved, one of
these inequalities on the primal or dual side must be tight, for each primal and
dual variable. If the constraint is an equality, this holds automatically.
• The purpose of Section 5.10 (also optional) is to connect LP duality with the
KKT conditions from Section 4.9 in the previous chapter. Essentially, if the
optimisation problem is linear, then LP duality means the same as the KKT
conditions.
• Section 5.11 introduces the most important algorithm for solving an LP, called
the simplex algorithm, by means of an example.
• The final, also optional Section 5.12 is a description of the simplex algorithm
for an LP subject to equality constraints given in abstract form. The algorithm
traverses a sequence of basic solutions to these equality constraints. This de-
scription is meant to describe the mathematical concepts behind the algorithm,
in algebraic form, for the mathematically minded student. The main ideas
have already been covered in the preceding Section 5.11.

5.2 Linear Functions, Hyperplanes, and Halfspaces

So far we have used the letters 𝑥, 𝑦, 𝑧 to name variables in small-dimensional


examples, but will from now on use in such examples 𝑥1 , 𝑥2 , 𝑥3 instead. The reason
is that in our theory 𝑥 and 𝑦 will be certain vectors of variables, with 𝑦 called “dual
variables”, so that a different use of 𝑥 and 𝑦 in examples would be too confusing.
The examples are therefore slightly harder to read and to write down, but their
correspondence with the general, really useful notation will be clearer.
In the example (4.68), we have made our life unnecessarily hard because we
could also have written the third constraint simply as 𝑥1 + 𝑥2 ≤ 1, so that the entire
problem could be written as

maximise 2𝑥1 + 𝑥 2 subject to 𝑥1 ≥ 0, 𝑥2 ≥ 0, 𝑥1 + 𝑥 2 ≤ 1. (5.1)

In this form, the problem is in the standard inequality form of a linear optimisation
problem, also called linear programming problem, or just linear program. (The term
“programming” was popular in the middle of the 20th century when optimisation
problems started to be solved with computer programs, with electronic computers
also being developed around the same time.)
A linear function 𝑓 : R𝑛 → R is of the form

𝑓 (𝑥1 , . . . , 𝑥 𝑛 ) = 𝑐 1 𝑥1 + · · · + 𝑐 𝑛 𝑥 𝑛 (5.2)

for suitable real coefficients 𝑐 1 , . . . , 𝑐 𝑛 . These coefficients define an 𝑛-tuple 𝑐 =


(𝑐1 , . . . , 𝑐 𝑛 ) in R𝑛 . As a reminder from Section 4.3, we then write (5.2) as 𝑓 (𝑥) = 𝑐⊤𝑥
128 Chapter 5. Linear Optimisation

which means that both 𝑐 and 𝑥 are considered as column vectors in R𝑛 , which we
consider as 𝑛 × 1 matrices. Then 𝑐⊤ is the corresponding row vector (a 1 × 𝑛 matrix)
and 𝑐⊤𝑥 is just a matrix product, which in this case produces a 1 × 1 matrix which
is a real number that represents the scalar product of the vectors 𝑐 and 𝑥 in (5.2).
For that reason, we write the multiplication of a vector 𝑥 with a scalar 𝜆 in the
form 𝑥𝜆 (rather than as 𝜆𝑥) because it is the product of a 𝑛 × 1 with a 1 × 1 matrix.
This consistency is very helpful in re-grouping products of several matrices and
vectors.
Recall that we write the derivative 𝐷 ℎ(𝑥) of a function ℎ as a row vector, so
that we multiply it with a scalar like 𝜆 from the left as in 𝜆𝐷 ℎ(𝑥). Also, when we
write 𝑥 = (𝑥1 , . . . , 𝑥 𝑛 ), say, then this is just meant to define 𝑥 as an 𝑛-tuple of real
numbers and not as a row vector, because otherwise we would always have to
introduce 𝑥 tediously as 𝑥 = (𝑥1 , . . . , 𝑥 𝑛 )⊤. The thing to remember is that when we
use matrix multiplication, then a vector like 𝑥 is always a column vector and 𝑥⊤ is
a row vector.
Let 𝑐 ≠ 0 (where 0 is the vector with all components zero, in any dimension),
and let 𝑓 be the linear function defined by 𝑓 (𝑥) = 𝑐⊤𝑥 as in (5.2). The set
{𝑥 | 𝑓 (𝑥) = 0} where 𝑓 takes value 0 is a linear subspace of R𝑛 . By definition, it
consists of all vectors 𝑥 that are orthogonal to 𝑐, that is, have scalar product 0 with 𝑐.
If 𝑛 = 2, then this “nullspace” of 𝑓 is a line, but in general it will be a “hyperplane”
in R𝑛 , a space of dimension 𝑛 − 1.
More generally, let 𝑢 ∈ R and consider the set

𝐻 = {𝑥 ∈ R𝑛 | 𝑓 (𝑥) = 𝑢} = {𝑥 ∈ R𝑛 | 𝑐⊤𝑥 = 𝑢} (5.3)

where 𝑓 takes value 𝑢, which we have earlier called a contour set or level set for 𝑓 .
Then for any two 𝑥 and 𝑥ˆ on this level set 𝐻, that is, so that 𝑓 (𝑥) = 𝑓 (𝑥)
ˆ = 𝑢, we
have 𝑐 (𝑥 − 𝑥)
⊤ ˆ = 0, so that the vector 𝑥 − 𝑥ˆ is orthogonal to 𝑐. Then 𝐻 is also called
a hyperplane through the point 𝑥 (which does not contain the origin 0 unless 𝑢 = 0)
with normal vector 𝑐. To repeat, such a hyperplane 𝐻 is of the form (5.3) for some
𝑐 ∈ R𝑛 , 𝑐 ≠ 0, and 𝑢 ∈ R. The different contour sets for 𝑓 are therefore parallel
hyperplanes, all with the same normal vector 𝑐.
Figure 5.1 shows an example of such level sets, where these “hyperplanes”
are contour lines because 𝑛 = 2. The vector 𝑐, here 𝑐 = (2, −1), is orthogonal to
any such level set. Moreover, 𝑐 points in the direction in which the function value
of 𝑓 (𝑥) increases, because if we replace 𝑥 by 𝑥 + 𝑐 then 𝑓 (𝑥) changes from 𝑐⊤𝑥 to
𝑓 (𝑥 + 𝑐) = 𝑐⊤𝑥 + 𝑐⊤𝑐 which is larger than 𝑐⊤𝑥 because 𝑐⊤𝑐 = 𝑛𝑖=1 𝑐 2𝑖 > 0 because
Í
𝑐 ≠ 0. Note that 𝑐 may have negative components (as in the figure). Only the
direction of 𝑐 matters to find out where 𝑓 (𝑥) gets larger.
Similar to a hyperplane 𝐻 defined by 𝑐 and 𝑢 in (5.3), a halfspace 𝑆 is defined
by an inequality according to
5.2. Linear Functions, Hyperplanes, and Halfspaces 129

0 0 (1,0) (2,0)

c S c

Figure 5.1 Left: Contour lines (level sets) of the function 𝑓 : R2 → R defined
by 𝑓 (𝑥) = 𝑐⊤𝑥 for 𝑐 = (2, −1). Right: Halfspace 𝑆 in (5.4) given by 𝑐⊤𝑥 ≤ 5. As
before, the strokes next to 𝐻 indicate the side of the line where this inequality
is valid, which defines 𝑆.

𝑆 = {𝑥 ∈ R𝑛 | 𝑐⊤𝑥 ≤ 𝑢} (5.4)

which consists of all points 𝑥 that are on the hyperplane 𝐻 or “below” it, that
is, with smaller values of 𝑐⊤𝑥 than the points on 𝐻. Figure 5.1 shows such a
halfspace 𝑆 for 𝑐 = (2, −1) and 𝑢 = 5, which contains, for example, the point
𝑥 = (2.5, 0). It is customary to “shade” the side of the hyperplane 𝐻 that defines 𝑆
with a few small parallel strokes as shown in the picture, and then it is not needed
to indicate 𝑐 which is the orthogonal vector to 𝐻 that points away from 𝑆.
x2

c
(0,1)

(1,0)
(0,0)
x1

Figure 5.2 The feasible set and the objective function for the example (5.1),
and the optimal point (1, 0).

With these conventions, Figure 5.2 gives a graphical description of the problem
in (5.1) where the feasible set where all inequalities hold is the intersection of the
three halfspaces defined by 𝑥1 ≥ 0 (which could be written as −𝑥1 ≤ 0), 𝑥2 ≥ 0,
(which could be written as −𝑥2 ≤ 0), and 𝑥 1 + 𝑥2 ≤ 0. This is the shaded triangle.
In this graphical way, the optimal solution is nearly obvious.
130 Chapter 5. Linear Optimisation

5.3 Linear Programming in Two Dimensions

Consider the following linear program:

maximise 𝑥1 + 𝑥2
subject to 𝑥1 ≥ 0
𝑥2 ≥ 0
(5.5)
−𝑥1 + 𝑥 2 ≤ 1
𝑥1 + 6𝑥 2 ≤ 15
4𝑥1 − 𝑥 2 ≤ 10 .

The set of points (𝑥1 , 𝑥2 ) in R2 that fulfill these inequalities is called the feasible set
and shown in Figure 5.3.
𝑥2

𝐷 𝑓 = (1, 1)
−𝑥1 + 𝑥2 ≤ 1

𝑥1 + 6𝑥2 ≤ 15
(3, 2)
𝑥1 + 𝑥2 = 5

𝑥2 ≥ 0
𝑥1
𝑥1 ≥ 0

4𝑥 1 − 𝑥2 ≤ 10

Figure 5.3 Feasible set and objective function vector (1, 1) for the LP (5.5),
with optimum at (3, 2) and objective function value 5.

The contour lines of the objective function 𝑓 (𝑥1 , 𝑥2 ) = 𝑥 1 + 𝑥 2 are parallel lines
where the maximum is clearly at the top-right corner (3, 2) of the feasible set, where
the constraints ℎ 4 (𝑥 1 , 𝑥2 ) = 𝑥1 + 6𝑥 2 ≤ 15 and ℎ5 (𝑥1 , 𝑥2 ) = 4𝑥1 − 𝑥 2 ≤ 10 are tight.
The fact that this is a local maximum can be seen with the KKT Theorem 4.11
because there are nonnegative 𝜇4 and 𝜇5 so that 𝐷 𝑓 = (1, 1) = 𝜇4 𝐷 ℎ4 + 𝜇5 𝐷 ℎ5 =
𝜇4 (1, 6) + 𝜇5 (4, −1), namely 𝜇4 = 51 , 𝜇5 = 15 . We can write 𝐷 𝑓 instead of 𝐷 𝑓 (𝑥 1 , 𝑥2 )
because the gradient of a linear function is constant. The picture shows that (3, 2)
is in fact also the global maximum of 𝑓 . We will see that the KKT Theorem has a
simpler version for linear programming, which is called the duality theorem, where
5.3. Linear Programming in Two Dimensions 131

the multipliers 𝜇𝑖 are called dual variables. Moreover, there will be better ways of
finding a maximum than testing all combinations of possible tight constraints.
𝑥2
𝐷 𝑓 = ( 61 , 1)

−𝑥1 + 𝑥2 ≤ 1

𝑥1 + 6𝑥2 ≤ 15
6 𝑥1 + 𝑥2 =
1 5
2

𝑥2 ≥ 0
𝑥1
𝑥1 ≥ 0

4𝑥 1 − 𝑥2 ≤ 10

Figure 5.4 Feasible set and objective function vector ( 16 , 1) with non-unique
maximum along the side where the constraint 𝑥 1 + 6𝑥 2 ≤ 15 is tight.

It can be shown that if the feasible set is bounded, then an optimum of a linear
program can always be found at a corner of the feasible set. However, there may be
more than one corner where the optimum is obtained, and then any point on the
line segment that connects these optimal corners is also optimal. Figure 5.4 shows
this with the same constraints as in (5.5), but a different objective function to be
maximised, namely 𝑓 (𝑥 1 , 𝑥2 ) = 61 𝑥 1 + 𝑥 2 . The corner (3, 2) is also optimal here, but
so is the entire line where 𝑓 (𝑥1 , 𝑥2 ) = 16 𝑥1 + 𝑥2 = 52 (intersected with the feasible
set), which coincides with the tight constraint 𝑥1 + 6𝑥 2 = 15.
Figure 5.5 shows an example where the feasible set is empty, which is called
an infeasible linear program. This occurs, for example, by reversing two of the
inequalities in (5.5) to obtain the following constraints:

𝑥1 ≥ 0
𝑥2 ≥ 0
−𝑥1 + 𝑥 2 ≥ 1 (5.6)
𝑥1 + 6𝑥 2 ≤ 15
4𝑥1 − 𝑥2 ≥ 10 .

Finally, an optimal solution need not exist even when there are feasible
solutions. This happens when the objective function can attain arbitrarily large
132 Chapter 5. Linear Optimisation

𝑥2
−𝑥1 + 𝑥2 ≥ 1

𝑥 1 + 6𝑥2 ≤ 15

𝑥2 ≥ 0
𝑥1
𝑥1 ≥ 0

4𝑥1 − 𝑥2 ≥ 10

Figure 5.5 Example of an infeasible set, for the constraints (5.6). Recall
that the little strokes indicate the side where the inequality is valid, and here
there is no point (𝑥1 , 𝑥2 ) where all inequalities are valid. This would be the
case even without the constraints 𝑥 1 ≥ 0 and 𝑥2 ≥ 0.

values; such a linear program is called unbounded. This is the case when we remove
the constraints 4𝑥 1 − 𝑥2 ≤ 10 and 𝑥1 + 6𝑥 2 ≤ 15 from the initial example (5.5), as
shown in Figure 5.6.
𝑥2

−𝑥1 + 𝑥2 ≤ 1

𝐷 𝑓 = (1, 1)

𝑥2 ≥ 0
𝑥1
𝑥1 ≥ 0

Figure 5.6 Example of a feasible set with an unbounded objective function.

The pictures shown in this section provide a good intuition of how linear
programs look in principle. However, this graphical method hardly extends
beyond R2 or R3 . Our development of the theory of linear programming will
5.4. Linear Programs and Duality 133

proceed largely algebraically, with some geometric intuition for the important
Theorem 5.4 of Farkas.

5.4 Linear Programs and Duality

We recall the notation introduced in Section 4.3. For positive integers 𝑚, 𝑛, the set
of 𝑚 × 𝑛 matrices is denoted by R𝑚×𝑛 . An 𝑛-vector is an element of R𝑛 . Unless
stated otherwise, all vectors are column vectors, so a vector 𝑥 in R𝑛 is considered
as an 𝑛 × 1 matrix. Its transpose 𝑥⊤ is the corresponding row vector in R1×𝑛 .
The components of an 𝑛-vector 𝑥 are 𝑥1 , . . . , 𝑥 𝑛 . The vectors 0 and 1 have all
components equal to zero and one, respectively, and have suitable dimension,
which may vary with each use of 0 or 1. An inequality between vectors like 𝑥 ≥ 0
holds for all components. The identity matrix, of any dimension, is denoted by 𝐼.
A linear optimisation problem or linear program (LP) says: optimise (maximise
or minimise) a linear objective function subject to linear constraints (inequalities or
equalities).
The standard inequality form of an LP is given by an 𝑚 × 𝑛 matrix 𝐴, an 𝑚-vector
𝑏 and an 𝑛-vector 𝑐 and says:

maximise 𝑐⊤𝑥
subject to 𝐴𝑥 ≤ 𝑏 , (5.7)
𝑥 ≥ 0.

Here 𝑥 is the 𝑛-vector (𝑥 1 , . . . , 𝑥 𝑛 )⊤ of variables. The vector 𝑐 of coefficients


determines the objective function. The matrix 𝐴 and the vector 𝑏 determine the
𝑚 linear constraints, which are here only inequalities. Furthermore, all variables
𝑥1 , . . . , 𝑥 𝑛 are constrained to be nonnegative; this is stated as 𝑥 ≥ 0 separately from
𝐴𝑥 ≤ 𝑏 because nonnegative variables are a standard case.
The LP (5.7) states maximisation subject to “upper bounds” (inequalities “≤”).
A way to remember this is to assume that the components of 𝐴, 𝑏 and 𝑐 are positive
(they do not have to be), so that maximising 𝑐⊤𝑥 is restricted by the “upper bounds”
imposed by 𝐴𝑥 ≤ 𝑏. If 𝐴 has only positive entries, then 𝐴𝑥 ≤ 𝑏 can only be fulfilled
for nonnegative 𝑥 if 𝑏 ≥ 0.
In general, the LP (5.7) is called feasible if 𝐴𝑥 ≤ 𝑏 and 𝑥 ≥ 0 hold for some 𝑥,
otherwise infeasible. If 𝑐⊤𝑥 can be arbitrarily large for suitable 𝑥 subject to these
constraints, then the LP is called unbounded, otherwise bounded. The LP can have
an optimal solution only if it is feasible and bounded. We will see that in that case
it has an optimal solution even though the feasible set {𝑥 ∈ R𝑛 | 𝐴𝑥 ≤ 𝑏, 𝑥 ≥ 0} is
in general not compact.
134 Chapter 5. Linear Optimisation

Example 5.1. Consider (5.7) with


" # " #
3 4 2 7 ⊤
h i
𝐴= , 𝑏= , 𝑐 = 8 10 5 ,
1 1 1 2

which can be stated explicitly as: for 𝑥 1 , 𝑥2 , 𝑥3 ≥ 0 subject to

3𝑥1 + 4𝑥2 + 2𝑥3 ≤ 7


𝑥1 + 𝑥2 + 𝑥3 ≤ 2 (5.8)
maximise 8𝑥1 + 10𝑥2 + 5𝑥3 .

The horizontal line is often written to separate the objective function from the
constraints.

One feasible solution to (5.8) is 𝑥1 = 0, 𝑥2 = 1, 𝑥3 = 1 with objective function


value 15. Another is 𝑥1 = 1, 𝑥2 = 1, 𝑥3 = 0 with objective function value 18, which
is better. (We often choose integers in our examples, but coefficients and variables
are allowed to assume any real values.) How do we know when we have an
optimal value?
The dual of an LP can be motivated by finding an upper bound to the objective
function of the given LP (which is called the primal LP). The dual LP results by
reading the constraint matrix vertically rather than horizontally and exchanging
the roles of objective function and right-hand side, as follows.
In the example (5.8), we multiply each of the two inequalities by some
nonnegative number, for example the first inequality by 𝑦1 = 1 and the second by
𝑦2 = 6, and add the inequalities up, which yields

(3 + 6)𝑥1 + (4 + 6)𝑥2 + (2 + 6)𝑥 3 ≤ 7 + 6 · 2

or
9𝑥1 + 10𝑥2 + 8𝑥3 ≤ 19.
In this inequality, which holds for any feasible solution, all coefficients of the
nonnegative variables 𝑥 𝑗 are at least as large as in the primal objective function, so
the right-hand side 19 is certainly an upper bound for this objective function. In
fact, we can obtain an even better bound by multiplying the two primal inequalities
by 𝑦1 = 2 and 𝑦2 = 2, getting

(3 · 2 + 2)𝑥1 + (4 · 2 + 2)𝑥2 + (2 · 2 + 2)𝑥3 ≤ 2 · 7 + 2 · 2

or
8𝑥1 + 10𝑥2 + 6𝑥3 ≤ 18.
5.4. Linear Programs and Duality 135

Again, all coefficients are at least as large as in the primal objective function. Thus,
it cannot be larger than 18, which was achieved by the above solution 𝑥 1 = 1, 𝑥2 = 1,
𝑥3 = 0, which is therefore optimal.
In general, the dual LP for the primal LP (5.7) is obtained as follows:
• Multiply each primal inequality by some nonnegative number 𝑦 𝑖 (so as to not
reverse the inequality).
• Sum the resulting entries of each of the 𝑛 columns and require that the resulting
coefficient of 𝑥 𝑗 for 𝑗 = 1, . . . , 𝑛 is at least as large as the coefficient 𝑐 𝑗 of the
objective function. (Because 𝑥 𝑗 ≥ 0, this will at most increase the objective
function.)
• Minimise the resulting right-hand side 𝑦1 𝑏 1 + · · · + 𝑦𝑚 𝑏 𝑚 (because it is an upper
bound for the primal objective function).
So the dual of (5.7) says:

minimise 𝑦⊤𝑏
(5.9)
subject to 𝑦⊤𝐴 ≥ 𝑐⊤, 𝑦 ≥0.

Clearly, (5.9) is also an LP in standard inequality form because it can be written as:
maximise −𝑏⊤𝑦 subject to −𝐴⊤𝑦 ≤ −𝑐, 𝑦 ≥ 0 . In that way, it is easy to see that the
dual LP of the dual LP (5.9) is again the primal LP (5.7).
A good way to simultaneously picture the primal and dual LP (which are
defined by the same data 𝐴, 𝑏, 𝑐) is the following “Tucker diagram”:
𝑥≥0

𝑦≥0 𝐴 ≤ 𝑏 (5.10)

∨ ↩→ min
𝑐⊤ → max

The diagram (5.10) shows the 𝑚 × 𝑛 matrix 𝐴 with the 𝑚-vector 𝑏 on the right
and the row vector 𝑐⊤ at the bottom. The top shows the primal variables 𝑥 with
their constraints 𝑥 ≥ 0. The left-hand side shows the dual variables 𝑦 with their
constraints 𝑦 ≥ 0. The primal LP is to be read horizontally, with constraints 𝐴𝑥 ≤ 𝑏,
and the objective function 𝑐⊤𝑥 that is to be maximised. The dual LP is to be read
vertically, with constraints 𝑦⊤𝐴 ≥ 𝑐⊤ (where in the diagram (5.10) ≥ is written
vertically as ∨ ), and the objective function 𝑦⊤𝑏 that is to be minimised. A way
to remember the direction of the inequalities is to see that one inequality 𝐴𝑥 ≤ 𝑏
points “towards” 𝐴 and the other, 𝑦⊤𝐴 ≥ 𝑐⊤, “away from” 𝐴, where maximisation
is subject to upper bounds and minimisation subject to lower bounds, apart from
the nonnegativity constraints for 𝑥 and 𝑦.
136 Chapter 5. Linear Optimisation

The fact that the primal and dual objective functions are mutual bounds is
known as the “weak duality” theorem, which is very easy to prove – essentially in
the way we have motivated the dual LP above.
Theorem 5.2 (Weak LP duality). For a pair 𝑥, 𝑦 of feasible solutions to the primal LP
(5.7) and its dual LP (5.9), the objective functions are mutual bounds:

𝑐⊤𝑥 ≤ 𝑦⊤𝑏 .

If thereby 𝑐⊤𝑥 = 𝑦⊤𝑏 (equality holds), then these two solutions are optimal for both LPs.

Proof. In general, if 𝑢, 𝑣, 𝑤 are vectors of the same dimension, then

𝑢 ≥ 0, 𝑣 ≤ 𝑤 ⇒ 𝑢⊤𝑣 ≤ 𝑢⊤𝑤 , (5.11)

because 𝑣 ≤ 𝑤 is equivalent to (𝑤 − 𝑣) ≥ 0 which with 𝑢 ≥ 0 implies 𝑢⊤(𝑤 − 𝑣) ≥ 0


and hence 𝑢⊤𝑣 ≤ 𝑢⊤𝑤; note that this is an inequality between scalars which can
also be written as 𝑣⊤𝑢 ≤ 𝑤⊤𝑢.
Feasibility of 𝑥 for (5.7) and of 𝑦 for (5.9) means 𝐴𝑥 ≤ 𝑏, 𝑥 ≥ 0, 𝑦⊤𝐴 ≥ 𝑐⊤, 𝑦 ≥ 0.
Using (5.11), this implies

𝑐⊤𝑥 ≤ (𝑦⊤𝐴)𝑥 = 𝑦⊤(𝐴𝑥) ≤ 𝑦⊤𝑏

as claimed.
If 𝑐⊤𝑥 ∗ = (𝑦 ∗ )⊤𝑏 for some primal feasible 𝑥 ∗ and dual feasible 𝑦 ∗ , then 𝑐⊤𝑥 ≤
(𝑦 ∗ )⊤𝑏 = 𝑐⊤𝑥 ∗ for any primal feasible 𝑥, and 𝑦⊤𝑏 ≥ 𝑐⊤𝑥 ∗ = (𝑦 ∗ )⊤𝑏 for any dual
feasible 𝑦, so equality of the objective functions implies optimality.
The following “strong duality” theorem is the central theorem of linear
programming.
Theorem 5.3 (Strong LP duality). Whenever both the primal LP (5.7) and its dual LP
(5.9) are feasible, they have optimal solutions with equal value of their objective functions.

We will prove this theorem in Section 5.5. Its proof is not trivial. In fact,
many theorems in economics have a hidden LP duality so that they can be proved
by writing down a suitable LP and interpreting its dual LP. For that reason,
Theorem 5.3 is extremely useful.

5.5 The Lemma of Farkas and Strong LP Duality

This section is about the Lemma of Farkas, also known as the theorem of the
separating hyperplane. It is used to prove the strong LP duality Theorem 5.3.
The Lemma of Farkas gives a condition that the conditions 𝐴𝑥 = 𝑏, 𝑥 ≥ 0 have
no solution. It has a strong geometric intuition. Moreover, solutions to the system
5.5. The Lemma of Farkas and Strong LP Duality 137

𝐴𝑥 = 𝑏, 𝑥 ≥ 0 are used in the important simplex algorithm for solving an LP, which
we treat later (from Section 5.11 onwards).
In this section we first state and explain the Lemma of Farkas. We then use it to
prove strong LP duality. The Lemma of Farkas itself is then shown in a number of
steps. Each of these steps is quite accessible, and explained in a separate subsection
in order to have a better overview of the argument. Some of these proof steps,
such as the use of linearly independent solutions to 𝐴𝑥 = 𝑏 (see Subsection 5.5.6),
will also be of help for understanding the simplex algorithm.

5.5.1 Statement of the Lemma of Farkas

The Lemma of Farkas is concerned with the question of finding a nonnegative


solution 𝑥 to a system 𝐴𝑥 = 𝑏 of linear equations. If 𝐴 = [𝐴1 · · · 𝐴𝑛 ], this means
that 𝑏 is a nonnegative linear combination 𝐴1 𝑥1 + · · · + 𝐴𝑛 𝑥 𝑛 of the columns of 𝐴.
We write the set of these nonnegative linear combinations as

𝐶 = {𝐴𝑥 | 𝑥 ∈ R𝑛 , 𝑥 ≥ 0}. (5.12)

The set 𝐶 in (5.12) of nonnegative linear combinations of the column vectors of 𝐴


is also called the cone generated by these vectors. Figure 5.7 gives an example with
𝐴 ∈ R2×4 and a vector 𝑏 ∈ R2 that is not in the cone 𝐶 generated by 𝐴1 , 𝐴2 , 𝐴3 , 𝐴4 .

𝐶 𝐴4 𝐶 𝐴4
𝐴1 𝐴1
𝐴3 𝐴3
𝐴2 𝐴2
𝑐
𝐻 𝑦 = 𝑐−𝑏
0 0
𝑏 𝑏
Figure 5.7 Left: Vectors 𝐴1 , 𝐴2 , 𝐴3 , 𝐴4 , the cone 𝐶 generated by them (which
extends to infinity between the two “rays” that extend 𝐴3 and 𝐴2 ), and a vector
𝑏 not in 𝐶. Right: A separating hyperplane 𝐻 for 𝑏 with normal vector 𝑦 = 𝑐−𝑏.

The right diagram in Figure 5.7 shows a vector 𝑦 so that 𝑦⊤𝐴 𝑗 ≥ 0 for all 𝑗,
for 1 ≤ 𝑗 ≤ 𝑛, and 𝑦⊤𝑏 < 0. The set 𝐻 = {𝑧 ∈ R𝑚 | 𝑦⊤𝑧 = 0} is called a separating
hyperplane with normal vector 𝑦 because all vectors 𝐴 𝑗 are on one side of 𝐻 (they
fulfill 𝑦⊤𝐴 𝑗 ≥ 0, which includes the case 𝑦⊤𝐴 𝑗 = 0 where 𝐴 𝑗 belongs to 𝐻, like 𝐴2
in Figure 5.7), whereas 𝑏 is strictly on the other side of 𝐻 because 𝑦⊤𝑏 < 0. The
Lemma of Farkas asserts that such a separating hyperplane exists for any 𝑏 that
does not belong to 𝐶.

Theorem 5.4 (Lemma of Farkas). Let 𝐴 ∈ R𝑚×𝑛 and 𝑏 ∈ R𝑚 . Then exactly one of the
following statements holds:
138 Chapter 5. Linear Optimisation

(a) ∃𝑥 ∈ R𝑛 : 𝑥 ≥ 0, 𝐴𝑥 = 𝑏.
(b) ∃𝑦 ∈ R𝑚 : 𝑦⊤𝐴 ≥ 0⊤, 𝑦⊤𝑏 < 0.

In Theorem 5.4, it is clear that (a) and (b) cannot both hold because if (a) holds,
then 𝑦⊤𝐴 ≥ 0⊤ implies 𝑦⊤𝑏 = 𝑦⊤(𝐴𝑥) = (𝑦⊤𝐴)𝑥 ≥ 0.
If (a) is false, that is, 𝑏 does not belong to the cone 𝐶 in (5.12), then 𝑦 can be
constructed by the following intuitive geometric argument: Take a vector 𝑐 in 𝐶
that is closest to 𝑏 (see Figure 5.7), and let 𝑦 = 𝑐 − 𝑏. We will show later that 𝑦
fulfills the conditions in (b) and that the point 𝑐 exists (which is nontrivial and
shown in Section 5.5.7).
However, we postpone the proof of the Lemma of Farkas in order to show first
how it can be used to prove the strong LP duality theorem. For that we use some
elementary algebraic manipulations and no longer appeal to geometry.

5.5.2 Proof of Strong LP Duality

Theorem 5.4 is concerned with finding nonnegative solutions 𝑥 to a system of


equations 𝐴𝑥 = 𝑏. We will use a closely related “inequality form” of this theorem
that concerns nonnegative solutions 𝑥 to a system of inequalities 𝐴𝑥 ≤ 𝑏.
Theorem 5.5 (Lemma of Farkas with inequalities). Let 𝐴 ∈ R𝑚×𝑛 and 𝑏 ∈ R𝑚 . Then
exactly one of the following statements holds:
(a) ∃𝑥 ∈ R𝑛 : 𝑥 ≥ 0, 𝐴𝑥 ≤ 𝑏.
(b) ∃𝑦 ∈ R𝑚 : 𝑦 ≥ 0, 𝑦⊤𝐴 ≥ 0⊤, 𝑦⊤𝑏 < 0 .

Note the subtle difference between the conditions (a) and (b) in Theorems 5.4
and 5.5, respectively. Theorem 5.4(a) is about equations 𝐴𝑥 = 𝑏 and Theorem 5.5(a)
is about inequalities 𝐴𝑥 ≤ 𝑏. Theorem 5.4(b) states the existence of an arbitrary
vector 𝑦 and Theorem 5.5(b) the existence of a nonnegative vector 𝑦. This is a
recurring theme: equations are preserved when multiplying them with arbitrary
coefficients, namely 𝐴𝑥 = 𝑏 implies 𝑦⊤𝐴𝑥 = 𝑦⊤𝑏 for any 𝑦, whereas inequalities
𝐴𝑥 ≤ 𝑏 are only preserved if we have 𝑦 ≥ 0, which then implies 𝑦⊤𝐴𝑥 ≤ 𝑦⊤𝑏. The
following proof uses a trick (the introduction of “slack variables” 𝑠) to convert
inequalities into equations. This trick will also be used again, see (5.37).
Proof of Theorem 5.5. Clearly, there is a vector 𝑥 so that 𝐴𝑥 ≤ 𝑏 and 𝑥 ≥ 0 if and
only if there are 𝑥 ∈ R𝑛 and 𝑠 ∈ R𝑚 with
𝐴𝑥 + 𝑠 = 𝑏, 𝑥 ≥ 0, 𝑠 ≥ 0. (5.13)
The system (5.13) is a system of equations as in Theorem 5.4 with the matrix [𝐴 𝐼]
instead of 𝐴, where 𝐼 is the 𝑚 × 𝑚 identity matrix, and the vector (𝑥, 𝑠) instead of 𝑥.
The condition 𝑦⊤[𝐴 𝐼] ≥ 0⊤ in Theorem 5.4(b) is then simply 𝑦⊤𝐴 ≥ 0⊤, 𝑦 ≥ 0 as
stated here in (b).
5.5. The Lemma of Farkas and Strong LP Duality 139

We can now prove the strong duality theorem.


Proof of Theorem 5.3. We assume that (5.7) and (5.9) are feasible, and want to
show that there are feasible 𝑥 and 𝑦 so that 𝑐⊤𝑥 ≥ 𝑦⊤𝑏, which by the weak duality
Theorem 5.2 implies 𝑐⊤𝑥 = 𝑦⊤𝑏. Suppose, to the contrary, that there are (feasible)
𝑥,
ˆ 𝑦ˆ such that
𝑥ˆ ≥ 0, 𝐴 𝑥ˆ ≤ 𝑏, 𝑦ˆ ≥ 0, 𝑦ˆ⊤𝐴 ≥ 𝑐⊤ (5.14)
but no (optimal) solution 𝑥, 𝑦 to the system of 𝑛 + 𝑚 + 1 inequalities

− 𝐴⊤𝑦 ≤ − 𝑐
𝐴𝑥 ≤ 𝑏 (5.15)
−𝑐⊤𝑥 + 𝑏⊤𝑦 ≤ 0

and 𝑥, 𝑦 ≥ 0. By Theorem 5.5, this means that there are 𝑢 ∈ R𝑛 , 𝑣 ∈ R𝑚 , and 𝑡 ∈ R


such that

𝑢 ≥ 0, 𝑣 ≥ 0, 𝑡 ≥ 0, 𝑣⊤𝐴 − 𝑡𝑐⊤ ≥ 0⊤, −𝑢⊤𝐴⊤ + 𝑡𝑏⊤ ≥ 0⊤, (5.16)

but
−𝑢⊤𝑐 + 𝑣⊤𝑏 < 0 . (5.17)
We derive a contradiction as follows: If 𝑡 = 0, this means that already the first
𝑛 + 𝑚 inequalities in (5.16) have no nonnegative solution 𝑥, 𝑦, contrary to our
assumption, for the following reason. Namely, by (5.16) 𝑡 = 0 means that 𝑣⊤𝐴 ≥ 0⊤,
𝑢⊤𝐴⊤ ≤ 0⊤, which with (5.14) implies

𝑣⊤𝑏 ≥ 𝑣⊤𝐴 𝑥ˆ ≥ 0 ≥ 𝑢⊤𝐴⊤𝑦ˆ ≥ 𝑢⊤𝑐

in contradiction to (5.17).
If 𝑡 > 0, then 𝑢 and 𝑣 are essentially primal and dual feasible solutions that
violate weak LP duality, because then by (5.16) 𝑏𝑡 ≥ 𝐴𝑢 and 𝑣⊤𝐴 ≥ 𝑡𝑐⊤ and
therefore
𝑣⊤𝑏 𝑡 ≥ 𝑣⊤𝐴𝑢 ≥ 𝑡𝑐⊤𝑢 ,
which is an equality between two scalars. After division by 𝑡 it gives 𝑣⊤𝑏 ≥ 𝑐⊤𝑢,
again contradicting (5.17).
In summary, if the first 𝑛 + 𝑚 inequalities in (5.15) have a solution 𝑥, 𝑦 ≥ 0,
then there is also such a solution that fulfills the last inequality, as claimed by the
strong LP duality Theorem 5.3.

5.5.3 Convex Sets

The following subsections are about completing the proof of Theorem 5.4, the
Lemma of Farkas. We will use geometric arguments about points in R𝑚 . In most
140 Chapter 5. Linear Optimisation

cases, this will involve at most three such points. These three points define a
triangle, so that the arguments are very accessibly visualised in a two-dimensional
plane.
Our first geometric concept, discussed in this subsection, is about convex
combinations and convexity.

𝑤′
𝑥
𝑧
𝑦
𝑤


{𝑥 + (𝑦 − 𝑥)𝑝 | 𝑝 ∈ R}

Figure 5.8 The line through the points 𝑥 and 𝑦 consists of points written as
𝑥 + (𝑦 − 𝑥)𝑝 where 𝑝 ∈ R. Examples are point 𝑧 for 𝑝 = 0.6, point 𝑤 for 𝑝 = 1.5,
and point 𝑤 ′ when 𝑝 = −0.4. The line segment [𝑥, 𝑦] that connects 𝑥 and 𝑦
(drawn as a solid line) results when 𝑝 is restricted to 0 ≤ 𝑝 ≤ 1.

Let 𝑥 and 𝑦 be two vectors in R𝑚 . Figure 5.8 shows two points 𝑥 and 𝑦 in the
plane, but the picture may also be regarded as a suitable view of the situation
in a higher-dimensional space. The line that goes through the points 𝑥 and 𝑦 is
obtained by adding to the point 𝑥, regarded as a vector, any scalar multiple of the
difference 𝑦 − 𝑥. The resulting vector 𝑥 + (𝑦 − 𝑥)𝑝, for 𝑝 ∈ R, gives 𝑥 when 𝑝 = 0
and 𝑦 when 𝑝 = 1. Figure 5.8 gives some examples 𝑧, 𝑤, 𝑤 ′ of other points. When
0 ≤ 𝑝 ≤ 1, as for point 𝑧, the resulting points define the line segment that joins 𝑥
and 𝑦, which we denote by [𝑥, 𝑦] (note that 𝑥 and 𝑦 belong to R𝑚 here):

[𝑥, 𝑦] = {𝑥(1 − 𝑝) + 𝑦𝑝 | 𝑝 ∈ [0, 1]} . (5.18)

If 𝑝 > 1, then one obtains points on the line through 𝑥 and 𝑦 on the other side of 𝑦
relative to 𝑥, like the point 𝑤 in Figure 5.8. For 𝑝 < 0, the corresponding point,
like 𝑤 ′ in Figure 5.8, is on that line but on the other side of 𝑥 relative to 𝑦.
As already done in (5.18), the expression 𝑥 + (𝑦 − 𝑥)𝑝 can be re-written as
𝑥(1 − 𝑝) + 𝑦𝑝, where the given points 𝑥 and 𝑦 appear only once. This special linear
combination of 𝑥 and 𝑦 with nonnegative coefficients that sum to one is called a
convex combination of 𝑥 and 𝑦. It is useful to remember the expression 𝑥(1 − 𝑝) + 𝑦𝑝
in this order with 1 − 𝑝 as the coefficient of the first vector and 𝑝 of the second
vector, because then the line segment [𝑥, 𝑦] that joins 𝑥 to 𝑦 corresponds to the
real interval [0, 1] for the possible values of 𝑝, with the endpoints 0 and 1 of the
interval corresponding to the respective endpoints 𝑥 and 𝑦 of the line segment.
5.5. The Lemma of Farkas and Strong LP Duality 141

In general, a convex combination of points 𝑧 1 , 𝑧2 , . . . , 𝑧 𝑘 in some space is given


as any linear combination 𝑧 1 𝑝1 + 𝑧 2 𝑝2 + · · · + 𝑧 𝑘 𝑝 𝑘 where the linear coefficients
𝑝1 , . . . , 𝑝 𝑘 are nonnegative and sum to one. The previously discussed case 𝑘 = 2
corresponds to 𝑧 1 = 𝑥, 𝑧 2 = 𝑦, 𝑝1 = 1 − 𝑝, and 𝑝2 = 𝑝 ∈ [0, 1].

x
x

y y

Figure 5.9 Examples of sets that are convex (left) and not convex (right).

A set of points is called convex if it contains with any points 𝑧 1 , 𝑧2 , . . . , 𝑧 𝑘 also


every convex combination of these points. Equivalently, one can show that a set is
convex if it contains with any two points also the line segment that joins these two
points (see Figure 5.9). One can then obtain combinations of 𝑘 points for 𝑘 > 2 by
iterating convex combinations of only two points.

5.5.4 Separating Hyperplanes

The topic of this section is stated in Theorem 5.7 and shown in Figure 5.10: Given
a closed convex set 𝐶 and a point 𝑏 not in 𝐶, there is a hyperplane 𝐻 that separates
𝐶 from 𝑏, which means that 𝐶 is on one side of the hyperplane and 𝑏 is strictly on
the other side.

𝑏 𝐻

𝑐 𝑎
𝐶

Figure 5.10 The separating hyperplane theorem for a closed set 𝐶, here a
compact set 𝐶 (but 𝐶 can be unbounded) and a point 𝑏 not in 𝐶.

The reasoning involves three points 𝑎, 𝑏, 𝑐, and is based on a very intuitive


observation about triangles. In order to remember it better, we describe it in the
form of a joke that can be dated back to the 19th century: Why does the chicken
142 Chapter 5. Linear Optimisation

cross the road? The answer is: To get to the other side. (The fact that this is not
really a joke at all is what is meant to be funny about this.) Our variant of this
question is shown in Figure 5.11 and asks: How does the chicken cross the triangle?
The chicken is in one corner 𝑏 of the triangle and wants to get to the other side of
the triangle. As appropriate for appearing in a course on optimisation, it wants to
do so as fast as possible. However, the chicken only crosses the triangle if the angle
at the adjacent corner 𝑐 is acute (less than 90 degrees). Otherwise, it will go along
the side from 𝑏 to 𝑐.

𝑏 𝑏 𝑏

𝑐 𝑎 𝑐 𝑎 𝑐 𝑎

Figure 5.11 A triangle with corners 𝑎, 𝑏, 𝑐 and a chicken at point 𝑏 (with 𝑏


for bird, 𝑐 for corner). The chicken wants to get to the other side [𝑐, 𝑎] of
the triangle. Left: Acute angle at 𝑐, where the chicken crosses the triangle
(assuming 𝑎 is far enough away). Middle: Obtuse angle at 𝑐. Right: Right
angle at 𝑐. In the latter two cases the shortest way to the other side is along
the side from [𝑏, 𝑐] of the triangle.

Whether the angle at 𝑐 is acute, a right angle, or obtuse is represented by the


sign of the scalar product of the vectors 𝑏 − 𝑐 and 𝑎 − 𝑐 (for simplicity, imagine
𝑐 = 0, in which case this is just the scalar product 𝑏⊤𝑎).

Lemma 5.6. Consider three distinct points 𝑎, 𝑏, 𝑐 in R𝑚 . Then the closest point to 𝑏 on
the line segment [𝑐, 𝑎] is 𝑐 if and only if

(𝑏 − 𝑐)⊤(𝑎 − 𝑐) ≤ 0 . (5.19)

Otherwise, that closest point is a convex combination 𝑐(1 − 𝑝) + 𝑎𝑝 for some 𝑝 ∈ (0, 1].

Proof. By (3.16), the (Euclidean) length ∥𝑥∥ of a vector 𝑥 is equal to 𝑥⊤𝑥, and
minimising that length is equivalent to minimising its square 𝑥⊤𝑥. Consider a
point 𝑧 = 𝑐 + (𝑎 − 𝑐)𝑝 for 𝑝 ∈ R on the line through 𝑐 and 𝑎 (see Figure 5.8), which
for 𝑝 ∈ [0, 1] is on the side [𝑐, 𝑎] of the triangle. We minimise ∥𝑏 − 𝑧 ∥ 2 , that is,
(𝑏 − 𝑧)⊤(𝑏 − 𝑧), where

∥𝑏 − 𝑧 ∥ 2 = ∥𝑏 − 𝑐 − (𝑎 − 𝑐)𝑝∥ 2 = ∥𝑏 − 𝑐∥ 2 − 2(𝑏 − 𝑐)⊤(𝑎 − 𝑐)𝑝 + ∥𝑎 − 𝑐∥ 2 𝑝 2

which as a function of 𝑝 is a parabola that tends to infinity for large |𝑝| and thus has
its minimum when its derivative is zero, that is, −2(𝑏 − 𝑐)⊤(𝑎 − 𝑐) + 2∥𝑎 − 𝑐 ∥ 2 𝑝 = 0
5.5. The Lemma of Farkas and Strong LP Duality 143

or
(𝑏 − 𝑐)⊤(𝑎 − 𝑐)
𝑝= .
∥𝑎 − 𝑐 ∥ 2
Hence, 𝑝 has the same sign as (𝑏 − 𝑐)⊤(𝑎 − 𝑐). If 𝑝 = 0 then, by the definition of 𝑧,
the closest point to 𝑏 on the line through 𝑐 and 𝑎 is 𝑐, as in the right picture in
Figure 5.11. If 𝑝 < 0, then that closest point on the line is to the left of 𝑐 but not on
the line segment [𝑐, 𝑎] (the side of the triangle), so the closest point to 𝑏 on [𝑐, 𝑎] is
also 𝑐. These are the cases claimed in (5.19). If 𝑝 > 0 then the closest point to 𝑏 on
the line through 𝑐 and 𝑎 is to the right of 𝑐, which belongs to [𝑐, 𝑎] if 𝑝 ≤ 1, and is
to the right of 𝑎 if 𝑝 > 1 in which case the closest point to 𝑏 on [𝑐, 𝑎] is 𝑎 (so then
the chicken does not cross the triangle either but walks along the side [𝑏, 𝑎]); at
any rate, the closest point to 𝑏 on [𝑐, 𝑎] is not 𝑐 and 𝑝 > 0 as claimed.

𝑏 𝐻

𝑐 𝑎
𝐶

Figure 5.12 Proof of Theorem 5.7 using Lemma 5.6.

The following theorem on the separating hyperplane is now easy to prove.

Theorem 5.7. Let 𝐶 be a non-empty closed convex set, 𝐶 ⊂ R𝑚 , 𝑏 ∈ R𝑚 , and 𝑏 ∉ 𝐶.


Let 𝑐 ∈ 𝐶 minimize ∥𝑏 − 𝑐 ∥. Consider the hyperplane 𝐻 through 𝑐 with normal vector
𝑣 = 𝑏 − 𝑐,
𝐻 = {𝑧 ∈ R𝑚 | 𝑣⊤𝑧 = 𝑣⊤𝑐} . (5.20)
Then all of 𝐶 is on one side of 𝐻 and 𝑏 is strictly on the other side of 𝐻:

(𝑏 − 𝑐)⊤(𝑏 − 𝑐) > 0, ∀𝑎 ∈ 𝐶 : (𝑏 − 𝑐)⊤(𝑎 − 𝑐) ≤ 0 . (5.21)

Proof. A closest point 𝑐 in 𝐶 to 𝑏 exists because 𝐶 is non-empty and closed.


The distance ∥𝑏 − 𝑐∥ is positive because 𝑏 ∉ 𝐶, which proves the first assertion
∥𝑏 −𝑐∥ 2 > 0 in (5.21). As shown in Figure 5.12, the second assertion is a consequence
of Lemma 5.6: It holds trivially if 𝑐 = 𝑎, and for 𝑎 ∈ 𝐶 with 𝑎 ≠ 𝑐 we have a triangle
with distinct points 𝑎, 𝑏, 𝑐. The side [𝑐, 𝑎] of that triangle belongs to 𝐶 because 𝐶
is convex. We cannot have (𝑏 − 𝑐)⊤(𝑎 − 𝑐) > 0 because then the closest point to 𝑏
on [𝑐, 𝑎] (and certainly such a point in 𝐶) would not be 𝑐 but some other point of
[𝑐, 𝑎] by Lemma 5.6.
144 Chapter 5. Linear Optimisation

5.5.5 The Convex Cone in the Lemma of Farkas

Our aim is to apply Theorem 5.7 to the cone 𝐶 in (5.12). For that we need to show
that 𝐶 is non-empty, convex, and closed. Clearly, 𝐶 is non-empty because 0 ∈ 𝐶.
Convexity is similarly easy, and a good exercise.

⇒ Prove that 𝐶 in (5.12) is convex.

The proof that 𝐶 is closed is not difficult but needs further steps that we
postpone to the next two subsections. For now we assume that 𝐶 is closed, and
focus on the definition of the hyperplane 𝐻.

𝑏
𝑣 = 𝑏−𝑐 𝐴1
𝑐 𝐴2
𝐶
0 𝐴3
𝐻

Figure 5.13 The separating hyperplane theorem applied to 𝐶 in (5.12). Then


the “support point” 𝑐 can be replaced by 0.

Figure 5.13 shows a picture similar to Figure 5.7 where the point 𝑏 is not in 𝐶
and 𝑐 is the closest point to 𝑏 in 𝑐. The hyperplane 𝐻 that separates 𝑏 from 𝐶 is
defined in (5.20). We want to show that 𝐻 is defined by those points 𝑧 that fulfill
𝑣⊤𝑧 = 0, where 𝑣 = 𝑏 − 𝑐, because this is what we need for the Lemma of Farkas. In
other words, we want to show that 0 belongs to 𝐻, that is, 𝑣⊤𝑐 = 0. This is obvious
when we already have 𝑐 = 0.

⇒ Draw a picture of a cone 𝐶 as in (5.12) and a point 𝑏 not in 𝐶 such that the
closest point to 𝑏 in 𝐶 is 0.

In general, the condition 𝑣⊤𝑐 = 0 means that 𝑐 is orthogonal to 𝑐. Figures 5.7


and 5.13 show that this seems to be the case, and it is not hard to show using (5.21).
Note that the following theorem differs from Theorem 5.7 via the changed sign of
the normal vector of 𝐻, with 𝑦 = −𝑣.

Theorem 5.8. Let 𝐴 ∈ R𝑚×𝑛 and 𝐶 = {𝐴𝑥 | 𝑥 ∈ R𝑚 , 𝑥 ≥ 0} as in (5.12), and 𝑏 ∉ 𝐶.


Assume that 𝐶 is closed. Let 𝑐 be the closest point to 𝑏 in 𝐶, and 𝑦 = 𝑐 − 𝑏. Then
𝐻 = {𝑧 ∈ R𝑚 | 𝑦⊤𝑧 = 0} separates 𝑏 from 𝐶 in the sense that

𝑦⊤𝑏 < 0, 𝑦⊤𝐴 𝑗 ≥ 0 (1 ≤ 𝑗 ≤ 𝑛) (5.22)

(where 𝐴1 , . . . , 𝐴𝑛 are the columns of 𝐴) and thus 𝑦⊤𝑧 ≥ 0 for all 𝑧 ∈ 𝐶.


5.5. The Lemma of Farkas and Strong LP Duality 145

Proof. We apply Theorem 5.7 where 𝑣 = 𝑏 − 𝑐 = −𝑦 with 𝐻 as in (5.20). We


show 𝑣⊤𝑐 = 0, which holds if 𝑐 = 0. Otherwise, 𝑐 = 𝑗∈𝐽 𝐴 𝑗 𝑥 𝑗 with 𝑥 𝑗 > 0 for all
Í
𝑗 ∈ 𝐽 for some non-empty subset 𝐽 of {1, . . . , 𝑛}. Let 𝑗 ∈ 𝐽 and 𝑎 = 𝑐 + 𝐴 𝑗 and
𝑎 ′ = 𝑐 − 𝐴 𝑗 𝑥 𝑗 where both 𝑎 and 𝑎 ′ belong to 𝐶, and 𝑎 − 𝑐 = 𝐴 𝑗 and 𝑎 ′ − 𝑐 = −𝐴 𝑗 𝑥 𝑗 .
According to (5.21), (𝑏 − 𝑐)⊤(𝑎 − 𝑐) = 𝑣⊤𝐴 𝑗 ≤ 0 and (𝑏 − 𝑐)⊤(𝑎 ′ − 𝑐) = −𝑣⊤𝐴 𝑗 𝑥 𝑗 ≤ 0
and thus 𝑣⊤𝐴 𝑗 ≥ 0 because 𝑥 𝑗 > 0. In summary, 𝑣⊤𝐴 𝑗 = 0 for all 𝑗 ∈ 𝐽 and
therefore 𝑣⊤𝑐 = 0. Furthermore, 0 < (𝑏 − 𝑐)⊤(𝑏 − 𝑐) = 𝑣⊤𝑏 − 𝑣⊤𝑐 = 𝑣⊤𝑏 = −𝑦⊤𝑏.
This shows 𝑦⊤𝑏 < 0 in (5.22). Because 𝐴 𝑗 ∈ 𝐶 for 1 ≤ 𝑗 ≤ 𝑛, (5.21) shows
0 ≥ (𝑏 − 𝑐)⊤(𝐴 𝑗 − 𝑐) = 𝑣⊤𝐴 𝑗 − 𝑣⊤𝑐 = −𝑦⊤𝐴 𝑗 and the other inequalities in (5.22).
These clearly imply 𝑦⊤𝑧 ≥ 0 for all 𝑧 ∈ 𝐶.
The proof of Theorem 5.8 has a geometric intuition: If the closest point 𝑐 in 𝐶
to 𝑏 is not the origin 0, then 𝑐 is the positive linear combination of some columns
𝐴 𝑗 of 𝐴 (defined by 𝑗 ∈ 𝐽). Adding to 𝑐 a suitable positive or negative multiple of
𝐴 𝑗 creates another point 𝑎 or 𝑎 ′ in 𝐶. But then the vector 𝑣 that points from 𝑏 to
𝑐 must have a right angle with 𝐴 𝑗 and −𝐴 𝑗 because otherwise there would be a
closer point in 𝐶 to 𝑏 as shown in Figure 5.11.

⇒ What is the set 𝐽 in the proof of Theorem 5.8 in Figure 5.7 and in Figure 5.13?

Theorem 5.8 is very nearly the statement of the Lemma of Farkas (Theorem 5.4),
except that we assumed that 𝐶 is closed. We next prove this assumption.

5.5.6 Linear Independence and Carathéodory’s Theorem

This subsection provides one more step in completing the proof of the Lemma
of Farkas. When looking for a nonnegative solution 𝑥 to the system 𝐴𝑥 = 𝑏,
the solution 𝑥 is in general not unique, as in the typical case where 𝐴 has more
columns than rows. However, when the columns 𝐴 𝑗 where 𝑥 𝑗 is positive are linearly
independent, then these components 𝑥 𝑗 of 𝑥 are unique.
The next lemma states that this can always be achieved even with the additional
requirement 𝑥 ≥ 0, if such solutions exist at all.
Lemma 5.9. Let 𝐴 = [𝐴1 · · · 𝐴𝑛 ] ∈ R𝑚×𝑛 and 𝑏 ∈ R𝑚 . If 𝐴𝑥 = 𝑏, 𝑥 ≥ 0 has a solution 𝑥,
then there is a set 𝐽 ⊆ {1, . . . , 𝑛} such that the vectors 𝐴 𝑗 for 𝑗 ∈ 𝐽 are linearly independent,
and there are unique positive reals 𝑥 𝑗 for 𝑗 ∈ 𝐽 with
Õ
𝐴𝑗 𝑥 𝑗 = 𝑏 . (5.23)
𝑗∈𝐽

Proof. Let 𝐴𝑥 = 𝑏 for some 𝑥 ≥ 0 and 𝐽 = {𝑗 | 𝑥 𝑗 > 0} such that (5.23) holds. The
goal is now to remove elements from 𝐽 until the vectors 𝐴 𝑗 for 𝑗 ∈ 𝐽 are linearly
independent (in which case we simply call 𝐽 independent). Suppose this is not the
case. We change the coefficients 𝑥 𝑗 by keeping them nonnegative but such that at
least one of them becomes zero, which gives a smaller set 𝐽, as follows.
146 Chapter 5. Linear Optimisation

If 𝐽 is not independent then there are scalars 𝑧 𝑗 for 𝑗 ∈ 𝐽, not all zero, so that
Õ
𝐴𝑗 𝑧 𝑗 = 0 ,
𝑗∈𝐽

where we can assume that the set 𝑆 = {𝑗 ∈ 𝐽 | 𝑧 𝑗 > 0} is not empty (otherwise
replace 𝑧 by −𝑧). Then Õ
𝐴 𝑗 (𝑥 𝑗 − 𝑧 𝑗 𝛼) = 𝑏
𝑗∈𝐽

for any 𝛼. We choose the largest 𝛼 so that 𝑥 𝑗 − 𝑧 𝑗 𝛼 ≥ 0 for all 𝑗 ∈ 𝐽. If 𝑧 𝑗 ≤ 0 this


imposes no constraint on 𝛼, but for 𝑧 𝑗 > 0 (that is, 𝑗 ∈ 𝑆) this means 𝑥 𝑗 /𝑧 𝑗 ≥ 𝛼. The
largest 𝛼 fulfilling all these constraints is given by

𝑥𝑗 𝑥𝑖
 
𝛼 = min 𝑗 ∈ 𝑆 =: (5.24)
𝑧𝑗 𝑧𝑖

which implies 𝑥 𝑖 − 𝑧 𝑖 𝛼 = 0. We then remove any 𝑖 that achieves the minimum in


(5.24) from 𝐽. By replacing 𝑥 with 𝑥 − 𝑧𝛼, that is, 𝑥 𝑗 with 𝑥 𝑗 − 𝑧 𝑗 𝛼 for all 𝑗 ∈ 𝐽, we
thus obtain a smaller set 𝐽 to represent 𝑏 as in (5.23). By continuing in this manner,
we eventually obtain an independent set 𝐽 as claimed (if 𝑏 = 0, then 𝐽 is the empty
set). Then, because the vectors 𝐴 𝑗 for 𝑗 ∈ 𝐽 are linearly independent, the scalars 𝑥 𝑗
for 𝑗 ∈ 𝐽 are unique.
An easy consequence of Lemma 5.9 is known as Carathéodory’s Theorem. For
𝑚 = 2 it states that if a point is the convex combination of some points in the plane,
then it is already the convex combination of suitably chosen three of those points
(see Figure 5.14).

Figure 5.14 Illustration of Theorem 5.10 for 𝑚 = 2. Any point in the pen-
tagon belongs to one of the three shown triangles (which are not unique
because there are other ways to “triangulate” the pentagon). A triangle is
the set of convex combinations of its corners.

Theorem 5.10 (Carathéodory). If a point 𝑏 is the convex combination of some points in


R𝑚 , then it is the convex combination of a suitable subset of at most 𝑚 + 1 of these points.

Proof. If 𝑏 is the convex combination of 𝑛 vectors 𝐴1 , . . . , 𝐴𝑛 in R𝑚 , then 𝑏 =


𝐴1 𝑥1 + · · · + 𝐴𝑛 𝑥 𝑛 with 𝑥1 , . . . , 𝑥 𝑛 ≥ 0, 𝑥1 + · · · + 𝑥 𝑛 = 1. This is equivalent to saying
5.5. The Lemma of Farkas and Strong LP Duality 147

that there are 𝑥1 , . . . , 𝑥 𝑛 ≥ 0 such that


" # " # " #
𝑏 𝐴1 𝐴𝑛
= 𝑥1 + · · · + 𝑥 𝑛 ∈ R𝑚+1 . (5.25)
1 1 1
" #
𝐴𝑗
By Lemma 5.9, a linearly independent subset of the vectors suffices to
1
" #
𝑏
represent , which has at most size 𝑚 + 1. This proves the claim.
1

5.5.7 Closedness of the Cone ∗

We now go back to the initial consideration for the Lemma of Farkas illustrated
in Figure 5.7. Consider the cone 𝐶 = {𝐴𝑥 | 𝑥 ≥ 0} as in (5.12) generated by the
columns 𝐴1 , . . . , 𝐴𝑛 of 𝐴. If 𝑏 ∉ 𝐶, then the hyperplane 𝐻 that separates 𝑏 from 𝐶
has the normal vector 𝑦 = 𝑐 − 𝑏 where 𝑐 is the closest point to 𝑏 in 𝐶.
In order for 𝑐 to exist, 𝐶 needs to be closed, that is, it contains any point nearby.
Otherwise, 𝑏 could be a point near 𝐶 but not in 𝐶 which would mean that the
distance ∥𝑐 − 𝑏∥ for 𝑐 in 𝐶 can become arbitrarily small. In that case, one could not
define 𝑦 as described.

𝐶 𝐴4
𝐴1
𝐴3 𝑐 (1)
𝐴2
𝑐 (2)

0 𝑏 ← 𝑐 (𝑘)

Figure 5.15 Illustration of the proof of Lemma 5.11 where 𝐽 = {2} since 𝑐 (𝑘)
for large 𝑘 is a positive linear combination of 𝐴2 only.

Lemma 5.11. For an 𝑚 × 𝑛 matrix 𝐴 = [𝐴1 · · · 𝐴𝑛 ], the cone 𝐶 in (5.12) is a closed set.

Proof. Let 𝑏 be a point in R𝑚 near 𝐶, that is, for all 𝜀 > 0 there is a 𝑐 in 𝐶 so
that ∥𝑐 − 𝑏∥ < 𝜀. Consider a sequence 𝑐 (𝑘) (for 𝑘 = 1, 2, . . .) of elements of 𝐶 that
converges to 𝑏. By Lemma 5.9, there exists for each 𝑘 a subset 𝐽 (𝑘) of {1, . . . , 𝑛} and
(𝑘)
unique positive real numbers 𝑥 𝑗 for 𝑗 ∈ 𝐽 (𝑘) so that the columns 𝐴 𝑗 for 𝑗 ∈ 𝐽 (𝑘) are
linearly independent and Õ
(𝑘)
𝑐 (𝑘) = 𝐴𝑗 𝑥 𝑗 .
𝑗∈𝐽 (𝑘)
148 Chapter 5. Linear Optimisation

There are only finitely many different sets 𝐽 (𝑘) , so there is a set 𝐽 that appears
infinitely often among them (see Figure 5.15 for an example). We consider the
subsequence of the vectors 𝑐 (𝑘) that use this set, that is,
Õ
(𝑘) (𝑘) (𝑘)
𝑐 = 𝐴 𝑗 𝑥 𝑗 =: 𝐴 𝐽 𝑥 𝐽 (5.26)
𝑗∈𝐽

(𝑘)
where 𝐴 𝐽 is the matrix with columns 𝐴 𝑗 for 𝑗 ∈ 𝐽 and 𝑥 𝐽 is the vector with
(𝑘) (𝑘)
components 𝑥𝑗 for 𝑗 ∈ 𝐽. Now, 𝑥𝐽
in (5.26) is a continuous function of 𝑐 (𝑘) : In
order to see this, consider a set 𝐼 of |𝐽 | linearly independent rows of 𝐴 𝐽 , let 𝐴𝐼𝐽 be
(𝑘)
the square submatrix of 𝐴 𝐽 with these rows and let 𝑐 𝐼 be the subvector of 𝑐 (𝑘)
(𝑘) (𝑘)
with these rows, so that 𝑥 𝐽 = 𝐴−1 𝑐 in (5.26). Hence, as 𝑐 (𝑘) converges to 𝑏, the
𝐼𝐽 𝐼
(𝑘) (𝑘)
|𝐽 |-vector 𝑥 𝐽 converges to some 𝑥 ∗𝐽 with 𝑏 = 𝐴 𝐽 𝑥 ∗𝐽 , where 𝑥 𝐽 > 0 implies 𝑥 ∗𝐽 ≥ 0,
which shows that 𝑏 ∈ 𝐶. So 𝐶 is closed.

Remark 5.12. In Lemma 5.11, it is important that 𝐶 is the cone generated by finitely
many vectors 𝐴1 , . . . , 𝐴𝑛 . The cone generated by infinitely many vectors may not
be closed. For example, let 𝐶 be the set of nonnegative linear combinations of the
vectors (𝑛, 1) in R2 , for 𝑛 = 0, 1, 2, . . . Then (1, 0) is a vector near 𝐶 that does not
belong to 𝐶.

⇒ Exercise 5.3 asks you to prove Remark 5.12, by giving an exact description
of 𝐶.

Lemma 5.11 completes the proof of the Lemma of Farkas: It shows that the
assumption in Theorem 5.8 that 𝐶 is closed always holds. The conclusions (5.22)
in that theorem then imply Theorem 5.4.

5.6 Boundedness and Dual Feasibility

So far, the strong duality Theorem 5.3 makes only a statement when both primal
and dual LP are feasible. In principle, it could be the case that the primal LP has
an optimal solution while its dual is not feasible. The following theorem excludes
this possibility. Its proof is a typical application of Theorem 5.3 itself.

Theorem 5.13 (Boundedness implies dual feasibility). Suppose the primal LP (5.7) is
feasible. Then its objective function is bounded if and only if the dual LP (5.9) is feasible.

Proof. By weak duality (Theorem 5.2), if the dual LP has a feasible solution 𝑦,
then its objective function 𝑦⊤𝑏 provides an upper bound for the primal objective
function 𝑐⊤𝑥. Conversely, suppose that the dual LP (5.9) is infeasible, and consider
5.6. Boundedness and Dual Feasibility 149

the following LP which uses an additional real variable 𝑡 and the vector 1 which
has all components equal to 1:
minimise 𝑡
(5.27)
subject to 𝑦 𝐴 + 𝑡1⊤ ≥ 𝑐⊤,

𝑦 ≥ 0, 𝑡 ≥ 0.
This LP is clearly feasible by setting 𝑦 = 0 and 𝑡 = max{0, 𝑐1 , . . . , 𝑐 𝑛 }. Also, the
constraints 𝑦⊤𝐴 ≥ 𝑐⊤, 𝑦 ≥ 0 of (5.9) have no solution if and only if the optimum
value of (5.27) has 𝑡 > 0, which we assume to be the case. The LP (5.27) is the dual
LP to the following LP, which we write with variables 𝑧 ∈ R𝑛 :
maximise 𝑐⊤𝑧
subject to 𝐴𝑧 ≤ 0 ,
(5.28)
1⊤𝑧 ≤ 1 ,
𝑧≥0.
This LP is also feasible with 𝑧 = 0. By strong duality, it has the same value as its
dual LP (5.27), which is positive, given by 𝑐⊤𝑧 = 𝑡 > 0 for some 𝑧 that fulfills the
constraints in (5.28). Consider now a feasible solution 𝑥 to the original primal LP,
that is, 𝐴𝑥 ≤ 𝑏, 𝑥 ≥ 0, and let 𝛼 ∈ R, 𝛼 ≥ 0. Then 𝐴(𝑥 +𝑧𝛼) = 𝐴𝑥 +𝐴𝑧𝛼 ≤ 𝑏 +0𝛼 = 𝑏
and 𝑥 + 𝑧𝛼 ≥ 0, so 𝑥 + 𝑧𝛼 is also a feasible solution to (5.7) with objective function
value 𝑐⊤(𝑥 + 𝑧𝛼) = 𝑐⊤𝑥 + (𝑐⊤𝑧)𝛼 which gets arbitrarily large with growing 𝛼. So
the original LP is unbounded. This proves the theorem.
An alternative way of stating the preceding theorem, for the dual LP, is as
follows.
Corollary 5.14. Suppose the dual LP (5.9) is feasible. Then the primal LP (5.7) is
infeasible if and only if the objective function of the dual LP (5.9) is unbounded.

Proof. This is just an application of Theorem 5.13 with dual and primal exchanged:
Rewrite (5.9) as a primal LP in the form: maximise −𝑏⊤𝑦 subject to −𝐴⊤𝑦 ≤ −𝑐,
𝑦 ≥ 0, so that its dual is: minimise −𝑥⊤𝑐 subject to −𝑥⊤𝐴⊤ ≥ −𝑏⊤, 𝑥 ≥ 0, which is
the same as (5.7), and apply Theorem 5.13.
On the other hand, the fact that one LP is infeasible does not imply that its
dual LP is unbounded, because both could be infeasible.
Remark 5.15. It is possible that both the primal LP (5.7) and its dual LP (5.9) are
infeasible.

Proof. Consider the primal LP


maximise 𝑥2
subject to 𝑥1 ≤ −1
𝑥1 − 𝑥2 ≤ 1
𝑥1 , 𝑥2 ≥ 0,
150 Chapter 5. Linear Optimisation

which is clearly infeasible, as is its dual LP

minimise − 𝑦1 + 𝑦2
subject to 𝑦1 + 𝑦2 ≥ 0
− 𝑦2 ≥ 1
𝑦1 , 𝑦2 ≥ 0 .

primal
optimal unbounded infeasible
dual
optimal yes no no

unbounded no no yes

infeasible no yes yes

Table 5.1 The possibilities for primal and dual LP, where “optimal” means
the LP is feasible and bounded and then has an optimal solution, and “un-
bounded” means the LP is feasible but its objective function is unbounded.

Table 5.1 shows the four possibilities that can occur for the primal LP and its
dual: both have optimal solutions, one is infeasible and the other unbounded, or
both are infeasible. If one LP is feasible, its dual cannot be unbounded by weak
duality (Theorem 5.2), and if it has an optimal solution then its dual cannot be
infeasible by Theorem 5.13.
Table 5.1 does not state the equality of primal and dual objective functions
when both have optimal solutions, but it does state Corollary 5.14. We show that
this implies the Lemma of Farkas for inequalities (Theorem 5.5, which we have
used to prove the strong duality Theorem 5.3). Consider the LP

maximise 0
subject to 𝐴𝑥 ≤ 𝑏 , (5.29)
𝑥 ≥ 0.

with its Tucker diagram


𝑥≥0

𝑦≥0 𝐴 ≤ 𝑏 (5.30)

∨ ↩→ min
0⊤ → max
5.7. Equality LP Constraints and Unrestricted Variables 151

Its dual LP: minimise 𝑦⊤𝑏 subject to 𝑦⊤𝐴 ≥ 0⊤, 𝑦 ≥ 0, is feasible with 𝑦 = 0. The LP
(5.29) is feasible if and only if there is a solution 𝑥 to the inequalities 𝐴𝑥 ≤ 𝑏, 𝑥 ≥ 0.
By Corollary 5.14, there is no such solution if and only if the dual is unbounded,
that is, assumes an arbitrarily negative value of its objective function 𝑦⊤𝑏. This
is equivalent to the existence of some 𝑦 ≥ 0 with 𝑦⊤𝐴 ≥ 0⊤ and 𝑦⊤𝑏 < 0 which
can then be made arbitrarily negative by replacing 𝑦 with 𝑦𝛼 for any 𝛼 > 0. This
proves Theorem 5.5. This inequality version of the Lemma of Farkas can therefore
be remembered with the Tucker diagram (5.30) and Corollary 5.14. In that way, the
possibilities described in Table 5.1 capture the important theorems of LP duality.

5.7 Equality LP Constraints and Unrestricted Variables

We have stated the strong duality Theorem 5.3 for the standard inequality form
of an LP where both primal and dual LP have inequalities with separately stated
nonnegativity constraints for the primal and dual variables. In this section, we
consider different constraints for an LP, which offer greater flexibility in applying
the duality theorem in various contexts. Namely, we allow not only inequalities
but also equalities, as well as variables without a sign restriction.
These cases are closely related with respect to the duality property. As we will
see, a primal equality constraint corresponds to a dual variable that is unrestricted
in sign, and a primal variable that is unrestricted in sign gives rise to a dual
constraint that is an equality. The other case, which we have already seen, is
a primal inequality that corresponds to a dual variable that is nonnegative, or
a primal nonnegative variable where the corresponding dual constraint is an
inequality.
In the following, the matrix 𝐴 and vectors 𝑏 and 𝑐 will always have dimensions
𝐴 ∈ R𝑚×𝑛 , 𝑏 ∈ R𝑚 , and 𝑐 ∈ R𝑛 . These data 𝐴, 𝑏, 𝑐 will simultaneously define a
primal LP with variables 𝑥 in R𝑛 , and a dual LP with variables 𝑦 in R𝑚 . In the
primal LP, we write 𝐴𝑥 ≤ 𝑏 (which gives rise to nonnegative dual variables 𝑦
such that 𝑦 ≥ 0) or 𝐴𝑥 = 𝑏 (giving dual variables 𝑦 without sign constraints),
and maximise the objective function 𝑐⊤𝑥. In the dual LP, we state inequalities
𝑦⊤𝐴 ≥ 𝑐⊤ or equations 𝑦⊤𝐴 = 𝑐⊤ (depending on whether the corresponding primal
variables 𝑥 are nonnegative or unconstrained, respectively), and minimise the
objective function 𝑦⊤𝑏.
We first consider a primal LP with nonnegative variables and equality con-
straints, which is often called an LP in equality form:

maximise 𝑐⊤𝑥
subject to 𝐴𝑥 = 𝑏 , (5.31)
𝑥 ≥ 0.
152 Chapter 5. Linear Optimisation

The corresponding dual LP can be motivated, as in Section 5.4, by trying to find an


upper bound for the primal objective function. That is, each constraint in 𝐴𝑥 = 𝑏 is
multiplied with a dual variable 𝑦 𝑖 . However, because the constraint is an equality,
the variable 𝑦 𝑖 can be unrestricted in sign. The dual constraints are inequalities
because the primal objective function has nonnegative variables 𝑥 𝑗 . That is, the
dual LP to (5.31) is
minimise 𝑦⊤𝑏
(5.32)
subject to 𝑦⊤𝐴 ≥ 𝑐⊤.
The described motivation is just the weak duality theorem, which is immediate:
it says that for feasible solutions 𝑥 and 𝑦 to (5.31) and (5.32) we have 𝑥 ≥ 0 and
𝑦⊤𝐴 − 𝑐⊤ ≥ 0 and thus
𝑐⊤𝑥 ≤ 𝑦⊤𝐴 𝑥 = 𝑦⊤𝑏. (5.33)

The dual LP (5.32) has unconstrained variables 𝑦 subject to inequality con-


straints. As a primal LP, this is of the form

maximise 𝑐⊤𝑥
(5.34)
subject to 𝐴𝑥 ≤ 𝑏.

To find the dual LP to the primal LP (5.34), we can again multiply each inequality in
𝐴𝑥 ≤ 𝑏 with a separate variable 𝑦 𝑖 , with the aim of finding an upper bound to the
primal objective function 𝑐⊤𝑥. The inequality is preserved when 𝑦 𝑖 is nonnegative,
but in order to obtain an upper bound on the primal objective function 𝑐⊤𝑥 we
have to require that 𝑦⊤𝐴 = 𝑐⊤ because the sign of any variable 𝑥 𝑗 is not known.
That, is the dual to (5.34) is

minimise 𝑦⊤𝑏
(5.35)
subject to 𝑦⊤𝐴 = 𝑐⊤, 𝑦 ≥ 0.

Observe that compared to (5.7) the LP (5.34) is missing the nonnegativity constraints
𝑥 ≥ 0, and that compared to (5.9) the dual LP (5.35) states 𝑛 equations 𝑦⊤𝐴 = 𝑐⊤
rather than inequalities.
Again, the choice of primal and dual LP is motivated by weak duality, which
states that for feasible solutions 𝑥 to (5.34) and 𝑦 to (5.35) the corresponding
objective functions are mutual bounds. Including proof, it says

𝑐⊤𝑥 = 𝑦⊤𝐴 𝑥 ≤ 𝑦⊤𝑏. (5.36)

Hence, we have the following types of pairs of a primal LP and its dual LP,
including the original more symmetric situation of LPs in inequality form:
• a primal LP (5.31) with nonnegative variables and equality constraints, and its
dual LP (5.32) with unrestricted variables and inequality constraints;
5.7. Equality LP Constraints and Unrestricted Variables 153

• a primal LP (5.34) with unrestricted variables and inequality constraints, and


its dual LP (5.35) with nonnegative variables and equality constraints;
• a primal LP (5.7) and its dual LP (5.9), both with nonnegative variables subject
to inequality constraints.
In all cases, by changing signs and transposing the matrix, we see that the dual of
the dual is again the primal.
Next, we consider how to convert an LP (5.7) in inequality form into equality
form by means of nonnegative “slack variables” as we have seen it in (5.13), which
is a standard trick in linear programming. The inequality constraints 𝐴𝑥 ≤ 𝑏 can
be converted to equality form by introducing a slack variable 𝑠 𝑖 for each inequality.
These slack variables define a nonnegative vector 𝑠 in R𝑚 . Then (5.7) is equivalent
to:
maximise 𝑐⊤𝑥
subject to 𝐴𝑥 + 𝑠 = 𝑏 , (5.37)
𝑥, 𝑠 ≥ 0 .

This amounts to extending the original constraint matrix 𝐴 to the right by an 𝑚 × 𝑚


identity matrix and adding coefficients 0 in the objective function for the slack
variables, as shown in the following Tucker diagram:
𝑥≥0 𝑠≥0
1
1
𝑦 ∈ R𝑚 𝐴 .. = 𝑏 (5.38)
.
1
∨ ∨ ↩→ min
𝑐⊤ 0 0 · · · 0 → max

Note that converting the inequality form (5.7) to an LP in equality form (5.37)
defines a new dual LP with unrestricted variables 𝑦1 , . . . , 𝑦𝑚 , but the former
inequalities 𝑦 𝑖 ≥ 0 reappear now explicitly via the identity matrix and objective
function zeros introduced with the slack variables, as shown in (5.38). So the
resulting dual LP is exactly the same as in (5.9).
Even simpler, an LP in inequality form (5.7) can also be seen as the special
case of an LP with unrestricted variables 𝑥 𝑗 as in (5.34) since the condition 𝑥 ≥ 0
can be written in the form 𝐴𝑥 ≤ 𝑏 by explicitly listing the 𝑛 inequalities −𝑥 𝑗 ≤ 0.
That is, 𝐴𝑥 ≤ 𝑏 and 𝑥 ≥ 0 become with unrestricted 𝑥 ∈ R𝑛 the 𝑚 + 𝑛 inequalities
" # " #
𝐴 𝑏
𝑥 ≤
with an 𝑛 × 𝑛 identity matrix 𝐼. As is easily seen with a suitable
−𝐼 0
Tucker diagram, the corresponding dual LP according to (5.35) has an additional
𝑛-vector of slack variables 𝑟, say, with the dual constraints 𝑦⊤𝐴 − 𝑟⊤ = 𝑐⊤, 𝑦 ≥ 0,
154 Chapter 5. Linear Optimisation

𝑟 ≥ 0, which are equivalent to the inequalities 𝑦⊤𝐴 ≥ 𝑐⊤, 𝑦 ≥ 0, again exactly as


in (5.9).

⇒ Draw the Tucker diagram suggested in the previous paragraph.

5.8 General LP Duality *

It is useful to consider all these forms of linear programs as a special case of an LP


in general form. A general LP has both inequalities and equalities as constraints,
as well as nonnegative and unrestricted variables, all in the same system. Let 𝐽
be the subset of the column set {1, . . . , 𝑛} where 𝑗 ∈ 𝐽 means that 𝑥 𝑗 ≥ 0, and
let 𝐾 be the subset of the row set {1, . . . , 𝑚} where 𝑖 ∈ 𝐾 means that row 𝑖 is an
inequality in the primal constraint with corresponding dual variable 𝑦 𝑖 ≥ 0 (the
letter 𝐼 denotes an identity matrix so we use 𝐾 instead). Let 𝐽 and 𝐾 be sets of
unconstrained primal and dual variables, with corresponding dual and primal
equality constraints,

𝐽 = {1, . . . , 𝑛} − 𝐽 , 𝐾 = {1, . . . , 𝑚} − 𝐾 . (5.39)

To define the LP in general form, we first draw the Tucker diagram, shown in
Figure 5.16. The diagram assumes that columns and rows are arranged so that
those in 𝐽 and 𝐾 come first. The big boxes contain the respective parts of the
constraint matrix 𝐴, the vertical boxes on the right the parts of the right-hand side 𝑏,
and the horizontal box at the bottom the parts of the primal objective function 𝑐⊤.

𝑥 𝑗 ≥ 0 (𝑗 ∈ 𝐽) 𝑥 𝑗 ∈ R (𝑗 ∈ 𝐽)

𝑦𝑖 ≥ 0

(𝑖 ∈ 𝐾)
𝐴 𝑏
𝑦𝑖 ∈ R
=
(𝑖 ∈ 𝐾)

𝑐⊤
Figure 5.16 Tucker diagram for an LP in general form.

In order to state the duality theorem concisely, we define the feasible sets 𝑋
and 𝑌 for the primal and dual LP. The entries of the 𝑚 × 𝑛 matrix 𝐴 are 𝑎 𝑖𝑗 in row 𝑖
and column 𝑗. Let
5.8. General LP Duality * 155

𝑛
Õ
𝑛
𝑋 = {𝑥 ∈ R | 𝑎 𝑖𝑗 𝑥 𝑗 ≤ 𝑏 𝑖 , 𝑖 ∈ 𝐾,
𝑗=1
𝑛
Õ (5.40)
𝑎 𝑖𝑗 𝑥 𝑗 = 𝑏 𝑖 , 𝑖 ∈ 𝐾,
𝑗=1

𝑥𝑗 ≥ 0 , 𝑗 ∈ 𝐽 }.
Any 𝑥 belonging to 𝑋 is called primal feasible, and the primal LP is called feasible
if 𝑋 is not the empty set ∅. The primal LP is the problem

maximise 𝑐⊤𝑥 subject to 𝑥 ∈ 𝑋 . (5.41)

(This results when reading the Tucker diagram in Figure 5.16 horizontally.) The
corresponding dual LP has the feasible set
𝑚
Õ
𝑚
𝑌 ={𝑦 ∈R | 𝑦 𝑖 𝑎 𝑖𝑗 ≥ 𝑐 𝑗 , 𝑗 ∈ 𝐽,
𝑖=1
𝑚
Õ (5.42)
𝑦 𝑖 𝑎 𝑖𝑗 = 𝑐 𝑗 , 𝑗 ∈ 𝐽,
𝑖=1
𝑦𝑖 ≥ 0 , 𝑖 ∈ 𝐾}
and is the problem
minimise 𝑦⊤𝑏 subject to 𝑦 ∈ 𝑌 . (5.43)
(This results when reading the Tucker diagram in Figure 5.16 vertically.) By
reversing signs, one can verify that the dual of the dual LP is again the primal.
Table 5.2 shows the roles of the sets 𝐾, 𝐾, 𝐽, 𝐽.
For an LP in general form, the strong duality theorem states that (a) for any
primal and dual feasible solutions, the corresponding objective functions are
mutual bounds, (b) if the primal and the dual LP both have feasible solutions,
then they have optimal solutions with the same value of their objective functions,
(c) if the primal or dual LP is bounded, the other LP is feasible. This implies the
possibilities shown in Table 5.1.
Theorem 5.16 (General LP duality). For the primal of LP (5.41) and its dual LP (5.43),
(a) (Weak duality) 𝑐⊤𝑥 ≤ 𝑦⊤𝑏 for all 𝑥 ∈ 𝑋 and 𝑦 ∈ 𝑌.
(b) (Strong duality) If 𝑋 ≠ ∅ and 𝑌 ≠ ∅ then 𝑐⊤𝑥 = 𝑦⊤𝑏 for some 𝑥 ∈ 𝑋 and 𝑦 ∈ 𝑌, so
that both 𝑥 and 𝑦 are optimal.
(c) (Boundedness implies dual feasibility) If 𝑋 ≠ ∅ and 𝑐⊤𝑥 for 𝑥 ∈ 𝑋 is bounded
above, then 𝑌 ≠ ∅. If 𝑌 ≠ ∅ and 𝑦⊤𝑏 for 𝑦 ∈ 𝑌 is bounded below, then 𝑋 ≠ ∅.

Proof. We show that an LP in general form can be represented as an LP in inequality


form with nonnegative variables, so that the claims (a), (b), and (c) follow from
156 Chapter 5. Linear Optimisation

primal LP dual LP
constraint variable
𝑛
Õ
row 𝑖 ∈ 𝐾 inequality 𝑎 𝑖𝑗 𝑥 𝑗 ≤ 𝑏 𝑖 nonnegative 𝑦𝑖 ≥ 0
𝑗=1
𝑛
Õ
row 𝑖 ∈ 𝐾 equation 𝑎 𝑖𝑗 𝑥 𝑗 = 𝑏 𝑖 unconstrained 𝑦𝑖 ∈ R
𝑗=1

objective function
𝑚
Õ
minimise 𝑦𝑖 𝑏 𝑖
𝑖=1
variable constraint
𝑚
Õ
column 𝑗 ∈ 𝐽 nonnegative 𝑥𝑗 ≥ 0 inequality 𝑦 𝑖 𝑎 𝑖𝑗 ≥ 𝑐 𝑗
𝑖=1
𝑚
Õ
column 𝑗 ∈ 𝐽 unconstrained 𝑥𝑗 ∈ R equation 𝑦 𝑖 𝑎 𝑖𝑗 = 𝑐 𝑗
𝑖=1
objective function
𝑛
Õ
maximise 𝑐𝑗 𝑥𝑗
𝑗=1

Table 5.2 Relationship between inequalities and nonnegative variables, and


equations and unconstrained variables, for a primal and dual LP in general
form.

Theorems 5.2, 5.3, and 5.13. In order to keep the notation simple, we demonstrate
this first for the special case of an LP (5.31) in equality form and its dual (5.32),
that is, with 𝐽 = {1, . . . , 𝑛} and 𝐾 = {1, . . . , 𝑚}. This LP with constraints 𝐴𝑥 = 𝑏,
𝑥 ≥ 0 is equivalent to
maximise 𝑐⊤𝑥
subject to 𝐴𝑥 ≤ 𝑏,
(5.44)
− 𝐴𝑥 ≤ − 𝑏 ,
𝑥≥ 0.
The corresponding dual LP uses two 𝑚-vectors 𝑦ˆ and 𝑦 and says

minimise 𝑦ˆ⊤𝑏 − 𝑦⊤𝑏


subject to 𝑦ˆ⊤𝐴 − 𝑦⊤𝐴 ≥ 𝑐⊤, (5.45)
𝑦,
ˆ 𝑦 ≥ 0,

or equivalently
5.9. Complementary Slackness 157

minimise ( 𝑦ˆ − 𝑦)⊤𝑏
subject to ( 𝑦ˆ − 𝑦)⊤𝐴 ≥ 𝑐⊤, (5.46)
𝑦,
ˆ 𝑦 ≥ 0.
Any solution 𝑦 to the dual LP (5.32) with unconstrained dual variables 𝑦 can be
written in the form (5.46) where 𝑦ˆ represents the “positive” part of 𝑦 and 𝑦 the
negated “negative” part of 𝑦 according to

𝑦ˆ 𝑖 = max{𝑦 𝑖 , 0}, 𝑦 𝑖 = max{−𝑦 𝑖 , 0}, 𝑖 ∈ 𝐾, (5.47)

so that evidently 𝑦ˆ ≥ 0 and 𝑦 ≥ 0 and 𝑦 = 𝑦ˆ − 𝑦. That is, any vector 𝑦 without


sign restrictions can be written as the difference of two nonnegative vectors 𝑦ˆ and
𝑦. These nonnegative vectors are not unique, because their components 𝑦ˆ 𝑖 and
𝑦 𝑖 when defined by (5.47) (where at least one of them is zero) can be replaced by
𝑦ˆ 𝑖 + 𝑧 𝑖 and 𝑦 𝑖 + 𝑧 𝑖 for any 𝑧 𝑖 ≥ 0, which leaves 𝑦ˆ 𝑖 − 𝑦 𝑖 unchanged.
That is, any solution to (5.46) and thus (5.45) is a solution to (5.32) and vice
versa. This proves the three claims (a), (b), (c) because they are known for the
primal LP (5.44) and its dual (5.45) which are in inequality form.
The same argument applies in the general case. First, if the set 𝐾 of equality
constraints is only a subset of {1, . . . , 𝑚}, then we write each such row as a pair
of two inequalities as in (5.44), which gives rise to a pair of nonnegative dual
variables 𝑦ˆ 𝑖 and 𝑦 𝑖 for 𝑖 ∈ 𝐾. Their difference 𝑦ˆ 𝑖 − 𝑦 𝑖 defines a variable 𝑦 𝑖 without
sign restrictions, and can be obtained from 𝑦 𝑖 according to (5.47).
Suppose this is done so that all rows are inequalities. In a similar way, we then
write an unrestricted primal variable 𝑥 𝑗 for 𝑗 ∈ 𝐽 as the difference 𝑥ˆ 𝑗 − 𝑥 𝑗 of two
new primal variables that are nonnegative. The 𝑗th column 𝐴 𝑗 of 𝐴 and the 𝑗th
component 𝑐 𝑗 of 𝑐⊤ are then replaced by a pair 𝐴 𝑗 , −𝐴 𝑗 and 𝑐 𝑗 , −𝑐 𝑗 with coefficients
𝑥ˆ 𝑗 , 𝑥 𝑗 , so that 𝑎 𝑖𝑗 𝑥 𝑗 is written as 𝑎 𝑖𝑗 𝑥ˆ 𝑗 − 𝑎 𝑖𝑗 𝑥 𝑗 and 𝑐 𝑗 𝑥 𝑗 as 𝑐 𝑗 𝑥ˆ 𝑗 − 𝑐 𝑗 𝑥 𝑗 . For these
columns, the resulting pair of inequalities in the dual LP
𝑚
Õ 𝑚
Õ
𝑦 𝑖 𝑎 𝑖𝑗 ≥ 𝑐 𝑗 , −𝑦 𝑖 𝑎 𝑖𝑗 ≥ −𝑐 𝑗
𝑖=1 𝑖=1

is then equivalent to a dual equation for 𝑗 ∈ 𝐽, as stated in (5.42) and Table 5.2. The
claim then follows as before for the known statements for an LP in inequality form.

5.9 Complementary Slackness

The optimality condition 𝑐⊤𝑥 = 𝑦⊤𝑏, already stated in the weak duality Theorem 5.2,
is equivalent to a combinatorial condition known as “complementary slackness”.
158 Chapter 5. Linear Optimisation

It states that in each column 𝑗 and row 𝑖 at least one of the associated inequalities
in the dual or primal LP is tight, that is, holds as an equality. In a general LP, this
is only relevant for the inequality constraints, that is, for 𝑗 ∈ 𝐽 and 𝑖 ∈ 𝐾 (see the
Tucker diagram in Figure 5.16).

Theorem 5.17 (Complementary slackness). A pair 𝑥, 𝑦 of feasible solutions to the


primal LP (5.7) and its dual LP (5.9) is optimal if and only if

(𝑦⊤𝐴 − 𝑐⊤) 𝑥 = 0, 𝑦⊤(𝑏 − 𝐴𝑥) = 0, (5.48)

or equivalently for all 𝑗 = 1, . . . , 𝑛 and 𝑖 = 1, . . . , 𝑚

𝑚 𝑛
!
Õ Õ
𝑦 𝑖 𝑎 𝑖𝑗 − 𝑐 𝑗 𝑥 𝑗 = 0 , 𝑦 𝑖 ­𝑏 𝑖 − 𝑎 𝑖𝑗 𝑥 𝑗 ® = 0 , (5.49)
© ª
𝑖=1 « 𝑗=1 ¬
that is,
𝑚
Õ 𝑛
Õ
𝑥𝑗 > 0 ⇒ 𝑦 𝑖 𝑎 𝑖𝑗 = 𝑐 𝑗 , 𝑦𝑖 > 0 ⇒ 𝑎 𝑖𝑗 𝑥 𝑗 = 𝑏 𝑖 . (5.50)
𝑖=1 𝑗=1

For an LP in general form (5.41) and its dual (5.43), a feasible pair 𝑥 ∈ 𝑋, 𝑦 ∈ 𝑌 is also
optimal if and only if (5.48) holds, or equivalently (5.49) or (5.50).

Proof. Suppose 𝑥 and 𝑦 are feasible for (5.7) and (5.9), so 𝐴𝑥 ≤ 𝑏, 𝑥 ≥ 0, 𝑦⊤𝐴 ≥ 𝑐⊤,
𝑦 ≥ 0. They are both optimal if and only if their objective functions are equal,
𝑐⊤𝑥 = 𝑦⊤𝑏. This means that the two inequalities 𝑐⊤𝑥 ≤ 𝑦⊤𝐴 𝑥 ≤ 𝑦⊤𝑏 used to prove
weak duality hold as equalities 𝑐⊤𝑥 = 𝑦⊤𝐴 𝑥 and 𝑦⊤𝐴 𝑥 = 𝑦⊤𝑏, which are equivalent
to (5.48).
The left equation in (5.48) says
𝑛 𝑚
!
Õ Õ
0 = (𝑦⊤𝐴 − 𝑐⊤) 𝑥 = 𝑦 𝑖 𝑎 𝑖𝑗 − 𝑐 𝑗 𝑥 𝑗 . (5.51)
𝑗=1 𝑖=1

Then 𝑦⊤𝐴 ≥ 𝑐⊤ and 𝑥 ≥ 0 imply that the sum over 𝑗 on the right-hand side of (5.51)
is a sum of nonnegative terms, which is zero only if each of them is zero, as stated
on the left in (5.49). Similarly, the second equation 𝑦⊤(𝑏 − 𝐴𝑥) = 0 in (5.48) holds
only if the equations on the right of (5.49) hold for all 𝑗. Clearly, (5.50) is equivalent
to (5.49).
For an LP in general form, the feasibility conditions 𝑦 ∈ 𝑌 and 𝑥 ∈ 𝑋 with
(5.42) and (5.40) imply
𝑚
Õ 𝑛
Õ
𝑦 𝑖 𝑎 𝑖𝑗 = 𝑐 𝑗 for 𝑗 ∈ 𝐽, 𝑎 𝑖𝑗 𝑥 𝑗 = 𝑏 𝑖 for 𝑖 ∈ 𝐾, (5.52)
𝑖=1 𝑗=1
5.9. Complementary Slackness 159

so that (5.49) holds for 𝑗 ∈ 𝐽 and 𝑖 ∈ 𝐾. Hence, the respective terms in (5.49) are
zero in the scalar products (𝑦⊤𝐴 − 𝑐⊤) 𝑥 and 𝑦⊤(𝑏 − 𝐴𝑥). These scalar products
are nonnegative because 𝑚 𝑖=1 𝑦 𝑖 𝑎 𝑖𝑗 ≥ 𝑐 𝑗 and 𝑥 𝑗 ≥ 0 for 𝑗 ∈ 𝐽, and 𝑦 𝑖 ≥ 0 and
Í
Í𝑛
𝑏 𝑖 ≥ 𝑗=1 𝑎 𝑖𝑗 𝑥 𝑗 for 𝑖 ∈ 𝐾. So the weak duality proof 𝑐⊤𝑥 ≤ 𝑦⊤𝐴 𝑥 ≤ 𝑦⊤𝑏 applies as
well. As before, optimality 𝑐⊤𝑥 = 𝑦⊤𝑏 is equivalent to (5.48) and thus to (5.49) and
(5.50), which for 𝑗 ∈ 𝐽 and 𝑖 ∈ 𝐾 hold trivially by (5.52) irrespective of the sign of 𝑥 𝑗
or 𝑦 𝑖 .

Consider the standard LP in inequality form (5.7) and its dual LP (5.9). The
dual feasibility constraints imply nonnegativity of 𝑦⊤𝐴 − 𝑐⊤, which is the 𝑛-vector
of “slacks”, that is, of differences in the inequalities 𝑦⊤𝐴 ≥ 𝑐⊤; such a slack is
zero in some column if the inequality is tight. The condition (𝑦⊤𝐴 − 𝑐⊤) 𝑥 = 0
in (5.48) says that this nonnegative slack vector is orthogonal to the nonnegative
vector 𝑥, because the scalar product of these two vectors is zero. The conditions
(5.49) and (5.50) state that this orthogonality can hold only if the two vectors are
complementary in the sense that in each component at least one of them is zero.
Similarly, the nonnegative 𝑚-vector 𝑦 and the 𝑚-vector of primal slacks 𝑏 − 𝐴𝑥 are
orthogonal in the second equation 𝑦⊤(𝑏 − 𝐴𝑥) = 0 in (5.48). In a compact way, we
can write
𝑦⊤𝐴 ≥ 𝑐⊤ ⊥ 𝑥≥0
(5.53)
𝑦≥0 ⊥ 𝐴𝑥 ≤ 𝑏

to state the following:


• all the inequalities in (5.53) have to hold, where those on the left state dual
feasibility and those on the right state primal feasibility, and
• the orthogonality signs ⊥ in the two rows in (5.53) say that the 𝑛- and 𝑚-vectors
of slacks (differences in these inequalities) have to be orthogonal as in (5.48)
for 𝑥 and 𝑦 to be optimal.
In the Tucker diagram (5.10), the first orthogonality in (5.53) refers to the
𝑛 columns and the second orthogonality to the 𝑚 rows. By (5.49) or (5.50),
orthogonality means complementarity in the sense that for each column 𝑗 or row 𝑖
at least one inequality is tight.
We demonstrate with Example 5.1 how complementary slackness is useful in
the search for optimal solutions. The dual to (5.8) (which we normally see directly
from the Tucker diagram) says explicitly: for 𝑦1 , 𝑦2 ≥ 0 subject to

3𝑦1 + 𝑦2 ≥ 8
4𝑦1 + 𝑦2 ≥ 10
(5.54)
2𝑦1 + 𝑦2 ≥ 5

minimise 7𝑦1 + 2𝑦2 .


160 Chapter 5. Linear Optimisation

One feasible primal solution is 𝑥 = (𝑥1 , 𝑥2 , 𝑥3 ) = (0, 1, 1). Then the first inequality
in (5.8) is not tight so by (5.50) we need 𝑦1 = 0 in an optimal solution. Because
𝑥2 > 0 and 𝑥3 > 0 the second and third inequality in (5.54) have to be tight, which
implies 𝑦2 = 10 and 𝑦2 = 5 which is impossible. So this 𝑥 is not optimal.
Another feasible primal solution is (𝑥1 , 𝑥2 , 𝑥3 ) = (0, 1.75, 0), where 𝑦2 = 0
because the second primal inequality is not tight. Only the second inequality in
(5.54) has to be tight, that is, 4𝑦1 = 10 or 𝑦1 = 2.5. However, this violates the first
dual inequality in (5.54).
Finally, for the primal solution 𝑥 = (𝑥1 , 𝑥2 , 𝑥3 ) = (1, 1, 0) both primal inequal-
ities are tight, which allows for 𝑦1 > 0 and 𝑦2 > 0. Then the first two dual
inequalities in (5.54) have to be tight, which determines 𝑦 as (𝑦1 , 𝑦2 ) = (2, 2), which
also fulfills the third dual inequality (which is allowed to have positive slack
because 𝑥3 = 0). So here 𝑥 and 𝑦 are optimal.
The complementary slackness condition is a good way to verify that a con-
jectured primal solution is optimal, because the resulting equations for the dual
variables typically determine the values of the dual variables which can then be
checked for dual feasibility (or for equality of primal and dual objective function).
As stated in Theorem 5.17, the complementary slackness conditions charac-
terise optimality of a primal-dual pair 𝑥, 𝑦 also for an LP in general form. However,
in such an LP they only impose constraints for the primal or dual inequalities,
that is, for the columns 𝑗 ∈ 𝐽 and 𝑖 ∈ 𝐾 in the Tucker diagram in Figure 5.16.
The other columns and rows already define dual or primal equations which by
definition have zero slack. This is also the case if such an equality is converted to a
pair of inequalities. For example, for 𝑖 ∈ 𝐾, the primal equation 𝑛𝑗=1 𝑎 𝑖𝑗 𝑥 𝑗 = 𝑏 𝑖
Í
with unrestricted dual variable 𝑦 𝑖 can be rewritten as a pair of two inequalities
Í𝑛 Í𝑛
𝑗=1 𝑎 𝑖𝑗 𝑥 𝑗 ≤ 𝑏 𝑖 and − 𝑗=1 𝑎 𝑖𝑗 𝑥 𝑗 ≤ −𝑏 𝑖 with associated nonnegative dual variables
𝑦ˆ 𝑖 and 𝑦 𝑖 so that 𝑦 𝑖 = 𝑦ˆ 𝑖 − 𝑦 𝑖 . As noted in the proof of Theorem 5.16, we can add
a constant 𝑧 𝑖 to the two variables 𝑦ˆ 𝑖 and 𝑦 𝑖 in any dual feasible solution, so that
they are both positive when 𝑧 𝑖 > 0. By complementary slackness, the two primal
inequalities then have to be tight, but they can anyhow only be fulfilled if they both
hold as an equation. This confirms that for a general LP, complementary slackness
is informative only for the inequality constraints.

5.10 LP Duality and the KKT Theorem *

In this section we connect linear programming duality with the first-order condi-
tions studied in the previous chapter. These concern a local optimum, but for an
LP that is the same as a global optimum:
Theorem 5.18. Any local optimum (maximum or minimum) of an LP (in general form) is
a global optimum.
5.10. LP Duality and the KKT Theorem * 161

Proof. Suppose the objective function 𝑐⊤𝑥 is to be maximised (if it is to be minimised


consider −𝑐⊤𝑥 instead). Let 𝑋 be the feasible set of the given LP as in (5.40) and
suppose 𝑥 is a local maximum, that is, for some 𝜀 > 0 we have
∀𝑧 ∈ 𝑋 : ∥𝑧 − 𝑥∥ < 𝜀 ⇒ 𝑐⊤𝑥 ≥ 𝑐⊤𝑧 . (5.55)
We now use the crucial property that 𝑋 is convex, which is an easy and instructive
exercise.
⇒ Show that 𝑋 in (5.40) is convex. Convex combinations are special linear
combinations. Why do they preserve equations? And why do they preserve
inequalities?

Suppose that 𝑥 is not a global maximum, that is, 𝑐⊤𝑥 > 𝑐⊤𝑥 for some 𝑥 ∈ 𝑋.
Let 0 < 𝛿 ≤ 1 and let 𝑧 = 𝑥(1 − 𝛿) + 𝑥𝛿 = 𝑥 + (𝑥 − 𝑥)𝛿 where 𝑧 ∈ 𝑋 because is 𝑋 is
convex. Then
𝑐⊤𝑧 = 𝑐⊤𝑥 + (𝑐⊤𝑥 − 𝑐⊤𝑥)𝛿 > 𝑐⊤𝑥 . (5.56)
However, ∥𝑧 − 𝑥∥ = ∥𝑥 − 𝑧 ∥𝛿 < 𝜀 for sufficiently small positive 𝛿, and then (5.56)
contradicts (5.55). Hence, there is no 𝑥 ∈ 𝑋 with 𝑐⊤𝑥 > 𝑐⊤𝑥, which shows that 𝑥 is
indeed a global maximum.
We show that the KKT Theorem 4.11 applied to a linear program is essentially
the strong LP duality theorem, applied to an LP (5.34) in inequality form where
any inequalities such as 𝑥 ≥ 0 would have to be written as part of 𝐴𝑥 ≤ 𝑏, so the
variables 𝑥 ∈ R𝑛 are unrestricted. In order to match the notation in Theorem 4.11, let
the number of rows of 𝐴 be ℓ . That is, (5.34) states: maximise 𝑓 (𝑥) = 𝑐⊤𝑥 subject to
ℎ 𝑖 (𝑥) = 𝑛𝑗=1 𝑎 𝑖𝑗 𝑥 𝑗 − 𝑏 𝑖 ≤ 0 for 1 ≤ 𝑖 ≤ ℓ . The functions 𝑓 and ℎ 𝑖 are affine functions
Í
that have constant derivatives, with 𝐷 𝑓 (𝑥) = 𝑐⊤ and 𝐷 ℎ 𝑖 (𝑥) = (𝑎 𝑖1 , . . . , 𝑎 𝑖𝑛 ) for
1 ≤ 𝑖 ≤ ℓ . The open set 𝑈 in Theorem 4.11 is R𝑛 .
Suppose that this LP is feasible and that 𝑐⊤𝑥 has a local maximum at 𝑥 = 𝑥,
which by Theorem 5.18 is also a global maximum. By the duality Theorem 5.16
for an LP in general form, there exists an optimal dual vector 𝑦 ∈ Rℓ with 𝑦 ≥ 0
and 𝑦⊤𝐴 = 𝑐⊤ (see also (5.35)), which is equivalent to 𝐷 𝑓 (𝑥) = 𝑐⊤ = 𝑦⊤𝐴 =
Íℓ
𝑖=1 𝑦 𝑖 𝐷 ℎ 𝑖 (𝑥) which is the last equation in (4.51) with 𝜇 𝑖 = 𝑦 𝑖 for 1 ≤ 𝑖 ≤ ℓ .
Moreover, the optimality condition 𝑐⊤𝑥 = 𝑦⊤𝑏 is equivalent to the complementary
slackness conditions (5.49). In (5.49), the first set of equations hold automatically
because 𝑦⊤𝐴 = 𝑐⊤, and the second equations 𝑦 𝑖 (𝑏 𝑖 − ℓ𝑗=1 𝑎 𝑖𝑗 𝑥 𝑗 ) = 0 are equivalent
Í
to 𝑦 𝑖 (−ℎ 𝑖 (𝑥)) = 0 and therefore to 𝜇𝑖 ℎ 𝑖 (𝑥) = 0 as stated in (4.51). So Theorem 4.11
is a consequence of the strong duality theorem, in fact in a stronger form because
it does not require the constraint qualification that the gradients in (4.51) for the
tight constraints are linearly independent.
Conversely, the strong duality theorem for an LP with unrestricted variables
(5.34) can also be seen as a special case of the KKT Theorem 4.11, where one can
argue separately that the constraint qualification is not needed.
162 Chapter 5. Linear Optimisation

However, although we do not go into details here, mathematically it is not


the right approach to prove LP duality as a special case of the KKT Theorem.
Rather, LP duality is a more basic observation (which we have proved here in
full). The KKT Theorem applies to the derivatives of differentiable functions. The
fundamental purpose of derivatives is to describe local approximations of the given
functions by linear functions. These linear approximations can be studied with
the help of linear programming, which can be used to prove the KKT Theorem.
However, this has to be done carefully, and requires separate treatment. For
example, the constraint qualifications are needed to make sure that this approach
works properly. See Kuhn (1991) for a detailed discussion.

5.11 The Simplex Algorithm: Example

The simplex algorithm is a method to find a solution to a linear program. It


has been successfully applied to very large LPs with hundreds of thousands of
variables. The algorithm applies to LPs in equality form, to which a standard LP
in inequality form is converted with the help of slack variables as in (5.37).
In this section, we describe the simplex algorithm for Example 5.1, written
with slack variables 𝑠 1 and 𝑠2 as the problem: for 𝑥1 , 𝑥2 , 𝑥3 , 𝑠1 , 𝑠2 ≥ 0 subject to
3𝑥 1 + 4𝑥 2 + 2𝑥3 + 𝑠1 =7
𝑥1 + 𝑥2 + 𝑥3 + 𝑠2 = 2 (5.57)
maximise 8𝑥 1 + 10𝑥2 + 5𝑥3 .
Because 𝑏 ≥ 0, this system 𝐴𝑥 + 𝑠 = 𝑏 has an easy first solution where 𝑥 = 0
and 𝑠 = 𝑏. These equations are now rewritten with 𝑠1 and 𝑠 2 as functions of the
remaining variables 𝑥1 , 𝑥2 , 𝑥3 . In addition, an extra variable 𝑧 denotes the current
value of the objective function. Then (5.57) is equivalent to: maximise 𝑧 subject to
𝑥1 , 𝑥2 , 𝑥3 , 𝑠1 , 𝑠2 ≥ 0 and
𝑠1 = 7 − 3𝑥1 − 4𝑥2 − 2𝑥 3
𝑠2 = 2 − 𝑥1 − 𝑥2 − 𝑥3 (5.58)
𝑧 = 0 + 8𝑥1 + 10𝑥2 + 5𝑥 3
A system such as (5.58) is called a dictionary and is defined as follows. Assume
the original system has 𝑚 equality constraints. Then 𝑚 of the variables (here 𝑠1
and 𝑠2 ), called basic variables, are expressed in terms of the remaining nonbasic
variables (here 𝑥 1 , 𝑥2 , 𝑥3 ). This system of equations is always equivalent to the original
equality constraints. The objective function 𝑧 is also expressed as a function of the
nonbasic variables, and written below the horizontal line.
The basic solution that corresponds to a dictionary is obtained by setting all
nonbasic variables to zero. It is called a basic feasible solution when the resulting
5.11. The Simplex Algorithm: Example 163

basic variables are nonnegative. In the dictionary, the values for the basic variables
in this basic feasible solution are just the constants that follow the equality signs,
with the corresponding value for 𝑧 beneath the horizontal line. In (5.58) these
values are 𝑠 1 = 7, 𝑠2 = 2, 𝑧 = 0.
Starting with an initial basic feasible solution such as (5.58), the simplex
algorithm proceeds in steps that rewrite the dictionary. In our example, we record
the changes of the dictionary, and keep in mind that 𝑧 should be maximised and
all variables should stay nonnegative. In each step, one nonbasic variable becomes
basic (it is said to enter the basis) and a basic variable becomes nonbasic (this variable
is said to leave the basis).
For a given dictionary, the entering variable is chosen so as to improve the value
of the objective function when that variable is increased from zero in the current basic
feasible solution. In (5.58), this will happen by increasing any of 𝑥1 , 𝑥2 , 𝑥3 because
they all have a positive coefficient in the linear equation for 𝑧. Suppose 𝑥 2 is
chosen as the entering variable (for example, because it has the largest coefficient).
Suppose the other nonbasic variables 𝑥 1 and 𝑥3 stay at zero and 𝑥2 increases. Then
𝑧 = 0 + 10𝑥2 (the desired increase), 𝑠 1 = 7 − 4𝑥 2 , and 𝑠2 = 2 − 𝑥2 . In order to
maintain feasibility, we need 𝑠1 = 7 − 4𝑥2 ≥ 0 and 𝑠 2 = 2 − 𝑥2 ≥ 0, where these
two constraints are equivalent to 74 = 1.75 ≥ 𝑥2 and 2 ≥ 𝑥2 . The first of these
is the stronger constraint: when 𝑥2 is increased from 0 to 1.75, then 𝑠1 = 0 and
𝑠2 = 0.25 > 0. For that reason, 𝑠 1 is chosen as the leaving variable, and we rewrite
the first equation in (5.58) so that 𝑥 2 is on the left and 𝑠1 is on the right, giving

4𝑥 2 = 7 − 3𝑥1 − 𝑠1 − 2𝑥3
𝑠2 = 2 − 𝑥1 − 𝑥2 − 𝑥3 (5.59)
𝑧 = 0 + 8𝑥1 + 10𝑥 2 + 5𝑥3

However, this is not a dictionary because 𝑥2 is still on the right-hand side of the
second and third equation, but should appear only on the left. To remedy this, we
first rewrite the first equation so that 𝑥2 has coefficient 1, and then substitute this
equation into the other two equations:

𝑥2 = 1.75 − 0.75𝑥1 − 0.25𝑠1 − 0.5𝑥3


𝑠2 = 2− 𝑥1 − 𝑥3
− (1.75 − 0.75𝑥1 − 0.25𝑠1 − 0.5𝑥3 ) (5.60)
𝑧= 0+ 8𝑥1 + 5𝑥3
+ 10(1.75 − 0.75𝑥1 − 0.25𝑠1 − 0.5𝑥3 )

which gives the new dictionary with basic variables 𝑥2 and 𝑠2 and nonbasic
variables 𝑥 1 , 𝑠1 , 𝑥3 :
164 Chapter 5. Linear Optimisation

𝑥2 = 1.75 − 0.75𝑥1 − 0.25𝑠1 − 0.5𝑥3


𝑠2 = 0.25 − 0.25𝑥1 + 0.25𝑠1 − 0.5𝑥3 (5.61)
𝑧 = 17.5 + 0.5𝑥 1 − 2.5𝑠 1 + 0𝑥3

The basic feasible solution corresponding to (5.61) is 𝑥 2 = 1.75, 𝑠2 = 0.25 and has
objective function value 𝑧 = 17.5. The latter can still be improved by increasing
𝑥1 , which is now the unique choice for entering variable because neither 𝑠 1 nor
𝑥3 have a positive coefficient in this representation of 𝑧. Increasing 𝑥1 from zero
imposes the constraints 𝑥2 = 1.75 − 0.75𝑥1 ≥ 0 and 𝑠2 = 0.25 − 0.25𝑥1 ≥ 0, where
the second is stronger, since 𝑠 2 becomes zero when 𝑥1 = 1 while 𝑥 2 is still positive.
So 𝑥1 enters and 𝑠2 leaves the basis. Similar to the step from (5.58) to (5.59), we
bring 𝑥 1 to the left and 𝑠2 to the right side of the equation,

𝑥2 = 1.75 − 0.75𝑥1 − 0.25𝑠1 − 0.5𝑥3


0.25𝑥1 = 0.25 − 𝑠2 + 0.25𝑠1 − 0.5𝑥3 (5.62)
𝑧 = 17.5 + 0.5𝑥1 − 2.5𝑠 1 + 0𝑥3

and substitute the resulting equation 𝑥1 = 1 − 4𝑠 2 + 𝑠1 − 2𝑥3 for 𝑥1 into the other
two equations:
𝑥2 = 1.75 − 0.25𝑠1 − 0.5𝑥3
− 0.75(1 − 4𝑠 2 + 𝑠1 − 2𝑥3 )
𝑥1 = 1 − 4𝑠 2 + 𝑠1 − 2𝑥3 (5.63)
𝑧= 17.5 − 2.5𝑠1 + 0𝑥3
+ 0.5(1 − 4𝑠2 + 𝑠1 − 2𝑥 3 )
which gives the next dictionary with 𝑥 2 and 𝑥1 as basic variables and 𝑠2 , 𝑠1 , 𝑥3 as
nonbasic variables:
𝑥2 = 1 + 3𝑠2 − 𝑠1 + 𝑥3
𝑥1 = 1 − 4𝑠2 + 𝑠1 − 2𝑥3 (5.64)
𝑧 = 18 − 2𝑠2 − 2𝑠 1 − 𝑥3
As always, this dictionary is equivalent to the original system of equations (5.57),
with basic feasible solution 𝑥1 = 1, 𝑥2 = 1, and corresponding objective function
value 𝑧 = 18. In the last line in (5.64), no nonbasic variable has a positive coefficient.
This means that no increase from zero of a nonbasic variable can improve the
objective function. Hence this basic feasible solution is optimal, and the algorithm
terminates.
Converting one dictionary to another by exchanging a nonbasic (entering)
variable with a basic (leaving) variable is commonly referred to as pivoting. The
column of the entering variable and the row of the leaving variable define a
nonzero coefficient of the entering variable known as a pivot element. Pivoting
5.11. The Simplex Algorithm: Example 165

amounts to a manipulation of the matrix of coefficients of all variables and of the


right-hand side (often called a tableau). This manipulation involves row operations
that represent the substitutions as performed in (5.60) and (5.63). In such a row
operation, the pivot row is divided by the pivot element, and suitable multiples of
the resulting new row are subtracted from the other rows.
That is, we can express the variable substitution that leads to the new dictionary
in terms of suitable row operations of the system of equations. This is easiest seen
by keeping all variables on one side similar to (5.57). We rewrite (5.58) as

3𝑥1 + 4𝑥2 + 2𝑥3 + 𝑠1 =7


𝑥1 + 𝑥2 + 𝑥3 + 𝑠2 = 2 (5.65)
𝑧 − 8𝑥1 − 10𝑥2 − 5𝑥3 =0

where we now have to remember that in the expression for 𝑧 a potential entering
variable is identified by a negative coefficient. In (5.65) the basic variables are 𝑠1
and 𝑠2 which have a unit vector as a column of coefficients, which has entry 1 in the
row of the basic variable and entry 0 elsewhere.
With the 𝑥2 as the entering and 𝑠1 as the leaving variable in (5.65), pivoting
amounts to creating a unit vector in the column for 𝑥2 . This means to divide the
first (pivot) row by 4 so that 𝑥2 has coefficient 1 in that row. The new first row is
then subtracted from the second row, and 10 times the new first row is added to
the third row, so that the coefficient of 𝑥 2 in those rows becomes zero:
0.75𝑥1 + 𝑥2 + 0.5𝑥3 + 0.25𝑠1 = 1.75
0.25𝑥1 + 0.5𝑥3 − 0.25𝑠1 + 𝑠2 = 0.25 (5.66)
𝑧 − 0.5𝑥1 + 0𝑥3 + 2.5𝑠1 = 17.5

These row operations have the same effect as the substitutions in (5.60). The system
(5.66) is equivalent to the second dictionary (5.61). The basic variables 𝑥2 and 𝑠 2
are identified by their unit-vector columns. Note that 𝑧 is expressed only in terms
of the nonbasic variables.
The entering variable in (5.66) is 𝑥1 and the leaving variable is 𝑠2 , so that the
unit vector for that second row should now appear in the column for 𝑥1 rather
than 𝑠2 . The second row is divided by the pivot element 0.25 (i.e., multiplied by 4)
to give 𝑥1 coefficient 1, and the coefficients of 𝑥 1 in the other rows give the suitable
multiplier to subtract the new second row from the other rows, namely 0.75 for
the first and −0.5 for the third row. This gives

𝑥 2 − 𝑥3 + 𝑠 1 − 3𝑠2 = 1
𝑥1 + 2𝑥3 − 𝑠 1 + 4𝑠2 = 1 (5.67)
𝑧 + 𝑥3 + 2𝑠 1 + 2𝑠2 = 18
166 Chapter 5. Linear Optimisation

which is equivalent to the final dictionary (5.64).


The only difference between the “dictionaries” (5.58), (5.61), (5.64) and the
“tableaus” (5.65), (5.66), (5.67) is the sign of the nonbasic variables, and that the
columns of all variables stay in place in a tableau, which requires to identify the
basic variables; they have unit vectors as columns.

5.12 The Simplex Algorithm: General Description *

This section describes the simplex algorithm in general. We also define in generality
the relevant terms, many of which have already been introduced in the previous
section.
The simplex algorithm applies to an LP (5.31) in equality form: maximise 𝑐⊤𝑥
subject to 𝐴𝑥 = 𝑏, 𝑥 ≥ 0, for given 𝐴 ∈ R𝑚×𝑛 , 𝑏 ∈ R𝑚 , 𝑐 ∈ R𝑛 . We assume that
the 𝑚 rows of the matrix 𝐴 are linearly independent. This is automatically the
case if 𝐴 has been obtained from an LP in inequality form by adding an identity
matrix for the slack variables as in (5.38). In general, if the row vectors of 𝐴 are
linearly dependent, then some row of 𝐴 is a linear combination of the other rows.
The respective equation in 𝐴𝑥 = 𝑏 is then either also the linear combination of the
other equations, that is, it can be omitted, or it contradicts the linear combination
and 𝐴𝑥 = 𝑏 has no solution. Therefore, it is no restriction to assume that 𝐴 has full
row rank 𝑚.
Let 𝐴1 , . . . , 𝐴𝑛 be the 𝑛 columns of 𝐴. A basis (of 𝐴) is an 𝑚-element subset
𝐵 of the column indices {1, . . . , 𝑛} so that the vectors 𝐴 𝑗 for 𝑗 ∈ 𝐵 are linearly
independent (sometimes “basis” also refers to such a set of vectors, that is, to a
basis of the column space of 𝐴). For a basis 𝐵 of 𝐴, a feasible solution 𝑥 to (5.31)
(that is, 𝐴𝑥 = 𝑏 and 𝑥 ≥ 0) where 𝑥 𝑗 > 0 implies 𝑗 ∈ 𝐵 is called a basic feasible
solution. The components 𝑥 𝑗 of 𝑥 for 𝑗 ∈ 𝐵 are called basic variables.
For a basis 𝐵, let 𝑁 = {1, . . . , 𝑛} −𝐵, where 𝑗 ∈ 𝑁 means 𝑥 𝑗 is a nonbasic variable.
Let 𝐴𝐵 denote the 𝑚 × 𝑚 submatrix of 𝐴 that consists of the basic columns 𝐴 𝑗 for
𝑗 ∈ 𝐵, and let 𝐴 𝑁 be the submatrix that consists of the nonbasic columns. Similarly,
let 𝑥 𝐵 and 𝑥 𝑁 be the subvectors of 𝑥 with components 𝑥 𝑗 for 𝑗 ∈ 𝐵 and 𝑗 ∈ 𝑁,
respectively. We write the equations 𝐴𝑥 = 𝑏 in the form 𝐴𝑥 = 𝐴𝐵 𝑥 𝐵 + 𝐴 𝑁 𝑥 𝑁 = 𝑏,
assuming a suitable arrangement of the columns 𝐴 𝑗 into 𝐴𝐵 and 𝐴 𝑁 .
Because the column vectors of 𝐴𝐵 are linearly independent, 𝐴𝐵 is invertible,
and the basic solution 𝑥 to 𝐴𝑥 = 𝐴𝐵 𝑥 𝐵 + 𝐴 𝑁 𝑥 𝑁 = 𝑏 that corresponds to the basis 𝐵
is uniquely given by 𝑥 𝑁 = 0 and 𝑥 𝐵 = 𝐴−1 𝐵
𝑏. By definition, this is a basic feasible
solution if it is nonnegative, that is, if 𝑥 𝐵 ≥ 0.
A basic feasible solution is uniquely specified by a basis 𝐵. The converse does
not hold because 𝑥 𝐵 may have zero components 𝑥 𝑖 for some 𝑖 ∈ 𝐵. Such a basis 𝐵
and its corresponding basic feasible solution is called degenerate. In that case, 𝐵 can
5.12. The Simplex Algorithm: General Description * 167

equally well be replaced by a basis 𝐵 − {𝑖} ∪ {𝑗} (for suitable 𝑗 ∈ 𝑁) that has the
same basic feasible solution 𝑥. This lack of uniqueness requires certain precautions
when defining the simplex algorithm; for the moment, we assume that no feasible
basis is degenerate.
By Lemma 5.9, if the LP (5.31) has a feasible solution, then it also has a basic
feasible solution, because any solution to 𝐴𝑥 = 𝑏 can be iteratively modified until 𝑏
is only a positive linear combination of linearly independent columns of 𝐴. If these
are fewer than 𝑚 columns, they can be extended with suitable further columns 𝐴 𝑗
to form a basis of the column space of 𝐴, with corresponding coefficients 𝑥 𝑗 = 0;
the corresponding basic feasible solution is then degenerate.
The simplex algorithm works exclusively with basic feasible solutions, which
are iteratively changed to improve the objective function. Thereby, it suffices to
change only one basic variable at a time, which is called pivoting.
Assume that a basic feasible solution to the LP (5.31) has been found. In
general, this requires an initialisation phase of the simplex algorithm, which fails
if the LP is infeasible, a case that is discovered at that point. We will describe this
initializing “first phase” later.
Consider a basic feasible solution with basis 𝐵, and let 𝑁 denote the index set
of the nonbasic columns as above. The following equations are equivalent for any
𝑥 ∈ R𝑛 :

𝐴𝑥 = 𝑏
𝐴𝐵 𝑥 𝐵 + 𝐴𝑁 𝑥 𝑁 = 𝑏
𝐴−1 −1 −1
𝐵 𝐴𝐵 𝑥 𝐵 + 𝐴𝐵 𝐴 𝑁 𝑥 𝑁 = 𝐴𝐵 𝑏
𝑥 𝐵 = 𝐴−1 −1
𝐵 𝑏 − 𝐴𝐵 𝐴𝑁 𝑥 𝑁
Õ (5.68)
𝑥𝐵 = 𝐴−1
𝐵 𝑏 − 𝐴−1
𝐵 𝐴𝑗 𝑥𝑗
𝑗∈𝑁
Õ
𝑥𝐵 = 𝑏 − 𝐴𝑗 𝑥 𝑗
𝑗∈𝑁

with the notation 𝑏 = 𝐴−1𝐵


𝑏 and 𝐴 𝑗 = 𝐴−1
𝐵
𝐴 𝑗 for 𝑗 ∈ 𝑁. The last three equations in
(5.68) (which are the same equation in different notation) represent a dictionary as
written above the horizontal line in the examples (5.58), (5.61), (5.64). A dictionary
expresses the basic variables 𝑥 𝐵 in terms of the nonbasic variables 𝑥 𝑁 , and defines
the basic solution for the basis 𝐵 by setting 𝑥 𝑁 = 0, which is feasible when 𝑏 ≥ 0.
The value 𝑐⊤𝑥 of the objective function is represented as follows. Let 𝑐 𝐵 and 𝑐 𝑁
denote the subvectors of 𝑐 with the components 𝑐 𝑗 for 𝑗 ∈ 𝐵 and 𝑗 ∈ 𝑁, respectively.
Then
168 Chapter 5. Linear Optimisation

𝑐⊤𝑥 = 𝑐⊤ ⊤
𝐵 𝑥𝐵 + 𝑐𝑁 𝑥𝑁
= 𝑐⊤ −1 −1 ⊤
𝐵 (𝐴 𝐵 𝑏 − 𝐴 𝐵 𝐴 𝑁 𝑥 𝑁 ) + 𝑐 𝑁 𝑥 𝑁
= 𝑐⊤ −1 ⊤ ⊤ −1
𝐵 𝐴 𝐵 𝑏 + (𝑐 𝑁 − 𝑐 𝐵 𝐴 𝐵 𝐴 𝑁 ) 𝑥 𝑁
(5.69)
Õ
= 𝑐⊤ −1
𝐵 𝐴𝐵 𝑏 + (𝑐 𝑗 − 𝑐⊤ −1
𝐵 𝐴𝐵 𝐴 𝑗 ) 𝑥 𝑗
𝑗∈𝑁

which expresses the objective function 𝑐⊤𝑥 in terms of the nonbasic variables, as
in the equation below the horizontal line in the examples (5.58), (5.61), (5.64). In
(5.69), 𝑐⊤ 𝐴−1 𝑏 is the value of the objective function for the basic feasible solution
𝐵 𝐵
where 𝑥 𝑁 = 0. This is an optimal solution if

𝑐 𝑗 − 𝑐⊤ −1
𝐵 𝐴𝐵 𝐴 𝑗 ≤ 0 for all 𝑗 ∈ 𝑁 , (5.70)

because 𝑥 𝑗 ≥ 0 in any feasible solution, so that by (5.69) 𝑐⊤𝑥 is maximal if (5.70)


holds. Condition (5.70) is the criterion for optimality used by the simplex algorithm.
If this condition is fulfilled, we have also obtained an optimal solution to the dual
LP (5.32) of (5.31), namely
𝑦⊤ = 𝑐⊤ −1
𝐵 𝐴𝐵 , (5.71)
which is feasible for the dual LP (5.32) because 𝑦⊤ 𝐴𝐵 = 𝑐⊤ 𝐵
by (5.71) and 𝑦⊤𝐴 𝑗 ≥ 𝑐 𝑗
for 𝑗 ∈ 𝑁 by (5.70), that is, 𝑦⊤𝐴 𝑁 ≥ 𝑐⊤ 𝑁
, so altogether 𝑦⊤𝐴 ≥ 𝑐⊤. It is optimal
because 𝑦⊤𝑏 = 𝑐⊤ 𝐴−1 𝑏 = 𝑐⊤
𝐵 𝐵
𝑥 = 𝑐⊤𝑥 when 𝑥 𝑁 = 0, that is, dual and primal
𝐵 𝐵
objective function have the same value.
The optimality criterion (5.70) fails if

𝑐 𝑗 − 𝑐⊤ −1
𝐵 𝐴𝐵 𝐴 𝑗 > 0 for some 𝑗 ∈ 𝑁 . (5.72)

In that case, the value of the objective function will be increased if 𝑥 𝑗 can assume a
positive value. The simplex algorithm therefore looks for such a 𝑗 in (5.72) and
makes 𝑥 𝑗 a new basic variable, called the entering variable. The index 𝑗 is said to
enter the basis. This has to be done while preserving feasibility, and so that there
are again 𝑚 basic variables. Thereby, some element 𝑖 of 𝐵 leaves the basis, where 𝑥 𝑖
is called the leaving variable.
To demonstrate this change of basis, consider the last equation in (5.68) that
expresses the variables 𝑥 𝐵 in terms of the nonbasic variables 𝑥 𝑁 . Assume that all
components of 𝑥 𝑁 are kept zero except 𝑥 𝑗 . Then (5.68) has the form

𝑥𝐵 = 𝑏 − 𝐴 𝑗 𝑥 𝑗 , (5.73)

where by (5.69) and (5.72)

𝑐⊤𝑥 = 𝑐⊤ ⊤
𝐵 𝑏 + (𝑐 𝑗 − 𝑐 𝐵 𝐴 𝑗 ) 𝑥 𝑗 with 𝑐 𝑗 − 𝑐⊤
𝐵 𝐴𝑗 > 0 . (5.74)

For 𝑥 𝑗 = 0, (5.73) represents the current basic feasible solution 𝑥 𝐵 = 𝑏. How


long does 𝑥 𝐵 stay nonnegative if 𝑥 𝑗 is gradually increased? If 𝐴 𝑗 has no positive
5.12. The Simplex Algorithm: General Description * 169

components, then 𝑥 𝑗 can be made arbitrarily large, where because of (5.74) 𝑐⊤𝑥
increases arbitrarily and the LP is unbounded.
Hence, suppose that some components of 𝐴 𝑗 are positive. It is useful to
consider the 𝑚 rows of the equation (5.73)) as numbered with the elements of 𝐵
because the left-hand side of that equation is 𝑥 𝐵 (in practice, one would record for
each row 1, . . . , 𝑚 of the dictionary represented by the last equation in (5.68) the
respective element of 𝐵, as for example in (5.64)). That is, let the components of 𝐴 𝑗
be 𝑎 𝑖𝑗 for 𝑖 ∈ 𝐵. At least one of them is positive, and any of these positive elements
imposes an upper bound on the choice of 𝑥 𝑗 in (5.73) so that 𝑥 𝐵 stays nonnegative,
by the condition 𝑏 𝑖 − 𝑎 𝑖𝑗 𝑥 𝑗 ≥ 0 or equivalently 𝑏 𝑖 /𝑎 𝑖𝑗 ≥ 𝑥 𝑗 (because 𝑎 𝑖𝑗 > 0). This
defines the maximum choice of 𝑥 𝑗 by the following so-called minimum ratio test
(which we have encountered in similar form before in (5.24)):

𝑥 𝑗 = min { 𝑏ℓ /𝑎ℓ 𝑗 | 𝑎ℓ 𝑗 > 0, ℓ ∈ 𝐵 } = 𝑏 𝑖 /𝑎 𝑖𝑗 for some 𝑖 ∈ 𝐵, 𝑎 𝑖𝑗 > 0 . (5.75)

For at least one 𝑖 ∈ 𝐵, the minimum ratio is achieved as stated in (5.75). The
corresponding variable 𝑥 𝑖 is made the leaving variable and becomes nonbasic.
This defines the pivoting step: The entering variable 𝑥 𝑗 is made basic and the
leaving variable 𝑥 𝑖 is made nonbasic, and the basis 𝐵 is replaced by 𝐵 − {𝑖} ∪ {𝑗}.
We show that the column vectors 𝐴 𝑘 for 𝑘 ∈ 𝐵 − {𝑖} ∪ {𝑗} are linearly independent.
Consider a linear combination of these vectors that represents the zero vector,
𝑘 𝐴 𝑘 𝑡 𝑘 = 0, which implies 𝑘 𝐴 𝐵 𝐴 𝑘 𝑡 𝑘 = 0. The vectors 𝐴 𝐵 𝐴 𝑘 for 𝑘 ∈ 𝐵 − {𝑖}
Í Í −1 −1

are unit vectors with zeros in all rows except row 𝑘, so their linear combination has
a zero in row 𝑖. On the other hand, 𝐴−1 𝐵
𝐴 𝑘 for 𝑘 = 𝑗 is the vector 𝐴 𝑗 which in row 𝑖
has entry 𝑎 𝑖𝑗 > 0. This implies 𝑡 𝑗 = 0. For all 𝑘 ∈ 𝐵 − {𝑖}, the vectors 𝐴 𝑘 are linearly
independent, so that 𝑡 𝑘 = 0 for all 𝑘. Thus, 𝐵 − {𝑖} ∪ {𝑗} is indeed a new basis.
We have described an iteration of the simplex algorithm. In summary, it
consists of the following steps.
1. Given a basic feasible solution as in (5.68) with basis 𝐵, choose some entering
variable 𝑥 𝑗 according to (5.72). If no such variable exists, stop: the current
solution is optimal.
2. With 𝑏 = 𝐴−1 𝐵
𝑏 and 𝐴 𝑗 = 𝐴−1𝐵
𝐴 𝑗 , determine the maximum value of 𝑥 𝑗 so that
𝑏 − 𝐴 𝑗 𝑥 𝑗 ≥ 0. If there is no such maximum because 𝐴 𝑗 ≤ 0, then stop: the LP
is unbounded. Otherwise, set 𝑥 𝑗 to the minimum ratio in (5.75).
3. Replace the current basic feasible solution 𝑥 𝐵 = 𝑏 by 𝑥 𝐵 = 𝑏 − 𝐴 𝑗 𝑥 𝑗 . At least
one component 𝑥 𝑖 of this vector is zero, which is made the leaving variable and
is replaced by the entering variable 𝑥 𝑗 . Replace the basis 𝐵 by 𝐵 − {𝑖} ∪ {𝑗}.
Go back to Step 1.
A given basis 𝐵 determines a unique basic feasible solution 𝑥 𝐵 . By increasing
the value of the entering variable 𝑥 𝑗 , the feasible solution changes according to
170 Chapter 5. Linear Optimisation

(5.73). During this continuous change, this feasible solution is not a basic solution:
it has 𝑚 + 1 positive variables, namely 𝑥ℓ for ℓ ∈ 𝐵 and 𝑥 𝑗 . Unless the LP is
unbounded, 𝐴 𝑗 has positive components 𝑎ℓ 𝑗 , so the respective variables 𝑥ℓ decrease
while 𝑥 𝑗 increases. The smallest value of 𝑥 𝑗 where at least one component 𝑥 𝑖 of 𝑥 𝐵
becomes zero is given by the minimum ratio in (5.75). At this point, again only 𝑚
(or fewer) variables are nonzero, and the leaving variable 𝑥 𝑖 is replaced by 𝑥 𝑗 so
that indeed a new basic feasible solution and corresponding basis 𝐵 − {𝑖} ∪ {𝑗} is
obtained.
However, the simplex algorithm does not require a continuous change of
the values of 𝑚 + 1 variables. Instead, the value of the entering value 𝑥 𝑗 can
directly “jump” to the minimum ratio in (5.75). What is important is the next
basis. The change of the basis requires an update of the inverse 𝐴−1 𝐵
of the basis
matrix in order to obtain the new dictionary in (5.68). (As demonstrated in the
previous section, this update can be implemented by suitable row operations on
the “tableau” representation of the dictionary, which also determines the new
basic feasible solution.) In this view, the simplex algorithm is a combinatorial
method that computes a sequence of bases, which are certain finite subsets of the
set {1, . . . , 𝑛} that represents the columns of the original system 𝐴𝑥 = 𝑏.
We have made an important assumption, namely that no feasible basis is
degenerate, that is, all basic variables in a basic feasible solution have positive
values. This implies 𝑏 𝑖 > 0 in (5.75), so that the entering variable 𝑥 𝑗 takes on a
positive value and the objective function for the basic feasible solution increases
with each iteration by (5.74). Hence, no basis is revisited, and the simplex algorithm
terminates because there are only finitely many bases. Furthermore, Step 3 of the
above summary shows that in the absence of degeneracy the leaving variable 𝑥 𝑖 ,
and thus the minimum in the minimum-ratio test (5.75), is unique, because if two
variables could leave the basis because they become zero at the same time, then
only one of them leaves and the other remains basic but has value zero in the new
basic feasible solution.
If there are degenerate basic feasible solutions, then the minimum in (5.75)
may be zero because 𝑏 𝑖 = 0 for some 𝑖 where 𝑎 𝑖𝑗 > 0. Then the entering variable
𝑥 𝑗 , which was zero as a nonbasic variable, enters the basis but stays at zero in
the new basic feasible solution. In that case, only the basis has changed but not
the feasible solution and also not the value of the objective function. In fact, it is
possible that this results in a cycle of the simplex algorithm (when the same basis
is revisted) and thus a failure to terminate. This behaviour is rare, and degeneracy
itself an “accident” that only occurs when there are special relationships between
the entries of the payoff matrix. Nevertheless, degeneracy can be dealt with in a
systematic manner, which we do not treat in this guide. For a detailed treatment
see chapter 3 of Chvátal (1983).
5.13. Reminder of Learning Outcomes 171

We also need to find an initial feasible solution to start the simplex algorithm.
For that purpose, we use a “first phase” with a different objective function that
establishes whether the LP (5.31) is feasible, similar to the approach in (5.27). First,
choose an arbitrary basis 𝐵 and let 𝑏 = 𝐴−1 𝐵
𝑏. If 𝑏 ≥ 0, then 𝑥 𝐵 = 𝑏 is already a
basic feasible solution and nothing needs to be done. Otherwise, 𝑏 has at least one
negative component. Define the 𝑚-vector ℎ = 𝐴𝐵 1 where 1 is the all-one vector.
That is, ℎ is just the sum of the columns of 𝐴𝐵 . We add −ℎ as an extra column to
the system 𝐴𝑥 = 𝑏 with a new variable 𝑡 and consider the following LP:

maximise −𝑡
subject to 𝐴𝑥 − ℎ𝑡 = 𝑏 (5.76)
𝑥, 𝑡≥0
We find a basic feasible solution to this LP with a single pivoting step from the
(infeasible) basis 𝐵. Namely, the following are equivalent, similar to (5.68):

𝐴𝑥 − ℎ𝑡 = 𝑏
𝐴𝐵 𝑥 𝐵 + 𝐴 𝑁 𝑥 𝑁 − ℎ𝑡 = 𝑏
(5.77)
𝑥 𝐵 = 𝐴−1 −1 −1
𝐵 𝑏 − 𝐴𝐵 𝐴𝑁 𝑥 𝑁 + 𝐴𝐵 ℎ 𝑡

𝑥𝐵 = 𝑏 − 𝐴−1
𝐵 𝐴𝑁 𝑥 𝑁 + 1 𝑡

where we now let 𝑡 enter the basis and increase 𝑡 such that 𝑏 + 1 𝑡 ≥ 0. For the
smallest such value of 𝑡, at least one component 𝑥 𝑖 of 𝑥 𝐵 is zero and becomes
the leaving variable. After the pivot with 𝑥 𝑖 leaving and 𝑡 entering the basis one
obtains a basic feasible solution to (5.76).
The LP (5.76) is therefore feasible, and its objective function bounded from
above by zero. The original system 𝐴𝑥 = 𝑏, 𝑥 ≥ 0 is feasible if and only if the
optimum in (5.76) is zero. Suppose this is the case, which will be found out
by solving the LP (5.76) with the simplex algorithm. Then this “first phase”
terminates with a basic feasible solution to (5.76) where 𝑡 = 0 which is then also a
feasible solution to 𝐴𝑥 = 𝑏, 𝑥 ≥ 0. The simplex algorithm can then proceed with
maximising the original objective function 𝑐⊤𝑥 as described earlier.

5.13 Reminder of Learning Outcomes

After studying this chapter, you should be able to:


• for linear inequalities in dimension two, draw with confidence the correspond-
ing lines in the plane and the halfspace (here half-plane) where that inequality
is valid
• draw the feasible set for an LP with two real variables, and indicate the
direction of the objective function
172 Chapter 5. Linear Optimisation

• state the dual LP of an LP in inequality form, and also later (see Section 5.7)
for an LP in equality form or with unconstrained variables
• state the Lemma of Farkas, and apply it to examples (as in Exercise 5.2)
• understand the differences between feasible, infeasible, and unbounded LPs
and how this relates to the dual LP
• state the complementarity slackness condition and apply it to finding optimal
solutions in small examples
• describe the role of dictionaries for the simplex algorithm
• apply the simplex algorithm to small examples.

5.14 Exercises for Chapter 5

Exercise 5.1.
(a) Draw a picture of the set of points (𝑥1 , 𝑥2 ) in R2 that fulfill the following
inequalities:
𝑥1 ≥ 0
𝑥2 ≥ 0
−𝑥 1 − 𝑥 2 ≤ − 2
−𝑥 1 + 𝑥 2 ≤ 0
𝑥2 ≤ 3.

(b) For each of the following objective functions 𝑓 , 𝑔, ℎ : R2 → R, find their


maximum subject to the constraints in (a), or explain why a maximum does
not exist:
𝑓 (𝑥1 , 𝑥2 ) = −𝑥1 + 2𝑥2 ,
𝑔(𝑥1 , 𝑥2 ) = −2𝑥 1 + 𝑥 2 ,
ℎ(𝑥1 , 𝑥2 ) = 2𝑥1 + 𝑥2 .
(c) For each of the linear optimisation problems (linear programs) in (b), write
down the corresponding dual linear program. With the help of the LP duality
theorem, find an optimal solution to each dual linear program, or explain why
an optimal solution does not exist.
Exercise 5.2. Consider the matrix 𝐴 and the vectors 𝑏 and 𝑏 ′ defined by
! ! !
4 −1 0 2 1 1
𝐴= , 𝑏= , 𝑏′ = .
1 0 3 2 0 2
Find a nonnegative vector 𝑥 so that 𝐴𝑥 = 𝑏 or prove that no such vector exists;
similarly, find a nonnegative vector 𝑥 ′ so that 𝐴𝑥 ′ = 𝑏 ′ or prove that no such vector
exists. You can use standard results.
5.14. Exercises for Chapter 5 173

Exercise 5.3. Let the vectors 𝐴𝑛 ∈ R2 for 𝑛 = 0, 1, 2, 3, . . . be defined by 𝐴𝑛 = (𝑛, 1).


Let 𝐶 be the cone “generated” by these vectors, that is, the set of their finite
nonnegative linear combinations, according to
n Õ o
𝐶= 𝐴𝑗 𝑥 𝑗 𝐽 ⊂ {0, 1, 2, . . .}, |𝐽 | < ∞, 𝑥 𝑗 ≥ 0 for 𝑗 ∈ 𝐽 .
𝑗∈𝐽

Draw a picture of the vectors 𝐴 𝑗 for the first few values of 𝑗. What is the set 𝐶?
Does 𝑏 = (1, 0) belong to 𝐶? Is there a vector 𝑦 ∈ R2 such that 𝑦⊤𝐴 𝑗 ≥ 0 for all 𝑗
and 𝑦⊤𝑏 < 0? Discuss the Lemma of Farkas for this example.

Exercise 5.4. Consider the following linear program: for 𝑥1 , 𝑥2 , 𝑥3 ≥ 0 subject to

𝑥 1 + 𝑥2 + 2𝑥 3 ≤ 4
2𝑥 1 + 𝑥2 + 4𝑥 3 ≤ 7
2𝑥 1 + 4𝑥 3 ≤ 5

maximise 3𝑥 1 + 2𝑥2 + 4𝑥 3 .

(a) Write down the Tucker diagram for this LP. Explain why this LP is feasible
and bounded.
(b) Write down this LP in equality form with slack variables 𝑠1 , 𝑠2 , 𝑠3 .
(c) Apply the simplex algorithm to the LP in (b). Always choose the entering
variable with the largest coefficient in the current objective function.
(d) Verify the optimal solution found in (c) with an optimal dual solution, and
explain why both primal and dual optimal solution are unique, with the help
of the complementary slackness conditions.
(e) Find a set of three columns of the system in (b) that does not form a basis.

You might also like