MODULE 4- Dynamic Programming
MODULE 4- Dynamic Programming
DYNAMIC PROGRAMMING: Three basic examples, The Knapsack Problem and Memory
Functions, Warshall’s and Floyd’s Algorithms.
Dynamic programming is a technique for solving problems with overlapping sub problems.
Typically, these sub problems arise from a recurrence relating a given problem’s solution to
solutions of its smaller sub problems. Rather than solving overlapping sub problems again and
again, dynamic programming suggests solving each of the smaller subproblems only once and
recording the results in a table from which a solution to the original problem can then be
obtained. [From T1]
The Dynamic programming can also be used when the solution to a problem can be viewed
as the result of sequence of decisions. [ From T2]. Here are some examples.
Example 1
Example 2
Example 3
Example
4
(a) Digraph. (b) Its adjacency matrix. (c) Its transitive closure.
We can generate the transitive closure of a digraph with the help of depthfirst search or breadth-
first search. Performing either traversal starting at the ith vertex gives the information about the
vertices reachable from it and hence the columns that contain 1’s in the ith row ofthe transitive
closure. Thus, doing such a traversal for every vertex as a starting point yields the transitive
closure in its entirety.
Since this method traverses the same digraph several times, we can use a better algorithm called
Warshall’s algorithm. Warshall’s algorithm constructs the transitive closure througha series
of n × n boolean matrices:
Each of these matrices provides certain information about directed paths in the digraph.
Specifically, the element r(k) th th (k)
ij in the i row and j column of matrix R (i, j = 1, 2, . . . , n, k
= 0, 1, . . . , n) is equal to 1 if and only if there exists a directed path of a positive length from
the ith vertex to the jth vertex with each intermediate vertex, if any, numbered not higher than
k.
Thus, the series starts with R(0) , which does not allow any intermediate vertices in its paths;
hence, R(0) is nothing other than the adjacency matrix of the digraph. R(1) contains the
information about paths that can use the first vertex as intermediate. The last matrix in the
series, R(n) , reflects paths that can use all n vertices of the digraph as intermediate and hence
is nothing other than the digraph’s transitive closure.
This means that there exists a path from the ith vertex vi to the jth vertex vj with each
intermediate vertex numbered not higher than k:
vi, a list of intermediate vertices each numbered not higher than k, vj . --- (*)
Two situations regarding this path are possible.
1. In the first, the list of its intermediate vertices does not contain the kth vertex. Then this
path from vi to vj has intermediate vertices numbered not higher than k−1. i.e. r(k–1) = 1
ij
2. The second possibility is that path (*) does contain the kth vertex vk among the
intermediate vertices. Then path (*) can be rewritten as;
vi, vertices numbered ≤ k − 1, vk, vertices numbered ≤ k − 1, vj .
i.e r(k–1) = 1 and r(k–1) = 1
ik kj
Thus, we have the following formula for generating the elements of matrix R(k) from the
elements of matrix R(k−1)
As an example, the application of Warshall’s algorithm to the digraph is shown below. New
1’s are in bold.
Analysis
Its time efficiency is Θ(n3). We can make the algorithm to run faster by treating matrix rows
as bit strings and employ the bitwise or operation available in most modern computer
languages.
Space efficiency: Although separate matrices for recording intermediate results of thealgorithm
are used, that can be avoided.
(a) Digraph. (b) Its weight matrix. (c) Its distance matrix
We can generate the distance matrix with an algorithm that is very similar to Warshall’s
algorithm. It is called Floyd’s algorithm.
Floyd’s algorithm computes the distance matrix of a weighted graph with n vertices through aseries
of n × n matrices:
vertex with each intermediate vertex, if any, numbered not higher than k.
As in Warshall’s algorithm, we can compute all the elements of each matrix D(k) from its
immediate predecessor D(k−1)
If d(k)
ij =1, then it means that there is a path;
The situation is depicted symbolically in Figure, which shows the underlying idea of Floyd’s
algorithm.
Taking into account the lengths of the shortest paths in both subsets leads to the following
recurrence:
Application of Floyd’s algorithm to the digraph is shown below. Updated elements are shown
in bold.
Knapsack problem
We start this section with designing a dynamic programming algorithm for the knapsack
problem: given n items of known weights w1, . . . , wn and values v1, . . . , vn and a knapsack
of capacity W, find the most valuable subset of the items that fit into the knapsack.
To design a dynamic programming algorithm, we need to derive a recurrence relation that
expresses a solution to an instance of the knapsack problem in terms of solutions to its
smaller sub-instances.
Let us consider an instance defined by the first i items, 1≤ i ≤ n, with weights w 1, . . . , wi,
values v1, . . . , vi , and knapsack capacity j, 1 ≤ j ≤ W. Let F(i, j) be the value of an optimal
solution to this instance. We can divide all the subsets of the first i items that fit the knapsack
of capacity j into two categories: those that do not include the ith item and those that do. Note
the following:
i. Among the subsets that do not include the ith item, the value of an optimal subset is, by
definition, F(i − 1, j).
ii. Among the subsets that do include the ith item (hence, j − wi ≥ 0), an optimal subset is
made up of this item and an optimal subset of the first i−1 items that fits into the
knapsack of capacity j − wi . The value of such an optimal subset is vi + F(i − 1, j − wi).
Thus, the value of an optimal solution among all feasible subsets of the first I items is the
maximum of these two values.
Analysis
The time efficiency and space efficiency of this algorithm are both in Θ(nW). The time
needed to find the composition of an optimal solution is in O(n).
Memory Functions
The direct top-down approach to finding a solution to such a recurrence leads to an algorithm
that solves common subproblems more than once and hence is very inefficient.
The classic dynamic programming approach, on the other hand, works bottom up: it fills a table
with solutions to all smaller subproblems, but each of them is solved only once. An unsatisfying
aspect of this approach is that solutions to some of these smaller subproblems are often not
necessary for getting a solution to the problem given. Since this drawback is not present in the
top-down approach, it is natural to try to combine the strengths of the top-downand bottom-up
approaches. The goal is to get a method that solves only subproblems that are
necessary and does so only once. Such a method exists; it is based on using memory
functions.
Example-2 Let us apply the memory function method to the instance considered in Example
1. The table in Figure given below gives the results. Only 11 out of 20 nontrivial values (i.e., not
those in row 0 or in column 0) have been computed. Just one nontrivial entry, V (1, 2), is retrieved
rather than being recomputed. For larger instances, the proportion of such entries can be
significantly larger.
Figure: Example of solving an instance of the knapsack problem by the memory function
algorithm
GREEDY METHOD
THE GREEDY METHOD: Prim’s Algorithm, Kruskal’s Algorithm, Dijkstra’s Algorithm, Huffman
Trees and Codes.
Chapter 9 (Sections 9.1,9.2,9.3,9.4)
The approach applied in the opening paragraph to the change-making problem is called greedy.
Computer scientists consider it a general design technique despite the fact that it is applicable to
optimization problems only. The greedy approach suggests constructing a solution through a
sequence of steps, each expanding a partially constructed solution obtained so far, until a complete
solution to the problem is reached. On each step—and this is the central point of this technique—
the choice made must be:
feasible, i.e., it has to satisfy the problem’s constraints
locally optimal, i.e., it has to be the best local choice among all feasible choices available on that
step
irrevocable, i.e., once made, it cannot be changed on subsequent steps of the algorithm
These requirements explain the technique’s name: on each step, it suggests a “greedy” grab of the best
alternative available in the hope that a sequence of locally optimal choices will yield a (globally)
optimal solution to the entire problem.
The greedy method constructs an algorithm that works in stages considering one input at a
time. At each stage the decision is made regarding whether a particular input is an optimal solution.
If the inclusion of the next input into a partially constructed optimal solution will result in an infeasible
solution, then that input is not added to the partial solution; otherwise it is added. The selection
Dept. of ISE,JSSATEB Page 12
procedure is itself based on some optimization measure; this measure is may be objective of the
problem.
Different optimization measures may be possible for a given problem; these will generate an
algorithm that generates suboptimal solutions. This version of greedy technique called Subset
paradigm.
General Method:
Algorithm Greedy(a,n)
// a[1:n] contains n inputs
{
solution=ф;
for i=1 to n do
{ x= Select(a);
if Feasible(solution, x) then
solution = Union(solution,x);
}
return solution;
}
The above algorithm shows the Greedy method control abstraction for the subset paradigm.
The algorithm works as follows:
The function Select selects an input from a[ ] and removes it. The selected input’s value is
assigned to x.
Feasible is Boolean valued function that determines whether x can be included into the solution
subset.
The function Union combines x with the solution and updates the objective function.
The problems that do not call for selection of an optimal subset, in the greedy method we
make decisions by considering inputs in some order. Each decision is made using an optimization
criterion that can be computed using decisions already made. This method of greedy approach is called
Ordering paradigm.
The subset paradigm problems are:
Knapsack problem
Job sequencing with deadlines
Minimum cost spanning trees:
o Prim’s algorithm & Kruskal’s algorithm
Task A B C D E F G
Start 0 3 4 9 7 1 6
Finish 2 7 7 11 10 5 8
The task- to- machine assignment is a feasible assignment that utilizes seven machines such that
one machine to one task. But this assignment is not an optimal assignment because other assignments
use fewer machines. For example, we can assign the tasks A, B & D to the same machine so it can
reduce number of machines required.
The greedy technique to obtain optimal task assignment is to assign the tasks in stages, one task
per stage and in non decreasing order of task start times. A machine is old if at least one of the tasks
has been assigned to it; otherwise the machine is new.
In the above example tasks in the non decreasing order of task start time are A, F, B, C, G, E,
D. the greedy algorithm assigns tasks to machine in this order only. The algorithm has n=7 stages & in
each stage one task is assigned to a machine as shown in figure
In stage one no old machine so task A is assigned to a new machine, let M1. This machine
is busy from time 0 to 2.
In stage 2 task F is considered, since old machine M1 is busy, it is assigned to new machine
let it be M2.
The task B is considered at stage 3, since its start time is 3 it is assigned to old
machineM1 since M1 is available.
In stage 4 the task C is considered, its start time is 4, as the old machines are not available,
this task is assigned to new machine, let it be M3.
In stage 5, the task G is considered, as its start time is 6, it assigned M2 as M2 is
available.
In stage 6, the task E is considered, as its start time is 7, as both M1 & M2 are available, its can
be assigned either M1 or M2, we assume it is assigned to M1.
In stage 7, the last task D is considered, its start time is 9; as both M2 & M3 available, it can be
assigned to either M2 or M3, we assume it is assigned to M3.
The above steps are shown in the below figure. So the greedy algorithm will do the machine scheduling
of above instance with three machines only
M3 C D
M2 F G
M1
A B E
0 1 2 3 4 5 6 7 8 9 10
11
Maximize pi x i Eqn 1
1 i n
The feasible solution is any set {x1, x2, . . . , xn} Eqn2 & Eqn3. An optimal solution is a feasible
solution for which Eqn1 is maximized.
Example:
Consider the following instance of knapsack problem: n=3, m=20, (p1, p2, p3) = (25,24,15)
and (w1,w2,w3)=(18,15,10) find the optimal solution.
Solution: Weights:
(w1,w2,w3)=(18,15,10) Profits :
(p1,p2,p3)=(25,24,15)
Profit/weight ratio:
p1 25 p2 24 p 3 15
= =1.39 = =1.6 = =1.5
w1 18 w2 15 w3 10
∴ Total Profit=31.5
Algorithm GreedyKnapsack(m,n)
//Greedy algorithm for Knapsack problem.
//I/P: P[1:n] & W[1:n] contains the profits & weights of n items respectively & m is the
//capacity of the knapsack.
//O/P: X[1:n] is represents the solution vector.
{ for i=1 to n do
X[i]=0.0;
U=m;
for i=1 to n do
{
if(W[i]>U) then break; /*if weight of ith item is not greater than m select */
x[i]=1.0;
U=U-W[i]; /*U is the remaining capacity*/
}
U
if (i≤n) then x[i]= ---
W [i]
}
Theorem:
p1 p 2 pn
If ≥ ≥. . .≥ then GreedyKnapsack generates an optimal solution to the given
w1 w2 wn
ii. If k=j, then since ∑wixi=m and yi=xi for 1≤i<j, it follows that either yk<xk or
∑wiyi>m.
Now suppose we increase yk to xk and decrease as many of (yk+1 , yk+2. . ., yn) as necessary so that the
total capacity used still m . This results in a new solution z=(z1,z2,. . . zn) with zi=xi,
1≤i≤k, and wi ( yi zi) = wk(zk-yk). Then for z we have:
k i n
p pi
pizi = piyi ( zk yk )wk k ( yi zi )wi
1 i n 1 i n wk k wi
i n
pk
≥ piyi [(zk yk )wk ( yi zi )wi ]
1 i n k wk
i n
= piyi
1 i n
If ∑pizi >∑piyi , then y could not have
been optimal solution. If these sums
are equal, then either z=x and x is
optimal, or z≠x. In z≠x repeated use of
the above argument will either shows
that y is not optimal, or transform y into
x thus it shows that x is too optimal.
They can be used to obtain an independent set of circuit equations for an electric network.
They can be used to check the cyclicity & connectivity property of the graph.
Network design
Telephone, electrical, hydraulic, TV cable, computer, road
Approximation algorithms for NP-hard problems.
Traveling salesperson problem, Steiner tree
Cluster analysis.
Reducing data storage in sequencing amino acids in a protein
Learning salient features for real-time face verification
Auto config protocol for Ethernet bridging to avoid cycles in a network, etc
The minimum spanning tree of a given graph is a connected acyclic subgraph that contains all
the vertices of the graph such that the sum of the edge weights of the tree should be minimum.
or
Let G=(V,E) be an undirected connected graph, a subgraph t=(V,E’) of G is a spanning tree of G
if and only if t is a tree such that the sum of the edge weights of the tree should be minimum.
There are two algorithms to find the minimum spanning tree of the given graph
i. Prim’s Algorithm
ii. Kruskal’s Algorithm
Prim’s algorithm constructs a minimum spanning tree through a sequence of expanding subtrees. The
initial subtree in such a sequence consists of a single vertex selected arbitrarily from the set V of the
graph’s vertices.
On each iteration, the algorithm expands the current tree in the greedy manner by simply attaching to it
the nearest vertex not in that tree. (By the nearest vertex, we mean a vertex not in the tree connected to
a vertex in the tree by an edge of the smallest weight. Ties can be broken arbitrarily.)
The algorithm stops after all the graph’s vertices have been included in the tree being constructed. Since
the algorithm expands a tree by exactly one vertex on each of its iterations, the total number of such
iterations is n − 1, where n is the number of vertices in the graph. The tree generated by the algorithm
is obtained as the set of edges used for the tree expansions.
The nature of Prim’s algorithm makes it necessary to provide each vertex not in the current tree with
the information about the shortest edge connecting the vertex to a tree vertex. We can provide such
information by attaching two labels to a vertex: the name of the nearest tree vertex and the length (the
weight) of the corresponding edge.
Vertices that are not adjacent to any of the tree vertices can be given the ∞ label indicating their “infinite”
distance to the tree vertices and a null label for the name of the nearest tree vertex. (Alternatively, we
can split the vertices that are not in the tree into two sets, the “fringe” and the “unseen.”)
The fringe contains only the vertices that are not in the tree but are adjacent to at least one tree vertex.
These are the candidates from which the next tree vertex is selected. The unseen vertices are all the other
vertices of the graph, called “unseen” because they are yet to be affected by the algorithm. With such
labels, finding the next vertex to be added to the current tree T =(VT, ET) becomes a simple task of
finding a vertex with the smallest distance label in the set V − VT .
It depends on the data stuctures chosen for the graph itself and for the priority queue of the
set V-VT whose vertex priorities are the distances to the nearest tree vertices. In general graph is represented
by its weight matrix and adjacency lists.
If a graph is represented by weight matrix and priority queue is implemented as an unordered array
the algorithm time complexity will be O(׀V)׀2.
If a graph is represented by weight matrix and priority queue is implemented as MIN-HEAP the
algorithm time complexity will be O(log n).
If a graph is represented by adjacency List and priority queue is implemented as a min-heap , the
algorithm time complexity will be O(׀E ׀log ׀V)׀.This is because the algorithm performs ׀V׀-1
deletions of the smallest element and makes IEI verifications.
Example:
There is another greedy algorithm for the minimum spanning tree problem that also always yields
an optimal solution. It is named Kruskal’s algorithm after Joseph Kruskal, who discovered this algorithm
when he was a second-year graduate student [Kru56].
Kruskal’s algorithm looks at a minimum spanning tree of a weighted connected graph G =(V, E)as
an acyclic subgraph with |V| − 1 edges for which the sum of the edge weights is the smallest. (It is not
difficult to prove that such a subgraph must be a tree.) Consequently, the algorithm constructs a minimum
spanning tree as an expanding sequence of subgraphs that are always acyclic but are not necessarily
connected on the intermediate stages of the algorithm. The algorithm begins by sorting the graph’s
edges in nondecreasing order of their weights. Then, starting with the empty subgraph, it scans this
sorted list, adding the next edge on the list to the current subgraph if such an inclusion does not
create a cycle and simply skipping the edge otherwise.
EXAMPLE:
A variety of practical applications of the shortest-paths problem have made the problem a very popular
object of study. The obvious but probably most widely used applications are Transportation planning and
packet routing in communication networks, including the Internet. Multitudes of less obvious applications
include finding shortest paths in social networks, speech recognition, document formatting, robotics,
compilers, and airline crew scheduling. In the world of entertainment, one can mention path finding in
video games and finding best solutions to puzzles using their state-space graphs
Dijkstra’s algorithm finds the shortest paths to a graph’s vertices in order of their distance from a given
source. First, it finds the shortest path from the source to a vertex nearest to it, then to a second nearest, and
so on. In general, before its ith iteration commences, the algorithm has already identified the shortest paths
to i − 1 other vertices nearest to the source. These vertices, the source, and the edges of the shortest paths
leading to them from the source form a subtree Ti of the given graph. Since all the edge weights are
nonnegative, the next vertex nearest to the source can be found among the vertices adjacent to the vertices
of Ti . The set of vertices adjacent to the vertices in Ti can be referred to as “fringe vertices”; they are the
candidates from which Dijkstra’s algorithm selects the next vertex nearest to the source.
To identify the ith nearest vertex, the algorithm computes, for every fringe vertex u, the sum of the
distance to the nearest tree vertex v (given by the weight of the edge (v, u)) and the length dv of the shortest
path from the source to v (previously determined by the algorithm) and then selects the vertex with the
smallest such sum.
To facilitate the algorithm’s operations, we label each vertex with two labels. The numeric label d
indicates the length of the shortest path from the source to this vertex found by the algorithm so far; when
a vertex is added to the tree, d indicates the length of the shortest path from the source to that vertex. The
other label indicates the name of the next-to-last vertex on such a path, i.e., the parent of the vertex in the
tree being constructed. With such labeling, finding the next nearest vertex u∗ becomes a simple task of
finding a fringe vertex with the smallest d value. Ties can be broken arbitrarily.
After we have identified a vertex u ∗ to be added to the tree, we need to perform two operations:
Move u ∗ from the fringe to the set of tree vertices.
Time complexity: For graphs represented by their adjacency lists and the priority queue implemented as a
min-heap, it is in O(|E| log |V |).
Introduction
Codeword. : Suppose we have to encode a text that comprises symbols from some n-symbol alphabet by
assigning to each of the text’s symbols some sequence of bits called the codeword.
There are two types of encoding
1. Fixed length encoding : A fixed-length encoding that assigns to each symbol a bit string of the same
length m (m ≥ log2 n).
2. Variable length encoding: Variable-length encoding, which assigns codeword of different lengths to
different symbols, introduces a problem that fixed-length encoding does not have. To get a coding scheme
that yields a shorter bit string on the average is based on the old idea of assigning shorter code words to
more frequent symbols and longer code words to less frequent symbols. This idea was used, in
particular, in the telegraph code invented in the mid-19th century by Samuel Morse. In that code, frequent
letters such as e (.) and a (.−) are assigned short sequences of dots and dashes while infrequent letters such
as q(−−.−) and z (−−) .
Among the many trees that can be constructed in this manner for a given alphabet with known
frequencies of the symbol occurrences, how can we construct a tree that would assign shorter bit strings
to high-frequency symbols and longer ones to low-frequency symbols? It can be done by the following
greedy algorithm, invented by David Huffman .
Huffman’s algorithm
Step 1 Initialize n one-node trees and label them with the symbols of the alphabet given. Record the
frequency of each symbol in its tree’s root to indicate the tree’s weight. (More generally, the weight of a
tree will be equal to the sum of the frequencies in the tree’s leaves.)
Step 2 Repeat the following operation until a single tree is obtained. Find two trees with the smallest
weight (ties can be broken arbitrarily, but see Problem 2 in this section’s exercises). Make them the left
and right subtree of a new tree and record the sum of their weights in the root of the new tree as its weight.
A tree constructed by the above algorithm is called a Huffman tree. It defines—in the manner described
above—a Huffman code.
EXAMPLE Consider the five-symbol alphabet {A, B, C, D, _} with the following occurrence
frequencies in a text made up of these symbols:
The Huffman tree construction for this input is shown in Figure 9.12
we used a fixed-length encoding for the same alphabet, we would have to use at least 3 bits per each
symbol. Thus, for this toy example, Huffman’s code achieves the compression ratio—a standard measure
of a compression algorithm’s effectiveness—of (3− 2.25)/3 . 100%= 25%.
Dynamic Huffman encoding: In which the coding tree is updated each time a new symbol is read from
the source text. Further, modern alternatives such as Lempel-Ziv algorithms
This may not be the case for arbitrary pi ’s, however. For example, if n = 4 and p1 = 0.1, p2 = 0.2,
p3 = 0.3, and p4 = 0.4, the minimum weighted path tree is the rightmost one in Figure 9.13. Thus, we
need Huffman’s algorithm to solve this problem in its general case. Note that this is the second time we
are encountering the problem of constructing an optimal binary tree. In Section 8.3, we discussed the
problem of constructing an optimal binary search tree with positive numbers (the search probabilities)
assigned to every node of the tree. In this section, given numbers are assigned just to leaves. The latter
problem turns out to be easier: it can be solved by the greedy algorithm, whereas the former is solved by
the more complicated dynamic programming algorithm.