Design Analysis and Algorithm MCA-AI
Design Analysis and Algorithm MCA-AI
ANALYSIS OF
ALGORITHM
LECTURE NOTES
ON
DESIGN AND ANALYSIS OF ALGORITHMS
Prepared by
MODULE – I
Lecture 1 - Introduction to Design and analysis of algorithms
Lecture 2 - Growth of Functions ( Asymptotic notations)
Lecture 3 - Recurrences, Solution of Recurrences by substitution
Lecture 4 - Recursion tree method
Lecture 5 - Master Method
Lecture 6 - Worst case analysis of merge sort, quick sort and binary search
Lecture 7 - Design and analysis of Divide and Conquer Algorithms
Lecture 8 - Heaps and Heap sort
Lecture 9 - Priority Queue
Lecture 10 - Lower Bounds for Sorting
MODULE-II
Lecture 11 - Dynamic Programming algorithms
Lecture 12 - Matrix Chain Multiplication
Lecture 13 - Elements of Dynamic Programming
Lecture 14 - Longest Common Subsequence
Lecture 15 - Greedy Algorithms
Lecture 16 - Activity Selection Problem
Lecture 17 - Elements of Greedy Strategy
Lecture 18-19 - Fractional Knapsack Problem
Lecture 20 - Huffman Codes
Lecture – 21 Disjoint Set Data Structure
Lecture 22 - Disjoint Set Operations, Linked list Representation
Lecture 23 - Disjoint Forests
MODULE – III
Lecture 24 - Graph Algorithm - BFS and DFS
Lecture 25 - Minimum Spanning Trees
Lecture 26 - Kruskal algorithm
Lecture 27 - Prim's Algorithm
Lecture 28 -30 Single Source Shortest paths, Dijkstra's Algorithm
Lecture 31 – All pairs shortest paths algorithms.
Lecture 32 - Backtracking And Branch And Bound
Lecture 33 - Fourier transforms and Rabin-Karp Algorithm
Lecture 34 - NP-Hard and NP-Complete Problems
Lecture 35 - Approximation Algorithms(Vertex-Cover Problem)
Lecture 36 - NP-Complete Problems (without proofs)
Lecture 37 - Traveling Salesman Problem
Module-I
MOTIVATION
The advancement in science and technology enhance the performance of processor, which proportionally
affect the characteristics of computer system, such as security, scalability and reusability. Important problems
such as sorting, searching, string processing, graph problems, Combinational problems, numerical problems
are basic motivations for designing algorithm.
DESIGN GOAL
The Basic objective of solving problem with multiple constraints such as problem size performance and cost in
terms of space and time. The goal is to design fast, efficient and effective solution to a problem domain. Some
problems are easy to solve and some are hard. Quite cleverness is required to design solution with fast and
better approach. Designing new system need a new technology and background of the new technology is the
enhancement of existing algorithm. The study of algorithm is to design efficient algorithm not only limited in
reducing cost and time but to enhance scalability, reliability and availability.
i) Correctness of solution
ii) Decomposition of application into small and clear units which can be maintained precisely
iii) Improving the performance of application
A lay man perceives that a computer perform anything and everything. It is very difficult to ensure that it
is not really the computer but the man behind computer who does the whole thing.
For example users just enter their queries and can get information as he/she desire. A common man
rarely understands that a man made procedure called search has done the entire task and the only
support provided by the computer is the execution speed and organized storage information.
‘Algorithm’ is defined after the name of Abu Ja’ far Muhammad ibn Musa Al-Khwarizmi, Ninth century ,
al-jabr means “restoring” referring to the process of moving a subtracted quantity to other side of an
equation; al-muqabala is “comparing” and refers to subtracting equal quantities from both sides of an
equation.
Definition of Algorithm
An algorithm is a set of rules for carrying out calculation either by hand or on a machine.
An algorithm is a sequence of computational steps that transform the input into the output.
An algorithm is a sequence of operations performed on data that have to be organized in the data
structures.
A finite set of instruction that specify a sequence of operations to be carried out in order to solve
a specific problem or class of problems is called an algorithm.
An algorithm is an abstraction of a program to be executed on a physical machine (model
computation).
An algorithm is defined as set of instructions to perform a specific task within finite no. of steps.
Algorithm is defined as a step by step procedure to perform a specific task within finite number of
steps.
It can be defined as a sequence of definite and effective instructions, while terminates with the
production of correct output from the given input.
Example of algorithm
Let us assume can common example of ATM withdrawal system .To present the scenario of a person
under goes the following steps.
If we go through these above 6 steps without considering the statement of the problem, we should
assume that this is the algorithm for clearing the toilet. As of several ambiguities arises there while
comprehending every step. The step 1 may imply toothbrush, paintbrush, toilet brush etc. Such an
ambiguous doesn’t an instruction an algorithmic step. Thus every step should be made unambiguous
step is called ‘definite instruction’. Even if the step 2 is rewritten as apply the toothpaste, to eliminate
ambiguities yet the conflicts such as, where to apply the toothpaste and where the source of the
toothpaste is, need to be resolved. Therefore, the act of applying the toothpaste is not mentioned.
Although unambiguous such unrealizable steps can’t be included as algorithmic instruction as they
are not effective.
The definiteness and effectiveness of an instruction implies the successful termination of that
instruction. Hence that above two may not be sufficient to guarantee the termination of the
algorithm. Therefore, while designing an algorithm care should be taken to provide a proper
termination for algorithm.
CHARACTERISTICS OF AN ALGORITHM
a) Input
b) Output
c) Finiteness
d) Definiteness
e) Effectiveness
f) Precision
g) Determination
h) Correctness
i) Generality
a) Input
An algorithm may have one or more inputs. The inputs are taken from a specified set of subjects. The
input pattern may be texts, images or any type of files.
b) Output
An algorithm may have one or more outputs. Output is basically a quantity which has a specified
relation with the input.
c) Finiteness
An algorithm should terminate after a countable number of steps. In some cases the repetition of
steps may be larger in number. If a procedure is able to resolve in finite number of execution of
steps, then it is referred to be computational method.
d) Definiteness
Each step of an algorithm must be precisely defined. The action to be carried out must be on
ambiguously specified for each case. Due to the lack of understandability one may think that the
step might be lacking definiteness. Therefore in such cases mathematically expressions are
written, so that it resembles the instruction of any computer language.
e) Effectiveness
An algorithm is generally expected to effective. Means the steps should be sufficiently basic, so that it
may be possible for a man to resolve them.
f) Precision
The steps are precisely stated.
g) Determination
The intermediate results of each step of execution are unique and are determined only by input and
output of the preceding steps.
h) Correctness
Output produced by the algorithm should correct.
i) Generality
Algorithm applies to set of standard inputs.
Described precisely: very difficult for a machine to know how much water, milk to be added etc. in the above
tea making algorithm.
These algorithms run on computers or computational devices. for example, GPS in our smart phones, Google
hangouts.
GPS uses shortest path algorithm. Online shopping uses cryptography which uses RSA algorithm.
Characteristics of an algorithm:-
Correctness:-
Produce an incorrect answer: Even if it fails to give correct results all the time still there is a control on how
often it gives wrong result. Eg. Rabin- Miller PrimalityTest (Used in RSA algorithm): It doesn’t give correct
answer all the time.1 out of 250 times it gives incorrect result.
Approximation algorithm: Exact solution is not found, but near optimal solution can be found out. (Applied to
optimization problem.)
Resource usage:
Here, the time is considered to be the primary measure of efficiency .We are also concerned with how much
the respective algorithm involves the computer memory.But mostly time is the resource that is dealt with.
And the actual running time depends on a variety of backgrounds: like the speed of the Computer, the
language in which the algorithm is implemented, the compiler/interpreter, skill of the programmers etc.
So, mainly the resource usage can be divided into: 1.Memory (space) 2.Time
performance measurement or Apostoriori Analysis: Implementing the algorithm in a machine and then
calculating the time taken by the system to execute the program successfully.
Performance Evaluation or Apriori Analysis. Before implementing the algorithm in a system. This is done as
follows
How long the algorithm takes :-will be represented as a function of the size of the input.
How fast the function that characterizes the running time grows with the input size.
The algorithm with less rate of growth of running time is considered better.
Algorithms are just like a technology. We all use latest and greatest processors but we need to run
implementations of good algorithms on that computer in order to properly take benefits of our money that
we spent to have the latest processor. Let’s make this example more concrete by pitting a faster
computer(computer A) running a sorting algorithm whose running time on n values grows like n2 against a
slower computer (computer B) running assorting algorithm whose running time grows like n lg n. They each
must sort an array of 10 million numbers. Suppose that computer A executes 10 billion instructions per
second (faster than any single sequential computer at the time of this writing) and computer B executes only
10 million instructions per second, so that computer A is1000 times faster than computer B in raw computing
power. To make the difference even more dramatic, suppose that the world’s craftiest programmer codes in
machine language for computer A, and the resulting code requires 2n2 instructions to sort n numbers.
Suppose further that just an average programmer writes for computer B, using a high- level language with an
inefficient compiler, with the resulting code taking 50n lg n instructions.
Time taken=
So choosing a good algorithm (algorithm with slower rate of growth) as used by computer B affects a lot.
Lecture 2-Growth of Functions (Asymptotic notations)
Before going for growth of functions and asymptotic notation let us see how to analyses an algorithm.
Pseudo code:
key=A[j]-----------------------------------------------------------------C2
A[i+1]=A[i]---------------------------------------------------------------C6
i=i-1------------------------------------------------------------------------C7
A[i+1]=key----------------------------------------------------------------C8
Let Ci be the cost of ith line. Since comment lines will not incur any cost C3=0. Cost No. Of times
Executed
C1n C2 n-1
C6 )
C7 C8n-1
Best case:
Quadratic function. So in worst case insertion set grows in n2. Why we concentrate on worst-case running
time?
The worst-case running time gives a guaranteed upper bound on the running time for any input.
For some algorithms, the worst case occurs often. For example, when searching, the worst case often occurs
when the item being searched for is not present, and searches for absent items may be frequent.
Why not analyze the average case? Because it’s often about as bad as the worst case.
Order of growth:
It is described by the highest degree term of the formula for running time. (Drop lower-order terms. Ignore
the constant coefficient in the leading term.)
Example: We found out that for insertion sort the worst-case running time is of the form an2 + bn + c.
Drop lower-order terms. What remains is an2.Ignore constant coefficient. It results in n2.But we cannot say
that the worst-case running time T(n) equals n2 .Rather It grows like n2 . But it doesn’t equal n2.We say that
the running time is Θ (n2) to capture the notion that the order of grow this n2.
We usually consider one algorithm to be more efficient than another if its worst-case running time has a
smaller order of growth.
Asymptotic notation
Focus on what’s important by abstracting away low-order terms and constant factors.
Ω≈ ≥ Θ ≈ = o ≈ < ω ≈ >
Example: n2 /2 − 2n = Θ (n2) with c1 = 1/4 c2 = 1/2 and n0 = 8.
Lecture 3-5: Recurrences, Solution of Recurrences by substitution, Recursion Tree and
Master Method
• If the given instance of the problem is small or simple enough, just solve it.
• Otherwise, reduce the problem to one or more simpler instances of the same problem.
E.g. the worst case running time T(n) of the merge sort procedure by recurrence can be
expressed as
1. SUBSTITUTION METHOD:
Verify by induction
We substitute the guessed solution for the function when applying the inductive hypothesis
to smaller values. Hence the name “substitution method”. This method is powerful but we must be
able to guess the form of the answer in order to apply it.
F(n)=4f(n/2)
F(2n)=4f(n)
F(n)=n2
<=4c(n/2)3 +n
<=cn3/2+n
<=cn3-(cn3/2-n)
T(n)=O(n3)
n>=1
c>=2
Now suppose we guess that T(n)=O(n2) which is tight upper bound Assume(k)<=ck2
T(n)=4T(n/2)+n
= 4c(n/2)2+n
=cn2+n
So, T(n) will never be less than cn2. But if we will take the assumption of T(k)=c1 k2-c2k, then
we can find that T(n) = O(n2)
2. BY ITERATIVE METHOD:
e.g. T(n)=2T(n/2)+n
=>22T(n/4)+n+n
=>23T(n/23) +3n
= O(nlogn)
In a recursion tree ,each node represents the cost of a single sub-problem somewhere in the
set of recursive problems invocations .we sum the cost within each level of the tree to obtain a set of
per level cost, and then we sum all the per level cost to determine the total cost of all levels of
recursion .
Sub problem size is 1 when n/4i=1 => i=log4n So, no. of levels =1+ log4n
Cost of each level at depth i=3i c (n/4i)2 = (3/16)icn2 T(n)= i=0∑log4n cn2(3/16)i
T(n)= i=0∑log4n -1 cn2(3/16)i + cost of last level Cost of nodes in last level =3iT(1)
c3log
4 n (at last level i=log4 n)
cnlog 3
4
4
T(n)= + c nlog 3
<= cn2
T(n)=aT(n/b)+f(n)
where a>=1 and b>1 are constants and f(n) is a asymptotically positive function . To use the
master method, we have to remember 3 cases:
If f(n)=Ὠ(nlogba+Ɛ) for some constant Ɛ>0 and if a*f(n/b)<=c*f(n) for some constant c<1 and
all sufficiently large n,then T(n)=ϴ(f(n))
T(n)=2T(n/2)+nlogn
f(n)=nlogn
=>ϴ(n1 logkn)=nlogn
2
=>K=1
=>ϴ(nlog2n)
Lecture 6 - Worst case analysis of merge sort, quick sort
Merge sort
It is one of the well-known divide-and-conquer algorithm. This is a simple and very efficient
algorithm for sorting a list of numbers.
We are given a sequence of n numbers which we will assume is stored in an array A [1...n].
The objective is to output a permutation of this sequence, sorted in increasing order. This is normally
done by permuting the elements within the array A.
How can we apply divide-and-conquer to sorting? Here are the major elements of the Merge
Sort algorithm.
Divide: Split A down the middle into two sub-sequences, each of size roughly n/2 . Conquer:
Sort each subsequence (by calling Merge Sort recursively on each).
Combine: Merge the two sorted sub-sequences into a single sorted list.
The dividing process ends when we have split the sub-sequences down to a single item. A
sequence of length one is trivially sorted. The key operation where all the work is done is in the
combine stage, which merges together two sorted lists into a single sorted list. It turns out that the
merging process is quite easy to implement.
The following gure gives a high-level view of the algorithm. The “divide” phase is shown on
the left. It works top-down splitting up the list into smaller subsists. The “conquer and combine”
phases are shown on the right. They work bottom-up, merging sorted lists together into larger sorted
lists.
Merge Sort
Designing the Merge Sort algorithm top-down. We’ll assume that the procedure that merges
two sorted list is available to us. We’ll implement it later. Because the algorithm is called recursively
on sublistsin addition to passing in the array itself we will pass in two indices which indicate the rst
and last indices of the sub array that we are to sort. The call Merge Sort(A, p, r) will sort the sub-array
[ p..r ] and return the sorted result in the same sub array.
Here is the overview. If r = p, then this means that there is only one element to sort, and we
may return immediately. Otherwise (if p < r) there are at least two elements, and we will invoke the
divide-and-conquer. We nd the index q midway between p and r namely q = ( p + r ) / 2 (rounded
down to the nearest integer). Then we split the array into sub arrays A [ p..q ] and A [ q
+ 1 ..r ] . Call Merge Sort recursively to sort each sub array. Finally, we invoke a procedure
(which we have yet to write) which merges these two sub arrays into a single sorted array.
} }
Merging: All that is left is to describe the procedure that merges two sorted lists. Merge(A, p,
q, r)assumes that the left sub array, A [ p..q ] , and the right sub array, A [ q + 1 ..r ] , have already
been sorted. We merge these two sub arrays by copying the elements to a temporary working array
called B. For convenience, we will assume that the array B has the same index range A, that is, B [ p..r
] . We have to indices i and j, that point to the current elements of each sub array. We move the
smaller element into the next position of B (indicated by index k) and then increment the
corresponding index (either i or j). When we run out of elements in one array, then we just copy the
rest of the other array into B. Finally, we copy the entire contents of B back into A.
Merge(array A, int p, int q, int r) { // merges A[p..q] with A[q+1..r] array B[p..r]
j = q+1
while (i <= q and j <= r) { // while both sub arrays are nonempty
if (A[i] <= A[j]) B[k++] = A[i++] // copy from left sub array
while (i <= q) B[k++] = A[i++]// copy any leftover to B while (j <= r) B[k++] = A[j++]
Analysis: What remains is to analyze the running time of Merge Sort. First let us consider the
running time of the procedure Merge(A p q r). Let n = r − p + 1 denote the total length of both the
left and right sub arrays. What is the running time of Merge as a function of n? The algorithm
contains four loops (none nested in the other). It is easy to see that each loop can be executed at
most n times. (If you are a bit more careful you can actually see that all the while-loops
together can only be executed times in total, because each execution copies one new
element to the array B, and B only has space form elements.) Thus the running time to Merge n items
is Θ ( n ) . Let us write this without the asymptotic notation simply as n. (We’ll see later why we do
this.)
Now, how do we describe the running time of the entire Merge Sort algorithm? We will do
this through the use of a recurrence that is a function that is dened recursively in terms of itself. To
avoid circularity, the recurrence for a given value of n is dened in terms of values that are strictly
smaller than n. Finally a recurrence has some basis values (e.g. for n = 1 ) which are dened
explicitly.
Let’s see how to apply this to Merge Sort. Let T ( n ) denote the worst case running time of
Merge Sort on an array of length n. For concreteness we could count whatever we like: number of
lines of pseudocode,number of comparisons, number of array accesses, since these will only differ by
a constant factor. Since all of the real work is done in the Merge procedure, we will count the total
time spent in the Merge procedure.
First observe that if we call Merge Sort with a list containing a single element, then the
running time is constant. Since we are ignoring constant factors, we can just write T ( n ) =1 . When
we call Merge Sort with a list of length n >1 e.g. Merge(A p r) where r − p +1 = n the algorithm rst
computes q = ( p + r ) / 2 . The sub array A p..q which contains q − p + 1 elements. You can verify
that is of size n/ 2 . Thus the remaining sub array A [ q +1 ..r ] has n/ 2 elements in it. How long does it
take to sort the left sub array? We do not know this, but because n/ 2< n for n >1 , we can express
this as T (n/ 2) . Similarly, we can express the time that it takes to sort the right sub array as T (n/ 2).
Finally, to merge both sorted lists takes n time, by the comments made above. In conclusion
we have
T ( n ) =1 if n = 1 ,
2T (n/ 2) + n otherwise.
Solving the above recurrence we can see that merge sort has a time complexity of Θ (n log n) .
QUICKSORT
Sorts in place.
Combine: No work is needed to combine the sub arrays, because they are sorted in place.
Perform the divide step by a procedure PARTITION, which returns the index q that marks the
position separating the sub arrays.
QUICKSORT (A, p, r)
ifp < r
thenq ←PARTITION(A p r )
Partitioning
x ← Ar
i ← p –1
for j ← p to r –1
do if A j ≤ x, then i ← i + 1
exchange Ai ↔ A j
return i + 1
PARTITION always selects the last element A[r ] in the subarrayA[p . . r ] as the pivot the
element around which to partition.
As the procedure executes, the array is partitioned into four regions, some of which may be
empty:
• If the sub arrays are balanced, then Quick sort can run as fast as merge sort.
• If they are unbalanced, then Quick sort can run as slowly as insertion sort.
Worst case
= T (n − 1) + Θ (n)
= O (n2) .
Best case
• Occurs when the sub arrays are completely balanced every time.
• Each sub array has ≤ n/2 elements.
• Get the recurrence
T (n) = 2T (n/2) + Θ (n) = O(n lgn).
Balanced partitioning
• QuickPort’s average running time is much closer to the best case than to the worst case.
• Imagine that PARTITION always produces a 9-to-1 split.
• Get the recurrence
T (n) ≤ T (9n/10) + T (n/10) + _ (n) = O (n lgn).
HEAPSORT
In place algorithm
Running Time: O(n log n)
The (binary) heap data structure is an array object that we can view as a nearly complete binary
tree. Each node of the tree corresponds to an element of the array. The tree is completely filled
on all levels except possibly the lowest, which is filled from the left up to a point.
The root of the tree is A[1], and given the index i of a node, we can easily compute the indices
of its parent, left child, and right child:
On most computers, the LEFT procedure can compute 2i in one instruction by simply shifting
the binary representation of i left by one bit position.
Similarly, the RIGHT procedure can quickly compute 2i + 1 by shifting the binary
representation of i left by one bit position and then adding in a 1 as the low-order bit.
The PARENT procedure can compute [i/2] by shifting i right one bit position. Good
implementations of heap sort often implement these procedures as "macros" or "inline"
procedures.
There are two kinds of binary heaps: max-heaps and min-heaps.
In a max-heap,the max-heap property is that for every node i other than the root,
A[PARENT(i)] >= A[i] ,that is, the value of a node is at most the value of its parent. Thus, the
largest element in a max-heap is stored at the root, and the subtree rooted at a node contains
values no larger than that contained at the node itself.
A min-heap is organized in the opposite way; the min-heap property is that for every node i
other than the root, A[PARENT(i)<=A[i] ,
The smallest element in a min-heap is at the root.
The height of a node in a heap is the number of edges on the longest simple downward path
from the node to a leaf and
The height of the heap is the height of its root.
Height of a heap of n elements which is based on a complete binary tree is O(log n).
Maintaining the heap property
MAX-HEAPIFY lets the value at A[i] "float down" in the max-heap so that the subtree rooted
at index i obeys the max-heap property.
MAX-HEAPIFY(A,i)
l LEFT(i)
r RIGHT(i)
if A[l] > A[i]
largest l
if A[r] > A[largest]
Largest r
if largest != i
Then exchange A[i] A[largest]