Algorithm+Analysis
Algorithm+Analysis
Algorithm Analysis
Session 1: Algorithms
Introduction to algorithms
In this session, you have seen the importance of algorithms in a computer program. Specifically,
Computer Program = Algorithms + Data structures
What is an algorithm?
An algorithm is a method of solving a problem through a set of sequential instructions.
Example 1: Ola app, on your request to pick from location A and drop at location B, an algorithm running
behind the screen helps to locate a cab nearby and find the shortest route to destination.
Example 2: In a similar way, Ola share has another algorithm which helps to identify the multiple requests
from different people en route to the destination and find out the shortest distance to pick and drop in their
respective locations.
Algorithm 1
We discussed a practical scenario where students can register for a course both online mode & offline at
academic office and certain students registered for the same course in both the modes.
You have learnt an algorithm to find out the student ID’s registered twice for the same course.
Algorithm 1
We have assumed that the university consists of not more than 10000 students and student ID’s are a set of
integers between 1 – 10000.
The combined data of student ids are stored in an array variable id [ ]
……
i= 0 1 2 3 n-1
For the first iteration of outer loop(i) i.e., when i = 0, the inner loop(j) iterates from 1 (i+1) to n - 1 and
compares the first student id (id [0]) with other student IDs on right-hand side till it finds the duplicate
student ID in id [0].
For the second iteration of outer loop(i) i.e., when i = 1, the inner loop(j) iterates from 2 (i + 1) to n – 1 and
compares the second student id (id [1]) with other student IDs on the right-hand side till it finds the duplicate
student ID of id [1].
For each iteration of inner loop(j), instruction set will compare id[ i ] & id[ j ], if it is true then prints the
duplicate ID and breaks out of inner loop(j).
In this way the outer loop(i) is iterated from 0 to n – 1 and for each iteration of outer loop(i), inner loop(j)
instruction set is executed for n – (i + 1) and prints all the duplicate IDs found.
Algorithm 2
As you know, typically, there is more than one way to solve the same problem and you have seen another
approach in finding the duplicate student IDs from given data.
Algorithm 2
In algorithm 2, declare an extra array variable count [ ] with the maximum size of student id i.e. 10000 besides
an array id [ ] for student registrations data.
Count array index values refers to corresponding student IDs. For id[i] (ith student), count[id[i]] increments
to 1, if same id appears again then the same cell is incremented to 2.
As you can observe, there are no duplicates in the above given data and so the corresponding cells of student
ID in the count array are incremented to 1.
Here, Student ID – 5 & 8 are repeated in the given data, so you can see that the arrays cells corresponding
to the duplicate student ID (5 & 8) are incremented to 2.
When count [id[ i ]] is equal to 2 then prints the duplicate student ID.
Parameters
Now, you need to analyse the above two algorithms and find the efficient algorithm. For which we discussed
that the parameters based on which an algorithm is analysed are
Both execution time and memory space are calculated as a function of input size(n).
In general, to analyse an algorithm you need to calculate the no of times certain instruction set is executed
rather than the exact time values as it depends on various external factors such as processor speed, the
compiler etc. Also, while analysing an algorithm, we consider worst case possible.
To understand worst case, you saw an analogy of unlocking a lock when you have ten different keys, of
which you are not aware of the right key. When you try to unlock on a trial and error method, the worst
case is that the lock is unlocked on your tenth attempt and the best case is that it unlocked on your first
attempt.
Time complexity
The worst case to be considered in algorithm 1 is when there is no duplicate student IDs and for every i th
iteration of outer loop(i), inner loop instruction set is executed n – (i + 1) times.
In a similar way, algorithm 2 executes certain instruction set to find the duplicate student IDs n times
On assuming constant times:
1. C1 for rest of the instruction set like declaring variables, passing data to functions, etc.,
2. C2 for the instruction set inside the loop to find duplicate student IDs .
Algorithm 2 – T(n) = C1 + n* C2
The total time taken to execute an algorithm as a function of input size(n) is called time complexity of an
algorithm and is represented by T(n).
Space complexity
The other parameter considered to analyse an algorithm is memory space required to execute an algorithm.
The total memory space required to execute an algorithm as a function of input size(n) is called space
complexity of an algorithm and is represented by S(n). In general, you only calculate the extra memory required,
not including the memory needed to store the input.
As you have learnt, algorithm 1 uses a constant memory space besides an array variable id [ ] to store student
IDs and algorithm 2 uses an extra array variable count [ ], the size of count array variable increases linearly
with the increase in the student strength of the university i.e., when student ID exceeds 10000, then the
maximum student ID is considered as the size of count array.
Therefore,
Algorithm 1 – S(n) - Constant space, memory space does not depend on the input size
Algorithm 2 – S(n) is linearly proportional to the number of possible students in the university
Asymptotic notations
After obtaining the complexity functions of both algorithms, you have learnt mathematical notations like Big
O, Big omega (Ω), Big theta (ϴ) called as asymptotic notations which help us to compare the functions of two
different algorithms.
Big O
Big O indicates the upper bound (worst case) of the running time or space complexity of an algorithm.
To calculate the Big O of any function, we discussed certain simplification rules as,
• Drop the constant multiplier in a function as it depends on hardware like processor speed, on which
the program is run.
For example,
i) 𝑇(𝑛) = 2𝑛 ⇒ 𝑇(𝑛) ∈ 𝑂(𝑛)
ii) 𝑇(𝑛) = 5𝑛2 ⇒ 𝑇(𝑛) ∈ 𝑂(𝑛2 )
• Drop less significant terms in a polynomial function. Except for higher order terms, rest of the terms
relatively contribute very less in explaining the growth of a function.
For example,
𝑇(𝑛) = 10𝑛3 + 𝑛2 + 4𝑛 + 800. ⇒ 𝑇(𝑛) ∈ 𝑂(𝑛3 )
If you drop all the less significant terms except for 10𝑛3 , then the error rate is 0.01%, which is very
minimal.
Definition: Big O is “bounded above by” (upper bound), there exists constants 𝑐 > 0 and 𝑁 > 0
𝑇(𝑛) ≤ 𝑐 𝑓(𝑛) for all 𝑛 > 𝑁, 𝑇(𝑛) ∈ 𝑂(𝑓(𝑛))
When calculating Big O, the time complexity function of an algorithm is less than or equal to upper bound
and input size(n) is a positive value, as n value increases −3𝑛 + 6 < 0.
Big Omega
Big Omega indicates the lower bound of running time or space complexity of an algorithm. In the scenario
of opening a lock, if you dealt with the best case, i.e. on your first attempt itself, you could unlock.
Definition: Big Omega (ꭥ) is “bounded below by” (lower bound), there exists constants 𝑐 > 0 and 𝑁 > 0
𝑇(𝑛) ≥ 𝑐 𝑓(𝑛) for all 𝑛 > 𝑁, 𝑇(𝑛) ∈ Ω(𝑓(𝑛))
When calculating Big Omega, the time complexity function of an algorithm is greater than or equal to
lower bound.
𝑇(𝑛) = 2𝑛2 − 3𝑛 + 6
≥ 2𝑛2 − 3𝑛 (because 6 > 0)
≥ 𝑛2 , 𝑛 ≥ 3 ⇒ 2𝑛2 − 3𝑛 ≥ 𝑛2
⇒ 2𝑛2 − 𝑛2 ≥ 3𝑛
⇒ 𝑛2 ≥ 3𝑛
⇒ 𝑛≥3
Definition: Big Theta (ϴ) is “bounded above and below”, there exists constants c1 > 0, c2 > 0 and 𝑁 > 0,
c1 . 𝑔(𝑛) ≤ 𝑇(𝑛) ≤ c2 .𝑓(𝑛), for all 𝑛 > 𝑁, 𝑇(𝑛) ∈ 𝛳(𝑓(𝑛))
Among all the three asymptotic notations, the most used notation to compare algorithms is Big O.
Therefore, you always tend to find the worst case of an algorithm with respect to the input size(n).
Rule of sums
In an algorithm, when two for loops are one after another as follows
Then the first for loop(i) instruction set is executed n times and second for loop(j) instruction set is executed
m times.
So, in total n + m steps taken in executing this kind of an algorithm. If you consider the time complexity of
this algorithm, 𝑇(𝑛) = 𝑛 + 𝑚 (neglecting the constant time taken by instruction set)
If 𝑛 > 𝑚 ⇒ then you can say that 𝑇(𝑛) = 𝑛 + 𝑚
≤𝑛+ 𝑛
≤ 2𝑛
On dropping the constant multiplier 2, 𝑇(𝑛) ∈ 𝑂(𝑛)
Rule of products
In an algorithm, when two for loops are nested as follows
Then the outer loop(i) is iterated n times and for each iteration of outer loop(i), inner loop(j) instruction set
is executed for m times. In total, the instruction set inside the inner loop is executed n * m times, if you
consider the time complexity function of algorithm, 𝑇(𝑛) = 𝑛 ∗ 𝑚(neglecting the constant time taken by
instruction set).
To find out the relative efficiency among the different functions, the following table has been discussed in
detail on how it helps you to give an idea on the different time taken for different input sizes.
To understand the below table, we have assumed that a computer performs 1 billion (109 ) operations per
second.
Using this, you can compare different complexity functions of algorithms and find the efficient algorithm.
Duplicates
After finding the relative efficiency of different functions, you have learnt how to compare algorithms to find
duplicate student IDs and find the efficient algorithm
Time complexity:
𝑛(𝑛−1)
Algorithm 1 – T(n) = C1 + * C2 ⇒ T(n ) ∈ O(𝑛2 ) – Quadratic function
2
you can observe that algorithm 2: O(𝑛) is more efficient when compared to algorithm 1: O(𝑛2 )
Space complexity:
You can observe that algorithm 1: O(1) is more efficient when compared to algorithm 2: O(n)
Time vs Space Complexity Trade-off:
As you’ve seen with the algorithms for identifying duplicated student ids, the two algorithms have made
different trade-offs in terms of time and space.
Specifically:
• Algorithm 1 runs slower, but uses less memory
• Algorithm 2 runs faster, but uses more memory
As a software developer, you’ll often face this kind of dilemma while designing programs and software. Do
you write programs that runs fast but uses lots of memory space? Or do you write program that runs slower
but uses less memory space?
For example, if you are writing software to do high-frequency stock trading where every microsecond can
be the difference between earning or losing hundreds of thousands of dollars, you will likely want to design
programs can execute very quickly at the expense of using lots of memory space.
On the other hand, you may be writing software that runs on smartphones where the memory available to
the software is limited. In this situation, you may want to write software that uses less memory but runs a
bit more slowly.
Therefore, use your best judgement when it comes to Time vs Space Complexity trade-off. Identify your
business needs or constraints, and then decide if you should trade space for time or vice versa.
Session 2: Run-time Analysis
Fibonacci sequence
In this session, you have been introduced to a mathematical function called Fibonacci sequence, which is
defined as:
F(0) = 0
F(1) = 1
F(n) = F(n - 1) + F(n - 2)
We made a small change in the above function as F(n) = [F(n - 1) + F(n - 2)]%10, i.e.by dividing the function
with modulo 10 in order to avoid integer overflow error for higher input(n) values. We then discussed on
calculating nth number of function i.e., n = 0, 1, 2, 3, 4, 5 …
You have learnt algorithm 1 to generate nth number of the function F(n) = [F(n - 1) + F(n - 2)]%10.
Algorithm 1
As you have learnt, the function fibonacci() is called inside the same function which is called as recursive
function
Recursive function
Recursive function is a function which calls itself during its execution, and a recursive function typically needs
to define two cases:
1. the base case that returns a definitive value and
2. the recursive case where the recursive function calls itself and tries to solve smaller parts of the
problem at hand
In algorithm 1, the if condition acts as a terminating or base condition which returns the definite values to
end the recursive calls of the function fibonacci() and else condition acts as a recursive case, which calls the
same function fibonacci() again till the passing argument satisfies the base condition. Now to understand
how exactly the recursion function generates the Fibonacci number for a given input(n), we discussed a
recursion tree to generate 5th Fibonacci number.
In which F(0) & F(1) are the terminating conditions and helps to find rest of the values and print final output.
Then, we have demonstrated the code of algorithm 1 for different input size like n = 4, n =42
But when the input given is n = 100, then the compiler took time to process and you never saw the output.
As suggested, If you have run this code in your system for the input n = 100 even after waiting for hours the
compiler must not have given any output and still processing.
Time complexity of algorithm 1
We considered that,
No of additions required to calculate F(0)& F(1) = 0, as the if condition returns the same value without any
calculations(additions) required.
Then, we have calculated the upper bound for the time complexity function as follows,
So, the time complexity of algorithm 1 is an exponential function with O(2^n), which is really slow, and this
is why algorithm 1 is unable to produce an output when n = 100.
Space complexity of algorithm 1
With respect to the memory space i.e., space complexity required for algorithm 1, we have again discussed
with an example of generating 5th Fibonacci number and how the memory space is occupied on each
recursive call?
Think of memory space as partitions as shown and each partition is occupied as the function is called and
the same function pops out of memory space when the process is done and returns the value.
As you can observe, the maximum memory space required is proportional to the Fibonacci number
generated. So, space complexity S(n) ∈ O(n) – linear function
Recursion
Consider each folder is a directory and it consists of many sub directories(folders), in order to search a
specific filename, pass in the filename that you are looking for and the file directory path(folder path) that
you want to start the search to the recursive function.
Algorithm 2
In algorithm 1, we have not stored the values F(2) & F(3) and had to calculate each time whenever required
We try and overcome the redundant calculations in algorithm 2 by storing all the values calculated in an
array variable f [ ]. Therefore, if we need to recall a Fibonacci value that has been previously calculated, we
can simply refer back to the values stored in f[ ] rather than recalculating the value again.
Algorithm 2
As you can observe, there is only one for loop iterating from 2 to n i.e., n – 1 times executes certain
instructions to generate nth number of the function F(n) = [F(n - 1) + F(n - 2)]%10
Therefore,
Time Complexity:
T(n) = No. of additions to compute F(n)
So, T(n) = n – 1
Therefore, T(n) ∈ O(n), linear time
With respect to the memory space, besides declaring variable n for input, you need to create an array
variable f[ ] of size n in order to store all the calculated values of the function F(n) = [F(n - 1) + F(n - 2)]%10
Therefore,
Space Complexity:
An extra array variable f[] is defined, whose size is dependent on input variable n
So, S(n) ∈ O(n), linear in memory space
After analysing algorithm 2 with respect to time taken and memory space required, we have run the java
code for input values like n = 4, n = 100 and using algorithm 2 we are able to generate the output for n = 100.
So, algorithm 2 overcomes the constraint of algorithm 1 and process the nth number of the function when n
=100.
The compiler displays an out of memory error as the memory needed to create an array of size 5 x 108 is
much larger than the total memory available to our entire program. Therefore, algorithm 2 cannot process
the output when 𝑛 = 5 × 108 .
With respect to the execution time, algorithm 2 with O(n) is better than algorithm 1 with 𝑂(2𝑛 ) and so is
able to process the Fibonacci number for n = 100 in no time.
Whereas algorithm 2 must be improved with respect to the memory space required. Otherwise it can’t store
and process Fibonacci numbers for large input values such as 𝑛 = 5 × 108 .
Algorithm 3
To overcome the memory space constraint in algorithm 2, you have learnt algorithm 3, a clever technique
that calculates the Fibonacci sequence by using three different variables a, b & c.
Algorithm 3
and then for each iteration of for loop(i), variables are assigned as follows
c = (a + b)%10;
a = b;
b = c;
The values are listed below for first two iterations and for nth iteration, variable c stores nth number of the
function F(n) = [F(n - 1) + F(n - 2)]%10
The java code of algorithm 3 is executed for different values of n and in specific when n = 109 , output is
processed and displayed as follows,
So, with this we are able to confirm that algorithm 3 overcomes the memory space constraint of algorithm
2. More specifically, unlike algorithm 2, the memory required for algorithm 3 is constant and independent
of the input n. On analysing algorithm 3 with respect to time taken and memory space required,
The instruction set which help in generating nth number of the function is executed n – 1 times,
So, the time complexity of algorithm 3, T(n) = n – 1 ⇒ T(n) ∈ O(n) – linear time
Memory space required is constant, as only three variables are used to process the output irrespective of
the input size(n), so the space complexity of algorithm 3, S(n) ∈ O(1) – constant space
Summary
Algorithm 1 – 𝑇(𝑛) ∈ 𝑂(2𝑛 ) – Exponential time
𝑆(𝑛) ∈ 𝑂(𝑛) – Linear space
The runtime and space complexity of algorithm 3 is O(n) and O(1) respectively, which is much more efficient
than the other two algorithms. Therefore, algorithm 3 can process the nth number in the Fibonacci sequence
for large values of n = 109 , a value that would otherwise be too large for algorithm 1 and algorithm 2.