0% found this document useful (0 votes)
8 views

Algorithm+Analysis

The lecture notes cover the importance of algorithms in computer programs, defining an algorithm as a method for solving problems through sequential instructions. Two algorithms for identifying duplicate student IDs are discussed, with Algorithm 1 being less efficient in time but using constant space, while Algorithm 2 is faster but requires linear space. The notes also explain time and space complexity, asymptotic notations, and the trade-offs between time and space in algorithm design.

Uploaded by

hithisisaryan123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Algorithm+Analysis

The lecture notes cover the importance of algorithms in computer programs, defining an algorithm as a method for solving problems through sequential instructions. Two algorithms for identifying duplicate student IDs are discussed, with Algorithm 1 being less efficient in time but using constant space, while Algorithm 2 is faster but requires linear space. The notes also explain time and space complexity, asymptotic notations, and the trade-offs between time and space in algorithm design.

Uploaded by

hithisisaryan123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Lecture Notes

Algorithm Analysis
Session 1: Algorithms
Introduction to algorithms

In this session, you have seen the importance of algorithms in a computer program. Specifically,
Computer Program = Algorithms + Data structures

What is an algorithm?
An algorithm is a method of solving a problem through a set of sequential instructions.

Example 1: Ola app, on your request to pick from location A and drop at location B, an algorithm running
behind the screen helps to locate a cab nearby and find the shortest route to destination.
Example 2: In a similar way, Ola share has another algorithm which helps to identify the multiple requests
from different people en route to the destination and find out the shortest distance to pick and drop in their
respective locations.

Algorithm 1
We discussed a practical scenario where students can register for a course both online mode & offline at
academic office and certain students registered for the same course in both the modes.

You have learnt an algorithm to find out the student ID’s registered twice for the same course.
Algorithm 1

We have assumed that the university consists of not more than 10000 students and student ID’s are a set of
integers between 1 – 10000.
The combined data of student ids are stored in an array variable id [ ]
……
i= 0 1 2 3 n-1

id [ i ] : ith student id registered


n : length of the array
The nested for loops in algorithm 1 helps to iterate across the data and compare each student id with other
already registered ids to find duplicate student IDs.

For the first iteration of outer loop(i) i.e., when i = 0, the inner loop(j) iterates from 1 (i+1) to n - 1 and
compares the first student id (id [0]) with other student IDs on right-hand side till it finds the duplicate
student ID in id [0].

For the second iteration of outer loop(i) i.e., when i = 1, the inner loop(j) iterates from 2 (i + 1) to n – 1 and
compares the second student id (id [1]) with other student IDs on the right-hand side till it finds the duplicate
student ID of id [1].

For each iteration of inner loop(j), instruction set will compare id[ i ] & id[ j ], if it is true then prints the
duplicate ID and breaks out of inner loop(j).

In this way the outer loop(i) is iterated from 0 to n – 1 and for each iteration of outer loop(i), inner loop(j)
instruction set is executed for n – (i + 1) and prints all the duplicate IDs found.

Algorithm 2

As you know, typically, there is more than one way to solve the same problem and you have seen another
approach in finding the duplicate student IDs from given data.
Algorithm 2
In algorithm 2, declare an extra array variable count [ ] with the maximum size of student id i.e. 10000 besides
an array id [ ] for student registrations data.

Count array index values refers to corresponding student IDs. For id[i] (ith student), count[id[i]] increments
to 1, if same id appears again then the same cell is incremented to 2.

As you can observe, there are no duplicates in the above given data and so the corresponding cells of student
ID in the count array are incremented to 1.

Whereas if there is a duplicate student ID in the given data,

Here, Student ID – 5 & 8 are repeated in the given data, so you can see that the arrays cells corresponding
to the duplicate student ID (5 & 8) are incremented to 2.

When count [id[ i ]] is equal to 2 then prints the duplicate student ID.
Parameters

Now, you need to analyse the above two algorithms and find the efficient algorithm. For which we discussed
that the parameters based on which an algorithm is analysed are

• How long does an algorithm takes to process output?


• How much memory space is required to execute an algorithm?

Both execution time and memory space are calculated as a function of input size(n).

In general, to analyse an algorithm you need to calculate the no of times certain instruction set is executed
rather than the exact time values as it depends on various external factors such as processor speed, the
compiler etc. Also, while analysing an algorithm, we consider worst case possible.

To understand worst case, you saw an analogy of unlocking a lock when you have ten different keys, of
which you are not aware of the right key. When you try to unlock on a trial and error method, the worst
case is that the lock is unlocked on your tenth attempt and the best case is that it unlocked on your first
attempt.

Time complexity
The worst case to be considered in algorithm 1 is when there is no duplicate student IDs and for every i th
iteration of outer loop(i), inner loop instruction set is executed n – (i + 1) times.

Therefore, in total the no of times inner loop instruction set is executed = (𝑛 − 1) + (𝑛 − 2) + … … . + 1


𝑛(𝑛−1)
= 2

In a similar way, algorithm 2 executes certain instruction set to find the duplicate student IDs n times
On assuming constant times:
1. C1 for rest of the instruction set like declaring variables, passing data to functions, etc.,

2. C2 for the instruction set inside the loop to find duplicate student IDs .

Therefore, the total time taken,


𝑛(𝑛−1)
Algorithm 1 – T(n) = C1 + * C2
2

Algorithm 2 – T(n) = C1 + n* C2

The total time taken to execute an algorithm as a function of input size(n) is called time complexity of an
algorithm and is represented by T(n).

Space complexity
The other parameter considered to analyse an algorithm is memory space required to execute an algorithm.

The total memory space required to execute an algorithm as a function of input size(n) is called space
complexity of an algorithm and is represented by S(n). In general, you only calculate the extra memory required,
not including the memory needed to store the input.

As you have learnt, algorithm 1 uses a constant memory space besides an array variable id [ ] to store student
IDs and algorithm 2 uses an extra array variable count [ ], the size of count array variable increases linearly
with the increase in the student strength of the university i.e., when student ID exceeds 10000, then the
maximum student ID is considered as the size of count array.

Therefore,
Algorithm 1 – S(n) - Constant space, memory space does not depend on the input size

Algorithm 2 – S(n) is linearly proportional to the number of possible students in the university

Asymptotic notations

After obtaining the complexity functions of both algorithms, you have learnt mathematical notations like Big
O, Big omega (Ω), Big theta (ϴ) called as asymptotic notations which help us to compare the functions of two
different algorithms.

Big O
Big O indicates the upper bound (worst case) of the running time or space complexity of an algorithm.

To calculate the Big O of any function, we discussed certain simplification rules as,
• Drop the constant multiplier in a function as it depends on hardware like processor speed, on which
the program is run.
For example,
i) 𝑇(𝑛) = 2𝑛 ⇒ 𝑇(𝑛) ∈ 𝑂(𝑛)
ii) 𝑇(𝑛) = 5𝑛2 ⇒ 𝑇(𝑛) ∈ 𝑂(𝑛2 )
• Drop less significant terms in a polynomial function. Except for higher order terms, rest of the terms
relatively contribute very less in explaining the growth of a function.
For example,
𝑇(𝑛) = 10𝑛3 + 𝑛2 + 4𝑛 + 800. ⇒ 𝑇(𝑛) ∈ 𝑂(𝑛3 )

If n = 1000, then T(n)=10,001,040,800.

If you drop all the less significant terms except for 10𝑛3 , then the error rate is 0.01%, which is very
minimal.

Definition: Big O is “bounded above by” (upper bound), there exists constants 𝑐 > 0 and 𝑁 > 0
𝑇(𝑛) ≤ 𝑐 𝑓(𝑛) for all 𝑛 > 𝑁, 𝑇(𝑛) ∈ 𝑂(𝑓(𝑛))

If an algorithm takes the time complexity function as T(n) = 2𝑛2 − 3𝑛 + 6

When calculating Big O, the time complexity function of an algorithm is less than or equal to upper bound
and input size(n) is a positive value, as n value increases −3𝑛 + 6 < 0.

So, 𝑇(𝑛) ≤ 2𝑛2 , 𝑛 > 2 ⇒ −3𝑛 + 6 < 0


⇒ 3𝑛 > 6
⇒ 𝑛 > 2

Therefore, 𝑇(𝑛) ≤ 2𝑛2 ⇒ 𝑇(𝑛) ∈ 𝑂(𝑛2 )

Big Omega
Big Omega indicates the lower bound of running time or space complexity of an algorithm. In the scenario
of opening a lock, if you dealt with the best case, i.e. on your first attempt itself, you could unlock.

Definition: Big Omega (ꭥ) is “bounded below by” (lower bound), there exists constants 𝑐 > 0 and 𝑁 > 0
𝑇(𝑛) ≥ 𝑐 𝑓(𝑛) for all 𝑛 > 𝑁, 𝑇(𝑛) ∈ Ω(𝑓(𝑛))

If an algorithm takes the time complexity function as 𝑇(𝑛) = 2𝑛2 − 3𝑛 + 6,

When calculating Big Omega, the time complexity function of an algorithm is greater than or equal to
lower bound.
𝑇(𝑛) = 2𝑛2 − 3𝑛 + 6
≥ 2𝑛2 − 3𝑛 (because 6 > 0)
≥ 𝑛2 , 𝑛 ≥ 3 ⇒ 2𝑛2 − 3𝑛 ≥ 𝑛2
⇒ 2𝑛2 − 𝑛2 ≥ 3𝑛
⇒ 𝑛2 ≥ 3𝑛
⇒ 𝑛≥3

So, 𝑇(𝑛) ≥ 𝑛2, for all 𝑛 ≥ 3

Therefore, 𝑇(𝑛) ∈ Ω (𝑛2 )


Big Theta
Big Theta represents an in-between case, bounded both above and below the running time or space
complexity of an algorithm.

Definition: Big Theta (ϴ) is “bounded above and below”, there exists constants c1 > 0, c2 > 0 and 𝑁 > 0,
c1 . 𝑔(𝑛) ≤ 𝑇(𝑛) ≤ c2 .𝑓(𝑛), for all 𝑛 > 𝑁, 𝑇(𝑛) ∈ 𝛳(𝑓(𝑛))

If an algorithm takes the time complexity function as 𝑇(𝑛) = 2𝑛2 − 3𝑛 + 6,


As already discussed,

Upper Bound: 2𝑛2 − 3𝑛 + 6 ≤ 2𝑛2


Lower Bound: 𝑛2 ≤ 2𝑛2 − 3𝑛 + 6

So, 𝑛2 ≤ 2𝑛2 − 3𝑛 + 6 ≤ 2𝑛2 , 𝑛 ≥ 3 (N)

Among all the three asymptotic notations, the most used notation to compare algorithms is Big O.
Therefore, you always tend to find the worst case of an algorithm with respect to the input size(n).

Rule of Sums and Rule of Products


You have learnt two rules which are used in general while analysing for time complexity of algorithms,

Rule of sums
In an algorithm, when two for loops are one after another as follows

Then the first for loop(i) instruction set is executed n times and second for loop(j) instruction set is executed
m times.
So, in total n + m steps taken in executing this kind of an algorithm. If you consider the time complexity of
this algorithm, 𝑇(𝑛) = 𝑛 + 𝑚 (neglecting the constant time taken by instruction set)
If 𝑛 > 𝑚 ⇒ then you can say that 𝑇(𝑛) = 𝑛 + 𝑚
≤𝑛+ 𝑛
≤ 2𝑛
On dropping the constant multiplier 2, 𝑇(𝑛) ∈ 𝑂(𝑛)
Rule of products
In an algorithm, when two for loops are nested as follows

Then the outer loop(i) is iterated n times and for each iteration of outer loop(i), inner loop(j) instruction set
is executed for m times. In total, the instruction set inside the inner loop is executed n * m times, if you
consider the time complexity function of algorithm, 𝑇(𝑛) = 𝑛 ∗ 𝑚(neglecting the constant time taken by
instruction set).

if 𝑛 > 𝑚 ⇒ then you can say that 𝑇(𝑛) = 𝑛 ∗ 𝑚


≤𝑛∗𝑛
≤ 𝑛2
So, 𝑇(𝑛) ∈ 𝑂(𝑛2 )

Big-O and Growth Rates


You have learnt different functions encountered when analysing algorithms as follows,

To find out the relative efficiency among the different functions, the following table has been discussed in
detail on how it helps you to give an idea on the different time taken for different input sizes.
To understand the below table, we have assumed that a computer performs 1 billion (109 ) operations per
second.

Hence, the efficiency order of different functions


𝑂(1) > 𝑂(𝑙𝑜𝑔 𝑛) > 𝑂(𝑛) > 𝑂(𝑛 𝑙𝑜𝑔 𝑛) > 𝑂(𝑛2 ) > 𝑂 (𝑛3 )

Using this, you can compare different complexity functions of algorithms and find the efficient algorithm.

Duplicates
After finding the relative efficiency of different functions, you have learnt how to compare algorithms to find
duplicate student IDs and find the efficient algorithm

Time complexity:
𝑛(𝑛−1)
Algorithm 1 – T(n) = C1 + * C2 ⇒ T(n ) ∈ O(𝑛2 ) – Quadratic function
2

Algorithm 2 – T(n) = C1 + n* C2 ⇒ T(n) ∈ O(𝑛) – Linear function

you can observe that algorithm 2: O(𝑛) is more efficient when compared to algorithm 1: O(𝑛2 )

Space complexity:

Algorithm 1 – S(n ) ∈ O(1) – constant function

Algorithm 2 – S(n) ∈ O(n) – linear function

You can observe that algorithm 1: O(1) is more efficient when compared to algorithm 2: O(n)
Time vs Space Complexity Trade-off:

As you’ve seen with the algorithms for identifying duplicated student ids, the two algorithms have made
different trade-offs in terms of time and space.

Specifically:
• Algorithm 1 runs slower, but uses less memory
• Algorithm 2 runs faster, but uses more memory

As a software developer, you’ll often face this kind of dilemma while designing programs and software. Do
you write programs that runs fast but uses lots of memory space? Or do you write program that runs slower
but uses less memory space?

The answer is it depends.

For example, if you are writing software to do high-frequency stock trading where every microsecond can
be the difference between earning or losing hundreds of thousands of dollars, you will likely want to design
programs can execute very quickly at the expense of using lots of memory space.

On the other hand, you may be writing software that runs on smartphones where the memory available to
the software is limited. In this situation, you may want to write software that uses less memory but runs a
bit more slowly.

Therefore, use your best judgement when it comes to Time vs Space Complexity trade-off. Identify your
business needs or constraints, and then decide if you should trade space for time or vice versa.
Session 2: Run-time Analysis
Fibonacci sequence

In this session, you have been introduced to a mathematical function called Fibonacci sequence, which is
defined as:
F(0) = 0
F(1) = 1
F(n) = F(n - 1) + F(n - 2)

We made a small change in the above function as F(n) = [F(n - 1) + F(n - 2)]%10, i.e.by dividing the function
with modulo 10 in order to avoid integer overflow error for higher input(n) values. We then discussed on
calculating nth number of function i.e., n = 0, 1, 2, 3, 4, 5 …

You have learnt algorithm 1 to generate nth number of the function F(n) = [F(n - 1) + F(n - 2)]%10.

Algorithm 1

As you have learnt, the function fibonacci() is called inside the same function which is called as recursive
function
Recursive function
Recursive function is a function which calls itself during its execution, and a recursive function typically needs
to define two cases:
1. the base case that returns a definitive value and
2. the recursive case where the recursive function calls itself and tries to solve smaller parts of the
problem at hand

In algorithm 1, the if condition acts as a terminating or base condition which returns the definite values to
end the recursive calls of the function fibonacci() and else condition acts as a recursive case, which calls the
same function fibonacci() again till the passing argument satisfies the base condition. Now to understand
how exactly the recursion function generates the Fibonacci number for a given input(n), we discussed a
recursion tree to generate 5th Fibonacci number.

In which F(0) & F(1) are the terminating conditions and helps to find rest of the values and print final output.
Then, we have demonstrated the code of algorithm 1 for different input size like n = 4, n =42

But when the input given is n = 100, then the compiler took time to process and you never saw the output.
As suggested, If you have run this code in your system for the input n = 100 even after waiting for hours the
compiler must not have given any output and still processing.
Time complexity of algorithm 1

We considered that,

T(n) = No of additions required to compute F(n)

No of additions required to calculate F(0)& F(1) = 0, as the if condition returns the same value without any
calculations(additions) required.

Then, we have calculated the upper bound for the time complexity function as follows,

So, the time complexity of algorithm 1 is an exponential function with O(2^n), which is really slow, and this
is why algorithm 1 is unable to produce an output when n = 100.
Space complexity of algorithm 1

With respect to the memory space i.e., space complexity required for algorithm 1, we have again discussed
with an example of generating 5th Fibonacci number and how the memory space is occupied on each
recursive call?

Think of memory space as partitions as shown and each partition is occupied as the function is called and
the same function pops out of memory space when the process is done and returns the value.

As you can observe, the maximum memory space required is proportional to the Fibonacci number
generated. So, space complexity S(n) ∈ O(n) – linear function

Recursion

We have discussed a real-world problem “file search” in a laptop using recursion,

The file search process as follows,

Consider each folder is a directory and it consists of many sub directories(folders), in order to search a
specific filename, pass in the filename that you are looking for and the file directory path(folder path) that
you want to start the search to the recursive function.

Then follow certain steps inside the recursive function as


1. List all the contents inside the file directory passed to the function
2. Loop through the contents inside the file directory,
- If a specific content is another directory, recursively call the function and pass in the file directory
and its path.
- If the content is a file, check and see if the file matches with the filename you are searching for.
If it matches, then return “file is found” with the file path.

This has been demonstrated using java code as follows,

Algorithm 2

Now, coming back the function F(n) = [F(n - 1) + F(n - 2)]%10,

In algorithm 1, we have not stored the values F(2) & F(3) and had to calculate each time whenever required

We try and overcome the redundant calculations in algorithm 2 by storing all the values calculated in an
array variable f [ ]. Therefore, if we need to recall a Fibonacci value that has been previously calculated, we
can simply refer back to the values stored in f[ ] rather than recalculating the value again.
Algorithm 2

As you can observe, there is only one for loop iterating from 2 to n i.e., n – 1 times executes certain
instructions to generate nth number of the function F(n) = [F(n - 1) + F(n - 2)]%10

Therefore,
Time Complexity:
T(n) = No. of additions to compute F(n)
So, T(n) = n – 1
Therefore, T(n) ∈ O(n), linear time

With respect to the memory space, besides declaring variable n for input, you need to create an array
variable f[ ] of size n in order to store all the calculated values of the function F(n) = [F(n - 1) + F(n - 2)]%10

Therefore,
Space Complexity:
An extra array variable f[] is defined, whose size is dependent on input variable n
So, S(n) ∈ O(n), linear in memory space

After analysing algorithm 2 with respect to time taken and memory space required, we have run the java
code for input values like n = 4, n = 100 and using algorithm 2 we are able to generate the output for n = 100.
So, algorithm 2 overcomes the constraint of algorithm 1 and process the nth number of the function when n
=100.

But, for an input n = 5 x 108 , the compiler gives an error as follows,

The compiler displays an out of memory error as the memory needed to create an array of size 5 x 108 is
much larger than the total memory available to our entire program. Therefore, algorithm 2 cannot process
the output when 𝑛 = 5 × 108 .

If you compare algorithm 2 with algorithm 1,


Algorithm 1 – T(n) ∈ O(2𝑛 ) – Exponential time
S(n) ∈ O(n) – Linear space

Algorithm 2 – T(n) ∈ O(n) – Linear time


S(n) ∈ O(n) – Linear space

With respect to the execution time, algorithm 2 with O(n) is better than algorithm 1 with 𝑂(2𝑛 ) and so is
able to process the Fibonacci number for n = 100 in no time.

Whereas algorithm 2 must be improved with respect to the memory space required. Otherwise it can’t store
and process Fibonacci numbers for large input values such as 𝑛 = 5 × 108 .
Algorithm 3

To overcome the memory space constraint in algorithm 2, you have learnt algorithm 3, a clever technique
that calculates the Fibonacci sequence by using three different variables a, b & c.

Algorithm 3

The variables are initialized as


a = 0, b = 1, c = n

and then for each iteration of for loop(i), variables are assigned as follows
c = (a + b)%10;
a = b;
b = c;

The values are listed below for first two iterations and for nth iteration, variable c stores nth number of the
function F(n) = [F(n - 1) + F(n - 2)]%10
The java code of algorithm 3 is executed for different values of n and in specific when n = 109 , output is
processed and displayed as follows,

So, with this we are able to confirm that algorithm 3 overcomes the memory space constraint of algorithm
2. More specifically, unlike algorithm 2, the memory required for algorithm 3 is constant and independent
of the input n. On analysing algorithm 3 with respect to time taken and memory space required,

The instruction set which help in generating nth number of the function is executed n – 1 times,
So, the time complexity of algorithm 3, T(n) = n – 1 ⇒ T(n) ∈ O(n) – linear time

Memory space required is constant, as only three variables are used to process the output irrespective of
the input size(n), so the space complexity of algorithm 3, S(n) ∈ O(1) – constant space

Summary
Algorithm 1 – 𝑇(𝑛) ∈ 𝑂(2𝑛 ) – Exponential time
𝑆(𝑛) ∈ 𝑂(𝑛) – Linear space

Algorithm 2 – 𝑇(𝑛) ∈ 𝑂(𝑛) – Linear time


𝑆(𝑛) ∈ 𝑂(𝑛) – Linear space

Algorithm 3 – 𝑇(𝑛) ∈ 𝑂(𝑛) – Linear time


𝑆(𝑛) ∈ 𝑂(1) – Constant space

The runtime and space complexity of algorithm 3 is O(n) and O(1) respectively, which is much more efficient
than the other two algorithms. Therefore, algorithm 3 can process the nth number in the Fibonacci sequence
for large values of n = 109 , a value that would otherwise be too large for algorithm 1 and algorithm 2.

You might also like