ds-2021_1643097115
ds-2021_1643097115
DATA STRUCTURE
BRANCH-IT
SEMESTER-3rd
Th-2 DATA STRUCTURE
Common to (CSE/IT)
B. RATIONAL: The study of Data structure is an essential part of computer science. Data
structure is a logical & mathematical model of storing & organizing data in a particular way in a
computer. In system programming application programming the methods & techniques of data
structures are widely used. The study of data structure helps the students in developing logic &
structured programs.
C. OBJECTIVE: After completion of this course the student will be able to:
• Understand the concepts of linear data structures, their operations and applications
• Understand the operation in abstract data type like Stack and Queue.
• Understand the concept of pointers and their operations in linked list.
• Know the concepts of non-linear data structures, their operations and applications intree
and graph.
• Understand the various sorting and searching techniques.
• Understand file storage and access techniques.
D. DETAIL CONTENT:
INTRODUCTION:
ARRAYS
Give Introduction about array,
LINKED LIST
Give Introduction about linked list
Explain Insertion into a linked list, Deletion from a linked list, header linked list
TREE
Explain Basic terminology of Tree
Discuss Binary tree, its representation and traversal, binary search tree, searching,
7.0 GRAPHS
Merging
FILE ORGANIZATION
Discuss Different types of files organization and their access method,
Book Recommended:-
DATA
Data is a set of values of qualitative or quantitative variables. Data in computing (or data
processing) is represented in a structure that is often tabular (represented by rows and
columns), a tree (a set of nodes with parent-children relationship), or a graph (a set of
connected nodes). Data is typically the result of measurements and can be visualized using
graphs or images.
Data as an abstract concept can be viewed as the lowest level of abstraction, from which
information and then knowledge are derived.
INFORMATION
Information is that which informs us with some valid meaning, i.e. that from which data can
be derived. Information is conveyed either as the content of a message or through direct or
indirect observation of some thing. Information can be encoded into various forms for
transmission and interpretation. For example, information may be encoded into signs, and
transmitted via signals.
DATA TYPE
Data types are used within type systems, which offer various ways of defining,
implementing and using the data. Different type systems ensure varying degrees of type
safety.
Almost all programming languages explicitly include the notion of data type. Though
different languages may use different terminology. Common data types may include:
Integers,
Booleans,
Characters,
Floating-point numbers,
Alphanumeric strings.
For example, in the Java programming language, the "int" type represents the set of 32-bit
integers ranging in value from -2,147,483,648 to 2,147,483,647, as well as the operations
that can be performed on integers, such as addition, subtraction, and multiplication.
Colors, on the other hand, are represented by three bytes denoting the amounts each of
red, green, and blue, and one string representing that color's name; allowable operations
include addition and subtraction, but not multiplication.
Most programming languages also allow the programmer to define additional data types,
usually by combining multiple elements of other types and defining the valid operations of
the new data type. For example, a programmer might create a new data type named
"complex number" that would include real and imaginary parts. A data type also represents
a constraint placed upon the interpretation of data in a type system, describing
representation, interpretation and structure of values or objects stored in computer memory.
The type system uses data type information to check correctness of computer programs that
There is a specific set of arithmetic instructions that use a different interpretation of the bits
in word as a floating-point number. Machine data types need to be exposed or made
available in systems or low-level programming languages, allowing fine-grained control over
hardware. The C programming language, for instance, supplies integer types of various
widths, such as short and long. If a corresponding native type does not exist on the target
platform, the compiler will break them down into code using types that do exist. For
instance, if a 32-bit integer is requested on a 16 bit platform, the compiler will tacitly treat it
as an array of two 16 bit integers. Several languages allow binary and hexadecimal literals,
for convenient manipulation of machine data.
In higher level programming, machine data types are often hidden or abstracted as an
implementation detail that would render code less portable if exposed. For instance, a
generic numeric type might be supplied instead of integers of some specific bit-width.
BOOLEAN TYPE
The Boolean type represents the values true and false. Although only two values are
possible, they are rarely implemented as a single binary digit for efficiency reasons. Many
programming languages do not have an explicit boolean type, instead interpreting (for
instance) 0 as false and other values as true.
The integer data types, or "whole numbers". May be subtyped according to
their ability to contain negative values (e.g. unsigned in C and C++). May also have a
small number of predefined subtypes (such as short and long in C/C++); or allow users to
freely define subranges such as 1..12 (e.g. Pascal/Ada).
Floating point data types, sometimes misleadingly called reals, contain
fractional values. They usually have predefined limits on both their maximum values and
their precision. These are often represented as decimal numbers.
Fixed point data types are convenient for representing monetary values. They
are often implemented internally as integers, leading to predefined limits.
Bignum or arbitrary precision numeric types lack predefined
limits. They are not primitive types, and are used sparingly for efficiency
reasons.
Composite types are derived from more than one primitive type. This can be done in a
number of ways. The ways they are combined are called data
structures. Composing a primitive type into a compound type generally results in a new
type, e.g. array-of-integer is a different type to integer.
Many others are possible, but they tend to be further variations and compoundsof the
above.
ENUMERATED TYPE
This has values which are different from each other, and which can be compared and
assigned, but which do not necessarily have any particular concrete representation in the
computer's memory; compilers and interpreters can represent them arbitrarily. For
example, the four suits in a deck of playing cards may be four enumerators named CLUB,
DIAMOND, HEART, SPADE, belonging to an enumerated type named suit. If a variable V
is declared having suit as its data type, one can assign any of those four values to it.
Some implementations allow programmers to assign integer values to the enumeration
values, or even treat them as type-equivalent to integers.
Character and string types can store sequences of characters from a character set such as
ASCII. Since most character sets include the digits, it is possible to have a numeric string,
such as "1234". However, many languages would still treat these as belonging to a
different type to the numeric value 1234.
Character and string types can have different subtypes according to the required character
"width". The original 7-bit wide ASCII was found to be limited and superseded by 8 and 16-
bit sets.
Any type that does not specify an implementation is an abstract data type. For instance, a
stack (which is an abstract type) can be implemented as an array (a contiguous block of
memory containing multiple values), or as a linked list (a set
Examples include:
A queue is a first-in first-out list. Variations are Deque and Priority queue.
A set can store certain values, without any particular order,
and with norepeated values.
A stack is a last-in, first out.
A tree is a hierarchical structure.
A graph.
A hash or dictionary or map or Map/Associative array/Dictionary is a more
flexible variation on a record, in which name-value pairs can be added and deleted freely.
A smart pointer is the abstract counterpart to a pointer. Both are
kinds of reference
For convenience, high-level languages may supply ready-made "real world" data types, for
instance times, dates and monetary values and memory, even where the language allows
them to be built from primitive types.
Types can be based on, or derived from, the basic types explained above. In some
languages, such as C, functions have a type derived from the type of their return value. The
main non-composite, derived type is the pointer, a data type whose value refers directly to
(or "points to") another value stored elsewhere in the computer memory using its address.
It is a primitive kind of reference. (In everyday terms, a page number in a book could be
considered a piece of data that refers to another one). Pointers are often stored in a format
similar to an integer; however, attempting to dereference or "look up" a pointer whose
value was never a valid memory address would cause a program to crash. To ameliorate
this potential problem, pointers are considered a separate type to the type of data they
point to, even if the underlying representation is the same.
DATA STRUCTURE
Data Structure can be defined as the group of data elements which provides an efficient
way of storing and organising data in the computer so that it can be used efficiently. Some
examples of Data Structures are arrays, Linked List, Stack, Queue, etc. Data Structures
are widely used in almost every aspect of Computer Science i.e. Operating System,
Compiler Design, Artifical intelligence, Graphics and many more.
Efficiency: Efficiency of a program depends upon the choice of data structures. For
example: suppose, we have some data and we need to perform the search for a perticular
record. In that case, if we organize our data in an array, we will have to search sequentially
element by element. hence, using array may not be very efficient here. There are better
data structures which can make the search process efficient like ordered array, binary
search tree or hash tables.
Reusability: Data structures are reusable, i.e. once we have implemented a particular data
structure, we can use it at any other place. Implementation of data structures can be
compiled into libraries which can be used by different clients.
Abstraction: Data structure is specified by the ADT which provides a level of abstraction.
The client program uses the data structure through interface only, without getting into the
implementation details.
Primitive data structures are basic structures and are directly operated upon by
machine instructions.
Primitive data structures have different representations on different computers.
Integers, floats, character and pointers are examples of primitive data structures.
These data types are available in most programming languages as built in type.
o Integer: It is a data type which allows all values without fraction part. We can use it
for whole numbers.
o Float: It is a data type which use for storing fractional numbers.
o Character: It is a data type which is used for character values.
LINEAR DATA STRUCTURES: A data structure is called linear if all of its elements are
arranged in the linear order. In linear data structures, the elements are stored in non-
hierarchical way where each element has the successors and predecessors except the first
and last element.
Arrays: An array is a collection of similar type of data items and each data item is called an
element of the array. The data type of the element may be any valid data type like char, int,
float or double.
The elements of array share the same variable name but each one carries a different index
number known as subscript. The array can be one dimensional, two dimensional or
multidimensional.
Linked List: Linked list is a linear data structure which is used to maintain a list in the
memory. It can be seen as the collection of nodes stored at non-contiguous memory
locations. Each node of the list contains a pointer to its adjacent node.
Stack: Stack is a linear list in which insertion and deletions are allowed only at one end,
called top.
A stack is an abstract data type (ADT), can be implemented in most of the programming
languages. It is named as stack because it behaves like a real-world stack, for example: -
piles of plates or deck of cards etc.
The basic concept can be illustrated by thinking of your data set as a stack of plates or
books where you can only take the top item off the stack in order to remove things from it.
This structure is used all throughout programming.
The basic implementation of a stack is also called a ―Last In First Out‖
structure; however there are different variations of stack implementations.
There are basically three operations that can be performed on stacks. They are:
inserting (―pushing‖) an item into a stack
deleting (―popping‖) an item from the stack
displaying the contents of the top item of the stack (―peeking)
Queue: Queue is a linear list in which elements can be inserted only at one end
called rear and deleted only at the other end called front.
It is an abstract data structure, similar to stack. Queue is opened at both end therefore it
follows First-In-First-Out (FIFO) methodology for storing the data items.
This data structure does not form a sequence i.e. each item or element is connected with
two or more other items in a non-linear arrangement. The data elements are not arranged
in sequential structure.
Trees: Trees are multilevel data structures with a hierarchical relationship among its
elements known as nodes. The bottommost nodes in the herierchy are called leaf
node while the topmost node is called root node. Each node contains pointers to point
adjacent nodes.
Tree data structure is based on the parent-child relationship among the nodes. Each node
in the tree can have more than one children except the leaf nodes whereas each node can
have atmost one parent except the root node. Trees can be classfied into many categories
which will be discussed later in this tutorial.
Graphs: Graphs can be defined as the pictorial representation of the set of elements
(represented by vertices) connected by the links known as edges. A graph is different from
tree in the sense that a graph can have cycle while the tree can not have the one.
A graph data structure consists of a finte (and possibly mutable) set of ordered
pairs, called edges or arcs, of certain entities called nodes or vertices. As in
mathematics, an edge (x,y) is said to point or go from x to y. The nodes may be part of the
graph structure, or may be external entities represented by integer indices or references. A
graph data structure may also associate to each edge some edge value, such as a symbolic
label or a numeric attribute.
1) Traversing: Every data structure contains the set of data elements. Traversing the data
structure means visiting each element of the data structure in order to perform some
specific operation like searching or sorting.
2) Insertion: Insertion can be defined as the process of adding the elements to the data
structure at any location.
If the size of data structure is n then we can only insert n-1 data elements into it.
3) Deletion:The process of removing an element from the data structure is called Deletion.
We can delete an element from the data structure at any random location.
If we try to delete an element from an empty data structure then underflow occurs.
4) Searching: The process of finding the location of an element within the data structure is
called Searching. There are two algorithms to perform searching, Linear Search and Binary
Search. We will discuss each one of them later in this tutorial.
5) Sorting: The process of arranging the data structure in a specific order is known as
Sorting. There are many algorithms that can be used to perform sorting, for example,
insertion sort, selection sort, bubble sort, etc.
6) Merging: When two lists List A and List B of size M and N respectively, of similar type of
elements, clubbed or joined to produce the third list, List C of size (M+N), then this process
is called merging
ABSTRACT DATA TYPE
In computer science, an abstract data type (ADT) is a mathematical model for a certain
class of data structures that have similar behavior; or for certain data types of one or more
programming languages that have similar semantics. An abstract data type is defined
indirectly, only by the operations that may be performed on it and by mathematical
constraints on the effects (and possibly cost) of those operations.
When analyzing the efficiency of algorithms that use stacks, one may also specify that all
operations take the same time no matter how many items have been pushed into the stack,
and that the stack uses a constant amount of storage for each element.
Abstract data types are purely theoretical entities, used (among other things) to simplify the
description of abstract algorithms, to classify and evaluate data structures, and to formally
describe the type systems of programming languages. However, an ADT may be
implemented by specific data types or data structures, in many ways and in many
programming languages; or described in a formal specification language. ADTs are often
implemented as modules: the module's interface declares procedures that correspond to
the ADT operations, sometimes with comments that describe the constraints. This
information hiding strategy allows the implementation of the module to be changed without
disturbing the client programs.
ALGORITHM
In computer science, the algorithms are evaluated by the determination of the amount of
resources (such as time and storage) necessary to execute them. Most algorithms are
designed to work with inputs of arbitrary length. Usually, the efficiency or running time of an
algorithm is stated as a function relating the input length to the number of steps (time
complexity) or storage locations (spacecomplexity).
Algorithm analysis is an important part of a broader computational complexity theory, which
provides theoretical estimates for the resources needed by any algorithm which solves a
given computational problem. These estimates provide an insight into reasonable
directions of search for efficient algorithms.
In theoretical analysis of algorithms it is common to estimate their complexity in the
asymptotic sense, i.e., to estimate the complexity function for arbitrarily large input. Big O
notation, Big-omega notation and Big-theta notation are used to this end. For instance,
binary search is said to run in a number of steps proportional to the logarithm of the length
of the list being searched, or in O(log(n)), colloquially "in logarithmic time". Usually
asymptotic estimates are used because different implementations of the same algorithm
may differ in efficiency. However the efficiencies of any two "reasonable" implementations
of a given algorithm are related by a constant multiplicative factor called a hidden constant.
Exact (not asymptotic) measures of efficiency can sometimes be computed but they usually
require certain assumptions concerning the particular implementation of the algorithm,
called model of computation. A model of computation may be defined in terms of an
abstract computer, e.g., Turing machine, and/or by postulating that certain operations are
executed in unit time. For example, if the sorted list to which we apply binary search has n
elements, and we can guarantee that each lookup of an element in the list can be done in
unit time, then at most log2 n + 1 time units are needed to return an answer.
The best, worst and average case complexity refer to three different ways of measuring the
time complexity (or any other complexity measure) of different inputs of the same size.
Since some inputs of size n may be faster to solve than others, we define the following
complexities:
• Best-case complexity: This is the complexity of solving the problem for the best input
of size n.
• Worst-case complexity: This is the complexity of solving the problem for the worst
input of size n.
• Average-case complexity: This is the complexity of solving the problem on an
average. This complexity is only defined with respect to a probability distribution
over the inputs. For instance, if all inputs of the same size are assumed to be
equally likely to appear, the average case complexity can be defined with respect to
the uniform distribution over all inputs of size n.
TIME COMPLEXITY
In computer science, the time complexity of an algorithm quantifies the amount of time
taken by an algorithm to run as a function of the length of the string representing the input.
The time complexity of an algorithm is commonly expressed using big O notation, which
excludes coefficients and lower order terms. When expressed this way, the time complexity
is said to be described asymptotically, i.e., as the input size goes to infinity.
For example, if the time required by an algorithm on all inputs of size n is at most 5n3 + 3n,
the asymptotictime complexity is O(n3).
Time complexity is commonly estimated by counting the number of elementary operations
performed by the algorithm, where an elementary operation takes a fixed amount of time
to perform. Thus the amount of time taken and the number of elementary operations
performed by the algorithm differ by at most a constant factor.
Since an algorithm's performance time may vary with different inputs of the same size, one
commonly uses the worst-case time complexity of an algorithm, denoted as T(n), which is
defined as the maximum amount of time taken on any input of size n. Time complexities
are classified by the nature of the function T(n). For instance, an algorithm with T(n) = O(n)
is called a linear time algorithm, and an algorithm with T(n) = O(2n) is said to be an
exponential time algorithm.
SPACE COMPLEXITY
The way in which the amount of storage space required by an algorithm varies with the
size of the problem it is solving. Space complexity is normally expressed as an order of
magnitude, e.g. O(N^2) means that if the size of the problem (N) doubles then four times
as much working storage will be needed.
ASYMPTOTIC NOTATIONS
The commonly used asymptotic notations used for calculating the running time complexity
of an algorithm is given below:
o Big oh Notation (Ο)
o Omega Notation (Ω)
o Theta Notation (θ)
Big oh Notation (O) It is the formal way to express the upper boundary of an algorithm
running time. It measures the worst case of time complexity or the longest amount of time,
algorithm takes to complete their operation.
Omega Notation (Ω) It is the formal way to represent the lower bound of an
algorithm's running time. It measures the best amount of time an algorithm
can possibly take to complete or the best case time complexity.
Theta Notation (?) It is the formal way to express both the upper bound and
lower bound of an algorithm running time.
UNIT-II
STRINGPROCESSING
A finite sequence ‗S‘ of zero or more characters is called a String. The string with zero
character is called the empty string or null string.
S 1 || S 2 = XY1 PQR
The character data type is of two data type. (1) Constant (2) Variable
Constant String:
-> The constant string is fixed & is written in either ‗ ‘ single quote & ― ‖
doublequotation.
Ex:- ‗SONA‘
―Sona‖
Variable String:
Whose variable is defined before the program can be executed & cannot change
Semi-static variable:
Whose length variable may as long as the length does not exist, a maximum
value. A maximum value determine by the program before the program is
executed.
Dynamic variable:
A variable whose length can change during the execution of the program.
STRING OPERATION:
To denote the substring of string ‗S‘ beginning in the position ‗K‘ having a length
‗L‘.
SUBSTRING (S, K, L) T KL
SUBSTRING=BE OR N
INDEXING:-
Indexing also called pattern matching which refers to finding the position
where a string pattern ‗P‘. First appears in a given string text ‗T‘, we called
this operation index and write as INDEX (text, pattern)
If the pattern ‗P‘ does not appear in text ‗T‘ then index is assign the
value 0; the argument & text and pattern can either string constant or string
variable.
For e.g.; T contains the text.
10
Concatenation:-
Length operation:-
The number of character in a string is called its length. We will write LENGTH
(string).For the length of a given string LENGTH (―Computer‖). The length is
8.
Basic language LEN (STRING)
Strlen (string)
Strupper(string) String upper
Strupr(‗computer‘)
COMPUTER
String lower
Strlwr (‗COMPUTER‘)
COMPUTER
String concatenating
Strrev
UNIT-III
ARRAY
Definition
• Arrays are defined as the collection of similar type of data items stored at
contiguous memory locations.
• Arrays are the derived data type in C programming language which can store the
primitive type of data such as int, char, double, float, etc.
• Array is the simplest data structure where each data element can be randomly
accessed by using its index number.
For example, if we want to store the marks of a student in 6 subjects, then we don't need
to define different variable for the marks in different subject.
instead of that, we can define an array which can store the marks in each subject at a the
contiguous memory locations. The array marks[10] defines the marks of the student in 10
different subjects where each subject marks are located at a particular subscript in the
array i.e. marks[0] denotes the marks in first subject, marks[1] denotes the marks in 2nd
subject and so on
Array is a list of finite number of n homogeneous data elements i.e. the elements of same
data types Such that:
• The elements of the array are referenced respectively by an index set consisting of
n consecutive numbers.
• The elements of the array are stored respectively in the successive memory
locations.
The number n of elements is called length or size of array. If not explicitly stated, we will
assume the index set consists of integers 1, 2, 3 …n. In general the length or the
number of data elements of the array can be obtained from the index set by the
formula
Length= UB – LB + 1
Where UB is the largest index, called the upper bound, and LB is the smallest index,
called the lower bound. Note that length = UB when LB = 1.
The elements of an array A are denoted by subscript notation à a1, a2, a3…anOr by the
parenthesis notation -> A (1), A (2),…., A(n)
Or by the bracket notation -> A[1], A[2],….,A[n].
We will usually use the subscript notation or the bracket notation.
Let LA is a linear array in the memory of the computer. Recall that the memory of
computer is simply a sequence of addressed locations.
LOC (LA[k]) = address of element LA[k] of the array LA.
As previously noted, the elements of LA are stored in the successive memory cells.
Accordingly, the computer does not need to keep track of the address of every
element of LA, but needs to keep track only of the address of the first element of LA,
denoted by
Base (LA)
And called the base address of LA. Using base address the computer calculates the
address of any element of LA by the following formula:
LOC (LA[k]) = Base (LA) + w (k-lower bound)
Where w is the number of words per memory cell for the array LA.
OPERATIONS ON ARRAYS
Here LA is a linear array with lower bound LB and upper bound UB.
This algorithm traverses LA applying an operation PROCESS to each
element of LA.
1. [Initialize counter] Set k: =LB.
2. Repeat steps 3 and 4 while k <=UB.
3. [Visit Element] Apply PROCESS to LA [k].
4. [Increase Counter] Set
k: =k + 1. [End of step 2
loop]
5. Exit.
OR
Here LA is a linear array with lower bound LB and upper bound UB.
This algorithm traverses LA applying an operation PROCESS to each
element of LA.
1. Repeat for k = LB to UB:
Apply PROCESS
to LA [k]. [End of
loop]
2. Exit.
Caution: The operation PROCESS in the traversal algorithm may use certain
variables which must be initialized before PROCESS is applied to any of the
elements in the array. Accordingly, the algorithm may need to be proceeded by such
an initialization step.
Here LA is a linear array with N elements and K is a positive integer such that K<=N.
The algorithm inserts an element ITEM into the Kth position in LA.
The following algorithm deletes the Kth element from a linear array LA
and assigns it to a variable ITEM.
MULTIDIMENSIONAL ARRAY
int a[3][4];
Use :
for(i=0;i
<row;i+
+)
for(j=0;j<col;j++)
{
printf("%d",a[i][j]);
}
MEMORY REPRESENTATION
Array Representation:
• Column-major
• Row-major
Arrays may be represented in Row-major form or Column-major form. In Row- major
form, all the elements of the first row are printed, then the elements of the second row
and so on up to the last row. In Column-major form, all the elements of the first
column are printed, then the elements of the second column and so on up to the last
column. The ‗C‘ program to input an array of order m x n and print the array contents
in row major and column major is given below. The following array elements may be
entered during run time to test this program:
Output:
Row Major:
1 2 3
4 5 6
7 8 9
Column Major:
1 4 7
2 5 8
3 6 9
Address(Arr[k])=Base(Arr)+w(k-LowerBound)
2 d Array :- Suppose Arr is a 2 d array. The first dimension of Arr contains the index
set 0,1,2, … row-1 ( the lower bound is 0 and the upper bound is row-1) and the
second dimension contains the index set 0,1,2,… col-1( with lower bound0 and
upper bound col-1.)
The length of each dimension is to be calculated .The multiplied result of
both the lengths will give you the number of elements
in the array.
1,1
2,1
1,2
2,2
Column MjorOrder
1,1
1,2
2,1
2,2
Now we know that computer keeps track of only the base address. So
the address of any specified location of an array , for example Arr[j,k]
of a 2 d array Arr[m,n] can be calculated by using the following
formula :- (Column major order ) Address(Arr[j,k])=
base(Arr)+w[m(k-1)+(j-1)] (Row major order)
Address(Arr[j,k])=base(Arr)+w[n(j-1)+(k-1)] For example
Arr(25,4) is an array with base value 200.w=4 for this array. The address
of Arr(12,3) can be calculated using row-major order as
Address(Arr(12,3))=200+4[4(12-1)+(3-1)]
=200+4[4*11+2]
=200+4[44+2]
=200+4[46]
=200+184
=384
=200+4[61]
=200+244
=444
SPARSE MATRIX
Matrix with relatively a high proportion of zero entries are called sparse
matrix. Two general types of n-square sparse matrices are there which
occur in various
4 5 -3
3 -5 1 4 3
1 0 6 9 -3 6
-7 8 -1 3 2 4 -7
5 -2 0 -8 3 0
Triangular matrix
This is the matrix where all the entries above the main diagonal are zero or
equivalently where non-zero entries can only occur on or below the main diagonal is
called a (lower)Triangular matrix.
Tridiagonal matrix
This is the matrix where non-zero entries can only occur on the diagonal or on
elements immediately above or below the diagonal is called a Tridiagonal matrix. The
natural method of representing matrices in memory as two-dimensionalarrays may
not be suitable for sparse matrices i.e. one may save space bystoring only those
entries which may be non-zero.
UNIT-IV
Operations on stack: The two basic operations associated with stacks are:
1. Push
2. Pop
While performing push and pop operations the following test must be conducted on the
stack. a) Stack is empty or not b) stack is full or not
1. PUSH: Push operation is used to add new elements in to the stack. At the time of addition
first check the stack is full or not. If the stack is full it generates an error message "stack
overflow".
2. POP: Pop operation is used to delete elements from the stack. At the time of deletion first
check the stack is empty or not. If the stack is empty it generates an error message "stack
underflow". All insertions and deletions take place at the same end, so the last element
added to the stack will be the first element removed from the stack. When a stack is
created, the stack base remains fixed while the stack top changes as elements are added
and removed. The most accessible element is the top and the least accessible element is
the bottom of the stack.
REPRESENTATION OF STACK (OR) IMPLEMENTATION OF STACK:
The stack should be represented in two ways:
1. Stack using array
2. Stack using linked list
Array
The array implementation aims to create an array where the first element (usually at the
zero-offset) is the bottom. That is, array[0] is the first element pushed onto the stack and
the last element popped off. The program must keep track of the size, or the length of the
stack.
Let us consider a stack with 6 elements capacity. This is called as the size of the stack.
The number of elements to be added should not exceed the maximum size of the stack. If
we attempt to add new element beyond the maximum size, we will encounter a stack
overflow condition. Similarly, you cannot remove elements beyond the base of the stack. If
such is the case, we will reach a stack underflow condition.
This procedure deletes the TOP element of STACK and assigns it to the
variable ITEM.
If we use a dynamic array, then we can implement a stack that can grow or
shrink as much as needed. The size of the stack is simply the size of the dynamic
array. A dynamic array is a very efficient implementation of a stack, since adding
items to or removing items from the end of a dynamic array is dynamically with
respect to time.
APPLICATIONS OF STACK:
1. Stack is used by compilers to check for balancing of parentheses, brackets and
braces.
2. Stack is used to evaluate a postfix expression.
3. Stack is used to convert an infix expression into postfix/prefix form.
4. In recursion, all intermediate arguments and return values are stored on the
processor’s stack.
5. During a function call the return address and arguments are pushed onto a stack
and on return they are popped off. Converting and evaluating Algebraic expressions
POLISH NOTATION
Polish notation, also known as Polish prefix notation or simply prefix notation is a
form of notation for logic, arithmetic, and algebra. Its distinguishing feature is that it
places operators to the left of their operands. If the arity of the operators is fixed, the
result is a syntax lacking parentheses or other brackets that can still be parsed
without ambiguity. The Polish logician Jan Łukasiewicz invented this notation in 1924
in order to simplify sentential logic.
The term Polish notation is sometimes taken (as the opposite of infix notation) to also
include Polish postfix notation, or Reverse Polish notation, in which the operator is
placed after the operands.
printf("%d ",value);
}
Example − a function that calls another function which in turn calls it again.
int function1(int value1) {
if(value1 < 1)
return;
function2(value1 - 1);
printf("%d ",value1);
}
int function2(int value2) {
function1(value2);
}
Properties
A recursive function can go infinite like a loop. To avoid infinite running of recursive
function, there are two properties that a recursive function must have −
• Base criteria − There must be at least one base criteria or condition, such that,
when this condition is met the function stops calling itself recursively.
• Progressive approach − The recursive calls should progress in such a way that
each time a recursive call is made it comes closer to the base criteria.
Implementation
Many programming languages implement recursion by means of stacks. Generally,
whenever a function (caller) calls another function (callee) or itself as callee, the caller
function transfers execution control to the callee. This transfer process may also involve
some data to be passed from the caller to the callee.
This implies, the caller function has to suspend its execution temporarily and resume later
when the execution control returns from the callee function. Here, the caller function needs
to start exactly from the point of execution where it puts itself on hold. It also needs the
exact same data values it was working on. For this purpose, an activation record (or stack
frame) is created for the caller function.
This activation record keeps the information about local variables, formal parameters,
return address and all information passed to the caller function.
Types of Recursion
There are two types of Recursion
• Direct recursion
• Indirect recursion
Direct Recursion
When in the body of a method there is a call to the same method, we say that the method
is directly recursive.
There are three types of Direct Recursion
• Linear Recursion
• Binary Recursion
• Multiple Recursion
Linear Recursion
• Linear recursion begins by testing for a set of base cases there should be at least
one.
• Perform a single recursive call. This recursive step may involve a test that decides
which of several possible recursive calls to make, but it should ultimately choose to
make just one of these calls each time we perform this step.
• Define each possible recursion call, so that it makes progress towards a base case.
Binary Recursion
• Binary recursion occurs whenever there are two recursive calls for each non base
case.
Multiple Recursion
• In multiple recursion we make not just one or two but many recursive calls.
Calculating the factorial of a number is a common problem that can be solved recursively.
As a reminder, a factorial of a number, n, is defined by n! and is the result of multiplying the
numbers 1 to n. So, 5! is equal to 5*4*3*2*1, resulting in 120.
function factorial(n) {
if(n === 1 || n === 0) { // base case
return 1;
}
return n * factorial(n - 1); // recursive call
}
QUEUE ADT
A queue is an ordered collection of data such that the data is inserted at one end and
deleted from another end. The key difference when compared stacks is that in a queue the
information stored is processed first-in first-out or FIFO. In other words the information
receive from a queue comes in the same order that it was placed on the queue.
Queue is a linear list of elements in which deletions can take place only at one end, called
the front and insertions can take place only at the other end, called the rear. The
terms“front” and “rear” are used in describing a linear list only when it is implemented as a
queue.
Queue are also called first-in first-out (FIFO) lists, since the first elements enter a queue
will be the first element out of the queue. In other words, the order in which elements enter
a queue is the order in which they leave. This contrasts with stacks, which are last-in first-
out (LIFO) lists.
Queues abound in everyday life. The automobiles waiting to pass through an intersection
form a queue. In which the first car in line is the first car through; the people waiting in line
at a bank form a queue, where the first person in line is the
first person to be waited on; and so on. An important example of a queue in computer
science occurs in a timesharing system, in which programs with the same priority form a
queue while waiting to be executed
REPRESENTATION OF QUEUES:
Queues may be represented in the computer in various ways, usually by means at one-
way lists or linear arrays
Unless otherwise stated or implied, each of our queues will be maintained by a linear array
QUEUE and two pointer variables: FRONT, containing the location of the front element of
the queue; and REAR, containing the location of the rear element of the queue. The
condition FRONT = NULL will indicate that the queue is empty. Following figure shows the
way the array in Figure will be stared in memory using an array QUEUE with N elements.
Figure also indicates the way elements will be deleted from the queue and the way new
elements will be added to the queue. Observe that whenever an element is deleted from
the queue, the value of FRONT is increased by 1; this can be implemented by the
assignment
FRONT: = Rear + 1
Similarly, whenever an element is added to the queue, the value of REAR is increased by
1; this can be implemented by the assignment
REAR: = Rear +1
This means that after N insertion, the rear element of the queue will occupy QUEUE [N] or,
in other words; eventually the queue will occupy the last part of the array. This occurs
even though the queue itself may not contain many elements.
Suppose we want to insert an element ITEM into a queue will occupy the last part of the
array, i.e., when REAR=N. One way to do this is to simply move the entire queue to thee
beginning of the array, changing FRONT and REAR accordingly, and then inserting ITEM
as above. This procedure may be very expensive. The procedure we adopt is to assume
that the array QUEUE is circular, that is, that QUEUE [1] comes after QUEUE [N] in the
array. With this assumption, we insert ITEM into the queue by assigning ITEM to QUEUE
[1]. Specifically, instead of Increasing REAR to N+1, we reset REAR=1 and then assign
QUEUE [REAR]: = ITEM
Similarly, if FRONT = N and an element of QUEUE is deleted, we reset FRONT = 1
instead of increasing FRONT to N +1
Suppose that our queue contains only one element, i.e.,
suppose that FRONT = REAR # NULL
And suppose that the element is deleted. Then we assign FRONT: = NULL
and REAR: = NULL
OPERATIONS ON QUEUE:
Like stacks, underflow and overflow conditions are to be checked before operations in a
queue. Queue empty or underflow condition is
ENQUEUE OPERATION
Queues maintain two data pointers, front and rear. Therefore, its operations are
comparatively difficult to implement than that of stacks. The following steps should be taken
to enqueue (insert) data into a queue –
.• Step 3 − If the queue is not full, increment rear pointer to point the next empty space.
• Step 4 − Add data element to the queue location, where the rear is pointing.
• Step 5 − return
PROCEDURE ENQUEUE(DATA)
endif
rear ← rear + 1
queue[rear] ← data
return true
end procedure
DEQUEUE OPERATION
Accessing data from the queue is a process of two tasks − access the data where front is
pointing and remove the data after access.
.• Step 3 − If the queue is not empty, access the data where front is pointing
.• Step 4 − Increment front pointer to point to the next available data element.
procedure dequeue
if queue is empty
return underflow
end
if data = queue[front]
front ← front + 1
return true
end procedure
Circular Queue
Why was the concept of the circular queue introduced?
There was one limitation in the array implementation of Queue
. If the rear reaches to the end position of the Queue then there might be possibility that some vacant
spaces are left in the beginning which cannot be utilized. So, to overcome such limitations, the concept of
the circular queue was introduced.
As we can see in the above image, the rear is at the last position of the Queue and front is
pointing somewhere rather than the 0th position. In the above array, there are only two
elements and other three positions are empty. The rear is at the last position of the Queue;
if we try to insert the element then it will show that there are no empty spaces in the
Queue. There is one solution to avoid such wastage of memory space by shifting both the
elements at the left and adjust the front and rear end accordingly. It is not a practically
good approach because shifting all the elements will consume lots of time. The efficient
approach to avoid the wastage of the memory is to use the circular queue data structure.
29M
664
Enqueue operation
The steps of enqueue operation are given below:
o If rear != max - 1, then rear will be incremented to mod(maxsize) and the new value will
be inserted at the rear end of the queue.
o If front != 0 and rear = max - 1, it means that queue is not full, then set the value of rear to
0 and insert the new element there.
o When front ==0 && rear = max-1, which means that front is at the first position of the
Queue and rear is at the last position of the Queue.
o front== rear + 1;
Step 4: EXIT
Dequeue Operation
The steps of dequeue operation are given below:
o First, we check whether the Queue is empty or not. If the queue is empty, we cannot perform
the dequeue operation.
o When the element is deleted, the value of front gets decremented by 1.
o If there is only one element left which is to be deleted, then the front and rear are reset to -
1.
Step 1: IF FRONT = -1
Write " UNDERFLOW "
Goto Step 4
[END of IF]
Step 4: EXIT
et's understand the enqueue and dequeue operation through the diagrammatic
representation.
PRIORITY QUEUE
• A priority queue is a special type of queue in which each element is associated with
a priority value. And, elements are served on the basis of their priority. That is, higher
priority elements are served first.
• However, if elements with the same priority occur, they are served according to their
order in the queue.
• Assigning Priority Value
• Generally, the value of the element itself is considered for assigning the priority. For
example,
• The element with the highest value is considered the highest priority element. However,
in other cases, we can assume the element with the lowest value as the highest priority
element.
• We can also set priorities according to our needs.
DIFFERENCE BETWEEN PRIORITY QUEUE AND NORMAL QUEUE
Stack Queue
A Linear List Which allows insertion or deletion A Linear List Which allows insertion at one end
of and
an element at one end only is called as Stack deletion at another end is called as Queue
Since insertion and deletion of an element are Since insertion and deletion of an element are
performed at one end of the stack, the performed at opposite end of the queue, the
elements can only be removed in the opposite elements can only be removed in the same
order of order of
insertion. insertion.
Stack is called as Last In First Out (LIFO) List. Queue is called as First In First Out (FIFO) List.
The most and least accessible elements are Insertion of element is performed at FRONT
called end
as TOP and BOTTOM of the stack and deletion is performed from REAR end
Example of stack is arranging plates in one Example is ordinary queue in provisional store.
above
one.
Insertion operation is referred as PUSH Insertion operation is referred as ENQUEUE
and and
deletion operation is referred as POP deletion operation is referred as DQUEUE
Function calling in any languages uses Stack Task Scheduling by Operating System uses
queue
UNIT-V
LINKED LIST
LIST
o Linked List can be defined as collection of objects called nodes that are randomly
stored in the memory.
o A node contains two fields i.e. data stored at that particular address and the pointer
which contains the address of the next node in the memory.
o The last node of the list contains pointer to the null.
Till now, we were using array data structure to organize the group of elements that are to
be stored individually in the memory. However, Array has several advantages and
disadvantages which must be known in order to decide the data structure which will be
used throughout the program.
1. The size of array must be known in advance before using it in the program.
2. Increasing size of the array is a time taking process. It is almost impossible to
expand the size of the array at run time.
3. All the elements in the array need to be contiguously stored in the memory. Inserting
any element in the array needs shifting of all its predecessors.
Linked list is the data structure which can overcome all the limitations of an array. Using
linked list is useful because,
1. It allocates the memory dynamically. All the nodes of linked list are non-contiguously
stored in the memory and linked together with the help of pointers.
2. Sizing is no longer a problem since we do not need to define its size at the time of
declaration. List grows as per the program's demand and limited to the available
memory space.
Singly linked list can be defined as the collection of ordered set of elements. The number of
elements may vary according to need of the program. A node in the singly linked list
consist of two parts: data part and link part. Data part of the node stores actual information
that is to be represented by the node while the link part of the node stores the address of
its immediate successor.
One way chain or singly linked list can be traversed only in one direction. In other words,
we can say that each node contains only next pointer, therefore we can not traverse the list
in the reverse direction.
Consider an example where the marks obtained by the student in three subjects are stored
in a linked list as shown in the figure.
In the above figure, the arrow represents the links. The data part of every node contains
the marks obtained by the student in the different subject. The last node in the list is
identified by the null pointer which is present in the address part of the last node. We can
have as many elements we require, in the data part of the list.
More memory space is needed if no. of filed are more. Logical &
physical ordering of node are different. Searching is solve . Difficult to
program because pointer manipulation is required.
Linear linked list or one way linked list or single list. Double linked list or two way
linked list are two way linked list. Circular linked list is two types i.e.
Let list be a linked list. Then list required to linear arrays called INFO and LINK.
Such that INFO[k] and LINK[k] contain the information part & next
pointerdenoted by a NULL which indicates the end of the LIST.
The computer maintains a special list which consist of a list of all free memory
calls & also has its own pointer is called the list of available space or the free
storage list or the free pool .
Suppose insertion are to be performed on linked list then unused memory calls I
the array will also be linked to gather to form a linked list using AVAIL. As its list
pointer variable such a data structure will be denoted by writing
LIST(INFO,LINK,START,AVAIL)
The operating system of a computer may periodically collect all the deleted space
on to the free storage list. Any technique which does these collections is called
garbage collection.
When we delete a particular note from an existing linked list or delete the linked
list the space occupied by it must be given back to the free pool. So that the
memory can be the used by some other program that needs memory space.
To the free pool is done.
The operating system will perform this operation whenever it finds the CPU idle
or whenever the programs are falling short of memory space. The OS scans
through the entire memory cell & marks those cells. That are being by some
program then it collects the entire cell which are not being used & add to the free
pool. So that this cells can be used by other programs. This process is called
garbage collection. The garbage collection is invisible to the programmer.
Algorithm:-
Let list be a linked list in memory, this algorithm traverse LIST applying &
operation PROCESS to each element of LIST. The variable PTR to point to the
nodes currently being processed.
Step 1:-set PTR=START [initialize pointer PTR] Step 2:-repeat step 3 & step 4
while PTR! = NULL Step 3:-apply PROCESS to INFO[PTR]
STEP 4:-SET PTR=LINK [PTR]
[PTR now points to the next node]End of step 2 loop
Step 5:-exit
Algorithm for searching linked list:- SEARCH (INFO,LINK,START,ITEM,LOC)
LIST is a linked list in memory , this algorithm finds the location LOC
of the node where ITEM first appears in LIST or sets , LOC=NULL.
Step 1:-set PTR=START[initialize pointer PTR]
Step 2:-repeat step 3 while PTR ! = NULL
Step 3:-if ITEM = INFO[PTR]
Then set LOC = PTR & exit
Else
Set PTR = LINK[PTR]
[PTR now points to next node] [End of if structure]
End of step 2 loop
Step 4:-[Search is unsuccessful] Set LOC = NULL
Step 5:-Exit
Inserting the node at the beginning of the list:
Else
Set = LINK [LOCP] = LINK [LOC]
[Deletes node N][End of if]
In tree data structure, every individual element is called as Node. Node in a tree data
structure stores the actual data of that particular element and link to next element in
hierarchical structure.
In a tree data structure, if we have N number of nodes then we can have a maximum
of N-1 number of links.
Example
Terminology
In a tree data structure, we use the following terminology...
1. Root
In a tree data structure, the first node is called as Root Node. Every tree must have a
root node. In any tree, there must be only one root node. We never have multiple root
nodes in a tree.
2. Edge
In a tree data structure, the connecting link between any two nodes is called as EDGE.
In a tree with 'N' number of nodes there will be a maximum of 'N-1' number of edges.
3. Parent
In a tree data structure, the node which is a predecessor of any node is called
as PARENT NODE. In simple words, the node which has a branch from it to any other
node is called a parent node. Parent node can also be defined as "The node which has
child / children".
4. Child
In a tree data structure, the node which is descendant of any node is called as CHILD
Node. In simple words, the node which has a link from its parent node is called as child
node. In a tree, any parent node can have any number of child nodes. In a tree, all the
nodes except root are child nodes.
5. Siblings
In a tree data structure, nodes which belong to same Parent are called as SIBLINGS. In
simple words, the nodes with the same parent are called Sibling nodes.
6. Leaf
In a tree data structure, the node which does not have a child is called as LEAF Node. In
simple words, a leaf is a node with no child.
In a tree data structure, the leaf nodes are also called as External Nodes. External node
is also a node with no child. In a tree, leaf node is also called as 'Terminal' node.
7. Internal Nodes
In a tree data structure, the node which has atleast one child is called as INTERNAL
Node. In simple words, an internal node is a node with atleast one child.
In a tree data structure, nodes other than leaf nodes are called as Internal Nodes. The
root node is also said to be Internal Node if the tree has more than one node. Internal
nodes are also called as 'Non-Terminal' nodes.
8. Degree
In a tree data structure, the total number of children of a node is called as DEGREE of
that Node. In simple words, the Degree of a node is total number of children it has. The
highest degree of a node among all the nodes in a tree is called as 'Degree of Tree'
9. Level
In a tree data structure, the root node is said to be at Level 0 and the children of root
node are at Level 1 and the children of the nodes which are at Level 1 will be at Level 2
and so on... In simple words, in a tree each step from top to bottom is called as a Level
and the Level count starts with '0' and incremented by one at each level (Step).
10. Height
In a tree data structure, the total number of edges from leaf node to a particular node in
the longest path is called as HEIGHT of that Node. In a tree, height of the root node is
said to be height of the tree. In a tree, height of all leaf nodes is '0'.
11. Depth
In a tree data structure, the total number of egdes from root node to a particular node is
called as DEPTH of that Node. In a tree, the total number of edges from root node to a
leaf node in the longest path is said to be Depth of the tree. In simple words, the highest
depth of any leaf node in a tree is said to be depth of that tree. In a tree, depth of the
root node is '0'.
12. Path
In a tree data structure, the sequence of Nodes and Edges from one node to another
node is called as PATH between that two Nodes. Length of a Path is total number of
nodes in that path. In below example the path A - B - E - J has length 4.
13. Sub Tree
In a tree data structure, each child from a node forms a subtree recursively. Every child
node will form a subtree on its parent node.
A tree in which every node can have a maximum of two children is called Binary
Tree.
In a binary tree, every node can have either 0 children or 1 child or 2 children but not
more than 2 children.
Example
PROPERTIES OF BINARY TREE: Some of the important properties of a binary tree are
as follows:
1. If h = height of a binary tree, then
a. Maximum number of leaves = 2h
b. Maximum number of nodes = 2h + 1 – 1
2. If a binary tree contains m nodes at level l, it contains at most 2m nodes at level l + 1.
3. Since a binary tree can contain at most one node at level 0 (the root), it can contain at
most 2l node at level l.
4. The total number of edges in a full binary tree with n node is n - 1
A binary tree in which every node has either two or zero number of children is
called Strictly Binary Tree
Strictly binary tree is also called as Full Binary Tree or Proper Binary Tree or 2-Tree
Strictly binary tree data structure is used to represent mathematical expressions.
Example
A binary tree in which every internal node has exactly two children and all leaf
nodes are at same level is called Complete Binary Tree.
The full binary tree obtained by adding dummy nodes to a binary tree is called as
Extended Binary Tree.
In above figure, a normal binary tree is converted into full binary tree by adding dummy
nodes (In pink colour).
1. Array Representation
2. Linked List Representation
Array Representation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
10 5 16 - 8 15 20 - - - - - - - 23
The index 1 is holding the root, it has two children 5 and 16, they are placed at location 2
and 3. Some children are missing, so their place is left as blank.
In this representation we can easily get the position of two children of one node by using
this formula −
child1=2∗parentchild1=2∗parent
child2=⟮2∗parent⟯+1child2=⟮2∗parent⟯+1
parent=[child2]parent=[child2]
This approach is good, and easily we can find the index of parent and child, but it is not
memory efficient. It will occupy many spaces that has no use. This representation is
good for complete binary tree or full binary tree.
Another approach is by using linked lists. We create node for each element. This will be
look like below −
Traversal is a process to visit all the nodes of a tree and may print their values too.
Because, all nodes are connected via edges links we always start from the root head
node. That is, we cannot random access a node in tree. There are three ways which we
use to traverse a tree –
In-order Traversal
Pre-order Traversal
Post-order Traversal
Generally we traverse a tree to search or locate given item or key in the tree or to print
all the values it contains.
. Preorder Traversal-
Algorithm-
Example-
Applications-
2. Inorder Traversal-
Algorithm-
1. Traverse the left sub tree i.e. call Inorder (left sub tree)
2. Visit the root
3. Traverse the right sub tree i.e. call Inorder (right sub tree)
Example-
Application-
3. Postorder Traversal-
Algorithm-
1. Traverse the left sub tree i.e. call Postorder (left sub tree)
2. Traverse the right sub tree i.e. call Postorder (right sub tree)
3. Visit the root
Example-
Applications-
A binary search tree follows some order to arrange the elements. In a Binary search tree,
the value of left node must be smaller than the parent node, and the value of right node
must be greater than the parent node. This rule is applied recursively to the left and right
subtrees of the root.
In the above figure, we can observe that the root node is 40, and all the nodes of the left
subtree are smaller than the root node, and all the nodes of the right subtree are greater
than the root node.
Similarly, we can see the left child of root node is greater than its left child and smaller
than its right child. So, it also satisfies the property of binary search tree. Therefore, we
can say that the tree in the above image is a binary search tree.
Suppose if we change the value of node 35 to 55 in the above tree, check whether the
tree will be binary search tree or not.
In the above tree, the value of root node is 40, which is greater than its left child 30 but
smaller than right child of 30, i.e., 55. So, the above tree does not satisfy the property of
Binary search tree. Therefore, the above tree is not a binary search tree.
o Searching an element in the Binary search tree is easy as we always have a hint
that which subtree has the desired element.
o As compared to array and linked lists, insertion and deletion operations are faster
in BST.
Now, let's see the creation of binary search tree using an example.
Suppose the data elements are - 45, 15, 79, 90, 10, 55, 12, 20, 50
o First, we have to insert 45 into the tree as the root of the tree.
o Then, read the next element; if it is smaller than the root node, insert it as the root
of the left subtree, and move to the next element.
o Otherwise, if the element is larger than the root node, then insert it as the root of
the right subtree.
Now, let's see the process of creating the Binary search tree using the given data
element. The process of creating the BST is shown below -
As 15 is smaller than 45, so insert it as the root node of the left subtree.
As 79 is greater than 45, so insert it as the root node of the right subtree.
90 is greater than 45 and 79, so it will be inserted as the right subtree of 79.
Step 5 - Insert 10.
55 is larger than 45 and smaller than 79, so it will be inserted as the left subtree of 79.
Step 7 - Insert 12.
12 is smaller than 45 and 15 but greater than 10, so it will be inserted as the right
subtree of 10.
20 is smaller than 45 but greater than 15, so it will be inserted as the right subtree of 15.
Step 9 - Insert 50.
50 is greater than 45 but smaller than 79 and 55. So, it will be inserted as a left subtree
of 55.
Now, the creation of binary search tree is completed. After that, let's move towards the
operations that can be performed on Binary search tree.
We can perform insert, delete and search operations on the binary search tree.
Let's understand how a search is performed on a binary search tree.
1. First, compare the element to be searched with the root element of the tree.
2. If root is matched with the target element, then return the node's location.
3. If it is not matched, then check whether the item is less than the root element, if it
is smaller than the root element, then move to the left subtree.
4. If it is larger than the root element, then move to the right subtree.
5. Repeat the above procedure recursively until the match is found.
6. If the element is not found or not present in the tree, then return NULL.
Now, let's understand the searching in binary tree using an example. We are taking the
binary search tree formed above. Suppose we have to find node 20 from the below tree.
Step1:
Step2:
Step3:
Now, let's see the algorithm to search an element in the Binary search tree.
In a binary search tree, we must delete a node from the tree by keeping in mind that the
property of BST is not violated. To delete a node from BST, there are three possible
situations occur -
It is the simplest case to delete a node in BST. Here, we have to replace the leaf node
with NULL and simply free the allocated space.
We can see the process to delete a leaf node from BST in the below image. In below
image, suppose we have to delete node 90, as the node to be deleted is a leaf node, so
it will be replaced with NULL, and the allocated space will free.
In this case, we have to replace the target node with its child, and then delete the child
node. It means that after replacing the target node with its child node, the child node will
now contain the value to be deleted. So, we simply have to replace the child node with
NULL and free up the allocated space.
We can see the process of deleting a node with one child from BST in the below image.
In the below image, suppose we have to delete the node 79, as the node to be deleted
has only one child, so it will be replaced with its child 55.
So, the replaced node 79 will now be a leaf node that can be easily deleted.
This case of deleting a node in BST is a bit complex among other two cases. In such a
case, the steps to be followed are listed as follows -
The inorder successor is required when the right child of the node is not empty. We can
obtain the inorder successor by finding the minimum element in the right child of the
node.
We can see the process of deleting a node with two children from BST in the below
image. In the below image, suppose we have to delete node 45 that is the root node, as
the node to be deleted has two children, so it will be replaced with its inorder successor.
Now, node 45 will be at the leaf of the tree so that it can be deleted easily.
Now let's understand how insertion is performed on a binary search tree.
A new key in BST is always inserted at the leaf. To insert an element in BST, we have to
start searching from the root node; if the node to be inserted is less than the root node,
then search for an empty location in the left subtree. Else, search for the empty location
in the right subtree and insert the data. Insert in BST is similar to searching, as we
always have to maintain the rule that the left subtree is smaller than the root, and right
subtree is larger than the root.
Now, let's see the process of inserting a node into BST using an example.
UNIT-VII
GRAPH
A Graph is a non-linear data structure consisting of nodes and edges. The nodes are
sometimes also referred to as vertices and the edges are lines or arcs that connect any
two nodes in the graph. More formally a Graph can be defined as,
A Graph consists of a finite set of vertices (or nodes) and set of Edges which connect a
pair of nodes.
In the above Graph, the set of vertices V = {0,1,2,3,4} and the set of edges E = {01, 12, 23,
34, 04, 14, 13}.
Graphs are used to solve many real-life problems. Graphs are used to represent networks.
The networks may include paths in a city or telephone network or circuit network. Graphs
are also used in social networks like linkedIn, Facebook. For example, in Facebook, each
person is represented with a vertex(or node). Each node is a structure and contains
information like person id, name, gender, locale etc.
Definition
A graph G can be defined as an ordered set G(V, E) where V(G) represents the set of
vertices and E(G) represents the set of edges which are used to connect these vertices.
A Graph G(V, E) with 5 vertices (A, B, C, D, E) and six edges ((A,B), (B,C), (C,E), (E,D),
(D,B), (D,A)) is shown in the following figure.
DIRECTED AND UNDIRECTED GRAPH
A graph can be directed or undirected. However, in an undirected graph, edges are not
associated with the directions with them. An undirected graph is shown in the above
figure since its edges are not attached with any of the directions. If an edge exists
between vertex A and B then the vertices can be traversed from B to A as well as A to B.
In a directed graph, edges form an ordered pair. Edges represent a specific path from
some vertex A to another vertex B. Node A is called initial node while node B is called
terminal node.
A directed graph is shown in the following figure.
GRAPH TERMINOLOGY
Path
A path can be defined as the sequence of nodes that are followed in order to reach some
terminal node V from the initial node U.
Closed Path
A path will be called as closed path if the initial node is same as terminal node. A path
will be closed path if V0=VN.
Simple Path
If all the nodes of the graph are distinct with an exception V 0=VN, then such path P is
called as closed simple path.
Cycle
A cycle can be defined as the path which has no repeated edges or vertices except the
first and last vertices.
Connected Graph
A connected graph is the one in which some path exists between every two vertices (u,
v) in V. There are no isolated nodes in connected graph.
Complete Graph
A complete graph is the one in which every node is connected with all other nodes. A
complete graph contain n(n-1)/2 edges where n is the number of nodes in the graph.
Weighted Graph
In a weighted graph, each edge is assigned with some data such as length or weight.
The weight of an edge e can be given as w(e) which must be a positive (+) value
indicating the cost of traversing the edge.
Digraph
A digraph is a directed graph in which each edge of the graph is associated with some
direction and the traversing can be done only in the specified direction.
Loop
An edge that is associated with the similar end points can be called as Loop.
Adjacent Nodes
If two nodes u and v are connected via an edge e, then the nodes u and v are called as
neighbours or adjacent nodes.
Degree of the Node
A degree of a node is the number of edges that are connected with that node. A node
with degree 0 is called as isolated node.
Graph Representation
By Graph representation, we simply mean the technique which is to be used in order to
store some graph into the computer's memory.
There are two ways to store Graph into the computer's memory. In this part of this
tutorial, we discuss each one of them in detail.
1. Sequential Representation
In sequential representation, we use adjacency matrix to store the mapping represented
by vertices and edges. In adjacency matrix, the rows and columns are represented by
the graph vertices. A graph having n vertices, will have a dimension n x n.
An entry Mij in the adjacency matrix representation of an undirected graph G will be 1 if
there exists an edge between Vi and Vj.
An undirected graph and its adjacency matrix representation is shown in the following
figure.
in the above figure, we can see the mapping among the vertices (A, B, C, D, E) is
represented by using the adjacency matrix which is also shown in the figure.
There exists different adjacency matrices for the directed and undirected graph. In
directed graph, an entry Aij will be 1 only when there is an edge directed from Vi to Vj.
A directed graph and its adjacency matrix representation is shown in the following figure.
Representation of weighted directed graph is different. Instead of filling the entry by 1,
the Non- zero entries of the adjacency matrix are represented by the weight of respective
edges.
The weighted directed graph along with the adjacency matrix representation is shown in
the following figure.
LINKED REPRESENTATION
In the linked representation, an adjacency list is used to store the Graph into the
computer's memory.
Consider the undirected graph shown in the following figure and check the adjacency list
representation.
An adjacency list is maintained for each node present in the graph which stores the node
value and a pointer to the next adjacent node to the respective node. If all the adjacent
nodes are traversed then store the NULL in the pointer field of last node of the list. The
sum of the lengths of adjacency lists is equal to the twice of the number of edges present
in an undirected graph.
Consider the directed graph shown in the following figure and check the adjacency list
representation of the graph.
In a directed graph, the sum of lengths of all the adjacency lists is equal to the number of
edges present in the graph.
In the case of weighted directed graph, each node contains an extra field that is called
the weight of the node. The adjacency list representation of a directed graph is shown in
the following figure.
ADJACENCY MATRIX
In graph theory, an adjacency matrix is a dense way of describing the finite graph
structure. It is the 2D matrix that is used to map the association between the graph
nodes.
If a graph has n number of vertices, then the adjacency matrix of that graph is n x n, and
each entry of the matrix represents the number of edges from one vertex to another.
An adjacency matrix is also called as connection matrix. Sometimes it is also called
a Vertex matrix.
erence between JDK, JRE, and JVM
• If there exists an edge between vertex Vi and Vj, where i is a row, and j is a
column, then the value of aij = 1.
• If there is no edge between vertex Vi and Vj, then the value of aij = 0.
• If there are no self loops in the simple graph, then the vertex matrix (or adjacency
matrix) should have 0s in the diagonal.
• An adjacency matrix is symmetric for an undirected graph. It specifies that the
value in the ith row and jth column is equal to the value in jth row ith
• If the adjacency matrix is multiplied by itself, and if there is a non-zero value is
present at the ith row and jth column, then there is the route from Vi to Vj with a
length equivalent to 2. The non-zero value in the adjacency matrix represents that
the number of distinct paths is present.
Note: In an adjacency matrix, 0 represents that there is no association is exists between
two nodes, whereas 1 represents that there is an association is exists between two
nodes.
In the graph, we can see there is no self-loop, so the diagonal entries of the adjacent
matrix will be 0. The adjacency matrix of the above graph will be -
In a directed graph, edges form an ordered pair. Edges represent a specific path from
some vertex A to another vertex B. Node A is called the initial node, while node B is
called the terminal node.
Let us consider the below directed graph and try to construct the adjacency matrix of it.
In the above graph, we can see there is no self-loop, so the diagonal entries of the
adjacent matrix will be 0. The adjacency matrix of the above graph will be -
NOTE: A graph is said to be the weighted graph if each edge is assigned a positive
number, which is called the weight of the edge.
Question 1 - What will be the adjacency matrix for the below undirected weighted
graph?
Solution - In the given question, there is no self-loop, so it is clear that the diagonal
entries of the adjacent matrix for the above graph will be 0. The above graph is a
weighted undirected graph. The weights on the graph edges will be represented as the
entries of the adjacency matrix.
The adjacency matrix of the above graph will be -
Question 2 - What will be the adjacency matrix for the below directed weighted graph?
Solution - In the given question, there is no self-loop, so it is clear that the diagonal
entries of the adjacent matrix for the above graph will be 0. The above graph is a
weighted directed graph. The weights on the graph edges will be represented as the
entries of the adjacency matrix.
The adjacency matrix of the above graph will be -
Hope this article is beneficial to you in order to understand about adjacency matrix. Here,
we have discussed the adjacency matrix along with its creation and properties. We have
also discussed the formation of adjacency matrix on directed or undirected graphs,
whether they are weighted or not.
WHAT IS PATH MATRIX?
A path matrix is a matrix representing a graph where each value in m’th row and n’th
column project whether there is a path from m to n. The path may be direct or indirect. It
may have a single edge or multiple edges.
Derive A2 : Derive A2 :
Derive A3 :
Derive A4 :
Derive Path Matrix P from B4 by replacing any none zero value with 1:
Difference Between Adjacency Matrix & Path Matrix
Key difference between Adjacency Matrix and Path Matrix is that an adjacency matrix is
about direct edge where a path matrix is about whether can be traveled or not. Path
matrix include both direct and indirect edges.
UNIT-VIII
Sorting refers to arranging data in a particular format. Sorting algorithm specifies the way
to arrange data in a particular order. Most common orders are in numerical or
lexicographical order.
BUBBLE SORTING:
Algorithm:
BUBBLE
(DATA,N)
Here DATA is an array with N element. This algorithm sorts the
element in DATA. Step 1: [Loop]
Repeat step 2 and step 3 for K=1 to N-1
Step 2: [Initialize pass pointer PTR]
Set[PTR]=1
Step 3: [Execute pass]
Repeat while PTR <=N-K
a. If DATA [PTR] > DATA [PTR+1]
Then interchange DATA [PTR] &
DATA [PTR+1][End of if structure]
b. Set PTR =PTR+1
[End of
Step 1 Loop] Step
4: Exit
COMPLEXITY OF THE BUBBLE SORT ALGORITHM
There are n-1 comparisons during the first pass, which places the largest
element in the last position;
there are n-2 comparisons in the second step, which places the second largest
element in the next to last position and so on.
F(n)=(n-1)+(n-2)+….+2+1=n(n-1)/2=n^2/2+0(n)=0(n^2)
The time required to execute the bubble sort algorithm is
proportional to n^2,Where n is the number of input items.
To understand the working of bubble sort algorithm, let's take an unsorted array. We are
taking a short and accurate array, as we know the complexity of bubble sort is O(n2).
First Pass
Sorting will start from the initial two elements. Let compare them to check which is
greater.
Here, 32 is greater than 13 (32 > 13), so it is already sorted. Now, compare 32 with 26.
Here, 26 is smaller than 36. So, swapping is required. After swapping new array will look
like -
Here, 35 is greater than 32. So, there is no swapping required as they are already
sorted.
Here, 10 is smaller than 35 that are not sorted. So, swapping is required. Now, we reach
at the end of the array. After first pass, the array will be -
Second Pass
Here, 10 is smaller than 32. So, swapping is required. After swapping, the array will be -
Now, move to the third iteration.
Third Pass
Here, 10 is smaller than 26. So, swapping is required. After swapping, the array will be -
Fourth pass
Now, let's see the time complexity of bubble sort in the best case, average case, and
worst case. We will also see the space complexity of bubble sort.
1. Time Complexity
o Best Case Complexity - It occurs when there is no sorting required, i.e. the array
is already sorted. The best-case time complexity of bubble sort is O(n).
o Average Case Complexity - It occurs when the array elements are in jumbled
order that is not properly ascending and not properly descending. The average
case time complexity of bubble sort is O(n2).
o Worst Case Complexity - It occurs when the array elements are required to be
sorted in reverse order. That means suppose you have to sort the array elements
in ascending order, but its elements are in descending order. The worst-case time
complexity of bubble sort is O(n2).
QUICK SORT
• Quick sort is also known as Partition-exchange sort based on the rule of Divide
and Conquer.
• It is a highly efficient sorting algorithm.
• Quick sort is the quickest comparison-based sorting algorithm.
• It is very fast and requires less additional space
• Quick sort picks an element as pivot and partitions the array around the picked
pivot.
There are different versions of quick sort which choose the pivot in different ways:
4. Median as pivot
1. 44 33 11 55 77 90 40 60 99 22 88
Let 44 be the Pivot element and scanning done from right to left
Comparing 44 to the right-side elements, and if right-side elements are smaller than 44,
then swap it. As 22 is smaller than 44 so swap them.
22 33 11 55 77 90 40 60 99 44 88
Now comparing 44 to the left side element and the element must be greater than 44
then swap them. As 55 are greater than 44 so swap them.
22 33 11 44 77 90 40 60 99 55 88
Recursively, repeating steps 1 & steps 2 until we get two lists one left from pivot
element 44 & one right from pivot element.
22 33 11 40 77 90 44 60 99 55 88
22 33 11 40 44 90 77 60 99 55 88
Now, the element on the right side and left side are greater than and smaller
than 44 respectively.
And these sublists are sorted under the same process as above done.
Merging Sublists:
SORTED LISTS
lgorithm:
Partition Algorithm:
In Merge Sort, the given unsorted array with n elements, is divided into n subarrays,
each having one element, because a single element is always sorted in itself. Then, it
repeatedly merges these subarrays, to produce new sorted subarrays, and in the end,
one complete sorted array is produced.
Algorithm
rocedure mergesort( var a as array )
if ( n == 1 ) return a
l1 = mergesort( l1 )
l2 = mergesort( l2 )
var c as array
while ( a and b have elements )
if ( a[0] > b[0] )
add b[0] to the end of c
remove b[0] from b
else
add a[0] to the end of c
remove a[0] from a
end if
end while
return c
end procedure
To know about merge sort implementation in C programming language, please click
here.
How Merge Sort Works?
As we have already discussed that merge sort utilizes divide-and-conquer rule to break
the problem into sub-problems, the problem in this case being, sorting a given array.
In merge sort, we break the given array midway, for example if the original array
had 6 elements, then merge sort will break it down into two subarrays with 3 elements
each.
But breaking the orignal array into 2 smaller subarrays is not helping us in sorting the
array.
So we will break these subarrays into even smaller subarrays, until we have multiple
subarrays with single element in them. Now, the idea here is that an array with a single
element is already sorted, so once we break the original array into subarrays which has
only a single element, we have successfully broken down our problem into base
problems.
And then we have to merge all these sorted subarrays, step by step to form one single
sorted array.
Below, we have a pictorial representation of how merge sort will sort the given array.
I COMPLEXITY OF THE MERGING ALGORITHM
SEARCHING:
1. linear Search
2. Binary Search
LINEAR SEARCH:
ALGORITHM
Algorithm
File is a collection of records related to each other. The file size is limited by the size of
memory and storagemedium.
FILE ORGANIZATION
File organization ensures that records are available for processing. It is used to
determine an efficient file organization for each base relation.
• Storing and sorting in contiguous block within files on tape or disk is called as
sequential access file organization.
• In sequential access file organization, all records are stored in a sequential order. The
records are arranged in the ascending or descending order of a key field.
• Sequential file search starts from the beginning of the file and the records can be added
at the end of the file.
• In sequential file, it is not possible to add a record in the middle of the file without
rewriting the file.
• Direct access file is also known as random access or relative file organization.
• In direct access file, all records are stored in direct access storage device (DASD),
such as hard disk. The records are randomly placed throughout the file.
• The records does not need to be in sequence because they are updated directly and
rewritten back in the same location.
• This file organization is useful for immediate access to large amount of information. It is
used in accessing large databases.
• Direct access file helps in online transaction processing system (OLTP) like online
railway reservation system.
• It is expensive.
• Indexed sequential access file combines both sequential file and direct access file
organization.
• In indexed sequential access file, records are stored randomly on a direct access
device such as magnetic disk by a primary key.
• This file have multiple keys. These keys can be alphanumeric in which the records are
ordered is called primary key.
• The data can be access either sequentially or randomly using the index. The index is
stored in a file and read into memory when the file is opened.
• In indexed sequential access file, sequential file and random file access is possible.
• It accesses the records very fast if the index table is properly organized.
• Indexed sequential access file requires unique keys and periodic reorganization.
• Indexed sequential access file takes longer time to search the index for the data access
or retrieval.
• It is less efficient in the use of storage space as compared to other file organizations.
HASHING
There are many possibilities for representing the dictionary and one of the best methods
for representing is hashing. Hashing is a type of a solution which can be used in almost
all situations. Hashing is a technique which uses less key comparisons and searches the
element in O(n) time in the worst case and in an average case it will be done
in O(1) time. This method generally used the hash functions to map the keys into a table,
which is called a hash table.
1) Hash table
Hash table is a type of data structure which is used for storing and accessing data very
quickly. Insertion of data in a table is based on a key value. Hence every entry in the
hash table is defined with some key. By using this key data can be searched in the hash
table by few key comparisons and then searching time is dependent upon the size of the
hash table.
2) Hash function
There are various types of hash function which are used to place the data in a hash
table,
1. Division method
In this the hash function is dependent upon the remainder of a division. For example:-if
the record 52,68,99,84 is to be placed in a hash table and let us take the table size is 10.
Then:
In this method firstly key is squared and then mid part of the result is taken as the index.
For example: consider that if we want to place a record of 3101 and the size of table is
1000. So 3101*3101=9616201 i.e. h (3101) = 162 (middle 3 digit)
In this method the key is divided into separate parts and by using some simple
operations these parts are combined to produce a hash key. For example: consider a
record of 12465512 then it will be divided into parts i.e. 124, 655, 12. After dividing the
parts combine these parts by adding it.
H(key)=124+655+12
=791
1. The hash function should generate different hash values for the similar string.
2. The hash function is easy to understand and simple to compute.
3. The hash function should produce the keys which will get distributed, uniformly
over an array.
4. A number of collisions should be less while placing the data in the hash table.
5. The hash function is a perfect hash function when it uses all the input data.
Collision
It is a situation in which the hash function returns the same hash key for more than one
record, it is called as collision. Sometimes when we are going to resolve the collision it
may lead to a overflow condition and this overflow and collision condition makes the poor
hash function.
If there is a problem of collision occurs then it can be handled by apply some technique.
These techniques are called as collision resolution techniques. There are generally four
techniques which are described below.
1) Chaining
It is a method in which additional field with data i.e. chain is introduced. A chain is
maintained at the home bucket. In this when a collision occurs then a linked list is
maintained for colliding data.
Example: Let us consider a hash table of size 10 and we apply a hash function of
H(key)=key % size of table. Let us take the keys to be inserted are 31,33,77,61. In the
above diagram we can see at same bucket 1 there are two records which are maintained
by linked list or we can say by chaining method.
2) Linear probing
It is very easy and simple method to resolve or to handle the collision. In this collision
can be solved by placing the second record linearly down, whenever the empty place is
found. In this method there is a problem of clustering which means at some place block
of a data is formed in a hash table.
Example: Let us consider a hash table of size 10 and hash function is defined as
H(key)=key % table size. Consider that following keys are to be inserted that are
56,64,36,71.
In this diagram we can see that 56 and 36 need to be placed at same bucket but by
linear probing technique the records linearly placed downward if place is empty i.e. it can
be seen 36 is placed at index 7.
3) Quadratic probing
This is a method in which solving of clustering problem is done. In this method the hash
function is defined by the H(key)=(H(key)+x*x)%table size. Let us consider we have to
insert following elements that are:-67, 90,55,17,49.
In this we can see if we insert 67, 90, and 55 it can be inserted easily but at case of 17
hash function is used in such a manner that :-(17+0*0)%10=17 (when x=0 it provide the
index value 7 only) by making the increment in value of x. let x =1 so (17+1*1)%10=8.in
this case bucket 8 is empty hence we will place 17 at index 8.
4) Double hashing
It is a technique in which two hash function are used when there is an occurrence of
collision. In this method 1 hash function is simple as same as division method. But for
the second hash function there are two important rules which are
Where, p is a prime number which should be taken smaller than the size of a hash table.
In this we can see 67, 90 and 55 can be inserted in a hash table by using first hash
function but in case of 17 again the bucket is full and in this case we have to use the
second hash function which is H2(key)=P-(key mode P) here p is a prime number which
should be taken smaller than the hash table so value of p will be the 7.
i.e. H2(17)=7-(17%7)=7-3=4 that means we have to take 4 jumps for placing the 17.
Therefore 17 will be placed at index 1.