0% found this document useful (0 votes)
20 views47 pages

Module 6

The document discusses various searching and sorting techniques, including linear search, binary search, and Fibonacci search, detailing their algorithms and complexities. It also covers sorting methods such as insertion sort and shell sort, explaining their processes and advantages. The document emphasizes the importance of sorting for efficient searching and categorizes sorting into internal and external types.

Uploaded by

Avadhut Mali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views47 pages

Module 6

The document discusses various searching and sorting techniques, including linear search, binary search, and Fibonacci search, detailing their algorithms and complexities. It also covers sorting methods such as insertion sort and shell sort, explaining their processes and advantages. The document emphasizes the importance of sorting for efficient searching and categorizes sorting into internal and external types.

Uploaded by

Avadhut Mali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Searching & Sorting

Technique
Searching
• Searching refers to the operation of finding the location of a given element in a collection of elements. If
the element dose appear in that list, then the location of that element is recorded and searching is said to be
successful. Otherwise the search is said to be unsuccessful.

• Linear search is also called as sequential search algorithm. It is the simplest searching algorithm. In Linear
search, we simply traverse the list completely and match each element of the list with the item whose
location is to be found. If the match is found, then the location of the item is returned; otherwise, the
algorithm returns NULL.
• Linear search is implemented using following steps...
Step 1 - Read the search element from the user.
Step 2 - Compare the search element with the first element in the list.
Step 3 - If both are matched, then display "Given element is found!!!" and terminate the function
Step 4 - If both are not matched, then compare search element with the next element in the list.
Step 5 - Repeat steps 3 and 4 until search element is compared with last element in the list.
Step 6 - If last element in the list also doesn't match, then display "Element is not found!!!" and terminate the
function.
• It is widely used to search an element from the unordered list, i.e., the list in which items are not sorted.
The worst-case time complexity of linear search is O(n).
The following steps are followed to search for an element k = 1 in the list below.

Start from the first element, compare k with each element x

If x == k, return the index.


• Algorithm:
1. Start
2. linear_search ( Array , value)
For each element in the array
If (searched element == value)
Return's the searched element location
end if
end for
3. end

• Linear search on a list of n elements. In the worst case, the search must visit every element once. This happens
when the value being searched for is either the last element in the list, or is not in the list. However, on average,
assuming the value searched for is in the list and each list element is equally likely to be the value searched for, the
search visits only n/2 elements. In best case the array is already sorted i.e O(1).
• Binary search
• A binary search or half-interval search algorithm finds the position of a specified input value (the search
"key") within an array sorted by key value. For binary search, the array should be arranged in ascending or
descending order.
• In each step, the algorithm compares the search key value with the key value of the middle element of the
array. If the keys match, then a matching element has been found and its index is returned.
• Otherwise, if the search key is less than the middle element's key, then the algorithm repeats its action on the
sub-array to the left of the middle element or, if the search key is greater, on the sub-array to the right.
• If the remaining array to be searched is empty, then the key cannot be found in the array and a special "not
found" indication is returned..
• Binary search is implemented using following steps...
Step 1 - Read the search element from the user.
Step 2 - Find the middle element in the sorted list.
Step 3 - Compare the search element with the middle element in the sorted list.
Step 4 - If both are matched, then display "Given element is found!!!" and terminate the function.
Step 5 - If both are not matched, then check whether the search element is smaller or larger than the
middle element.
Step 6 - If the search element is smaller than middle element, repeat steps 2, 3, 4 and 5 for the left
sublist of the middle element.
Step 7 - If the search element is larger than middle element, repeat steps 2, 3, 4 and 5 for the right
sublist of the middle element.
Step 8 - Repeat the same process until we find the search element in the list or until sublist contains
only one element.
Step 9 - If that element also doesn't match with the search element, then display "Element is not
found in the list!!!" and terminate the function.
A 11 20 25 32 45 60 80 low=0 high=6 mid=low+high/2 SE 11
i 0 1 2 3 4 5 6
1. Mid =3 A[3]=SE 32!=11
2. 32>11 high=mid-1 high =3-1=2
Low=0 high=2 mid=1
A[1]=20 20!=11
3. 20>11
High=mid-1=1-1=0
Low =0 high=0 mid =0
A[0]!10
10<11
• Binary Search Algorithm
1. Set BEG = 0 and END = N -1
2. Set MID = (BEG + END) / 2
3. Repeat step 4 to 8 While (BEG <= END) and (A[MID] ≠ ITEM)
4. If (ITEM < A[MID]) Then
5. Set END = MID – 1
6. Else
7. Set BEG = MID + 1 [End of If]
8. Set MID = (BEG + END) / 2
9. If (A[MID] == ITEM) Then
10. Print: ITEM exists at location MID
11. Else
12. Print: ITEM doesn’t exist
[End of If]
13. Exit
• Fibonacci Search: The fibonacci search is an efficient searching algorithm which works on sorted array of
length n. It is a comparison-based algorithm and it returns the index of an element which we want to search in
array and if that element is not there then it returns -1.

• Examples:
1. Input: arr[] = {20, 30, 40, 50, 60}, x = 50
Output: 3 (Element x is present at index 3.)
2. Input: arr[] = {25, 35, 45, 55, 65}, x = 15
Output: -1 (Element x is not present.)

• In this, the fibonacci numbers are used to search an element in given sorted array.
Similarities with Binary Search:
• Fibonacci search works on sorted array only as the binary search.
• The time taken by fibonacci search in worst case is O( log n) which is similar to the binary search.
• The divide and conquer technique is used by fibonacci search.
Differences with Binary Search:
• Binary search divides array into equal parts but the fibonacci search divides array into unequal parts.
• In fibonacci search, there is no use “/” operator instead of this, it uses + and – operator.
• Fibonacci search can be used if the size of array is larger.
Background:
• F(n) = F(n-1) + F(n-2), F(0) = 0, F(1) = 1 is way to define fibonacci numbers recursively.
• First few Fibinacci Numbers are 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, …
Steps
1. Find F(K)(Kth Fibonacci number) which is greater than or equal to n.
2. if F(K) = 0 then
stop and print element not present
3. offset = -1
4. index = min(offset+F(K-2), n-1)
5. if Search_Ele == A[index]
return index & stop the search
if Search_Ele > A[index]
K = K -1 offset = index and repeat steps 4,5
if Search_Ele < A[index]
K = K – 2 repeat steps 4, 5
A - 2 5 7 13 21 28 31 n = 7 SE =21

K- 0 1 2 3 4 5 6
F(K)-0 1 1 2 3 5 8
1. F[K]>=n F[K]=8 K=6 offset=-1 F(K)=8, F(k-1)+F(K-2)=5+3
Index= min(offset+F(K-2), n-1)=min(-1+3, 7-1)=min(2,6)=2
A[2]=21 7!=21
2. A[2]<21 F(k)=F[k-1]=5 k=5 offset=index=2
Index(min(2+2,6)=min(4,6)=4
A[4]=21
• Sorting :
• Sorting is nothing but storage of data in sorted order, it can be in ascending or descending order. The term
Sorting comes into picture with the term Searching.
• There are so many things in our real life that we need to search, like a particular record in database, roll
numbers in merit list, a particular telephone number, any particular page in a book etc. Sorting arranges data
in a sequence which makes searching easier. Every record which is going to be sorted will contain one key.
Based on the key the record will be sorted. For example, suppose we have a record of students, every such
record will have the following data: Roll No., Name, Age Here Student roll no. can be taken as key for
sorting the records in ascending or descending order. Now suppose we have to search a Student with roll no.
15, we don't need to search the complete record we will simply search between the Students with roll no. 10
to 20.
• CATEGORIES OF SORTING
• An internal sort is any data sorting process that takes place entirely within the main memory of a computer.
This is possible whenever the data to be sorted is small enough to all be held in the main memory.
• External sorting is a term for a class of sorting algorithms that can handle massive amounts of data. External
sorting is required when the data being sorted do not fit into the main memory of a computing device
(usually RAM) and instead they must reside in the slower external memory (usually a hard drive). External
sorting typically uses a hybrid sort-merge strategy. In the sorting phase, chunks of data small enough to fit
in main memory are read, sorted, and written out to a temporary file. In the merge phase, the sorted sub
files are combined into a single larger file.
• We can say a sorting algorithm is stable if two objects with equal keys appear in the same order in sorted
output as they appear in the input unsorted array.
• Insertion Sort:
• It is a very simple and efficient sorting algorithm for sorting a small number of elements in which the
sorted array is built one element at a time. The main idea behind insertion sort is that it inserts each
element into its proper location in the sorted array.
• Let us take there are n elements the array arr. Then process of inserting each element in proper place is
as
• Pass 1- arr[0] is already sorted because of only one element.
• Pass 2- arr[1] is inserted before or after arr[0]. So arr[0] and arr[1] are sorted.
• Pass 3- arr[2] is inserted before arr[0] or in between arr[0] and arr[1] or after arr[1]. So arr[0], arr[1]
and arr[2] are sorted.
• Pass 4- arr[3] is inserted into its proper place in array arr[0], arr[1], arr[2] So, arr[0] arr[1] arr[2] and
arr[3] are sorted.
• ................................................
• ................................................
• ................................................
• Pass N - arr[n-1] is inserted into its proper place in array. arr[0], arr[1], arr[2],..............................
arr[n-2]. So, arr[0] arr[1],............................. arr[n-1] are sorted.
Insertion sort has following advantages-
• Simple implementation
• Efficient for small data sets
• Stable; i.e., does not change the relative order of elements with equal keys
• In-place; i.e., only requires a constant amount O(1) of additional memory space

Time Complexity:
• Best Case Complexity - It occurs when there is no sorting required, i.e. the array is already sorted. The best-
case time complexity of insertion sort is O(n).
• Average Case Complexity - It occurs when the array elements are in jumbled order that is not properly
ascending and not properly descending. The average case time complexity of insertion sort is O(n2).
• Worst Case Complexity - It occurs when the array elements are required to be sorted in reverse order. That
means suppose you have to sort the array elements in ascending order, but its elements are in descending
order. The worst-case time complexity of insertion sort is O(n2).
521364
• Compare first and second elements i.e., 5 and 2, arrange in increasing order.
251364
• Next, sort the first three elements in increasing order.
125364
• Next, sort the first four elements in increasing order.
123564
• Next, sort the first five elements in increasing order.
123564
• Next, sort the first six elements in increasing order.
123456
• Algorithm:
• Insertion(int a[ ],int n)
1. Start
2. set i=1
3. repeat the steps 4,5,8 and 9 while(i<n)
4. set temp=a[i] and j=i-1
5. repeat steps 6 and 7 while j>=0 && a[j]>temp
6. set a[j+1] =a[j]
7. j=j-1
8. set a[j+1]=temp
9. set i=i+1
10.stop
• Shell Sort:
• It is a sorting algorithm that is an extended version of insertion sort. Shell sort has improved the
average time complexity of insertion sort. As similar to insertion sort, it is a comparison-based and
in-place sorting algorithm. Shell sort is efficient for medium-sized data sets.
• In insertion sort, at a time, elements can be moved ahead by one position only. To move an element
to a far-away position, many movements are required that increase the algorithm's execution time.
But shell sort overcomes this drawback of insertion sort. It allows the movement and swapping of
far-away elements as well.
• This algorithm first sorts the elements that are far away from each other, then it subsequently
reduces the gap between them. This gap is called as interval.
• Some of the intervals that can be used in the shell sort algorithm are:
• Shell's original sequence: N/2, N/4, …………….,1.
• Papernov & Stasevich increment: 1,3,5,9,17,33,65…………
• Hibbard's increments: 1,3,7,15,31,63,127,255,511…………
• Pratt:1,2,3,4,6,9,8,12,18,27,16,24,36,54,81,………………..
At the interval of 4, the sublists are {33, 12}, {31, 17}, {40, 25}, {8, 42}.

Now, we have to compare the values in every sub-list. After comparing, we have to swap them if required in the original
array. After comparing and swapping, the updated array will look as follows -
• In the second loop, elements are lying at the interval of 2 (n/4 = 2), where n = 8.
• Now, we are taking the interval of 2 to sort the rest of the array. With an interval of 2, two sublists will be generated
- {12, 25, 33, 40}, and {17, 8, 31, 42}.

Now, we again have to compare the values in every sub-list. After comparing, we have to swap them if required in
the original array. After comparing and swapping, the updated array will look as follows -
• In the third loop, elements are lying at the interval of 1 (n/8 = 1), where n = 8. At last, we use the interval of value 1 to
sort the rest of the array elements. In this step, shell sort uses insertion sort to sort the array elements.
• Algorithm:
1. ShellSort(a, n) // 'a' is the given array, 'n' is the size of array
2. for (gap = n/2; gap > 0; gap=gap / 2)
3. for ( j = gap; j < n-1; j ++ )
4. for (i = j-gap; i >= 0 ; i =i-gap)
5. if(a[i+gap] > a[i])
break;
else
swap(a[i+gap],a[i])
[End of if]
[End of for]
[End of for]
6. End ShellSort
• The performance of the shell sort depends on the type of sequence used for a given input array.
• Best Case Complexity - It occurs when there is no sorting required, i.e., the array is already sorted. The best-
case time complexity of Shell sort is O(n*logn).
• Average Case Complexity - It occurs when the array elements are in jumbled order that is not properly
ascending and not properly descending. The average case time complexity of Shell sort is O(n*logn).
• Worst Case Complexity - It occurs when the array elements are required to be sorted in reverse order. That
means suppose you have to sort the array elements in ascending order, but its elements are in descending order.
The worst-case time complexity of Shell sort is O(n2).
• Heap sort is a comparison-based sorting technique based on Binary Heap data structure.
• Binary Heap
A Binary Heap is a Complete Binary Tree where items are stored in a special order such that the value
in a parent node is greater or smaller than the values in its two children nodes. The former is called max
heap and the latter is called min-heap. A complete binary tree is a binary tree in which every level,
except possibly the last, is completely filled, and all nodes are as far left as possible.
• The heap can be represented by a binary tree or array. Since a Binary Heap is a Complete Binary Tree, it
can be easily represented as an array and the array-based representation is space-efficient. If the parent
node is stored at index I, the left child can be calculated by 2 * I + 1 and the right child by 2 * I + 2 .
• The process of reshaping a binary tree into a Heap data structure is known as ‘heapify’. A binary tree is
a tree data structure that has two child nodes at max. If a node’s children nodes are ‘heapified’, then
only ‘heapify’ process can be applied over that parent node. A heap should always be a complete binary
tree.
• Starting from a complete binary tree, we can modify it to become a Max-Heap by running a function
called ‘heapify’ on all the non-leaf elements of the heap. i.e. ‘heapify’ uses recursion.
Example:

30(0)
/ \
70(1) 50(2)
Input data: 4, 10, 3, 5, 1
4(0)
/ \
10(1) 3(2)
/ \
5(3) 1(4)
The numbers in bracket represent the indices in the array representation of data.
Applying heapify procedure to index 1:
4(0)
/ \
10(1) 3(2)
/ \
5(3) 1(4)

Applying heapify procedure to index 0:


10(0)
/ \
5(1) 3(2)
/ \
4(3) 1(4)
The heapify procedure calls itself recursively to build heap in top down manner.
Algorithm for “heapify”:
heapify(array)
Root = array[0]
Largest = largest( array[0] , array [2 * 0 + 1], array[2 * 0 + 2])
if(Root != Largest)
Swap(Root, Largest)

Heap Sort Algorithm for sorting in increasing order:


1. Build a max heap from the input data.
2. At this point, the largest item is stored at the root of the heap. Replace it with the last item of the heap
followed by reducing the size of heap by 1. Finally, heapify the root of the tree.
3. Repeat step 2 while the size of the heap is greater than 1(heap contains one element).
• MaxHeapify(A, i)
largest = i
l = left(i)
r = right(i)
if l <= heap-size[A] and A[l] > A[largest]
then largest = l
else if r <= heap-size[A] and A[r] > A[largest]
then largest = r
if largest != i
then swap A[i] with A[largest]
MaxHeapify(A, largest)
• end func
• BuildMaxHeap(A)
heap-size[A] = length[A]
for i = |length[A]/2| downto 1
do MaxHeapify(A, i)
• end func
• HeapSort(A)
BuildMaxHeap(A)
for i = length[A] downto 1
do swap A[1] with A[i]
heap-size[A] = heap-size[A] – 1
MaxHeapify(A, 1)
• end func
Quick Sort

• The Quick sort algorithm was developed by a British computer scientist Tony Hoare in 1959. The name
"Quick Sort" comes from the fact that, quick sort is capable of sorting a list of data elements significantly
faster than other sorting algorithms.
• It is one of the most efficient sorting algorithms and is based on divide and conquer strategy. It picks an
element as pivot and partitions the given array. The quick sort algorithm attempts to separate the list of
elements into two parts and then sort each part recursively. That means it use divide and conquer strategy.
In quick sort, the partition of the list is performed based on the element called pivot.
• Here pivot element is one of the elements in the list. The list is divided into two partitions such that "all
elements to the left of pivot are smaller than the pivot and all elements to the right of pivot are greater
than or equal to the pivot". There are many different versions of Quick Sort that pick pivot in different
ways.
1. Always pick first element as pivot.
2. Always pick last element as pivot.
3. Pick a random element as pivot.
4. Pick median as pivot.
• The key process in Quick Sort is partition().
The steps in the Quick sort algorithm are:
1. Select Pivot Element.
2. Split the given array with respect to pivot element in a way such that every element to the left of the
pivot is smaller than or equal to the pivot and every element to the right of the pivot is greater than or
equal to the pivot.
3. Repeat step 1 and 2 with the left sub list.
4. Repeat step 1 and 2 with the right sub list.
In Quick sort algorithm, partitioning of the list is performed using following steps...
1. Consider the first element of the list as pivot (i.e., Element at first position in the list).
2. Define two variables i and j. Set i and j to first and last elements of the list respectively.
3. Increment i until list[i] <= pivot then stop.
4. Decrement j until list[j] > pivot then stop.
5. If i < j then exchange list[i] and list[j].
6. Repeat steps 3,4 & 5 until i > j.
7. Exchange the pivot element with list[j] element.
.

Merge Sort
• Merge sort is a comparison based sorting algorithm. It is based on divide and conquer rule. In this
algorithm we divide the array into equal halves first and then combines them in a sorted manner.
This sorting algorithm suffers from space complexity but is it a stable algorithm because it
preserves the input order of equal elements in the sorted output.
• It is most respected and trusted sorting algorithm because of its time complexity in worst-case
being Ο(n log n).
Algorithm
• Step 1: if it is only one element in the list it is already sorted, return.
• Step 2: divide the list recursively into two halves until it can no more be divided.
• Step 3: merge the smaller lists into new list in sorted order.
Merge sort keeps on dividing the list into equal halves until it can no more be divided. Then, merge
sort combines the smaller sorted lists keeping the new list sorted too.
.
MERGE_SORT(arr, beg, end)

1. if beg < end


2. set mid = (beg + end)/2
3. MERGE_SORT(arr, beg, mid)
4. MERGE_SORT(arr, mid + 1, end)
5. MERGE (arr, beg, mid, end)
6. end of if

END MERGE_SORT

The important part of the merge sort is the MERGE function. This function performs the merging of
two sorted sub-arrays that are A[beg…mid] and A[mid+1…end], to build one sorted
array A[beg…end]. So, the inputs of the MERGE function are A[], beg, mid, and end.
• void merge(int A[ ] , int start, int mid, int end) {
//stores the starting position of both parts in temporary variables.
int p = start ,q = mid+1;
int Arr[end-start+1] , k=0;
for(int i = start ;i <= end ;i++) {
if(p > mid) //checks if first part comes to an end or not .
Arr[ k++ ] = A[ q++] ;
else if ( q > end) //checks if second part comes to an end or not
Arr[ k++ ] = A[ p++ ];
else if( A[ p ] < A[ q ]) //checks which part has smaller element.
Arr[ k++ ] = A[ p++ ];
else
Arr[ k++ ] = A[ q++];
}
for (int p=0 ; p< k ;p ++) {
/* Now the real array has elements in sorted manner including both parts.*/
A[ start++ ] = Arr[ p ] ;
}}
Radix sort
• Radix sort is one of the sorting algorithms used to sort a list of integer numbers in order. In radix
sort algorithm, a list of integer numbers will be sorted based on the digits of individual numbers.
Sorting is performed from least significant digit to the most significant digit.
• Radix sort algorithm requires the number of passes which are equal to the number of digits present
in the largest number among the list of numbers. For example, if the largest number is a 3 digit
number then that list is sorted with 3 passes.
• The Radix sort algorithm is performed using the following steps...
Step 1 - Define 10 queues each representing a bucket for each digit from 0 to 9.
Step 2 - Consider the least significant digit of each number in the list which is to be sorted.
Step 3 - Insert each number into their respective queue based on the least significant digit.
Step 4 - Group all the numbers from queue 0 to queue 9 in the order they have inserted into their
respective queues.
Step 5 - Repeat from step 3 based on the next least significant digit.
Step 6 - Repeat from step 2 until all the numbers are grouped based on the most significant digit.
• Hashing:
• Hashing is one of the searching techniques that uses a constant time. The time complexity in hashing is
O(1). Till now, we studied the techniques for searching, i.e., linear search and binary search. The worst time
complexity in linear search is O(n), and O(log n) in binary search. The drawbacks of both searching
methods are as the number of elements increases, time taken to perform the search also increases. This
becomes problematic when total number of elements become too large.
• In both the searching techniques, the searching depends upon the number of elements but we want the
technique that takes a constant time. So, hashing technique came that provides a constant time.
• Hashing is well known technique to search any particular element among several elements. It minimizes the
number of comparisons while performing the search.
• In Hashing technique, the hash table and hash function are used. Using the hash function, we can calculate
the address at which the value can be stored.
• In hashing, large keys are converted into small keys by using hash functions. The values are stored in a data
structure called hash table.
• Hash key value is a special value that serves as an index for a data item. It indicates where the data item
should be stored in the hash table. Hash key value is generated using a hash function.

• E.g. In colleges, each student is assigned a unique roll number that can be used to retrieve information
about them.
Hashing
• Types of Hash Functions-
There are various types of hash functions available such as-
• Mid Square Hash Function
• Division Hash Function
• Folding Hash Function etc.

The properties of a good hash function are-


• It should be easy to compute.
• It should minimize the number of collisions.
• It should distribute the keys uniformly over the table.

Collision
• No matter what the hash function, there is possibility that two different keys could resolve to the same hash
address. This situation is known as collision.
Handling the collision:
• Separate Chaining (Open hashing)
• Open addressing (closed hashing) ( Linear probing, Quadratic probing, random probing, Double hashing)
Fig: Division method
Chaining: To handle the collision, This technique creates a linked list to the slot for which collision
occurs.
• The new key is then inserted in the linked list.
• These linked lists to the slots appear like chains.
• That is why, this technique is called as separate chaining.
• Open Addressing:

• Linear Probing-In linear probing,


When collision occurs, we linearly probe for the next bucket.
We keep probing until an empty bucket is found.

• Quadratic Probing- In quadratic probing,


When collision occurs, we probe for i2‘th bucket in ith iteration.
We keep probing until an empty bucket is found.

• Double Hashing- In double hashing,


We use another hash function hash2(x) and look for i * hash2(x) bucket in ith iteration.
It requires more computation time as two hash functions need to be computed.
Linear Probing

You might also like