Data Structures and
Algorithm Analysis
Lecturer: Jing Liu
Email: [email protected]
Homepage: http://see.xidian.edu.cn/faculty/li
ujing
Textbook
Mark Allen Weiss, Data
Structures and Algorithm
Analysis in C, China Machine
Press.
Grading
Final exam: 70%
Others: 30%
What are Data
Structures and
Algorithms?
Data Structures are methods of
organizing large amounts of data.
An algorithm is a procedure that consists
of finite set of instructions which, given an
input from some set of possible inputs,
enables us to obtain an output if such an
output exists or else obtain nothing at all if
there is no output for that particular input
through a systematic execution of the
instructions.
Inputs Outputs
Instruction
(Problem s (Answers
s) )
Computers
Programmin Data Softwar
Algorith
g Structur e
ms
Languages e System
s
Contents
Chapter 3 Lists, Stacks, and Queues
Chapter 4 Trees
Chapter 5 Hashing
Chapter 6 Priority Queues (Heaps)
Chapter 7 Sorting
Chapter 8 The Disjoint Set ADT
Chapter 9 Graph Algorithms
Chapter 10 Algorithm Design Techniques
Abstract Data Types (A
DTs)
One of the basic rules concerning programming is
to break the program down into modules.
Each module is a logical unit and does a specific
job. Its size is kept small by calling other modules.
Modularity has several advantages. (1) It is much
easier to debug small routines than large routines;
(2) It is easier for several people to work on a
modular program simultaneously; (3) A well-
written modular program places certain
dependencies in only one routing, making
changes easier.
Abstract Data Types (A
DTs)
An abstract data type (ADT) is a set of operati
ons.
Abstract data types are mathematical abstraction
s; nowhere in an ADT’s definition is there any me
ntion of how the set of operations is implemented.
Objects such as lists, sets, and graphs, along with
their operations, can be viewed as abstract data t
ypes, just as integers, reals, and booleans are dat
a types. Integers, reals, and booleans have opera
tions associated with them, and so do ADTs.
Abstract Data Types (A
DTs)
The basic idea is that the implementation of the
operations related to ADTs is written once in the
program, and any other part of the program that
needs to perform an operation on the ADT can d
o so by calling the appropriate function.
If for some reason implementation details need t
o be changed, it should be easy to do so by mer
ely changing the routings that perform the ADT
operations.
There is no rule telling us which operations must
be supported for each ADT; this is a design decis
ion.
The List ADT
The form of a general list: A1, A2, A3, …, AN;
The size of this list is N;
An empty list is a special list of size 0;
For any list except the empty list, we say that Ai+1
follows (or succeeds) Ai (i<N) and that Ai-1
precedes Ai (i>1);
The first element of the list is A1, and the last
element is AN. We will not define the predecessor
of A1 or the successor of AN.
The position of element Ai in a list is i.
The List ADT
There is a set of operations that we would like
to perform on the list ADT:
PrintList
MakeEmpty
Find: return the position of the first occurrence of a key
Insert and Delete: insert and delete some key from so
me position in the list
FindKth: return the element in some position
Next and Previous: take a position as argument and ret
urn the position of the successor and predecessor
The List ADT
Example: The list is 34, 12, 52, 16, 13
Find(52)
Insert(X, 3)
Delete(52)
The interpretation of what is appropriate for a
function is entirely up to the programmer.
Simple Array
Implementation of Lists
All these functions about lists can be
implemented by using an array.
PrintList
MakeEmpty
Find
Insert
Delete
Next
Previous
Simple Array
Implementation of Lists
Disadvantages:
An estimate of the maximum size of the list is
required, even if the array is dynamically
allocated. Usually this requires a high
overestimate, which wastes considerable
space.
Insertion and deletion are expensive. For
example, inserting at position 0 requires first
pushing the entire array down one spot to
make room.
Because the running time for insertions and deletions
is so slow and the list size must be known in advance,
simple arrays are generally not used to implement lists.
Linked Lists
In order to avoid the linear cost of insertion
and deletion, we need to ensure that the list is
not stored contiguously, since otherwise entire
parts of the list will need to be moved.
A1 A2 A3 A4 A5
A linked list
The linked list consists of a series of structures, which are not
necessarily adjacent in memory.
Each structure contains the element and a pointer to a
structure containing its successor. We call this the Next pointer.
The last cell’s Next pointer points to NULL;
Linked Lists
If P is declared to be a pointer to a structure, then the value store
d in P is interpreted as the location, in main memory, where a str
ucture can be found.
A field of that structure can be accessed by P->FieldName,
where FieldName is the name of the field we wish to examine.
A1 80 A2 71 A3 99 A4 69 A5 0
0 2 2 2
1000 800 712 992 692
Linked list with actual pointer
values
In order to access this list, we need to know where the first cell
can be found. A pointer variable can be used for this purpose.
Linked Lists
To execute PrintList(L) or Find(L, Key), we merely pass a
pointer to the first element in the list and then traverse t
he list by following the Next pointers.
The Delete command can be executed in one pointer cha
nge.
A1 A2 A3 A4 A5
The Insert command requires obtaining a new c
ell from the system by using a malloc call and t
hen executing two pointer maneuvers.
A1 A2 A3 A4 A5
X
Linked Lists
There are several places where you are likely
to go wrong:
(1) There is no really obvious way to insert at the front of the list from the definitions given;
(2) Deleting from the front of the list is a special case, because it changes the start of the list;
careless coding will lose the list;
(3) A third problem concerns deletion in general. Although the pointer moves above are simple,
the deletion algorithm requires us to keep track of the cell before the one that we want to delete.
Linked Lists
One simple change solves all three problems.
We will keep a sentinel node, referred to an a
header or dummy node.
Heade A1 A2 A3 A4 A5
r
L Linked list with a
To avoid the problems header
associated with deletions, we need
to write a routing FindPrevious, which will return the positi
on of the predecessor of the cell we wish to delete. If we u
se a header, then if we wish to delete the first element in
the list, FindPrevious will return the position of the header.
Doubly Linked Lists
Sometimes it is convenient to traverse lists
backwards. The solution is simple. Merely add
an extra field to the data structure, containing
a pointer to the previous cell. The cost of this
is an extra link, which adds to the space
requirement and also doubles the cost of
insertions and deletions because there are
more pointers to fix.
A1 A2 A3 A4 A5
A doubly linked list
How to implement doubly linked lists?
Circularly Linked Lists
A popular convention is to have the last cell keep a
pointer back to the first. This can be done with or
without a header. If the header is present, the last
cell points to it.
It can also be done with doubly linked lists, the first
cell’s previous pointer points to the last cell.
A1 A2 A3 A4 A5
A double circularly linked list
Example – The
Polynomial ADT
N
F(X ) = å A
i=0 i
X i
If most of the coefficients Ai are nonzero, we
can use a simple array to store the
coefficients.
Write codes to calculate F(X) based on array.
Example – The
Polynomial ADT
1000 14
P1 ( X ) = 10 X + 5 X +1
If most of the coefficients Ai are zero, the
implementation based on array is not efficient, since
most of the time is spent in multiplying zeros.
10 100 5 14 1 0
0
P1
An alternative is to use a singly linked list.
Each term in the polynomial is contained in one cell,
and the cells are sorted in decreasing order of
exponents.
Stack ADT
A stack is a list with the restriction that
insertions and deletions can be performed in
only one position, namely, the end of the list,
called the top.
The fundamental operations on a stack are
Push, which is equivalent to an insert, and Pop,
which deletes the most recently inserted
element.
The most recently inserted element can be
examined prior to performing a Pop by use of
the Top routine.
Stack ADT
A Pop or Top on an empty stack is generally
considered an error in the stack ADT.
Running out of space when performing a Push
is an implementation error but not an ADT
error.
Stacks are sometimes known as LIFO (last in,
first out) lists.
Stack ADT
Top 7
Stack model: only the top element is
accessible
Implementation of
Stacks
Since a stack is a list, any list implementation
will do.
We will give two popular implementations. One
uses pointers and the other uses an array.
No matter in which case, if we use good
programming principles, the calling routines
do not need to know which method is being
used.
Linked List
Implementation of
Stacks
We perform a Push by inserting at the front of
the list
We perform a Pop by deleting the element at
the front of the list
A Top operation merely examines the element
at the front of the list, returning its value.
Array Implementation
of Stacks
A Stack is defined as a pointer to a structure. T
he structure contains the TopOfStack and Cap
acity fields. Once the maximum size is known,
the stack array can be dynamically allocated.
Associated with each stack is TopOfStack, whic
h is -1 for an empty stack (this is how an empt
y stack is initialized).
Array Implementation
of Stacks
To push some element X onto the stack, we in
crement TopOfStack and then set Stack[TopOf
Stack]=X, where Stack is the array representin
g the actual stack.
To pop, we set the return value to Stack[TopOf
Stack] and then decrement TopOfStack.
Example – Conversion
of Numbers
We have many different data systems, like
Decimal system, Binary system,
Hexadecimal system, Octal system
Convert a decimal number to a binary number
Decimal Diviso Quotient Remainder
Number r
30 2 15 0
15 2 7 1
7 2 3 1
3 2 1 1
1 2 0 1
Function calls.
The Queue ADT
Like stacks, queues are lists.
With a queue, however, insertion is done
at one end, whereas deletion is performed
at the other end.
The basic operations on a queue are Enqu
eue, which inserts an element at the end
of the list (called the rear), and Dequeue,
which deletes (and returns) the element a
t the start of the list (known as the front).
Array Implementation
of Queues
For each queue data structure, we keep an array, Queue[],
and the positions Front and Rear, which represent the
ends of the queue.
We also keep track of the number of elements that are
actually in the queue, Size. All this information is part of
one structure.
The following figure shows a queue in some intermediate
state. The cells that are blanks have undefined values in
them:
5 2 7 1
Front Rear
Array Implementation
of Queues
To Enqueue an element X, we increment Size a
nd Rear, then set Queue[Rear]=X.
To Dequeue an element, we set the return val
ue to Queue[Front], decrement Size, and then i
ncrement Front.
Whenever Front or Rear gets to the end of the
array, it is wrapped around to the beginning. T
his is known as a circular array implementatio
n.
Initial State 2 4
Front Rear
Enqueue (1) 1 2 4
Rear Front
Enqueue 1 3 2 4
(3) Rear Front
Dequeue, which 1 3 2 4
returns 2 Rear Front
Dequeue, which 1 3 2 4
returns 4 FrontRear
Dequeue, which 1 3 2 4
returns 1 Rear
Front
Dequeue, which r
eturns 3 and mak 1 3 2 4
es the Queue em Rear Front
Linked List
Implementation of
Queues
Front Header
……
Rea /
r
Linked List
Implementation of
Queues
Front
Empty /
Reart
Queue
Front
Enqueue x x /
Reart
Front
Enqueue y x y /
Reart
Front
Dequeue x x y /
Reart
Example Applications
When jobs are submitted to a printer, they ar
e arranged in order of arrival.
Every real-life line is a queue. For instance, li
nes at ticket counters are queues, because s
ervice is first-come first-served.
A whole branch of mathematics, known as qu
eueing theory, deals with computing, probabi
listically, how long users expect to wait on a l
ine, how long the line gets, and other such q
uestions.
Homework of Chapter 3
Exercises:
3.2 (Don’t need to analyze the running time.)
3.3
3.4
3.5
3.21
3.25