Industryal
Industryal
The model defines an abstract view to the problem. This implies that the model focuses only on problem
related stuff and that a programmer tries to define the properties of the problem.
An entity with the properties just described is called an abstract data type (ADT).
1
A data structure is a language construct that the programmer has defined in order to implement an abstract
data type.
There are lots of formalized and standard Abstract data types such as Stacks, Queues, Trees, etc.
1.1.2. Abstraction
Abstraction is a process of classifying characteristics as relevant and irrelevant for the particular purpose
at hand and ignoring the irrelevant ones.
How do data structures model the world or some part of the world?
The value held by a data structure represents some specific characteristic of the world
The characteristic being modeled restricts the possible values held by a data structure
The characteristic being modeled restricts the possible operations to be performed on the data
structure.
Note: Notice the relation between characteristic, value, and data structures
Where are algorithms, then?
1.2. Algorithms
An algorithm is a well-defined computational procedure that takes some value or a set of values as input
and produces some value or a set of values as output. Data structures model the static part of the world.
They are unchanging while the world is changing. In order to model the dynamic part of the world we
need to work with algorithms. Algorithms are the dynamic part of a program’s world model.
An algorithm transforms data structures from one state to another state in two ways:
The quality of a data structure is related to its ability to successfully model the characteristics of the
world. Similarly, the quality of an algorithm is related to its ability to successfully simulate the changes in
the world.
However, independent of any particular world model, the quality of data structure and algorithms is
determined by their ability to work together well. Generally speaking, correct data structures lead to
simple and efficient algorithms and correct algorithms lead to accurate and efficient data structures.
2
• Definiteness: Each step must be clearly defined, having one and only one interpretation. At
each point in computation, one should be able to tell exactly what happens next.
• Sequence: Each step must have a unique defined preceding and succeeding step. The first step
(start step) and last step (halt step) must be clearly noted.
• Feasibility: It must be possible to perform each instruction.
• Correctness: It must compute correct answer for all possible legal inputs.
• Language Independence: It must not depend on any one programming language.
• Completeness: It must solve the problem completely.
• Effectiveness: It must be possible to perform each step exactly and in a finite amount of time.
• Efficiency: It must solve with the least amount of computational resources such as time and
space.
• Generality: Algorithm should be valid on all possible inputs.
• Input/Output: There must be a specified number of input values, and one or more result
values.
In order to solve a problem, there are many possible algorithms. One has to be able to choose the best
algorithm for the problem at hand using some scientific method. To classify some data structures and
algorithms as good, we need precise ways of analyzing them in terms of resource requirement. The main
resources are:
Running Time
Memory Usage
Communication Bandwidth
Running time is usually treated as the most important since computational time is the most precious
resource in most problem domains.
3
o Input Size
o Input Properties
Operating Environment
Accordingly, we can analyze an algorithm according to the number of operations required, rather than
according to an absolute amount of time involved. This can show how an algorithm’s efficiency changes
according to the size of the input.
The goal is to have a meaningful measure that permits comparison of algorithms independent of operating
platform.
There are two things to consider:
Time Complexity: Determine the approximate number of operations required to solve a problem
of size n.
Space Complexity: Determine the approximate memory required to solve a problem of size n.
There is no generally accepted set of rules for algorithm analysis. However, an exact count of operations
is commonly used.
4
Examples:
1. int count(){
int k=0;
cout<< “Enter an integer”;
cin>>n;
for (i=0;i<n;i++)
k=k+1;
return 0;}
Time Units to Compute
-------------------------------------------------
1 for the assignment statement: int k=0
1 for the output statement.
1 for the input statement.
In the for loop:
1 assignment, n+1 tests, and n increments.
n loops of 2 units for an assignment, and an addition.
1 for the return statement.
-------------------------------------------------------------------
T (n)= 1+1+1+(1+n+1+n)+2n+1 = 4n+6 = O(n)
2. int total(int n)
{
int sum=0;
for (int i=1;i<=n;i++)
sum=sum+1;
return sum;
}
Time Units to Compute
-------------------------------------------------
1 for the assignment statement: int sum=0
In the for loop:
1 assignment, n+1 tests, and n increments.
n loops of 2 units for an assignment, and an addition.
1 for the return statement.
-------------------------------------------------------------------
T (n)= 1+ (1+n+1+n)+2n+1 = 4n+4 = O(n)
3. void func()
{
int x=0;
int i=0;
int j=1;
cout<< “Enter an Integer value”;
cin>>n;
while (i<n){
x++;
i++;
}
5
while (j<n)
{
j++;
}
}
Time Units to Compute
-------------------------------------------------
1 for the first assignment statement: x=0;
1 for the second assignment statement: i=0;
1 for the third assignment statement: j=1;
1 for the output statement.
1 for the input statement.
In the first while loop:
n+1 tests
n loops of 2 units for the two increment (addition) operations
In the second while loop:
n tests
n-1 increments
-------------------------------------------------------------------
T (n)= 1+1+1+1+1+n+1+2n+n+n-1 = 5n+5 = O(n)
4. int sum (int n)
{
int partial_sum = 0;
for (int i = 1; i <= n; i++)
partial_sum = partial_sum +(i * i * i);
return partial_sum;
}
Time Units to Compute
-------------------------------------------------
1 for the assignment.
1 assignment, n+1 tests, and n increments.
n loops of 4 units for an assignment, an addition, and two multiplications.
1 for the return statement.
-------------------------------------------------------------------
T (n)= 1+(1+n+1+n)+4n+1 = 6n+4 = O(n)
• In general, a for loop translates to a summation. The index and bounds of the summation are the
same as the index and bounds of the for loop.
6
N
fo r (in t i = 1 ; i < = N ; i+ + ) {
}
s u m = s u m + i; i1
1 N
• Suppose we count the number of additions that are done. There is 1 addition per iteration of the
loop, hence N additions in total.
• Nested for loops translate into multiple summations, one for each for loop.
f o r ( in t i = 1 ; i < = N ; i+ + ) {
f o r ( in t j = 1 ; j < = M ; j+ + ) { N M N
}
s u m = s u m + i+ j ;
i 1 j 1
2 i 1
2 M 2 MN
}
• Again, count the number of additions. The outer summation is for the outer for loop.
Conditionals: Formally
• If (test) s1 else s2: Compute the maximum of the running time for s1 and s2.
if (te s t = = 1 ) {
fo r (in t i = 1 ; i < = N ; i+ + ) { N N N
}}
s u m = s u m + i; max 1, 2
i 1 i1 j 1
e ls e fo r (in t i = 1 ; i < = N ; i+ + ) {
fo r (in t j = 1 ; j < = N ; j+ + ) {
max N , 2 N 2
2N 2
s u m = s u m + i+ j;
}}
Example:
7
Suppose we have hardware capable of executing 106 instructions per second. How long would it take to
execute an algorithm whose complexity function was:
T (n) = 2n2 on an input size of n=108?
The total number of operations to be performed would be T (108):
Exercises
Determine the run time equation and complexity of each of the following code segments.
1. for (i=0;i<n;i++)
for (j=0;j<n; j++)
sum=sum+i+j;
5. int x=0;
for(int i=1;i<n;i=i+5)
x++;
What is the value of x when n=25?
6. int x=0;
for(int k=n;k>=n/3;k=k-5)
x++;
What is the value of x when n=25?
7. int x=0;
for (int i=1; i<n;i=i+5)
for (int k=n;k>=n/3;k=k-5)
x++;
What is the value of x when n=25?
8
8. int x=0;
for(int i=1;i<n;i=i+5)
for(int j=0;j<i;j++)
for(int k=n;k>=n/2;k=k-3)
x++;
What is the correct big-Oh Notation for the above code segment?
Average Case (Tavg): The amount of time the algorithm takes on an "average" set of inputs.
Worst Case (Tworst): The amount of time the algorithm takes on the worst possible set of inputs.
Best Case (Tbest): The amount of time the algorithm takes on the smallest possible set of inputs.
We are interested in the worst-case time, since it provides a bound for all input – this is called the “Big-
Oh” estimate.
Big Oh Notation, Ο
The Ο(n) is the formal way to express the upper bound of an algorithm's running time. It measures the
worst case time complexity or longest amount of time an algorithm can possibly take to complete.
9
Omega Notation, Ω
The Ω(n) is the formal way to express the lower bound of an algorithm's running time. It measures the
best case time complexity or best amount of time an algorithm can possibly take to complete.
Example
Let us consider a given function, f ( ) = 3 + 10 2 + n +
Considering ( ) = n3, . ( ) ≤ f ( ) ≤5. ( ) for all the large values of n
Hence, the complexity of ( ) can be represented as Ɵ( ( )), i.e. Ɵ (n3).
10
Typical Orders
Here is a table of some typical cases. This uses logarithms to base 2, but these are simply proportional to
logarithms in other base.
11
2. Simple Sorting and Searching Algorithms
2.1. Searching
Searching is a process of looking for a specific element in a list of items or determining that the item is
not in the list. There are two simple searching algorithms:
Pseudocode
Loop through the array starting at the first element until the value of target matches one of the array
elements.
Time is proportional to the size of input (n) and we call this time complexity O(n).
Example Implementation:
12
o If in lower half, make this half the array to search
o If in the upper half, make this half the array to search
• Loop back to step 1 until the size of the array to search is one, and this element does not match, in
which case return –1.
The computational time for this algorithm is proportional to log2 n. Therefore the time complexity is
O(log n)
Example Implementation:
Sorting is one of the most important operations performed by computers. Sorting is a process of
reordering a list of items in either increasing or decreasing order. The following are simple sorting
algorithms used to sort small-sized lists.
• Insertion Sort
• Selection Sort
• Bubble Sort
2.2.1. Insertion Sort
The insertion sort works just like its name suggests - it inserts each item into its proper place in the final
list. The simplest implementation of this requires two list structures - the source list and the list into which
13
sorted items are inserted. To save memory, most implementations use an in-place sort that works by
moving the current item past the already sorted items and repeatedly swapping it with the preceding item
until it is in place.
It's the most instinctive type of sorting algorithm. The approach is the same approach that you use for
sorting a set of cards in your hand. While playing cards, you pick up a card, start at the beginning of your
hand and find the place to insert the new card, insert it and move all the others up one place.
Basic Idea:
Find the location for an element and move all others up, and insert the element.
1. The left most value can be said to be sorted relative to itself. Thus, we don’t need to do anything.
2. Check to see if the second value is smaller than the first one. If it is, swap these two values. The
first two values are now relatively sorted.
3. Next, we need to insert the third value in to the relatively sorted portion so that after insertion, the
portion will still be relatively sorted.
4. Remove the third value first. Slide the second value to make room for insertion. Insert the value in
the appropriate position.
5. Now the first three are relatively sorted.
6. Do the same for the remaining items in the list.
Implementation
Analysis
How many comparisons?
1+2+3+…+(n-1)= O(n2)
How many swaps?
1+2+3+…+(n-1)= O(n2)
How much space?
In-place algorithm
14
Basic Idea:
Implementation:
Bubble sort is the simplest algorithm to implement and the slowest algorithm on very large inputs.
Basic Idea:
Loop through array from i=0 to n and swap adjacent elements if they are out of order.
Implementation:
void bubble_sort(list[])
{
int i,j,temp;
for(i=0;i<n; i++){
for(j=n-1;j>i; j--){
if(list[j]<list[j-1]){
temp=list[j];
list[j]=list[j-1];
list[j-1]=temp;
15
}//swap adjacent elements
}//end of inner loop
}//end of outer loop
}//end of bubble_sort
Each of these algorithms requires n-1 passes: each pass places one item in its correct place. The ith pass
makes either i or n - i comparisons and moves. So:
or O(n2). Thus these algorithms are only suitable for small problems where their simple code makes them
faster than the more complex code of the O(n logn) algorithm. As a rule of thumb, expect to find an O(n
logn) algorithm faster for n>10 - but the exact value depends very much on individual machines!.
Empirically it’s known that Insertion sort is over twice as fast as the bubble sort and is just as easy to
implement as the selection sort. In short, there really isn't any reason to use the selection sort - use the
insertion sort instead.
If you really want to use the selection sort for some reason, try to avoid sorting lists of more than a 1000
items with it or repetitively sorting lists of more than a couple hundred items.
16
Module Title: Basic Programming
Computers process data under the control of sets of instructions called computer
programs. These programs guide the computer through orderly sets of actions specified by
people called computer programmers. The programs that run on a computer are referred
to as software.
A computer consists of various devices referred to as hardware (e.g., the keyboard, screen,
mouse, hard disks, memory, DVDs and processing units). Computing costs are dropping
dramatically, owing to rapid developments in hardware and software technologies.
Computers that might have filled large rooms and cost millions of dollars decades ago are
now inscribed on silicon chips smaller than a fingernail, costing perhaps a few dollars each.
Ironically, silicon is one of the most abundant materials—it’s an ingredient in common sand.
Silicon-chip technology has made computing so economical that more than a billion
general-purpose computers are in use worldwide
Any computer can directly understand only its own machine language, defined by its
hardware design. Machine languages generally consist of strings of numbers (ultimately
reduced to 1s and 0s) that instruct computers to perform their most elementary operations
one at a time. Machine languages are machine dependent (a particular machine language
can be used on only one type of computer). Such languages are cumbersome for humans.
Programming in machine language was simply too slow and tedious for most programmers.
Computer usage increased rapidly with the advent of assembly languages, but programmers
still had to use many instructions to accomplish even the simplest tasks. To speed the
programming process, high-level languages were developed in which single statements
could be written to accomplish substantial tasks. Translator programs called compilers
convert high-level language programs into machine language. High-level languages allow
you to write instructions that look almost like every day English and contain commonly used
mathematical notations.
Instead of using the strings of numbers that computers could directly understand,
programmers began using English-like abbreviations to represent elementary operations.
These abbreviations formed the basis of assembly languages. Translator programs called
assemblers were developed to convert early assembly-language programs to machine
language at computer speeds.
From the programmer’s standpoint, high-level languages are preferable to machine and
assembly languages. C++, C, Microsoft’s .NET languages (e.g., Visual Basic, Visual C++ and
Visual C#) and Java are among the most widely used high-level programming languages.
Compiling a large high-level language program into machine language can take a
considerable amount of computer time. Interpreter programs were developed to execute
high-level language programs directly (without the delay of compilation), although slower
than compiled programs run.
Software
You do not normally talk directly to the computer, but communicate with it through an operating system.
The operating system allocates the computer’s resources to the different tasks that the computer must
accomplish. The operating system is actually a program, but it is perhaps better to think of it as your chief
servant. It is in charge of all your other servant programs, and it delivers your requests to them. If you
want to run a program, you tell the operating system the name of the file that contains it, and the
operating system runs the program. If you want to edit a file, you tell the operating system the name of
the file and it starts up the editor to work on that file. To most users the operating system is the
computer. Most users never see the computer without its operating system. The names of some
common operating systems are UNIX, DOS, Linux, Windows, Mac OS, and VMS.
A program is a set of instructions for a computer to follow. The input to a computer can be thought of as
consisting of two parts, a program and some data. The computer follows the instructions in the program,
and in that way, performs some process. The data is what we conceptualize as the input to the program.
For example, if the program adds two numbers, then the two numbers are the data. In other words, the
data is the input to the program, and both the program and the data are input to the computer (usually
via the operating system). Whenever we give a computer both a program to follow and some data for
the program, we are said to be running the program on the data, and the computer is said to execute the
program on the data. The word data also has a much more general meaning than the one we have just
given it. In its most general sense it means any information available to the computer. The word is
commonly used in both the narrow sense and the more general sense.
High-Level Languages
There are many languages for writing programs. In this text we will discuss the C++ programming
language and use it to write our programs. C++ is a high level language, as are most of the other
Lecture Note-Cosc1013 Page 2
Module Title: Basic Programming
programming languages you are likely to have heard of, such as C, Java, Pascal, Visual Basic, FORTRAN,
COBOL, Lisp, Scheme, and Ada. High-level languages resemble human languages in many ways. They are
designed to be easy for human beings to write programs in and to be easy for human beings to read. A
high-level language, such as C++, contains instructions that are much more complicated than the simple
instructions a computer’s processor (CPU) is capable of following.
The kind of language a computer can understand is called a low-level language. The exact details of low-
level languages differ from one kind of computer to another. A typical low-level instruction might be the
following:
ADD X Y Z
This instruction might mean “Add the number in the memory location called X to the number in the
memory location called Y, and place the result in the memory location called Z.” The above sample
instruction is written in what is called assembly language. Although assembly language is almost the
same as the language understood by the computer, it must undergo one simple translation before the
computer can understand it. In order to get a computer to follow an assembly language instruction, the
words need to be translated into strings of zeros and ones. For example, the word ADD might translate to
0110, the X might translate to 1001, the Y to 1010, and the Z to 1011. The version of the above
instruction that the computer ultimately follows would then be:
0110 1001 1010 1011
Assembly language instructions and their translation into zeros and ones differ from machine to
machine.
Programs written in the form of zeros and ones are said to be written in machine language, because that
is the version of the program that the computer (the machine) actually reads and follows. Assembly
language and machine language are almost the same thing, and the distinction between them will not be
important to us. The important distinction is that between machine language and high-level languages
like C++: Any high-level language program must be translated into machine language before the
computer can understand and follow the program.
Compilers
A program that translates a high-level language like C++ to a machine language is called a compiler. A
compiler is thus a somewhat peculiar sort of program, in that its input or data is some other program,
and its output is yet another program. To avoid confusion, the input program is usually called the source
program or source code, and the translated version produced by the compiler is called the object
program or object code. The word code is frequently used to mean a program or a part of a program,
and this usage is particularly common when referring to object programs. Now, suppose you want to run
a C++ program that you have written. In order to get the computer to follow your C++ instructions,
proceed as follows. First, run the compiler using your C++ program as data. Notice that in this case, your
C++ program is not being treated as a set of instructions. To the compiler, your C++ program is just a long
string of characters. The output will be another long string of characters, which is the machine-language
equivalent of your C++ program.
Next, run this machine-language program on what we normally think of as the data for the C++ program.
The output will be what we normally conceptualize as the output of the C++ program.
Any C++ program you write will use some operations (such as input and output routines) that have
already been programmed for you. These items that are already programmed for you (like input and
output routines) are already compiled and have their object code waiting to be combined with your
program’s object code to produce a complete machine-language program that can be run on the
computer. Another program, called a linker, combines the object code for these program pieces with the
object code that the compiler produced from your C++ program. In routine cases, many systems will do
this linking for you automatically. Thus, you may not need to worry about linking in very simple cases. In
routine cases, many systems will do this linking for you automatically. Thus, you may not need to worry
about linking in very simple cases.
Analysis stage requires a thorough understanding of the problem at hand and analysis of the data
and procedures needed to achieve the desired result. In analysis stage, therefore, what we must do
is work out what must be done rather than how to do it.
What input data are needed to the problem?
What procedures needed to achieve the result?
What outputs data are expected?
Algorithm design and flowchart
Once the requirements of the program are defined, the next stage is to design an algorithm to solve
the problem. An algorithm is a finite set of steps which, if followed accomplishes a particular task.
An algorithm is a map or an outline of a solution which shows the precise order in which the
program will execute individual functions to arrive at the solution. It is language independent. An
algorithm can be expressed in many ways. Here, we only consider two such methods:
Narrative (pseudocode) and Flowchart. English is often used to describe or narrate the algorithm.
There is no need to follow any rules about how to write it. Instead we use pseudo code which is free
form list of statements that shows the sequence of instructions to follow.
Flowchart
A flowchart consists of an ordered set of standard symbols (mostly, geometrical shapes) which
represent operations, data flow or equipment.
A flowchart is a diagram consisting of labeled symbols, together with arrows connecting one
symbol to another. It is a means of showing the sequence of steps of an algorithm.
A program flowchart shows the operations and logical decisions of a computer program. The most
significant advantage of flowcharts is a clear presentation of the flow of control in the algorithm, i.e.
the sequence in which operations are performed. Flowcharts allow the reader to follow the logic of
the algorithm more easily than would a linear description in English. Another advantage of
flowchart is it doesn’t depend on any particular programming language, so that it can used, to
translate an algorithm to more than one programming language.
A basic set of established flowchart symbols is:
Decision
Processing Input/output
START/STOP
Input/output: data are to be read into the computer memory from an input device or data are to
be passed from the memory to an output device.
Decision: –It usually contains a question within it. There are typically two output paths: one if the
answer to the question is yes ( true) , and the other if the answer is no ( false). The path to be
followed is selected during the execution by testing whether or not the condition specified within
the outline is fulfilled.
Terminals: appears either at the beginning of a flowchart (and contains the word "start") or at its
conclusion (and contains "stop"). It represents the Start and End of a program.
Connector: makes it possible to separate a flowchart into parts.
Flow lines: is used to indicate the direction of logical flow. (A path from one operation to another)
DESIGN AND IMPLEMENTATION OF ALGORITHMS
An algorithm is a finite set of instruction that specify a sequence of operations to be carried out in
order to solve a specific problem or class of problems. It is just a tool for solving a problem. All the
tasks that can be carried out by a computer can be stated as algorithms. For one problem there may
be a lot of algorithms that help to solve the problem, but the algorithm that we select must be
powerful, easy to maintain, and efficient (it doesn’t take too much space and time)
Once an algorithm is designed, it is coded in a programming language and computer executes the
program. An algorithm consists of a set of explicit and unambiguous finite steps which, when carried
out for a given set of initial conditions, produce the corresponding output and terminate in a fixed
amount of time. By unambiguity it is meant that each step should be defined precisely i.e., it should
have only one meaning. This definition is further classified with some more features.
An algorithm has five important features.
Finiteness: An algorithm terminates after a fixed number of steps.
Definiteness: Each step of the algorithm is precisely defined, i.e., the actions to be carried
out should be specified unambiguously.
Effectiveness: All the operations used in the algorithm are basic (division, multiplication,
comparison, etc.) and can be performed exactly in a fixed duration of time.
Input: An algorithm has certain precise inputs, i.e. quantities, which are specified to it
initially, before the execution of the algorithm begins.
Output: An algorithm has one or more outputs, that is, the results of operations which have
a specified relation to the inputs
Step 1: START
Step 2: Read N
Step 3: Sum ← 0,
Step 4: Count ← 0
Step 5: Read Num
Step 6: Sum←Sum + Num
Step 7: count ← count +1
Step 8: If Count < N then goto step5
Step 9: Print Sum
Step 10: Stop
Coding
The flowchart is independent of programming language. Now at this stage we translate each steps
described in the flowchart (algorithm description) to an equivalent instruction of target
programming language, that means, for example if we want to write in FORTRAN program language,
each step will be described by an equivalent FORTRAN instruction (Statement).
Implementation
Once the program is written, the next step is to implement it. Program implementation involves
three steps, namely, debugging (the process of removing errors), testing (a check of correctness),
and documenting the program (to aid the maintenance of a program during its life time). Every
program contains bugs that can range from simple mistakes in the language usage (syntax errors) up
to complex flaws in the algorithm (logic errors).
Maintenance
There are many reasons why programs must be continually modified and maintained, like changing
conditions, new user needs, previously undiscovered bugs (errors). Maintenance may involve all
steps from requirements analysis to testing.
A program written in High-level or Assembly is called Source code or Source program and, the
translated machine code is called the object code or object program. Programs that translate a
program written in high level language and Assembly to machine code program are called
translators. There are three types of translators; assembler, interpreter, and compiler.
1. Sequential executions where instructions are performed one after the other.
Lecture Note-Cosc1013 Page 9
Module Title: Basic Programming
Illustration: 1 - An algorithm and a flowchart to compute the area of a rectangle whose length is ‘l’
and breadth is ‘b’. Flowchart
START
Algorithm
Read l,b
Step 1: START
Step 2: Obtain (input) the length, call it l
Step 3: Obtain (input) the breadth, call it b Area ← l*b
Stop
START
Illustration: 2- To allow for repeated calculation of the area of a rectangle
whose length is ‘l’ and breadth is ‘b’, rewrite the above algorithm and
Read l,b
flowchart. Allow different values of ‘l’ and ‘b’ before each calculation of
area.
Area ← l*b
Algorithm Flowchart
Step 1: START
Step 2: Read length, call it l
Step 3: Read breadth, call it b
Step 4: compute l*b, call it Area
Step 5: Display Area
Step 6: Go to step 2
Note: Here in effect we created a loop. The series of instructions to calculate ‘area’ is reused over
and over again, each time with a different set of input data. But here it is an infinite loop. There is no
way to stop the repeated calculations except to pull the plug of the computer. There fore using this
type of unconditional transfer (Go to) to construct a loop is generally not a good idea. In almost all
cases where an unconditional loop seems useful, one of the other control structures (loops or
branches) can be substituted and in fact it is preferred
II. Branching Operations
With sequential instructions there is no possibility of skipping over one instruction. A branch is a
point in a program where the computer will make a decision about which set of instructions to
execute next. The question that we use to make the decision about which branch to take must be
set up so that the answers can only be yes or no. Depending on the answer to the question, control
flows in one direction or the other.
Illustration: Construct an algorithm and flowchart to read two numbers and determine which is
large.
Algorithm
Step 1: START
Step 2: Read two numbers A and B.
Step 3: If A > B then go to step 6
Step 4: Display B is largest.
Step 5: go to step 7
Step 6: Display A is largest
Step 7: STOP
Flowchart
Note that only one block of instructions is executed, not both. After one block or other executes, the
two paths merge (at the circle) and control transfers to the next instruction. We could have several
loops inside each block since we are not limited to just one instruction.
Nesting of Branching Operations
There are many times when we need to choose between more than two alternatives. One of the
solutions to this is nesting of instructions.
Illustration
Construct an algorithm and flowchart to see if a number ‘n’ is negative, positive, or zero
Algorithm Flowchart
Step 1: START
Step 2: Read in ‘n’
Step 3: Is n<0
Step 4: If yes, go to step 11
Step 5: Is n=0
Step 6: If yes, go to step 9
Step 7: Print “Positive”
Step 8: go to step 12
Step 9: Print “Zero”
Step 10: go to step 12
Step 11: Print “Negative”
Step 12: STOP
III. LOOPS
Loops are the third major type of control structure that we need to examine. There are 2 different
types of loops, the counted loop and the conditional loop. The counted loop repeats a
predetermined number of times while the conditional loop repeats until a condition is satisfied.
In the counted loop a counter keeps track of the loop executions. Once the counter reaches
a predetermined number, the loop terminates. Number of loop executions is set before the
loop begins and cannot be changed while the loop is running.
The conditional loop has no predetermined stopping point. Rather, each time through the
loop the program performs a test to determine when to stop. Also the quantity being used
for the test can change while the loop executes.
The variable used to control the loop can be referred to as Loop Control Variable (LCV).
Flowchart symbol for loop is hexagon. Inside the loop is the start and stop values for LCV.
Also a step value is included from which the computer decides how many times to execute
the loop. Inside the loop is the body which can consist of any number of instructions. When
the loop finishes, the control transfers to the first statement outside the loop.
With counted loops, it’s a must to know in advance how many times the loop will execute.
But if this information is not available and yet the problem demands a loop means, we can
make use of conditional loop, where the machine will check every time through the loop to
see whether it should be repeated or not.
In conditional loop, the programmer must change the Loop control variable.
Start =…
Stop=…
Illustration
Ex. An algorithm and flowchart to print out the numbers 1 to 100 and their squares.
Algorithm Flowchart
Start 1: START
Step 5: STOP
Quick quize
Construct an algorithm and a flowchart for the following:
1. To find the smallest number from three numbers.
2. To read in x, y and z and then compute the value of xyz.
3. To determine if a whole number is odd or even.
C++ is an Object Oriented Programming Language. It was initially named ‘C with classes’, C++ was
developed by Bjarne Stroustrup at AT&T Bell laboratories in Murray Hill, New Jersey, USA, in the
early eighties. Stroustrup, an admirer of Simula67 and a strong supporter of C, wanted to combine
the best of both languages and create a more powerful language that could support object-oriented
programming features and still retain the power and elegance of C. The result was C++. Stroustrup
called the new langrage ‘C with classes’. However, later in 1983, the name was changed to C++. C++
is a super set of C. Therefore, almost all C programs can be written in C++.
List of Compilers:
1. Borland C+ + & Turbo C+ + available from Borland International for DOS & OS/2.
2. Zortech C+ + from Zortech International on DOS.
3. Microsoft Visual C+ + by Microsoft Corp.
4. GNU C+ + usually called as G++.
Layout of a Simple C++ Program
The general form of a simple C++ program is shown in Display 2.1. As far as the compiler is
concerned, the line breaks and spacing need not be as shown there and in our examples. The
compiler will accept any reasonable pattern of line breaks and indentation. In fact, the compiler will
even accept most unreasonable patterns of line breaks and indentation. However, a program
should always be laid out so that it is easy to read. Placing the opening brace, { , on a line by itself
and also placing the closing brace, } , on a line by itself will make these punctuations easy to find.
Indenting each statement and placing each statement on a separate line makes it easy to see what
the program instructions are. Later on, some of our statements will be too long to fit on one line and
then we will use a slight variant of this pattern for indenting and line breaks. You should follow the
pattern set by the examples in this material.
1 #include <iostream.h>
2 int main( )
3{
4 Variable_Declarations
5 Statement_1
6 Statement_2
7 ...
8 Statement_Last
9 return 0;
10 }
The variable declarations are on the line that begins with the word int. As we will see in the next
sections, you need not place all your variable declarations at the beginning of your program, but
that is a good default location for them. Unless you have a reason to place them somewhere else,
place them at the start of your program as shown in Display 2.1. The statements are the instructions
that are followed by the computer. In Display 2.2 in the following page, the statements are the lines
that begin with cout or cin, and the one line that begins with c followed by an equal sign.
Statements are often called executable statements. We will use the terms statement and
executable statement interchangeably. Notice that each of the statements we have seen ends with a
semicolon. The semicolon in statements is used in more or less the same way that the period is used
in English sentences; it marks the end of a statement.
For now you can view the first few lines as a funny way to say “this is the beginning of the
program.” But we can explain them in a bit more detail. The first line #include <iostream.h> is called
an include directive. It tells the compiler where to find information about certain items that are
used in your program. In this case iostream.h is the name of a library that contains the definitions of
the routines that handle input from the keyboard and output to the screen; iostream.h is a file that
contains some basic information about this library. A linker program combines the object code for
the library iostream.h and the object code for the program you write. For the library iostream.h this
will probably happen automatically on your system. You will eventually use other libraries as well,
and when you use them, they will have to be named in directives at the start of your program. For
other libraries, you may need to do more than just place an include directive in your program, but in
order to use any library in your program, you will always need to at least place an include directive
for that library in your program. Directives always begin with the symbol #. Some compilers require
that directives have no spaces around the #; so it is always safest to place the # at the very start of
the line and not include any space between the # and the word include. The following line further
explains the include directive that we just explained.
The second and third nonblank lines, shown next, simply say that the main part of the program
starts here:
int main( )
The correct term is main function, rather than main part,. The braces { and } mark the beginning and
end of the main part of the program. They need not be on a line by themselves, but that is the way
to make them easy to find and we will therefore always place each of them on a line by itself. The
next-to-last line return 0; says to “end the program when you get to here.” This line need not be the
last thing in the program, but in a very simple program it makes no sense to place it anywhere else.
Some compilers will allow you to omit this line and will figure out that the program ends when there
are no more statements to execute. However, other compilers will insist that you include this line,
so it is best to get in the habit of including it, even if your compiler is happy without it. This line is
called a return statement and is considered to be an executable statement because it tells the
computer to do something; specifically, it tells the computer to end the program. The number 0 has
no intuitive significance to us yet, but must be there; its meaning will become clear as you learn
more about C++. Note that even though the return statement says to end the program, you still
must add a closing brace } at the end of the main part of your program.
Be certain that you do not have any extra space between the < and the iostream.h
file name (Display 2.1) or between the end of the file name and the closing >.
The compiler include directive is not very smart: It will search for a file name that
starts or ends with a space! The file name will not be found, producing an error that
is quite difficult to find. You should make this error deliberately in a small program,
then compile it. Save the message that your compiler produces so you know what
the error message means the next time you get that error message.
Compiling and Running a C++ Program
Next you will learn what would happen if you run the C++ program show in Display 2.2. But where is
that program and how do you make it run? You write a C++ program using a text editor in the same
way that you write any other document such as a term paper, a love letter, a shopping list, or
whatever. The program is kept in a file just like any other document you prepare using a text editor.
The way that you compile and run a C++ program also depends on the particular system you are
using. When you give the command to compile your program, this will produce a machine-language
translation of your C++ program. This translated version of your program is called the object code
for your program. The object code for your program must be linked (that is, combined) with the
object code for routines (such as input and output routines) that are already written for you. It is
likely that this linking will be done automatically, so you do not need to worry about linking.
void main ()
int a, b, c;
Lecture Note-Cosc1013
cout<<”Enter values of a, b”; Page 17
cin>>a>>b;
Module Title: Basic Programming
In the above program the statement cin>>a>>b; is an input statement and causes the program to
wait for the user to type two numbers. If we key in two values, say 10 and 20 then 10 will be
assigned to a, 20 to b. The operator >> is known as extraction (or) get from operator.
The statement cout<<”The result is:”<<c; is an output statement that causes the string in quotation
marks to be displayed on the screen as it is and then the content of the variable c is displayed . The
operator << is known as insertion (or) put to operator. The identifier cin is pronounced as ‘C in’ and
cout is pronounced as ‘C out’).
A mistake in a program is usually called a bug, and the process of eliminating bugs is called
debugging.
The compiler will catch certain kinds of mistakes and will write out an error message when it finds a
mistake. It will detect what are called syntax errors, because they are, by and large, violation of the
syntax (that is, the grammar rules) of the programming language, such as omitting a semicolon.
If the compiler discovers that your program contains a syntax error, it will tell you where the error is
likely to be and what kind of error it is likely to be. If the compiler says your program contains a
syntax error, you can be confident that it does. However, the compiler may be incorrect about
either the location or the nature of the error. It does a better job of determining the location of an
error, to within a line or two, than it does of determining the source of the error. This is because the
Lecture Note-Cosc1013 Page 18
Module Title: Basic Programming
compiler is guessing at what you meant to write down and can easily guess wrong. After all, the
compiler cannot read your mind. Error messages subsequent to the first one have a higher
likelihood of being incorrect with respect to either the location or the nature of the error. Again, this
is because the compiler must guess your meaning. If the compiler’s first guess was incorrect, this will
affect its analysis of future mistakes, since the analysis will be based on a false assumption.
If your program contains something that is a direct violation of the syntax rules for your
programming language, the compiler will give you an error message. However, sometimes the
compiler will give you only a warning message, which indicates that you have done something that
is not, technically speaking, a violation of the programming language syntax rules, but that is
unusual enough to indicate a likely mistake. When you get a warning message, the compiler is
saying, “Are you sure you mean this?” At this stage of your development, you should treat every
warning as if it were an error until your instructor approves ignoring the warning.
There are certain kinds of errors that the computer system can detect only when a program is run.
Appropriately enough, these are called run-time errors. Most computer systems will detect certain
run-time errors and output an appropriate error message. Many run-time errors have to do with
numeric calculations. For example, if the computer attempts to divide a number by zero, that is
normally a run-time error. If the compiler approved of your program and the program ran once with
no run-time error messages, this does not guarantee that your program is correct. Remember, the
compiler will only tell you if you wrote a syntactically (that is, grammatically) correct C++ program. It
will not tell you whether the program does what you want it to do. Mistakes in the underlying
algorithm or in translating the algorithm into the C++ language are called logic errors. For example,
if you were to mistakenly use the multiplication sign * instead of the addition sign + in the program
in Display 2.2, that would be a logic error. The program would compile and run normally, but would
give the wrong answer. If the compiler approves of your program and there are no runtime errors,
but the program does not perform properly, then undoubtedly your program contains a logic error.
Logic errors are the hardest kind to diagnose, because the computer gives you no error messages to
help find the error. It cannot reasonably be expected to give any error messages. For all the
computer knows, you may have meant what you wrote.
Quick quiz
3. If you omit a punctuation symbol (such as a semicolon) from a program, an error is produced.
What kind of error?
4. Omitting the final brace} from a program produces an error. What kind of error?
Comments
Comments are notice that added to your program to describe the logic and how the particular part
of the program is work. The comment lines in the program are ignored by the compiler. There are
two types of comments. These are:
Single line comment: C++ introduces single line comment // (double slash). Comments starts with a
double slash symbol and terminate at the end of line.
E.g : /* this is an example of C++ program to illustrate some of its features and how c++ is
written*/
Variable
Variables are often referred to as named memory locations to store a determined value. The value
of a variable may vary throughout program means that, a variable may take different values at
different times during execution. Each variable needs an identifier that distinguishes it from the
others, for example, in the code a=5 the variable identifier is ‘a’, but we could have called the
variables with any names we wanted to invent, as long as they were valid identifiers.
Identifiers
A valid identifier is a sequence of one or more letters, digits or underscore characters
(_ ).
The length of an identifier is not limited, although for some compilers only the 32 first characters
of an identifier are significant (the rest are not considered).
Neither spaces nor marked letters can be part of an identifier.
Variable identifiers should always begin either with a letter or underscore character
(_ ). They can never begin with a digit. But (_) this is usually reserved for external links.
Our own identifiers cannot match any key word of the C++ language nor your compiler's specific
ones since they could be confused with these.
The C++ language is "case sensitive", that means that an identifier written in capital letters is not
equivalent to another one with the same name but written in small letters. Thus, for example the
variable RESULT is not the same as the variable result nor the variable Result.
When programming, we store the variables in our computer's memory, but the computer has to
know what kind of data we want to store in them, since it is not going to occupy the same amount
of memory to store a simple number than to store a single letter or a large number, and they are not
going to be interpreted the same way. The memory in our computers is organized in bytes. A byte is
the minimum amount of memory that we can manage in C++.
Float, Double They are the numbers with decimal point. Ex:69.65,3.1415.
Character Any letter enclosed within single quotes comes under character.
The modifiers signed, unsigned, long, and short may be applied to character and integer basic data
types. However, the modifier long may also be applied to double. The following table lists all
combinations of the basic data types and modifiers along with their size and range.
* The values of the columns Size and Range depend on the system the program is compiled
for.
Declaration of Variables
In order to use a variable in C++, we must first declare it specifying the data type.
The syntax to declare a new variable is to write the data type specifier that we want (like
int, short, float...) followed by a valid variable identifier.
For example:
int a; declares a variable of type int with the identifier a
float mynumber; declares a variable of type float with the identifier mynumber.
Once declared, variables a and mynumber can be used within the rest of their scope in the
program.
To declare several variables of the same type and to save some writing work you can declare
all of them in the same line separating the identifiers with commas.
E.g: int a, b, c; declares three variables (a, b and c) of type int.
Initialization of Variables
When declaring a local variable, its value is undetermined by default.
To store a concrete value for a variable the moment it is declared append an equal sign
followed by the value wanted to the variable declaration:
o Syntax: type identifier = initial_value ;
o E.g: int a = 0; Declare an int variable called a that contains the value 0 at the
moment in which it is declared.
C++ has added a new way to initialize a variable: by enclosing the initial value
between parenthesis ():
o Syntax: type identifier (initial_value) ;
o For example: int a (0);
Both ways are valid and equivalent in C++.
Scope of variables
All the variables that we intend to use in a program must have been declared with its type
specifier in an earlier point in the code.
A variable can be either of global or local scope.
A global variable is a variable declared in the source code, outside all functions, while a local
variable is one declared within the body of a function or a block.
Global variables can be referred from anywhere in the code, even inside functions,
whenever it is after its declaration.
The scope of local variables is limited to the block enclosed in braces ({}) where they are
declared. For example, if they are declared at the beginning of the body of a function (like in
function main) their scope is between its declaration point and the end of that function. This
means that if another function existed in addition to main, the local variables declared in
main could not be accessed from the other function and vice versa.
Constant
Literals are used to express particular values within the source code of a program. For
example, when we wrote: a = 5 ; the 5 in this piece of code was a literal constant.
Literal constants can be divided in Integer Numerals, Floating-Point Numerals, Characters,
Strings and Boolean Values.
Integer Numerals
They express numbers with decimals and/or exponents. They can include either a decimal
point, an e character (that expresses "by ten at the Xth height", where X is an integer value
that follows the e character), or both a decimal point and an e character.
In both cases, the suffix can be specified using either upper or lowercase letters.
For e.g: 3.14159 // 3.14159
6.02e23 // 6.02 x 10^23
3.0 // 3.0
To explicitly to express a float or long double numerical literal, we use the f or l suffixes
respectively: 3.14159L // long double 6.02e23f // float
Any of the letters that can be part of a floating-point numerical constant (e, f, l) can be
written using either lower or uppercase letters without any difference in their meanings.
Boolean literals
There are only two valid Boolean values: true and false.
These can be expressed in C++ as values of type bool by using the Boolean literals true and
false.
Character and string literals
There are special characters that are difficult or impossible to express otherwise in the
source code of a program, like newline (\n) or tab (\t). All of them are preceded by a
backslash (\). Here you have a list of some of such escape codes:
String literals can extend to more than a single line of code by putting a backslash sign (\) at
the end of each unfinished line. E.g: "string expressed in \ two lines"
To define our own names for constants that we use very often without having to resort to
memory consuming variables, simply by using the #define preprocessor directive. Its format
is: #define identifier value
For example: #define PI 3.14159
#define NEWLINE '\n'
This defines two new constants: PI and NEWLINE. Once they are defined, you can use them
in the rest of the code as if they were any other regular constant.
In fact the only thing that the compiler preprocessor does when it encounters #define
directives is to literally replace any occurrence of their identifier (in the previous example,
these were PI and NEWLINE) by the code to which they have been defined (3.14159 and '\n'
respectively).
The #define directive is not a C++ statement but a directive for the preprocessor; therefore
it assumes the entire line as the directive and does not require a semicolon (;) at its end.
Declared constants (const)
With the const prefix you can declare constants with a specific type in the same way as you would
do with a variable:
Here, PI and tabulator are two typed constants. They are treated just like regular variables except
that their values cannot be modified after their definition.
Introduction to Strings
Variables that can store non-numerical values that are longer than one single character are
known as strings.
The C++ language library provides support for strings through the standard string class. This
is not a fundamental type, but it behaves in a similar way as fundamental types do in its
most basic usage.
A first difference with fundamental data types is that in order to declare and use objects
(variables) of this type we need to include an additional header file in our source code:
<string>
#include <iostream.h>
#include <string.h>
int main ()
return 0;
An operator is a symbol that tells the computer to perform certain mathematical (or) logical
manipulations.
Operators are used in programs to manipulate data and variables.
C++ operators can be classified into number of categories. They include
1. Arithmetic operators. 5. Increment / Decrement operators.
2. Relational operators 6. Conditional operators.
3. Logical operators. 7. Bitwise operators
4. Assignment operators
1. Arithmetic operators: C++ provides all the basic arithmetic operators like add (+), subtract (-), multiply (*),
divide (/), and mod (%).mod gives remainder of division.
Eg.. For mod: if a = 10; b = 3; c = a % b; c = 1;
2. Relational operators: These are the operators which relate the operands on either side of them like less
than(<),less than or equal(<=),equal(==),Greater than(>),Greater than or equal(>=)and not equal(!=). The
result of a relational operation is a Boolean value that can only be true or false, according to its Boolean
result.
3. Logical operators: C++ has the following three Truth table for AND and OR operations
logical operators. && (meaning logical AND), ||
(logical OR), ! (logical NOT). op-1 op-2 op-1 && op-2 op-1 || op-2
E.g:( (5 == 5) && (3 > 6) )
F F F F
// evaluates to false ( true && false ).
F T F T
((5 == 5) || (3 > 6) )
T F F T
// evaluates to true ( true || false ).
T T T T
4. Assignment operators: used to assign the result of
an expression to a variable and the symbol used is ‘= ‘.
o The part at the left of the assignment operator (=) is known as the lvalue (left value) and the right
one as the rvalue (right value). The lvalue has to be a variable whereas the rvalue can be either a
constant, a variable, the result of an operation or any combination of these.
o It is of 3 types:.
(i) Simple assignment E.g: a = 9;
(ii) Multiple assignment E.g: a = b = c = 36; a = 2 + (b = 5);fs
(iii) Compound assignment
E.g: a + = 15; (add 15 to a .Equivalent a =a +15;)
c * = 6; (Multiply c by 6).
b=++a;
b=--a;
Result a=b=6;
a=b=4;
b. Postfix auto increment / decrement --- This first assigns the value to the variable on the left & then
increments/decrements the operand.
Eg. : a = 5;
a=5;
b=a++;
b=a--;
Result b=5, a=6 b=5,a=4;
Generally a=a+1 can be written as ++a, a++ or a+=1. Similarly a=a-1 can be written as a--, --a or a -= 1.
6. Conditional operator (ternary operator):
Conditional expressions are of the following form.
exp1 ? exp2 : exp3 ;
exp1 is evaluated first. If the result is true then exp2 is evaluated else exp3 is evaluated. It is
this evaluated value that becomes the value of the expression.
For example, consider the following statements. a=10; b=15; x = (a>b) ? a : b; In this example
x will be assigned the value of b.
7. Bitwise Operators
Bitwise operators modify variables considering the bit patterns that represent the values
they store.
& Bitwise AND
| Bitwise Inclusive OR
~ Unary complement (bit inversion)
Type conversion in Assignments
The value of the right side (expression side) of the assignment is converted to the type of the left
side (target variable).
E.g: int x; char ch; float f;
ch=x; appropriate amount of high order bits are removed.
f=ch; 8-bit integer value of ch is stored as the same in floating point format.
i = (int) f;
o Code converts the float number 3.14 to an integer value (3), the remainder is lost. Here,
the typecasting operator was (int).
Another way to do the same thing in C++ is using the functional notation:
Preceding the expression to be converted by the type and enclosing the expression between
parentheses: E.g: i = int ( f );
Both ways of type casting are valid in C++.
sizeof()
This operator accepts one parameter, which can be either a type or a variable itself and returns
the size in bytes of that type or object:
a = sizeof (char); This will assign the value 1 to a because char is a one-byte long type.
1 :: Scope Left
3 ++ -- increment/decrement Right
! unary NOT
11 ?: Conditional Right
= += -= *= /= %=
12 Assignation Right
>>= <<= &= ^= |=
(The operators that you are familiar with are shaded in the above table)
Input/Output
C++ uses a convenient abstraction called streams to perform input and output operations in
sequential media such as the screen or the keyboard.
A stream is an object where a program can either insert or extract characters to/from it. We do
not really need to care about many specifications about the physical media associated with the
stream.
The insertion operator (<<) may be used more than once in a single statement:
o cout << "Hello, " << "I am " << "a C++ statement";
o cout << "Hello, I am " << age << " years old and my zipcode is " << zipcode;
o If we assume the age variable to contain the value 24 and the zipcode variable to contain
90064 the output of the previous statement would be:
Hello, I am 24 years old and my zipcode is 90064
cout does not add a line break after its output unless we explicitly indicate it.
In order to perform a line break on the output we must explicitly insert a new-line character into
cout.
In C++ a new-line character can be specified as \n (backslash, n):
o E.g: cout << "First sentence.\n ";
o cout << "Second sentence.\nThird sentence.";
Additionally, to add a new-line, you may also use the endl manipulator.
o Eg: cout << "First sentence." << endl;
o cout << "Second sentence." << endl;
cin can only process the input from the keyboard once the RETURN key has been pressed.
Therefore, even if we request a single character, the extraction from cin will not process the input
until the user presses RETURN after the character has been introduced.
Also cin extraction stops reading as soon as if finds any blank space character, so in this case we
will be able to get just one word for each extraction.
Library Functions
C+ + consists of many library functions which contain the functions that are used in the program
construction of the language.
These are the header files that are to be included before main () & are sometimes termed as
preprocessor statements.
Here are some files given:
Header File Purpose of including in the Program
stdlib.h Standard library functions like conversion of one type to other etc.
string.h String manipulation functions like strcpy (), strcat(), strcmp() etc.
Sample program 1: Write a c++ program to calculate area of a circle for given diameter d, using formula
r2 where r=d/2.
#include<iostream.h>
void main()
float A, pi=3.1415;
float d, r;
cin>>d;
r=d / 2;
A= pi * r * r;
Sample program 2: Write a c++ program to read the temperature in Fahrenheit and convert it into Celsius.
(Formula: c= (5.0/9)*(f-32)).
#include<iostream.h>
void main ()
float c,f;
cin>>f;
DATA COMMUNICATIONS
The fundamental purpose of a communications system is the exchange of data between two parties.
Figure 1.1 presents one particular example, which is communication between a workstation and a server
over a public telephone network.
Another example is the exchange of voice signals between two telephones over the same network. The
key components of the model are as follows:
Source. This device generates the data to be transmitted; examples are telephones and personal
computers.
Transmitter: Usually, the data generated by a source system are not transmitted directly in the
form in which they were generated. Rather, a transmitter transforms and encodes the information
in such a way as to produce electromagnetic signals that can be transmitted across some sort of
transmission system. For example, a modem takes a digital bit stream from an attached device
such as a personal computer and transforms that bit stream into an analog signal that can be
handled by the telephone network.
Transmission system: This can be a single transmission line or a complex network connecting
source and destination.
Receiver: The receiver accepts the signal from the transmission system and converts it into a
form that can be handled by the destination device. For example, a modem will accept an analog
signal coming from a network or transmission line and convert it into a digital bit stream.
Destination: Takes the incoming data from the receiver.
To get some flavor for the focus of data communication, Figure 1.2 provides a new perspective on the
communications model of Figure 1.1a. We trace the details of this figure using electronic mail as an
example. Suppose that the input device and transmitter are components of a personal computer. The
user of the PC wishes to send a message m to another user. The user activates the electronic mail
package on the PC and enters the message via the keyboard (input device). The character string is
briefly buffered in main memory. We can view it as a sequence of bits (g) in memory. The personal
computer is connected to some transmission medium, such as a local network or a telephone line, by an
Page 3 of 70
Admas University, CoSc2032, Lecture Note
I/O device (transmitter), such as a local network transceiver or a modem. The input data are
transferred to the transmitter as a sequence of voltage shifts [g(t)] representing bits on some
communications bus or cable. The transmitter is connected directly to the medium and converts the
incoming stream [g(t)] into a signal [s(t)] suitable for transmission; specific alternatives will be
described later on coming sections.
The transmitted signal s(t) presented to the medium is subject to a number of impairments, discussed
in later section, before it reaches the receiver. Thus, the received signal r(t) may differ from s(t). The
receiver will attempt to estimate the original s(t), based on r(t) and its knowledge of the medium,
producing a sequence of bits These bits are sent to the output personal computer, where they are briefly
buffered in memory as a block of bits In many cases, the destination system will attempt to determine
if an error has occurred and, if so, cooperate with the source system to eventually obtain a complete,
error-free block of data. These data are then presented to the user via an output device, such as a printer
or screen. The message as viewed by the user will usually be an exact copy of the original message (m).
Now consider a telephone conversation. In this case the input to the telephone is a message (m) in the
form of sound waves. The sound waves are converted by the telephone into electrical signals of the
same frequency. These signals are transmitted without modification over the telephone line. Hence the
input signal g(t) and the transmitted signal s(t) are identical. The signals (t) will suffer some distortion
over the medium, so that r(t) will not be identical to s(t). Nevertheless, the signal r(t) is converted back
into a sound wave with no attempt at correction or improvement of signal quality. Thus, is not an exact
replica of m. However, the received sound message is generally comprehensible to the listener. The
discussion so far does not touch on other key aspects of data communications, including data link
control techniques for controlling the flow of data and detecting and correcting errors, and multiplexing
techniques for transmission efficiency
A guided transmission medium is point to point if it provides a direct link between two devices and
Page 4 of 70
Admas University, CoSc2032, Lecture Note
those are the only two devices sharing the medium. In a multipoint guided configuration, more than
two devices share the same medium.
A transmission may be simplex, half duplex, or full duplex. In simplex transmission, signals are
transmitted in only one direction; one station is transmitter and the other is receiver. In half-duplex
operation, both stations may transmit, but only one at a time. In full-duplex operation, both stations may
transmit simultaneously. In the latter case, the medium is carrying signals in both directions at the same
time. How this can be is explained in due course. We should note that the definitions just given are the
ones in common use in the United States (ANSI definitions). Elsewhere (ITU-T definitions)
Transmission Media
In a data transmission system, the transmission medium is the physical path between transmitter and
receiver. For guided media, electromagnetic waves are guided along a solid medium, such as copper
twisted pair, copper coaxial cable, and optical fiber. For unguided media, wireless transmission occurs
through the atmosphere, outer space, or water.
The characteristics and quality of a data transmission are determined both by the characteristics of the
medium and the characteristics of the signal. In the case of guided media, the medium itself is more
important in determining the limitations of transmission.
For unguided media, the bandwidth of the signal produced by the transmitting antenna is more important
than the medium in determining transmission characteristics. One key property of signals transmitted
by antenna is directionality. In general, signals at lower frequencies are omnidirectional; that is, the
signal propagates in all directions from the antenna. At higher frequencies, it is possible to focus the
signal into a directional beam.
In considering the design of data transmission systems, key concerns are data rate and distance: the
greater the data rate and distance the better. A number of design factors relating to the transmission
medium and the signal determine the data rate and distance:
Bandwidth: All other factors remaining constant, the greater the bandwidth of a signal, the
higher the data rate that can be achieved.
Transmission impairments: Impairments, such as attenuation, limit the distance. For guided
media, twisted pair generally suffers more impairment than coaxial cable, which in turn suffers
more than optical fiber.
Interference: Interference from competing signals in overlapping frequency bands can distort
or wipe out a signal. Interference is of particular concern for unguided media, but is also a
problem with guided media. For guided media, interference can be caused by emanations from
nearby cables. For example, twisted pairs are often bundled together and conduits often carry
multiple cables. Interference can also be experienced from unguided transmissions. Proper
shielding of a guided medium can minimize this problem.
Number of receivers: A guided medium can be used to construct a point-to-point link or a
shared link with multiple attachments. In the latter case, each attachment introduces some
attenuation and distortion on the line, limiting distance and/or data rate.
Page 5 of 70
Admas University, CoSc2032, Lecture Note
Another important protocol architecture is the seven-layer OSI model.
It is clear that there must be a high degree of cooperation between the two computer systems. Instead
of implementing the logic for this as a single module, the task is
broken up into subtasks, each of which is implemented separately. In a protocol architecture, the
modules are arranged in a vertical stack. Each layer in the stack performs a related subset of the
functions required to communicate with another system. It relies on the next lower layer to perform
more primitive functions and to conceal the details of those functions. It provides services to the next
higher layer. Ideally, layers should be defined so that changes in one layer do not require changes in
other layers.
Of course, it takes two to communicate, so the same set of layered functions must exist in two systems.
Communication is achieved by having the corresponding, or peer, layers in two systems communicate.
The peer layers communicate by means of formatted blocks of data that obey a set of rules or
conventions known as a protocol. The key features of a protocol are as follows:
Syntax: Concerns the format of the data blocks
Semantics: Includes control information for coordination and error handling
Timing: Includes speed matching and sequencing
Page 6 of 70
Admas University, CoSc2032, Lecture Note
Host-to-host, or transport layer
Application layer
The physical layer covers the physical interface between a data transmission device (e.g., workstation,
computer) and a transmission medium or network. This layer is concerned with specifying the
characteristics of the transmission medium, the nature of the signals, the data rate, and related matters.
The network access layer is concerned with the exchange of data between an end system (server,
workstation, etc.) and the network to which it is attached. The sending computer must provide the
network with the address of the destination computer, so that the network may route the data to the
appropriate destination Regardless of the nature of the applications that are exchanging data, there is
usually a requirement that data be exchanged reliably. That is, we would like to be assured that all of
the data arrive at the destination application and that the data arrive in the same order in which they
were sent. As we shall see, the mechanisms for providing reliability are essentially independent of the
nature of the applications. Thus, it makes sense to collect those mechanisms in a common layer shared
by all applications; this is referred to as the host-to-host layer, or transport layer. The Transmission
Control Protocol (TCP) is the most commonly used protocol to provide this functionality.
Finally, the application layer contains the logic needed to support the various user applications. For
each different type of application, such as file transfer, a separate module is needed that is peculiar to
that application.
Page 7 of 70
Admas University, CoSc2032, Lecture Note
Digital-to-Digital Conversion
This section explains how to convert digital data into digital signals. It can be done in two ways, line
Page 8 of 70
Admas University, CoSc2032, Lecture Note
coding and block coding. For all communications, line coding is necessary whereas block coding is
optional.
Line Coding
The process for converting digital data into digital signal is said to be Line Coding. Digital data
is found in binary format. It is represented (stored) internally as series of 1s and 0s.
Digital signal is denoted by discreet signal, which represents digital data. There are three types
of line coding schemes available:
Unipolar Encoding
Unipolar encoding schemes use single voltage level to represent data. In this case, to represent
binary 1, high voltage is transmitted and to represent 0, no voltage is transmitted. It is also called
Unipolar-Non-return-to-zero, because there is no rest condition i.e. it either represents 1 or 0.
Polar Encoding
Polar encoding scheme uses multiple voltage levels to represent binary values. Polar encodings is
available in four types:
Polar Non Return to Zero (Polar NRZ)
It uses two different voltage levels to represent binary values. Generally, positive voltage represents 1
and negative value represents 0. It is also NRZ because there is no rest condition. NRZ scheme has two
variants: NRZ-L and NRZ-I.
Page 9 of 70
Admas University, CoSc2032, Lecture Note
NRZ-L changes voltage level at when a different bit is encountered whereas NRZ-I changes voltage
when a 1 is encountered.
Return to Zero (RZ)
Problem with NRZ is that the receiver cannot conclude when a bit ended and when the next bit is started,
in case when sender and receiver’s clock are not synchronized.
RZ uses three voltage levels, positive voltage to represent 1, negative voltage to represent 0 and zero
voltage for none. Signals change during bits not between bits.
Manchester
This encoding scheme is a combination of RZ and NRZ-L. Bit time is divided into two halves. It transits
in the middle of the bit and changes phase when a different bit is encountered.
Differential Manchester
This encoding scheme is a combination of RZ and NRZ-I. It also transits at the middle of the bit but changes
phase only when 1 is encountered.
Bipolar Encoding
Bipolar encoding uses three voltage levels, positive, negative, and zero. Zero voltage represents binary 0
and bit 1 is represented by altering positive and negative voltages.
Block Coding
Page 10 of 70
Admas University, CoSc2032, Lecture Note
To ensure accuracy of the received data frame, redundant bits are used. For example, in even-parity, one
parity bit is added to make the count of 1s in the frame even. This way the original number of bits is
increased. It is called Block Coding.
Block coding is represented by slash notation, mB/nB. Means, m-bit block is substituted with n-bit block
where n > m. Block coding involves three steps:
1. Division
2. Substitution
3. Combination.
After block coding is done, it is line coded for transmission.
Analog-to-Digital Conversion
Microphones create analog voice and camera creates analog videos, which are treated is analog data. To
transmit this analog data over digital signals, we need analog to digital conversion.
Analog data is a continuous stream of data in the wave form whereas digital data is discrete. To convert
analog wave into digital data, we use Pulse Code Modulation (PCM).
PCM is one of the most commonly used method to convert analog data into digital form. It involves three
steps:
Sampling
Quantization
Encoding.
Sampling
The analog signal is sampled every T interval. Most important factor in sampling is the rate at
which analog signal is sampled. According to Nyquist Theorem, the sampling rate must be at
least two times of the highest frequency of the signal.
Quantization
Sampling yields discrete form of continuous analog signal. Every discrete pattern shows the amplitude
of the analog signal at that instance. The quantization is done between the maximum amplitude value
and the minimum amplitude value. Quantization is approximation of the instantaneous analog value.
Switching
Switching is a mechanism by which data/information sent from source towards destination which are not
directly connected. Networks have interconnecting devices, which receives data from directly connected
Page 11 of 70
Admas University, CoSc2032, Lecture Note
sources, stores data, analyze it and then forwards to the next interconnecting device closest to the
destination.
In a switched communication network, data entering the network from a station are routed to the
destination by being switched from node to node. For example, in Figure 1.4, data from station A
intended for station F are sent to node 4. They may then be routed via nodes 5 and 6 or nodes 7 and 6
to the destination. Several observations are in order:
1. Some nodes connect only to other nodes (e.g., 5 and 7). Their sole task is the internal (to the
network) switching of data. Other nodes have one or more stations attached as well; in addition
to their switching functions, such nodes accept data from and deliver data to the attached
stations.
2. Node-station links are generally dedicated point-to-point links. Node-node links are usually
multiplexed, using either frequency division multiplexing (FDM) or time division multiplexing
(TDM).
3. Usually, the network is not fully connected; that is, there is not a direct link between every
possible pair of nodes. However, it is always desirable to have more than one possible path
through the network for each pair of stations. This enhances the reliability of the network.
Page 12 of 70
Admas University, CoSc2032, Lecture Note
Figure 1.4 Simple Switching Network
Two different technologies are used in wide area switched networks: circuit switching and packet
switching. These two technologies differ in the way the nodes switch information from one link to
another on the way from source to destination.
Circuit switching was developed to handle voice traffic but is now also used for data traffic. The best-
known example of a circuit-switching network is the public telephone network. A public
telecommunications network can be described using four generic architectural components:
Subscribers: The devices that attach to the network. It is still the case that most subscriber
devices to public telecommunications networks are telephones, but the percentage of data traffic
increases year by year.
Subscriber line: The link between the subscriber and the network, also referred to as the
subscriber loop or local loop. Almost all local loop connections use twisted-pair wire. The length
of a local loop is typically in a range from a few kilometers to a few tens of kilometers.
Exchanges: The switching centers in the network. A switching center that directly supports
subscribers is known as an end office. Typically, an end office will support many thousands of
subscribers in a localized area. There are over 19,000 end offices in the United States, so it is
clearly impractical for each end office to have a direct link to each of the other end offices; this
would require on the order of links. Rather, intermediate switching nodes are used.
Page 13 of 70
Admas University, CoSc2032, Lecture Note
Trunks: The branches between exchanges. Trunks carry multiple voice frequency circuits using
either FDM or synchronous TDM.
Packet Switching
The long-haul circuit-switching telecommunications network was originally designed to handle voice
traffic, and the majority of traffic on these networks continues to be voice. A key characteristic of
circuit-switching networks is that resources within the network are dedicated to a particular call. For
voice connections, the resulting circuit will enjoy a high percentage of utilization because, most of the
time, one party or the other is talking. However, as the circuit-switching network began to be used
increasingly for data connections, two shortcomings became apparent:
In a typical user/host data connection (e.g., personal computer user logged on to a database
server), much of the time the line is idle. Thus, with data connections, a circuit-switching
approach is inefficient.
In a circuit-switching network, the connection provides for transmission at a constant data rate.
Thus, each of the two devices that are connected must transmit and receive at the same data rate
as the other. This limits the utility of the network in interconnecting a variety of host computers
and workstations.
To understand how packet switching addresses these problems, let us briefly summarize packet-
switching operation. Data are transmitted in short packets. A typical upper bound on packet length is
1000 octets (bytes). If a source has a longer message to send, the message is broken up into a series of
packets in figure 1.5 . Each packet contains a portion (or all for a short message) of the user’s data plus
some control information. The control information, at a minimum, includes the information that the
network requires to be able to route the packet through the network and deliver it to the intended
destination. At each node in route, the packet is received, stored briefly, and passed on to the next node.
Let us return to Figure 1.4, simple packet switching, but now assume that it depicts a simple packet
switching network. Consider a packet to be sent from station A to station E. The packet includes control
information that indicates that the intended destination is E. The packet is sent from A to node 4. Node
4 stores the packet, determines the next leg of the route (say 5), and queues the packet to go out on that
link (the 4–5 link). When the link is available, the packet is transmitted to node 5, which forwards the
packet to node 6, and finally to E. This approach has a number of advantages over circuit switching:
Line efficiency is greater, because a single node-to-node link can be dynamically shared by
many packets over time. The packets are queued up and transmitted as rapidly as possible over
the link. By contrast, with circuit switching, time on a node-to-node link is pre-allocated using
synchronous time division multiplexing. Much of the time, such a link may be idle because a
portion of its time is dedicated to a connection that is idle.
Page 14 of 70
Admas University, CoSc2032, Lecture Note
A packet-switching network can perform data-rate conversion. Two stations of different data
rates can exchange packets because each connects to its node at its proper data rate.
When traffic becomes heavy on a circuit-switching network, some calls are blocked; that is, the
network refuses to accept additional connection requests until the load on the network decreases.
On a packet-switching network, packets are still accepted, but delivery delay increases.
Priorities can be used. If a node has a number of packets queued for transmission, it can transmit
the higher-priority packets first. These packets will therefore experience less delay than lower-
priority packets.
Switching Technique
If a station has a message to send through a packet-switching network that is of length greater than the
maximum packet size, it breaks the message up into packets and sends these packets, one at a time, to
the network. A question arises as to how the network will handle this stream of packets as it attempts
to route them through the network and deliver them to the intended destination. Two approaches are
used in contemporary networks: datagram and virtual circuit.
In the datagram approach, each packet is treated independently, with no reference to packets that have
gone before.
Each node chooses the next node on a packet’s path, taking into account information received from
neighboring nodes on traffic, line failures, and so on. So the packets, each with the same destination
address, do not all follow the same route, and they may arrive out of sequence at the exit point. In this
example, the exit node restores the packets to their original order before delivering them to the
destination. In some datagram networks, it is up to the destination rather than the exit node to do the
reordering. Also, it is possible for a packet to be destroyed in the network. For example, if a packet-
switching node crashes momentarily, all of its queued packets may be lost. Again, it is up to either the
exit node or the destination to detect the loss of a packet and decide how to recover it. In this technique,
each packet, treated independently, is referred to as a datagram.
In the virtual circuit approach, a preplanned route is established before any packets are sent. Once the
route is established, all the packets between a pair of communicating parties follow this same route
through the network. somewhat similar to a circuit in a circuit-switching network and is referred to as
a virtual circuit. Each packet contains a virtual circuit identifier as well as data. Each node on the
preestablished route knows where to direct such packets; no routing decisions are required. At any time,
each station can have more than one virtual circuit to any other station and can have virtual circuits to
more than one station. So the main characteristic of the virtual circuit technique is that a route between
stations is set up prior to data transfer. Note that this does not mean that this is a dedicated path, as in
circuit switching. A transmitted packet is buffered at each node, and queued for output over a line, while
other packets on other virtual circuits may share the use of the line. The difference from the datagram
approach is that, with virtual circuits, the node need not make a routing decision for each packet. It is
made only once for all packets using that virtual circuit.
If two stations wish to exchange data over an extended period of time, there are certain advantages to
virtual circuits. First, the network may provide services related to the virtual circuit, including
sequencing and error control. Sequencing refers to the fact that, because all packets follow the same
route, they arrive in the original order. Error control is a service that assures not only that packets arrive
in proper sequence, but also that all packets arrive correctly. For example, if a packet in a sequence
from node 4 to node 6 fails to arrive at node 6, or arrives with an error, node 6 can request a
retransmission of that packet from node 4. Another advantage is that packets should transit the network
more rapidly with a virtual circuit; it is not necessary to make a routing decision for each packet at each
node
Page 15 of 70
Admas University, CoSc2032, Lecture Note
COMPUTER NETWORKING
A system of interconnected computers and computerized peripherals such as printers is called computer
network. This interconnection among computers facilitates information sharing among them.
Computers may connect to each other by either wired or wireless media.
Geographical Span
Geographically a network can be seen in one of the following categories:
It may be spanned across your table, among Bluetooth enabled devices, Ranging not more than
few meters.
It may be spanned across a whole building, including intermediate devices to connect all floors.
It may be spanned across a whole city.
It may be spanned across multiple cities or provinces.
It may be one network covering whole world.
Inter-Connectivity
Components of a network can be connected to each other differently in some fashion. By
connectedness we mean either logically, physically, or both ways.
Every single device can be connected to every other device on network, making the
network mesh.
All devices can be connected to a single medium but geographically disconnected,
created bus-like structure.
Each device is connected to its left and right peers only, creating linear structure.
All devices connected together with a single device, creating star-like structure.
All devices connected arbitrarily using all previous ways to connect each other, resulting
in a hybrid structure.
Administration
From an administrator’s point of view, a network can be private network which belongs a single
autonomous system and cannot be accessed outside its physical or logical domain. A network can be
public, which is accessed by all.
Network Architecture
Computer networks can be discriminated into various types such as Client-Server, peer-to-peer or
hybrid, depending upon its architecture.
There can be one or more systems acting as Server. Other being Client, requests the Server to
serve requests. Server takes and processes request on behalf of Clients.
Two systems can be connected Point-to-Point, or in back-to-back fashion. They both reside at the
same level and called peers.
There can be hybrid network which involves network architecture of both the above types.
Network Applications
Computer systems and peripherals are connected to form a network. They provide numerous
Page 16 of 70
Admas University, CoSc2032, Lecture Note
advantages:
Resource sharing such as printers and storage devices
Exchange of information by means of e-Mails and FTP
Information sharing by using Web or Internet
Interaction with other users using dynamic web pages
IP phones
Video conferences
Parallel computing
Instant messaging
For example, Piconet is Bluetooth-enabled Personal Area Network which may contain up to 8
devices connected together in a master-slave fashion.
LocalArea Network
A computer network spanned inside a building and operated under single administrative system
is generally termed as Local Area Network (LAN). Usually, LAN covers an organization offices,
schools, colleges or universities. Number of systems connected in LAN may vary from as least
as two to as much as 16 million. LAN provides a useful way of sharing the resources between end
users. The resources such as printers, file servers, scanners, and internet are easily sharable
among computers.
Page 17 of 70
Admas University, CoSc2032, Lecture Note
LANs are composed of inexpensive networking and routing equipment. It may contains local servers
serving file storage and other locally shared applications. It mostly operates on private IP addresses and
does not involve heavy routing. LAN works under its own local domain and controlled centrally.
LAN uses either Ethernet or Token-ring technology. Ethernet is most widely employed LAN technology
and uses Star topology, while Token-ring is rarely seen. LAN can be wired, wireless, or in both forms
at once.
Metropolitan AreaNetwork
The Metropolitan Area Network (MAN) generally expands throughout a city such as cable TV network.
It can be in the form of Ethernet, Token-ring, ATM, or Fiber Distributed Data Interface (FDDI).
Metro Ethernet is a service which is provided by ISPs. This service enables its users to expand their
Local Area Networks. For example, MAN can help an organization to connect all of its offices in a city.
Backbone of MAN is high-capacity and high-speed fiber optics. MAN works in between Local Area
Network and Wide Area Network. MAN provides uplink for LANs to WANs or internet.
Wide AreaNetwork
As the name suggests, the Wide Area Network (WAN) covers a wide area which may span across
provinces and even a whole country. Generally, telecommunication networks are Wide Area Network.
These networks provide connectivity to MANs and LANs. Since they are equipped with very high speed
backbone, WANs use very expensive network equipment.
Page 18 of 70
Admas University, CoSc2032, Lecture Note
WAN may use advanced technologies such as Asynchronous Transfer Mode (ATM), Frame Relay, and
Synchronous Optical Network (SONET). WAN may be managed by multiple administration.
Peer-to-peer: Both remote processes are executing at same level and they exchange data using
some shared resource.
Client-Server: One remote process acts as a Client and requests some resource from another
application process acting as Server.
In client-server model, any process can act as Server or Client. Itis not the type of machine, size of the
machine, or its computing power which makes it server; it is the ability of serving request that makes a
machine a server.
A system can act as Server and Client simultaneously. That is, one process is acting as Server and
another is acting as a client. This may also happen that both client and server processes reside on the
same machine.
Communication
Two processes in client-server model can interact in various ways:
Sockets
Remote Procedure Calls (RPC)
Page 19 of 70
Admas University, CoSc2032, Lecture Note
Sockets
In this paradigm, the process acting as Server opens a socket using a well-known (or known by client)
port and waits until some client request comes. The second process acting as a Client also opens a socket;
but instead of waiting for an incoming request, the client processes ‘requests first’.
When the request is reached to server, it is served. It can either be an information sharing or resource
request.
Remote Procedure Call
This is a mechanism where one process interacts with another by means of procedure calls. One process
(client) calls the procedure lying on remote host. The process on remote host is said to be Server. Both
processes are allocated stubs. This communication happens in the following way:
The client process calls the client stub. It passes all the parameters pertaining to program local
to it.
All parameters are then packed (marshalled) and a system call is made to send them to other side
of the network.
Kernel sends the data over the network and the other end receives it.
The remote host passes data to the server stub where it is unmarshalled.
The parameters are passed to the procedure and the procedure is then executed.
The result is sent back to the client in the same manner.
NETWORK COMPONENTS
Networking hardware components
Transceivers
Page 20 of 70
Admas University, CoSc2032, Lecture Note
A transceiver is a networking device that converts from one cabling technology to another. For example,
a transceiver may act as an interface between a network based on coaxial cable and one using fibre-
optic cable.
Repeater
In a bus topology, signal loss can occur if the segments are too long. A repeater is a
device that connects two network segments and broadcasts data between them. It
amplifies the signal, thereby extending the usable length of the bus.
Hub
One network component that has become standard equipment in networks is the hub. A hub acts as the
central component in a star topology, and typically contains 4, 8, 16 or even more different ports for
connecting to computers or other hubs. It is similar in operation to a repeater, except that it broadcasts
data received by any of the ports to all other ports on the hub. Hubs can be active, passive or hybrid.
Most hubs are active; that is, they regenerate and retransmit signals in the same way as a repeater does.
Because hubs usually have eight to twelve ports for network computers to connect to, they are
sometimes called multiport repeaters. Active hubs require electrical power to run. Some types of hubs
are passive. They act as connection points and do not amplify or regenerate the signal; the signal passes
through the hub. Passive hubs do not require electrical power to run. Advanced hubs that will
accommodate several different types of cables are called hybrid hubs.
A switch is similar to a bridge, except that it has multiple ports. A switch can also be seen as a more
intelligent hub – whereas a hub passes on all data to every port, a switch will only pass data on to the
port that it is intended for.
A router is also used for connecting networks together. However, unlike a bridge, a router can be used
to connect networks that use different network technologies. Routers are very commonly found in the
hardware infrastructure that forms the basis of the Internet.
The topic of routing in computer networking is a crucial one and has been the subject of much research.
We will return to this important topic in Handout 4 (Network Architecture).
Wireless networking
Although most networks use physical connections between the network components, recently wireless
networking has been increasing in popularity. Wireless networks can use infrared light, line-of-sight
lasers, or radio waves to transmit data between nodes without the need for physical cabling. They
eliminate the need to install physical cabling and offer a lot of flexibility for users using the network.
However, they are currently more expensive and slower than cable-based networks. As costs drop and
performance increases, wireless networks are sure to be increasingly popular in the future.
There are two main types of hardware associated with wireless communication in
computing: Bluetooth and 802.11. Bluetooth only allows very short-range transmission
Page 21 of 70
Admas University, CoSc2032, Lecture Note
(typically less than 10m) and is intended primarily for cable-free peripherals, such as
mouses and keyboards. 802.11, or wireless Ethernet, is the standard for wireless
networking of computers, and will be discussed in more detail in Handout 4 (Network
Architecture
Novell's NetWare is the most familiar and popular example of a NOS in which the client computer's
networking software is added on to its existing computer operating system. The desktop computer needs
both operating systems in order to handle stand-alone and networking functions together.
Network operating system software is integrated into a number of popular operating systems including
Windows 2000 Server/Windows 2000 Professional, Windows NT Server/Windows NT Workstation,
Windows 98, Windows 95, and AppleTalk.
A computer's operating system coordinates the interaction between the computer and the programs
(applications) it is running. It controls the allocation and use of hardware resources such as:
Memory
CPU time
Disk space
Peripheral devices
In a networking environment, servers provide resources to the network clients, and client network
software makes these resources available to the client computer. The network and the client operating
systems are coordinated so that all portions of the network function properly.
Multitasking
A multitasking operating system, as the name suggests, provides the means for a computer to process
more than one task at a time. A true multitasking operating system can run as many tasks as there are
processors (CPUs). If there are more tasks than processors, the computer must arrange for the available
processors to devote a certain amount of time to each task, alternating between tasks until all are
completed. With this system, the computer appears to be working on several tasks at once.
Page 22 of 70
Admas University, CoSc2032, Lecture Note
Because the interaction between the stand-alone operating system and the NOS is ongoing, a pre-
emptive multitasking system offers certain advantages. For example, when the situation requires it, the
pre-emptive system can shift CPU activity from a local task to a network task.
Client software
In a stand-alone system, when the user types a command that requests the computer to perform a task,
the request goes over the computer's local bus to the computer's CPU. For example, if you want to see
a directory listing on one of the local hard disks, the CPU interprets and executes the request and then
displays the results in a directory listing in the window. In a network environment, however, when a
user initiates a request to use a resource that exists on a server in another part of the network, the request
has to be forwarded, or redirected, away from the local bus, out onto the network, and from there to the
server with the requested resource. This forwarding is performed by the redirector.
The redirector
A redirector processes forwarding requests. Depending on the networking software, this redirector is
sometimes referred to as the "shell" or the "requester." The redirector is a small section of code in the
NOS that:
Intercepts requests in the computer
Determines if the requests should continue in the local computer's bus or be
redirected over the network to another server
Redirector activity originates in a client computer when the user issues a request for a network resource
or service. Figure 1.6 shows how a redirector forwards requests to the network. The user's computer is
referred to as a client because it is making a request of a server. The request is intercepted by the
redirector and forwarded out onto the network. The server processes the connection requested by client
redirectors and gives them access to the resources they request. In other words, the server services - or
fulfils - the request made by the client.
Using the redirector, users don't need to be concerned with the actual location of data or peripherals, or
with the complexities of making a connection.
The role of the NOS on a server is to process and act upon requests from clients (redirectors) for network
resources managed by the server. For example, in Figure 1.7, a user is requesting a directory listing on
a shared remote hard disk. The request is forwarded by the redirector on to the network, where it is
Page 23 of 70
Admas University, CoSc2032, Lecture Note
passed to the file and print server containing the shared directory. The request is granted, and the
directory listing is provided.
The server is also responsible for controlling the way in which resources are shared over the network.
Sharing is the term used to describe resources made publicly available for access by anyone on the
network. Most NOSs not only allow sharing, but also determine the degree of sharing. For example, an
office manager wants everyone on the network to be familiar with a certain document (file), so she
shares the document. However, she controls access to the document by sharing it in such a way that:
Some users will be able only to read it
Some users will be able to read it and make changes in it
Security models
It is the responsibility of the network administrator to ensure that network resources will be safe from
both unauthorised access and accidental or deliberate damage. Policies for assigning permissions and
rights to network resources are at the heart of securing the network.
Two security models have evolved for keeping data and hardware resources safe:
Password-protected shares
Access permissions
These models are also called "share-level security" (for password-protected shares) and "user-level
security" (for access permissions).
Access-permission security involves assigning certain rights on a user-by-user basis. A user types a
password when logging on to the network. The server validates this user name and password
combination and uses it to grant or deny access to shared resources by checking access to the resource
against a user- access database on the server. Access-permission security provides a higher level of
control over access rights. It is much easier for one person to give another person a printer password, as
in share-level security. It is less likely for that person to give away a personal password. Because user-
level security is more extensive and can determine various levels of security, it is usually the preferred
model in larger organizations.
Page 24 of 70
Admas University, CoSc2032, Lecture Note
Managing users
Network operating systems also allow a network administrator to determine which people, or groups of
people, will be able to access network resources. A network administrator can use the NOS to:
Create user privileges, tracked by the network operating system, that indicate who gets to use
the network
Grant or deny user privileges on the network
Remove users from the list of users that the network operating system tracks
To simplify the task of managing users in a large network, NOSs allow for the creation of user groups.
By classifying individuals into groups, the administrator can assign privileges to the group. All group
members have the same privileges, which have been assigned to the group as a whole. When a new user
joins the network, the administrator can assign the new user to the appropriate group, with its
accompanying rights and privileges.
Overview of NOSs
The major server-based network operating systems are Microsoft Windows NT 4 Server, Windows
2000 Server, Windows 2003 Server, Novell NetWare 3.x, 4.x and 5.x, and UNIX (including Linux and
Solaris). The principal peer-to-peer network operating systems are AppleTalk, Windows 95, 98, ME
and XP, and UNIX. Each operating system has its own strengths and weaknesses, and its own supporters
and detractors.
Network Applications
Computer networking has revolutionised the way people use computers. This section will briefly
examine some of the applications of computer networking that have led to this massive change. In
particular we will look at the Internet and electronic mail (or email).
The Internet
The Internet is a vast network of networks, the ultimate WAN, consisting of tens of
thousands of businesses, universities, and research organizations with millions of
individual users and using a variety of different network architectures.
What is now known as the Internet was originally formed in 1970 as a military network
called ARPAnet (Advanced Research Projects Agency network) as part of the United
States Department of Defence. The network opened to non-military users in the 1970s,
when universities and companies doing defence-related research were given access, and
flourished in the late 1980s as most universities and many businesses around the world
started to use the Internet. In 1993, when commercial Internet service providers were
first permitted to sell Internet connections to individuals, usage of the network grew
tremendously. There were millions of new users within months, and a new era of
computer communications began. Today, it is estimated that over 500 million people
use the Internet worldwide. The table below breaks this number down by region.
Page 25 of 70
Admas University, CoSc2032, Lecture Note
Asia/Pacific 143.99 million
Europe 154.63 million
Middle East 4.65million
Canada & USA 180.68 million
Latin America 25.33 million
World Total 513.41 million
Every site on the Internet has an address, just like people have PO Box numbers at their local post office.
On the Internet addresses are called URLs (Uniform Resource Locators). URLs are written as a number
of words separated by dots, for example www.yahoo.com. The word after the final dot (e.g. com) is the
domain of the address. The domain indicates the category of the web site. The table below lists some of
the more common categories of address on the Internet.
The World Wide Web (WWW) is a way of browsing the information on the Internet in a pleasant, easy
to understand. Text can be mixed with graphics, video, and audio to provide multimedia (i.e. many
different media) Internet content.
This is all made possible by using a special communications protocol, called the Hypertext Transport
Protocol (HTTP). You may have noticed when using the Internet that many URLs begin with the letters
“http://” - this means that the page of information will be transmitted using the Hypertext Transport
Protocol. Pages of multimedia Internet content are commonly written in a special language called
HTML (the Hypertext Markup Language)
Instant messaging
One of the more recent innovations in the use of the Internet is instant messaging. Using instant
messaging software two users in different parts of the world can take part in an on-line conversation
using their personal computers. Text typed at one computer will be “instantly” transmitted to the screen
of the other. Instant messaging provides for much faster and interactive communication than electronic
mail.
Electronic mail
When most people think of applications of the Internet they probably think first of electronic mail, or
email. Originally email was a way of sending simple text messages to different users over local area
networks. However, nowadays email can be used to send multimedia content such as audio, video or
even computer software to a user anywhere in the world.
Page 26 of 70
Admas University, CoSc2032, Lecture Note
Email is made possible by using the Simple Mail Transport Protocol (SMTP). SMTP specifies how
electronic mail messages are exchanged between computers using TCP. In order to use email, it is
necessary to install software on both the sending and receiving computer. Email uses the client-server
method to allow mail to be exchanged. Client computers exchange messages with a mail server that is
responsible for ensuring that the message reaches its destination. On the server computer each user is
assigned a specific mailbox. This electronic mailbox is just like a normal PO Box – mail is stored there
until a user logs on to collect their mail. Each electronic mailbox has a unique email address. Email
addresses are divided into two parts: the user name and the mailbox name. These two parts are separated
by an “@” character. For example, [email protected] is a valid email address. The user name is
“Elizabeth”, and the mail server that is responsible for collecting the mail is located at the computer
called “telecom.net.et”. In this case “telecom.net.et” is a mail server running at Ethiopian Telecom in
Addis Ababa. Remember that this computer name will also have an associated IP address to identify it
on the Internet.
SMTP is the protocol used to send email on the Internet. The user receiving the email will need to use
another protocol to access the incoming mail from the mail server. Two different protocols exist for this
purpose: the Post Office Protocol (POP3) and the newer alternative, Internet Message Access Protocol
(IMAP).
A Network Topology is the arrangement with which computer systems or network devices are
connected to each other. Topologies may define both physical and logical aspect of the network. Both
logical and physical topologies could be same or different in a same network.
Point-to-Point
Point-to-point networks contains exactly two hosts such as computer, switches, routers, or servers
connected back to back using a single piece of cable. Often, the receiving end of one host is connected
to sending end of the other and vice versa
If the hosts are connected point-to-point logically, then may have multiple intermediate devices. But
the end hosts are unaware of underlying network and see each other as if they are connected directly.
Bus Topology
In case of Bus topology, all devices share single communication line or cable. Bus topology may have
problem while multiple hosts sending data at the same time. Therefore, Bus topology either uses
CSMA/CD technology or recognizes one host as Bus Master to solve the issue. It is one of the simple
forms of networking where a failure of a device does not affect the other devices. But failure of the
shared communication line can make all other devices stop functioning.
Page 27 of 70
Admas University, CoSc2032, Lecture Note
Both ends of the shared channel have line terminator. The data is sent in only one direction and as soon
as it reaches the extreme end, the terminator removes the data from the line.
Star Topology
All hosts in Star topology are connected to a central device, known as hub device, using a point-to-point
connection. That is, there exists a point to point connection between hosts and hub. The hub device can
be any of the following:
Layer-1 device such as hub or repeater
Layer-2 device such as switch or bridge
Layer-3 device such as router or gateway
As in Bus topology, hub acts as single point of failure. If hub fails, connectivity of all hosts to all other
hosts fails. Every communication between hosts takes place through only the hub. Star topology is not
expensive as to connect one more host, only one cable is required and configuration is simple
Page 28 of 70
Admas University, CoSc2032, Lecture Note
.Ring Topology
In ring topology, each host machine connects to exactly two other machines, creating a circular network
structure. When one host tries to communicate or send message to a host which is not adjacent to it,
the data travels through all intermediate hosts. To connect one more host in the existing structure, the
administrator may need only one more extra cable.
Failure of any host results in failure of the whole ring. Thus, every connection in the ring is a point of
failure. There are methods which employ one more backup ring.
Mesh Topology
In this type of topology, a host is connected to one or multiple hosts. This topology has hosts in point-
to-point connection with every other host or may also have hosts which are in point-to-point connection
with few hosts only.
Hosts in Mesh topology also work as relay for other hosts which do not have direct point-to-point links.
Mesh technology comes into two types:
Full Mesh: All hosts have a point-to-point connection to every other host in the
network. Thus, for every new host n(n-1)/2 connections are required. It provides the
most reliable network structure among all network topologies.
Page 29 of 70
Admas University, CoSc2032, Lecture Note
Partially Mesh: Not all hosts have point-to-point connection to every other host. Hosts
connect to each other in some arbitrarily fashion. This topology exists where we need
to provide reliability to some hosts out of all.
Tree Topology
Also known as Hierarchical Topology, this is the most common form of network topology in use
presently. This topology imitates as extended Star topology and inherits properties of Bus topology.
This topology divides the network into multiple levels/layers of network. Mainly in LANs, a network
is bifurcated into three types of network devices. The lowermost is access-layer where computers are
attached. The middle layer is known as distribution layer, which works as mediator between upper layer
and lower layer. The highest layer is known as core layer, and is central point of the network, i.e. root
of the tree from which all nodes fork.
All neighboring hosts have point-to-point connection between them. Similar to the Bus topology, if the
root goes down, then the entire network suffers even though it is not the single point of failure. Every
connection serves as point of failure, failing of which divides the network into unreachable segment.
Hybrid Topology
A network structure whose design contains more than one topology is said to be hybrid topology.
Hybrid topology inherits merits and demerits of all the incorporating topologies.
Page 30 of 70
The above picture represents an arbitrarily hybrid topology. The combining topologies may contain
attributes of Star, Ring, Bus, and Daisy-chain topologies. Most WANs are connected by means of Dual-
Ring topology and networks connected to them are mostly Star topology networks. Internet is the best
example of largest Hybrid topology.
Every layer clubs together all procedures, protocols, and methods which it requires to execute its piece
of task. All layers identify their counterparts by means of encapsulation header and tail.
OSI Model
Open System Interconnect is an open standard for all communication systems. OSI model is
established by International Standard Organization (ISO). This model has seven layers:
Page 31 of 70
Application Layer: This layer is responsible for providing interface to the application user. This layer
encompasses protocols which directly interact with the user.
Presentation Layer: This layer defines how data in the native format of remote host should be
presented in the native format of host.
Session Layer: This layer maintains sessions between remote hosts. For example, once user/password
authentication is done, the remote host maintains this session for a while and does not ask for
authentication again in that time span.
Transport Layer: This layer is responsible for end-to-end delivery between hosts.
Network Layer: This layer is responsible for address assignment and uniquely addressing hosts in a
network.
Data Link Layer: This layer is responsible for reading and writing data from and onto the line. Link
errors are detected at this layer.
Physical Layer: This layer defines the hardware, cabling, wiring, power output, pulse rate etc.
TCP/IP Model
Internet uses TCP/IP protocol suite, also known as Internet suite. This defines Internet Model which
contains four layered architecture. OSI Model is general communication model but Internet Model is
what the internet uses for all its communication. The internet is independent of its underlying network
architecture so is its Model. This model has the following layers:
Application Layer: This layer defines the protocol which enables user to interact with the network.
For example, FTP, HTTP etc.
Page 32 of 70
Transport Layer: This layer defines how data should flow between hosts. Major protocol at this layer
is Transmission Control Protocol (TCP). This layer ensures data delivered between hosts is in-order
and is responsible for end- to-end delivery.
Internet Layer: Internet Protocol (IP) works on this layer. This layer facilitates host addressing and
recognition. This layer defines routing.
Link Layer: This layer provides mechanism of sending and receiving actual data. Unlike its OSI
Model counterpart, this layer is independent of underlying network architecture and hardware.
TRANSMISSION MEDIA
Guided Media
The transmission media is nothing but the physical media over which communication takes place in
computer networks.
Magnetic Media
One of the most convenient way to transfer data from one computer to another, even before the birth of
networking, was to save it on some storage media and transfer physical from one station to another.
Though it may seem old-fashion way in today’s world of high speed internet, but when the size of data
is huge, the magnetic media comes into play. For example, a bank has to handle and transfer huge data
of its customer, which stores a backup of it at some geographically far-away place for security reasons
and to keep it from uncertain calamities. If the bank needs to store its huge backup data, then its transfer
through internet is not feasible. The WAN links may not support such high speed. Even if they do; the
cost is too high to afford.
In these cases, data backup is stored onto magnetic tapes or magnetic discs, and then shifted physically
at remote places.
Twisted PairCable
A twisted pair cable is made of two plastic insulated copper wires twisted together to form a single
media. Out of these two wires, only one carries actual signal and another is used for ground reference.
The twists between wires are helpful in reducing noise (electro-magnetic interference) and crosstalk.
Page 33 of 70
UTP has seven categories, each suitable for specific use. In computer networks, Cat- 5, Cat-5e, and Cat-
6 cables are mostly used. UTP cables are connected by RJ45 connectors.
Coaxial Cable
Coaxial cable has two wires of copper. The core wire lies in the center and it is made of solid conductor.
The core is enclosed in an insulating sheath. The second wire is wrapped around over the sheath and
that too in turn encased by insulator sheath. This all is covered by plastic cover.
Because of its structure, the coax cable is capable of carrying high frequency signals than that of twisted
pair cable. The wrapped structure provides it a good shield against noise and cross talk. Coaxial cables
provide high bandwidth rates of up to 450 mbps.
There are three categories of coax cables namely, RG-59 (Cable TV), RG-58 (Thin Ethernet), and RG-
11 (Thick Ethernet). RG stands for Radio Government. Cables are connected using BNC connector
and BNC-T. BNC terminator is used to terminate the wire at the far ends.
Fiber Optics
Fiber Optic works on the properties of light. When light ray hits at critical angle, it tends to refracts at
90 degree. This property has been used in fiber optic. The core of fiber optic cable is made of high
quality glass or plastic. From one end of it light is emitted, it travels through it and at the other end light
detector detects light stream and converts it to electric data.
Fiber Optic provides the highest mode of speed. It comes in two modes, one is single mode fiber and
second is multimode fiber. Single mode fiber can carry a single ray of light whereas multimode is
capable of carrying multiple beams of light.
Page 34 of 70
Fiber Optic also comes in unidirectional and bidirectional capabilities. To connect and access
fiber optic special type of connectors are used. These can be Subscriber Channel (SC), Straight
Tip (ST), or MT-RJ.
Unguided Media
Wireless transmission is a form of unguided media. Wireless communication involves no physical link
established between two or more devices, communicating wirelessly. Wireless signals are spread over
in the air and are received and interpreted by appropriate antennas.
When an antenna is attached to electrical circuit of a computer or wireless device, it converts the digital
data into wireless signals and spread all over within its frequency range. The receptor on the other end
receives these signals and converts them back to digital data.
A little part of electromagnetic spectrum can be used for wireless transmission.
Radio Transmission
Radio frequency is easier to generate and because of its large wavelength it can penetrate through walls
and structures alike. Radio waves can have wavelength from 1mm – 100,000km and have frequency
ranging from 3Hz (Extremely Low Frequency) to 300 GHz (Extremely High Frequency). Radio
frequencies are sub-divided into six bands.
Radio waves at lower frequencies can travel through walls whereas higher RF can travel in straight line
and bounce back. The power of low frequency waves decreases sharply as they cover long distance.
High frequency radio waves have more power.
Lower frequencies such as VLF, LF, MF bands can travel on the ground up to 1000 kilometers, over
the earth’s surface.
Page 35 of 70
Radio waves of high frequencies are prone to be absorbed by rain and other obstacles. They use
Ionosphere of earth atmosphere. High frequency radio waves such as HF and VHF bands are spread
upwards. When they reach Ionosphere, they are refracted back to the earth.
Microwave Transmission
Electromagnetic waves above 100MHz tend to travel in a straight line and signals over them can be
sent by beaming those waves towards one particular station. Because Microwaves travels in straight
lines, both sender and receiver must be aligned to be strictly in line-of-sight.
Microwaves can have wavelength ranging from 1mm – 1meter and frequency ranging from 300MHz to
300GHz.
Microwave antennas concentrate the waves making a beam of it. As shown in picture above, multiple
antennas can be aligned to reach farther. Microwaves have higher frequencies and do not penetrate
wall like obstacles.
Microwave transmission depends highly upon the weather conditions and the frequency it is using.
Infrared Transmission
Infrared wave lies in between visible light spectrum and microwaves. It has wavelength of 700nm to
1mm and frequency ranges from 300GHz to 430THz.
Infrared wave is used for very short range communication purposes such as television and its remote.
Infrared travels in a straight line hence it is directional by nature. Because of high frequency range,
Infrared cannot cross wall-like obstacles.
Light Transmission
Highest most electromagnetic spectrum which can be used for data transmission is light or optical
signaling. This is achieved by means of LASER.
Page 36 of 70
Because of frequency light uses, it tends to travel strictly in straight line. Hence the sender and receiver
must be in the line-of-sight. Because laser transmission is unidirectional, at both ends of communication
the laser and the photo-detector needs to be installed. Laser beam is generally 1mm wide hence it is a
work of precision to align two far receptors each pointing to lasers source.
Lasers cannot penetrate obstacles such as walls, rain, and thick fog. Additionally, laser beam is distorted
by wind, atmosphere temperature, or variation in temperature in the path.
Laser is safe for data transmission as it is very difficult to tap 1mm wide laser without interrupting the
communication channel.
Page 37 of 70
Database Management System
2
Data management levels
◼ Data management passes through the different levels of
development along with the development in technology
and services.
◼ These levels could best be described by categorizing the
levels into three levels of development.
◼ Even though there is an advantage and a problem
overcome at each new level, all methods of data
handling are in use to some extent.
3
Data Management levels
◼ The major three levels are:-
◼ Manual Approach
◼ Database Approach
4
1. Manual Approach
In the manual approach, data storage and retrieval follows
the primitive and traditional way of information handling
where cards and paper are used for the purpose.
5
Limitations of the Manual approach
◼ Prone to error
◼ Difficult to update, retrieve, integrate
◼ You have the data but it is difficult to compile the
information
◼ Limited to small size information
◼ Cross referencing is difficult
6
2. Traditional File Based Approach
◼ File based systems were an early attempt to
computerize the manual filing system.
◼ This approach is the decentralized computerized data
handling method.
◼ A collection of application programs perform services
for the end-users.
◼ In such systems, every application program that
provides service to end users define and manage its
own data.
7
2. Traditional File Based Approach
◼ Such systems have number of programs for each of the
different applications in the organization.
◼ Since every application defines and manages its own
data, the system is subjected to serious data duplication
problem.
◼ File, in traditional file based approach, is a collection of
records which contains logically related data.
8
2. Traditional File Based Approach
9
Limitations of the Traditional File Based approach
10
Limitations of the Traditional File Based approach
◼ Deletion Anomalies
◼ Insertion Anomalies
11
Limitations of the Traditional File Based approach
12
3. Database Approach
What is database?
◼ Database is an organized collection of logically related
data
◼ Database is a shared collection of logically related data
13
3. Database Approach
14
Benefits of the database approach
15
Benefits of the database approach
16
Limitations and risk of Database Approach
◼ Introduction of new professional and specialized
personnel.
◼ Complexity in designing and managing data
◼ The cost and risk during conversion from the old to the
new system
◼ High cost incurred to develop and maintain
◼ Complex backup and recovery services from the users
perspective
◼ Reduced performance due to centralization
◼ High impact on the system when failure occur
17
Database applications
◼ Banking: transactions
◼ Airlines: reservation , schedules
◼ Universities:registration, grades
◼ Sales:customers ,sales purchases
◼ Online retailers:order tracking
◼ Manufacturing: production,inventory,orders,supply
chain
◼ Human resource: employee records,salaries ,tax
deductions
18
University database example
◼ Application program examples
◼ Add new students, instructors and courses
◼ Generate transcripts
19
Database users and administrator
◼ Native users
◼ Application programmer
◼ Sophisticated uses
◼ Specialized user
◼ Online users
20
Database users and administrator
◼ Native users
◼ Those who need not be aware the presence of database
systems
◼ These are end users who work though menu driven
applications
◼ Application programmer
◼ Are responsible for developing application programs/user interfaces
written in high level language
21
Database users and administrator
◼ Sophisticated uses
◼ Are users familiar with the structure of the Database and
facilities of the DBMS.
◼ Have complex requirements
◼ Have higher level queries
◼ Are most of the time engineers, scientists, business analysts, etc
◼ Specialized user
◼ Who rights specialized database applications that do not fit into
fractional database processing framework
◼ Online users
◼ Who may communicate with database directly though online
22
Database Administrator
◼ A person/group in charge for implementing database
system in an organization.
◼ The DBA has all privileges allowed by the database
management system. He can assign or remove
privileges from the users.
23
Database Management System(DBMS)
◼ My SQL
◼ SQL Server
24
Database
◼ Massive
◼ Persistent
◼ Safe
◼ Multi-user
◼ Continent
◼ Efficient
◼ reliable
25
Key people
◼ DBMS implementer
◼ Builds system
◼ Database designer
◼ Establishes schema
◼ Database application developer
◼ Programs that operate a database
◼ Database administrator
◼ Loads data ,keeping running smoothly
26
Fundamentals of Database Systems
◼ Classification of DBMS
28
DBMS Architecture
There are three levels
◼ External level/view level
29
DBMS Architecture
◼ External Level: Users' view of the database. Describes
that part of database that is relevant to a particular user.
Different users have their own customized view of the
database independent of other users.
◼ Conceptual Level: Community view of the database.
Describes what data is stored in database and
relationships among the data.
◼ Internal Level: Physical representation of the database
on the computer. Describes how the data is stored in
the database.
30
DBMS Architecture
External schemas at the external level to describe the
various user views. Usually uses the same data model as
the conceptual level
Conceptual schema at the conceptual level to describe the
structure and constraints for the whole database for a
community of users. Uses a conceptual data model.
◼ Conceptual schema represents:-
31
DBMS Architecture
◼ Data structure
◼ File organizations
32
ANSI-SPARC Three-level Architecture
33
ANSI-SPARC Architecture and Database Design Phases
34
DBMS schemas at three levels:
35
Data independence
Logical Data Independence:
◼ Refers to immunity of external schemas to changes in
conceptual schema.
◼ Conceptual schema changes e.g. addition/removal of entities
should not require changes to external schema or rewrites of
application programs.
◼ The capacity to change the conceptual schema without
having to change the external schemas and their application
programs
36
Data independence
Physical Data Independence
◼ The ability to modify the physical schema without changing the
logical schema
◼ The capacity to change the internal schema without having to
change the conceptual schema
◼ Refers to immunity of conceptual schema to changes in the
internal schema
◼ In general, the interfaces between the various levels and
components should be well defined so that changes in some parts
do not seriously influence others
37
Database Languages
Data Definition Language (DDL)
◼ Allows DBA or user to describe and name entitles, attributes
and relationships required for the application.
◼ Specification notation for defining the database schema
Data Manipulation Language (DML)
◼ Provides basic data manipulation operations on data held in
the database.
◼ Language for accessing and manipulating the data organized
by the appropriate data model
◼ DML also known as query language
38
Database Languages
DML can be procedural or non-procedural
◼ Procedural DML: user specifies what data is required and how
◼ Query Languages
◼ Forms Generators
◼ Report Generators
◼ Graphics Generators
◼ Application Generators
39
Data Model
What is data model?
◼ Data Model: a set of concepts to describe the
structure of a database, and certain constraints that the
database should obey.
◼ A data model is a description of the way that data is
stored in a database. Data model helps to understand
the relationship between entities and to create the most
effective structure to hold data.
40
Data Model…
Data Model is a collection of tools or concepts for
describing
◼ Data
◼ Data relationships
◼ Data semantics
◼ Data constraints
◼ The main purpose of Data Model is to represent the
data in an understandable way.
41
Categories of data models include
Categories of data models include:
◼ Object-based
◼ Record-based
◼ Physical
Object-based Data Models
◼ Entity-Relationship
◼ Semantic
◼ Functional
◼ Object-Oriented
42
Data Model…
43
Hierarchical Model
◼ The simplest data model
◼ Record type is referred to as node or segment
◼ The top node is the root node
◼ Nodes are arranged in a hierarchical structure as sort of
up sidedown tree
◼ A parent node can have more than one child node
◼ A child node can only have one parent node
◼ The relationship between parent and child is one-to-
many
44
Hierarchical Model
◼ Relation is established by creating physical link between
stored records (each is stored with a predefined access path to
other records)
◼ To add new record type or relationship, the database must be
redefined and then stored in a new form.
Department
Employee Job
45
Hierarchical Model
Advantages of hierarchical data model:
◼ Hierarchical Model is simple to construct and operate on
46
Network Model
◼ Allows record types to have more that one parent unlike
hierarchical model
◼ A network data models sees records as set members
◼ Each set has an owner and one or more members
◼ Allow no many to many relationship between entities
◼ Like hierarchical model network model is a collection
of physically linked records.
◼ Allow member records to have more than one owner
47
Network Model
Department Job
Employee
Activity
Time Card
48
Network Model
Advantages of network data model:
◼ Network Model is able to model complex relationships and
represents semantics of add/delete on the relationships.
◼ Can handle most situations for modeling using record types
and relationship types.
◼ Language is navigational; uses constructs like FIND, FIND
member, FIND owner, FIND NEXT within set, GET etc.
Programmers can do optimal navigation through the database.
Disadvantages of network data model:
◼ Navigational and procedural nature of processing
◼ Database contains a complex array of pointers that thread
through a set of records.
◼ Little scope for automated "query optimization”
49
Relational model
Database = set of named relations(or tables)
◼ Attribute
◼ Tuple
Schema =structural description of relations in a database
schema includes name of relations, attribute ,types of
each attribute
Instance = actual contents at given point in time
NULL = special value “unknown ” “undefined”
50
Relational model
◼ Developed by Dr. Edgar Frank Codd in 1970 (famous
paper, 'A Relational Model for Large Shared Data
Banks')
◼ Terminologies originates from the branch of
mathematics called set theory and relation
◼ Can define more flexible and complex relationship
◼ Viewed as a collection of tables called “Relations” equivalent to
collection of record types
51
Relational model…
◼ Viewed as a collection of tables called “Relations” equivalent to collection
of record types
◼ Relation: Two dimensional table
◼ Stores information or data in the form of tables rows and columns
◼ A row of the table is called tuple equivalent to record
◼ A column of a table is called attribute equivalent to fields
◼ Data value is the value of the Attribute
◼ Records are related by the data stored jointly in the fields of records in two
tables or files. The related tables contain information that creates the
relation
◼ The tables seem to be independent but are related some how.
◼ No physical consideration of the storage is required by the user
◼ Many tables are merged together to come up with a new virtual view of the
relationship
52
Relational model…
53
Relational model…
◼ The rows represent records (collections of information about
separate items)
◼ The columns represent fields (particular attributes of a record)
◼ Conducts searches by using data in specified columns of one
table to find additional data in another table
◼ In conducting searches, a relational database matches
information from a field in one table with information in a
corresponding field of another table to produce a third table
that combines requested data from both tables
54
Relational model…
Properties of Relational Databases
◼ Each row of a table is uniquely identified by a primary
key composed of one or more columns
◼ Each tuple in a relation must be unique
◼ Group of columns, that uniquely identifies a row in a
table is called a candidate key
◼ entity integrity rule of the model states that no
component of the primary key may contain a NULL
value.
55
Relational model…
Properties of Relational Databases
◼ A column or combination of columns that matches the
primary key of another table is called a foreign key.
Used to cross-reference tables.
◼ The referential integrity rule of the model states that,
for every foreign key value in a table there must be a
corresponding primary key value in another table in the
database or it should be NULL.
◼ All tables are logical entities
56
Relational model…
Properties of Relational Databases
◼ A table is either a BASE TABLES (Named Relations)
or VIEWS (Unnamed Relations)
◼ Only Base Tables are physically stores
◼ VIEWS are derived from BASE TABLES with SQL
instructions like:
◼ [SELECT .. FROM .. WHERE .. ORDER BY]
◼ Is the collection of tables o Each entity in one table
◼ Attributes are fields (columns) in table
57
Relational model…
Properties of Relational Databases
◼ Order of rows and columns is immaterial
◼ Entries with repeating groups are said to be un-
normalized
◼ Entries are single-valued
58
Building Blocks of the Relational Data Model
59
Building Blocks of the Relational Data Model
60
Fundamentals of Database Management
Systems
62
The E-R Model: over view
◼ An entity-relationship model (or E-R model) is a
detailed, logical representation of the data for an
organization or for a business area.
◼ The E-R model is expressed in terms of entities in the
business environment, the relationships (or
associations) among those entities, and the attributes
(or properties) of both the entities and their
relationships.
◼ An E-R model is normally expressed as an entity-
relationship diagram (or E-R diagram, or simply ERD),
which is a graphical representation of an E-R model.
63
The E-R Model
64
The E-R Model
Entity
◼ PRODUCT
◼ ORDER
◼ ITEM
◼ SUPPLIER
◼ SHIPMENT T
65
Drawing tools
◼ Microsoft Visio
◼ Oracle Designer
◼ All Fusion ERWin
◼ Power Designer
66
Building Blocks of the Relational Data Model
The building blocks of the relational data model are:
◼ Entities: real world physical or logical object
67
Building Blocks of the Relational Data Model
Entity: A person, place, object, event, or concept in the
user environment about which the organization wishes to
maintain data.
Thus, an entity has a noun name. Some examples of each
of these hinds of entities follow:
Person: EMPLOYEE, STUDENT, PATIENT
Place: STORE, WAREHOUSE, STATE
Object: MACHINE, BUILDING, AUTOMOBILE
Event: SALE, REGISTRATION, RENEWAL
Concept: ACCOUNT, COURSE, WORK CENTER
68
Building Blocks of the Relational Data Model
◼ Entity type: A collection of entities that share
common properties or characteristics.
◼ Entity instance: A single occurrence of an entity type.
69
Types of entity
◼ Strong entity type: An entity that exists
independently of other entity types
70
Attribute
◼ Attribute: A property or characteristic of an entity or
relationship type that is of interest to the organization.
71
Types of Attributes
1. Simple (atomic) Vs Composite attributes
2. Single-valued Vs multi-valued attributes
3. Stored vs. Derived Attribute
4. Null Values
72
Types of Attributes…
(1) Simple (atomic) Vs Composite attributes
◼ Simple : contains a single value (not divided into sub
parts)
◼ E.g. Age, gender
◼ Composite: Divided into sub parts (composed of other attributes)
◼ E.g. Name, address
73
Types of Attributes …
(2) Single-valued Vs multi-valued attributes
◼ Single-valued : have only single value(the value may
74
Types of Attributes…
Stored vs. Derived Attribute
◼ Stored : not possible to derive or compute
◼ Profit (earning-cost)
75
Types of Attributes…
Null Values
◼ NULL applies to attributes which are not applicable or
76
Relationships
◼ Relationships are the glue that holds together the
various components of an E-R model.
◼ Intuitively, a relationship is an association
representing an interaction among the instances of one
or more entity types that is of interest to the
organization.
◼ Thus, a relationship has a verb phrase name.
77
Relationship type and instances
(a) Relationship type (Completes)
78
Relationship type and instances
(b) Relationship instances
79
Relationship type and instances
Relationship type: A meaningful association between
(or among) entity types.
Relationship instance: An association between (or
among) entity instances where each relationship
instance includes exactly one entity from each
participating entity type.
80
Degree of a Relationship
81
Cardinality Constraints
◼ Cardinality constraint: Specifies the number of
instances of one entity that can (or must) be
associated with each instance of another entity.
◼ Minimum cardinality: The minimum number of
instances of one entity that may be associated with
each instance of another entity.
◼ Maximum cardinality: The maximum number of
instances of one entity that may be associated with
each instance of another entity.
82
Cardinality Constraints
Cardinality can be :-
◼ ONE-TO-ONE, e.g. Building - Location,
83
Cardinality Constraints
◼ Example
84
Problem in ER Modeling
◼ While designing the ER model one could face a
problem on the design which is called a connection
traps. Connection traps are problems arising from
misinterpreting certain relationships .
◼ There are two types of connection traps;
◼ Fan trap
◼ Chasm Trap:
85
Problem in ER Modeling
1.Fan trap:
◼ Occurs where a model represents a relationship between
86
1.Fan trap:
◼ Example
87
Fan trap …
◼ Problem: Which car (Car1 or Car3 or Car5) is used by
Employee 6 Emp6 working in Branch 1 (Bra1)?
◼ Thus from this ER Model one can not tell which car is
used by which staff since a branch can have more than
one car and also a branch is populated by more than one
employee.
◼ Thus we need to restructure the model to avoid the
connection trap.
88
Fan trap…
◼ To avoid the Fan Trap problem we can go for
restructuring of the E-R Model. This will result in the
following E-R Model.
89
Chasm Trap
90
Chasm Trap…
Example
91
Chasm Trap:
Problem: How can we identify which BRANCH is
responsible for which PROJECT? We know that whether
the PROJECT is active or not there is a responsible
BRANCH. But which branch is a question to be answered,
and since we have a minimum participation of zero
between employee and PROJECT we can’t identify the
BRANCH responsible for each PROJECT.
92
Chasm Trap…
◼ The solution for this Chasm Trap problem is to add
another relationship between the extreme entities
(BRANCH and PROJECT)
93
Constraints
◼ Domain Integrity: No value of the attribute should be
beyond the allowable limits
◼ Entity Integrity: In a base relation, no attribute of a
primary key can be null
◼ Referential Integrity: If a foreign key exists in a
relation, either the foreign key value must match a
candidate key in its home relation or the foreign key
value must be null foreign key to primary key match-
ups
◼ Enterprise Integrity: Additional rules specified by the
users or database administrators of a database are
incorporated
94
Key constraints
Key constraints
◼ If tuples are need to be unique in the database, and then we need
to make each tuple distinct. To do this we need to have relational
keys that uniquely identify each relation.
◼ Super Key: an attribute or set of attributes that uniquely
identifies a tuple within a relation.
◼ Candidate Key: a super key such that no proper subset of that
collection is a Super Key within the relation. A candidate key has
two properties:
1.Uniqueness
2.Irreducibility
If a candidate key consists of more than one attribute it is called
composite key.
95
Key constraints
◼ Primary Key: the candidate key that is selected to
identify tuples uniquely within the relation.
◼ The entire set of attributes in a relation can be considered as a
primary case in a worst case.
◼ Foreign Key: an attribute, or set of attributes, within
one relation that matches the candidate key of some
relation.
◼ A foreign key is a link between different relations to create
the view or the unnamed relation
96
Relational languages and views
◼ The languages in relational database management
systems are the DDL, SDL, VDL and the DML that
are used to define or create the database and perform
manipulation on the database.
◼ We have the two kinds of relation in relational
database.
◼ The difference is on how the relation is created, used
and updated:
97
Relational languages and views
1. Base Relation
◼ A Named Relation corresponding to an entity in the
conceptual schema, whose tuples are physically stored
in the database.
2. View
◼ Is the dynamic result of one or more relational
operations operating on the base relations to produce
another virtual relation. So a view virtually derived
relation that does not necessarily exist in the database
but can be produced upon request by a particular user at
the time of request.
98
Relational languages and views
Purpose of a view
◼ Hides unnecessary information from users
99
Symbols to draw ERD
100
Symbols to draw ERD
101
ERD example
102