0% found this document useful (0 votes)
114 views187 pages

Industryal

This document provides an introduction to data structures and algorithms analysis. It discusses that a program consists of organizing data through data structures and solving problems through algorithms. Abstract data types are introduced as a way to model real-world problems by focusing only on relevant properties and operations. Algorithms are then discussed as computational procedures that take input and produce output, transforming data structures from one state to another. The properties, analysis concepts, and complexity analysis of algorithms are also covered to determine algorithms' efficiency based on time and space requirements.

Uploaded by

Mikiyas Getasew
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
114 views187 pages

Industryal

This document provides an introduction to data structures and algorithms analysis. It discusses that a program consists of organizing data through data structures and solving problems through algorithms. Abstract data types are introduced as a way to model real-world problems by focusing only on relevant properties and operations. Algorithms are then discussed as computational procedures that take input and produce output, transforming data structures from one state to another. The properties, analysis concepts, and complexity analysis of algorithms are also covered to determine algorithms' efficiency based on time and space requirements.

Uploaded by

Mikiyas Getasew
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 187

Data Structures and Algorithms Analysis

1. Introduction to Data Structures and Algorithms Analysis


A program is written in order to solve a problem. A solution to a problem actually consists of two things:
 A way to organize the data
 Sequence of steps to solve the problem
The way data are organized in a computers memory is said to be Data Structure and the sequence of
computational steps to solve a problem is said to be an algorithm. Therefore, a program is nothing but
data structures plus algorithms.
1.1. Introduction to Data Structures
Given a problem, the first step to solve the problem is obtaining ones own abstract view, or model, of the
problem. This process of modeling is called abstraction.

The model defines an abstract view to the problem. This implies that the model focuses only on problem
related stuff and that a programmer tries to define the properties of the problem.

These properties include


 The data which are affected and
 The operations that are involved in the problem.
With abstraction you create a well-defined entity that can be properly handled. These entities define the
data structure of the program.

An entity with the properties just described is called an abstract data type (ADT).

1.1.1. Abstract Data Types


An ADT consists of an abstract data structure and operations. Put in other terms, an ADT is an abstraction
of a data structure.
The ADT specifies:
1. What can be stored in the Abstract Data Type
2. What operations can be done on/by the Abstract Data Type.
For example, if we are going to model employees of an organization:
 This ADT stores employees with their relevant attributes and discarding irrelevant attributes.
 This ADT supports hiring, firing, retiring, … operations.

1
A data structure is a language construct that the programmer has defined in order to implement an abstract
data type.

There are lots of formalized and standard Abstract data types such as Stacks, Queues, Trees, etc.

Do all characteristics need to be modeled?


Not at all
 It depends on the scope of the model
 It depends on the reason for developing the model

1.1.2. Abstraction
Abstraction is a process of classifying characteristics as relevant and irrelevant for the particular purpose
at hand and ignoring the irrelevant ones.

Applying abstraction correctly is the essence of successful programming

How do data structures model the world or some part of the world?
 The value held by a data structure represents some specific characteristic of the world
 The characteristic being modeled restricts the possible values held by a data structure
 The characteristic being modeled restricts the possible operations to be performed on the data
structure.
Note: Notice the relation between characteristic, value, and data structures
Where are algorithms, then?

1.2. Algorithms
An algorithm is a well-defined computational procedure that takes some value or a set of values as input
and produces some value or a set of values as output. Data structures model the static part of the world.
They are unchanging while the world is changing. In order to model the dynamic part of the world we
need to work with algorithms. Algorithms are the dynamic part of a program’s world model.

An algorithm transforms data structures from one state to another state in two ways:

 An algorithm may change the value held by a data structure


 An algorithm may change the data structure itself

The quality of a data structure is related to its ability to successfully model the characteristics of the
world. Similarly, the quality of an algorithm is related to its ability to successfully simulate the changes in
the world.

However, independent of any particular world model, the quality of data structure and algorithms is
determined by their ability to work together well. Generally speaking, correct data structures lead to
simple and efficient algorithms and correct algorithms lead to accurate and efficient data structures.

1.2.1. Properties of an algorithm


• Finiteness: Algorithm must complete after a finite number of steps.

2
• Definiteness: Each step must be clearly defined, having one and only one interpretation. At
each point in computation, one should be able to tell exactly what happens next.
• Sequence: Each step must have a unique defined preceding and succeeding step. The first step
(start step) and last step (halt step) must be clearly noted.
• Feasibility: It must be possible to perform each instruction.
• Correctness: It must compute correct answer for all possible legal inputs.
• Language Independence: It must not depend on any one programming language.
• Completeness: It must solve the problem completely.
• Effectiveness: It must be possible to perform each step exactly and in a finite amount of time.
• Efficiency: It must solve with the least amount of computational resources such as time and
space.
• Generality: Algorithm should be valid on all possible inputs.
• Input/Output: There must be a specified number of input values, and one or more result
values.

1.2.2. Algorithm Analysis Concepts


Algorithm analysis refers to the process of determining the amount of computing time and storage space
required by different algorithms. In other words, it’s a process of predicting the resource requirement of
algorithms in a given environment.

In order to solve a problem, there are many possible algorithms. One has to be able to choose the best
algorithm for the problem at hand using some scientific method. To classify some data structures and
algorithms as good, we need precise ways of analyzing them in terms of resource requirement. The main
resources are:

 Running Time
 Memory Usage
 Communication Bandwidth

Running time is usually treated as the most important since computational time is the most precious
resource in most problem domains.

There are two approaches to measure the efficiency of algorithms:


• Empirical: Programming competing algorithms and trying them on different instances.
• Theoretical: Determining the quantity of resources required mathematically (Execution time,
memory space, etc.) needed by each algorithm.

However, it is difficult to use actual clock-time as a consistent measure of an algorithm’s efficiency,


because clock-time can vary based on many things. For example,

 Specific processor speed


 Current processor load
 Specific data for a particular run of the program

3
o Input Size
o Input Properties
 Operating Environment

Accordingly, we can analyze an algorithm according to the number of operations required, rather than
according to an absolute amount of time involved. This can show how an algorithm’s efficiency changes
according to the size of the input.

1.2.3. Complexity Analysis


Complexity Analysis is the systematic study of the cost of computation, measured either in time units or
in operations performed, or in the amount of storage space required.

The goal is to have a meaningful measure that permits comparison of algorithms independent of operating
platform.
There are two things to consider:
 Time Complexity: Determine the approximate number of operations required to solve a problem
of size n.
 Space Complexity: Determine the approximate memory required to solve a problem of size n.

Complexity analysis involves two distinct phases:


 Algorithm Analysis: Analysis of the algorithm or data structure to produce a function T (n) that
describes the algorithm in terms of the operations performed in order to measure the complexity of
the algorithm.
 Order of Magnitude Analysis: Analysis of the function T (n) to determine the general
complexity category to which it belongs.

There is no generally accepted set of rules for algorithm analysis. However, an exact count of operations
is commonly used.

1.2.3.1. Analysis Rules:


1. We assume an arbitrary time unit.
2. Execution of one of the following operations takes time 1:
 Assignment Operation
 Single Input/Output Operation
 Single Boolean Operations
 Single Arithmetic Operations
 Function Return
3. Running time of a selection statement (if, switch) is the time for the condition evaluation + the
maximum of the running times for the individual clauses in the selection.
4. Loops: Running time for a loop is equal to the running time for the statements inside the loop *
number of iterations.
The total running time of a statement inside a group of nested loops is the running time of the
statements multiplied by the product of the sizes of all the loops.
For nested loops, analyze inside out.
 Always assume that the loop executes the maximum number of iterations possible.
5. Running time of a function call is 1 for setup + the time for any parameter calculations + the time
required for the execution of the function body.

4
Examples:
1. int count(){
int k=0;
cout<< “Enter an integer”;
cin>>n;
for (i=0;i<n;i++)
k=k+1;
return 0;}
Time Units to Compute
-------------------------------------------------
1 for the assignment statement: int k=0
1 for the output statement.
1 for the input statement.
In the for loop:
1 assignment, n+1 tests, and n increments.
n loops of 2 units for an assignment, and an addition.
1 for the return statement.
-------------------------------------------------------------------
T (n)= 1+1+1+(1+n+1+n)+2n+1 = 4n+6 = O(n)
2. int total(int n)
{
int sum=0;
for (int i=1;i<=n;i++)
sum=sum+1;
return sum;
}
Time Units to Compute
-------------------------------------------------
1 for the assignment statement: int sum=0
In the for loop:
1 assignment, n+1 tests, and n increments.
n loops of 2 units for an assignment, and an addition.
1 for the return statement.
-------------------------------------------------------------------
T (n)= 1+ (1+n+1+n)+2n+1 = 4n+4 = O(n)
3. void func()
{
int x=0;
int i=0;
int j=1;
cout<< “Enter an Integer value”;
cin>>n;
while (i<n){
x++;
i++;
}

5
while (j<n)
{
j++;
}
}
Time Units to Compute
-------------------------------------------------
1 for the first assignment statement: x=0;
1 for the second assignment statement: i=0;
1 for the third assignment statement: j=1;
1 for the output statement.
1 for the input statement.
In the first while loop:
n+1 tests
n loops of 2 units for the two increment (addition) operations
In the second while loop:
n tests
n-1 increments
-------------------------------------------------------------------
T (n)= 1+1+1+1+1+n+1+2n+n+n-1 = 5n+5 = O(n)
4. int sum (int n)
{
int partial_sum = 0;
for (int i = 1; i <= n; i++)
partial_sum = partial_sum +(i * i * i);
return partial_sum;
}
Time Units to Compute
-------------------------------------------------
1 for the assignment.
1 assignment, n+1 tests, and n increments.
n loops of 4 units for an assignment, an addition, and two multiplications.
1 for the return statement.
-------------------------------------------------------------------
T (n)= 1+(1+n+1+n)+4n+1 = 6n+4 = O(n)

1.2.3.2. Formal Approach to Analysis


In the above examples we have seen that analysis is a bit complex. However, it can be simplified by using
some formal approach in which case we can ignore initializations, loop control, and book keeping.

for Loops: Formally

• In general, a for loop translates to a summation. The index and bounds of the summation are the
same as the index and bounds of the for loop.

6
N
fo r (in t i = 1 ; i < = N ; i+ + ) {

}
s u m = s u m + i;  i1
1  N

• Suppose we count the number of additions that are done. There is 1 addition per iteration of the
loop, hence N additions in total.

Nested Loops: Formally

• Nested for loops translate into multiple summations, one for each for loop.

f o r ( in t i = 1 ; i < = N ; i+ + ) {
f o r ( in t j = 1 ; j < = M ; j+ + ) { N M N

}
s u m = s u m + i+ j ; 
i 1 j 1
2  i 1
2 M  2 MN
}

• Again, count the number of additions. The outer summation is for the outer for loop.

Consecutive Statements: Formally

• Add the running times of the separate blocks of your code


fo r (in t i = 1 ; i < = N ; i+ + ) {
su m = su m + i;
}  N   N N  2
fo r (in t i = 1 ; i < = N ; i+ + ) { 
 1   
 i 1   i 1 j 1
2  N  2 N
fo r (in t j = 1 ; j < = N ; j+ + ) { 
su m = su m + i+ j;
}
}

Conditionals: Formally
• If (test) s1 else s2: Compute the maximum of the running time for s1 and s2.
if (te s t = = 1 ) {
fo r (in t i = 1 ; i < = N ; i+ + ) {  N N N 
}}
s u m = s u m + i; max    1,   2  

 i 1 i1 j 1 
e ls e fo r (in t i = 1 ; i < = N ; i+ + ) {
fo r (in t j = 1 ; j < = N ; j+ + ) { 
max N , 2 N 2 
 2N 2
s u m = s u m + i+ j;
}}

Example:

7
Suppose we have hardware capable of executing 106 instructions per second. How long would it take to
execute an algorithm whose complexity function was:
T (n) = 2n2 on an input size of n=108?
The total number of operations to be performed would be T (108):

T(108) = 2*(108)2 =2*1016


The required number of seconds
required would be given by
T(108)/106 so:

Running time =2*1016/106 = 2*1010


The number of seconds per day is 86,400 so this is about 231,480 days (634 years).

Exercises
Determine the run time equation and complexity of each of the following code segments.
1. for (i=0;i<n;i++)
for (j=0;j<n; j++)
sum=sum+i+j;

2. for(int i=1; i<=n; i++)


for (int j=1; j<=i; j++)
sum++;
What is the value of the sum if n=20?
3. int k=0;
for (int i=0; i<n; i++)
for (int j=i; j<n; j++)
k++;
What is the value of k when n is equal to 20?
4. int k=0;
for (int i=1; i<n; i*=2)
for(int j=1; j<n; j++)
k++;
What is the value of k when n is equal to 20?

5. int x=0;
for(int i=1;i<n;i=i+5)
x++;
What is the value of x when n=25?
6. int x=0;
for(int k=n;k>=n/3;k=k-5)
x++;
What is the value of x when n=25?

7. int x=0;
for (int i=1; i<n;i=i+5)
for (int k=n;k>=n/3;k=k-5)
x++;
What is the value of x when n=25?

8
8. int x=0;
for(int i=1;i<n;i=i+5)
for(int j=0;j<i;j++)
for(int k=n;k>=n/2;k=k-3)
x++;

What is the correct big-Oh Notation for the above code segment?

1.3. Measures of Times


In order to determine the running time of an algorithm it is possible to define three functions Tbest(n),
Tavg(n) and Tworst(n) as the best, the average and the worst case running time of the algorithm respectively.

Average Case (Tavg): The amount of time the algorithm takes on an "average" set of inputs.
Worst Case (Tworst): The amount of time the algorithm takes on the worst possible set of inputs.
Best Case (Tbest): The amount of time the algorithm takes on the smallest possible set of inputs.

We are interested in the worst-case time, since it provides a bound for all input – this is called the “Big-
Oh” estimate.

1.4. Asymptotic Analysis


Asymptotic analysis is concerned with how the running time of an algorithm increases with the size of the
input in the limit, as the size of the input increases without bound.
There are five notations used to describe a running time function. These are:
 Big-Oh Notation (O)
 Big-Omega Notation ()

Big Oh Notation, Ο
The Ο(n) is the formal way to express the upper bound of an algorithm's running time. It measures the
worst case time complexity or longest amount of time an algorithm can possibly take to complete.

For example, for a function f(n)


Ο(f(n)) = { g(n) : there exists c > 0 and n0 such that g(n) ≤ c.f(n) for all n > n0. }
Example
Let us consider a given function, f ( ) = 3 + 10 2 + n+
Considering g ( ) = 3
f ( ) ≤ .g( )for all the values of n > . Hence, the complexity of f ( ) can be represented as
O(g( )), i.e. O(n3).

9
Omega Notation, Ω
The Ω(n) is the formal way to express the lower bound of an algorithm's running time. It measures the
best case time complexity or best amount of time an algorithm can possibly take to complete.

For example, for a function f(n)


Ω(f(n)) ≥ { g(n) : there exists c > 0 and n0 such that g(n) ≤ c.f(n) for all n > n0. }
Example
Let us consider a given function, f ( ) = 3 + 10 2 + n +
Considering g( ) = n3, f ( ) ≥ . g( ) for all the values of n > n
Hence, the complexity of f ( ) can be represented as (g( )), i.e. (n3).
Theta Notation, θ
The θ(n) is the formal way to express both the lower bound and upper bound of an algorithm's running
time. It is represented as following −

For example, for a function f(n)


θ(f(n)) = { g(n) if and only if g(n) = Ο(f(n)) and g(n) = Ω(f(n)) for all n > n0. }

Example
Let us consider a given function, f ( ) = 3 + 10 2 + n +
Considering ( ) = n3, . ( ) ≤ f ( ) ≤5. ( ) for all the large values of n
Hence, the complexity of ( ) can be represented as Ɵ( ( )), i.e. Ɵ (n3).

10
Typical Orders
Here is a table of some typical cases. This uses logarithms to base 2, but these are simply proportional to
logarithms in other base.

N O(1) O(log n) O(n) O(n log n) O(n2) O(n3)


1 1 1 1 1 1 1
2 1 1 2 2 4 8
4 1 2 4 8 16 64
8 1 3 8 24 64 512
16 1 4 16 64 256 4,096
1024 1 10 1,024 10,240 1,048,576 1,073,741,824

1.5. Relational Properties of the Asymptotic Notations


Transitivity
• if f(n)=(g(n)) and g(n)= (h(n)) then f(n)=(h(n)),
• if f(n)=O(g(n)) and g(n)= O(h(n)) then f(n)=O(h(n)),
• if f(n)=(g(n)) and g(n)= (h(n)) then f(n)= (h(n)),
Symmetry
• f(n)=(g(n)) if and only if g(n)=(f(n)).
Transpose symmetry
• f(n)=O(g(n)) if and only if g(n)=(f(n)),
• f(n)=o(g(n)) if and only if g(n)=(f(n)).
Reflexivity
• f(n)=(f(n)),
• f(n)=O(f(n)),
• f(n)=(f(n)).

11
2. Simple Sorting and Searching Algorithms
2.1. Searching

Searching is a process of looking for a specific element in a list of items or determining that the item is
not in the list. There are two simple searching algorithms:

• Sequential Search, and


• Binary Search

2.1.1. Linear Search (Sequential Search)

Pseudocode

Loop through the array starting at the first element until the value of target matches one of the array
elements.

If a match is not found, return –1.

Time is proportional to the size of input (n) and we call this time complexity O(n).

Example Implementation:

int Linear_Search(int list[], int key)


{
int index=0;
int found=0;
do{
if(key==list[index])
found=1;
else
index++;
}while(found==0&&index<n);
if(found==0)
index=-1;
return index;
}

2.1.2. Binary Search

This searching algorithms works only on an ordered list.

The basic idea is:

• Locate midpoint of array to search


• Determine if target is in lower half or upper half of an array.

12
o If in lower half, make this half the array to search
o If in the upper half, make this half the array to search
• Loop back to step 1 until the size of the array to search is one, and this element does not match, in
which case return –1.

The computational time for this algorithm is proportional to log2 n. Therefore the time complexity is
O(log n)

Example Implementation:

int Binary_Search(int list[],int k)


{
int left=0;
int right=n-1;
int found=0;
do{
mid=(left+right)/2;
if(key==list[mid])
found=1;
else{
if(key<list[mid])
right=mid-1;
else
left=mid+1;
}
}while(found==0&&left<right);
if(found==0)
index=-1;
else
index=mid;
return index;
}

2.2. Sorting Algorithms

Sorting is one of the most important operations performed by computers. Sorting is a process of
reordering a list of items in either increasing or decreasing order. The following are simple sorting
algorithms used to sort small-sized lists.

• Insertion Sort
• Selection Sort
• Bubble Sort
2.2.1. Insertion Sort

The insertion sort works just like its name suggests - it inserts each item into its proper place in the final
list. The simplest implementation of this requires two list structures - the source list and the list into which

13
sorted items are inserted. To save memory, most implementations use an in-place sort that works by
moving the current item past the already sorted items and repeatedly swapping it with the preceding item
until it is in place.

It's the most instinctive type of sorting algorithm. The approach is the same approach that you use for
sorting a set of cards in your hand. While playing cards, you pick up a card, start at the beginning of your
hand and find the place to insert the new card, insert it and move all the others up one place.

Basic Idea:

Find the location for an element and move all others up, and insert the element.

The process involved in insertion sort is as follows:

1. The left most value can be said to be sorted relative to itself. Thus, we don’t need to do anything.
2. Check to see if the second value is smaller than the first one. If it is, swap these two values. The
first two values are now relatively sorted.
3. Next, we need to insert the third value in to the relatively sorted portion so that after insertion, the
portion will still be relatively sorted.
4. Remove the third value first. Slide the second value to make room for insertion. Insert the value in
the appropriate position.
5. Now the first three are relatively sorted.
6. Do the same for the remaining items in the list.

Implementation

void insertion_sort(int list[]){


int temp;
for(int i=1;i<n;i++){
temp=list[i];
for(int j=i; j>0 && temp<list[j-1];j--)
{ // work backwards through the array finding where temp should go
list[j]=list[j-1];
list[j-1]=temp;
}//end of inner loop
}//end of outer loop
}//end of insertion_sort

Analysis
How many comparisons?
1+2+3+…+(n-1)= O(n2)
How many swaps?
1+2+3+…+(n-1)= O(n2)
How much space?
In-place algorithm

2.2.2. Selection Sort

14
Basic Idea:

 Loop through the array from i=0 to n-1.


 Select the smallest element in the array from i to n
 Swap this value with value at position i.

Implementation:

void selection_sort(int list[])


{
int i,j, smallest;
for(i=0;i<n;i++){
smallest=i;
for(j=i+1;j<n;j++){
if(list[j]<list[smallest])
smallest=j;
}//end of inner loop
temp=list[smallest];
list[smallest]=list[i];
list[i]=temp;
} //end of outer loop
}//end of selection_sort
Analysis
How many comparisons?
(n-1)+(n-2)+…+1= O(n2)
How many swaps?
n=O(n)
How much space?
In-place algorithm
2.2.3. Bubble Sort

Bubble sort is the simplest algorithm to implement and the slowest algorithm on very large inputs.

Basic Idea:

 Loop through array from i=0 to n and swap adjacent elements if they are out of order.

Implementation:
void bubble_sort(list[])
{
int i,j,temp;
for(i=0;i<n; i++){
for(j=n-1;j>i; j--){
if(list[j]<list[j-1]){
temp=list[j];
list[j]=list[j-1];
list[j-1]=temp;

15
}//swap adjacent elements
}//end of inner loop
}//end of outer loop
}//end of bubble_sort

Analysis of Bubble Sort


How many comparisons?
(n-1)+(n-2)+…+1= O(n2)
How many swaps?
(n-1)+(n-2)+…+1= O(n2)
Space?
In-place algorithm.
General Comments

Each of these algorithms requires n-1 passes: each pass places one item in its correct place. The ith pass
makes either i or n - i comparisons and moves. So:

or O(n2). Thus these algorithms are only suitable for small problems where their simple code makes them
faster than the more complex code of the O(n logn) algorithm. As a rule of thumb, expect to find an O(n
logn) algorithm faster for n>10 - but the exact value depends very much on individual machines!.

Empirically it’s known that Insertion sort is over twice as fast as the bubble sort and is just as easy to
implement as the selection sort. In short, there really isn't any reason to use the selection sort - use the
insertion sort instead.

If you really want to use the selection sort for some reason, try to avoid sorting lists of more than a 1000
items with it or repetitively sorting lists of more than a couple hundred items.

16
Module Title: Basic Programming

Course Title: Fundamentals of Programming I

Compiled By: Destalem H

Unit I: Introduction to Programming


Introduction to computer and programming
A computer is a device that can perform computations and make logical decisions
phenomenally faster than human beings can. Many of today’s personal computers can
perform billions of calculations in one second—more than a human can perform in a
lifetime. Supercomputers are already performing thousands of trillions (quadrillions) of
instructions per second! To put that in perspective, a quadrillion-instruction-per-second
computer can perform in one second more than 100,000 calculations for every person on
the planet!
And—these “upper limits” are growing quickly!

Computers process data under the control of sets of instructions called computer
programs. These programs guide the computer through orderly sets of actions specified by
people called computer programmers. The programs that run on a computer are referred
to as software.

A computer consists of various devices referred to as hardware (e.g., the keyboard, screen,
mouse, hard disks, memory, DVDs and processing units). Computing costs are dropping
dramatically, owing to rapid developments in hardware and software technologies.
Computers that might have filled large rooms and cost millions of dollars decades ago are
now inscribed on silicon chips smaller than a fingernail, costing perhaps a few dollars each.
Ironically, silicon is one of the most abundant materials—it’s an ingredient in common sand.
Silicon-chip technology has made computing so economical that more than a billion
general-purpose computers are in use worldwide

Any computer can directly understand only its own machine language, defined by its
hardware design. Machine languages generally consist of strings of numbers (ultimately
reduced to 1s and 0s) that instruct computers to perform their most elementary operations
one at a time. Machine languages are machine dependent (a particular machine language
can be used on only one type of computer). Such languages are cumbersome for humans.

Programming in machine language was simply too slow and tedious for most programmers.
Computer usage increased rapidly with the advent of assembly languages, but programmers
still had to use many instructions to accomplish even the simplest tasks. To speed the
programming process, high-level languages were developed in which single statements
could be written to accomplish substantial tasks. Translator programs called compilers
convert high-level language programs into machine language. High-level languages allow

Lecture Note-Cosc1013 Page 1


Module Title: Basic Programming

Course Title: Fundamentals of Programming I

Compiled By: Destalem H

you to write instructions that look almost like every day English and contain commonly used
mathematical notations.
Instead of using the strings of numbers that computers could directly understand,
programmers began using English-like abbreviations to represent elementary operations.
These abbreviations formed the basis of assembly languages. Translator programs called
assemblers were developed to convert early assembly-language programs to machine
language at computer speeds.

From the programmer’s standpoint, high-level languages are preferable to machine and
assembly languages. C++, C, Microsoft’s .NET languages (e.g., Visual Basic, Visual C++ and
Visual C#) and Java are among the most widely used high-level programming languages.
Compiling a large high-level language program into machine language can take a
considerable amount of computer time. Interpreter programs were developed to execute
high-level language programs directly (without the delay of compilation), although slower
than compiled programs run.

Software
You do not normally talk directly to the computer, but communicate with it through an operating system.
The operating system allocates the computer’s resources to the different tasks that the computer must
accomplish. The operating system is actually a program, but it is perhaps better to think of it as your chief
servant. It is in charge of all your other servant programs, and it delivers your requests to them. If you
want to run a program, you tell the operating system the name of the file that contains it, and the
operating system runs the program. If you want to edit a file, you tell the operating system the name of
the file and it starts up the editor to work on that file. To most users the operating system is the
computer. Most users never see the computer without its operating system. The names of some
common operating systems are UNIX, DOS, Linux, Windows, Mac OS, and VMS.
A program is a set of instructions for a computer to follow. The input to a computer can be thought of as
consisting of two parts, a program and some data. The computer follows the instructions in the program,
and in that way, performs some process. The data is what we conceptualize as the input to the program.
For example, if the program adds two numbers, then the two numbers are the data. In other words, the
data is the input to the program, and both the program and the data are input to the computer (usually
via the operating system). Whenever we give a computer both a program to follow and some data for
the program, we are said to be running the program on the data, and the computer is said to execute the
program on the data. The word data also has a much more general meaning than the one we have just
given it. In its most general sense it means any information available to the computer. The word is
commonly used in both the narrow sense and the more general sense.
High-Level Languages
There are many languages for writing programs. In this text we will discuss the C++ programming
language and use it to write our programs. C++ is a high level language, as are most of the other
Lecture Note-Cosc1013 Page 2
Module Title: Basic Programming

Course Title: Fundamentals of Programming I

Compiled By: Destalem H

programming languages you are likely to have heard of, such as C, Java, Pascal, Visual Basic, FORTRAN,
COBOL, Lisp, Scheme, and Ada. High-level languages resemble human languages in many ways. They are
designed to be easy for human beings to write programs in and to be easy for human beings to read. A
high-level language, such as C++, contains instructions that are much more complicated than the simple
instructions a computer’s processor (CPU) is capable of following.
The kind of language a computer can understand is called a low-level language. The exact details of low-
level languages differ from one kind of computer to another. A typical low-level instruction might be the
following:
ADD X Y Z
This instruction might mean “Add the number in the memory location called X to the number in the
memory location called Y, and place the result in the memory location called Z.” The above sample
instruction is written in what is called assembly language. Although assembly language is almost the
same as the language understood by the computer, it must undergo one simple translation before the
computer can understand it. In order to get a computer to follow an assembly language instruction, the
words need to be translated into strings of zeros and ones. For example, the word ADD might translate to
0110, the X might translate to 1001, the Y to 1010, and the Z to 1011. The version of the above
instruction that the computer ultimately follows would then be:
0110 1001 1010 1011
Assembly language instructions and their translation into zeros and ones differ from machine to
machine.
Programs written in the form of zeros and ones are said to be written in machine language, because that
is the version of the program that the computer (the machine) actually reads and follows. Assembly
language and machine language are almost the same thing, and the distinction between them will not be
important to us. The important distinction is that between machine language and high-level languages
like C++: Any high-level language program must be translated into machine language before the
computer can understand and follow the program.
Compilers
A program that translates a high-level language like C++ to a machine language is called a compiler. A
compiler is thus a somewhat peculiar sort of program, in that its input or data is some other program,
and its output is yet another program. To avoid confusion, the input program is usually called the source
program or source code, and the translated version produced by the compiler is called the object
program or object code. The word code is frequently used to mean a program or a part of a program,
and this usage is particularly common when referring to object programs. Now, suppose you want to run
a C++ program that you have written. In order to get the computer to follow your C++ instructions,
proceed as follows. First, run the compiler using your C++ program as data. Notice that in this case, your
C++ program is not being treated as a set of instructions. To the compiler, your C++ program is just a long
string of characters. The output will be another long string of characters, which is the machine-language
equivalent of your C++ program.

Lecture Note-Cosc1013 Page 3


Module Title: Basic Programming

Course Title: Fundamentals of Programming I

Compiled By: Destalem H

Next, run this machine-language program on what we normally think of as the data for the C++ program.
The output will be what we normally conceptualize as the output of the C++ program.

Any C++ program you write will use some operations (such as input and output routines) that have
already been programmed for you. These items that are already programmed for you (like input and
output routines) are already compiled and have their object code waiting to be combined with your
program’s object code to produce a complete machine-language program that can be run on the
computer. Another program, called a linker, combines the object code for these program pieces with the
object code that the compiler produced from your C++ program. In routine cases, many systems will do
this linking for you automatically. Thus, you may not need to worry about linking in very simple cases. In
routine cases, many systems will do this linking for you automatically. Thus, you may not need to worry
about linking in very simple cases.

Algorithm development and representation


Computer problem solving is a complicated process requiring careful planning and attention to
detail. The vehicle for the computer solution to a problem is a set of explicit and unambiguous
instructions called a program and expressed in a programming language. A program is a sequence
of instructions that determine the operations to be carried out by the machine in order to solve a
particular problem. There are five stages of software/program development. They are Analysis,
Algorithm design & Flow chart, Coding, Implementation, and Maintenance.
Stages of Program Development
There are five stages of program development: namely, Analysis, Algorithm & Flow chart design
Coding, Implementation, and Maintenance.
Analysis

Lecture Note-Cosc1013 Page 4


Module Title: Basic Programming

Course Title: Fundamentals of Programming I

Compiled By: Destalem H

Analysis stage requires a thorough understanding of the problem at hand and analysis of the data
and procedures needed to achieve the desired result. In analysis stage, therefore, what we must do
is work out what must be done rather than how to do it.
 What input data are needed to the problem?
 What procedures needed to achieve the result?
 What outputs data are expected?
Algorithm design and flowchart
Once the requirements of the program are defined, the next stage is to design an algorithm to solve
the problem. An algorithm is a finite set of steps which, if followed accomplishes a particular task.
An algorithm is a map or an outline of a solution which shows the precise order in which the
program will execute individual functions to arrive at the solution. It is language independent. An
algorithm can be expressed in many ways. Here, we only consider two such methods:
Narrative (pseudocode) and Flowchart. English is often used to describe or narrate the algorithm.
There is no need to follow any rules about how to write it. Instead we use pseudo code which is free
form list of statements that shows the sequence of instructions to follow.
Flowchart
A flowchart consists of an ordered set of standard symbols (mostly, geometrical shapes) which
represent operations, data flow or equipment.
A flowchart is a diagram consisting of labeled symbols, together with arrows connecting one
symbol to another. It is a means of showing the sequence of steps of an algorithm.
A program flowchart shows the operations and logical decisions of a computer program. The most
significant advantage of flowcharts is a clear presentation of the flow of control in the algorithm, i.e.
the sequence in which operations are performed. Flowcharts allow the reader to follow the logic of
the algorithm more easily than would a linear description in English. Another advantage of
flowchart is it doesn’t depend on any particular programming language, so that it can used, to
translate an algorithm to more than one programming language.
A basic set of established flowchart symbols is:
Decision
Processing Input/output

START/STOP

Connector Flow lines

The symbols have the following meanings:


Processing: one or more computational tasks are to be performed sequentially.

Lecture Note-Cosc1013 Page 5


Module Title: Basic Programming

Course Title: Fundamentals of Programming I

Compiled By: Destalem H

Input/output: data are to be read into the computer memory from an input device or data are to
be passed from the memory to an output device.
Decision: –It usually contains a question within it. There are typically two output paths: one if the
answer to the question is yes ( true) , and the other if the answer is no ( false). The path to be
followed is selected during the execution by testing whether or not the condition specified within
the outline is fulfilled.
Terminals: appears either at the beginning of a flowchart (and contains the word "start") or at its
conclusion (and contains "stop"). It represents the Start and End of a program.
Connector: makes it possible to separate a flowchart into parts.
Flow lines: is used to indicate the direction of logical flow. (A path from one operation to another)
DESIGN AND IMPLEMENTATION OF ALGORITHMS
An algorithm is a finite set of instruction that specify a sequence of operations to be carried out in
order to solve a specific problem or class of problems. It is just a tool for solving a problem. All the
tasks that can be carried out by a computer can be stated as algorithms. For one problem there may
be a lot of algorithms that help to solve the problem, but the algorithm that we select must be
powerful, easy to maintain, and efficient (it doesn’t take too much space and time)

Once an algorithm is designed, it is coded in a programming language and computer executes the
program. An algorithm consists of a set of explicit and unambiguous finite steps which, when carried
out for a given set of initial conditions, produce the corresponding output and terminate in a fixed
amount of time. By unambiguity it is meant that each step should be defined precisely i.e., it should
have only one meaning. This definition is further classified with some more features.
An algorithm has five important features.
 Finiteness: An algorithm terminates after a fixed number of steps.
 Definiteness: Each step of the algorithm is precisely defined, i.e., the actions to be carried
out should be specified unambiguously.
 Effectiveness: All the operations used in the algorithm are basic (division, multiplication,
comparison, etc.) and can be performed exactly in a fixed duration of time.
 Input: An algorithm has certain precise inputs, i.e. quantities, which are specified to it
initially, before the execution of the algorithm begins.
 Output: An algorithm has one or more outputs, that is, the results of operations which have
a specified relation to the inputs

Examples of Algorithm Development

1) Algorithm to add two numbers.


Step 1: START
Step 2: Read two numbers n1 and n2.
Step 3: Sum ← n1 + n2

Lecture Note-Cosc1013 Page 6


Module Title: Basic Programming

Course Title: Fundamentals of Programming I

Compiled By: Destalem H

Step 4: Print Sum


Step 5: STOP
2) Algorithm to find largest number from two numbers.
Step 1: START
Step 2: Read n1 and n2.
Step 3: If n1 > n2 go to step 5
Step 4: Big←n2,go to step 6
Step 5: Big ← n1
Step 6: Print Big
Step 7: STOP
3) Algorithm to find largest number from three numbers.
Step 1: START
Step 2: Read n1, n2 and n3.
Step 3: If n1 > n2 and n1 > n3, go to step 6
Step 4: If n2 > n1 and n2 > n3, go to step 7
Step 5: Big ←n3, go to step 8
Step 6:Big ← n1 , go to step 8
Step 7:Big ← n2
Step 8: Print Big
Step 9: STOP.

4) Algorithm to find sum of N positive integer numbers.

Step 1: START
Step 2: Read N
Step 3: Sum ← 0,
Step 4: Count ← 0
Step 5: Read Num
Step 6: Sum←Sum + Num
Step 7: count ← count +1
Step 8: If Count < N then goto step5
Step 9: Print Sum
Step 10: Stop

Flow Chart Development


1. A flowchart to add two numbers

Lecture Note-Cosc1013 Page 7


Module Title: Basic Programming

Course Title: Fundamentals of Programming I

Compiled By: Destalem H

2. A Flowchart to find largest of


two numbers.

Coding
The flowchart is independent of programming language. Now at this stage we translate each steps
described in the flowchart (algorithm description) to an equivalent instruction of target

Lecture Note-Cosc1013 Page 8


Module Title: Basic Programming

Course Title: Fundamentals of Programming I

Compiled By: Destalem H

programming language, that means, for example if we want to write in FORTRAN program language,
each step will be described by an equivalent FORTRAN instruction (Statement).
Implementation
Once the program is written, the next step is to implement it. Program implementation involves
three steps, namely, debugging (the process of removing errors), testing (a check of correctness),
and documenting the program (to aid the maintenance of a program during its life time). Every
program contains bugs that can range from simple mistakes in the language usage (syntax errors) up
to complex flaws in the algorithm (logic errors).
Maintenance
There are many reasons why programs must be continually modified and maintained, like changing
conditions, new user needs, previously undiscovered bugs (errors). Maintenance may involve all
steps from requirements analysis to testing.

Translation and Execution


The only language that the computer understands is the machine language. Therefore any program
that is written in either low-level or high level language must be translated to machine code so that
the computer could process it.

A program written in High-level or Assembly is called Source code or Source program and, the
translated machine code is called the object code or object program. Programs that translate a
program written in high level language and Assembly to machine code program are called
translators. There are three types of translators; assembler, interpreter, and compiler.

 Source code: High-level language instructions.


 Compiler: System software which translates high-level language instructions into machine
language or object code. The compiler translates source code once and produces a complete
machine language program.
 Object code: Translated instructions ready for computer.
 An assembler is a software tool for translating low-level language (assembly language) into
machine language.
The translator not only translates the instructions into machine code but also it detects whether the
program fulfills the syntax of the programming language. A program passes through different stages
before it carries out its function. First the program will be translated to object code (compilation
time), then it will be loaded to the memory and finally it will be executed (run time) or carries out its
function.
Basic Programming Tools
The three basic building blocks, that is essential to develop a solution to a problem are:

1. Sequential executions where instructions are performed one after the other.
Lecture Note-Cosc1013 Page 9
Module Title: Basic Programming

Course Title: Fundamentals of Programming I

Compiled By: Destalem H

2. Branching operations where a decision is made to perform one block of instructions or


another.
3. Looping operations where a block of instructions is repeated. There are two types of loops.
The first is the conditional loop where we do not know in advance how many times
something is to be repeated. The second type is the counted loop where we do know in
advance how many times to repeat the solution.
I. Sequential Execution of Instructions
Sequential instructions are executed one after the other. The computer begins with the first
instruction and performs the indicated operation, then moves to the next instruction and so on.

Illustration: 1 - An algorithm and a flowchart to compute the area of a rectangle whose length is ‘l’
and breadth is ‘b’. Flowchart
START

Algorithm
Read l,b
Step 1: START
Step 2: Obtain (input) the length, call it l
Step 3: Obtain (input) the breadth, call it b Area ← l*b

Step 4: Compute l*b, call it Area


Step 5: Display Area Display Area
Step 6: STOP

Stop

START
Illustration: 2- To allow for repeated calculation of the area of a rectangle
whose length is ‘l’ and breadth is ‘b’, rewrite the above algorithm and
Read l,b
flowchart. Allow different values of ‘l’ and ‘b’ before each calculation of
area.
Area ← l*b

Lecture Note-Cosc1013 Page 10 Display


Module Title: Basic Programming

Course Title: Fundamentals of Programming I

Compiled By: Destalem H

Algorithm Flowchart

Step 1: START
Step 2: Read length, call it l
Step 3: Read breadth, call it b
Step 4: compute l*b, call it Area
Step 5: Display Area
Step 6: Go to step 2
Note: Here in effect we created a loop. The series of instructions to calculate ‘area’ is reused over
and over again, each time with a different set of input data. But here it is an infinite loop. There is no
way to stop the repeated calculations except to pull the plug of the computer. There fore using this
type of unconditional transfer (Go to) to construct a loop is generally not a good idea. In almost all
cases where an unconditional loop seems useful, one of the other control structures (loops or
branches) can be substituted and in fact it is preferred
II. Branching Operations
With sequential instructions there is no possibility of skipping over one instruction. A branch is a
point in a program where the computer will make a decision about which set of instructions to
execute next. The question that we use to make the decision about which branch to take must be
set up so that the answers can only be yes or no. Depending on the answer to the question, control
flows in one direction or the other.
Illustration: Construct an algorithm and flowchart to read two numbers and determine which is
large.
Algorithm
Step 1: START
Step 2: Read two numbers A and B.
Step 3: If A > B then go to step 6
Step 4: Display B is largest.
Step 5: go to step 7
Step 6: Display A is largest
Step 7: STOP
Flowchart

Lecture Note-Cosc1013 Page 11


Module Title: Basic Programming

Course Title: Fundamentals of Programming I

Compiled By: Destalem H

Note that only one block of instructions is executed, not both. After one block or other executes, the
two paths merge (at the circle) and control transfers to the next instruction. We could have several
loops inside each block since we are not limited to just one instruction.
Nesting of Branching Operations
There are many times when we need to choose between more than two alternatives. One of the
solutions to this is nesting of instructions.
Illustration
Construct an algorithm and flowchart to see if a number ‘n’ is negative, positive, or zero
Algorithm Flowchart
Step 1: START
Step 2: Read in ‘n’
Step 3: Is n<0
Step 4: If yes, go to step 11
Step 5: Is n=0
Step 6: If yes, go to step 9
Step 7: Print “Positive”
Step 8: go to step 12
Step 9: Print “Zero”
Step 10: go to step 12
Step 11: Print “Negative”
Step 12: STOP
III. LOOPS

Loops are the third major type of control structure that we need to examine. There are 2 different
types of loops, the counted loop and the conditional loop. The counted loop repeats a
predetermined number of times while the conditional loop repeats until a condition is satisfied.

 In the counted loop a counter keeps track of the loop executions. Once the counter reaches
a predetermined number, the loop terminates. Number of loop executions is set before the
loop begins and cannot be changed while the loop is running.
 The conditional loop has no predetermined stopping point. Rather, each time through the
loop the program performs a test to determine when to stop. Also the quantity being used
for the test can change while the loop executes.
 The variable used to control the loop can be referred to as Loop Control Variable (LCV).

Lecture Note-Cosc1013 Page 12


Module Title: Basic Programming

Course Title: Fundamentals of Programming I

Compiled By: Destalem H

 Flowchart symbol for loop is hexagon. Inside the loop is the start and stop values for LCV.
Also a step value is included from which the computer decides how many times to execute
the loop. Inside the loop is the body which can consist of any number of instructions. When
the loop finishes, the control transfers to the first statement outside the loop.
 With counted loops, it’s a must to know in advance how many times the loop will execute.
But if this information is not available and yet the problem demands a loop means, we can
make use of conditional loop, where the machine will check every time through the loop to
see whether it should be repeated or not.
 In conditional loop, the programmer must change the Loop control variable.

Start =…

Stop=…

Illustration

Ex. An algorithm and flowchart to print out the numbers 1 to 100 and their squares.

Algorithm Flowchart

Start 1: START

Start 2: Loop (LCV start=1;stop=100,step=1)

Step 3: Print LCV, LCV2 value

Step 4: End loop

Step 5: STOP

Quick quize
Construct an algorithm and a flowchart for the following:
1. To find the smallest number from three numbers.
2. To read in x, y and z and then compute the value of xyz.
3. To determine if a whole number is odd or even.

Lecture Note-Cosc1013 Page 13


Module Title: Basic Programming

Course Title: Fundamentals of Programming I

Compiled By: Destalem H

4. To print a prompt message “good morning”.


5. To find the sum of first n even numbers.
6. To find the sum of digits of a given number.
7. To print out a list of first n even numbers and their squares.
8. To find the sum of n positive numbers.
9. To find the biggest among n numbers.
10. To find the factorial of a given number.
11. To generate Fibonacci series (1, 1, 2, 3, 5, 8, 13…)

Unit-II: C++ basics


History of C++

C++ is an Object Oriented Programming Language. It was initially named ‘C with classes’, C++ was
developed by Bjarne Stroustrup at AT&T Bell laboratories in Murray Hill, New Jersey, USA, in the
early eighties. Stroustrup, an admirer of Simula67 and a strong supporter of C, wanted to combine

Lecture Note-Cosc1013 Page 14


Module Title: Basic Programming

Course Title: Fundamentals of Programming I

Compiled By: Destalem H

the best of both languages and create a more powerful language that could support object-oriented
programming features and still retain the power and elegance of C. The result was C++. Stroustrup
called the new langrage ‘C with classes’. However, later in 1983, the name was changed to C++. C++
is a super set of C. Therefore, almost all C programs can be written in C++.

List of Compilers:

1. Borland C+ + & Turbo C+ + available from Borland International for DOS & OS/2.
2. Zortech C+ + from Zortech International on DOS.
3. Microsoft Visual C+ + by Microsoft Corp.
4. GNU C+ + usually called as G++.
Layout of a Simple C++ Program

The general form of a simple C++ program is shown in Display 2.1. As far as the compiler is
concerned, the line breaks and spacing need not be as shown there and in our examples. The
compiler will accept any reasonable pattern of line breaks and indentation. In fact, the compiler will
even accept most unreasonable patterns of line breaks and indentation. However, a program
should always be laid out so that it is easy to read. Placing the opening brace, { , on a line by itself
and also placing the closing brace, } , on a line by itself will make these punctuations easy to find.
Indenting each statement and placing each statement on a separate line makes it easy to see what
the program instructions are. Later on, some of our statements will be too long to fit on one line and
then we will use a slight variant of this pattern for indenting and line breaks. You should follow the
pattern set by the examples in this material.

DISPLAY 2.1 Layout of a Simple C++ Program

1 #include <iostream.h>

2 int main( )

3{

4 Variable_Declarations

5 Statement_1

6 Statement_2

7 ...

8 Statement_Last

9 return 0;

Lecture Note-Cosc1013 Page 15


Module Title: Basic Programming

Course Title: Fundamentals of Programming I

Compiled By: Destalem H

10 }

The variable declarations are on the line that begins with the word int. As we will see in the next
sections, you need not place all your variable declarations at the beginning of your program, but
that is a good default location for them. Unless you have a reason to place them somewhere else,
place them at the start of your program as shown in Display 2.1. The statements are the instructions
that are followed by the computer. In Display 2.2 in the following page, the statements are the lines
that begin with cout or cin, and the one line that begins with c followed by an equal sign.
Statements are often called executable statements. We will use the terms statement and
executable statement interchangeably. Notice that each of the statements we have seen ends with a
semicolon. The semicolon in statements is used in more or less the same way that the period is used
in English sentences; it marks the end of a statement.

For now you can view the first few lines as a funny way to say “this is the beginning of the
program.” But we can explain them in a bit more detail. The first line #include <iostream.h> is called
an include directive. It tells the compiler where to find information about certain items that are
used in your program. In this case iostream.h is the name of a library that contains the definitions of
the routines that handle input from the keyboard and output to the screen; iostream.h is a file that
contains some basic information about this library. A linker program combines the object code for
the library iostream.h and the object code for the program you write. For the library iostream.h this
will probably happen automatically on your system. You will eventually use other libraries as well,
and when you use them, they will have to be named in directives at the start of your program. For
other libraries, you may need to do more than just place an include directive in your program, but in
order to use any library in your program, you will always need to at least place an include directive
for that library in your program. Directives always begin with the symbol #. Some compilers require
that directives have no spaces around the #; so it is always safest to place the # at the very start of
the line and not include any space between the # and the word include. The following line further
explains the include directive that we just explained.

The second and third nonblank lines, shown next, simply say that the main part of the program
starts here:

int main( )

The correct term is main function, rather than main part,. The braces { and } mark the beginning and
end of the main part of the program. They need not be on a line by themselves, but that is the way
to make them easy to find and we will therefore always place each of them on a line by itself. The

Lecture Note-Cosc1013 Page 16


Module Title: Basic Programming

Course Title: Fundamentals of Programming I

Compiled By: Destalem H

next-to-last line return 0; says to “end the program when you get to here.” This line need not be the
last thing in the program, but in a very simple program it makes no sense to place it anywhere else.
Some compilers will allow you to omit this line and will figure out that the program ends when there
are no more statements to execute. However, other compilers will insist that you include this line,
so it is best to get in the habit of including it, even if your compiler is happy without it. This line is
called a return statement and is considered to be an executable statement because it tells the
computer to do something; specifically, it tells the computer to end the program. The number 0 has
no intuitive significance to us yet, but must be there; its meaning will become clear as you learn
more about C++. Note that even though the return statement says to end the program, you still
must add a closing brace } at the end of the main part of your program.

 Be certain that you do not have any extra space between the < and the iostream.h
file name (Display 2.1) or between the end of the file name and the closing >.
 The compiler include directive is not very smart: It will search for a file name that
starts or ends with a space! The file name will not be found, producing an error that
is quite difficult to find. You should make this error deliberately in a small program,
then compile it. Save the message that your compiler produces so you know what
the error message means the next time you get that error message.
Compiling and Running a C++ Program

Next you will learn what would happen if you run the C++ program show in Display 2.2. But where is
that program and how do you make it run? You write a C++ program using a text editor in the same
way that you write any other document such as a term paper, a love letter, a shopping list, or
whatever. The program is kept in a file just like any other document you prepare using a text editor.

The way that you compile and run a C++ program also depends on the particular system you are
using. When you give the command to compile your program, this will produce a machine-language
translation of your C++ program. This translated version of your program is called the object code
for your program. The object code for your program must be linked (that is, combined) with the
object code for routines (such as input and output routines) that are already written for you. It is
likely that this linking will be done automatically, so you do not need to worry about linking.

DISPLAY 2.2: Sample Program in C++


#include <iostream.h>

void main ()

int a, b, c;

Lecture Note-Cosc1013
cout<<”Enter values of a, b”; Page 17

cin>>a>>b;
Module Title: Basic Programming

Course Title: Fundamentals of Programming I

Compiled By: Destalem H

In the above program the statement cin>>a>>b; is an input statement and causes the program to
wait for the user to type two numbers. If we key in two values, say 10 and 20 then 10 will be
assigned to a, 20 to b. The operator >> is known as extraction (or) get from operator.

The statement cout<<”The result is:”<<c; is an output statement that causes the string in quotation
marks to be displayed on the screen as it is and then the content of the variable c is displayed . The
operator << is known as insertion (or) put to operator. The identifier cin is pronounced as ‘C in’ and
cout is pronounced as ‘C out’).

TESTING AND DEBUGGING

A mistake in a program is usually called a bug, and the process of eliminating bugs is called
debugging.

Kinds of Program Errors

The compiler will catch certain kinds of mistakes and will write out an error message when it finds a
mistake. It will detect what are called syntax errors, because they are, by and large, violation of the
syntax (that is, the grammar rules) of the programming language, such as omitting a semicolon.

If the compiler discovers that your program contains a syntax error, it will tell you where the error is
likely to be and what kind of error it is likely to be. If the compiler says your program contains a
syntax error, you can be confident that it does. However, the compiler may be incorrect about
either the location or the nature of the error. It does a better job of determining the location of an
error, to within a line or two, than it does of determining the source of the error. This is because the
Lecture Note-Cosc1013 Page 18
Module Title: Basic Programming

Course Title: Fundamentals of Programming I

Compiled By: Destalem H

compiler is guessing at what you meant to write down and can easily guess wrong. After all, the
compiler cannot read your mind. Error messages subsequent to the first one have a higher
likelihood of being incorrect with respect to either the location or the nature of the error. Again, this
is because the compiler must guess your meaning. If the compiler’s first guess was incorrect, this will
affect its analysis of future mistakes, since the analysis will be based on a false assumption.

If your program contains something that is a direct violation of the syntax rules for your
programming language, the compiler will give you an error message. However, sometimes the
compiler will give you only a warning message, which indicates that you have done something that
is not, technically speaking, a violation of the programming language syntax rules, but that is
unusual enough to indicate a likely mistake. When you get a warning message, the compiler is
saying, “Are you sure you mean this?” At this stage of your development, you should treat every
warning as if it were an error until your instructor approves ignoring the warning.

There are certain kinds of errors that the computer system can detect only when a program is run.
Appropriately enough, these are called run-time errors. Most computer systems will detect certain
run-time errors and output an appropriate error message. Many run-time errors have to do with
numeric calculations. For example, if the computer attempts to divide a number by zero, that is
normally a run-time error. If the compiler approved of your program and the program ran once with
no run-time error messages, this does not guarantee that your program is correct. Remember, the
compiler will only tell you if you wrote a syntactically (that is, grammatically) correct C++ program. It
will not tell you whether the program does what you want it to do. Mistakes in the underlying
algorithm or in translating the algorithm into the C++ language are called logic errors. For example,
if you were to mistakenly use the multiplication sign * instead of the addition sign + in the program
in Display 2.2, that would be a logic error. The program would compile and run normally, but would
give the wrong answer. If the compiler approves of your program and there are no runtime errors,
but the program does not perform properly, then undoubtedly your program contains a logic error.
Logic errors are the hardest kind to diagnose, because the computer gives you no error messages to
help find the error. It cannot reasonably be expected to give any error messages. For all the
computer knows, you may have meant what you wrote.

Quick quiz

1. What are the three main kinds of program errors?

2. What kinds of errors are discovered by the compiler?

Lecture Note-Cosc1013 Page 19


Module Title: Basic Programming

Course Title: Fundamentals of Programming I

Compiled By: Destalem H

3. If you omit a punctuation symbol (such as a semicolon) from a program, an error is produced.
What kind of error?

4. Omitting the final brace} from a program produces an error. What kind of error?

Comments

Comments are notice that added to your program to describe the logic and how the particular part
of the program is work. The comment lines in the program are ignored by the compiler. There are
two types of comments. These are:

Single line comment: C++ introduces single line comment // (double slash). Comments starts with a
double slash symbol and terminate at the end of line.

Example: c=5.0/9*(f-32); //conversion formula.

Multi line comment: Multi line comment symbols are /*, */ .

E.g : /* this is an example of C++ program to illustrate some of its features and how c++ is
written*/

Variable

Variables are often referred to as named memory locations to store a determined value. The value
of a variable may vary throughout program means that, a variable may take different values at
different times during execution. Each variable needs an identifier that distinguishes it from the
others, for example, in the code a=5 the variable identifier is ‘a’, but we could have called the
variables with any names we wanted to invent, as long as they were valid identifiers.

Identifiers
 A valid identifier is a sequence of one or more letters, digits or underscore characters
(_ ).

 The length of an identifier is not limited, although for some compilers only the 32 first characters
of an identifier are significant (the rest are not considered).
 Neither spaces nor marked letters can be part of an identifier.
 Variable identifiers should always begin either with a letter or underscore character
(_ ). They can never begin with a digit. But (_) this is usually reserved for external links.

Lecture Note-Cosc1013 Page 20


Module Title: Basic Programming

Course Title: Fundamentals of Programming I

Compiled By: Destalem H

 Our own identifiers cannot match any key word of the C++ language nor your compiler's specific
ones since they could be confused with these.
 The C++ language is "case sensitive", that means that an identifier written in capital letters is not
equivalent to another one with the same name but written in small letters. Thus, for example the
variable RESULT is not the same as the variable result nor the variable Result.

Some of the keywords supported by ANSI-C++ Standard

asm, auto, bool, break, case, catch, char, class, const,


const_cast, continue, default, delete, do, double,
dynamic_cast, else, enum, explicit, extern, false, float, for,
friend, goto, if, inline, int, long, mutable, namespace, new,
operator, private, protected, public, register,
reinterpret_cast, return, short, signed, sizeof, static,
static_cast, struct, switch, template, this, throw, true, try,
typedef, typeid, typename, union, unsigned, using, virtual,
void, volatile, wchar_t , far, huge near, and, and_eq, bitand,
bitor, compl, not, not_eq, or, or_eq, xor, xor_eq
Built in Data Types

When programming, we store the variables in our computer's memory, but the computer has to
know what kind of data we want to store in them, since it is not going to occupy the same amount
of memory to store a simple number than to store a single letter or a large number, and they are not

Lecture Note-Cosc1013 Page 21


Module Title: Basic Programming

Course Title: Fundamentals of Programming I

Compiled By: Destalem H

going to be interpreted the same way. The memory in our computers is organized in bytes. A byte is
the minimum amount of memory that we can manage in C++.

Integer They are the numbers without decimal part.Ex:69,360,32330.

Float, Double They are the numbers with decimal point. Ex:69.65,3.1415.

Character Any letter enclosed within single quotes comes under character.

The modifiers signed, unsigned, long, and short may be applied to character and integer basic data
types. However, the modifier long may also be applied to double. The following table lists all
combinations of the basic data types and modifiers along with their size and range.

Size & Range of C++ Basic Data Types

Type Bytes* Range*

Char 1 -128 to 127

unsigned char 1 0-255

signed char 1 -128 to 127

Int 2 or 4 -32768 to 32767

unsigned int 2 or 4 0 to 65535

signed int 2 or 4 -32768 to 32767

short int 2 -32768 to 32767

unsigned short 2 0 to 65535


int

signed short int 2 -32768 to 32767

long int 4 -2147483648 to 2147483647

signed long int 4 -2147483648 to 2147483647

unsigned long int 4 0 to 4294967295

Float 4 3.4E -38 to 3.4E +38

Double 8 1.7E -308 to 1.7E +308

Lecture Note-Cosc1013 Page 22


Module Title: Basic Programming

Course Title: Fundamentals of Programming I

Compiled By: Destalem H

long double 10 304E -4932 to 1.1E +4932

* The values of the columns Size and Range depend on the system the program is compiled
for.
Declaration of Variables
 In order to use a variable in C++, we must first declare it specifying the data type.
 The syntax to declare a new variable is to write the data type specifier that we want (like
int, short, float...) followed by a valid variable identifier.
 For example:
 int a;  declares a variable of type int with the identifier a
 float mynumber; declares a variable of type float with the identifier mynumber.
 Once declared, variables a and mynumber can be used within the rest of their scope in the
program.
 To declare several variables of the same type and to save some writing work you can declare
all of them in the same line separating the identifiers with commas.
 E.g: int a, b, c;  declares three variables (a, b and c) of type int.
Initialization of Variables
 When declaring a local variable, its value is undetermined by default.
 To store a concrete value for a variable the moment it is declared append an equal sign
followed by the value wanted to the variable declaration:
o Syntax: type identifier = initial_value ;
o E.g: int a = 0;  Declare an int variable called a that contains the value 0 at the
moment in which it is declared.
 C++ has added a new way to initialize a variable: by enclosing the initial value
between parenthesis ():
o Syntax: type identifier (initial_value) ;
o For example: int a (0);
 Both ways are valid and equivalent in C++.
Scope of variables

 All the variables that we intend to use in a program must have been declared with its type
specifier in an earlier point in the code.
 A variable can be either of global or local scope.
 A global variable is a variable declared in the source code, outside all functions, while a local
variable is one declared within the body of a function or a block.
 Global variables can be referred from anywhere in the code, even inside functions,
whenever it is after its declaration.
 The scope of local variables is limited to the block enclosed in braces ({}) where they are
declared. For example, if they are declared at the beginning of the body of a function (like in

Lecture Note-Cosc1013 Page 23


Module Title: Basic Programming

Course Title: Fundamentals of Programming I

Compiled By: Destalem H

function main) their scope is between its declaration point and the end of that function. This
means that if another function existed in addition to main, the local variables declared in
main could not be accessed from the other function and vice versa.
Constant

 Constants are expressions with a fixed value.


Literals

 Literals are used to express particular values within the source code of a program. For
example, when we wrote: a = 5 ; the 5 in this piece of code was a literal constant.
 Literal constants can be divided in Integer Numerals, Floating-Point Numerals, Characters,
Strings and Boolean Values.
Integer Numerals

 They are numerical constants that identify integer decimal values.


 C++ allows the use as literal constants of octal numbers (base 8) and hexadecimal numbers
(base 16).
 To express an octal number we have to precede it with a 0 (zero character).
 To express a hexadecimal number we have to precede it with the characters 0x (zero, x).
 E.g: 75 // decimal 0113 // octal 0x4b // hexadecimal . All of these represent the same
number: 75 (seventy-five) expressed as a base-10 numeral, octal numeral and hexadecimal
numeral, respectively.
 Literal constants, like variables, are considered to have a specific data type. By default,
integer literals are of type int. We can force them to either be unsigned by appending the u
character to it, or long by appending l.
 E.g: 75 // int 75u // unsigned int 75l // long 75ul // unsigned long
Floating Point Numbers

 They express numbers with decimals and/or exponents. They can include either a decimal
point, an e character (that expresses "by ten at the Xth height", where X is an integer value
that follows the e character), or both a decimal point and an e character.
 In both cases, the suffix can be specified using either upper or lowercase letters.
 For e.g: 3.14159 // 3.14159
6.02e23 // 6.02 x 10^23

1.6e-19 // 1.6 x 10^-19

3.0 // 3.0

 These are four valid numbers with decimals expressed in C++.


 The default type for floating point literals is double.

Lecture Note-Cosc1013 Page 24


Module Title: Basic Programming

Course Title: Fundamentals of Programming I

Compiled By: Destalem H

 To explicitly to express a float or long double numerical literal, we use the f or l suffixes
respectively: 3.14159L // long double 6.02e23f // float
 Any of the letters that can be part of a floating-point numerical constant (e, f, l) can be
written using either lower or uppercase letters without any difference in their meanings.
Boolean literals

 There are only two valid Boolean values: true and false.
 These can be expressed in C++ as values of type bool by using the Boolean literals true and
false.
Character and string literals

 There also exist non-numerical constants, like:


 'z'  Represents a single character.
 "Hello world" Represents strings of characters.
 To represent a single character we enclose it between single quotes (').
 To express a string of more than one character we enclose them between double quotes (").
 x refers to variable x, whereas 'x' refers to the character constant 'x' and “x” represent
string constant.
Escape codes.

 There are special characters that are difficult or impossible to express otherwise in the
source code of a program, like newline (\n) or tab (\t). All of them are preceded by a
backslash (\). Here you have a list of some of such escape codes:
 String literals can extend to more than a single line of code by putting a backslash sign (\) at
the end of each unfinished line. E.g: "string expressed in \ two lines"

Lecture Note-Cosc1013 Page 25


Module Title: Basic Programming

Course Title: Fundamentals of Programming I

Compiled By: Destalem H

Defined constants (#define)

 To define our own names for constants that we use very often without having to resort to
memory consuming variables, simply by using the #define preprocessor directive. Its format
is: #define identifier value
 For example: #define PI 3.14159
 #define NEWLINE '\n'
 This defines two new constants: PI and NEWLINE. Once they are defined, you can use them
in the rest of the code as if they were any other regular constant.
 In fact the only thing that the compiler preprocessor does when it encounters #define
directives is to literally replace any occurrence of their identifier (in the previous example,
these were PI and NEWLINE) by the code to which they have been defined (3.14159 and '\n'
respectively).
 The #define directive is not a C++ statement but a directive for the preprocessor; therefore
it assumes the entire line as the directive and does not require a semicolon (;) at its end.
Declared constants (const)

With the const prefix you can declare constants with a specific type in the same way as you would
do with a variable:

const int PI = 3.14159

const char tab = '\t';

Here, PI and tabulator are two typed constants. They are treated just like regular variables except
that their values cannot be modified after their definition.

Introduction to Strings

 Variables that can store non-numerical values that are longer than one single character are
known as strings.

Lecture Note-Cosc1013 Page 26


Module Title: Basic Programming

Course Title: Fundamentals of Programming I

Compiled By: Destalem H

 The C++ language library provides support for strings through the standard string class. This
is not a fundamental type, but it behaves in a similar way as fundamental types do in its
most basic usage.
 A first difference with fundamental data types is that in order to declare and use objects
(variables) of this type we need to include an additional header file in our source code:
<string>
#include <iostream.h>

#include <string.h>

int main ()

string mystring = "This is a string";

cout << mystring;

return 0;

 Both initialization formats are valid with strings:


o string mystring = "This is a string";
o string mystring ("This is a string");
 Strings can also perform all the other basic operations that fundamental data types can, like
being declared without an initial value and being assigned values during execution.
Operators

 An operator is a symbol that tells the computer to perform certain mathematical (or) logical
manipulations.
 Operators are used in programs to manipulate data and variables.
 C++ operators can be classified into number of categories. They include
1. Arithmetic operators. 5. Increment / Decrement operators.
2. Relational operators 6. Conditional operators.
3. Logical operators. 7. Bitwise operators
4. Assignment operators
1. Arithmetic operators: C++ provides all the basic arithmetic operators like add (+), subtract (-), multiply (*),
divide (/), and mod (%).mod gives remainder of division.
Eg.. For mod: if a = 10; b = 3; c = a % b;  c = 1;

Lecture Note-Cosc1013 Page 27


Module: Basic Programming

Course: Fundamentals of Programming

Compiled By: Destalem H.

2. Relational operators: These are the operators which relate the operands on either side of them like less
than(<),less than or equal(<=),equal(==),Greater than(>),Greater than or equal(>=)and not equal(!=). The
result of a relational operation is a Boolean value that can only be true or false, according to its Boolean
result.
3. Logical operators: C++ has the following three Truth table for AND and OR operations
logical operators. && (meaning logical AND), ||
(logical OR), ! (logical NOT). op-1 op-2 op-1 && op-2 op-1 || op-2
E.g:( (5 == 5) && (3 > 6) )
F F F F
// evaluates to false ( true && false ).
F T F T
((5 == 5) || (3 > 6) )
T F F T
// evaluates to true ( true || false ).
T T T T
4. Assignment operators: used to assign the result of
an expression to a variable and the symbol used is ‘= ‘.
o The part at the left of the assignment operator (=) is known as the lvalue (left value) and the right
one as the rvalue (right value). The lvalue has to be a variable whereas the rvalue can be either a
constant, a variable, the result of an operation or any combination of these.
o It is of 3 types:.
(i) Simple assignment E.g: a = 9;
(ii) Multiple assignment E.g: a = b = c = 36; a = 2 + (b = 5);fs
(iii) Compound assignment
E.g: a + = 15; (add 15 to a .Equivalent a =a +15;)

b - = 5; (subtract 5 from b).

c * = 6; (Multiply c by 6).

d / = 5; (divide d by 5 equal to d =d /5; ).


e % = 10; (divide e by 10 & store remainder in e).
5. Auto increment / decrement (+ + / - - ): used to automatically increment and decrement the value of a
variable by 1. There are 2 types of incrementing or decrementing.
a. Prefix auto increment / decrement --- Adds /subtracts 1 to the operand & result is assigned to the
variable on the left.
Eg. : a = 5;
a=5;

Lecture Note- CoSc1012 Page 28


Module: Basic Programming

Course: Fundamentals of Programming

Compiled By: Destalem H.

b=++a;

b=--a;
Result a=b=6;
a=b=4;
b. Postfix auto increment / decrement --- This first assigns the value to the variable on the left & then
increments/decrements the operand.
Eg. : a = 5;
a=5;
b=a++;

b=a--;
Result b=5, a=6 b=5,a=4;
 Generally a=a+1 can be written as ++a, a++ or a+=1. Similarly a=a-1 can be written as a--, --a or a -= 1.
6. Conditional operator (ternary operator):
 Conditional expressions are of the following form.
exp1 ? exp2 : exp3 ;

 exp1 is evaluated first. If the result is true then exp2 is evaluated else exp3 is evaluated. It is
this evaluated value that becomes the value of the expression.
 For example, consider the following statements. a=10; b=15; x = (a>b) ? a : b; In this example
x will be assigned the value of b.
7. Bitwise Operators
 Bitwise operators modify variables considering the bit patterns that represent the values
they store.
 & Bitwise AND
 | Bitwise Inclusive OR
 ~ Unary complement (bit inversion)
Type conversion in Assignments

 The value of the right side (expression side) of the assignment is converted to the type of the left
side (target variable).
 E.g: int x; char ch; float f;
ch=x;  appropriate amount of high order bits are removed.

x=f;  x will receive the non fractional part of f.

f=ch; 8-bit integer value of ch is stored as the same in floating point format.

f=x; convert an integer value into floating point format.

Lecture Note- CoSc1012 Page 29


Module: Basic Programming

Course: Fundamentals of Programming

Compiled By: Destalem H.

Explicit type casting


 Type casting operators allow converting a datum of a given type to another. There are several
ways to do this in C++.
 The simplest one, is to precede the expression to be converted by the new type enclosed
between parentheses (()):
 E.g: int i;
float f = 3.14;

i = (int) f;

o Code converts the float number 3.14 to an integer value (3), the remainder is lost. Here,
the typecasting operator was (int).
 Another way to do the same thing in C++ is using the functional notation:
 Preceding the expression to be converted by the type and enclosing the expression between
parentheses: E.g: i = int ( f );
 Both ways of type casting are valid in C++.
sizeof()

 This operator accepts one parameter, which can be either a type or a variable itself and returns
the size in bytes of that type or object:
a = sizeof (char); This will assign the value 1 to a because char is a one-byte long type.

 The value returned by sizeof is a constant, so it is always determined before program


execution.
Priority of operators
 When making complex expressions with several operands, we may have some doubts about which
operand is evaluated first and which later.
o For example, in this expression: a = 5 + 7 % 2. we may doubt if it really means:
 a = 5 + (7 % 2) with result 6, OR
 a = (5 + 7) % 2 with result 0
o The correct answer is the first of the two expressions, with a result of 6. There is an
established order with the priority of each operator, and not only the arithmetic ones (those
whose preference we may already know from mathematics) but for all the operators which
can appear in C++.
 Also in the case that there are several operators of the same priority level- which one
must be evaluated first, the rightmost one or the leftmost one is decided by the
Associativity of the operator.
 All these precedence levels for operators can be manipulated or become more legible
using parenthesis signs ( and ), as in this example: a = 5 + 7 % 2; might be written as: a = 5
+ (7 % 2); or a = (5 + 7) % 2;according to the operation that we wanted to perform.

Lecture Note- CoSc1012 Page 30


Module: Basic Programming

Course: Fundamentals of Programming

Compiled By: Destalem H.

 From greatest to lowest priority, the priority order is as follows:


Priority Operator Description Associativity

1 :: Scope Left

2 () [ ] -> . sizeof Left

3 ++ -- increment/decrement Right

~ Complement to one (bitwise)

! unary NOT

&* Reference and Dereference (pointers)

(type) Type casting

4 */% arithmetical operations Left

5 +- arithmetical operations Left

6 << >> bit shifting (bitwise) Left

7 < <= > >= Relational operators Left

8 == != Relational operators Left

9 &^| Bitwise operators Left

10 && || Logic operators Left

11 ?: Conditional Right

= += -= *= /= %=
12 Assignation Right
>>= <<= &= ^= |=

13 , Comma, Separator Left

(The operators that you are familiar with are shaded in the above table)

Input/Output

Lecture Note- CoSc1012 Page 31


Module: Basic Programming

Course: Fundamentals of Programming

Compiled By: Destalem H.

 C++ uses a convenient abstraction called streams to perform input and output operations in
sequential media such as the screen or the keyboard.
 A stream is an object where a program can either insert or extract characters to/from it. We do
not really need to care about many specifications about the physical media associated with the
stream.
 The insertion operator (<<) may be used more than once in a single statement:
o cout << "Hello, " << "I am " << "a C++ statement";
o cout << "Hello, I am " << age << " years old and my zipcode is " << zipcode;
o If we assume the age variable to contain the value 24 and the zipcode variable to contain
90064 the output of the previous statement would be:
 Hello, I am 24 years old and my zipcode is 90064
 cout does not add a line break after its output unless we explicitly indicate it.
 In order to perform a line break on the output we must explicitly insert a new-line character into
cout.
 In C++ a new-line character can be specified as \n (backslash, n):
o E.g: cout << "First sentence.\n ";
o cout << "Second sentence.\nThird sentence.";
 Additionally, to add a new-line, you may also use the endl manipulator.
o Eg: cout << "First sentence." << endl;
o cout << "Second sentence." << endl;
 cin can only process the input from the keyboard once the RETURN key has been pressed.
Therefore, even if we request a single character, the extraction from cin will not process the input
until the user presses RETURN after the character has been introduced.
 Also cin extraction stops reading as soon as if finds any blank space character, so in this case we
will be able to get just one word for each extraction.
Library Functions

 C+ + consists of many library functions which contain the functions that are used in the program
construction of the language.
 These are the header files that are to be included before main () & are sometimes termed as
preprocessor statements.
 Here are some files given:
Header File Purpose of including in the Program

iostream.h Standard input /output streams like cin, cout etc.

math.h Mathematical functions like sin(), cos(), sqrt(),log() etc.

stdlib.h Standard library functions like conversion of one type to other etc.

string.h String manipulation functions like strcpy (), strcat(), strcmp() etc.

Lecture Note- CoSc1012 Page 32


Module: Basic Programming

Course: Fundamentals of Programming

Compiled By: Destalem H.

ctype.h Declares functions for testing characters.

E.g: isalpha(),isnum(),islower(),toupper(),tolower() etc.

time.h Includes date & time functions.

Sample program 1: Write a c++ program to calculate area of a circle for given diameter d, using formula
r2 where r=d/2.

// To calculate area of circle

#include<iostream.h>

void main()

float A, pi=3.1415;

float d, r;

cout<<”Enter the diameter of the circle\n”;

cin>>d;

r=d / 2;

A= pi * r * r;

Cout<< “\nArea of circle = ”<<A;

Sample program 2: Write a c++ program to read the temperature in Fahrenheit and convert it into Celsius.
(Formula: c= (5.0/9)*(f-32)).

#include<iostream.h>

void main ()

float c,f;

cout<<”Enter the temperature in Fahrenheit:”;

cin>>f;

c=(5.0 / 9)*(f - 32);

Lecture Note- CoSc1012 Page 33


Module: Basic Programming

Course: Fundamentals of Programming

Compiled By: Destalem H.

cout<<”The temperature in Celsius is: ”<<c;

Lecture Note- CoSc1012 Page 34


Admas University, CoSc2032, Lecture Note
DATA COMMUUNICATION AND COMPUTER
NETWORKING BASICS

DATA COMMUNICATIONS
The fundamental purpose of a communications system is the exchange of data between two parties.
Figure 1.1 presents one particular example, which is communication between a workstation and a server
over a public telephone network.
Another example is the exchange of voice signals between two telephones over the same network. The
key components of the model are as follows:
 Source. This device generates the data to be transmitted; examples are telephones and personal
computers.
 Transmitter: Usually, the data generated by a source system are not transmitted directly in the
form in which they were generated. Rather, a transmitter transforms and encodes the information
in such a way as to produce electromagnetic signals that can be transmitted across some sort of
transmission system. For example, a modem takes a digital bit stream from an attached device
such as a personal computer and transforms that bit stream into an analog signal that can be
handled by the telephone network.
 Transmission system: This can be a single transmission line or a complex network connecting
source and destination.
 Receiver: The receiver accepts the signal from the transmission system and converts it into a
form that can be handled by the destination device. For example, a modem will accept an analog
signal coming from a network or transmission line and convert it into a digital bit stream.
 Destination: Takes the incoming data from the receiver.

Figure 1.1 Simplified Communications Model

To get some flavor for the focus of data communication, Figure 1.2 provides a new perspective on the
communications model of Figure 1.1a. We trace the details of this figure using electronic mail as an
example. Suppose that the input device and transmitter are components of a personal computer. The
user of the PC wishes to send a message m to another user. The user activates the electronic mail
package on the PC and enters the message via the keyboard (input device). The character string is
briefly buffered in main memory. We can view it as a sequence of bits (g) in memory. The personal
computer is connected to some transmission medium, such as a local network or a telephone line, by an
Page 3 of 70
Admas University, CoSc2032, Lecture Note
I/O device (transmitter), such as a local network transceiver or a modem. The input data are
transferred to the transmitter as a sequence of voltage shifts [g(t)] representing bits on some
communications bus or cable. The transmitter is connected directly to the medium and converts the
incoming stream [g(t)] into a signal [s(t)] suitable for transmission; specific alternatives will be
described later on coming sections.

The transmitted signal s(t) presented to the medium is subject to a number of impairments, discussed
in later section, before it reaches the receiver. Thus, the received signal r(t) may differ from s(t). The
receiver will attempt to estimate the original s(t), based on r(t) and its knowledge of the medium,
producing a sequence of bits These bits are sent to the output personal computer, where they are briefly
buffered in memory as a block of bits In many cases, the destination system will attempt to determine
if an error has occurred and, if so, cooperate with the source system to eventually obtain a complete,
error-free block of data. These data are then presented to the user via an output device, such as a printer
or screen. The message as viewed by the user will usually be an exact copy of the original message (m).

Now consider a telephone conversation. In this case the input to the telephone is a message (m) in the
form of sound waves. The sound waves are converted by the telephone into electrical signals of the
same frequency. These signals are transmitted without modification over the telephone line. Hence the
input signal g(t) and the transmitted signal s(t) are identical. The signals (t) will suffer some distortion
over the medium, so that r(t) will not be identical to s(t). Nevertheless, the signal r(t) is converted back
into a sound wave with no attempt at correction or improvement of signal quality. Thus, is not an exact
replica of m. However, the received sound message is generally comprehensible to the listener. The
discussion so far does not touch on other key aspects of data communications, including data link
control techniques for controlling the flow of data and detecting and correcting errors, and multiplexing
techniques for transmission efficiency

Figure 1.2 Simplified Data Communications Model

Modes of Data Transmission


Transmission Terminology
Data transmission occurs between transmitter and receiver over some transmission medium.
Transmission media may be classified as guided or unguided. In both cases, communication is in the
form of electromagnetic waves. With guided media, the waves are guided along a physical path;
examples of guided media are twisted pair, coaxial cable, and optical fiber. Unguided media, also
called wireless, provide a means for transmitting electromagnetic waves but do not guide them;
examples are propagation through air, vacuum, and seawater.
The term direct link is used to refer to the transmission path between two devices in which signals
propagate directly from transmitter to receiver with no intermediate devices, other than amplifiers or
repeaters used to increase signal strength. Note that this term can apply to both guided and unguided
media.

A guided transmission medium is point to point if it provides a direct link between two devices and

Page 4 of 70
Admas University, CoSc2032, Lecture Note
those are the only two devices sharing the medium. In a multipoint guided configuration, more than
two devices share the same medium.
A transmission may be simplex, half duplex, or full duplex. In simplex transmission, signals are
transmitted in only one direction; one station is transmitter and the other is receiver. In half-duplex
operation, both stations may transmit, but only one at a time. In full-duplex operation, both stations may
transmit simultaneously. In the latter case, the medium is carrying signals in both directions at the same
time. How this can be is explained in due course. We should note that the definitions just given are the
ones in common use in the United States (ANSI definitions). Elsewhere (ITU-T definitions)
Transmission Media
In a data transmission system, the transmission medium is the physical path between transmitter and
receiver. For guided media, electromagnetic waves are guided along a solid medium, such as copper
twisted pair, copper coaxial cable, and optical fiber. For unguided media, wireless transmission occurs
through the atmosphere, outer space, or water.
The characteristics and quality of a data transmission are determined both by the characteristics of the
medium and the characteristics of the signal. In the case of guided media, the medium itself is more
important in determining the limitations of transmission.
For unguided media, the bandwidth of the signal produced by the transmitting antenna is more important
than the medium in determining transmission characteristics. One key property of signals transmitted
by antenna is directionality. In general, signals at lower frequencies are omnidirectional; that is, the
signal propagates in all directions from the antenna. At higher frequencies, it is possible to focus the
signal into a directional beam.
In considering the design of data transmission systems, key concerns are data rate and distance: the
greater the data rate and distance the better. A number of design factors relating to the transmission
medium and the signal determine the data rate and distance:
 Bandwidth: All other factors remaining constant, the greater the bandwidth of a signal, the
higher the data rate that can be achieved.
 Transmission impairments: Impairments, such as attenuation, limit the distance. For guided
media, twisted pair generally suffers more impairment than coaxial cable, which in turn suffers
more than optical fiber.
 Interference: Interference from competing signals in overlapping frequency bands can distort
or wipe out a signal. Interference is of particular concern for unguided media, but is also a
problem with guided media. For guided media, interference can be caused by emanations from
nearby cables. For example, twisted pairs are often bundled together and conduits often carry
multiple cables. Interference can also be experienced from unguided transmissions. Proper
shielding of a guided medium can minimize this problem.
 Number of receivers: A guided medium can be used to construct a point-to-point link or a
shared link with multiple attachments. In the latter case, each attachment introduces some
attenuation and distortion on the line, limiting distance and/or data rate.

Protocols and Standards


 A protocol architecture is the layered structure of hardware and software that supports the
exchange of data between systems and supports distributed applications, such as electronic mail
and file transfer.
 At each layer of a protocol architecture, one or more common protocols are implemented in
communicating systems. Each protocol provides a set of rules for the exchange of data between
systems.
 The most widely used protocol architecture is the TCP/IP protocol suite, which consists of the
following layers: physical, network access, internet, transport, and application.

Page 5 of 70
Admas University, CoSc2032, Lecture Note
 Another important protocol architecture is the seven-layer OSI model.

The Need for a Protocol Architecture


When computers, terminals, and/or other data processing devices exchange data, the procedures
involved can be quite complex. Consider, for example, the transfer of a file between two computers.
There must be a data path between the two computers, either directly or via a communication network.
But more is needed. Typical tasks to be performed are as follow:
1. The source system must either activate the direct data communication path or inform the
communication network of the identity of the desired destination system.
2. The source system must ascertain that the destination system is prepared to receive data.
3. The file transfer application on the source system must ascertain that the file management
program on the destination system is prepared to accept and store the file for this particular user.
4. If the file formats used on the two systems are different, one or the other system must perform
a format translation function.

It is clear that there must be a high degree of cooperation between the two computer systems. Instead
of implementing the logic for this as a single module, the task is
broken up into subtasks, each of which is implemented separately. In a protocol architecture, the
modules are arranged in a vertical stack. Each layer in the stack performs a related subset of the
functions required to communicate with another system. It relies on the next lower layer to perform
more primitive functions and to conceal the details of those functions. It provides services to the next
higher layer. Ideally, layers should be defined so that changes in one layer do not require changes in
other layers.
Of course, it takes two to communicate, so the same set of layered functions must exist in two systems.
Communication is achieved by having the corresponding, or peer, layers in two systems communicate.
The peer layers communicate by means of formatted blocks of data that obey a set of rules or
conventions known as a protocol. The key features of a protocol are as follows:
 Syntax: Concerns the format of the data blocks
 Semantics: Includes control information for coordination and error handling
 Timing: Includes speed matching and sequencing

The TCP/IP Protocol Architecture


The TCP/IP protocol architecture is a result of protocol research and development conducted on the
experimental packet-switched network, ARPANET, funded by the Defense Advanced Research
Projects Agency (DARPA), and is generally referred to as the TCP/IP protocol suite. This protocol suite
consists of a large collection of protocols that have been issued as Internet standards by the Internet
Activities Board (IAB).
The TCP/IP Layers
In general terms, communications can be said to involve three agents: applications, computers, and
networks. Examples of applications include file transfer and electronic mail. The applications that we
are concerned with here are distributed applications that involve the exchange of data between two
computer systems. These applications, and others, execute on computers that can often support multiple
simultaneous applications. Computers are connected to networks, and the data to be exchanged are
transferred by the network from one computer to another. Thus, the transfer of data from one application
to another involves first getting the data to the computer in which the application resides and then
getting the data to the intended application within the computer. With these concepts in mind, we can
organize the communication task into five relatively independent layers.
 Physical layer
 Network access layer
 Internet layer

Page 6 of 70
Admas University, CoSc2032, Lecture Note
 Host-to-host, or transport layer
 Application layer
The physical layer covers the physical interface between a data transmission device (e.g., workstation,
computer) and a transmission medium or network. This layer is concerned with specifying the
characteristics of the transmission medium, the nature of the signals, the data rate, and related matters.
The network access layer is concerned with the exchange of data between an end system (server,
workstation, etc.) and the network to which it is attached. The sending computer must provide the
network with the address of the destination computer, so that the network may route the data to the
appropriate destination Regardless of the nature of the applications that are exchanging data, there is
usually a requirement that data be exchanged reliably. That is, we would like to be assured that all of
the data arrive at the destination application and that the data arrive in the same order in which they
were sent. As we shall see, the mechanisms for providing reliability are essentially independent of the
nature of the applications. Thus, it makes sense to collect those mechanisms in a common layer shared
by all applications; this is referred to as the host-to-host layer, or transport layer. The Transmission
Control Protocol (TCP) is the most commonly used protocol to provide this functionality.
Finally, the application layer contains the logic needed to support the various user applications. For
each different type of application, such as file transfer, a separate module is needed that is peculiar to
that application.

The OSI Model


The Open Systems Interconnection (OSI) reference model was developed by the International
Organization for Standardization (ISO)2 as a model for a computer protocol architecture and as a
framework for developing protocol standards. The OSI model consists of seven layers:
Application, Presentation, Session, Transport, Network, Data link, Physical. Figure1.3 below illustrates
the OSI model and provides a brief definition of the functions performed at each layer. The intent of
the OSI model is that protocols be developed to perform the functions of each layer.

Page 7 of 70
Admas University, CoSc2032, Lecture Note

Figure 1.3: The OSI Layers

Standardization of Protocols Architectures


The principal motivation for the development of the OSI model was to provide a framework for
standardization. Within the model, one or more protocol standards can be developed at each layer. The
model defines in general terms the functions to be performed at that layer and facilitates the standards-
making process in two ways:
 Because the functions of each layer are well defined, standards can be developed independently
and simultaneously for each layer. This speed up the standards-making process.
 Because the boundaries between layers are well defined, changes in standards in one layer need
not affect already existing software in another layer. This makes it easier to introduce new
standards.

Data Transmission and Representation Techniques


Digital Transmission
Data or information can be stored in two ways, analog and digital. For a computer to use the data, it must
be in discrete digital form. Similar to data, signals can also be in analog and digital form. To transmit data
digitally, it needs to be first converted to digital form.

Digital-to-Digital Conversion
This section explains how to convert digital data into digital signals. It can be done in two ways, line

Page 8 of 70
Admas University, CoSc2032, Lecture Note
coding and block coding. For all communications, line coding is necessary whereas block coding is
optional.

Line Coding
The process for converting digital data into digital signal is said to be Line Coding. Digital data
is found in binary format. It is represented (stored) internally as series of 1s and 0s.

Digital signal is denoted by discreet signal, which represents digital data. There are three types
of line coding schemes available:

Unipolar Encoding
Unipolar encoding schemes use single voltage level to represent data. In this case, to represent
binary 1, high voltage is transmitted and to represent 0, no voltage is transmitted. It is also called
Unipolar-Non-return-to-zero, because there is no rest condition i.e. it either represents 1 or 0.

Polar Encoding
Polar encoding scheme uses multiple voltage levels to represent binary values. Polar encodings is
available in four types:
Polar Non Return to Zero (Polar NRZ)
It uses two different voltage levels to represent binary values. Generally, positive voltage represents 1
and negative value represents 0. It is also NRZ because there is no rest condition. NRZ scheme has two
variants: NRZ-L and NRZ-I.

Page 9 of 70
Admas University, CoSc2032, Lecture Note

NRZ-L changes voltage level at when a different bit is encountered whereas NRZ-I changes voltage
when a 1 is encountered.
Return to Zero (RZ)
Problem with NRZ is that the receiver cannot conclude when a bit ended and when the next bit is started,
in case when sender and receiver’s clock are not synchronized.

RZ uses three voltage levels, positive voltage to represent 1, negative voltage to represent 0 and zero
voltage for none. Signals change during bits not between bits.

Manchester
This encoding scheme is a combination of RZ and NRZ-L. Bit time is divided into two halves. It transits
in the middle of the bit and changes phase when a different bit is encountered.
Differential Manchester
This encoding scheme is a combination of RZ and NRZ-I. It also transits at the middle of the bit but changes
phase only when 1 is encountered.
Bipolar Encoding
Bipolar encoding uses three voltage levels, positive, negative, and zero. Zero voltage represents binary 0
and bit 1 is represented by altering positive and negative voltages.

Block Coding

Page 10 of 70
Admas University, CoSc2032, Lecture Note
To ensure accuracy of the received data frame, redundant bits are used. For example, in even-parity, one
parity bit is added to make the count of 1s in the frame even. This way the original number of bits is
increased. It is called Block Coding.
Block coding is represented by slash notation, mB/nB. Means, m-bit block is substituted with n-bit block
where n > m. Block coding involves three steps:
1. Division
2. Substitution
3. Combination.
After block coding is done, it is line coded for transmission.

Analog-to-Digital Conversion
Microphones create analog voice and camera creates analog videos, which are treated is analog data. To
transmit this analog data over digital signals, we need analog to digital conversion.
Analog data is a continuous stream of data in the wave form whereas digital data is discrete. To convert
analog wave into digital data, we use Pulse Code Modulation (PCM).
PCM is one of the most commonly used method to convert analog data into digital form. It involves three
steps:
 Sampling
 Quantization
 Encoding.
Sampling

The analog signal is sampled every T interval. Most important factor in sampling is the rate at
which analog signal is sampled. According to Nyquist Theorem, the sampling rate must be at
least two times of the highest frequency of the signal.

Quantization
Sampling yields discrete form of continuous analog signal. Every discrete pattern shows the amplitude
of the analog signal at that instance. The quantization is done between the maximum amplitude value
and the minimum amplitude value. Quantization is approximation of the instantaneous analog value.

Circuit Switching and packet Switching

Switching
Switching is a mechanism by which data/information sent from source towards destination which are not
directly connected. Networks have interconnecting devices, which receives data from directly connected

Page 11 of 70
Admas University, CoSc2032, Lecture Note
sources, stores data, analyze it and then forwards to the next interconnecting device closest to the
destination.

Switched and Communication Networks


For transmission of data beyond a local area, communication is typically achieved by transmitting data
from source to destination through a network of intermediate switching nodes; this switched network
design is typically used to implement LANs as well. The switching nodes are not concerned with the
content of the data; rather, their purpose is to provide a switching facility that will move the data from
node to node until they reach their destination. Figure 1.4 below illustrates a simple network. The
devices attached to the network may be referred to as stations. The stations may be computers,
terminals, telephones, or other communicating devices. We refer to the switching devices whose
purpose is to provide communication as nodes. Nodes are connected to one another in some topology
by transmission links. Each station attaches to a node, and the collection of nodes is referred to as a
communications network.

In a switched communication network, data entering the network from a station are routed to the
destination by being switched from node to node. For example, in Figure 1.4, data from station A
intended for station F are sent to node 4. They may then be routed via nodes 5 and 6 or nodes 7 and 6
to the destination. Several observations are in order:
1. Some nodes connect only to other nodes (e.g., 5 and 7). Their sole task is the internal (to the
network) switching of data. Other nodes have one or more stations attached as well; in addition
to their switching functions, such nodes accept data from and deliver data to the attached
stations.
2. Node-station links are generally dedicated point-to-point links. Node-node links are usually
multiplexed, using either frequency division multiplexing (FDM) or time division multiplexing
(TDM).
3. Usually, the network is not fully connected; that is, there is not a direct link between every
possible pair of nodes. However, it is always desirable to have more than one possible path
through the network for each pair of stations. This enhances the reliability of the network.

Page 12 of 70
Admas University, CoSc2032, Lecture Note
Figure 1.4 Simple Switching Network
Two different technologies are used in wide area switched networks: circuit switching and packet
switching. These two technologies differ in the way the nodes switch information from one link to
another on the way from source to destination.

Circuit Switched Network


Communication via circuit switching implies that there is a dedicated communication path between two
stations. That path is a connected sequence of links between network nodes. On each physical link, a
logical channel is dedicated to the connection. Communication via circuit switching involves three
phases, which can be explained with reference to Figure above simple Switching Network.
1. . Circuit establishment. Before any signals can be transmitted, an end-to-end (station-to-station)
circuit must be established. For example, station A sends a request to node 4 requesting a
connection to station E. Typically, the link from A to 4 is a dedicated line, so that part of the
connection already exists. Node 4 must find the next leg in a route leading to E. Based on routing
information and measures of availability and perhaps cost, node 4 selects the link to node 5,
allocates a free channel (using FDM or TDM) on that link, and sends a message requesting
connection to E. So far, a dedicated path has been established from A through 4 to 5. Because a
number of stations may attach to 4, it must be able to establish internal paths from multiple
stations to multiple nodes. How this is done is discussed later in this section. The remainder of
the process proceeds similarly. Node 5 allocates a channel to node 6 and internally ties that
channel to the channel from node 4. Node 6 completes the connection to E. In completing the
connection, a test is made to determine if E is busy or is prepared to accept the connection.
2. Data transfer. Data can now be transmitted from A through the network to E. The transmission
may be analog or digital, depending on the nature of the network. As the carriers evolve to fully
integrated digital networks, the use of digital (binary) transmission for both voice and data is
becoming the dominant method. The path is A-4 link, internal switching through 4, 4-5 channel,
internal switching through 5, 5-6 channel, internal switching through 6, 6-E link. Generally, the
connection is full duplex.
3. Circuit disconnects. After some period of data transfer, the connection is terminated, usually by
the action of one of the two stations. Signals must be propagated to nodes 4, 5, and 6 to
deallocate the dedicated resources

Circuit switching was developed to handle voice traffic but is now also used for data traffic. The best-
known example of a circuit-switching network is the public telephone network. A public
telecommunications network can be described using four generic architectural components:
 Subscribers: The devices that attach to the network. It is still the case that most subscriber
devices to public telecommunications networks are telephones, but the percentage of data traffic
increases year by year.
 Subscriber line: The link between the subscriber and the network, also referred to as the
subscriber loop or local loop. Almost all local loop connections use twisted-pair wire. The length
of a local loop is typically in a range from a few kilometers to a few tens of kilometers.
 Exchanges: The switching centers in the network. A switching center that directly supports
subscribers is known as an end office. Typically, an end office will support many thousands of
subscribers in a localized area. There are over 19,000 end offices in the United States, so it is
clearly impractical for each end office to have a direct link to each of the other end offices; this
would require on the order of links. Rather, intermediate switching nodes are used.
Page 13 of 70
Admas University, CoSc2032, Lecture Note
 Trunks: The branches between exchanges. Trunks carry multiple voice frequency circuits using
either FDM or synchronous TDM.

Packet Switching
The long-haul circuit-switching telecommunications network was originally designed to handle voice
traffic, and the majority of traffic on these networks continues to be voice. A key characteristic of
circuit-switching networks is that resources within the network are dedicated to a particular call. For
voice connections, the resulting circuit will enjoy a high percentage of utilization because, most of the
time, one party or the other is talking. However, as the circuit-switching network began to be used
increasingly for data connections, two shortcomings became apparent:
 In a typical user/host data connection (e.g., personal computer user logged on to a database
server), much of the time the line is idle. Thus, with data connections, a circuit-switching
approach is inefficient.
 In a circuit-switching network, the connection provides for transmission at a constant data rate.
Thus, each of the two devices that are connected must transmit and receive at the same data rate
as the other. This limits the utility of the network in interconnecting a variety of host computers
and workstations.

To understand how packet switching addresses these problems, let us briefly summarize packet-
switching operation. Data are transmitted in short packets. A typical upper bound on packet length is
1000 octets (bytes). If a source has a longer message to send, the message is broken up into a series of
packets in figure 1.5 . Each packet contains a portion (or all for a short message) of the user’s data plus
some control information. The control information, at a minimum, includes the information that the
network requires to be able to route the packet through the network and deliver it to the intended
destination. At each node in route, the packet is received, stored briefly, and passed on to the next node.

Figure1.5: The Use of Packets

Let us return to Figure 1.4, simple packet switching, but now assume that it depicts a simple packet
switching network. Consider a packet to be sent from station A to station E. The packet includes control
information that indicates that the intended destination is E. The packet is sent from A to node 4. Node
4 stores the packet, determines the next leg of the route (say 5), and queues the packet to go out on that
link (the 4–5 link). When the link is available, the packet is transmitted to node 5, which forwards the
packet to node 6, and finally to E. This approach has a number of advantages over circuit switching:
 Line efficiency is greater, because a single node-to-node link can be dynamically shared by
many packets over time. The packets are queued up and transmitted as rapidly as possible over
the link. By contrast, with circuit switching, time on a node-to-node link is pre-allocated using
synchronous time division multiplexing. Much of the time, such a link may be idle because a
portion of its time is dedicated to a connection that is idle.
Page 14 of 70
Admas University, CoSc2032, Lecture Note
 A packet-switching network can perform data-rate conversion. Two stations of different data
rates can exchange packets because each connects to its node at its proper data rate.
 When traffic becomes heavy on a circuit-switching network, some calls are blocked; that is, the
network refuses to accept additional connection requests until the load on the network decreases.
On a packet-switching network, packets are still accepted, but delivery delay increases.
 Priorities can be used. If a node has a number of packets queued for transmission, it can transmit
the higher-priority packets first. These packets will therefore experience less delay than lower-
priority packets.

Switching Technique

If a station has a message to send through a packet-switching network that is of length greater than the
maximum packet size, it breaks the message up into packets and sends these packets, one at a time, to
the network. A question arises as to how the network will handle this stream of packets as it attempts
to route them through the network and deliver them to the intended destination. Two approaches are
used in contemporary networks: datagram and virtual circuit.
In the datagram approach, each packet is treated independently, with no reference to packets that have
gone before.

Each node chooses the next node on a packet’s path, taking into account information received from
neighboring nodes on traffic, line failures, and so on. So the packets, each with the same destination
address, do not all follow the same route, and they may arrive out of sequence at the exit point. In this
example, the exit node restores the packets to their original order before delivering them to the
destination. In some datagram networks, it is up to the destination rather than the exit node to do the
reordering. Also, it is possible for a packet to be destroyed in the network. For example, if a packet-
switching node crashes momentarily, all of its queued packets may be lost. Again, it is up to either the
exit node or the destination to detect the loss of a packet and decide how to recover it. In this technique,
each packet, treated independently, is referred to as a datagram.

In the virtual circuit approach, a preplanned route is established before any packets are sent. Once the
route is established, all the packets between a pair of communicating parties follow this same route
through the network. somewhat similar to a circuit in a circuit-switching network and is referred to as
a virtual circuit. Each packet contains a virtual circuit identifier as well as data. Each node on the
preestablished route knows where to direct such packets; no routing decisions are required. At any time,
each station can have more than one virtual circuit to any other station and can have virtual circuits to
more than one station. So the main characteristic of the virtual circuit technique is that a route between
stations is set up prior to data transfer. Note that this does not mean that this is a dedicated path, as in
circuit switching. A transmitted packet is buffered at each node, and queued for output over a line, while
other packets on other virtual circuits may share the use of the line. The difference from the datagram
approach is that, with virtual circuits, the node need not make a routing decision for each packet. It is
made only once for all packets using that virtual circuit.

If two stations wish to exchange data over an extended period of time, there are certain advantages to
virtual circuits. First, the network may provide services related to the virtual circuit, including
sequencing and error control. Sequencing refers to the fact that, because all packets follow the same
route, they arrive in the original order. Error control is a service that assures not only that packets arrive
in proper sequence, but also that all packets arrive correctly. For example, if a packet in a sequence
from node 4 to node 6 fails to arrive at node 6, or arrives with an error, node 6 can request a
retransmission of that packet from node 4. Another advantage is that packets should transit the network
more rapidly with a virtual circuit; it is not necessary to make a routing decision for each packet at each
node
Page 15 of 70
Admas University, CoSc2032, Lecture Note

COMPUTER NETWORKING
A system of interconnected computers and computerized peripherals such as printers is called computer
network. This interconnection among computers facilitates information sharing among them.
Computers may connect to each other by either wired or wireless media.

Classification of Computer Networks


Computer networks are classified based on various factors. They include:
 Geographical span
 Inter-connectivity
 Administration
 Architecture

Geographical Span
Geographically a network can be seen in one of the following categories:
 It may be spanned across your table, among Bluetooth enabled devices, Ranging not more than
few meters.
 It may be spanned across a whole building, including intermediate devices to connect all floors.
 It may be spanned across a whole city.
 It may be spanned across multiple cities or provinces.
 It may be one network covering whole world.
Inter-Connectivity
Components of a network can be connected to each other differently in some fashion. By
connectedness we mean either logically, physically, or both ways.
 Every single device can be connected to every other device on network, making the
network mesh.
 All devices can be connected to a single medium but geographically disconnected,
created bus-like structure.
 Each device is connected to its left and right peers only, creating linear structure.
 All devices connected together with a single device, creating star-like structure.
 All devices connected arbitrarily using all previous ways to connect each other, resulting
in a hybrid structure.
Administration
From an administrator’s point of view, a network can be private network which belongs a single
autonomous system and cannot be accessed outside its physical or logical domain. A network can be
public, which is accessed by all.

Network Architecture
Computer networks can be discriminated into various types such as Client-Server, peer-to-peer or
hybrid, depending upon its architecture.
 There can be one or more systems acting as Server. Other being Client, requests the Server to
serve requests. Server takes and processes request on behalf of Clients.
 Two systems can be connected Point-to-Point, or in back-to-back fashion. They both reside at the
same level and called peers.
 There can be hybrid network which involves network architecture of both the above types.
Network Applications
Computer systems and peripherals are connected to form a network. They provide numerous
Page 16 of 70
Admas University, CoSc2032, Lecture Note
advantages:
 Resource sharing such as printers and storage devices
 Exchange of information by means of e-Mails and FTP
 Information sharing by using Web or Internet
 Interaction with other users using dynamic web pages
 IP phones
 Video conferences
 Parallel computing
 Instant messaging

Types of Computer Networks based on Geographical Span


Generally, networks are distinguished based on their geographical span. A network can be as small as
distance between your mobile phone and its Bluetooth headphone and as large as the internet itself,
covering the whole geographical world.
Personal AreaNetwork
A Personal Area Network (PAN) is smallest network which is very personal to a user. This may include
Bluetooth enabled devices or infra-red enabled devices. PAN has connectivity range up to 10 meters.
PAN may include wireless computer keyboard and mouse, Bluetooth enabled headphones, wireless
printers, and TV remotes.

For example, Piconet is Bluetooth-enabled Personal Area Network which may contain up to 8
devices connected together in a master-slave fashion.

LocalArea Network
A computer network spanned inside a building and operated under single administrative system
is generally termed as Local Area Network (LAN). Usually, LAN covers an organization offices,
schools, colleges or universities. Number of systems connected in LAN may vary from as least
as two to as much as 16 million. LAN provides a useful way of sharing the resources between end
users. The resources such as printers, file servers, scanners, and internet are easily sharable
among computers.

Page 17 of 70
Admas University, CoSc2032, Lecture Note

LANs are composed of inexpensive networking and routing equipment. It may contains local servers
serving file storage and other locally shared applications. It mostly operates on private IP addresses and
does not involve heavy routing. LAN works under its own local domain and controlled centrally.
LAN uses either Ethernet or Token-ring technology. Ethernet is most widely employed LAN technology
and uses Star topology, while Token-ring is rarely seen. LAN can be wired, wireless, or in both forms
at once.
Metropolitan AreaNetwork
The Metropolitan Area Network (MAN) generally expands throughout a city such as cable TV network.
It can be in the form of Ethernet, Token-ring, ATM, or Fiber Distributed Data Interface (FDDI).
Metro Ethernet is a service which is provided by ISPs. This service enables its users to expand their
Local Area Networks. For example, MAN can help an organization to connect all of its offices in a city.

Backbone of MAN is high-capacity and high-speed fiber optics. MAN works in between Local Area
Network and Wide Area Network. MAN provides uplink for LANs to WANs or internet.

Wide AreaNetwork
As the name suggests, the Wide Area Network (WAN) covers a wide area which may span across
provinces and even a whole country. Generally, telecommunication networks are Wide Area Network.
These networks provide connectivity to MANs and LANs. Since they are equipped with very high speed
backbone, WANs use very expensive network equipment.

Page 18 of 70
Admas University, CoSc2032, Lecture Note

WAN may use advanced technologies such as Asynchronous Transfer Mode (ATM), Frame Relay, and
Synchronous Optical Network (SONET). WAN may be managed by multiple administration.

Types of Computer Networks based on Architecture


Two remote application processes can communicate mainly in two different fashions:

 Peer-to-peer: Both remote processes are executing at same level and they exchange data using
some shared resource.
 Client-Server: One remote process acts as a Client and requests some resource from another
application process acting as Server.

In client-server model, any process can act as Server or Client. Itis not the type of machine, size of the
machine, or its computing power which makes it server; it is the ability of serving request that makes a
machine a server.

A system can act as Server and Client simultaneously. That is, one process is acting as Server and
another is acting as a client. This may also happen that both client and server processes reside on the
same machine.
Communication
Two processes in client-server model can interact in various ways:
 Sockets
 Remote Procedure Calls (RPC)
Page 19 of 70
Admas University, CoSc2032, Lecture Note
Sockets
In this paradigm, the process acting as Server opens a socket using a well-known (or known by client)
port and waits until some client request comes. The second process acting as a Client also opens a socket;
but instead of waiting for an incoming request, the client processes ‘requests first’.

When the request is reached to server, it is served. It can either be an information sharing or resource
request.
Remote Procedure Call
This is a mechanism where one process interacts with another by means of procedure calls. One process
(client) calls the procedure lying on remote host. The process on remote host is said to be Server. Both
processes are allocated stubs. This communication happens in the following way:
 The client process calls the client stub. It passes all the parameters pertaining to program local
to it.
 All parameters are then packed (marshalled) and a system call is made to send them to other side
of the network.
 Kernel sends the data over the network and the other end receives it.
 The remote host passes data to the server stub where it is unmarshalled.
 The parameters are passed to the procedure and the procedure is then executed.
 The result is sent back to the client in the same manner.

NETWORK COMPONENTS
Networking hardware components

Network interface card


The Network Interface Card (NIC), also known as a network adaptor, acts as the interface between the
computer and the physical network connection. In most networks, every computer must have a network
interface card to be able to connect to the network. NICs are usually specific to a particular type of
cabling – for example, a NIC may have either an RJ45 connector or a BNC connector – although it is
possible to get combo cards, which include more than one type of connector.

Transceivers
Page 20 of 70
Admas University, CoSc2032, Lecture Note
A transceiver is a networking device that converts from one cabling technology to another. For example,
a transceiver may act as an interface between a network based on coaxial cable and one using fibre-
optic cable.
Repeater
In a bus topology, signal loss can occur if the segments are too long. A repeater is a
device that connects two network segments and broadcasts data between them. It
amplifies the signal, thereby extending the usable length of the bus.

Hub
One network component that has become standard equipment in networks is the hub. A hub acts as the
central component in a star topology, and typically contains 4, 8, 16 or even more different ports for
connecting to computers or other hubs. It is similar in operation to a repeater, except that it broadcasts
data received by any of the ports to all other ports on the hub. Hubs can be active, passive or hybrid.

Most hubs are active; that is, they regenerate and retransmit signals in the same way as a repeater does.
Because hubs usually have eight to twelve ports for network computers to connect to, they are
sometimes called multiport repeaters. Active hubs require electrical power to run. Some types of hubs
are passive. They act as connection points and do not amplify or regenerate the signal; the signal passes
through the hub. Passive hubs do not require electrical power to run. Advanced hubs that will
accommodate several different types of cables are called hybrid hubs.

Bridges, switches and routers


For large networks it is often necessary to partition it into smaller groups of nodes to help isolate traffic
and improve performance. A bridge is a device that acts as an interface between two sets of nodes. For
example, if a company’s network has been partitioned into two subnets, for the sales department and
administration department respectively, a bridge will be placed between the two networks. If a computer
on the sales subnet sends data to another computer on the sales subnet, the bridge will not pass on the
data to the administration subnet. However, if the same computer sends data to a computer on the
administration subnet, it will be forwarded by the bridge. Because not all data is passed onto the other
subnet, network traffic is reduced.

A switch is similar to a bridge, except that it has multiple ports. A switch can also be seen as a more
intelligent hub – whereas a hub passes on all data to every port, a switch will only pass data on to the
port that it is intended for.

A router is also used for connecting networks together. However, unlike a bridge, a router can be used
to connect networks that use different network technologies. Routers are very commonly found in the
hardware infrastructure that forms the basis of the Internet.

The topic of routing in computer networking is a crucial one and has been the subject of much research.
We will return to this important topic in Handout 4 (Network Architecture).

Wireless networking
Although most networks use physical connections between the network components, recently wireless
networking has been increasing in popularity. Wireless networks can use infrared light, line-of-sight
lasers, or radio waves to transmit data between nodes without the need for physical cabling. They
eliminate the need to install physical cabling and offer a lot of flexibility for users using the network.
However, they are currently more expensive and slower than cable-based networks. As costs drop and
performance increases, wireless networks are sure to be increasingly popular in the future.

There are two main types of hardware associated with wireless communication in
computing: Bluetooth and 802.11. Bluetooth only allows very short-range transmission

Page 21 of 70
Admas University, CoSc2032, Lecture Note
(typically less than 10m) and is intended primarily for cable-free peripherals, such as
mouses and keyboards. 802.11, or wireless Ethernet, is the standard for wireless
networking of computers, and will be discussed in more detail in Handout 4 (Network
Architecture

Network Software Components

Network Operating Systems


Just as a computer cannot operate without a computer operating system, a network of computers cannot
operate without a network operating system. Without a network operating system of some kind,
individual computers cannot share resources, and other users cannot make use of those resources. This
section provides a general introduction to network operating systems (sometimes referred to as NOSs).
It describes the basic features and functions of a NOS and contrasts these with the capabilities of a stand-
alone operating system.

Novell's NetWare is the most familiar and popular example of a NOS in which the client computer's
networking software is added on to its existing computer operating system. The desktop computer needs
both operating systems in order to handle stand-alone and networking functions together.

Network operating system software is integrated into a number of popular operating systems including
Windows 2000 Server/Windows 2000 Professional, Windows NT Server/Windows NT Workstation,
Windows 98, Windows 95, and AppleTalk.

A computer's operating system coordinates the interaction between the computer and the programs
(applications) it is running. It controls the allocation and use of hardware resources such as:
 Memory
 CPU time
 Disk space
 Peripheral devices

In a networking environment, servers provide resources to the network clients, and client network
software makes these resources available to the client computer. The network and the client operating
systems are coordinated so that all portions of the network function properly.
Multitasking

A multitasking operating system, as the name suggests, provides the means for a computer to process
more than one task at a time. A true multitasking operating system can run as many tasks as there are
processors (CPUs). If there are more tasks than processors, the computer must arrange for the available
processors to devote a certain amount of time to each task, alternating between tasks until all are
completed. With this system, the computer appears to be working on several tasks at once.

There are two primary forms of multitasking:


 Pre-emptive: In pre-emptive multitasking, the operating system can take control of the CPU
whenever it wants to, without the task's cooperation.
 Non-pre-emptive (cooperative): In non-pre-emptive multitasking, the task
itself decides when to give up the CPU. Programs written for non-pre-emptive
multitasking systems must include provisions for yielding control of the processor.
No other program can run until the non-pre-emptive program has given up control
of the processor.

Page 22 of 70
Admas University, CoSc2032, Lecture Note
Because the interaction between the stand-alone operating system and the NOS is ongoing, a pre-
emptive multitasking system offers certain advantages. For example, when the situation requires it, the
pre-emptive system can shift CPU activity from a local task to a network task.

Client software

In a stand-alone system, when the user types a command that requests the computer to perform a task,
the request goes over the computer's local bus to the computer's CPU. For example, if you want to see
a directory listing on one of the local hard disks, the CPU interprets and executes the request and then
displays the results in a directory listing in the window. In a network environment, however, when a
user initiates a request to use a resource that exists on a server in another part of the network, the request
has to be forwarded, or redirected, away from the local bus, out onto the network, and from there to the
server with the requested resource. This forwarding is performed by the redirector.

The redirector

A redirector processes forwarding requests. Depending on the networking software, this redirector is
sometimes referred to as the "shell" or the "requester." The redirector is a small section of code in the
NOS that:
 Intercepts requests in the computer
 Determines if the requests should continue in the local computer's bus or be
redirected over the network to another server

Redirector activity originates in a client computer when the user issues a request for a network resource
or service. Figure 1.6 shows how a redirector forwards requests to the network. The user's computer is
referred to as a client because it is making a request of a server. The request is intercepted by the
redirector and forwarded out onto the network. The server processes the connection requested by client
redirectors and gives them access to the resources they request. In other words, the server services - or
fulfils - the request made by the client.

Figure 1.6 – The operation of a redirector in the client operating system

Using the redirector, users don't need to be concerned with the actual location of data or peripherals, or
with the complexities of making a connection.

1.3 Server software

The role of the NOS on a server is to process and act upon requests from clients (redirectors) for network
resources managed by the server. For example, in Figure 1.7, a user is requesting a directory listing on
a shared remote hard disk. The request is forwarded by the redirector on to the network, where it is
Page 23 of 70
Admas University, CoSc2032, Lecture Note
passed to the file and print server containing the shared directory. The request is granted, and the
directory listing is provided.

Figure 1.7 – A request for a directory listing over a network

The server is also responsible for controlling the way in which resources are shared over the network.
Sharing is the term used to describe resources made publicly available for access by anyone on the
network. Most NOSs not only allow sharing, but also determine the degree of sharing. For example, an
office manager wants everyone on the network to be familiar with a certain document (file), so she
shares the document. However, she controls access to the document by sharing it in such a way that:
 Some users will be able only to read it
 Some users will be able to read it and make changes in it

Security models

It is the responsibility of the network administrator to ensure that network resources will be safe from
both unauthorised access and accidental or deliberate damage. Policies for assigning permissions and
rights to network resources are at the heart of securing the network.

Two security models have evolved for keeping data and hardware resources safe:
 Password-protected shares
 Access permissions
These models are also called "share-level security" (for password-protected shares) and "user-level
security" (for access permissions).

Implementing password-protected shares requires assigning a password to each shared


resource. Access to the shared resource is granted when a user enters the correct
password. In many systems, resources can be shared with different types of
permissions. The password-protected share system is a simple security method that
allows anyone who knows the password to obtain access to that particular resource.

Access-permission security involves assigning certain rights on a user-by-user basis. A user types a
password when logging on to the network. The server validates this user name and password
combination and uses it to grant or deny access to shared resources by checking access to the resource
against a user- access database on the server. Access-permission security provides a higher level of
control over access rights. It is much easier for one person to give another person a printer password, as
in share-level security. It is less likely for that person to give away a personal password. Because user-
level security is more extensive and can determine various levels of security, it is usually the preferred
model in larger organizations.
Page 24 of 70
Admas University, CoSc2032, Lecture Note

Managing users
Network operating systems also allow a network administrator to determine which people, or groups of
people, will be able to access network resources. A network administrator can use the NOS to:
 Create user privileges, tracked by the network operating system, that indicate who gets to use
the network
 Grant or deny user privileges on the network
 Remove users from the list of users that the network operating system tracks

To simplify the task of managing users in a large network, NOSs allow for the creation of user groups.
By classifying individuals into groups, the administrator can assign privileges to the group. All group
members have the same privileges, which have been assigned to the group as a whole. When a new user
joins the network, the administrator can assign the new user to the appropriate group, with its
accompanying rights and privileges.

Overview of NOSs

The major server-based network operating systems are Microsoft Windows NT 4 Server, Windows
2000 Server, Windows 2003 Server, Novell NetWare 3.x, 4.x and 5.x, and UNIX (including Linux and
Solaris). The principal peer-to-peer network operating systems are AppleTalk, Windows 95, 98, ME
and XP, and UNIX. Each operating system has its own strengths and weaknesses, and its own supporters
and detractors.

Network Applications

Computer networking has revolutionised the way people use computers. This section will briefly
examine some of the applications of computer networking that have led to this massive change. In
particular we will look at the Internet and electronic mail (or email).

The Internet

The Internet is a vast network of networks, the ultimate WAN, consisting of tens of
thousands of businesses, universities, and research organizations with millions of
individual users and using a variety of different network architectures.

What is now known as the Internet was originally formed in 1970 as a military network
called ARPAnet (Advanced Research Projects Agency network) as part of the United
States Department of Defence. The network opened to non-military users in the 1970s,
when universities and companies doing defence-related research were given access, and
flourished in the late 1980s as most universities and many businesses around the world
started to use the Internet. In 1993, when commercial Internet service providers were
first permitted to sell Internet connections to individuals, usage of the network grew
tremendously. There were millions of new users within months, and a new era of
computer communications began. Today, it is estimated that over 500 million people
use the Internet worldwide. The table below breaks this number down by region.

Continent Number of Internet


users
Africa 4.15 million

Page 25 of 70
Admas University, CoSc2032, Lecture Note
Asia/Pacific 143.99 million
Europe 154.63 million
Middle East 4.65million
Canada & USA 180.68 million
Latin America 25.33 million
World Total 513.41 million

Every site on the Internet has an address, just like people have PO Box numbers at their local post office.
On the Internet addresses are called URLs (Uniform Resource Locators). URLs are written as a number
of words separated by dots, for example www.yahoo.com. The word after the final dot (e.g. com) is the
domain of the address. The domain indicates the category of the web site. The table below lists some of
the more common categories of address on the Internet.

Domain type Organisation type


edu Educational institution
com Commercial organisation
gov Governmental
mil Military
net Network providers and support
org Other organisations
country code A country code, for example .et for Ethiopia,
.uk for the United Kingdom

2.1.1 The World Wide Web

The World Wide Web (WWW) is a way of browsing the information on the Internet in a pleasant, easy
to understand. Text can be mixed with graphics, video, and audio to provide multimedia (i.e. many
different media) Internet content.

This is all made possible by using a special communications protocol, called the Hypertext Transport
Protocol (HTTP). You may have noticed when using the Internet that many URLs begin with the letters
“http://” - this means that the page of information will be transmitted using the Hypertext Transport
Protocol. Pages of multimedia Internet content are commonly written in a special language called
HTML (the Hypertext Markup Language)

Instant messaging

One of the more recent innovations in the use of the Internet is instant messaging. Using instant
messaging software two users in different parts of the world can take part in an on-line conversation
using their personal computers. Text typed at one computer will be “instantly” transmitted to the screen
of the other. Instant messaging provides for much faster and interactive communication than electronic
mail.

Electronic mail

When most people think of applications of the Internet they probably think first of electronic mail, or
email. Originally email was a way of sending simple text messages to different users over local area
networks. However, nowadays email can be used to send multimedia content such as audio, video or
even computer software to a user anywhere in the world.

Page 26 of 70
Admas University, CoSc2032, Lecture Note
Email is made possible by using the Simple Mail Transport Protocol (SMTP). SMTP specifies how
electronic mail messages are exchanged between computers using TCP. In order to use email, it is
necessary to install software on both the sending and receiving computer. Email uses the client-server
method to allow mail to be exchanged. Client computers exchange messages with a mail server that is
responsible for ensuring that the message reaches its destination. On the server computer each user is
assigned a specific mailbox. This electronic mailbox is just like a normal PO Box – mail is stored there
until a user logs on to collect their mail. Each electronic mailbox has a unique email address. Email
addresses are divided into two parts: the user name and the mailbox name. These two parts are separated
by an “@” character. For example, [email protected] is a valid email address. The user name is
“Elizabeth”, and the mail server that is responsible for collecting the mail is located at the computer
called “telecom.net.et”. In this case “telecom.net.et” is a mail server running at Ethiopian Telecom in
Addis Ababa. Remember that this computer name will also have an associated IP address to identify it
on the Internet.

SMTP is the protocol used to send email on the Internet. The user receiving the email will need to use
another protocol to access the incoming mail from the mail server. Two different protocols exist for this
purpose: the Post Office Protocol (POP3) and the newer alternative, Internet Message Access Protocol
(IMAP).

Computer Network Topologies

A Network Topology is the arrangement with which computer systems or network devices are
connected to each other. Topologies may define both physical and logical aspect of the network. Both
logical and physical topologies could be same or different in a same network.

Point-to-Point
Point-to-point networks contains exactly two hosts such as computer, switches, routers, or servers
connected back to back using a single piece of cable. Often, the receiving end of one host is connected
to sending end of the other and vice versa

If the hosts are connected point-to-point logically, then may have multiple intermediate devices. But
the end hosts are unaware of underlying network and see each other as if they are connected directly.

Bus Topology
In case of Bus topology, all devices share single communication line or cable. Bus topology may have
problem while multiple hosts sending data at the same time. Therefore, Bus topology either uses
CSMA/CD technology or recognizes one host as Bus Master to solve the issue. It is one of the simple
forms of networking where a failure of a device does not affect the other devices. But failure of the
shared communication line can make all other devices stop functioning.

Page 27 of 70
Admas University, CoSc2032, Lecture Note

Both ends of the shared channel have line terminator. The data is sent in only one direction and as soon
as it reaches the extreme end, the terminator removes the data from the line.

Star Topology
All hosts in Star topology are connected to a central device, known as hub device, using a point-to-point
connection. That is, there exists a point to point connection between hosts and hub. The hub device can
be any of the following:
 Layer-1 device such as hub or repeater
 Layer-2 device such as switch or bridge
 Layer-3 device such as router or gateway

As in Bus topology, hub acts as single point of failure. If hub fails, connectivity of all hosts to all other
hosts fails. Every communication between hosts takes place through only the hub. Star topology is not
expensive as to connect one more host, only one cable is required and configuration is simple

Page 28 of 70
Admas University, CoSc2032, Lecture Note
.Ring Topology
In ring topology, each host machine connects to exactly two other machines, creating a circular network
structure. When one host tries to communicate or send message to a host which is not adjacent to it,
the data travels through all intermediate hosts. To connect one more host in the existing structure, the
administrator may need only one more extra cable.

Failure of any host results in failure of the whole ring. Thus, every connection in the ring is a point of
failure. There are methods which employ one more backup ring.
Mesh Topology
In this type of topology, a host is connected to one or multiple hosts. This topology has hosts in point-
to-point connection with every other host or may also have hosts which are in point-to-point connection
with few hosts only.

Hosts in Mesh topology also work as relay for other hosts which do not have direct point-to-point links.
Mesh technology comes into two types:
 Full Mesh: All hosts have a point-to-point connection to every other host in the
network. Thus, for every new host n(n-1)/2 connections are required. It provides the
most reliable network structure among all network topologies.
Page 29 of 70
Admas University, CoSc2032, Lecture Note
 Partially Mesh: Not all hosts have point-to-point connection to every other host. Hosts
connect to each other in some arbitrarily fashion. This topology exists where we need
to provide reliability to some hosts out of all.
Tree Topology
Also known as Hierarchical Topology, this is the most common form of network topology in use
presently. This topology imitates as extended Star topology and inherits properties of Bus topology.
This topology divides the network into multiple levels/layers of network. Mainly in LANs, a network
is bifurcated into three types of network devices. The lowermost is access-layer where computers are
attached. The middle layer is known as distribution layer, which works as mediator between upper layer
and lower layer. The highest layer is known as core layer, and is central point of the network, i.e. root
of the tree from which all nodes fork.

All neighboring hosts have point-to-point connection between them. Similar to the Bus topology, if the
root goes down, then the entire network suffers even though it is not the single point of failure. Every
connection serves as point of failure, failing of which divides the network into unreachable segment.

Hybrid Topology
A network structure whose design contains more than one topology is said to be hybrid topology.
Hybrid topology inherits merits and demerits of all the incorporating topologies.

Page 30 of 70
The above picture represents an arbitrarily hybrid topology. The combining topologies may contain
attributes of Star, Ring, Bus, and Daisy-chain topologies. Most WANs are connected by means of Dual-
Ring topology and networks connected to them are mostly Star topology networks. Internet is the best
example of largest Hybrid topology.

Computer Network Model


Network engineering is a complicated task, which involves software, firmware, chip level engineering,
hardware, and electric pulses. To ease network engineering, the whole networking concept is divided
into multiple layers. Each layer is involved in some particular task and is independent of all other
layers. But as a whole, almost all networking tasks depend on all of these layers. Layers share data
between them and they depend on each other only to take input and send output.
Layered Tasks
In layered architecture of Network Model, one whole network process is divided into small tasks. Each
small task is then assigned to a particular layer which works dedicatedly to process the task only. Every
layer does only specific work.
In layered communication system, one layer of a host deals with the task done by or to be done by its
peer layer at the same level on the remote host. The task is either initiated by layer at the lowest level
or at the top most level. If the task is initiated by the topmost layer, it is passed on to the layer below it
for further processing. The lower layer does the same thing, it processes the task and passes on to lower
layer. If the task is initiated by lowermost layer, then the reverse path is taken.

Every layer clubs together all procedures, protocols, and methods which it requires to execute its piece
of task. All layers identify their counterparts by means of encapsulation header and tail.
OSI Model
Open System Interconnect is an open standard for all communication systems. OSI model is
established by International Standard Organization (ISO). This model has seven layers:

Page 31 of 70
Application Layer: This layer is responsible for providing interface to the application user. This layer
encompasses protocols which directly interact with the user.
Presentation Layer: This layer defines how data in the native format of remote host should be
presented in the native format of host.
Session Layer: This layer maintains sessions between remote hosts. For example, once user/password
authentication is done, the remote host maintains this session for a while and does not ask for
authentication again in that time span.
Transport Layer: This layer is responsible for end-to-end delivery between hosts.
Network Layer: This layer is responsible for address assignment and uniquely addressing hosts in a
network.
Data Link Layer: This layer is responsible for reading and writing data from and onto the line. Link
errors are detected at this layer.
Physical Layer: This layer defines the hardware, cabling, wiring, power output, pulse rate etc.
TCP/IP Model
Internet uses TCP/IP protocol suite, also known as Internet suite. This defines Internet Model which
contains four layered architecture. OSI Model is general communication model but Internet Model is
what the internet uses for all its communication. The internet is independent of its underlying network
architecture so is its Model. This model has the following layers:

Application Layer: This layer defines the protocol which enables user to interact with the network.
For example, FTP, HTTP etc.

Page 32 of 70
Transport Layer: This layer defines how data should flow between hosts. Major protocol at this layer
is Transmission Control Protocol (TCP). This layer ensures data delivered between hosts is in-order
and is responsible for end- to-end delivery.
Internet Layer: Internet Protocol (IP) works on this layer. This layer facilitates host addressing and
recognition. This layer defines routing.
Link Layer: This layer provides mechanism of sending and receiving actual data. Unlike its OSI
Model counterpart, this layer is independent of underlying network architecture and hardware.

TRANSMISSION MEDIA
Guided Media
The transmission media is nothing but the physical media over which communication takes place in
computer networks.
Magnetic Media
One of the most convenient way to transfer data from one computer to another, even before the birth of
networking, was to save it on some storage media and transfer physical from one station to another.
Though it may seem old-fashion way in today’s world of high speed internet, but when the size of data
is huge, the magnetic media comes into play. For example, a bank has to handle and transfer huge data
of its customer, which stores a backup of it at some geographically far-away place for security reasons
and to keep it from uncertain calamities. If the bank needs to store its huge backup data, then its transfer
through internet is not feasible. The WAN links may not support such high speed. Even if they do; the
cost is too high to afford.
In these cases, data backup is stored onto magnetic tapes or magnetic discs, and then shifted physically
at remote places.
Twisted PairCable
A twisted pair cable is made of two plastic insulated copper wires twisted together to form a single
media. Out of these two wires, only one carries actual signal and another is used for ground reference.
The twists between wires are helpful in reducing noise (electro-magnetic interference) and crosstalk.

There are two types of twisted pair cables:


 Shielded Twisted Pair (STP) Cable
 Unshielded Twisted Pair (UTP) Cable
STP cables comes with twisted wire pair covered in metal foil. This makes it more indifferent to noise
and crosstalk.

Page 33 of 70
UTP has seven categories, each suitable for specific use. In computer networks, Cat- 5, Cat-5e, and Cat-
6 cables are mostly used. UTP cables are connected by RJ45 connectors.

Coaxial Cable
Coaxial cable has two wires of copper. The core wire lies in the center and it is made of solid conductor.
The core is enclosed in an insulating sheath. The second wire is wrapped around over the sheath and
that too in turn encased by insulator sheath. This all is covered by plastic cover.

Because of its structure, the coax cable is capable of carrying high frequency signals than that of twisted
pair cable. The wrapped structure provides it a good shield against noise and cross talk. Coaxial cables
provide high bandwidth rates of up to 450 mbps.
There are three categories of coax cables namely, RG-59 (Cable TV), RG-58 (Thin Ethernet), and RG-
11 (Thick Ethernet). RG stands for Radio Government. Cables are connected using BNC connector
and BNC-T. BNC terminator is used to terminate the wire at the far ends.

Fiber Optics
Fiber Optic works on the properties of light. When light ray hits at critical angle, it tends to refracts at
90 degree. This property has been used in fiber optic. The core of fiber optic cable is made of high
quality glass or plastic. From one end of it light is emitted, it travels through it and at the other end light
detector detects light stream and converts it to electric data.
Fiber Optic provides the highest mode of speed. It comes in two modes, one is single mode fiber and
second is multimode fiber. Single mode fiber can carry a single ray of light whereas multimode is
capable of carrying multiple beams of light.

Page 34 of 70
Fiber Optic also comes in unidirectional and bidirectional capabilities. To connect and access
fiber optic special type of connectors are used. These can be Subscriber Channel (SC), Straight
Tip (ST), or MT-RJ.

Unguided Media
Wireless transmission is a form of unguided media. Wireless communication involves no physical link
established between two or more devices, communicating wirelessly. Wireless signals are spread over
in the air and are received and interpreted by appropriate antennas.
When an antenna is attached to electrical circuit of a computer or wireless device, it converts the digital
data into wireless signals and spread all over within its frequency range. The receptor on the other end
receives these signals and converts them back to digital data.
A little part of electromagnetic spectrum can be used for wireless transmission.

Radio Transmission
Radio frequency is easier to generate and because of its large wavelength it can penetrate through walls
and structures alike. Radio waves can have wavelength from 1mm – 100,000km and have frequency
ranging from 3Hz (Extremely Low Frequency) to 300 GHz (Extremely High Frequency). Radio
frequencies are sub-divided into six bands.
Radio waves at lower frequencies can travel through walls whereas higher RF can travel in straight line
and bounce back. The power of low frequency waves decreases sharply as they cover long distance.
High frequency radio waves have more power.
Lower frequencies such as VLF, LF, MF bands can travel on the ground up to 1000 kilometers, over
the earth’s surface.

Page 35 of 70
Radio waves of high frequencies are prone to be absorbed by rain and other obstacles. They use
Ionosphere of earth atmosphere. High frequency radio waves such as HF and VHF bands are spread
upwards. When they reach Ionosphere, they are refracted back to the earth.

Microwave Transmission
Electromagnetic waves above 100MHz tend to travel in a straight line and signals over them can be
sent by beaming those waves towards one particular station. Because Microwaves travels in straight
lines, both sender and receiver must be aligned to be strictly in line-of-sight.
Microwaves can have wavelength ranging from 1mm – 1meter and frequency ranging from 300MHz to
300GHz.

Microwave antennas concentrate the waves making a beam of it. As shown in picture above, multiple
antennas can be aligned to reach farther. Microwaves have higher frequencies and do not penetrate
wall like obstacles.
Microwave transmission depends highly upon the weather conditions and the frequency it is using.

Infrared Transmission
Infrared wave lies in between visible light spectrum and microwaves. It has wavelength of 700nm to
1mm and frequency ranges from 300GHz to 430THz.
Infrared wave is used for very short range communication purposes such as television and its remote.
Infrared travels in a straight line hence it is directional by nature. Because of high frequency range,
Infrared cannot cross wall-like obstacles.

Light Transmission
Highest most electromagnetic spectrum which can be used for data transmission is light or optical
signaling. This is achieved by means of LASER.

Page 36 of 70
Because of frequency light uses, it tends to travel strictly in straight line. Hence the sender and receiver
must be in the line-of-sight. Because laser transmission is unidirectional, at both ends of communication
the laser and the photo-detector needs to be installed. Laser beam is generally 1mm wide hence it is a
work of precision to align two far receptors each pointing to lasers source.

Laser works as Tx (transmitter) and photo-detectors works as Rx (receiver).

Lasers cannot penetrate obstacles such as walls, rain, and thick fog. Additionally, laser beam is distorted
by wind, atmosphere temperature, or variation in temperature in the path.
Laser is safe for data transmission as it is very difficult to tap 1mm wide laser without interrupting the
communication channel.

Page 37 of 70
Database Management System

Chapter One: Introduction to database


Outlines
◼ Basic concepts and definition
◼ Data management levels
◼ Database applications
◼ Database Management System
◼ Database actors

2
Data management levels
◼ Data management passes through the different levels of
development along with the development in technology
and services.
◼ These levels could best be described by categorizing the
levels into three levels of development.
◼ Even though there is an advantage and a problem
overcome at each new level, all methods of data
handling are in use to some extent.

3
Data Management levels
◼ The major three levels are:-
◼ Manual Approach

◼ Traditional File Based Approach

◼ Database Approach

4
1. Manual Approach
In the manual approach, data storage and retrieval follows
the primitive and traditional way of information handling
where cards and paper are used for the purpose.

5
Limitations of the Manual approach
◼ Prone to error
◼ Difficult to update, retrieve, integrate
◼ You have the data but it is difficult to compile the
information
◼ Limited to small size information
◼ Cross referencing is difficult

6
2. Traditional File Based Approach
◼ File based systems were an early attempt to
computerize the manual filing system.
◼ This approach is the decentralized computerized data
handling method.
◼ A collection of application programs perform services
for the end-users.
◼ In such systems, every application program that
provides service to end users define and manage its
own data.

7
2. Traditional File Based Approach
◼ Such systems have number of programs for each of the
different applications in the organization.
◼ Since every application defines and manages its own
data, the system is subjected to serious data duplication
problem.
◼ File, in traditional file based approach, is a collection of
records which contains logically related data.

8
2. Traditional File Based Approach

9
Limitations of the Traditional File Based approach

◼ Separation or Isolation of Data


◼ Limited data sharing
◼ Duplication or redundancy of data
◼ Data dependency on the application
◼ Fixed query processing

10
Limitations of the Traditional File Based approach

◼ The most significant problem experienced by the


traditional file based approach of data handling is the
“update anomalies”. We have three types of update
anomalies;
◼ Modification Anomalies

◼ Deletion Anomalies

◼ Insertion Anomalies

11
Limitations of the Traditional File Based approach

◼ Modification Anomalies: a problem experienced when


one or more data value is modified on one application
program but not on others containing the same data set.
◼ Deletion Anomalies: a problem encountered where one
record set is deleted from one application but remain
untouched in other application programs.
◼ Insertion Anomalies: a problem encountered where
one cannot decide whether the data to be inserted is
valid and consistent with other similar data set

12
3. Database Approach

What is database?
◼ Database is an organized collection of logically related
data
◼ Database is a shared collection of logically related data

designed to meet the information needs of an


organization
◼ Database is a collection of logically related data where
these logically related data comprises entities, attributes,
relationships, and business rules of an organization's
information.

13
3. Database Approach

◼ database also contains a description of the data which


called as“Metadata” or “Data Dictionary” or
“Systems Catalogue” or “Data about Data”.
◼ Metadata: Data that describe the properties or
characteristics of end-user data, and the context of
that data.

14
Benefits of the database approach

◼ Data can be shared


◼ Improved accessibility of data
◼ Redundancy can be reduced
◼ Quality data can be maintained
◼ Inconsistency can be avoided
◼ Transaction support can be provided
◼ Integrity can be maintained

15
Benefits of the database approach

◼ Security majors can be enforced


◼ Improved decision support
◼ Standards can be enforced
◼ Compactness
◼ Less labour
◼ Centralized information control

16
Limitations and risk of Database Approach
◼ Introduction of new professional and specialized
personnel.
◼ Complexity in designing and managing data
◼ The cost and risk during conversion from the old to the
new system
◼ High cost incurred to develop and maintain
◼ Complex backup and recovery services from the users
perspective
◼ Reduced performance due to centralization
◼ High impact on the system when failure occur

17
Database applications
◼ Banking: transactions
◼ Airlines: reservation , schedules
◼ Universities:registration, grades
◼ Sales:customers ,sales purchases
◼ Online retailers:order tracking
◼ Manufacturing: production,inventory,orders,supply
chain
◼ Human resource: employee records,salaries ,tax
deductions

18
University database example
◼ Application program examples
◼ Add new students, instructors and courses

◼ Register students for courses and generate class rosters

◼ Assign grades to students

◼ Compute grade point averages(GPA)

◼ Generate transcripts

19
Database users and administrator
◼ Native users

◼ Application programmer

◼ Sophisticated uses

◼ Specialized user

◼ Online users

20
Database users and administrator
◼ Native users
◼ Those who need not be aware the presence of database
systems
◼ These are end users who work though menu driven
applications

◼ Application programmer
◼ Are responsible for developing application programs/user interfaces
written in high level language

21
Database users and administrator
◼ Sophisticated uses
◼ Are users familiar with the structure of the Database and
facilities of the DBMS.
◼ Have complex requirements
◼ Have higher level queries
◼ Are most of the time engineers, scientists, business analysts, etc

◼ Specialized user
◼ Who rights specialized database applications that do not fit into
fractional database processing framework

◼ Online users
◼ Who may communicate with database directly though online
22
Database Administrator
◼ A person/group in charge for implementing database
system in an organization.
◼ The DBA has all privileges allowed by the database
management system. He can assign or remove
privileges from the users.

23
Database Management System(DBMS)

◼ Database Management System (DBMS): A software


system that is used to create, maintain, and provide
controlled access to user databases.
Examples
◼ Oracle

◼ My SQL

◼ SQL Server

24
Database
◼ Massive
◼ Persistent
◼ Safe
◼ Multi-user
◼ Continent
◼ Efficient
◼ reliable

25
Key people
◼ DBMS implementer
◼ Builds system
◼ Database designer
◼ Establishes schema
◼ Database application developer
◼ Programs that operate a database
◼ Database administrator
◼ Loads data ,keeping running smoothly

26
Fundamentals of Database Systems

Chapter 2: Database System Concepts and


Architecture
Outlines
◼ Data Models, Schema and Instances

◼ DBMS Architecture and Data Independence

◼ Database Language and Interface

◼ The Database System Environment

◼ Classification of DBMS

28
DBMS Architecture
There are three levels
◼ External level/view level

◼ Conceptual level/logical level

◼ Internal level/physical/storage level

29
DBMS Architecture
◼ External Level: Users' view of the database. Describes
that part of database that is relevant to a particular user.
Different users have their own customized view of the
database independent of other users.
◼ Conceptual Level: Community view of the database.
Describes what data is stored in database and
relationships among the data.
◼ Internal Level: Physical representation of the database
on the computer. Describes how the data is stored in
the database.

30
DBMS Architecture
External schemas at the external level to describe the
various user views. Usually uses the same data model as
the conceptual level
Conceptual schema at the conceptual level to describe the
structure and constraints for the whole database for a
community of users. Uses a conceptual data model.
◼ Conceptual schema represents:-

◼ All entities, attributes and relationships


◼ constraints on the data
◼ Security and integrity information

31
DBMS Architecture

Internal schema at the internal level to describe physical


storage structures and access paths. Typically uses a
physical data model.
◼ it covers :

◼ Data structure
◼ File organizations

32
ANSI-SPARC Three-level Architecture

◼ There are three levels

33
ANSI-SPARC Architecture and Database Design Phases

◼ The contents of the external, conceptual and internal


levels The purpose of the external/conceptual and the
conceptual/internal mappings

34
DBMS schemas at three levels:

Differences between Three Levels of ANSI-SPARC Architecture

35
Data independence
Logical Data Independence:
◼ Refers to immunity of external schemas to changes in
conceptual schema.
◼ Conceptual schema changes e.g. addition/removal of entities
should not require changes to external schema or rewrites of
application programs.
◼ The capacity to change the conceptual schema without
having to change the external schemas and their application
programs

36
Data independence
Physical Data Independence
◼ The ability to modify the physical schema without changing the
logical schema
◼ The capacity to change the internal schema without having to
change the conceptual schema
◼ Refers to immunity of conceptual schema to changes in the
internal schema
◼ In general, the interfaces between the various levels and
components should be well defined so that changes in some parts
do not seriously influence others

37
Database Languages
Data Definition Language (DDL)
◼ Allows DBA or user to describe and name entitles, attributes
and relationships required for the application.
◼ Specification notation for defining the database schema
Data Manipulation Language (DML)
◼ Provides basic data manipulation operations on data held in
the database.
◼ Language for accessing and manipulating the data organized
by the appropriate data model
◼ DML also known as query language

38
Database Languages
DML can be procedural or non-procedural
◼ Procedural DML: user specifies what data is required and how

to get the data.


◼ Non-Procedural DML: user specifies what data is required but

not how it is to be retrieved


◼ SQL is the most widely used non-procedural language query language
◼ Fourth Generation Language (4GL)

◼ Query Languages

◼ Forms Generators

◼ Report Generators

◼ Graphics Generators

◼ Application Generators

39
Data Model
What is data model?
◼ Data Model: a set of concepts to describe the
structure of a database, and certain constraints that the
database should obey.
◼ A data model is a description of the way that data is
stored in a database. Data model helps to understand
the relationship between entities and to create the most
effective structure to hold data.

40
Data Model…
Data Model is a collection of tools or concepts for
describing
◼ Data
◼ Data relationships
◼ Data semantics
◼ Data constraints
◼ The main purpose of Data Model is to represent the
data in an understandable way.

41
Categories of data models include
Categories of data models include:
◼ Object-based
◼ Record-based
◼ Physical
Object-based Data Models
◼ Entity-Relationship
◼ Semantic
◼ Functional
◼ Object-Oriented

42
Data Model…

◼ We have three major types of data models


◼ Hierarchical Model
◼ Network Model
◼ Relational Data Model

43
Hierarchical Model
◼ The simplest data model
◼ Record type is referred to as node or segment
◼ The top node is the root node
◼ Nodes are arranged in a hierarchical structure as sort of
up sidedown tree
◼ A parent node can have more than one child node
◼ A child node can only have one parent node
◼ The relationship between parent and child is one-to-
many

44
Hierarchical Model
◼ Relation is established by creating physical link between
stored records (each is stored with a predefined access path to
other records)
◼ To add new record type or relationship, the database must be
redefined and then stored in a new form.

Department

Employee Job

Time Card Activity

45
Hierarchical Model
Advantages of hierarchical data model:
◼ Hierarchical Model is simple to construct and operate on

◼ Corresponds to a number of natural hierarchically organized

domains - e.g., assemblies in manufacturing, personnel


organization in companies
◼ Language is simple; uses constructs like GET, GET UNIQUE,

GET NEXT, GET NEXT WITHIN PARENT etc.


Disadvantages of hierarchical data model:
◼ Navigational and procedural nature of processing

◼ Database is visualized as a linear arrangement of records

◼ Little scope for "query optimization

46
Network Model
◼ Allows record types to have more that one parent unlike
hierarchical model
◼ A network data models sees records as set members
◼ Each set has an owner and one or more members
◼ Allow no many to many relationship between entities
◼ Like hierarchical model network model is a collection
of physically linked records.
◼ Allow member records to have more than one owner

47
Network Model

Department Job

Employee

Activity

Time Card

48
Network Model
Advantages of network data model:
◼ Network Model is able to model complex relationships and
represents semantics of add/delete on the relationships.
◼ Can handle most situations for modeling using record types
and relationship types.
◼ Language is navigational; uses constructs like FIND, FIND
member, FIND owner, FIND NEXT within set, GET etc.
Programmers can do optimal navigation through the database.
Disadvantages of network data model:
◼ Navigational and procedural nature of processing
◼ Database contains a complex array of pointers that thread
through a set of records.
◼ Little scope for automated "query optimization”
49
Relational model
Database = set of named relations(or tables)
◼ Attribute
◼ Tuple
Schema =structural description of relations in a database
schema includes name of relations, attribute ,types of
each attribute
Instance = actual contents at given point in time
NULL = special value “unknown ” “undefined”

50
Relational model
◼ Developed by Dr. Edgar Frank Codd in 1970 (famous
paper, 'A Relational Model for Large Shared Data
Banks')
◼ Terminologies originates from the branch of
mathematics called set theory and relation
◼ Can define more flexible and complex relationship
◼ Viewed as a collection of tables called “Relations” equivalent to
collection of record types

51
Relational model…
◼ Viewed as a collection of tables called “Relations” equivalent to collection
of record types
◼ Relation: Two dimensional table
◼ Stores information or data in the form of tables rows and columns
◼ A row of the table is called tuple equivalent to record
◼ A column of a table is called attribute equivalent to fields
◼ Data value is the value of the Attribute
◼ Records are related by the data stored jointly in the fields of records in two
tables or files. The related tables contain information that creates the
relation
◼ The tables seem to be independent but are related some how.
◼ No physical consideration of the storage is required by the user
◼ Many tables are merged together to come up with a new virtual view of the
relationship

52
Relational model…

53
Relational model…
◼ The rows represent records (collections of information about
separate items)
◼ The columns represent fields (particular attributes of a record)
◼ Conducts searches by using data in specified columns of one
table to find additional data in another table
◼ In conducting searches, a relational database matches
information from a field in one table with information in a
corresponding field of another table to produce a third table
that combines requested data from both tables

54
Relational model…
Properties of Relational Databases
◼ Each row of a table is uniquely identified by a primary
key composed of one or more columns
◼ Each tuple in a relation must be unique
◼ Group of columns, that uniquely identifies a row in a
table is called a candidate key
◼ entity integrity rule of the model states that no
component of the primary key may contain a NULL
value.

55
Relational model…
Properties of Relational Databases
◼ A column or combination of columns that matches the
primary key of another table is called a foreign key.
Used to cross-reference tables.
◼ The referential integrity rule of the model states that,
for every foreign key value in a table there must be a
corresponding primary key value in another table in the
database or it should be NULL.
◼ All tables are logical entities

56
Relational model…
Properties of Relational Databases
◼ A table is either a BASE TABLES (Named Relations)
or VIEWS (Unnamed Relations)
◼ Only Base Tables are physically stores
◼ VIEWS are derived from BASE TABLES with SQL
instructions like:
◼ [SELECT .. FROM .. WHERE .. ORDER BY]
◼ Is the collection of tables o Each entity in one table
◼ Attributes are fields (columns) in table

57
Relational model…
Properties of Relational Databases
◼ Order of rows and columns is immaterial
◼ Entries with repeating groups are said to be un-

normalized
◼ Entries are single-valued

◼ Each column (field or attribute) has a distinct name

All values in a column represent the same attribute and


have the same data format

58
Building Blocks of the Relational Data Model

The building blocks of the relational data model are:


◼ Entities: real world physical or logical object

◼ Attributes: properties used to describe each Entity or


real world object.
◼ Relationship: the association between Entities

◼ Constraints: rules that should be obeyed while

manipulating the data.

59
Building Blocks of the Relational Data Model

Entities (persons, places, things etc.) which the


organization has to deal with. Relations can also
describe relationships
Example : student, employee, course, instructor
Attributes - the items of information which characterize
and describe these entities.

60
Fundamentals of Database Management
Systems

Chapter three : The ER Model


Outlines
◼ The high-level conceptual model
◼ Entities, Attributes, and Keys
◼ Relationships, Associations, and Constraints
◼ The ER Diagrams
◼ Mapping ER-models to relational tables

62
The E-R Model: over view
◼ An entity-relationship model (or E-R model) is a
detailed, logical representation of the data for an
organization or for a business area.
◼ The E-R model is expressed in terms of entities in the
business environment, the relationships (or
associations) among those entities, and the attributes
(or properties) of both the entities and their
relationships.
◼ An E-R model is normally expressed as an entity-
relationship diagram (or E-R diagram, or simply ERD),
which is a graphical representation of an E-R model.

63
The E-R Model

64
The E-R Model

Entity
◼ PRODUCT

◼ ORDER

◼ ITEM

◼ SUPPLIER

◼ SHIPMENT T

65
Drawing tools
◼ Microsoft Visio
◼ Oracle Designer
◼ All Fusion ERWin
◼ Power Designer

66
Building Blocks of the Relational Data Model
The building blocks of the relational data model are:
◼ Entities: real world physical or logical object

◼ Attributes: properties used to describe each Entity or


real world object.
◼ Relationship: the association between Entities

◼ Constraints: rules that should be obeyed while

manipulating the data.

67
Building Blocks of the Relational Data Model
Entity: A person, place, object, event, or concept in the
user environment about which the organization wishes to
maintain data.
Thus, an entity has a noun name. Some examples of each
of these hinds of entities follow:
Person: EMPLOYEE, STUDENT, PATIENT
Place: STORE, WAREHOUSE, STATE
Object: MACHINE, BUILDING, AUTOMOBILE
Event: SALE, REGISTRATION, RENEWAL
Concept: ACCOUNT, COURSE, WORK CENTER

68
Building Blocks of the Relational Data Model
◼ Entity type: A collection of entities that share
common properties or characteristics.
◼ Entity instance: A single occurrence of an entity type.

69
Types of entity
◼ Strong entity type: An entity that exists
independently of other entity types

◼ Weak entity type: An entity type that whose


existence depends on some other entity type.

70
Attribute
◼ Attribute: A property or characteristic of an entity or
relationship type that is of interest to the organization.

71
Types of Attributes
1. Simple (atomic) Vs Composite attributes
2. Single-valued Vs multi-valued attributes
3. Stored vs. Derived Attribute
4. Null Values

72
Types of Attributes…
(1) Simple (atomic) Vs Composite attributes
◼ Simple : contains a single value (not divided into sub

parts)
◼ E.g. Age, gender
◼ Composite: Divided into sub parts (composed of other attributes)
◼ E.g. Name, address

73
Types of Attributes …
(2) Single-valued Vs multi-valued attributes
◼ Single-valued : have only single value(the value may

change but has only one value at one time)


◼ E.g. Name, Sex, Id. No. color_of_eyes
◼ Multi-Valued: have more than one value
◼ E.g. Address, dependent-name
◼ Person may have several college degrees

74
Types of Attributes…
Stored vs. Derived Attribute
◼ Stored : not possible to derive or compute

◼ E.g. Name, Address


◼ Derived: The value may be derived (computed) from
the values of other attributes.
Example
◼ Age (current year – year of birth)

◼ Length of employment (current date- start date)

◼ Profit (earning-cost)

◼ G.P.A (grade point/credit hours)

75
Types of Attributes…

Null Values
◼ NULL applies to attributes which are not applicable or

which do not have values.


◼ You may enter the value NA (meaning not applicable)

◼ Value of a key attribute can not be null.

76
Relationships
◼ Relationships are the glue that holds together the
various components of an E-R model.
◼ Intuitively, a relationship is an association
representing an interaction among the instances of one
or more entity types that is of interest to the
organization.
◼ Thus, a relationship has a verb phrase name.

77
Relationship type and instances
(a) Relationship type (Completes)

78
Relationship type and instances
(b) Relationship instances

79
Relationship type and instances
Relationship type: A meaningful association between
(or among) entity types.
Relationship instance: An association between (or
among) entity instances where each relationship
instance includes exactly one entity from each
participating entity type.

80
Degree of a Relationship

Degree: The number of entity types that participate in a


relationship.
Unary relationship: A relationship between the
instances of a single entity type.
Binary relationship: A relationship between the
instances of two entity types.
Ternary relationship: A simultaneous relationship
among the instances of three entity types.

81
Cardinality Constraints
◼ Cardinality constraint: Specifies the number of
instances of one entity that can (or must) be
associated with each instance of another entity.
◼ Minimum cardinality: The minimum number of
instances of one entity that may be associated with
each instance of another entity.
◼ Maximum cardinality: The maximum number of
instances of one entity that may be associated with
each instance of another entity.

82
Cardinality Constraints

Cardinality can be :-
◼ ONE-TO-ONE, e.g. Building - Location,

◼ ONE-TO-MANY, e.g. hospital - patient,

◼ MANY-TO-ONE, e.g. Employee - Department

◼ MANY-TO-MANY, e.g. Author - Book

83
Cardinality Constraints

◼ Example

84
Problem in ER Modeling
◼ While designing the ER model one could face a
problem on the design which is called a connection
traps. Connection traps are problems arising from
misinterpreting certain relationships .
◼ There are two types of connection traps;
◼ Fan trap
◼ Chasm Trap:

85
Problem in ER Modeling
1.Fan trap:
◼ Occurs where a model represents a relationship between

entity types, but the pathway between certain entity


occurrences is ambiguous.
◼ May exist where two or more one-to-many (1:M)
relationships fan out from an entity. The problem could
be avoided by restructuring the model so that there
would be no 1:M relationships fanning out from a singe
entity and all the semantics of the relationship is
preserved.

86
1.Fan trap:
◼ Example

87
Fan trap …
◼ Problem: Which car (Car1 or Car3 or Car5) is used by
Employee 6 Emp6 working in Branch 1 (Bra1)?
◼ Thus from this ER Model one can not tell which car is
used by which staff since a branch can have more than
one car and also a branch is populated by more than one
employee.
◼ Thus we need to restructure the model to avoid the
connection trap.

88
Fan trap…
◼ To avoid the Fan Trap problem we can go for
restructuring of the E-R Model. This will result in the
following E-R Model.

89
Chasm Trap

◼ Occurs where a model suggests the existence of a


relationship between entity types, but the path way does
not exist between certain entity occurrences.
◼ May exist when there are one or more relationships with
a minimum multiplicity on cardinality of zero forming
part of the pathway between related entities.

90
Chasm Trap…
Example

If we have a set of projects that are not active currently


then we can not assign a project manager for these
projects. So there are project with no project manager
making the participation to have a minimum value of zero.

91
Chasm Trap:
Problem: How can we identify which BRANCH is
responsible for which PROJECT? We know that whether
the PROJECT is active or not there is a responsible
BRANCH. But which branch is a question to be answered,
and since we have a minimum participation of zero
between employee and PROJECT we can’t identify the
BRANCH responsible for each PROJECT.

92
Chasm Trap…
◼ The solution for this Chasm Trap problem is to add
another relationship between the extreme entities
(BRANCH and PROJECT)

93
Constraints
◼ Domain Integrity: No value of the attribute should be
beyond the allowable limits
◼ Entity Integrity: In a base relation, no attribute of a
primary key can be null
◼ Referential Integrity: If a foreign key exists in a
relation, either the foreign key value must match a
candidate key in its home relation or the foreign key
value must be null foreign key to primary key match-
ups
◼ Enterprise Integrity: Additional rules specified by the
users or database administrators of a database are
incorporated
94
Key constraints
Key constraints
◼ If tuples are need to be unique in the database, and then we need
to make each tuple distinct. To do this we need to have relational
keys that uniquely identify each relation.
◼ Super Key: an attribute or set of attributes that uniquely
identifies a tuple within a relation.
◼ Candidate Key: a super key such that no proper subset of that
collection is a Super Key within the relation. A candidate key has
two properties:
1.Uniqueness
2.Irreducibility
If a candidate key consists of more than one attribute it is called
composite key.
95
Key constraints
◼ Primary Key: the candidate key that is selected to
identify tuples uniquely within the relation.
◼ The entire set of attributes in a relation can be considered as a
primary case in a worst case.
◼ Foreign Key: an attribute, or set of attributes, within
one relation that matches the candidate key of some
relation.
◼ A foreign key is a link between different relations to create
the view or the unnamed relation

96
Relational languages and views
◼ The languages in relational database management
systems are the DDL, SDL, VDL and the DML that
are used to define or create the database and perform
manipulation on the database.
◼ We have the two kinds of relation in relational
database.
◼ The difference is on how the relation is created, used
and updated:

97
Relational languages and views
1. Base Relation
◼ A Named Relation corresponding to an entity in the
conceptual schema, whose tuples are physically stored
in the database.
2. View
◼ Is the dynamic result of one or more relational
operations operating on the base relations to produce
another virtual relation. So a view virtually derived
relation that does not necessarily exist in the database
but can be produced upon request by a particular user at
the time of request.
98
Relational languages and views
Purpose of a view
◼ Hides unnecessary information from users

◼ Provide powerful flexibility and security

◼ Provide customized view of the database for users

◼ A view of one base relation can be updated.

◼ Update on views derived from various relations is not allowed

◼ Update on view with aggregation and summary is not allowed

99
Symbols to draw ERD

100
Symbols to draw ERD

101
ERD example

102

You might also like