DLBCSL01 Course Book
DLBCSL01 Course Book
Masthead
Publisher:
IU Internationale Hochschule GmbH
IU International University of Applied Sciences
Juri-Gagarin-Ring 152
D-99084 Erfurt
Mailing address:
Albert-Proeller-Straße 15-19
D-86675 Buchdorf
[email protected]
www.iu.org
DLBCSL01
Version No.: 001-2023-0324
Module Director
Prof. Dr. Paul Libbrecht
Table of Contents
Algorithms, Data Structures, and Programming Languages
Module Director . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Introduction
Algorithms, Data Structures, and Programming Languages 7
Signposts Throughout the Course Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Unit 1
Basic Concepts 12
1.1 Algorithms, Data Structures, and Programming Languages as the
Basics of Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Unit 2
Data Structures 36
2.1 Advanced Data Structures: Queue, Heap, Stack, Graph . . . . . . . . . . . . . . . 36
2.3 Polymorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Unit 3
Algorithm Design 58
3.1 Induction, Iteration, and Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Unit 4
Basic Algorithms 80
4.1 Traversing and Linearization of Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Unit 5
Measuring Programs 104
5.1 Type Inference and IDE Interactive Support . . . . . . . . . . . . . . . . . . . . . . . 104
Unit 6
Programming Languages 128
6.1 Programming Paradigms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Unit 7
Overview of Important Programming Languages 150
7.1 Assembler and Webassembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Appendix 1
List of References 178
Appendix 2
List of Tables and Figures 184
Introduction
Algorithms, Data Structures, and
Programming Languages
8 Introduction
Welcome
This course book contains the core content for this course. Additional learning materials can
be found on the learning platform, but this course book should form the basis for your
learning.
The content of this course book is divided into units, which are divided further into sections.
Each section contains only one new key concept to allow you to quickly and efficiently add
new learning material to your existing knowledge.
At the end of each section of the digital course book, you will find self-check questions.
These questions are designed to help you check whether you have understood the concepts
in each section.
For all modules with a final exam, you must complete the knowledge tests on the learning
platform. You will pass the knowledge test for each unit when you answer at least 80% of the
questions correctly.
When you have passed the knowledge tests for all the units, the course is considered fin-
ished and you will be able to register for the final assessment. Please ensure that you com-
plete the evaluation prior to registering for the assessment.
Good luck!
Introduction 9
Learning Objectives
This course, Algorithms, Data Structures, and Programming Languages, will provide students
with a basic understanding of algorithms, data structures, and programming languages,
which are the foundations of computer programming. It will equip the students with a basic
understanding of how to represent algorithms in different ways and how to use control struc-
tures, such as loops, conditionals, and recursion, to write programs.
This course will provide the student with a basic understanding of data structures—the
building blocks of algorithms. Basic data structures, such as lists, chains, and trees, will be
covered. This will be followed by advanced data structures, such as stacks, queues, heaps,
and graphs. The concept of abstract data types (ADT) will be introduced for modeling data
structures. Students will be taught how to implement data structures using objects and
classes.
On completion of the course, the students will have an understanding of basic algorithms
and be able to apply them in practical situations. Students will be able to design and analyze
basic algorithms and apply suitable algorithms to problems arising in different applications.
Additionally, students will have gained a basic understanding of tree traversal, searching,
sorting, searching in strings, hashing, and pattern recognition algorithms.
The course will introduce various methodologies for proving the correctness, verification, and
testing of programs. On completion of the course, students will understand and be able to
apply various program measurement methodologies, as well as explain and compare various
programming paradigms and languages.
Unit 1
Basic Concepts
STUDY GOALS
DL-E-DLBCSL01-U01
12 Unit 1
1. Basic Concepts
Introduction
The stepwise execution of a sequence of instructions to accomplish a given task is
ubiquitous in our daily lives. Cooking a dish based on a recipe, searching for a book on
a bookshelf in a library, or searching for the shortest route to a destination on a digital
map all involve stepwise execution of instructions, with or without the help of a com-
puter. In fact, algorithms and algorithmic computing existed long before the advent of
the modern-day digital computer. Ancient civilizations devised systematic methods, or
sequences of instructions, to carry out various tasks. Architects in antiquity, such as
those in ancient Egypt, devised various systematic geometric constructions using rul-
ers, compasses, and knotted strings. The celebrated algorithm for finding the “greatest
common divisor” of two whole numbers was suggested by the Greek mathematician
Euclid around 300 BCE. A popular algorithm for finding prime numbers, the “Sieve of
Eratosthenes,” is attributed to Eratosthenes of Cyrene, an ancient Greek polymath who
lived around the second century BCE. Around 250 BCE, the Greek mathematician Archi-
medes proposed an algorithmic procedure for computing an approximation of π using
the ratio of the circumference and diameter of a circle. Although several approxima-
tions of π had been proposed earlier, most were merely estimated constant values.
Archimedes was the first to suggest an iterative algorithm to compute the value of π,
with the accuracy increasing with each iteration. With the advent of digital computers
and the increasing complexity of problems solved by them, a systematic paradigm of
programming has emerged. For a given problem specification, an algorithm is designed.
Data structures will then provide a means of efficient storage, retrieval, and processing
of the data encountered by the algorithm. Programming languages then provide the
constructs to implement and map the design ideas in the algorithms and data struc-
tures to executable code.
Basic Concepts
This definition is machine-independent and would also apply to a pen and paper exe-
cution.
The Electronic Numerical Integrator and Computer (ENIAC), built in 1946, was one of the
first general-purpose digital computers (O’Regan, 2018). Although ENIAC is regarded as
the first programmable digital computer, it did not have program storage capabilities.
ENIAC’s inventors, John Mauchly and John Presper Eckert, proposed its successor, the
Electronic Discrete Variable Automatic Computer (EDVAC). Most general-purpose com-
puters and computing as we know them today draw inspiration from the classical
architecture proposed by John von Neumann (von Neumann, 1945). EDVAC was based Architecture
on this. This architecture defines what is known as a “stored-program model of compu- An architecture
tation.” In the von Neumann architecture, a computer consists of the following (Liang, describes a set of
2017): rules and specifica-
tions for how soft-
• a main memory called random access memory (RAM). Instructions and data reside ware and hardware
in the read-write main memory. that comprise a
• a central processing unit (CPU) consisting of a control unit and an arithmetic and computer system are
logic unit. The CPU fetches instructions and data from the main memory and per- organized and inter-
forms operations on the data according to the instructions. Results of computations act.
are then written back into the main memory.
• secondary storage units. The data stored in RAM are ephemeral and are no longer
available once the system is switched off. Secondary storage units allow us to store
data and programs permanently, to be retrieved as required.
• input-output (I/O) units. These include devices such as the keyboard, mouse, moni-
tor, and printer, which allow the user to communicate with the computer.
There are often multiple algorithms for solving a problem. It is good practice to choose
one based on efficiency considerations, such as time or space requirements, or ease of
implementation.
The language in which the program is written is called a “programming language.” Com-
puters have a set of hardware-specific built-in instructions called the “machine lan-
guage.” So, a seemingly obvious choice is to map the algorithm to a machine language
program. Programming in machine language involves writing code in a binary number
system. That would not only be cumbersome, but such programs would also be hard to
14 Unit 1
read, comprehend, debug, and edit. To circumvent such problems, assembly languages
were created. An assembly language replaces machine language code with instructions
using mnemonics. The assembly language code can be translated into machine lan-
guage using an assembler. Assembly language is still difficult to work with, while also
being machine dependent. Programs are, therefore, more commonly written in plat-
form-independent languages known as “high-level languages.” Examples include
Python, Java, C, and C++, but there are many others. Programs written in high-level lan-
guages are translated into machine code using “compilers” and “interpreters.” Compil-
ers translate the whole code into machine language, whereas interpreters translate one
statement at a time. Java, C, and C++ are examples of compiled languages. Python and
Lisp are interpreted languages.
Example Program
Below is a simple Python program fragment for computing the greatest common divisor
(GCD) of two positive integers.
Gcd.py
first = eval(input('Enter first positive integer:'))
second = eval(input('Enter second positive integer:'))
answer = 1
divisor = 2
while ((divisor <= first) and (divisor <= second)):
if(((first % divisor)==0) and ((second % divisor)==0)):
answer = divisor
divisor += 1
print(answer)
This program reads in two positive integers. These are entered as user input from the
terminal and stored in main memory as the variables “first” and “second.” The variable
names refer to storage locations where these values are stored. The program checks
the integers from two onwards as possible candidates for GCD. It continues this for as
long as the candidate divisor is less than or equal to the smaller of the two input num-
bers. The variable “answer” is another memory location where the program stores the
value of the common divisor found. The assignment statement
answer = divisor
overwrites the value at this location when a higher valued common divisor is disco-
vered. A common divisor cannot be larger than the minimum of the two numbers.
Hence, once this candidate divisor has been checked, the value of the answer as stored
in memory is printed out as the GCD of “first” and “second.”
Unit 1 15
Basic Concepts
Since there can often be multiple algorithms for the same problem, anyone implemen-
ting the algorithm is frequently faced with the problem of deciding which one to cho-
ose. Efficiency measures can assist us in making comparisons and arriving at a decision
regarding the choice of the algorithm. According to Cormen et al. (2009) two common
measures of efficiency are “space complexity,” a measure of the amount of memory the
algorithm needs, and “time complexity,” a measure of how fast the algorithm runs.
There are also other types of complexity measures. The message complexity in distribu-
ted algorithms is an example. Programs also use efficient means of structuring data. To
measure the efficiency of a data structure, we measure the space used by the data
structure (space complexity), the time taken to build the data structure (preprocessing
time), the time taken to run a particular query on the data structure (query time), or
the time taken to update the data structure (update time).
When measuring the actual time required by an algorithm, there are some challenges.
The time required would depend on several factors, such as the machine used, the
software environment, and the data set used. Of course, the algorithm must be pro-
grammed first. Algorithm analysis involves computing efficiency measures from the
pseudocode by counting “primitive operations,” such as assignments, comparisons,
arithmetic operations, function calls, and returns from functions. A basic assumption is
that a primitive operation corresponds to, at most, a constant number of instructions
on the computer, thus actual times taken by different primitive operations are similar.
Hence, the actual time taken by the algorithm is proportional to the number of primi-
tive operations. We call the time taken for an algorithm to run the “running time”
or “time complexity,” and we measure it in terms of the “input size.” Consider the follo-
wing Python code for linear search: Linear search
A linear search is a
linearSearch.py search algorithm to
def linSearch(numList, keyValue): locate a key value in
index = 0 a sequence of unor-
while(index < len(numList)): dered elements by
if(keyValue == numList[index]): comparing the key
return index with elements in the
index += 1 sequence one after
return -1 the other in the
same order.
To analyze the algorithm, note that the two statements index = 0 and return -1 are
always executed. If the list has n elements (i.e., numList is equal to n), the while loop is
executed at most n times. The statement while(index < len(numList)) includes a
function call to len() and a comparison. The statement if(keyValue == num-
List[index]) includes a == operator and is read from a specified position in the list.
The statement return index adds one more primitive operation. Finally, the increment
operation and the assignment index +=1 may be counted as one or two operations.
We can summarize and claim that the running time of linear search is n · k + 2 where k
is a small constant and n is the size of the list being searched. There is a better search
16 Unit 1
algorithm, called “binary search,” which takes time proportional to log n. These two
time complexities are customarily specified as O n and O log n respectively, using the
notation for “asymptotic upper bound below,” which guarantees that these bounds
hold for all sufficiently large values of n.
There are situations when more than one parameter is used to specify the time or
space complexity. Conventionally, complexity of graph algorithms is specified in terms
of the number of vertices V and the number of edges E . Also, there are situations
when the complexity may be measured in terms of an input value itself rather than the
number of such values. The GCD algorithm presented above runs in time O min m, n ,
where m and n are the numbers whose GCD is being computed. There is a better algo-
rithm for the problem attributed to Euclid that runs in O log min m, n time (Cormen
et al., 2009).
Specifying Algorithms
Natural language
Using natural language to represent algorithms is a seemingly attractive choice. Howe-
ver, natural language is inherently ambiguous and, by definition, algorithms need to be
represented in clear unambiguous steps. Therefore, this turns out to be an impractical
choice. At the same time, an additional natural language description of an algorithm
sometimes complements or augments other forms of representations of the algorithm
and improves clarity of understanding for the reader. This is an approach often follo-
Unit 1 17
Basic Concepts
wed in books, research papers, and technical documents. Despite its natural drawbacks
of being ambiguous, natural language is an advantage when the details of an algorithm
need to be communicated to people who may not be familiar with programming. Below
is a simple natural language description of Euclid’s algorithm for finding the GCD of two
non-negative numbers.
Pseudocode
As a method of representation of algorithms, pseudocode comes somewhere in bet-
ween a natural language description and a program written in a high-level language. It
is a more precise representation of the algorithm but usually at a level higher than that
of the program. However, since there are no standardized notations to represent pseu-
docode, people follow their own conventions. It is assumed anyone with some know-
ledge or background in programming would be able to understand the algorithm. Yet
pseudocode has the advantage of being programming language agnostic. The pseudo-
code describing Euclid’s algorithm for finding GCD can take the following form:
GCD
begin
read a, b
m ← maximum(a, b)
n ← minimum(a, b)
while (n ≠0)
r ← m mod n
m ← n
n ← r
endwhile
return m
end
Flowcharts
Presenting an algorithm as a flowchart was in fashion in the early days of computing. It
is still a useful tool to teach or present simple algorithms. The major disadvantage of
flowcharts is that they do not scale up well to more complex problems. Below is the
flowchart version of Euclid’s GCD algorithm.
18 Unit 1
Various attributes of a good language have been identified over the years (Pratt & Zel-
kowitz, 2001; Sebesta, 2016). These include
• clarity and simplicity. The simpler the programming language, the easier it is for the
algorithm designer to map the algorithms to programs. The algorithm can then be
specified with a pseudocode very close to the target language itself.
• expressivity. These include powerful features in the language that allow the pro-
grammer to express solutions to problems in clear and natural ways.
• orthogonality. Fewer numbers of primitive and independent constructs and a set of
rules for combining them in all possible ways can make a language more convenient
to use. If constructs of a language are orthogonal, the language is easy to learn.
Exceptions do not need to be learned, since virtually all combinations are allowed.
Unit 1 19
Basic Concepts
• support for abstraction. The data structures required for the problem being solved
are often different from what is provided in terms of the built-in types. It is the
responsibility of the software developer to create appropriate abstractions required
for the solution. Implementing these abstractions requires support from the lan-
guage. For example, Python provides support for object-oriented programming.
• portability or transportability across machines.
• cost of use.
Data abstractions allow us to use a data type without details of how it is implemented
(Sebesta, 2016). A data abstraction is defined in terms of the associated defining opera-
tions.
In data structure design, data abstraction is supported through abstract data types Abstract data type
(ADTs). The ADT for a data structure specifies what is stored in the data structure and An abstract data
what operations are supported without detailing how the operations are implemented. type is a mathemati-
“Objects” are instances of ADTs. cal model of a data
structure that identi-
The primary goal of data abstraction is hiding unwanted information, whereas data fies the type of data
encapsulation refers to hiding data within an entity along with methods to control stored in the data
access. Since the representation can be manipulated by a controlled set of operations structure and the
defined by a programmer, only these limited sets of defining operations depend on the operations allowed.
internal representation. If the representation is updated, only the limited set of defin-
ing operations needs to change. Data encapsulation also helps the programmer to
ensure that private data rules are enforced. These rules are called “representation
invariants.” For instance, one may enforce a rule that an ordered list may contain only
unique items. If defining operations can only generate objects that follow the represen-
tation invariant, then that leads to correct-by-construction implementation—the user
cannot create objects that violate these rules.
20 Unit 1
Arithmetic Expressions
Operator evaluation
The control flow in order of evaluation of the operators partly depends on the estab-
lished “precedence of operators” as defined by the programming language. For
instance, consider the Python expression 3+4*2. Here, the multiplication operation 4*2
is carried out first, since the multiplication operator * has a higher precedence than the
addition operator +.
Rules for associativity in the programming language govern the order of evaluation of
operators of the same precedence. For example, consider the Python expression
2**3**2. Here, ** is the exponentiation operator, which associates right to left. Hence
the expression evaluates to 512 and not 64.
Parentheses can be used to alter the implied order of evaluation as determined by the
precedence and associativity rules. The expression (2**3)**2 will evaluate to 64 in
Python because the parentheses override the rules for associativity.
Assignment Operator
The assignment operator, in its simplest variant, takes the following form:
variable = expression
Unit 1 21
Basic Concepts
This requires the expression on the right to be evaluated first before the assignment
takes place.
Compound assignments
In many languages, compound assignment operators are supported. Consider the follo-
wing Python assignments:
a = 2
a *= 3 #equivalent to a = a*3
print(a)
Multiple assignments
Consider the following assignment statement in Python:
a = b = c = 1
c = 1
b = c
a = b
a, b, c = 1, 2, 3
a = 1
b = 2
c = 3
x, y = y, x
Boolean expressions also involve the logical operators or, and, and not. A Boolean
expression evaluates to True or False. Logical operators in general work on Boolean
operands.
22 Unit 1
Conditional Statements
if(condition):
statement
if((x % 3)==0):
print("Divisible by” 3")
Here the intent is to do nothing if the conditional expression is not true. Conditional
statements may come with two alternatives. For example, in Python:
if(condition):
Statement 1
Else:
Statement 2
if(x%2 == 0):
print("Even")
else:
print("Odd")
if(condition 1):
Statement 1
Elif(condition 2):
Statement 2
.
.
elif(condition n):
Statement n
else:
Statement n+1
if(a == b):
print("a equals b")
else:
if(a < b):
print("a is less than b")
Unit 1 23
Basic Concepts
else:
print("a is greater than b")
Conditional statements can often be written in many equivalent ways.
The following six statements are equivalent in Python:A) if(x > 0):
if(x < 100):
Iterative Loops
Built-in functions
Languages provide several built-in functions, for example, print(), type(), input(), max(),
and min() in Python. Consider the example of the max() function:
>>>max(2,3)
3
>>>max(2,3,4)
4
>>>max(max(2,3),4)
4
'c'
>>>max("abc","bcd")
'bcd'
>>>max(2,"two")
TypeError
24 Unit 1
User-defined functions
User-defined functions help in code reuse and in organizing and simplifying code. A
simple Python example is
def plus5(a):
return(a+5)
>>>plus5(7)#returns 12
Multiple arguments
Functions may have multiple arguments, such as
maxmin1.py
def maximinOf3(x, y, z):
max3 = max(max(x,y),z)
min3 = min(min(x,y),z)
return(max3, min3)
print(maximinOf3(15,23,12))
print(maximinOf3(15,23.1,12.5))
No return values
The following example, maxmin2.py, demonstrates a “void” function, which returns not-
hing. Functions that return something are called “fruitful.”
maxmin2.py
def maximinOf3(x, y, z):
max3 = max(max(x,y),z)
min3 = min(min(x,y),z)
print (max3, min3)
maximinOf3(15, 23, 12)
Recursion
Recursion is a mechanism wherein functions invoke themselves. It often leads to ele-
gant solutions since some problems can be modeled recursively in a natural way.
Recursive functions have a base case that enables us to terminate it. Consider the fol-
lowing factorial function:
Unit 1 25
Basic Concepts
1, n = 0
factorial n =
n*factorial n − 1 , n ≥ 1
fact.py
def fact(n):
if (n==0):
return 1
else:
return n*fact(n-1)
Without the base case under the if clause, the function would run indefinitely, causing
a runtime error.
Type
A type is defined by a set of values and a set of operations that operate on those val-
ues. There are language-specific constraints on the usage of types in a program. A vari-
able of a type can only be operated on by operations defined on the type. A type, in
turn, attaches specific meanings to an entity in a program, such as a variable. The hard-
ware would not discriminate between meanings associated with a sequence of bits,
that is, whether it is to be interpreted as a string, integer, or character. However, the
programming language defines the operations that can be done on the sequence of
bits, and the execution of the program translates into microprocessor instructions that
manipulate those bits.
Utility of Types
Types have several utilities. Types assist in the hierarchical conceptualization of data.
For instance, “employee ID” and “salary” could both be integers. Computing the sum or
average is fine for salaries, whereas it would not make sense for the employee ID.
Defining separate types for these would require an integer field in both, but a different
set of operations could be defined.
Types also ensure correctness. The type system defines rules of usage, which are
checked. For instance, the “+” operation in C would represent the addition of numeric
types like integers and floating-point numbers. Trying to add two strings would flag an
26 Unit 1
Types also define the amount of storage that needs to be allocated. For example, a
“char” in C would require one byte of storage. Sometimes, the sizes for different types
vary for different implementations of the language.
Type Systems
The type system of a programming language is a logical system defined with a set of
constructs to assign types to entities like variables, expressions, or return values of
functions (Gabrielli & Martini, 2010). The type system defines the set of built-in types
for the language, provides the constructs for defining new types, and defines rules for
control of types. There are rules for type compatibility; for example, if a function
expects an argument to be a floating-point number, will an integer value for the argu-
ment be allowed? Another set of rules defines how the type of an expression is compu-
ted from the types of its constituents.
Fundamental Types
Some fundamental types are supported by the language. These usually correspond to
the most common and basic ways of structuring data. The set of built-in or fundamen-
tal types varies from language to language. For instance, int (for integers), bool (for
Booleans), char (for characters), and float (single-precision floating-point) are some
of the built-in types in C++. There are also more specific types, such as the signed and
unsigned variants of char or int, or the long and short variants of int. Python built-in
types include str (strings), int, float, list, tuple, range, dict (dictionaries), set,
and bool (Booleans).
User-Defined Types
Creating new user-defined types allows the programmer to write programs with the
new types closely aligned with the concepts of the application. This helps in writing
more concise code and makes the program more readable. Moreover, illegal usage of
Unit 1 27
Basic Concepts
objects can be detected at compile time, greatly simplifying testing. Type casting, the
facility in languages to change one data type to another, gives added flexibility to the
programmer for type management. This is also known as type conversion.
The type system of a programming language lays down a set of rules that the programs
written in that language must follow. These rules constrain the set of valid programs
that can be written, but are these rules strong enough to ensure that the valid pro-
grams do not have type errors? The extent to which this can be guaranteed defines
whether the type system is strong or weak. A language with a strong type system is
classified as “strongly typed” (Sebesta, 2016). Languages that are not strongly typed are
“weakly typed.” Note that these definitions are not precise and there are different view-
points on the relative strengths of languages. In general, however, an overly restrictive
type system may be easier to check but may severely restrict the set of legal programs.
This may also require the user to write more code to ensure type safety, for instance, by
explicit conversions using type casting. This is a trade-off that language designers must
keep in mind.
Statically typed languages obey a static type system, meaning the checking of the type
system rules is accomplished at compile time. Declaring all variables with designated
types and requiring that expressions have well-defined types are ways to ensure that
type safety can be verified at compile time.
In languages with dynamic typing, the checking of the type system rules is conducted
at run time. Dynamic checking slows down program execution. In a dynamically typed
language, a variable may be bound to an object (but not a type) during compilation,
but the binding to is delayed until run time.
Static typing implies a strongly typed language although the converse is not true. Java
is a statically typed language and Python is dynamically typed, but both are regarded
as strongly typed language.
hardware. Common word sizes are 32 bits and 64 bits, as determined by the manufactu-
rer. The type of an operand determines its size (Hennessy & Patterson, 2017). There are
multiple views on words and bytes, such as
• logical. This is viewed as a string of bits. There are bitwise operators that act accor-
ding to this view.
• integer. This can be operated on according to rules of arithmetic operations. Two’s
complement representation is the most common representation for signed integers.
• floating-point. The operations are the same as for integers, but the word is divided
into the sign bit, the mantissa, and the exponent. The mantissa represents the
actual bits of the floating-point number, and the exponent represents the power of
the radix (in this case two) in the scientific notation. For instance, 25.375 in binary is
11001.011 = 1.1001011 · 24, where 1.1001011 is the mantissa and the unbiased expo-
nent is 100 (or 4 in decimal). The exponent is usually stored after adding what is cal-
led a “bias.” Since the mantissa always starts with a 1, often only the rest of it, called
the normalized mantissa, is stored. Usually, hardware manufacturers follow IEEE 754,
which is the technical standard for floating-point arithmetic. Single-precision floa-
ting-point usually uses 32 bits and double precision uses 64 bits for representation.
• character. The view represents a character code like 8-bit ASCII, 16-bit Unicode, or
32-bit Unicode.
1. If the machine is Big Endian, the most significant or the leftmost byte of the multi-
byte data is stored first (at the lowest address).
2. If the machine is Little Endian, the least significant or the rightmost byte of the
multi-byte data is stored first (at the lowest address).
import sys
print(sys.byteorder)
>>little
Basic Concepts
List
The “list” or the “singly linked list” is an unordered sequence of items. We need to be
able to maintain the relative positions of these items, so we call the first and last ele-
ments of the list the head and tail of the list, respectively. The location of the head of
the list is explicitly known, and the location of the i + 1-th item in the sequence is sto-
red with the i-th item. There is no next item corresponding to the last item on the list.
We will construct a Python implementation of a list data structure below, and, for
simplicity, we will assume that our lists cannot contain duplicate items. Note that
native Python lists are implemented using arrays and are different from linked lists.
Structure
The linked list is built as a collection of basic building blocks called nodes. Each node
stores two fields—a data element and a next node information. A Python implementa-
tion is shown below:
sList.py
class Node:
def __init__(self, elem):
self.element = elem
self.nextNode = None
def getElement(self):
return self.element
def getNextNode(self):
return self.nextNode
def setElement(self, elem):
self.element = elem
def setNextNode(self, elem):
self.nextNode = elem
Supported operations
We will construct a list data structure that supports the following operations:
sList.py
class LinkedList:
def __init__(self):
self.length = 0
self.head = None
def isEmpty(self):
return (self.length==0)
def getLength(self):
return self.length
def addNode(self,elem):
temp=Node(elem)
temp.setNextNode(self.head)
self.head=temp
self.length +=1
if(thisNode==None):
print("Element not in list")
elif lastNode == None: #head node gets deleted
self.head = thisNode.getNextNode()
self.length -=1
else:
lastNode.setNextNode(thisNode.getNextNode())
self.length -=1
Basic Concepts
Chain
A chain is also known as a doubly linked list. It is similar to a singly linked list, except
that each node has a pointer to both its predecessor and successor on the list. The
symmetrical nature of the doubly linked list makes it easier to implement certain ope-
rations on it. However, the price we pay is an extra pointer per node that not only occu-
pies space, but also needs to be correctly updated during list operations. We maintain
two sentinel nodes at the head and tail of the list, which simplifies some special cases.
The Python implementation follows:
dList.py
class DNode:
def __init__(self,elem = None, prev=None, next=None):
self.element = elem
self.prevNode = prev
self.nextNode = next
def getElement(self):
return self.element
def getPrevNode(self):
return self.prevNode
def getNextNode(self):
return self.nextNode
def setElement(self, elem):
self.element = elem
def setPrevNode(self, elem):
self.prevNode = elem
def setNextNode(self, elem):
self.nextNode = elem
class DoublyLinkedList:
def __init__(self):
self.length = 0
self.head = DNode(None)
self.tail = DNode(None)
def isEmpty(self):
return (self.length==0)
def getLength(self):
return self.length
def addNodeFront(self,elem):
#Add element immediately after head node
self._addNodeIntermediate\
(self, elem,self.head,self.head.nextNode)
def addNodeEnd(self,elem):
#Add element immediately after head node
self._addNodeIntermediate\
(self,elem,self.tail.prevNode, self.tail)
Trees
The tree is a fundamental data structure that helps to represent connectivity and hier-
archy. For example, graphs and trees can be used to model chemical compounds
(Ahmad & Koam, 2020) to help visualize their atomic-level connectivity. In 1857, mathe-
matician Arthur Cayley invented the concept of trees when trying to model the problem
of counting the number of possible isomers of an alkane (Wilson, 2010). Since then,
trees have been widely used to model various problems in chemistry, geology, biology,
computer science, and other disciplines.
Unit 1 33
Basic Concepts
Trees enable us to naturally organize data in the form of file systems, HyperText Markup
Language (HTML) pages, organizational structures in companies, and genealogical dia-
grams called family trees. Trees are also used to represent expressions. In programming
language compilers and in natural language processing, parse trees represent the deri-
vation of strings in the language according to the rules of the underlying grammar.
Definitions
Trees may be used for representing acyclic relationships connecting entities, but many
trees are rooted. A rooted tree is a collection of nodes storing data elements with the
following properties (Goodrich et al., 2013):
Note that the tree is a recursive structure. A tree T is either empty or consists of a root
node r connected to possibly empty subtrees rooted at nodes v where v is a child of r
(Goodrich et al., 2013).
If nodes u and v have the same parent w, then u and v are called “sibling nodes.”
Any node a on the path from the root to node v is called an “ancestor” of v. Any node d
on the path from node v to a leaf node is called a “descendant” of v (Goodrich et al.,
2013).
Nodes with one or more child nodes are called “internal nodes” (Goodrich et al., 2013).
In an m-ary tree, each internal node has at most m child nodes. If each internal node
has exactly m children, the tree is called a full m-ary tree. The most common m-ary
tree is the binary tree for m = 2.
The length of the path, in terms of the number of nodes, from the root to a node v in
the tree is called the “level” of v. The maximum of the levels of all the vertices in the
tree is called the “height” of the tree (Goodrich et al., 2013).
Summary
Most general-purpose computers as we know them today draw inspiration from the
classical architecture proposed by John von Neumann and consist of RAM, CPU, sec-
ondary storage, and I/O.
Control structures facilitate the flow of control through a program. These include
structures inside statements governed by rules of operator precedence and asso-
ciativity, structures for conditional statements and loops, and flow of control
between subprograms.
The programming language features of functions and function calls support proce-
dural abstractions. Recursive functions are an elegant yet powerful feature that
allows functions to invoke themselves.
Types are programming language features that facilitate the structuring of data.
Types include built-in and user-defined types, type systems, and static and
dynamic typing. Important types include basic data structures include lists, chains,
and trees with their Python implementations.
Knowledge Check
You can check your understanding by completing the questions for this unit on the
learning platform.
Good luck!
Unit 2
Data Structures
STUDY GOALS
DL-E-DLBCSL01-U02
36 Unit 2
2. Data Structures
Introduction
Data structures represent data and relationships among data for efficient manipula-
tion. Data are much more than collections of bits and bytes. Data are associated with
objects and their representations. Objects could be persons, physical objects, events,
or abstract concepts. For representations, there are choices to be made regarding what
attributes are to represent the objects, what queries we need to ask of the objects, and
how frequently. Consider a simple problem of storing and querying a set of integers.
The goal is to answer a search query from the user about the presence or absence of
an integer of the user’s choice in our collection. Suppose the design decision to make
is whether we should store it in a sorted array or an unsorted array. In general, search
works better in sorted arrays. There are algorithms to execute our search problem on a
sorted array within a time that is logarithmic relative to the number of integers stored.
A brute-force scan through the array would take linear time. However, if our set is
unsorted to start with, we will need to sort it first, but then the time taken to sort fol-
lowed by a binary search would be expensive compared to a brute-force linear search.
So, does that mean that we should use linear search as opposed to binary search for
this problem? Yes, if we simply had to search only once. If we had to search several
times, the cumulative advantage of the logarithmic searches over linear ones would be
significant, even considering the overhead of the sorting step. Now, scale this problem
up to web searching. We expect our answers immediately! The search engine is able to
satisfy our requirement for speed because of sophisticated preprocessing and data
storage ahead of processing our query.
Stacks
A “stack” is a collection of data items following a “Last In, First Out” (LIFO) paradigm
(Goodrich et al., 2013). The basic operations supported by a stack are as follows (using
Python nomenclature):
Data Structures
Stacks can be implemented using singly linked lists, although the simpler array imple-
mentations are also common. An array implementation of a stack using Python lists is
given below:
stack.py
class Stack:
def __init__(self):
self.elements = [] #Initialized to empty list
def isEmpty(self):
return self.elements == []
def size(self):
return len(self.elements)
def topOfStack(self):
return self.elements[len(self.elements)-1]
def pop(self):
return self.elements.pop()#Python's in-built pop
Stacks can be used to switch railroad cars of a train in a switching yard (Knuth, 2013). In
the figure above, railroad cars arrive at the switching yard in the order D, C, B, and A. A
stack-like structure is used to switch the cars so that they leave the yard in the order C,
A, B, and D. The following Python code simulates the process:
s=Stack()
s.push("D")
s.push("C")
print(s.pop())
s.push("B")
s.push("A")
print(s.pop())
print(s.pop())
print(s.pop())
Queues
A “queue” is a collection of data items following a “First In, First Out” (FIFO) paradigm
(Goodrich et al., 2013). An example of a queue is the departure queue of flights taking
off from a particular runway at an airport. An aircraft that is ready to depart enters the
queue. Aircraft in the queue wait for their turn. The aircraft at the front of the queue
takes off after receiving permission from air traffic control.
Data Structures
queue.py
class Queue:
def __init__(self):
self.elements = []
def isEmpty(self):
return self.elements == []
def size(self):
return len(self.elements)
def deQueue(self):
return self.elements.pop()
Queues are used in job scheduling, breadth-first search (BFS) in graphs, and other
applications.
Heaps
A “heap” is a data structure that is used as a building block in two important problems
—heapsort and implementation of priority queues. Many algorithms, such as Dijkstra’s Heapsort
shortest path algorithm, Prim’s minimal spanning tree algorithm, and various job sche- A heapsort is a sor-
duling and selection problems, use a heap as a fundamental data structure (Cormen et ting algorithm that
al., 2009). We consider the binary heap here. There are other variants, including the builds a heap of the
Binomial Heap, Fibonacci Heap, and Leftist Heap, among others, that have their own numbers to be sor-
properties (Brodal, 2013). ted and repeatedly
removes the maxi-
Priority queue operations mum (or minimum).
Let us consider the problem of implementing a priority queue using a heap. The basic
item is a <data, priority> pair. The priority queue attempts to keep track of the item
with the highest priority (Goodrich et al., 2013).
Heap property
Let T be a complete binary tree with nodes v having fields defined as follows: Complete binary tree
A complete binary
• key v is the key associated with node v. tree is a binary tree
• Left v is the left child of v. with the following
• Right v is the right child of v. properties: (a) all
40 Unit 2
heaps.py
class Heap:
def __init__(self):
self._X = []
def isEmpty(self):
return self._X == []
def size(self):
return len(self._X)
Data Structures
return(maxChild)
def extractMax(self):
#Remove the maximum element from heap and return
maxElement=self._X.pop(0)
if(self.size() != 0):
#Bring last element to front
lastElement=self._X.pop()
self._X.insert(0, lastElement)
#Trickle down
i = 0
while(2*i < self.size()-1):
m = self._maxChild(i)
if(self._X[m] > self._X[i]):
self._X[i],self._X[m] \
= self._X[m],self._X[i]
else:
break
i =m
return maxElement
def reportMax(self):
return(self._X[0])
def printHeap(self):
print(self._X)
Of the operations, reportMax takes O 1 time while both extractMax and insert take
O log n time, where n is the number of items in the heap when the operation is per-
formed.
42 Unit 2
An application
Consider an aircraft landing problem at an airport, where the air traffic control tries to
prioritize the landing of aircraft based on various factors quantified by a priority value.
Aircraft with higher priority land earlier. For instance, an airplane that is already low
and very close will have a high priority. Consider the scenario shown in the figure,
where aircraft with different priority values seek landing permission. These priorities
are inserted into a priority queue implemented as a MAX-heap. Then, an aircraft arrives
seeking an emergency landing. It has the priority value 13, which is the maximum. The
priority queue returns this value for the extractMax operation. These operations are
executed using our heap implementation as follows:
H=Heap()
H.insert(5)
H.insert(12)
H.insert(9)
H.insert(7)
H.insert(1)
H.insert(8)
H.insert(13)
H.extractMax()
Data Structures
Graphs
The celebrated problem of Königsberg bridges asked whether the seven bridges of the
Prussian city of Königsberg, over the river Preger, could all be traversed in a single trip
without going through any bridge twice (Rosen, 2019). The additional requirement was
that the trip must end in the same place it began. In 1736, Leonhard Euler showed that
the Königsberg bridge problem could not be solved. This initiated the study of graph
theory, which is central to computer science today (Rosen, 2019).
In the graph representation of the Königsberg bridge problem, each vertex represents a
landmass, and each edge represents a bridge.
A “simple graph” has no self-loops and does not have multiple edges between vertices.
A graph with multiple edges or self-loops is called a pseudo-graph (Rosen, 2019).
Unit 2 45
Data Structures
A “directed graph” has only directed edges between pairs of vertices. An “undirected Self-loops
graph” has no directed edges (Rosen, 2019). These are edges in
graphs that start and
end at the same ver-
tex.
Applications
Social networks are often represented as graphs. We sometimes call such graphs
“social graphs.” The entities represented by vertices could be individuals, posts, or
some comments. The edges connecting the vertices represent some relationship
between the entities between the vertices. For example, the edge of a graph showing
Facebook connections would represent friendship. Graphs may have different types of
vertices. For instance, in a collaboration network, a vertex may be an author or a
research paper. In some networks, the edges represent different types of relationships,
such as friendship, familial relationships, or acquaintance. In others, such as trust net-
works, the edge may be weighted. The relationship is non-random, and, often, the enti-
ties form clusters of “communities,” which are not necessarily disjointed. The graph
representation also depends on what type of data will be mined. A collaboration net-
work in research may be represented with (a) vertices as authors and edges indicating
co-authorship, (b) vertices as papers with edges indicating the presence of common
authors, or (c) each vertex being either an author or a paper and an edge indicating
authorship of a paper.
• road networks.
• web pages and their hyperlinks.
• communication networks.
• collaboration networks.
• airline network (connectivity between cities).
46 Unit 2
Cycles
Cycles Various problems are modeled using cycles in graphs.
A cycle is a sequence
of vertices such that
every consecutive
pair in the sequence
is connected by an
edge, and the last
vertex in the
sequence is connec-
ted to the first.
In the graph below, A-E-D-A and A-B-C-D-A are cycles, but A-B-C-D-E-A is not.
Unit 2 47
Data Structures
DAGs
Directed acyclic graphs (DAGs) are useful for modeling dependencies between tasks. Directed acrylic
graphs
Graph representation A DAG is a directed
Two popular ways in which graphs may be represented are “adjacency lists” and “adja- graph with no direc-
cency matrices” (Rosen, 2019). ted cycles.
For an undirected graph G = V , E , the adjacency list for vertex v, Adj v , stores the
list of neighbors u v, u ϵ E u, v ϵ E if v, u ϵ E. This is usually the preferred rep-
resentation if the graph is sparse. An adjacency list representation of an undirected Sparse
graph is shown below along with the adjacency lists. For example, the self-loop at ver- A sparse graph is a
tex A and a straight edge between A and C results in Adj A = A, C . graph with |V| verti-
ces and O(|V|) num-
ber of edges.
For a directed graph G = V , E , the adjacency list for vertex v, Adj v stores the list of
neighbors u v, u ϵE . In some applications, Adj v may store incoming edges at v.
48 Unit 2
For any simple graph G, directed or undirected, let us assume that the vertices are
named v i , 0 ≤ i ≤ n − 1. The adjacency matrix A is a two-dimensional Boolean
matrix where A i, j = 1 if, and only if, v i , v j is an edge in the graph. This is some-
Dense times the preferred representation if the graph is dense.
A dense graph is a
graph with |V| verti-
ces and O(|V|2)
number of edges.
The adjacency matrix of an undirected graph is symmetric. This need not be the case
for a directed graph.
Unit 2 49
Data Structures
Both adjacency list and adjacency matrix representations are widely used; however,
adjacency matrices have a space requirement of O V 2 , while adjacency lists have a
space requirement of O V 2 E .
ADTs
The abstract data type (ADT) for a data structure specifies what is stored in the data
structure and what operations are supported on them, as shown for the stack or the
queue data structure above. The ADT does not detail how the operations are imple-
mented.
• Vertex ADT
◦ VerteVertex(name) creates a vertex with a given city name.x ADT
◦ getName() returns the name of the vertex.
• Edge ADT
50 Unit 2
Note that we have not defined details of the representation of the graph, for example,
whether we are using adjacency lists or adjacency matrices. The ADT serves as the pub-
lic interface for those using the graph data structure. The implementation details are
hidden.
Heap ADT
In the heap example, the heap ADT may be defined as follows:
This is the heap’s public interface. Information that is notably not a part of the ADT
includes:
Data Structures
Constructors
Once we have defined a class, we can create instances or objects of the class using
a “constructor.” For example, we can create an instance of the Heap class by invoking
the constructor as Heap(). This accomplishes two things: It creates an object in the
memory and it calls the __init__ method of the class to initialize (assign data to)
object.
Inheritance
“Inheritance” adds a powerful feature to object-oriented programming that facilitates
modular and hierarchical organization. This enables us to define new classes based on
the existing class. The new class is called the “derived class” or “subclass.” The existing
class from which the subclass is derived is known as the “superclass” or “base class”
(Goodrich et al., 2013).
The subclass
parallelograms.py
class Parallelogram:
def __init__(self, p, q):
self.first = p
self.second = q
self.third = p
self.fourth = q
def perimeter(self):
return(self.first + self.second +\
self.third+self.fourth)
52 Unit 2
class Rectangle(Parallelogram):
def __init__(self, p, q):
super().__init__(p,q)
def area(self):
return(self.first*self.second)
class Square(Rectangle):
def __init__(self, p):
super().__init__(p,p)
def area(self):
return(self.first*self.first)
P=Parallelogram(3,4)
print(P.perimeter())
R=Rectangle(3,4)
print(R.perimeter())
print(R.area())
S=Square(5)
print(S.perimeter())
print(S.area())
• the Rectangle class inherits the perimeter method from the base class Parallelo-
gram, as does the Square class,
• the Rectangle class extends the base class Parallelogram with an area method,
and
• the Square class area method overrides the area method from the Rectangle class.
2.3 Polymorphism
Software reuse implies better productivity. The ability to use the same subprogram for
different types of data leads to software reuse and is a powerful facility provided by
different languages supporting object-oriented programming. This facility is known as
“polymorphism” and manifests itself in different ways in programming languages
(Goodrich et al., 2013).
Unit 2 53
Data Structures
In Python, support for polymorphism exists for both built-in types and for user-defined
classes.
The len function in Python works for several types, including ranges, strings, lists, tup-
les, sets, and dictionaries:
A = range(0, 5) #range
print(len(A))
B= [2,3,4,5] #list
print(len(B))
D = {4,5,6,7,8,9} #set
print(len(D))
The + Operator
The + operator works for a variety of types, such as strings, numeric types, lists, and
tuples, but both operands must be of the same type:
a = 23
b = 45
print(a+b)
a = "abc"
b = "def"
print(a+b)
a = [1,2,3]
b=[4,5]
print(a+b)
54 Unit 2
a = (1,2,3)
b = (3,4,5)
print(a+b)
Let us revisit the Parallelogram example. Note that the perimeter is implemented in
the base class Parallelogram, but not in the derived classes Rectangle and Square.
When we instantiate an object S of class Square and invoke S.perimeter(), the
perimeter() method defined in the superclass Parallelogram is called. Now, con-
sider the area methods defined in the Rectangle and Square classes. When we create
an object S of the Square class and invoke S.area(), the area method defined in class
S is called. If we delete this method, since class Square is a subclass of the class rec-
tangle, the rectangle’s area method will be invoked whenever S.area() is called.
Summary
Stacks are structures following the LIFO paradigm. The main operations for a stack
are the PUSH and the POP. The PUSH operation involves the addition of an element
to the top of the stack. The POP operation involves the removal of an element from
the top of the stack. They can be implemented using linked lists or arrays. We pro-
posed an implementation using Python lists. Stacks are useful in many algorithms
including parentheses matching in expressions, matching tags in HTML, and sup-
porting stack frames to process function calls.
Queues are FIFO structures that support operations of enqueue to add elements at
the end of the queue and dequeue to remove them from the front. We proposed an
implementation using Python lists. Queues are useful in BFS and many scheduling
algorithms.
Unit 2 55
Data Structures
Heaps are used in Heapsort and for implementing priority queues. They come in
two varieties: MAX-heap and MIN-heap. The basic operations supported include
returning the maximum or minimum in O 1 time and removing the same in
O log n time. The heap also supports an addition of a new element in O log n
time.
Graphs are a fundamental data structure used for modeling relationships such as
computer networks, social networks, communication networks, road networks, and
biological networks.
Knowledge Check
You can check your understanding by completing the questions for this unit on the
learning platform.
Good luck!
Unit 3
Algorithm Design
STUDY GOALS
DL-E-DLBCSL01-U03
58 Unit 3
3. Algorithm Design
Introduction
To solve a problem on a computer, we need algorithms and data structures that we
then map to programs in a programming language of our choice. For the solution to be
efficient, each of these (algorithms, data structures, and programs) need to be efficient.
Algorithm design involves mapping the specifications of a problem, possibly in natural
language, to an algorithmic pseudocode that can be universally understood. For better
understanding later, it may be necessary to create a correctness proof, particularly if
some steps of the algorithm are nontrivial.
Since there may be multiple algorithms for the same problem to choose from, the pro-
grammer will need some basis for the choice. Hence, it will also be necessary to aug-
ment the solution description with an analysis of resource requirements, which mostly
translates to the running time and space. Over time, standard methodologies have
emerged for all these steps of algorithm design, analysis, and correctness proofs.
Although each algorithm is different, over the years, some “design patterns” or tem-
plates have also emerged for algorithm design methodologies. These apply to a large
class of problems. Finally, standard measures of complexity help in comparing multiple
algorithms for the same problem. Once the program is written, we need program verifi-
cation techniques to ensure the program acts according to specifications. Rigorous
testing techniques are then employed to test whether the program meets the require-
ments and to unearth any bugs.
Iteration
Algorithm Design
while(condition):
statements
The following Python code prints the natural numbers from 1 to 25. The test condition
for the while loop fails when n = 26, and the loop then terminates.
n = 1
while (n <= 25):
n += 1
print(n)
Note that if n had been initialized to 26, the while loop would not be executed at all.
do {
statements;
} while (condition);
The statements in the body of the do-while loop are executed until the condition is
evaluated to be false. The do-while loop is always executed at least once, even if the
condition is false throughout.
The else part is optional and executed when the for loop exits normally. It will not exe-
cute when the exit is through a break statement.
The following prints all the natural numbers from 21 to 34, inclusively:
60 Unit 3
forloops.py
for i in range(21,35):
print(i)
The following prints the names of the fruits in the list fruitBasket:
fruitBasket=["apple","banana","mango","cherry","kiwi"]
for i in fruitBasket:
print(i)
for i in (2,4,6,8):
print(i)
for i in [2,4,6,8]:
print(i)
for i in "2468":
print(i)
The for loop in C-based languages has the following general form:
Here, the first expression is used for initialization and the second for the condition to
be tested for the loop to continue. The third expression is used for any action at the
end of the loop, such as incrementing the loop control variable. All expressions are
optional.
User-controlled mechanisms
Programming languages also support loop control mechanisms wherein the exact loca-
tion of the control mechanism within the body of the loop can be decided by the user.
Python supports two such constructs: break and continue. Break allows the control to
exit the loop, whereas continue allows the control to skip the rest of the statements in
the loop body and return to the start of the loop.
For example, consider the following Python code fragment to print the odd numbers in
the range 501,1000 starting from 501 until it encounters an odd multiple of 37. It
prints the sequence 501, 503, ..., 555.
break.py
for i in range(501,1000,2):
print(i)
if(i % 37==0):
break
The following Python code fragment prints all odd multiples of 37 in the range
501,1000 :
Unit 3 61
Algorithm Design
continue.py
for i in range(501,1000,2):
if(i % 37!=0):
continue
print(i)
Iterators
“Iterators” are user-defined functions that go through (iterate) a data structure in some
sequence (Goodrich et al., 2013). Each time it is called, it returns another element of the
data structure. For instance, Python allows the creation of iterators that walk through
the elements of an iterable object.
In the following example, the Python code snippet creates an iterator for a list of inte-
gers and iterates through the list, printing them one by one:
iterator.py
alist = list(range(1,21))
i = iter(alist) #creates iterator
while (1):
try:
print(next(i)) #iterates through list
except StopIteration:
break
Of course, in Python, the for loop would have automated this process of creating an ite-
rator for an object and invoking the next element repeatedly before calling the StopI-
teration exception (Goodrich et al., 2013).
Generators
“Generators” are an alternative to a traditional function and are suitable when we need
the results one by one. Here is an example that generates the prime factors of a natu-
ral number in Python. Note the use of the “yield” construct instead of a “return”.
generator.py
def generatePrimeFactors(num):
fact = 2
while fact * fact <= num:
if num % fact:
fact += 1
else:
num //= fact
yield fact
if num > 1:
yield num
for i in generatePrimeFactors(3000):
print(i)
Recursion
primes.py
def primeFactors (num, fact):
if num < fact*fact:
return [num]
if num % fact == 0:
return [fact] + primeFactors (num // fact, 2)
return primeFactors (num, fact + 1)
Above is an example of linear recursion since only one of the two calls to primeFac-
tors in the body of the function will be executed. The number of such calls may be
more than one.
The following example illustrates a case of binary recursion in Python. Fibonacci num-
bers are defined for all non-negative integers n as follows:
n, 0 ≤ n < 2
fib n =
fib n − 1 + fib n − 2 , n ≥ 2
binFib.py
#Binary Recursive Fibonacci
def fib(n):
if(n==0):
return 0
elif(n==1):
return 1
else:
return(fib(n-1)+fib(n-2))
print(fib(6))
Although recursion is elegant, it must be used judiciously. Recursive calls have system
overheads. Moreover, although a certain way of programming may be “natural,” it may
not be the most efficient. The above may be improved to a linear recursive version as
follows.
Unit 3 63
Algorithm Design
linFib.py
def linearFibonacci(n):
#Returns F(n) and F(n-1)
if (n <= 1):
return (1,0)
else:
(current, prev) = linearFibonacci(n-1)
return (current+prev, current)
The difference between the two implementations is significant. The binary variant runs
in exponential time, whereas the linear version takes O n time, as we shall see below.
Induction Proofs
Weak induction
Suppose we need to prove that a property P n is true for all non-negative integers
n ≥ 0. The steps are as follows:
Strong induction
In this case, to prove that a property P n is true for all non-negative integers n ≥ 0,
the steps are as follows:
Note that we may use a different basis condition depending on the problem.
A Simple Algorithm
A “simple algorithm” is the simplest one for solving the problem; it is usually an
obvious one based on the problem statement directly. We consider the Maximum Con-
tiguous Subarray Problem, which is defined as follows: We are given a sequence A of n
integers A[1..n] and we need to find the largest sum possible in a contiguous subse-
quence A[i..j] of A. This and similar problems arise in applications, such as bioinfor-
matics, computer vision, and data mining (Brodal, 2013; Bentley, 2000).
For example, consider the sequence A=[-6, -22, 1, 6, -5, 3, 4]. Here, the maxi-
mum contiguous subsequence is [1, 6, -5, 3, 4] with a sum of 9.
Brute-force algo- A possible brute-force algorithm for this problem could simply be to compute the sub-
rithm array sum for each possible pair i, j satisfying 0 ≤ i ≤ j ≤ n − 1 and keep track of the
This is a straightfor- maximum. Below is a Python implementation of this algorithm, with the sequence
ward algorithm that represented as a Python list.
typically adopts a
simple approach like maxContiguousBF.py
considering all pos- def maxContiguousSubseq(A):
sible cases. maxSum = 0
n = len(A)
for i in range(0,n):
for j in range(i,n):
subseqSum = 0
for k in range(i,j+1):
subseqSum += A[k]
maxSum=max(maxSum, subseqSum)
print(“maxSum =”, maxSum)
The loop with index i is executed n times. The loop with index j is executed n − i ≤ n
times. The loop with index k is executed j − i + 1 ≤ n times. So, overall, this is an O n3
algorithm for the problem.
Dynamic Programming
Algorithm Design
can be obtained from Sum(i, j-1), the sum of the subsequence A[i..j-1], by simply
adding A[i]. The second sum need not be recomputed from scratch, but instead com-
puted from the solution to the subproblem sum.
A i if j = i .
Sum i, j =
Sum i, j − 1 + A j if i < j < n
This leads to an improved algorithm because we optimize on space by not storing the
partial sums. The Python code is displayed below.
maxContiguousDP.py
def maxContiguousSubseqDP(A):
maxSum = 0
n = len(A)
for i in range(0,n):
subseqSum = 0
for j in range(i,n):
subseqSum += A[j] #Compute Sum(i,j)
maxSum=max(maxSum, subseqSum)
print("maxSumDP =", maxSum)
listA=[-6, -22, 1, 6, -5, 3, 4]
maxContiguousSubseqDP(listA)
The loop with index i is executed n times, and the loop with index j is executed
n − 1 ≤ n times. So, overall, this is an O n2 algorithm for the problem, which is an
improvement on the brute-force O n3 approach.
Divide-and-Conquer
maxContiguousDC.py
def maxContiguousSubseqDC(A, low, high):
if(low == high): #single element
return max(0,A[low]) #if negative return 0
mid=(low+high)//2
print("low, mid, high, left, right, maxLeft, maxRight", low, mid, high,
left, right, maxLeftSum, maxRightSum)
return(max(left, maxLeftSum+maxRightSum, right))
Let the running time for this algorithm be expressed as T n , where n is the size of the
input. We subdivide the problem into two parts and recurse on each. Finding the maxi-
mum crossing subsequences takes O n time.
O 1 , n ≤ 1
T n = n
2T + O n , n > 1
2
Algorithm Design
Greedy Algorithms
It turns out that if the customers are served in the order of non-decreasing c j , the
optimal solution is achieved. Here, the store owner makes the greedy choice by choos-
ing the customer with the minimum c j (ties broken arbitrarily) among those still wait-
ing as the next customer to be served. We illustrate this with an example:
Let c 1 , c 2 , c 3 , c 4 , c 5 = 25,21,14,10,5 .
Note that the cumulative waiting time decreases if a customer’s position in the queue
is exchanged with another customer with a higher service time who is ahead in the
queue. The greedy schedule with a non-decreasing order of service times gives the
minimal solution.
The condition is c 1 ≥ c 2 ≥ ⋯ ≥ c n .
The goal is to minimize the sum of the waiting time of all customers.
The claim is: The total waiting time is minimized if the customers are processed in the
order.
S = s 1 , s 2 , …, s n = c n , c n − 1 , …, c 1
Proof
We prove the claim to be correct by the method of contradiction. The claim implies that
s 1 ≤ s 2 ≤ s 3 ≤ … ≤ s n . Let us assume that this order of serving customers, S ,
adopted by the greedy algorithm is incorrect and does not minimize the total waiting
time. Let an optimal schedule minimizing the total waiting time for processing the cus-
tomers be T = t 1 , t 2 , …, t n . As T is optimal and S is not, they must differ at
one or more indices. Let i be the smallest index such that t i ≠ s i :
s 1 = t 1 , s 2 = t 2 , …, s i − 1 = t i − 1
Since s 1 ≤ s 2 ≤ s 3 ≤ ⋯ ≤ s n , we get t j ≤ t i .
Cost T ′ =Cost T + t j ⋅ j − i − t i ⋅ j − i
Unit 3 69
Algorithm Design
Cost T ′ =Cost T − j − i ⋅ t i − t j
s 1 = t 1 , s 2 = t 2 , …, s i − 1 = t i − 1 , t i ≠ s i
T ′ = t′ 1 , t′ 2 , …, t′ n
s 1 = t′ 1 , s 2 = t′ 2 , …, s i − 1 = t′ i − 1 , s i = t′ i
Cost T ′ ≤ Cost T
T matches S up to position i − 1.
T ′ matches S up to position i.
Hence, Cost S = Cost T , that is, the optimal schedule has the same cost as the
greedy schedule, which we had assumed does not minimize the total waiting time. This
is a contradiction. Therefore, we conclude that the greedy schedule must minimize the
total waiting time.
The proof methodology used here is general and has been applied for many greedy
algorithms. “Matroid theory” provides a mathematical basis to show that a greedy algo-
rithm is correct by using a combinatorial structure called a matroid (Cormen et al.,
2009). This has been used for many greedy algorithms but is not necessarily applicable
to all.
Example
Consider the following Python function for computing factorials.
factorial.py
def factorial(n):
index = 0
value = 1
while(index < n):
70 Unit 3
index += 1
value *= index
return value
Proof
The basis P 0 is true. By definition 0! = 1 . Before the while loop executes,
value = 0! = 1.
The induction step is as follows: Let P k − 1 be true. Assume that if the while loop
executes r = k − 1 times and value = r! = k − 1 !. The variable index tracks the num-
ber of iterations of the while loop. Thus, at this point index = k − 1. Then, in the next
iteration index gets incremented to k and value = value ⋅ k = k − 1 ! ⋅ k. Hence, P k is
true.
Example
Consider the following Python program which computes the highest power of a factor
fact ≥ 2 that divides a natural number num ≥ 2.
Let us prove that this program is correct by using strong induction. Note that we
assume that the following inequality always holds because of the nature of the prob-
lem: fact ≥ 2.
Proof
This is achieved by strong induction on the first function argument num.
The basis is as follows: The function works correctly for num = 0,1,2. If num = 0 or 1,
num < fact, hence the function correctly returns 0 in line 3. If num = 2, and num < fact,
again the function returns a 0 in line 3, which is correct. If num = fact = 2, the function
correctly returns a value of 1 (line 5).
The induction step is as follows: Suppose the function works correctly for all values of
num satisfying 0 ≤ num < k. Now consider num = k.
Unit 3 71
Algorithm Design
Case 1: If fact > num, the function will return 0, which is correct.
Case 3: If fact < num and num % fact = 0, then fact divides num. Then, the highest
power of fact that divides num is one more than the highest power of fact that divides
num/fact.
Since k/fact < k, according to the induction hypothesis, the function works correctly for
num = k/fact and returns the highest power of fact that divides num/fact. Hence, the
function correctly returns one more than the highest power of fact that divides
num/fact (line 7).
Hence, by induction on num, the above arguments prove that the function correctly
returns the highest power of fact that divides num for all integers num ≥ 0 and all
integers fact ≥ 2.
Loop Invariants
maxContiguousOPT.py
def maxContiguousSubseqOpt(A):
maxSum = 0
subseqSum = 0
n = len(A)
for i in range(0,n):
subseqSum = max(subseqSum+ A[i], 0)
maxSum = max(maxSum, subseqSum)
print("maxSumOpt =", maxSum)
listA=[-6, -22, 1, 6, -5, 3, 4]
maxContiguousSubseqOpt(listA)
To see why this is correct, note that the variable subseqSum tracks the maximum sum
for a subsequence ending at the most recently processed position, which is i − 1 at the
start of the loop and i at the end. The variable maxSum tracks the maximum sum for
the entire processed subsequence, which is A 0 . . i − 1 at the start of the loop and
A 0 . . i at the end. The values of subseqSum and maxSum are both zero initially, which
is correct by definition. If the value A i when added to subseqSum is negative, then A i
cannot be appended to any existing subsequence to create a maximum subsequence
and no maximum subsequence ends at position i. Otherwise, subseqSum is updated
with the value of A i . The last line of the loop updates maxSum with the new value of
72 Unit 3
subseqSum if the latter is greater. Thus, the algorithm correctly maintains the meanings
associated with subseqSum and maxSum across loops. The formal proof can be done
using mathematical induction on the loop variable i.
Program Verification
Over the years, research in formal methods in software engineering has focused on the
development of rigorous techniques for specification, development, and verification of
software systems. Using rigorous specifications and verifying that the implementation
meets the specifications can help to detect errors early or to eliminate them. As a limi-
tation, note that formal verification methods only verify whether the system is correct
according to the specification, but there is no guarantee that the specification itself is
completely correct.
• proof tools. These include “automatic theorem provers,” which automatically con-
struct proofs using axioms and rules of inference, and “proof assistants,” which are
interactive theorem provers that can help study complex properties and prove
expected behaviors based on theoretical deductions.
• model checkers. These use the program’s “state space.” The system is specified
using logic, and desired properties are validated. A counterexample is provided if a
desired property is not valid. Model checkers suffer from the problem of state space
explosion and do not scale well to large systems. One way this problem is circum-
vented is by using higher levels of abstraction. Another way is by using bounded
model checking (Clarke et al., 2001). Bounded model checkers consider only those
states that can be reached within a number of steps below a fixed bound.
• program annotation (Peled & Qu, 2003). These are logical properties to be verified
that are placed in the code. These additional instructions are executed during the
verification process. The additional code does not alter the behavior of the original
program. A common usage is to add simple assertions as preconditions and post-
conditions to pieces of code.
Testing
Algorithm Design
In practice, commercial software goes through stages of testing including testing done
during development, testing done at the time of release, and testing done by users.
Testing usually involves both manual and automated processes.
Integration testing
Integration testing involves integrating components in an almost realistic setting and
subjecting the integrated system to testing. The goal here is to check that the individual
pieces are compatible, integrate smoothly, and transfer data correctly through interfa-
ces (Sommerville, 2016).
Release testing
Release testing is the process of testing a particular release of the software and is
intended to satisfy external users by ensuring the product or program is ready to be
released for general consumption before they receive the release (Sommerville, 2016).
Performance testing
The goal of performance testing is to verify that the system can operate and deliver an
adequate service under the intended load. This is carried out after the system is fully
integrated (Sommerville, 2016).
User testing
User testing is typically carried out by users and customers to experiment and provide
feedback on a new system with the aim of ensuring that interaction with the software
under scripted and unscripted conditions yields expected behaviors (Sommerville,
2016).
Thus, the running time is of interest, and, in particular, how the running time grows
with the size of the input rather than the exact time, which could depend on the com-
putational resources available to the user. For this, we count some fundamental steps
of the algorithm, such as comparisons, arithmetic, and logic operations. Our analysis
should yield the order of growth, typically in terms of the input size, and we then
choose the algorithm based on this measure.
Asymptotic Complexity
An example follows. Suppose the running time of an algorithm in terms of its input is
given as
T n = 5n2 − 20n + 1
T n ≤ 5n2 + n2 + n2 ≤ 7n2 ∀n ≥ 1
T n = O n2
T n = 5n2 − 20n + 1
T n = n2 + 4n n − 5 + 1
T n ≥ n2 ∀n ≥ 5
T n = Ω n2
Since T n = O n2 and T n = Ω n2 , T n = Θ n2
Unit 3 75
Algorithm Design
f n
lim =0 f n =O g n
n ∞g n
f n
lim =∞ f n =Ω g n
n ∞g n
f n
lim =c>0 f n =ϴ g n
n ∞g n
Asymptotic comparison
When faced with a choice of algorithms, we choose the asymptotically faster algorithm.
Note that the asymptotic complexity captures the practical notion that we are interes-
ted in comparisons for sufficiently large n. A case in point is the comparison between
the functions f n = n3 and g n = 2n. Notice that while g n < f n for 1 < n < 10,
f n is asymptotically smaller.
76 Unit 3
• constant: O 1
• logarithmic: O log n
• log: O log log n
• linear: O n
• n-logn: O n ⋅ log n
• quadratic: O n2
• cubic: O n3
• polynomial: O nk
• exponential: O 2n
• factorial: O n!
Many problems we encounter are solvable in polynomial time, that is, there is an algo-
rithm for the problem that runs in time bounded by O nk , where n is the size of the
input, for some constant k ≥ 0. One natural question that arises is whether all prob-
lems are, in fact, solvable in polynomial time. In the early days of the study of algorith-
mic complexity, it was observed that, although several problems are solvable within a
time that is a low-degree polynomial in n, such solutions were elusive for many prob-
lems. So, it seems there is some fine line dividing problems that are solvable in polyno-
mial time from those that are not. There are two complexity classes of utmost impor-
tance:
1. The class P . All problems in this class are solvable in polynomial time.
2. The class NP (NP-complete). No polynomial time algorithms are known for these
problems. Moreover, no one has been able to prove that such algorithms do not
exist (Cormen et al., 2009). Also, if any one of these problems is solvable in polyno-
mial time, all of these problems would be solvable in polynomial time!
Summary
Algorithm Design
As an example, the Maximum Contiguous Subarray sum can be found using a sim-
ple brute-force approach, while improved algorithms using dynamic programming
and divide-and-conquer methodologies can be designed.
The greedy technique can be applied to multiple domains, for example, to a sched-
uling problem.
Having mapped the algorithm to a program, we need to verify and test it. Formal
verification of programs involves formally checking if the program matches its spec-
ification. Techniques include proof tools, model checkers, and program annota-
tions. Program testing goals include validation testing to check if the program
meets the requirements and defect testing to unearth bugs. These are accom-
plished through unit or component testing, integration testing, and release testing.
The program is also tested for performance.
Knowledge Check
You can check your understanding by completing the questions for this unit on the
learning platform.
Good luck!
Unit 4
Basic Algorithms
STUDY GOALS
… utilize the trie data structure for searching for a word in a string.
DL-E-DLBCSL01-U04
80 Unit 4
4. Basic Algorithms
Introduction
Of the many algorithms that we encounter, some are ubiquitous in practical applica-
tions. These common algorithms often serve as fundamental building blocks in algo-
rithmic solutions to more complex problems.
Trees are useful for representing acyclic relationships and connectivity information in
numerous applications. In many such applications, it is necessary to visit all the nodes
and process them in some systematic order.
In the twenty-first century, we are faced with a data deluge and are thus building appli-
cations that are increasingly data dependent. Therefore, it is imperative that we be able
to locate data efficiently when needed. Hence, search algorithms are fundamental to
many such applications.
Whether the data are ordered or not is a basic distinction that we need to make while
deciding on the type of search algorithm to apply. We can find a word quickly in a dic-
tionary because the words are ordered. Many hotels and restaurants across the world
have a valet parking service, wherein a valet parks the customer’s car. When the car
needs to be retrieved later, the valet knows exactly where it is.
Hash algorithms try to generalize this idea of finding objects by computing a data
item’s location from a table or a series of possible locations.
Text processing remains a major application area today, despite the increase in multi-
media content. Locating a series of words in a preprocessed text is common to many
applications. Data structures like “tries” support such string searches. On the other
hand, pattern matching algorithms solve a complementary problem by preprocessing a
pattern to speed up searching in a text document.
Basic Algorithms
Representation
aTree=['A', #Root
['B', #Left Subtree
['D',[],[]],
[]],
['C', #Right Subtree
82 Unit 4
['E',
['G',[],[]],
[]],
['F',[],[]]
]]
def treeRoot(aTree):
if(aTree):
return aTree[0]
def leftSubTree(aTree):
if(aTree):
return aTree[1]
def rightSubTree(aTree):
if(aTree):
return aTree[2]
print(treeRoot(aTree))
print(leftSubTree(aTree))
print(rightSubTree(aTree))
A
['B', ['D', [], []], []]
['C', ['E', ['G', [], []], []], ['F', [], []]]
Inorder Traversal
In inorder traversal, we recursively perform an in-order traversal of the left subtree, fol-
lowed by a visit to the root node. This is followed by a recursive in-order traversal of
the right subtree (Cormen et al., 2009). The Python implementation is shown below.
def inorder(aTree):
if aTree:
inorder(leftSubTree(aTree))
print(treeRoot(aTree))
inorder(rightSubTree(aTree))
If we invoke inorder(atree) with the tree above, the characters stored in the nodes
are printed in the following order: D B A G E C F.
Unit 4 83
Basic Algorithms
Preorder Traversal
In preorder traversal, we visit the root node first. This is followed by recursive preorder
traversals of each of the subtrees (Cormen et al., 2009). While this applies to any tree,
we illustrate it for a binary tree below.
def preorder(aTree):
if aTree:
print(treeRoot(aTree))
preorder(leftSubTree(aTree))
preorder(rightSubTree(aTree))
For our example, a call to preorder(atree) prints the characters stored in the nodes
in the following order: A B D C E G F.
Postorder Traversal
In postorder traversal, we first visit the subtrees in postorder, recursively. This is follo-
wed by a visit to the root node (Cormen et al., 2009). A Python implementation is shown
below. Like preorder traversal, this can also be extended to other types of trees.
def postorder(aTree):
if aTree:
postorder(leftSubTree(aTree))
postorder(rightSubTree(aTree))
print(treeRoot(aTree))
For the example above, a call to postorder(atree) prints the characters stored in the
nodes in the following order: D B G E F C A.
Breadth-First Traversal
The breadth-first traversal (BFS) is also called “level-order traversal” since the nodes
are visited level-by-level starting from the root. Within a level, the nodes may be visited
in any order (Goodrich et al., 2013). A Python implementation is displayed below.
def bfSearch(aTree):
if aTree:
qList=[aTree]
while qList:
nextNode = qList.pop(0)
if(nextNode):
print(treeRoot(nextNode))
qList.append(leftSubTree(nextNode))
qList.append(rightSubTree(nextNode))
84 Unit 4
For our example, a call to bfSearch(aTree) prints the characters stored in the nodes
in the following order: A B C D E F G.
Sequential Search
In a linear or sequential search, we walk through the list, comparing each element in
turn to the user-given key value x, until we find an element equal to x, or until we
reach the end of the list. A Python implementation is as follows:
The above implementation implicitly assumes that the list is unordered. If the list
being searched is ordered, we can take advantage of this by terminating the search
early, that is, upon finding a value in the list greater than the key being searched for.
Basic Algorithms
else:
index+=1
return success
A linear search takes O n time in the worst case. On an ordered list, the unsuccessful
searches are faster than in the unordered case when there is an early termination.
However, it is still O n in the worst case.
Binary Search
For an ordered sequence, there is a better algorithm to search for a key than linear
search. A binary search is based on gradual refinement of the possible interval of indi-
ces within which we need to search. The algorithm first compares the user-defined
search key keyValue with the middle element. If they are equal, it terminates success-
fully, returning the index of the middle element. If keyValue is larger than the middle
element, the lower half of the list is removed from consideration. If keyValue is smaller
than the middle element, the upper half of the list is removed from consideration. In
either case, we continue with another iteration. Since the size of the interval within
which we need to search is halved in each iteration, the algorithm either terminates
successfully or the size of the interval is reduced to one; thus, the algorithm has
O log n iterations. The Python code is given below.
found = -1
while left <= right:
mid = (left + right) // 2
if numList[mid] == keyValue:
found = mid
break
else:
if keyValue < numList[mid]:
right = mid - 1
else:
left = mid + 1
return found
Consider the list aList = [-17, -1, 12, 13, 27, 45, 57, 82]. The call binarySearch(aList,
13) returns 3 since the key 13 is in index position 3. The call binarySearch(aList,28)
returns −1 because 28 is not present.
1 if n = 1
T n = n
T + 1 otherwise.
2
• solution to the “togetherness” problem. Sorting helps in grouping items with the
same key value together.
• matching two sets of items. Comparisons become easier if the items are sorted.
• searching for information by key values. For instance, a binary search is only appli-
cable on a sorted sequence.
Insertion Sort
Insertion sort addresses the problem of inserting a new element into a subsequence of
elements that is already sorted. Assuming that the subsequence is already sorted in
non-decreasing order, the algorithm starts from the end of the subsequence and
moves backward, looking for the correct place to insert the new element (Cormen et al.,
2009). If the input is stored in an array aList, the idea is successively applied to
aList[0..i] for 0≤ i ≤ n-2. Once the subsequence aList[0..i] is sorted, we try to
insert aList[i+1] at an appropriate position, shifting elements to make space. A
Python implementation is as follows:
def insertionSort(aList):
seqLen = len(aList)
for index in range(1, seqLen):
toInsert = aList[index]
j = index
while j > 0:
if(toInsert >= aList[j-1]):
break
aList[j] = aList[j-1]
j -= 1
aList[j] = toInsert
Unit 4 87
Basic Algorithms
aList = [12,3,22,44,15,13,7,45,77,33]
insertionSort(aList)
print(aList)
The figure illustrates some intermediate steps of “insertion sort” on an example. The
subsequence [3, 12, 22, 44] is already sorted. The algorithm first tries to insert 15
into this subsequence. The positions where the algorithm tries to place the number are
circled. This is followed by insertion of 13 into the subsequence [3,12,15,22,44].
Insertion sort takes O n2 comparisons and O n2 exchanges in the worst case, where n
is the size of the input. The number of comparisons can be reduced to O nlog n by
using a binary search to locate the position where the insertion would take place. The
overall running time is still dominated by the number of exchanges and hence is O n2 .
Bubble Sort
In a “bubble sort,” we make a pass through the sequence comparing consecutive ele-
ments and swapping them if they are not in order. After the first pass, the largest ele-
ment ends up in the last position. If we repeat the process, after the second iteration,
the second largest element ends up in the penultimate position. If we repeat this n − 1
times, where n is the number of elements, the array will be sorted. Also, if during an
iteration we notice that no interchanges take place, we can conclude that the sequence
is already in order and terminate the algorithm (Cormen et al., 2009). A Python imple-
mentation is as follows:
def bubbleSort(aList):
seqLen = len(aList)
swapped = True
for lastIndex in range(seqLen-1, 0, -1):
if not swapped:
break
swapped = False
for k in range(0, lastIndex):
if aList[k] > aList[k+1]:
aList[k],aList[k+1]=aList[k+1],aList[k]
88 Unit 4
swapped = True
aList = [12,3,22,44,15,13,7,45,77,33]
bubbleSort(aList)
print(aList)
The figure shows some intermediate steps of running a bubble sort on an example.
Adjacent pairs of elements to be exchanged are marked. Also marked are the “locked”
elements, which will no longer be moved because they are already sorted. Bubble sort
takes O n2 comparisons and O n2 exchanges in the worst case, where n is the size of
the input.
Selection Sort
“Selection sort” is similar to bubble sort in that the i-th largest element is located in
the i-th iteration and moved to its correct destination. It differs from bubble sort in
that selection sort performs exactly one exchange per iteration. It locates the element
to be moved first and moves it to its correct destination with a single swap (Goodrich
et al., 2013). A Python implementation is depicted below:
def selectionSort(aList):
seqLen = len(aList)
for lastIndex in range(seqLen-1, 0, -1):
maxIndex =0
for k in range(1, lastIndex + 1):
if aList[k] > aList[maxIndex]:
maxIndex = k
aList[lastIndex], aList[maxIndex] \
= aList[maxIndex],aList[lastIndex]
aList = [12,3,22,44,15,13,7,45,77,33]
selectionSort(aList)
print(aList)
aList = [12,3,22,44,15,13,7,45,77,33]
selectionSort(aList)
print(aList)
Unit 4 89
Basic Algorithms
The figure above shows some of the steps in running a selection sort on an example.
Elements to be exchanged are circled. Note that, unlike bubble sort, selection sort
exchanges pairs of elements that may or may not be adjacent. Selection sort takes
O n2 comparisons and O n exchanges in the worst case, where n is the size of the
input.
Quicksort
Quicksort is one of the most popular sorting algorithms. It works by choosing a pivot
element p and partitioning the sequence into two groups of elements: the elements
x ≤ p and elements x ≥ p. The algorithm then recurses into the two partitions. A
Python implementation using the first element of the subsequence being sorted as the
pivot is given below:
qSort(aList,partIndex+1,right)
def quickSort(aList):
seqLen = len(aList)
qSort(aList, 0, seqLen-1)
aList = [12,3,22,44,15,13,7,45,77,33]
quickSort(aList)
print(aList)
The figure shows some steps of a quicksort on an example. If the pivot element always
creates a balanced partition, the recurrence for the running time T n is as follows:
a if n = 1
T n = n
T + bn otherwise.
2
However, the partition may not always be balanced, and quicksort has a worst-case
running time of O n2 . We could get an O nlog n worst-case algorithm if we could
always generate equal-sized partitions, which is theoretically possible by using the
O n median-finding algorithm (Cormen et al., 2009). However, this algorithm is com-
plex and is never used in practical scenarios. Using a random pivot, however, the
O nlog n expected running time can be achieved for a randomized quicksort algo-
rithm (Cormen et al., 2009). However, the generation of pseudo-random numbers is an
expensive process and slows down the algorithm. So, a compromise often used in prac-
tice is to use the median of the three elements (Miller & Ranum, 2013).
Merge Sort
Like quicksort, merge sort is also a divide-and-conquer algorithm. The sequence is divi-
ded into two equal parts, which are then sorted recursively. The two sorted subsequen-
ces are then merged to create a sorted version of the original sequence (Cormen et al.,
2009). The Python implementation below first defines the merge function for merging
two sorted Python lists. The merge function is invoked within the recursive mergeSort
function.
Unit 4 91
Basic Algorithms
def merge(A,B,C):
a=b=0
la, lb, lc = len(A), len(B), len(C)
while(a+b < lc):
if((b==lb) or ((a < la) and (A[a]<B[b]))):
C[a+b],a,b=A[a],a+1,b #Select from A
else:
C[a+b],a,b = B[b],a,b+1 #Select from B
return C
def mergeSort(aList):
seqLen = len(aList);
if seqLen <= 1:
return
mid = seqLen//2
lower = aList[:mid] #Copy lower half
upper = aList[mid:] #Copy upper half
mergeSort(lower) #Sort lower half
mergeSort(upper) #Sort upper half
aList = merge(lower,upper,aList)
aList = [12,3,22,44,15,13,7,45,77,33]
mergeSort(aList)
print(aList)
bList = [3,12,15,22,44,7,13,33,45,77]
merge(bList[:5],bList[5:10],bList)
print(bList)
92 Unit 4
The figure shows steps of merge sort applied to an example. The shaded subsequences
are being merged into bigger subsequences. Merge sort creates almost balanced parti-
tions, where the sizes of the two partitions differ by at most one. Its running time is
O nlog n in the worst case.
Using the Spyder IDE to step through the code, one can study the sorting algorithms in
detail. In the figure, the contents of the Python list being sorted are shown in the varia-
ble explorer (top-right pane) and are also printed in the Python console (bottom-right
pane). The editor is on the left pane.
Unit 4 93
Basic Algorithms
Tries
Also known as digital search trees, “tries” are important data structures in information
retrieval. Instead of a search method based on comparisons between elements, tries
attempt to take advantage of the representations of the elements as a sequence of
characters or digits (Goodrich et al., 2013).
Standard tries
Let Σ be an alphabet. Let S be a set of strings from Σ with total length n satisfying the
prefix property. We define a trie over S to be a tree satisfying the following properties Prefix property
(Goodrich et al., 2013): The prefix property
states that no string
• Each edge is labeled with a character from Σ. is a proper prefix of
• Each node has, at most, Σ children. another.
• Edges connecting a node to its child nodes are all labeled differently.
• The number of leaf nodes is exactly S .
• Each leaf node v is associated with a string that is the concatenation of the charac-
ters on the path from the root to v.
• The total number of nodes in the trie is n + 1.
• The height of the trie is the same as the size of the longest string in S .
94 Unit 4
class Trie:
def __init__(self):
self._top = dict() #Create top level dictionary
def buildTrie(self,aList):
for word in aList:
d = self._top
for letter in word:
if letter not in d:#no entry for letter
d[letter] = dict() #create entry
d = d[letter]#descend subtree by letter
def searchTrie(self,word):
d = self._top
for letter in word:
if letter not in d:#no entry for letter
print("Not Found")
return False
d = d[letter]#descend subtree by letter
print("Match Found")
return True
def printTrie(self):
print(self._top)
Unit 4 95
Basic Algorithms
aList = ["all","aloud","above","at","about"]
trial = Trie()
trial.buildTrie(aList)
trial.printTrie()
trial.searchTrie("aloud")
trial.searchTrie("albeit")
trial.searchTrie("abo")
{'a': {'l': {'l': {}, 'o': {'u': {'d': {}}}}, 'b': {'o': {'v': {'e': {}},
'u': {'t': {}}}}, 't': {}}}
Match Found
Not Found
Match Found
Other Structures
Searching on a set of strings can also use standard search algorithms, such as linear
search and binary search. We need to define the comparison operator suitably. In
Python, the usual operators <, >, ==, >=, and <= work with strings in the sense of lexico-
graphic comparison. To apply binary search on a set of strings stored as a Python list,
we can sort them lexicographically using any of the standard sorting algorithms and
then apply binary search. Likewise, search structures like hash tables and binary search
trees can be used with strings just as they are used with numeric data (Cormen et al.,
2009). There is a modified trie structure called Patricia trie that uses a simple compres-
sion idea to reduce a redundant chain of edges into a single edge (Goodrich et al.,
2013). The Patricia trie takes O S space as opposed to the O n space required by the
standard variant, where n is the total size of all the strings. The Patricia trie for our
example is shown below.
96 Unit 4
The basic questions we encounter when designing a hashing scheme from a source of
n elements to a table of m locations are (Knuth, 1998):
The pair (hash function and collision resolution algorithm) together define a hashing
scheme. Note that in hashing, the hash function generates the key values, which are
table indices. The search algorithm simply looks up the table at those indices. The
insert and delete functions also need to search with the key first and make use of the
same hash function.
Unit 4 97
Basic Algorithms
Hash Functions
Two desirable properties of hash functions are that they should be (a) easy to compute
and (b) able to distribute the keys into table locations with approximately equal proba-
bility (Cormen et al., 2009). In practice, the distribution is difficult to estimate.
Division method
If there are m locations in the hash table numbered 0 . . m − 1, a simple hash function
is ℎ k = k mod m. This is called the “division” method. This can be computed
quickly, and the distribution of keys into locations is reasonable for m prime.
Multiplication method
In the “multiplication” method, we define ℎ k = kϴ mod 1 , where 0 < ϴ < 1 and x
is the largest integer greater than or equal to x. Results indicate that a value of
ϴ= 5 − 1 /2 and ϴ = 1 − 5 − 1 /2 works well (Knuth, 1998).
Universal hashing
One potential problem with hashing is that if someone chooses all or several keys such
that ℎ k is the same for each key, severe collision and consequent performance degra-
dation takes place. To counter that, the universal hashing scheme chooses a hash func-
tion randomly from a collection of universal hash functions in a way that is independ- Universal hash func-
ent of the keys being stored (Cormen et al., 2009). Although universal hashing tions
distributes the keys satisfactorily on average, they are also expensive to compute. These are a collec-
tion H of hash func-
tions such that for
Collision Resolution Schemes any pair of keys a
and b, the number
When keys are mapped to the same location, the collision needs to be resolved. A vari- of hash functions for
ety of collision resolution schemes have been proposed to address this (Cormen et al., which they map to
2009). the same location is
at most |H|/m,
Chaining where m is the num-
In chaining, each location in the hash table is a linked list of keys that has been map- ber of memory loca-
ped to that address by the hash function. We create m lists L i , 0 ≤ i < m. List L i tions.
stores all the keys that get mapped to location i. Under the “simple unform hashing
assumption,” any element is equally likely to be mapped by the hash function onto any
of the table locations (Cormen et al., 2009). Under this assumption, the search takes
ϴ 1 + α where α = n/m is the load factor, m the number of locations, and n is the
number of elements. If we maintain α < 2, the search time is constant. A way to main-
tain this is to increase the table size and rehash once n = 2m.
Open addressing
Under “open addressing,” all items are stored in the hash table directly. For collisions,
we search for alternative positions within the table itself. To generate this probe
sequence, we must find any empty slots and design a sequence that is a permutation
of 0,1,2,…, m − 1 .
98 Unit 4
To search, we follow the same probe sequence as that used for insertion. If we encoun-
ter an empty slot during the search, we immediately conclude that the element being
searched for is not present in the table because the insert operation followed the
same probe sequence and would not have missed the empty slot. If ℎ k is the original
hash function, let g k, i denote the i-th location probed. The common algorithms for
generating the probe sequence are as follows (Cormen et al., 2009):
Basic Algorithms
Naïve pattern matching is a simple brute-force algorithm that tries every possible value
of the shift s and checks whether it is a valid shift. There are n − m + 1 possible
choices for s. Below is a Python implementation.
def naiveMatch(p,t):
if not p or not t:
return 0
m = len(p)
n = len(t)
found = False
for i in range(n-m+1):
j=0
k=i
while j < m and i < n and p[j]==t[k]:
j+=1
k+=1
if j== m:
print("Found valid shift", i, "for", p)
found = True
if not found:
print("No match for",p)
naiveMatch('aba','cbabababaa')
naiveMatch('abc','cbabababaa')
With the two nested loops, the running time is O n − m + 1 m . The algorithm’s ineffi-
ciency has a reason; in the event of a mismatch, partial matches between the pattern
and the text are not taken advantage of later.
The Knuth-Morris-Pratt algorithm (KMP algorithm; Cormen et al., 2009) corrects the
problem associated with the naïve algorithm. If a prefix of p of size r has matched with
s, followed by a mismatch, we try to determine the longest suffix of the matched part
that is also a prefix of p. The key observation here is that this portion is already a
matched part of text and need not be matched again. Additionally, some preprocessing
of the pattern can support this computation, as depicted in the figure below.
100 Unit 4
In this example, the substring abcab of the text matches a prefix of the pattern abcabb.
When the last character of the pattern fails to match, the brute-force algorithm would
try to shift the pattern by one position and attempt a rematch. However, the suffix ab of
the matching substring abcab is also a prefix of the pattern. This substring is already
matched with the substring ab of text at positions five and six. The pattern is now reali-
gned (as shown) for a new attempted match, without a required rematch of the prefix
ab. This saves comparisons over the brute-force algorithm. We pre-compute a prefix
function in a table based on the pattern without knowledge of the text. Essentially,
table[j]=k tells us that if the pattern fails to match at position j+1, we can assume
that the first k characters of the pattern are already matched and proceed. A Python
implementation of the prefix table computation is depicted below.
def prefix(p):
m = len(p) #size of pattern
table = [0]*m #Creates a list of m zeros.
i = 0
for j in range(1,m):
while i > 0 and p[i] != p[j]:
i=table[i-1]
if p[i] == p[j]:
i+=1
table[j]=i
return table
def kmp(p,t):
m = len(p) #size of pattern
n =len(t)
table=prefix(p)
j=0
for i in range(n):
while j > 0 and p[j] != t[i]:
Unit 4 101
Basic Algorithms
j=table[j-1]
if(p[j] == p[i]):
j+=1
if j == m:
print("Match found at", i)
return i-m+1
kmp('aba','cbabababaa')
The running time for the KMP algorithm is ϴ m + n (Cormen et al., 2009).
Summary
Tree traversal algorithms involve visiting all the nodes of a tree in a systematic
order. There are four fundamental tree traversal algorithms: inorder, preorder, post-
order, and level-order.
Searching and sorting are fundamental algorithmic problems with broad applica-
tions. A basic linear search comes in two variants: those for unordered and those
for ordered sequences. For ordered sequences, a more efficient algorithm is the
binary search algorithm. Some fundamental sorting algorithms include insertion
sort, bubble sort, selection sort, quicksort, and merge sort. Preprocessing a set of
strings to facilitate efficient search with strings is a common problem in text pro-
cessing applications. The trie is an example of a data structure that stores such
preprocessed strings.
Common collision resolution schemes are chaining and open addressing. Under
open addressing, the different algorithms for generating the probe sequence are
linear probing, quadratic probing, and double hashing.
There are different approaches to the problem of searching for a fixed prepro-
cessed pattern string in a block of text. The naïve algorithm runs a sliding window
of the pattern across the text trying to discover matches. The Knuth-Morris-Pratt
algorithm, which constructs a prefix table to record information about prefixes of
the pattern that occurs within it, enables us to get a faster solution than the naïve
one.
102 Unit 4
Knowledge Check
You can check your understanding by completing the questions for this unit on the
learning platform.
Good luck!
Unit 5
Measuring Programs
STUDY GOALS
… understand tools to generate documentation and apply knowledge of the best practices
in documentation sharing.
DL-E-DLBCSL01-U05
104 Unit 5
5. Measuring Programs
Introduction
Measurements are of paramount importance in any scientific process. Observations
based on measurements lead to generalizations and the development of theories that
facilitate the implementation of the process. The software development process,
including design, coding, debugging, testing, verification, and integration, has also
benefited from the development and application of measurement methodologies.
Different types of metrics have evolved to capture and measure various aspects of pro-
grams. Some metrics attempt to capture features of the product such as complexity,
size, or performance. One such measure is “cyclomatic complexity.” Based on the cyclo-
matic number in graph theory, this metric tries to measure the difficulties involved in
testing and understanding a program.
“Process metrics” are those that target improvements of the software development
process and maintenance. A primary goal in the software development process is to
ensure that the implementation meets the requirement specifications. To achieve this
goal, “code coverage” was one of the first metrics developed for software testing. It tries
to measure to what extent the program is covered by the test cases. This is defined in
terms of various criteria, such as lines of code, instructions, functions, function calls, or
branches, which are expected to resemble a representative usage. A combination of
instruction coverage and branch coverage is commonly used today, and test coverage is
an important consideration in equipment certification in the avionics and automotive
industries.
Measuring Programs
Difficulties in Python
What is the type of a in line number six? Here, if num is odd, the type of a is “integer.” If
num is even, the type of a is “string.” Since the argument to the function typeCheck is
randomly generated, its parity becomes known only at runtime. Another difficulty is
that Python allows new code generation at runtime.
Type Inferencing in ML
Type inference has a long history in the context of functional programming languages.
Practical type inferencing was applied to the programming language “meta language”
(ML) by Robin Milner (Sebesta, 2016). ML is primarily a functional programming lang-
uage with support for the imperative style of programming. It has a syntax similar to
many imperative languages and is strongly typed, with all types being statically infer-
red. In ML, type declarations are not required if the types can be derived unambi-
guously. Standard ML (S ML) is a modern dialect of ML. Saarland Online S ML (SOSML) is
the online integrated development environment (IDE) for S ML developed at Saarland
University. Consider the computation of the semi-perimeter of a rectangle as the sum
of its width and height in SOSML. The following definitions are all equivalent, and all
produce the correct result whenever the type (real) of at least one of either width,
height, or the function, is specified. The type inference mechanism infers the missing
types as real.
If none of the types are specified, all the types default to integer, and the program rep-
orts an error when invoked with real parameters.
Statically typed languages obey a static type system, and the type system rules can be
checked at compile time. Declaring all variables with designated types and requiring
that expressions have well-defined types are ways to ensure that type system rules can
be verified at compile time. However, this is too conservative and comes at a price.
Consider the following Python code fragment:
x=1
if(0==1):
x="2+3"
Unit 5 107
Measuring Programs
else:
x=x+2
print(x)
This executes without error in Python and the value of x is correctly printed as 3. The
if branch is not executed and so does not interfere with the rest of the computation.
However, the types of x in the two branches of the conditional statement being diffe-
rent, a static type checker would have flagged an error. Thus, static type checking turns
out to be more conservative, which can be advantage for a clearer reading but can also
be seen as a lack of flexibility.
Cyclomatic Complexity
“Cyclomatic complexity” is an example of a predictor, or product metric. It is the mea- Product metric
sure of complexity in a program. While there are many complexity measures, it is A product metric is a
important to choose one that is largely independent of implementation characteristics software metric
such as source formatting and programming language. This measure was originally pro- associated with the
posed by Thomas McCabe (Kan, 2016), and tries to quantify the testability and maintai- software itself as
nability of software. For example, to measure the complexity of the control structure of opposed to control
a program, we consider its control flow graph. Cyclomatic complexity is the number of metrics, which are
linearly independent paths through this graph. Mathematically, the cyclomatic comple- associated with soft-
xity CC G of a control flow graph G with e edges, v vertices, and k components is defi- ware processes.
ned as follows (Kan, 2016):
CC G = e − v + 2k
This also represents the minimum number of paths whose linear combination can
generate all possible paths in the graph. Complexity is a major cause of software errors.
Studies have shown that cyclomatic complexity has a high correlation with errors in
software (Watson & McCabe, 1996).
The control flow graph for the above is the graph G1 below:
If we add the following (redundant) code just before the return, the control flow graph
changes to the graph G2 below.
else:
maxNum = num2
For G2, CC G2 = e − v + 2 = 8 − 7 + 2 = 3.
Simplified calculations
A straight-line control flow graph with one start and one exit node has a complexity of
e − v + 2 = 1. If we add p binary decision predicates, they add p to the cyclomatic com-
plexity since each decision predicate adds two edges and one vertex, which adds one
to the cyclomatic complexity. Thus, a control flow graph with all predicates being binary
predicates has a cyclomatic complexity of p + 1 where p is the number of binary deci-
sion predicates.
Unit 5 109
Measuring Programs
For a planar graph, Euclid’s formula gives us the number of edges e, the number of ver- Planar graph
tices v, and the number of regions r in the planar embedding of the graph (Rosen, A planar graph is a
2019): r = e − v + 2. Since CC G = e – v + 2, CC G = r. Thus, if the control flow graph graph that can be
is planar, the cyclomatic complexity is the number of regions in the planar drawing of drawn on the plane
the graph. without any edge
crossings.
Cyclomatic complexity using Radon
Radon is a Python tool that computes various software metrics from source code. Met-
rics computed include cyclomatic complexity, raw metrics related to the number of
lines, Halstead metrics, and maintainability index, among others.
aList = [23,34,2,13,11,-1,33,-44]
def linSearch(numList, keyValue):
index = 0
while(index < len(numList)):
if(keyValue == numList[index]):
return index
index += 1
return -1
def typeCheck(num):
if(num%2):
x = 123
else:
x = "123"
print(type(x))
def testMax(num1, num2, num3):
if(num1 > num2):
maxNum = num1
else:
maxNum = num2
if(num3 > maxNum):
maxNum = num3
return maxNum
The contents of the above file are fed to Radon’s ComplexityVisitor application pro-
gramming interface (API). Application pro-
gramming interface
from radon.visitors import ComplexityVisitor An application pro-
f=open("radonTest.py","r") gramming interface,
v = ComplexityVisitor.from_code(f.read()) or API, serves as an
f.close() intermediate layer
print(v.functions) that allows applicati-
ons to communicate.
This prints the output below. The complexity value refers to cyclomatic complexity
(Watson & McCabe, 1996):
110 Unit 5
This prints A, B, C, D, E, F .
In Radon, the cyclomatic complexity score is converted to a rank using the following
equation:
with A for a rank = 0 and Ϝ for rank ≥ 5, with A indicative of a simple block and Ϝ indi-
where H is the heavyside step function. The rank in turn is converted to a letter grade
cating a very high-risk block. The higher the cyclomatic complexity is, the more complex
the code will be. Such code is likely to be prone to coding errors and may be unstable,
requiring frequent modifications and bug fixes.
The data dependency complexity, within and across modules, is captured by the data
flow metrics. These measures are useful in practice. Dunsmore’s “data flow complexity”
Live variable is defined as the average number of live variables per statement in a block of code
A variable is said to (Chung, 1990).
be live between its
first and last refer- Chung (1990) redefined data flow complexity based on live variable assignments.
ences within a func-
tion. A definition of a variable v occurs in a statement whenever there is an assignment of a
value to v. A definition clear path to v is a path in which v is not reassigned. The defini-
tion of a variable v reaches the top of a block b of code if, and only if, there is a defini-
tion clear path from the definition of v to the top of block b. Analogously, we can
describe the notion of the definition of v reaching the bottom of b. The definition of v is
live at the top of b if the definition reaches the top of b and it is referenced later. Varia-
ble v is live at the top or bottom of a block b if there is a live definition of v. The total
number of live variables in a block or the total of live definitions of all live variables in
the block are suitable complexity measures.
Unit 5 111
Measuring Programs
Tools
Doxygen
Originally developed for C++, Doxygen can be used to generate documentation for C, C#,
Java, Python, and PHP. It can generate a HyperText Markup Language (HTML) file for
online browsing or a LaTeX file for creating an offline manual. It can also be used to
derive structure from code and to generate dependency graphs and collaboration dia-
grams.
Sphinx
Sphinx is a popular and comprehensive document generator for Python but is also
used for other languages. It generates automatic cross-referencing links for functions
and classes and creates indices. It also allows customization through user-defined indi-
ces. It uses the powerful reStructuredText markup language (Garcia-Tobar, 2017), which
is the basis of the readthedocs.io website.
Javadoc
Javadoc is used to generate HTML pages from Java source files and to parse declaration
and documentation comments in Java source files. The HTML pages describe the public
and protected classes, interfaces, nested classes, methods, and constructors. The java-
doc command can be run on entire packages or individual source files.
Swagger
Swagger is an “interface description language” for describing RESTful APIs used to com- RESTful APIs
municate with web services. Swagger Core generates an OpenAPI interface from existing A RESTful API is an
Java code. The documentation can be generated automatically from the API definition. API conforming to
the REST software
pdoc architectural style,
pdoc is used for generating Python documentation; it is simpler than Sphinx and has which defines a set
minimal setup requirements. The documentation is simply entered as a markdown of rules for creating
language. Moreover, pdoc automatically links identifiers in Python docstrings to corre- web services.
sponding documentation. Source code of functions and classes can be viewed in HTML.
112 Unit 5
Pydoc
Pydoc is an online help system and document generator in Python. The document may
Docstring be created as text or HTML and is derived from docstrings. In Python, docstrings help
A docstring is any to embed documentation into the source code (Goodrich et al., 2013), and are demarca-
string appearing as ted by triple quotes (“””) at the beginning and end. There are various ways to retrieve
the first statement of the documentation. For example, help(obj) for any object obj generates the corre-
a class, a member sponding documentation. Alternatively, the documentation could also be retrieved
function of a class, a using repr(linSearch.__doc__).
function, or a
module in Python. Below is an example using help(obj):
aList = [23,34,2,13,11,-1,33,-44]
def linSearch(numList, keyValue):
"""Search for keyValue in numList.
Args:
numList: a list of values
print(linSearch(aList,11))
help(linSearch)
Best Practices
Various best practices have been established by the Python community over time, facil-
itating code maintainability. Similar guidelines exist for any documentation generator
based on code. Guidelines for using docstrings in Python, for example, were documen-
ted in PEP 257 (Goodger & van Rossum, 2010). Recommended best practices include
Unit 5 113
Measuring Programs
• documenting modules. Each module should start with a top-level docstring that
outlines the purpose of the module. The subsequent paragraphs should describe
the module operations.
• documenting classes. There should be a class-level docstring for every class,
describing the purpose and operations. It should also describe the public attributes
and methods, and provide guidance for deriving subclasses, including information
on attributes and methods.
• documenting functions. This is similar to modules and classes. Additionally, there
should be explanatory entries for function arguments, return values, and any spe-
cial behaviors.
Issues in Sharing
• what the documentation contains. These include the qualities of being correct, com-
plete, and current.
• how the content is written and organized. These include ease of use, readability, and
usefulness for the intended purpose. For instance, a relevant issue is whether the
documentation can be understood by the intended audience. This forces documen-
tation authors to declare the expected knowledge of their readers. For example, the
target audience might be newbies in a community of developers.
• documentation generation tool and documentation processes.
alist=[]
pi=3.14
twoPi=2*3.14
for index in range(0,20):
alist.append(twoPi*index)
print(alist)
a=2
b=3
y=3**a + 3**a*b
print(y)
flag=True
if flag:
a += 1
else:
b+=1
Unit 5 115
Measuring Programs
Reduction in strength
Often operations can be replaced by more efficient alternatives (Aho et al., 2007). For
instance, x**5 is more efficient than making a function call pow(x,5).
Loop unrolling
Since condition checking of a loop is an overhead, if the loop runs for a small constant
number of times, it is more efficient to eliminate the loop construct and instead repeat
the code the required number of times (Aho et al., 2007).
Difficulties
To measure coverage, we need to first identify what part of the software is under con-
sideration: a file, module, library, or system.
The Metrics
The coverage can be counted at various levels of granularity in terms of the following
(Qian Yang et al., 2009):
Python Tool
Let’s look at the popular tool Coverage.py, which provides support for measuring code
coverage in Python, and illustrate its usage with an example. Consider the following
Python function to find the highest power of a given number that is a factor of a
second given number:
Let us assume that this function is in a file factors.py. Let us create another file
test_factors.py with the following code:
Measuring Programs
118 Unit 5
Unit Testing
Unit testing includes testing individual functions, classes, and methods with different
parameters. The wording “unit” testing does not imply that a notion of unit is defined
in the programming languages, but simply that it refers to a small part that can be tes-
ted. When testing a class, all parameters need to be checked, all attributes need to be
set, and all values verified. When using inheritance, operations need to be verified in
subclasses as well. Unit tests have a dual role: They should demonstrate the correct
expected behavior and also discover bugs.
Some accepted best practices for creating test cases (Whittaker, 2009) are as follows:
Consider a supermarket that maintains a list of their customers and awards points to
them from time to time during promotional offers. The points can be redeemed against
purchases. Consider an example: During one such promotion, the supermarket, Green
and Fresh, decides to award 50 points to all customers in the age group of 18,25
whose point balance is currently nil. We define a Python class for the customer data,
and also define a method offer() to calculate if an offer is being made to a customer
and, if so, update their points balance. The code is as follows:
import unittest
class CustData:
def __init__(self, ID, age):
self._ID = ID
self._age = age
self._points = 0
def get_ID(self):
Unit 5 119
Measuring Programs
return self._ID
def get_age(self):
return self._age
def get_points(self):
return self._points
def update_points(self,r):
self._points+=r
For testing purposes, we create a dataset of four customers. We add 50 points to the
balance of the customer with the ID 1322.
custList=[]
custList.insert(0,CustData(1555,18))
custList.insert(1,CustData(1322,23))
custList[1].update_points(50)
custList.insert(2,CustData(1687,25))
custList.insert(3,CustData(3231,53))
The first three customers are in the target age group for the current promotion, but the
second customer already has a non-zero balance and hence will not receive an offer.
The fourth customer is not in the offer’s target group. We create four tests to capture
this behavior, using Python “assertions.” Assertions are statements that must be true in
a program. The Python assert statement has an associated condition and an optional
error message. If the condition is not satisfied, the program halts and reports an
AssertionError. The optional error message, if specified, is also printed. The code is
as follows:
def test_offer():
assert custList[0].offer(18,25,50) == True,\
"test_offer0_FAIL"
test_offer()
The following statements allow us to run this from the command line with the com-
mand python -m unittest:
if __name__ == '__main__':
unittest.main()
Integration Testing
In integration testing, previously tested individual units are integrated into larger com-
ponents with the focus being on testing the program with its interfaces (Somerville,
2016). In software development, several interacting objects are combined into larger
components. Access to the object functionality is achieved via component interfaces.
Assuming unit testing on individual objects has been performed, the focus is then on
testing the interfaces and the components as a group to determine if they work
together as required (Sommerville, 2016). The following interfaces should be tested:
• parameter interfaces through which components exchange data and function refer-
ences
• shared memory interfaces in which components share a block of memory
• procedural interfaces wherein one component encapsulates procedures or func-
tions that are called by other components
• message passing interface
Errors can result from the calling component passing wrong parameter types, an incor-
rect number of parameters, or parameters in an incorrect order. Errors can also occur
due to the calling component not sending parameters satisfying some required proper-
ties. For instance, the calling component may invoke a function with an unordered list
when an ordered list is required.
Unit 5 121
Measuring Programs
Interface testing can be difficult since any defects may show up only under certain con-
ditions depending on the behavior of other components.
Memory Profiler
The memory usage of a process is monitored by a Python module called the Memory
Profiler. A line-by-line analysis of memory consumption is generated by the Memory
Profiler. It is built on top of the Python library psutil (process and system utilities),
which monitors and retrieves information on running processes and system utilization
(Python Software Foundation, 2021b).
To use the Memory Profiler, we can invoke Python from the command line as
With the decorator @profile, the functions being profiled can be marked. Here is an
example usage:
@ profile
def profileAnalysis():
a = [0] * (10**7)
b = a.copy()
c = a[:]
del c
del b
return a
if __name__ == '__main__':
profileAnalysis()
122 Unit 5
Note that in the second case, as we increased all list sizes by an order of magnitude,
the corresponding numbers also increased in the “Increment” column, which shows the
increase and decrease in memory usage as lists are dynamically created and deleted.
To generate and plot the memory usage over time, we can invoke the Memory Profiler
as follows:
The resultant plot for the above program is depicted in the figure below.
Unit 5 123
Measuring Programs
@profile
def profileAnalysis():
a=[]
b=[]
for i in range(0,10**5):
a.append(0)
b.append(0)
c = a.copy()
d = a[:]
del d
del c
del b
return a
124 Unit 5
In this case, c is an alias of a, so the creation of c does not require any incremental
memory. But d is a copy of a, so the creation of d requires an incremental memory.
Finally, d, c, and b all point to independent memory, hence their deletions release
memory. Note that here the statistics would vary with the runs. The internal implemen-
tation of the Python list does not allocate memory for each append. How the memory
is allocated and freed depends on internal memory management algorithms.
Summary
Despite efficiency being a key concern for programmers, code can often be opti-
mized automatically. Compiler optimization techniques include local and global
optimization, making small functions inline, taking repetitive computations outside
loops, eliminating common subexpressions, eliminating redundant stores, eliminat-
ing unreachable code, reduction in strength, and loop unrolling.
Unit 5 125
Measuring Programs
“Code coverage” is a software metric that tries to quantify to what extent the soft-
ware is verified by measuring the degree to which a suite of tests exercises a soft-
ware system. The Python tool Coverage.Py can be used for this purpose.
Knowledge Check
You can check your understanding by completing the questions for this unit on the
learning platform.
Good luck!
Unit 6
Programming Languages
STUDY GOALS
DL-E-DLBCSL01-U06
128 Unit 6
6. Programming Languages
Introduction
Over the years, many languages have been designed and implemented. The study of
programming languages is not just about studying the syntax of individual languages in
isolation, though. When mapping a problem’s solution to a computer program, the pro-
grammer must choose an appropriate programming language and then express the sol-
ution efficiently in the chosen language.
While languages support a wide variety of features, a few programming styles, or para-
digms, common to many languages have gradually emerged and evolved. Each para-
digm has its distinct advantages and disadvantages and is more suitable for certain
types of applied algorithms than others. Each paradigm also requires support in the
form of certain programming language features for effective usage. There are program-
ming languages that are exclusively meant for programming in one paradigm. Haskell,
for example, is a purely functional language. Some languages primarily provide support
for one paradigm more than other paradigms. For instance, Lisp is primarily a func-
tional programming language, although modern dialects of Lisp have features of imper-
ative programming. Many languages like Python or C++ are deemed to be multi-para-
digm and can be used to implement programs in different paradigms according to the
requirement.
There are concepts, such as “lazy evaluation” that are pervasive across languages and
can lead to code improvement, if used appropriately. Within the same language, there
are often alternative features to choose from, such as multiple loop constructs, for
example. A good understanding of different programming paradigms and certain key
concepts and features that are common to many programming languages is extremely
useful for good programming.
• imperative programming,
• object-oriented programming,
• functional programming,
• logic programming,
• programming for streaming data, and
• event-driven programming.
Unit 6 129
Programming Languages
Imperative Programming
partition1.py
L=[11,59,26,17,2,1,25,9,3,15]
pivot = 11
i=0
j=len(L)-1
while True:
while (i <= j) and (L[i] <= pivot):
i+=1
while (i <=j) and (L[j] > pivot):
j-=1
if(i <= j):
L[i],L[j] = L[j],L[i]
else:
break
print(L)
This code can be placed inside a function, with the list and pivot as parameters.
partition2.py
def partition(L,p):
i=1
j=len(L)-1
while True:
while (i <= j) and (L[i] <= p):
130 Unit 6
i+=1
while (i <=j) and (L[j] > p):
j-=1
if(i <= j):
L[i],L[j] = L[j],L[i]
else:
break
return L
aList=[11,59,26,17,2,1,25,9,3,15]
partition(aList,11)
Object-Oriented Programming
In the object-oriented paradigm, the first task is to identify the fundamental objects in
the design. Then, an abstraction is created, keeping the implementation details hidden.
Language features supporting the object-oriented paradigm facilitate the creation of
classes to implement these objects.
Let’s revisit the same problem of partitioning and consider a Python implementation
using the object-oriented paradigm. We encapsulate our solution in a class called Sco-
res. Then, we invoke the constructor for Scores and create an object called bList,
which is an instance of the class Scores. Finally, we invoke the partitioning method by
a call to bList.part(). The code is as follows:
partition3.py
class Scores:
def __init__(self,L):
self.S = L
def part(self,p):
i=0
j=self.size()-1
while True:
while (i <= j) and (self.S[i] <= p):
i+=1
while (i <=j) and (self.S[j] > p):
j-=1
if(i <= j):
self.S[i],self.S[j] = self.S[j],self.S[i]
else:
break
self.S[0],self.S[j]= self.S[j],self.S[0]
return self.S
def isEmpty(self):
return self.S == []
Unit 6 131
Programming Languages
def size(self):
return len(self.S)
bList =Scores([11,59,26,17,2,1,25,9,3,15])
print(bList.part(11))
Functional Programming
lambda a : a + 13
(lambda a : a + 13)(11)
The map function in Python has a syntax map (f, iter), where iter represents one or
more iterables and f is a function that is applied to each element of the iterables.
The reduce function has a syntax reduce(f, iter[,initial]), where iter repre-
sents one iterable and f is a two-argument function that is cumulatively applied to
each of its elements. This can be used, for example, to apply an aggregation function,
such as sum, to the elements in a list. The optional argument [,initial] can be used
to specify an initial value of the aggregation.
The filter function in Python has a syntax filter(f, iter), where iter represents
an iterable and f is a Boolean-valued function that is applied to each element of the
iterable. Only those items x in iter for which f(x) is true are output.
This is followed by another lambda function to filter out elements x > pivot:
Finally, we concatenate the two results and wrap them around another lambda func-
tion, to get our final solution, as follows:
132 Unit 6
partition4.py
aList=[11,59,26,17,2,1,25,9,3,15]
pivot=11
ans=(lambda L,p: list(filter(lambda x: x<= p,L)) + \
list(filter(lambda x: x> p,L)))(aList,pivot)
print(ans)
Logic Programming
Objects that support looping through a sequence of values are called “iterables.”
Examples in Python include data structures for collections, as well as lists, tuples, sets,
and dictionaries. Functions that iterate through an iterable object are known as “itera-
tors.” One problem associated with iterables is that all data need to be stored in
memory before we can iterate through them in a loop. There are situations where we
may not need the entire sequence after all and may break out of the loop after proces-
sing a few elements.
Unit 6 133
Programming Languages
A “data stream” is a sequence of data items that are available one at a time. Python
supports processing such streams of data by using a construct called the “generator”
(Goodrich et al., 2013). Consider the following function to generate the sequence of
Fibonacci numbers.
fibGen.py
def generateFib():
one = 0
other = 1
while (1):
yield one
another = one + other
one = other
other = another
gen = generateFib()
getLessThan(gen, 100)
The statement gen = generateFib() creates a generator for all Fibonacci numbers.
The function call next(gen) gets the next item from the stream of Fibonacci numbers
and the function call getLessThan(gen,100) gets all the Fibonacci numbers less than
100.
Event-Driven Programming
event.py
from tkinter import *
def clickButton():
print("Button Clicked")
w = Tk()
l = Label(w, \
text = "Event-Driven Programming")
b = Button(w, \
text = "Click here",command=clickButton)
134 Unit 6
l.pack()
b.pack()
w.mainloop()
A label and a button are created and displayed as shown below. We define a function
clickButton() that is bound to the defined button. When the user clicks the button,
the program processes this event via the callback function clickButton (). The pro-
gram then prints “Button Clicked.”
All programs are finally converted to machine language, which is a set of basic instruc-
tions in binary code. The CPU runs the program in a sequence of “fetch” and “execute”
commands: Fetch the next instruction, execute it, and repeat the process in a cyclic
manner. Some instructions are control instructions, which determine the order in which
the CPU executes the instruction sequence. This may require the CPU to go back to an
earlier instruction in the sequence while executing a loop. It may also skip certain
statements while executing a conditional instruction.
The CPU allocates an area in RAM where the program is loaded. It also allocates space
Process for data. The typical layout of a process in memory is shown in the figure below.
A process is a pro-
gram in execution.
Unit 6 135
Programming Languages
The execution of a program begins from the first instruction. The CPU outputs the
address of the memory location containing the next instruction. This is stored in the
program counter. The logic for the address decoding helps select not only the RAM
chip, but also the location allocated to the concerned address. The code for the
instruction is retrieved from RAM into the CPU via the data bus. This is read into an
instruction register. The contents of the register are decoded by the CPU, and instruc-
tion processing is initiated. The data or operands on which the instruction must act are
fetched from RAM via the data bus, similar to how the instruction was fetched.
Once the operands have been fetched, they are processed by the data processing logic
in the CPU. More data items may be required to be fetched, depending on the instruc-
tion. The partial results are stored in data registers and may be required to be written
back into RAM. The program counter is then incremented to the address of the next
instruction. The operating system, program code, and data are all in RAM at the time of
execution.
The operating system also provides support for concurrency, which allows multiple pro-
grams to work together. This includes
A process with a single thread has a single program counter. The execution of the proc-
ess proceeds sequentially until termination, one instruction at a time. A program with
multiple threads also has multiple program counters, one for each thread. As the pro-
gram executes, the process changes between the following various states:
The CPU executes a scheduling algorithm to select a program to run. CPU scheduling
algorithms are at the heart of multiprogramming. In a CPU with a single core, only a
single process runs at a time, while other processes wait for the CPU to be free. Typi-
cally, a process must wait to complete an I/O request. The CPU would remain idle dur-
ing this time unless it can be engaged in some other activity. The operating system
uses this time to schedule another process.
Programming Languages
Language examples
C, FORTRAN, Pascal, Ada, JavaScript, PHP, and Ruby are examples of imperative langua-
ges. Other multi-paradigm languages that also support the imperative style include
Python and C++.
Key features
Language support for this style of programming includes features such as procedural
abstractions, variable declarations, expressions, control structures for selection, itera-
tion, and branching operations.
The imperative style evolved with procedural abstractions at its core. The programmer
starts with a specification of a function along with its input and output parameters.
This allows the developer to concentrate primarily on the interface between the func-
tion and what it computes, and to ignore the algorithm and details of how it was com-
puted (Tucker & Noonan, 2007). This leads to the development of the program by a pro-
cess of stepwise refinement. First, the programmer starts with a description of the
program to be written along with its input and output specifications. This is then bro-
ken down hierarchically into smaller functions to be implemented.
At the heart of the syntax of all imperative languages is the assignment statement,
which takes the following form:
variable = expression
The expression is evaluated, and the output value is copied to the left-hand side. When
the right-hand side is also a variable, we need to distinguish between the cases where
the assignment merely creates an alias or a separate copy of the right-hand side.
alias.py
a = [2,3,4,5]
b = a
b[0]=-1
print(a)
print(b)
copy.py
a = [2,3,4,5]
b = a[:]
b[0]=-1
print(a)
print(b)
138 Unit 6
The above code prints [2, 3, 4, 5] followed by [-1, 3, 4, 5]. Here, b is a copy of
a.
The former is called “assign by reference” or “reference semantics.” The latter is cal-
led “assign by copy” or “copy semantics.” The latter is more common among imperative
languages.
Expressions are composed using arithmetic and logical operators, as well as built-in
functions of the language. In C, the assignment is also an operator, which returns a
value and hence can appear in an expression. This allows statements such as a = b =
c.
Object-Oriented Languages
Language examples
C++, Java, Python, and Ruby are some examples of object-oriented languages.
Key features
In object-oriented programming, a basic means of abstraction that supports the crea-
tion of user-defined types is the class. Instantiations of classes are called objects. How
data are represented is determined by the class. The actual data are stored by the
object. The class also has defined member functions, also called methods.
Programming Languages
Language examples
Lisp, Haskell, Scheme, ML, OCaml, and F# are some examples of functional program-
ming languages.
Key features
The functional programming style is regarded as the first significant departure from the
imperative style of programming. Lisp is the most widely used functional programming
language. It began as a pure functional programming language, but its subsequent dia-
lects incorporated various imperative features to improve computational efficiency.
A key feature of the imperative style is the notion of state, which is captured by values
of variables. This needs to be tracked during development. A pure functional program-
ming language has no variables or state (Sebesta, 2016). Without variables, iterative
loops cannot be implemented as in imperative languages. These are implemented indi-
rectly using recursion (Sebesta, 2016).
Language examples
Prolog is the most well-known and widely used logic programming language. Other
examples include ALF, Alice, and Datalog.
140 Unit 6
Key features
Logic programming languages follow a declarative style of programming. Programs writ-
ten in such languages specify the goals of the computation rather than details of an
algorithm to reach the goal (Tucker & Noonan, 2007). The goals are specified as a col-
lection of rules and constraints in symbolic logic, rather than assignments and control
flow statements. The language needs to support a mechanism to specify the rules and
the goal as logical statements, as well as an inference mechanism to reach the goal. In
Prolog, for instance, the representation of the rules and facts uses first-order predicate
logic. The inference mechanism runs a process of “resolution.” This involves creating a
negation of the goal and reaching a contradiction by repeated application of a simple
rule: (A OR B) AND (~A OR C) implies that (B OR C) is true. Whereas the program-
ming effort is reduced in logic programming, logic programming languages can be slow.
The efficiency of the solution is dependent on the efficiency of the inference mecha-
nism.
The form of the expressions, statements, and procedural units of a programming lan-
guage is called its “syntax,” and the meaning of these syntactical units is called the
“semantics.” “Pragmatics” refers to what the statements achieve in practice (Sebesta,
2016).
Example
while boolean_expression:
statement
The semantics of the while loop state that if the Boolean expression is true, the state-
ment will be executed. If there are multiple statements in the same block, they would
all be executed in order. Once this is completed, control returns to the Boolean expres-
sion for evaluation again. This is repeated until the expression evaluates to false. As an
example, consider the following code snippet in Python:
Unit 6 141
Programming Languages
while1.py
n=13
while n < 20:
n+=1
print(n)
The semantics of the while loop states that when the current value of the Boolean
expression is true, the statements in the scope of the while loop will be executed. Here,
the while loop prints the integers from 14 to 20. When n = 20 the condition fails, and
control leaves the while loop.
while2.py
n=13
while n > 20:
n+=1
print(n)
The semantics of the while loop construct have not changed. However, this loop will
not be executed since the Boolean expression will always be false.
while3.py
n=21
while n > 20:
n+=1
print(n)
Here, the control enters the while loop but never leaves since the Boolean expression
is always true. This is an infinite loop. Infinite loop
An infinite loop is a
This example illustrates a simple while loop construct, the syntax of which is well-defi- loop that does not
ned and the semantics well understood. The pragmatics indicate different behavior terminate.
under different conditions.
Short-Circuit Evaluation
short1.py
x = 20
y = 0
x > 20 and x/y < 5
142 Unit 6
The condition evaluates to False. Although the division by zero is not permissible, the
Python interpreter is still able to evaluate the truth value of the expression. Since the
subexpression x > 20 is False, the result of a logical and of any Boolean expression
with False would be False. So, the interpreter evaluates the whole expression to
False without evaluating the second subexpression x/y < 5. This is an example
of “short-circuit evaluation” or “lazy evaluation.” This provides a special and important
opportunity for code improvement and increased readability. If the first subexpression
is a very unlikely condition, and the second subexpression involves a very expensive
function call, short-circuit evaluation leads to significant time savings (Scott, 2016).
If we modify the above code fragment by initializing x to 21, x > 20 is True. So, the
second subexpression x/y > 5 is evaluated and the interpreter flags a ZeroDivisio-
nError. This can be corrected by introducing a “guard clause” as follows:
short2.py
x = 21
x > 20 and y!= 0 and x/y < 5
The semantics of the and expression is that it is both commutative and associative, so
the order of evaluation of the subexpressions should not matter. However, from the
point of view of the pragmatics of the short-circuit evaluation, we may reorder such
subexpressions to our advantage and improve the code.
short3.c
p = my_list;
while (p && p->key != val)
p = p->next;
If p is NULL, the subexpression p->key != val is not evaluated. This works because of
short-circuit evaluation.
Specifying Syntax
A language L is defined over an alphabet set Σ. The set of all strings that one can form
using characters from Σ is denoted as Σ*. The language L is a subset of Σ*. The syntax
rules of the language specify which strings are in L. At the lowest level, such strings,
called lexemes, include elements such as numeric literals, operators, and operands. A
program written in the language is a string of such lexemes. These lexemes are divided
into groups called tokens. For example, for a statement like i = 2j + 5, the tokens in a
language could include identifiers, integer literals, multiplication operators, and semi-
colons (Sebesta, 2016).
Unit 6 143
Programming Languages
An Ambiguity
if_then_else1.py
x=0
i=1
if i >= 0:
if i==0:
x=1
else:
x=2
print(x)
if_then_else2.py
x=0
i=1
if i >= 0:
if i==0:
x=1
else:
x=2
print(x)
The value printed upon executing the above code is 2. The two print statements print
differently although the code fragments have identical statements and are similar bar-
ring indentation. This is an example of a larger issue of syntactic ambiguity, which is:
Which if block do we pair the last else with? Python resolves this by allowing the user
to specify the correspondence by using appropriate indentation.
144 Unit 6
Languages such as C and C++ resolve this ambiguity by associating such a dangling else
with the textually closest if, allowing the user to override this default behavior by
using explicit braces.
Specifying Semantics
Although the grammar rules of a language can capture most syntactical rules, there are
some which cannot be captured. An example is the rule in many languages that
requires that variables must be declared before use. There are other rules specified by
the type system that require complex rules of grammar to be captured. These rules are
covered by static semantic rules of the language. These rules have more to do with the
validity of program syntax than the meaning of the program execution. Such static
semantic rules are specified and checked using the mechanism of attribute grammars
(Sebesta, 2016). Static semantics is so-called because it can be checked at compilation
time.
Dynamic semantics deals with the meaning of statements, expressions, blocks, and
functions. Describing the dynamic semantics of a language is more difficult than
describing static semantics. Precise semantic specifications could potentially lead to
correct-by-construction programs and make testing redundant.
Scope of Variables
The scope of a variable defines the part of the program where the variable can be
assigned or referenced. The scope rules of a language determine how a particular refe-
rence to a variable is associated with a declaration. The scope can be static or dynamic.
Unit 6 145
Programming Languages
In static scoping, also known as lexical scoping, the scope can be determined once the
code is written, prior to execution. In dynamic scoping, the scope of the variable
depends on the calling sequence of functions and hence can only be determined at
runtime. Most modern languages support static scoping.
Scopes may be nested or disjoint. When the scoping is disjoint, the same name can be
used for different entities. In C and C++, a block of statements enclosed within braces “
{“ and “}” defines a new scope (Tucker & Noonan, 2007). Blocks, but not functions, may
be nested in C and C++. Scoping rules in Python, however, are based on functions, so
nested functions are possible in Python.
Here, the two print statements print 2 and 1 respectively because the variable x inside
function A() is local, and the assignment x=2 does not change the value of the global
variable x, which is assigned a value of 1. Next, we change the variable inside function
A() to be global. Now, both print statements print 2 because the y being updated in
A() is the same variable as the y originally assigned to 1. The code is as follows:
global2.py
y=1
def A():
global y
y=2
print("A", y)
A()
print("Global", y)
global3.py
x=1
print("Global1", x)
def A():
x=2
print("A1",x)
def B():
x=3
print("B", x)
B()
print("A2",x)
A()
print("Global2", x)
namespace1.py
x=1
def A():
y=2
def B():
z=3
print(vars())
print(dir())
B()
print(vars())
print(dir())
A()
Type Systems
The type system associated with a programming language defines the built-in types, as
well as a set of rules for the creation of user-defined types and the usage of types in
the language. A type error results when a type system rule is violated. Type checking,
performed either at compile time or run time, is intended to detect type errors.
Type compatibility Type checking tests type compatibility in various situations, including the following:
The property that
allows a value of one
type to be accepta-
Unit 6 147
Programming Languages
• compatibility between operands of an operation. Some rules define how the type of ble when a value of
an expression is computed from the types of its constituents. another type is
• assignment statements. Rules govern the relationship between the type of the vari- expected is called
able on the left-hand side and the expression on the right-hand side of an assign- type compatibility.
ment statement.
• compatibility between the actual and formal parameters of a function. For example,
if a function expects an argument to be of a certain type according to its declara-
tion, would an argument of another type be acceptable in the function call?
• The result of the expression a + b is an integer if both a and b are integers. How-
ever, the type of the result is a floating-point number if the type of at least one of a
or b is a floating-point number.
• Consider the integer division 15//2, which evaluates to 7, but 15//2.0, where 15 is
implicitly converted into a floating-point number, is evaluated to 7.5.
• Also, bool(1==1) is True and bool(1==1) + 5 evaluates to 6. Here, the intermedi-
ate expression True + 5 is evaluated to 6, since the Boolean value True is coerced
into 1.
Summary
Different programming patterns, or paradigms, have evolved over the years, such as
imperative, object-oriented, functional, and logic programming. A programming lan-
guage may primarily support one of the paradigms; however, a multi-paradigm lan-
guage like Python allows us to program in different paradigms. The operating sys-
tem provides support for the execution of programs: The CPU runs the program in a
sequence of fetch-execute cycles.
148 Unit 6
Programming languages are often classified along their paradigm(s). There are cer-
tain key features of languages in each category. Language support for the impera-
tive style of programming includes various features, such as procedural abstrac-
tions, variable declarations, expressions, control structures for selection, iteration,
and branching operations. Object-oriented languages provide a syntactic structure
to leverage abstract data types in the language, facilities for classes, and inheri-
tance. Functional programming languages view a computation as a mathematical
function mapping its arguments to outputs. A functional language typically provides
some built-in functions, a mechanism to create more complex functions from the
primitive ones, and a function application operation. Logic programming languages
follow a declarative style of programming. Programs written in these languages
specify the goals of the computation rather than the details of an algorithm to
reach the goal.
A type in a programming language defines a set of values along with a set of opera-
tions that act on those values. When associated with a variable, the type deter-
mines what values the variable can take. Local scoping, global scoping, and name-
spaces are important notions. Explicit type conversions and automatic type
promotions allow expressions of mixed, but compatible, types to be evaluated.
Knowledge Check
You can check your understanding by completing the questions for this unit on the
learning platform.
Good luck!
Unit 7
Overview of Important Programming
Languages
STUDY GOALS
DL-E-DLBCSL01-U07
150 Unit 7
Introduction
Different programming languages have features that support programming paradigms,
such as imperative, object-oriented, functional, or logical. Often, a language can be
classified based on the paradigm, but it provides certain features that also support
another paradigm. Languages have their unique features and so, over time, they have
found usage in some specific application domains. WebAssembly enables the execu-
tion of compiled code on the web and supports many languages. C has been popular
for systems programming because of its low-level features. C++ is a multi-paradigm lan-
guage; it includes powerful constructs to support object-oriented programming. Java is
an object-oriented language, with support for efficient memory management, multi-
threading, and distributed computing. The reach and power of Java influenced the
design of C#, which was created by Microsoft for its .NET framework (Sebesta, 2016).
Haskell and Lisp are two well-known functional programming languages. Haskell is a
purely functional, statically typed language with lazy evaluation, list comprehension,
and minimalist syntax, whereas Lisp is considered more flexible and is dynamically
typed. Originally conceived as a functional alternative to Java, today JavaScript is a cen-
tral language in web applications. JavaScript has been designed around the idea of the
Document Object Model (DOM) with a hierarchy of parent and child objects. Ada, an
imperative programming language, was also an important milestone in the develop-
ment of programming languages because certain important features went on to influ-
ence the design of other programming languages. New languages continue to appear
regularly. Knowledge of the key features of different languages will help us choose one
that suits our needs for a particular requirement.
Assembler
for a specific architecture. Below is a small C program and its corresponding assembly
code in an 8086-like assembly language generated using the CtoAssembly tool. The
comments in the assembly code are summarized from comments generated by the Cto-
Assembly tool.
CtoAssemblyTest.c
int main ()
{
int a = 1;
int b = 2;
int i = 0;
while (i < 5)
{
a = a + b;
i++;
}
return 0;
}
main:
PUSH %BP ; Push base pointer onto stack
MOV %SP, %BP; base pointer = stack pointer
@main_body:
SUB %SP, $4, %SP ; reserve space on stack for a
MOV $1, -4(%BP); set a = 1
SUB %SP, $4, %SP; reserve space on stack for b
MOV $2, -8(%BP); set b = 2
SUB %SP, $4, %SP; reserve space on stack for i
MOV $0, -12(%BP); set i = 0
@while0:
CMP -12(%BP), $5; compare i with 5
JGE @false0; exit while loop if i >= 5
@true0:
ADD -4(%BP), -8(%BP), %0; compute a+ b
MOV %0, -4(%BP); a = a + b
INC -12(%BP); i++
JMP @while0; control goes back to while loop start
@false0:
@exit0:
@main_exit:
MOV %BP, %SP; stack pointer = base pointer
POP %BP; pops value from stack
RET
152 Unit 7
WebAssembly
WebAssembly (Wasm) enables execution of compiled code on the web without plug-ins
and includes the following components:
WebAssembly supports several compiled and interpreted languages. The text format
uses the syntax of symbolic expressions or S-expressions. As an example, consider a C-
function computing Fibonacci number as follows:
fib.c
int fib(int n)
{
int curr, next, sum;
curr = 1;
next = 1;
for(int i = 1; i <= n-2; i++) {
sum = curr + next;
curr = next;
next = sum;
}
return sum;
}
The tool WasmFiddle (Rourke, 2018) was used to generate the WAT code below.
fib.wat
(module
(table 0 anyfunc)
(memory $0 1)
(export "memory" (memory $0))
(export "fib" (func $fib))
(func $fib (; 0 ;) (param $0 i32) (result i32)
(local $1 i32); local variables declared
(local $2 i32)
(local $3 i32)
(block $label$0
(br_if $label$0
(i32.lt_s
(get_local $0)
(i32.const 3); loop skipped if n < 3
)
)
(set_local $0
Unit 7 153
(i32.add
(get_local $0)
(i32.const -2); compute n - 2
)
)
(set_local $2
(i32.const 1); next = 1
)
(set_local $3
(i32.const 1)
)
(loop $label$1
(set_local $2
(i32.add
(tee_local $1
(get_local $2)
)
(get_local $3)
)
)
(set_local $3
(get_local $1)
)
(br_if $label$1
(tee_local $0
(i32.add
(get_local $0)
(i32.const -1)
)
)
)
)
)
(get_local $2); the final result
)
)
WasmFiddle also creates the Wasm binary. The WAT is the textual representation of this
and is extremely useful for development and debugging. In the WAT expression, the
function parameter n in fib(int n) is indicated as (param $0 i32), where the vari-
able $0 represents n, and i32 represents a 32-bit integer. The type of the return value is
indicated by (result i32). The three local variables are declared as $1, $2, and $3.
WASM execution is defined in terms of a stack machine. The instruction get_local
pushes the value of a local variable read onto the stack. The instruction i32.add pops
the top two values from the stack, adds them, and pushes the result back onto the
stack. The instruction set_local pops from the stack into a local variable and
tee_local reads from the stack into a local variable but does not pop.
154 Unit 7
WebAssembly is a low-level binary format that is compatible with common web brow-
sers. Neither WebAssembly code nor the text-based WAT code is written by human
developers but generated from code written in high-level languages, such as C, C++,
Rust, and Go. The resultant code can be made to use memory very carefully, and is,
generally, fast. The Wasm code is loaded and executed in the browser using JavaScript
WebAssembly application programming interface (API).
The WasmFiddle interface is shown in the figure below. The C code is input by the user.
The WAT and JavaScript content is generated by WasmFiddle.
wasm.js
var wasmModule = new WebAssembly.Module(wasmCode);
var wasmInstance = new WebAssembly.Instance(wasmModule, wasmImports);
log(wasmInstance.exports.main());
The global WebAssembly object has two child objects WebAssembly.Module and
WebAssembly.Instance that are used to interact with WebAssembly and debug. The
WebAssembly.Module object contains WebAssembly code that has already been compi-
led. The WebAssembly.Instance object is an instance of a WebAssembly.Module,
which contains all the exported WebAssembly functions.
Namespaces
C++ provides a simple data-hiding principle based on namespaces. We aggregate rela- Namespace
ted data, functions, and variables into separate namespaces. This facilitates informa- A namespace is a
tion hiding. It also allows different identifiers with the same names to be used for dif- region in the pro-
ferent purposes. For example, we define a queue data structure in C++ and place that in gram that defines
a queue namespace. the scope of identi-
fiers declared inside
Queue.cpp it.
#include <iostream>
#include "string.h"
bool isFull=0;
void enQueue(int n)
{
if(isFull) return;
val[rear]=n;
num++;
rear=(rear+1)% maxSize;
if(num==maxSize) isFull=1;
}
int deQueue()
{
if(num==0) return-1;
int temp = val[front];
front =(front+1)% maxSize;
num--;
return temp;
}
}
int main()
{
testQ(35);
}
Classes
Templates
sum.c
#include <stdio.h>
int fun(int i, int j, int k);
int main() {
int a = 2, b = 3, c=1;
printf("a=%d, b=%d, c=%d\n",a,b,c);
printf("Result=%d\n",fun(a,b,c));
return 0;
}
If we now need a function to operate on arguments that are of type double, we need to
implement a different function. The template feature of C++ offers a simple solution to
this problem. The same function can operate on arguments of different types. In the
example below, the function fun is called with variables of type int, double, and
string. In the first two cases, it adds the arguments. For strings, the function interprets
a+b+c as the concatenation of strings a, b, and c,
sum.cpp
#include <iostream>
#include "string.h"
using namespace std;
template<class T> T fun(T i, T j, T k);
template<class T> T fun(T i, T j, T k)
{
return (i+j+k);
}
int main() {
int a = 2, b=3, c=1;
std::cout << "a=" << a << ", b=" << b << ", c=" << c << "\n";
std::cout << "Result = " << fun(a,b,c) << "\n";
float d=2.3, e=2.5, f=1.1;
std::cout << "Result = " << fun(d,e,f) << "\n";
string r = "Apple", s = "Orange", t = "Peach";
std::cout << "Result = " << fun(r,s,t) << "\n";
return 0;
}
158 Unit 7
Exception Handling
Often, when an error occurs, the action to be taken depends on the module that
invoked the function rather than the function where the error is detected. C++ allows us
to define an error handling function that is invoked on detection of the error. The
exception handling mechanism is a system stack unwinding mechanism that serves as
an alternative return mechanism, which has used beyond exception detection and
recovery. Due to the lack of such exception handling mechanisms, C programs return a
zero in case of success, or non-zero in case of error, instead of returning a useful value.
The key to the success of Java is the bytecode (Schildt, 2017), which is the output of the
Java compiler. It is a highly optimized set of instructions that is executed on the Java
Virtual Machine (JVM), Java’s runtime system and interpreter for the bytecode. The JVM
needs to be implemented for different platforms, but not the Java bytecode (Schildt,
Just-in-time 2017). However, now many Java programs are also compiled using a just-in-time (JIT)
This refers to compi- compiler when they start running, which compiles Java bytecodes to machine code at
lation during execu- run time.
tion rather than
before. Java has both classes and primitive types. Java arrays are instances of a specific class.
Instead of pointers, Java uses a reference type to point to instances of a class. One can-
not write stand-alone subprograms in Java, and all subprograms need to be wrapped in
classes as methods. In Java, a class can be derived from a single class only, although
Multiple inheritance some benefits of multiple inheritance can be achieved through the usage of a feature
This is a feature in called interface.
some object-orien-
ted languages Java supports an elaborate system of type conversions and automatic type promotions
wherein a class may that facilitates programming. For example, possible automatic type promotions among
be derived from numeric types in Java are shown in the figure below. An arrow from type A to type B
multiple classes. means that a variable of type A may be promoted to type B.
Unit 7 159
The Java package is a naming encapsulation construct. Public and protected variables
and methods, as well as those with no access specifiers, are visible to all other classes
within the same package.
Java supports templates, or generics, that allow for type parameterized classes. The Generics
syntax for a generic class is className<T>, where T is a type variable. For generic The feature of
methods in Java, generic parameters must be user-defined classes and not primitive classes called gener-
types. We can instantiate such generic methods multiple times. However, internally, the ics allow a method
method operates on Object class objects (Sebesta, 2016). to operate on
objects of various
types.
C#
The reach and the power of Java influenced the design of C#, which was created by
Microsoft for its .NET framework. C# is closely related to Java, shares similar syntax and
object models, and provides support for distributed computation.
“Get” and “set” methods give public access to private variables in a class. In the follo-
wing Java program, public access to the private field _value of class GetSetTest is pro-
vided by means of the getVal and setVal methods:
GetSetTest.java
class GetSetTest{
private int _value;
GetSetTest() {
_value=0;
}
public int getVal()
{
return _value;
}
public void setVal(int x)
{
_value = x;
}
}
public class Test{
In C#, the get and set methods do not need to be explicitly invoked. Its mechanism
allows us to access private variables with a syntax similar to that for public ones. An
example program follows:
Customer.cs
using System;
Generics in C#
C# also supports generics or template types. A method can be defined with arguments
or return objects of generic type T. C# also supports generic collection classes, which
allow for the definition of arrays, lists, stacks, queues, and dictionaries of generic type.
The following is an example of using generic stacks in C#:
GenericStacks.cs
using System;
using System.Collections.Generic;
{
public static void Main()
{
Console.WriteLine("Stack of Strings");
Stack<string> numbers = new Stack<string>();
numbers.Push("twenty one");
numbers.Push("thirty two");
numbers.Push("sixty three");
Console.WriteLine("\nPop", numbers.Pop());
Console.WriteLine("Top: '{0}'",numbers.Peek());
Console.WriteLine("Pop", numbers.Pop());
Console.WriteLine("\nStack of Integers");
Stack<int> figures = new Stack<int>();
figures.Push(21);
figures.Push(32);
figures.Push(63);
Console.WriteLine("\nPop", figures.Pop());
Console.WriteLine("Top: {0} ",figures.Peek());
Console.WriteLine("Pop", figures.Pop());
}
}
Generics in Java
In Java, we can define our own classes, variables, or methods with arguments of generic
type with parameterized declarations, as follows:
class MyClass<T>
public T myData;
public void myMethod<T>(T myArg)
However, support for generics is stronger in Java with wildcard types. Wildcard types are
not supported in C#.
Unit 7 163
Test.java
import java.util.ArrayList;
import java.util.List;
import java.util.Arrays;
import java.util.Collection;
import java.util.List;
public class Test{
public static void test(Collection<?> c){
for (Object n: c) {
System.out.print(n+" ");
}
System.out.println("");
}
public static void main(String []args){
List<Integer> L1=
Arrays.asList(1,2,3,4,5);
test(L1);
List<Float> L2=
Arrays.asList(1.1f,2.1f,3.1f,4.1f,5.1f);
test(L2);
List<String> L3=
Arrays.asList("a","b","c","d","e");
test(L3);
}
}
This prints
1 2 3 4 5
1.1 2.1 3.1 4.1 5.1
a b c d e
The signature of the test method test(Collection<?> c) allows us to work with lists
of type Integer, Float, and String. If we change this to test(Collection<?
extends Number>), it will work with Integer and Float, but not String. In general,
test(Collection<? extends X>) will work with a subclass of X, and test(Collec-
tion<? super X>) will work with a superclass of X. This gives us more flexibility in
working with generic types.
Lisp
Lisp (an abbreviation of “list processor”) was designed by John McCarthy in 1960 and is
regarded as the first functional programming language. It is also regarded as the
first “artificial intelligence language” (Tucker & Noonan, 2007). Used primarily for sym-
bolic data processing, Lisp has been used for solving various problems in artificial
intelligence, game playing, electronic circuit design, and other areas. Today, many dia-
lects of the original Lisp exist. However, due to portability problems, Common Lisp was
created in the 1990s and it combined features of several dialects, including Scheme,
while preserving the syntax, primitive functions, and basic features of pure Lisp
(Sebesta, 2016).
The two basic data objects in Lisp are atoms and lists. Atoms are indivisible objects
and may be either numeric or symbolic. Integers are real numbers and are examples of
numeric atoms. Symbolic atoms consist of strings with different restrictions on allowed
characters depending on the Lisp dialect being used. The list is a recursive structure
consisting of an opening parenthesis “(” followed by zero or more atoms or lists, and
ending with a closing parenthesis “)”. The following are valid lists in Lisp:
(1 2 3 4 5)
(1 (2 3) (4 (5 6)))
The syntax of Lisp is characterized by uniformity and simplicity: Both data and pro-
grams take the same form, that of lists. Consider the list (A B C). Interpreted as data,
it consists of three atoms A, B, and C. Interpreted as a program, it represents a function
named A, followed by two arguments B and C (Sebesta, 2016). Such symbolic expressi-
ons, or S-expressions, are similar to those used in the WAT format of WebAssembly. We
can define anonymous functions in Lisp using lambda expressions. The term “lambda”
Lambda calculus owes its origin to lambda calculus. Such an expression in Lisp evaluates to function
This is a formal sys- object. Let us define an anonymous function to compute the expression
tem in mathematical f x, y = 2x + 3y + 2.
logic based on func-
tion abstractions fxy.lisp
and applications. (LAMBDA (x y) (+ (* 2 x) (* 3 y) 2))
To evaluate this, we simply wrap this as the first member in a list, with arguments follo-
wing as second and third members:
((LAMBDA (x y) (+ (* 2 x) (* 3 y) 2)) 3 4)
To print the result, wrap the above as the second member in yet another list, with the
print function as the first:
We can define a named function using the defun keyword. Consider the following Fibo-
nacci number example:
fib.lisp
(defun fib (n)
(if (or (zerop n) (= n 1)) n
(+ (fib (- n 1)) (fib (- n 2)))))
(print (fib 9))
Lisp is used extensively for list processing and has support for list operations. The fun-
damental list operations CAR, CADR, CONS, and LIST are illustrated below.
The CAR function returns the first element of a list. Some examples are shown below.
The single quotation marks indicates that what follows is a list and not a function fol-
lowed by its arguments.
list1.lisp
(print (CAR '(A B C)))
(print (CAR '((A B) (C D))))
(print (CAR '(A (B C))))
The CDR function returns the given list with the first element removed. The CAR and CDR
functions may also be composed. CAAR is CAR after CAR, CDDR is CDR after CDR, and CADR
is CAR after CDR. Most dialects of Lisp allow between two and four such compositions.
list2.lisp
(print (CDR '(A B C)))
(print (CDR '(A (B C))))
(print (CADR '(A B C)))
(print (CDDR '(A B C)))
The CONS and the LIST functions create lists from arguments. CONS is a function with
two arguments that creates a list, with the first argument of the function becoming the
first element of the list and the second argument forming the rest of the list. LIST
takes any number of arguments and returns a list, the elements of which are the func-
tion arguments.
list3.lisp
(print (CONS 'A '(B C)))
(print (LIST 'A '(B C)))
(print (LIST 'A '(B C) 'D))
Haskell
Haskell is a purely functional language. Like Lisp, the fundamental data structure in
Haskell is the list. Some key features in Haskell include lazy evaluation, list comprehen-
sion, and minimalist syntax.
List comprehension
Lists in Haskell can be defined by enumeration, as in [2,3,5,7,9], and using ellipses
(..) as in [1,3..11]. List comprehension is a method to generate lists using a function
called a generator. We define each element of a list A as a function of the correspon-
ding element of another list B, i.e., A[i]=f(B[i]). For instance, we may define a list as
[2*x+1 | x <- [0..10]]. List comprehensions also allow us to define infinite lists, as
in [2*x | x <-[0,1..]].
list1.hs
print $ [2,3,5,7,9]
print $ [1,3..11]
print $ [2*x+1 | x <- [0..10]]
Lazy evaluation
Non-strict Haskell has non-strict semantics, which increases efficiency by using lazy evaluation to
Being non-strict is a avoid some computations (Sebesta, 2016). In lazy evaluation, a parameter of a function
property of a lang- is evaluated only if its value is needed for the evaluation of the function. Lazy evalua-
uage that allows a tion allows us to work with infinite lists. An example of a linear search on an ordered
function to be eva- list follows. It works not only for finite lists, but also for infinite ones.
luated, even if all
actual parameters linSearch.hs
are not evaluated. linSearch x (m:y)
| m < x = linSearch x y
| m == x = True
| otherwise = False
main = do
print $ linSearch 21 [2*x+1 | x <-[12,13,17,22,23,25]]
print $ linSearch 21 [2*x+1 | x <-[0,1..]]
print $ linSearch 22 [2*x+1 | x <-[0,1..]]
Minimalist syntax
Consider the simple Haskell program for computing Fibonacci numbers below.
recFib.hs
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)
main = do
print $ fib 10
Unit 7 167
iterFib.hs
f a b = a : f b (a + b)
fib = f 0 1
main = do
print $ take 10 fib
This prints the first ten Fibonacci numbers as [0,1,1,2,3,5,8,13,21,34]. Note that
there is no keyword to define the function in Haskell. The first line defines the function
f with parameters a and b. It outputs a to the output list and then implements the
pseudocode: sum=a+b; a = b; b = sum. The last line iterates this ten times. The second
line initializes the first two Fibonacci numbers: 0 and 1.
Haskell programs are succinct. Below is an example of finding the factors of a number:
factors.hs
factors n = [f | f <-[1..n], mod n f == 0]
main = do
print $ factors 60
This prints the factors of 60. The first line defines factors of n as the list of numbers
f, 1 ≤ f ≤ n such that n mod f = 0.
Now, consider the problem of partitioning an array of integers that involves rearranging
them so that integers less than or equal to a pivot appear before those greater than
the pivot. This problem forms a building block of other algorithms, such as quicksort
and various selection algorithms (Cormen et al., 2009). We again notice the strong dec-
larative style in the short Haskell solution below:
part.hs
part (i:j) = [x|x<-j, x <= i]++[i]++[x|x<-j, x > i]
main = do
print $ part [13, 9, 44, 53, 6, 5, 23, 2, 39]
all user actions including mouseovers, selecting, scrolling, clicking, and zooming. It
forms an interface between the user and the webpage. JavaScript runs inside the
browser and has access to the elements in the web document, local file systems, and
system resources. It is fully compatible with most browsers and is widely used for cli-
ent-side front-end scripting (Sebesta, 2016). Since then, JavaScript has found its way as
a programming language for servers, data-science, and many other applications. The
engine used is the same as that of a browser but without graphical aspects.
Basic Features
The basic syntax of JavaScript has similarities with C, with support for variables, expres-
sions, operators, conditionals, and loops. The code can be embedded in HTML code, as
shown in the following example to compute the mean of a sequence of numbers
entered by the user.
mean.html
<!DOCTYPE html>
<html>
<body>
<h2>Computing Mean</h2>
<p>Mean of a sequence of numbers.</p>
<p id="example"></p>
<script>
var n, i=0, sum = 0;
var body = document.body;
n = prompt("Enter count of numbers, range [1,50]", "");
var p1 = document.createElement('p');
if((n < 1) || (n > 50)){
p1.appendChild(document.createTextNode("Error!"));
}
else {
var aList = new Array(50);
p1.appendChild(document.createTextNode("Sequence: "));
for(i=0; i < n; i++) {
aList[i] = prompt("Enter next number","");
var t1 = document.createTextNode(aList[i]+" ")
p1.appendChild(t1);
sum+=parseInt(aList[i]);
}
var t2=document.createTextNode("Mean=" + sum/n);
p1.appendChild(t2);
}
body.appendChild(p1);
</script>
</body>
</html>
Unit 7 169
The Document Object Model (DOM) is a World Wide Web Consortium (W3C) standard
defining an application programming interface (API) for web documents to manipulate Application pro-
the tree of HTML elements. The HTML DOM is a standard for accessing, adding, deleting, gramming interface
or updating elements of an HTML document. JavaScript has been designed around the An application pro-
idea of the DOM with a hierarchy of parent and child objects. We illustrate this using an gramming interface,
example, in which a hierarchy of objects is defined using the document.createEle- or API, serves as an
ment() method with various table objects as arguments. The following (child, parent) intermediate layer
relationships are defined among various objects: (table, body), (tblbody, table), that allows applicati-
(row, tblbody), (cell,row), and (cellText, cell). ons to communicate.
table.html
<!DOCTYPE html>
<html>
<body>
<p id="example"></p>
<input type="button" value="Create a table" onclick='create_table()'>
<script>
function create_table() {
var n=0, i=0, sum = 0, num=0;
var body = document.body;
n = prompt("Enter count of distinct values, \
in the range [1,10]", "");
if((n < 1) || (n > 10)){
t1=document.createTextNode(" Error, wrong value");
body.appendChild(t1);
}
else {
var A = new Array(10);
var B = new Array(10);
for(i=0; i < n; i++) {
A[i] = prompt("Enter next score","");
B[i] = prompt("Enter next frequency","");
170 Unit 7
sum+=parseInt(A[i]*B[i]);
num+=parseInt(B[i]);
}
With a set of five scores and frequencies, the HTML page renders and displays the table
as follows:
Unit 7 171
The Relatives
Despite the popularity of JavaScript, there are similar languages that are more suitable
for specific applications and can be easily compiled into JavaScript. These include
Typescript, Coffeescript, Elm, Roy, Opal, and Clojurescript (Fogus, 2013). Moreover, Java-
Script is often processed by various transformation tools so that it becomes more com-
pact, more contextualized, less readable, or better performing. Finally, JavaScript is, as
of May 2022, becoming increasingly popular for server-side applications where the
NodeJS environment allows the language to perform extremely well in input-output
operations because of its functional aspects.
Ada
Ada was developed as part of an extensive effort in the 1970s by the US Department of
Defense (Tucker & Noonan, 2007). In Ada 95, extensions for supporting object-oriented
programming were added to the original, largely imperative, Ada 83, making Ada a
multi-paradigm language (Tucker & Noonan, 2007). Ada was an important milestone in
the development of programming languages since certain important features went on
to influence the design of other programming languages (Sebesta, 2016), including the
following:
• The idea of generics was introduced allowing procedures to be defined with param-
eters of unspecified types. The generic procedure can be instantiated for a particu-
lar type at compile time.
Rendezvous • Support for concurrency is provided through the rendezvous mechanism for syn-
This is a mechanism chronization and communication (Tucker & Noonan, 2007).
for synchronization
between a pair of Ada 2005 added some more features, such as interfaces and greater control over
tasks, allowing data scheduling algorithms. Ada is widely used in avionics, air traffic control, and rail trans-
to be exchanged portation (Sebesta, 2016).
between them and
coordinated execu-
tion. Perl
Perl found wide usage as a scripting language. It can be compiled into a machine-inde-
pendent bytecode, which can then be interpreted or compiled into an executable pro-
gram.
Perl is dynamically typed. Built-in data structures include dynamic arrays with integer
indices and associative arrays with string indices. Support for classes was added in Ver-
sion 5, allowing Perl to be used as a multi-paradigm language. Perl lacks generics, over-
loading, and exception handling (Tucker & Noonan, 2007), but a strength of Perl lies in
its support for regular expressions; it is not surprising that Perl is best known for text
processing.
PHP
With the need for dynamic, database-driven content for websites, technologies sup-
porting such content emerged in the mid-1990s. The Hypertext Preprocesssor (PHP),
developed by Rasmus Lerdorf, was originally called “Personal Home Page Tools.” It
emerged as a general-purpose server-side scripting language that can be embedded in
HTML (JavaScript is used for client-side scripting).
PHP is integrated with several database management systems including MySQL, Micro-
soft SQL Server, Oracle, Informix, and PostgreSQL. It has a simple syntax, is open-
source, and is loosely typed, making it easy to use (Nixon, 2018).
Summary
Among the many programming languages that exist, many stand out due to some
specific features. At the same time, languages that are closely related, for example,
by supporting similar programming paradigms, have certain crucial differences.
C is a simple and structured imperative programming language that has been pop-
ular for embedded systems programming and has strongly influenced the design of
many programming languages.
The design of C# was influenced by Java, with which it is closely related. C# and Java
share similar syntax and object modeling, and both provide support for distributed
computation.
Haskell and Lisp are functional programming languages. Whereas Lisp is considered
more flexible and is dynamically typed, Haskell is a purely functional language and
statically typed. Some key features in Haskell include lazy evaluation, list compre-
hension, and a crisp syntax.
Knowledge Check
You can check your understanding by completing the questions for this unit on the
learning platform.
Good luck!
Evaluation 175
Congratulations!
You have now completed the course. After you have completed the knowledge tests on
the learning platform, please carry out the evaluation for this course. You will then be
eligible to complete your final assessment. Good luck!
Appendix 1
List of References
178 Appendix 1
List of References
Aghajani, E., Nagy, C., Linares-Vásquez, M., Moreno, L., Bavota, G., Lanza, M., & Shepherd,
D. C. (2020, July 6—11). Software documentation: The practitioners’ perspective. 2020
IEEE/ACM 42nd international conference on software engineering (ICSE) (pp. 590—601).
IEEE.
Aghajani, E., Nagy, C., Vega-Marquez, O. L., Linares-Vasquez, M., Moreno, L., Bavota, G., &
Lanza, M. (2019). Software documentation issues unveiled. 2019 IEEE/ACM 41st interna-
tional conference on software engineering (ICSE) (pp. 1199—1210). IEEE.
Ahmad, A., & Koam, A. N. A. (2020). Computing the topological descriptors of line graph
of the complete m-ary trees. Journal of Intelligent and Fuzzy Systems, 39(1), 1081—1088.
https://doi-org.pxz.iubh.de:8443/10.3233/JIFS-191992
Aho, A. V., Lam, M. S., Sethi, R., & Ullman, J. D. (2007). Compilers: Principles, techniques,
and tools (2nd ed.). Addison-Wesley.
Almeida, J. B., Frade, M. J., Pinto, J. S., & Melo de Sousa, S. (2011). Rigorous software
development: An introduction to program verification. Springer.
Clarke, E., Biere, A., Raimi, R., & Zhu, Y. (2001). Bounded model checking using satisfiabil-
ity solving. Formal Methods in System Design, 19(1), 7—34. https://doi.10.1023/
A:1011276507260
Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). Introduction to algorithms
(3rd ed.). MIT Press.
Even, G., & Medina, M. (2012). Digital logic design: A rigorous approach. Cambridge Uni-
versity Press. https://.doi.10.1017/CBO9781139226455.009
Appendix 1 179
List of References
Filippova, K., & Strube, M. (2009). Tree linearization in English: Improving language
model based approaches. NAACL-short 09’: Proceedings of human language technolo-
gies: The 2009 annual conference of the North American chapter of the Association for
Computational Linguistics, companion volume: Short papers (pp. 225—228). https://doi-
org.pxz.iubh.de:8443/10.3115/1620853.1620915
Gabrielli, M., & Martini, S. (2010). Programming languages: Principles and paradigms.
Springer.
Goodger, D., & van Rossum, G. (2010). Docstring conventions. In M. Alchin (Ed.), Pro
Python (pp. 303—307). Apress. https://doi-
org.pxz.iubh.de:8443/10.1007/978-1-4302-2758-8_15
Goodrich, M. T., Tamassia, R., & Goldwasser, M. H. (2013). Data structures and algorithms
in Python. Wiley.
Horowitz, E., Sahni, S., & Rajasekaran, S. (2008). Computer algorithms/C++. Universities
Press.
Jujuedv, & PHP Wellnitz. (n.d.). SOSML IDE (Version 1.6.4) [Computer software]. https://
sosml.org/
Kan, S. H. (2016). Metrics and models in software quality engineering (2nd ed.). Pearson.
Kernighan, B. W., & Ritchie, D. M. (1988). The C programming language (2nd ed.). Pear-
son.
Knuth, D. E. (1998). The art of computer programming: Sorting and Searching (2nd ed.,
Vol. 3). Pearson.
Knuth, D. E. (2013). The art of computer programming: Fundamental algorithms (3rd ed.,
Vol. 1). Addison-Wesley.
Levitin, A. (2012). Introduction to the design and analysis of algorithms (3rd ed.). Pear-
son.
Miller, B. N., & Ranum, D. L. (2013). Problem solving with algorithms and data structures
using Python (2nd ed.). Franklin Beedle Publishers.
180 Appendix 1
Nixon, R. (2018). Learning PHP, MySQL and JavaScript (5th ed.). O’Reilly.
Peled D., & Qu, H. (2003). Automatic verification of annotated code. In H. Konig, M.
Heiner & A. Wolisz (Eds.), Formal techniques for networked and distributed systems—
FORTE 2003. Lecture notes in computer science (Vol. 2767). Springer. https://doi.org/
10.1007/978-3-540-39979-7_9
Pratt, T. W., & Zelkowitz, M. V. (2001). Programming languages: Design and implementa-
tion (4th ed.). Prentice-Hall.
Python Software Foundation. (2021b). Memory Profiler (Version 0.58.0) [Computer soft-
ware). https://pypi.org/project/memory-profiler/
Qian Yang, J., Li, J., & Weiss, D. M. (2009). A survey of coverage-based testing tools. The
Computer Journal, 52(5), 589—597.
Rosen, K. H. (2019). Discrete mathematics and its applications (8th ed.). McGraw-Hill.
Schildt, H. (2017). Java: The complete reference (10th ed.). Oracle Press.
Tucker, A. B., & Noonan, R. E. (2007). Programming languages: Principles and paradigms
(2nd ed.). McGraw Hill.
von Neumann, J. (1945). First draft of a report on the EDVAC. University of Pennsylvania.
Watson, A. H., & McCabe, T. J. (1996). Structured testing: A testing methodology using the
cyclomatic complexity metric (NIST Special Publication 500—235). NIST. http://
www.mccabe.com/iq_research_nist.htm
Appendix 1 181
List of References
Spyder IDE
Source: Prosenjit Gupta (2022), based on Spyder IDE (2021).
SOSML IDE
Source: Prosenjit Gupta (2022), based on Jujuedv & PHP Wellnitz (n.d.).
Another Example
Source: Prosenjit Gupta (2022), based on Pedregosa (2021).
Mailing address:
Albert-Proeller-Straße 15-19
D-86675 Buchdorf