CSE Time Table Jul-Nov 2018-Final-Rooms
CSE Time Table Jul-Nov 2018-Final-Rooms
CS244
May 5, 2018
1
The following report is the explanation of working of a two pass assembler given as a part of
project in System Programming Lab (CS244). The report includes detailed explanation of how
the two pass assembler is different from a single pass assembler. It also includes explanation of
working of macros, linker, loader with respect to our project. The report provides a detailed
explanation on how our assembler operates and the methods, notations and data structures used
along with the sources.
1 Introduction
As part of System Programming lab project on assembler, we created a two pass assembler. The
project uses notations an data structures used in IBM 8085 microprocessor. The simulator is
custom made. The compiler can compile upto 2 nested loops, if-else nested statements, for- while
loops. Basic arithmetic and logical operations are supported.
2 Assembly Language
An assembly language is a low-level programming language for microprocessors and other pro-
grammable devices. It is not just a single language, but rather a group of languages. An assembly
language implements a symbolic representation of the machine code needed to program a given
CPU architecture.
An assembly language is the most basic programming language available for any processor. With
assembly language, a programmer works only with operations that are implemented directly on
the physical CPU.
Assembly languages generally lack high-level conveniences such as variables and functions, and
they are not portable between various families of processors. They have the same structures and
set of commands as machine language, but allow a programmer to use names instead of numbers.
This language is still useful for programmers when speed is necessary or when they need to carry
out an operation that is not possible in high-level languages.
A program written in assembly language consists of a series of mnemonic processor instructions and
meta-statements (known variously as directives, pseudo-instructions and pseudo-ops), comments
and data. Assembly language instructions usually consist of an opcode mnemonic followed by a
list of data, arguments or parameters. These are translated by an assembler into machine language
instructions that can be loaded into memory and executed. For example, the instruction below
tells an x86/IA-32 processor to move an immediate 8-bit value into a register. The binary code for
this instruction is 10110 followed by a 3-bit identifier for which register to use. The identifier for
the AL register is 000, so the following machine code loads the AL register with the data 01100001.
10110000 01100001
This binary computer code can be made more human-readable by expressing it in hexadecimal as
follows.
B0 61
Here, B0 means ’Move a copy of the following value into AL’, and 61 is a hexadecimal repre-
sentation of the value 01100001, which is 97 in decimal. Assembly language for the 8086 family
provides the mnemonic MOV (an abbreviation of move) for instructions such as this, so the ma-
chine code above can be written as follows in assembly language, complete with an explanatory
comment if required, after the semicolon. This is much easier to read and to remember.
2
3 8085 Programming Model
• It uses English like words to convey the action/meaning called as MNEMONICS
• For e.g. MOV to indicate data transfer ADD to add two values SUB to subtract two values
• Assembly language is specific to a given processor.
MOV B 4FH
MOV A B
LXI H 2050H
MOV M B
OUT 01H
IN 07H
ADD 32H
ADD B
SUB 32H
SUB C
INCR D
DCR E
JMP 2080H
CALL 2050H
BNZ
3
• Machine control instructions are HLT, NOP etc.
An example of an assembly language program to add two no. and store them in a register is given
below
MVI D 2H
MVI E 3H
MOV A D
ADD E
MOV C A
HLT
4
3.4 Imperative Statement
An imperative statement indicates an action to be performed during the execution of the assembled
program. Each imperative statement typically translates into one machine instruction.
4 Linker
4.1 What is a linker ?
In computing, a linker or link editor is a computer utility program that takes one or more object
files generated by a compiler and combines them into a single executable file, library file, or another
’object’ file.
Computer programs typically are composed of several parts or modules; these parts/modules
need not all be contained within a single object file, and in such cases refer to each other by means
of symbols. Typically, an object file can contain three kinds of symbols:
• defined "external" symbols, sometimes called "public" or "entry" symbols, which allow it to
be called by other modules,
• undefined "external" symbols, which reference other modules where these symbols are de-
fined, and
• local symbols, used internally within the object file to facilitate relocation.
Computer programs are usually made up of multiple modules that span separate object files,
each being a compiled computer program. The program as a whole refers to these separately-
compiled object files using symbols. The linker combines these separate files into a single, unified
program; resolving the symbolic references as it goes along.
5
4.4 Linker: Part 1
We mentioned earlier that a declaration of a function or a variable is a promise to the C compiler
that somewhere else in the program is a definition for that function or variable, and that the linkers
jobs is to make good on that promise. With the diagram of an object file in front of us, we can
also describe this as "filling in the blanks".
With these two diagrams, we can see that all of the dots can joined up (if they couldn’t be,
then the linker would emit an error message). Every thing has its place, and every place has its
thing, and the linker can fill in all of the blanks as shown (on a UNIX system, the linker is typically
invoked with ld).
As for object files, we can use nm to examine the resulting executable file: This has all of the
symbols from the two objects, and all of the undefined references have vanished. The symbols
have also all been reordered so that similar types of things are together, and there are a few added
extras to help the operating system deal with the whole thing as an executable program.
The main observation that affected the function of the linker is this: if lots of different programs
need to do the same sorts of things (write output to the screen, read files from the hard disk, etc),
then it clearly makes sense to commonize this code in one place and have lots of different programs
use it.
This is perfectly feasible to do by just using the same object files when linking different pro-
grams, but it makes life much easier if whole collections of related object files are kept together in
one easily accessible place: a library
Running the program obviously involves executing the machine code, so the operating system
clearly has to transfer the machine code from the executable file on the hard disk into the computer’s
memory, where the CPU can get at it. This chunk of the program’s memory is known as the code
segment or text segment.
Recall from an earlier section that a global variable can start off with a particular value. In
C, constructing the initial value of such a global variable is easy: the particular value is just
copied from the data segment of the executable file into the relevant place in the memory for the
soon-to-be-running program.
6
In C++, the construction process is allowed to be much more complicated than just copying
in a fixed value; all of the code in the various constructors for the class hierarchy has to be run,
before the program itself starts running properly.
To deal with this, the compiler includes some extra information in the object files for each
C++ file; specifically, the list of constructors that need to be called for this particular file. At link
time, the linker combines all of these individual lists into one big list, and includes code that goes
through the list one by one, calling all of these global object constructors.
Note that the order in which all of these constructors for global objects get called is not defined-
it’s entirely at the mercy of what the linker chooses to do. (See Scott Meyers’ Effective C++ for
more details-Item 47 in the second edition, Item 4 in the third edition).
There are various things here, but the one we’re interested in is the two entries with class as W
(which indicates a "weak" symbol) and with section names like ".gnu.linkonce.t.stuff ". These are
the markers for global object constructors, and we can see that the corresponding "Name" fields
look sensible-one for each of the two constructors used.
5 LOADER
5.1 Definition - What does Loader mean?
A loader is a major component of an operating system that ensures all necessary programs and
libraries are loaded, which is essential during the startup phase of running a program. It places
the libraries and programs into the main memory in order to prepare them for execution. Loading
involves reading the contents of the executable file that contains the instructions of the program
and then doing other preparatory tasks that are required in order to prepare the executable for
running, all of which takes anywhere from a few seconds to minutes depending on the size of the
program that needs to run.
The loader is a component of an operating system that carries out the task of preparing a pro-
gram or application for execution by the OS. It does this by reading the contents of the executable
file and then storing these instructions into the RAM, as well as any library elements that are
required to be in memory for the program to execute. This is the reason a splash screen appears
right before most programs start, often showing what is happening in the background, which is
what the loader is currently loading into the memory. When all of that is done, the program
is ready to execute. For small programs, this process is almost instantaneous, but for large and
complex applications with large libraries required for execution, such as games as well as 3D and
CAD software, this could take longer. The loading speed is also dependent on the speed of the
CPU and RAM.
Not all code and libraries are loaded at program startup, only the ones required for actually running
the program. Other libraries are loaded as the program runs, or only as required. This is especially
true for applications such as games that only need assets loaded for the current level or location
that the player is in.
7
Though loaders in different operating systems might have their own nuances and specialized
functions native to that particular operating system, they still serve basically the same function.
The following are the responsibilities of a loader:
2. Copy necessary files, such as the program image or required libraries, from the disk into the
memory
3. Copy required command-line arguments into the stack
4. Link the starting point of the program and link any other required library
6 Assembler
WHAT IS ASSEMBLER?
An assembler is a program that converts an assembly level language code (also called as
mnemonic code) into machine language code and provides necessary information for the loader
to load the
programme.
Example:
MOV AX, X
–MOV is a mnemonic opcode.
–AX is a register operand in symbolic form.
–X is a memory operand in symbolic form.
8
Assembler languages-structure
<Label> <Mnemomic> <Operand> Comments
. Label - symbolic labeling of an assembler address (command address at Machine level)
. Mnemomic - Symbolic description of an operation
. Operands - Contains of variables or addresse if necessary
. Comments
- ignored by assembler
- used by humans to document/understand programs
- tips for useful comments:
. avoid restating the obvious, as " decrement R1"
. provide additional insight, as in " accumulate product in R6"
9
Literals & Constants
int z=5;
x = x + 5;
1. Literal cannot be changed during program execution
2. Literal is more safe and protected than a constant.
3. Literals appear as a part of the instruction.
Pass I
. Pass I uses the following data structures
1. Machine Opcode table (OPTAB)
2.Symbol Table (ST)
3.Literal Table (LT)
4.Pool Table (PT)
10
7 MACROS
. Macro instructions are single-line abbreviations for group of
instructions.
. Macro is an abbreviation for a sequence of operations .
. Syntax/structure
IMPLEMENTATION
STATEMENT OF PROBLEM
SPECIFICATION OF DATA BASES
SPECIFICATION OF DATA BASE FORMAT
ALGORITHM
Advantages
functions do not have to implemented twice Less overhead during processing
More flexible for programmer
Disadvantages
Large program
More complex
11
7.1 Basic Introduction
A macro (short for "macroinstruction", from Greek ?????? ’long’) in computer science is a rule
or pattern that specifies how a certain input sequence (often a sequence of characters) should be
mapped to a replacement output sequence (also often a sequence of characters) according to a
defined procedure. The mapping process that instantiates (transforms) a macro use into a specific
sequence is known as macro expansion. A facility for writing macros may be provided as part of a
software application or as a part of a programming language. In the former case, macros are used
to make tasks using the application less repetitive. In the latter case, they are a tool that allows
a programmer to enable code reuse or even to design domain-specific languages.
Macros are used to make a sequence of computing instructions available to the programmer as
a single program statement, making the programming task less tedious and less error-prone.[1][2]
(Thus, they are called "macros" because a "big" block of code can be expanded from a "small"
sequence of characters.) Macros often allow positional or keyword parameters that dictate what
the conditional assembler program generates and have been used to create entire programs or
program suites according to such variables as operating system, platform or other factors. The
term derives from "macro instruction", and such expansions were originally used in generating
assembly language code.
12
definition and the macro use are distinct, allowing macro definers and users not to worry about in-
advertent variable capture (cf. referential transparency). Hygienic macros have been standardized
for Scheme in both the R5RS and R6RS standards. The upcoming R7RS standard will also include
hygienic macros. A number of competing implementations of hygienic macros exist such as syntax-
rules, syntax-case, explicit renaming, and syntactic closures. Both syntax-rules and syntax-case
have been standardized in the Scheme standards.
Recently, Racket has combined the notions of hygienic macros with a "tower of evaluators", so
that the syntactic expansion time of one macro system is the ordinary runtime of another block
of code,[13] and showed how to apply interleaved expansion and parsing in a non-parenthesized
language.[14]
7.2.4 Applications
Evaluation order
Macro systems have a range of uses. Being able to choose the order of evaluation (see lazy
evaluation and non-strict functions) enables the creation of new syntactic constructs (e.g. control
structures) indistinguishable from those built into the language. For instance, in a Lisp dialect
that has cond but lacks if, it is possible to define the latter in terms of the former using macros.
For example, Scheme has both continuations and hygienic macros, which enables a programmer to
design their own control abstractions, such as looping and early exit constructs, without the need
to build them into the language.
Data sub-languages and domain-specific languages
Next, macros make it possible to define data languages that are immediately compiled into
code, which means that constructs such as state machines can be implemented in a way that is
both natural and efficient.[15]
Binding constructs
Macros can also be used to introduce new binding constructs. The most well-known example
is the transformation of let into the application of a function to a set of arguments.
Felleisen conjectures[16] that these three categories make up the primary legitimate uses of
macros in such a system. Others have proposed alternative uses of macros, such as anaphoric
macros in macro systems that are unhygienic or allow selective unhygienic transformation.
The interaction of macros and other language features has been a productive area of research.
For example, components and modules are useful for large-scale programming, but the interac-
tion of macros and these other constructs must be defined for their use together. Module and
component-systems that can interact with macros have been proposed for Scheme and other lan-
guages with macros. For example, the Racket language extends the notion of a macro system to
a syntactic tower, where macros can be written in languages including macros, using hygiene to
ensure that syntactic layers are distinct and allowing modules to export macros to other modules.
13