BCS - Compiler Construction - Notes
BCS - Compiler Construction - Notes
Chapter 1
Introduction to compiler Construction
H:\COMPILER\COMPILER.DOC
2
What is Compiler?
DEF: “ A compiler is a program which takes a HLL ( High Level
Language ) program as an input and translates it into an
equalent program in Assembly Language. ”
Objective: To translate a HLL program into Assembly Language.
HLL Program Assembly Language Program
Source Program
Compiler Target Program
Or Source Code
H:\COMPILER\COMPILER.DOC
3
Assembly Language
Assembler
Object Code
Linker
Executable Program
H:\COMPILER\COMPILER.DOC
4
H:\COMPILER\COMPILER.DOC
5
2 – Syntax Analyzer:
DEF “A program which performed syntax analysis”.
In this phase tokens are collected from the lexical analyzer one by one
until a statement is complete. Then it is verified that the statement is
complete , i.e. statement has been build according to the
rules / syntax / grammar of the language. During this process usually a tree
called parse tree is constructed.
To illustrate the construction of the parse tree consider the following
simple rules in the language.
Rule 1 – An identifier is an expression.
Rule 2 - A number is an expression.
Rule 3 – Sum of two expression is an expression.
Rule 4 – Product of two expression is an expression.
Rule 5 – expression is equal to expression is an assignment statement.
expr id | num | expr + expr | expr * expr
asgn expr = expr
The parse tree is saved in main memory for subsequent processing.
To save space this tree is compress, i.e. redundant information is removed.
The compressed form of a parse tree is called syntax tree. In a syntax tree
an operator appears as an interior node and operand appears as a children.
Syntax Tree: A compressed parse tree is called Syntax Tree. In a
syntax tree an operator appears as an interior node and operand appears as
children. (as shown in Fig II).
Operator
The parse tree of the statement F.H = 1.8 * centigrade +32
2*3 is illustrated in figure I and the syntax tree is shown in
figure II.
Operands
asgn =
id expr + expr * 32
1.8 centigrade
H:\COMPILER\COMPILER.DOC
6
3 – Semantic Analysis:
In this phase it is verified that a grammatically correct statement is
meaningful.
In programming, meaningful would be confined to the type checking
only.
H:\COMPILER\COMPILER.DOC
7
5 – Code Optimization:
In this phase the number of statements in TAC are reduced. For e.g.
Statement: a = b * c + c
TAC: temp1 = b * c
Optimized Code
temp2 = temp1 + c
a = temp1 + c
a = temp2
6 – Code Generation:
In this phase, the optimized TAC translated into Assembly language.
Exercise Transform the following input statements by different phases
of the compiler:
1–a=b*c+c
2 – F.H = 1.8 * centigrade + 32
3 – v = 49 – 9.8 * t
4 – s = 49 * t – 4.9 * t * t
5 – s = vi * t + * a * t * t
6 – rad = deg / 180 * ¶
The six phases of the compiler lexical analysis to code generation are
called Formal Phases. In addition to these there are two Informal Phases
which interacts with all the formal phases of the compiler, i.e.
H:\COMPILER\COMPILER.DOC
8
Informal Phases:
I Symbol Table Manager
II Error Handler
H:\COMPILER\COMPILER.DOC
9
Chapter 2
Lexical Analysis
H:\COMPILER\COMPILER.DOC
10
H:\COMPILER\COMPILER.DOC
11
H:\COMPILER\COMPILER.DOC
12
If two regular expressions r and s denotes the same language then r = s, for
e.g. { a | b } = { b | a }.
H:\COMPILER\COMPILER.DOC
13
H:\COMPILER\COMPILER.DOC
14
H:\COMPILER\COMPILER.DOC
15
//lex specification
%%
//rules part
\+?[1-9] { printf(“Positive digit”); }
\+?[1-9] [1-9]* { printf(“Positive Number”); }
-[1-9] { printf(“Negative digit”); }
-[1-9] [1-9]* { printf(“Negative Number”); }
[Ff][Oo][Rr] { printf(“Reserved Word for”); }
[A-Za-z] [A-Za-z0-9]* { printf(“Identifier”); }
%%
Q – Why is identifier define at the end of a lex specification?
Identifiers must be defined at the end, otherwise reserved words would
also be recognized as an identifier.
Example(b) Write lex specification (using definition part also) to
recognize:
1 – Positive digit.
2 – Positive Number.
3 – Negative digit.
4 – Negative Number.
5 – Reserved Word.
6 – Identifier.
//lex specification
//definition part
L [A-Za-z]
NZD [1-9]
D [0-9]
%%
//rules part
\+?{NZD} { printf(“Positive digit”); }
\+?{NZD}{D}* { printf(“Positive Number”); }
-{NZD} { printf(“Negative digit”); }
-{NZD}{D}* { printf(“Negative Number”); }
[Ff][Oo][Rr] { printf(“Reserved Word for”); }
{L}({L}|{D})* { printf(“Identifier”); }
Representations of Tokens:
A token is a meaningful word of the language.
In practice a token is represented by a numeric constant associated
with it. The value of this numeric constant should not be less than 256,
, 0 - 255 are used for ASCII code of the characters and 256 is used for
error condition.
H:\COMPILER\COMPILER.DOC
16
H:\COMPILER\COMPILER.DOC
17
Chapter 3
Syntax Analysis
H:\COMPILER\COMPILER.DOC
18
DEF “Tokens are collected one by one to form statement and checks for its
grammar.”
Q – How to specify the syntax of the language?
For this purpose a specialized notation called Context Free Grammar
(CFG) is used. CFG can be used to specify the hierarchical structure of the
programming construct, e.g. using this notation ‘if‘ statement in C can be
defined as:
If_stmt if(expr1) stmt1 else stmt2
where,
means ‘is defined by’ or ‘can have the form’.
The above form of representation is called Production. In this definition
if, (, ), else are indivisible lexical units and are called Terminals.
If_stmt, expr1, stmt1, stmt2 are composite entities and are called Non-
Terminals. Non-terminal consists of terminals and non-terminals in terms of
it’s component. And these components can be terminals as well as non-
terminals. The symbol on LHS of a production must be a non-terminal.
Backus Norman Form (BNF) is used to specify the syntax/grammar of
a language. This grammar is used in the front end of the compiler.
Definition Of Context Free Grammar (CFG):
CFG is a notation used to specify the syntax of a language. CFG has
four components, i.e.
1 – A set of terminals.
2 – A set of non-terminals.
3 – A set of productions.
4 – Starting symbol. ( one of the non-terminal is designated as starting
symbol)
The syntax of a language is specified by defining all the non-terminals
of the language or by writing production for all the non-terminals.
The production for the starting symbol is listed first.
Example Write CFG for expressions consisting of digits, separated
by + and/or - ?
expr expr + digit
expr expr - digit
expr digit
digit 0
digit 1
digit 2
: :
digit 9
H:\COMPILER\COMPILER.DOC
19
| expr - digit
| digit
digit 0 |1 |2 |3 … |9
Using this grammar it can be proved that 8 – 6 + 2 is an expression.
Proof(a)
To prove: expr = 8 – 6 + 2 expr
8 – 6 + 2
8 – 6 + digit expr + digit
8 – digit + digit
digit – digit + digit expr - digit 2
expr – digit + digit
expr + digit digit 6
expr Fig I
8 Bottom Up Parsing
yield: 8 – 6 + 2 = input = 8 – 6 + 2
The yield matched the input, input is correct.
In this case constructing of the parse tree started from the leaves and
it went up to the root, this type of construction is called Bottom Up
construction of the tree and this type of parsing is called Bottom Up Parsing.
Proof(b)
To prove: expr = 8 – 6 + 2 expr
expr
expr + digit expr + digit
expr – digit + digit
digit – digit + digit expr - digit 2
8 – digit + digit
8 – 6 + digit digit 6
8 – 6 + 2 Fig II
8 Top Down Parsing
In this case input string has been derived from the starting symbol, input
string is correct. While deriving this string the non-terminals are processed
from left to right or non-terminals on the left side are processed first. The
steps involved in this derivation can also be represented in the form of a tree
shown in figure II.
Note: Every signals on the left side of a CFG is a non-terminal.
H:\COMPILER\COMPILER.DOC
20
8 6 6 2
Fig I Fig II
yield: (8 – 6) + 2 yield: 8 – (6 + 2)
Associativity of Operator:
Suppose an operand has the same operators on its left and right side
then:
I- If the operator on the left applies first, the operator is left
associative.
II - If the operator on the right applies first, the operator is right
associative.
For e.g.
H:\COMPILER\COMPILER.DOC
21
1- All the four basic arithmetic operators +,-,*,/ are left associative
8 - 6 - 2 means (8 – 6) – 2.
Note: Tree of the grammar for left associative operator grows towards left
and tree of the grammar for right associative operator grows towards right.
H:\COMPILER\COMPILER.DOC
22
| array[simple] of type
simple integer
| char
| num..num assume that num is a positive integer
This grammar defines subset of types in Pascal language.
C Pascal
int x x: integer
int *ptr = &x ptr: integer
ptr: ^x
int A[20] A: array[0..19] of integer
H:\COMPILER\COMPILER.DOC
23
Recursive Grammar:
DEF “A production is recursive if it defines a non-terminals in terms of
itself.” OR
“A grammar is recursive if at least one production of its is non-
terminal.”
For e.g.
expr expr + digit --- left recursive
expr digit + expr --- right recursive
expr digit + expr - digit --- ____________ ?
Note: A recursive production which is not left recursive is right recursive.
Left Recursion:
DEF “If a production occur on the extreme left of the RHS of the
production.”
Left recursion is dangerous for top-down parsing. It can be proved as follows:
Proof Prove that left recursion is dangerous for top-down parsing?
(a) For this purpose consider a left recursive grammar:
expr expr + digit
| digit
In this case the parser can select any one of the two production, which
production will be selected, its depend upon the look ahead symbol.
expr digit
expr expr + digit
expr + digit + digit
expr + digit + digit + digit
expr + digit + digit + digit + digit ……………
:
:
In this way parser can enter in an infinite loop. This is why the left
recursion is dangerous for top-down parsing because can take place from left
to right and the parser can enter in to an infinite loop.
H:\COMPILER\COMPILER.DOC
24
A A’ II
A’ A’ | є
Let us find the output of (strings that can be derived from) the above
grammar,
A A’
A’ є
A є ----- A (parsing terminates)
A’ A’
A A’
A’ є ----- Aє (parsing terminates)
A’ A’
A A’
A’ є ----- A є (terminates)
A’ A’
A A’ ………
:
H:\COMPILER\COMPILER.DOC
25
Output: The string that can be derived from this grammar is a ’s followed by
zero or more ’s, i.e.
є, є, є, є, … = , , , , …
A A’
A A’
…… ……є
Grammar I Grammar II
We know that left recursion is dangerous for top-down parsing. So, for top-
down parsing a left recursive grammar must be converted into equalent right
recursive grammar. Converting a left recursive grammar into an equalent
right recursive grammar is called elimination of a left recursive from a
grammar.
Exercise Eliminate left recursion from the following grammars:
1– EE+T|T
TT* F|F
F (E) | id
2– expr expr + digit
| expr - digit
| digit
digit 0 |1 |2 |3 … |9
H:\COMPILER\COMPILER.DOC
26
d + b $
Stack
The parsing process depends
a
upon two symbols:
1 – Current input token ‘a’. Parsing Program
X
2 – Symbol ‘X’ on top of u
the stack. v
w
c M Parsing Table
$
Top Down Parsing
There are three possibilities:
1 – X = a = $, parsing is complete accept the input.
2 – X = a $, pop X off the stack throw it away and advance the input.
3 – If X is a non-terminal then consult the entry M[X,a] of the parsing table.
If this slough is blank, it is an error, report syntax error. If it is an X
production of the form Xuvw then pick X from the stack throw it
away and push the symbol of RHS on to the stack in reverse order, so
that u comes on the top of the stack.
4 – Xuvw is the output.
Example Consider the following grammar:
E TE’
E’ +TE’ | є
T FT’
T’ * FT’ | є
F (E) | id
Non-terminals: E, E’, T, T’, F
Terminals: +, *, (, ), id, $
Starting Symbol: E
H:\COMPILER\COMPILER.DOC
27
NT\Term + * ( ) id $
E ETE’ ETE’
E’ E’+TE’ E’є E’є
T TFT’ TFT’
T’ T’є T’*FT’ T’є T’є
F F(E) Fid
H:\COMPILER\COMPILER.DOC
28
( In this grammar it has been assumed that lower case letters represents
non-terminals and upper case letters represents terminals. )
FIRST(x) = { A, J, P }
Q – What should be done if a RHS begins with a non-terminal ?
In this case that non-terminal ( i.e. the beginning non-terminal) replaced by
its first set is included in the first set of the non-terminals on LHS. Now
consider the following grammar:
x ABm
| jkT
| Pq
m sj
| yk
j DMa
| CNB
FIRST(x) = { A, FIRST(j), P }
{ A, { D, C }, P }
{ A, D, C, P }
A function that computes first sets of a non-terminal of a grammar is called
First Function.
Rules For Finding First Set:
1 – If x is a terminal then FIRST(x) = {x}.
2 – If x є the is included in FIRST(x).
3 – Consider x A, where A is a terminal and is a sequence of zero or
more terminals and non-terminals. Then A is included in FIRST(x).
4 – Consider x b, where b is a non-terminal and x is a sequence of
zero or more terminals and non-terminals. Then FIRST(b) is included
in FIRST(x), x b, where is a sequence of zero or more
terminals and non-terminals, b is a non-terminals and is a sequence
of terminals and non-terminals.
Then, FIRST(x) = FIRST() OR FIRST(b)
FIRST() U FIRST(b)
Example Find first sets of all the non-terminals of the following
grammar:
E TE’
E’ +TE’ | є
T FT’
T’ * FT’ | є
F (E) | id
FIRST(E) = FIRST(T) = FIRST(F) = { (, id }
FIRST(E’) = { +, є }
FIRST(T) = { (, id }
FIRST(T’) = { *, є }
H:\COMPILER\COMPILER.DOC
29
FIRST(F) = { (, id }
Exercise Find first sets of all the non-terminals of the following
grammar:
stmt expr;
expr term expr’
expr’ + term expr’ | є
term factor term’
term’ * factor term’ | є
factor (expr) | id
Follow Set:
DEF “A terminal is included in the follow set of a non-terminal if it follows
(comes immediately after) that non-terminal in any production.”
Example Consider the following grammar, in this grammar it has been
assumed that lower case represents non-terminals and upper case
Represents terminals:
s AbM
| bTk
k JbS
| XY
…………
FOLLOW(b) = { M, T, S }
Q – What should be done if a non-terminal is followed by another non
terminal?
s AbM
| bTk
k JbS
| XY
t GnU | BX
H:\COMPILER\COMPILER.DOC
30
H:\COMPILER\COMPILER.DOC
31
Finding follow set is a multi pass (multi step) process. During the first
pass rules 1 and 2 are applied. During the subsequent passes rule 3 is
repeatedly applied until it doesn’t add anything new to the follow sets
Example Find follow sets of all the non-terminals of the following
grammar?
E TE’
E’ +TE’ | є
T FT’
T’ * FT’ | є
F (E) | id
H:\COMPILER\COMPILER.DOC
32
1 – Consider: E TE’
Compare with: A
A = E, = TE’,
FIRST() = FIRST(TE’) = FIRST(T) = { (, id }
a = (, id
For a = (,
M[A, a] = A
M[E, (] = E TE’
For a = id
M[A, a] = A
M[E, id] = E TE’
M[A, a] = A
H:\COMPILER\COMPILER.DOC
33
M[E’, +] = E +TE’
For a = є
Find, FOLLOW(A)
FOLLOW(E’) = { ), $ }
b = ), $
For b = )
M[A, b] = A є
M[E’, )] = E’ є
For b = $
M[E’, $] = E’ є
3 – Consider: T FT’
Compare with: A
A = T, = FT’,
FIRST() = FIRST(FT’) = FIRST(F) = { (, id }
a = (, id
For a = (,
M[A, a] = A
M[T, (] = T FT’
For a = id
M[A, a] = A
M[T, id] = T FT’
4– Consider: T’ * FT’ | є
Compare with: A
A = T’, = * FT’| є,
FIRST() = FIRST(* FT’| є) = { * ,є }
a = * ,є
For a = *
M[A, a] = A
M[T’, *] = T’ *FT’ | є
For a = є
Find, FOLLOW(A)
FOLLOW(T’) = { +, ), $ }
b = +, ), $
For b = +
M[A, b] = A є
M[T’, +] = T’ є
For b = )
M[A, b] = A є
M[T’, )] = T’ є
For b = $
H:\COMPILER\COMPILER.DOC
34
M[A, b] = A є
M[T’, $] = T’ є
4– Consider: F (E) | id
Compare with: A
A = F, = (E) | id,
FIRST() =FIRST(F) = FIRST((E) | id) = { (, id }
a = (, id
For a = (,
M[A, a] = A
M[F, (] = F (E)
For a = id
M[A, a] = A
M[F, id] = F id
Parsing Table
NT\Term + * ( ) id $
E ETE’ ETE’
E’ E’+TE’ E’є E’є
T TFT’ TFT’
T’ T’є T’*FT’ T’є T’є
F F(E) Fid
Exercise Find FIRST and FOLLOW sets of all the non-terminals also
construct the parsing table of the following grammars:
1 – stmt expr;
expr term expr’
expr’ + term expr’ | є
term factor term’
term’ * factor term’ | є
factor (expr) | id
2– se
e te’
e’ +te’ | є
t f t’
t’ * f t’ | є
f (e) | id
Bottom-Up parsing:
H:\COMPILER\COMPILER.DOC
35
In this type of parsing the construction of the tree starts from the
leaves and it proceeds up to the root. This type of parsing is generated by a
parser (generated by a tool).
H:\COMPILER\COMPILER.DOC
36
2
37
Step 2 – Process the specification using yacc tool. For this purpose use
the following command in UNIX environment.
H:\COMPILER\COMPILER.DOC
38
$ yacc first.y
This generates the parser program in the form of a function
yyparse() and by default saves it in a file y.tab.c It can be
renamed if desired.
Step 3 – Compile this C program. For this purpose use the following
command:
$ cc y.tab.c -ly
where, ‘cc’ mean compiler compiler.
‘y.tab.c’ is a default file name.
‘ly’ means library yacc.
This generates an executable file a.out which is the parser.
F (E) | DIGIT
Step 1 – Write yacc specification, i.e.
H:\COMPILER\COMPILER.DOC
39
//yacc specification
%{
#include< ctype.h > //for type checking
%}
%token DIGIT
%%
line : expr ‘\n’ { printf(“%d\n”,$1); } // $ can be used interchangeably
expr: expr ‘+’ term { $$ = $1 + $3 }
| term
term: term ‘*’ factor{ $$ = $1 * $3 }
| factor
factor: ‘(‘ expr ‘)’ { $$ = $2 }
| DIGIT
%%
//lexical Analyzer
int yylex()
{
int c;
c = getche();
if( isdigit(c)) // function in ctype.h
{
yylval = c – ‘0’;
return yylval;
}
return c;
}
Save it in a file first.y and perform steps 2 and 3 as mentioned above.
Exercise Develop Lexical Analyzer and parser program separately for
the desk calculator?
H:\COMPILER\COMPILER.DOC
40
Chapter 4
Semantics Analysis
H:\COMPILER\COMPILER.DOC
41
Repeat
In this phase it is verified that a grammatically correct statement is
meaningful.
In programming, meaningful would be confined to the type checking
only.
H:\COMPILER\COMPILER.DOC
42
Chapter 5
Intermediate Code Generation
H:\COMPILER\COMPILER.DOC
43
Intermediate Languages:
There are many intermediate languages but we shall consider only
three or four intermediate languages / intermediate representation.
1 – Syntax Tree: It represents =
hierarchical structure of the statement.
For e.g. the syntax tree of thea +
statement a = b * -c + b * -c is shown
in fig. I * *
b uminus b uminus
Fig I Syntax Tree
2– DAG (Directed Acyclic Graph): It =
contains the same information as
syntax tree but here the information isa +
in more compact form. A DAG for the
above statement is shown in fig II. *
b uminus
Fig II DAG
H:\COMPILER\COMPILER.DOC
44
id a +
* *
id b U- id b U-
id c Id c
H:\COMPILER\COMPILER.DOC
45
Binding Of Names:
In programming language environment state
semantics Environment is
a function that maps a name storage value
name to a storage location
location; State is a function
Thus assignment statement changes the r-value while l-value remains the
same.
Heap
47
Activation Record:
DEF “An activation record is used to store the information regarding a
single execution of a procedure or function.“
In languages like Pascal, C etc. the activation record is usually pushed on to
the runtime stack when a procedure is called. It is popped off when a control
returns to the caller. The activation record is a record / structure having the
following fields:
1 – Temporaries: The intermediate values of the Return Value
expressions are evaluated and placed in temporary Actual
variables. Parameters
2 – Local Data: This field contains the data i.e. local to Optional
the execution of a procedures / functions. Control Links
3 – Saved Machine Status: It holds the information Saved
about the state of the machine just before calling a Machine
procedure, i.e. this field stores the values of IP and Status
other registers.
Local Data
4 – Optional Access Link: Non local data held in other
activation record. This field is not required in the case Temporaries
of a FORTRAN, but is required in the case of Pascal, C, etc.
5 – Optional Control Link: Points to the activation record of the calling
procedures / functions.
6 – Actual Parameters: This field is used by the calling procedure to
supply actual parameters.
7 – Return Value: This is used by the called procedure to return a value
to the calling procedure.
All of these fields are not always used. Sometimes registers are used in
the place of some of the fields.
Symbol Table:
DEF “A Symbol Table is a data structure which contains information about
tokens and their attributes.”
H:\COMPILER\COMPILER.DOC
48
H:\COMPILER\COMPILER.DOC
49
n * Cn
cnn
Total effort for n entries and inquiries:
cnn + eCn
( n + e )Cn
In a medium size program we might have n = 100; e = 1000;
Effort = ( 100 + 1000 ) * 100
1100 * 100
1,10,000 C = E (say)
If the size of the program becomes 10 times larger then
n’ = 1000; e’ = 10,000
Effort = (n’ + e’) n’c
(1000 + 10,000) 1000 C
11,000,000 C
100 (110,000 C)
100 E
Thus, if the size of a program increases by 10 times the effort require to
compile that program increases 100 times.
Here comes the inefficiency.
2 – Hash Table: Closed Hash Table Open Hash Table
We shall 0 Data 0 Pointer Data NULL
consider Open 1 Data 1 Pointer Data NULL
Hash
Table, i.e. : :
I– A hash table : :
consisting of an : :
array of n pointers. : :
II – m separates : :
link list called : :
buckets. ( m <=n ). 210 Data 210 Pointer NULL
Data
Each record appears exactly in one of the lists. To convert an entry s
into index of an array we apply a hash function h(s). Returns an integer
between zero and (n-1). If s is in the symbol table then it is in the list h(s)
otherwise this element is inserted at the front of this list. As a rule of thumb
average length of a list is n / m. If there are n names entered into the table.
For hash table effort, for an inquiry is en / m effort for an entry Cn / m. in
hash table length of each list reduces to n / m items.
H:\COMPILER\COMPILER.DOC
50
m
where m can be made as large as we like. This method is more
efficient then link list, Suppose m = n / 2 then
E = 2 ( e + n ) Cn
n/2
2 ( e + n) C
For a medium size program:
n = 100; e = 1000
E = 2 ( e + n) C
2 ( 1000 + 100 ) C
2 ( 1100 ) C
2200 C = 1 E
If the program becomes 10 times larger then on the average, i.e.
n = 1000; e = 10,000
Now, E = 2 ( e + n) C
2 ( 10,000 + 1000 ) C
2 ( 11,000 ) C
22000 C
10 ( 2200 ) C
10 E
This time the effort also becomes 10 times larger, thus effort varies
linearly as the size of the program.
Space Required:
1 – n words for hash table.
2 – bn words for n entries where b is the space required for one entry.
OR
float volume( int r, int h )
{
return M_PI * r * r * h;
H:\COMPILER\COMPILER.DOC
51
}
In this example r and h are formal / dummy parameters.
Values of these parameters must be supplied when the function is
called. These values are called Actual Parameters.
Exercise Find out the Actual and Formal Parameters in the following
functions definitions:
1–
# include < ---- >
inline int square ( x ){ return x * x; }
void main(void)
{
int a = 5, b;
b = square(a);
cout<<“The square of “<<a <<” is “<<b;
}
2–
# include < ---- >
inline int square ( x ){ return x * x; }
void main(void)
{
cout<<“The square of “<<5 <<” is “<<square(5);
}
Parameter Passing:
DEF “Associativity actual parameters with formal parameters at the line of
function call is called the Parameter Passing.”
1 – Call By Value:
The formal parameters are treated as local variables so that there is a
room for them in the activation record. When the function or procedure is
called the actual parameters are copied into the formal parameters.
H:\COMPILER\COMPILER.DOC
52
void main(void)
{
int a = 5, b = 6;
cout <<”Before swapping the value of a is “<<a
<<” and the value of b is “<<b<<endl;
swap( a, b );
cout <<”After swapping the value of a is “<<a
<<” and the value of b is “<<b;
}
void swap(int x, int y)
{
int temp;
temp = x;
x = y;
y = temp;
}
Exercise What is the output of the above program?
Although the values of the formal parameters have changed, this change is
not reflected by the actual parameters and the values of the actual
parameters remain intact.
Distinguishing Features of this method is that the values of the actual
parameters don’t change.
Note: The local variables of procedures or functions are created when that
procedure or function is active and these are discarded when the procedure
or functions terminates.
2 – Call By Reference:
In this method the addresses (l values) of the actual parameters are
copied into the addresses of the formal parameters, so that, each pair of
actual and formal parameters points to the same location. In this case the
affects of processing are reflected in actual parameters.
H:\COMPILER\COMPILER.DOC
53
void main(void)
{
int a = 5, b = 6;
cout <<”Before swapping the value of a is “<<a
<<” and the value of b is “<<b<<endl;
swap( a, b );
cout <<”After swapping the value of a is “<<a
<<” and the value of b is “<<b;
}
3 – Call By Name:
The working of this method is displayed:
1 – The procedure is treated as a macro, i.e. its body is substituted
for the call in the caller with actual parameters literally
substituted for the formals. Such a literal substitution is called
Macro Expansion or Inline Expansion.
2 – The local names of the called procedure are kept distinct from
the names of the calling procedure. We can think of each local
name of the called procedure being systematically renamed into
a distinct new name before macro expansion is done.
3 – The actual parameters are placed within parenthesis if
necessary to preserve their integrity.
4 – Copy Restore:
It is a hybrid between call by value and call by reference ( Also known
are COPY IN COPY OUT).
The working of this method is:
H:\COMPILER\COMPILER.DOC
54
H:\COMPILER\COMPILER.DOC
55
Chapter 6
Code Optimization
H:\COMPILER\COMPILER.DOC
56
H:\COMPILER\COMPILER.DOC
57
Chapter 7
Code Generation
The final phase of a compiler is code generator. The input to this
phase is an intermediate representation IR with optimized code of the source
program and its output is an equivalent program in a target language.
Error
Handler
Symbol
Table
It is required that:
1– The generated code must be correct.
2– The generated code must be of high quality (i.e. it makes an effective
use of resources)
3– The code generator itself should be efficient.
H:\COMPILER\COMPILER.DOC
58
etc). We also assume that type checking has taken place and the type
conversion has been inserted where necessary, (In some compilers this kind
of type of type checking is done together with the code generation).
2 – Target Program: The target program is the output of the code generator.
Like the IR it can have many forms, i.e.
a – Absolute Machine Code is can be placed in fix location in
memory and immediate by executed.
b – Relocatable Machine Code allows sub programs to be compiled
H:\COMPILER\COMPILER.DOC
59
2 ADD AX, c
3 MOV a, AX
Statement 2
4 MOV AX, a
5 ADD AX, e
6 MOV d, AX
In this code 4th and 5th statements is redundant.
The quality of the generated code is determined by its speed and size.
A target machine with a rich instruction set may provide several ways of
doing the same thing. Since the cost differences between different
implementation may be significant, a naïve translation may lead to a correct
but unacceptably inefficient code, e.g. if a target machine has INC instruction
then,
Repeat
Let translate x = x + 1 into two Assembly routines, i.e.
1 – MOV AX, x
ADD AX, 1
MOV x, AX
2 – INC x ;Machine having increment instruction
The second one is more efficient.
The selection of a best Assembly code sequence for a three address
construct also depends upon the context in which the construct appears.
Instruction Cost:
Cost of an instruction
cost associated with the source and destination mode + 1
This cost corresponds to the length in words of the instruction.
Addressing modes involving:
a – Registers have cost zero.
b – A memory location have cost 1.
c– Literals (constants) have cost 1.
In order to save space we must minimize the length of an instruction.
This has a additional benefit as well.
For most machine and for most instructions the time taken to fetch an
instruction from the memory exceeds the time spent in executing the
instruction. Thus by minimizing the instruction length we can minimize the
time taken to perform that instruction.
Consider the following examples:
MOV AX, BX
Cost associated with source =0
Cost associated with destination =0
Cost associated with source and destination mode = 0
Cost of instruction = 1 (because one word of info. has been moved.
H:\COMPILER\COMPILER.DOC
60
MOV a, AX
Cost of instruction = 1 + [1(execution cost)] = 2
ADD AX, 1
Destination Mode = 0 + 1
Instruction Cost = 1 + 1(execution cost) = 2
We assume that the machine is byte addressable have n registers four
byte word.
H:\COMPILER\COMPILER.DOC