0% found this document useful (0 votes)
127 views

Unit 5

Uploaded by

Shirley Andrina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
127 views

Unit 5

Uploaded by

Shirley Andrina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

UNIT V

Code generation: Machine-dependent code generation, object code forms, generic code
generation algorithm, Register allocation, and assignment. Using DAG representation of
Block.

CODE GENERATION
The final phase in compiler model is the code generator. It takes as input an intermediate representation of
the source program and produces as output an equivalent target program. The code generation techniques
presented below can be used whether or not an optimizing phase occurs before code generation.

The code generated by the compiler is an object code of some lower-level programming language,
for example, assembly language.
The source code written in a higher-level language is transformed into a lower-level language that
results in a lower-level object code, which should have the following minimum properties:
-It should carry the exact meaning of the source code.
-It should be efficient in terms of CPU usage and memory management

ISSUES IN THE DESIGN OF A CODE GENERATOR

Code generator converts the intermediate representation of source code into a form that can be
readily executed by the machine. A code generator is expected to generate the correct code.
Designing of the code generator should be done in such a way that it can be easily implemented,
tested, and maintained.

The main issues in the design of a code generator are:


Input to the code generator
Target program
Memory management
Instruction selection
Register allocation
Evaluation order

(i) Input to code generator The input to the code generator is the intermediate code generated
by the front end, along with information in the symbol table that determines the run-time addresses
of the data objects denoted by the names in the intermediate representation. Intermediate codes

1
may be represented mostly in quadruples, triples, indirect triples, Postfix notation, syntax trees,
DAGs, etc. The code generation phase just proceeds on an assumption that the input is free from
all syntactic and state semantic errors, the necessary type checking has taken place and the type-
conversion operators have been inserted wherever necessary.

(ii) Target program: The target program is the output of the code generator. The output may be
absolute machine language, relocatable machine language, or assembly language.
Absolute machine language as output has the advantages that it can be placed in
a fixed memory location and can be immediately executed. For example, WATFIV
is a compiler that produces the absolute machine code as output.
Relocatable machine language as an output allows subprograms and subroutines
to be compiled separately. Relocatable object modules can be linked together and
loaded by a linking loader. But there is added expense of linking and loading.
Assembly language as output makes the code generation easier. We can generate
symbolic instructions and use the macro-facilities of assemblers in generating code.
And we need an additional assembly step after code generation.

(iii) Memory Management Mapping the names in the source program to the addresses of data
objects is done by the front end and the code generator. A name in the three address statements
refers to the symbol table entry for the name. Then from the symbol table entry, a relative address
can be determined for the name.

(iv) Instruction selection Selecting the best instructions will improve the efficiency of the program. It
includes the instructions that should be complete and uniform. Instruction speeds and machine idioms also
play a major role when efficiency is considered. But if we do not care about the efficiency of the target
program then instruction selection is straightforward. For example, the respective three-address statements
would be translated into the latter code sequence as shown below:
P:=Q+R
S:=P+T

MOV Q, R0
ADD R, R0
MOV R0, P
MOV P, R0
ADD T, R0
MOV R0, S
Here the fourth statement is redundant as the value of the P is loaded again in that statement that
just has been stored in the previous statement. It leads to an inefficient code sequence. A given
intermediate representation can be translated into many code sequences, with significant cost
differences between the different implementations. Prior knowledge of instruction cost is needed
in order to design good sequences, but accurate cost information is difficult to predict.
(v) Register allocation issues Use of registers make the computations faster in comparison to
that of memory, so efficient utilization of registers is important. The use of registers is
subdivided into two sub-problems:
1. During Register allocation we select only those sets of variables that will reside in the
registers at each point in the program.

2
2. During a subsequent Register assignment phase, the specific register is picked to access the
variable.

To understand the concept consider the following three address code sequence
t:=a+b
t:=t*c
t:=t/d
Their efficient machine code sequence is as follows:
MOV a, R0
ADD b, R0
MUL c, R0
DIV d, R0
MOV R0, t

(vi) Evaluation order The code generator decides the order in which the instruction will be
executed. The order of computations affects the efficiency of the target code. Among many
computational orders, some will require only fewer registers to hold the intermediate results.
However, picking the best order in the general case is a difficult NP-complete problem.

Approaches to code generation issues: Code generator must always generate the correct code.
It is essential because of the number of special cases that a code generator might face. Some of
the design goals of code generator are:
Correct
Easily maintainable
Testable
Efficient

Disadvantages in the design of a code generator:

Limited flexibility: Code generators are typically designed to produce a specific type of code,
and as a result, they may not be flexible enough to handle a wide range of inputs or generate
code for different target platforms. This can limit the usefulness of the code generator in certain
situations.
Maintenance overhead: Code generators can add a significant maintenance overhead to a
project, as they need to be maintained and updated alongside the code they generate. This can
lead to additional complexity and potential errors.
Debugging difficulties: Debugging generated code can be more difficult than debugging hand-
written code, as the generated code may not always be easy to read or understand. This can make
it harder to identify and fix issues that arise during development.
Performance issues: Depending on the complexity of the code being generated, a code
generator may not be able to generate optimal code that is as performant as hand-written code.
This can be a concern in applications where performance is critical.
Learning curve: Code generators can have a steep learning curve, as they typically require a
deep understanding of the underlying code generation framework and the programming
languages being used. This can make it more difficult to onboard new developers onto a project
that uses a code generator.

3
Over-reliance: -
reliance on generated code, to the point where developers are no longer able to write code
manually when necessary. This can limit the flexibility and creativity of a development team,
and may also result in lower quality code overall.

MACHINE DEPENDENT CODE OPTIMIZATIONS

Machine-dependent optimization uses information about the limits and special features of the target
machine to produce code which is shorter or which executes more quickly on the machine.
The code produced by the compiler should take advantage of the special features of the target
machine. For example, consider code intended for machines of the PDP-11 family.
These computers have auto increment and auto decrement modes for instructions
When an instruction is given in the auto increment mode, the contents of the register are incremented
after being used. The register is incremented by one for byte instructions and by two for word
instructions.
The use of instructions in these modes reduces the code necessary for pushing and popping stacks.
The PDP-11 computers also have machine-level instructions to increment (INC), or to decrement
(DEC), by one, values stored in memory.
Whenever possible, the INC and DEC operations should be used instead of creating a constant with
value 1 and adding or subtracting this constant from the value stored in memory.
The PDP-11 machines have left- and right-shift operations. Shifting the bits one position to the left
is equivalent to multiplying by 2. Since shifting is faster than multiplication or division, more efficient
code is generated if multiplication and division by multiples of 2 are implemented with shift
operations.

OBJECT CODE FORMS


The output of code generation is an object code or machine code. Which is normally classified into different
forms.
1. Absolute Code
2. Relocatable machine Code
3. Assembler Code
Absolute Code:- Producing an absolute machine language program as output has the advantage that it can
be placed in a fixed location in memory and immediately executed. Programs can be compiled and executed
quickly.
Relocatable Machine Code:- Producing a relocatable machine language program (often called as object
module) as output allows sub programs to be compiler separately. For example a set of relocatable object
modules can be linked together and loaded for execution by linking loader. If the target machine does not
handle relocation automatically, the compiler must provide explicit relocation information to the loader to
link the separately compiled program segments.
Assembler Code:- Producing an assembly-language program as output makes the process of code
generation somewhat easier. We can generate symbolic instructions and use macro facilities of the

4
assembler to help in generation of code. But, generating assembler code as an output makes code generation
process slower because of it needs assembling, linking and loading.
GENERIC CODE GENERATION ALGORITHM

A simple code generation algorithm is a one that generates code for a single basic block.
It considers each three-address instruction in turn and keeps track of what values are in what
registers so it can avoid generating unnecessary loads and stores.
One of the primary issues during code generation is how to use registers effectively.
There are four principle uses of registers:

In most machine architectures some or all of the operands of an operation must be in registers in
order to perform the operation.
Registers make good temporaries i.e. places to hold the result of sub expression while a larger
expression is being evaluated.
Registers are used to hold values that are computed in one basic block and used in other blocks.
Registers are often used to help with run-time storage management, for example registers are used
to manage the run-time stack.
Let us assume that some set of registers is available to hold the values that are used within the block.
Typically this set of registers does not include all the registers of the machine since some registers are
reserved for global variables and managing the stack.
But our code generation algorithm considers each three address instruction in turn and decides what loads
are necessary to get the needed operands into registers. After generating the loads, it generates the operation
itself. And if there is a need to store the result into a memory location, it also generates that store. In order
to make needed decisions we require a data structure that tells us what program variables currently have
their value in a register and in which register(s) if so.
The desired data structure has the following descriptors:
o For each available register, a register descriptor keeps track of the variable names whose current
value is in that register.
o For each program variable an address descriptor keeps track of the location or locations where the
current value of that variable can be found. Where the location may be a stack location or a register
or a memory address.
o An essential part of the algorithm is the function getReg(I) which selects registers for each memory
location associated with the three address instruction I.
o The function getReg has access to the register and address descriptors for all the variables of the
basic block.
Machine Instructions for operations
For a three address instruction such as x = y + z, do the following:
1. Use getReg(x = y + z) to select registers for x, y and z. let these registers are Rx, Ry, and Rz.
2. If y is not in Ry then issue an instruction LD
for y .
3

5
4. Issue the instruction ADD Rx, Ry, Rz.

In the above three address instruction x = y + z we shall treat + as a generic operator and
ADD as the equivalent machine instruction.
Thus, when we implement the operation, the value of y must be in the second register and
z must be the third register in the ADD instruction.
Managing Register and Address Descriptors
As the code generation algorithm issues load, store and other machine instructions, it needs to update the
register and address descriptors. The rules are as follows:
1. For the instruction LD R, x
a) Change the register descriptor for register R so it holds only x.
b) Change the address descriptor for x by adding register R as an additional location.
2. For the instruction ST x, R change the address descriptor for x to include its own memory location.
3. For an operation such as ADD Rx, Ry, Rz implementing a three address instruction x = y + z.
a) Change the register descriptor for Rx so that it holds only x
b)
c) Remove Rx from the address descriptor of any variable other than x.
REGISTER ALLOCATION AND ASSIGNMENT
Register allocation is only within a basic block. It follows top-down approach.
Local register allocation
Register allocation is only within a basic block. It follows top-down approach.
Assign registers to the most heavily used variables
Traverse the block
Count uses
Use count as a priority function
Assign registers to higher priority variables first
Need of global register allocation

Local allocation does not take into account that some instructions (e.g. those in loops) execute more
frequently. It forces us to store/load at basic block endpoints since each block has no knowledge of
the context of others.
To find out the live range(s) of each variable and the area(s) where the variable is used/defined
global allocation is needed. Cost of spilling will depend on frequencies and locations of uses.
Register allocation depends on:
Size of live range

6
Number of uses/definitions
Frequency of execution
Number of loads/stores needed.
Cost of loads/stores needed.
Usage Counts:
A simple method of determining the savings to be realized by keeping variable x in a register for the
duration of loop L is to recognize that in our machine model we save one unit of cost for each reference to
x if x is in a register. An approximate formula for the benefit to be realized from allocating a register to x
within a loop L is:

where,
-use(x, B) is the number of times x is used in B prior to any definition of x;
-live(x,B) is 1 if x is live on exit from B and is assigned a value in B and
-live(x,B) is 0 otherwise.

B1 B2 B3 B4
a=(0+2)+(1+0)+(1+0)+(0+0)=4
b=(2+0)+(0+0)+(0+2)+(0+2)=6
c=(1+0)+(0+0)+(1+0)+(1+0)=3
d=(1+2)+(1+0)+(1+0)+(1+0)=6
e=(0+2)+(0+0)+(0+2)+(0+0)=4

7
f=(1+0)+(0+2)+(1+0)+(0+0)=4
Registers R0, R1, R2 are fixed registers.
R0 can be used by a or e or f
R1 can be used by b
R2 can be used by d

DIRECTED ACYCLIC GRAPH

Directed Acyclic Graph (DAG) is a tool that depicts the structure of basic blocks, helps to see the flow of
values flowing among the basic blocks, and offers optimization too. DAG provides easy transformation on
basic blocks. DAG can be understood here:

Leaf nodes represent identifiers, names or constants.


Interior nodes represent operators.
Interior nodes also represent the results of expressions or the identifiers/name where the values are
to be stored or assigned.
Input: A basic block
Output: A DAG for the basic block containing the following information:
1. A label for each node. For leaves, the label is an identifier. For interior nodes, an operator symbol.
2. For each node a list of attached identifiers to hold the computed values.
Case (i) x := y OP z
Case (ii) x := OP y
Case (iii) x := y
Method:
Step 1: If y is undefined then create node(y).
If z is undefined, create node (z) for case (i).
Step 2: For the case (i), create a node (OP) whose left child is node(y) and right child is node (z) . (Checking
for common sub expression). Let n be this node.
For case (ii), determine whether there is node (OP) with one child node(y). If not create such a node.
For case (iii), node n will be node(y).
Step 3: Delete x from the list of identifiers for node(x). Append x to the list of attached identifiers for the
node n found in step 2 and set node(x) to n.
Problem 1: Construct a DAG for the following code
t0 = a + b

8
t1 = t0 + c
d = t0 + t1

Problem 2: Construct a DAG for the following code


T1 := 4*I0
T2 := a[T1]
T3 := 4*I0
T4 := b[T3]
T5 := T2 * T4
T6 := prod + T5
prod:= T6
T7 := I0 + 1
I0 := T7
if I0 <= 20 goto 1

9
Application of Directed Acyclic Graph:
Directed acyclic graph determines the subexpressions that are commonly used.
Directed acyclic graph determines the names used within the block as well as the names computed
outside the block.
Determines which statements in the block may have their computed value outside the block.
Code can be represented by a Directed acyclic graph that describes the inputs and outputs of each of
the arithmetic operations performed within the code; this representation allows the compiler to
perform common subexpression elimination efficiently.
Several programming languages describe value systems that are linked together by a directed acyclic
graph. When one value changes, its successors are recalculated; each value in the DAG is evaluated
as a function of its predecessors.

10

You might also like