0% found this document useful (0 votes)
275 views

Code Generation

The code generator is the final phase of a compiler that produces target code from an intermediate representation (IR) and symbol table. It performs key tasks like instruction selection, register allocation, and instruction ordering to generate semantically equivalent target programs. The design of the code generator involves issues like the input and target formats, memory management, choosing an evaluation order, and different code generation approaches. It aims to produce correct code while balancing implementation complexity, testability, and maintainability.

Uploaded by

Candy Angel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
275 views

Code Generation

The code generator is the final phase of a compiler that produces target code from an intermediate representation (IR) and symbol table. It performs key tasks like instruction selection, register allocation, and instruction ordering to generate semantically equivalent target programs. The design of the code generator involves issues like the input and target formats, memory management, choosing an evaluation order, and different code generation approaches. It aims to produce correct code while balancing implementation complexity, testability, and maintainability.

Uploaded by

Candy Angel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 49

Code Generation

Introduction

• The final phase of a compiler is code generator


• It receives an intermediate representation (IR) with
supplementary information in symbol table
• Produces a semantically equivalent target program
• Code generator main tasks:
– Instruction selection
– Register allocation and assignment
– Insrtuction ordering

Code
Front end Code optimizer
Generator
Issues in the Design of Code Generator
• Input to the code generator
• Target Programs
• Memory Management
• Instruction Selection
• Register allocation
• Choice of evaluation order
• Approaches to code generation
Issues in the Design of Code Generator
• The most important criterion is that it produces correct
code
• Input to the code generator
– IR + Symbol table
– We assume front end produces low-level IR, i.e. values of names
in it can be directly manipulated by the machine instructions.
– Syntactic and semantic errors have been already detected
• The target program
– Common target architectures are: RISC, CISC and Stack based
machines
– In this chapter we use a very simple RISC-like computer with
addition of some CISC-like addressing modes
Issues in the Design of Code Generator
• Memory Management: Mapping names in the
source program to address of data objects in
run-time memory is done by the code
generator.
• Instruction Selection: The nature of the
instruction set of the target machine determine
the difficulty of instruction selection. The
quality of the generated code is determined by
its speed and size.
Issues in the Design of Code Generator
• Register Allocation: Instruction involving register
operands are usually shorter and faster than
involving operands in memory.
• Evaluation order: The order in which computation
are performed can affect the efficiency of the target
code. Some computations orders require fewer
register to hold intermediate results than others.
• Approaches to code Generation: Give the premium
on correctness, designing a code generator so it can
be easily implemented, tested and maintained is an
important design goal.
complexity of mapping

• the level of the IR


• the nature of the instruction-set architecture
• the desired quality of the generated code.
a=b+c
x=y+z d=a+e

LD R0, y LD R0, b
ADD R0, R0, z ADD R0, R0, c
ST x, R0 ST a, R0
LD R0, a
ADD R0, R0, e
ST d, R0
Register allocation

• Two subproblems
– Register allocation: selecting the set of variables that will reside in
registers at each point in the program
– Resister assignment: selecting specific register that a variable reside in
• Complications imposed by the hardware architecture
– Example: register pairs for multiplication and division

t=a+b t=a+b
t=t*c t=t+c
T=t/d T=t/d
L R0, a
L R1, a A R0, b
A R1, b M R0, c
M R0, c SRDA R0, 32
D R0, d D R0, d
ST R1, t ST R1, t
A simple target machine model
• Load operations: LD r,x and LD r1, r2
• Store operations: ST x,r
• Computation operations: OP dst, src1, src2
• Unconditional jumps: BR L
• Conditional jumps: Bcond r, L like BLTZ r, L
Addressing Modes
• variable name: x
• indexed address: a(r) like LD R1, a(R2) means
R1=contents(a+contents(R2))
• integer indexed by a register : like LD R1, 100(R2)
• Indirect addressing mode: *r and *100(r)
• immediate constant addressing mode: like LD R1,
#100
Mode Form Address Added Cost
Absolute M M 1
Register R R 0
Indexed C(R) C+contents(R) 1
Indirect register *R Contents(R) 0
Indirect indiexed *C(R) Contents(C+contents(R) 1
Literal #C (source to be a constant) 1

Address Modes, Forms and Associated Costs


b = a [i]

LD R1, i //R1 = i
MUL R1, R1, 8 //R1 = Rl * 8
LD R2, a(R1)
//R2=contents(a+contents(R1))
ST b, R2 //b = R2
a[j] = c

LD R1, c //R1 = c
LD R2, j // R2 = j
MUL R2, R2, 8 //R2 = R2 * 8
ST a(R2), R1
//contents(a+contents(R2))=R1
x=*p

LD R1, p //R1 = p
LD R2, 0(R1) // R2 =
contents(0+contents(R1))
ST x, R2 // x=R2
conditional-jump three-address instruction

If x<y goto L
LD R1, x // R1 = x
LD R2, y // R2 = y
SUB R1, R1, R2 // R1 = R1 - R2
BLTZ R1, M // i f R1 < 0 jump t o M
costs associated with the addressing modes

• LD R0, R1 cost = 1
• LD R0, M cost = 2
• LD R1, *100(R2) cost = 3
Addresses in the Target Code

• A statically determined area Code

• A statically determined data area Static

• A dynamically managed area Heap

• A dynamically managed area Stack


Static Allocation
three-address statements for procedure calls
and returns
• call callee
• Return
• Halt
• action
Target program for a sample call and return
Stack Allocation

Branch to called procedure

Return to caller
in Callee: BR *0(SP)
in caller: SUB SP, SP, #caller.recordsize
Target code for stack allocation
A Simple Code Generator
• Here we shall consider an algorithm that
generates code for a single basic block. It
considers each three address insturction in
turn, and keep track of what are in what
registers so it can avoid generating
unnecesary loads and strores.
Principal uses of registers
• In most machine architectures, some or all of the
operands of an operation must be in registers in order
to perform the operation.
• Registers make good temporaries - places to hold the
result of a subexpression while a larger expression is
being evaluated, or more generally, a place to hold a
variable that is used only within a single basic block.
• Registers are used to hlod values that are complied in
one basic block and used on other blocks.
• Registers are often used to help with run-time storage
management, for example, to manage the run-time
stack, including the maintenance of stack pointers and
possibly the top elements of the stack itself.
Generic code Generation Algorithm
The code generates algorithm takes a sequence of
there-address statement of the form x=y op z, we
perform the following actions.
1.Invoke a function getreg to determine the
location L where the result of the computation y
op z should be stored, L will usually be a register.
2.Consult the address descriptor for y to determine
y’, the current location of y. if the value of y in not
already in L, generate the instruction MOV y’, L to
place and copy of y in L
Generic code Generation Algorithm
3. Generate the instruction op z’, L where z’ is a
current location z.
4. Update the address descriptor of x, to indicate
that x is in location L. If L is a register, update its
descriptor to indicate that it contains the value
of x and remove x from all other register
descriptors.
5. If the current values of y and /or z have no next
uses, and are in registers, after the register
descriptor or indicate that, those registers no
longer will contain y and/or z, respectively.
Register and Address Descriptors
The code generation algorithm uses descriptors to keep track of
register contents and address for names
1. Register Descriptors: A register descriptors is a pointer to a list
containing information about the current contents of each register.
Initially all the registers are empty.

2. Address Descriptors: An address descriptor keeps track of the


location where the current value of the name can be found at run-
time. This information can be stored in the symbol table.
Function getreg()
The function getreg() when called upon to
return a location where the computation
specifieds by the three address statement
x=y op z should be performed, return a
location L as follows,
Function getreg()
1. If the name y is in register that holds the value of no other
name and y in not live and has no next use after execution of
x=y op z, then return the register of y for L. update the
address descriptor of y to indicate that y is no longer in L.
2. Other wise, return an empty register for L if there in one.
3. If there exists no empty register and if x has next use in the
block of op is an operator such as indexing that requires a
register, then find a suitable occupied register, empty it by
storing its value in the proper memory location m and update
the address descriptor and return such a register for L.
4. If x in not used in the block or no suitable occupied register
can be found, select the memory location of x as L.
Example: The assignemnt d=(a-b)+(a-c)+(a-c)
might be translated into the following address
code sequence.
t=a-b
u=a-c
v=t+u
d=v+u and d live at the end.
Code sequence for the above example as
follows
code sequence for d=(a-b)+(a-c)+(a-c)

Statement Code Generated Register Descriptor Address Descriptor


t=a-b MOV A, R0 RO contains t t in R0
SUB b, R0
u=a-c MOV a, R1 R0 contains t t in R0
SUB c,R1 R1 contains u u in R0
v=t+u ADD R1, R0 R0 contains v u in R1
R1 contains u v in R0
d=v+u ADD R1, R0 R0 contains d d in R0 and memory
MOV R0,d

Cost of code sequence is 12


Register Allocation and Assignment

• Global Register Allocation


• Usage Counts
• Register Assignment for Outer Loops
• Register Allocation by Graph Coloring
Global register allocation
• Previously explained algorithm does local (block based)
register allocation
• This resulted that all live variables be stored at the end
of block
• To save some of these stores and their corresponding
loads, we might arrange to assign registers to
frequently used variables and keep these registers
consistent across block boundaries (globally)
• Some options are:
– Keep values of variables used in loops inside registers
– Use graph coloring approach for more globally allocation
Usage counts
• For the loops we can approximate the saving
by register allocation as:
– Sum over all blocks (B) in a loop (L)
– For each uses of x before any definition in the
block we add one unit of saving
– If x is live on exit from B and is assigned a value in
B, then we ass 2 units of saving
Flow graph of an inner loop
Code sequence using global register
assignment
Register allocation by Graph coloring
• Two passes are used
– Target-machine instructions are selected as though
there are an infinite number of symbolic registers
– Assign physical registers to symbolic ones
• Create a register-interference graph
• Nodes are symbolic registers and edges connects two nodes
if one is live at a point where the other is defined.
• For example in the previous example an edge connects a
and d in the graph
• Use a graph coloring algorithm to assign registers.
Intermediate-code tree for a[i]=b+1
Tree-rewriting rules
Syntax-directed translation scheme
An instruction set for tree matching
Ershov Numbers
• Label any leaf 1.
• The label of an interior node with one child is
the label of its child.
• The label of an interior node with two children
is
– The larger of the labels of its children, if those
labels are different.
– One plus the label of its children if the labels are
the same.
A tree labeled with Ershov numbers
Generating code from a labeled expression tree
• To generate machine code for an interior node with label k and two children
with equal labels (which must be k - l) do the following:
– Recursively generate code for the right child, using base b+1. The result of the
right child appears in register Rb+k.
– Recursively generate code for the left child, using base b; the result appears in
Rb+k-1.
– Generate the instruction OP Rb+k, Rb+k-1, Rb+k, where OP is the appropriate
operation for the interior node in question.
• Suppose we have an interior node with label k and children with unequal
labels. Then one of the children, which we'll call the "big" child, has label k ,
and the other child, the "little" child, has some label m < k. Do the following to
generate code for this interior node, using base b:
– Recursively generate code for the big child, using base b; the result appears in
register Rb+k-l.
– Recursively generate code for the small child, using base b; the result appears
in register Rb+m-l. Note that since m < k, neither Rb+k-l nor any higher-numbered
register is used.
– Generate the instruction OP Rb+k-l, Rb+m-l, Rb+k-1 or the instruction OP Rb+k-l, Rb+k-l,
Rb+m+l, depending on whether the big child is the right or left child,
respectively.
• For a leaf representing operand x, if the base is b generate the instruction LD
Rb, x.
Optimal three-register code
Evaluating Expressions with an
Insufficient Supply of Registers
• Node N has at least one child with label r or greater. Pick the larger child
(or either if their labels are the same) to be the "big" child and let the
other child be the "little" child.
• Recursively generate code for the big child, using base b = 1. The result of
this evaluation will appear in register Rr
• Generate the machine instruction ST tk, Rr, where tk is a temporary
variable used for temporary results used to help evaluate nodes with label
k.
• Generate code for the little child as follows. If the little child has label r or
greater, pick base b=1. If the label of the little child is j<r, then pick b=r-j.
Then recursively apply this algorithm to the little child; the result appears
in Rr.
• Generate the instruction LD Rr-l, tk.
• If the big child is the right child of N, then generate the instruction OP R r,
Rr, Rr-1. If the big child is the left child, generate OP Rr, Rr-1, Rr.
Optimal three-register code using only
two registers
Dynamic Programming Algorithm

• Compute bottom-up for each node n of the expression tree T an


array C of costs, in which the ith component C[i] is the optimal cost
of computing the subtree S rooted at n into a register, assuming i
registers are available for the computation,
 for1  i  r
• Traverse T, using the cost vectors to determine which subtrees of T
must be computed into memory.
• Traverse each tree using the cost vectors and associated
instructions to generate the final target code. The code for the
subtrees computed into memory locations is generated first.
Syntax tree for (a-b)+c*(d/e) with
cost vector at each node
minimum cost of evaluating the root
with two registers available
• Compute the left subtree with two registers available into
register R0, compute the right subtree with one register
available into register R1, and use the instruction ADD R0, R0,
R1 to compute the root. This sequence has cost 2+5+1=8.
• Compute the right subtree with two registers available into R
l , compute the left subtree with one register available into
R0, and use the instruction ADD R0, R0, R1. This sequence has
cost 4+2+1=7.
• Compute the right subtree into memory location M, compute
the left subtree with two registers available into register RO,
and use the instruction ADD R0, R0, M. This sequence has cost
5+2+1=8.

You might also like