CSM 562 Compiler Construction Assignment Kaunda PG8589621, M Phil Computer Science, KNUST
CSM 562 Compiler Construction Assignment Kaunda PG8589621, M Phil Computer Science, KNUST
Page 1 of 23
Assignment 1:
Assembler
The individual instructions (like mov) directly correspond to specific opcodes in the
instruction set of the target CPU.
Page 2 of 23
Linkers and loaders
The basic job of any linker or loader is that: it binds more abstract names to more
concrete names, which permits programmers to write code using the more abstract
names. That is, it takes a name written by a programmer such as getline and binds it
to ‘‘the location 612 bytes from the beginning of the executable code in module
iosys.’’ Or it may take a more abstract numeric address such as ‘‘the location 450
bytes beyond the
beginning of the static data for this module’’ and bind it to a numeric address.
Linkers and loaders perform several related but conceptually separate actions
Page 3 of 23
sqrt in the library, and patching the caller’s object code to so the call
instruction refers to that location.
Although there’s considerable overlap between linking and loading, it’s reasonable to
define a program that does program loading as a loader, and one that does symbol
resolution as a linker. Either can do relocation, and there have been all-in-one linking
loaders that do all three functions.
Assignment 3
What is grammar?
A grammar is the set of rules that govern how we determine if these sentences are
part of the language or not.
For example, you can imagine there’s a rule in Java’s Grammar called “the equality
condition rule”. This rule says
Then, there must be another rule that says what an expression is. And so on for each
logical piece of the language (expressions, variables, keywords, etc.).
Looking deeper down, you can expect that the rules of a grammar start by defining
very high-level type of sentences, and then go to more and more bottom levels. Let
me use again the previous example to clarify this concept.
We know there is a rule about the equality condition. It says an equality statement
must be expression, followed by ==, followed by another expression. We can write is
as
Eq: statement : expression '==' expression
However, the rule is not complete in itself: the Grammar must also specify what
an expression is.
Page 4 of 23
Let’s assume an expression can be a variable, or a string, or an integer. Of course,
real programming languages have a more complex definition of expression, but let’s
continue with the simple one. So we would have
expression : variable | int | string
You can see that the character | is used to express the inclusive logical condition
(OR). The Grammar is not done yet, because it will also have to clarify what
a variable is.
So, let’s assume a variable is a sequence of characters in a-z, A-Z and digits in 0-9,
without length limitations in the number of characters. You would express this with
a regular expression rule, which is
variable : [a-zA-Z0-9]*
The * symbol after the square brackets means you can repeat each of the symbols
zero or more times.
Again, real programming languages have more complex definitions for variables, but
I am hoping this simple example clarifies the concept.
There are still two pieces of a previous rule that haven’t been specified
yet: int and string.
Let’s assume that an int is any non-empty sequence of digits, and string is any
sequence of characters in a-z, A-Z. Of course in real applications you would have
many more characters (for instance, ?!+-, etc.), but let’s keep it simple for the
moment. So, you are going to have two more rules:
int : [0-9]+
string : [a-zA-Z]*
At this point we’ve gone the deepest you can in the definition of an equality
condition. Why? Because you cannot expand further these rules. We say that single
characters such as a-z, A-Z and digits 0-9 are the terminal symbols or, more
formally, these are the characters of the Alphabet of the Grammar.
Page 5 of 23
Assignment 4
Page 6 of 23
Assignment 5 (Midsem)
Question 2 (b)
Question 2 (a)
case 'a':
return "(S " + parseW(toParse) + ")";
Page 7 of 23
default:
System.err.println("parse error");
return "error";
}
}
case 'a':
case '$':
return "(tail1 " + parseS(toParse) + ")";
default:
System.err.println("parse error");
return "error";
}
}
Page 8 of 23
case 'b':
match(toParse, 'b');
match(toParse, 'b');
return "(tail2 b b)";
default:
System.err.println("parse error");
return "error";
}
}
System.out.println(parseTree);
}
}
Page 9 of 23
Question 2 (c)
One advantage of applying optimizations in the AST is that it may reduce the
execution time of some back-end optimization pass. However, I believe that these
optimizations need to be done with parsimony because you may be hindering
further optimizations in the code.
Question 2 (b)
We want to accept only the string 0. Let s 1 be the only final state, where we reach s 1
on input 0 from the start state so. Make a "graveyard" state s2 , and have all other
transitions (there are five of them in all) lead there.
This uses the same idea as in part (a), but we need a few more states. The graveyard
state is s4 . See the picture for details.
Page 10 of 23
In the picture of our machine, we show a transition to the graveyard state whenever
we encounter a 0. The only final state is s 2 , which we reach after 11 and remain at as
long as the input consists just of 1's.
Question 2 (C)
Each representation you use is good for a particular thing. An AST is not very easy to
use for some types of optimizations (Data-Flow Analysis) while others, such as SSA,
are very good for that kind of task.
Some optimization passes can be applied directly over AST. Some can be applied
over machine code. Many optimization passes cannot be applied over AST nor over
machine code.
Page 11 of 23
Question 3 (a)
Page 12 of 23
Improvements in flow of control summary
Algebraic simplifications
If a program uses an expression consisting of two or more constants and no
variables, then that expression may be evaluated at the compile time only. Some
arithmetic instructions can be replaced by much simpler instructions on the
occurrence of some specific operands. For example, x2 can be implemented as x * x
and if a square root instruction is available, then x0.5 may be implemented as sqrt(x).
Similarly, multiplication by a power of 2 may be implemented as left-shift, division by
a power of 2 may be implemented as right-shift, addition of 1 may be implemented
as increment, subtraction of 1 may be implemented as decrement, and
multiplication or division by -1 may be implemented as negation. Logic expressions
may be simplified at the compile time using the concepts of Boolean algebra like the
De Morgan's laws. Additionally, arithmetic instructions may be recorded taking
advantage of the commutative and associative properties of the operations to
facilitate further algebraic simplification.
Page 13 of 23
Use of machine idioms summary
1. Addressing optimizations
2. Using special instructions
Question 3 (b)
Produce a state transition diagram (finite state machine) for a recognizer for a
comment in a high-level language where a comment starts with /* and ends with */.
Suggest how this recognizer could be implemented in software.
Page 14 of 23
Implementing the recognizer in java
import java.io.*;
class anstring
{
public static void main(String args[])throws IOException
{
int q[][]={{1,0},{1,2},{1,3},{0,1}};
String st;
BufferedReader obj= new BufferedReader (new InputStreamReader (System.in));
System.out.println("Enter string:");
st=obj.readLine();
int s=0;
for(int i=0; i<st.length();i++)
{
if (st.charAt(i)==”/”)
{
s=q[s][0];
}
else if (st.charAt(i)==”*”)
{
s=q[s][1];
}
}
if (s==2)
{
System.out.println("This string is a comment");
}
else
{
System.out.println("This string is not a comment");
}
}
}
Page 15 of 23
Question 3 (d)
(i) Assuming left-to-right associativity and precedence of "*" over "+", the
grammar is shown below.
Question 4 (a)
Design a lexer class in Java that scans statements from a source program, breaks
them into tokens, and stores them into symbol table. Lexical error message should
be displayed if an invalid token is encountered during scanning.
Lexer class
import java.lang.*;
import java.io.*;
// Released under the GNU General Public License Version 2, June 1991.
Page 16 of 23
public static final int // symbol codes...
word = 0,
numeral = 1,
open = 2, // (
close = 3, // )
plus = 4, // +
minus = 5, // -
times = 6, // *
over = 7, // /
eofSy = 8;
Page 17 of 23
sy = numeral;
}
void skipRest()
{ if( ! eof ) System.out.print("skipping to end of input...");
int n = 0;
while( ! eof )
{ if( n%80 == 0 ) System.out.println(); // break line
System.out.print(ch); n++; getch();
}
System.out.println();
}//skipRest
Page 18 of 23
}//error
}//Lexical class
Question 4(b)
If you look at compiler internally, they operate in phases. Typical phases might be:
What a specific compiler does for phases varies from compiler to compiler. Each of
these steps pushes the program representations closer to the final machine code.
Page 19 of 23
An N-pass compiler (single-pass, two-pass, or multi-pass) would bundle one or
more of these steps into a single pass.
A single-pass bundles all the phases into one- it emits assembly (or binary code)
directly during parsing, without building an intermediate representation (IR) of code,
such as an Abstract Syntax Tree (AST).
For example, a hypothetical compiler based on parser generator (like GNU Bison),
can emit assembly directly inside semantic actions of grammar productions, as
shown below.
This code cannot be compiled with a single-pass compiler, as the compiler has no
knowledge of Edge when it is first encountered. This applies to absolutely any
mutually-recursive definitions, or even just plain forward references that you may
encounter.
Page 20 of 23
It goes further. A single-pass compiler is, quite understandably, missing a lot of the
context for each token in the program, which severely limits the range of possible
optimisations that can be done. Let’s take, for example:
A multi-pass compiler might go even further and actually replace every single
invocation of foo() in the final code with constant 42, something that a single-pass
compiler is, again, completely unable to do, because it is missing a lot of the context.
In summary, single-pass compilers are, quite obviously, much simpler than multi-
pass compilers, but they are also significantly less powerful, and cannot handle
mutual recursion.
Question 4 (c)
What are the pros and cons of introducing Intermediate Representation (IR)
in a compiler?
Pros Cons
1. If a compiler translates the source Different IR affects speed and make the
language to its target machine compiler complex in architecture.
language without having the
option for generating intermediate
code, then for each new machine,
a full native compiler is required.
2. Intermediate code eliminates the
need of a new full compiler for
every unique machine by keeping
the analysis portion same for all
Page 21 of 23
the compilers.
3. It becomes easier to apply the
source code modifications to
improve code performance by
applying code optimization
techniques on the intermediate
code.
4. IR helps to separate the language
dependent front end from the
hardware dependent back end.
That way you don’t have to write a
second back end if you write
another front end for a different
language. Or another front end if
you port the same language to a
different architecture.
5. The intermediate code can be
used as the input to an interpreter,
which can then double as a
debugger; assuming that you will
be writing the compiler in its own
language after version
Question 4
What is an attribute grammar?
Attribute grammar is a medium to provide semantics to the context-free grammar
and it can help specify the syntax and semantics of a programming language.
Attribute grammar (when viewed as a parse-tree) can pass values or information
among the nodes of a tree.
Page 22 of 23
The value in doing on a per grammar rule basis is that it gives a very good way to
approach of the problem of doing global analysis using a lot of individually local
computations which are easy to understand.
As an example, one might define an attribute called “Type” with the intention of
computing Type for every expression tree node. Then an attribute rule for a “+ node
might combine the Type values for the children of the “+“ node to compute the type
of the result of adding the operands.
Page 23 of 23