Module 1
Module 1
COMPILER CONSTRUCTION
Module I
Introduction to Compilers
Dr. A. K. Jayswal
ASET(CSE)
PhD(CSE)-JNU
MTech(CSE)-JNU
GATE(CS),
UGC-NET(CS)
1
Module I:
❑ Introduction of Compiler
❑ Cousins of the Compiler
❑ Phases of a Compiler
❑ Lexical Analysis
❑ Finite state Machine, R.E.
❑ Compiler writing tools-LEX, YACC
❑ CFG-Derivation, Ambiguity
2
Course Title: Compiler Construction [CSE304]
Course Objectives:
The objective of this course is to describe the utilization of formal Grammar using Parser
representations, especially those on bottom-up and top-down approaches and various algorithms;
to learn techniques for designing parser using appropriate software. The theory and practice of
programming language translation, compilation, and run-time systems, organized around
significant programming project to build a compiler for simple but nontrivial programming
language. To understand, design and implement a parser. To understand design code generation
schemes. To understand optimization of codes and runtime environment.
Pre-requisites: Computer architecture or equivalent, (Data structures and algorithms) or
equivalent, (Systems programming) or equivalent Familiarity with Java
3
Course Title: Compiler Construction [CSE304]
5
Course Contents/Syllabus
6
Assessment: Theory/Lab
7
Amity School of Engineering and Technology
Recommended Reading
Textbooks:
• Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman, “Compilers:
Principles Techniques and Tool”, Second Edition, Pearson Publication, 2007.
Reference Book:
• Des Watson, A Practical Approach to Compiler Construction, First
Edition, Springer, 2017
8
Amity School of Engineering and Technology
OBJECTIVES
9
Amity School of Engineering and Technology
Module-1 Assessment
• Assignment- 2 Marks
10
Amity School of Engineering and Technology
Therefore, HLL instructions must be translated into machine language prior to execution.
11
Amity School of Engineering and Technology
Translator
It is a program that takes as input a program written in one programming
language i.e., the source program and translates into an equivalent program in
another language i.e. the target language.
Translator
Source program Target program
Compiler is a translator.
12
Amity School of Engineering and Technology
Compiler
Compiler is a software which converts a program written in high level language
(Source Language) to low level language (Object/Target/Machine Language).
13
Amity School of Engineering and Technology
14
Amity School of Engineering and Technology
It is slower than Compiler but provides better error diagnostic than Compiler
15
Amity School of Engineering and Technology
Compiler Vs Interpreter
16
Amity School of Engineering and Technology
17
Amity School of Engineering and Technology
Structure of a Compiler
There are two phases of whole Compilation process:
• Analysis (Machine Independent/Language dependent)
• Synthesis (Machine dependent/Language independent)
18
Amity School of Engineering and Technology
Structure of a Compiler
19
Amity School of Engineering and Technology
Structure of a Compiler
Semantic Analysis do the type checking and generates a annotated parse tree (a Parse tree with
semantic action, called SDT) or (a parse tree + data type information)
20
Amity School of Engineering and Technology
21
Amity School of Engineering and Technology
Amity School of Engineering and Technology
23
Amity School of Engineering and Technology
24
Amity School of Engineering and Technology
Q.3: 𝑚𝑎𝑖𝑛
{
𝑎 = 𝑏 +++−−−−+++==; }
Q.4:
44
Amity School of Engineering and Technology
45
Amity School of Engineering and Technology
1. Analysis phase creates an intermediate representation from the given source code.
• Lexical Analyzer
• Syntax Analyzer
• Semantic Analyzer
2. Synthesis phase creates an equivalent target program from the intermediate representation.
• Intermediate Code Generator
• Code Optimizer
• Code Generator
47
Amity School of Engineering and Technology
Grouping of Phases
Phases of Compiler are grouped into two:
1. Front End
2. Back End
1. Front End : It consists of those phases or parts of phases that primarily depends on
the source language and are independent of the target machine. It includes lexical,
syntax analysis, creation of symbol table, semantic analysis and generation of
intermediate code. It also includes error handling which goes along with each of
these phases.
2. Back End: It includes those phases or parts of phases that depend on the target
machine and independent of the source language. It includes intermediate code
generation, code optimization, code generation along with the necessary error
handling and symbol table operations.
48
Amity School of Engineering and Technology
Example
49
Amity School of Engineering and Technology
E
M
C
Symbol Table
𝑟 + means one or
more instances.
r? means 0 or one
instance
Regular definitions:
Re-writing Regular definitions:
Recognition of tokens: Grammar
Token recognition can be
done using regular definitions:
D1→r1
D2→r2
.
.
Di is a new symbol not in ∑
and ri is a RE over ∑ .
No token for Ws: blank,
tab, new line etc.
Amity School of Engineering and Technology
The source program lies in the input buffer. For Example: consider the statement E=M*C**2
Two pointers are used to recognize the lexeme:
𝒍𝒆𝒙𝒆𝒎𝒆𝑩𝒆𝒈𝒊𝒏: Mark the beginning of the current lexeme whose extent we are attempting to
determine
𝒇𝒐𝒓𝒘𝒂𝒓𝒅: Scan until a pattern match is found
67
Amity School of Engineering and Technology
Initially both the pointers point to the first character of the input string.
fp move ahead to find the white space (i.e. end of the lexeme), after encountering white space bp and fp
are set to next token.
This reading of characters from secondary storage is very costly. Hence buffering technique is used. In
buffering technique, a block of data is first placed from main memory to buffer and then from buffer to
lexical analyzer.
There are two methods used in this context:
1. One Buffer Scheme and
2. Two Buffer Scheme.
68
Amity School of Engineering and Technology
69
Amity School of Engineering and Technology
Similarly end of second buffer is also recognized by the end of buffer mark present at the end of second
buffer. when fp encounters first eof, then one can recognize end of first buffer and hence filling up second
buffer is started. In the same way when second eof is obtained then it indicates of second buffer.
Alternatively, both the buffers can be filled up until end of the input program and stream of tokens is
identified. This eof character introduced at the end is calling Sentinel which is used to identify the end of
buffer.
70
Thus, both the test can be combined by extending each buffer to hold a sentinels character (eof)
at the end.
LEX(lexical Analyzer generator):
▪ LEX is a s/w tool to generate
lexical analyzer (1st phase of a
Compiler). It is based on R.E.
▪ YACC is a tool to generate syntax
analyzer to check syntax (2nd
phase of a compiler. It is based on
Grammar or production rule.
Construction of Lexical Analyzer with LEX tool:
❑ 1st a lex specification or Lex source file (𝒍𝒆𝒙. 𝒍) is prepared. It consists of Regular Expression
associated with Lex actions (called Lex language).
❑ Then Lex compiler is used to convert Lex specification file (lex.l) into C language file 𝒍𝒆𝒙. 𝒚𝒚. 𝒄 (it is
a lex program in C language). To run anything, we want .exe file. So, a C compiler is used which
convert this C program file (lex.yy.c) into executable file (binary form), called 𝒂. 𝒐𝒖𝒕.
❑ Now this executable file or output file 𝒂. 𝒐𝒖𝒕 is ready to generate a sequence of tokens. Any input
stream, to this file, generates a sequence of tokens. For example: Tokens may be identifiers,
keywords, operators etc.
Structure of lex program:
Formal Definition:
92
Amity School of Engineering and Technology
93
Amity School of Engineering and Technology
94
Amity School of Engineering and Technology
Examples of
context Free
Grammar:
Amity School of Engineering and Technology
Language of
a Grammar
Amity School of Engineering and Technology
Example:
Amity School of Engineering and Technology
Context Free
Language
(Definition):
Amity School of Engineering and Technology
Example
Amity School of Engineering and Technology
Another Example
100
Amity School of Engineering and Technology
Another Example
101
Amity School of Engineering and Technology
102
Amity School of Engineering and Technology
103
Amity School of Engineering and Technology
Parse Tree
A parse tree of a derivation is a tree in which:
•If a rule A → A1A2…An occurs in the derivation then A is a parent node of nodes labeled A1,
A2, …, An
a
S
a S
b
S
e
104
Amity School of Engineering and Technology
Parse Tree
S →A|AB Sample derivations:
A →|a|Ab|AA S AB AAB aAB aaB aabB aabb
B →b|bc|Bc|bB S AB AbB Abb AAbb Aabb aabb
106
Amity School of Engineering and Technology
Ambiguity in CFG
108
Amity School of Engineering and Technology
Ambiguous
Grammar
(Examples1
,Example2)
Amity School of Engineering and Technology
Q.1:
bExp → 𝑏𝐸𝑥𝑝 𝑶𝑹 𝑏𝐸𝑥𝑝
bExp → 𝑏𝐸𝑥𝑝 𝑨𝑵𝑫 𝑏𝐸𝑥𝑝
Check Your bExp → 𝑵𝑶𝑻 𝑏𝐸𝑥𝑝
bExp → 𝑻𝑹𝑼𝑬
Progress-1 bExp → 𝑭𝑨𝑳𝑺𝑬
Amity School of Engineering and Technology
112
Amity School of Engineering and Technology
Note: Removing Left recursion removes ambiguity from expression related grammar (now,
only one parse tree for any w) but still here a problem of precedence and associativity
113
Amity School of Engineering and Technology
Converting
Ambiguous to
Unambiguous
Grammar:
Example2:
Note: Removing Left recursion and (or) Left factoring from Grammar, removes the ambiguity in a CFG.
Amity School of Engineering and Technology
Note: Removing Left recursion and (or) Left factoring from Grammar, removes the ambiguity in a CFG.
115
Amity School of Engineering and Technology
Example
Amity School of Engineering and Technology
Ambiguity in CFG
Example
Amity School of Engineering and Technology
Ambiguity in CFG
Example
Amity School of Engineering and Technology
Ambiguity in CFG
Example
Amity School of Engineering and Technology
Take a=2
Ambiguity in CFG
▪ Two different Parse
tree may cause
problems in
Example
applications which use
the derivation tree.
▪ For Example: -
Evaluating
expressions, and in
general, in compiler for
programming
languages.
Amity School of Engineering and Technology
Q.1 Consider the following parse tree for the expression a#b$c$d#e#f, involving two binary
operators $ and #.
Which one of the following is correct for the given parse tree?
(A) $ has higher precedence and is left associative; # is right associative
(B) # has higher precedence and is left associative; $ is right associative
(C) $ has higher precedence and is left associative; # is left associative
(D) # has higher precedence and is right associative; $ is left associative
121
Ambiguous Grammar
Definition. A grammar G is ambiguous if there is a word w L(G)
having at least two different parse trees or
Two or more than 2 LMD or
Two or more than 2 RMD
S→A
S→B
S → AB
A → aA
B → bB
A→e
B→e
Example
Amity School of Engineering and Technology
Example
Amity School of Engineering and Technology
Example
Amity School of Engineering and Technology
Two
Derivation
tree for W
Amity School of Engineering and Technology
Important
note for
Ambiguity
in CFG
Amity School of Engineering and Technology
130
Amity School of Engineering and Technology
Writing
Ambiguous to
unambiguous
CFG
Amity School of Engineering and Technology
132
Amity School of Engineering and Technology
Ambiguity
134
Amity School of Engineering and Technology
135
Amity School of Engineering and Technology
If the operator is left associative then change the right most symbol
Eg.- E->E*E|id can be replaced by
E->E*E’|E’
E’-> id
If the operator is right associative then change the Left most symbol
Eg.- E->E E|id can be replaced by
E->E’ E|E’
E’-> id 136
Amity School of Engineering and Technology
137
Amity School of Engineering and Technology
138
Amity School of Engineering and Technology
139
Amity School of Engineering and Technology
End of Module-1
**************
140
Amity School of Engineering and Technology
**************
141
Regular Expression
Amity School of Engineering and Technology
158
Amity School of Engineering and Technology
159
Amity School of Engineering and Technology
161
Amity School of Engineering and Technology
162
Amity School of Engineering and Technology
163
Amity School of Engineering and Technology
Specification of Tokens(Pattern)
Regular expressions are an important notation for specifying patterns.
Regular expression is a grammar. It defines the rule. For the grammar, first
we have to define the alphabet
164
Amity School of Engineering and Technology
Regular Expression
165
Amity School of Engineering and Technology
166
Amity School of Engineering and Technology
2. Relational Operator
167
Amity School of Engineering and Technology
168
Amity School of Engineering and Technology
169
Amity School of Engineering and Technology
170
Amity School of Engineering and Technology
Bootstrapping
Bootstrapping is widely used in the compilation development.
Bootstrapping is used to produce a self-hosting compiler. Self-hosting compiler is a type
of compiler that can compile its own source code.
Bootstrap compiler is used to compile the compiler and then you can use this compiled
compiler to compile everything else as well as future versions of itself.
171
Amity School of Engineering and Technology
Bootstrapping
The T- diagram shows a compiler SCIT for Source S, Target T, implemented in I.
172
Amity School of Engineering and Technology
Bootstrapping
3. Compile LCSA using the compiler SCAA to obtain LCAA. LCAA is a compiler for language L,
which runs on machine A and produces code for machine A.
173
Amity School of Engineering and Technology
174
Amity School of Engineering and Technology
175
Amity School of Engineering and Technology
Program#1: WAP in C/C++ to Implement a deterministic finite automata (DFA) for accepting the
language
𝐿 = {𝑎𝑛𝑏𝑛 | 𝑛 mod 2 = 0, 𝑚 ≥ 1}
Example (Output):
Input: aabbb
Output: ACCEPTED //n=2(even) m=3 (>=1)
Input: aaabbb
Output: NOT ACCEPTED //n=3(odd), m = 3
176
Amity School of Engineering and Technology
L = (aa)*.b+
Approaches:
There are 3 steps involve which results in acceptance of string:
177
Amity School of Engineering and Technology
178
Amity School of Engineering and Technology
179
Amity School of Engineering and Technology
180
Amity School of Engineering and Technology
181
L={01,001,101,110001,1001,……….}
L={01,001,101,110001,1001,……….}
Problem#2
Design deterministic finite automata (DFA) with ∑ = {0, 1} that accepts the
languages ending with “01” over the characters {0, 1}.
Solution:
L={01,001, 10001,1101,….}
Minimum string length is 2 so minimum 3 states required and its DFA is
182
L={01,001,101,110001,1001,……….}
L={01,001,101,110001,1001,……….}
Problem#2
Solution:
L={01,001,10001,…..}
=(0+1)*01
OUTPUT
183
Amity School of Engineering and Technology
Problem #3
184
Amity School of Engineering and Technology
185