0% found this document useful (0 votes)

8 views

Module 1

The document outlines the course on Compiler Construction at Amity School of Engineering and Technology, detailing its objectives, learning outcomes, and syllabus. It covers essential topics such as compiler phases, lexical analysis, and tools like LEX and YACC, while also emphasizing the importance of understanding programming language translation and optimization techniques. The course aims to equip students with practical skills in designing and implementing compilers, alongside theoretical knowledge of compiler structures and processes.

Uploaded by

Shivika Mittal

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Module 1

Uploaded by

Shivika Mittal

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 185

Amity School of Engineering and Technology

COMPILER CONSTRUCTION
Module I
Introduction to Compilers
Dr. A. K. Jayswal
ASET(CSE)
PhD(CSE)-JNU
MTech(CSE)-JNU
GATE(CS),
UGC-NET(CS)

1
Module I:
❑ Introduction of Compiler
❑ Cousins of the Compiler
❑ Phases of a Compiler
❑ Lexical Analysis
❑ Finite state Machine, R.E.
❑ Compiler writing tools-LEX, YACC
❑ CFG-Derivation, Ambiguity

2
Course Title: Compiler Construction [CSE304]

Course Objectives:

The objective of this course is to describe the utilization of formal Grammar using Parser
representations, especially those on bottom-up and top-down approaches and various algorithms;
to learn techniques for designing parser using appropriate software. The theory and practice of
programming language translation, compilation, and run-time systems, organized around
significant programming project to build a compiler for simple but nontrivial programming
language. To understand, design and implement a parser. To understand design code generation
schemes. To understand optimization of codes and runtime environment.
Pre-requisites: Computer architecture or equivalent, (Data structures and algorithms) or
equivalent, (Systems programming) or equivalent Familiarity with Java
3
Course Title: Compiler Construction [CSE304]

Course Learning Outcomes:

At the end of this course, the student will be able to
1. Describe the compiler concepts with their utilization.
2. Design various types of compilers.
3. Analyze & implement SLR and LALR parsing techniques.
4. Synthesize the code generation techniques.
5. Demonstrate the process of Implementation of optimization of source code.
6. Understand the structure of compilers
7. Understand the basic techniques used in compiler construction such as lexical analysis, top-
down, bottom-up parsing, context-sensitive analysis, and intermediate code generation
8. Understand the basic data structures used in compiler construction such as abstract syntax
trees, symbol tables, three-address code, and stack machines
9. Design and implement a compiler using a software engineering approach.
4
Course Contents/Syllabus

5
Course Contents/Syllabus

6
Assessment: Theory/Lab

7
Amity School of Engineering and Technology

Recommended Reading
Textbooks:
• Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman, “Compilers:
Principles Techniques and Tool”, Second Edition, Pearson Publication, 2007.

• A.A Putambekar, “Compiler Construction”, Technical Publications, 2009.

Reference Book:
• Des Watson, A Practical Approach to Compiler Construction, First
Edition, Springer, 2017

8
Amity School of Engineering and Technology

OBJECTIVES

After completing this section (module –I), you will

be able to
1.1 Understand the working of compiler.
1.2 Differentiate between phases of compiler.
1.3 Explain Lexical Analyzer
1.4 Analyze and differentiate Compiler writing tools-LEX, YACC
1.5 Explain Finite state Machine, R.E., Ambiguity in CFG

9
Amity School of Engineering and Technology

Module-1 Assessment

• Quiz (Conceptual and Numerical Based)-2 marks

• Assignment- 2 Marks

10
Amity School of Engineering and Technology

❑ All computers only understand machine Language

Therefore, HLL instructions must be translated into machine language prior to execution.

11
Amity School of Engineering and Technology

Translator
It is a program that takes as input a program written in one programming
language i.e., the source program and translates into an equivalent program in
another language i.e. the target language.

Translator
Source program Target program

Compiler is a translator.

12
Amity School of Engineering and Technology

Compiler
Compiler is a software which converts a program written in high level language
(Source Language) to low level language (Object/Target/Machine Language).

13
Amity School of Engineering and Technology

Language processing systems (using Compiler)

Key responsibilities of Preprocessor are:

Micro expansion (Macros are defined using #define directives which is
used to define constants or to create short, reusable code), Preprocessor
replaces macro invocations with their definitions and

File inclusion ( Preprocessor directives such as #include are used to

include header files into the source program. The preprocessor is
responsible for inserting contents of included files into source program)

Note: Preprocessor removes these lines and substitute entire program of

that header files into the source program

14
Amity School of Engineering and Technology

Cousins of Compiler(other translators)

1. Interpreter: It is program that executes other programs. Instead of
producing a target program as a translation, it executes a source program
statement by statement.

It is slower than Compiler but provides better error diagnostic than Compiler

15
Amity School of Engineering and Technology

Compiler Vs Interpreter

16
Amity School of Engineering and Technology

Cousins of Compiler(other translators)

2. Preprocessor : It translates programs written in one high level language into
another high-level language program.

3. Assembler: It translates assembly language program into machine language

program

17
Amity School of Engineering and Technology

Structure of a Compiler
There are two phases of whole Compilation process:
• Analysis (Machine Independent/Language dependent)
• Synthesis (Machine dependent/Language independent)

Compilation process is a complex process, so it is partitioned into a series of sub

processes called Phases.
Phases: A phase is logically an operation which takes as input one representation of
source program and produces as output another representation.
There are six different phases of compiler.

18
Amity School of Engineering and Technology

Structure of a Compiler

19
Amity School of Engineering and Technology

Structure of a Compiler

Semantic Analysis do the type checking and generates a annotated parse tree (a Parse tree with
semantic action, called SDT) or (a parse tree + data type information)
20
Amity School of Engineering and Technology

Output of a Compiler phases:

21
Amity School of Engineering and Technology
Amity School of Engineering and Technology

23
Amity School of Engineering and Technology

Front-end and Back-end of a compiler:

❑ The front-end phases are Lexical, Syntax and Semantic

analyses. These form the "analysis phase" as you can
well see these all do some kind of analysis.

❑ The Back End phases are called the "synthesis phase" as

they synthesize the intermediate and the target language
and hence the program from the representation created
by the Front-End phases. The advantages are that not
only can lots of code be reused, but also since the
compiler is well structured -it is easy to maintain &
debug.

24
Amity School of Engineering and Technology

(numerical Questions)-Lexical Analysis ( No of tokens)

Q.1 p𝑟𝑖𝑛𝑡𝑓(“%𝑑 𝐻𝑎𝑖”, &𝑥);
Q.2 𝑖𝑛𝑡 max(𝑥, 𝑦)
𝑖𝑛𝑡 𝑥, 𝑦;
/∗ 𝑓𝑖𝑛𝑑 𝑡ℎ𝑒 𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑜𝑓 𝑥 𝑎𝑛𝑑 𝑦 ∗/
{
𝑟𝑒𝑡𝑢𝑟𝑛 𝑥 > 𝑦? 𝑥: 𝑦 ;
}

Q.3: 𝑚𝑎𝑖𝑛
{
𝑎 = 𝑏 +++−−−−+++==; }

Q.4:

44
Amity School of Engineering and Technology

(numerical Questions)-Lexical Analysis ( No of tokens)

Q.4: 𝑚𝑎𝑖𝑛
{
𝑖𝑛𝑡 𝑎 = 10;
𝑐ℎ𝑎𝑟 𝑏 = ”𝑎𝑏𝑐”;
𝑖𝑛 𝑡 𝑐 = 30;
𝑐ℎ 𝑎𝑟 𝑑 = ”𝑥𝑦𝑧”;
𝑖𝑛 /∗ 𝑐𝑜𝑚𝑚𝑒𝑛𝑡 ∗/ 𝑡 𝑚 = 40.5; }

45
Amity School of Engineering and Technology

Analysis-Synthesis model of Compiler

We basically have two parts of compilers:
1. Analysis part
2. Synthesis part.

1. Analysis phase creates an intermediate representation from the given source code.
• Lexical Analyzer
• Syntax Analyzer
• Semantic Analyzer
2. Synthesis phase creates an equivalent target program from the intermediate representation.
• Intermediate Code Generator
• Code Optimizer
• Code Generator

47
Amity School of Engineering and Technology

Grouping of Phases
Phases of Compiler are grouped into two:
1. Front End
2. Back End

1. Front End : It consists of those phases or parts of phases that primarily depends on
the source language and are independent of the target machine. It includes lexical,
syntax analysis, creation of symbol table, semantic analysis and generation of
intermediate code. It also includes error handling which goes along with each of
these phases.
2. Back End: It includes those phases or parts of phases that depend on the target
machine and independent of the source language. It includes intermediate code
generation, code optimization, code generation along with the necessary error
handling and symbol table operations.
48
Amity School of Engineering and Technology

Example

49
Amity School of Engineering and Technology

Some Important terms (related to lexical analyzer):

1. Pattern: It is a rule which describes a set of lexemes that can represent a
particular token in the source program. For eg.- an identifier can be described as
a letter followed by letters or digit.

2. Lexeme: Lexemes are the smallest logical units of a program. It is a sequence of

characters present in the source program for which a token is produced. For eg.-
10, int, + , etc

3. Token: It is a sequence of characters that can be treated as a unit in the

grammar of the programming languages.
Classes of similar lexemes are identified by the same token.
For eg: identifier, keyword, operator, constant, delimeter , etc.
50
Pattern is a rule, pattern for id is
Letter(letter+digit)*
Example: FORTRAN statement

E
M
C

Symbol Table
𝑟 + means one or
more instances.
r? means 0 or one
instance
Regular definitions:
Re-writing Regular definitions:
Recognition of tokens: Grammar
Token recognition can be
done using regular definitions:
D1→r1
D2→r2
.
.
Di is a new symbol not in ∑
and ri is a RE over ∑ .
No token for Ws: blank,
tab, new line etc.
Amity School of Engineering and Technology

Scanning the Input (concept of buffer scheme)

The source program lies in the input buffer. For Example: consider the statement E=M*C**2
Two pointers are used to recognize the lexeme:
𝒍𝒆𝒙𝒆𝒎𝒆𝑩𝒆𝒈𝒊𝒏: Mark the beginning of the current lexeme whose extent we are attempting to
determine
𝒇𝒐𝒓𝒘𝒂𝒓𝒅: Scan until a pattern match is found

67
Amity School of Engineering and Technology

Scanning the Input (concept of buffer scheme)

The lexical analyzer scans the given input from left to right one character at a time.
It uses two pointers
1. begin ptr(bp): or lexeme Begin pointer and
2. forward ptr(fp) (or forward): to keep track of the pointer of the input scanned.

Initially both the pointers point to the first character of the input string.
fp move ahead to find the white space (i.e. end of the lexeme), after encountering white space bp and fp
are set to next token.
This reading of characters from secondary storage is very costly. Hence buffering technique is used. In
buffering technique, a block of data is first placed from main memory to buffer and then from buffer to
lexical analyzer.
There are two methods used in this context:
1. One Buffer Scheme and
2. Two Buffer Scheme.
68
Amity School of Engineering and Technology

One Buffer Scheme

In this scheme, only one buffer is used to store the input string but the
problem with this scheme is that if lexeme is very long (as compared to
length of the buffer) then it crosses the buffer boundary, to scan rest of the
lexeme the buffer has to be refilled, that makes overwriting the first of
lexeme.

69
Amity School of Engineering and Technology

Two Buffer Scheme

To overcome the problem of one buffer scheme, a two-buffer scheme is used. In this method two buffers
are used to store the input string. The first buffer and second buffer are scanned alternately. When end of
current buffer is reached the other buffer is filled. To identify the boundary of first buffer, end of buffer
(eof) character should be placed at the end first buffer.

Similarly end of second buffer is also recognized by the end of buffer mark present at the end of second
buffer. when fp encounters first eof, then one can recognize end of first buffer and hence filling up second
buffer is started. In the same way when second eof is obtained then it indicates of second buffer.
Alternatively, both the buffers can be filled up until end of the input program and stream of tokens is
identified. This eof character introduced at the end is calling Sentinel which is used to identify the end of
buffer.

70
Thus, both the test can be combined by extending each buffer to hold a sentinels character (eof)
at the end.
LEX(lexical Analyzer generator):
▪ LEX is a s/w tool to generate
lexical analyzer (1st phase of a
Compiler). It is based on R.E.
▪ YACC is a tool to generate syntax
analyzer to check syntax (2nd
phase of a compiler. It is based on
Grammar or production rule.
Construction of Lexical Analyzer with LEX tool:

❑ 1st a lex specification or Lex source file (𝒍𝒆𝒙. 𝒍) is prepared. It consists of Regular Expression
associated with Lex actions (called Lex language).
❑ Then Lex compiler is used to convert Lex specification file (lex.l) into C language file 𝒍𝒆𝒙. 𝒚𝒚. 𝒄 (it is
a lex program in C language). To run anything, we want .exe file. So, a C compiler is used which
convert this C program file (lex.yy.c) into executable file (binary form), called 𝒂. 𝒐𝒖𝒕.
❑ Now this executable file or output file 𝒂. 𝒐𝒖𝒕 is ready to generate a sequence of tokens. Any input
stream, to this file, generates a sequence of tokens. For example: Tokens may be identifiers,
keywords, operators etc.
Structure of lex program:

Declaration section start with %{ and end with %} and

having information such as #include<stdio.h>, any
constant or variable declaration.
In translation section, a pattern is defined with
associated action within brackets { }. An action is a C
programming code.
A auxiliary section start with a main() and contains
yylex() fuction. The job of this function is to transfer the
execution to translation section. Program execution
always start with main program.
Lex specification: Square bracket [abc] means either a
or b or c.
How to write patterns-
[a-z]: any letter between range a to z
[a\-z]: occurrence of either a or z.
[^ab]: negation of ab (anything except
a and b.
?: zero or one occerrance
| : or operator

For example vowels is represented as

[aeiou].
And consonants [b-d f-h j-n p-t v-z]
Lex Compiler:

Compiling lex program: 3

steps:
❑ Lex file is prepared and
saved with .l extension
❑ CC means c compiler and
❑ Since C compiler can not
execute the lex file so we
have to exclude lex file so,
–lfl means excluding lex file.
A simple LEX program for C “tokens”

For example: suppose we enter the

string 𝒊𝒏𝒕 𝒂, 𝒃, 𝒄;
The generated token are:
Keywords, identifier, special symbol,
identifier, special symbol, identifier,
special symbol.
YACC (yet another compiler-compiler)
Structure of YACC program:
Executing a YACC program:
Compiling a YACC program:
For Example:
Consider a grammar to accept the language
L={ set of all strings start with 01}
Lex file 𝒂𝒃𝒄. 𝒍
YACC program
Now compile the program by using the following steps:
Amity School of Engineering and Technology

Formal Definition:

92
Amity School of Engineering and Technology

93
Amity School of Engineering and Technology

Context Free Grammar

Definition. A context-free grammar is a 4-tuple (, NT, R, S), where:

•  is an alphabet (each character in  is called terminal)

• NT is a set (each element in NT is called nonterminal)
• R, the set of rules, is a subset of NT  (  NT)*
• S, the start symbol, is one of the symbols in NT

If (,)  R, we write production  → 

 is called a sentential form

94
Amity School of Engineering and Technology

Examples of
context Free
Grammar:
Amity School of Engineering and Technology

Language of
a Grammar
Amity School of Engineering and Technology

Example:
Amity School of Engineering and Technology

Context Free
Language
(Definition):
Amity School of Engineering and Technology

Example
Amity School of Engineering and Technology

Another Example

100
Amity School of Engineering and Technology

Another Example

101
Amity School of Engineering and Technology

Derivation Order and Parse Tree

102
Amity School of Engineering and Technology

Derivation Order and Parse Tree

103
Amity School of Engineering and Technology

Parse Tree
A parse tree of a derivation is a tree in which:

• Each internal node is labeled with a nonterminal

•If a rule A → A1A2…An occurs in the derivation then A is a parent node of nodes labeled A1,
A2, …, An

a
S

a S
b
S

e
104
Amity School of Engineering and Technology

Parse Tree
S →A|AB Sample derivations:
A →|a|Ab|AA S  AB  AAB  aAB  aaB  aabB  aabb
B →b|bc|Bc|bB S  AB  AbB  Abb  AAbb  Aabb  aabb

These two derivations use same productions, but in different orders.

This ordering difference is often uninteresting.
Derivation trees give way to abstract away ordering differences.

S Root label = start node.

Each interior label = variable.

A B
Each parent/child relation = derivation step.
A A b B
Each leaf label = terminal or .
a a b
All leaf labels together = derived string = yield.
105
Amity School of Engineering and Technology

Leftmost, Rightmost Derivations

Definition. A left-most derivation of a sentential form is one in which rules transforming
the left-most nonterminal are always applied

Definition. A right-most derivation of a sentential form is one in which rules transforming

the right-most nonterminal are always applied

106
Amity School of Engineering and Technology

Leftmost, Rightmost Derivations

S →A|AB Sample derivations:
A →|a|Ab|AA S  AB  AAB  aAB  aaB  aabB  aabb
B →b|bc|Bc|bB S  AB  AbB  Abb  AAbb  Aabb  aabb

S These two derivations are special.

A B 1st derivation is leftmost.

Always picks leftmost variable.
A A b B
2nd derivation is rightmost.
Always picks rightmost variable.
a a b
107
Amity School of Engineering and Technology

Ambiguity in CFG

108
Amity School of Engineering and Technology

How to check a Given CFG is Ambiguous?

Q. Check whether the following CFG is ambiguous?

E→𝐸+𝐸 S → 𝑎𝑆/𝑆𝑎/a S → 𝑎𝑆𝑏𝑆/𝑏𝑆𝑎𝑆/∈ R → 𝑅 + 𝑅/𝑅.R/R*/a/b/c
E→𝐸∗𝐸
E → 𝑖𝑑
109
Amity School of Engineering and Technology

Ambiguous
Grammar
(Examples1
,Example2)
Amity School of Engineering and Technology

Check whether the following CFG is ambiguous?

Q.1:
bExp → 𝑏𝐸𝑥𝑝 𝑶𝑹 𝑏𝐸𝑥𝑝
bExp → 𝑏𝐸𝑥𝑝 𝑨𝑵𝑫 𝑏𝐸𝑥𝑝
Check Your bExp → 𝑵𝑶𝑻 𝑏𝐸𝑥𝑝
bExp → 𝑻𝑹𝑼𝑬
Progress-1 bExp → 𝑭𝑨𝑳𝑺𝑬
Amity School of Engineering and Technology

Converting Ambiguous to Unambiguous Grammar:

112
Amity School of Engineering and Technology

Converting Ambiguous to Unambiguous Grammar:

Example1:

Note: Removing Left recursion removes ambiguity from expression related grammar (now,
only one parse tree for any w) but still here a problem of precedence and associativity
113
Amity School of Engineering and Technology

Converting
Ambiguous to
Unambiguous
Grammar:
Example2:

Note: Removing Left recursion and (or) Left factoring from Grammar, removes the ambiguity in a CFG.
Amity School of Engineering and Technology

Converting Ambiguous to Unambiguous Grammar:

Example3:

Note: Removing Left recursion and (or) Left factoring from Grammar, removes the ambiguity in a CFG.

115
Amity School of Engineering and Technology

Ambiguity in CFG (some more examples):

Example
Amity School of Engineering and Technology

Ambiguity in CFG

Example
Amity School of Engineering and Technology

Ambiguity in CFG

Example
Amity School of Engineering and Technology

Ambiguity in CFG

Example
Amity School of Engineering and Technology

Take a=2
Ambiguity in CFG
▪ Two different Parse
tree may cause
problems in
Example
applications which use
the derivation tree.
▪ For Example: -
Evaluating
expressions, and in
general, in compiler for
programming
languages.
Amity School of Engineering and Technology

Q.1 Consider the following parse tree for the expression a#b$c$d#e#f, involving two binary
operators $ and #.
Which one of the following is correct for the given parse tree?
(A) $ has higher precedence and is left associative; # is right associative
(B) # has higher precedence and is left associative; $ is right associative
(C) $ has higher precedence and is left associative; # is left associative
(D) # has higher precedence and is right associative; $ is left associative

121
Ambiguous Grammar
Definition. A grammar G is ambiguous if there is a word w  L(G)
having at least two different parse trees or
Two or more than 2 LMD or
Two or more than 2 RMD
S→A
S→B
S → AB
A → aA
B → bB
A→e
B→e

Notice that a has at least two left-most derivations

Amity School of Engineering and Technology

Example
Amity School of Engineering and Technology

Two Derivation Trees for same w => Ambiguous Grammar

S →A|AB Other derivation trees for

A →  | a | A b | AA w = aabb this string?
B →b|bc|Bc|bB
S
S S
A
A B A B
A A Infinitely
A A b B A A b many others
A A A b possible.
a a b a A b
a  A b
a
a
Amity School of Engineering and Technology

Another Example of Ambiguous Grammar

Amity School of Engineering and Technology

Two
Derivation
tree for W
Amity School of Engineering and Technology

Important
note for
Ambiguity
in CFG
Amity School of Engineering and Technology

Writing Unambiguous Grammar:

Consider the following Grammar:

𝐸 →𝐸+𝐸
𝐸 →𝐸∗𝐸
𝐸→ 𝐸
𝐸→𝑎

130
Amity School of Engineering and Technology

Writing
Ambiguous to
unambiguous
CFG
Amity School of Engineering and Technology

Check your Progress-2

Remove the Ambiguity in the following grammar

Ex2: 𝑅→𝑅 + 𝑅/𝑅. 𝑅/𝑅 ∗/𝑎/𝑏/𝑐

𝐸𝑥3: 𝑏𝐸𝑥𝑝→𝑏𝐸𝑥𝑝 𝑂𝑅 𝐵𝑒𝑥𝑝

𝑏𝐸𝑥𝑝→𝑏𝐸𝑥𝑝 𝐴𝑁𝐷 𝐵𝑒𝑥𝑝
𝑏𝐸𝑥𝑝→𝑁𝑂𝑇 𝐵𝑒𝑥𝑝
𝑏𝐸𝑥𝑝→𝑇𝑅𝑈𝐸
𝑏𝐸𝑥𝑝→𝐹𝐴𝐿𝑆𝐸

132
Amity School of Engineering and Technology

Ambiguity

CFG ambiguous  any of following equivalent statements:

•  string w with multiple (2 or more than 2) derivation trees.
•  string w with multiple (2 or more than 2) leftmost derivations.
•  string w with multiple (2 or more than 2) rightmost derivations.

Defining ambiguity of grammar, not language.

Amity School of Engineering and Technology

Rules for converting ambiguous to unambiguous grammar

Generally, productions are ambiguous when they have more than one occurrence of
a given non-terminal on their right-hand side.

Rules (for expression related (precedence and associativity) grammar)

1. Check the precedence of the operators involved.
2. Different precedence operators are treated differently for removing ambiguity.
3. First remove the ambiguity for the minimum precedence operator and so on…
4. Then check operator associativity while writing grammar

134
Amity School of Engineering and Technology

Rules for Converting Ambiguous to Unambiguous Grammar:

135
Amity School of Engineering and Technology

Rules for Disambiguating a grammar

Example: If a production rule is
S->ASBSy|a1|a2|……|an
Then ambiguity can be removed by rewriting the production rule as
S -> ASBSy|S’
S’-> a1|a2|…|an

If the operator is left associative then change the right most symbol
Eg.- E->E*E|id can be replaced by
E->E*E’|E’
E’-> id
If the operator is right associative then change the Left most symbol
Eg.- E->E E|id can be replaced by
E->E’ E|E’
E’-> id 136
Amity School of Engineering and Technology

Removal of Left Recursion and Left Factoring

137
Amity School of Engineering and Technology

Elimination of Left Recursion

A grammar is left recursive if the first symbol on the right hand side of a rule is the same
non-terminal as that in the left hand side.
E.g.:- S->Sa

Rule to remove Left recursion from a grammar of the form

S->Sά1| Sά2|…| Sάn|β1|β2|…|βm
S->β1S’|β2S’|…|βmS’
S’-> ά1S’| ά2S’|…| άnS’|€

138
Amity School of Engineering and Technology

Elimination of Left Factoring

Left factoring is a process which isolates the common parts of two or more productions
into a single production. After elimination of left factoring, the produced grammar is
suitable for top down parsing.

Rule to remove Left factoring from a grammar of the form

A-> ά β1| ά β2|…| ά βm
Then,
A-> ά A’
A’-> β1| β2|…| βm

139
Amity School of Engineering and Technology

End of Module-1

**************

140
Amity School of Engineering and Technology

Extra slide for Module-1

**************

141
Regular Expression
Amity School of Engineering and Technology

158
Amity School of Engineering and Technology

Introduction of Lexical Analysis

Lexical Analysis is the first phase of the compiler also known as a scanner. Its main task
is to read the input characters and produce a sequence of tokens that the parser(next
phase) uses for syntax analysis.

Lexical Analysis can be implemented with the Deterministic finite Automata.

The output is a sequence of tokens that is sent to the parser for syntax analysis.

159
Amity School of Engineering and Technology

Some Important terms:

2. Lexeme: Lexemes are the smallest logical units of a program. It is a sequence of

characters present in the source program for which a token is produced. For eg.-
10, int, + , etc

3. Token: It is a sequence of characters that can be treated as a unit in the

grammar of the programming languages.
Classes of similar lexemes are identified by the same token.
For eg: identifier, keyword, operator, constant, delimeter , etc.
160
Amity School of Engineering and Technology

Introduction of Lexical Analysis

Token may have two things: <token-name, token value>

token-name is the type and token value points to an entry in the symbol table for the
token.
Example of Non-Tokens:
Comments, preprocessor directive, macros, blanks, tabs, newline, etc.

161
Amity School of Engineering and Technology

Symbol Table in Compiler

Symbol Table is an important data structure created and maintained
by the compiler in order to keep track of semantics of variable i.e. it
stores information about scope and binding information about names,
information about instances of various entities such as variable and
function names, classes, objects, etc.
It is built in lexical and syntax analysis phases.
The information is collected by the analysis phases of compiler and is
used by synthesis phases of compiler to generate code.
It is used by compiler to achieve compile time efficiency.

162
Amity School of Engineering and Technology

Symbol Table in Compiler

Items stored in Symbol table:
Variable names and constants
Procedure and function names
Literal constants and strings
Compiler generated temporaries
Labels in source languages
Information used by compiler from Symbol table:
Data type and name
Declaring procedures
Offset in storage
If structure or record then, pointer to structure table.
For parameters, whether parameter passing by value or by reference
Number and type of arguments passed to function
Base Address

163
Amity School of Engineering and Technology

Specification of Tokens(Pattern)
Regular expressions are an important notation for specifying patterns.
Regular expression is a grammar. It defines the rule. For the grammar, first
we have to define the alphabet

An alphabet is a finite, non-empty set of symbols

• We use the symbol ∑ (sigma) to denote an alphabet
• Examples:
• Binary: ∑ = {0,1}
• All lower case letters: ∑ = {a,b,c,..z}
• digits ∑ = {0-9}

164
Amity School of Engineering and Technology

Regular Expression

A string or word is a finite sequence of symbols chosen from ∑

• Empty string is  (or “epsilon”)

• Length of a string w, denoted by “|w|”, is equal to the number of

(non- ) characters in the string
• E.g., x = 010100 |x| = 6
• x = 01  0  1  00  |x| = ?

• xy = concatentation of two strings x and y

165
Amity School of Engineering and Technology

Example of Minimization of DFA

166
Amity School of Engineering and Technology

Transition diagram for Different types of Tokens

1. Identifiers:

2. Relational Operator

167
Amity School of Engineering and Technology

Transition diagram for Different types of Tokens

3. Keyword

168
Amity School of Engineering and Technology

Introduction of Lexical Analysis Valid Tokens:

'int’
'main’
Example 1: '(‘
')’
'{‘
'int’
'a’
',’
'b’
';’
'a’
'=‘
'10’
';’
'return’ '0’ ';’ '}'

169
Amity School of Engineering and Technology

Introduction of Lexical Analysis

Example 2: → There are 5 valid token in this
printf statement.

Example 3: int max(int i);

• Lexical analyzer first read int and finds it to be valid and accepts as token
• max is read by it and found to be a valid function name after reading (
• int is also a token , then again i as another token and finally ;
Answer: Total number of tokens 7: int, max, ( ,int, i, ), ;

170
Amity School of Engineering and Technology

Bootstrapping
Bootstrapping is widely used in the compilation development.
Bootstrapping is used to produce a self-hosting compiler. Self-hosting compiler is a type
of compiler that can compile its own source code.
Bootstrap compiler is used to compile the compiler and then you can use this compiled
compiler to compile everything else as well as future versions of itself.

A compiler can be characterized by three languages:

Source Language
Target Language
Implementation Language

171
Amity School of Engineering and Technology

Bootstrapping
The T- diagram shows a compiler SCIT for Source S, Target T, implemented in I.

Follow some steps to produce a new language L for machine A:

1. Create a compiler SCAA for subset, S of the desired language, L using language "A" and
that compiler runs on machine A.

2. Create a compiler LCSA for language L written in a subset of L.

172
Amity School of Engineering and Technology

Bootstrapping
3. Compile LCSA using the compiler SCAA to obtain LCAA. LCAA is a compiler for language L,
which runs on machine A and produces code for machine A.

173
Amity School of Engineering and Technology

174
Amity School of Engineering and Technology

Writing Program in C for any given Regular Expression or

(any Language)

175
Amity School of Engineering and Technology

Program#1: WAP in C/C++ to Implement a deterministic finite automata (DFA) for accepting the
language
𝐿 = {𝑎𝑛𝑏𝑛 | 𝑛 mod 2 = 0, 𝑚 ≥ 1}

Regular expression for above language L is,

L = (aa)*.b+

Example (Output):

Input: aabbb
Output: ACCEPTED //n=2(even) m=3 (>=1)
Input: aaabbb
Output: NOT ACCEPTED //n=3(odd), m = 3

176
Amity School of Engineering and Technology

L = (aa)*.b+

Approaches:
There are 3 steps involve which results in acceptance of string:

1. Construct FA for (aa)* means having even number of a’s.

2. Construct FA for b+ means having any number of b’s greater than one.
3. Concatenate the two FA and make single DFA.
Any other combination result is the rejection of the input string.

Given DFA has following states.

State 3 leads to the acceptance of the string, whereas states 0, 1, 2 and 4 leads
to the rejection of the string.

177
Amity School of Engineering and Technology

178
Amity School of Engineering and Technology

179
Amity School of Engineering and Technology

180
Amity School of Engineering and Technology

181
L={01,001,101,110001,1001,……….}
L={01,001,101,110001,1001,……….}

Amity School of Engineering and Technology

Problem#2
Design deterministic finite automata (DFA) with ∑ = {0, 1} that accepts the
languages ending with “01” over the characters {0, 1}.
Solution:
L={01,001, 10001,1101,….}
Minimum string length is 2 so minimum 3 states required and its DFA is

182
L={01,001,101,110001,1001,……….}
L={01,001,101,110001,1001,……….}

Amity School of Engineering and Technology

Problem#2
Solution:
L={01,001,10001,…..}
=(0+1)*01

OUTPUT

183
Amity School of Engineering and Technology

Problem #3

L={Odd no of 0’s and Odd no of 1’s}

184
Amity School of Engineering and Technology

185

Google Interview
33% (3)
Google Interview
487 pages
CC_Mod1_PPT(Dr. Juhi)
No ratings yet
CC_Mod1_PPT(Dr. Juhi)
101 pages
CC_Mod1_PPT
No ratings yet
CC_Mod1_PPT
37 pages
Compiler Construction CS-4207: Lecture 1 & 2 Instructor Name: Atif Ishaq
No ratings yet
Compiler Construction CS-4207: Lecture 1 & 2 Instructor Name: Atif Ishaq
29 pages
Unit - 1
No ratings yet
Unit - 1
43 pages
Compiler Design Notes
100% (1)
Compiler Design Notes
156 pages
Compiler Construction Course
No ratings yet
Compiler Construction Course
12 pages
Lecture 1 CC
No ratings yet
Lecture 1 CC
18 pages
Compiler Design i Msc (1)
No ratings yet
Compiler Design i Msc (1)
160 pages
Module 1
100% (1)
Module 1
91 pages
Narayana Engineering College::Nellore: Department of Computer Science and Engineering
No ratings yet
Narayana Engineering College::Nellore: Department of Computer Science and Engineering
20 pages
CD Iii I
No ratings yet
CD Iii I
180 pages
Compiler Lab Manual
No ratings yet
Compiler Lab Manual
82 pages
CD - Unit - 1 IPU
No ratings yet
CD - Unit - 1 IPU
121 pages
Compiler Design Lec1
No ratings yet
Compiler Design Lec1
6 pages
Unit 2
No ratings yet
Unit 2
16 pages
Lecture 01 02
No ratings yet
Lecture 01 02
12 pages
Compiler Design
No ratings yet
Compiler Design
156 pages
Compiler Design Notes
No ratings yet
Compiler Design Notes
161 pages
Lecture 1
No ratings yet
Lecture 1
30 pages
CS8602 Notes Compiler Design
No ratings yet
CS8602 Notes Compiler Design
92 pages
Compiler Design Notes
No ratings yet
Compiler Design Notes
8 pages
19CSE401 CD 01 Introduction
No ratings yet
19CSE401 CD 01 Introduction
28 pages
Compiler Design
No ratings yet
Compiler Design
188 pages
Principles of Compiler Design PDF
0% (1)
Principles of Compiler Design PDF
177 pages
Compiler Lecture 1
No ratings yet
Compiler Lecture 1
13 pages
Compiler Design
No ratings yet
Compiler Design
94 pages
Compiler Design - Quick Guide
No ratings yet
Compiler Design - Quick Guide
38 pages
Compiler Lecture-1
No ratings yet
Compiler Lecture-1
47 pages
AT_Module6_Compiler and its phases_PS
No ratings yet
AT_Module6_Compiler and its phases_PS
32 pages
Compiler Design Book PDF
100% (1)
Compiler Design Book PDF
101 pages
Compiler
No ratings yet
Compiler
79 pages
Notes Compile Complete
No ratings yet
Notes Compile Complete
117 pages
Compiler Design - Phases of Compiler
No ratings yet
Compiler Design - Phases of Compiler
4 pages
Assignment 1
No ratings yet
Assignment 1
8 pages
CH 1
No ratings yet
CH 1
21 pages
Programming Language and Compiler Design Session
No ratings yet
Programming Language and Compiler Design Session
33 pages
Intro Compiler
No ratings yet
Intro Compiler
47 pages
PF Lect1-48 Session 22 26
No ratings yet
PF Lect1-48 Session 22 26
695 pages
Compiler Design LectureNotes
No ratings yet
Compiler Design LectureNotes
45 pages
AT&FL Lab 11
No ratings yet
AT&FL Lab 11
6 pages
Compiler Design Mini Project Sakshi Gupta
No ratings yet
Compiler Design Mini Project Sakshi Gupta
7 pages
Compiler Design: B.Tech Cse Iii Year Ii Semester
No ratings yet
Compiler Design: B.Tech Cse Iii Year Ii Semester
25 pages
Chapter-1
No ratings yet
Chapter-1
49 pages
Chapter 1
No ratings yet
Chapter 1
51 pages
Compiler Design Chapter-1
No ratings yet
Compiler Design Chapter-1
41 pages
for stud
No ratings yet
for stud
51 pages
CD Chapter 1 (1)
No ratings yet
CD Chapter 1 (1)
39 pages
C - Notes
No ratings yet
C - Notes
171 pages
Compiler Design Concepts, Worked Out Examples and Mcqs For Net/Set
No ratings yet
Compiler Design Concepts, Worked Out Examples and Mcqs For Net/Set
101 pages
CH 1 C++
No ratings yet
CH 1 C++
17 pages
Chapter 1
No ratings yet
Chapter 1
40 pages
Compiler CH-2
No ratings yet
Compiler CH-2
60 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
32 pages
Lecture 1-Introduction
No ratings yet
Lecture 1-Introduction
18 pages
Compiler Design (Unit-1,2) (AKTU)
No ratings yet
Compiler Design (Unit-1,2) (AKTU)
99 pages
Compiler Design Concepts Worked Out Examples and M
100% (1)
Compiler Design Concepts Worked Out Examples and M
100 pages
Ch1 IntroductiontoCompilerpdf 2023 12 18 08 57 18
No ratings yet
Ch1 IntroductiontoCompilerpdf 2023 12 18 08 57 18
71 pages
Chapter 1
No ratings yet
Chapter 1
17 pages
Unit-1 Notes CD OU
No ratings yet
Unit-1 Notes CD OU
19 pages
Code Beneath the Surface: Mastering Assembly Programming
From Everand
Code Beneath the Surface: Mastering Assembly Programming
Kameron Hussain
No ratings yet
Module 2
No ratings yet
Module 2
142 pages
M1-L5-Lime Soda
No ratings yet
M1-L5-Lime Soda
10 pages
M1-L4 - Boiler Feed Water
No ratings yet
M1-L4 - Boiler Feed Water
19 pages
M1-L2 - Hardness Determination EDTA
0% (1)
M1-L2 - Hardness Determination EDTA
9 pages
V Idhi: Business Plan
No ratings yet
V Idhi: Business Plan
11 pages
Landmarks in Noida By-Shalika Mittal 4L
No ratings yet
Landmarks in Noida By-Shalika Mittal 4L
8 pages
Fun With Alphabets
No ratings yet
Fun With Alphabets
29 pages
Message 6
No ratings yet
Message 6
3 pages
Aitkensmethod 170829115234
No ratings yet
Aitkensmethod 170829115234
17 pages
ch-9 Oosd The Object-Oriented Design Process and Design Axioms
No ratings yet
ch-9 Oosd The Object-Oriented Design Process and Design Axioms
53 pages
Computer Science - Data Structures
No ratings yet
Computer Science - Data Structures
1 page
Data Types
No ratings yet
Data Types
13 pages
Computer Science Language Assignment
No ratings yet
Computer Science Language Assignment
5 pages
12-10 JAVAAAKASH
No ratings yet
12-10 JAVAAAKASH
3 pages
2021CS10095 Embedded
No ratings yet
2021CS10095 Embedded
2 pages
Linux Commands
No ratings yet
Linux Commands
3 pages
CSE321 - 1. Introduction & Operating-System
No ratings yet
CSE321 - 1. Introduction & Operating-System
45 pages
Canvas: Java SE 6
No ratings yet
Canvas: Java SE 6
3 pages
Linked Lists
100% (1)
Linked Lists
22 pages
Blind PHP
No ratings yet
Blind PHP
6 pages
Web Tech Unit 3 PDF
No ratings yet
Web Tech Unit 3 PDF
38 pages
Rel Notes1
No ratings yet
Rel Notes1
48 pages
07 Boosting Notes
No ratings yet
07 Boosting Notes
10 pages
CS2305 PP Ques Bank (2&16)
No ratings yet
CS2305 PP Ques Bank (2&16)
6 pages
PL-SQL Functions
100% (1)
PL-SQL Functions
130 pages
Tugas Roki Framework
No ratings yet
Tugas Roki Framework
20 pages
Assignment-11 Solution July 2019
0% (1)
Assignment-11 Solution July 2019
4 pages
C++ Access Specifiers
No ratings yet
C++ Access Specifiers
6 pages
Kalman Filter Implementation Algorithm
No ratings yet
Kalman Filter Implementation Algorithm
1 page
Imp C Q On Unit I - II
No ratings yet
Imp C Q On Unit I - II
5 pages
Unit-2-Start Learning R
No ratings yet
Unit-2-Start Learning R
10 pages
Notes by Hriddhi
No ratings yet
Notes by Hriddhi
24 pages
What Is Rate of Growth?: Structures-And-Algorithms
No ratings yet
What Is Rate of Growth?: Structures-And-Algorithms
2 pages
DTC Unicode Programming
No ratings yet
DTC Unicode Programming
14 pages
Intervalmatch QlikView Function
No ratings yet
Intervalmatch QlikView Function
6 pages
Unit-2 MCQS
No ratings yet
Unit-2 MCQS
7 pages