Chapter 3 - Syntax Analysis
Chapter 3 - Syntax Analysis
Compiler Design
1
The Concept of Syntax Analysis
• The tokens generated by lexical analyzer are accepted by the next phase of compiler i.e.
syntax analyzer.
• The syntax analyzer or parser comes after lexical analysis.
• This is one of the important components of front end of the compiler.
• The syntax analysis is the second phase in compilation.
• The syntax analyzer (parser) basically checks for the syntax of the language.
• The syntax analyzer takes the tokens from the lexical analyzer and groups them in such
as a way that some programming structure(syntax) can be recognized.
• After grouping the token, if any syntax cannot be recognized then syntactic error will be
generated.
• This overall process is called syntax checking of the language. 2
Definition of Parser
• A parsing or syntax analysis is a process which takes the input string w and produces either parse tree
(syntactic structure) or generates the syntactic errors.
For example: a = b + 10;
• The above programming statement is the given to lexical analyzer.
• The lexical analyzer will divide it into group of tokens.
• The syntax analyzer takes the tokens as input and generates a tree like structure called parse tree.
• The parse tree drawn above is for some programming statement.
• It shows how the statement gets parsed according to their syntactic specification.
3
Role of Parser
• In the process of compilation, the parser and lexical analyzer work together.
• That means, when parser requires string tokens it invokes lexical analyzer.
• In turn, the lexical analyzer supplies tokens to syntax analyzer (parser).
4
Cont'd ...
• The parser collects sufficient number of tokens and builds a parse tree.
• Thus, by building the parse tree, parser smartly finds the syntactical errors if any.
• It is also necessary that the parser should recover from commonly occurring errors so that
remaining task of process the input can be continued.
Why lexical and Syntax Analyzer are separated?
• The lexical analyzer scans the input program and collects the tokens from it.
• Parser builds a parse tree using these tokens.
• There are two important activities and these activities are independently carried out by these
two phases of compiler.
• Separating out these two phases has two advantages:
1. It accelerates the process of compilation
2. The error in the source input can be identified precisely 5
Basic Issues in Parsing
• There are two important issues in parsing
1. Specification of syntax
2. Representation of input after parsing
• A very important issue in parsing is specification of syntax in programming language.
• Specification of syntax means how to write any programming language statements.
• There some characteristics of specification of syntax:
i. Specification should be precise and unambiguous
ii. Specification should be in detail
iii. Specification should be complete.
• Such specification is called "Context free Grammar" (CFG).
• Another important issue in parsing is representation of the input after parsing.
• Lastly the most crucial issue is the parsing algorithm 6
Context Free Grammars
• Specification of input can be done by using "Context Free Grammar".
• The context free grammar G is a collection of following things:
1. V is set of non-terminals
2. T is a set of terminals
3. S is a start symbol
4. P is a set of production rule
• Thus G can be represented as G=(V, T, S, P)
• The production rules are given in the following form:
7
Cont'd ...
• Example 1: let the language L= anbn where n>1
• Let G=(V, T, S, P), where V={S}, T = {a,b} and S is a start symbol then, given the
production rules.
State
Terminator
Type List
id ;
List ,
int
id
Fig. 3.4.Parse Tree for Derivation of int id , id ;
• Hence, int id, id; can be defined by means of the above context free grammars.
9
Cont'd ...
10
Derivation and Parse Tree
• Derivation from S means generation of string W from S.
• For constructing derivation two things are important.
i. Choice of Non-terminal from several others.
ii. Choice of rule from production rules for corresponding non-terminal.
Definition of Derivation Tree
• Let G = (V, T, P, S) be a Context Free Grammar.
• The derivation tree is a tree a which can be constructed by following properties:
i. The root has label S.
ii. Every vertex can be derived from (V U T U ε).
iii. If there exists a vertex A with children R1, R2, …., Rn then there should be production
A → R1, R2, …, Rn.
11
iv. The leaf nodes are from set T and interior nodes are from set V.
Cont'd ...
12
Cont'd ...
• Solution: leftmost derivation rightmost derivation
B →aBB
B →aBB B →bS
B →aBB S →bA
B →bS A →a
S →bA B →aBB
A →a B →b
B →b B →bS
B →bS S→bA
S→bA A →a
A →a
13
Cont'd ...
• The parse tree for the above derivation is as given below:
• Also obtain the leftmost and rightmost derivation for the string ‘aaabaab’ using above
grammar.
• Solution:
15
Cont'd ...
• The derivation tree can be drawn as follows:
(a) Derivation Tree for Leftmost Derivation (b)Derivation Tree for Rightmost Derivation
16
Ambiguous Grammar
• A grammar G is said to be ambiguous if it generates more than one parse tree for a sentences
of a language L(G).
Example 1:
for input string id+id*id for input string id+id+id
E E
E + E
E + E
E + E id
id E + E
id id id id
a) Parse Tree 1 b) Parse Tree 2
• There are two different parse trees for deriving string id+id*id and id+id+id
17
•
Cont'd ...
• There are two different parse trees for deriving string aab.
• Hence the above given grammar is an ambiguous grammar.
18
Parsing Techniques
20
Parsing Techniques Cont’d …
21
Top-down Parsing Techniques
• If for a non-terminal there are multiple production rules beginning with the same input
symbol to get the correct derivation, we need to try all these possibilities.
• Secondly, in backtracking we need to move some levels upward in order to check
possibilities.
• This increases lot of overhead in implementation of parsing. 23
Cont'd
• And hence it becomes necessary to eliminate the backtracking by modifying the
grammar.
• Backtracking will try different production rules to find the match for the input string
by backtracking each time.
• The backtracking is a powerful than predictive parsing.
• But this technique a backtracking parser is slower and it requires exponential time in
general.
• Hence, backtracking is not preferred for practical compilers.
Limitation:
• If the given grammar has more number of alternatives then the cost of
backtracking is high.
24
Predictive Parser
• As the name indicates predictive parser tries to predict the next construction using one or
more lookahead symbols from input string.
• There are two types of predictive parser:
i. Recursive Descent Parser
ii. LL (1) Parser
i. Recursive Descent Parser
• A parser that uses collection of recursive procedures for parsing the given input string is
called Recursive Descent (RD) Parser).
• This type of parser the CFG is used to build the recursive routines.
• The RHS of the production rule is directly converted to a program.
• For each non-terminal a separate procedure is written and body of the procedure (code) is
25
RHS of the corresponding non-terminal.
Cont'd
26
Cont'd
• To construct a parse top-down for the input string w=cad, begin with a tree consisting of
a single node labeled S and the input ptr. pointing to c, the first symbol of w.
• S has only one production, so we use it to expand S and obtain the tree as in figure.
27
Cont'd
29
Cont'd
• The simple block diagram for LL (1) parser is as given below.
• The data structure used by LL (1) are:
Input buffer
Stack
Parsing table
• The LL (1) parser uses input buffer to store the input tokens.
• The stack is used to hold the left sentential form.
• The symbol table in RHS of rule are pushed into the stack in reverse order. i.e. from
right to left.
• Thus, use of stack makes this algorithm no recursive.
• The table is basically a two-dimensional array.
30
Cont'd
• The table has row for non-terminal and column for terminals.
• The table can be represented as M[A, a] where A is a non-terminal and a is current input
symbol.
• The parser works as follows:
• The parsing program reads top of the stack and a current input symbol.
• With the help of these two symbols the parsing action is determined.
• The parsing action can be:
31
Cont'd
Top Input token Parsing Action
• The parser consults the table M[A, a] each time while taking the parsing actions hence
this type of parsing method is called table driven parsing algorithm.
• The configuration of LL (1) parser can be defined by top of the stack and a lookahead
token.
32
Cont'd
• One by one configuration is performed and the input is successfully parsed if the parser
reaches the halting configuration.
• When the stack is empty and next token is $ then it corresponds to successful parse.
• Configuration of Predictive LL (1) Parser
• The construction of predictive LL (1) parser is based on two very important functions
and those are FIRST and FOLLOW.
• For construction of Predictive LL (1) parser we have to follow the following steps:
1. Computation of FIRST and FOLLOW function
2. Construct the Predictive Parsing Table using FIRST and FOLLOW
functions
3. Stack Implementation
33
4. Construct Parse Tree.
Cont'd
First Function
• FIRST(α) is a set of terminal symbols that begins in strings derived from α.
Example: A→ abc/def/ghi
then FIRST(A) = {a, d, g}
Rules for Calculating FIRST Function
1. For production rule X → ε then FIRST(X) ={ε}
2. For any terminal symbol a then FIRST(a) ={a}
3. For production rule X→Y1 Y2 Y3
• Calculating FIRST(X)
If ε does not belongs to FIRST(Y1), then FIRST(X) = FIRST(Y1).
36
Cont'd
Note
• ε may appear in the First function of a non terminal.
• ε will never appear in the Follow function of a non terminal.
• It is recommended to eliminate left recursion from the grammar if present before
calculating First and Follow functions.
• We will calculate the Follow function of a non terminal looking where it is present on
RHS of a production rule.
37
Cont'd
• Example 1: Calculate the First and Follow functions of the given grammar
Solution:
S → aBDh Step 1: Calculating First and Follow Function
B → cC FIRST FOLLOW
C → bc/ε
S {a} {$}
D → EF B {c} {g, f, h}
E → g/ε C {b, ε} {g, f, h}
F → f/ε D {g, f, ε} {h}
E {g, ε} {f, h}
F {f, ε} {h}
38
Cont'd
Step 2: Construct parse table using First and Follow function.
a b c f g h $
S S → aBDh
B B → cC
C C → bc C→ε C →ε C →ε
D D → EF D → EF D → EF
E E→ε E→g E→ε
F F→f F→ε
39
Cont'd
Step 3: Stack Implementation Input string acbgh$
S
a h
B D
c E F
C
ε
b g
C
string acbgh
ε
41
Cont'd
• Example2: Calculate the First and Follow functions of the given grammar
S S →A
A A → aBA'
B B→b
C C→g
44
Cont'd
• Step 3: Stack Implementation by using Parsing Table.
a
B A'
b A'
String = abd
ε
46
Cont'd
• Example 3 : Show that the following grammar is LL(1).
S → AaAb |
BbBa
A→ε
B→ε
• Solution: now we will first construct FIRST and FOLLOW for the given grammar
FISRT FOLLOW a b $
S { a, b } {$} S S → AaAb S → BbBa
47
Cont'd
• Now consider the "ba" for Parsing:
Stack Input Production
S$ ba$ S → BbBa S
BbBa$ ba$ B →ε
bBa$ ba$ Pop b a
B B
Ba$ a$ B →ε b
a$ a$ Pop a
ε ε
$ $ Accept
• This shows that the given grammar is LL (1). • Parse Tree for the above given LL(1)
grammar
48
Cont'd
• Example 4 : Construct LL (1) parsing table for the following grammar.
S → aB | aC | Sd
| Se
B → bBc | f
C→g
• Solution: now we will first construct FIRST and FOLLOW for the given grammar
FISRT FOLLOW a b c d f g $
S S → aB
S {a} {d, e, $} S → aC
B { b, f } {c, d, e, $} B B → bBc B→f
C {g } {d, e, $}}
C C→g
AU 51
Bottom-Up Parser Cont.…
– Now constructing Parse Tree using bottom-up manner is
as follows: – Step5: Reducing id to L. L → id
– Step1: We will start from leaf node.
AU 57
Shift Reduce Parser Cont.…
– Solution: • Construct parse tree
using bottom-up
E
manner
E - E
E
id E *
id
id
T L ;
int L ,
id
id
AU 59
Shift Reduce Parser Cont.…
E → 2E2
– Example 3: Consider the grammar
E → 3E3
• Parse the input string 32423 using shift reduce parser. E →4
– Solution:
Stack Input Parsing Action • Construct parse tree
Buffer using bottom-up
$ 32423$ Shift 3 manner
E
$3 2423$ Shift 2
$32 423$ Shift 4 3 3
E
$324 23$ Reduce by E → 4
$32E 23$ Shift 2 2
2 E
$32E2 3$ Reduce by E → 2E2
$3E 3$ Shift 3
4
$3E3 $ Reduce by E → 3E3
AU 60
$E $ Accept
Operator Precedence Parsing
– Any grammar G isa called an operator precedence grammar it
meets the following two condition:
1. There Exist no production rule which contains ε (epsilon) on its right-
hand side (RHS).
2. There Exist no production rule which contains two non-terminal
adjacent to each other on the its right-hand side(RHS)
– A parser that reads and understand an operator precedence grammar is
called as operator precedence parser.
E → EAE
– Example 1: Which | (E) | -E | id
is not operator precedence grammar
A→ + | - | * | / | ^
E Which
– Example 2: → E + is
E | operator
E – E | E *precedence
E|E/E|E^ E | (E) | -E | id
grammar
AU 61
Operator Precedence Parsing Cont.…
– Operator precedence can be established between the terminals of
the grammar.
– It ignores the non-terminals.
– Parsing action
1. Both end of the given input string, add the $ symbol.
2. Now scan the input from left to right until the > is encountered.
3. Scan towards left over all the equal precedence until the first
leftmost < is encountered.
4. Everything between leftmost < and rightmost > is handle.
5. $ and $ means parsing is successful.
AU 62
Operator Precedence Parsing Cont.…
– There are three operator precedence relations
1. a > b is terminal a has higher precedence than b .
2. a < b is terminal a has lower precedence than b .
3. a = b is terminal a and b have same precedence .
Rules
– Precedence table
+ * ( ) id $ id, a, b, c is
high.
+ > < < < < >
$ is low
* > < < > < > +>+
( < < < = < x *>*
) > > x > x > id ≠ id
id > > x > x > $ Accept $
$ < < < x < A
AU 63
Operator Precedence Parsing Cont.…
– Example1: Consider for the following grammar and construct the operator
precedence parser, then parse the following string : id+id*id
E → EAE | id
– Solution: A→ + | *
• Step1: convert the given grammar to operator precedence grammar
E → E + E | E + E | id
• Step 2: Construct the operator precedence table, terminal symbols are
{id, +, *, $} Id + * $
• Relation table id > > >
+ < > < >
* < > > >
$ < < < A
AU 64
Operator Precedence Parsing Cont.…
• Step3: parse the given string id+id*id
stack Relation Input Action
$ < id+id*id$ Shift id
$id > +id*id$ Reduce by E → id
$E < +id*id$ Shift +
$E+ < id*id$ Shift id
$E+id > *id$ Reduce by E → id
$E+E < *id$ Shift *
$E+E* < id$ Shift id
$E+E*id > $ Reduce by E → id
$E+E*E > $ Reduce by E → E * E
$E+E > $ Reduce by E → E + E
$E A $ Accept
AU 65
LR Parser
Reading Assignment!!!
Thank You!!!
66
End of Part 1!!!
Thank You!!!