Syntax Analysis
Syntax Analysis
if
== = ;
b 0 a b
• Grammar
list list + digit
| list – digit
| digit
digit 0 | 1 | … | 9
6
Examples …
7
Syntax analyzers
• Testing for membership whether w
belongs to L(G) is just a “yes” or
“no” answer
8
Derivation
• If there is a production A α then we say
that A derives α and is denoted by A α
• α A β α γ β if A γ is a production
+
• If α1 α2 … αn then α1 αn
list + digit
list - digit 2
digit 5
9
12
Ambiguity
• A Grammar can have more than
one parse tree for a string
• Consider grammar
string string + string
| string – string
|0|1|…|9
9 5 5 2
14
Ambiguity …
• Ambiguity is problematic because meaning
of the programs can be incorrect
17
Parsing
• Process of determination whether a
string can be generated by a grammar
– Top-down parsing:
Construction of the parse tree starts at the
root (from the start symbol) and proceeds
towards leaves (token or terminals)
– Bottom-up parsing:
Constructions of the parse tree starts from the
leaf nodes (tokens or terminals of the
grammar) and proceeds towards root (start
symbol)
18
Example: Top down
Parsing
• Following grammar generates
types of Pascal
type simple
| id
| array [ simple] of type
simple integer
| char
| num dotdot num
19
Example …
• Construction of parse tree is done
by starting root labeled by start
symbol
21
array [ num dotdot num ] of integer
Start symbol
look-ahead Expand using the rule
type type array [ simple ] of
type
For example :
First(simple) = {integer, char, num}
First(num dotdot num) = {num}
23
Define a procedure for each non terminal
procedure type;
if lookahead in {integer, char, num}
then simple
else if lookahead =
then begin match( );
match(id)
end
else if lookahead = array
then begin match(array);
match([);
simple;
match(]);
match(of);
type
end
else error;
24
procedure simple;
if lookahead = integer
then match(integer)
else if lookahead = char
then match(char)
else if lookahead = num
then begin match(num);
match(dotdot);
match(num)
end
else
error;
procedure match(t:token);
if lookahead = t
then lookahead = next token
else error; 25
Ambiguity
• Dangling else problem
e2 s1 s2 27
Resolving dangling else
problem
• General rule: match each else with the closest
previous then. The grammar can be rewritten as
stmt matched-stmt
| unmatched-stmt
| others
28
Left recursion
• A top down parser with production
A A may loop forever
A R
RR|
29
Parse tree
corresponding Parse tree corresponding
to left recursive to the modified grammar
grammar A A
A R
A R
β α α β α Є
EE+T|T
TT*F|F
F ( E ) | id
E T E’
E’ + T E’ | Є
T F T’
T’ * F T’ | Є
F ( E ) | id
31
Removal of left recursion
In general
transforms to
S Aa | b
A Ac | Sd | Є
S Aa Sda
33
Removal of left recursion due
to many productions …
• After the first step (substitute S by its rhs in
the rules) the grammar becomes
S Aa | b
A Ac | Aad | bd | Є
S Aa | b
A bdA' | A'
A' cA' | adA' | Є
34
Left factoring
• In top-down parsing when it is not clear which
production to choose for expansion of a symbol
defer the decision till we have seen enough input.
• Therefore A 1 | 2
transforms to
A A’
A’ 1 | 2
35
Dangling else problem
again
Dangling else problem can be handled by
left factoring
can be transformed to
36
Predictive parsers
• A non recursive top down parsing method
input
Parse table is a
two dimensional array
M[X,a] where “X” is a
stack
Parse
table
38
Parsing algorithm
• The parser considers 'X' the symbol on top of stack,
and 'a' the current input symbol
if X = a = $ then halt
if X is a non terminal
then if M[X,a] = {X UVW}
then begin pop(X); push(W,V,U)
end
else error
39
Example
• Consider the grammar
E T E’
E' +T E' | Є
T F T'
T' * F T' | Є
F ( E ) | id
40
Parse table for the
grammar
id + * ( ) $
ETE’ ET
E E’
E’+T E’Є E’Є
E’ E’
TFT’ TFT
T ’
T’Є T’*FT T’Є T’Є
T’ ’
Fid F(E
F )
Blank entries are error states. For example
E can not derive a string starting with ‘+’
41
Example
Stack input action
$E id + id * id $ expand by
ETE’
$E’T id + id * id $ expand by TFT’
$E’T’F id + id * id $ expand by Fid
$E’T’id id + id * id $ pop id and
ip++
$E’T’ + id * id $ expand by T’Є
$E’ + id * id $ expand by E’+TE’
$E’T+ + id * id $ pop + and ip++
$E’T id * id $ expand by
TFT’
42
Example …
Stack input action
$E’T’F id * id $ expand by Fid
$E’T’id id * id $ pop id and ip++
$E’T’ * id $ expand by T’*FT’
$E’T’F* * id $ pop * and ip++
$E’T’F id $ expand by Fid
$E’T’id id $ pop id and ip++
$E’T’ $ expand by T’Є
$E’ $ expand by E’Є
$ $ halt
43
Constructing parse table
• Table can be constructed if for every non terminal,
every lookahead symbol can be handled by at most
one production
first follow
44
Compute first sets
• If X is a terminal symbol then First(X) =
{X}
• If X is a non terminal
and X YlY2 … Yk is a production
then
if for some i, a is in First(Yi)
and Є is in all of First(Yj) (such
that j<i)
then a is in First(X)
3. If there is a production A → αB
then everything in follow(A) is in
follow(B)
47
Example
• For the expression grammar
E T E’
E' + T E' | Є
T F T'
T' * F T' | Є
F ( E ) | id
follow(E) = follow(E’) = { $, ) }
follow(T) = follow(T’) = { $, ), + }
follow(F) = { $, ), +, *}
48
Construction of parse
• for each productiontable
A α do
– for each terminal ‘a’ in first(α)
M[A,a] = A α
– If Є is in First(α)
M[A,b] = A α
for each terminal b in follow(A)
49
Practice Assignment
• Construct LL(1) parse table for the
expression grammar
bexpr bexpr or bterm | bterm
bterm bterm and bfactor | bfactor
bfactor not bfactor | ( bexpr ) | true |
false
• Steps to be followed
– Remove left recursion
– Compute first sets
– Compute follow sets
– Construct the parse table
• Not to be submitted
50
Error handling
• Stop at the first error and print a message
– Compiler writer friendly
– But not user friendly
– Error productions
– Global correction
51
Panic mode
• Simplest and the most popular
method
53
Phrase level recovery
• Make local correction to the input
58
Bottom up parsing
• Construct a parse tree for an input string
beginning at leaves and going towards root
OR
• Reduce a string w of input to start symbol of
grammar
Consider a grammar
S aABe
A Abc | b
Bd
60
Shift reduce parsing …
• Bottom up parsing has two actions
61
Example
Assume grammar is E E+E | E*E | id
Parse id*id+id
String action
.id*id+id shift
id.*id+id reduce Eid
E.*id+id shift
E*.id+id shift
E*id.+id reduce Eid
E*E.+id reduce EE*E
E.+id shift
E+.id shift
E+id. Reduce Eid
E+E. Reduce EE+E
E. ACCEPT 62
Shift reduce parsing …
• Symbols on the left of “.” are kept on a
stack
66
Handles …
• Handles always appear at the top of the
stack and never inside it
stack inputaction
stack inputaction E+E *id shift
E+E *id reduce by E+E* id shift
EE+E E+E*id reduce by
E *id shift Eid
E* id shift E+E*E reduce
E*id reduce by byEE*E
Eid E+E reduce
E*E reduce byEE+EE
byEE*EE
70
Reduce reduce conflict
Consider the grammar M R+R | R+c | R
Rc
and input c+c
Stack input action Stack input action
c+c shift c+c shift
c +c reduce by Rc c +c reduce by
Rc
R +c shift
R +c shift
R+ c shift
R+ c shift
R+c reduce by Rc
R+c reduce by
R+R reduce by R+RM MR+cM
71
LR parsing
• Input contains the input
input string.
output
parser where each Xi is a grammar
symbol and each Si is a
state.
• If action[Sm,ai] = shift S
Then the configuration becomes
Stack: S0X1S1……XmSmaiS Input: ai+1…an$
• If action[Sm,ai] = accept
Then parsing is completed. HALT
• If action[Sm,ai] = error
Then invoke error recovery routine.
74
LR parsing Algorithm
Initial state: Stack: S0 Input: w$
Loop{
if action[S,a] = shift S’
then push(a); push(S’); ip++
else if action[S,a] = reduce Aβ
then pop (2*|β|) symbols;
push(A); push (goto[S’’,A])
(S’’ is the state after popping
symbols)
else if action[S,a] = accept
then exit
else error
}
75
Example
EE+T | T
Consider the grammar TT*F | F
And its parse table F ( E ) | id
State id + * ( ) $ E T F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3 76
11 r5 r5 r5 r5
Parse id + id * id
Stack Input Action
0 id+id*id$ shift 5
0 id 5 +id*id$ reduce by Fid
0F3 +id*id$ reduce by TF
0T2 +id*id$ reduce by ET
0E1 +id*id$ shift 6
0E1+ 6 id*id$ shift 5
0E1+ 6 id 5 *id$ reduce by Fid
0E1+ 6 F3 *id$ reduce by TF
0E1+ 6 T9 *id$ shift 7
0E1+ 6 T9* 7 id$ shift 5
0E1+ 6 T9* 7 id 5 $ reduce by Fid
0E1+ 6 T9* 7 F 10 $ reduce by TT*F
0E1+ 6 T9 $ reduce by EE+T
0E1 $ ACCEPT
77
Parser states
• Goal is to know the valid reductions at
any given point
81
Start state
• Start state of DFA is empty stack
corresponding to S’.S item
– This means no input has been seen
– The parser expects to see a string derived from
S
82
Closure operation
• If I is a set of items for a grammar G then
closure(I) is a set constructed as follows:
E’ E
EE+T | T
TT*F | F
F → ( E ) | id
If I is { E’ .E } then closure(I) is
E’ .E
E .E + T
E .T
T .T * F
T .F
F .id
F .(E)
84
Applying symbols in a
state
• In the new state include all the
items that have appropriate
input symbol just after the “.”
85
Goto operation
• Goto(I,X) , where I is a set of items and X is a
grammar symbol,
– is closure of set of item A αX.β
– such that A → α.Xβ is in I
E E + .T
T .T * F
T .F
F .(E)
F .id
86
Sets of items
C : Collection of sets of LR(0) items
for grammar G’
C = { closure ( { S’ .S } ) }
repeat
for each set of items I in C
and each grammar symbol X
such that goto (I,X) is not empty and
not in C
ADD goto(I,X) to C
until no more additions
87
Example
Grammar: I2: goto(I0,T)
E’ E E T.
E E+T | T T T. *F
T T*F | F
F (E) | id I3: goto(I0,F)
T F.
I0: closure(E’.E)
E′ .E
E .E + T I4: goto( I0,( )
E .T F (.E)
T .T * F
T .F E .E + T
F .(E) E .T
F .id T .T * F
T .F
I1: goto(I0,E) F .(E)
E′ E.
E E. + T F .id
I5: goto(I0,id)
F id. 88
I6: goto(I1,+) I9: goto(I6,T)
E E + .T E E + T.
T .T * F T T. * F
T .F
F .(E) goto(I6,F) is I3
F .id goto(I6,( ) is I4
goto(I6,id) is I5
I7: goto(I2,*)
T T * .F I10: goto(I7,F)
F .(E) T T * F.
F .id
goto(I7,( ) is I4
I8: goto(I4,E) goto(I7,id) is I5
F (E.)
E E. + T I11: goto(I8,) )
F (E).
goto(I4,T) is I2
goto(I4,F) is I3 goto(I8,+) is I6
goto(I4,( ) is I4 goto(I9,*) is I7
89
goto(I4,id) is I5
id
id I5
+ id
I1 I6 I9
( ( +
*
(
)
I0 I4 I8 I11
id
(
I2 * I7 I10
I3
90
F I5
E T
I1 I6 I9
E
I0 I4 I8 I11
T F T
F
I2 I7 I10
I3
91
id
id I5
F
E + T id
I1 I6 I9
( ( +
*
(
E )
I0 I4 I8 I11
T F T id
(
F
I2 * I7 I10
I3
92
Construct SLR parse table
• Construct C={I0, …, In} the collection of sets of
LR(0) items
• If Aα. is in Ii
then action[i,a] = reduce Aα for all a in follow(A)
• If goto(Ii,A) = Ij
then goto[i,A]=j for all non terminals A
94
Assignment
Construct SLR parse table for following grammar
E E + E | E - E | E * E | E / E | ( E ) | digit
• Steps to be followed
– Augment the grammar
– Construct set of LR(0) items
– Construct the parse table
– Show states of parser as the given string is parsed
• Due on todate+5
95
Example
• Consider following grammar and its SLR parse
table:
I1: goto(I0, S)
S’ S S’ S.
SL=R
SR I2: goto(I0, L)
L *R S L.=R
L id R L.
RL
Assignment
I0: S’ .S (not to be
S .L=R submitted):
S .R Construct rest
L .*R of the items
L .id and the parse
R .L table.
96
SLR parse table for the grammar
= * id $ S L R
0 s4 s5 1 2 3
1 acc
2 s6,r6 r6
3 r3
4 s4 s5 8 7
5 r5 r5
6 s4 s5 8 9
7 r4 r4
8 r6 r6
9 r2
98
Problems in SLR parsing
• No sentential form of this grammar can start with R=…
repeat
for each item [A α.Bβ, a] in I
for each production B γ in G'
and for each terminal b in
First(βa)
add item [B .γ, b] to I
until no more additions to I
101
Example
Consider the following grammar
S‘ S
S CC
C cC | d
S‘ .S, $
S .CC, $
C .cC, c
C .cC, d
C .d, c
C .d, d
102
Example
Construct sets of LR(1) items for the grammar on
previous slide
I4: goto(I0,d)
I0: S′ .S, $ C d., c/d
S .CC, $
C .cC, c/d I5: goto(I2,C)
C .d, c/d S CC., $
I1: goto(I0,S) I6: goto(I2,c)
S′ S., $ C c.C, $
C .cC, $
I2: goto(I0,C) C .d, $
S C.C, $
C .cC, $ I7: goto(I2,d)
C .d, $ C d., $
• If [A α., a] is in Ii
then action[i,a] reduce A α
• If [S′ S., $] is in Ii
then action[i,$] = accept
• Let J = I1 U I2…….U Ik
109
Notes on LALR parse table
• Modified parser behaves as original except
that it will reduce Cd on inputs like ccd.
The error will eventually be caught before
any more symbols are shifted.
110
Notes on LALR parse
•
table…
Merging items may result into conflicts in LALR
parsers which did not exist in LR parsers
– LL(k) ≤ LR(k)
112
Error Recovery
• An error is detected when an entry in the action
table is found to be empty.
C
Compiler
Object code