0% found this document useful (0 votes)

6 views115 pages

Syntax Analysis

The document discusses syntax analysis in programming languages, focusing on the construction of abstract syntax trees, error reporting, and the limitations of regular languages. It explains context-free grammars, parsing techniques, and the challenges of ambiguity in grammar, including the dangling else problem and left recursion. Additionally, it covers predictive parsing and the implementation of parsers using parse tables.

Uploaded by

thorat_496512597

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views115 pages

Syntax Analysis

Uploaded by

thorat_496512597

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 115

Syntax Analysis

• Check syntax and construct abstract

syntax tree

if
== = ;
b 0 a b

• Error reporting and recovery

• Model using context free grammars
• Recognize using Push down
automata/Table Driven Parsers
1
What syntax analysis can not
do!
• To check whether variables are of
types on which operations are
allowed

• To check whether a variable has

been declared before use

• To check whether a variable has

been initialized

• These issues will be handled in

semantic analysis 2
Limitations of regular
•
languages
How to describe language syntax precisely
and conveniently. Can regular expressions
be used?

• Many languages are not regular for

example string of balanced parentheses
– ((((…))))
– { (i)i | i ≥ 0 }
– There is no regular expression for this language

• A finite automata may repeat states,

however, it can not remember the number
of times it has been to a particular state

• A more powerful language is needed to

describe valid string of tokens
3
Syntax definition
• Context free grammars
– a set of tokens (terminal symbols)
– a set of non terminal symbols
– a set of productions of the form
nonterminal String of terminals & non terminals
– a start symbol
<T, N, P, S>

• A grammar derives strings by beginning with

start symbol and repeatedly replacing a non
terminal by the right hand side of a
production for that non terminal.

• The strings that can be derived from the start

symbol of a grammar G form the language
L(G) defined by the grammar.
4
Examples
• String of balanced parentheses
S(S)S|Є

• Grammar
list  list + digit
| list – digit
| digit
digit  0 | 1 | … | 9

Consists of language which is a list

of digit separated by + or -.
5
Derivation
list  list + digit
 list – digit + digit
 digit – digit + digit
 9 – digit + digit
 9 – 5 + digit
9–5+2

Therefore, the string 9-5+2 belongs to

the language specified by the grammar

The name context free comes from the

fact that use of a production X  … does
not depend on the context of X

6
Examples …

• Grammar for Pascal block

block  begin statements end

statements  stmt-list | Є
stmt–list  stmt-list ; stmt
| stmt

7
Syntax analyzers
• Testing for membership whether w
belongs to L(G) is just a “yes” or
“no” answer

• However the syntax analyzer

– Must generate the parse tree
– Handle errors gracefully if string is not
in the language

• Form of the grammar is important

– Many grammars generate the same
language
– Tools are sensitive to the grammar

8
Derivation
• If there is a production A  α then we say
that A derives α and is denoted by A  α

• α A β  α γ β if A  γ is a production

+
• If α1  α2  …  αn then α1  αn

• Given a grammar+G and a string w of

terminals in L(G) we can write S  w
*
• If S  α where α is a string of terminals
and non terminals of G then we say that
α is a sentential form of G
9
Derivation …
• If in a sentential form only the leftmost
non terminal is replaced then it becomes
leftmost derivation

• Every leftmost step can be written as

wAγ lm* wδγ
where w is a string of terminals and A  δ
is a production

• Similarly, right most derivation can be

defined

• An ambiguous grammar is one that

produces more than one
leftmost/rightmost derivation of a
sentence
10
Parse tree
• It shows how the start symbol of a
grammar derives a string in the language

• root is labeled by the start symbol

• leaf nodes are labeled by tokens

• Each internal node is labeled by a non

terminal

• if A is a non-terminal labeling an internal

node and x1, x2, …xn are labels of children
of that node then A  x1 x2 … xn is a
production
11
Example
Parse tree for 9-5+2
list

list + digit

list - digit 2

digit 5

9
12
Ambiguity
• A Grammar can have more than
one parse tree for a string

• Consider grammar
string  string + string
| string – string
|0|1|…|9

• String 9-5+2 has two parse

trees
13
string string

string + string string - string

string - string 2 9 string + string

9 5 5 2

14
Ambiguity …
• Ambiguity is problematic because meaning
of the programs can be incorrect

• Ambiguity can be handled in several ways

– Enforce associativity and precedence
– Rewrite the grammar (cleanest way)

• There are no general techniques for

handling ambiguity

• It is impossible to convert automatically an

ambiguous grammar to an unambiguous
one
15
Associativity
• If an operand has operator on both the
sides, the side on which operator takes
this operand is the associativity of that
operator

• In a+b+c b is taken by left +

• +, -, *, / are left associative

• ^, = are right associative

• Grammar to generate strings with right

associative operators
right  letter = right | letter
letter  a| b |…| z
16
Precedence
• String a+5*2 has two possible
interpretations because of two
different parse trees
corresponding to
(a+5)*2 and a+(5*2)

• Precedence determines the

correct interpretation.

17
Parsing
• Process of determination whether a
string can be generated by a grammar

• Parsing falls in two categories:

– Top-down parsing:
Construction of the parse tree starts at the
root (from the start symbol) and proceeds
towards leaves (token or terminals)

– Bottom-up parsing:
Constructions of the parse tree starts from the
leaf nodes (tokens or terminals of the
grammar) and proceeds towards root (start
symbol)
18
Example: Top down
Parsing
• Following grammar generates
types of Pascal

type  simple
|  id
| array [ simple] of type

simple  integer
| char
| num dotdot num

19
Example …
• Construction of parse tree is done
by starting root labeled by start
symbol

• repeat following two steps

– at node labeled with non terminal A

select one of the production
(Whichof production?)
A and
construct children nodes

– find the next node at

(Which which subtree is
node?)
Constructed
20
• Parse
array [ num dotdot num ] of integer

Start symbol type

Expanded using
the
simple rule type 
simple
• Can not proceed as non terminal “simple” never
generates a string beginning with token
“array”. Therefore, requires back-tracking.

• Back-tracking is not desirable therefore, take

help of a “look-ahead” token. The current token
is treated as look-ahead token. (restricts the class
of grammars)

21
array [ num dotdot num ] of integer

Start symbol
look-ahead Expand using the rule
type type  array [ simple ] of
type

array [ simple ] of type

Left most non terminal

Expand using the nu dotdot num simple

rule m
Simple  num
dotdot num
all the tokens exhausted Left most non terminal integer
Parsing completed Expand using the rule
type  simple

Left most non terminal

Expand using the rule 22
simple  integer
Recursive descent parsing
First set:

Let there be a production

A

then First() is set of tokens that

appear as the first token in the strings
generated from 

For example :
First(simple) = {integer, char, num}
First(num dotdot num) = {num}

23
Define a procedure for each non terminal
procedure type;
if lookahead in {integer, char, num}
then simple
else if lookahead = 
then begin match(  );
match(id)
end
else if lookahead = array
then begin match(array);
match([);
simple;
match(]);
match(of);
type
end
else error;
24
procedure simple;
if lookahead = integer
then match(integer)
else if lookahead = char
then match(char)
else if lookahead = num
then begin match(num);

match(dotdot);
match(num)
end
else
error;

procedure match(t:token);
if lookahead = t
then lookahead = next token
else error; 25
Ambiguity
• Dangling else problem

Stmt  if expr then stmt

| if expr then stmt else
stmt

• according to this grammar, string

if el then if e2 then S1 else S2
has two parse trees
26
if e1
then if e2
stmt
then s1
else s2
if expr then stmt else stmt

if e1 e1 if expr then stmt s2

then if e2
then s1
else s2 e2 s1
stmt

if expr then stmt

e1 if expr then stmt else stmt

e2 s1 s2 27
Resolving dangling else
problem
• General rule: match each else with the closest
previous then. The grammar can be rewritten as

stmt  matched-stmt
| unmatched-stmt
| others

matched-stmt  if expr then matched-stmt

else matched-stmt
| others

unmatched-stmt  if expr then stmt

| if expr then matched-stmt
else unmatched-stmt

28
Left recursion
• A top down parser with production
A  A  may loop forever

• From the grammar A  A  | 

left recursion may be eliminated by
transforming the grammar to

A R
RR|

29
Parse tree
corresponding Parse tree corresponding
to left recursive to the modified grammar
grammar A A

A R

β α α β α Є

Both the trees generate string βα*

30
Example
• Consider grammar for arithmetic expressions

EE+T|T
TT*F|F
F  ( E ) | id

• After removal of left recursion the grammar

becomes

E  T E’
E’  + T E’ | Є
T  F T’
T’ * F T’ | Є
F  ( E ) | id

31
Removal of left recursion
In general

A  A1 | A2 | ….. |Am

|1 | 2 | …… | n

transforms to

A  1A' | 2A' | …..| nA'

A'  1A' | 2A' |…..| mA' | Є
32
Left recursion hidden due to
many productions
• Left recursion may also be introduced by two or more
grammar rules. For example:

S  Aa | b
A  Ac | Sd | Є

there is a left recursion because

S  Aa  Sda

• In such cases, left recursion is removed systematically

– Starting from the first rule and replacing all the

occurrences of the first non terminal symbol

– Removing left recursion from the modified grammar

33
Removal of left recursion due
to many productions …
• After the first step (substitute S by its rhs in
the rules) the grammar becomes

S  Aa | b
A  Ac | Aad | bd | Є

• After the second step (removal of left

recursion) the grammar becomes

S  Aa | b
A  bdA' | A'
A'  cA' | adA' | Є

34
Left factoring
• In top-down parsing when it is not clear which
production to choose for expansion of a symbol
defer the decision till we have seen enough input.

In general if A  1 | 2

defer decision by expanding A to A'

we can then expand A’ to 1 or 2

• Therefore A   1 |  2

transforms to

A  A’
A’  1 | 2

35
Dangling else problem
again
Dangling else problem can be handled by
left factoring

stmt  if expr then stmt else stmt

| if expr then stmt

can be transformed to

stmt  if expr then stmt S'

S'  else stmt | Є

36
Predictive parsers
• A non recursive top down parsing method

• Parser “predicts” which production to use

• It removes backtracking by fixing one

production for every non-terminal and input
token(s)

• Predictive parsers accept LL(k) languages

– First L stands for left to right scan of
input
– Second L stands for leftmost derivation
– k stands for number of lookahead token

• In practice LL(1) is used

37
Predictive parsing
• Predictive parser can be implemented
by maintaining an external stack

input
Parse table is a
two dimensional array
M[X,a] where “X” is a
stack

parser output non terminal and “a” is

a terminal of the grammar

Parse
table
38
Parsing algorithm
• The parser considers 'X' the symbol on top of stack,
and 'a' the current input symbol

• These two symbols determine the action to be taken

by the parser

• Assume that '$' is a special token that is at the

bottom of the stack and terminates the input string

if X = a = $ then halt

if X = a ≠ $ then pop(x) and ip++

if X is a non terminal
then if M[X,a] = {X  UVW}
then begin pop(X); push(W,V,U)
end
else error

39
Example
• Consider the grammar

E  T E’
E'  +T E' | Є
T  F T'
T'  * F T' | Є
F  ( E ) | id

40
Parse table for the
grammar
id + * ( ) $
ETE’ ET
E E’
E’+T E’Є E’Є
E’ E’
TFT’ TFT
T ’
T’Є T’*FT T’Є T’Є
T’ ’
Fid F(E
F )
Blank entries are error states. For example
E can not derive a string starting with ‘+’

41
Example
Stack input action
$E id + id * id $ expand by
ETE’
$E’T id + id * id $ expand by TFT’
$E’T’F id + id * id $ expand by Fid
$E’T’id id + id * id $ pop id and
ip++
$E’T’ + id * id $ expand by T’Є
$E’ + id * id $ expand by E’+TE’
$E’T+ + id * id $ pop + and ip++
$E’T id * id $ expand by
TFT’

42
Example …
Stack input action
$E’T’F id * id $ expand by Fid
$E’T’id id * id $ pop id and ip++
$E’T’ * id $ expand by T’*FT’
$E’T’F* * id $ pop * and ip++
$E’T’F id $ expand by Fid
$E’T’id id $ pop id and ip++
$E’T’ $ expand by T’Є
$E’ $ expand by E’Є
$ $ halt

43
Constructing parse table
• Table can be constructed if for every non terminal,
every lookahead symbol can be handled by at most
one production

• First(α) for a string of terminals and non terminals

α is
– Set of symbols that might begin the fully
expanded (made of only tokens) version of α

• Follow(X) for a non terminal X is

– set of symbols that might follow the derivation of
X in the input streamX

first follow
44
Compute first sets
• If X is a terminal symbol then First(X) =
{X}

• If X  Є is a production then Є is in First(X)

• If X is a non terminal
and X  YlY2 … Yk is a production
then
if for some i, a is in First(Yi)
and Є is in all of First(Yj) (such
that j<i)
then a is in First(X)

• If Є is in First (Y1) … First(Yk) then Є is in

First(X)
45
Example
• For the expression grammar
E  T E’
E'  +T E' | Є
T  F T'
T'  * F T' | Є
F  ( E ) | id

First(E) = First(T) = First(F) = { (,

id }
First(E') = {+, Є}
First(T') = { *, Є}
46
Compute follow sets
1. Place $ in follow(S)
2. If there is a production A → αBβ
then everything in first(β) (except ε)
is in follow(B)

3. If there is a production A → αB
then everything in follow(A) is in
follow(B)

4. If there is a production A → αBβ

and First(β) contains ε
then everything in follow(A) is in follow(B)

Since follow sets are defined in terms of follow

sets last two steps have to be repeated until follow
sets converge

47
Example
• For the expression grammar
E  T E’
E'  + T E' | Є
T  F T'
T'  * F T' | Є
F  ( E ) | id

follow(E) = follow(E’) = { $, ) }
follow(T) = follow(T’) = { $, ), + }
follow(F) = { $, ), +, *}
48
Construction of parse
• for each productiontable
A  α do
– for each terminal ‘a’ in first(α)
M[A,a] = A  α

– If Є is in First(α)
M[A,b] = A  α
for each terminal b in follow(A)

– If ε is in First(α) and $ is in follow(A)

M[A,$] = A  α

• A grammar whose parse table has no multiple

entries is called LL(1)

• Steps to be followed
– Remove left recursion
– Compute first sets
– Compute follow sets
– Construct the parse table

• Not to be submitted
50
Error handling
• Stop at the first error and print a message
– Compiler writer friendly
– But not user friendly

• Every reasonable compiler must recover from error and

identify as many errors as possible

• However, multiple error messages due to a single fault must

be avoided

• Error recovery methods

– Panic mode

– Phrase level recovery

– Error productions

– Global correction
51
Panic mode
• Simplest and the most popular
method

• Most tools provide for specifying

panic mode recovery in the grammar

• When an error is detected

– Discard tokens one at a time until a set
of tokens is found whose role is clear
– Skip to the next token that can be
placed reliably in the parse tree
52
Panic mode …
• Consider following code
begin
a = b + c;
x=pr;
h = x < 0;
end;

• The second expression has syntax error

• Panic mode recovery for begin-end block

skip ahead to next ‘;’ and try to parse the next
expression

• It discards one expression and tries to continue parsing

• May fail if no further ‘;’ is found

53
Phrase level recovery
• Make local correction to the input

• Works only in limited situations

– A common programming error which is
easily detected
– For example insert a “;” after closing
“}” of a class definition

• Does not work very well!

54
Error productions
• Add erroneous constructs as
productions in the grammar

• Works only for most common mistakes

which can be easily identified

• Essentially makes common errors as

part of the grammar

• Complicates the grammar and does not

work very well
55
Global corrections
• Considering the program as a whole find
a correct “nearby” program

• Nearness may be measured using certain

metric

• PL/C compiler implemented this scheme:

anything could be compiled!

• It is complicated and not a very good

idea!
56
Error Recovery in LL(1)
parser
• Error occurs when a parse table entry
M[A,a] is empty

• Skip symbols in the input until a token in a

selected set (synch) appears

• Place symbols in follow(A) in synch set.

Skip tokens until an element in follow(A) is
seen.
Pop(A) and continue parsing

• Add symbol in first(A) in synch set. Then it

may be possible to resume parsing
according to A if a symbol in first(A)
appears in input.
57
Assignment
• Reading assignment: Read about
error recovery in LL(1) parsers
• Assignment to be submitted:
– introduce synch symbols (using both
follow and first sets) in the parse table
created for the boolean expression
grammar in the previous assignment
– Parse “not (true and or false)” and
show how error recovery works
– Due on todate+10

58
Bottom up parsing
• Construct a parse tree for an input string
beginning at leaves and going towards root
OR
• Reduce a string w of input to start symbol of
grammar

Consider a grammar
S  aABe
A  Abc | b
Bd

And reduction of a string

a bbcde Right most derivation
aAbcde Sa ABe
aAde a Ade
aABe a Abcde
a bbcde
S
59
Shift reduce parsing
• Split string being parsed into two
parts
– Two parts are separated by a special
character “.”
– Left part is a string of terminals and
non terminals
– Right part is a string of terminals

• Initially the input is .w

60
Shift reduce parsing …
• Bottom up parsing has two actions

• Shift: move terminal symbol from right

string to left string
if string before shift is α.pqr
then string after shift is αp.qr

• Reduce: immediately on the left of “.”

identify a string same as RHS of a
production and replace it by LHS
if string before reduce action is αβ.pqr
and Aβ is a production
then string after reduction is αA.pqr

61
Example
Assume grammar is E  E+E | E*E | id
Parse id*id+id

String action
.id*id+id shift
id.*id+id reduce Eid
E.*id+id shift
E*.id+id shift
E*id.+id reduce Eid
E*E.+id reduce EE*E
E.+id shift
E+.id shift
E+id. Reduce Eid
E+E. Reduce EE+E
E. ACCEPT 62
Shift reduce parsing …
• Symbols on the left of “.” are kept on a
stack

– Top of the stack is at “.”

– Shift pushes a terminal on the stack
– Reduce pops symbols (rhs of production) and
pushes a non terminal (lhs of production) onto
the stack

• The most important issue: when to shift

and when to reduce

• Reduce action should be taken only if the

result can be reduced to the start symbol
63
Bottom up parsing …
• A more powerful parsing technique

• LR grammars – more expensive than LL

• Can handle left recursive grammars

• Can handle virtually all the programming

languages

• Natural expression of programming language

syntax

• Automatic generation of parsers (Yacc, Bison etc.)

• Detects errors as soon as possible

• Allows better error recovery 64

Issues in bottom up
parsing
• How do we know which action to take
– whether to shift or reduce
– Which production to use for reduction?

• Sometimes parser can reduce but it should not:

XЄ can always be reduced!

• Sometimes parser can reduce in different ways!

• Given stack δ and input symbol a, should the

parser
– Shift a onto stack (making it δa)
– Reduce by some production Aβ assuming that
stack has form αβ (making it αA)
– Stack can have many combinations of αβ
– How to keep track of length of β?
65
Handle
• A string that matches right hand side of a production
and whose replacement gives a step in the reverse
right most derivation

• If S rm* αAw rm αβw then β (corresponding to

production A β) in the position following α is a
handle of αβw. The string w consists of only terminal
symbols

• We only want to reduce handle and not any rhs

• Handle pruning: If β is a handle and A  β is a

production then replace β by A

• A right most derivation in reverse can be obtained by

handle pruning.

66
Handles …
• Handles always appear at the top of the
stack and never inside it

• This makes stack a suitable data structure

• Consider two cases of right most

derivation to verify the fact that handle
appears on the top of the stack

– S  αAz  αβByz  αβγyz

– S  αBxAz  αBxyz  αγxyz

• Bottom up parsing is based on

recognizing handles
67
Handle always appears on the
top
Case I: S  αAz  αβByz  αβγyz
stack input action
αβγ yz reduce by Bγ
αβB yz shift y
αβBy z reduce by A βBy
αA z

Case II: S  αBxAz  αBxyz  αγxyz

stack input action
αγ xyz reduce by Bγ
αB xyz shift x
αBx yz shift y
αBxy z reduce Ay
αBxA z 68
Conflicts
• The general shift-reduce technique is:
– if there is no handle on the stack then shift
– If there is a handle then reduce

• However, what happens when there is a

choice
– What action to take in case both shift and
reduce are valid?
shift-reduce conflict
– Which rule to use for reduction if reduction is
possible by more than one rule?
reduce-reduce conflict

• Conflicts come either because of

ambiguous grammars or parsing method
is not powerful enough
69
Shift reduce conflict
Consider the grammar E  E+E | E*E | id
and input id+id*id

stack inputaction
stack inputaction E+E *id shift
E+E *id reduce by E+E* id shift
EE+E E+E*id reduce by
E *id shift Eid
E* id shift E+E*E reduce
E*id reduce by byEE*E
Eid E+E reduce
E*E reduce byEE+EE
byEE*EE
70
Reduce reduce conflict
Consider the grammar M  R+R | R+c | R
Rc
and input c+c
Stack input action Stack input action
c+c shift c+c shift
c +c reduce by Rc c +c reduce by
Rc
R +c shift
R +c shift
R+ c shift
R+ c shift
R+c reduce by Rc
R+c reduce by
R+R reduce by R+RM MR+cM

71
LR parsing
• Input contains the input
input string.

• Stack contains a string of

the form S0X1S1X2……XnSn
stack

output
parser where each Xi is a grammar
symbol and each Si is a
state.

• Tables contain action and

action goto goto parts.

• action table is indexed by

Parse table state and terminal symbols.

• goto table is indexed by

state and non terminal
symbols.
72
Actions in an LR (shift
reduce) parser
• Assume Si is top of stack and ai is
current input symbol

• Action [Si,ai] can have four values

1. shift ai to the stack and goto state

Sj
2. reduce by a rule
3. Accept
4. error
73
Configurations in LR
Stack: S X S X …X S
parser
Input: a a …a $
0 1 1 2 m m i i+1 n

• If action[Sm,ai] = shift S
Then the configuration becomes
Stack: S0X1S1……XmSmaiS Input: ai+1…an$

• If action[Sm,ai] = reduce Aβ

Then the configuration becomes
Stack: S0X1S1…Xm-rSm-r AS Input: aiai+1…an$
Where r = |β| and S = goto[Sm-r,A]

• If action[Sm,ai] = accept
Then parsing is completed. HALT

• If action[Sm,ai] = error
Then invoke error recovery routine.
74
LR parsing Algorithm
Initial state: Stack: S0 Input: w$

Loop{
if action[S,a] = shift S’
then push(a); push(S’); ip++
else if action[S,a] = reduce Aβ
then pop (2*|β|) symbols;
push(A); push (goto[S’’,A])
(S’’ is the state after popping
symbols)
else if action[S,a] = accept
then exit
else error
}
75
Example
EE+T | T
Consider the grammar TT*F | F
And its parse table F  ( E ) | id
State id + * ( ) $ E T F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3 76
11 r5 r5 r5 r5
Parse id + id * id
Stack Input Action
0 id+id*id$ shift 5
0 id 5 +id*id$ reduce by Fid
0F3 +id*id$ reduce by TF
0T2 +id*id$ reduce by ET
0E1 +id*id$ shift 6
0E1+ 6 id*id$ shift 5
0E1+ 6 id 5 *id$ reduce by Fid
0E1+ 6 F3 *id$ reduce by TF
0E1+ 6 T9 *id$ shift 7
0E1+ 6 T9* 7 id$ shift 5
0E1+ 6 T9* 7 id 5 $ reduce by Fid
0E1+ 6 T9* 7 F 10 $ reduce by TT*F
0E1+ 6 T9 $ reduce by EE+T
0E1 $ ACCEPT
77
Parser states
• Goal is to know the valid reductions at
any given point

• Summarize all possible stack prefixes α as

a parser state

• Parser state is defined by a DFA state that

reads in the stack α

• Accept states of DFA are unique

reductions
78
Constructing parse table
Augment the grammar
• G is a grammar with start symbol
S

• The augmented grammar G’ for G

has a new start symbol S’ and an
additional production S’  S

• When the parser reduces by this

rule it will stop with accept
79
Viable prefixes
• α is a viable prefix of the grammar if
– There is a w such that αw is a right sentential
form
– α.w is a configuration of the shift reduce parser

• As long as the parser has viable prefixes on

the stack no parser error has been seen

• The set of viable prefixes is a regular

language (not obvious)

• Construct an automaton that accepts

viable prefixes
80
LR(0) items
• An LR(0) item of a grammar G is a production of
G with a special symbol “.” at some position of the
right side

• Thus production A→XYZ gives four LR(0) items

A  .XYZ
A  X.YZ
A  XY.Z
A  XYZ.

• An item indicates how much of a production has

been seen at a point in the process of parsing

– Symbols on the left of “.” are already on the stacks

– Symbols on the right of “.” are expected in the input

81
Start state
• Start state of DFA is empty stack
corresponding to S’.S item
– This means no input has been seen
– The parser expects to see a string derived from
S

• Closure of a state adds items for all

productions whose LHS occurs in an item
in the state, just after “.”
– Set of possible productions to be reduced next
– Added items have “.” located at the beginning
– No symbol of these items is on the stack as yet

82
Closure operation
• If I is a set of items for a grammar G then
closure(I) is a set constructed as follows:

– Every item in I is in closure (I)

– If A  α.Bβ is in closure(I) and B  γ is a

production then B  .γ is in closure(I)

• Intuitively A α.Bβ indicates that we might

see a string derivable from Bβ as input

• If input B  γ is a production then we

might see a string derivable from γ at this
point
83
Example
Consider the grammar

E’  E
EE+T | T
TT*F | F
F → ( E ) | id

If I is { E’  .E } then closure(I) is

E’  .E
E  .E + T
E  .T
T  .T * F
T  .F
F  .id
F  .(E)
84
Applying symbols in a
state
• In the new state include all the
items that have appropriate
input symbol just after the “.”

• Advance “.” in those items and

take closure

85
Goto operation
• Goto(I,X) , where I is a set of items and X is a
grammar symbol,
– is closure of set of item A αX.β
– such that A → α.Xβ is in I

• Intuitively if I is set of items for some valid prefix

α then goto(I,X) is set of valid items for prefix αX

• If I is { E’E. , EE. + T } then goto(I,+) is

E E + .T
T .T * F
T .F
F .(E)
F .id

86
Sets of items
C : Collection of sets of LR(0) items
for grammar G’

C = { closure ( { S’  .S } ) }
repeat
for each set of items I in C
and each grammar symbol X
such that goto (I,X) is not empty and
not in C
ADD goto(I,X) to C
until no more additions
87
Example
Grammar: I2: goto(I0,T)
E’  E E  T.
E  E+T | T T  T. *F
T  T*F | F
F  (E) | id I3: goto(I0,F)
T  F.
I0: closure(E’.E)
E′  .E
E  .E + T I4: goto( I0,( )
E  .T F  (.E)
T  .T * F
T  .F E  .E + T
F  .(E) E  .T
F  .id T  .T * F
T  .F
I1: goto(I0,E) F  .(E)
E′  E.
E  E. + T F  .id

I5: goto(I0,id)
F  id. 88
I6: goto(I1,+) I9: goto(I6,T)
E  E + .T E  E + T.
T  .T * F T  T. * F
T  .F
F  .(E) goto(I6,F) is I3
F  .id goto(I6,( ) is I4
goto(I6,id) is I5
I7: goto(I2,*)
T  T * .F I10: goto(I7,F)
F .(E) T  T * F.
F  .id
goto(I7,( ) is I4
I8: goto(I4,E) goto(I7,id) is I5
F  (E.)
E  E. + T I11: goto(I8,) )
F  (E).
goto(I4,T) is I2
goto(I4,F) is I3 goto(I8,+) is I6
goto(I4,( ) is I4 goto(I9,*) is I7
89
goto(I4,id) is I5
id

id I5
+ id
I1 I6 I9
( ( +
*
(
)
I0 I4 I8 I11
id
(
I2 * I7 I10

90
F I5
E T
I1 I6 I9

E
I0 I4 I8 I11
T F T
F
I2 I7 I10

91
id

id I5
F

E + T id
I1 I6 I9
( ( +
*
(
E )
I0 I4 I8 I11
T F T id
(
F
I2 * I7 I10

92
Construct SLR parse table
• Construct C={I0, …, In} the collection of sets of
LR(0) items

• If Aα.aβ is in Ii and goto(Ii,a) = Ij

then action[i,a] = shift j

• If Aα. is in Ii
then action[i,a] = reduce Aα for all a in follow(A)

• If S'S. is in Ii then action[i,$] = accept

• If goto(Ii,A) = Ij
then goto[i,A]=j for all non terminals A

• All entries not defined are errors

93
Notes
• This method of parsing is called SLR (Simple LR)

• LR parsers accept LR(k) languages

– L stands for left to right scan of input
– R stands for rightmost derivation
– k stands for number of lookahead token

• SLR is the simplest of the LR parsing methods. It is too

weak to handle most languages!

• If an SLR parse table for a grammar does not have multiple

entries in any cell then the grammar is unambiguous

• All SLR grammars are unambiguous

• Are all unambiguous grammars in SLR?

94
Assignment
Construct SLR parse table for following grammar

E  E + E | E - E | E * E | E / E | ( E ) | digit

Show steps in parsing of string

9*5+(2+3*7)

• Steps to be followed
– Augment the grammar
– Construct set of LR(0) items
– Construct the parse table
– Show states of parser as the given string is parsed

• Due on todate+5

95
Example
• Consider following grammar and its SLR parse
table:
I1: goto(I0, S)
S’  S S’  S.
SL=R
SR I2: goto(I0, L)
L  *R S  L.=R
L  id R  L.
RL
Assignment
I0: S’  .S (not to be
S  .L=R submitted):
S  .R Construct rest
L  .*R of the items
L  .id and the parse
R  .L table.
96
SLR parse table for the grammar

= * id $ S L R
0 s4 s5 1 2 3
1 acc
2 s6,r6 r6
3 r3
4 s4 s5 8 7
5 r5 r5
6 s4 s5 8 9
7 r4 r4
8 r6 r6
9 r2

The table has multiple entries in action[2,=]

97
• There is both a shift and a reduce entry in action[2,=]. Therefore
state 2 has a shift-reduce conflict on symbol “=“, However, the
grammar is not ambiguous.

• Parse id=id assuming reduce action is taken in [2,=]

Stack input action
0 id=id shift 5
0 id 5 =id reduce by Lid
0L2 =id reduce by RL
0R3 =id error

• if shift action is taken in [2,=]

Stack input action
0 id=id$ shift 5
0 id 5 =id$ reduce by Lid
0L2 =id$ shift 6
0L2=6 id$ shift 5
0 L 2 = 6 id 5 $ reduce by Lid
0L2=6L8 $ reduce by RL
0L2=6R9 $ reduce by SL=R
0S1 $ ACCEPT

98
Problems in SLR parsing
• No sentential form of this grammar can start with R=…

• However, the reduce action in action[2,=] generates a

sentential form starting with R=

• Therefore, the reduce action is incorrect

• In SLR parsing method state i calls for reduction on

symbol “a”, by rule Aα if Ii contains [Aα.] and “a” is in
follow(A)

• However, when state I appears on the top of the stack, the

viable prefix βα on the stack may be such that βA can not
be followed by symbol “a” in any right sentential form

• Thus, the reduction by the rule Aα on symbol “a” is

invalid

• SLR parsers can not remember the left context

99
Canonical LR Parsing
• Carry extra information in the state so
that wrong reductions by A  α will be
ruled out

• Redefine LR items to include a terminal

symbol as a second component (look
ahead symbol)

• The general form of the item becomes [A

 α.β, a] which is called LR(1) item.

• Item [A  α., a] calls for reduction only if

next input is a. The set of symbols “a”s
will be a subset of Follow(A).
100
Closure(I)

repeat
for each item [A  α.Bβ, a] in I
for each production B  γ in G'
and for each terminal b in
First(βa)
add item [B  .γ, b] to I
until no more additions to I

101
Example
Consider the following grammar

S‘ S
S  CC
C  cC | d

Compute closure(I) where I={[S’  .S, $]}

S‘ .S, $
S  .CC, $
C  .cC, c
C  .cC, d
C  .d, c
C  .d, d

102
Example
Construct sets of LR(1) items for the grammar on
previous slide
I4: goto(I0,d)
I0: S′  .S, $ C  d., c/d
S  .CC, $
C  .cC, c/d I5: goto(I2,C)
C  .d, c/d S  CC., $
I1: goto(I0,S) I6: goto(I2,c)
S′  S., $ C  c.C, $
C  .cC, $
I2: goto(I0,C) C  .d, $
S  C.C, $
C  .cC, $ I7: goto(I2,d)
C  .d, $ C  d., $

I3: goto(I0,c) I8: goto(I3,C)

C  c.C, c/d C  cC., c/d
C  .cC, c/d
C  .d, c/d I9: goto(I6,C)
C  cC., $
103
Construction of Canonical LR
parse table
• Construct C={I0, …,In} the sets of LR(1) items.

• If [A  α.aβ, b] is in Ii and goto(Ii, a)=Ij

then action[i,a]=shift j

• If [A  α., a] is in Ii
then action[i,a] reduce A  α

• If [S′  S., $] is in Ii
then action[i,$] = accept

• If goto(Ii, A) = Ij then goto[i,A] = j for all non

terminals A
104
Parse table
State c d $ S C
0 s3 s4 1 2
1 acc
2 s6 s7 5
3 s3 s4 8
4 r3 r3
5 r1
6 s6 s7 9
7 r3
8 r2 r2
9 r2
105
Notes on Canonical LR Parser
• Consider the grammar discussed in the previous two
slides. The language specified by the grammar is
c*dc*d.

• When reading input cc…dcc…d the parser shifts cs

into stack and then goes into state 4 after reading d.
It then calls for reduction by Cd if following symbol
is c or d.

• IF $ follows the first d then input string is c*d which

is not in the language; parser declares an error

• On an error canonical LR parser never makes a

wrong shift/reduce move. It immediately declares an
error

• Problem: Canonical LR parse table has a large

number of states
106
LALR Parse table
• Look Ahead LR parsers

• Consider a pair of similar looking states

(same kernel and different lookaheads) in
the set of LR(1) items
I4: C  d. , c/d I7: C  d., $

• Replace I4 and I7 by a new state I47

consisting of
(C  d., c/d/$)

• Similarly I3 & I6 and I8 & I9 form pairs

• Merge LR(1) items having the same core

107
Construct LALR parse
table
• Construct C={I0,……,In} set of LR(1) items

• For each core present in LR(1) items find all sets

having the same core and replace these sets by
their union

• Let C' = {J0,…….,Jm} be the resulting set of items

• Construct action table as was done earlier

• Let J = I1 U I2…….U Ik

since I1 , I2……., Ik have same core, goto(J,X) will

have he same core

Let K=goto(I1,X) U goto(I2,X)……goto(Ik,X) the

goto(J,X)=K
108
LALR parse table …
State c d $ S C
0 s36 s47 1 2
1 acc
2 s36 s47 5
36 s36 s47 89
47 r3 r3 r3
5 r1
89 r2 r2 r2

109
Notes on LALR parse table
• Modified parser behaves as original except
that it will reduce Cd on inputs like ccd.
The error will eventually be caught before
any more symbols are shifted.

• In general core is a set of LR(0) items and

LR(1) grammar may produce more than
one set of items with the same core.

• Merging items never produces

shift/reduce conflicts but may produce
reduce/reduce conflicts.

• SLR and LALR parse tables have same number of

states.

110
Notes on LALR parse
•
table…
Merging items may result into conflicts in LALR
parsers which did not exist in LR parsers

• New conflicts can not be of shift reduce kind:

– Assume there is a shift reduce conflict in some state
of LALR parser with items
{[Xα.,a],[Yγ.aβ,b]}
– Then there must have been a state in the LR parser
with the same core
– Contradiction; because LR parser did not have
conflicts

• LALR parser can have new reduce-reduce conflicts

– Assume states
{[Xα., a], [Yβ., b]} and {[Xα., b], [Yβ.,
a]}
– Merging the two states produces
{[Xα., a/b], [Yβ., a/b]}
111
Notes on LALR parse table…
• LALR parsers are not built by first making
canonical LR parse tables

• There are direct, complicated but efficient

algorithms to develop LALR parsers

• Relative power of various classes

– SLR(1) ≤ LALR(1) ≤ LR(1)

– SLR(k) ≤ LALR(k) ≤ LR(k)

– LL(k) ≤ LR(k)

112
Error Recovery
• An error is detected when an entry in the action
table is found to be empty.

• Panic mode error recovery can be implemented as

follows:

– scan down the stack until a state S with a goto on a

particular nonterminal A is found.

– discard zero or more input symbols until a symbol a

is found that can legitimately follow A.

– stack the state goto[S,A] and resume parsing.

• Choice of A: Normally these are non terminals

representing major program pieces such as an
expression, statement or a block. For example if A
is the nonterminal stmt, a might be semicolon or
end.
113
Parser Generator
• Some common parser generators

– YACC: Yet Another Compiler Compiler

– Bison: GNU Software
– ANTLR: ANother Tool for Language
Recognition

• Yacc/Bison source program specification

(accept LALR grammars)
declaration
%%
translation rules
%%
supporting C routines
114
Yacc and Lex schema
C code for lexical analyzer
Lex.yy.c
Token
specifications
Lex

Grammar C code for

specifications
Yacc parser y.tab.c

C
Compiler

Object code

Input Parser Abstract

program Syntax tree

Refer to YACC Manual 115

Syntax Analysis
No ratings yet
Syntax Analysis
115 pages
Top Down Parsing
No ratings yet
Top Down Parsing
37 pages
Parsing - 1
No ratings yet
Parsing - 1
59 pages
L4 Formal Grammers
No ratings yet
L4 Formal Grammers
23 pages
Unit-2 2.1. Review of CFG Ambiguity of Grammars 2.1.1. Limitations of Regular Language
No ratings yet
Unit-2 2.1. Review of CFG Ambiguity of Grammars 2.1.1. Limitations of Regular Language
44 pages
Syntax Analysis: - Check Syntax and Construct Abstract Syntax Tree
No ratings yet
Syntax Analysis: - Check Syntax and Construct Abstract Syntax Tree
22 pages
L5 TopDownParsing
No ratings yet
L5 TopDownParsing
30 pages
Chapter 2 - Simple Syntax Directed Translator
No ratings yet
Chapter 2 - Simple Syntax Directed Translator
39 pages
Parsing Techniques Explained
No ratings yet
Parsing Techniques Explained
12 pages
BCS 324 Compiler Design Notes - Unit2
No ratings yet
BCS 324 Compiler Design Notes - Unit2
37 pages
CD Unit 3
No ratings yet
CD Unit 3
76 pages
Syntax Analyser
No ratings yet
Syntax Analyser
30 pages
Compiler Design Lec-Three Syntax Analysis
No ratings yet
Compiler Design Lec-Three Syntax Analysis
60 pages
CSC 409 Note 2
No ratings yet
CSC 409 Note 2
12 pages
Syntax Analysis in Compiler Design
No ratings yet
Syntax Analysis in Compiler Design
74 pages
Chapter - 3
No ratings yet
Chapter - 3
46 pages
Top-Down Parsing PDF
No ratings yet
Top-Down Parsing PDF
6 pages
Unit 2 (CD)
No ratings yet
Unit 2 (CD)
12 pages
CD Unit-Ii
No ratings yet
CD Unit-Ii
56 pages
Chapter 4 - Syntax Analysis CIE1
No ratings yet
Chapter 4 - Syntax Analysis CIE1
69 pages
Parsing
No ratings yet
Parsing
38 pages
Lec03 parserCFG
No ratings yet
Lec03 parserCFG
27 pages
3 Syntax Analysis
No ratings yet
3 Syntax Analysis
42 pages
Lec02-Syntax Analysis and LL
No ratings yet
Lec02-Syntax Analysis and LL
74 pages
CH03
No ratings yet
CH03
57 pages
Chapter 3
No ratings yet
Chapter 3
9 pages
Parsing, Lexical Analysis, and Tools: William Cook
No ratings yet
Parsing, Lexical Analysis, and Tools: William Cook
16 pages
Syntax Analysis & Parsing Guide
No ratings yet
Syntax Analysis & Parsing Guide
29 pages
Syntax Analysis and Parsing Guide
No ratings yet
Syntax Analysis and Parsing Guide
95 pages
10-11-12-13-Top Down Parser
No ratings yet
10-11-12-13-Top Down Parser
76 pages
A Simple One-Pass Compiler (To Generate Code For The JVM)
No ratings yet
A Simple One-Pass Compiler (To Generate Code For The JVM)
70 pages
Lecture 4 - Syntax Analysis
No ratings yet
Lecture 4 - Syntax Analysis
66 pages
Lec4 SyntaxAnalysis
No ratings yet
Lec4 SyntaxAnalysis
41 pages
Chapter 3 Syntax Analysis
No ratings yet
Chapter 3 Syntax Analysis
78 pages
Class Three
No ratings yet
Class Three
74 pages
Top To Bottom
No ratings yet
Top To Bottom
31 pages
Parsing ME Modified
No ratings yet
Parsing ME Modified
168 pages
Chapter 3
No ratings yet
Chapter 3
180 pages
Compiler Unit2
No ratings yet
Compiler Unit2
89 pages
Lexical and Syntax Analysis
No ratings yet
Lexical and Syntax Analysis
63 pages
Compiler Construction CS-4207: Lecture 8-9 Instructor Name: Atif Ishaq
No ratings yet
Compiler Construction CS-4207: Lecture 8-9 Instructor Name: Atif Ishaq
34 pages
Chapter-4 - CS-411 Compiler Construction
No ratings yet
Chapter-4 - CS-411 Compiler Construction
8 pages
Chapter 2
No ratings yet
Chapter 2
47 pages
Unit-II CD
No ratings yet
Unit-II CD
81 pages
SSK5204 Chapter 5: Context-Free Grammars and Languages
No ratings yet
SSK5204 Chapter 5: Context-Free Grammars and Languages
55 pages
Multimedia Application L4
No ratings yet
Multimedia Application L4
42 pages
Chapter - Three: Syntax Analysis
No ratings yet
Chapter - Three: Syntax Analysis
100 pages
Chapter-3 So Far
No ratings yet
Chapter-3 So Far
50 pages
Syntax Analyzer and Parsing Techniques
No ratings yet
Syntax Analyzer and Parsing Techniques
38 pages
CD Unit3
No ratings yet
CD Unit3
74 pages
Chapter 3
No ratings yet
Chapter 3
96 pages
Parser
No ratings yet
Parser
36 pages
Unit 3
No ratings yet
Unit 3
25 pages
Unit - Ii Topdown Parsing 1. Context-Free Grammars: Definition
No ratings yet
Unit - Ii Topdown Parsing 1. Context-Free Grammars: Definition
26 pages
Syntax Analysis: EECS 483 - Lecture 4 University of Michigan Monday, September 17, 2006
No ratings yet
Syntax Analysis: EECS 483 - Lecture 4 University of Michigan Monday, September 17, 2006
28 pages
Chapter 3
No ratings yet
Chapter 3
43 pages
Chapter 4 - Syntax Analysis
No ratings yet
Chapter 4 - Syntax Analysis
82 pages
Compilers - Week 3
No ratings yet
Compilers - Week 3
17 pages
Module 08
No ratings yet
Module 08
23 pages
DPSA Va
No ratings yet
DPSA Va
18 pages
Chapter Analysis
No ratings yet
Chapter Analysis
47 pages
03 Parsing
No ratings yet
03 Parsing
61 pages
Basic Structure of Publisher/Subscriber Design Pattern
No ratings yet
Basic Structure of Publisher/Subscriber Design Pattern
1 page
PA Chapter05
No ratings yet
PA Chapter05
103 pages
Intro of SA
No ratings yet
Intro of SA
1 page
Data Centered Architecture
No ratings yet
Data Centered Architecture
2 pages
Agents
No ratings yet
Agents
8 pages
SA Teachingplan - JPC
No ratings yet
SA Teachingplan - JPC
4 pages
Imperative Programmingvs Declarative
No ratings yet
Imperative Programmingvs Declarative
2 pages
Vivek Exp3
No ratings yet
Vivek Exp3
6 pages
N - W Lab Exp 5
No ratings yet
N - W Lab Exp 5
2 pages
Life Cycle of A Thread
No ratings yet
Life Cycle of A Thread
4 pages
One-Dimensional Assembly Tolerance Stack-Up
100% (2)
One-Dimensional Assembly Tolerance Stack-Up
26 pages
Assessment On Factors That Affect Tax Collection in Tepi Town Merchant's Yeki Woreda, Sheka Zone, South West Ethiopia
No ratings yet
Assessment On Factors That Affect Tax Collection in Tepi Town Merchant's Yeki Woreda, Sheka Zone, South West Ethiopia
9 pages
City University of Hong Kong Course Syllabus Offered by Department of Mathematics With Effect From Semester - A - 20 - 15 - / 16
No ratings yet
City University of Hong Kong Course Syllabus Offered by Department of Mathematics With Effect From Semester - A - 20 - 15 - / 16
6 pages
Chapter 9
No ratings yet
Chapter 9
34 pages
Cfa-Level-Ii-Errata 2021
No ratings yet
Cfa-Level-Ii-Errata 2021
3 pages
804YB Kendriya Vidyalaya Sangathan Hyderabad Region Common Summative Assessment - Ii
No ratings yet
804YB Kendriya Vidyalaya Sangathan Hyderabad Region Common Summative Assessment - Ii
8 pages
List of Syllabus of All Subjects in IISc
100% (1)
List of Syllabus of All Subjects in IISc
225 pages
How Indian Highways Are Numbered
No ratings yet
How Indian Highways Are Numbered
3 pages
MA 1140: Elementary Linear Algebra: Dipankar Ghosh (IIT Hyderabad)
No ratings yet
MA 1140: Elementary Linear Algebra: Dipankar Ghosh (IIT Hyderabad)
16 pages
COSC 3101A - Design and Analysis of Algorithms 7
No ratings yet
COSC 3101A - Design and Analysis of Algorithms 7
50 pages
Mathematical Quantization 1st Edition Nik Weaver Instant Download
100% (6)
Mathematical Quantization 1st Edition Nik Weaver Instant Download
61 pages
Pamantasan NG Lungsod NG Muntinlupa University Road Poblacion, Muntinlupa College of Teacher Education
No ratings yet
Pamantasan NG Lungsod NG Muntinlupa University Road Poblacion, Muntinlupa College of Teacher Education
8 pages
Math Problem Set with Solutions
No ratings yet
Math Problem Set with Solutions
5 pages
BG3801 L3 Medical Image Processing 14-15
No ratings yet
BG3801 L3 Medical Image Processing 14-15
18 pages
Math Lesson Plan 6th Grade
75% (4)
Math Lesson Plan 6th Grade
2 pages
Tools For Technology Enhanced Flexible Learning
No ratings yet
Tools For Technology Enhanced Flexible Learning
10 pages
Reflection - Project in Enhanced Mathematics 8
No ratings yet
Reflection - Project in Enhanced Mathematics 8
5 pages
Optional Challenge 2
0% (6)
Optional Challenge 2
3 pages
Chapter 6 Shear and Moments in Beams Updting 2020
No ratings yet
Chapter 6 Shear and Moments in Beams Updting 2020
19 pages
CH-1, Work Sheet
No ratings yet
CH-1, Work Sheet
2 pages
Aharonov-Anandan - PRL.1990.65.1697 - Geometry of Quantum Phase
No ratings yet
Aharonov-Anandan - PRL.1990.65.1697 - Geometry of Quantum Phase
4 pages
Reaction Engineering Course Outline
No ratings yet
Reaction Engineering Course Outline
181 pages
Mechanical Engineering MCQs: Production Tech
No ratings yet
Mechanical Engineering MCQs: Production Tech
27 pages
Algorithms For Data Compression in Wireless Computing Systems
No ratings yet
Algorithms For Data Compression in Wireless Computing Systems
7 pages
Errata Sheet for Module Corrections
No ratings yet
Errata Sheet for Module Corrections
6 pages
Daa Lab-2
No ratings yet
Daa Lab-2
6 pages
Optimization of The SWAT Model To Adequately Predict Different Segments of A Managed Streamflow Hydrograph
No ratings yet
Optimization of The SWAT Model To Adequately Predict Different Segments of A Managed Streamflow Hydrograph
21 pages
Lec 1
No ratings yet
Lec 1
54 pages
10th Maths - Monday Test-2
No ratings yet
10th Maths - Monday Test-2
8 pages