See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/285542683
Compiler Techniques - Syntax analyzer (Parser)
Data · December 2015
DOI: 10.13140/RG.2.1.2123.9127
CITATIONS READS
0 817
1 author:
Qasim Mohammed Hussein
Al-Kunooze University College
108 PUBLICATIONS 157 CITATIONS
SEE PROFILE
All content following this page was uploaded by Qasim Mohammed Hussein on 02 December 2015.
The user has requested enhancement of the downloaded file.
Dr. Qasim Mohammed Hussein Syntax analyzer
2. Syntax Analyzer
The syntax analyzer is called parser. The parser is the program that
determine if a string of tokens, which are produced by lexical, can be
generated by a grammar. The output of the parser is the parse tree.
A model of a compiler front end
The process of finding a parse tree for a given string of token is called
parsing that string.
The syntax of a programming language describes the proper form of its
programs, while the semantics of the language defines what its programs
mean; that is, what each program does when it executes
To specify a syntax of language, we use the context free grammar (GFC),
or Backus – Naur Form (BNF).
Context Free Grammar (CFG)
A context-free grammar has four components:
1. A set of terminal symbols (T), known tokens. Such as +, *, 5,
while.
2. A set of nonterminals (NT), which is sequence of tokens.
3. A set of productions, where each production consists of a
nonterminal (on left side of the production) an arrow, and a
sequence of terminals and/or nonterminal (right side of the
production).
4. A designation of one of the nonterminals as the start symbol.
For example, an if-else statement in C++ can have the form
if ( expression ) statement else statement
The production is
stmt if ( expr ) stmt else stmt
The "" may read as "can have the form". The variables like expr and
stmt represent sequences of tokens (called nonterminals)
25
Dr. Qasim Mohammed Hussein Syntax analyzer
CFG is capable of describing most, but not all, of the syntax of
programming language.
Any syntactic construct that can be described by regular
expression can be described by CFG.
Example 1: The regular expression (a|b)*abb can described by
the CFG as following:
A0 aA0 | bA0 | aA1
A1 bA2
A2 bA3
A3
The string of zero terminals, written as , is called the empty string.
The R.E. must useful for describing the structure of lexical such
as identifier, constant, keywords and so on. CFG are useful for
describing nested structure such as balanced parentheses, if-
then-else. The nested structure cannot be described by R.E.
The CFG that generate a language must show that:
- Every string generated by grammar is in language.
- Every string in language can be generated by grammar.
There are several ways to view the process by which grammar define a
language such as: derivation and parse tree.
Derivations
The derivation is a sequence of replacement. A grammar derives strings
by beginning with the start symbol and repeatedly replacing a
nonterminal by the body of a production for that nonterminal. The
derivation of the string is defined by the grammar. There is one
replacement for each step.
E - E “It read E drives –E”
Example 2: Consider the grammar S AbB | d; A m | aS; B cA | b.
Drive the string adbcm?
Answer
S AbB
S aSbB
S adbB
S adbcA
S adbcm
26
Dr. Qasim Mohammed Hussein Syntax analyzer
There are two ways to do the derivation: leftmost derivation and
rightmost derivation.
In leftmost derivations, the leftmost nonterminal in each sentential is
always chosen. In rightmost derivations, the rightmost nonterminal is
always chosen.
Example 3: Consider the grammar E - E | (E) | E+E | E-E | id. Find the
leftmost derivation and rightmost derivation of the string – (id + id)?
Answer
Leftmost derivation rightmost derivation
E-E E-E
E-E+E E-E+E
E - id + E E - E + id
E - id + id E - id + id
Parse tree
A parse tree pictorially shows how the start symbol of a grammar derives
a string in the language. A parse tree properties are:
1. 1. The root: it is labeled by start symbol.
2. Each leaf is labeled by a token or .
3. Each interior node is labeled by a nonterminal.
4. If A is NT labeling some interior node and X 1, X2, … Xn are the
labels of its children . They arrange from left to right. Such that
A X1 X2 … Xn. For example, if a nonterminal A has a
production AXYZ, then a parse tree is
Notes
1. Each node in the trees labeled by a grammar symbol. The interior
node corresponding to the left side of production, the children are
the right side.
2. The parse tree ignores the order in which symbols are replaced.
Example 4: Consider we have the grammar E - E | (E) | E+E | E-E | id.
Find the parse tree of – (id + id)?
Solution
27
Dr. Qasim Mohammed Hussein Syntax analyzer
Example 5: Consider we have the grammar
S d|AbDe A b a | a b D m b | n A . Find the parse tree
of abbnde?
Solution: The parse tree is
Ambiguity
The grammar that produce more than one parse tree for a given string of
tokens is said to ambiguous. Since the string with more than parse tree
has more than one meaning for compilation application.
Example 6: Consider the grammar E E A E | (E) | - E | id . The string
id+id*id has two distinct leftmost derivation as shown below.
28
Dr. Qasim Mohammed Hussein Syntax analyzer
Example 7: Consider the grammar
String string + string | string - string | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 ,
and we have the string 9 – 5 + 2. Is this grammar ambiguous?
Answer
Since we can construct two parse trees, as shown in figure, the grammar
Eliminating the ambiguity
Sometimes an ambiguous grammar can be rewritten to eliminate
the ambiguity. For arithmetic operators, we can disambiguous
the grammar by specifying associativity and precedence of the
operators.
In associativity of Operators, we suppose the to be right associative and
other operators to be left associative. Operators on the same line have the
same associativity and precedence:
The precedence of the operators in decreasing order is
29
Dr. Qasim Mohammed Hussein Syntax analyzer
- (unary minus)
/,*
+,-
For examples"
a b c is (a (b c))
a–b+c is ((a-b)-c)
a+-b c +d*e is (a+((-b) c))+(d*e)
We can also rewrite the grammar to include the associativity and
precedence rules into the grammar by introducing one
nonterminal for each precedence level. For example, the
grammar for the expression
Expression Expression + Term | Expression – Term | Term
Term= Term*Factor | Term/Factor | Factor
Factor = Primary Factor | Primary
Primary - Primary | Element
Element (Expression) | id
30
Dr. Qasim Mohammed Hussein Syntax analyzer
Parsing techniques
There are two basic types of parser for CFG:
1. Bottom-up. Start from leaves to the root.
2. Top-down. Build parse tree from root down to leaves. It
called recursive descent parsing.
In both cases, the input in the parser is scanned from left to
right, one symbol at a time.
Bottom-up parsing
The general style of bottom up of parsing is known shift reduce
parsing. Shift reduce parsing attempts to construct a parse tree
for an input string beginning at the leaves (the bottom) and
working up toward the root (the top). The
At each reduction step, a particular substring matching the right
side of a production is replaced by the symbol on the left of that
production, and if the substring is chosen correctly at each step,
a rightmost derivation is traced out in reverse.
Example 8: Consider the grammar
S aABe
A Abc | b
Bd
The sentence abbcde can be reduced to S by the following
steps:
abbcde
aAbcde
aAde
aABe
S
Each replacement of the right side of a production by the left
side in the process above is called a reduction.
Handles
A handle of a string is a substring that matches the right side of
a production, and whose reduction to the nonterminal on the left
of that production represent one step along the reverse of a
rightmost derivation. In many cases the left substring β that
matches the right side of some production A β is not a
31
Dr. Qasim Mohammed Hussein Syntax analyzer
handle, because a reduction by the production A β yields a
string that cannot be reduced to the start symbol.
Example 9: Consider the following grammar
(1) E E + E
(2) E E * E
(3) E (E)
(4) E id
And the right most derivation
E E+E
E + E*E
E+E*id3
E +id2 *id3
id1 + id2 *id3
For example id1 is a handle of the right – sentential form
E+id2*id3 because the right side of production E id
Because the above grammar is ambiguous there is another
rightmost derivation of the same string.
E E*E
E * id3
E+E*id3
E +id2 *id3
id1 + id2 *id3
Example 10: Consider the grammar in ex.3 rand the input string
id1+id2*id3. Find the sequence of reduction to stare symbol E
using shift – reduce parse?
Right-sentential form handle Reducing production
id1 + id2 * id3 id1 E id
E + id2 * id3 id2 E id
E +E * id3 id3 E id
E +E * E E*E EE*E
E +E E+E EE +E
E
32
Dr. Qasim Mohammed Hussein Syntax analyzer
Stack Implementation of Shift Reduce Parsing
There are two problems that must be solved if we are parse by
handle pruning. The first is how to locate a handle in a right–
sentential form, and the second is what production to chose in
case there is more than one production with the same right side.
To implement a shift – reduce parser is to use a stack and an
input buffer and will use $ to mark the bottom of the stack and
the right end of the input.
The parser operates by shifting zero or more input symbols onto
the stack until a handle β is on top of the stack. The parser then
reduce β to the left side of the appropriate production. The
parser repeats this cycle until it has detected an error or until the
stack contains the start symbol and the input is empty.
Example11: Consider the following grammar
(1) E E + E
(2) E E * E
(3) E (E)
(4) E id
Find the sequence of actions using shift – reduce parser in
parsing the input string id1+id2*id3?
stack input Action
(1) $ id1 + id2*id3$ Shift
(2) $id1 + id2*id3$ reduce by Eid
(3) $E + id2*id3$ shift
(4) $E+ id2*id3$ shift
(5) $E+id2 *id3$ reduce by Eid
(6) $E+E *id3$ shift
(7) $E+E* id3$ shift
(8) $E+E*id3 $ reduce by Eid
(9) $E+E*E $ reduce by EE*E
(10) $E+E $ reduce by EE+E
(11) $E $ accept
Configuration of shift-reduce parser on input id1+id2*id3
33
Dr. Qasim Mohammed Hussein Syntax analyzer
There are four possible actions of the parser
1. Shift. The next input symbol is shifted into the top of the
stack.
2. Reduce. The parser knows the right end of the handle is at
the top of the stack. It must then locate the left end of the
handle within the stack and decide with what nonterminal
to replace the handle.
3. Accept. The parser announces successful completion of
parsing.
4. Error. The parser discovers a syntax error and call an error
recovery routine.
Example 12: Consider the following grammar.
(10 degrees)
S==> e A B | A b B; a e | a D; B d A | b; D B m | a b.
Parse the input string “abmbdae” using shift reduce parsing
Answer
Stack input Action
$ abmbdae$ shift
$a bmbdae$ shift
$ab mbdae$ reduce Bb
$aB mbdae$ shift
$aBm bdae$ reduce DBm
SaD bdae$ reduce AaD
$A bdae$ shift
$Ab dae$ shift
$Abd ae$ shift
$Abda e$ shift
$Abdae $ reduce Aae
$AbdA $ reduce BDa
$AbB $ reduce SAbB
$S $ accept
The string is accepted in this grammar
Questions:
1) Consider the grammar S (L) | a; L L , S | S
Parse the string w = (a,a,a))?
2) Consider the grammar S AB | e; Aa | bd; B d | SA
Parse the strings aebd?
34
View publication stats