0% found this document useful (0 votes)

602 views

Introduction To Javacc: Cheng-Chia Chen

This document provides an introduction to JavaCC, a parser generator written in Java. It discusses: - What a parser generator is and how it can generate a scanner and parser from a lexical and grammar specification. - The key features of JavaCC including that it is a top-down LL(K) parser generator that allows for lexical and grammar specifications in one file along with tree building, customization options, documentation generation, and internationalization. - The basic steps to use JavaCC which include writing a .jj specification file defining the grammar and actions, running JavaCC to generate source files, writing a program that uses the generated parser, and compiling and running the program. An example regular expression

Uploaded by

AsmaBatoolNaqvi

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

602 views

Introduction To Javacc: Cheng-Chia Chen

Uploaded by

AsmaBatoolNaqvi

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 87

Introduction to JavaCC

Cheng-Chia Chen

1
What is a parser generator

T o t a l = p r i c e + t a x ;

Scanner

Total = price + tax ;

assignment Parser

Total = Expr

id + id Parser generator
(JavaCC)
price tax
lexical+grammar
specification 2
JavaCC
• JavaCC (Java Compiler Compiler) is a scanner and
parser generator;
• Produce a scanner and/or a parser written in java, itself
is also written in Java;
• There are many parser generators.
– yacc (Yet Another Compiler-Compiler) for C programming
language (dragon book chapter 4.9);
– Bison from gnu.org
• There are also many parser generators written in Java
– JavaCUP;
– ANTLR;
– SableCC

3
More on classification of java parser generators
• Bottom up Parser Generators Tools
– JavaCUP;
– jay, YACC for Java www.inf.uos.de/bernd/jay
– SableCC, The Sable Compiler Compiler www.sablecc.org
• Topdown Parser Generators Tools
– ANTLR, Another Tool for Language Recognition www.antlr.org
– JavaCC, Java Compiler Compiler javacc.dev.java.net

4
Features of JavaCC
• TopDown LL(K) parser genrator
• Lexical and grammar specifications in one file
• Tree Building preprocessor
– with JJTree
• Extreme Customizable
– many different options selectable
• Document Generation
– by using JJDoc
• Internationalized
– can handle full unicode
• Syntactic and Semantic lookahead

5
Features of JavaCC (cont’d)
• Permits extneded BNF specifications
– can use | * ? + () at RHS.
• Lexical states and lexical actions
• Case-sensitive/insensitive lexical analysis
• Extensive debugging capability
• Special tokens
• Very good error reporting

6
JavaCC Installation
• Download the file javacc-4.X.zip from https://
javacc.dev.java.net/
• unzip javacc-4.X.zip to a directory %JCC_HOME%
• add %JCC_HOME\bin directory to your %path%.
– javacc, jjtree, jjdoc are now invokable directly from the command
line.

7
Steps to use JavaCC
• Write a javaCC specification (.jj file)
– Defines the grammar and actions in a file (say, calc.jj)
• Run javaCC to generate a scanner and a parser
– javacc calc.jj
– Will generate parser, scanner, token,… java sources
• Write your program that uses the parser
– For example, UseParser.java
• Compile and run your program
– javac -classpath . *.java
– java -cp . mainpackage.MainClass

8
Example 1: parse a spec of regular expressions
and match it with input strings
• Grammar : re.jj
• Example
– % all strings ending in "ab"
– (a|b)*ab;
– aba;
– ababb;
• Our tasks:
– For each input string (Line 3,4) determine whether it matches the
regular expression (line 2).

9
the overall picture

% comment REParserTo tokens

REParser MainClass
kenManager
(a|b)*ab;
a;
result
ab; javaCC

re.jj

10
Format of a JavaCC input Grammar
• javacc_options
• PARSER_BEGIN ( <IDENTIFIER>1 )
java_compilation_unit
PARSER_END ( <IDENTIFIER>2 )
• ( production )*

11
the input spec file (re.jj)
options {
USER_TOKEN_MANAGER=false;
BUILD_TOKEN_MANAGER=true;
OUTPUT_DIRECTORY="./reparser";
STATIC=false;
}

12
13
re.jj
PARSER_BEGIN(REParser)
package reparser;

import java.lang.*;
…
import dfa.*;

public class REParser {

public FA tg = new FA();

// output error message with current line number

public static void msg(String s) {
System.out.println("ERROR"+s);
}

public static void main(String args[]) throws Exception {

REParser reparser = new REParser(System.in);

reparser.S();
}
}
PARSER_END(REParser)

14
re.jj (Token definition)
TOKEN : {
<SYMBOL: ["0"-"9","a"-"z","A"-"Z"] >
| <EPSILON: "epsilon" >
| <LPAREN: "(“ >
| <RPAREN: ")“ >
| <OR: "|" >
| <STAR: "*“ >
| <SEMI: ";“ >

SKIP: {
< ( [" ","\t","\n","\r","\f"] )+ >
|
< "%" ( ~ ["\n"] )* "\n" > { System.out.println(image); }
}

15
re.jj (productions)
void S() : { FA d1; }
{
d1 = R() <SEMI>
{ tg = d1; System.out.println("------NFA"); tg.print();

System.out.println("------DFA");
tg = tg.NFAtoDFA(); tg.print();

System.out.println("------Minimize");
tg = tg.minimize(); tg.print();

System.out.println("------Renumber");
tg=tg.renumber(); tg.print();

System.out.println("------Execute");
}
testCases()

16
re.jj
void testCases() : {}
{ (testCase() )+ }

void testCase(): { String testInput ;}

{ testInput = symbols()
<SEMI>
{ tg.execute( testInput) ; }
}

String symbols() :
{Token token = null; StringBuffer result = new StringBuffer(); }
{
(
token = <SYMBOL>
{ result.append( token.image) ; }
)*
{ return result.toString(); }
}

17
re.jj (regular expression)
// R --> RUnit | RConcat | RChoice

FA R() : {FA result ;}

{ result = RChoice() { return result; } }

FA RUnit() :
{ FA result ; Token d1; }
{
(
<LPAREN> result = RChoice() <RPAREN>
|
<EPSILON> { result = tg.epsilon(); }
|
d1 = <SYMBOL> { result = tg.symbol( d1.image ); }
)
{ return result ; }
}

18
re.jj
FA RChoice() : { FA result, temp ;}
{
result = RConcat()
( <OR> temp = RConcat() { result = result.choice( temp ) ;} )*
{return result ; } }

FA RConcat() : { FA result, temp ;}

{ result = RStar()
( temp = RStar() { result = result.concat( temp ) ;} )*
{return result ; } }

FA RStar() : {FA result;}

{ result = RUnit()
( <STAR> { result = result.closure(); } )*
{ return result; } }

19
Format of a JavaCC input Grammar
javacc_input ::= javacc_options
PARSER_BEGIN ( <IDENTIFIER>1 )
java_compilation_unit
PARSER_END ( <IDENTIFIER>2 )
( production )*
<EOF>

color usage:
– blue --- nonterminal
– <orange> – a token type
– purple --- token lexeme ( reserved word;
– I.e., consisting of the literal itself.)
– black -- meta symbols

20
Notes
• <IDENTIFIER> means any Java identifers like var, class2, …
– IDENTIFIER means IDENTIFIER only.
• <IDENTIFIER>1 must = <IDENTIFIER>2
• java_compilation_unit is any java code that as a whole can appear
legally in a file.
– must contain a main class declaration with the same name as
<IDENTIFIER>1 .
• Ex:
PARSER_BEGIN ( MyParser )
package mypackage;
import myotherpackage….;
public class MyParser { … }
class MyOtherUsefulClass { … } …
PARSER_END (MyParser)

21
The input and output of javacc TokenMgrError.java

(MyLangSpec.jj )
Token.java

PARSER_BEGIN ( MyParser ) javacc

package mypackage;
ParseException.java
import myotherpackage….;
public class MyParser { … }
class MyOtherUsefulClass { … } …
PARSER_END (MyParser)
MyParser.java
…

MyParserCostant.java MyParserTokenManager.java

22
Notes:
• Token.java and ParseException.java are the same for all input and
can be reused.
• package declaration in *.jj are copied to all 3 outputs.
• import declarations in *.jj are copied to the parser and token
manager files.
• parser file is assigned the file name <IDENTIFIER>1 .java
• The parser file has contents:
…class MyParser { …
//generated parser is inserted here.
…}
• The generated token manager provides one public method:
Token getNextToken() throws TokenMgeError;

23
Lexical Specification with JavaCC

24
javacc options
javacc_options ::=
[ options { ( option_binding )* } ]

• option_binding are of the form :

– <IDENTIFIER>3 = <java_literal> ;
– where <IDENTIFIER>3 is not case-sensitive.
• Ex:
options {
USER_TOKEN_MANAGER=true;
BUILD_TOKEN_MANAGER=false;
OUTPUT_DIRECTORY="./sax2jcc/personnel";
STATIC=false;
LOOKAHEAD=2;
}

25
More Options
• LOOKAHEAD
– java_integer_literal (1)
• CHOICE_AMBIGUITY_CHECK
– java_integer_literal (2) for A | B … | C
• OTHER_AMBIGUITY_CHECK
– java_integer_literal (1) for (A)*, (A)+ and (A)?
• STATIC (true)
• DEBUG_PARSER (false)
• DEBUG_LOOKAHEAD (false)
• DEBUG_TOKEN_MANAGER (false)
• OPTIMIZE_TOKEN_MANAGER
– java_boolean_literal (false)
• OUTPUT_DIRECTORY (current directory)
• ERROR_REPORTING (true)

26
More Options
• JAVA_UNICODE_ESCAPE (false)
– replace \u2245 to actual unicode (6 char  1 char)
• UNICODE_INPUT (false)
– input strearm is in unicode form
• IGNORE_CASE (false)
• USER_TOKEN_MANAGER (false)
– generate TokenManager interface for user’s own scanner
• USER_CHAR_STREAM (false)
– generate CharStream.java interface for user’s own inputStream
• BUILD_PARSER (true)
– java_boolean_literal
• BUILD_TOKEN_MANAGER (true)
• SANITY_CHECK (true)
• FORCE_LA_CHECK (false)
• COMMON_TOKEN_ACTION (false)
– invoke void CommonTokenAction(Token t) after every getNextToken()
• CACHE_TOKENS (false)

27
Example: Figure 2.2
1. if IF
2. [a-z][a-z0-9]* ID
3. [0-9]+ NUM
4. ([0-9]+”.”[0-9]*) | ([0-9]*”.”[0-9]+) REAL
5. (“--”[a-z]*”\n”) | (“ “|”\n” | “\t” )+ nonToken, WS
6. . error
• javacc notations 
1. “if” or “i” “f” or [“i”][“f”]
2. [“a”-”z”]([“a”-”z”,”0”-”9”])*
3. ([“0”-”9”])+
4. ([“0”-”9”])+ “.” ( [“0”-”9”] ) * |
([“0”-”9”])* ”.” ([“0”-”9”])+

28
JvaaCC spec for the tokens from Fig 2.2
PARSER_BEGIN(MyParser) class MyParser{}
PARSER_END(MyParser)
/* For the regular expressin on the right, the token on the left will be
returned */
TOKEN : {
< IF: “if” >
| < #DIGIT: [“0”-”9”] >
|< ID: [“a”-”z”] ( [“a”-”z”] | <DIGIT>)* >
|< NUM: (<DIGIT>)+ >
|< REAL: (<DIGIT>)+ “.” (<DIGIT>)* |
(<DIGIT>)+ “.” (<DIGIT>)* >
}

29
JvaaCC spec for the tokens from Fig
2.2 (continued)
/* The regular expression here will be skipped during lexical analysis */
SKIP : { < “ “> | <“\t”> |<“\n”> }

/* like SKIP but skipped text accessible from parser action */

SPECIAL_TOKEN : {
<“--” ([“a”-”z”])* (“\n” | “\r” | “\n\r” ) >
}

/* . For any substring not matching lexical spec, javacc will throw an
error */
/* main rule */
void start() : {}
{ (<IF> | <ID> |<NUM> |<REAL>)* }
30
31
Grammar Specification with JavaCC

32
The Form of a Production
java_return_type java_identifier ( java_parameter_list ) :
java_block
{ expansion_choices }

• EX :
void XMLDocument(Logger logger): { int msg = 0; }
{ <StartDoc> { print(token); }
Element(logger)
<EndDoc> { print(token); }
| else()
}

33
Example ( Grammar 3.30 )
1. PL
2. S  id := id
3. S  while id do S
4. S  begin L end
5. S if id then S
6. S  if id then S else S
7. L S
8. L L;S

1,7,8 : P  S (;S)*

34
JavaCC Version of Grammar 3.30
PARSER_BEGIN(MyParser)
pulic class MyParser{}
PARSRE_END(MyParser)

SKIP : {“ “ | “\t” | “\n” }

TOKEN: {
<WHILE: “while”> | <BEGIN: “begin”> | <END:”end”>
| <DO:”do”> | <IF:”if”> | <THEN : “then”>
| <ELSE:”else”> | <SEMI: “;”> | <ASSIGN: “=“>
|<#LETTER: [“a”-”z”]>
| <ID: <LETTER>(<LETTER> | [“0”-”9”] )* >

35
JavaCC Version of Grammar 3.30 (cont’d)
void Prog() : { } { StmList() <EOF> }

void StmList(): { } {
Stm() (“;” Stm() ) *
}

void Stm(): { } {
<ID> “=“ <ID>
| “while” <ID> “do” Stm()
| <BEGIN> StmList() <END>
| “if” <ID> “then” Stm() [ LOOKAHEAD(1) “else” Stm() ]

36
Types of producitons
• production ::= javacode_production
| regulr_expr_production
| bnf_production
| token_manager_decl

Note:
1,3 are used to define grammar.
2 is used to define tokens
4 is used to embed codes into token manager.

37
JAVACODE production
• javacode_production ::= “JAVACODE”
java-return_type iava_id “(“ java_param_list “)”
java_block

• Note:
– Used to define nonterminals for recognizing sth that is hard to
parse using normal production.

38
Example JAVACODE
JAVACODE void skip_to_matching_brace()
{
Token tok;
int nesting = 1;
while (true) {
tok = getToken(1);
if (tok.kind == LBRACE) nesting++;
if (tok.kind == RBRACE) {
nesting--;
if (nesting == 0) break; }
tok = getNextToken(); } }

39
Note:
• Do not use nonterminal defined by JAVACODE at choice
point without giving LOOKHEAD.
• void NT() : {} {
skip_to_matching_brace()
| some_other_production()
}
• void NT() : {} {
"{" skip_to_matching_brace()
| "(" parameter_list() ")"
}

40
41
TOKEN_MANAGER_DECLS
token_manager_decls ::=
TOKEN_MGR_DECLS : java_block

• The token manager declarations starts with the reserved

word "TOKEN_MGR_DECLS" followed by a ":" and then a
set of Java declarations and statements (the Java block).
• These declarations and statements are written into the
generated token manager (MyParserTokenManager.java)
and are accessible from within lexical actions.
• There can only be one token manager declaration in a
JavaCC grammar file.

42
regular_expression_production
regular_expr_production ::=
[ lexical_state_list ]
regexpr_kind [ [ IGNORE_CASE ] ] :
{ regexpr_spec ( | regexpr_spec )* }

• regexpr_kind::=
TOKEN | SPECIAL_TOKEN | SKIP | MORE

• TOKEN is used to define normal tokens

• SKIP is used to define skipped tokens (not passed to later parser)
• MORE is used to define semi-tokens (I.e. only part of a token).
• SPECIAL_TOKEN is between TOKEN and SKIP tokens in that it is
passed on to the parser and accessible to the parser action but is
ignored by production rules (not counted as an token). Useful for
representing comments.

43
lexical_state_list
lexical_state_list::=
< * > | < java_identifier ( , java_identifier )* >
• The lexical state list describes the set of lexical states for
which the corresponding regular expression production
applies.
• If this is written as "<*>", the regular expression
production applies to all lexical states. Otherwise, it
applies to all the lexical states in the identifier list within
the angular brackets.
• if omitted, then a DEFAULT lexical state is assumed.

44
regexpr_spec
regexpr_spec::=
regular_expression1 [ java_block ] [ : java_identifier ]

• Meaning:
• When a regular_expression1 is matched then
– if java_block exists then execute it
– if java_identifier appears, then transition to that lexical state.

45
regular_expression
regular_expression ::=
java_string_literal
| < [ [#] java_identifier : ] complex_regular_expression_choices >
| <java_identifier>
| <EOF>

• <EOF> is matched by end-of-file character only.

• (3) <java_identifier> is a reference to other labeled regular_expression.
– used in bnf_production
• java_string_literal is matched only by the string denoted by itself.
• (2) is used to defined a labled regular_expr and not visible to outside
the current TOKEN section if # occurs.
• (1) for unnamed tokens

46
Example
<DEFAULT, LEX_ST2> TOKEN [IGNORE_CASE] : {
< FLOATING_POINT_LITERAL:
(["0"-"9"])+ "." (["0"-"9"])* (<EXPONENT>)? (["f","F","d","D"])? |
"." (["0"-"9"])+ (<EXPONENT>)? (["f","F","d","D"])? |
(["0"-"9"])+ <EXPONENT> (["f","F","d","D"])? |
(["0"-"9"])+ (<EXPONENT>)? ["f","F","d","D"] >
{ // do Something } : LEX_ST1
| < #EXPONENT: ["e","E"] (["+","-"])? (["0"-"9"])+ >
}
• Note: if # is omitted, E123 will be recognized erroneously
as a token of kind EXPONENT.

47
Structure of complex_regular_expression
• complex_regular_expression_choices::=
complex_regular_expression (| complex_regular_expression )*
• complex_regular_expression ::=
( complex_regular_expression_unit )*
• complex_regular_expression_unit ::=
java_string_literal | < java_identifier >
| character_list
| ( complex_regular_expression_choices ) [+|*|?]

• Note:
unit concatenation;juxtaposition
concatenation;juxtaposition
complex_regular_expression choice; | 
complex_regular_expression_choice (.)[+|*|?] 
unit
• Principle :
先串接再選擇 , 套用重複運算必須先加括號

48
character_list
character_list::=
[~] [ [ character_descriptor ( , character_descriptor )* ] ]
character_descriptor::=
java_string_literal [ - java_string_literal ]
java_string_literal ::= // reference to java grammar
“ singleCharString* “
note: java_sting_literal here is restricted to length 1.
ex:
– ~[“a”,”b”] --- all chars but a and b.
– [“a”-”f”, “0”-”9”, “A”,”B”,”C”,”D”,”E”,”F”] --- hexadecimal digit.
– [“a”,”b”]+ is not a regular_expression_unit. Why ?
• should be written ( [“a”,”b”] )+ instead.

49
bnf_production
• bnf_production::=
java_return_type java_identifier "(" java_parameter_list ")"
":"
java_block
"{" expansion_choices "}“

• expansion_choices::= expansion ( "|" expansion )*

• expansion::= ( expansion_unit )*

50
expansion_unit
• expansion_unit::=
local_lookahead
| java_block
| "(" expansion_choices ")" [ "+" | "*" | "?" ]
| "[" expansion_choices "]"
| [ java_assignment_lhs "=" ] regular_expression
| [ java_assignment_lhs "=" ]
java_identifier "(" java_expression_list ")“
Notes:
1 is for lookahead; 2 is for semantic action
4 = ( …)?
5 is for token match
6. is for match of other nonterminal
51
lookahead
• local_lookahead::= "LOOKAHEAD" "("
[ java_integer_literal ] [ "," ] [ expansion_choices ] [ "," ]
[ "{" java_expression "}" ] ")“

• Notes:
• 3 componets: max # lookahead + syntax + semantics
• examples:
– LOOKHEAD(3)
– LOOKAHEAD(5, Expr() <INT> | <REAL> , { true} )
• More on LOOKAHEAD
– see minitutorial on javacc.dev.java.net

52
JavaCC API
• Non-Terminals in the Input Grammar
• NT is a nonterminal =>
returntype NT(parameters) throws ParseError;
is generated in the parser class

• API for Parser Actions

• Token token;
– variable always holds the last token and can be used in
parser actions.
– exactly the same as the token returned by getToken(0).
– two other methods - getToken(int i) and getNextToken() can
also be used in actions to traverse the token list.

53
Token class
• public int kind;
– 0 for <EOF>
• public int beginLine, beginColumn, endLine, endColumn;
• public String image;
• public Token next;
• public Token specialToken;
• public String toString()
• { return image; }
• public static final Token newToken(int ofKind)

54
Error reporting and recovery
• It is not user friendly to throw an exception and exit the
parsing once encountering a syntax error.

• two Exceptions
– ParseException .  can be recovered
– TokenMgrError  not expected to be recovered

• Error reporting
– modify ParseExcpetion.java or TokenMgeError.java
– generateParseException method is always invokable in parser
action to report error

55
Error Recovery in JavaCC:
• Shallow Error Recovery
• Deep Error Recovery

• Shallow Error Recovery

• Ex:
void Stm() : {} {
IfStm()
| WhileStm() }

if getToken(1) != “if” or “while” => shallow error

56
Shallow recovery
can be recovered by additional choice:
void Stm() : {} {
IfStm()
| WhileStm()
| error_skipto(SEMICOLON)
}
where
JAVACODE
void error_skipto(int kind) {
ParseException e = generateParseException(); // generate the exception
object.
System.out.println(e.toString()); // print the error message
Token t;
do { t = getNextToken(); } while (t.kind != kind);}

57
Deep Error Recovery
• Same example: void Stm() : {} { IfStm() | WhileStm() }
• But this time the error occurs during paring inside
IfStmt() or WhileStmt() instead of the lookahead entry.
• The approach: use java try-catch construct.
void Stm() : {} {
try {
( IfStm() | WhileStm() )
} catch (ParseException e) {
error_skipto(SEMICOLON);
}
}
note: the new syntax for javacc bnf_production.

58
References:

• javaCC web site :

http://Javacc.dev.java.net

• JavaCC documentation :
https://javacc.dev.java.net/doc/docindex.html

59
Looking ahead in javacc

60
What’s LOOKAHEAD?
• What strings are void Input() :
{}
matches by Input() ?
{
– abcc (yes)
"a" BC() "c"
– abc (no!!) }
• Why ?
– javacc‘s default greedy void BC() :
lookahead alg. {}
{
"b" [ "c" ]
}

61
Why matching abcc ?
• Input() :abcc
• “a”  consume a :abcc
• BC() :bcc
• “b” consume b :bcc
• [“c”] greedily consume c : cc
• “c”  consume c :c
• succeed! :

62
Why abc not matched ?
• Input() :abc
• “a”  consume a :abc
• BC() :bc
• “b” consume b :bc
• [“c”] greedily consume c :c
• even if no consumption seems better
• “c”  need a ‘c’ :  don’t match
• fail!

• Why such behavior ?

– 1 one symbol lookahead(for performance)
– 2. avoid Backtracking!

63
How to math both input ?
• Rewrite the grammar!
• increase lookhead number

64
What about these rewritings ?

void Input() : good! void Input() :

{}
{} {
{ "a" ( BC1() | BC2() )
}
"a" "b" "c" [ "c" ]
} void BC1() :
{}
void Input() : {
{} "b" "c" "c"
}
{
"a" "b" "c" "c" void BC2() :
{}
| {
"b" "c" [ "c" ]
"a" "b" "c" }
}
65
Looking Ahead
• Backtracking is unacceptable language parser
• LOOKAHEAD:
– The process of exploring tokens further in the input
stream to determine decision at various choice points.
– once making a decision, it commits to it and there is no
backtracking!
– Since some of these decisions may be made with less
than perfect information you need to know something
about LOOKAHEAD to make your grammar work
correctly.
• The two ways in which you make the choice
decisions work properly are:
1. Modify the grammar to make it simpler.
2. Insert hints at the more complicated choice points to
help the parser make the right choices.
66
Four Choice Points in javacc
• ( exp1 | exp2 | ... | expn )
– which one to match ?
• ( … )?
– To match content inside () or bypass ?
• ( … )*
– To leave or match and then repeat ?
• ( … )+ = (…)(…)*
– To leave or match and repeat after first
match ?

67
The Default Algo for choice |
• The default choice determination algorithm looks ahead 1
token in the input stream and uses this to help make its
choice at choice points

void basic_expr() :
The choice determination algorithm :
{}
{ if (next token is <ID>) {
<ID> "(" expr() ")" // Choice 1 choose Choice 1
| } else if (next token is "(") {
"(" expr() ")" // Choice 2 choose Choice 2
| } else if (next token is "new") {
"new" <ID> // Choice 3 choose Choice 3
} } else {
produce an error message
}

68
A Modified Grammar

void basic_expr() :
{}
{
What happans
<ID> "(" expr() ")“ // Choice 1
on <ID>?
|
"(" expr() ")" // Choice 2
| Why?
"new" <ID> // Choice 3
|
<ID> "." <ID> // Choice 4
}
Warning: Choice conflict involving two expansions at line 25,
column 3 and line 31, column 3 respectively. A common
prefix is: <ID> Consider using a lookahead of 2 for earlier expansion. 69
Greedy behavior for (…)*
Note: the choice determination
void identifier_list() : algorithm does not look beyond
{} the (...)*
{
<ID> ( "," <ID> )*
}

• Suppose the first <ID> has already been matched and that the parser has
reached the choice point (the (...)* construct). Here's how the choice
determination algorithm works:

while (next token is ",") {

choose the nested expansion (i.e., go into the (...)* construct)
consume the "," token
if (next token is <ID>) consume it, otherwise report error
}

70
Another Example
• When making a choice at ( "," <ID> )*, it will always go into
the (...)* construct if the next token is a ",".
– It will do this even when identifier_list was called from
funny_list and the token after the "," is an <INT>.
– Intuitively, the right thing to do in this situation is to skip
the (...)* construct and return to funny_list

void identifier_list() : void funny_list() :

{} {}
{ {
<ID> ( "," <ID> )* identifier_list() "," <INT>
} }

71
A Concrete input
One input "id1, id2, 5",
the parser will complain that it encountered a 5 when it was
expecting an <ID>.
•Note – during parser generation, javacc would give the
warning message:

Warning: Choice conflict in (...)* construct at line 25, column 8.

Expansion nested within construct and expansion following
construct have common prefixes, one of which is: ",“ Consider
using a lookahead of 2 or more for nested expansion.

•Essentially, JavaCC is saying it has detected a situation

which may cause the parser to do strange things. The
generated parser will still work - except that it probably
doesn’t do what you expect

72
Multiple Token Lookaheads Specs
• the default algorithm works fine in most situations. In
cases where it does not work well, javacc provides you
with warning messages.

• If you have javacc file without producing any warnings,

then the grammar is a LL(1) grammar.

• There are two options for lookahead if your grammar is

not LL(1).
– Modify your grammar to make it LL(1)
– give more lookaheads globally or where needed.

73
Option 1 - Modify your grammar
• You can modify your grammar so that the warning
messages go away. That is, you can attempt to make
your grammar LL(1) by making some changes to it
• But not always work!

void basic_expr() : {} {
<ID> "(" expr() ")“ // Choice 1 Factor void basic_expr() :{ } {
common
| left parts <ID> ( "(" expr() ")" | "." <ID> )
|
"(" expr() ")" // Choice 2
"(" expr() ")"
| |
"new" <ID>
"new" <ID> // Choice 3
}
|
<ID> "." <ID> // Choice 4
} 74
Factoring not always work!!
void basic_expr() : {}
{
{ initMethodTables(); } <ID> "(" expr() ")"
|
"(" expr() ")"
|
"new" <ID>
|
{ initObjectTables(); } <ID> "." <ID>
}

• Since the actions are different, left-factoring cannot be

performed.
75
Option 2 – Increase lookadeads
• You can provide the generated parser with some hints to
help it out in the non LL(1) situations.

• All such hints are specified using either

– setting the global LOOKAHEAD option to a larger value or
– using the LOOKAHEAD(...) construct to provide a local hint on
puzzled choice points.

• Comparisons between the two options

– Option 1 makes your grammar perform better.
– Option 2 give you a simpler grammar - one that is
easier to develop and maintain - one that is more
human friendy.
– Sometimes Option 2 is the only choice.

76
• Global Option LOOKAHEAD
– options { LOOKAHEAD=2; … }
• local lookahead :

void
voidbasic_expr()
basic_expr(): :
{}{}
{{ ifif(next
LOOKAHEAD(2) (next22tokens
tokensare
are<ID>
<ID>and and"("
"(") ){ {
LOOKAHEAD(2) choose
chooseChoice
Choice11
<ID>
<ID>"("
"("expr()
expr()")"//
")"//Choice
Choice11 } }else
|| elseifif(next
(nexttoken
tokenisis"(")
"("){ {
choose
chooseChoice
Choice22
"("
"("expr()
expr()")"
")" ////Choice
Choice22 } }else
|| elseifif(next
(nexttoken
tokenisis"new")
"new"){ {
choose
chooseChoice
Choice33
"new"
"new"<ID> //
//Choice
Choice33 } }else
||
<ID> elseifif(next
(nexttoken
tokenisis<ID>)
<ID>){ {
choose
chooseChoice
Choice44
<ID>
<ID>"."
"."<ID>
<ID> // //Choice
Choice44 } }else
}} else{ {
produce
producean anerror
errormessage
message
}}
77
void identifier_list() :
{}
{
<ID>
( LOOKAHEAD(2) "," <ID> )*
}

while (next 2 tokens are "," and <ID>) {

choose the nested expansion (i.e., go into the (...)* construct)
consume the "," token
consume the <ID> token
}

78
Syntactic lookahead
• How many lookaheads are needed in the java type
declaration ?

void TypeDeclaration() :
{}
{
ClassDeclaration() // public static final class
|
InterfaceDeclaration() // public abstract abstract interface
}

79
Solution 1

void TypeDeclaration() :
{}
{
LOOKAHEAD(2147483647) ClassDeclaration()
|
InterfaceDeclaration()
}

• Where 2147483647 is Integer.MAX_VALUE.

• Maybe 100 is ok as well !

80
Solution 2 – syntactic lookahead

void TypeDeclaration() :
{}
{
LOOKAHEAD( ClassDeclaration() )
ClassDeclaration()
|
InterfaceDeclaration()
}

• Lookahead of a complete ClassDeclaraation() takes too

much time and makes a lot of unnecessary checking.

81
Solution 3 – a better one

void TypeDeclaration() :
{}
{
LOOKAHEAD( ( "abstract" | "final" | "public" )* "class" )
ClassDeclaration()
|
InterfaceDeclaration()
}

82
Solution 4 – syntactic lookahead + number bound

void TypeDeclaration() :{}

{
LOOKAHEAD( 10, ( "abstract" | "final" | "public" )* "class" )
ClassDeclaration()
|
InterfaceDeclaration()
}
• Meaning: Look ahead at most 10 tokens, if not violating
the pattern ( "abstract" | "final" | "public" )* "class" try
ClassDeclaration().
• default max numbers of tokens to be looked ahead is
Integer.MAX_VALUE for syntactic lookahead.

83
Semantic lookahead
• Could we make the parser choose 2nd alternative on
input “a” “a” without changing the order ?
void Input() : {}
{
"a“
| “a” “a”
}

• Syntactic lookahead impossible since it can’t say things

like that next toke is “a” and following token is not “a”.

84
Solution: semantic lookahead
void Input() : {}
{
LOOKAHEAD( { getToken(1).kind == A && getToken(2).kind != A })
<A:"a“>
| “a” “a”
}
• syntactic + semantic
void Input() : {}
{
LOOKAHEAD(“a”, {getToken(2).kind != A })
<A:"a“>
| “a” “a”
}

85
Complete LOOKAHEAD directive

LOOKAHEAD( amount,
expansion,
{ boolean_expression } ) followExpansion

• At least one of the three entries must be present.

• The default values for each of these entities is defined
below:
– { boolean_expr }  { true;}
– exapnsion  followExpansion
– "amount“  expansion present ? 2147483647 : 0
– Note: amount = 0, no syntactic LOOKAHEAD is performed.

86
References on javacc lookahead
• https://javacc.dev.java.net/doc/lookahead.html
• http://userpages.umbc.edu/~vick/431/Lectures/Spring06/
4_Parsing/1_LL/Looking_ahead_in_javacc.ppt

PDF Money, Banking and Financial Markets, 6e ISE Stephen G. Cecchetti download
100% (4)
PDF Money, Banking and Financial Markets, 6e ISE Stephen G. Cecchetti download
21 pages
DARKBERT
No ratings yet
DARKBERT
17 pages
DRILL New SAT Math Fake Passport To Advanced With SOLUTIONS Part 1
No ratings yet
DRILL New SAT Math Fake Passport To Advanced With SOLUTIONS Part 1
12 pages
JP - Bank EBS Requiuirement Sheet Config
No ratings yet
JP - Bank EBS Requiuirement Sheet Config
37 pages
Westpac Sap Bank Transaction Codes
No ratings yet
Westpac Sap Bank Transaction Codes
8 pages
Chapter Three: Constraints and Challenges
No ratings yet
Chapter Three: Constraints and Challenges
8 pages
1 Credit Cards Generator - Credit Card Numbers and Data
No ratings yet
1 Credit Cards Generator - Credit Card Numbers and Data
8 pages
Commercial Bank
100% (2)
Commercial Bank
19 pages
How To Spread Your Malware Like Adam
No ratings yet
How To Spread Your Malware Like Adam
2 pages
How To Find My Personal IBAN or My Bank's BIC Code?: What Is An IBAN Number?
No ratings yet
How To Find My Personal IBAN or My Bank's BIC Code?: What Is An IBAN Number?
2 pages
Checkbook
No ratings yet
Checkbook
1 page
G2a 31 Aug Update
No ratings yet
G2a 31 Aug Update
9 pages
Fact Sheet Mistaken Internet Payments
No ratings yet
Fact Sheet Mistaken Internet Payments
2 pages
PP WebsitePaymentsStandard IntegrationGuide
No ratings yet
PP WebsitePaymentsStandard IntegrationGuide
194 pages
Fake AP
No ratings yet
Fake AP
6 pages
Shadowsocks: A Secure SOCKS5 Proxy: 1 Overview
No ratings yet
Shadowsocks: A Secure SOCKS5 Proxy: 1 Overview
9 pages
Dormant
No ratings yet
Dormant
86 pages
Why Is My Cash App Account Closed
No ratings yet
Why Is My Cash App Account Closed
2 pages
Credit Card Fraud Vanvlasselaer - dss2015
No ratings yet
Credit Card Fraud Vanvlasselaer - dss2015
11 pages
How Can Email Verification Change The World?
No ratings yet
How Can Email Verification Change The World?
2 pages
How To Detect Fraud Sites On The Internet
No ratings yet
How To Detect Fraud Sites On The Internet
6 pages
CSF Fy12 990
No ratings yet
CSF Fy12 990
100 pages
Deals 13676362 44
No ratings yet
Deals 13676362 44
275 pages
PHP3 PDF
No ratings yet
PHP3 PDF
11 pages
AirlineReservationSystem - HCI Assignment - Lim Choon Onn - Lai Mei Ting - Leong Xiao Hui - Joanne Ong Yong en
No ratings yet
AirlineReservationSystem - HCI Assignment - Lim Choon Onn - Lai Mei Ting - Leong Xiao Hui - Joanne Ong Yong en
8 pages
Default Usernames, Passwords and IP Addresses For Surveillance Cameras
No ratings yet
Default Usernames, Passwords and IP Addresses For Surveillance Cameras
8 pages
Setting IP Address
No ratings yet
Setting IP Address
4 pages
Help Log Out: Memo: 112509FROM CHECKING 1306865902 Memo: Withdrawal
No ratings yet
Help Log Out: Memo: 112509FROM CHECKING 1306865902 Memo: Withdrawal
4 pages
8 1 Evidence Search A Pattern Match Game
No ratings yet
8 1 Evidence Search A Pattern Match Game
57 pages
PrimusBank Admin FRS V 2 2 Lyst1358
No ratings yet
PrimusBank Admin FRS V 2 2 Lyst1358
43 pages
Payroll Bank
No ratings yet
Payroll Bank
6 pages
CC Scrapping Method
No ratings yet
CC Scrapping Method
1 page
Asterisk PBX + Google Voice - How I Set Up 100% Free Landline Calling - AnandTech Forums
No ratings yet
Asterisk PBX + Google Voice - How I Set Up 100% Free Landline Calling - AnandTech Forums
12 pages
ACH
No ratings yet
ACH
1 page
Interview Questions and Answers
No ratings yet
Interview Questions and Answers
3 pages
PayPal As The Most Loved Payment System Among Merchants and Buyers in Online Transactions
No ratings yet
PayPal As The Most Loved Payment System Among Merchants and Buyers in Online Transactions
5 pages
Banking System
No ratings yet
Banking System
5 pages
Lesson 4 - Case Study (Bank Account)
No ratings yet
Lesson 4 - Case Study (Bank Account)
10 pages
API Documentation October 18, 2016
No ratings yet
API Documentation October 18, 2016
37 pages
Direct Loan Basics Students
No ratings yet
Direct Loan Basics Students
6 pages
Bit Twiddling Hacks
No ratings yet
Bit Twiddling Hacks
29 pages
Internship Report: Brac University
No ratings yet
Internship Report: Brac University
80 pages
2 G Scam-1
No ratings yet
2 G Scam-1
26 pages
Checks Management
No ratings yet
Checks Management
3 pages
2022 Air BNB Method ?
No ratings yet
2022 Air BNB Method ?
3 pages
Money Transfer App (Easy Pay)
No ratings yet
Money Transfer App (Easy Pay)
13 pages
Kenny
No ratings yet
Kenny
21 pages
Hilsoft Check Writer
No ratings yet
Hilsoft Check Writer
66 pages
Hack Back Fisher
No ratings yet
Hack Back Fisher
24 pages
Instructions - IP: Various Processors
No ratings yet
Instructions - IP: Various Processors
34 pages
User Init Registration
No ratings yet
User Init Registration
10 pages
Credit Cards
No ratings yet
Credit Cards
26 pages
Receipt Entry Training Document
No ratings yet
Receipt Entry Training Document
81 pages
Digital Banking Guide PDF
No ratings yet
Digital Banking Guide PDF
14 pages
Data
0% (1)
Data
1 page
F 941
No ratings yet
F 941
4 pages
Zebra Quick Card
No ratings yet
Zebra Quick Card
2 pages
How To Set Up A SOCKS Proxy Using Putty & SSH
No ratings yet
How To Set Up A SOCKS Proxy Using Putty & SSH
2 pages
The Ridiculously Simple Guide to Apple Watch Series 4: A Practical Guide to Getting Started with Apple Watch Series 4 and WatchOS 6
From Everand
The Ridiculously Simple Guide to Apple Watch Series 4: A Practical Guide to Getting Started with Apple Watch Series 4 and WatchOS 6
Scott La Counte
No ratings yet
Winning The Credit Game
From Everand
Winning The Credit Game
Ian Anthony Suite
No ratings yet
Lecture 04
No ratings yet
Lecture 04
35 pages
Research Project For Mid-Term
No ratings yet
Research Project For Mid-Term
11 pages
SPCC May17
No ratings yet
SPCC May17
2 pages
EDUC 205 LESSON No. 1 (LM # 4 Final)
No ratings yet
EDUC 205 LESSON No. 1 (LM # 4 Final)
8 pages
Pega Next Best Action Advisor Implementation Guide - 0
No ratings yet
Pega Next Best Action Advisor Implementation Guide - 0
61 pages
Object Oriented Analysis and Design - Part2
No ratings yet
Object Oriented Analysis and Design - Part2
40 pages
Adama Science and Technology University: School of Electrical Engineering and Computing
No ratings yet
Adama Science and Technology University: School of Electrical Engineering and Computing
10 pages
Lecture (Chapter 9)
No ratings yet
Lecture (Chapter 9)
55 pages
CD Unit-Ii
No ratings yet
CD Unit-Ii
34 pages
Critical Review BIM Project Management
No ratings yet
Critical Review BIM Project Management
15 pages
Psd3a-Principles of Compiler Design
No ratings yet
Psd3a-Principles of Compiler Design
1 page
Memory Management
No ratings yet
Memory Management
127 pages
Operating Systems Session 15 Paging
No ratings yet
Operating Systems Session 15 Paging
15 pages
Logcat
No ratings yet
Logcat
9 pages
Compiler Design Handwritten Notes
No ratings yet
Compiler Design Handwritten Notes
42 pages
Lab
0% (1)
Lab
32 pages
Health Planning - PSM Made Easy
No ratings yet
Health Planning - PSM Made Easy
6 pages
Filename.l A.out Lex - Yy.c Source Program LEX Compiler
No ratings yet
Filename.l A.out Lex - Yy.c Source Program LEX Compiler
9 pages
805purl Compiler-Design TYS
No ratings yet
805purl Compiler-Design TYS
6 pages
Compiler Construction - CS606 Power Point Slides Lecture 02
100% (1)
Compiler Construction - CS606 Power Point Slides Lecture 02
20 pages
Cs2352 Principles of Compiler Design
No ratings yet
Cs2352 Principles of Compiler Design
1 page
Uwe Methodology
No ratings yet
Uwe Methodology
9 pages
CD Questions With Answers
100% (1)
CD Questions With Answers
36 pages
UTM - Playbook - For - Partners For 8 3 PDF
No ratings yet
UTM - Playbook - For - Partners For 8 3 PDF
86 pages
Unit - I Lexical Analysis Translator
No ratings yet
Unit - I Lexical Analysis Translator
13 pages
Lecture 11 - Syntax Directed Translation
No ratings yet
Lecture 11 - Syntax Directed Translation
57 pages
Otieno Isdora Awino Planning Development Project
No ratings yet
Otieno Isdora Awino Planning Development Project
116 pages
Compiler Design QB
No ratings yet
Compiler Design QB
6 pages
Lecture3 Java
No ratings yet
Lecture3 Java
82 pages
The Uniqueness of Software Quality Assurance - The Environments For Which SQA Methods
No ratings yet
The Uniqueness of Software Quality Assurance - The Environments For Which SQA Methods
21 pages