jsp(3)-41-50
jsp(3)-41-50
RawInputCharacter:
any Unicode character
HexDigit: one of
0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F
FT
escape; if the number is odd, then the \ is not eligible to begin a Unicode escape.
For example, the raw input "\\u2297=\u2297" results in the eleven characters
" \ \ u 2 2 9 7 = ⊗ " (\u2297 is the Unicode encoding of the character “⊗”).
If an eligible \ is not followed by u, then it is treated as a RawInputCharacter
and remains part of the escaped Unicode stream. If an eligible \ is followed by u,
or more than one u, and the last u is not followed by four hexadecimal digits, then
a compile-time error occurs.
The character produced by a Unicode escape does not participate in further
Unicode escapes. For example, the raw input \u005cu005a results in the six char-
RA
acters \ u 0 0 5 a, because 005c is the Unicode value for \. It does not result in
the character Z, which is Unicode character 005a, because the \ that resulted from
the \u005c is not interpreted as the start of a further Unicode escape.
The Java programming language specifies a standard way of transforming a
program written in Unicode into ASCII that changes a program into a form that
can be processed by ASCII-based tools. The transformation involves converting
any Unicode escapes in the source text of the program to ASCII by adding an
extra u—for example, \uxxxx becomes \uuxxxx—while simultaneously convert-
ing non-ASCII characters in the source text to a \uxxxx escape containing a sin-
gle u.
This transformed version is equally acceptable to a compiler for the Java pro-
gramming language ("Java compiler") and represents the exact same program.
D
The exact Unicode source can later be restored from this ASCII form by convert-
ing each escape sequence where multiple u’s are present to a sequence of Unicode
characters with one fewer u, while simultaneously converting each escape
sequence with a single u to the corresponding single Unicode character.
Implementations should use the \uxxxx notation as an output format to dis-
play Unicode characters when a suitable font is not available.
15
3.4 Line Terminators LEXICAL STRUCTURE
Implementations next divide the sequence of Unicode input characters into lines
by recognizing line terminators. This definition of lines determines the line num-
bers produced by a Java compiler or other system component. It also specifies the
termination of the // form of a comment (§3.7).
LineTerminator:
the ASCII LF character, also known as “newline”
the ASCII CR character, also known as “return”
FT
the ASCII CR character followed by the ASCII LF character
InputCharacter:
UnicodeInputCharacter but not CR or LF
Lines are terminated by the ASCII characters CR, or LF, or CR LF. The two
characters CR immediately followed by LF are counted as one line terminator, not
two.
The result is a sequence of line terminators and input characters, which are the
terminal symbols for the third step in the tokenization process.
RA
3.5 Input Elements and Tokens
The input characters and line terminators that result from escape processing (§3.3)
and then input line recognition (§3.4) are reduced to a sequence of input elements.
Those input elements that are not white space (§3.6) or comments (§3.7) are
tokens. The tokens are the terminal symbols of the syntactic grammar (§2.3).
This process is specified by the following productions:
Input:
InputElementsopt Subopt
D
InputElements:
InputElement
InputElements InputElement
InputElement:
WhiteSpace
Comment
Token
16
LEXICAL STRUCTURE White Space 3.6
Token:
Identifier
Keyword
Literal
Separator
Operator
Sub:
the ASCII SUB character, also known as “control-Z”
White space (§3.6) and comments (§3.7) can serve to separate tokens that, if
FT
adjacent, might be tokenized in another manner. For example, the ASCII charac-
ters - and = in the input can form the operator token -= (§3.12) only if there is no
intervening white space or comment.
As a special concession for compatibility with certain operating systems, the
ASCII SUB character (\u001a, or control-Z) is ignored if it is the last character in
the escaped input stream.
Consider two tokens x and y in the resulting input stream. If x precedes y,
then we say that x is to the left of y and that y is to the right of x.
For example, in this simple piece of code:
RA
class Empty {
}
we say that the } token is to the right of the { token, even though it appears, in this
two-dimensional representation on paper, downward and to the left of the { token.
This convention about the use of the words left and right allows us to speak, for
example, of the right-hand operand of a binary operator or of the left-hand side of
an assignment.
17
3.7 Comments LEXICAL STRUCTURE
3.7 Comments
FT
These comments are formally specified by the following productions:
Comment:
TraditionalComment
EndOfLineComment
TraditionalComment:
/ * NotStar CommentTail
EndOfLineComment:
RA / / CharactersInLineopt LineTerminator
CommentTail:
* CommentTailStar
NotStar CommentTail
CommentTailStar:
/
* CommentTailStar
NotStarNotSlash CommentTail
NotStar:
InputCharacter but not *
D
LineTerminator
NotStarNotSlash:
InputCharacter but not * or /
LineTerminator
CharactersInLine:
InputCharacter
CharactersInLine InputCharacter
These productions imply all of the following properties:
18
LEXICAL STRUCTURE Identifiers 3.8
FT
3.8 Identifiers
19
3.9 Keywords LEXICAL STRUCTURE
FT
by a NON-SPACING ACUTE (´, \u0301) when sorting, but these are different in
identifiers. See The Unicode Standard, Volume 1, pages 412ff for details about
decomposition, and see pages 626–627 of that work for details about sorting.
Examples of identifiers are:
String i3 αρετη MAX_VALUE isLetterOrDigit
3.9 Keywords
RA
The following character sequences, formed from ASCII letters, are reserved for
use as keywords and cannot be used as identifiers (§3.8):
Keyword: one of
abstract default if private this
boolean do implements protected throw
break double import public throws
byte else instanceof return transient
case extends int short try
catch final interface static void
char finally long strictfp volatile
class float native super while
D
const for new switch
continue goto package synchronized
The keywords const and goto are reserved, even though they are not cur-
rently used. This may allow a Java compiler to produce better error messages if
these C++ keywords incorrectly appear in programs.
While true and false might appear to be keywords, they are technically
Boolean literals (§3.10.3). Similarly, while null might appear to be a keyword, it
is technically the null literal (§3.10.7).
20
LEXICAL STRUCTURE Integer Literals 3.10.1
3.10 Literals
A literal is the source code representation of a value of a primitive type (§4.2), the
String type (§4.3.3), or the null type (§4.1):
Literal:
IntegerLiteral
FloatingPointLiteral
BooleanLiteral
CharacterLiteral
FT
StringLiteral
NullLiteral
21
3.10.1 Integer Literals LEXICAL STRUCTURE
DecimalNumeral:
0
NonZeroDigit Digitsopt
Digits:
Digit
Digits Digit
Digit:
0
NonZeroDigit
FT
NonZeroDigit: one of
1 2 3 4 5 6 7 8 9
HexDigits:
HexDigit
HexDigit HexDigits
The following production from §3.3 is repeated here for clarity:
HexDigit: one of
0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F
OctalDigits:
OctalDigit
OctalDigit OctalDigits
OctalDigit: one of
0 1 2 3 4 5 6 7
22
LEXICAL STRUCTURE Integer Literals 3.10.1
Note that octal numerals always consist of two or more digits; 0 is always
considered to be a decimal numeral—not that it matters much in practice, for the
numerals 0, 00, and 0x0 all represent exactly the same integer value.
The largest decimal literal of type int is 2147483648 ( 2 31 ). All decimal liter-
als from 0 to 2147483647 may appear anywhere an int literal may appear, but
the literal 2147483648 may appear only as the operand of the unary negation
operator -.
The largest positive hexadecimal and octal literals of type int are
0x7fffffff and 017777777777, respectively, which equal 2147483647
( 2 31 – 1 ). The most negative hexadecimal and octal literals of type int are
FT
0x80000000 and 020000000000, respectively, each of which represents the deci-
mal value –2147483648 ( – 2 31 ). The hexadecimal and octal literals 0xffffffff
and 037777777777, respectively, represent the decimal value -1.
A compile-time error occurs if a decimal literal of type int is larger than
2147483648 ( 2 31 ), or if the literal 2147483648 appears anywhere other than as
the operand of the unary - operator, or if a hexadecimal or octal int literal does
not fit in 32 bits.
Examples of int literals:
0 2 0372 0xDadaCafe 1996 0x00FF00FF
RAThe largest decimal literal of type long is 9223372036854775808L ( 2 63 ).
All decimal literals from 0L to 9223372036854775807L may appear anywhere a
long literal may appear, but the literal 9223372036854775808L may appear only
as the operand of the unary negation operator -.
The largest positive hexadecimal and octal literals of type long are
0x7fffffffffffffffL and 0777777777777777777777L, respectively, which
equal 9223372036854775807L ( 2 63 – 1 ). The literals 0x8000000000000000L
and 01000000000000000000000L are the most negative long hexadecimal and
octal literals, respectively. Each has the decimal value –9223372036854775808L
( – 2 63 ). The hexadecimal and octal literals 0xffffffffffffffffL and
D
01777777777777777777777L, respectively, represent the decimal value -1L.
A compile-time error occurs if a decimal literal of type long is larger than
9223372036854775808L ( 2 63 ), or if the literal 9223372036854775808L appears
anywhere other than as the operand of the unary - operator, or if a hexadecimal or
octal long literal does not fit in 64 bits.
Examples of long literals:
0l 0777L 0x100000000L 2147483648L 0xC0B0L
23
3.10.2 Floating-Point Literals LEXICAL STRUCTURE
FT
A floating-point literal is of type float if it is suffixed with an ASCII letter F
or f; otherwise its type is double and it can optionally be suffixed with an ASCII
letter D or d.
FloatingPointLiteral:
Digits . Digitsopt ExponentPartopt FloatTypeSuffixopt
. Digits ExponentPartopt FloatTypeSuffixopt
Digits ExponentPart FloatTypeSuffixopt
Digits ExponentPartopt FloatTypeSuffix
ExponentPart:
RA ExponentIndicator SignedInteger
ExponentIndicator: one of
e E
SignedInteger:
Signopt Digits
Sign: one of
+ -
FloatTypeSuffix: one of
f F d D
D
The elements of the types float and double are those values that can be rep-
resented using the IEEE 754 32-bit single-precision and 64-bit double-precision
binary floating-point formats, respectively.
The details of proper input conversion from a Unicode string representation of
a floating-point number to the internal IEEE 754 binary floating-point representa-
tion are described for the methods valueOf of class Float and class Double of
the package java.lang.
The largest positive finite float literal is 3.40282347e+38f. The smallest
positive finite nonzero literal of type float is 1.40239846e-45f. The largest
24