Chapter 6 Compiler Phases
Chapter 6 Compiler Phases
Compiler
What is a Compiler?
• A program that translates a program in one language
to another language
– The essential interface between applications &
architectures
• Typically lowers the level of abstraction
– analyzes and reasons about the program & architecture
• We expect the program to be optimized, i.e., better
than the original
– ideally exploiting architectural strengths and hiding
weaknesses
What is a Compiler?
• A compiler is a computer
program that translates a
program in a source language
into an equivalent program in a
target language.
Source Target
• A source program/code is a program compiler program
program/code written in the
source language, which is
usually a high-level language.
Error
• A target program/code is a message
program/code written in the
target language, which often is a
machine language or an
intermediate code.
Compiler vs. Interpreter
Ideal concept:
Source code Compiler Executable
Source code
Interpreter Output data
Input data
Compiler vs. Interpreter
• Most languages are usually thought of as using
either one or the other:
– Compilers: FORTRAN, COBOL, C, C++, Pascal,
PL/1
– Interpreters: Lisp, scheme, BASIC, APL, Perl,
Python, Smalltalk
• BUT: not always implemented this way
– Virtual Machines (e.g., Java)
– Linking of executables at runtime
– JIT (Just-in-time) compiling
Compiler vs. Interpreter
Intermed. code
Virtual machine Output
Input Data
Compiler vs. Interpreter
Compiler Interpreter
• Pros • Pros
– Less space – Easy debugging
– Fast execution – Fast Development
• Cons • Cons
– Slow processing – Not for large projects
• Partly Solved • Exceptions: Perl,
(Separate Python
compilation) – Requires more space
– Debugging – Slower execution
• Improved thru IDEs • Interpreter in
memory all the time
Scanning/Lexical analysis
• Break program down into its smallest meaningful
symbols (tokens, atoms)
• Tools for this include lex, flex
• Tokens include e.g.:
– “Reserved words”: do if float while
– Special characters: ( { , + - = ! /
– Names & numbers: myValue 3.07e02
• Start symbol table with new symbols found
Process of Compiling
Stream of characters
scanner
Stream of tokens
parser
Parse/syntax tree
Semantic analyzer
Annotated tree
Intermediate code generator
Intermediate code
Code optimization
Intermediate code
Code generator
Target code
Code optimization
Target code
Scanning
• A scanner reads a stream of characters and
puts them together into some meaningful (with
respect to the source language) units called
tokens.
• It produces a stream of tokens for the next
phase of compiler.
Parsing
• A parser gets a stream of tokens from the
scanner, and determines if the syntax
(structure) of the program is correct according
to the (context-free) grammar of the source
language.
• Then, it produces a data structure, called a
parse tree or an abstract syntax tree, which
describes the syntactic structure of the
program.
Semantic analysis
• It gets the parse tree from the parser together with
information about some syntactic elements
• It determines if the semantics or meaning of the program
is correct.
• This part deals with static semantic.
– semantic of programs that can be checked by reading off from
the program only.
– syntax of the language which cannot be described in context-free
grammar.
• Mostly, a semantic analyzer does type checking.
• It modifies the parse tree in order to get that (static)
semantically correct code.
Intermediate code generation
• An intermediate code generator
– takes a parse tree from the semantic analyzer
– generates a program in the intermediate language.
• In some compilers, a source program is
translated into an intermediate code first and
then the intermediate code is translated into the
target language.
• In other compilers, a source program is
translated directly into the target language.
Intermediate code generation (cont’d)
• Using intermediate code is beneficial when
compilers which translates a single source
language to many target languages are required.
– The front-end of a compiler – scanner to intermediate
code generator – can be used for every compilers.
– Different back-ends – code optimizer and code
generator– is required for each target language.
• One of the popular intermediate code is three-
address code. A three-address code instruction is in
the form of x = y op z.
Code optimization
• Replacing an inefficient sequence of instructions
with a better sequence of instructions.
• Sometimes called code improvement.
• Code optimization can be done:
– after semantic analyzing
• performed on a parse tree
– after intermediate code generation
• performed on a intermediate code
– after code generation
• performed on a target code
Code generation
• A code generator
– takes either an intermediate code or a parse tree
– produces a target program.
Error Handling
• Error can be found in every phase of compilation.
– Errors found during compilation are called static (or
compile-time) errors.
– Errors found during execution are called dynamic (or
run-time) errors
• Compilers need to detect, report, and recover
from error found in source programs
• Error handlers are different in different phases of
compiler.
Cross Compiler
• a compiler which generates target code for a
different machine from one on which the compiler
runs.
• A host language is a language in which the
compiler is written.
S T
– T-diagram
H