101174
101174
com
https://ebookmeta.com/product/essentials-of-compilation-an-
incremental-approach-in-python-1st-edition-jeremy-g-siek/
OR CLICK BUTTON
DOWNLOAD NOW
Jeremy G. Siek
The MIT Press would like to thank the anonymous peer reviewers who provided comments on
drafts of this book. The generous work of academic experts is essential for establishing the
authority and quality of our publications. We acknowledge with gratitude the contributions of
these otherwise uncredited readers.
This book was set in Times LT Std Roman by the author. Printed and bound in the United
States of America.
10 9 8 7 6 5 4 3 2 1
This book is dedicated to Katie, my partner in everything, my children, who grew
up during the writing of this book, and the programming language students at
Indiana University, whose thoughtful questions made this a better book.
Contents
Preface xi
1 Preliminaries 1
1.1 Abstract Syntax Trees 1
1.2 Grammars 3
1.3 Pattern Matching 5
1.4 Recursive Functions 6
1.5 Interpreters 8
1.6 Example Compiler: A Partial Evaluator 10
3 Parsing 29
3.1 Lexical Analysis and Regular Expressions 29
3.2 Grammars and Parse Trees 31
3.3 Ambiguous Grammars 33
3.4 From Parse Trees to Abstract Syntax Trees 34
3.5 Earley’s Algorithm 36
3.6 The LALR(1) Algorithm 40
3.7 Further Reading 43
4 Register Allocation 45
4.1 Registers and Calling Conventions 46
4.2 Liveness Analysis 49
4.3 Build the Interference Graph 51
viii Contents
8 Functions 125
8.1 The LFun Language 125
8.2 Functions in x86 130
8.3 Shrink LFun 133
8.4 Reveal Functions and the LFunRef Language 133
Contents ix
12 Generics 195
12.1 Compiling Generics 201
12.2 Resolve Instantiation 202
12.3 Erase Generic Types 202
A Appendix 207
A.1 x86 Instruction Set Quick Reference 207
References 209
Index 217
Preface
There is a magical moment when a programmer presses the run button and the
software begins to execute. Somehow a program written in a high-level language is
running on a computer that is capable only of shuffling bits. Here we reveal the wiz-
ardry that makes that moment possible. Beginning with the groundbreaking work
of Backus and colleagues in the 1950s, computer scientists developed techniques
for constructing programs called compilers that automatically translate high-level
programs into machine code.
We take you on a journey through constructing your own compiler for a small
but powerful language. Along the way we explain the essential concepts, algorithms,
and data structures that underlie compilers. We develop your understanding of how
programs are mapped onto computer hardware, which is helpful in reasoning about
properties at the junction of hardware and software, such as execution time, soft-
ware errors, and security vulnerabilities. For those interested in pursuing compiler
construction as a career, our goal is to provide a stepping-stone to advanced topics
such as just-in-time compilation, program analysis, and program optimization. For
those interested in designing and implementing programming languages, we connect
language design choices to their impact on the compiler and the generated code.
A compiler is typically organized as a sequence of stages that progressively trans-
late a program to the code that runs on hardware. We take this approach to the
extreme by partitioning our compiler into a large number of nanopasses, each of
which performs a single task. This enables the testing of each pass in isolation and
focuses our attention, making the compiler far easier to understand.
The most familiar approach to describing compilers is to dedicate each chapter
to one pass. The problem with that approach is that it obfuscates how language
features motivate design choices in a compiler. We instead take an incremental
approach in which we build a complete compiler in each chapter, starting with
a small input language that includes only arithmetic and variables. We add new
language features in subsequent chapters, extending the compiler as necessary.
Our choice of language features is designed to elicit fundamental concepts and
algorithms used in compilers.
• We begin with integer arithmetic and local variables in chapters 1 and 2, where
we introduce the fundamental tools of compiler construction: abstract syntax trees
and recursive functions.
xii Preface
• In chapter 3 we learn how to use the Lark parser framework to create a parser
for the language of integer arithmetic and local variables. We learn about the
parsing algorithms inside Lark, including Earley and LALR(1).
• In chapter 4 we apply graph coloring to assign variables to machine registers.
• Chapter 5 adds conditional expressions, which motivates an elegant recursive
algorithm for translating them into conditional goto statements.
• Chapter 6 adds loops. This elicits the need for dataflow analysis in the register
allocator.
• Chapter 7 adds heap-allocated tuples, motivating garbage collection.
• Chapter 8 adds functions as first-class values without lexical scoping, similar to
functions in the C programming language (Kernighan and Ritchie 1988). The
reader learns about the procedure call stack and calling conventions and how
they interact with register allocation and garbage collection. The chapter also
describes how to generate efficient tail calls.
• Chapter 9 adds anonymous functions with lexical scoping, that is, lambda
expressions. The reader learns about closure conversion, in which lambdas are
translated into a combination of functions and tuples.
• Chapter 10 adds dynamic typing. Prior to this point the input languages are
statically typed. The reader extends the statically typed language with an Any
type that serves as a target for compiling the dynamically typed language.
• Chapter 11 uses the Any type introduced in chapter 10 to implement a gradually
typed language in which different regions of a program may be static or dynami-
cally typed. The reader implements runtime support for proxies that allow values
to safely move between regions.
• Chapter 12 adds generics with autoboxing, leveraging the Any type and type
casts developed in chapters 10 and 11.
There are many language features that we do not include. Our choices balance the
incidental complexity of a feature versus the fundamental concepts that it exposes.
For example, we include tuples and not records because although they both elicit the
study of heap allocation and garbage collection, records come with more incidental
complexity.
Since 2009, drafts of this book have served as the textbook for sixteen-week
compiler courses for upper-level undergraduates and first-year graduate students at
the University of Colorado and Indiana University. Students come into the course
having learned the basics of programming, data structures and algorithms, and
discrete mathematics. At the beginning of the course, students form groups of two
to four people. The groups complete approximately one chapter every two weeks,
starting with chapter 2 and including chapters according to the students interests
while respecting the dependencies between chapters shown in figure 0.1. Chapter 8
(functions) depends on chapter 7 (tuples) only in the implementation of efficient
tail calls. The last two weeks of the course involve a final project in which students
design and implement a compiler extension of their choosing. The last few chapters
can be used in support of these projects. Many chapters include a challenge problem
that we assign to the graduate students.
Preface xiii
Figure 0.1
Diagram of chapter dependencies.
For compiler courses at universities on the quarter system (about ten weeks in
length), we recommend completing the course through chapter 7 or chapter 8 and
providing some scaffolding code to the students for each compiler pass. The course
can be adapted to emphasize functional languages by skipping chapter 6 (loops)
and including chapter 9 (lambda). The course can be adapted to dynamically typed
languages by including chapter 10.
This book has been used in compiler courses at California Polytechnic State Uni-
versity, Portland State University, Rose–Hulman Institute of Technology, University
of Freiburg, University of Massachusetts Lowell, and the University of Vermont.
This edition of the book uses Python both for the implementation of the compiler
and for the input language, so the reader should be proficient with Python. There
are many excellent resources for learning Python (Lutz 2013; Barry 2016; Sweigart
2019; Matthes 2019).The support code for this book is in the GitHub repository at
the following location:
https://github.com/IUCompilerCourse/
The compiler targets x86 assembly language (Intel 2015), so it is helpful but
not necessary for the reader to have taken a computer systems course (Bryant
and O’Hallaron 2010). We introduce the parts of x86-64 assembly language that
are needed in the compiler. We follow the System V calling conventions (Bryant
and O’Hallaron 2005; Matz et al. 2013), so the assembly code that we gener-
ate works with the runtime system (written in C) when it is compiled using the
GNU C compiler (gcc) on Linux and MacOS operating systems on Intel hardware.
On the Windows operating system, gcc uses the Microsoft x64 calling conven-
tion (Microsoft 2018, 2020). So the assembly code that we generate does not work
with the runtime system on Windows. One workaround is to use a virtual machine
with Linux as the guest operating system.
xiv Preface
Acknowledgments
Jeremy G. Siek
Bloomington, Indiana
Preliminaries
1
In this chapter we introduce the basic tools needed to implement a compiler. Pro-
grams are typically input by a programmer as text, that is, a sequence of characters.
The program-as-text representation is called concrete syntax. We use concrete syn-
tax to concisely write down and talk about programs. Inside the compiler, we use
abstract syntax trees (ASTs) to represent programs in a way that efficiently sup-
ports the operations that the compiler needs to perform. The process of translating
concrete syntax to abstract syntax is called parsing and is studied in chapter 3. For
now we use the parse function in Python’s ast module to translate from concrete
to abstract syntax.
ASTs can be represented inside the compiler in many different ways, depending
on the programming language used to write the compiler. We use Python classes
and objects to represent ASTs, especially the classes defined in the standard ast
module for the Python source language. We use grammars to define the abstract
syntax of programming languages (section 1.2) and pattern matching to inspect
individual nodes in an AST (section 1.3). We use recursive functions to construct
and deconstruct ASTs (section 1.4). This chapter provides a brief introduction to
these components.
Compilers use abstract syntax trees to represent programs because they often need
to ask questions such as, for a given part of a program, what kind of language feature
is it? What are its subparts? Consider the program on the left and the diagram
of its AST on the right (1.1). This program is an addition operation that has two
subparts, a input operation and a negation. The negation has another subpart, the
integer constant 8. By using a tree to represent the program, we can easily follow
the links to go from one part of a program to its subparts.
input_int() + -8 input_int() -
8 (1.1)
Other documents randomly have
different content
of works for storage of water clearly testify. The period 69-180 ad
seems to have been marked by a considerable extension of
cultivation in these parts, and particularly in southern Numidia,
which at that time was included in the Province Africa. In this
district, between Sitifis (Setif) and Trajan’s great city Thamugadi
(Timgad), lay the commune of Lamasba[1169], the members of
which appear to have been mainly engaged in agriculture. There has
been preserved a large portion of a great inscription dealing with the
water-rights of their several farms. There is nothing to suggest that
the holders of these plots were tenants under great landlords. They
seem to be owners, not in the full sense of Roman civil law, but on
the regular provincial[1170] footing, subject to tribute. To determine
the shares of the several plots in the common water-supply was
probably the most urgent problem of local politics in this community.
The date of the inscription has been placed in the reign of
Elagabalus; but it is obviously based on earlier conditions and not
improbably a revision of an earlier scheme. It deals with the several
plots one by one, fixing the number of hours[1171] during which the
water is to be turned on to each, and making allowance for variation
of the supply according to the season of the year. A remarkable
feature of this elaborate scheme is the division of the plots into
those below the water level into which the water finds its way by
natural flow (declives), and those above water level (acclives). To
the latter it is clear that the water must have been raised by
mechanical means, and the scale of hours fixed evidently makes
allowance for the slower delivery accomplished thereby. For the
‘descendent’ water was to be left flowing for fewer hours than the
‘ascendent.’ As a specimen of the care taken in such a community to
prevent water-grabbing by unscrupulous members this record is a
document of high interest. That many others of similar purport
existed, and have only been lost to us by the chances of time, is
perhaps no rash guess.
The water-leet is called aqua Claudiana. The regulations are
issued by the local senate and people (decreto ordinis et
colonorum), for the place had a local[1172] government. Names of 43
possessors remain on the surviving portion of the stone. In form
they are generally Roman[1173]. It is noted that only three of them
have a praenomen. Of the quality of the men it is not easy to infer
anything. Some may perhaps have been Italians. Whether they, or
some of them, were working farmers must remain doubtful. At all
events they do not seem to belong to the class of coloni of whom we
shall have to speak below, but to be strictly cultivating possessors.
What labour they employed it is hardly possible to guess.
XXXVIII. FRONTINUS.
Sextus Julius Frontinus, a good specimen of the competent
departmental officers in the imperial service, was not only a
distinguished military commander but an engineer and a writer of
some merit. His little treatise[1174] on the aqueducts of Rome has for
us points of interest. From it we can form some notion of the
importance of the great water-works, not only to the city but to the
country for some miles in certain directions. For water-stealing by
the illicit tapping of the main channels was practised outside as well
as within the walls. Landowners[1175] did it to irrigate their gardens,
and the underlings of the staff (aquarii) connived at the fraud: to
prevent this abuse was one of the troubles of the curator. But in
certain places water was delivered by branch supplies from certain
aqueducts. This of course had to be duly licensed, and license was
only granted when the flow of water in the particular aqueduct was
normally sufficient to allow the local privilege without reducing the
regular discharge in Rome. The municipality of Tibur[1176] seems to
have had an old right to a branch of the Anio vetus. The aqua Crabra
had been a spring serving Tusculum[1177], but in recent times the
Roman aquarii had led off some of its water into the Tepula, and
made illicit profit out of the supply thus increased in volume.
Frontinus himself with the emperor’s approval redressed the
grievance, and the full supply of the Crabra again served the
Tusculan landlords. The jealous attention given to the water-works is
illustrated by the decrees[1178] of the Senate in the time of the
Republic and of emperors since, by which grants of water-rights can
only be made to individuals named in the grant, and do not pass to
heirs or assigns: the water must only be drawn from the reservoir
named, and used on the estate for which the license is specifically
granted.
The office of curator aquarum was manifestly no sinecure. It was
not merely that constant precautions had to be taken against the
stealing of the water. An immense staff[1179] had to be kept to their
duties, and the cleansing and repair of the channels needed prompt
and continuous attention. And it seems that some of the landowners
through whose estates the aqueducts passed gave much
trouble[1180] to the administration. Either they erected buildings in
the strips of land reserved as legal margin on each side of a channel,
or they planted trees there, thus damaging the fabric; or they drove
local roads over it; or again they blocked the access to working
parties engaged in the duties of upkeep. Frontinus quotes decrees of
the Senate dealing with these abuses and providing penalties for
persons guilty of such selfish and reckless conduct. But to legislate
was one thing, to enforce the law was another. Yet the
unaccommodating[1181] landlords had no excuse for their behaviour.
It was not a question of ‘nationalizing’ the side strips, though that
would have been amply justified in the interests of the state. But the
fact is that the old practice of Republican days was extremely tender
of private rights. If a landlord made objection to selling a part of his
estate, they took over the whole block and paid him for it. Then they
marked off the portions required for the service, and resold the
remainder. Thus the state was left unchallenged owner of the part
retained for public use. But the absence of any legal or moral claim
has not availed to stop encroachments: the draining away of the
water still goes on, with or without leave, and even the channels and
pipes themselves are pierced. No wonder that more severe and
detailed legislation was found necessary in the time of Augustus.
The writer ends by recognizing the unfairness of suddenly enforcing
a law the long disuse of which has led many to presume upon
continued impunity for breaking it. He therefore has been reviving it
gradually, and hopes that offenders will not force him to execute it
with rigour.
What stands out clearly in this picture of the water-service is the
utter lack of public spirit imputed to the landowners near Rome by a
careful and responsible public servant of good repute. There is none
of the sermonizing of Seneca or the sneers and lamentations of
Pliny. Frontinus takes things as they are, finds them bad, and means
to do his best to improve them, while avoiding the temptations of
the new broom. That a great quantity of water was being, and had
long been, diverted from the public aqueducts to serve suburban
villas and gardens, is certain. What we do not learn is whether much
or any of this was used for the market-gardens of the humble folk
who grew[1182] garden-stuff for the Roman market. It is the old
story,—little or nothing about the poor, save when in the form of a
city rabble they achieve distinction as a public burden and nuisance.
It does however seem fairly certain that licenses to abstract water
were only granted as a matter of special favour. Therefore, so far as
licensed abstraction went, it is most probable that influential owners
of suburbana were the only beneficiaries. Theft of water with
connivance[1183] of the staff was only possible for those who could
afford to bribe. There remains the alternative of taking it by eluding
or defying the vigilance of the staff. Is it probable that the poor
market-gardener ventured to do this? Not often, I fancy: we can
only guess, and I doubt whether much of the intercepted water
came his way. There was it is true one aqueduct[1184] the water of
which was of poor quality. It was a work of Augustus, intended to
supply the great pond (naumachia) in which sham sea-fights were
held to amuse the public. When not so employed, this water was
made available for irrigation of gardens. This was on the western or
Vatican side of the Tiber. Many rich men had pleasure-gardens in
that part, and we cannot be sure that even this water was in
practice serving any economic purpose.
XXXIX. INSCRIPTIONS RELATIVE TO
ALIMENTA.