How To Make Mistakes in Python
How To Make Mistakes in Python
Make Mistakes
in Python
Mike Pirnat
Mike Pirnat
First Edition
978-1-491-93447-0
[LSI]
Table of Contents
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
1. Setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Polluting the System Python
Using the Default REPL
1
4
2. Silly Things. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Forgetting to Return a Value
Misspellings
Mixing Up Def and Class
7
9
10
3. Style. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Hungarian Notation
PEP-8 Violations
Bad Naming
Inscrutable Lambdas
Incomprehensible Comprehensions
13
15
17
19
20
4. Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Pathological If/Elif Blocks
Unnecessary Getters and Setters
Getting Wrapped Up in Decorators
Breaking the Law of Demeter
Overusing Private Attributes
God Objects and God Methods
23
25
27
29
31
33
ix
Global State
36
5. Surprises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Importing Everything
Overbroadly Silencing Exceptions
Reinventing the Wheel
Mutable Keyword Argument Defaults
Overeager Code
Poisoning Persistent State
Assuming Logging Is Unnecessary
Assuming Tests Are Unnecessary
41
43
46
48
50
56
59
62
6. Further Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Philosophy
Tools
| Table of Contents
65
66
Introduction
xi
Surprises
Those sudden shocking mysteries that only time can turn from
OMG to LOL.
There are a couple of quick things that should be addressed before
we get started.
First, this work does not aim to be an exhaustive reference on poten
tial programming pitfallsit would have to be much, much longer,
and would probably never be completebut strives instead to be a
meaningful tour of the greatest hits of my sins.
My experiences are largely based on working with real-world but
closed-source code; though authentic examples are used where pos
sible, code samples that appear here may be abstracted and hyper
bolized for effect, with variable names changed to protect the inno
cent. They may also refer to undefined variables or functions. Code
samples make liberal use of the ellipsis () to gloss over reams of
code that would otherwise obscure the point of the discussion.
Examples from real-world code may contain more flaws than those
under direct examination.
Due to formatting constraints, some sample code thats described as
one line may appear on more than one line; I humbly ask the use
of your imagination in such cases.
Code examples in this book are written for Python 2, though the
concepts under consideration are relevant to Python 3 and likely far
beyond.
Thanks are due to Heather Scherer, who coordinated this project; to
Leonardo Alemeida, Allen Downey, and Stuart Williams, who pro
vided valuable feedback; to Kristen Brown and Sonia Saruba, who
helped tidy everything up; and especially to editor Meghan Blanch
ette, who picked my weird idea over all of the safe ones and encour
aged me to run with it.
Finally, though the material discussed here is rooted in my profes
sional life, it should not be construed as representing the current
state of the applications I work with. Rather, its drawn from over 15
years (an eternity on the web!) and much has changed in that time.
Im deeply grateful to my workplace for the opportunity to make
mistakes, to grow as a programmer, and to share what Ive learned
along the way.
xii
| Introduction
With any luck, after reading this you will be in a position to make a
more interesting caliber of mistake: with an awareness of what can
go wrong, and how to avoid it, you will be freed to make the excit
ing, messy, significant sorts of mistakes that push the art of pro
gramming, or the domain of your work, forward.
Im eager to see what kind of trouble youll get up to.
Introduction
xiii
CHAPTER 1
Setup
There are a couple of ways Ive gotten off on the wrong foot by not
starting a project with the right tooling, resulting in lost time and
plenty of frustration. In particular, Ive made a proper hash of sev
eral computers by installing packages willy-nilly, rendering my sys
tem Python environment a toxic wasteland, and Ive continued to
use the default Python shell even though better alternatives are
available. Modest up-front investments of time and effort to avoid
these issues will pay huge dividends over your career as a Pytho
nista.
That may feel okay at first, but once you start developing or working
with multiple projects on that computer, youre going to eventually
have conflicts over package dependencies. Suppose project P1
depends on version 1.0 of library L, and project P2 uses version 4.2
of library L. If both projects have to be developed or deployed on the
same machine, youre practically guaranteed to have a bad day due
to changes to the librarys interface or behavior; if both projects use
the same site-packages, they cannot coexist! Even worse, on many
Linux distributions, important system tooling is written in Python,
so getting into this dependency management hell means you can
break critical pieces of your OS.
The solution for this is to use so-called virtual environments. When
you create a virtual environment (or virtual env), you have a sepa
rate Python environment outside of the system Python: the virtual
environment has its own site-packages directory, but shares the
standard library and whatever Python binary you pointed it at dur
ing creation. (You can even have some virtual environments using
Python 2 and others using Python 3, if thats what you need!)
For Python 2, youll need to install virtualenv by running pip
install virtualenv, while Python 3 now includes the same func
tionality out-of-the-box.
To create a virtual environment in a new directory, all you need to
do is run one command, though it will vary slightly based on your
choice of OS (Unix-like versus Windows) and Python version (2 or
3). For Python 2, youll use:
virtualenv <directory_name>
Chapter 1: Setup
Equivalents are also provided for the Csh and Fish shells on Unixlike systems, as well as PowerShell on Windows. Once activated, the
virtual environment is isolated from your system Pythonany
packages you install are independent from the system Python as well
as from other virtual environments.
When you are done working in that virtual environment, the
deactivate command will revert to using the default Python again.
As you might guess, I used to think that all this virtual environment
stuff was too many moving parts, way too complicated, and I would
never need to use it. After causing myself significant amounts of
pain, Ive changed my tune. Installing virtualenv for working with
Python 2 code is now one of the first things I do on a new computer.
Chapter 1: Setup
Figure 1-1. The Jupyter Notebook gives your browser super powers!
It takes just a little bit of extra effort and forethought to install and
learn your way around one of these more sophisticated REPLs, but
the sooner you do, the happier youll be.
Chapter 1: Setup
CHAPTER 2
Silly Things
indexes, getting the database query just so, because thats the fun
part.
Heres an example fresh from a recent side project where I did this
yet again. This function does all the hard work of querying for vot
ers, optionally restricting the results to voters who cast ballots in
some date range:
def get_recent_voters(self, start_date=None, end_date=None):
query = self.session.query(Voter).\
join(Ballot).\
filter(Voter.status.in_(['A', 'P']))
if start_date:
query.filter(Ballot.election_date >= start_date)
if end_date:
query.filter(Ballot.election_date <= end_date)
query.group_by(Voter.id)
voters = query.all()
Meanwhile, three or four levels up the stack, some code that was
expecting to iterate over a list of Voter objects vomits catastrophi
cally when it gets a None instead. Now, if Ive been good about writ
ing tests, and Ive only just written this function, I find out about
this error right away, and fixing it is fairly painless. But if Ive been
In The Zone for several hours, or its been a day or two between
writing the function and getting a chance to exercise it, then the
resulting AttributeError or TypeError can be quite baffling. I
might have made that mistake hundreds or even thousands of lines
ago, and now theres so much of it that looks correct. My brain
knows what it meant to write, and that can prevent me from finding
the error as quickly as Id like.
This can be even worse when the function is expected to sometimes
return a None, or if its result is tested for truthiness. In this case, we
dont even get one of those confusing exceptions; instead the logic
just doesnt work quite right, or the calling code behaves as if there
were no results, even though we know there should be. Debugging
these cases can be exquisitely painful and time-consuming, and
theres a strong risk that these errors might not be caught until much
later in the life cycle of the code.
Ive started to combat this tendency by cultivating the habit of writ
ing the return immediately after defining the function, making a
second pass to write its core behavior:
Misspellings
One of the top entries on my list of superpowers is my uncanny abil
ity to mistype variable or function names when Im programming.
Like my forgetfulness about returning things from functions, I
encounter this the most when Ive been In The Zone for a couple of
hours and have been slacking at writing or running tests along the
way. Theres nothing quite like a pile of NameErrors and
AttributeErrors to deflate ones ego at the end of what seemed like
a glorious triumph of programming excellence.
Transposition is especially vexing because its hard to see what Ive
done wrong. I know what its supposed to say, so thats all I can see.
Worse, if the flaw isnt exposed by tests, theres a good chance it will
escape unscathed from code review. Peers reviewing code can skip
right over it because they also know what Im getting at and assume
(often too generously) I know what Im doing.
My fingers seem to have certain favorites that they like to torment
me with. Any end-to-end tests I write against our REST APIs arent
complete without at least half a dozen instances of respones when I
mean response. I may want to add a metadata element to a JSON
payload, but if its getting close to lunch time, my rebellious pha
langes invariably substitute meatdata. Some days I just give in and
deliberately use slef everywhere instead of self since it seems like
my fingers wont cooperate anyway.
Misspelling is particularly maddening when it occurs in a variable
assignment inside a conditional block like an if:
def fizzbuzz(number):
output = str(number)
if number % 3 == 0:
putput = "fizz"
...
return output
Misspellings
The code doesnt blow up, no exceptions are raisedit just doesnt
work right, and it is utterly exasperating to debug.
This issue, of course, is largely attributable to my old-school, artisi
nal coding environment, by which I mean Ive been too lazy to
invest in a proper editor with auto-completion. On the other hand,
Ive gotten good at typing xp in Vim to fix transposed characters.
I have also been really late to the Pylint party. Pylint is a code analy
sis tool that examines your code for various bad smells. It will
warn you about quite a lot of potential problems, can be tuned to
your needs (by default, it is rather talkative, and its output should be
taken with a grain of salt), and it will even assign a numeric score
based on the severity and number of its complaints, so you can
gamify improving your code. Pylint would definitely squawk about
undefined variables (like when I try to examine respones.headers)
and unused variables (like when I accidentally assign to putput
instead of output), so its going to save you time on these silly bug
hunts even though it may bruise your ego.
So, a few suggestions:
Pick an editor that supports auto-completion, and use it.
Write tests early and run them frequently.
Use Pylint. It will hurt your feelings, but that is its job.
10
...
# many more lines...
def test_being_excellent():
instance = SuperAmazingClass(42, 2112)
assert instance.be_excellent(...)
Wait, what?
My reverie is over, my flow is gone, and now I have to sort out what
Ive done to myself, which can take a couple of minutes when Ive
been startled by something that I assumed should Just Work.
When this happens, it means that I only thought that I wrote the
code above. Instead, my careless muscle memory has betrayed me,
and Ive really written this:
def SuperAmazingClass(object):
def __init__(self, arg1, arg2):
...
In this case, our class was called just fine, did nothing of value, and
implicitly returned None. It may seem obvious in this contrived con
text, but in the thick of debugging reams of production code, it can
be just plain weird.
Above all, be on your guard. Trust no oneleast of all yourself!
11
CHAPTER 3
Style
Okay, so ten out of ten for style, but minus several million
for good thinking, yeah?
Zaphod Beeblebrox
In this chapter, were going to take a look at five ways Ive hurt
myself with bad style. These are the sorts of things that can seem like
a good idea at the time, but will make code hard to read and hard to
maintain. They dont break your programs, but they damage your
ability to work on them.
Hungarian Notation
A great way to lie to yourself about the quality of your code is to use
Hungarian Notation. This is where you prefix each variable name
with a little bit of text to indicate what kind of thing its supposed to
be. Like many terrible decisions, it can start out innocently enough:
strFirstName
intYear
blnSignedIn
fltTaxRate
lstProducts
dctParams
13
The intent here is noble: were going to leave a signpost for our
future selves or other developers to indicate our intent. Is it a string?
Put a str on it. An integer? Give it an int. Masters of brevity that
we are, we can even specify lists (lst) and dictionaries (dct).
But soon things start to get silly as we work with more complex val
ues. We might conjoin lst and dct to represent a list of dictionaries:
lctResults
14
Chapter 3: Style
PEP-8 Violations
When I was starting out in Python, I picked up some bad habits
from our existing codebase and perpetuated them for a lot longer
than I should have. Several years had passed before I discovered
PEP-8, which suggests a standardized style for writing Python code.
Lets take a look at a distilled example and examine my sins:
class MyGiganticUglyClass(object):
def iUsedToWriteJava(self,x,y = 42):
blnTwoSpacesAreMoreEfficient = 1
while author.tragicallyConfused():
print "Three spaces FTW roflbbq!!1!"
if (new_addition):
four_spaces_are_best = True
if (multipleAuthors \
or peopleDisagree):
print "tabs! spaces are so mainstream"
...
return ((pain) and (suffering))
PEP-8 Violations
15
16
Chapter 3: Style
from regrets
import unfortunate_choices
class AnotherBadHabit(object):
short_name
much_longer_name
= 'foo'
= 'bar'
Bad Naming
At some point I internalized PEP-8s 80-character line length limit,
but my poor judgment led me to squeeze the most code I could into
a single line by using single-character variables wherever possible:
f.write(string.join(map(lambda
x,y=self.__dicProfiles,z=strPy:"%0.3s %s:
%s:(%s)" % (z,x,y[x][0],y[x]
%[1]),self.__dicProfiles.keys()),'\n')
%+'\n')
Bad Naming
17
Stare deeply into a line of code like SBD=J(D(H),SB) and its like gaz
ing into the abyss. The cognitive load of deciphering this later sim
ply isnt worth itgive things meaningful, human-readable names.
Of course, its entirely possible to hurt yourself with long names, too.
If you arent working with an editor that can do auto-completion,
things like these are filled with peril:
class TestImagineAClassNameThatExceeds80Characters(object):
...
def getSomethingFancyfromDictionary(...):
...
count_number_of_platypus_incidents_in_avg_season = ...
18
Chapter 3: Style
you spot the typos? Will you even be able to read the code that uses
these names?
foo, bar, and baz are a good fit for example code, but not something
that has to run and be maintained in production. The same goes for
every silly, nonsense name you might be tempted to use. Will you
even remember what spam or moo do in a week? In six months? I
once witnessed classes named for post-Roman Germanic tribes. Pop
quiz: What does a Visigoth do? How about a Vandal? These names
might as well have been line noise for all the good they did.
Though it grieves me to say it, clever or nerdy cultural references
(my worst offenses were lois.py and clark.py, which did some
reporting tasks, and threepio.py, which communicated with a
partners EWOKS system) should be avoided as well. Inevitably,
you will be devastated when no one appreciates the joke. Save the
comedy for your code comments.
Even semantically accurate but cute names can be a source of pain.
Youll command a lot more self-respect when you opt for
LocationResolver over LocationLookerUpper.
Names should be clear, concise, specific, meaningful, and readable.
For a great exploration of this topic, check out Brandon Rhodes talk
from PyCon 2013, The Naming of Ducks.
Inscrutable Lambdas
You can create anonymous functions inline in your Python code
with lambdas. Using lambdas can make you feel really smart, but
Ive become progressively allergic to them. Even when theyre sim
ple, they can be hard to read and quickly become confusing if theres
more than one on a line or in an expression:
lstRollout = filter(lambda x: x[-1] == '0',
filter(lambda x: x != '0|0', lstMbrSrcCombo))
if not filter(lambda lst, sm=sm: sm in lst,
map(lambda x, dicA=dicA: dicA.get(x, []),
lstAttribute)):
...
When we use a lambda in the middle of a line of code, that 80character rule pressures us to really make the most of that line. Cue
the one- and two-character variable names!
Inscrutable Lambdas
19
Our future selves will often be better off if we extract that complex
ity into a named, reusable, documentable, testable function that we
only have to get right once:
def taco_to_cat(input):
"""Convert tacos to cats"""
return input[-1].lower().replace('taco', 'cat')
Incomprehensible Comprehensions
List comprehensions are great: theyre beautiful, theyre elegant,
theyre inspiring other languages to adopt them. When I discovered
list comprehensions, I fell in love, and I fell hard. I used them at
every opportunity I had. And using them is fine, until they get filled
with so much junk that its hard to see whats even going on.
This example isnt too bad, but any time comprehensions are nested
like this, it takes more effort to understand whats happening:
crumbs = [y for y in
[x.replace('"', '') for x in crumbs] if y]
This one will scare new developers who arent friends with zip yet:
return [dict(x) for x in [zip(keys, x) for x in values]]
20
Chapter 3: Style
All of those examples are real, all of them appeared inline in other
functions, and none of them were commented or explained. (I am
so, so sorry.) At the very least, constructions like this deserve some
kind of comment. They could probably use better variable names
than x or j (and in the i, j, k case, those werent even integers for
countingoof!).
If the comprehension is sufficiently complex, it might even be worth
extracting the whole thing into a separate function with a reasonable
name to encapsulate that complexity. Instead of the examples above,
imagine if we had code that read like this:
crumbs = filter_crumbs(crumbs)
data = dict_from_lists(keys, values)
prop_list = make_exclusion_properties(prop_data)
Incomprehensible Comprehensions
21
CHAPTER 4
Structure
Its a trap!
Admiral Ackbar
Lets move on into questions of structure and how you can hurt your
future self with tangled logic and deep coupling. These structural
problems impact your ability to change or reuse your code as its
requirements inevitably change.
23
24
Chapter 4: Structure
If thats the case, consider externalizing it entirely, and let the strat
egy be chosen by the caller, who may in fact know better than we do
about whatever those factors are. The strategy is invoked as a call
back:
def do_awesome_stuff(strategy):
...
strategy()
...
result = do_awesome_stuff(strategy1)
From there its not too far of a jump into dependency injection,
where our code is provided with what it needs, rather than having to
be smart enough to ask for it on its own:
class Foo(object):
def __init__(self, strategy):
self.strategy = strategy
def do_awesome_stuff(self):
...
self.strategy()
...
foo = Foo(strategy2)
foo.do_awesome_stuff()
25
Each and every attribute of each and every class had getter and set
ter functions that did barely anything. The getters would simply
return the attributes that they guarded, and the setters would occa
sionally enforce things like types or constraints on the values the
attributes were allowed to take. This InviteEvent class had 40 get
ters and 40 setters; other classes had even more. Thats a lot of code
to accomplish very littleand thats not even counting the tests
needed to cover it all.
And trying to work with instances of these objects was pretty awful,
toothis kind of thing quickly becomes tiresome:
event.setEventNumber(10)
print event.getEventNumber()
26
Chapter 4: Structure
@property
def event_number(self):
return self._event_number
@event_number.setter
def _set_event_number(self, x):
self._event_number = int(x)
@event_number.deleter
def _delete_event_number(self):
self._event_number = None
...
The only trick is remembering to use the name of the property when
hooking up the setter or deleter, rather than using @property itself.
One nice thing about this decorator-based approach is that it doesnt
junk up the namespace of the class with a bunch of functions that
you really dont want anyone to call. Theres just the single property
object for each property!
Using these objects is far more comfortable than before, too. All
those function calls and parentheses simply vanish, leaving us with
what looks like plain old dot access:
event.event_number = 10
print event.event_number
27
28
| Chapter 4: Structure
29
it knows about, and not reach deeply into nested attributes, across
friends of friends, and into strangers.
It feels great to break this law because its so expedient to do so. Its
easy to feel like a superhero or a ninja commando when you quickly
tunnel through three, four, or more layers of abstraction to accom
plish your mission in record time.
Here are just a few examples of my countless crimes. Ive reached
across multiple objects to call a method:
gvars.objSession.objCustomer.objMemberStatus.isPAID()
Yikes!
This kind of thing might be okay when were debugging, or explor
ing in an interactive shell, but its bad news in production code.
When we break this law, our code becomes brittle. Instead of relying
on the public interface of a single object, it now relies on a delicate
chain of nested attributes, and any change that disrupts that chain
will break our code in ways that will furrow our brows as we strug
gle to repair the complex code plumbing mess weve made for our
selves.
We should especially avoid depending on single- and doubleunderscore internals of an object, because they are prefixed this way
for a reason. We are explicitly being told that these items are part of
the internal implementation of the object and we cannot depend on
them to remain as they arethey can be changed or removed at any
time. (The single underscore is a common convention to indicate
that whatever it prefixes is private-ish, while double-underscore
attributes are made private by Pythons name mangling.)
30
| Chapter 4: Structure
Hands off! this code shouts. Youll never need to use these things,
and I know better than you!
31
Inevitably, I discovered that I did need to use code that was hiding
behind the double underscore, sometimes to reuse functionality in
previously unforeseen ways, sometimes to write tests (either to test a
method in isolation or to mock it out).
Lets say we wanted to subclass that MyClass up above, and it needs a
slightly customized implementation of the do_something method.
We might try this:
class MyOtherClass(object):
def do_something(self):
self.__do_a_new_step()
self.__do_one_more_step()
Chapter 4: Structure
Find all Python soure files, count the number of lines, and sort
the results in descending order, so that the files with the most
33
lines bubble to the top of the list; anything over 1000 lines is
worth further investigation.
Count the number of classes defined in a big module
And the number of methods defined at some level of indenta
tion (i.e., within a class or within other functions) in that mod
ule.
If the ratio of methods to classes seems large, thats a good warning
sign that we need to take a closer look.
Or, if we feel like being creative, we can use Python to make a little
cross-platform tool:
import collections
import fileinput
import os
def is_line(line):
return True
def has_class(line):
return line.startswith('class')
def has_function(line):
return 'def ' in line
def find_gods():
stats = collections.defaultdict(collections.Counter)
for line in fileinput.input(find_files()):
for key, func in COUNTERS.items():
if func(line):
stats[key][fileinput.filename()] += 1
34
Chapter 4: Structure
if __name__ == '__main__':
find_gods()
This small program is enough to recursively find all .py files; count
the number of lines, classes, and functions in each file; and emit
those statistics grouped by filename and sorted by the number of
lines in the file, along with a ratio of functions to classes. Its not per
fect, but its certainly useful for identifying risky modules!
Lets take a high-level look at some of the gods Ive regretted creating
over the years. I cant share the full source code, but their summaries
should illustrate the problem.
One of them is called CardOrderPage, which spreads 2900 lines of
pain and suffering across 69 methods, with an 85-line __init__ and
numerous methods in excess of 200 to 300 lines, all just to shovel
some data around.
MemberOrderPage is only 2400 lines long, but it still packs a whop
ping 58 methods, and its __init__ is 90 lines. Like CardOrderPage,
it has a diverse set of methods, doing everything from request han
dling to placing an order and sending an email message (the last of
which takes 120 lines, or roughly 5 percent of the class).
Then theres a thing called Session, which isnt really what most web
frameworks would call a session (it doesnt manage session data on
the server), but which instead provides context about the request,
which is a polite way to say that its a big bag of things that you can
hurt yourself with. Lots of code in this codebase ended up being
tightly coupled to Session, which presents its own set of problems
that well explore further in a later section.
At the time that I captured the data about it, Session was only about
1000 lines, but it had 79 methods, most of which are small, save for a
monstrous 180-line __init__ laden with mine fields and side
effects.
35
Besides line count, another way you can identify god methods is by
looking for naming anti-patterns. Some of my most typical bad
methods have been:
def update_everything(...):
...
def do_everything(...):
...
def go(...):
...
If you find these kinds of abominations in your code, its a sign that
its time to take a deep breath and refactor them. Favor small func
tions and small classes that have as few responsibilities as possible,
and strive to do as little work as possible in the __init__ so that
your classes are easy to instantiate, with no weird side effects, and
your tests can be easy and lighweight. You want to break up these
wanna-be gods before they get out of hand.
Increasing the number of small classes and methods may not opti
mize for raw execution speed, but it does optimize for maintenance
over the long term and the overall sanity and well-being of the
development team.
Global State
We come now to one of my greatest regrets. This module is called
gvars.py, and it started simply as a favor to another developer who
needed easy access to some objects and didnt want to pass them
around everywhere, from way at the top of the stack to deep down
in the guts:
dctEnv = None
objSession = None
objWebvars = None
objHeaders = None
objUserAgent = None
Its basically just a module that has some module-level global vari
ables that would get repopulated by the app server with every
request that would come in over the web. If you import it, you can
talk to those globals, and you can do this at any level, from those
lofty heights that first see the request, where it seems like a reason
able thing to want to do, all the way down to the darkest, most hor
36
| Chapter 4: Structure
rible depths of your business logic, data model, and scary places
where this has no business being. It enables this sort of thing at
every level of your system:
from col.web import gvars
...
if gvars.objSession.hasSomething():
...
if gvars.objWebvars.get('foo') == 'bar':
...
strUserName = \
gvars.objSession.objCustomer.getName()
Global State
37
For its specific purpose, it got the job done, albeit with little flexibil
ity.
One day, I had to figure out how to surface the permission questions
for more than one sitegroup (a fancy internal term for a customer
namespace). Without some serious refactoring, this just wasnt pos
sible as-is.
So insteadand I am so, so sorry for thisI wrote a
PermissionFacade wrapper around the PermissionAdapter, and its
job was to fake out the necessary objects in gvars using Mock
objects, instantiate a PermissionAdapter, then restore the original
gvars before leaving the method:
class PermissionFacade(object):
def __init__(self,
self.webvars =
self.session =
self.headers =
...):
Mock()
Mock()
Mock()
38
Chapter 4: Structure
self.session.getSourceFamily.return_value = \
source_family
try:
self.permission_adapter = PermissionAdapter(
sitegroup, ...)
# ...and some other grotesque mock monkey
# patching to fake out a request context...
finally:
gvars.dctEnv = orig_gvars_env
return self.permission_adapter
Global State
39
CHAPTER 5
Surprises
If you do not expect the unexpected you will not find it,
for it is not to be reached by search or trail.
Heraclitus
At last, we get to the really weird stuff, the things that go bump in
the night and cause someone to get paged to solve them. These are
some of the many ways that you can create little time bombs in your
code, just waiting to surprise and delight you at some point in the
future.
Importing Everything
PEP-8 recommends avoiding wildcard imports (from some_module
import *), and its absolutely right. One of the most exciting reasons
is that it opens your code up to some interesting ways to break in a
multideveloper environment.
Suppose theres some module foo.py that has a bunch of great
things in it, and your code wants to make use of many of them. To
save yourself the tedium of either importing lots of individual
names, or importing just the module and having to type its name
over and over again, you decide to import everything:
import time
from foo import *
def some_function(...):
41
current_time = time.time()
...
This works fine, your tests pass, you commit the change, and off it
goes up the deployment pipeline. Time passes, until one day errors
start flooding in. The traceback tells you that your code is causing
AttributeError exceptions when trying to call time.time(). But
the unit tests are all greennot only are the tests for your code pass
ing, so are the tests for foo.
Whats happening in this ripped-from-reality scenario is that some
one has added a time to foo.py that isnt the standard library mod
ule of the same name. Maybe they defined time as a module-level
global variable, or made a function with the same name, or perhaps
they, too, didnt like typing a module name in numerous function
calls and so imported it like this:
from time import time
...
Because the import * happened after the import time in your code,
the name time is replaced by the one from from foo import *,
supplanted like a pod person from Invasion of the Body Snatchers.
The tests didnt catch this error because they were unit tests, and
intentionally isolated the code under test (your moduleor in real
ity, mine) from dependencies that are hard to control. The entire
time module was mocked out in the tests to allow it to be controlled,
and because it was mocked out, it presented exactly the expected
interface. And of course the tests for foo itself passtheyre verify
ing that the things inside that module are behaving correctly; its not
their responsibility to check up on what callers are doing with this
module.
Its an easy mistake to make. Whoever changed foo.py didnt look to
see if anyone else was importing everything, and you were busy
working on other things and either didnt see the change come in or
it didnt register with you. This is especially possible if foo.py is
some third-party library that you might upgrade without much
reviewthis hazard is just a pip install -U away!
So, do as PEP-8 suggests, and avoid wildcard imports! Dont do
them, dont let your colleagues do them, and if you see them in code
that you maintain, correct them.
42
Chapter 5: Surprises
43
44
Chapter 5: Surprises
def get_important_object():
try:
data = talk_to_database(...)
except IOError:
# Handle the exception appropriately;
# perhaps use default values?
data = { ... }
return ImportantObject(data)
The one acceptable use I have found for the Diaper Pattern is at
high-level system boundaries, such as a RESTful web service that
should always return a well-formed JSON response, or an XMLRPC backend that must always return a well-formed blob of XML.
In such a case, we do want to catch everything that might go wrong,
and package it up in a way that clients will be able to cope with it.
But even when the need for the diaper is legitimate, its not enough
to just package up the error and wash our hands of it, because while
our clients will be able to complain to us that our service is broken,
we wont actually know what the problem is unless we do something
to record the exception. The logging module that comes with
Python makes this so trivial that its almost pleasant. Assuming you
have a logger instance handy, simply invoking its exception
method will log the full stack trace:
def get_important_object():
...
try:
data = talk_to_database(...)
return ImportantObject(data)
except Exception:
logger.exception("informative message")
...
Now well know what happened, where it happened, and when, and
that can make all the difference when cleaning things up in the mid
dle of the night.
45
46
Chapter 5: Surprises
47
Chapter 5: Surprises
49
Stop for a second and really think about thatthis excess data came
from other requests. Other requests potentially came from other
users.
What if it included personal details, like names or addresses? What
if it included health information? What if it included payment data,
like a credit card number? The consequences could range from
embarassing to catastrophic.
Thankfully, this bug never made it to production, and once we loca
ted the problem code, the fix was straightforward. When the default
value really does need to be mutable, we set it to None in the func
tion definition, and then immediately give it a reasonable default
inside the function itself:
def set_reminders(self, event, reminders=None):
reminders = reminders or []
# or, if there are valid falsey inputs
# that we'd like to preserve:
reminders = [] if reminders is None else reminders
...
This way were guaranteed to start with a fresh instance each time
the function is called, and data wont leak between invokations.
Overeager Code
For longer than I care to admit, I thought it was good for code to be
proactive, to make things convenient to set up and get going. This
led me to create code that would do too much, too soon, resulting in
side effects that hampered reusability and impacted performance.
These mistakes fall into two basic categories: doing too much when
a module is imported and doing too much when an object is instan
tiated.
At Import Time
It can be tempting to set up globally available values at the module
level so that everything else can use them right away. Maybe were
establishing a database connection, perhaps performing some
expensive calculation, traversing or transforming a large data struc
ture, or fetching data from an external service.
50
Chapter 5: Surprises
Overeager Code
51
At Instantiation Time
Loading up the __init__ or __new__ methods of a class with a lot of
extra work is similar to what we saw above at the module level, but
with a couple of insidious differences.
First, unless weve made a module-level mess, the import behavior
wont be impacted. It may be enticing, daring us to use it even if it
has weird dependencies. After all, says the wicked little voice, if
were really desperate we can just feed it Mocks or @patch our sor
rows away. Come onitll be fun. If there arent dependency issues,
the class practically double-dog dares us.
52
Chapter 5: Surprises
Overeager Code
53
tion was required to ensure that data wouldnt leak between objects
in those shared class attributes. Thinking myself very clever indeed,
I created the MutantDataObject, which carefully made instance
copies of mutable class attributes.
Time passed. MutantDataObject became popular for its conve
nience and worked its way into a number of our systems. Everyone
was happy until one day when we got a nasty surprise from a new
system we were building: the system was so slow that requests were
hitting our 30-second fcgi timeout, bringing the website to its knees.
As we poked around, we eventually discovered that we were simply
making way too many MutantDataObject instances. One or two
werent terrible, but some inefficient logic had us accidentally mak
ing and discarding N2 or N3 of them. For our typical data sets, this
absolutely killed the CPUthe higher the load went, the worse each
subsequent request became. We did a little comparative timing anal
ysis on a box that wasnt busy dying, spinning up some minimal
objects with only a few class attributes.
DataObject was kind of mediocre, and StrictDataObject was,
pay too much attention to the numbers in Figure 5-3, as they werent
captured on current hardware; instead, focus on their relative mag
nitudes.
Fixing the flawed plumbing that led to instantiating so many objects
was off the table due to the time and effort it required, so we resor
ted to even darker magic to resolve this crisis, creating a new
DataObject which called upon the eldritch powers of metaclasses to
more efficiently locate and handle mutables in the __new__. The
result was uncomfortably complicated, maybe even Lovecraftian in
its horror, but it did deliver signficant performance results (see
Figure 5-4).
54
Chapter 5: Surprises
Overeager Code
55
56
| Chapter 5: Surprises
test_func_wrapper = wraps(func)(test_func_wrapper)
return test_func_wrapper
Tests that used DuckPuncher would inherit from it, define a setup
and teardown that would, respectfully, punch (to do the monkey
patch) and hug (to undo the monkey patch) the metaphorical
ducks in question, and with_setup would be applied as a decorator
around a method that would execute the test, the idea being that the
actual test would automatically have the setup and teardown happen
around it. Unfortunately, if something fails during the call to the
wrapped method, the teardown never happens, the punched ducks
are never hugged, and now the trap is set. Any other tests that make
use of whatever duck was punched will get a nasty surprise when
they expect to use the real version of whatever functionality was
patched out.
Maybe youre lucky and this hurts immediatelyif a built-in like
open was punched, the test runner (Nose, in my case), will die
immediately because it cant read the stack trace generated by the
test failure. If youre unlucky, as in our mystery scenario above, it
may be 30 or 40 directories away in some vastly unrelated code, and
only methodically trying different combinations of tests will locate
the real problem. Its even more fun when the tests that are breaking
are for code that hasnt changed in six months or more.
A better, smarter DuckPuncher would use finally to make sure that
no matter what happens during the wrapped function, the teardown
is executed:
class DuckPuncher(object):
def __init__(...): ...
def setup(...): ...
def teardown(...): ...
def punch(...): ...
def hug(...): ...
def with_setup(self, func):
def test_func_wrapper(*args, **kwargs):
self.setup()
try:
ret = func(*args, **kwargs)
finally:
self.teardown()
return ret
test_func_wrapper = wraps(func)(test_func_wrapper)
return test_func_wrapper
57
| Chapter 5: Surprises
59
The codes too simple? Baloneycode will pile up, something will
eventually go wrong, and itll be hard to diagnose. Integrating with a
third-party service? Your code might be golden, but can you prove
it? And what product owner is going to prioritize the work to add
logging over whatever hot new feature theyre really excited to
launch? The only way youre adding logging later is when you have
to because somethings gone horribly wrong and you have no idea
what or where.
Having good logging is like having an army of spies arranged strate
gically throughout your code, witnesses who can confirm or deny
your understandings and assumptions. Its not very exciting code; it
doesnt make you feel like a ninja rockstar genius. But it will save
your butt, and your future self will thank you for being so consider
ate and proactive.
Okay, so youre determined to learn from my failures and be awe
some at logging. What should you be thinking about? What differ
entiates logging from logging well?
Log at Boundaries
Logging fits naturally at boundaries. That can be when entering or
leaving a method, when branching (if/elif/else) or looping (for,
while), when there might be errors (try/except/finally), or before
and after calling some external service. The type of boundary will
guide your choice of log level; for example, debug is best in branch
ing and looping situations, where info makes more sense when
entering or leaving larger blocks. (More on this shortly.)
Log Mindfully
Its not a good idea to just log indiscriminately; a little bit of mind
fulness is important.
60
| Chapter 5: Surprises
61
62
Chapter 5: Surprises
63
CHAPTER 6
Further Resources
Now that youve seen many flavors of mistakes, here are some ideas
for further exploration, so that you can make more interesting mis
takes in the future.
Philosophy
PEP-8
The definitive resource for the Python communitys standards
of style. Not everyone likes it, but I enjoy how it enables a com
mon language and smoother integration into teams of Python
programmers.
The Zen of Python
The philosophy of what makes Python pythonic, distilled into a
series of epigrams. Start up a Python shell and type import
this. Print out the results, post them above your screen, and
program yourself to dream about them.
The Naming of Ducks
Brandon Rhodes PyCon talk about naming things well.
The Little Book of Python Anti-Patterns
A recent compilation of Python anti-patterns and worst
practices.
65
Getters/Setters/Fuxors
One of the inspirational posts that helped me better understand
Python and properties.
Freedom Languages
An inspirational post about freedom languages like Python
and safety languages like Java, and the mindsets they enable.
Clean Code: A Handbook of Agile Software Craftsmanship
by Robert C. Martin (Prentice-Hall, 2008)
Uncle Bob Martins classic text on code smells and how to pro
gressively refactor and improve your code for readability and
maintainability. I disagree with the bits about comments and
inline documentation, but everything else is spot-on.
Head First Design Patterns
by Eric Freeman and Elizabeth Robson, with Kathy Sierra and Bert
Bates (OReilly, 2004)
Yes, the examples are all in Java, but the way it organically
derives principles of good object-oriented design fundamentally
changed how I thought. Theres a lot here for an eager Pytho
nista.
Tools
Python Editors
Links to some editors that may make your life easier as a Python
developer.
Nose
Nose is a unit testing framework that helps make it easy to write
and run unit tests.
Pytest
Pytest is a unit testing framework much like Nose but with
some extra features that make it pretty neat.
Mock
Lightweight mock objects and patching functionality make it
easier to isolate and test your code. I give thanks for this daily.
Pylint
The linter for Python; helps you detect bad style, various coding
errors, and opportunities for refactoring. Consider rigging this
66
Tools
67