A Statistical Approach to Sports Betting
A Statistical Approach to Sports Betting
This version of the publication may differ from the final published version.
Reuse: Copies of full items can be used for personal research or study,
educational, or not-for-profit purposes without prior permission or charge.
Provided that the authors, title and full bibliographic details are credited, a
hyperlink and/or URL is given for the original metadata page and the content is
not changed in any way.
submitted by
Anton Altmann
January 2004
Contents
1 Introduction 15
values ................................. 26
2.1.7 Validating the model and assessingpredictive capability 26
.....
2.1.8 Comparing predictions obtained through a statistically-based
2
2.5.1 Discrepancy measures 35
........................
2.5.2 Predictive ability summary statistics 36
........... .....
2.6 Comparing predictions obtained through a statistically-based procedure
3
4.5 Possible improvements to the model 80
....................
4.5.1 Hierarchical modelling using foul rates 80
...............
4.5.2 Dependence of home and away bookings 81
..............
4.6 Conclusion 82
..................................
4.7 Additional comments and information 83
................ ...
4.7.1 Generating model 2 predictions 83
...................
4.7.2 Kernel Regression 85
..........................
abilities 132
................................
4
6.1.1 League structure 139
...........................
Game regulations
6.1.2 139
...........................
6.2 NBA data 140
...................................
6.3 A basic NBA scores model 141
..........................
6.4 Possible improvements to the basic model 146
.................
Truncation of winning margins
6.4.1 148
.... ... ....... .... .
6.4.2 Effect of schedule 150
...........................
6.4.3 Short-term form 152
...........................
6.4.4 Use of additional covariates 158
.....................
6.4.5 Increasing levels of team parameterisation 161
.............
6.4.6 Inclusion of player information 162
...................
6.5 Construction of more advanced model 164
...................
6.5.1 Adjustment for overtime periods 166
..................
6.6 Comparison of basic model and advanced model 169
.............
6.6.1 Summary statistics 170
..........................
6.6.2 Betting success 171
............................
6.7 Conclusion 173
..................................
them 176
......................................
7.2 Specification of the parametric form of direct relationships 179
........
7.3 Prior specifications 180
....... ......................
7.4 Model implementation 181
............................
7.5 A comparisonof the MLE and MCMC modelling approaches 184
......
7.6 Additional comments and information Markov Chain Monte Carlo
-
methods: a brief summary .......................... 185
8 Conclusion 188
5
List of Tables
Tablesfor chapter 4: Harsh referees and dirty teams: estimating booking rates in soccer
played in brackets 74
..............................
4.11 Referee parameter estimates at timepoints 256 and 512 75
..........
4.12 Predictive likelihood for different models 75
..................
4.13 Data for two matches in data set with equal mean returns 79
........
4.14 Frequencyof observedjoint scoresdivided by expectedfrequency given
independenceassumption 82
..........................
Tables for chapter 5: Estimating NFL scores: the threes and sevens distribution
6
5.6 Mean values for figures in data set, 1997-2001 106
...............
5.7 Touch Down and Field Goals means and variances 1997-2001 112
......
5.8 Coefficients and values for Home Rushed Yards model, using various
7
6.10 Number of observations in each confidence class 156
..............
6.11 Comparison of observed variance for variables, with simulated values
parameters are added into NBA model, season 1997/98 to 2000/01.... 162
6.13 Coefficients and p-values obtained using previous year's parameters to
predictions 164
..................................
6.15 NBA team ability estimates for home offense, away offense, home defense
8
List of Figures
Figures for chapter 4: Harsh referees and dirty teams: estimating booking rates in
soccer
4.1 Histograms of yellow and red cards, with Poissondistribution lines overlaid 57
..
4.2 Cards collected versuscards provoked, season1998/1999 59
...........
4.3 Moving averageof number of yellow cards awardedin eachmatch 60
.... ...
4.4 Predicted score advantage versus yellows collected 63
......... .... .
4.5 Moving averageof observedyellows and estimated climate 64
...........
4.6 Plot of moving averageof observedyellows along with predicted climate 65
....
4.7 Plot of moving averageof observedred cards along with smoothedclimate and
predicted climate 66
................ ...............
4.8 Plots of team dirtiness and provocation estimates over time, for Blackburn,
ferent variances 80
................................
4.12 Predicted climate curve 83
............................
4.13 Kernel regressionestimatesfor different choicesof bandwidth 86
.........
Figures for chapter 5: Estimating NFL scores: the threes and sevens distribution
scores ..................................... 95
match means 97
.................................
5.4 Histogram of all scores,either side, 1983-2000 98
.................
9
5.5 Frequency plot of P(score=211µ), for different values of p 100
... ........
5.6 Plot of f (µ) = P(X = xip) with kernel-smoothed curve overlaid, for x=
(0,7,21) 101
....................................
5.7 Plot of Egg xP(x1p), for each value of p, where the probabilities are those of
the NFL distribution 102
..............................
5.8 Plots of observed histograms of score frequencies, theoretical frequencies ob-
tained assumingnormal distribution applies, and also the computed NFL dis-
tribution, for three sets of means 103
.......................
5.9 Proportions of bets won, where a bet is made provided P(Win)>cut-off, ac-
cording to both the basic model and the NFL distribution 105
..... .... .
5.10 The conditional structure of a multivariate NFL model, with condition-
ing proceedingfrom left to right, then top to bottom 109
...........
5.11 Histograms of data, seasons1997-2001 110
....................
5.12 Density of 3*FG+6*TD, assuming FG and TD are PoissonDistributed and
ence, and predicted total score versus observed total score for quasi-multivariate
model 128
.....................................
5.15 Proportions of bets won, where bet is made provided P(Win)> cut-off, accord-
ing to the quasimultivariate model 129
... ................. ...
line 147
......................................
6.4 Plot of 4th quarter score differencesagainst scoredifferenceat the end of the
3rd quarter for basketball, seasons 1997-2001 149
.................
6.5 Plotting scores, model predictions and bookmakers spreads against recent runs
of form 155
.....................................
6.6 Plot of averageobservedscoredifferenceplus confidenceintervals, model pre-
dictions and bookmaker's line against length of winning streak prior to match 157
10
6.7 Plot of offensive parameters for home games of NBA teams at final time-point
of data set against offensive parameters for away games at final time-point ..
168
6.8 Plot of basic model score predictions against advanced model 170
...... ...
6.9 Moving average plots of predicted score differences and totals, for basic model
11
Acknowledgements
Firstly I would like to expressmy gratitude towards Dr. Mark Dixon for his large
supply of enthusiasm and expertise while supervising this thesis. Furthermore I would
like to thank Prof. Richard Verrall, Dr. Julie Badcock and Dr. Matthew Dominey for
reading earlier drafts of the thesis and providing many helpful suggestions, and Dr.
Russell Gerrard for several valuable bits of advice for some of the technical problems.
In addition, I would like to thank the programmersof the statistical packageR for
producing such a useful piece of software and supplying it for free. In addition, Luigi
Colombo, John Harrison and others made City University's researchroom a fun place
to be, John Harrison in particular providing severaluseful ideasthat havecontributed
towards the thesis. Finally, I'd like to thank my parents for their constant support,
12
Declaration
conditions of acknowledgement.
13
Abstract
While gambling on sports fixtures is a popular activity, for the majority of gamblers
it is not a profitable one. In order to make a consistent profit through gambling, one of
the requirements is the ability to assessaccurate probabilities for the outcomes of the
events upon which one wishes to place bets. Through experience of betting, familiarity
with certain sports and a natural aptitude for estimating probabilities, a small number
of gamblers are able to do this. This thesis also attempts to achieve this but through
purely scientific means. There are three main areas covered in this thesis. These are
the market for red and yellow cards in Premier League soccer, the market for scores
in American football (NFL) and the market for scores in US Basketball (NBA).
There are severalissuesthat must be consideredwhen attempting to fit a statistical
model to any of these betting markets. These are introduced in the early stagesof
this thesis along with somepreviously suggestedsolutions. Among these,for example,
is the importance of obtaining estimates of team characteristicsthat reflect the belief
that these characteristicsadjust over time. It is also important to devisemeasuresof
evaluating the successof any model and to be able to comparethe predictive abilities
of different models for the samemarket.
A general method is described which is suitable for modelling the sporting markets
that are featured in this thesis. This method is adapted from a previous study on UK
soccer results and involves the maximisation of a likelihood function. In order to make
predictions that have any chance of competing with the odds supplied by professional
bookmakers, this modelling process must be expanded to reflect the idiosyncrasies of
each sport.
With the market for red and yellow cards in Premier League soccer matches, in
addition to considering the characteristics of the two teams in the match, one must
also consider the effect of the referee. It is also discovered that the average booking
rate for Premier League soccer matches varies significantly throughout the course of a
season.
The unusual scoring system used in the NFL meansthat a histogram of the final
scoresfor match results doesnot resembleany standard statistical distribution. There
is also a wealth of data available for every NFL match besidesthe final score. It is
worth investigating whether by exploiting this additional past data, more accurate
predictions for future matchescan be obtained.
The analysis of basketball considersthe busier scheduleof gamesthat NBA teams
face, comparedto NFL or Premier Leaguesoccerteams. The result of one match may
plausibly be affected by the number of gamesthat the team has had to play in the
days immediately before the match. Furthermore, data is availablegiving the scoresof
the game at various stagesthroughout the match. By using this data, one can assess
to what extent, and in which situations, the scoring rate varies during a match.
These issues,among many others, are addressedduring this thesis. In each casea
model is devised and a betting strategy is simulated by comparing model predictions
with odds that were supplied by professionalbookmakersprior to fixtures. The limita-
tions of eachmodel are discussedand possibleextensionsof the analysis are suggested
throughout.
Chapter 1
Introduction
Among the many applications of probability and statistics, gambling is maybe one
of the more widely known and appreciated by the general public. There are many
popular forms of gambling in society today, such as state lotteries, casinogamesand
sports betting, which is the focus of this thesis.
The aim of this study is to incorporate many of the ideas that are consideredby
bookmakers
professional and successfulgamblerswithin a formal statistical framework.
By employing well-known and well-understood statistical procedures, the intention is
of a probability model, between sports betting and various other types of gambling
of both the statistical approachand the
are considered.The strengths and weaknesses
intuitive approachfavoured by bookmakersand the majority of gamblersare discussed
in Section 1.2. In Section 1.3 a brief explanation of the betting opportunities available
today is given. Somedetails concerningthe scopeof this study are outlined in Section
1.4, since restrictions are placed on the type of sports analysed. In Section 1.5 the
One way in which sports betting is different from many other types of betting, from
a statistical point of view, is that the probabilities cannot be fully specified. For
'Superior predictions are not in fact essential in order to money against the bookmakers, as
win
will be shown in Section 5.6.1.
15
many casino games, such as Roulette, the probabilities are entirely known by both the
gambler and the bookmaker. As a result there is no way the gambler can make a profit
on a long-term basis due to the small bias in the Casino's favour, assuming the roulette
wheel to be fair2. Meanwhile, for some card games, such as Poker, the entire probability
distribution for future events conditional on the information currently available to
the player is theoretically possible to calculate but in practice it is impossible for a
poker player to process the full set of calculations while the game is taking place.
Nevertheless, with experience, the player can approximate the odds of various outcomes
approach
concerningodds for sporting events has been published. In fact, the vast majority of
odds that are availablefor sporting eventsare not derived through advancedstatistical
techniques. In general they are determined through the practical experienceof those
setting the odds combined with selective use of basic figures such as team/player
averages,or in many casesit is the beliefs and behaviour of the market that determines
the odds.
Most successfulgamblers apply a thorough knowledgeof the sport on which they
16
knowledge about a sport more inviting than the process of acquiring the statistical
techniques required to produce accurate sports models as well as the computing ex-
perience necessary to implement these models. On the other hand, while the initial
ample, Forrest and Simmons (2000a) studied the predictions of English and Scottish
soccer results made by professional advisers working for various British newspapers.
It was investigated whether widely available information, such as recent form of the
teams or the difference in league positions between the two soccer teams participating
in a match, were used by the advisers. It was concluded that some of the informa-
tion seemed to correlate strongly with the forecasts made by the advisers. However
excessive weight was attached to some parts of it while other parts were not exploited
by the forecasters even though they were important predictors of soccer results. Fur-
profit curves or confidence intervals on projected returns, for example, the successof
an intuitive approach is not easily measured and attempts to do so are frequently
inaccurate, optimistic or both3.
thus inflexible to certain important factors. In horse or greyhound racing, for example,
the odds change very quickly in the hours leading up to a race to accommodate new
information such as paddock gossip, results of previous races that day or weather
spreadsheet, for past fixtures and attempting to fit a reliable model in order to update
approach towards modelling a sport, with regard to producing a prediction system that
is on average superior to the intuitive approach, one must decide carefully which sports
are suitable.
3"In betting on races,there are two elementsthat are
neverlacking: hope as hope and an incomplete
recollection of the past" E.V. Lucas, New York Times, 7 October 1951.
17
1.3 Betting opportunities
While the importance of producing accurate probabilities for sporting events is evident,
it is also necessary to maximise the potential profits of any betting strategy using these
probabilities by selecting the most attractive bets available. There are many different
mediums through which bets can be made and many different types of bets are available
for most fixtures.
Until the mid 1990sin the UK, visiting a high street bookmaker such as Ladbrokes
or Coral was the most popular method of placing a bet. More recently, many find
it more convenient to place bets with an online bookmaker, of which there are now
stakes are offered by gamblers to other gamblers, thus removing the bookmaker's role
of specifying odds. For most websites of this type, a small proportion of the profits
from winning bets goes to the website administrators.
The types of bets available for most sports fall into two basic categories. The more
traditional format is fixed odds, which is the system used by high-street bookmakers
as well as many online bookmakers. A more recent format that has become popular
within the last decade is spread betting (otherwise known as index or range betting).
For European sports, a fixed odds system generally offers odds on each outcome of an
event. For example, these odds were available from Sportingbet for a Premier League
18
probabilities is 1.11 rather than 1. The surplus of 0.11 is known as the bookmaker's
take or overround. By scaling down all probabilities by implied probabilities of
111'
(0.59,0.26,0.15) are obtained. For every £1 that is placed, the bookmaker makes a
profit of £0.11 assuming equal money is placed on all three outcomes. If money is
not placed equally on all three outcomes, the bookmaker's expected profit increases
as more money is staked on the outcomes with lower odds (so higher probability).
The size of the bookmaker's overround varies across different sports and fixtures. In
general, the more popular fixtures are frequently bet on by less discerning gamblers
who do not seek the most favourable odds, which allows the bookmaker to offer less
competitive odds yet still attract custom.
Fixed odds for many American sports differ slightly from those of Europeansports
in that handicapbets are more common4.With these,one of the sidesstarts the game
with a points handicap (known as a line) specifiedby the bookmaker,and the gambler
may place bets on which side will win the match, after the scoresare adjusted to in-
clude the handicap. The payouts for either bet are equal (this is known as evenodds).
For example,this line from Sportingbet.com's website allows two bets:
TennesseeTitans:
-7.0 -1.10
Pittsburgh Steelers: +7.0
-1.10
The gambler should back Tennessee Titans if they believe it is likely that the Titans
will beat Pittsburgh Steelers by more than seven points in the match. Conversely, if
they believe that Pittsburgh Steelers are likely either to win, or to lose by six points
or fewer, they should back them. Should either bet be successful, on a £1 stake, the
19
1.3.2 Spread betting
Spread betting is a stock-market style of betting where sporting outcomes are traded
spread for the total number of bookings points for the match as 30 - 34. The spread
the firm's expectation of the number of bookings points (10 points for every
reflects
yellow card and 25 for every red card) given during the match (if a player receives two
yellow cards in the match, resulting in a red card, then overall 35 points are included
The number of yellow and red cards that will be scored are being treated as a
commodity and gamblerscan buy this commodity at 34 or sell at 30. A typical bet on
this match might be to SELL at 30 for £1 per point. If the final total is
0
"4 yellows and reds = 4* 10+0=40 points then the return is (30 - 40) x1= -£10
1
"0 yellows and red (10*0+25*1)=25 points then the return is (30 - 25) x1=
+L5.
respectively. In addition, the spreads often fluctuate up and down in the lead-up to
kick-off to reflect the betting behaviour of gamblers, for the same reasons and in the
fixed odds bet is made, this is not always the case. For example, if one bets that the
of
score a soccer match will be 0-0, the bet (and possibly interest in the fixture) is lost
as soon as a goal is scored. On the other hand, some gamblers are deterred from spread
betting since the maximum loss is uncertain prior to a fixture. For example, if in the
example above the number of bookings is sold, the maximum return is +30 times the
selected stake whereas the maximum loss could be as high as around 130 times the
stake. Some spread betting firms offer stop loss accounts, where the maximum gain or
loss on each market is set to some agreed limit.
on either side of the offered spreads. Assuming, for simplicity, that £K has been both
bought and sold on a constant 36-40 spread for a fixture, this will result in a risk-free
profit of £4K for the bookmaker. Analogously, in the caseof fixed odds betting such
20
as an NFL handicap bet as described above, the bookmakers will ideally have equal
amounts of money staked on both outcomes of the match. Thus, contrary to popular
belief, the points handicap is not specified with the aim of estimating the most likely
difference in score in the match but with the aim of attracting an equal volume of
bets on each outcome. As a result it seemsfeasible that the probabilities inferred from
the odds available from bookmakers are not always unbiased. Besides the obvious
difficulties in evaluating unbiased probabilities for each outcome of any sporting event
the effect of a side playing at their home ground is examined. It is concluded for both
sports that win rates of sides playing at home do not differ significantly from the rates
inferred by the odds. Vergin (2001) simulated a set of betting strategies designed to
investigate the theory that handicaps reflected a tendency of the public to overestimate
the ability of a team with recent good results and to underestimate the ability of a
team with recent bad results. The theory appeared to be true in the case of a side with
recent good results. Furthermore, Simmons, Forrest and Curran (2003) investigate the
efficiency of the handicap and spread betting markets of Rugby League fixtures. It is
concluded that the handicap market does not fully incorporate the home advantage
but does correctly evaluate the relative strengths of the two competing teams (thus
has no favourite underdog bias). The spread betting market is unbiased in these two
-
respects however. It is speculated that the handicap betting market may comprise
gamblers who select bets largely on an emotional basis whereas due to the higher risks
and higher returns associated with the spread betting market, any spread bet placed
must seem financially sound to the gambler. In a study of NFL betting lines, Vergin
(2001) supports this theory by reporting that "Line-makers report that bets from the
Note that bias in the probabilities inferred from a set of odds doesnot imply market
inefficiency, which, as stated above, only arises if an expected gain is available by
21
placing a bet on at least one outcome. There may not be such a bet if the bookmaker's
overround prevents the gambler from exploiting the inaccuracy in the probabilities in
" Are there significant betting opportunities for the sport? Ideally it would be
possible to have a successful model for predictions of the results for a sport and
to use these in order to develop a profit-making strategy by placing bets with
bookmakers. The scope for a model to be profit-making is increased with a
greater number of events to bet on, and with a larger selection of different bets
available.
" Are there adequate data resources available? There are detailed match reports
available on the internet for all fixtures since 1995 for all of the four major
American sports (NFL, NFL, MLB and the National Hockey League). Some
available for UK soccer matches since 1996. More recently, detailed web pages
for cricket and rugby matches have become available.
There are of course other criteria besidesthose mentioned above, not least the ease
with which the sport can be modelled statistically. Certain sports, such as cricket and
golf, do not havescoring systemswhich can be easily approximated by the well-known
distributions, while another problem is that the covariatesrequired in establishing a
model for some markets, such as the number of corners in a soccer match, are not
always clear.
Although quantitative analysis may be of interest for many sports, in the interests
22
" the outcome is described by an accumulated total, rather than a time or an event.
This excludes sports such as boxing or athletics.
Among the sports which do fulfill the necessarycriteria are UK soccer,NFL, and
NBA. In addition, this thesis only considers the statistical methods required in order
to predict final results of matches and does not attempt to model various other aspects
of sports fixtures, such as the time until the first goal or the winner of a league. Also,
since the final result is the variable of ultimate interest, within-game modelling is not
generally considered. For a detailed analysis of within-game models for soccer data,
the reader is referred to Hirotsu and Wright (2002).
very general level. The remainder of this thesis is structured as follows. In Chapter
2 the main issues that need to be addressed when modelling sporting events of the
type listed in Section 1.4 are specified. Possible solutions are discussed by means of
a literature review. Chapter 3 gives a detailed explanation of one of the stages of the
modelling process, namely the estimation of the parameters of a specified statistical
model. Three individual sports markets are treated in depth in Chapters 4,5 and 6
using the procedure outlined in Chapter 3. These are respectively the markets for red
and yellow cards for Premier League soccer matches, NFL scores and NBA scores. An
23
Chapter 2
As stated in Chapter 1, the motivation for this thesis is to produce statistical models for
outcomes of certain sports events. Ideally these will be able to generate probabilities
which are competitive with the odds provided by professional bookmakers. In order to
approach this task, some of the techniques already applied to modelling sports will now
be discussed via a literature review, with the aim of outlining some of the problems
is to produce the most accurate possible prediction for the final result of a fixture
between two teams. This can be expressed as result (X, Y) between teams (ti, t2).
Note that (X, Y) does not have to denote a final score, and could express the total
number of red cards or the number of shots on goal for example. The following issues
The scale,varianceand range of X and Y vary from sport to sport. Somethought must
be given as to which of the common statistical distributions are most suitable, or if a
24
2.1.2 Accommodating dependency between the home and away scores
It is plausible for many sports that information about the value of X may affect one's
belief about the density of Y. Hence the joint distribution of (X, Y) may need to
such a joint distribution depends on both the response distribution chosen and the
a well known bivariate form whereas the Poisson distribution, for example, has a
considerably more complicated, and thus less flexible, bivariate form. In addition,
the relationship between X and Y may be of such an intricate form that no existing
not adequately expressed by a single parameter. For example, if the expected value
of X-Y is large and positive, this could arise because ti has a tendency to play in
such a way that high values of X are expected. Alternatively their style of play may
generally prevent high values of Y arising. These situations are analogous to a team
respectively being predominantly attacking or defensive if X and Y are soccer scores.
In addition to studying the mean values of the team parameters, their variance may
also be of interest, since some teams may be less consistent than others.
For severalsports, totals for in-game statistics such as attempted shots, fouls commit-
ted and time spent in possessionof the ball are available. In addition to these,factors
such as the effect of playing at home, the length of time sincethe previous fixture, the
key injured players and many other relevant factors could be included to improve the
accuracy of predictions.
It seems reasonable that the values of the parameters of ti and t2 should vary over
time. However, the way in which past information, such as results of previous fixtures,
much importance should be attached to a result from a fixture that occurred one year
25
ago compared to the result of a fixture that took place the previous week?
While exploration of the data based on one's knowledge of the sport can lead to the
employed in the model can be a considerable task. An overly ambitious model may
even make the process mathematically intractable. In some cases compromises with
The task of producing statistical models for sports results differs from the task of
producing statistical models for some other applications in the sense that it is the
of interest (although these processes are of course linked). In particular, the danger
of over-fitting must be considered. For example consider the situation that in the
first five matches of the data set where a team played on the birthday of the wife of
the team's manager, the team won. If the objective is to interpret existing data a
common step would be to obtain the optimal fitted values for the data. To achieve
this, an indicator vector signalling matches when teams played on the manager's wife's
birthday should be included. Assuming that these
results are entirely coincidental,
as seems likely, the predictive power of the model will be greatly harmed by doing
so. Hence standard measures for determining the accuracy of a fitted model, such as
R2 or C,, are only considered provided the conclusions drawn from their use are also
will be devised.
The stated ambition in Chapter 1 was to produce predictions that are superior to those
26
2.1.9 Considering betting strategies based on model predictions
The development of a betting strategy once predictions have been obtained from a
to represent NBA scores does not necessarily ease the task of finding the optimal
this chapter.
This chapter contains mainly technical material which is included in order to sug-
gest some techniques that could be considered for the treatment of tasks that arise
later in this thesis, and to introduce the reader to some of the thinking behind existing
in the techniques applied specifically in this thesis, the remainder of this chapter does
not need to be read immediately and can be referred to, where directed by the text, if
In order to link the ideas outlined very generally above with the formal analysis sum-
27
of the home team and away team, denoted by i(k) and j(k) respectively. The term
`score' is used here for convenience, however, as mentioned at the start of Section 2.1
Xk and Yk could also represent match totals of figures other than the final score, such
time hence the two ability parameters of team i(k) at time t are denoted by aI(k),t and
A(k), If Xk and Yk represent the final match scores of a fixture then the a and ß
t.
terms represent the attacking and defensive abilities of the team respectively.
Two further parameters are included in this example model. The effect of playing
at home is described by a single parameter 6 and it is assumed this effect is the same
for all teams. Finally, since it is desired that the ß terms have a mean of zero
a and
to aid their interpretation, a term to represent the global mean is included, denoted
by ry. Note that the inclusion of ö means that ry is effectively the mean for all away
fixtures.
With these terms defined it is now possible to specify a model.. The expected
scoring rates of both sidesfor match k, which takes place at time t is as follows:
E[Xk] = +
exp(7 ai(k), t + ßj(k), + b)
t
E[Yk] = exp(7 + ai(k), t +ß (k),t)
almost all sporting markets (including those studied in this thesis) match totals are
always greater than or equal to zero. This implies that their means must be strictly
positive, hencethe use of the exponential function.
Not all previous academicsports studies use the match scoresas the outcome vari-
able. For example Forrest and Simmons (2000b) when studying English and Scottish
Leaguesoccerresults representoutcomesusing a vector that can take on three values,
in order to representthe three outcomesof home win, draw and away win.
28
2.3 Allowing parameters to adjust over time
In most applications to sports the a and ß parameters need to adjust over time. Gener-
ally their abilities change for many different reasons and the parameters need to reflect
teams drifting in to and out of form, losing players or changing coach, for example.
Large sections of the previous relevant literature are devoted towards modelling this
Mt
lo9(L(Xk, Yklet)) * exp(-c(t tk)) (2.3.1)
L, -
k=1
where Mt is the number of matches played prior to time t, and 6t is the set of parameter
values at time t. In this framework the same values of a; and ß; are employed for
every match in which team i plays. However the importance of matches towards
obtaining optimal estimates for a; and ß; decreases the longer ago the match is, via
the exponential term. Since the quantity in Equation 2.3.1 is maximised at many time-
points throughout the data set, the estimates of the a and ,Bterms evolve through time
even though for each individual maximisation the same team parameter value is used
for each match in which the team plays. S is selected in order to maximise the predictive
capability of the model. Due to the non-standard nature of the predictive likelihood,
this can only be achieved by inspection through testing a range of values of c and
sume as follows a random walk, as implemented by Fahrmeir and Tutz (1994) in the
development of a paired comparisons system, so that
t + Ui,t+i
CYj, (2.3.2)
where
ui,t+i ^' N(O, ai)
29
and Stern (1998) modified this concept slightly for a model of NFL scores,so that
where u;,t is as in Equation 2.3.2, (but in fact was not set to be team specific), sc
a trend where the overall disparity between the ability of teams increases, or decreases,
over time. Glickman and Stern obtained a posterior value of 0.99 for k, suggesting that
and Stern formulation is that if team i doesnot play at time t+l, but other teams do,
then
N
E[ai, t+l] = as,t -n aj, t
j=1
(where the rc term included in Equation 2.3.3 is assumed to be 1). Hence team i's
parameter adjusts even though they didn't actually play, since the model has not been
designed with the property that
1N
- aj, t =
n j=1
In certain situations, there is some sense behind this phenomenon. For example, sup-
pose Liverpool beat Southampton 3-0 on day t and on day t+1, Manchester United
beat Southampton 7-0. Liverpool's 3-0 result looks a little less impressive after time
(st+1 (Paa
- ai, p
) (
_
Paß
) () as, t - CYi,O
+ ( ui, t+l
ßi,
t+1 -
ßi, 0
-. Pßa Pßß ýi,
t -
ßi, 0 vi, t-I-1
where
Ui,t+l
,ý,JV2(0)
E)
vi, t+l
independently.
This structure allows dependencebetweena team's attacking and defensiveparam-
30
eters, firstly from the constant (with respect both to time and teams) autoregressive
component, via paß and pßa, and also the time-specific, team-specific variations u;,t, v:,t
the time since the previous estimation. Note that the treatments above have indexed
time, hence either assume that all points of estimation are equally far apart, or that a
team's ability varies equally between each time-point when a game is played regardless
of the time differences between these time-points. However, Rue and Salvesen (1997)
use Brownian motion to model the evolution of parameters, so the attacking parameter
s) (2.3.4)
«i, tia = ai, t + Baut Bai
-
T 7t
where B(t) is standard Brownian motion starting at level 0 and r is the non-team-
specificinverseloss of memory rate for the a parameter. The defensiveparametersare
similarly defined.
Harville's (1980) treatment of NFL models the team abilities as a random effect,
within a mixed linear model framework, where team abilities are assumed to vary from
season-to-season,but not from game-to-game within a season. Hence the score differ-
ence Sk for match k between sides h(k) and a(k) during season m can be represented
as
Sk = Th(k), - Ta(k), +H+ Rk
m m
where Th(k),,,,,and Ta(k),,,,representthe abilities of team h(k) and a(k) relative to the
averageability during seasonm, H is the home effect and Rk is match k's random
residual effect.
Finally, it is necessaryto make suitable adjustments for seasonbreaks, particularly
when one considers Equation 2.3.4. The English League soccer season typically breaks
for around 3 months over summer, while the NBA season has a six month breaks, and
the NFL season breaks for 8 months. It seems unsatisfactory to treat these breaks as
any other and one might expect team abilities to vary at a different rate during season
breaks compared to the gaps between fixtures during the regular season. Glickman and
Stern (1998) acknowledge this effect when specifying
an NFL model and incorporate
two further parameters into their model so that, if time t+1 is the time of the first
31
match of a new season,
where
is shrinkage/expansionregressionparameterand aj
K, a season-to-season is the between-
seasonevolution standard error. The posterior value for r., obtained was 0.82, with
95% posterior interval (0.52,1.26), based on six season'sworth of data. The fact
that rc, <1 is plausible since the post-seasondrafting system is designedso that the
most promising American football players from US collegesare allocated to the worst
performing teams from the previous season,in an attempt to prevent the hierarchy
becomingtoo ingrained. In English Leaguesoccer,with no such system in place, it is
parameter values
in a matter of seconds and has to be repeated for each time-point, or for however
many estimations are required for the desired level of accuracy. However, as outlined
in Section 2.3 the Dixon and Coles formulation does not feature `true' dynamic team
abilities since in every match in which a team participates, the same parameters are
employed for that team. Furthermore, while it is possible to obtain both MLEs and
their standard errors (by taking the diagonal elements of the inverse of the observed
information matrix of the likelihood), the full posterior distribution of the parame-
ters is not available. This makes it difficult to verify the validity of the parameter
distributional assumptions, for example.
On the other hand, now consider the likelihood functions that apply to a true
32
conditional on initial value Bo is
fl
=f... P(Gi,..., Gt, 01,..., 9t lOo)d91.... dOt (2.4.1)
t
For a model of NBA scores, using Brownian motion for the drift of team ability as
demonstrated in Equation 2.3.4, the parameters required include attack and defensive
parameters for each team at each time-point, plus constant parameters for the global
mean, home effect, memory loss, and between-season expansion/shrinkage. This gives
nique, although doing so requires considerable thought in order to divide the complete
posterior distributions. Also, some inspection is required in order to find suitable prior
thesis.
One other technique that has been considered in order to obtain time-dependent
Xk = AXk-1 + Wk-1
error. The Kalman filter approach produces predictions for x via a set of prediction
33
equations, which predict values for x and the error covariance matrix, and a set of
measurement update equations, which act as feedback to the prediction process. The
update equations are designed in order to improve future predictions. The extended
Kalman filter relaxes the assumption that the process must be linear.
Fahrmeier and Tutz (1994) apply an extended Kalman filter to the set of param-
eters, which include response thresholds (since their response vector is categorical),
team abilities and, optionally, parameters for any other covariates. The estimation
combination of filters and smoothers. The filters are the loglikelihood of the data given
parameter estimates at the latest time point and the smoothers are the loglikelihood
of the transitions of the parameter values from one time-point to another. At each
time-point it is assumed that
to be estimated by the model. Forrest and Simmons (2000a), for example, approximate
team abilities with a range of measures such as recent form, league positions and total
scored/conceded goals in the current season. The match result is then regressed against
this set of measures and the coefficients of this regression are the parameters to be
estimated. As mentioned in Section 2.2, Forrest and Simmons classify a match result
as either a home win, a draw or an away win. An ordered logit model is used to obtain
parameter estimates and this could in principal be extended to a model which estimates
team parameters. While many popular statistical packages supply routines for ordered
logit analysis, ideally the estimation would be adapted to allow team abilities to vary
over time. This could be an onerous task given the large computational requirements
34
2.5 Validating the model and assessing predictive capa-
bility
One suitable technique to see how closely the specified model mimics the observed data
is to compare the predictive distribution to the data. This can be done by simulating
a suitable number of samples from the predictive distribution and comparing these
samples to the observed data. There are usually various aspects of the data that
can be checked and it is therefore useful to devise one or more test quantities. If a
model is being developed within a classical framework, this is a scalar summary of the
data. If the problem is being considered within a Bayesian framework, this is a scalar
summary of both the data and the parameters. This test quantity T (y) is known
as a test statistic, in the classical case, and as a discrepancy measure T (y, 0) in the
Bayesian case (Gelman et at 1995). Using these test quantities, tail area probabilities
to quantify the scale of disagreement between model and data can be approximated.
In the classical case, suppose there are n observed values y= (yi,... y). K
,
copies of replicated data, y,... yK given the model and the estimated value of 0
,
can be generated. Hence yi is a vector of simulated values (y; ). Then set
l, ... ,y;,
T (yi) = min(yl,... y,, ), for example. Then the tail area probability could be defined
,
as the length of the vector
considered, rather than their point estimates. As such, sample values 01*,..., BK are
generated from the posterior distribution of 0, then for each generated value, a single
yi 19' is generated. Tail areas can be computed as above, to approximate P(T (y', 0) >
T(y, O)Iy)
Glickman and Stern (1998) apply this technique to their NFL model, which is
defined from a Bayesian perspective. One assumption which they test is that the
variance of the score difference, conditional on its mean, is equal for all games. Two
discrepancy measures,which are sensitive to this assumption, that they use to test
35
this axe
" the difference between the largest annual average squared score-prediction resid-
ual, and the smallest (Glickman and Stern have six years of data available)
" the difference between the largest and smallest average squared score-prediction
In fact, Glickman and Stern do not obtain any significant evidenceto suggestthat
the variance of a match score is a function of its mean. They also conclude using
discrepancy measure techniques that it is necessaryto include team-specific home
Predictive ability summary statistics can be used either to test the effect of model
uating the predictive ability of a sports model. Noting that the only outcomes in
Knorr-Held's model are win/draw/lose, then given a total of N matches, and R pos-
(R=3 in this let pk denote the estimated probability that the
sible outcomes case),
result of match k will be r, where rE (1, R) and kE (1, N). Note that pk is
... , ... ,
calculated only using data available prior to match k. Also, let the observed result be
denoted by s for each match, hence pk is the probability of each observed result, as
1. the number of correctly predicted results, where the predicted result is the out-
2. N Ek=1 Io9(i'k)
3. )2 + Er#a(Pk)2)
-N Lk 1((1 -Pk
4" Lk=1 Pk
171
36
set of scores rather than just win/draw/lose, it can be defined as follows. If t denotes
the time at which match k takes place and ©t represents the parameter estimates based
on all data available up to, but not including, time t then the predictive likelihood is
defined as
N
E lo9(P(Xk, YkIOt(k)))
k=m
whereXk, Yk are the observedhomeand awayscoresand m is the first match for which
predictions are made'. It must be used with somecaution, however. Firstly, it is sen-
sitive to outliers, although this is lessof a problem if it is usedonly to comparenested
models. Secondly,it is not robust to mis-specificationof the responsedistribution.
Measure1 is only suitable wherethe number of possibleoutcomesis small, although
a similar measure to compare two models could be to count the number of occasions
one set of predictions is closer to the observed score, and to verify if the proportion is
significantly different from 0.5. This can be done via a straightforward binomial signs
test. A consequence of using such a measure is that the magnitude of error is not
considered. As a result, this measure may not pick up model deficiencies particularly
well.
Measure 3 is a quadratic loss, while measure 4 is similar to measure 2. However,
measure 4 has the disadvantage that if a result occurs to which the model had assigned
an extremely low, or even zero, probability, the measure isn't greatly penalized.
model predictions with the accuracyof the probabilities inferred from the bookmaker's
odds. In order to do this, a suitable definition of `accuracy' is needed. Stefani (1980)
usesthe absoluteaveragedifferencebetweenthe predicted scoreand the observedscore
for both College Football and NFL games. Harville (1980) also usesthis statistic in
order to compare the accuracy of predictions from an NFL scoresmodel with a book-
maker's line. Another measureconsideredis the squareddifferencebetweenpredicted
and observedscore,which penaliseslarger discrepanciesmore severely. Another mea-
sure used by Harville is the proportion of occasionsthe prediction system correctly
1Many models require a `burn-in' period so that
predictions are only evaluated once sufficient data
has been observedto make reasonableestimates of parameter values.
37
predicts the winner of a fixture.
accurate each year. The general conclusion to Harville's comparisons was that the
bookmaker's predictions were more accurate at the start and end of an NFL season,
while the model performed better during the middle of the season. It is suggested
that at the start of the season the bookmaker takes account of factors such as roster
changes, injuries and pre-season exhibition game results, while at the end of the season
the importance of late-season matches differs from team to team (this is discussed in
more detail in Chapter 5). The model implemented by Harville is based solely on match
scores (excluding exhibition games) hence does not accommodate such information.
Glickman and Stern (1998) comment that their NFL predictions' Mean Square
Error was smaller than that of the bookmakers, and also claim that for 65 out of the
110 validation matches the model predictions would have produced winning bets. They
also comment that `for this small sample, the model fit outperforms the point spread,
though the difference is not large enough to generalise'.
Harville (1980) suggeststhat bets could be made if the following ratio exceeds0.5 by
a sufficient amount:
P(Sk > Bk)
P(Sk > Bk) + P(Sk < Bk)
where Sk represents the score predicted by the model and Bk represents the book-
maker's line. The paper states that `the proposed betting scheme would generally
have shown a profit during the 1971-77 period', however it is not stated whether the
Dixon and Coles (1997) use a betting strategy similar to Harville's although they
also adjust for the bookmaker's overround. Hence, repeating the notation used to
describe the four measures suggested by Knorr-Held in Section 2.5.2, if they estimate
Pk
k
Trk (2.7.1)
38
(note that ET_R bk >1 for any bookmaker, which reflects their overround). So bets
should be placed provided the value in Equation 2.7.1 exceeds some cut-off value
Using predictions generated by a statistical model, Dixon and Coles simulate such a
strategy for different values of ý during the 1995/96 English League soccer season and
discover that overall profit can be made for C>0.15. There is considerable variance
in this profit and the 90% bootstrap confidence intervals of the realised profit when
e>0.15 generally include 0, and indeed also the loss that one would realise on average
were bets placed randomly. Nevertheless there is some indication that their predictions,
from a relatively simple model, can form the basis of a profit-making betting strategy.
Rue and Salvesen(1997) suggestthat a betting strategy could take account not
just the expected profit from making a bet but also the variance of that profit. Hence
bets should be placed with regard both to maximising profit but also restricting the
probability of ruin. Defining P as the profit on a bet, p* and a* as the expected profit
set of matches which can be bet on, then the optimal values of ßr can be found by
maximising
E(P) - Var(P) _r (14k- ßr (ar )2)
.
jEB
on original capital.
More complex strategiesthan this can be considered.In particular, one could take
into account the amount of capital available and the utility of money. In addition, many
trial and error. Thus the computation time required to implement any stage of the
modelling processhas to be short enough for the model developmentto take place on
39
a practical timescale.
An additional issue is the availability and cost of data required by a process. There
spective in that it may, for example, produce estimates of quantities that have lower
variance, or have lower expected bias, than an existing method. However, these im-
provements may only be observable given a suitably large amount of data. Considera-
tion must be given towards how much data is likely to be necessary and whether such
40
Chapter 3
parameter estimates
In the previous chapter some of the key issues involved in the modelling of sports
results were described and a selection of previous treatments were summarised. Some
of this material is helpful to raise awarenessof the potential problems that arise, while
is
some of more direct importance it
since can be applied, with minor modifications,
to the sports markets that are to be modelled in this thesis.
To clarify the objective of this chapter, in terms of how it ties in with the other ma-
terial in this thesis, it is helpful to outline the general procedure involved in modelling
a sport, and by exploring the available data. This stage concludes with the
specificationof a statistical model which relates the parts of data that are deemed
to be important to eachother via a set of parameters (such as team abilities).
3. Using the estimates obtained in stage 2, the validity of the specified model is
41
predictions for future sporting fixtures and these predictions can form the basis
of a betting strategy.
Chapters4,5 and 6 mainly cover stages1 and 3 of this process. Stage2 is covered
in this chapter. The material in this chapter is quite technical although an exhaustive
sections of this thesis, which, in Chapters 4,5 and 6, is the construction and application
of sports specific models. This chapter can be read in its entirety if the reader is
interested in the technical aspects of the parameter estimation process, otherwise the
reader may find it more useful for occasional reference, where indicated in the text,
and Coles (1997). The original application was the modelling of UK soccer scores and
the procedure they used can be extended quite easily to model other sports. It is by no
means the only technique that has been applied in studies of sports modelling but it is
used by all models in Chapters 4,5 and 6. Some other parameter estimation procedures
are mentioned in Section 2.4, one of which (the Markov Chain Monte Carlo approach)
is applied to a model for NFL scores in Chapter 7. For the models elsewhere in this
thesis, the procedure outlined in this chapter is considered to be more suitable. The
strengths and weaknessesof it compared to the Markov Chain Monte Carlo approach
In this section the procedure employed by Dixon and Coles (1997) to obtain estimates of
model parameters of English soccer teams in summarised. From here on this procedure
is referred to as the MLE method Initially the model specification that Dixon and
Coles chose is described. It is assumed that home and away goals follow independent
Poisson distributions. Given N matches in total, then for match k between teams i(k)
where
42
" Ak = e°i(ý)+ý3(w)+a and p= eaj(k)+O, (r)
" af(k), aj(k) representthe home and away sides' attacking capabilities,
" 191(k)ßJ(k) represent the home and away sides' defensive capabilities,
,
probabilities with respect to the (a, ß, ö) parameters in order to obtain maximum like-
lihood estimates (ä, ß, Ö). However, to do so in this case assumes that all parameters,
including team abilities, are fixed over time which in practice is not believed to be the
case. Various treatments of this problem for other modelling frameworks are outlined
in Section 2.3. The MLE method uses a `weighting' factor, Tk, for each match. Hence
N
E log(LK)T
k (3.1.2)
k=1
The parameter Tk should be larger the more recently the match took place. The
form for Tk chosenby Dixon and Colesis
where t is the current time, tk is the time match k took place and c< oo is a coeffi-
cient chosenin order to maximise the predictive ability of the model, rather than the
loglikelihood specifiedin Equation 3.1.1. Note that in the interests of readability, this
is
pseudo-loglikelihood referred to as the `likelihood' throughout this thesis.
From here on, Tk is referred to as an external parameter, while the team, global
mean and home effect parameters,which are maximised at each time-point as part of
the likelihood, are referred to as internal parameters. There is no algebraic solution
to finding the maximum likelihood estimates of the internal parameters but Newton
Raphson maximisation techniquescan be used without major difficulty.
In order to assessthe predictive ability of the model, a scalar quantity referred to as
the predictive likelihood(PL) can be used. It is defined as the sum of the loglikelihoods
43
the time of the match. Hence
N
PL =E 109(P(Xk)Yk)I19t(k)) (3.1.3)
k=m
where m denotes the first match after which sufficient data has been observed in order
to be able to make reliable estimates of the parameters, t(k) is the time at which match
k takes place and 9t is the set of (a, #, b) estimates based on all matches up to but
not including time t. This sum must be computed for a range of values of c until an
solution to the likelihood maximisation, since a constant can be added to all the a's
and subtracted from all the ß's without affecting any of the score predictions. Dixon
Colesintroduced the constraint that E,! '=11
and a= =0 to achievea unique maximum
likelihood. The effect of this along with somealternative solutions, are now discussed.
In order to generalise the MLE method described in the previous section so that pa-
rameter estimates for other sports can be obtained, some modifications are required.
For example, Dixon and Coles have all English soccer results from all divisions, plus
results from cup games, in their database and so it is fairly rare that a new team enters
into their likelihood. In certain other situations, such as the yellow cards application
in Chapter 4 where only Premier League data is employed, teams frequently enter and
leave the data set. As a result, the sum-to-zero, constraint applied to the teams' pa-
season. Each team's ability is constant throughout time and can be summarised by
a single parameter. These abilities relative to team A's are (0,0.3, -0.5, -1.0). The
interpretation of these parameters is that team B on averagebeats team A by 0.3
goals, for example and similarly for teams C and D. Furthermore, an intercept term ry
44
is required so that if team A were to play a team of equal ability, on average a total of
3.0 goals would be scored. The score of a match is not affected by whether the match
is played at the home ground of either side in the match.
E[X + Y] = ry-}-a=-I-aj
E[X - Y] = a= - aj
1 1 1 0 3.3
1 1 0 1 2.5
1 0 1 1 2.8
aA
0 1 -1 0 = -0.3
aB
0 1 0 -1 0.5
ac
0 0 1 -1 0.8
0 1 1 1 0.0
45
1 1 1 0 0 3.3
1 1 0 1 0 2.5
1 1 0 0 1 2.0
1 0 1 1 0 2.8
1 0 1 0 1 ry 2.3
1 0 0 1 1 aA 1.5
0 1 -1 0 0 aB = -0.3
0 1 0 -1 0 ac 0.5
0 1 0 0 -1 aD 1.0
0 0 1 -1 0 0.8
0 0 1 0 -1 1.3
0 0 0 1 -1 0.5
0 1 1 1 1 0.0
The problem here is that while the abilities of teams A, B and C have not changed,
aA, aB and ac have changed in order to satisfy the sum-to-zero constraint. The
predictions of match results are still valid but the parameter estimates become less
interpretable. One could choose the alternative constraint that the ability of team A is
always set to be zero but this means that the abilities of teams B, C and D can only be
expressed with respect to the ability of team A which, for real life applications, would
be likely to change over time.
Another approach is to include a prior on the team abilities when the likelihood
is maximised. Referring back to the example described by Equation 3.1.1, the most
natural prior assumption to place on the team abilities is to assumethey are Normally
distributed with mean zero. In addition, the overall mean scoring rate can be accom-
MNN
11 P(Xk, Ykja'i(k)l fli(k)l pj(k), &) f[ [J
Ctj(k) l ''l ir(ai) ir(Qi)
k=1 i=1 i=1
46
where
7r^' r(O,
TAO)
Taß is an external parameter, hence the optimal value of Ta,fi, like c, must be found by
inspection of the predictive likelihood defined in Equation 3.1.3.
It should be clarified at this stage that the prior distribution referred to here does
not serve the purpose conventionally served by a prior term in the context of Bayesian
statistics. The Bayesian interpretation of the prior term used here would be that
before any data has been observed, it is believed that all teams have equal a's and ß's
and that this belief is modified upon observing data. This is not the reason for the
inclusion of the prior term in this case. The prior term here serves an entirely different
purpose, which is to act as a constraint on the estimates of the parameters so that the
likelihood can be maximised.
One problem caused by the MLE method for parameter estimation is that by down-
weighting matches that took place less recently in the likelihood, information about
all parameters (rather than just the team abilities) is down weighted. Thus when the
parameters for the global mean, 7, and the effect of playing at home, S, are estimated
recent matches are given greater weight. Since these parameters are considered in
practice to be constant throughout time, this is not desirable. This same issue applies
to the estimation of the score standard deviation term if a Normal distribution is
employed for the scores, for example, and the correlation coefficient if a bivariate
Normal distribution is used. Hence, at the start of a season in particular, the estimates
of these parameters are based largely on recent results. While in some cases it may be
desirable that the global mean and other terms vary with time to some extent, there
47
then a suitable likelihood function could be
M
fl
L(X, Yl a, ß, 7, b, Qx, ay, p) = P(Xk, Yk l a'i(k), as(k), Pi(k), ßj(k), Y, 6, ax, aY, P)
k=1
N
jlir(ai)ir(Qi) (3.2.1)
* ir('Y)ir(b)ir(crx)ir(ay)ir(P)
i=1
where the )
ir(. terms are such that
11r(bo
08N , n)
9 ax N N(axo, Tex)
fsI (p0,
"pr# p)
It is then necessary to choose appropriate mean 7o, 8o,Qxo, ayo, Po and variance
The prior values for yo and 8o at time t could respectively be Yk1(t(k) < t) and
(Xk -Yk) I(t(k) < t). It is more difficult to select suitable initial values for the axo, oyo
and po terms. For example, ax and ay are conditional standard deviations, conditional
on covariates including team parameters and the effect of playing at home. Thus in
to produce a reliable estimate of (ax Ia, ß, 'y, b), suitable estimates for the a, ß
order
terms, for example, are necessary. Estimates for these parameters can only be obtained
by maximising the quantity in Equation 3.2.1. Yet it is for this process that suitable
values of oxo and ayo are required. Similarly, p is a conditional correlation, so a similar
argument applies. There are still various possible estimates for axo, ayo and po that
then be calculated. It is rather time consuming to repeat this process on every occasion
that the model parameters need to be estimated and so setting axo, oyo and po to
48
be the unconditional standard deviation of home scores, the unconditional standard
deviation of away scores and the unconditional (home score, away score) correlation
is a straightforward alternative. This will normally give inflated estimates for the ax
and ay terms, since team abilities account for some of the variance in almost all of
the situations which are investigated in this thesis. Techniques to scale down these
figures could be considered, although these would be chosen in order to maximise the
predictive ability of the model, along with the other external parameters.
Suitable variance quantities T. 7'a,Tax, Toy Tp for the priors are also required. A quan-
y, v
tity that allows sufficient movement from the initial estimate of the parameter, without
allowing excessive fluctuation of the estimate (which could bias predictions of future
results) is desirable. One obvious candidate is the standard error of the initial value
described in the above paragraph. This can be obtained either through formulae if
possible (see below), or alternatively a simple model with no team effects can be max-
imised. Standard errors of the terms of interest can be obtained by taking the diagonal
Ty -N
Var(Yk'(t(k)< t))
Var(Xk -Yk)I (t(k) < t)
Ta N
As previously discussedin this section, selecting appropriate values for Qxo, aYo
and po is problematic thus r, r0., and ip may need to be selectedto create suitably
weak priors.
The only covariates in the models specified so far in this chapter are two ability pa-
rameters for each team and the effect of playing at home. However, as discussed in
Section 2.1.4, there are often additional covariates that may improve the accuracy of
the covariate and the response. One straightforward model for soccer scores (Xk, Yk)
could involve using the attempted goals, or shots, (HSk, ASk) as covariates, with two
49
extra parameters r.l and ist:
A model for the prediction of (HSk, ASk) could be developed and combined with that
describedin Equation 3.2.2to obtain a joint distribution for (Xk, Yk,HSk, ASk), which
which is describedin Section 3.2.2, also affects the estimation of the rci and k2 terms
in Equation 3.2.2. The true valuesof theseparametersare not consideredin practice to
vary over time but the parameter estimation procedure, as describedthus far, places
greater emphasison recent results when nl and n2 are estimated. The problem is
compounded if the covariate is an indicator variable representing a rare or seasonal
event. An example of this could occur in soccerwhere a variable Zk could be defined
so that
1 if both teams in match k are threatened by relegation from the league should
Zý = the match be lost
0 otherwise
After the first match of a season where Zk = 1, the estimate of its coefficient is
heavily affected by the result of this match, rather than averaged out over that match
and all others in previous seasons as desired. While this is also true for parameters
such as the global intercept ry or home effect 8, it is less critical since the presence of
ry and b in the specification of the conditional mean of every match ensures that their
50
Next, the following functions are defined:
" 12(AIT*)
= IIk 0P(xk, T=
vkIA, T*)exp(-c(t
- tk))
Initially A* is a vector of zeroes of length 2N, hence all offensive and defensive
parameters are set to zero. Next, ll(TIA*) is maximised in order to obtain non-time-
dependent estimates T*. Then, using this value, 12(AIT*) is maximised to obtain
A**. Hence T* is obtained by giving equal weight to all matches, but assuming that
all teams are of equal ability, while A** is obtained by giving more importance to
recent matches. Team abilities are estimated subject to a restricted value of T*.
One could next consider maximising ll(TIA**) to obtain T** and repeating the
process described above until some desired number of iterations has been implemented
but this would be time-consuming and would also require a good understanding of the
behaviour of both il and 12, which is rather difficult given their large dimensionality.
For this reason only one implementation has been used. In certain situations it is more
chapters in this thesis, the process described above has been modified so that the
global parameters such as ry, ö and ox have been re-evaluated along with A, and a
prior term for them has been included as outlined in Section 3.2.2. The justification
is that their ubiquity in the likelihood function means they are less sensitive to the
One further issue that arises as a result of the down-weighting system employed by
the MLE method, although it also applies to any analysis which attempts to allow
estimates of teams' abilities to adjust over time, is the need to accommodate the break
that occurs between the seasons of most sports. The MLE method as outlined so
far assumes that team abilities adjust at the same rate over the season break as they
do during the season. This seems like an unrealistic assumption and so the solution
used in this thesis is to add, for every season before the current one, a between-
current one have w added to them while the time-points of the matches during the
51
season prior to that one have 2w added etc. This quantity is an external parameter
chosen in order to maximise the predictive likelihood. Since the functional form of
the predictive likelihood is too complicated to analyse algebraically, the only way to
find the optimal values is by using the rather crude technique of trying out many
sets of values, recording the predictive likelihood each time and choosing the set which
corresponds to the highest value of predictive likelihood. This technique is only valid if
between predictions obtained and those offered by bookmakers can only be made if
the optimal values for the external parameters are found using data which occurred
order to have genuine comparisons between model and bookmaker predictions, one
must divide the data set into two sections. The earlier section is used in order to find
optimal values for Tap, c and w. Then the updating of all non-external parameters is
performed at each time-point on the latter section of the data using the optimal values
of the external parameters. Predictions are then made using the most recent parameter
estimates and these predictions can be compared with the bookmaker's. If sufficient
time and computing resources are available, re-evaluation of the external parameters
could be performed at every time point, and summary statistics on the comparison
between bookmaker and model predictions could be computed at every stage.
The most frequently usedstatistic in this thesis in order to assessthe validity of models
is the predictive likelihood, as definedby Equation 3.1.3. Strictly speakingit is a device
which deals with model evaluation. This chapter focuseson stage 2 of this process,
52
however, given the frequent use of the predictive likelihood throughout the next three
(1,0,8,2,2,1,2,1,1,3)
and prior to each match it was believed that each score had an expected value of
p=2.1. In this case a predictive likelihood of -19.895 is obtained, assuming the scores
follow a Poisson distribution. Alternatively, suppose it was believed that each score
had an expected value of µ=2.01. Here the predictive likelihood becomes -19.915.
assumed that p=2.01, closer predictions for eight of the ten scores are obtained.
For spread betting, one should asymptotically make more profit using p=2.1 as the
prediction, since returns are proportional to the closenessof the predictions. For fixed
odds betting one would lose rather a lot of money by assuming p=2.1. In fact, by
looking at the logs of the observed probabilities of the scores given µ=2.1 and Poisson
distributed responses:
at the raw data of more complicated data sets than in this example). This suggests
that either the third score is an outlier, an extreme event, or that an extra covari-
ate to describe a characteristic feature of the third match is required. Some general
understanding of the sport being modelled may be important in order to decide this.
The material coveredin this chapter has been selectedwith the aim of describing
methods that are commonto all three markets that are coveredin the next three chap-
ters. It also suggestsa suitable method for other studies of similar sporting markets.
Each study must extend or modify these general methods in order to accommodate
the specific features of the market. The next three chapters illustrate this.
53
Chapter 4
soccer
This chapter investigates the rate of bookings in Premier League soccer. Motivated by
the rapidly growing and financially lucrative sports spread betting markets, the aim
is to estimate the distribution of the numbers of cautions and dismissals (yellow and
red cards) given by the referee in a particular future match. This is achieved using a
detailed statistical model to account for the characteristics of the two teams playing,
the referee and several other factors. The aim is to obtain predictions that could be
used as the basis for a profit making strategy on UK sports spread betting markets.
This chapter presentsthe first application of the likelihood maximising procedure
54
4.1 Bookings in soccer - an overview
Soccer, or Association Football, has been played with the same basic rules for over 100
years. The main change over this period has been the increase in its popularity and
the financial consequencesfor good or bad performance. As soccer and its participants
have become more professional, and successhas become more important, players must
play close to the boundaries of the rules, and inevitably sometimes break them. To
match officials (referees) are given the power to penalise a player who commits a
serious breach of the rules, or who continually commits minor offenses. Penalties can
range from a free-kick through to cautioning and ultimately dismissing (sending off) a
player. Every caution or dismissal by a referee is indicated to the offending and other
players by clearly displaying a yellow card (for a booking or caution) or a red card
(for a dismissal). The likely number of red and yellow cards to be shown in the match
differs from game to game depending on various factors: some players are more prone
to committing punishable offenses, while some referees tend to caution and dismiss
more readily than others. Estimating the distribution of the number of red and yellow
cards in a given match and investigating the influence of such factors are the subjects
of the chapter.
There have been a number of studies of both the statistics and the psychology
(1994) examine the effect of a red card on the outcome of the match, and even suggest
The aim here is rather different and is motivated by the opportunity of spread betting,
which is described using an example from this particular market in Section 1.3.2.
The volumes of bets on bookings markets can be huge: in fact for some firms it
is the most popular form of betting, so as a consequence there are strong financial
incentives to both bookmakers and gamblers for models that can accurately estimate
the probabilities of various outcomes of bookings markets and this is the underlying
55
Although prices are generally driven by gambler behaviour, the opening, or initial
spreads are quoted based on the spread companies' subjective probability of the mean
bookings points for the match. While for most sports markets the opening spreads
across bookmakers generally agree, for the booking markets, the opening prices are
usually very different. In the Arsenal versus Manchester United example detailed in
Section 1.3.2, the opening spread offered by one bookmaker was 30-34 but other firms
opened at 20-24 and 36-40. This level of discrepancy in opening prices is not atypical.
By developinga detailed statistical model, the aim is to estimate the distribution
of the numbers of yellow and red cards in a given match, and in addition to gain an
understanding of the bookings process. The fact that there is no existing literature
on the development of such a model that the writer is aware of is likely to be due to
lack of motivation: before the spread markets became popular, there was little desire
The data available include Premier League soccer matches since the start of the
1994/1995 season. The available data vary in detail: for later matches the referee
name is available, whereas for the early matches only the home and away red/yellow
card numbers are recorded. The data is split into three parts.
" Aug 1994- May 1997. Home/away red/yellow card numbers. (1222 matches)
" Aug 1997-May 1999. Home/away red/yellow card numbers with referee names
(760 matches).
referee date home away home away hm. aw. hm. aw. spread
team team score score yell. yels. reds reds
DGallagher 20010421 Arsenal Everton 4 1 0 2 0 1 37
NBarry 20010421 Bradford Derby 2 0 0 2 0 1 41
MDean 20010421 Chelsea Charlton 0 1 3 0 0 0 37
GBarber 20010421 Ipswich Coventry 2 0 1 4 0 0 35
GPoI 20010421 West Ham Leeds 0 2 5 2 0 1 48
56
The raw data for the second and third part of the data set are displayed in Figure
4.1 which displays histograms of numbers of yellow and red cards. Figure 4.1 suggests
Ir
s
ozae ozae
homes yýllOws away yellow.
U. O V '3 1.0 1.3 2.0 2.5 3.0 3.3 0.0 0.3 1.0 1.3 2.0 2.6
Figure 4.1: Histograms of yellow and red cards, with Poisson distribution lines overlaid.
sides tend to collect more bookings when playing away from home.
With the number of yellow cards being discrete and generally quite low, the Poisson
parameters are the overall means of the displayed data. While the overlaid lines do
appear to depart slightly from the histogram, note that each match has a different
expected number of yellows, so an exact fit of a Poisson model to the collated data
over matches is not expected, even if the distribution of yellows in each match has a
Poisson form. Hence it is assumed that yellow cards follow a Poisson distribution from
now on.
Concentrating initially on the second section of the data set, there is information
on 29 teams and 34 referees, with most referees officiating a match approximately once
every fortnight. Based on some initial thought, and simple exploration of the data,
" Fl the two teams' propensities to pick up bookings (hereafter termed the teams'
dirtinesses)
57
" FZ the two teams' propensities to provoke the opposition into getting booked
" F3 the referee's propensity to give out cards (hereafter termed the referee's harsh-
ness)
" F5 the current climate. The expected booking rate for an averagematch will
changethrough time, either abruptly or gradually, due to factors suchas referees'
guidelines,rule changesand state of the season.
" Other factors There are numerous other features that could also be important.
For example, dependence between home yellows and away yellows, in that if one
side collects many yellows, the general match temperature will rise and may pro-
voke fouls from the opposing side. Also the weather, longer-term consequencesof
players and referees, may, among other factors, all be influential on the bookings
rate.
In Sections 4.2.1-4.2.4, what are consideredto be the main effects, namely factors
F1-F6, are explored using empirical summariesof the data.
During Premier League soccer seasons1997/98 until 2001/02, Derby collected an aver-
age of 2.242 bookings over 190 matches, with a bootstrap confidence interval of (2.050,
2.434). Manchester United collected an average of 1.432 (1.244,1.620) in the same
time period. It is well acknowledged that some teams have players who are more
likely to collect bookings. What may be more surprising is that, for example, Leeds
provoked on average 2.453 (2.222,2.684) bookings from their opponents, while dur-
ing the same period, Southampton provoked only 1.489 (1.295,1.683) bookings. As
for referees, G Barber booked on average 4.269 (3.881,4.657) players in each match,
whereas the equivalent statistic for P Durkin is 2.832 (2.383,3.281). This suggests
definite team and referee specific effects for factors F1-F3. It is interesting to note
58
that the dirty teams are not necessarily the most provocative as one might expect. For
example, only two teams collected more bookings than Nottingham Forest during the
1998/99 season, yet only one team attracted fewer bookings. In fact, for every booking
Nottingham Forest provoked, they collected 1.79 themselves. Figure 4.2, which plots
the average number of bookings sides attracted against those they provoked in the
1998/1999 season, emphasises this lack of association.
0Ars. n&
a) Un
CV
0
a
0
U)
oTottenharn Na
DE rton
O 00harlton
ol. oistor
0 CV MQF.,. tör&, rR
0
0Cov. ntry
a) 0 N. wcastl.
dd`b''
Southampton
N 0
a, o Wwt Ham
0 0 She iel do
a) Q Nottm ForMt
E
rn 0
0 %Mmdedon
ca
Table 4.2 displays the averagenumber of red or yellow cards collected by a side,
compared to the difference in scoreof the match. There is little doubt that the worse
59
the result of the match is for a team, the more likely they are to collect bookings.
-4
0
ý.;
Z2
week -bor
Figure 4.3: Moving average of number of yellow cards awarded in each match. The vertical
green lines denote season breaks
Figure 4.3 displays the moving average (block size 50) of the total number of cards
given out for all matches since August 1994. It suggests the bookings market has
rate that occurred just before week 300. In fact, at around that time (January 2000)
the FA issued instructions to all Premier League referees advising them to exercise
more caution when issuing yellow and red cards. However, looking at the entire graph,
it appears that the awarding of bookings is more `fashionable' at certain times than
others.
Table 4.3 displays information concerning the numbers of bookings awarded during
matches between various pairs of teams who are recognised as being strong rivals.
60
Although the information is based on quite a small number of matches, and there are
other factors which determine the bookings rate in a match, it doesappear that there
may be a genuineeffect from these rivalries.
On a similar theme, it could be considered whether there is any effect from the
various pressures that some teams are under towards the end of a season. These
pressures could include the possibility of winning the Premier League, qualifying for
relegation.
In Section 3.1 a simple sports model was specified, which uses only the two teams
involved and a home effect as relevant predictors. This model is the template upon
which the model for yellow and red cards will be developed. For this application,
the attacking and defensivecapabilities can be substituted by teams' dirtiness and
provocation levels to model booking rates. However, as discussedin Section 4.2, now
there are other first-order effects that need to be included.
The refereescan be treated in the sameway that the teams' dirtiness and provocation
factors are. So a harshnessparameter is associatedwith eachrefereeand the individual
so the task reducesto finding the joint distributions (HYk, HRk) = (HRkI HYk) (HYk)
and (AYk, ARk) = (ARkIAYk)(AYk). This will be attempted in Section 4.3.7. For
Sections 4.3.2 to 4.3.6 it is the expected number of yellow cards that is examined
unless otherwise indicated. The validity of the assumption of independent home and
61
away booking rates will be discussed in Section 4.5.2.
The obvious problem with trying to include the result of a match in the model is that
the result of the match is not known at the time the prediction needs to be made.
An attempt can be made however to predict which matches are more likely to result
in a larger difference in score. This can be achieved quite easily by using the model
specified by Equation 3.1.1 and implementing the MLE procedure described in Chapter
3 in order to obtain parameter estimates for teams' goal-scoring abilities. This is in fact
the original application for which Dixon and Coles developed this procedure. Table 4.4
gives both the attacking and defensive parameters for all teams just after the matches
played on 11/05/2002. Table 4.5 provides predictions for the matches which took place
Table 4.4: Goal-scoringoffensive (ä) and defensive (ß) team ability estimates, May
2002
Team & rank rank
Arsenal 0.2875 2 -0.2853 1
Aston Villa 12 6
-0.0602 -0.1023
Barnsley -0.1061 14 0.2382 26
Blackburn -0.0058 7 0.0121 9
Bolton -0.0804 13 0.1574 22
Bradford -0.3022 29 0.2959 28
Charlton -0.1424 18 0.1003 19
Chelsea 0.2058 4 3
-0.2225
Coventry -0.1467 19 0.0742 16
Crystal Palace -0.1081 15 0.1734 23
Derby -0.1897 25 0.1029 20
Everton -0.0587 11 0.0673 14
Fulham -0.2336 26 -0.0465 7
Ipswich -0.0483 10 0.0771 17
Leeds 0.1412 5 -0.1659 5
Leicester -0.161 21 0.0653 13
Liverpool 0.2087 3 2
-0.2606
Man City -0.1806 24 0.23 25
Man United 0.4852 1 4
-0.1888
Middlesbrough -0.1629 22 8
-0.0019
Newcastle 0.1104 6 0.0202 10
Nottm Forest -0.2342 27 0.2486 27
Sheffield Weds -0.1108 16 0.1329 21
Southampton -0.1177 17 0.0874 18
Sunderland -0.167 23 0.022 11
Tottenham -0.0066 8 0.04 12
Watford -0.2482 28 0.3398 29
West Ham 9 0.0733 15
-0.013
Wimbledon 20 0.1832 24
-0.1469
62
Table 4.5: Scorepredictions for 11/5/2002
Home Away Home Away
team team predicted predicted
goals goals
Arsenal Everton 2.1375 0.7793
Blackburn Fulham 1.4241 0.874
Chelsea Aston Villa 1.6644 0.822
Leeds Middlesbrough 1.7251 0.7851
Leicester Tottenham 1.3297 1.1566
Liverpool Ipswich 1.9971 0.801
Man United Charlton 2.6952 0.7832
Southampton Newcastle 1.3614 1.3293
Sunderland Derby 1.4077 0.9223
West Ham Bolton 1.7288 1.0973
The next step is to include the score predictions generated in this way in the
predictions for yellow cards. Figure 4.4 plots a moving average of predicted score
difference versus collected yellow cards, for both the home side and the away side.
It appears that for the home side at least, if a side is expected to win, then their
average number of yellow cards decreases. For the away side, the situation is less clear.
The approach taken is to include separate home and away parameters to reflect the
II 0
o
0
WI)
CS
C 0
0
-1 0z -Z -1 V7
By examining Figure 4.3 it appears that generally yellow cards are awarded most
frequently at the start of a season, then tail off gradually until the end of the season.
63
model should acknowledge the generally high booking rate during that period when
evaluating the parameters for that team in order to avoid unnecessary bias.
There are two issues which need to be resolved here:
" How is the climate estimated for matches which have taken place already, in
order to minimise bias in the maximum likelihood estimation of the other model
parameters?
" How is an estimate provided for the climate of a future match for which a pre-
diction of the number of yellow cards is required?
To resolve the first issue a smooth curve is fitted which reflects the trends observed
set to 5 weeks (see Section 4.7.2 in the additional comments section for an explanation
procedure is modified directly after the trough in bookings rates around week 120,
as explained in Section 4.2.3. In this case, kernel smoothing is applied only to data
fV
wnk number
Figure 4.5: Moving average of observed yellows (_) and estimated climate Vertical
lines denote the start of a football season.
mated climate appears to be a sensible prediction of the next fixture's climate. The
exception to this is at the start of the season when, on inspection of Figure 4.5, a
rise in the climate is likely to occur. The reasons for this are not entirely understood
64
to the writer (possibly new guidelines for certain offenses are issued at the start of
most seasons). However, to accommodate this effect, the following simple procedure
is employed:
Let S be the number of seasons in the data set. Let ICI, ICs be the initial
...,
climate of each seasonand FC1i..., FCS be the final climate of each season,as displayed
in Figure 4.5. If a prediction E[ICC] of the climate at the start of season i is required
ýj=2=i-l(IC FC )
E[IC, ] = FC; + - -1
-1 i-2
So the expected climate at the start of a seasonis the climate at the end of the previous
season, plus the mean change in climate from the end of one season to the start of the
next, for all seasons observed until then. This value is carried through the first ten
time-points in each season, to remove the instability that arises from having only a
scribed above. It incorrectly predicts a jump in the climate at the start of the fourth
season, but generally seemsto predict the climate adequately. Note that seasons94-02
are employed to obtain the data for the season-jump, but only the climate for seasons
97-02 is plotted.
wnk numb. r
Figure 4.6: Plot of moving average of observed yellows (-) along with predicted climate
(-)
Finally, the discussion above is concerned with the climate of yellow cards. It is also
necessary to repeat the methodology in order to obtain an estimate for the climate of
red cards, since the awarding of red cards is also subject to various external pressures.
65
Figure 4.7 displays the moving average, fitted climate and predicted climate for red
cards. Again a bandwidth of 5 weeks appears to provide a satisfactory fit of the curve
to the observed data.
ý
ý,
week number
Figure 4.7: Plot of moving average of observed red cards (-) along with smoothed climate
(-) and predicted climate (-)
The final first order effects considered are those of team-to-team rivalries and specific
match incentives. Once the levels of rivalry and incentives have been determined, their
inclusion as factors in the model is straightforward, although it must be done at a
later stage of the modelling process, for reasons which will be discussed shortly. In this
case, the level of rivalries that exist between specific teams is determined empirically
following consultation with Tony Bloom, who has researched thoroughly the soccer
clubs' official magazines produced for supporters, collected prior to the first matches
of the data set, in order to determine the traditional rivalries. Table 4.6 displays some
of the levels of rivalry employed. Note that rivalries are not entirely symmetric: for
example, Leeds are a strong rival of Bradford, but not vice versa. This reflects the fact
than Leeds.
Meanwhile, a match is deemed to have a specific incentive attached to it if the
result of the match may have an abnormally significant effect on the future of the club.
Specifically, a team has an incentive if the match result may affect to a large extent the
probability that the team wins the Premier League or is relegated from the Premier
League. In order to calculate the probabilities of these two events, predictions for the
66
Table 4.6: Level of rivalry between teams
Team Strong rivalries Mild rivalries
Coventry Aston Villa Leicester,Derby
Everton Liverpool None
Leicester None Coventry, Aston Villa, Derby
Leeds Man United Barnsley, Chelsea,Bradford
Bradford Leeds None
Man City Man United None
numbers of goals in the remaining matches are necessary. The team goal-scoring and
final league table. The probability that a team is relegated is calculated to be the
proportion of these simulated seasons that result in the team's relegation. The prob-
ability that a team wins the Premier League is similarly defined. Table 4.7 lists the
final matches of the 2001/2002 seasonalong with these probabilities before the matches
take place. The probabilities of qualifying for two lucrative soccer tournaments, the
UEFA Cup and the Champions League, are not considered due to the rather compli-
cated rules which determine the chance of either event taking place, although this is a
Table 4.7: Title and relegation probabilities at end of 2001/2002 season. The proba-
bilities apply before the listed match takes place.
date home P(win) P(releg. ) away P(win) P(releg. ) score
20020427 Aston Villa 0 0 Southampton 0 0.002 2-1
20020427 Charlton 0 0.002 Sunderland 0 0.456 2-2
20020427 Derby 0 1 Leeds 0 0 0-1
20020427 Fulham 0 0 Leicester 0 1 0-0
20020427 Ipswich 0 0.395 Man United 0.124 0 0-1
20020427 Middlesbrough 0 0 Chelsea 0 0 0-2
20020427 Newcastle 0 0 West Ham 0 0 3-1
20020427 Tottenham 0 0 Liverpool 0.071 0 1-0
20020428 Everton 0 0 Blackburn 0 0 1-2
20020429 Bolton 0 0 Arsenal 0.776 0 0-2
20020508 Liverpool 0 0 Blackburn 0 0 4-3
20020508 Man United 0.122 0 Arsenal 0.878 0 0-1
20020511 Arsenal 1 0 Everton 0 0 4-3
20020511 Blackburn 0 0 Fulham 0 0 3-0
20020511 Chelsea 0 0 Aston Villa 0 0 1-3
20020511 Leeds 0 0 Middlesbrough 0 0 1-0
20020511 Leicester 0 1 Tottenham 0 0 2-1
20020511 Liverpool 0 0 Ipswich 0 0.937 5-0
20020511 Man United 0 0 Charlton 0 0 0-0
20020511 Southampton 0 0 Newcastle 0 0 3-1
20020511 Sunderland 0 0.063 Derby 0 1 1-1
20020511 West Ham 0 0 Bolton 0 0 2-1
67
In order to assessthe effects of incentives and rivalries as accurately as possible,
some realistic match home and away yellow card predictions are needed. This is
because in order to detect if these factors affect booking rates, it is necessary to compare
a set of predictions for yellow cards which take account of these factors with a set of
reasonably reliable predictions that do not. To obtain a set of predictions of the second
type, a model incorporating factors F1-F5 as outlined in Section 4.2 is fitted.
By adapting the model specified in Section 3.1, at time t, the expected number of
home and away yellow cards (HYk and AYk) for match k betweenteams i(k) and j(k),
where
" CYk represents the estimated yellow cards climate at the time match k takes
" as(k), as(k) are team i(k) and j(k)'s dirtiness parameters
" ß; (k), ßß(k) are team i(k) and j(k)'s provocation parameters
"Ak= E[HSCk] - E[ASCk] where E[HSCk], E[ASCk] are the home and away
" sh and sa are the home and away coefficients for the effect of home and away
predicted superiority.
since it has been separately determined in Section 4.3.3. It should also be noted
68
that the time down-weighting, prior tightness and seasonal truncation parameters, as
defined in Sections 3.1,3.2.1 and 3.2.4 and referred to as external parameters, are as
yet undetermined. To determine these, several sets of their values are fixed and for
each set, the entire set of internal parameters (the parameters included in Equation
4.3.1) are estimated at each time-point. They are then used to find predictions for
the numbers of yellow cards given and the resulting predictive likelihood statistic is
monitored. Table 4.8 displays the predictive likelihoods obtained in this way. The
optimal value is highlighted in red and it appears that (0.02,0.2,20) is close to the
optimal values for the time down-weighting, prior tightness and seasonal truncation
parameters respectively.
Table 4.8: Predictive likelihood of yellow cards model obtained for different choicesof
external parameters
Truncation w= 5 weeks:
Prior variance raß of offensiveand defensiveestimates
0.05 0.1 0.2 0.5
0.001 -5908.616 -5866.486 -5858.54 -5894.354
0.005 -5911.713 -5864.679 -5852.256 -5887.95
Weight; 0.01 -5916.128 -5864.963 -5846.599 -5882.129
0.02 -5924.553 -5871.035 -5841.957 -5877.462
0.05 -5941.492 -5896.95 -5852.989 -5891.999
Truncation w= 10 weeks:
Prior variance rß,ß of offensive and defensiveestimates
0.05 0.1 0.2 0.5
0.001 -5908.656 -5866.419 -5858.394 -5894.212
0.005 -5911.986 -5864.538 -5851.674 -5887.377
Weight S 0.01 -5916.717 -5865.076 -5845.814 -5881.361
0.02 -5925.566 -5872.059 -5841.579 -5877.161
0.05 -5942.824 -5899.437 -5854.699 -5894.17
Thuncation w= 20 weeks:
Prior variance r. 0 of offensive and defensiveestimates
0.05 0.1 0.2 0.5
0.001 -5908.761 -5866.232 -5857.992 -5893.817
0.005 -5912.715 -5864.196 -5850.117 -5885.842
Weight c 0.01 -5918.25 -5865.535 -5843.871 -5879.461
0.02 -5928.067 -5874.945 -5841.084 -5876.981
0.05 -5946.164 -5905.606 -5860.328 -5902.203
?huncation w= 30 weeks:
Prior variance raß of offensive and defensiveestimates
0.05 0.1 0.2 0.5
0.001 -5908.725 -5866.309 -5858.153
-5893.975
0.005 -5912.449 -5864.339 -5850.736 -5886.456
Weight t 0.01 -5917.691 -5865.367 -5844.635 -5880.213
0.02 -5927.174 -5873.874 -5841.271 -5877.038
0.05 -5944.897 -5903.322 -5858.008 -5898.61
69
4.3.6 Modelling Incentives
By obtaining MLEs for the parameters in the model specified by Equation 4.3.1 (sub-
ject to near-optimal values for the time down-weighting, prior tightness and seasonal
truncation parameters), predictions can be generated that are necessary for the final
stage of the modelling process. This is to test the effect of specific match incentives
and rivalries on the bookingsrate. The effect of the incentivesand rivalries is examined
by fitting severalgeneralisedlinear models.
Before constructing these models, the following variables are defined:
10
if in match k both sides can win Premier League
w1
/C
otherwise
relegated lies between 0.05 and 0.95, to ensure that teams whose predicament is efec-
tively sealed are not classified as having an incentive. The same principal is applied to
the teams who can win the Premier League. Also, in situations when rivals are also,
for example, both fighting against relegation, then the rivalry indicator is set to zero,
since it is assumed that the threat of relegation is the more dominant effect in the
match, and that the effects of these two factors are not additive (data are too sparse
to test this belief). Table 4.9 displays the relevant results.
70
Table 4.9: Investigating effect of derbies and incentives. hk and c% represent the
predictions for home and away yellow cards from the model constructed from factors
1-4
Some of the results in Table 4.9 are a little surprising. For example, it appears
that the threat of relegation has no effect on booking rates, even if both teams in the
match are relegation rivals. Similarly the booking rate for a match involving a side
in contention for winning the Premier League only rises if both sides participating are
this reason that the term for the mild relegation indicator is not included in the final
four models tested in table 4.9. Thus the only alterations needed in the model are the
additions of parameters that allow the expected number of yellow cards to increase
in matches between sides who are both in contention to win the Premier League and
matches where the two sides are traditional rivals. If this rivalry is mild, only the home
4.3.1 to 4.3.4 it is now possible to state the specification of the final model for the
mean yellow card rates. For match k between home team i(k), away team j(k) and
refereed by official r(k) the expected number of home and away yellow cards are:
71
" A, is the parameter for the effect of playing against a strong rival
" \a is the parameter for the effect of playing against a mild rival
"v is the parameter for the effect of both teams being rivals for overall victory in
Finally, a model for red cards conditional on the number of yellow cards is required.
Denoting the number of home and away red cards by HRk and ARk and the fitted cli-
mate for red cards displayed in Figure 4.7 by CRk, a straightforward model, assuming
a Poissondistribution in the likelihood, is:
where home effect, intercept and slope parameters (Jr, µh, µä, Qh,ea) are to be esti-
mated.
Note that the parameters included in Equation 4.3.2 are not all estimated within
A Aj and v are treated as parameters that are constant throughout time. However,
the parameter estimation procedure for allowing team parameters to be based on more
recent results also bases its estimates of the and v parameters on more recent
practice parameter estimates are obtained using a procedure similar to that outlined
in Section 3.2.3. Applying it to this example the procedure is as follows:
72
3. Again perform maximum likelihood estimation of the model described in Equa-
tion 4.3.2 but where the µh, µß, ö,sh, 8a,a Am and v parametersare treated as
By repeating this procedure at each time-point, estimates for each parameter are
obtained.
Tables 4.10 and 4.11 display the estimates for team and referee parameters, obtained
at time-point 256 (by which time 124 weeks have elapsed in the data set) and at time-
point 512 (when 249 weeks have elapsed), the final time-point in the data set at the
time of writing. Note that Ipswich, Manchester City and Fulham had not played in
the Premier League by time-point 256, hence do not have any estimates here. Figure
4.8 plots the team and referee estimates for selected teams and referees over time. The
period where Blackburn's estimate is almost flat corresponds to the two year period
when Blackburn were not playing in the Premier League due to being relegated at the
end of the 1998/99 soccer season. The curve is not totally flat though, because although
Blackburn do not participate in any matches during this period, their opponents and
referees do. As a result, the parameter estimates for Blackburn are slightly re-evaluated
based on data about opponents and referees that the parameter estimation procedure
subsequently incorporates.
The predictive ability of the model can be assessedvia its predictive likelihoodstatistic
as defined in Section 3.1. Table 4.12 displays this statistic, plus predictive likelihood
statistics for some simpler models, in order to gain a clearer picture of the model's
accuracy. Note that the joint likelihood of the number of (home yellow, away yel-
low, home red, away red) cards is calculated rather than the points make-up, which
has a rather less tractable distribution. Model 1 predicts that total bookings in any
match will be the mean total bookings observedin all matches prior to the game, in
73
Table 4.10: Team dirtiness (ä) and provocation (ý) parameter estimates, with ranking
displayed in brackets
Team ä, t=256 d, t=512 ß, t=256 ß, t=512
Arsenal 0.056 (11) 0.124 (7) 0.175 (3 0.145 (4)
Aston Villa -0.008 (16) -0.078 (22) -0.025 (16) 0.021 (13)
Barnsley 0.039 (12) 0.016 (16) 0.005 (15) 0.001 (17)
Blackburn 0.099 (6) 0.029 (14) 0.065 (10) 0.086 (8)
Bolton 0.031 (13) (18) 0.048 (13) -0.077 (23)
-0.012
Bradford (23) (23) (22) -0.119 (26)
-0.15 -0.084 -0.071
Charlton -0.093 (20) -0.046 (20) 0.087 (8) 0.063 (11)
Chelsea 0.192 (2) 0.162 (4) 0.073 (9) -0.056 (22)
Coventry -0.001 (15) 0.075 (8) -0.026 (17) 0.007 (15)
Crystal Palace -0.034 (19) -0.008 (17) 0.052 (11) 0.02 (14)
Derby 0.197 (1) 0.229 (2) -0.049 (18) -0.043 (20)
Everton 0.12 (5) 0.134 (6) 0.175 (4) 0.09 (6)
Fulham - -0.028 (19) - 0.233 (2)
Ipswich - -0.286 (29) - -0.091 (25)
Leeds 0.121 (4) 0.236 (1) 0.22 (1) 0.298 (1)
Leicester -0.254 (25) 0.02 (15) 0.05 (12) 0.092 (5)
Liverpool -0.013 (18) -0.093 (24) -0.062 (20) -0.027 (19)
Man City - 0.04 (12) - 0.06 (12)
Man United -0.144 (22) -0.072 (21) -0.155 (24) -0.164 (27)
Middlesbrough 0.08 (9) 0.055 (10) 0.098 (7) 0.082 (9)
Newcastle -0.131 (21) -0.178 (28) 0.119 (6) 0.068 (10)
Nottm Forest 0.093 (7) 0.071 (9) -0.147 (23) -0.078 (24)
Sheffield Weds -0.205 (24) -0.17 (27) -0.26 (25) -0.205 (28)
Southampton 0.01 (14) -0.157 (26) 0.008 (14) -0.045 (21)
Sunderland 0.185 (3) 0.163 (3) 0.123 (5) 0.162 (3)
Tottenham 0.084 (8) 0.034 (13) 0.207 (2) 0.006 (16)
Watford -0.01 (17) 0.046 (11) -0.052 (19) -0.026 (18)
West Ham 0.078 (10) 0.142 (5) -0.063 (21) 0.088 (7)
Wimbledon -0.261 (26) -0.134 (25) -0.515 (26) -0.36 (29)
other words that the booking rate in a match is not dependent on the referee, the
teams playing or the climate and can best be predicted by the overall mean number
of bookings for all matches. Model 2 is more sophisticated, where for each match
the home prediction is a combination of the mean number of yellows the home team
has collected, the mean number of yellows the away team has provoked and the mean
number of cards the referee has awarded in previous matches (all weighted according
to how recently these matches occurred). The away prediction is calculated similarly.
Also, the prevailing climate for bookings is accommodated. The exact method used
is outlined in Section 4.7.1 of the additional comments. This model has been devised
since it does not employ any advanced statistical methods, and might well be an ap-
proach a non-statistician, with access to the relevant data, would use. Model 3 is
the model incorporating factors F1 to F5 described in Section 4.3, hence does not
consider rivalries or incentives. Model 4 is similar to Model 3 but with rivalries and
incentives included, hence is the most advanced model constructed in this chapter and
74
Table 4.11: Referee parameter estimates at timepoints 256 and 512. The number in
brackets is their ranking out of all the referees who had officiated at that time-point
Referee 1t=256 5', t=512
P.Alcock -0.063 (20) -0.037 (25)
G.Ashby 0.007 (13) 0.006 (19)
G.Barber 0.148 (2) 0.089 (9)
N. Barry 0.037 (9) 0.003 (20)
S.Bennett 0.092 (6) 0.057 (12)
M. Bodenham 0.004 (14) 0.001 (21)
K. Burge -0.217 (25) -0.095 (29)
M. Dean - 0.096 (8)
P.Dowd - 0.106 (4)
S.Dunn -0.022 (17) (26)
-0.05
P.Durkin -0.197 (24) -0.254 (34)
A. Durso 0.031 (10) 0.049 (13)
D. Elleray -0.122 (22) -0.148 (32)
C.Foy - 0.128 (3)
D. Gallagher -0.176 (23) -0.008 (23)
M. Halsey -0.024 (18) -0.174 (33)
R.Harris 0.119 (4) 0.078 (10)
P.Jones -0.054 (19) 0.001 (22)
B.Knight 0.116 (5) 0.105 (5)
S.Lodge 0.031 (11) 0.01 (18)
M. Messias - 0.015 (17)
G.Poll 0.086 (7) -0.069 (27)
D. Pugh - 0.022 (16)
M. Reed 0.151 (1) 0.101 (7)
U. Rennie 0.048 (8) (31)
-0.126
M. Riley 0.004 (15) 0.166 (1)
R. Styles - 0.139 (2)
P.Taylor - (24)
-0.015
A. Wiley (16) (28)
-0.021 -0.075
C.Wilkes - 0.103 (6)
A. Wilkie 0.024 (12) 0.027 (15)
G.Willard 0.143 (3) 0.065 (11)
J.Winter -0.064 (21) (30)
-0.124
E.Wolstenholme - 0.048 (14)
the most advanced model is the one with overall the most accurate predictions. It is un-
fortunately not possible to produce equivalent figures for the bookmaker's predictions,
since they provide only a prediction for the total number of points accumulated in the
match, where 10 points are awarded for each yellow card and 25 points are awarded
75
Dirtlne.. I..
ti-.
On
q
V 60 100 ISO 200 260 300
week number
Provocation estimates
.e
8{
4
a N
ý`ý
9
Harshn-p estimate.
r._.ýý'yý,
if. v'__.,
_v-.. 1`ß,.. -ý_
_,_ _ ýý.
ýq ý., - ,_
,, ýýýý ýý ,,
.,
4
0 -, DO 100 160 400 260 300
week number
Figure 4.8: Plots of team dirtiness and provocation estimates over time, for Blackburn (_),
Newcastle (_) and Man United (_). Also plots of referee harshness estimate for D. Elleray
(_), P. Durkin (. ) and G. Barber (_)
for each red card (as outlined in Section 1.3.2). These values cannot be converted into
Poisson-distributed predictions for home and away yellow and red cards.
Using the predictions produced from the most advanced model it is of interest to for-
mulate a betting strategy and observe the returns it would generate. Bets should be
placed when a discrepancy arises between the model predictions and the spread pro-
vided by a bookmaker. The model predictions for individual yellow and red cards can
easily be converted into predictions for points make-ups by summing the probabilities
of all the permutations of cards which result in each possible make-up. Figure 4.9 plots
the quoted spread prices against the model predictions of points make-ups. While there
is broad agreement, it is the points away from the diagonal which represent matches
76
M
x
31 zz x
x ix xxxx
x x x x
Ix $XX
xx xx x M xii xg x
M x xxx x
xM xx
I x
x* M x
xx Ä x
xX SIX zx
mXx x xx
x xxx
x
x
x
xx
.? xxA I x z#x
i zxx x
iE x
R Mx XM * x x
x xx xX
20 30 40 60 60
spreadsfor the samematches. Define K to be a cut-off value where if, for match i,
MBi <SB;
-K-2
then a bet is placed on low bookings. The 2 point addition or subtraction appears
becausethe bookmaker offers 4-point spread intervals, rather than a single number,
in order to make its profit. The profit or loss made by following this betting strategy,
for different valuesof K, is considered.Figure 4.10 plots annual returns, in points, for
77
Observed annual return during 99/00 season
":.. ""y: ..
eý
o. ý
-e oe4e" 10
Cut-of margin
""ýL. '",
ý, ýj, tea.. ,"
4
-ý °"4s" 10
Cut-off margin
ýýh
Cut-off margin
Figure 4.10: Plot of annual profit against increasing values of cut-off. The dotted lines
represent95% bootstrap confidenceintervals
78
The observed return curves axe somewhat bizarre, since it seems that even by
makes a profit in all seasons. The 99/00 return curve must be regarded with some
skepticism since it was during the middle of this season that the dramatic drop in
booking rates highlighted in Section 4.2.3 occurred. In reality there was considerable
and included in the 99/00 return curve could not havebeenplacedwith any confidence.
Note that the returns generatedwhen the expectedreturn is negative do not corre-
spond to "random betting", sincethis strategy still excludeswhat the model considers
to be especially unattractive bets even if the cut-off value K<0. A random betting
strategy doesnot do this. Interestingly, the sum of the spread sell points for the 00/01
and 01/02 seasonswas 33844,while the total points make-upsfor the same matches
was 33320, meaning one would have achieveda profit of 524 points by selling every
match.
The strategy employed in Figure 4.10 is rather naive since bets with equal expected
return but different variances are treated equally whereas the bets with lower variance
are more attractive to many gamblers. For example, considerthe two matches detailed
in Table 4.13. According to the model predictions, the bookings total should be sold
in both matchesand both matcheshave similar expectedreturn. The differencein the
Table 4.13: Data for two matches in data set with equal moan returns
Date Home Away Spread Model Expected Variance
team team prediction return of return
20000514 Sheffield Weds Leicester 22-26 17.88 4.12 236.54
20000826 Everton Derby 52-56 47.92 4.08 639.66
" For the first match, the maximum possible win is 22 points and there is a 19%
chanceof this occurring. The probability of losing 50 points or more is 0.5%.
. For the second match, there is a 32% chance of winning 22 points or more and
79
Sheffield Wednesday versus Leicester, 20000514
8
Pointe ntum
points rtum
Figure 4.11: Density functions of returns on two bets with equal expected return but different
variances
first match.
There is another more subtle consideration concerning the variance of the expected
profit of each bet. So far, the variance of the parameter estimates has been used
only in Section 4.3.6 in order to evaluate the significance of extra covariates (including
addition to the variance of the data given the conditional mean specified by the model,
it may be worth considering the model within a Bayesian context and including also
the variance of the parameter estimates that are present in the conditional mean for
a given match. For example, the parameters employed in the prediction for a match
involving a newly promoted team or a new referee are subject to more uncertainty than
the parameters for a match involving teams and a referee that have been observed in
many matches. Calculating the total variance of estimates of all parameters involved
in the prediction of a match score is computationally awkward, but may be useful for
One match statistic to which the bookings rate might well be related is the fouling
80
was known, the prediction of the number of bookings would be influenced. Define
HFk, AFk to be the number of fouls by the home and away sides in match k and
HYk, AYk to be the number of yellow cards. A multivariate model can be formulated
that suggests a distribution for the number of fouls and from that a distribution for
where 0F representsa set of parameterswhich may determine the foul rate, suchas the
teams involved, and ®'l' representsa set of parameterswhich determine the proportion
of fouls which convert to yellow cards. These may also be team-specific. An approach
similar to this is carried out on NFL match scoresin the next chapter.
The assumption made throughout this chapter that the booking rates of the home
and away sides are independent of each other simplifies the model but it does seem
dubious. For example, if a side collects five bookings against a side which collects none,
that appears to be a `dirtier' performance than if the opposition had also collected five
bookings, since in the latter case, the high bookings rate can be put down to the
(i, j)
,f
JH(i)JA(j)
for each joint home and away bookings rate (i, j) i=O,..., 9 and i=O,..., 8, where
1H, JA
f, are the joint and marginal empirical probability functions for home and away
bookings.
A pattern to Table 4.14 is observed. Entries on or close to the (home bookings=away
bookings)diagonal generally occur more frequently, and entries away from the diago-
81
Table 4.14: Frequency of observed joint scores divided by expected frequency given
independence assumption
Away bookings
0 1 2 3 4 5 6 7 8
0 1.76 1.28 0.85 0.66 0.61 0.4 0.3 0 0
1 0.93 1.01 1.09 0.89 1.11 1.08 0.76 0.45 0
2 0.77 0.97 0.95 1.23 0.85 1.27 1.47 1.96 1.53
3 0.44 0.8 1.11 1.24 1.34 1.48 0.96 1.15 2.67
Home bookings 4 0.15 0.42 1.02 1.52 1.91 1.79 3.1 5.54 3.23
5 0.44 0.22 1.06 1.97 1.94 0 2.24 0 9.31
6 0 0.53 2.85 0 0 0 10.86 0 0
7 0 0 0 5.15 0 0 0 0 0
8 - - - - - - - - -
9 0 0 0 5.15 0 0 0 0 0
4.6 Conclusion
Overall, the results obtained from the model implemented are quite encouraging since
consistent profits are made for each year that a relatively naive betting strategy is sim-
ulated. In fact, the profit curves displayed in Figure 4.10 are likely to be conservative
estimates since it is the averagespread available from four bookmakers rather than
the most favourable price offered that has been used to calculate hypothetical profit
curves. Therefore many of the winning bets in practice would have resulted in slightly
greater wins than recorded here and many of the losing bets would have resulted in
slightly smaller losses. Also, more bets would have been placed if a larger range of
variability of booking rates in soccermeansthat all bets are relatively high risk. While
it is true that with any gambling system stakes must be decided in such a way that
the probability of financial ruin is kept to an acceptably small level, the non-negligible
probability of very large make-ups (2.0% of matches result in a total points make-
up of 100 or more) in booking rates means that any Sell bet is potentially risky.
Approximately 65% of bets are Sells if a cut-off value of 4 is used when placing bets.
In order to make large amounts of money by betting on this market one must be able to
any bet is restricted by the gambler), attempt to realise similarly profitable strategies
but with a more stable return curve.
82
4.7 Additional comments and information
Model 2, as employed in Section 4.4.2, creates predictions for yellow and red cards
without recourse to formal statistical modelling. The prediction for the home number
calculating the mean of the total number of yellows collected in the fifty matches prior
to the time when the match of interest takes place. The likely increase in booking
rates at the start of the season is estimated by a similar method to that described in
Section 4.3.3, by using the mean jump in the climate at the start of previous seasons.
This number is added to the climate at the end of the previous season to obtain
the climate for the first fifty matches of any season. Also, after the sudden drop in
bookings observed in January 1999 (week number 127), the mean number of yellows in
all matches since week number 127 is used, until week number 134 (which corresponds
U,
e_
3
4O
ch
Lp
N
week number
Figure 4.12: Predicted climate curve. The solid green lines denote the start of a new season,
the dotted line denotes the time-point where referees were advised to be more cautious with
regards to issuing cards
Next, estimates for teams' attacking and provoking parameters and the referees'
harshnesses are needed. This is done using a weighted mean, weighted according
to time. Let HYk and AYk represent the number of home and away yellows ob-
served in match k between sides i and j. Suppose team i has played at home in
83
matches iH (1), iH (N; H) and away from home in matches i, 4(1), iA (N; A) prior
... , ... ,
to match k. The yellow yards they have collected in these matches are therefore
HYX(1), HY,"H(N; and AYA(1),... AYiA(i Also, let t(k) be the time match k
... , H) , 4).
takes place.
Then when team i plays in match k the estimate of their attacking rate is defined
as follows:
Emý
,
HYiH(m) * exp(-w * (t(k - 1) - t(iH(m))))
attrk =
ým'k-1 HY
*º3
AY,, (m) * exp(-w * (t(k - 1) - t(iA(m))))
+EmI1 (4.7.1)
E-1 AY m
rrt=1
The rate defined by Equation 4.7.1 is a time-weightedmean of all of team i's home
and away yellows, divided by the mean home and away yellows in all matches before
match k. Equation 4.7.1 is equal to 1 if team i has an average booking rate, compared
to all teams. The weighting factor to is set to be the same value (0.02) as that selected
in Section 4.3.5, where the values for the external parameters for the yellow cards
The provocation rate pro and referee harshness harshk are defined on a similar
Ek_1 HY
E[HYk] _ HY,,, climatek * attrk * prow * harsh'
ý
ý, M=l AYý
-}-
Ek-I1 AYm
E[AYk] =k "` climatek * attrk * provk * harshk
ým=1HYYm+AYm
equationsof the type observedin Equation 4.7.1 are replacedby the home/away climate
84
until the team or referee has participated in five matches.
parametric regression technique, which has its roots in density estimation, is the kernel
Given i. i.d. data (X1, Y1), (XN, YN), a suitable form that representY as a func-
""",
tion of the X; is required. A kernel function K(t) can be thought of as a generalisation
function, f! 0. K(t)dt
of a weight which satisfies the condition that = 1. There are
various estimators that make use of kernel functions, one of the more popular choices
being the Nadayara-Watsonestimator, as outlined in Wand and Jones (1995):
ýk Kh(Z - 2k)Yk'Wk
ýrýý)
_ 1: Kh(x
k=i - xk)Wk
where wk is the square root of the number of observations with value xk and Kh is the
kernel function with bandwidth h. The next decision is the choice of kernel function.
There are several, which have different properties. Most are conceived with the aim of
bias and the integrated variance. Silverman (1986) details various methods that can
be employedto find kernel functions which result in small valuesof MISE. One of these
is the Epanechnikov kernel which is given by
and it is this kernel that has been applied in this chapter. It also requires relatively
85
little computational effort, which is another important criterion.
It is also necessary to choose a suitable bandwidth h. There are various ways of
the regression. In some cases it may be necessary to have an automated process that
chooses h by some objective process. In this case, since suitably powerful software
is available, it is possible to try out various values, look at the resulting curves, and
make a decision based on existing knowledge of the climate. Figure 4.13 displays curves
resulting from various choices of bandwidth. The curve arising from bandwidth set to
Bandwidth =2 Bandwidth =5
30 o
Y
M
ca
In
Bandwidth = 10 Bandwidth = 20
._ fs
E E
cli
100 200 300 400 0 100 zoo 300 400
Bandwidth = 50
ý' r
Üm
(O ý
week number
Figure 4.13: Kernel regression estimates for different choices of bandwidth. The
solid black
line represents observed moving average, the red line represents estimated climate using that
bandwidth
86
be 5 seemsto provide the best fit.
87
Chapter 5
NFL has traditionally been one of the most popular sports in the United States, among
gamblers and the general public alike. Betting on NFL is extremely popular and there
are numerous casinos that offer bets on the sport, usually in the form of a fixed odds
handicap bet (discussed in Section 1.3.1).
In this chapter firstly a brief overview of NFL is given, explaining the structure of
the season and the game regulations. Section 5.2 describes the data available while in
Section 5.3 a basic model, assuming two independent Normal distributions for the home
and away scores, is specified and fitted. Section 5.4 attempts to find an alternative to
the Normal distribution in order to represent the home and away scores. More match
data is incorporated into a large multivariate structure in Section 5.5 in an attempt to
find more accurate score predictions while Section 5.6 presents a more straightforward
use of this extra data. Section 5.7 presents the conclusions to the chapter.
" the regular season This involves six leagues containing five or six teams eachl.
Normally half of the matches take place between teams within the same league,
'This is true for the data set beinganalysedwhich containsmatchesuntil January28 2001.At the
start of the 2002/2003 seasonteams were reallocated into eight divisions each containing four teams
88
with the remaining matches being against selected teams from other leagues.
Opponents for matches outside a team's league are selected by the NFL admin-
istration so that successful teams from the previous season play other successful
The matches consist of four fifteen minute periods. Each team consists of two separate
squads of players, one being the offensive squad, one being the defensive squad. Each
squad contains 11 players. At the start of the first quarter one side is designated to
be in possession of the ball and this side fields its offensive squad while the side not
in possession of the ball fields its defensive squad. Upon a change of possession of the
ball, which can take place in several ways, the offensive players of the side that has just
lost possessionare substituted by the defensive players in their side, while the side that
has won possession replaces its defensive squad with its offensive players. A detailed
explanation of many details of the match regulations and the important aspects of
NFL matches, such as the ways in which possession of the ball can be lost, is deferred
to Section 5.5.1 (in order to understand the intervening sections of this chapter, a
thorough knowledge of such details is not required). Points are scored either through
9 Field Goals: these are scoredwhen a team kicks the ball through a set of raised
posts at the opponent's end of the field and are worth 3 points.
" Touch Downs: these are scoredwhen a team placesthe ball over a line at the
" 1-Point Conversions: after a Touch Down is scored, a team is given one extra
play. Should they successfullykick the ball betweenthe raised set of posts at the
opposing end of the field using this play, they scoreone extra point.
" 2-Point Conversions: if, after a Touch Down, the team succeedsin placing
the ball over the line at the opponent's end of the field with the extra play, they
89
score two extra points.
If the two teams have an equal number of points after the four periods, an extra
period, known as overtime, is played. This period endsas soon as one side scoreseither
a Touch Down or Field Goal, with this side being declared the winner2.
Two data sets are available for this analysis and they are describedbelow.
1. NFL final scoresfor the homeand away side for seasons1983/84- 2000/01, along
with a bookmaker's line for score differences.
2. For seasons1997/98 - 2000/01 the following figures are available for both the
home and away side,
" the points scored in each quarter of the match, including any overtime
periods
" the number of Touch Downs, Field Goals, 1-Point Conversions, 2-Point
Conversionsand DefensiveConversionsscoredin each match
9 the match totals for yards passed, yards rushed, number of attempted
passes, number of completed passes, number of rushes, number of inter-
2Strictly speakingthis period would alsoend if one side scoreda defensiveconversionand thus
yielded two points to the opposing side. This would be a bizarre tactic however, since it would result
in the side immediately losing the match.
90
9
Sý S
.ý8 s
8
O 10 20 30 40 50 so 0 10 20 30 40 so
Sý S
8
8
ceptions and time in possession of the ball (these terms are explained when
Sections 5.3 and 5.4 uses the first data set, while Section 5.5 uses the second.
Figure 5.1 displays histograms for home scores (HSC), away scores (ASC), score differ-
ences and score totals. The home mean, away mean, home standard deviation, away
standard deviation and home and away correlation for scores are 22.15,19.02,10.41,
9.97 and -0.03 respectively. Two independent univariate Normal distributions seems
to be the most obvious distribution to employ in order to model the home and away
scores and this was the distribution chosen in several previous studies of NFL, includ-
ing Stern (1991), Harville (1980) and Glickman and Stern (1998). Stern conducts a
however it is not clear how to improve on them by other than ad hoc procedures'. Due
to the way in which points are scored in NFL, scores which are combinations of 3s and
91
7s are more likely to occur. Furthermore, by applying the Normal distribution one
HSC, - N(22.15,10.41)
ASC N N(19.02,9.97)
it follows that
In practice NFL scores cannot be negative, thus the non zero probabilities in Equa-
tion 5.3.1 are a source of concern. The Normal distribution will be employed in the
first model attempt but concerns about its suitability, along with some alternative
away scores are independent. In NFL, possession is crucial and any possession of the
ball by one side implies lack of ball possessionby the other, which restricts their scoring
opportunities. The linear models summarised in Table 5.2 reveal some curious trends.
Table 5.2: Coefficients and significance levels, modelling NFL Home Score (HSC)
against Away Score (ASC), Home Rushed Yards (HRY) and Away Rushed Yards
(ARY)
Model Coefficients and n-values
HSC-ASC C: (0.00211,0.94798)
HSCNASC+HRY ASC: (0.10526,0.00056),HRY: (0.08326,0)
HSCNASC+HRY+ARY ASC: (0.18828,0),HRY: (0.07449,0),ARY:
While there is clearly enormous dependencebetween the play of the two teams,
the structure of this dependenceis not immediately obvious.
For now a straightforward model is specified which can later be modified where
necessary. For match k that takes place between team i (k) and team j (k), at team
i(k)'s ground,
92
HSCk N JV(pk,a)
ASCk N N(Ak, C) (5.3.2)
where
Pk = If + "i(k) + Pi(k) +Ö
" a; (k),aj(k) are offensiveparameters for respectively the home and away teams
" A(k), ßJ(k) are defensiveparameters for respectively the home and away teams
In order to obtain MLEs for the parameters included in the model specified by
Equation 5.3.2, values for the external parameters, as defined in Chapter 3, must be
fixed. The processused in Section 4.3.5 concerning the analysis of bookings rates is
here, by trying a range of values for these parameters and monitoring the
repeated
likelihood. Table 5.3 displays the predictive likelihood for different sets of
predictive
the external parameters, and it appears that the near-optimal (time-down-
values of
(c), offensive/defensive prior tightnesses (ra8), seasonal truncation (w)) val-
weighting
ues are (0.05,5,20), which are highlighted in red. Table 5.4 displays the estimates of
the team parameters for this model at the final time-point in the data set. In contrast
to the estimates presented for Premier League soccer team abilities in Chapter 4, it is
rare that NFL teams have both a strong offense and a strong defense. The drafting
system used by the NFL that is described in section 2.3 puts a ceiling on the number
of highly rated players that any squad can contain. As a result teams are forced to
make compromises concerning the quality of some sections of their squad. In the case
93
Table 5.3: Predictive likelihood obtained for different choicesof external parameters
for final scores
Truncation w= 5 weeks:
Prior variance rrg of offensive and defensive estimates
2 5 10 20
0.005 -5696.8346 -5693.0002 -5698.3407 -5701.1115
0.01 -5694.2798 -5687.5982 -5693.1642 -5696.1395
Weight c 0.02 -5692.254 -5678.6826 -5684.7334 -5688.226
0.05 -5696.3219 -5665.0395 -5673.7137 -5680.1769
0.1 -5708.4818 -5663.6557 -5680.9272 -5698.2981
ýuncation of w= 10 weeks:
Prior variance rag of offensiveand defensiveestimates
2 5 10 20
0.005 -5696.4878 -5691.8315 -5697.2038 -5700.0071
0.01 -5693.5382 -5685.5082 -5691.1315 -5694.1826
Weight c 0.02 -5691.844 -5675.5469 -5681.7373 -5685.4487
0.05 -5697.8174 -5662.9761 -5672.9169 -5680.8305
0.1 -5710.4193 -5665.455 -5687.4899 -5710.9848
Truncation of w= 20 weeks:
Prior variance raß of offensiveand defensiveestimates
2 5 10 20
0.005 -5695.9376 -5689.5608 -5694.9919 -5697.8628
0.01 -5692.3865 -5681.6207 -5687.3551 -5690.5718
Weight c 0.02 -5691.7564 -5670.4412 -5677.0025 -5681.249
0.05 -5700.7032 -5662.4368 -5676.0887 -5687.9537
0.1 -5712.6495 -5670.3222 -5703.2531 -5741.8719
Truncation of w= 30 weeks:
Prior variance r# of offensive and defensiveestimates
2 5 10 20
0.005 -5695.6078 -5686.8531 -5692.3737 -5695.3268
0.01 -5690.8531 -5677.4607 -5683.3596 -5686.7768
Weight t 0.02 -5691.3705 -5665.9569 -5673.183 -5678.1523
0.05 -5706.0655 -5667.5716 -5687.5513 -5705.6403
0.1 -5719.4015 -5679.3656 -5727.9586 -5789.005
of St Louis and Miami in particular, it is clear which aspect of the game they have
chosento specialisein.
Figure 5.2 plots a. moving average of predicted scores versus observed scores, for
the home scores, away scores, scores differences and total scores. It reveals that the
example if a team plays a fixture without one or more of their most highly valued play-
ers, their expected score supremacy is usually lower. Since on averageteams benefit
94
R
s
9Z
1: 1 %LI
15 20 25 30 35 15 20 25 30
Q
R
j2
-10 0 10 20 30 35 40 45 50 55 60 65
Figure 5.2: Plot of moving average of predicted scores versus moving average of observed
scores
from injuries to their opponents as often as they suffer from their own injuries, the
net effect of injuries for both sides across all matches is approximately zero. However,
there are biases in the predictions for matches where one, or both, of the squads is
team abilities, with many employing only a single parameter. However, Section 5.8.2
in the additional comments section of this chapter outlines and implements a technique
which compares the predictive power of models using differing numbers of parameters
to represent team abilities. The results suggest that using two parameters seems
suitable.
Only one parameter, b, is used to represent the effect of playing at home although
Glickman and Stern (1998) employed a separate home effect parameter for each team.
The method they used to test the need for such a specification is outlined briefly in
Section 2.5.1. It is plausible that with games being played in such a variety of climates,
95
Table 5.4: Rankings of all NFL teams after January 28,2001
Team Attack parameter rank Defenseparameter rank Overall ability rank
Baltimore 0.048 9 1 0.439 1
-0.391
Oakland 0.22 2 9 0.315 2
-0.095
Tennessee 0.048 8 2 0.271 3
-0.224
Indianapolis 0.176 4 14 0.182 4
-0.006
Denver 0.184 3 0.012 20 0.172 5
Tampa Bay 0.011 13 4 0.149 6
-0.138
Jacksonville 0.138 5 16 0.141 7
-0.004
St Louis 0.325 1 0.188 29 0.137 8
Pittsburgh 0.024 10 7 0.126 9
-0.102
NY Giants 15 5 0.123 10
-0.003 -0.126
Philadelphia 18 6 0.096 11
-0.025 -0.121
Miami 24 3 0.07 12
-0.069 -0.139
Green Bay 0.062 7 15 0.066 13
-0.004
Washington 20 8 0.048 14
-0.051 -0.099
NY Jets 17 10 0.036 15
-0.018 -0.054
Kansas City 0.019 11 0.004 18 0.015 16
Minnesota 0.128 6 0.119 27 0.008 17
Buffalo 16 0.007 19 18
-0.005 -0.012
Detroit 23 11 -0.018 19
-0.065 -0.047
Carolina 21 12 -0.026 20
-0.053 -0.027
New Orleans 19 0.024 22 -0.057 21
-0.033
Seattle 14 0.069 23 -0.07 22
-0.001
Dallas 22 0.019 21 -0.075 23
-0.056
New England 25 13 -0.094 24
-0.112 -0.018
San Francisco 0.017 12 0.155 28 -0.137 25
Chicago 29 17 -0.22 26
-0.223 -0.003
San Diego 26 0.112 26 -0.242 27
-0.13
Atlanta 27 0.093 25 -0.244 28
-0.151
Cincinnati 28 0.075 24 -0.283 29
-0.208
Arizona 30 0.205 31 -0.438 30
-0.233
Cleveland 31 0.197 30 -0.478 31
-0.281
and with journeys to some games being particularly long, the disadvantage of playing
at other grounds is not homogeneous. In all the models employed in this chapter
is used to represent the effect of playing at home although further
only one parameter
using the basic model, given three different predicted score intervals. It can be seen
that scoresare not Normally distributed and indeed they do not follow any standard
statistical distribution. To understand the distributions observedin Figure 5.3 the way
scoring a Touch Down. Figure 5.4 displays the histogram for the entire set of final
96
Match- where mean worlna rats 1. between 20 and 21
O 10 20 30 b 5.
8
o lo zo Ho JO no
O 10 20 30 4O 60
Figure 5.3: Plots of observed histograms of score frequencies (_) along with theoretical
frequencies obtained assuming normal distribution applies (_) given three different match
means
scores, and peaks are observed at all numbers which are combinations of a low number
of 7s or 3s.
It is important to have a reasonably accurately specified distribution function for
scores when betting. As discussed in Section 1.3 one of the most widely available
betting markets for NFL is handicap betting. To illustrate the problem that arises
by using the Normal distribution to predict scores, two possible betting situations are
considered. For the purposes of these examples, the term `score difference' is used to
signify the home score minus the away score of a match. It is frequently of interest
to know if P(score difference > handicap). Suppose the basic model gives a predicted
score difference, E(X - Y), of 2.5 points, while the handicap offered by the bookmaker
is -2.5 points (i. e. it believes the median score supremacy of the home team over
the away team is 2.5 points). A bet on the home side is won providing X-Y>3.
According to the basic model and applying a continuity correction, the probability of
97
8
f
s
0 20 40 80 eo
score
basic model predicts a score difference between 1.5 and 3.5,54.9% have a final score
difference > 3. Hence the basic model estimates this bet to be less attractive than it
is.
Meanwhile, suppose for another match that the basic model predicts that E(X -
Y) =4 and the bookmaker offers a handicap of -3. In this case it is tempting to
back the home team and such a bet is won providing X-Y>4. According the
basic model, the probability of winning this bet is P(X -Y>3.5) = 0.515, where
X-Y N(4, * 9.14). Of the matches where the basic model predicts
~ a score
difference between 3 and 5, only 49.8% have a final score difference > 4. In this case,
the basic model thinks this bet is more attractive than it is, since it is unaware that
(134) have a final score difference
only a small number of matches of 4 but many
3the MLE obtained for o, with the basic model is 9.14
98
more matches (340) have a scoredifferenceof 3. A similar trend is observedfor other
handicapsthat are aligned near the more frequent scoredifferences.
This suggeststhat the Normal distribution may not be adequateand it would be
desirable to find a more accurate distribution for the difference in scoresand total
scores. There are two approaches which will be covered here.
2. Modelling the total number of Touch Downs, Field Goals etc, instead of total
score.
as the probabilities. So, for example, P(X = Olp = 20.5) = 20/925, since of the 925
matcheswhere either team's mean scoring rate was between20 and 21,20 resulted in
a score of zero. However, the problem with this solution is seen by considering Figure
5.5.
Here the proportion of occasionsin which the scorewas 21, given differing values
of the predicted score, is plotted. The predicted scoresare obtained using the model
described in 5.3. To adjust for the continuity of the predicted score, P(X = 211µ) is
defined as the proportion of matchesfor which pE (p - c, p+ c), for somechosenvalue
of e (0.1 in this case). A smoother version of Figure 5.5 is preferable, and by obtaining
smooth versions for all scores,a full density for P(X = xI p) for all values of x and all
values of p is obtained. To clarify this process,consider Table 5.5.
For example, 0.05714of the matcheswhere µE (25.65,25.75)resulted in a scoreof
21. The NAs signify that no match actually had that predicted mean in the dataset.
The rows in Table 5.5 sum to 1. The accuracy of the probabilities in the rows of
Table 5.5, which representthe density of interest, is improved by smoothing down the
99
O
-CP
0
aý
00
a>
N
0
0
ö
10 20 30 40
expected score
100
Table 5.5: Observed nronortions of scores. for given means
scores frequency given p
0 1 2 21 22 23 98 99 100
...
0.0 NA NA NA NA NA NA NA NA NA
0.1 NA NA NA NA NA NA NA NA NA
0.2 NA NA NA NA NA NA NA NA NA
39.8 NA NA NA NA NA NA NA NA NA
39.9 NA NA NA NA NA NA NA NA NA
40.0 NA NA NA NA NA NA NA NA NA
this. In effect smooth versions for function f (µ) = P(X = x1µ) are obtained for all
observed values of X.
This density will be referred to as the NFL distribution. Figure 5.6 contrasts the
density obtained for the scores 0,7 and 21, once smoothing has been applied.
Soors 0
0 10 20 ý0 40
ýýpýc1W ýoorý
800x. 7
O 10 p 90 40
Yapýc1YG ccorv
Scor. 21
u 20 90 41
. p. ct. C icon
Figure 5.6: Plot of f (µ) = P(X = x1µ) (_) with kernel-smoothedcurve overlaid (_), for
x= (0,7,21)
101
CýIO41ýled eafpeeleA Seere fer meeý Ieve1ý of fVrI__alafrlbullofl
Figure 5.7: Plot of Egg for each value of µ, where the probabilities are those of
oxP(xjp),
the NFL distribution.
One possible concern by applying this technique is that, after the smoothing is
applied to P(X = zip) for fixed values of x, there may be values of p for which
EO xP(X = zip) # p. This may arise since the kernel smoothing is applied with
the aim of providing smooth curves to replace the more uneven curves of the type
displayed in Figure 5.5. A consequence of this is that the values in Table 5.5 are
adjusted by a small amount and it is conceivable that this could cause the calculated
expected value for each row in Table 5.5 to differ from the expected value specified by
p. However, Figure 5.7 suggests this problem is noticeable only for the values towards
the edge of the distribution, which in practice occur very rarely. It should be noted
that, for any given value of p, of the matches whose expected mean estimated using the
basic model was close to p, the observed average score differs slightly from M. This can
be observed by recalling Figure 5.2. It follows that before smoothing is applied to the
columns in Table 5.5, the rows do not have the property that Ex'=0 xP(X = x1p) µ.
Figure 5.8 displays the same plots as Figure 5.3, for three different mean scoring
rates, with the NFL distribution probabilities overlaid in green. The probabilities from
the NFL distribution bear a closer resemblance to the observed histograms than the
racy of predicted probabilities for future matches, it is not necessary in order to obtain
consistent estimates for the parameters. In this case these are the teams' offensive
102
Matchaa wham mean acorlno rata I. between 20 and 21
O 10 20 30 4O 60
o to 20 30 4o 00
ve
8
ä
ä
O 10 20 30 40 R(
Figure 5.8: Plots of observed histograms of score frequencies (_), theoretical frequencies
obtained assuming normal distribution applies (_), and also the computed NFL distribution
(_), for three sets of means
and defensive abilities, as well as the global mean, home effect and score variance.
In general, if the form specified for the density is incorrect but the conditional mean
of the data generating process is specified correctly (that is, the functional form and
explanatory variables are the same as those of the true data generating process), in
ticular this is true if the assumed density is a member of the exponential family, which
the Normal distribution is. However, the fact that asymptotically consistent estimates
can be obtained only guarantees that as the amount of data available becomes infinite,
the estimates of the parameters converge to the `true' values. However, the rate at
which they converge to them increases the closer the assumed density is to the true
tion. Unfortunately the numerical routines such as Newton-Raphson that are employed
103
in order to maximise the likelihoods specified in this thesis cannot be applied easily
to a likelihood function that incorporates the probabilities from the NFL distribution.
This is because the probabilities of scores defined by the NFL distribution are avail-
able only by reference to a look-up table, for a finite number of means. The values
defined by it are not available as a continuous, well-specified equation. As a result,
the likelihood of the scores is not a function with continuous first derivatives, which
is an essential criterion in order to use the numerical maximisation routines employed
throughout this thesis. For this reason the MLEs estimated from the basic model are
used in the remainder of this section. The NFL distribution is employed in order to
sistent estimates (despite an incorrectly specified probability density for the data),
the fact that the variance is mis-specifiedmeans that the standard errors obtained
for MLEs are invalid. As a result, valid t-statistics or confidenceintervals for the
parameters cannot be obtained using the basic model. That would be a problem if
a selectionof different models using a range of different factors were being fitted and
the significancelevel of these factors were being investigated. In this application, the
pp27-31.
It is of interest to seehow the two models above perform relative to the bookmaker's
line. One betting strategy is to place a bet on a match provided that, according to the
model, the probability of winning is greater than a cut-off value k, for example 0.55.
The successrate of such a strategy, for varying values of k, is displayed in Figure 5.9
for both the basic model and the NFL distribution model. Also included is a y=x line,
which representsthe curve that would be realised with a theoretically optimal model,
where bets are placed knowing the true probability that the bet is successful.Overall,
the plot is not conclusive, but it appears that for the majority of sensiblecandidate
values for k, the bets made using the NFL distribution slightly out-perform those made
using the basic model. Both models seemto perform quite respectably compared to
the bookmaker's line. Note that only the proportion of bets won, rather than profit,
104
2
1°
0
cut-off margin
Figure 5.9: Proportions of bets won, where a bet is made provided P (Win) >cut-off, according
to both the basic model (_) and the NFL distribution (_)
(1 (5.4.1)
1.1 - - 4)
This is positive if q>0.524, although the rate at which profit is made is too slow
for most gamblers unless the successrate is considerably higher than this. The profit
curve of Figure 5.9 appears to win approximately 55% of the time if the estimated
As mentioned in Section 5.2, the second data set available includes only four years of
data, but there is more data available for each match. While ultimately it is only the
distribution of the home and away score that is of primary interest, it is conceivable
that the marginal distribution for the home and away score, derived from a joint
distribution of many match variables, may be more accurate than the basic model
While none of the previous studies of NFL scoring rates that the writer is aware
of include any information besides the scores and identities of the teams in the model
specification, several papers suggest that some benefit may be derived by including
105
other match statistics. Glickman and Stern (1998) state that `use of covariate infor-
mation, such as game statistics like rushing yards gained or allowed, might improve
precision of the model fit [for score differences]', while Fahrmeir and Tutz's (1994)
NFL model is formulated in such a way that covariates besides team abilities can
be included. Harville, whose approach involves the use of mixed linear models, sug-
gests establishing a model for statistics such as total yards rushed, and monitoring the
correlation between the random effects of this model and the scores model.
The following statistics are available for each match:
" the number of Touch Downs (HTD, ATD), Field Goals (HFG, AFG), 1-Point
versions (HDC, ADC) scoredin each match by the home and away side.
. Yards rushed (HRYD, ARYD) and yards passed (HPYD, APYD). It is crucial
that a team moves the ball towards the opponent's end, firstly in order to increase
their chances of scoring points, and secondly because they are forced by the
regulations to surrender possession of the ball if they do not advance the ball
more than 10 yards every 4 plays (this is explained in more detail in Section
5.5.1). The ball can be advanced towards the opponent's goal either by running
9 The number of rushes (HR, AR), the number of attempted passes (HPA, APA)
The mean valuesof these figures are displayed in Table 5.6 in order to demonstrate
the approximate scale of each figure.
106
To create a joint distribution involving the covariates listed above, a set of marginal
and conditional distributions of the covariates must be established. There are a large
number of configurations for this, but the approach taken here reflects the approximate
Play effectively starts with a scrimmage, which is similar to the scrum in rugby and
involves a set of players from either side forming two lines standing opposite each
other. In NFL it is the offensive team that always begins with possession of the ball,
and the ball is almost always passed by the offensive players in the scrimmage to the
quarterback, who stands behind the scrimmage, protected from the opposing team's
defensive players by his own offensive players. The quarterback most often attempts
to pass the ball on to another player. This action is counted as a Pass Attempt. If this
pass is successful, the player receiving the ball either tries to run with the ball, which
is recorded as a Rush, or very occasionally pass it once more to another player (only
backwards passes are permitted in this case), which is recorded as a Pass Attempt.
This initial activity, which represents the start of any attack, is summarised in the
data set by the number of Rushes or Pass Attempts.
The first dependent variable is the decision the team makes concerning whether
to Rush or make a Pass Attempt. Now the procedure that follows a Rush or Pass
Attempt is considered.
A Rush almost always concludeswith the player with the ball being impeded by
the opposition either by being thrown to the ground, or forced to run out of the field of
play. The action in the game stops and another scrimmagetakes place from the place
where the rushing player was halted, provided play from the last four scrimmageshave
resulted in the offensiveteam advancingat least ten yards towards the opponent's end
of the field. If this is not the case,the offensiveteam losespossessionof the ball to the
defensiveteam, and all players on the field are substituted appropriately, as explained
in Section 5.1.2. In the case of a Pass Attempt, three things can occur. Firstly,
the player may catch the ball and continue to attack. Hence the number of Passes
Completed as a proportion of the number of PassesAttempted is the next dependent
variable. The other two situations occur if the Pass is unsuccessful. Normally the
ball is not caught completely by either side in which case a scrimmage takes place
from the point where the PassAttempt was started, again provided that play from the
107
previous four scrimmage has resulted in a gain of at least ten yards by the offensive
team. However occasionally (average 1.00 by the home side and 1.14 times by the away
side in each match) a player from the defensive side catches the ball. In this case this
player's side gains possession of the ball and becomes the offensive side. The number
a Touch Down, and through Defensive Conversions. The rates at which these are
achieved are approximately equal for all teams in all matches so, unlike the distribu-
tions described above, do not require extensive treatment. The procedure adopted for
Histograms of home and away rush and pass attempts are displayed in Figure 5.11.
They appear to be Normally distributed and their (home,away) correlation coefficient
is -0.527. The bivariate Normal distribution seems to be an appropriate distribution.
The most obvious way to model these is using binary logistic regression, using re-
spectively HRPA, ARPA, HPA and APA as the group size, and the covariatesbeing
108
HRPA ARPA
HPA APA HR AR
Figure 5.10: The conditional structure of a multivariate NFL model, with conditioning
proceeding from left to right, then top to bottom
109
Home pass and rush attempts Away pass and rush attempts
8
S
8
B
3 a
R 8
30 40 SO 00 1.00 90 90 eo Ho
ao .. o No ro
R R
8
8
E
9
s
R R
110
Total Passed Yards and Rushed Yards
Figure 5.11 displays histograms for the home and away total passed and rushed yards.
While the passed yards seem to follow Normal distributions, the rushed yards are
significantly positively skewed. The most obvious alternative distribution is the gamma
distribution. Unfortunately attempting to maximise the likelihood of data that is
the likelihood where it is assumed the responses follow any member of the exponential
family. The gamma distribution is a member of the exponential family but there are
other members of the exponential family that simplify the task of maximising the
likelihood. Since the correlation coefficient is 0.147 for Home/Away Passed Yards and
for Home/Away Rushed Yards, the bivariate Normal distribution is used for
-0.290
the parameter estimation process for both (HRYD, ARYD) and (HPYD, APYD). For
employed.
Pass Interceptions
The unconditional home/away mean and variance are 1.00/1.14 and 1.152/1.120re-
ables,which that
suggests the assumptionof equality of the mean and variancerequired
when using the Poissondistribution is violated. Note that the Poissoncondition is that
the mean is equal to the conditional variance, conditional on any relevant covariates.
Thus the under-dispersionof these variables is more severethan recorded in Table 5.7
since the conditional variance, given the team parameters and the other covariates,
is less than or equal to the unconditional variance. For the ultimate application of
this problem, it is necessaryto calculate probabilities such as P(HSC > k). Since
HSC =3* HFG +6* HTD (suppressing1- and 2-Point Conversionsand Defensive
Conversionsfor now), if the Poissondistribution is employed to model Touch Downs
and Field Goals, scores further from the mean have their probability of occurrence
111
Table 5.7: Touch Down and Field G oals means and variances 1997-2001
Touch Field
Downs Goals
Home mean 2.539 1.524
Home variance 2.184 1.365
Away mean 2.132 1.405
Away variance 1.942 1.35
ýµ-y (v )OV
f (býF'ýý) = K(µß ý)ý1ýZe 71 (5.5.1)
where
1
1+1-0(1 +
T(14,0)- 12¢lß ý14)
The mean and variance of the distribution are approximately p and ý. Although orig-
suppressed for maximum likelihood estimation. Since the first order maximisation con-
ditions are the same as those for maximum likelihood estimation of Poisson distributed
data, the MLEs obtained using either approach are the same. However, the standard
errors decrease, in the case of under-dispersed data, which has two effects on the appli-
cation in question. Firstly, inferences obtained via p-values are affected and secondly,
the variances of the parameters change which, if the parameters are considered from a
Bayesian point of view, affects the variance of the predictive distributions.
112
ö
0
ö
ö
d
ö
ö
0 10 20 30 40 50
Figure 5.12: Density of 3*FG+6*TD, assuming FG and TD are Poisson Distributed (_)
and Efron distributed ()
distribution for both HFG and HTD with the rate parameters set respectively to be
the overall means of the home Field Goals and Touch Downs. In red, the density is
plotted assuming Efron's Double Poisson distribution, with rate parameters as before,
but with dispersion parameter defined as the respective mean/variance ratios. It can
be observed that the probability of many low scores is far lower assuming Efron's
Double Poisson distribution.
Since the MLEs obtained by using Efron's Double Poisson distribution to model
the response variable are the same as those obtained using a standard Poisson distri-
bution, for parameter estimation purposes it is more convenient to employ the Poisson
comes, Efron's Double Poisson distribution is employed using the parameter estimates
obtained with the Poisson distribution. In order to do this, an estimate for the disper-
sion parameter 0 is required and the ratio of the mean to the conditional variance of
the variable is a suitable choice. After the final time-point, this was found to be 1.280
for Touch Downs and 1.185 for Field Goals5.
113
0
HPA OHPA)
N Bin(HRPA,
APA N Bin(ARPA, OAPA) (5.5.3)
0
(HPYD, APYD) Na (14HPYD, 1 (5.5.5)
' APYD, QHPYD, OAPYD, PPYD)
0
(HRYD, ARYD) NZ(PHRYD, LARYD, QHRYD, QARYD, PRYD) (5.5.6)
N
For prediction,
HTD POis(AHTD)
114
For prediction,
ATD N POt82(AATDeOATD)
HFG N Pais(AHFQ)
For prediction,
P0182(AHFG, OHFG)
HFG
The form of the mean terms such as AHFG or I ARPA has not yet been specified.
which producespredictions for future eventscould be obtained. With the large number
explain past data very precisely, but by modelling the random error rather than the
underlying relationships. Hence the predictive power may well be disappointing. This
of the data. The coefficient for the number of Home Rush/Pass Attempts (HRPA)
is highly significant for seasons2 and 3, but the size of the coefficient, and hence
statistical significanceis far lower in seasons1 and 4. The coefficientsfor Away Home
Rush/Pass Attempts (ARPA) and Away PassInterceptions (APINT) display a similar
problem for different seasons. Table 5.9 displays the results of performing binary
logistic regressionon the proportion of Home Pass Attempts (HPA) that result in
115
Table 5.8: Coefficients and values for Home Rushed Yards model, using various covari-
ates, regressed over each season individually
Covariate Season
(coef,p-value) 128
.4
HR (5.511,0) (5.852,0) (5.189,0) 5.533,0
a Completed Home Pass (HPC) using a number of covariates, where the regression
is again performed separately for each season of the data. As in Table 5.8, some
of the covariates' significance varies drastically from season to season. One possible
cause is the correlation between some of the covariates. Techniques such as Principal
Component Analysis could be consideredin this situation. The approach taken here is
to selectonly the most essentialset of covariatesfor eachmodel, although with further
1. For each of the models specified by Equations 5.5.3 to 5.5.9, obtain team pa-
116
mean. For example, the conditional mean for the models specifiedby Equations
5.5.3 and 5.5.4 are defined to be, for match k betweensidesi(k) and j(k),
where
" Xk and Yk are set to be the home and away responsevariablesin Equations
5.5.3 to 5.5.9. The distributions for Xk and Yk are as stated in Equations
5.5.3 to 5.5.9.
" the ß
a and parameters are the teams' offensiveand defensivecapabilities
with respect to the relevant responsevariable.
6
"p and are the global mean and effect of playing at home, with respect to
the responsevariable.
Note that for the other models, the specification of the conditional means such
proportion parameter in the binomial distribution, and the rate parameter in the
Poissondistribution, are necessarilygreater than zero.
2. Using the results of step 1, create team effects for each of the models featured
in Equations 5.5.3 to 5.5.9. For each match this is ä; (k)+ßj(k) for the model for
the home and äj(k) + &k) for the model for the away response.
response
model including, in the conditional mean, both the team effectsfrom stage 2 and
the availablecovariates(obtained by referenceto Figure 5.5.1). These models are
all fitted separately for the data of eachseason.The consistencyof the estimates
117
this, the covariatesare chosenwith the aim of including all necessaryinformation
while minimising the risk of over-fitting by modelling random error.
The logic behind this procedure is that it is conceivable that the problems involved
in selecting which covariates are important that is observed in Tables 5.8 and 5.9 may
be reduced by modelling more of the variance where possible. This can be achieved by
Basedon this analysis,Table 5.10 displays which covariateshave been selectedfor the
final models.
" There is a general symmetry between the home and away categories, which is
logical, although the partly subjective selection of covariates was made using this
as a criterion. Total symmetry is not expected since NFL teams, as in many other
sports, vary tactics according to whether the match is being played at home or
away.
" The importance of information concerning Touch Downs towards the prediction
of Field Goals is logical, since usually when a team is faced with a potential
scoring opportunity, it has to decide between trying to obtain a Touch Down for
6 points or settling for a Field Goal for 3 points. Thus a larger number of Touch
Downs than expected is likely to lead to a smaller number of Field Goals than
9 One parameter to represent both the effect of a home covariate, such as HRPA
or HPA, on the home result and the effect of an away covariate, such as ARPA
118
Table 5.10: Final model for each covariate
pass interceptions
touch downs
field goals
Having specified a model for each covariate, it is important to recall a key problem
associated with the MLE method of obtaining parameter estimates of models that
feature covariates beside team abilities. By placing less weight on the information
from matchesthat took place longer ago, teams' more recent performancescontribute
more to the estimates of their parameters. As a result, the information from less
recent matches which is helpful towards estimating the effect of factors besides team
parameters is also down weighted. These effects are not considered time dependent
119
so all information concerningthese factors is of interest, hence this is not a desirable
in
property, as explained more detail in Section 3.2.3. Hence the procedure outlined
in that section is employed,namely
" treating these estimated valuesas constants,estimate the team effects using the
MLE procedureoutlined in Chapter 3.
Table 5.11 displays results from the first stage of this process. The coefficientsfor
the covariatesobtained at a time-point halfway through the data set, and at the final
time-point are displayed.
For the secondstage, it is necessaryto repeat the processof finding the values for
the down weighting, team prior tightness and seasonaltruncation which maximise the
predictive power of the model. Table 5.12 displays the optimal values for the external
parameters for each model.
120
Table 5.12: Optimized values of external parameters for each model involved in creation
of joint distribution for NFL final scores
sw Tc$ C.
DPA 0.1 0.1 5
PINT 0.01 0.1 10
PCR 0.05 0.1 5
PASSYD 0.01 20 2
RUSHYD 0.01 20 2
TD 0.05 0.2 5
FG 0.05 0.2 10
Now that team parameters and the coefficientsare available, predictions for each
of the variables being modelled can be generated. The next stage is to examine how
closely the joint distribution implied by the predictions obtained throughout the mod-
model are suggested. One of these, the discrepancy measure technique, is used here.
To summarise the technique, samples are generated using the specified distribution and
the parameter estimates. A scalar summary statistic of the data is calculated for the
observed data set and for each of the simulated samples. The value of these statistics
for the simulated samples is compared to the value of the statistic for the observed
data. Recall from Section 2.5.1 that the discrepancy measure technique can be applied
this application, the maximum likelihood estimates of each parameter are employed in
order to produce predictions for the covariates in the conditional mean of each model,
but the distribution of these parameters is not considered. It is the distribution of the
data that is of primary interest, hence the classical form of discrepancy measure test
is employed. The discrepancy measures used in order to diagnose any problems with
the predictive distribution of the statistics are the mean and variance of each of the
which
seasons, constitutes the secondhalf of the data set, is included so that sufficient
121
data has been observed in order to make valid predictions. If the observed value for
any of the statisticsis not contained in the (2.5%, 97.5%) quartiles of the simulated
data, that suggests a model deficiency. The results are now considered.
Table 5.13: Statistics for observed values of NFL variables, with confidence intervals
of simulated values in bra ckets
Variable Mean Variance
H. PA- 60.82 60.39,61.89 73.76 69.79,88.74
ARPA 60.06 (59.42,60.99) 84.41 (74.16,95.4)
HPA 32.96 (31.88,32.94) 65.89 (38.92,49.26)
APA 33.34 (33.23,34.32) 72.59 (41.59,52.61)
HR 27.86 (28.22,29.29) 71.17 (39.04,49.34)
AR 26.71 (25.91,26.98) 72.2 (39.86,50.59)
HPC 19.07 (18.3,19.06) 29.62 (19.21,24.61)
APC 19.04 (18.69,19.54) 31.24 (22.48,28.91)
HPINT 1.07 (0.86,1.03) 1.23 (0.84,1.15)
APINT 1.14 (1.06,1.26) 1.22 (1.08,1.49)
HPYD 213.44 (202.46,213.35) 5600.37(4050.25,5190.09)
APYD 203.74 (202.23,213.75) 6014.05(4538.7,5745.8)
HRYD 111.59(111.03,118.93) 2778.79(1972.28,2467.92)
ARYD 105.3 (100.24,109.98) 2541.86(1760.42,2230.69)
HTD 2.5 (2.39,2.65) 2.31 (2.17,2.9)
ATD 2.12 (2.03,2.27) 2.01 (1.85,2.48)
HFG 1.54 (1.5,1.74) 1.43 (1.74,2.58)
AFG 1.4 (1.38,1.61) 1.35 (1.54,2.17)
The variance of the simulated HPA is far lower than that of the observed data.
by HPA
Recall that HPA is modelled estimating VR-pq within a binary logistic regression
framework, which assumes a binomial distribution where the group size is HRPA.
The binomial distribution only has one parameter and may not be flexible enough to
represent the process by which the data is generated in practice. One distribution
that may be more suitable is the beta-binomial distribution, which has two shape
The beta-binomial distribution is often usedto model count data for which the variance
is greater than the mean (and is thus overdispersed).By employing the beta-binomial
density, both parametersa and b could be estimated in such a way that both the mean
and variance of the simulated samples match that of the observed data more closely.
The complicated nature of its density suggeststhat maximisation of the likelihood of
122
These comments also apply to the number of Away Pass Attempts and the number
for away teams, but not for home teams. The simulated values of these statistics are
however computed using the previously simulated values of the number of Pass/Rush
the simulation procedure that may test more accurately the reliability of the Pass In-
values of Pass Attempts. That is, to generate 1000 simulations based upon the ob-
served, rather than simulated, values of Pass Attempts and Rush/Pass Attempts. This
update is repeated for the remaining variables in the distribution. Table 5.14 displays
the results.
Table 5.14: Observedstatistics for variables along with simulated values in brackets.
Values are simulated using observedvaluesof explanatory variables
Variable Mean Variance
HPINT 1.07 (0.94,1.06) 1.23 (1.04,1.31)
APINT 1.14 (1.08,1.22) 1.21 (1.18,1.62)
HPYD 213.48 (207.86,215.88) 5576.57(5042.6,6033.64)
APYD 204.48 (203.5,211.35) 6077.13(5234.15,6377.64)
HRYD 111.73 (108.1,113.91) 2772.12 (2610,3162.87)
ARYD 105.14 (106.75,112.27) 2543.75(2311.98,2776.07)
HTD 2.5 (2.47,2.67) 2.3 (2.13,2.77)
ATD 2.12 (2.1,2.28) 2.01 (1.68,2.24)
HFG 1.54 (1.45,1.64) 1.42 (1.22,1.63)
HFG 1.4 (1.38,1.56) 1.34 (1.17,1.55)
using simulated values of the relevant quantities. A small number of the observed
statistics lie outside the the confidenceintervals, howeverthis is not unexpectedgiven
the large number of comparisonsmade. Overall the discrepancymeasuresapplied here
have not identified any clear deficienciesin the models for these variables. However
it is possible that alternative summary measureswould discover some discrepancies
between the observeddata and the simulated samples.
Ideally, to conclude this section of analysis, simulated values of home and away
scoreswould be generated by using the simulated values of Touch Downs and Field
Goals. Unfortunately, due to the unsatisfactory variance of simulated samplesof some
of the covariates involved in this modelling process it is clear that the variance of
these simulated scoreswould not be accurate. Even if the variances of the simulated
123
quantities were closer to that of the observed data it would still be necessary to compare
the covariance structure of the simulated samples of data with that of the observed
data. In this case there is little interest in doing so since it has already been established
that some of the response distributions applied are not suitable. Therefore a more
straightforward approach than the one described in this section is now considered.
In order to exploit someof the data from the in-match totals available,a lessambitious
E[HTDk] atd td
ßj(k)
_ 'Ytd + + aiýd
(k) +
+ Aaf91
hf9HFGk (5.6.3)
where
49'Ytd,'Yfg are intercepts for the Touch Down and Field Goal scoring rates,
" the atd, arg parameters are teams' abilities to score Touch Downs and Field
Goals,
" the ßtd, p! .qparametersare the teams' abilities to prevent opponentsfrom scoring
Touch Downs or Field Goals,
. the A are the effect of observedTouch Downs or Field Goals in a match on scoring
rates.
The A terms, which are coefficientsof the HTD, ATD and AFG terms, in all the above
models are all highly significant. This formulation is simple to implement and also
124
generates predictions for Touch Downs and Field Goals, which can then be combined
to generate a probability distribution that resembles the distribution for NFL scores
that was observed in Figure 5.4. Unlike the methods used in Section 5.4 there is no
sponses in Equations 5.6.3. The approach described in Section 5.5.2 to model Touch
Downs and Field Goals in the more complex multivariate model is also used here. To
recap, the Poisson distribution is implemented when the likelihood is maximised and
asymptotically consistent estimates are produced. However, the Touch Downs data
is under-dispersed, meaning the variance of the data is lower than the mean. The
same is true of the Field Goals data. As a result, a distribution that can simulate this
feature of the data, such as Efron's Double Poisson (defined by Equation 5.5.1) is im-
after the final time-point is displayed in Table 5.15. The lack of similarity between the
rankings of teams across the four categories further supports the suggestion that the
better teams do not consist of players of equal calibre throughout the squad.
Once probability distributions for HTD, ATD, HFG and AFG are obtained in this
way, in order to obtain a distribution for final scoresit is also necessaryto consider
the distribution of 1-Point, 2-Point and DefensiveConversions(discussedin Section
5.1.2). The most obvious formulation is
HDCk N Pois(03)
in
that result a 1-PointConversion
and 03representsthe averagenumberof Defensive
125
Table 5.15: Offensive and defensive ability estimates of NFL teams in terms of Touch
Down and Field Goal conceding rates after 28 January, 2001
Team Touch Down rank Touch Down rank Field Goal rank Field Goal rank
offensive defensive offensive defensive
estimate estimate estimate estimate
Arizona 28 0.0093 18 29 0.3156 31
-0.0926 -0.337
Atlanta 19 11 -0.2865 28 0.0771 22
-0.0198 -0.0277
Baltimore 0.0389 11 1 0.0436 12 -0.6111 1
-0.0987
Buffalo 0.0487 8 0.0541 26 6e-04 16 -0.0311 13
Carolina 0.0256 14 13 24 -0.014 14
-0.0214 -0.1472
Chicago 26 9 -0.3615 30 0.0757 21
-0.0698 -0.0341
Cincinnati 30 0.0432 22 27 0.1273 25
-0.1393 -0.2717
Cleveland 31 0.0828 30 31 0.2446 29
-0.156 -0.4617
Dallas 0.0375 12 14 -0.1356 23 0.0097 16
-0.0178
Denver 0.0616 6 2 0.2409 3 0.0572 20
-0.0873
Detroit 0.0394 10 0.0309 21 -0.0893 21 -0.0814 11
Green Bay 0.1009 2 0.0063 16 0.0126 15 0.037 19
Indianapolis 0.1534 1 0.0714 28 0.2242 4 -0.032 12
Jacksonville 0.1007 3 7 0.1606 6 0.0219 17
-0.0536
Kansas City 20 5 0.0951 8 0.0258 18
-0.0225 -0.0717
Miami 0.0644 4 4e-04 15 -0.1279 22 -0.225 4
Minnesota 0.0526 7 0.0517 25 0.1713 5 0.1421 27
New England 0.0464 9 10 -0.2006 26 -0.0035 15
-0.0311
New Orleans 22 8 -0.0292 19 0.0927 24
-0.0304 -0.0365
NY Giants 23 4 0.0703 9 -0.1162 9
-0.041 -0.0788
NY Jets 0.0204 16 12 0.0313 13 -0.1017 10
-0.0221
Oakland 25 0.0759 29 0.2962 2 -0.1853 8
-0.062
Philadelphia 27 0.059 27 -0.0492 20 -0.264 3
-0.0759
Pittsburgh 18 0.0435 23 0.0619 11 -0.1935 6
-0.0107
San Diego 0.021 15 0.0281 20 -0.1847 25 0.1404 26
San Francisco -0.0263 21 0.0081 17 0.0695 10 0.1896 28
Seattle 0.026 13 0.0882 31 0.0185 14 0.0819 23
St. Louis 0.0053 17 0.0499 24 0.4339 1 0.2466 30
Tampa Bay 24 -0.0633 6 -0.0172 17 -0.204 5
-0.0618
Tennessee 0.0624 5 -0.0859 3 0.1036 7 -0.324 2
Washington -0.1093 29 0.0151 19 -0.0229 18 -0.1866 7
mated it is possibleto produce probability densities for the scoresof the games. Firstly
is to
it necessary simulate, for example, 10000samplesof eachof the distributions de-
fined above. So for eachmatch k, valuesHTDks, ATD*I, HFG*I, AFG*', H1Ck',
A1Ck`, H2Ck', A2Ck', HDCC', ADCC', iE (1,10000)are obtained. Then for each
126
N_
A
O
I..
1111111111 IIIIIIIIýI1111..
g 11
1 ......................................................... i
.1
a
0 20 40 so so 100
scor.
Figure 5.13: Probability density obtained from quasi-multivariatemodel for New York Giants'
final score,SuperBowl 2000/01
match, 10000simulated values of the home and away scorescan be obtained via
HSCC` = 6*HTDk'+3*HFGk'+H1Ck'+2*H2Ck'+2*HDCC'
ASCC' = 6*ATDk'+3*AFGk'+A1Ck'+2*A2Ck'+2*ADCki
model
Figure 5.13 plots the density of scores for New York Giants' score in the final match in
the data set, which was the 2000/01 SuperBowl. This density is obtained by simulating
10000 outcomes using the distributions obtained in Section 5.6. The uneven density
that was treated in Section 5.4 is mimicked. Figure 5.14 displays a moving average plot
of predictions against observed values, which reveals that the predictions are broadly
reliable.
In Figure 5.15 the betting strategy attempted in Section 5.4.2 is repeated, where
bets are placed on the difference in scores and the total score of matches. The results are
interesting in that the bets on differences in scores generally win just about frequently
enough to ensure a positive expected gain (as discussed in Section 5.4.2, this requires a
win rate greater than 52.4%), provided bets are placed when the probability of success
is estimated to be greater than around 57%. Nevertheless the profit curve is not close
to the red line which representsthe proportion of winning bets one would realise if the
model probabilities were the `true' probabilities. The return curve for bets on total
scoresis very disappointing, even though the lower graph in Figure 5.14 suggeststhat
127
Score differences
o
0
05 10 15 20
-15 -10 -5
prediction
Total scores
30 35 40 45 50 55 60
prediction
Figure 5.14: Moving average plots of predicted score difference versus observed score differ-
ence, and predicted total score versus observed total score for quasi-multivariate model
128
Score differences
ci
8
/
O. 50 0.55 0.60 0.65 0.70
probability of victory
Total scores
O
0
O
LS O
probability of victory
Figure 5.15: Proportions of bets won, where bet is made provided P(Win)> cut-off, according
to the quasimultivariate model.
in general the total score predictions are reasonable. Note that direct comparisons
between the return curves in Figures 5.9 and 5.15 are not possible since the return
curve in Figure 5.9 is based upon data observed since 1983-2001, while that of Figure
5.15 only includes data from 1997-2001.
A revealing comparison of the accuracy of the two model predictions against the
bookmaker's line is observed by fitting two linear models. Denoting the model means
for score differences and totals by EDm and ET7, and the bookmaker's equivalents by
129
the coefficientsand confidenceintervals of interest are
It is clear that these linear models place more emphasis on the bookmaker's line.
This suggests that the bookmaker's line generally predicts the final results more accu-
rately than the model developed in Section 5.6 does. As an aside, in order to make a
profit by betting on fixed odds events in this way, it is not necessary to have superior
A major problem when trying to model NFL scores is the non-standard nature of
the distribution of final scores. Two approaches have been considered in tackling
this problem. The first approach, covered in Section 5.4 constructs a non-parametric
density using kernel smoothing techniques. While a distribution that reflects that
winning bets with the bookmaker, the distribution cannot easily be used in order to
obtain MLEs of parameters via standard maximisation routines. The main problem
with such a model is its impracticality, in that probabilities can only be obtained by
dicts the events that form the scores, rather than the final score itself. Hence the
number of Touch Downs and Field Goals are modelled. Initially, this is attempted by
using many other statistics available for each match. Unfortunately the excessively
complicated relationship between these variables, and the fact that the statistical dis-
tributions of these variables are frequently quite complicated, prevents an accurate
marginal distribution for scores from being obtained. A simpler version of this model
is implemented in Section 5.6 and while reasonable predictions for scores are obtained,
130
the predictions inferred from the bookmaker's line are superior.
The focus of this chapter has been more on general statistical methods and little
consideration has been given to the nature of NFL itself. This is in contrast to Chapter
4 on yellow and red cards, where the effect of the prevailing climate, inter-team rivalries
and the pressure of matches are all taken into account in the model building process.
There is plenty of scope for improving the models outlined here in a similar way.
in each NFL match and the extent of their participation. Certain players, such as
the quarterback, are central to the passage of play and many teams do not have two
choice quarterback or other key players, which are not uncommon, are likely to impact
out a match depending on the score of the game. This is true of all sports, where teams
frequently become more defensive if they are ahead on goals. In NFL this tactic is
used far more regularly, since the stop-start nature of the game permits constant re-
organisation and re-evaluation of game strategy. However, all models in this chapter
assume a constant scoring rate throughout the match. An alternative is to use quar-
terly scores for each match (an approach used in Chapter 6 for NBA scores) or even
using the time of goals and analysing matches by treating the scoring rates as birth
processes.
5.8.1 How a gambler can make a profit off a bookmaker with equally
accurate probabilities
With equally accurate probabilities, a gambler can make a profit by betting with a
bookmaker which offers odds for all events. To illustrate this, Table 5.16 displays a set
of bets on events which each have two possible outcomes and each (unknown to both
bookmaker and gambler) has a 50% probability of occurring. While the bookmaker
and the gambler disagreeon the probability of many outcomes,they are overall equally
accurate. While Table 5.16 presents a simple example, the corollary of it can never-
thelessbe generalisedto any situation where bookmakers and gamblers have differing
but equally accurate predictions.
131
Table 5.16: A gambler's decisions and expected returns if a gambler has equally good
predictions to bookmaker
Bookmaker's Inferred Gambler's Gambler's Expected
odds bookmaker's probability decision return
probability to bet (Y/N) for gambler
11:9 0.45 0.55 Y 12-z=p
9u,
9:11 0.55 0.45 N 0
9:11 0.55 0.6 Y
lip-z=-I
4:6 0.6 0.55 N 0
6:4 0.4 0.45 Y z-2=
11:9 0.45 0.4 N 0
Hencethe only occasionwhen the gambler has an expected loss is when both the
bookmaker and the gambler overestimate the probability of a certain outcome but
the gambler overestimates it by more. However this expected loss is more than offset
by the expected gain on the occasions when both the gambler and the bookmaker
more drastically. Both of these situations should occur equally often in the long run
since the model and bookmaker are assumed to be equally accurate overall. This leaves
the occasions when one of the gambler or bookmaker overestimates the probability of
the outcome but the other underestimates it. These correspond to the first two rows
of Table 5.16. The gambler either does not bet, or has a positive expected gain.
However the situations when the gambler's estimate of the probability of an outcome
is higher than that of the bookmaker do not occur frequently, since the bookmaker's
probabilities are always inflated to include their own overround. Hence although the
required to represent the ability of each team in any given sport. There are various
levels of parameterisation that could be considered,such as
" allowing attack and defense parameters for both sides, and a single home effect
parameter that applies for all teams
9 allowing attack and defense parameters for both sides, and separate home ad-
132
for each side. The team specific home effect is subsumed by this parameterisa-
tion.
E[xi]=E[Yk]=ry
E[Xk]=ry+b
E[Yk] =7
3b E[Xk]-y+6+pj(h)
E[Yk] =7+ߺ(k)
133
" In 1
model all teams are assumed to be of equal ability regardless of whether
they play at their home ground.
" In model 2 all teams are assumed to be of equal ability except there is an effect
from playing at the home ground.
. In model 3a the identity of the opponent is irrelevant when predicting the the
9 Model 4 has the level of parameterisation used in each model so far in this
chapter.
" In model 5a, the expected goals scored by the home side, relative to the number
. Model 5b is similar to model 5a except the number of goals the home side con-
" In model 6, both the effect on scoring and conceding rates for the home side of
playing a match on their home ground, relative to a match played away from
their home ground, varies for each team.
Chapter 3 for each level of parameterisation for every time-point in the whole data
set and compare predictive likelihood statistics. However, a less labour and computer
intensive method to gain a rough idea as to how many parametersneedto be included,
the following procedure can be implemented for each level of parameterisation
2. monitor how well the estimates for a set of team parameters for one seasoncan
be predicted from the previous season's.
134
The models have been fitted for NFL scores and before stage 2 of the procedure is
implemented, it is checked that improvements in fit are observed with the addition of
the extra team parameters. In general, if the probability distribution used to model
data is a member of the exponential family, then if n extra parameters are included in a
model which do not improve predictions significantly, the difference in the loglikelihood
between the two models, multiplied by two, is asymptotically X2n_1distributed. This
was a procedure employed by Maher (1982) to determine how many parameters are
required to represent soccer teams' abilities. The results of this check are displayed in
Table 5.18. The best fit can be obtained with the maximal level of parameterisation.
135
While closer fitted values to past observations can be obtained by increasing the
number of team parameters, this does not guarantee that superior predictions can be
made. One possible reason could be that the improvements in fit observed by increasing
the number of team parameters are caused by modelling the correlations within the
random error of the data. Some measures of goodness of fit penalise the addition of
extra parameters into a model in an attempt to prevent this. The method used here
to detect if extra predictive power can be obtained by using more team parameters
is by checking to see if the team ability estimates evaluated one year are informative
about the team's ability the next season by applying the following simple least squares
regression
Wa^'TO+Ti*Wa-1
ficients and p-values of the Tl terms for each season, for each level of parameterisation
Table 5.19: Coefficients and p-values obtained using previous year's parameters to
predict next year's, for NFL, 1997/98 to 1000/01
Parameters Regression Coefficient and p-value of previous year's parameter
in model applied at 61
716'a; 2-1 0.86,0
3-2 0.39,0.06
4-3 0.9,0
136
It appears that overall there is a benefit in terms of predictive capability only up
137
Chapter 6
question of quarters
Basketball is a sport with worldwide appeal and has been popular since the 1950s.
is
This particularly the case in the United States and since 1949has been governedby
the NBA. The NBA leagueforms the focus of the researchcarried out in this chapter.
Initially a model is constructed for NBA scores similar to the model specified by
Equation 3.1.1, with parameters being estimated using the MLE procedure described
applied in Chapters 4 and 5 will be considered. Recall from Chapter 4 that details of
specific Premier League soccer matches beyond the abilities of the two teams playing
were examined, such as the importance of the match result or any particular rivalry
between the two soccer teams. Similarly, in Chapter 5, match statistics besides the final
scores of the two teams were considered, such as the number of yards either NFL side
gained in the match. Approaches similar to these are taken in this chapter along with
some new methods such as studying whether the scoring rate adjusts during the course
of an NBA match. The data set for NBA is larger than that available for the studies
various aspects of the data, thus aiding the model enhancement process.
examining
The structure of this chapter is as follows: initially the rules of NBA are sum-
marised. In Section 6.2 the available data is introduced and a basic model for NBA
scores is created in Section 6.3. The limitations of this model are discussed via an
exploration of the data in Section 6.4. The information gained here is used to specify
a more advancedmodel in Section 6.5, the accuracy of which is compared to both the
138
basic model and the lines offered by a professionalbookmaker in Section 6.6. Section
6.7 concludesthe chapter.
The model construction processin this chapter considersboth the rules of an individ-
ual NBA game as well as the regulations that govern the leaguestructure so a brief
summary of both is now presented.
The structure of the NBA league is as follows. 29 teams participate and they are
grouped into four different divisions. These are the Atlantic and Central Divisions,
which combine to form the Eastern League, and the MidWest and Pacific Divisions,
which form the Western League. The NBA season is divided into a regular season and
a post-season. During each regular season,each side plays 82 games between November
and April. This means that teams play a fixture almost every two days throughout
this period. Roughly two thirds of these regular seasongames are played against teams
within a team's own league. The 16 teams who perform best in these divisions are
are grouped into eight pairs and in the first round, the teams in these pairs play against
each other until one side has beaten the other three times. The eight victorious teams
progress to another knockout stage where they are again placed into pairs and the
victorious team from each pair is the first team to beat the other team in the pairing
four times. This leaves four teams progressing to the subsequent round where a similar
procedure is followed so that only two teams remain. These two teams qualify for the
final round, which takes place in June. Again, the two teams play each other until
side has beaten the other four times. The team that achieves this is the League
one
Champion.
139
in each game, with only five allowed to play on the court at any one time. Each game
is split into four quarters, each lasting 12 minutes of playing time. Points are scored
by placing the basketball into the net at the opposing team's end of the playing court.
If a successful shot is taken within 25 meters of the net, 2 points are scored. If the
against a player while they are in the act of shooting, that player's team is awarded
one or two shots (known as free-throws) at a distance of four meters from the net. The
opponents are not allowed to defend these shots. For each free-throw scored, one point
is awarded. Should the scores be level at the end of the fourth quarter, the game then
goes into overtime, where another 12 minutes of play are undertaken in order to decide
the match winner. If the scores are still level at the end of this quarter, another period
which teams qualify for the play-offs tournament, are decided according to the per-
centage of games the teams have won. Should this percentage be equal for two or more
teams, a rather complicated set of rules determine the order in which these teams are
ranked, such as individual results between these teams, or the percentage of games
won against other teams in the league. It is only whether a team wins or loses that
is recorded when teams are ranked in their division and the margin of victory in any
games is not relevant at any stage. So, whether a team wins a match by 1 point
The data set over which the models of this chapter are developedincludes all regular
and post-season matches from the 1997/98 seasonuntil the 2000/01 season. The
1997/98,1999/2000 and 2000/01 regular seasonsall consist of 1189 matches while
the respective post-seasonseach consist of approximately 70 matches'. There was a
'Since the post-seasonsare a set of mini-tournaments,
with each being won by whichever team
wins for the third time, in the caseof the first round, and the fourth time in subsequent
rounds, these
mini-tournaments can consist of a variable number of matches. So the total number of post-season
matches changes from year to year.
140
players' strike in the 1998/99 season so the regular season of that year only contained
725 matches, which took place between February and May of 1999. The total number
of matches in the data set is 4568. This compares to 1020 matches in the main NFL
data set used in Chapter 5 and the 1900 Premier League soccer matches used to predict
booking rates in Chapter 4. The data available for each match includes, for both the
9 the bookmaker's line for both the difference in score and the match total. The
In order to provide an idea of the scale of these figures, Table 6.1 displays this
Table 6.1: First five matches in data set. The figures for the home team are listed
above the figures for the away team
Date Teams Score Total 2-Point 8-Point Free Throw Bookmaker's
Attempts Attempts Attempts Attempts Line
19971031 Boston 92 105 72 18 15 -9
Chicago 85 108 71 8 29
19971031 Vancouver 88 109 76 13 20 2
Dallas 90 106 63 11 32
19971031 Miami 114 110 62 25 23 7
Toronto 101 124 83 9 32
19971031 Charlotte 85 114 57 13 44 2
New York 97 103 59 10 34
19971031 LA Lakers 104 115 49 29 37 2.5
Utah 87 117 77 7 33
For the proceeding analysis total scoresat the end of the fourth quarter are studied.
While extra information is availableby using the final score(that includes points scored
141
in overtime periods), in order to make valid comparisons between the points scored in
each game it is necessary to compare points scored in equal periods of time. It is also
important that scores at the conclusion of the fourth quarter are used rather than the
final score including overtime periods, because if points scored in overtime are included
in the final score, the score variance estimate is inflated. So for this chapter, the term
`score' is defined as the score at the end of the fourth quarter rather than the final
The (home mean, away mean, home standard deviation, away standard deviation,
home and away correlation) for scores are respectively (95.88,92.69,11.83,11.02,0.37).
Figure 6.1 displays a histogram of combined home and away scores for this data set.
Its symmetry, combined with the relatively high correlation coefficient considering the
scores
142
HSCk (/1k, Ch)
' .W
and
Pk =7+as(k) +ßj(k)+b
Ak ='r+CYj(k)'ý'ßi(k)
where
" HSCk, ASCk are the home and away scoresin match k
" ai(k), aj(k) are offensive parameters for respectively the home and away teams
" ßi(k), ßJ(k) are defensiveparametersfor respectively the home and away teams
" oh, Qa are the home and away score standard deviations
4.3.5, to obtain parameter estimates for a yellow card model, and 5.3 to obtain pa-
estimates for a model of NFL scores and is repeated at this stage. Table
rameter
6.2 displays the predictive likelihood obtained for a range of values of the external
143
parameter between the NFL model and the NBA model. For NFL the near-optimal
value of c is 0.05 whereas here it is 0.1. In the NBA application, a 10% weight is
placed on a match 23 weeks ago when the likelihood is maximised, whereas for the
NFL model, a match 46 weeks ago has a 10% weight. NBA teams play approximately
three times a week for up to seven months of the year, whereas NFL teams play once
a week for up to five months .It follows that the likelihood maximisation procedure for
NFL scores includes a larger number of less recent matches in order to increase the
Table 6.2: Predictive likelihood obtained for different choicesof external parameters
for final scores
Truncation w= 5 weeks:
Prior variaaice rap of offer live and defensiveestimates
2 5 10 20
0.01 -31596.3077 -31580.6296 -31590.3172 -31593.8397
0.05 -31521.145 -31420.2025 -31428.6001 -31433.8074
Weight t 0.1 -31580.2486 -31366.7204 -31370.8393 -31380.7614
0.2 -31678.2108 -31364.5173 -31365.1443 -31394.373
Truncation w= 10 weeks:
Prior variance rpp of offensiveand defensiveestimates
25 10 20
0.01 -31589.7093 -31567.5079 -31577.0747 -31580.5907
0.05 -31514.5357 -31398.5752 -31406.7237
-31412.4433
Weight c 0.1 -31591.1065 -31359.3255 -31360.6186 -31371.5109
0.2 -31664.3299 -31361.3923 -31372.5807 -31412.9664
Truncation w= 20 weeks:
Prior variance rap of offensiveand defensiveestimates
25 10 20
0.01 -31598.5711 -31544.7656 -31554.0958 -31557.6064
0.05 -31510.5953 -31370.856 -31378.1846 -31454.8696
Weight c 0.1 -31588.9692 -31349.4321 -31352.4165 -31367.8302
0.2 -31653.1635 -31897.5825 -Inf -Inf
71-uncationw= 80 weeks:
Prior variance raß of offensiveand defensiveestimates
25 10 20
0.01 -31601.3265 -31553.884 -31563.7258 -31566.3006
0.05 -31519.1974 -31381.4687 -31389.139 -31464.7527
Weight c 0.1 -31596.4297 -31353.7981 -31357.4613 -31378.109
0.2 -31667.769 -31913.4894 -Inf -Inf
estimate after the final match in the data set on 3 June 2001is displayed in Zähle 6.3.
The frequent large differencesin the valuesof the offensiveand defensiveparametersof
a team that was observedin the NFL study is also seenhere. Like the NFL, the NBA
also includes a `draft' system whereby the least successfulteams each seasonhave first
144
choice of the graduating college Basketball players for the next season. These players
are obliged to play for the team for a minimum of five years. This limits the number
of `star' players that play for any team and it follows that it is difficult for a team
to be one of the best both offensively and defensively. Furthermore within a match
a team's ability to score points is to some extent proportional to how willing they
are to risk conceding points to their opponent. As a result each team's individual
play. This second consideration applies far less to NFL since the the offensive players
and defensive players do not play at the same time in the match. Thus the defensive
players are not normally expected to switch to an attacking mode of play and similarly
for offensive players.
Table 6.3: Offensive (ä) and defensive(4) NBA team ability estimates according to
basic model, June 2001
team ä rank ß rank
Portland 0.896 12 -1.4147 9
Boston 1.4723 9 2.3218 20
Vancouver -3.4054 25 2.4879 22
Miami -6.9943 29 -5.0345 4
Charlotte 0.5661 13 -4.3919 5
LA Lakers 8.8196 1 -3.0014 7
Orlando 3.3811 5 3.1527 25
New Jersey -1.2154 19 4.8569 26
Denver 0.2832 14 2.7809 24
Detroit -0.1271 17 -0.6943 13
Houston 2.8479 6 0.1362 15
Philadelphia -2.5215 24 -5.6697 3
Phoenix -1.7862 23 -2.9609 8
Minnesota 1.3942 10 0.5105 16
Milwaukee 5.2354 4 -1.0619 12
Chicago -4.6988 26 2.0572 19
LA Clippers 0.1763 16 -0.2841 14
Atlanta 0.233 15 6.9112 27
Utah -1.2823 20 -3.0231 6
Indianapolis -1.535 22 -1.1342 11
Seattle 1.5205 8 -1.1994 10
San Antonio -1.2118 18 1
-7.0554
Washington 1.0371 11 8.6027 29
Sacre:nento 7.537 2 1.5947 18
New York -5.9292 28 2
-6.7165
Cleveland -1.4035 21 2.4694 21
Dallas 5.249 3 2.5838 23
Toronto 1.8681 7 1.5085 17
Golden State 27 7.3505 28
-4.7234
erageobserveddifference in scorein Figure 6.2. Figure 6.3 plots the model predictions
against the line offered by the bookmaker. From Figure 6.2 the predictions appear to
145
be generally sensible so it is not surprising that in Figure 6.3 a broad similarity be-
tween the two sets of predictions is observed. The matches furthest from the diagonal
represent the matches where the bookmaker and the model disagree most strongly and
-20 -10 0 10 20
Figure 6.2: Moving average of expected (home-away) scores plotted against moving average
of observed (home-away) scores
While the basic model can be used to produce sensible predictions for matches there
are several restrictions implicit within this model that could be relaxed. Among the
" it is assumed that scoring rates are constant throughout a match, regardless of
the size of the difference in score at any point. In practice the tactics of teams
alter during the course of a match, depending on the score of the match, the
condition of the players, the tactics being used by opposing teams and other
factors.
" NBA teams play a large number of games within a short time-period, as described
in Section 6.1. Players may become tired but the basic model does not adjust
" the basic model allows teams' abilities to adjust over time at a single rate, via the
146
0
0
0.
-
0
O Om co
O0
00O mm0 O® 0
0Oc. 'm O
týl 0 anwompo 00 0
O o0a00
OOO 000 Oo0
OO Oc? mO O) tD OO O
0
OO . -. -o
O oo 0 ®-ý 0
O ®O care t) a)O O
0 0 O
01
0 OOVýa"))O
0 cn aanpan ,0O0
ýt c)oýa a
o mo )
-i00 O
0 00 c 00 O
o0
0 0DO -0 ®c O
0 0o ab CID
Ic:, w 00 O
o oa ýýý<iT [)O OO
0000 0
®om Sim x
0 0®
0 ooa
ýO OU O
000
0 00 0ý
ý>00 000
aD o
0 0 (. )cn
--
= o® 1)0) m
oo0 axxu O00
o
om DO U
o a oa s <ý 0
00 ooc o "o
o cis 00
0 00 elý 00
0 om ý
oooon®
o mo os
0 o o oo 0
oeoam< cn io 000
00 CCMNDOXCý
0o mm
0
OD
0 00<a oc 0
0omoa
OOmO 0
OOOaO
0 /m
0 OD / co
O O/O
00
0 10 20 3(
-20 -10
model predictions
Figure 6.3: Plot of expected difference in scores according to model against bookmaker's line
parameter (c). While the underlying ability of the players and management may
adjust in the long term, it is conceivable that the standard of their performances
fluctuate in the short term due to factors which vary more quickly, such as
" match scores are the only data values included in the model. Extra accuracy
may be achieved using some of the match totals available for other aspects of the
game, as listed in Section 6.2.
" two parameters are used to express the abilities of teams whereas other param-
" the identity of the players participating in a match is likely to affect the scoring
rates whereas the basic model does not consider the squad details of any fixtures.
Throughout the rest of this section, each of these restrictions will be evaluated by
147
6.4.1 Truncation of winning margins
As mentioned in Section 6.1 the margin of victory does not affect the team's ranking
within their division. If the team has a comfortable lead at the conclusion of the third
quarter of play, in certain situations the team may continue to try to increase the score
difference. With the large amount of attention devoted to statistics in NBA coverage
in the US sports media, teams and players may attempt to make their own individual
rest its best players due to an important future fixture, or if it sensesthat one of its
is
players at risk of being injured, the team may withdraw their first choice players, or
its players may play at a less energetic pace. Thus the margin of victory in a fixture
does not always reflect the true disparity in level of performance during the course of
the match. As a result, it may be that if team A beats team B by x points on average,
and team B beats team C by y points on average (x, y> 0), team A is expected to
accommodate this effect, which occurs in other sports besides NBA. Rue and Salvesen
(1997) included an additional parameter in their Premier League soccer scores model
that reflects their belief that a soccer team tends to underestimate its opponent if the
opponent is weaker. Hence, defining X as the score of the home team, p and A as the
E[X]=exp(p--f(p-A))
where 'y is a term to allow E[X] to vary according to the differencein ability between
the two sides. If this effect is indeed present as Rue and Salvesensurmise,ry should be
and College Football scores.The equation for a predicted winning margin wk in match
k between sides i(k) and j(k) is
148
r(j(k)) can be considered as the average number of points advantage a team has over
victory as the difference in team abilities increases. The estimated values for A were
0.75 for College Football and 0.67 for NFL.
One strategy that can be readily attempted which exploits the available data is to
develop a model for home and away scores at the end of the 3rd quarter, then model the
4th quarter scores conditional on the scores at the end of the 3rd quarter. Combining
these models may obtain more accurate marginal distributions for the scores at the
difference at the start of the fourth quarter and the difference in points scored by
the two teams in the final quarter. This suggests that teams' level of performance is
not constant throughout an entire game and that the earlier stages of a match are
frequently used to establish a score supremacy over opponents. The later stages of a
match can be used to rest players while preventing opponents from scoring sufficient
ä
0
-40 -20 0 20 40
Figure 6.4: Plot of 4th quarter score differences against score difference at the end of the 3rd
quarter for basketball, seasons 1997-2001
The line of best fit that is included in Figure 6.4 is obtained by regressing the dif-
ference in points scored in the final quarter between the two sides against the difference
in score at the end of the third quarter. The coefficient and 95% confidence interval is
149
to the model.
With each NBA team playing 82 games within approximately seven months, teams are
often required to play several matches in a short space of time, and their performance
in some matches may be affected in some way due to this. Furthermore, the necessity
to travel a long distance shortly before a match may be detrimental to the team's
performance.
To study the suggestion that a team that is tired due to playing a busy schedule
will perform less well, the level of tiredness needs to be quantified in some way. A
enough situations, and general enough to include sufficient data to obtain significant
results where appropriate. Tables 6.4 and 6.5 display counts concerning schedulesof
the teams in the two days prior to a match:
are:
f1
if the side in match k has played on two of the previous three days
Mlk
0 otherwise
1 if the side in match k has played a match anywhere on the previous day
M3k =
0 otherwise
150
These vectors are constructed for the home and away sides in each match. To
verify if any of the measures listed above signify that a team is tired to the point
where the match score may be affected, the observed average scores are compared to
the predictions obtained using the basic model via simple linear regression. Tables 6.6
and 6.7 display the results for the home and away scores.
By comparing rows 1 and 5 of Table 6.6, it seemsthat teams are tired by having
to play the day previous to a fixture, while playing two gameson successivedays then
having a rest before a match does not appear to tire teams. A comparison of row 1
with rows 3 and 7 would ideally clarify whether it is playing the previous day, or the
travelling over the last 24 hours, that tires teams, but unfortunately rows 3 and 7 do
not contain enough observationsto draw any conclusionswith confidence. Hence the
overall messagefrom Table 6.6, by comparing rows 1,3,4 and 7 with row 8, seemsto
be that playing the previous day on averagereducesthe score difference by between
1.5 and 2 points.
average,even if they did play gameson the two previous days. However, comparison
of rows 1 and 4 suggeststhat playing two gamesin the last three days, one of which
151
took place the previous day, is more tiring than only playing a fixture the day before.
However, the confidence intervals in rows 4 and 5 suggest that more observations are
required to draw any strong conclusions with respect to this. Also, further discoveries
may be made if the distance teams had to travel to a fixture is calculated and included
measure M3k defined above seems to be the most appropriate one to include in the
model. By comparing the parameter estimates for the following two linear models
it appears that the specification by which the variables are included is quite crucial.
Denoting the expected home and away score means from the basic model by Pk and
Ak,
For the first model, the coefficientsand confidenceof interval for ßl and #2 are
(0.159, -1.741) and 1.071 (1.767,0.375), while for the secondmodel they are
-0.791
(0.566, -0.744) and 1.119 (2.013,0.225). It appears that teams concedemore
-0.089
points asa result of playing the day before, but their own scoringrate is not significantly
affected.
The underlying ability of a team rarely changes drastically within a short period of
time, since it is largely determined by the abilities of the players in the squad. These
do not change on a regular basis. But while the team's ability may change slowly,
its short-term form might fluctuate. For example a set of recent bad results may
the team's confidence briefly. To detect this short-term form effect, a method to
affect
determine if the team is in a spell of particularly good or bad form is needed. Hence a
form measure is needed for each team entering each match. Several methods are tried
here.
152
Using recent results for prediction
It may be worth incorporating a team's most recent results, as well as their long term
ability, in order to predict their scores. Two methods to measure a team's recent form
are attempted.
Method 1 averagesout the team's margin of victory or defeat in the previous k
significance of the form vector is monitored. Thus if a team has performed badly
results since teams' abilities are assumedto adjust only over a longer period of time.
A similar argument can be applied to suggestthat the score of a team with recent
good results may be significantly different to that predicted by the basic model.
Method 2 is similar to Method 1, but the differencebetweenthe margin of victory
defeat and the bookmaker's line for the previous k matches is calculated to create a
or
form vector. So, instead of observed performance, it is the extent to which they have
exceeded expectation that is considered. This may more accurately measure their
where match n is the previous match in which the home side of match m participated,
and A,, are the expected home and away scoresimplied by the basic model and 00
pm
ß1 are the coefficientsto be obtained. A large significant value of ßi would suggest
and
that a team's performance relative to its opponent, compared to that predicted by the
is
basic model, significantly improved given a good result in a previous match.
The model fitted for Method 2, k=1 for the scoredifferencein match m is
the µ,., Az, ßo and ßl terms are defined similarly and B. denotes the book-
where
153
maker's line for the home side's previous match. The bookmaker's line is used as an
alternative to the prediction made by the basic model, p, a,,, since the the depen.
a-
dency between p, a- A,,,and µa could produce misleading results if a model were
fitted containing both of these terms.
are tested for values of k between 1 and 10 and Table 6.8 displays the estimated
ences,for k=1, means that for every point by which the home side beat their opponent
in their previous match, in the current match they beat their opponent by average
0.004 points more than the basic model predicts. The only significant results obtained
for the models fitted are for Method 2, away games,values of k>3 but in fact it
seems that on averageteams do worse playing away from home if their form prior to
the game was good!
Additionally, it is of interest to see if the bookmaker's line is sensitive to recent
of form by the teams involved. If this is so, then considerationof the results from
runs
Table 6.8 suggeststhat this reaction would be misplaced, thus presenting an area of
the betting market to exploit. The first graph in Figure 6.5 plots, in blue, the average
difference in scorefor each differencein form level between the two teams in
observed
where form is defined by Method 1 above. Plotted over this, in black, is the
a match,
of the bookmaker's line for each match and, in red, the averagepredictions
average
from the basic model. The lower graph in Figure 6.5 is similar to the upper
obtained
form is defined by Method 2.
graph except
Note that the number of observationsdecreasestowards both the right and left
154
Form against predictions, method 1
LD
Figure 6.5: Plotting scores (_), model predictions (-) and bookmakers spreads (_)
against recent runs of form
hand edges of the plots if Figure 6.5. This explains the large variance in these areas
of the plots. If the bookmaker's line is sensitive to recent runs of forms, the black
lines should increase above the blue ones on the right hand side, representing an over-
reaction to good form by a team, and decrease below the blue line on the left hand
side, representing an over-reaction to a bad run of form. In fact, the bookmaker's line
ties in closely with the average observed scores regardless of the run of form suggesting
Quantifying a team's level of confidence at any one time objectively is difficult. How-
ever, an attempt is made here via intuition to construct a vector which approximately
maximum confidence to minimum confidence. Table 6.9 displays how teams are al-
155
located into each class and Zhble 6.10 displays the number of occasions teams are
classifiedin eachclass.
match k of the home team and away team and ILk,Ak to be the expected home and
away score according to the basic model. The following linear model is fit.
For the coefficient7h, the estimate and confidenceinterval are -0.036 (-0.287,0.215)
and for rya they are -0.296 (-0.531,-0.061). Recent good results do not affect the home
side at all, while again it seems the away side may, if anything, be at a disadvantage
as their confidenceincreases.
Winning streaks
one frequently quoted statistic in media coverageof NBA in the build-up to a fixture
is the length of winning streaks of the teams as they enter a fixture, a streak being
defined as the number of consecutivevictories immediately prior to the fixture. Again,
it is conceivablethat the market over-reactsto the importance of this short term run of
form. To examine this, the length of winning streaks prior to every match in the data
set is recorded, and model predictions are compared to the bookmaker's line. Figure
156
6.6 contains the relevant plot.
(O
2488 10
etr. ak length
Figure 6.6: Plot of average observed score difference (_) plus confidence intervals (... ),
model predictions (-) and bookmaker's line (-) against length of winning streak prior to
match
In fact, for streak lengths up to around 7, the bookmaker's line is very similar to the
predictions from the basic model, which do not explicitly adjust for winning streaks.
The differences observed for streak lengths above 7 are based on a small number of
matches thus no firm conclusions can be made (only 103 matches from the sample
of 4244 matches included in Figure 6.6 featured a side that was on a streak of 8 or
more prior to the fixture). This suggests again that the bookmaker's line does not
Two messagesemerge from the investigation of teams' recent form. First, team abilities
can be summarised by their long-term ability estimate obtained via the basic model.
Factors such as confidence and motivation acquired through very recent runs of form on
average do not significantly affect a team's results. There is mild evidence suggesting
that a team's results are on average worse when playing away from home given recent
good results. However, no adjustments are made to the model to accommodate short-
term form.
Secondly the bookmaker's line does not appear to over-estimate the importance
of recent good or bad results. The bookmaker's line is largely determined by the
157
behaviour of the betting market. However,despite the large emphasisthat is placed
on recent results in media reporting, it seemsthat gamblers are not susceptible to such
Recall that Section 6.2 described the data available for each match in addition to
" HAk, AAk are the total shot attempts of any sort,
1. Predict the number of shots, or attempts, of any kind that the home team and
2. Conditional on (HAk, AAk), predict how the attempts filter into 3-point at-
tempts, 2-point attempts and free throw attempts. Hence the following two
distributions are obtained:
158
from which this is inferred:
Sine
Using the results of stage 3, the distribution of the final score for each team in
An approach like this was applied to the NFL data in Section 5.5, although the
multivariate distribution obtained did not match the observed distribution to a satis-
factory extent. One reasonfor this was the unsuitability of the binomial distribution
in the situations where a proportion was modelled. In stages2 and 3 mentioned above,
the most obvious way to model teams' decisionsconcerning which type of shot they
have, or their shot conversionrates, is to employ the binomial distribution to model
the proportions involved. However,it is worth checkingthe suitability of this method
before implementing any modelling.
To test whether the HF2M are binomially distributed with group size HF2A,
explained below. These simulated samplesare then compared with the observeddata
159
1. for each match k between side i(k) and j(k), generate an estimate of the ratio
E[H] which is the mean of the average 2-point shot conversion rate for team
i(k) and the average conversion rate that j(k) allow their opponents, during the
last 500 NBA matches. This accounts for approximately half an NBA season
2. for each match k, kE [1, N] where N is the total number of matches, simulate
Z values,HF2MM1,... HF2MMZ of home 2-point made shots, using a binomial
,
distribution, given the observednumber HF2Ak as group size and the approx-
imate conversionrate from stage 1 as the probability parameter. This yields Z
observedVar(HF2M).
of AF2A as well as
Table 6.11: Comparison of observed variance for variables, with simulated values as-
suming binomial distribution
Variable Observed Simulated Simulated
Variance 2.5% quantile 97.5% quantile
HF2A+HF3A 59.376 53.046 55.718
AF2A+AF3A 58.536 53.739 56.568
HF2A 70.098 59.055 61.967
AF2A 72.187 59.695 62.754
HF2M 28.867 30.563 32.974
AF2M 25.088 30.878 33.229
HF3M 6.847 6.343 6.894
AF3M 6.449 6.261 6.804
HFTM 42.268 40.838 42.598
AFTM 38.249 37.666 39.386
160
The observedvalues of HF2A + HF3A and AF2A + AF3A are over-dispersed
compared to the samples simulated assuming a binomial response, as are the observed
values of HF2A and AF2A. Meanwhile the observed values of HF2M and AF2M are
described in Section 5.5.4 may be a more suitable response distribution than the bino-
mial distribution where the simulations are under-dispersed. Given the computational
nomial distribution, this approach has not been considered in this investigation. Hence
a full multivariate analysis of the type carried out in Section 5.5 for the modelling of
Section 5.8.2 described a simple method that can offer some guidance on the appro-
priate number of parameters to include for each team in a model of NFL scores. For
that application it appeared that two parameters was sufficient. Recall that firstly
it is tested whether a better fit of the past data can be obtained by the inclusion of
extra parameters to represent the teams' abilities. Secondly, in order to verify that
extra predictive power can, be gained from the extra team parameters (and that the
better fit of past data is not obtained by modelling random error) it is investigated
whether the prediction of team abilities from one season can be made using the fitted
team abilities of the previous season (which are modelled on an entirely separate data
set). The results of these tests, applied to NBA data, are displayed in Tables 6.12 and
6.13. It appears from 6.13 that the generally significant p-values of the most highly
parameterised model that genuine effects rather than random error are being modelled
with the extra parameters. Hence extra predictive power may be available by including
four rather than two parameters for each team in the model.
Alternatively, as a possible area for further research, rather than modelling team
abilities by allocating a set number of parameters for each team, other approaches
could be taken to examine in what circumstances teams' strategies alter for different
fixtures. One could examine in detail the effect on scoring and conceding rates of a
big difference in ability between the two teams or the importance of the match, for
example. It is possible that there are more efficient systems of summarising team
abilities and how they vary their tactics from match to match than including
extra
parameters for each team.
161
Table 6.12: Decrease, and significance of decrease, of deviance when additional team
parameters are added into NBA model, season 1997/98 to 2000/01.
Model Parameters Comparison Year Deviance reduction
number in model model (df, p-value)
1 ry - 97/98 -
98/99 -
99/00 -
00/01 -
There is little doubt that the identity of players participating in a match influences the
performance of the team. Figure 6.3, which plots the model predictions against the
bookmaker's line, revealsseveralpoints where the model and the bookmaker disagree.
Table 6.14 displays all matches where the model and bookmaker's line for a match
differ by more than 8 points.
Many disagreementsoccur at the start of seasons,for example, at the
start of the
162
Table 6.13: Coefficients and p-values obtained using previous year's parameters to
predict next year's, for NBA, 1997/98 to 2000/01
Parameters Regression Coefficient and p-value of previous year's parameter
in model applied a; ß; ry; b;
ry 2-1
3-2
4-3
'y,ö 2-1
3-2
4-3
163
Table 6.14: List of matches with big differences between bookmaker's line and model
predictions
Date Home team Away team Model Bookmaker's Observed score
prediction prediction difference
19980124 Toronto Minnesota -4.29 5 0
19980215 Sacremento Washington -2.02 6 2
19980224 Washington Houston 4.52 -5.5 12
19990205 Utah Chicago 0.02 15 8
19990206 Charlotte Milwaukee 2.64 -5.5 0
19990206 Golden State Houston 1.74 -6.5 -2
19990208 Charlotte Miami 1.49 -8.5 3
19990209 Chicago Atlanta 4.94 -4.5 -16
19990210 Charlotte Cleveland 3.85 -5.5 -10
19990211 Chicago New York 4.8 -6 -5
19990216 LA Lakers Charlotte 6.46 16 28
19990218 Indianapolis Philadelphia 1.17 10 4
19991107 LA Lakers Dallas 3.49 11.5 8
20000319 Golden State Phoenix -8.95 8 -17
20001104 Vancouver LA Lakers 1.27 -9 -9
20001106 Sacremento Portland 6.53 -2.5 4
20001112 Detroit Seattle 6.57 -2 9
20001113 New Jersey Portland 2.74 -6.5 -12
20001116 Sacremento LA Lakers 6.85 -2 0
20001116 Toronto Portland 5.74 -2.5 -6
20001127 LA Clippers LA Lakers -3.26 -11.5 -15
20001205 LA Lakers Philadelphia -0.15 8 11
20001221 Houston LA Lakers 1.71 -8 -5
98/99 season (which started 5 February 1999 due to a players' strike), and the 00/01
season, which started on 31 October 2000. Between seasons clubs buy players, sell
players, or players retire. Hence the roster of a team can change significantly between
the end of one season and the start of the next. The bookmaker's lines generally
consider such information. The procedure used in order to obtain parameter estimates
for the basic model places more weight on recent results, hence information from
previous seasonsfor all clubs is down-weighted. However it does not make adjustments
for specific changes to a squad such as this. Unfortunately data concerning which
players have participated in each match was not available during this study. It is a
164
where
" HSC3k, ASC3k are the scores at the end of the third quarter
4k), ß ck) are team i(k)'s offensive and defensive parameters while playing at
"a
home
" ATIREDk is an indicator variable set to 1 if team j(k) played a fixture the
previous day, while HTIREDk is similarly defined for team i(k). Ah is the
410h3,0a3,p3 are the home standard deviation, away standard deviation and cor-
relation coefficient of all third quarter final scores.
Then, the team abilities obtained from the aboveformulation are incorporated into
a simple linear model, and treated as constants, to produce a model for final quarter
scoresQ4HSCk and Q4ASCk.
where
+ 64 + (k) + fgý(k))
" Phq4k ='Y4 v(a + IC(HSC3
- ASC3)
+
! ag4k= 74 v(aj(k) +ß (k)) + rc(ASC3 - HSC3)
165
" ahg4,aag4,Pq4 are the home standard deviation, away standard deviation and
correlation coefficient of fourth quarter scores
The following linear model is implemented to verify if the tiredness of the two
where the HSC3 - ASC3 term is considered as a nuisance parameter since it is known
that it is a strong predictor of Q4HSC - Q4ASC. The estimates and confidence
intervals for the ß2 and #3 terms are -0.456 (-1.055,0.143) and -0.118 (-0.556,0.319).
There is no conclusive evidence that if a team plays on the day prior to a match, their
score in the final quarter is affected, so a tiredness indicator vector is not included in
estimate all the parameters from the above models simultaneously since multiple re-
gression of these four variables on each other reveals a strong dependence between
them. However, due to the computational complexity involved in doing so, two inde-
pendent bivariate Normal distributions are used and team parameters are estimated
only through the first model. The estimates for the four team parameters are displayed
in Table 6.15.
The similarity between the estimates for teams' parameters for their home games
and their awaygamesis not surprising given that it is the sameplayerswho participate
in these games. The tactics may vary according to whether the team plays at home or
away so an exact agreementis unlikely. The two sets of parameters are plotted both
for the offensiveparameters and the defensiveparameters in Figure 6.7.
The estimated probability for the distribution of final scorescan then be calculated
The model development so far has focused on generating probabilities for the events
P(HSC - ASC) and P(HSC + ASC) where HSC and ASC are the home and away
scoreof a match at the conclusionof the fourth quarter of play. For betting purposes,
it is the final scoreafter possible overtimes that is of interest. Define
9 DSC4 and TSC4 to be the difference in score and total score at the conclusion
166
Table 6.15: NBA team ability estimates for home offense (ah), away offense (äa), home
defense(Bh) and away defense(811).June 2001
team ah rank ä° rank ßh rank ß° rank
Portland 0.4427 13 0.1952 17 -3.2744 4 -3.5704 6
Boston 0.2986 16 1.3537 9 1.542 20 2.3514 23
Vancouver -2.9833 26 -0.333 19 2.1558 24 2.8062 26
Miami -2.4187 25 -3.3622 26 -3.8497 3 -5.8169 1
Charlotte -0.6289 21 0.6117 13 -1.8768 6 -1.9093 8
LA Lakers 5.2324 2 3.2522 3 -0.1579 12 -1.4951 9
Orlando 2.0383 6 0.673 12 1.0169 15 -0.0434 12
New Jersey 1.2115 9 0.3824 15 3.0275 27 2.6821 24
Denver 1.081 10 0.3537 16 2.8064 26 4.6767 29
Detroit 0.9941 11 2.146 5 1.6314 21 1.6099 18
Houston 1.4774 8 1.0948 11 1.1891 16 0.1263 13
Philadelphia -0.5255 19 -1.3798 22 -1.4715 8 -3.8216 5
Phoenix 0.1989 17 -0.6932 20 -1.5813 7 -2.8021 7
Minnesota 0.4542 12 1.7545 6 -1.3118 9 0.3479 14
Milwaukee 2.9362 3 4.161 2 1.1893 17 2.3237 22
Chicago 29 -3.4667 27 -0.7823 11 0.758 15
-6.739
LA Clippers 28 -1.452 23 1.9811 22 2.2405 21
-3.4005
Atlanta 22 -3.6421 28 1.4873 19 -0.0856 10
-0.6606
Utah 1.4809 7 -1.2072 21 -1.9445 5 -3.9773 4
Indianapolis 0.3686 14 1.0955 10 -1.0011 10 -0.0698 11
Seattle 2.6886 4 2.6519 4 0.3365 14 2.2309 20
San Antonio 18 0.4682 14 -6.0048 1 -4.5643 2
-0.2353
Washington 0.3143 15 -0.1212 18 3.9535 28 2.7331 25
Sacremento 5.8617 1 4.4384 1 2.194 25 4.1775 27
New York 27 29 -5.0601 2 -4.5078 3
-3.1018 -4.2615
Cleveland 24 -1.8492 24 0.1477 13 1.8894 19
-2.1034
Dallas 2.1973 5 1.6196 8 2.1116 23 0.8484 16
Toronto 20 1.6933 7 1.298 18 0.8781 17
-0.5286
Golden State -1.3266 23 -2.0003 25 4.4248 29 4.6099 28
" DSC and TSC to be the difference in score and total score after overtimes are
completed and
" OTDSC,,, OTTSC to be the difference in score and total score in the nth
overtime period,
167
0
0
O/
_N
OO/p
SO
00 o
O /ö Co
0
00
000
N
op
o0 /o
8 -4 -2 024
0
0
o 0
0oo
N
0o
ýo
p 0
0
0 0° o
jo
o0
oa
-6 -4 -2 024
Figure 6.7: Plot of offensive parameters for home games of NBA teams at final time-point of
data set against offensive parameters for away games at final time-point
the following formulae give the distributions for DSC and TSC:
P(TSC=T)=P(TSC4=TnDSC4#0)
T-1
+ P(TSC4 = yl n DSC4 = 0)P(OTSC1 =T- yl n ODSCI 0)
y1 =0
T-1 T-yi-1
168
and
+...
the coefficient and confidenceinterval for 61 are 0.315 (0.190,0.440)and for ß2 they
are -0.0478 (-0.155,0.060). Therefore the predicted difference E[DSC] is a significant
the coefficients and confidence intervals for ßi and #2 are 0.082 (-0.158,0.322)and
0.188 (-0.0181,0.395).Sinceneither E[DSC] nor E[TSC] are significant predictors for
OTSC
OTCSCk N N(OTSC, 00T) (6.5.4)
where again only data observed prior to match k are used. For the final match,
(OTSC, cog') = (21.008,6.995).
Figure 6.8 plots the predicted mean of the difference in score from the basic model
of Section 6.3 against the predicted mean of the difference in score using the more
169
advanced model constructed in Section 6.5 and similarly for the prediction of the total
scores. While there is a broad agreement between the matches, there are also many
matches whose predictions have changed greatly. Figure 6.9 displays a plot of moving
average of predictions against observed values for the predicted values of the difference
in score and the total score according to both models. It reveals that both model's
o
00
oýo
0 Q9ý'ao
c: ý
8
J-8
o0
ýý 0
0
0
0
000
,0 .2 0ö
00
Figure 6.8: Plot of basic model score predictions against advanced model
Two measures are used in order to compare the predictive ability of the basic model
with the more advanced model of Section 6.5. The first measure, which uses scores
at the end of the final quarter, counts the number of times that either model's mean
prediction is closer to the final score. The second measure calculates the average
predictive log-likelihood for each match. Table 6.16 displays the results of applying
these measures.
Table 6.16: Comparison of basic model and quasimultivariate model via summary
statistics
Score difference Score total
Basic model Advanced model Model 1 Model 2
Proportion closer 0.526 0.474 0.54 0.46
Mean loglikelihood
-3.923 -3.887 -4.246 -4.308
The first measure, which does not penalise the magnitude of difference between
prediction and result, reveals that the basic model is closer to the observed result
170
Score differences
fo 0
hod
Jc,
0
05 10 15 20
-15 -10 -5
prediction
Total scores
0
N
O
A
t0
U'f
prediction
Figure 6.9: Moving average plots of predicted score differences and totals, for basic model
(_) and advanced model(-)
The second comparison, which does penalise the magnitude of discrepancy, sug-
gests the advanced model produces slightly better predictions for score differences but
bivariate distribution. However, due to the complex procedure required to obtain pre-
dictions for final scores described in Section 6.5.1 it is not straightforward to calculate
the loglikelihood of the joint (DSC, TSC) distribution for the advanced model. Hence
Table 6.16 displays only marginal loglikelihoods.
The betting strategy adopted here for NBA matches is similar to that outlined in
Section 5.4.2 for betting on NFL matches. The probability of winning each bet offered
by the bookmaker, either on the score difference or total score, is calculated. Then
various cut-off values are chosen such that bets are only placed provided the probability
171
of winning the bet exceeds this cut-off value. For each cut-off value, the proportion
of winning bets is recorded and Figure 6.10 displays the results of this strategy. As
in Figure 5.9, a y=x line is included which represents the return curve that would be
obtained if bets were placed using the "true" probabilities of match outcomes.
Score differences
0
r
0
0
s
0
0
probability of victory
Total scores
3p
fö0
N
O
probability of victory
Figure 6.10: Proportions of bets won, where bet is made provided P(Win)> cut-off, according
to both the basic model (_) and the advanced (-).
It is clear that neither model produces predictions that can win bets against the
bookmaker on a consistent basis. The horizontal dotted line is the 50% level, which
represents the proportion of victories one would achieve by betting randomly. The
advanced model is generally superior to the basic model, however, in order to make
money by betting with the bookmaker in the long term, it is necessary to have bets that
win often enough to exceed the bookmaker's overround. As mentioned in Section 5.4.2,
this requires winning significantly more often than 52.5% of all bets. Unfortunately,
even the advanced model does not produce predictions that win often enough.
A similar conclusion is reached by repeating the general linear model test previously
performed on NFL scores in Section 5.6.1. After fitting the following two models
172
E(DSC) ad + ßdmEDm + ßdbEDb (6.6.1)
^+
where ED,,, and ET. are the model predictions for the score difference and total score
and EDb and ETb are the bookmaker's equivalents, the coefficients and confidence
intervals of interest are:
As seen in the Chapter 5 concerning NFL scores, a simple linear regression model
puts far more weight on the bookmaker's line than the model prediction suggesting
once again that the bookmaker's line is overall the more accurate prediction.
6.7 Conclusion
average result, are not being made by the bookmaker. While for many matches the
173
advanced model may, for example, correctly identify a 55% probability of winning a
handicap bet, in other matches where the bookmaker has more accurate odds, then the
probability of winning a bet drops to approximately 50% since the model is effectively
placing a random bet. Thus the 55% success of the accurate bets is being averaged
out with the 50% successrate of the inaccurate ones, putting a ceiling on the success
174
Chapter 7
An alternative estimation
method - Markov Chain Monte
Carlo
While the MLE procedure used throughout this thesis is relatively simple to implement,
from a statistical point of view it is a little unattractive. In particular, using the same
team ability parameters for every match in which a team plays when parameters are
some dynamic distribution for them. As emphasised by Equation 2.4.1, this does
order to obtain parameters estimates used so far in this thesis are no longer appropriate.
There is an alternative method that can be used to obtain estimates of the param-
can be obtained and from these, estimates of the means, modes and variances of the
175
samples. The posterior distribution of a set of parameters 0 given observed data X is
where P(O) is a prior distribution of 0 and P(XIO) is the likelihood of the data given
0. A likelihood function that includes separate parameters of each team at each time-
point would be of very large dimension which would make evaluation of the integral in
Equation 7.0.1 unfeasible. However, although the posterior distribution of the param-
eters cannot be specified in a closed form, a technique known as Markov Chain Monte
Carlo (MCMC) can be used in order to simulate from it. A thorough understanding
of the MCMC technique for simulation is not necessary in order to understand this
chapter, however a brief summary is given in Section 7.6. There are several imple-
mentations of the MCMC technique, one of the most popular of which is the Gibbs
Sampler. A Gibbs Sampling based MCMC program known as WinBUGS is used in
model including genuinely dynamic team parameters. The advantages and disadvan-
tages of this approach are demonstrated by an example. The market that analysed
is NFL scores for seasons 1997/98 until 2000/01. The model specification is similar
to that used in the basic model in Chapter 5 concerning NFL, although some modi-
fications are made on consideration of Glickman and Stern's (1998) NFL model and
Rue and Salvesen's (1997) soccer model. As outlined by Gilks et at (1995), the task of
specifying a full probability model can be divided into the three stages:
3. Prior specifications
176
" µk and Ak represent the mean home and away scoring level of match k
" ai, t and ßi, t represent the offensive and defensive abilities of team i at time-point
" The a. and P. terms follow a Brownian motion, with drift precision Tw between
each time-point during a season and r, between the final time-point of one season
The form chosenfor µk and Ak for match k between teams i(k) and j(k) taking
µk = +
'Y ai(k), t(k) + ßJ(k), +d
t(k)
Ak = +
It aj(k), t(k) + ß, (k),t(k) (7.1.1)
Figure 7.1 is a cut-down Directed Acyclic Graph (DAG) representing these relay
where parameters are considered to be random quantities, and are thus specified using
a probability distribution. A single arrow that points from one quantity to another
indicates that the probability distribution of the second quantity is some function of
the first quantity. For example, the distribution for the Ö term in Figure 7.1 depends
on both bp and r 0. Double arrows are used where several quantities combine to be
re-expressed in a single quantity, in order to aid the presentation of the model. This is
the case for the µA and Ak terms, which are defined by Equation 7.1.1. The rectangular
boxes in the DAG do not define anything with regard to the model specification but
they are included in order to clarify to the user that the groups of quantities within the
rectangle are considered to have a similar role in the model. In this case, it is helpful
to group all team parameters that refer to the same time-point within one rectangle.
177
Figure 7.1: Cut-down Directed Acyclic Graph representing relationship between pa-
rameters of NFL model
178
7.2 Specification of the parametric form of direct rela-
tionships
The distribution of each variable and the relationship between each variable in the
cut-down DAG is explained briefly.
clature.
N114ki k2 I
Xk
" As mentioned above, standard Brownian motion is used to model the variation of
a team's offensive ability. If t and t+x are time-points in the same NFL season
then
N (ai, (x/7-r)Ii1
ai, t+x ^' t,
If t+x is the time-point of the first fixture of one seasonand t is the time-point
of the final fixture of the previous season then
" It is necessaryto determine a prior mean and precision for the values of the a
and 0 terms before any data is observedthus
a.,i r".
j («o,Tao)
ß.,1 ^' N(, 6o,Tpo)
" It is also necessary to determine a distributional form and relevant prior values
for the other parameters in the model. It is
assumed the global mean and home
179
effect parameters are normally distributed so
6N N(80,Tao)
n, r(Ico, ww)
Tw - r(7-wo, ww0)
r(T30, w30)
^'
Td
Note that there is insufficient spacein Figure 7.1 to include the r, term. If the
time difference between t(k) -n+1 and t(k) -n corresponds to a season break then
r, replaces Tw as the precision quantity that applies to ai(k), t(k)-n+1j aj(k), t(k)-n+1,
A(k), t(k)-n+1 and ßj(k), t(k)-n+1" Similarly Taoand W,o replace r,,,o and w,,,O.
Given the ubiquity of the y,J,r. and r,,, terms in the model, weak conjugate priors are
employed. Since the mean away score during seasons 1997-2000is 19.14
ry N(19,0.01)
S- N(3,0.01)
The (unconditional) score variance is 106.11 hence the unconditional score precision
is 0.00942. To have a flat prior on the score precision, a mean of 0.01, variance 10 can
r(l. o 10-05,1.0
* * 10-03)
While somedetailed methods for setting prior valuesfor the a and ß terms could be
180
considered, such as using some function of the previous season's scored and conceded
a,,o -. /V(0,0.01)
ß.,o - N(o, 0.01)
A weekly deviation of 0.25 points in mean scoring or conceding level for a team
seemsplausible, or equivalently a precision of 16. The conjugate prior with mean 16,
variance 100 is r(2.56,0.16). Therefore
Tw r(2.56,0.16)
Given the relatively small amount of data available in relation to r3, a stronger,
informative prior is used. In order to set a prior distribution, first the mean scoring
and concedingrate for each team in each year is calculated. Denote these values by
5; j, Czj, i=1, 4, j =1, 311. Defining S21,
ß = S2,
ß S1,
ß, it is calculated that
... , ... , -
Var[S21,,] = 19.55 Var[S32,,] = 34.83 Var[S43,.] = 16.50
Var[C21,, ] = 12.16 Var[C32,. ] = 14.06 Var[C43,.] = 29.92
The reciprocals of these are respectively (0.0511,0.0287,0.0606,0.0822,0.0712,0.0334).
The mean and standard deviation of this vector are 0.05453 and 0.02100, so a reason-
able conjugate prior is r(6.75,120), which has mean 0.05625 and standard deviation
0.02165.
Figure 7.2 displays selected portions of the output obtained by running 5 parallel chains
of 5,000 iterations of the model described above in WinBUGS. The reason parallel
chains are run is in order to check that convergence has been achieved. Should the
disparity between the chains be similar to the
variance within each chain, this suggests
that the Markov chain has converged. It is expected that the different
variables in
the model will converge at different rates, hence traces
of the parameter estimates
for the offensive and defensive abilities
of one team (the Denver Broncos) as well as
the global parameters ry and 5 are monitored in the right hand
plots of Figure 7.2.
'Four seasons
of data are usedand 31 differentNFL teamsappearin this data set
181
While convergence appears to have been reached by the two global parameters, the
traces of the chains for the team parameters that have been monitored suggest that
overall convergence is still some way off. The traces of Denver Bronco's offensive
and defensive parameter on 31 January 1999 reveal several non-intersecting chains,
suggesting the stationary distribution has not been achieved. In fact, all these chains
were started from the same initial values. By starting these chains at different values,
as recommended in several texts, convergence may well seem even further away.
A more formal checkfor convergence,known as the Gelman-Rubindiagnostic test,
is outlined in Section 7.6. Essentially, it produces a value known as the potential
scale reduction estimate, with confidence intervals, which approach the value 1 as
convergenceof the Markov chain is achieved.The plots of the potential scalereduction
estimate are displayed in the plots on the left hand side of Figure 7.2. They confirm
what the traces suggest,namely that the global parametershave reachedconvergence
but the team parameters have not.
To considerwhy this is so, note that the distribution of the Denver Bronco's offen-
sive parameter on 31 Jan 1999 is determined primarily by a single data point, which
is the Denver Bronco's score in the match that occurred close to 31 Jan 1999, and its
relationship with the Denver Bronco's offensive parameters in the time-points imme-
diately before and after 31 Jan 1999. It is also determined less directly through the
complex dependence structure that exists between all the parameters featured in the
model, which can partly be observed by recalling the DAG in Figure 7.1. Contrast this
with the parameters for the global mean and home effect, which are determined using
every match score, as well as the complex dependence structure. Given that far more
data directly determines the global mean and home effect, it follows that the Gibbs
Sampler converges more quickly towards suitable estimated values for them.
That some parameters have not converged after 5,000 iterations is not surprising
given that Glickman-Stern's considerably simpler model was run for 18,000 iterations,
by which stage one parameter had still
not completely converged. Rue-Salvesen's
model, with a similar level of complexity to the one employed in this example was run
for 25,000 iterations, although certain parameters
were not in fact evaluated via the
MCMC routine. Unfortunately, WinBUGS
was only able to perform approximately
8,000 iterations before encountering memory problems
on a 700MhZ Pentium 3 PC
with 384 MB RAM. Hence this approach shall not be pursued any further due to the
182
offest[42,10] chains 1: 5 off est[42,1 0] chains 5: 1
1.5 15.0
--
1.0 10.0
0.5 =_-5.0
0.0 0.0
3.0 4.0
2.0
2.0 0.0
1.0 -2.0
-4.0
0.0 -6.0
1 2000 4000 4850 4900 4950
iteration iteration
Gelman-R
in convey ence and iterations4801 t SUUUtor giobai inte
hest chains 1:5 hest chains 5:1
1.5
5.0
1.C 4.0
0.: r ý-ý-
-i 3.0
2.0
or 1.0
7 2000 4000 4850 4900 4950
iteration aeration
183
iterations took 27 minutes.
proaches
Consideration of the MLE and MCMC approaches emphasises that there are several
of which are not related to the statistical qualities of the model or the estimation
process. For example, it is important that the parameter estimation process is quick,
and defense) for each team at each time-point, plus an updated estimate for the global
mean, home effect and score variance at each time-point. In the NFL data set used for
this application there are 31 teams and matches take place on 182 different days. That
leads to a total of (31*2+3)*182=11,830 parameters overall. The MCMC technique,
on the other hand, returns t*31*2+5 parameter distributions when run at time-point
t, since each team's offensive and defensive parameter is modelled as a dynamic pro-
cess. In addition, the global mean, home effect, score variance, weekly precision and
seasonal precision parameters are re-estimated. Note that these effects are assumed to
be constant throughout time, unlike the team ability parameters. Hence, if each team
is allocated two parameters for all 182 days, and the MCMC is run over all 182 days,
the output to the MCMC simulation features E182 (62t + 5) =1,033,396 distributions
(as opposed to to point estimates). It
may seem excessive to allow teams' abilities to
be re-evaluated at every day that
any match takes place and instead abilities could be
re-evaluated only after each occasion when the team in question has played a game.
An approximation to this can be
achieved by dividing each of the three seasons into
21 equally long time intervals (17 regular
and 4 play-off, including Superbowl, weeks),
giving 84 time-points overall. However, that still requires estimation of 221,660 distri-
butions overall.
184
estimates for the entire four years can be obtained in approximately 35 minutes. This
compares favourably to the 27 minutes required to run the 25,000 iterations of the
earlier MCMC routine for just the final time-point. Crowder et at (2002) report sim-
ilar findings. While greater efficiency and reliability could be achieved using a more
task. It is unlikely that it would be fast enough to make investigations into model
enhancements practicable, especially when one considers that to investigate the effect
is
process run at many time-points throughout the data set. The MCMC techniques
could be pursued further with regard to this application should considerable advances
be made in computing power and MCMC software. Hence, on balance, while MCMC
is certainly the more attractive approach from a statistical point of view, from a prac-
tical point of view, the MLE method is much easier to implement and is also far more
suitable for the process of model development. Thus the MLE method has been the
In statistical analysis it is often necessary to study a data set via a multivariate inter-
dependent set of variables O. If the analysis is being carried out within a Bayesian
framework, O is a set of parameters and if the analysis is carried out within a frequen-
tist framework, ® is a set of observable data values. If there are various characteristics
of O that are of interest, such as modes, higher posterior densities or quantiles of indi-
subsets of members of O, can produce many samples from a Markov chain whose
stationary distribution is Ire, henceinferencescan be drawn about © using relatively
straightforward analysis of these samples. There are various techniques employed in
order to produce such a Markov chain. All are explained in more detail by Gilles et al
(1996).
The most general specification of the MCMC technique
employs the Metropolis-
185
Hastings algorithm. It produces a sequenceof generated values E)*,,..., On which, for
a suitably large value of n, represent a sample from Ire. Firstly a proposal distribution
q(. 1Ot) is specified along with an initial value O. Next, a `candidate' value Ot is
and the stationary distribution of E)*,,..., On is always Ire, however the choice of
distribution affects how quickly the chain converges. Also, it is necessary to have a
`burn-in' period of m iterations so that only samples O ;,, On are considered to be
... ,
representative samples from Ire. In this way the samples are not affected by the choice
are various ways of approachingthe task of finding a suitable choiceof q(.1.). For exam-
ple, the Metropolis algorithm involves choosingonly symmetric forms, henceEquation
7.6.1 reduces to
involves decomposing O into a set of smaller subsets 01,... 6r. Then these indi-
,
vidual components are updated sequentially during each iteration, subject to an ac-
ceptance criteria similar to that of Equation 7.6.1. The most common example of
single-component Metropolis-Hastings, and in fact the most widely used MCMC rou-
tine at the time of writing, is the Gibbs Sampler. When applying a Gibbs Sampler,
the proposal distribution for updating the ith component of ©t is the full conditional
distribution
where O is of dimension K.
One key task when using MCMC
methods is monitoring convergenceof the chain
of values to ensure that the chain has settled into the required distribution. One way
to do this, which the WinBUGS software implements,
was outlined by Gelman-Rubin
(1992). It involves running several chains in
parallel and checking that they overlap
to a satisfactory extent. To do this, define W as a scalar
summary figure of all the
186
simulated values of one of the parameters. Then two estimates of var(IF) can be made.
Defining &j to be the jth realisation of the summary figure of the values from chain
i, these estimates are
E )2
ý'ý,
i=l
where s; =nil 1(Oij -
B- n T. )2
i_i
and W, thus
n1W+ 1B
r(I)
_ nn
is a secondestimate of var(W)
r(W) is initially larger than var(WY), while W is initially lower. However they
Taw)
=
parameter realisations as the scalar summary are included in the output provided for
the application of MCMC in this chapter.
187
Chapter 8
Conclusion
As stated in Chapter 1, the aim of this thesis was to develop models for sporting events
that produced probabilities that were at least as accurate as those inferred from odds
offered by professional bookmakers. Three attempts have been made to achieve this,
but only one, concerning the rate at which yellow and red cards are collected by soccer
It is not clear whether the greater successrate of the bookings model is because
the model is more accurate, or because the market for bookings bets is less effective
at forcing the mean towards the "true" probability than the market for NFL and
NBA scores. It is not possible to produce figures such as the predictive likelihood for
bookmakers' odds since only their expected mean is provided in the case of spread
betting and only their expected prediction for the median is provided in the case of
fixed odds handicap betting. The entire probability density for all outcomes is required
for most summary statistics of predictive capability. Other commonly used goodness-
only be used to compare nested models on the same data set, and cannot be used to
compare the accuracy of different predictions for different sports. Hence a comparison
of the accuracy of the central spreads for bookings with the accuracy of the NFL lines,
for example, is not possible.
rated into each model prediction. As well as both teams' individual tendency to attract
and provoke bookings, the referee, the difference in ability of the two teams, historical
rivalries and match-specific incentives are accommodated into each prediction. En-
hancing the NFL and NBA models in this way is necessary
and entirely feasible since
188
the data concerning which players are injured, or whether a team has any unusual
enhance all models covered so far. For example, data is also available concerning the
time that all points are scored, or bookings are collected. Using these, it is possible to
improve not only the predictions generated for the match totals before the fixture takes
place, as has been attempted throughout this thesis, but it is also possible to generate
analysed and developed further if necessary. This does not apply to the intuitive ap-
proach since once the knowledge concerning the sport has been acquired, and the skills
in converting this knowledge into probabilities in a reliable way have been developed,
it is not clear how any further improvements to the system can be made. While not
all of the models in this thesis have the desired predictive capability, the approaches
189
Bibliography
[1] A Colin Cameron and Pravin K Trivedi (1998) Regressionanalysis of count data
Cambridge University Press
[2] Martin Crowder, Mark Dixon, Anthony Ledford and Mike Robinson (2002) "Dy-
[3] M Dixon and S Coles (1997), "Modelling association football scoresand ineffi-
[4] Ludwig Fahrmeir and Gerhard Tutz (1994) "Dynamic stochastic models for time-
[5] David Forrest and Robert Simmons (2000a) "Forecasting sport: the behaviour
[6] David Forrest and Robert Simmons (2000b) "Making up the results: the work of
(7] John M Gandar, Richard A Zuber, Reinhold P Lamb (2001) "The home field
190
[10] WR Gilks, S Richardson and DJ Spiegelhalter (1995) Markov Chain Monte Carlo
[11] Mark E Glickman and Hal S Stern (1998) "A state-space model for National
[12] David Harville (1980) "Predictions for National Football Leaguegamesvia linear-
Journal of the American Statistical Association Vol.75,
model methodology"
No.371,516-524
tion football match to determine the optimal timing of substitution and tactical
decisions" Journal of Operational ResearchSocietyVo1.53,88-96
[14] Leonard Knorr-Held (2000) "Dynamic rating of sports teams" The Statisti-
[17] G Ridder, JS Cramer and P Hopstaken (1994), "Down to Ten: Estimating the
[19] Robert Simmons, David Forrest and Anthony Curran (2003) "Efficiency in the
handicap and index betting markets for English rugby league" extracted from
The Economy of Gambling by Leighton Vaughan-Williams (Routledge)
[21] Raymond Stefani (1977) "Football and basketball predictions using least squares"
IEEE Transactions on Systems,Man and CyberneticsFebruary 1977
[221 Raymond Stefani (1980) "Improved least squaresfootball, basketball and soccer
191
[23] Hal Stern (1991) "On the probability of winning a football game" American Sta-
[24] Roger C Vergin (2001) "Overreaction in the NFL point spread market" Applied
[25] MP Wand and MC Jones (1995) Kernel SmoothingChapman and Hall, London
192