Markov Chains Stochastic Stability
Markov Chains Stochastic Stability
Second Edition
The bible on Markov chains in general state spaces has been brought up to date to
reflect developments in the field since 1996 – many of them sparked by publication
of the first edition.
The pursuit of more efficient simulation algorithms for complex Markovian
models, or algorithms for computation of optimal policies for controlled Markov
models, has opened new directions for research on Markov chains. As a result,
new applications have emerged across a wide range of topics including optimiza-
tion, statistics, and economics. New commentary and an epilogue by Sean Meyn
summarize recent developments, and references have been fully updated.
This second edition reflects the same discipline and style that marked out the
original and helped it to become a classic: proofs are rigorous and concise, the
range of applications is broad and knowledgeable, and key ideas are accessible to
practitioners with limited mathematical background.
www.cambridge.org
Information on this title: www.cambridge.org/9780521731829
Asterisks (*) mark sections from the first edition that have been revised or augmented
in the second edition.
List of figures xi
1 Heuristics 3
1.1 A range of Markovian environments 3
1.2 Basic models in practice 6
1.3 Stochastic stability for Markov models 13
1.4 Commentary 19
2 Markov models 21
2.1 Markov models in time series 22
2.2 Nonlinear state space models* 26
2.3 Models in control and systems theory 33
2.4 Markov models with regeneration times 38
2.5 Commentary* 46
3 Transition probabilities 48
3.1 Defining a Markovian process 49
3.2 Foundations on a countable space 51
3.3 Specific transition matrices 54
3.4 Foundations for general state space chains 59
3.5 Building transition kernels for specific models 67
3.6 Commentary 72
v
vi Contents
4 Irreducibility 75
4.1 Communication and irreducibility: Countable spaces 76
4.2 ψ-Irreducibility 81
4.3 ψ-Irreducibility for random walk models 87
4.4 ψ-Irreducible linear models 89
4.5 Commentary 93
5 Pseudo-atoms 96
5.1 Splitting ϕ-irreducible chains 97
5.2 Small sets 102
5.3 Small sets for specific models 106
5.4 Cyclic behavior 110
5.5 Petite sets and sampled chains 115
5.6 Commentary 121
18 Positivity 462
18.1 Null recurrent chains 464
18.2 Characterizing positivity using P n 469
18.3 Positivity and T-chains 471
18.4 Positivity and e-chains 473
18.5 The LLN for e-chains 477
18.6 Commentary 480
IV APPENDICES 529
A Mud maps 532
A.1 Recurrence versus transience 532
A.2 Positivity versus nullity 534
A.3 Convergence properties 536
Contents ix
Bibliography 567
Indexes 587
General index 587
Symbols 593
List of figures
16.1 Simple adaptive control model when the control is set equal to zero 418
20.1 Estimates of the steady state customer population for a network model 522
B.1 The SETAR model: stability classification of (θ(1), θ(M ))-space 540
B.2 The SETAR model: stability classification of (φ(1), φ(M ))-space 541
B.3 The SETAR model: stability classification of (φ(1), φ(M ))-space 542
xi
Prologue to the second edition
Markov Chains and Stochastic Stability is one of those rare instances of a young book
that has become a classic. In understanding why the community has come to regard
the book as a classic, it should be noted that all the key ingredients are present. Firstly,
the material that is covered is both interesting mathematically and central to a number
of important applications domains. Secondly, the core mathematical content is non-
trivial and had been in constant evolution over the years and decades prior to the
first edition’s publication; key papers were scattered across the literature and had been
published in widely diverse journals. So, there was an obvious need for a thoughtful
and well-organized book on the topic. Thirdly, and most important, the topic attracted
two authors who were research experts in the area and endowed with remarkable skill
in communicating complex ideas to specialists and applications-focused users alike,
and who also exhibited superb taste in deciding which key ideas and approaches to
emphasize.
When the first edition of the book was published in 1993, Markov chains already
had a long tradition as mathematical models for stochastically evolving dynamical sys-
tems arising in the physical sciences, economics, and engineering, largely centered on
discrete state space formulations. A great deal of theory had been developed related
to Markov chain theory, both in discrete state space and general state space. However,
the general state space theory had grown to include multiple (and somewhat divergent)
mathematical strands, having much to do with the fact that there are several natural
(but different) ways that one can choose to generalize the fundamental countable state
concept of irreducibility to general state space. Roughly speaking, one strand took ad-
vantage of topological ideas, compactness methods, and required Feller continuity of the
transition kernel. The second major strand, starting with the pioneering work of Harris
in the 1950s, subsequently amplified by Orey, and later simplified through the beautiful
contributions of Nummelin, Athreya, and Ney in the 1970s, can be viewed as an effort
to understand general state space Markov chains through the prism of regeneration.
Thus, Meyn and Tweedie had to make some key decisions regarding the general state
space tools that they would emphasize in the book. The span of time that has elapsed
since this book’s publication makes clear that they chose well.
While offering an excellent and accessible discussion of methods based on topologi-
cal machinery, the book focuses largely on the more widely applicable and more easily
used concept of regeneration in general state space. In addition, the book recognizes
the central role that Foster–Lyapunov functions play in verifying recurrence and bound-
ing the moments and expectations that arise naturally in development of the theory of
xiii
xiv Prologue to the second edition
Markov chains. In choosing to emphasize these ideas, the authors were able to offer
the community, and especially practitioners, a convenient and easily applied roadmap
through a set of concepts and ideas that had previously been accessible only to special-
ists. Sparked by the publication of the first edition of this book, there has subsequently
been an explosion in the number of papers involving applications of general state space
Markov chains.
As it turns out, the period that has elapsed since publication of the first edition
also fortuitously coincided with the rapid development of several key applications areas
in which the tools developed in the book have played a fundamental role. Perhaps
the most important such application is that of Markov chain Monte Carlo (MCMC)
algorithms. In the MCMC setting, the basic problem at hand is the construction of an
efficient algorithm capable of sampling from a given target distribution, which is known
up to a normalization constant that is not numerically or analytically computable. The
idea is to produce a Markov chain having a unique stationary distribution that coincides
with the target distribution. Constructing such a Markov chain is typically easy, so one
has many potential choices. Since the algorithm is usually initialized with an initial
distribution that is atypical of equilibrium behavior, one then wishes to find a chain
that converges to its steady state rapidly. The tools discussed in this book play a
central role in answering such questions. General state space Markov chain ideas also
have been used to great effect in other rapidly developing algorithmic contexts such as
machine learning and in the analysis of the many randomized algorithms having a time
evolution described by a stochastic recursive sequence. Finally, many of the performance
engineering applications that have been explored over the past fifteen years leverage
off this body of theory, particularly those results that have involved trying to make
rigorous the connection between stability of deterministic fluid models and stability of
the associated stochastic queueing analogue. Given the ubiquitous nature of stochastic
systems or algorithms described through stochastic recursive sequences, it seems likely
that many more applications of the theory described in this book will arise in the years
ahead. So, the marketplace of potential consumers of this book is likely to be a healthy
one for many years to come.
Even the appendices are testimony to the hard work and exacting standards the
authors brought to this project. Through additional (and very useful) discussion, these
appendices provide readers with an opportunity to see the power of the concepts of
stability and recurrence being exercised in the setting of models that are both mathe-
matically interesting and of importance in their own right. In fact, some readers will
find that the appendices are a good way to quickly remind themselves of the methods
that exist to establish a particular desired property of a Markov chain model.
This second edition remains true to the remarkable standards of scholarship estab-
lished by the first edition. As noted above, a number of applications domains that
are consumers of this theory have developed rapidly since the publication of the first
edition. As one would expect with any mathematically vibrant area, there have also
been important theoretical developments over that span of time, ranging from the ex-
ploration of these ideas in studying large deviations for additive functionals of Markov
chains to the generalization of these concepts to the setting of continuous time Markov
processes. This new edition does a splendid job of making clear the most important
Prologue to the second edition xv
such developments and pointing the reader in the direction of the key references to be
studied in each area. With the background offered by this book, the reader who wishes
to explore these recent theoretical developments is well positioned both to read the
literature and to creatively apply these ideas to the problem at hand. All the elements
that made the first edition of Markov Chains and Stochastic Stability a classic are here
in the second edition, and it will no doubt be a very welcome addition to the literature.
Peter W. Glynn
Palo Alto
Preface to the second edition
Statistics <www.imstat.org/awards/tweedie.html>.
xvii
xviii Preface to the second edition
would support. Ideally I would rewrite Chapters 15 and 16 to provide a more cohesive
treatment of geometric ergodicity, and explain how these ideas lead to foundations for
multiplicative ergodic theory, Lyapunov exponents, and the theory of large deviations.
This will have to wait for a third edition or a new book devoted to these topics. In its
place, I have provided in Section 20.1 a brief survey of these directions of research.
Section 20.2: Simulation and MCMC Richard Tweedie and I became interested
in these topics soon after the first edition went to print. Section 20.2 describes applica-
tions of general state space Markov chain techniques to the construction and analysis
of simulation algorithms, such as the control variate method [10], and algorithms found
in reinforcement learning [29, 379].
Section 20.3: Continuous time models The final section explains how theory
in continuous time can be generated from discrete time counterparts developed in this
book. In particular, all of the ergodic theorems in Part III have precise analogues in
continuous time.
The significance of Poisson’s equation was not properly highlighted in the first edi-
tion. This is rectified in a detailed commentary at the close of Chapter 17, which
includes a menu of applications, and new results on existence and uniqueness of solu-
tions to Poisson’s equation, contained in Theorems 17.7.1 and 17.7.2, respectively.
The multi-step drift criterion for stability described in Section 19.1 has been im-
proved, and this technique has found many applications. The resulting “fluid model”
approach to stability of stochastic networks is one theme of the new monograph [267].
Extensions of the techniques in Section 19.1 have found application to the theory of
stochastic approximation [40, 39], and to Markov chain Monte Carlo (MCMC) [100].
It is surprising how few errors have been uncovered since the first edition went to
print. Section 2.2.3 on the gumleaf attractor contained errors in the description of the
figures. There were other minor errors in the analysis of the forward recurrence time
chains in Section 10.3.1, and the coupling bound in Theorem 16.2.4. The term limiting
variance is now replaced by the more familiar asymptotic variance in Chapter 17, and
starting in Chapter 9 the term norm-like is replaced with the more familiar coercive.
Words of thanks
Continued support from the National Science Foundation is gratefully acknowledged.
Over the past decade, support from Control, Networks and Computational Intelligence
has funded much of the theory and applications surveyed in Chapter 20 under grants
ECS 940372, ECS 9972957, ECS 0217836, and ECS 0523620. The NSF grant DMI
0085165 supported research with Shane Henderson that is surveyed in Section 20.2.1.
It is a pleasure to convey my thanks to my wonderful editor Diana Gillooly. It was
her idea to place the book in the Cambridge Mathematical Library series. In addition
to her work “behind the scenes” at Cambridge University Press, Diana dissected the
manuscript searching for typos or inconsistencies in notation. She provided valuable
advice on structure, and patiently answered all of my questions.
Jeffrey Rosenthal has maintained the website for the online version of the first edition
at probability.ca/MT. It is reassuring to know that this resource will remain in place
“till death do us part.”
Preface to the second edition xix
In the preface to the first edition, we expressed our thanks to Peter Glynn for
his correspondence and inspiration. I am very grateful that our correspondence has
continued over the past 15 years. Much of the material contained in the surveys in the
new Chapter 20 can be regarded as part of “transcripts” from our many discussions
since the book was first put into print.
I am very grateful to Ioannis Kontoyiannis for collaborations over the past decade.
Ioannis provided comments on the new edition, including the discovery of an error in
Theorem 16.2.4. Many have sent comments over the years. In particular, Vivek Borkar,
Jan van Casteren, Peter Haas, Lars Hansen, Galin Jones, Aziz Khanchi, Tze Lai, Zhan-
Qian Lu, Abdelkader Mokkadem, Eric Moulines, Gareth Roberts, Li-Ming Wu, and
three graduates from the University of Oslo – Tore W. Larsen, Arvid Raknerud, and
Øivind Skare – all pointed out errors that have been corrected in the new edition, or
suggested recent references that are now included in the updated bibliography.
Sean Meyn
Urbana-Champaign
Preface to the first edition
(1993)
Books are individual and idiosyncratic. In trying to understand what makes a good
book, there is a limited amount that one can learn from other books; but at least one
can read their prefaces, in hope of help.
Our own research shows that authors use prefaces for many different reasons.
Prefaces can be explanations of the role and the contents of the book, as in Chung
[71] or Revuz [326] or Nummelin [303]; this can be combined with what is almost an
apology for bothering the reader, as in Billingsley [37] or Çinlar [59]; prefaces can
describe the mathematics, as in Orey [309], or the importance of the applications, as
in Tong [388] or Asmussen [9], or the way in which the book works as a text, as in
Brockwell and Davis [51] or Revuz [326]; they can be the only available outlet for
thanking those who made the task of writing possible, as in almost all of the above
(although we particularly like the familial gratitude of Resnick [325] and the dedication
of Simmons [355]); they can combine all these roles, and many more.
This preface is no different. Let us begin with those we hope will use the book.
xxi
xxii Preface to the first edition
coupling methods, allows simple renewal approaches to almost all of the hard results.
Courses on countable space Markov chains abound, not only in statistics and math-
ematics departments, but in engineering schools, operations research groups and even
business schools. This book can serve as the text in most of these environments for a
one-semester course on more general space applied Markov chain theory, provided that
some of the deeper limit results are omitted and (in the interests of a fourteen-week
semester) the class is directed only to a subset of the examples, concentrating as best
suits their discipline on time series analysis, control and systems models or operations
research models.
The prerequisite texts for such a course are certainly at no deeper level than Chung
[72], Breiman [48], or Billingsley [37] for measure theory and stochastic processes, and
Simmons [355] or Rudin [345] for topology and analysis.
Be warned: we have not provided numerous illustrative unworked examples for the
student to cut teeth on. But we have developed a rather large number of thoroughly
worked examples, ensuring applications are well understood; and the literature is lit-
tered with variations for teaching purposes, many of which we reference explicitly.
This regular interplay between theory and detailed consideration of application to
specific models is one thread that guides the development of this book, as it guides the
rapidly growing usage of Markov models on general spaces by many practitioners.
The second group of readers we envisage consists of exactly those practitioners, in
several disparate areas, for all of whom we have tried to provide a set of research and
development tools: for engineers in control theory, through a discussion of linear and
nonlinear state space systems; for statisticians and probabilists in the related areas of
time series analysis; for researchers in systems analysis, through networking models for
which these techniques are becoming increasingly fruitful; and for applied probabilists,
interested in queueing and storage models and related analyses.
We have tried from the beginning to convey the applied value of the theory rather
than let it develop in a vacuum. The practitioner will find detailed examples of tran-
sition probabilities for real models. These models are classified systematically into the
various structural classes as we define them. The impact of the theory on the models
is developed in detail, not just to give examples of that theory but because the mod-
els themselves are important and there are relatively few places outside the research
journals where their analysis is collected.
Of course, there is only so much that a general theory of Markov chains can provide
to all of these areas. The contribution is in general qualitative, not quantitative. And
in our experience, the critical qualitative aspects are those of stability of the models.
Classification of a model as stable in some sense is the first fundamental operation un-
derlying other, more model-specific, analyses. It is, we think, astonishing how powerful
and accurate such a classification can become when using only the apparently blunt
instruments of a general Markovian theory: we hope the strength of the results de-
scribed here is equally visible to the reader as to the authors, for this is why we have
chosen stability analysis as the cord binding together the theory and the applications
of Markov chains.
We have adopted two novel approaches in writing this book. The reader will find
key theorems announced at the beginning of all but the discursive chapters; if these
are understood then the more detailed theory in the body of the chapter will be better
motivated, and applications made more straightforward. And at the end of the book we
Preface to the first edition xxiii
have constructed, at the risk of repetition, “mud maps” showing the crucial equivalences
between forms of stability, and we give a glossary of the models we evaluate. We trust
both of these innovations will help to make the material accessible to the full range of
readers we have considered.
of continuous time processes, but the remark is equally apt for discrete time models of
the period. We hope that it will be apparent in this book that the general space theory
has not only caught up with its countable counterpart in the areas we describe, but has
indeed added considerably to the ways in which the simpler systems are approached.
There are several themes in this book which instance both the maturity and the
novelty of the general space model, and which we feel deserve mention, even in the
restricted level of technicality available in a preface. These are, specifically,
(i) the use of the splitting technique, which provides an approach to general state
space chains through regeneration methods;
(ii) the use of “Foster–Lyapunov” drift criteria, both in improving the theory and in
enabling the classification of individual chains;
(iii) the delineation of appropriate continuity conditions to link the general theory with
the properties of chains on, in particular, Euclidean space; and
(iv) the development of control model approaches, enabling analysis of models from
their deterministic counterparts.
These are not distinct themes: they interweave to a surprising extent in the mathematics
and its implementation.
The key factor is undoubtedly the existence and consequences of the Nummelin
splitting technique of Chapter 5, whereby it is shown that if a chain {Φn } on a quite
general space satisfies the simple “ϕ-irreducibility” condition (which requires that for
some measure ϕ, there is at least positive probability from any initial point x that one
of the Φn lies in any set of positive ϕ-measure; see Chapter 4), then one can induce an
artificial “regeneration time” in the chain, allowing all of the mechanisms of discrete
time renewal theory to be brought to bear.
Part I is largely devoted to developing this theme and related concepts, and their
practical implementation.
The splitting method enables essentially all of the results known for countable space
to be replicated for general spaces. Although that by itself is a major achievement,
it also has the side benefit that it forces concentration on the aspects of the theory
that depend, not on a countable space which gives regeneration at every step, but on
a single regeneration point. Part II develops the use of the splitting method, amongst
other approaches, in providing a full analogue of the positive recurrence/null recur-
rence/transience trichotomy central in the exposition of countable space chains, together
with consequences of this trichotomy.
In developing such structures, the theory of general space chains has merely caught
up with its denumerable progenitor. Somewhat surprisingly, in considering asymptotic
results for positive recurrent chains, as we do in Part III, the concentration on a sin-
gle regenerative state leads to stronger ergodic theorems (in terms of total variation
convergence), better rates of convergence results, and a more uniform set of equivalent
conditions for the strong stability regime known as positive recurrence than is typically
realised for countable space chains.
The outcomes of this splitting technique approach are possibly best exemplified in
the case of so-called “geometrically ergodic” chains.
Preface to the first edition xxv
Let τC be the hitting time on any set C: that is, the first time that the chain Φn
returns to C; and let P n (x, A) = P(Φn ∈ A | Φ0 = x) denote the probability that
the chain is in a set A at time n given it starts at time zero in state x, or the “n-step
transition probabilities”, of the chain. One of the goals of Part II and Part III is to link
conditions under which the chain returns quickly to “small” sets C (such as finite or
compact sets), measured in terms of moments of τC , with conditions under which the
probabilities P n (x, A) converge to limiting distributions.
Here is a taste of what can be achieved. We will eventually show, in Chapter 15,
the following elegant result:
The following conditions are all equivalent for a ϕ-irreducible “aperiodic” (see Chap-
ter 5) chain:
(A) For some one “small” set C, the return time distributions have geometric tails;
that is, for some r > 1
sup Ex [rτ C ] < ∞.
x∈C
(B) For some one “small” set C, the transition probabilities converge geometrically
quickly; that is, for some M < ∞, P ∞ (C) > 0 and ρC < 1
(C) For some one “small” set C, there is “geometric drift” towards C; that is, for
some function V ≥ 1 and some β > 0
P (x, dy)V (y) ≤ (1 − β)V (x) + IC (x).
Each of these implies that there is a limiting probability measure π, a constant R < ∞
and some uniform rate ρ < 1 such that
sup P (x, dy)f (y) − π(dy)f (y) ≤ RV (x)ρn
n
|f |≤V
Who do we owe?
Like most authors we owe our debts, professional and personal. A preface is a good
place to acknowledge them.
The alphabetically and chronologically younger author began studying Markov
chains at McGill University in Montréal. John Taylor introduced him to the beauty
of probability. The excellent teaching of Michael Kaplan provided a first contact with
Markov chains and a unique perspective on the structure of stochastic models.
He is especially happy to have the chance to thank Peter Caines for planting him in
one of the most fantastic cities in North America, and for the friendship and academic
environment that he subsequently provided.
In applying these results, very considerable input and insight has been provided by
Lei Guo of Academia Sinica in Beijing and Doug Down of the University of Illinois.
Some of the material on control theory and on queues in particular owes much to their
collaboration in the original derivations.
He is now especially fortunate to work in close proximity to P.R. Kumar, who has
been a consistent inspiration, particularly through his work on queueing networks and
adaptive control. Others who have helped him, by corresponding on current research, by
sharing enlightenment about a new application, or by developing new theoretical ideas,
include Venkat Anantharam, A. Ganesh, Peter Glynn, Wolfgang Kliemann, Laurent
Praly, John Sadowsky, Karl Sigman, and Victor Solo.
The alphabetically later and older author has a correspondingly longer list of in-
fluences who have led to his abiding interest in this subject. Five stand out: Chip
Heathcote and Eugene Seneta at the Australian National University, who first taught
the enjoyment of Markov chains; David Kendall at Cambridge, whose own fundamental
work exemplifies the power, the beauty and the need to seek the underlying simplicity of
such processes; Joe Gani, whose unflagging enthusiasm and support for the interaction
of real theory and real problems has been an example for many years; and probably
Preface to the first edition xxvii
most significantly for the developments in this book, David Vere-Jones, who has shown
an uncanny knack for asking exactly the right questions at times when just enough was
known to be able to develop answers to them.
It was also a pleasure and a piece of good fortune for him to work with the Finnish
school of Esa Nummelin, Pekka Tuominen and Elja Arjas just as the splitting technique
was uncovered, and a large amount of the material in this book can actually be traced
to the month surrounding the First Tuusula Summer School in 1976. Applying the
methods over the years with David Pollard, Paul Feigin, Sid Resnick and Peter Brock-
well has also been both illuminating and enjoyable; whilst the ongoing stimulation and
encouragement to look at new areas given by Wojtek Szpankowski, Floske Spieksma,
Chris Adam and Kerrie Mengersen has been invaluable in maintaining enthusiasm and
energy in finishing this book.
By sheer coincidence both of us have held Postdoctoral Fellowships at the Australian
National University, albeit at somewhat different times. Both of us started much of our
own work in this field under that system, and we gratefully acknowledge those most
useful positions, even now that they are long past.
More recently, the support of our institutions has been invaluable. Bond University
facilitated our embryonic work together, whilst the Coordinated Sciences Laboratory of
the University of Illinois and the Department of Statistics at Colorado State University
have been enjoyable environments in which to do the actual writing.
Support from the National Science Foundation is gratefully acknowledged: grants
ECS 8910088 and DMS 9205687 enabled us to meet regularly, helped to fund our
students in related research, and partially supported the completion of the book.
Writing a book from multiple locations involves multiple meetings at every available
opportunity. We appreciated the support of Peter Caines in Montréal, Bozenna and
Tyrone Duncan at the University of Kansas, Will Gersch in Hawaii, Götz Kersting and
Heinrich Hering in Germany, for assisting in our meeting regularly and helping with
far-flung facilities.
Peter Brockwell, Kung-Sik Chan, Richard Davis, Doug Down, Kerrie Mengersen,
Rayadurgam Ravikanth, and Pekka Tuominen, and most significantly Vladimir Kalash-
nikov and Floske Spieksma, read fragments or reams of manuscript as we produced
them, and we gratefully acknowledge their advice, comments, corrections and encour-
agement. It is traditional, and in this case as accurate as usual, to say that any remain-
ing infelicities are there despite their best efforts.
Rayadurgam Ravikanth produced the sample path graphs for us; Bob MacFarlane
drew the remaining illustrations; and Francie Bridges produced much of the bibliogra-
phy and some of the text. The vast bulk of the material we have done ourselves: our
debt to Donald Knuth and the developers of LATEX is clear and immense, as is our debt
to Deepa Ramaswamy, Molly Shor, Rich Sutton and all those others who have kept
software, email and remote telematic facilities running smoothly.
Lastly, we are grateful to Brad Dickinson and Eduardo Sontag, and to Zvi Ruder
and Nicholas Pinfield and the Engineering and Control Series staff at Springer, for their
patience, encouragement and help.
xxviii Preface to the first edition
And finally . . .
And finally, like all authors whether they say so in the preface or not, we have received
support beyond the call of duty from our families. Writing a book of this magnitude has
taken much time that should have been spent with them, and they have been unfailingly
supportive of the enterprise, and remarkably patient and tolerant in the face of our quite
unreasonable exclusion of other interests.
They have lived with family holidays where we scribbled proto-books in restaurants
and tripped over deer whilst discussing Doeblin decompositions; they have endured
sundry absences and visitations, with no idea of which was worse; they have seen come
and go a series of deadlines with all of the structure of a renewal process.
They are delighted that we are finished, although we feel they have not yet adjusted
to the fact that a similar development of the continuous time theory clearly needs to
be written next.
So to Belinda, Sydney and Sophie; to Catherine and Marianne: with thanks for the
patience, support and understanding, this book is dedicated to you.
Part I
COMMUNICATION
and
REGENERATION
Chapter 1
Heuristics
This book is about Markovian models, and particularly about the structure and stability
of such models. We develop a theoretical basis by studying Markov chains in very
general contexts; and we develop, as systematically as we can, the applications of this
theory to applied models in systems engineering, in operations research, and in time
series.
A Markov chain is, for us, a collection of random variables Φ = {Φn : n ∈ T }, where
T is a countable time set. It is customary to write T as Z+ := {0, 1, . . .}, and we will
do this henceforth.
Heuristically, the critical aspect of a Markov model, as opposed to any other set of
random variables, is that it is forgetful of all but its most immediate past. The precise
meaning of this requirement for the evolution of a Markov model in time, that the
future of the process is independent of the past given only its present value, and the
construction of such a model in a rigorous way, is taken up in Chapter 3. Until then it
is enough to indicate that for a process Φ, evolving on a space X and governed by an
overall probability law P, to be a time-homogeneous Markov chain, there must be a set
of “transition probabilities” {P n (x, A), x ∈ X, A ⊂ X} for appropriate sets A such that
for times n, m in Z+
that is, P n (x, A) denotes the probability that a chain at x will be in the set A after n
steps, or transitions. The independence of P n on the values of Φj , j ≤ m, is the Markov
property, and the independence of P n and m is the time-homogeneity property.
We now show that systems which are amenable to modeling by discrete time Markov
chains with this structure occur frequently, especially if we take the state space of the
process to be rather general, since then we can allow auxiliary information on the past
to be incorporated to ensure the Markov property is appropriate.
3
4 Heuristics
(a) The cruise control system on a modern motor vehicle monitors, at each time point
k, a vector {Xk } of inputs: speed, fuel flow, and the like (see Kuo [230]). It
calculates a control value Uk which adjusts the throttle, causing a change in the
values of the environmental variables Xk +1 which in turn causes Uk +1 to change
again. The multidimensional process Φk = {Xk , Uk } is often a Markov chain (see
Section 2.3.2), with new values overriding those of the past, and with the next
value governed by the present value. All of this is subject to measurement error,
and the process can never be other than stochastic: stability for this chain consists
in ensuring that the environmental variables do not deviate too far, within the
limits imposed by randomness, from the pre-set goals of the control algorithm.
(b) A queue at an airport evolves through the random arrival of customers and the
service times they bring. The numbers in the queue, and the time the customer
has to wait, are critical parameters for customer satisfaction, for waiting room
design, for counter staffing (see Asmussen [9]). Under appropriate conditions (see
Section 2.4.2), variables observed at arrival times (either the queue numbers, or
a combination of such numbers and aspects of the remaining or currently uncom-
pleted service times) can be represented as a Markov chain, and the question of
stability is central to ensuring that the queue remains at a viable level. Techniques
arising from the analysis of such models have led to the now familiar single-line
multi-server counters actually used in airports, banks and similar facilities, rather
than the previous multi-line systems.
(c) The exchange rate Xn between two currencies can be and is represented as a
function of its past several values Xn −1 , . . . , Xn −k , modified by the volatility of
the market which is incorporated as a disturbance term Wn (see Krugman and
Miller [222] for models of such fluctuations). The autoregressive model
k
Xn = αj Xn −j + Wn
j =1
central in time series analysis (see Section 2.1) captures the essential concept of
such a system. By considering the whole k-length vector Φn = (Xn , . . . , Xn −k +1 ),
Markovian methods can be brought to the analysis of such time-series models.
Stability here involves relatively small fluctuations around a norm; and as we will
see, if we do not have such stability, then typically we will have instability of the
grossest kind, with the exchange rate heading to infinity.
(d) Storage models are fundamental in engineering, insurance and business. In en-
gineering one considers a dam, with input of random amounts at random times,
and a steady withdrawal of water for irrigation or power usage. This model has a
Markovian representation (see Section 2.4.3 and Section 2.4.4). In insurance, there
is a steady inflow of premiums, and random outputs of claims at random times.
This model is also a storage process, but with the input and output reversed when
compared to the engineering version, and also has a Markovian representation (see
Asmussen [9]). In business, the inventory of a firm will act in a manner between
these two models, with regular but sometimes also large irregular withdrawals,
1.1. A range of Markovian environments 5
state space, and (as always) a choice of appropriate assumptions, the methods we
give in this book become tools to analyze the stability of the system.
Simple spaces do not describe these systems in general. Integer or real-valued models
are sufficient only to analyze the simplest models in almost all of these contexts.
The methods and descriptions in this book are for chains which take their values
in a virtually arbitrary space X. We do not restrict ourselves to countable spaces, nor
even to Euclidean space Rn , although we do give specific formulations of much of our
theory in both these special cases, to aid both understanding and application.
One of the key factors that allows this generality is that, for the models we consider,
there is no great loss of power in going from a simple to a quite general space. The
reader interested in any of the areas of application above should therefore find that
the structural and stability results for general Markov chains are potentially tools of
great value, no matter what the situation, no matter how simple or complex the model
considered.
Φn = {Yn , . . . , Yn −k }
and setting Φ = {Φn , n ≥ 0} (taking obvious care in defining {Φ0 , . . . , Φk −1 }), we can
define from Y a Markov chain Φ. The motion in the first coordinate of Φ reflects that
of Y , and in the other coordinates is trivial to identify, since Yn becomes Y(n +1)−1 , and
so forth; and hence Y can be analyzed by Markov chain methods.
1.2. Basic models in practice 7
Such state space representations, despite their somewhat artificial nature in some
cases, are an increasingly important tool in deterministic and stochastic systems theory,
and in linear and nonlinear time series analysis.
As the second paradigm for constructing a Markov model representing a non-
Markovian system, we look for so-called embedded regeneration points. These are times
at which the system forgets its past in a probabilistic sense: the system viewed at such
time points is Markovian even if the overall process is not.
Consider as one such model a storage system, or dam, which fills and empties. This
is rarely Markovian: for instance, knowledge of the time since the last input, or the size
of previous inputs still being drawn down, will give information on the current level of
the dam or even the time to the next input. But at that very special sequence of times
when the dam is empty and an input actually occurs, the process may well “forget
the past”, or “regenerate”: appropriate conditions for this are that the times between
inputs and the size of each input are independent. For then one cannot forecast the
time to the next input when at an input time, and the current emptiness of the dam
means that there is no information about past input levels available at such times. The
dam content, viewed at these special times, can then be analyzed as a Markov chain.
“Regenerative models” for which such “embedded Markov chains” occur are common
in operations research, and in particular in the analysis of queueing and network models.
State space models and regeneration time representations have become increasingly
important in the literature of time series, signal processing, control theory, and opera-
tions research, and not least because of the possibility they provide for analysis through
the tools of Markov chain theory. In the remainder of this opening chapter, we will in-
troduce a number of these models in their simplest form, in order to provide a concrete
basis for further development.
xk +1 = F xk (1.3)
where F is an n × n matrix.
8 Heuristics
X2 X2
X1 X1
Figure 1.1: At left is a sample path generated by the deterministic linear model on R2 .
At right is a sample path from the linear state space model on R2 with Gaussian noise.
Xk +1 = F Xk + GWk +1
where X0 is arbitrary;
(LSS2) the random variables {Wk } are independent and identically dis-
tributed (i.i.d), and are independent of X0 , with common distribution
Γ(A) = P(Wj ∈ A) having finite mean and variance.
Then X is called the linear state space model driven by F, G, or the
LSS(F ,G) model, with associated control model LCM(F ,G).
Such linear models with random “noise” or “innovation” are related to both the
simple deterministic model (1.3) and also the linear control model (1.4).
10 Heuristics
There are obviously two components to the evolution of a state space model. The
matrix F controls the motion in one way, but its action is modulated by the regular
input of random fluctuations which involve both the underlying variable with distribu-
tion Γ, and its adjustment through G. At 2.5left
in Figure 1.1 we show a sample path
corresponding to the same matrix F , G = 2.5 , and with Γ taken as a bivariate Normal,
or Gaussian, distribution N (0, 1). This indicates that the addition of the noise variables
W can lead to types of behavior very different to that of the deterministic model, even
with the same choice of the function F .
Such models describe the movements of airplanes, of industrial and engineering
equipment, and even (somewhat idealistically) of economies and financial systems [3,
57]. Stability in these contexts is then understood in terms of return to level flight, or
small and (in practical terms) insignificant deviations from set engineering standards,
or minor inflation or exchange-rate variation. Because of the random nature of the noise
we cannot expect totally unvarying systems; what we seek to preclude are explosive or
wildly fluctuating operations.
We will see that, in wide generality, if the linear control model LCM(F ,G) is stable in
a deterministic way, and if we have a “reasonable” distribution Γ for our random control
sequences, then the linear state space LSS(F ,G) model is also stable in a stochastic
sense.
In Chapter 2 we will describe models which build substantially on these simple
structures, and which illustrate the development of Markovian structures for linear and
nonlinear state space model theory.
We now leave state space models, and turn to the simplest examples of another class
of models, which may be thought of collectively as models with a regenerative structure.
Φk +1 = Φk + Wk +1 . (1.5)
It is obvious that Φ = {Φk : k ∈ Z+ } is a Markov chain, taking values in the real line
R = (−∞, ∞); the independence of the {Wk } guarantees the Markovian nature of the
chain Φ.
In this context, stability (as far as the gambling house is concerned) requires that
Φ eventually reaches (−∞, 0]; a greater degree of stability is achieved from the same
perspective if the time to reach (−∞, 0] has finite mean. Inevitably, of course, this
stability is also the gambler’s ruin.
Such a chain, defined by taking successive sums of i.i.d. random variables, provides
a model for very many different systems, and is known as random walk.
1.2. Basic models in practice 11
Φk
Γ = N (0, 1)
Φk Φk
Γ = N (−0.2, 1) Γ = N (0.2, 1)
k k
Figure 1.2: Random walk sample paths from three different models. The increment
distributions is Γ = N (0, 1) for the path shown at top. The increment distribution is
Γ = N (−0.2, 1) for the path shown on the lower left, and Γ = N (+0.2, 1) for the path
shown on the lower right.
Random walk
Suppose that Φ = {Φk ; k ∈ Z+ } is a collection of random variables defined
by choosing an arbitrary distribution for Φ0 and setting for k ∈ Z+
(RW1)
Φk +1 = Φk + Wk +1
where the Wk are i.i.d. random variables taking values in R with
In Figure 1.2 we give sets of three sample paths of random walks with different
distributions for Γ: all start at the same value but we choose for the winnings on each
game
12 Heuristics
(ii) W having a Gaussian N(−0.2, 1) distribution, so the game is not fair, with the
house winning one unit on average each five plays;
(iii) W having a Gaussian N(0.2, 1) distribution, so the game modeled is, perhaps,
one of “skill” where the player actually wins on average one unit per five games
against the house.
The sample paths clearly indicate that ruin is rather more likely under case (ii) than
under case (iii) or case (i): but when is ruin certain? And how long does it take if it is
certain?
These are questions involving the stability of the random walk model, or at least
that modification of the random walk which we now define.
This chain follows the paths of a random walk, but is held at zero when the underly-
ing random walk becomes non-positive, leaving zero again only when the next positive
value occurs in the sequence {Wk }.
In Figure 1.3 we again give sets of sample paths of random walks on the half line
[0, ∞), corresponding to those of the unrestricted random walk in the previous section.
The difference in the proportion of paths which hit, or return to, the state {0} is again
clear.
We shall see in Chapter 2 that random walk on a half line is both a model for storage
systems and a model for queueing systems. For all such applications there are similar
1.3. Stochastic stability for Markov models 13
Φk Φk
Γ = N (−0.2, 1) Γ = N (+0.2, 1)
k k
Figure 1.3: Random walk paths reflected at zero. The increment distribution is Γ =
N (−0.2, 1) for the plot shown on the left, and Γ = N (+0.2, 1) for the plot shown on
the right.
concerns and concepts of the structure and the stability of the models: we need to know
whether a dam overflows, whether a queue ever empties, whether a computer network
jams. In the next section we give a first heuristic description of the ways in which such
stability questions might be formalized.
chain literature, and some we take from dynamical or stochastic systems theory, which
is concerned with precisely these same questions under rather different conditions on
the model structures.
(I) ϕ-irreducibility for a general space chain, which we approach by requiring that
the space supports a measure ϕ with the property that for every starting point
x∈X
ϕ(A) > 0 ⇒ Px (τA < ∞) > 0
where Px denotes the probability of events conditional on the chain beginning with
Φ0 = x.
This condition ensures that all “reasonable sized” sets, as measured by ϕ, can be
reached from every possible starting point.
For a countable space chain ϕ-irreducibility is just the concept of irreducibility
commonly used [59, 71], with ϕ taken as counting measure.
For a state space model ϕ-irreducibility is related to the idea that we are able to
“steer” the system to every other state in Rn . The linear control LCM(F ,G) model
is called controllable if for any initial states x0 and any other x ∈ X, there exists
m ∈ Z+ and a sequence of control variables (u1 , . . . , um ) ∈ Rp such that xm = x when
(u1 , . . . , um ) = (u1 , . . . , um ). If this does not hold then for some starting points we
are in one part of the space forever; from others we are in another part of the space.
Controllability, and analogously irreducibility, preclude this.
Thus under irreducibility we do not have systems so unstable in their starting po-
sition that, given a small change of initial position, they might change so dramatically
that they have no possibility of reaching the same set of states.
A study of the wide-ranging consequences of such an assumption of irreducibility
will occupy much of Part I of this book: the definition above will be shown to produce
remarkable solidity of behavior.
The next level of stability is a requirement, not only that there should be a possibility
of reaching like states from unlike starting points, but that reaching such sets of states
should be guaranteed eventually. This leads us to define and study concepts of
1.3. Stochastic stability for Markov models 15
(II) recurrence, for which we might ask as a first step that there is a measure ϕ
guaranteeing that for every starting point x ∈ X
These conditions ensure that reasonable sized sets are reached with probability one, as
in (1.8), or even in a finite mean time as in (1.9). Part II of this book is devoted to
the study of such ideas, and to showing that for irreducible chains, even on a general
state space, there are solidarity results which show that either such uniform (in x)
stability properties hold, or the chain is unstable in a well-defined way: there is no
middle ground, no “partially stable” behavior available.
For deterministic models, the recurrence concepts in (II) are obviously the same.
For stochastic models they are definitely different. For “suitable” chains on spaces with
appropriate topologies (the T-chains introduced in Chapter 6), the first will turn out
to be entirely equivalent to requiring that “evanescence”, defined by
∞
{Φ → ∞} = {Φ ∈ On infinitely often}c (1.10)
n =0
for a countable collection of open pre-compact sets {On }, has zero probability for all
starting points; the second is similarly equivalent, for the same “suitable” chains, to
requiring that for any ε > 0 and any x there is a compact set C such that
(III) the limiting, or ergodic, behavior of the chain: and it emerges that in the stronger
recurrent situation described by (1.9) there is an “invariant regime” described
by a measure π such that if the chain starts in this regime (that is, if Φ0 has
distribution π) then it remains in the regime, and moreover if the chain starts in
some other regime then it converges in a strong probabilistic sense with π as a
limiting distribution.
In Part III we largely confine ourselves to such ergodic chains, and find both theoretical
and pragmatic results ensuring that a given chain is at this level of stability. For whilst
the construction of solidarity results, as in Parts I and II, provides a vital underpinning
16 Heuristics
to the use of Markov chain theory, it is the consequences of that stability, in the form
of powerful ergodic results, that makes the concepts of very much more than academic
interest.
Let us provide motivation for such endeavors by describing, with a little more for-
mality, just how solid the solidarity results are, and how strong the consequent ergodic
theorems are. We will show, in Chapter 13, the following:
Theorem 1.3.1. The following four conditions are equivalent:
(i) The chain admits a unique probability measure π satisfying the invariant equations
π(A) = π(dx)P (x, A), A ∈ B(X); (1.12)
(ii) There exists some “small” set C ∈ B(X) and MC < ∞ such that
(iii) There exists some “small” set C, some b < ∞ and some non-negative “test func-
tion” V , finite ϕ-almost everywhere, satisfying
P (x, dy)V (y) ≤ V (x) − 1 + bIC (x), x ∈ X; (1.14)
(iv) There exists some “small” set C ∈ B(X) and some P ∞ (C) > 0 such that as
n→∞
lim inf sup |P n (x, C) − P ∞ (C)| = 0 (1.15)
n →∞ x∈C
for every x ∈ X for which V (x) < ∞, where V is any function satisfying (1.14).
Thus “local recurrence” in terms of return times, as in (1.13) or “local convergence”
as in (1.15) guarantees the uniform limits in (1.16); both are equivalent to the mere
existence of the invariant probability measure π; and moreover we have in (1.14) an
exact test based only on properties of P for checking stability of this type.
Each of (i)–(iv) is a type of stability: the beauty of this result lies in the fact that they
are completely equivalent. Moreover, for this irreducible form of Markovian system, it is
further possible in the “stable” situation of this theorem to develop asymptotic results,
which ensure convergence not only of the distributions of the chain, but also of very
general (and not necessarily bounded) functions of the chain (Chapter 14); to develop
global rates of convergence to these limiting values (Chapter 15 and Chapter 16); and
to link these to Laws of Large Numbers or Central Limit Theorems (Chapter 17).
Together with these consequents of stability, we also provide a systematic approach
for establishing stability in specific models in order to utilize these concepts. The exten-
sion of the so-called “Foster–Lyapunov” criteria as in (1.14) to all aspects of stability,
1.3. Stochastic stability for Markov models 17
and application of these criteria in complex models, is a key feature of our approach to
stochastic stability.
These concepts are largely classical in the theory of countable state space Markov
chains. The extensions we give to general spaces, as described above, are neither so
well known nor, in some cases, previously known at all.
The heuristic discussion of this section will take considerable formal justification,
but the end-product will be a rigorous approach to the stability and structure of Markov
chains.
In this sense the Markov transition function P can be viewed as a deterministic map
from M to itself, and P will induce such a dynamical system if it is suitably continuous.
This interpretation can be achieved if the chain is on a suitably behaved space and
has the Feller property that P f (x) := P (x, dy)f (y) is continuous for every bounded
continuous f , and then d becomes a weak convergence metric (see Chapter 6).
As in the stronger recurrence ideas in (II) and (III) in Section 1.3.1, in discussing
the stability of Φ, we are usually interested in the behavior of the terms P k , k ≥ 0,
when k becomes large. Our hope is that this sequence will be bounded in some sense,
or converge to some fixed probability π ∈ M, as indeed it does in (1.16).
Four traditional formulations of stability for a dynamical system, which give a frame-
work for such questions, are
(i) Lagrange stability: for each x ∈ X , the orbit starting at x is a precompact subset
of X . For the system (P, M, d) with d the weak convergence metric, this is exactly
tightness of the distributions of the chain, as defined in (1.11);
18 Heuristics
where d denotes the metric on X . This is again the requirement that the long-term
behavior of the system is not overly sensitive to a change in the initial conditions;
(iii) Asymptotic stability: there exists some fixed point x∗ so that T k x∗ = x∗ for
all k, with trajectories {xk } starting near x∗ staying near and converging to x∗
as k → ∞. For the system (P, M, d) the existence of a fixed point is exactly
equivalent to the existence of a solution to the invariant equations (1.12);
(iv) Global asymptotic stability: the system is stable in the sense of Lyapunov and for
some fixed x∗ ∈ X and every initial condition x ∈ X ,
This is comparable to the result of Theorem 1.3.1 for the dynamical system
(P, M, d).
Lagrange stability requires that any limiting measure arising from the sequence {µP k }
will be a probability measure, rather as in (1.16).
Stability in the sense of Lyapunov is most closely related to irreducibility, although
rather than placing a global requirement on every initial condition in the state space,
stability in the sense of Lyapunov only requires that two initial conditions which are
sufficiently close will then have comparable long term behavior. Stability in the sense
of Lyapunov says nothing about the actual boundedness of the orbit {T k x}, since it is
simply continuity of the maps {T k }, uniformly in k ≥ 0. An example of a system on R
which is stable in the sense of Lyapunov is the simple recursion xk +1 = xk + 1, k ≥ 0.
Although distinct trajectories stay close together if their initial conditions are similarly
close, we would not consider this system stable in most other senses of the word.
The connections between the probabilistic recurrence approach and the dynamical
systems approach become very strong in the case where the chain is both Feller and
ϕ-irreducible, and when the irreducibility measure ϕ is related to the topology by the
requirement that the support of ϕ contains an open set.
In this case, by combining the results of Chapter 6 and Chapter 18, we get for
suitable spaces
Theorem 1.3.2. For a ϕ-irreducible “aperiodic” Feller chain with supp ϕ containing
an open set, the dynamical system (P, M, d) is globally asymptotically stable if and only
if the distributions {P k (x, · )} are tight as in (1.11); and then the uniform ergodic limit
(1.16) holds.
This result follows, not from dynamical systems theory, but by showing that such
a chain satisfies the conditions of Theorem 1.3.1; these Feller chains are an especially
useful subset of the “suitable” chains for which tightness is equivalent to the properties
described in Theorem 1.3.1, and then, of course, (1.16) gives a result rather stronger
than (1.17).
1.4. Commentary 19
1.4 Commentary
This book does not address models where the time set is continuous (when Φ is usually
called a Markov process), despite the sometimes close relationship between discrete and
continuous time models: see Chung [71] or Anderson [4] for the classical countable space
approach.
On general spaces in continuous time, there are a totally different set of questions
that are often seen as central: these are exemplified in Sharpe [352], although the
interested reader should also see Meyn and Tweedie [279, 280, 278] for recent results
which are much closer in spirit to, and rely heavily on, the countable time approach
followed in this book.
There has also been considerable recent work over the past two decades on the
subject of more generally indexed Markov models (such as Markov random fields, where
T is multidimensional), and these are also not in this book. In our development Markov
chains always evolve through time as a scalar, discrete quantity.
The question of what to call a Markovian model, and whether to concentrate on the
denumerability of the space or the time parameter in using the word “chain”, seems to
have been resolved in the direction we take here. Doob [99] and Chung [71] reserve the
term chain for systems evolving on countable spaces with both discrete and continuous
time parameters, but usage seems to be that it is the time set that gives the “chaining”.
Revuz [326], in his Notes, gives excellent reasons for this.
The examples we begin with here are rather elementary, but equally they are com-
pletely basic, and represent the twin strands of application we will develop: the first,
from deterministic to stochastic models via a “stochasticization” within the same func-
tional framework has analogies with the approach of Stroock and Varadhan in their
analysis of diffusion processes (see [378, 377, 168]), whilst the second, from basic inde-
pendent random variables to sums and other functionals traces its roots back too far
to be discussed here. Both these models are close to identical at this simple level. We
give more diverse examples in Chapter 2.
We will typically use X and Xn to denote state space models, or their values at
time n, in accordance with rather long established conventions. We will then typically
use lower case letters to denote the values of related deterministic models. Regenerative
models such as random walk are, on the other hand, typically denoted by the symbols
Φ and Φn , which we also use for generic chains.
The three concepts described in (I)–(III) may seem to give a rather limited number
of possible versions of “stability”. Indeed, in the various generalizations of deterministic
dynamical systems theory to stochastic models which have been developed in the past
three decades (see for example Kushner [232] or Khas’minskii [206]) there have been
many other forms of stability considered. All of them are, however, qualitatively similar,
and fall broadly within the regimes we describe, even though they differ in detail.
20 Heuristics
It will become apparent in the course of our development of the theory of irreducible
chains that in fact, under fairly mild conditions, the number of different types of behav-
ior is indeed limited to precisely those sketched above in (I)–(III). Our aim is to unify
many of the partial approaches to stability and structural analysis, to indicate how
they are in many cases equivalent, and to develop both criteria for stability to hold for
individual models, and limit theorems indicating the value of achieving such stability.
With this rather optimistic statement, we move forward to consider some of the
specific models whose structure we will elucidate as examples of our general results.
Chapter 2
Markov models
The results presented in this book have been written in the desire that practitioners
will use them. We have tried therefore to illustrate the use of the theory in a systematic
and accessible way, and so this book concentrates not only on the theory of general
space Markov chains, but on the application of that theory in considerable detail.
We will apply the results which we develop across a range of specific applications:
typically, after developing a theoretical construct, we apply it to models of increas-
ing complexity in the areas of systems and control theory, both linear and nonlinear,
both scalar and vector valued; traditional “applied probability” or operations research
models, such as random walks, storage and queueing models, and other regenerative
schemes; and models which are in both domains, such as classical and recent time series
models.
These are not given merely as “examples” of the theory: in many cases, the appli-
cation is difficult and deep of itself, whilst applications across such a diversity of areas
have often driven the definition of general properties and the links between them. Our
goal has been to develop the analysis of applications on a step-by-step basis as the
theory becomes richer throughout the book.
To motivate the general concepts, then, and to introduce the various areas of appli-
cation, we leave until Chapter 3 the normal and necessary foundations of the subject,
and first introduce a cross-section of the models for which we shall be developing those
foundations.
These models are still described in a somewhat heuristic way. The full mathematical
description of their dynamics must await the development in the next chapter of the
concepts of transition probabilities, and the reader may on occasion benefit by moving
to some of those descriptions in parallel with the outlines here.
It is also worth observing immediately that the descriptive definitions here are from
time to time supplemented by other assumptions in order to achieve specific results:
these assumptions, and those in this chapter and the last, are collected for ease of
reference in Appendix C.
As the definitions are developed, it will be apparent immediately that very many of
these models have a random additive component, such as the i.i.d. sequence {Wn } in
both the linear state space model and the random walk model. Such a component goes
by various names, such as error, noise, innovation, disturbance or increment sequence,
21
22 Markov models
across the various model areas we consider. We shall use the nomenclature relevant to
the context of each model.
We will save considerable repetitive definition if we adopt a global convention im-
mediately to cover these sequences.
It will also be apparent that many models are defined inductively from their own past
in combination with such innovation sequences. In order to commence the induction,
initial values are needed. We adopt a second convention immediately to avoid repetition
in defining our models.
Initialization
Unless specifically defined otherwise, the initial state {Φ0 } of a Markov
model will be taken as independent of the error, noise, innovation, distur-
bance or increments process, and will have an arbitrary distribution.
Xk Xk
k k
Figure 2.1: Shown on the left is a sample path from the linear model with α = 0.85, and
shown on the right is a sample path obtained with α = 1.05. The increment distribution
is N (0, 1) in each case.
Autoregressive model
A process Y = {Yn } is called a (scalar) autoregression of order k, or AR(k)
model, if it satisfies, for each set of initial values (Y0 , . . . , Y−k +1 ),
(AR1) for each n ∈ Z+ , Yn and Wn are random variables on R satisfying
inductively for n ≥ 1
Yn = α1 Yn −1 + α2 Yn −2 + . . . + αk Yn −k + Wn ,
for some α1 , . . . , αk ∈ R;
Xn = (Yn , . . . , Yn −k +1 )
and setting X = {Xn , n ≥ 0}, we define X as a Markov chain whose first component has
exactly the sample paths of the autoregressive process. Note that the general convention
that X0 has an arbitrary distribution implies that the first k variables (Y0 , . . . , Y−k +1 )
are also considered arbitrary.
The autoregressive model can then be viewed as a specific version of the vector-
2.1. Markov models in time series 25
Yn = α1 Yn −1 + α2 Yn −2 + · · · + αk Yn −k
+ Wn + β1 Wn −1 + β2 Wn −2 + · · · + β Wn − ,
for some α1 , . . . , αk , β1 , . . . , β ∈ R;
In this case more care must be taken to obtain a suitable Markovian description of
the process. One approach is to take
Xn = (Yn , . . . , Yn −k +1 , Wn , . . . , Wn −
+1 ) .
Although the resulting state process X is Markovian, the dimension of this realization
may be overly large for effective analysis. A realization of lower dimension may be
obtained by defining the stochastic process Z inductively by
Zn = α1 Zn −1 + α2 Zn −2 + · · · + αk Zn −k + Wn . (2.2)
When the initial conditions are defined appropriately, it is a matter of simple algebra
and an inductive argument to show that
Yn = Zn + β1 Zn −1 + β2 Zn −2 + · · · + β
Zn −
,
Hence the probabilistic structure of the ARMA(k,
) process is completely determined
by the Markov chain {(Zn , . . . , Zn −k +1 ) : n ∈ Z+ } which takes values in Rk .
The behavior of the general ARMA(k,
) model can thus be placed in the Markovian
context, and we will develop the stability theory of this, and more complex versions of
this model, in the sequel.
26 Markov models
xn +1 = F (xn ), n ∈ Z+ , (2.3)
for some continuous function F : R → R. Hence the simple linear model defined in
(SLM1) may be interpreted as a linear dynamical system perturbed by the “noise”
sequence W .
The theory of time series is in this sense closely related to the general theory of
dynamical systems: it has developed essentially as that subset of stochastic dynamical
systems theory for which the relationships between the variables are linear, and even
with the nonlinear models from the time series literature which we consider below, there
is still a large emphasis on linear substructures.
The theory of dynamical systems, in contrast to time series theory, has grown from
a deterministic base, considering initially the type of linear relationship in (1.3) with
which we started our examples in Section 1.2, but progressing to models allowing a very
general (but still deterministic) relationship between the variables in the present and
in the past, as in (2.3). It is in the more recent development that “noise” variables,
allowing the system to be random in some part of its evolution, have been introduced.
Nonlinear state space models are stochastic versions of dynamical systems where a
Markovian realization of the model is both feasible and explicit: thus they satisfy a
generalization of (2.3) such as
Xn +1 = F (Xn , Wn +1 ), k ∈ Z+ , (2.4)
Xn = F (Xn −1 , Wn ),
Xk
400
− 400
k
400
Figure 2.2: Simple bilinear model path with F (x, w) = (0.707 + w)x + w
SETAR model
The chain X = {Xn } is called a scalar self-exciting threshold autoregres-
sion (SETAR) model if it satisfies
(SETAR1) for each 1 ≤ j ≤ M , Xn and Wn (j) are random variables on
R, satisfying, inductively for n ≥ 1,
Because of lack of continuity, the SETAR models do not fall into the class of non-
linear state space models, although they can often be analyzed using essentially the
same methods. The SETAR model will prove to be a useful example on which to test
the various stability criteria we develop, and the overall outcome of that analysis is
gathered together in Section B.2.
Many nonlinear processes cannot be modeled by a scalar Markovian model such as the
SNSS(F ) model. The more general multidimensional model is defined quite analogously.
2.2. Nonlinear state space models* 29
Xk = F (Xk −1 , Wk ),
The general nonlinear state space model can often be analyzed by the same meth-
ods that are used for the scalar SNSS(F ) model, under appropriate conditions on the
disturbance process W and the function F .
It is a central observation of such analysis that the structure of the NSS(F ) model
(and of course its scalar counterpart) is governed under suitable conditions by an asso-
ciated deterministic control model, defined analogously to the linear control model and
the linear state space model.
xk = Fk (x0 , u1 , . . . , uk ), k ∈ Z+ , (2.8)
The general ARMA model may be generalized to obtain a class of nonlinear models,
all of which may be “Markovianized”, as in the linear case.
30 Markov models
Yn = G(Yn −1 , Yn −2 , . . . , Yn −k , Wn , Wn −1 , Wn −2 , . . . , Wn − )
so that the components of [z] lie within the open unit disk in R2 for any z ∈ R2 .
Following this transformation we obtain the nonlinear state space model
a a
xn xn −1 1/xan −1 − 1/xbn −1
xn = =F b = . (2.11)
xbn xn −1 xan −1
2.2. Nonlinear state space models* 31
V
4000
t
0
-4000
X2 X2
X1 X1
(b) Shown on the left is the gumleaf attractor, and on the right is the gumleaf
attractor perturbed by noise.
A typical sample path of this model is given on the left hand side of Figure 2.3 (b).
In this figure 40,000 consecutive sample points of {xn } have been indicated by points
to illustrate the qualitative behavior of the model. Because of its similarity to some
Australian flora, the authors call the resulting plot the gumleaf attractor. Ydstie in
[410] also finds that such chaotic behavior can easily occur in adaptive systems.
One way that noise can enter the model (2.11) is to perturb (2.10) by noise. The
resulting two-dimensional recursion becomes
a
Xn 1/Xna −1 − 1/Xnb −1 Wn
Xn = b
= a + , (2.12)
Xn Xn −1 0
where W is i.i.d.. The special case where for each n the disturbance Wn is uniformly
distributed on [− 12 , 12 ] is illustrated on the right in Figure 2.3 (b). As in the previous
figure, we have plotted 40,000 values of the sequence X which takes values in R2 . Note
that the qualitative behavior of the process remains similar to the noise-free model,
although some of the detailed behavior is “smeared out” by the noise.
The analysis of general models of this type is a regular feature in what follows, and
in Chapter 7 we give a detailed analysis of the path structure that might be expected
under suitable assumptions on the noise and the associated deterministic model.
32 Markov models
Yk +1 = θk Yk + Wk +1 , (2.13)
θk +1 = αθk + Zk +1 ; (2.14)
It is assumed that W has a finite second moment, and that E[log(1+|Z|)] <
∞.
As usual, the control set Ow ⊆ R2 depends upon the specific distribution of W and Z.
A plot of the joint process Yθ is given in Figure 2.4. In this simulation we have
α = 0.933, Wk ∼ N (0, 0.14) and Zk ∼ N (0, 0.01).
The dark line is a plot of the parameter process θ, and the lighter, more explosive
path is the resulting output Y . One feature of this model is that the output oscillates
rapidly when θk takes on large negative values, which occurs in this simulation for time
values between 80 and 100.
2.3. Models in control and systems theory 33
10 θk
Yk
1
−1
−10
k
150
Figure 2.4: Dependent parameter bilinear model paths with α = 0.933, Wk ∼ N (0, 0.14)
and Zk ∼ N (0, 0.01)
We call the input sequence U sample mean square stabilizing if the input-output
process satisfies
1 2
N
lim sup [Yk + Uk2 ] < ∞ a.s.
N →∞ N
k =1
for every initial condition. The control law is then said to be minimum variance if it is
sample mean square stabilizing, and the sample path average
1 2
N
lim sup Yk (2.17)
N →∞ N
k =1
is minimized over all control laws with the property that, for each k, the input Uk is a
function of Yk , . . . , Y0 , and the initial conditions.
Such controls are often called “causal”, and for causal controls there is some pos-
sibility of a Markovian representation. We now specialize this general framework to a
situation where a Markovian analysis through state space representation is possible.
θ = (−α1 , . . . , −αn 1 , β1 , . . . , βn 2 )
denote the time invariant parameter vector. Suppose for the moment that the parameter
θ is known. If we set
φ
k −1 := (Yk −1 , . . . , Yk −n 1 , Uk −1 , . . . , Uk −n 2 ),
φ
k θ = 0, (2.18)
then this will result in Yk = Wk for all k. This control law obviously minimizes the
performance criterion (2.17) and hence is a minimum variance control law if it is sample
mean square stabilizing.
It is also possible to obtain a minimum variance control law, even when θ is not
available directly for the computation of the control Uk . One such algorithm (developed
in [142]) has a recursive form given by first estimating the parameters through the
following stochastic gradient algorithm:
φ
k θ̂k = 0.
Yk +1 = θk Yk + Uk + Wk +1 , (2.20)
θk +1 = αθk + Zk +1 , k ≥ 1, (2.21)
Zn σz 0
E[ W n (Zk , Wk )] = δn −k , n ≥ 1;
0 σw2
The time varying parameter process θ here is not observed directly but is partially
observed through the input and output processes U and Y .
The ultimate goal with such a model is to find a mean square stabilizing, minimum
variance control law. If the parameter sequence θ were completely observed then this
goal could be easily achieved by setting Uk = −θk Yk for each k ∈ Z+ , as in (2.18).
Since θ is only partially observed, we instead obtain recursive estimates of the
parameter process and choose a control law based upon these estimates. To do this
36 Markov models
we note that by viewing θ as a state process, as defined in [57], then because of the
assumptions made on (W , Z), the conditional expectation
θ̂k := E[θk | Yk ]
is computable using the Kalman filter (see [253, 240]) provided the initial distribution
of (U0 , Y0 , θ0 ) for (2.20), (2.21) is Gaussian.
In this scalar case, the Kalman filter estimates are obtained recursively by the pair
of equations
The closed loop system gives rise to a nonlinear state space model of the form
(NSS1). It follows then that the triple
σ2
is a Markov chain with state space X = [σz2 , 1−αz 2 ] × R2 . Although the state space is not
open, as required in (NSS1), when necessary we can restrict the chain to the interior
of X to apply the general results which will be developed for the nonlinear state space
model.
2.3. Models in control and systems theory 37
Yk Yk
0.4 30
0
0
− 0.4
k k
1000 1000
Figure 2.5: Output Y of the SAC model. The sample path shown on the left was
obtained using σz = 0.2, and the one shown on the right used σz = 1.1. In each case
α = 0.99 and σw = 0.1
Wk
0.4
− 0.4
k
1000
Figure 2.6: Disturbance W for the SAC model: N (0, 0.01) Gaussian white noise
In Figure 2.5 we have illustrated two typical sample paths of the output process Y ,
identical but for the different values of σz chosen. The disturbance process W in both
instances is i.i.d. N (0, 0.01); that is, σw = 0.1. A typical sample path of W is given in
Figure 2.6.
In both simulations we take α = 0.99. In the “stable” case shown on the left we
have σz = 0.2. In this case the output Y is barely distinguishable from the noise W .
In the second simulation, where σz = 1.1, we see that the output exhibits occasional
large bursts due to the more unpredictable behavior of the parameter process.
As we develop the general theory of Markov processes we will return to this example
to obtain fairly detailed properties of the closed loop system described by (2.22)-(2.24).
In Chapter 16 we characterize the mean square performance (2.17): when the pa-
rameter σz2 which defines the parameter variation is strictly less than unity, the limit
supremum is in fact a limit in this example, and this limit is independent of the initial
conditions of the system.
This limit, which is the expectation of Y0 with respect to an invariant measure,
cannot be calculated exactly due to the complexity of the closed loop system equations.
Using invariance, however, we may obtain explicit bounds on the limit, and give a
38 Markov models
characterization of the performance of the closed loop system which this limit describes.
Such characterizations are helpful in understanding how the performance varies as a
function of the disturbance intensity W and the parameter estimation error θ.
A chain which is a special form of the random walk chain in Section 1.2.3 is the renewal
process. Such chains will be fundamental in our later analysis of the structure of even
the most general of Markov chains, and here we describe the specific case where the
state space is countable.
Let {Y1 , Y2 , . . .} be a sequence of independent and identical random variables, with
distribution function p concentrated, not on the positive and negative integers, but
rather on Z+ . It is customary to assume that p(0) = 0. Let Y0 be a further independent
random variable, with the distribution of Y0 being a, also concentrated on Z+ . The
random variables
n
Zn := Yi
i=0
form an increasing sequence taking values in Z+ , and are called a delayed renewal
process, with a being the delay in the first variable: if a = p then the sequence {Zn } is
merely referred to as a renewal process.
As with the two-sided random walk, Zn is a Markov chain: not a particularly
interesting one in some respects, since it is evanescent in the sense of Section 1.3.1 (II),
but with associated structure which we will use frequently, especially in Part III.
With this notation we have P(Z0 = n) = a(n) and by considering the value of Z0
and the independence of Y0 and Y1 , we find
n
P(Z1 = n) = a(j)p(n − j).
j =0
To describe the n-step dynamics of the process {Zn } we need convolution notation.
2.4. Markov models with regeneration times 39
Convolutions
We write a ∗ b for the convolution of two sequences a and b given by
n
n
a ∗ b (n) := b(j)a(n − j) = a(j)b(n − j)
j =0 j =0
P(Zk = n) = a ∗ pk ∗ (n).
Two chains with appropriate regeneration associated with the renewal process are the
forward recurrence time chain, sometimes called the residual lifetime process, and the
backward recurrence time chain, sometimes called the age process.
III. We only use those aspects which we require in what follows, but for a much fuller
treatment of renewal and regeneration see Kingman [208] or Lindvall [239].
(Q2) The nth customer brings a job requiring service Sn where the ser-
vice times are independent of each other and of the interarrival times, and
are distributed as a variable S with distribution H(−∞, t] = P(S ≤ t).
(Q3) There is one server and customers are served in order of arrival.
Then the system is called a GI/G/1 queue.
The notation and many of the techniques here were introduced by Kendall [200, 201]:
GI for general independent input, G for general service time distributions, and 1 for a
single server system. There are many ways of analyzing this system: see Asmussen [9]
or Cohen [76] for comprehensive treatments.
Let N (t) be the number of customers in the queue at time t, including the customers
being served. This is clearly a process in continuous time. A typical sample path for
{N (t), t ≥ 0}, under the assumption that the first customer arrives at t = 0, is shown
2.4. Markov models with regeneration times 41
N (t)
S0 S1 S2 S3 S4
T1 T 2 T3 T4 T5 T6 t
0
T1 T2 T3 T4 T5 T6
x
Ti = T1 + · · · + Ti , i ≥ 1, (2.26)
Si = S0 + · · · + Si , i ≥ 0. (2.27)
Note that, in the sample path illustrated, because the queue empties at S2 , due to
T3 > S2 , the point x = T3 + S3 is not S3 , and the point T4 + S4 is not S4 , and so on.
Although the process {N (t)} occurs in continuous time, one key to its analysis
through Markov chain theory is the use of embedded Markov chains.
Consider the random variable Nn = N (Tn −), which counts customers immediately
before each arrival. By convention we will set N0 = 0 unless otherwise indicated. We
will show that under appropriate circumstances for k ≥ −j
P(Nn +1 = j + k | Nn = j, Nn −1 , Nn −2 , . . . , N0 ) = pk , (2.28)
regardless of the values of {Nn −1 , . . . , N0 }. This will establish the Markovian nature of
the process, and indeed will indicate that it is a random walk on Z+ .
Since we consider N (t) immediately before every arrival time, Nn +1 can only increase
from Nn by one unit at most; hence, equation (2.28) holds trivially for k > 1.
For Nn +1 to increase by one unit we need there to be no departures in the time
period Tn +1 − Tn , and obviously this happens if the job in progress at Tn is still in
progress at Tn +1 .
It is here that some assumption on the service times will be crucial. For it is easy
to show, as we now sketch, that for a general GI/G/1 queue the probability of the
remaining service of the job in progress taking any specific length of time depends,
typically, on when the job began. In general, the past history {Nn −1 , . . . , N0 } will
provide information on when the customer began service, and this in turn provides
information on how long the customer will continue to be served.
To see this, consider, for example, a trajectory such as that up to (T1 −) on Fig-
ure 2.7, where {Nn = 1, Nn −1 = 0, . . .}. This tells us that the current job began exactly
42 Markov models
so N = {Nn } is not a Markov chain, since from equation (2.29) and equation (2.30)
the different information in the events {Nn = 1, Nn −1 = 0} and {Nn = 1, Nn −1 =
1, Nn −2 = 0} (which only differ in the past rather than the present position) leads to
different probabilities of transition.
There is one case where this does not happen. If both sides of (2.30) are identical
so that the time until completion of service is quite independent of the time already
taken, then the extra information from the past is of no value.
This leads us to define a specific class of models for which N is Markovian.
GI/M/1 assumption
(Q4) If the distribution of service times is exponential with
H(−∞, t] = 1 − e−µt , t ≥ 0,
Here the M stands for Markovian, as opposed to the previous “general” assumption.
If we can now make assumption (Q4) that we have a GI/M/1 queue, then the well-
known “loss of memory” property of the exponential shows that, for any t, z,
In this way, the independence and identical distribution structure of the service times
show that, no matter which previous customer was being served, and when their service
started, there will be some z such that
M/G/1 assumption
(Q5) If the distribution of inter-arrival times is exponential with
G(−∞, t] = 1 − e−λt , t ≥ 0,
The actual probabilities governing the motion of these queueing models will be
developed in Chapter 3.
When a path of the contents process reaches zero, the process continues to take the
value zero until it is replenished by a positive input.
This model is a simplified version of the way in which a dam works; it is also a
model for an inventory, or for any other similar storage system.
The basic storage process operates in continuous time: to render it Markovian we
analyze it at specific time points when it (probabilistically) regenerates, as follows.
Φn +1 = [Φn + Sn − Jn ]+ ,
where the variables {Jn } are independent and identically distributed, with
0
k 0
k
0 100 0 100
Figure 2.8: Storage system paths. The plot shown on the left uses α/β = 2, and on the
right α/β = 0.5. In each case r = 1.
Usually we assume R(x) to be finite for all x. Since R is strictly increasing the inverse
function R−1 (t) is well defined for all t, and it follows that the drop in level in a time
period t with no input is given by
Jx (t) = x − q(x, t)
where
q(x, t) = R−1 (R(x) − t).
This enables us to use the same type of random walk calculation as for the Moran dam.
As before, when a path of this storage process reaches zero, the process continues
to take the value zero until it is replenished by a positive input.
46 Markov models
It is again necessary to analyze such a model at the times immediately before each
input in order to ensure a Markovian model. The assumptions we might use for such a
model are
Then the chain Φ = {Φn } represents the contents of the storage system at
the times {Tn −} immediately before each input, and is called the content-
dependent storage model.
Such models are studied in [157, 53]. In considering the connections between queue-
ing and storage models, it is then immediately useful to realize that this is also a model
of the waiting times in a model where the service time varies with the level of demand,
as studied in [56].
2.5 Commentary*
We have skimmed the Markovian models in the areas in which we are interested, try-
ing to tread the thin line between accessibility and triviality. The research literature
abounds with variations on the models we present here, and many of them would benefit
by a more thorough approach along Markovian lines.
For many more models with time series applications, the reader should see Brockwell
and Davis [51], especially Chapter 12; Granger and Anderson for bilinear models [143];
and for nonlinear models see Tong [388], who considers models similar to those we have
introduced from a Markovian viewpoint, and in particular discusses the bilinear and
SETAR models. Linear and bilinear models are also developed by Duflo in [102], with
a view towards stability similar to ours. For a development of general linear systems
theory the reader is referred to Caines [57] for a control perspective, or Aoki [5] for a
view towards time series analysis.
Bilinear models have received a great deal of attention in recent years in both time
series and systems theory. The dependent parameter bilinear model defined by (2.14,
2.13) is called a doubly stochastic autoregressive process of order 1, or DSAR(1), in
2.5. Commentary* 47
Tjøstheim [386]. Realization theory for related models is developed in Guégan [146]
and Mittnik [285], and the papers Pourahmadi [321], Brandt [44], Meyn and Guo [275],
and Karlsen [195] provide various stability conditions for bilinear models.
The idea of analyzing the nonlinear state space model by examining an associated
control model goes back to Stroock and Varadhan [378] and Kunita [227, 228] in con-
tinuous time. In control and systems models, linear state space models have always
played a central role, while nonlinear models have taken a much more significant role
over the past decade: see Kumar and Varaiya [225], Duflo [102], and Caines [57] for a
development of both linear adaptive control models, and (nonlinear) controlled Markov
chains.
The embedded regeneration time approach has been enormously significant since its
introduction by Kendall in [200, 201]. There are many more sophisticated variations
than those we shall analyze available in the literature. A good recent reference is
Asmussen [9], whilst Cohen [76] is encyclopedic.
The interested reader will find that, although we restrict ourselves to these relatively
less complicated models in illustrating the value of Markov chain modeling, virtually
all of our general techniques apply across more complex systems. As one example, note
that the stability of models which are state dependent, such as the content-dependent
storage model of Section 2.4.4, has only recently received attention [56], but using the
methods developed in later chapters it is possible to characterize it in considerable detail
[277, 279, 280].
The storage models described here can also be thought of, virtually by renaming
the terms, as models for state-dependent inventories, insurance models, and models of
the residual service in a GI/G/1 queue. To see the last of these, consider the amount of
service brought by each customer as the input to the “store” of work to be processed,
and note that the server works through this store of work at a constant rate.
The residual service can be, however, a somewhat minor quantity in a queueing
model, and in Section 3.5.4 below we develop a more complex model which is a better
representation of the dynamics of the GI/G/1 queue.
Added in second printing: In the last two years there has been a virtual explosion
in the use of general state space Markov chains in simulation methods, and especially
in Markov chain Monte Carlo methods which include Metropolis–Hastings and Gibbs
sampling techniques, which were touched on in Chapter 1.1(f). Any future edition will
need to add these to the collection of models here and examine them in more detail:
the interested reader might look at the recent results [63, 290, 360, 333, 328, 256, 335],
which all provide examples of the type of chains studied in this book.
Transition probabilities
As with all stochastic processes, there are two directions from which to approach the
formal definition of a Markov chain.
The first is via the process itself, by constructing (perhaps by heuristic arguments
at first, as in the descriptions in Chapter 2) the sample path behavior and the dynamics
of movement in time through the state space on which the chain lives. In some of our
examples, such as models for queueing processes or models for controlled stochastic
systems, this is the approach taken. From this structural definition of a Markov chain,
we can then proceed to define the probability laws governing the evolution of the chain.
The second approach is via those very probability laws. We define them to have
the structure appropriate to a Markov chain, and then we must show that there is
indeed a process, properly defined, which is described by the probability laws initially
constructed. In effect, this is what we have done with the forward recurrence time chain
in Section 2.4.1.
From a practitioner’s viewpoint there may be little difference between the ap-
proaches. In many books on stochastic processes, such as Çinlar [59] or Karlin and
Taylor [194], the two approaches are used, as they usually can be, almost interchange-
ably; and advanced monographs such as Nummelin [303] also often assume some of the
foundational aspects touched on here to be well understood.
Since one of our goals in this book is to provide a guide to modern general space
Markov chain theory and methods for practitioners, we give in this chapter only a sketch
of the full mathematical construction which provides the underpinning of Markov chain
theory.
However, we also have as another, and perhaps somewhat contradictory, goal the
provision of a thorough and rigorous exposition of results on general spaces, and for
these it is necessary to develop both notation and concepts with some care, even if some
of the more technical results are omitted.
Our approach has therefore been to develop the technical detail in so far as it is
relevant to specific Markov models, and where necessary, especially in techniques which
are rather more measure theoretic or general stochastic process theoretic in nature, to
refer the reader to the classic texts of Doob [99], and Chung [71], or the more recent
exposition of Markov chain theory by Revuz [326] for the foundations we need. Whilst
such an approach renders this chapter slightly less than self-contained, it is our hope
48
3.1. Defining a Markovian process 49
that the gaps in these foundations will be either accepted or easily filled by such external
sources.
Our main goals in this chapter are thus
(i) to demonstrate that the dynamics of a Markov chain {Φn } can be completely
defined by its one step “transition probabilities”
which are well defined for appropriate initial points x and sets A;
(ii) to develop the functional forms of these transition probabilities for many of the
specific models in Chapter 2, based in some cases on heuristic analysis of the chain
and in other cases on development of the probability laws; and
(iii) to develop some formal concepts of hitting times on sets, and the “strong Markov
property” for these and related stopping times, which will enable us to address
issues of stability and structure in subsequent chapters.
We shall start first with the formal concept of a Markov chain as a stochastic process,
and move then to the development of the transition laws governing the motion of the
chain; and complete the cycle by showing that if one starts from a set of possible
transition laws then it is possible to move from these to a chain which is well defined
and governed by these laws.
It may on the face of it seem odd to introduce quite general spaces before rather
than after topological (or more structured) spaces.
This is however quite deliberate, since (perhaps surprisingly) we rarely find the extra
structure actually increasing the ease of approach. From our point of view, we introduce
topological spaces largely because specific applied models evolve on such spaces, and
for such spaces we will give specific interpretations of our general results, rather than
extending specific topological results to more general contexts.
For example, after framing general properties of sets, we identify these general prop-
erties as holding for compact or open sets if the chain is on a topological space; or after
framing general properties of Φ, we develop the consequences of these when Φ is suitably
continuous with respect to the topology considered.
The first formal introduction of such topological concepts is given in Chapter 6, and
is exemplified by an analysis of linear and nonlinear state space models in Chapter 7.
Prior to this we concentrate on countable and general spaces: for purposes of exposi-
tion, our approach will often involve the description of behavior on a countable space,
followed by the development of analogous behavior on a general space, and completed
by specialization of results, where suitable, to more structured topological spaces in due
course.
For some readers, countable space models will be familiar: nonetheless, by develop-
ing the results first in this context, and then the analogues for the less familiar general
3.2. Foundations on a countable space 51
space processes on a systematic basis we intend to make the general context more acces-
sible. By then specializing where appropriate to topological spaces, we trust the results
will be found more applicable for, say, those models which evolve on multidimensional
Euclidean space Rk , or one of its subsets.
There is one caveat to be made in giving this description. One of the major observa-
tions for Markov chains is that in many cases, the full force of a countable space is not
needed: we merely require one “accessible atom” in the space, such as we might have
with the state {0} in the storage models in Section 2.4.1. To avoid repetition we will
often assume, especially later in the book, not the full countable space structure but
just the existence of one such point: the results then carry over with only notational
changes to the countable case.
In formalizing the concept of a Markov chain we pursue this pattern now, first
developing the countable space foundations and then moving on to the slightly more
complex basis for general space chains.
Pµ (Φ ∈ A | Φ0 = x0 ) = Px 0 (Φ ∈ A) (3.1)
where Px 0 is the probability distribution on F which is obtained when the initial distri-
bution is the point mass δx 0 at x0 .
The defining characteristic of a Markov chain is that its future trajectories depend
on its present and its past only through the current value.
To commence to formalize this, we first consider only the laws governing a trajectory
of fixed length n ≥ 1. The random variables {Φ0 . . . Φn }, thought of as a sequence, take
values in the space Xn +1 = X0 × · · · × Xn , the (n + 1)-fold product of copies Xi of the
countable space X, equipped with the product σ-field B(Xn +1 ) which consists again of
all subsets of Xn +1 .
The conditional probability
defined for any sequence {x0 , . . . , xn } ∈ Xn +1 and x0 ∈ X, and the initial probability
distribution µ on B(X) completely determine the distributions of {Φ0 , . . . , Φn }.
52 Transition probabilities
Pµ (Φ0 = x0 , Φ1 = x1 , Φ2 = x2 , . . . , Φn = xn )
(3.3)
= µ(x0 )Px 0 (Φ1 = x1 )Px 1 (Φ1 = x2 ) · · · Px n −1 (Φ1 = xn ).
Pµ (Φ0 = x0 , Φ1 = x1 , . . . , Φn = xn )
(3.4)
= µ(x0 )P (x0 , x1 )P (x1 , x2 ) · · · P (xn −1 , xn ),
Equation (3.5) incorporates both the “loss of memory” of Markov chains and the “time
homogeneity” embodied in our definitions. It is possible to mimic this definition, asking
that the Px j (Φ1 = xj +1 ) depend on the time j at which the transition takes place; but
the theory for such inhomogeneous chains is neither so ripe nor so clean as for the chains
we study, and we restrict ourselves solely to the time-homogeneous case in this book.
For a given model we will almost always define the probability Px 0 for a fixed x0 by
defining the one-step transition probabilities for the process, and building the overall
distribution using (3.4).
This is done using a Markov transition matrix.
3.2. Foundations on a countable space 53
In the next section we show how to take an initial distribution µ and a transition matrix
P and construct a distribution Pµ so that the conditional distributions of the process
may be computed as in (3.1), and so that for any x, y,
Pµ (Φn = y | Φ0 = x) = P n (x, y) (3.8)
For this reason, P n is called the n-step transition matrix. For A ⊆ X, we also put
P n (x, A) := P n (x, y).
y ∈A
to govern the overall evolution of Φ. The formula (3.1) and the interpretation of the
transition function given in (3.8) follow immediately from this construction.
A careful construction is in Chung [71], Chapter I.2. This leads to
54 Transition probabilities
are an initial measure on X and a Markov transition matrix satisfying (3.6) then there
exists a Markov chain Φ on (Ω, F) with probability law Pµ satisfying
The backward recurrence time chain V − has a similarly simple structure. For any
n ∈ Z+ , let us write
p(n) = p(j). (3.9)
j ≥n +1
3.3. Specific transition matrices 55
Write M = sup(m ≥ 1 : p(m) > 0); if M < ∞ then for this chain the state space
X = {0, 1, . . . , M − 1}; otherwise X = Z+ . In either case, for x ∈ X we have (with Y as
a generic increment variable in the renewal process)
P (x, x + 1) = P(Y > x + 1 | Y > x) = p(x + 1)/p(x),
P (x, 0) = P(Y = x + 1 | Y > x) = p(x + 1)/p(x), (3.10)
and zero otherwise. This gives a superdiagonal matrix of the form
b(1) 1 − b(1) 0 0 ...
b(2) 0 1 − b(2) 0 . . .
P = b(3) ..
0 . 1 − b(3)
.. .. .. ..
. . . .
whilst for y = 0,
P (x, 0) = P(Φ0 + W1 ≤ 0 | Φ0 = x)
= P(W1 ≤ −x)
= Γ(−∞, −x].
∞n − Jn = x)
Γ(x) = P(S
= i=0 H(i)G(x + i).
We have rather forced the storage model into our countable space context by assuming
that the variables concerned are integer valued. We will rectify this in later sections.
P(Nn +1 = j + k | Nn = j, Nn −1 , Nn −2 , . . . , N0 ) = 0. (3.16)
The independence and identical distribution structure of the service times show as in
Section 2.4.2 that, no matter which previous customer was being served, and when their
service started,
∞
P(Nn +1 = j + 1 | Nn = j, Nn −1 , Nn −2 , . . . , N0 ) = e−µt G(dt) = p0 (3.17)
0
Proposition 3.3.2. For the M/G/1 queue, the sequence N∗ = {Nn∗ , n ≥ 0} can be
constructed as a Markov chain with state space Z+ and transition matrix
q0 q1 q2 q3 q4 ...
q0 q1 q2 q3 q4 ...
q0 q1 q2 q3 ...
P =
q0 q1 q2 ...
.. .. .. .. ..
. . . . .
58 Transition probabilities
Hence N∗ is similar to a random walk on a half line, but with a different modification
of the transitions away from zero.
P (r, q) = P(Xn +1 = q | Xn = r)
= Γ(q − αr), r, q ∈ Q.
Again, once we have P = {P (r, q), r, q ∈ Q}, we are guaranteed the existence of the
Markov chain X, using the results of Theorem 3.2.1 with P as transition probability
matrix.
This autoregression highlights immediately the shortcomings of the countable state
space structure. Although Q is countable, so that in a formal sense we can construct
a linear model satisfying (SLM1) and (SLM2) on Q in such a way that we can use
countable space Markov chain theory, it is clearly more natural to take, say, α as real
and the variable W as real valued also, so that Xn is real valued for any initial x0 ∈ R.
To model such processes, and the more complex autoregressions and nonlinear mod-
els which generalize them in Chapter 2, and which are clearly Markovian but continuous
valued in conception, we need a theory for continuous-valued Markov chains. We turn
to this now.
These are all well defined by the measurability of the integrands P ( · , · ) in the first
variable, and the fact that the kernels
n are measures in the second variable.
If we now extend Pnx to all of 0 B(Xi ) in the usual way [37] and repeat this procedure
for increasing n, we find
Theorem 3.4.1. For any initial measure µ on B(X), and any transition probability ker-
{P (x, A), x ∈ X, A ∈ B(X)}, there exists
nel P = a stochastic process Φ = {Φ0 , Φ1 , . . .}
∞ ∞
on Ω = i=0 Xi , measurable with respect to F = i=0 B(Xi ), and a probability measure
Pµ on F such that Pµ (B) is the probability of the event {Φ ∈ B} for B ∈ F; and for
measurable Ai ⊆ Xi , i = 0, . . . , n, and any n
Pµ (Φ0 ∈ A0 , Φ1 ∈ A1 , . . . , Φn ∈ An ) (3.20)
= ··· µ(dy0 )P (y0 , dy1 ) · · · P (yn −1 , An ).
y 0 ∈A 0 y n −1 ∈A n −1
Proof Because of the consistency of definition of the set functions Pnx , there is an
overall measure Px for which the Pnx are finite-dimensional distributions, which leads to
the result: the details are relatively standard measure-theoretic constructions, and are
given in the general case by Revuz [326], Theorem 2.8 and Proposition 2.11; whilst if the
space has a suitable topology, as in (MC1), then the existence of Φ is a straightforward
consequence of Kolmogorov’s Consistency Theorem for construction of probabilities on
topological spaces.
The details of this construction are omitted here, since it suffices for our purposes
to have indicated why transition probabilities generate processes, and to have spelled
out that the key equation (3.20) is a reasonable representation of the behavior of the
process in terms of the kernel P .
We can now formally define
3.4. Foundations for general state space chains 61
We write P n for the n-step transition probability kernel {P n (x, A), x ∈ X, A ∈ B(X)}:
note that P n is defined analogously to the n-step transition probability matrix for the
countable space case.
As a first application of the construction equations (3.20) and (3.22), we have the
celebrated Chapman–Kolmogorov equations. These underlie, in one form or another,
virtually all of the solidarity structures we develop.
Theorem 3.4.2. For any m with 0 ≤ m ≤ n,
P n (x, A) = P m (x, dy)P n −m (y, A), x ∈ X, A ∈ B(X). (3.23)
X
n ∈ A) = P
Px (Φm mn
(x, A). (3.25)
This, and several other transition functions obtained from P , will be used widely in the
sequel.
62 Transition probabilities
This nomenclature is taken from the continuous time literature, but we will see that
in discrete time the m-skeletons and resolvents of the chain also provide a useful tool
for analysis.
There is one substantial difference in moving to the general case from the countable
case, which flows from the fact that the kernel P n can no longer be viewed as symmetric
in its two arguments.
In the general case the kernel P n operates on quite different entities from the left
and the right. As an operator P n acts on both bounded measurable functions f on X
and on σ-finite measures µ on B(X) via
n n n
P f (x) = P (x, dy)f (y), µP (A) = µ(dx)P n (x, A),
X X
and we shall use the notation P n f, µP n to denote these operations. We shall also write
P n (x, f ) := P n (x, dy)f (y) := δx P n f
Proposition 3.4.3. If Φ is a Markov chain on (Ω, F), with initial measure µ, and
h : Ω → R is bounded and measurable, then
The formulation of the Markov concept itself is made much simpler if we develop
more systematic notation for the information encompassed in the past of the process,
and if we introduce the “shift operator” on the space Ω.
For a given initial distribution, define the σ-field
which is the smallest σ-field for which the random variable {Φ0 , . . . , Φn } is measurable.
In many cases, FnΦ will coincide with B(Xn ), although this depends in particular on the
initial measure µ chosen for a particular chain.
The shift operator θ is defined to be the mapping on Ω defined by
θ1 = θ, θk +1 = θ ◦ θk , k ≥ 1.
θk H = h(Φk , Φk +1 , . . .)
valid for any bounded measurable h and fixed n ∈ Z+ , describes the time-homogeneous
Markov property in a succinct way.
It is not always the case that FnΦ is complete: that is, contains every set of Pµ -
measure zero. We adopt the following convention as in [326]. For any initial measure
µ we say that an event A occurs Pµ -a.s. to indicate that Ac is a set contained in an
element of FnΦ which is of Pµ -measure zero.
If A occurs Px -a.s. for all x ∈ X then we write that A occurs P∗ -a.s.
64 Transition probabilities
(i) For any set A ∈ B(X), the occupation time ηA is the number of visits
by Φ to A after time zero, and is given by
∞
ηA := I{Φn ∈ A}.
n =1
τA := min{n ≥ 1 : Φn ∈ A},
σA := min{n ≥ 0 : Φn ∈ A}
are called the first return and first hitting times on A, respectively.
τA (1) := τA ,
τA (k) := min{n > τA (k − 1) : Φn ∈ A}. (3.29)
In order to analyze numbers of visits to sets, we often need to consider the behavior
after the first visit τA to a set A (which is a random time), rather than behavior
after fixed times. One of the most crucial aspects of Markov chain theory is that the
“forgetfulness” properties in equation (3.20) or equation (3.27) hold, not just for fixed
times n, but for the chain interrupted at certain random times, called stopping times,
and we now introduce these ideas.
Stopping times
A function ζ : Ω → Z+ ∪ {∞} is a stopping time for Φ if for any initial
distribution µ the event {ζ = n} ∈ FnΦ for all n ∈ Z+ .
The first return and the hitting times on sets provide simple examples of stopping
times.
Proposition 3.4.4. For any set A ∈ B(X), the variables τA and σA are stopping times
for Φ.
for the return time probability to a set A starting from the state x.
The simple Markov property (3.28) holds for any bounded measurable h and fixed
n ∈ Z+ . We now extend (3.28) to stopping times.
If ζ is an arbitrary stopping time, then the fact that our time set is Z+ enables us
to define the random variable Φζ by setting Φζ = Φn on the event {ζ = n}. For a
stopping time ζ the property which tells us that the future evolution of Φ after the
stopping time depends only on the value of Φζ , rather than on any other past values,
is called the strong Markov property.
To describe this formally, we need to define the σ-field FζΦ :={A ∈ F : {ζ = n}∩A ∈
Fn , n ∈ Z+ }, which describes events which happen “up to time ζ”.
Φ
Proposition 3.4.6. For a Markov chain Φ with discrete time parameter, the strong
Markov property always holds.
We are not always interested only in the times of visits to particular sets. Often the
quantities of interest involve conditioning on such visits being in the future.
3.5. Building transition kernels for specific models 67
Taboo probabilities
We define the n-step taboo probabilities as
AP
n
(x, B) := Px (Φn ∈ B, τA ≥ n), x ∈ X, A, B ∈ B(X).
Γ(A) = P(Sn − Jn ∈ A)
∞
= G(A/r + y/r) H(dy), (3.37)
0
n
Zn := Yi
i=0
are again called a delayed renewal process, with Γ0 being the distribution of the delay
described by the first variable. If Γ0 = Γ then the sequence {Zn } is again referred to
as a renewal process.
As with the integer-valued case, write Γ0 ∗ Γ for the convolution of Γ0 and Γ given
by
t t
Γ0 ∗ Γ (dt) := Γ(dt − s) Γ0 (ds) = Γ0 (dt − s) Γ(ds) (3.38)
0 0
n∗ th
and Γ for the n convolution of Γ with itself. By decomposing successively over the
values of the first n variables Z0 , . . . , Zn −1 we have that
∞
and so the renewal measure given by U (−∞, t] = 0 Γn ∗ (−∞, t] has the interpretation
and
Γ0 ∗ U [0, t] = EΓ 0 [number of renewals in [0, t]],
where E0 refers to the expectation when the first renewal is at 0, and EΓ 0 refers to the
expectation when the first renewal has distribution Γ0 .
It is clear that Zn is a Markov chain: its transition probabilities are given by
and so Zn is a random walk. It is not a very stable one, however, as it moves inexorably
to infinity with each new step.
The forward and backward recurrence time chains, in contrast to the renewal process
itself, exhibit a much greater degree of stability: they grow, then they diminish, then
they grow again.
time δ-skeleton.
We call the process
(RT4) V − (t) := inf(t − Zn : Zn ≤ t, n ≥ 1), t ≥ 0,
the backward recurrence time process; and for any δ > 0, the discrete time
chain V − − −
δ = {Vδ (n) = V (nδ), n ∈ Z+ } is called the backward recurrence
time δ-skeleton.
No matter what the structure of the renewal sequence (and in particular, even if Γ
−
is not exponential), the forward and backward recurrence time δ-skeletons V + δ and V δ
are Markovian.
To see this for the forward chain, note that if x > δ, then the transition probabilities
P δ of V +
δ are merely
P δ (x, {x − δ}) = 1
whilst if x ≤ δ we have, by decomposing over the time and the index of the last
renewal in the period after the current forward recurrence time finishes, and using the
70 Transition probabilities
whilst we have a similarly embedded chain after the service times if the inter-arrival
time is exponential. However, the numbers in the queue, even at the arrival or departure
times, are not Markovian without such exponential assumptions.
The key step in the general case is to augment {Nn } so that we do get a Markov
model. This augmentation involves combining the information on the numbers in the
queue with the information in the residual service time
To do this we introduce a bivariate “ladder chain” on a “ladder” space Z+ × R,
with a countable number of rungs indexed by the first variable and with each rung
constituting a copy of the real line.
This construction is in fact more general than that for the GI/G/1 queue alone, and
we shall use the ladder chain model for illustrative purposes on a number of occasions.
Define the Markov chain Φ = {Φn } on Z+ × R with motion defined by the transition
probabilities P (i, x; j × A), i, j ∈ Z+ , x ∈ R, A ∈ B(R) given by
P (i, x; j × A) = 0, j > i + 1,
P (i, x; j × A) = Λi−j +1 (x, A), j = 1, . . . , i + 1, (3.40)
P (i, x; 0 × A) = Λ∗i (x, A).
where each of the Λi , Λ∗i is a substochastic transition probability kernel on R in its own
right.
3.5. Building transition kernels for specific models 71
The translation invariant and “skip-free to the right” nature of the movement of
this chain, incorporated in (3.41), indicates that it is a generalization of those random
walks which occur in the GI/M/1 queue, as delineated in Proposition 3.3.1. We have
∗
Λ0 Λ0
Λ∗1 Λ1 Λ0 0
P = Λ∗ Λ2 Λ1 Λ0
2
.. .. .. .. ..
. . . . .
where now the Λi , Λ∗i are substochastic transition probability kernels rather than mere
scalars.
To use this construction in the GI/G/1 context we write
Φn = (Nn , Rn ), n ≥ 1,
then Φ = {Φn ; n ∈ Z+ } can be realised as a Markov chain with the structure (3.41), as
we now demonstrate by constructing the transition kernel P explicitly.
As in (Q1)–(Q3) let H denote the distribution function of service times, and G
denote the distribution function of inter-arrival times; and let Z1 , Z2 , Z3 , . . . denote an
undelayed renewal process with Zn −Zn −1 = Sn having the service distribution function
H, as in (2.27). This differs from the process of completion points of services in that the
latter may have longer intervals when there is no customer present, after completion of
a busy cycle.
Let Rt denote the forward recurrence time in the renewal process {Zk } at time t
in this process, i.e., Rt = ZN (t)+1 − t, where N (t) = sup{n : Zn ≤ t} as in (RT3). If
R0 = x then Z1 = x. Now write
for the probability that, in this renewal process n “service times” are completed in [0, t]
and that the residual time of current service at t is in [0, y], given R0 = x.
With these definitions it is easy to verify that the chain Φ has the form (3.41) with
the specific choice of the substochastic transition kernels Λi , Λ∗i given by
∞
Λn (x, [0, y]) = Pnt (x, y) G(dt) (3.42)
0
and
∞
Λ∗n (x, [0, y]) = Λj (x, [0, ∞)) H[0, y]. (3.43)
n +1
in particular cases. The general functional form which we construct here for the scalar
SNSS(F ) model of Section 2.2.1 will be used extensively, as will the techniques which
are used in constructing its form.
For any bounded and measurable function h : X → R we have from (SNSS1),
P h (x) = E[h(Xn +1 ) | Xn = x]
= E[h(F (x, W ))]
where W is a generic noise variable. Since Γ denotes the distribution of W , this becomes
∞
P h (x) = h(F (x, w)) Γ(dw)
−∞
and by specializing to the case where h = IA , we see that for any measurable set A and
any x ∈ X, ∞
P (x, A) = I{F (x, w) ∈ A} Γ(dw).
−∞
To construct the k-step transition probability, recall from (2.5) that the transition
maps for the SNSS(F ) model are defined by setting F0 (x) = x, F1 (x0 , w1 ) = F (x0 , w1 ),
and for k ≥ 1,
where x0 and wi are arbitrary real numbers. By induction we may show that for any
initial condition X0 = x0 and any k ∈ Z+ ,
Xk = Fk (x0 , W1 , . . . , Wk ),
which immediately implies that the k-step transition function may be expressed as
3.6 Commentary
The development of foundations in this chapter is standard. The existence of the
excellent accounts in Chung [71] and Revuz [326] renders it far less necessary for us to
fill in specific details.
The one real assumption in the general case is that the σ-field B(X) is countably
generated. For many purposes, even this condition can be relaxed, using the device
of “admissible σ-fields” discussed in Orey [309], Chapter 1. We shall not require, for
the models we develop, the greater generality of non-countably generated σ-fields, and
leave this expansion of the concepts to the reader if necessary.
3.6. Commentary 73
The Chapman–Kolmogorov equations, simple though they are, hold the key to much
of the analysis of Markov chains. The general formulation of these dates to Kolmogorov
[215]: David Kendall comments [204] that the physicist Chapman was not aware of his
role in this terminology, which appears to be due to work on the thermal diffusion of
grains in a non-uniform fluid.
The Chapman–Kolmogorov equations indicate that the set P n is a semi-group of
operators just as the corresponding matrices are, and in the general case this obser-
vation enables an approach to the theory of Markov chains through the mathematical
structures of semi-groups of operators. This has proved a very fruitful method, espe-
cially for continuous time models. However, we do not pursue that route directly in
this book, nor do we pursue the possibilities of the matrix structure in the countable
case.
This is largely because, as general non-negative operators, the P n often do not act
on useful spaces for our purposes. The one real case where the P n operate success-
fully on a normed space occurs in Chapter 16, and even there the space only emerges
after a probabilistic argument is completed, rather than providing a starting point for
analysis.
Foguel [122, 124] has a thorough exposition of the operator-theoretic approach to
chains in discrete time, based on their operation on L1 spaces. Vere-Jones [405, 406]
has a number of results based on the action of a matrix P as a non-negative operator
on sequence spaces suitably structured, but even in this countable case results are
limited. Nummelin [303] couches many of his results in a general non-negative operator
context, as does Tweedie [394, 395], but the methods are probabilistic rather than using
traditional operator theory.
The topological spaces we introduce here will not be considered in more detail until
Chapter 6. Very many of the properties we derive will actually need less structure
than we have imposed in our definition of “topological” spaces: often (see for example
Tuominen and Tweedie [391]) all that is required is a countably generated topology with
the T1 separability property. The assumptions we make seem unrestrictive in practice,
however, and avoid occasional technicalities of proof.
Hitting times and their properties are of prime importance in all that follows. On
a countable space Chung [71] has a detailed account of taboo probabilities, and much
of our usage follows his lead and that of Nummelin [303], although our notation differs
in minor ways from the latter. In particular our τA is, regrettably, Nummelin’s SA and
our σA is Nummelin’s TA ; our usage of τA agrees, however, with that of Chung [71] and
Asmussen [9], and we hope is the more standard.
The availability of the strong Markov property is vital for much of what follows.
Kac is reported as saying [50] that he was fortunate, for in his day all processes had the
strong Markov property: we are equally fortunate that, with a countable time set, all
chains still have the strong Markov property.
The various transition matrices that we construct are well known. The reader who
is not familiar with such concepts should read, say, Çinlar [59], Karlin and Taylor [194]
or Asmussen [9] for these and many other not dissimilar constructions in the queue-
ing and storage area. For further information on linear stochastic systems the reader
is referred to Caines [57]. The control and systems areas have concentrated more in-
tensively on controlled Markov chains which have an auxiliary input which is chosen
to control the state process Φ. Once a control is applied in this way, the “closed
74 Transition probabilities
Irreducibility
This chapter is devoted to the fundamental concept of irreducibility: the idea that all
parts of the space can be reached by a Markov chain, no matter what the starting
point. Although the initial results are relatively simple, the impact of an appropriate
irreducibility structure will have wide-ranging consequences, and it is therefore of critical
importance that such structures be well understood.
The results summarized in Theorem 4.0.1 are the highlights of this chapter from a
theoretical point of view. An equally important aspect of the chapter is, however, to
show through the analysis of a number of models just what techniques are available in
practice to ensure the initial condition of Theorem 4.0.1 (“ϕ-irreducibility”) holds, and
we believe that these will repay equally careful consideration.
Theorem 4.0.1. If there exists an “irreducibility” measure ϕ on B(X) such that for
every state x
ϕ(A) > 0 ⇒ L(x, A) > 0 (4.1)
then there exists an essentially unique “maximal” irreducibility measure ψ on B(X) such
that
(i) for every state x we have L(x, A) > 0 whenever ψ(A) > 0, and also
(ii) if ψ(A) = 0, then ψ(Ā) = 0, where
Ā := {y : L(y, A) > 0} ;
(iii) if ψ(Ac ) = 0, then A = A0 ∪ N where the set N is also ψ-null, and the set A0 is
absorbing in the sense that
P (x, A0 ) ≡ 1, x ∈ A0 .
Proof The existence of a measure ψ satisfying the irreducibility conditions (i) and
(ii) is shown in Proposition 4.2.2, and that (iii) holds is in Proposition 4.2.3.
The term “maximal” is justified since we will see that ϕ is absolutely continuous
with respect to ψ, written ψ ϕ, for every ϕ satisfying (4.1); here the relation of
absolute continuity of ϕ with respect to ψ means that ψ(A) = 0 implies ϕ(A) = 0.
75
76 Irreducibility
Verifying (4.1) is often relatively painless. State space models on Rk for which the
noise or disturbance distribution has a density with respect to Lebesgue measure will
typically have such a property, with ϕ taken as Lebesgue measure restricted to an open
set (see Section 4.4, or in more detail, Chapter 7); chains with a regeneration point
α reached from everywhere will satisfy (4.1) with the trivial choice of ϕ = δα (see
Section 4.3).
The extra benefit of defining much more accurately the sets which are avoided by
“most” points, as in Theorem 4.0.1 (ii), or of knowing that one can omit ψ-null sets and
restrict oneself to an absorbing set of “good” points as in Theorem 4.0.1 (iii), is then of
surprising value, and we use these properties again and again. These are however far
from the most significant consequences of the seemingly innocuous assumption (4.1):
far more will flow in Chapter 5, and thereafter.
The most basic structural results for Markov chains, which lead to this formalization
of the concept of irreducibility, involve the analysis of communicating states and sets. If
one can tell which sets can be reached with positive probability from particular starting
points x ∈ X, then one can begin to have an idea of how the chain behaves in the longer
term, and then give a more detailed description of that longer term behavior.
Our approach therefore commences with a description of communication between
sets and states which precedes the development of irreducibility.
Φn = [Φn −1 + Wn ]+ . (4.2)
In this example, we might single out the set {0} and ask: can the chain ever reach the
state {0}?
It is transparent from the definition of P (x, 0) that {0} can be reached with positive
probability, and in one step, provided the distribution Γ of the increment {Wn } has an
4.1. Communication and irreducibility: Countable spaces 77
infinite negative tail. But suppose we have, not such a long tail, but only P(Wn < 0) > 0,
with, say,
Γ(w) = δ > 0 (4.4)
for some w < 0. Then we have for any x that after n ≥ |x/w| steps,
Px (Φn = 0) ≥ P(W1 = w, W2 = w, . . . , Wn = w) = δ n > 0
so that {0} is always reached with positive probability.
On the other hand, if P(Wn < 0) = 0 then it is equally clear that {0} cannot be
reached with positive probability from any starting point other than 0. Hence L(x, 0) >
0 for all states x or for none, depending on whether (4.4) holds or not.
But we might also focus on points other than {0}, and it is then possible that a
number of different sorts of behavior may occur, depending on the distribution of W .
If we have P(W = y) > 0 for all y ∈ Z then from any state there is positive probability
of Φ reaching any other state at the next step. But suppose we have the distribution
of the increments {Wn } concentrated on the even integers, with
P(W = 2y) > 0, P(W = 2y + 1) = 0, y ∈ Z,
and consider any odd valued state, say w. In this case w cannot be reached from any
even valued state, even though from w itself it is possible to reach every state with
positive probability, via transitions of the chain through {0}.
Thus for this rather trivial example, we already see X breaking into two subsets with
substantially different behavior: writing Z0+ = {2y, y ∈ Z+ } and Z1+ = {2y + 1, y ∈ Z+ }
for the set of non-negative even and odd integers respectively, we have
Z+ = Z0+ ∪ Z1+ ,
and from y ∈ Z1+ , every state may be reached, whilst for y ∈ Z0+ , only states in Z0+ may
be reached with positive probability.
Why are these questions of importance?
As we have already seen, the random walk on a half line above is one with many
applications: recall that the transition matrices of N = {Nn } and N ∗ = {Nn∗ }, the
chains introduced in Section 2.4.2 to describe the number of customers in GI/M/1 and
M/G/1 queues, have exactly the structure described by (4.3).
The question of reaching {0} is then clearly one of considerable interest, since it rep-
resents exactly the question of whether the queue will empty with positive probability.
Equally, the fact that when {Wn } is concentrated on the even integers (representing
some degenerate form of batch arrival process) we will always have an even number of
customers has design implications for number of servers (do we always want to have
two?), waiting rooms and the like.
But our efforts should and will go into finding conditions to preclude such oddities,
and we turn to these in the next section, where we develop the concepts of communi-
cation and irreducibility in the countable space context.
Proposition 4.1.1. The relation “↔” is an equivalence relation, and so the equivalence
classes C(x) = {y : x ↔ y} cover X, with x ∈ C(x).
When states do not all communicate, then although each state in C(x) communicates
with every other state in C(x), it is possible that there are states y ∈ [C(x)]c such that
x → y. This happens, of course, if and only if C(x) is not absorbing.
Suppose that X is not irreducible for Φ. If we reorder the states according to the
equivalence classes defined by the communication operation, and if we further order the
classes with absorbing classes coming first, then we have a decomposition of P such as
that depicted in Figure 4.1.
Here, for example, the blocks C(1), C(2) and C(3) correspond to absorbing classes,
and block D contains those states which are not contained in an absorbing class. In the
extreme case, a state in D may communicate only with itself, although it must lead to
some other state from which it does not return. We can write this decomposition as
X= C(x) ∪ D (4.5)
x∈I
4.1. Communication and irreducibility: Countable spaces 79
C(1)
0
C(2)
P =
0 C(3)
Proof We merely need to note that the elements of PC are positive, and
P (x, y) ≡ 1, x ∈ C,
y ∈C
because C is absorbing: the existence of ΦC then follows from Theorem 3.2.1, and
irreducibility of ΦC is an obvious consequence of the communicating class structure of
C.
Thus for non-irreducible chains, we can analyze at least the absorbing subsets in the
decomposition (4.5) as separate chains.
The virtue of the block decomposition described above lies largely in this assur-
ance that any chain on a countable space can be studied assuming irreducibility. The
“irreducible absorbing” pieces C(x) can then be put together to deduce most of the
properties of a reducible chain.
Only the behavior of the remaining states in D must be studied separately, and
in analyzing stability D may often be ignored. For let J denote the indices of the
states!for which the communicating classes are not absorbing. If the chain starts in
D = y ∈J C(y), then one of two things happens: either it reaches one of the absorbing
sets C(x), x ∈ X\J, in which case it gets absorbed: or, as the only other alternative,
the chain leaves every finite subset of D and “heads to infinity”.
To see why this might hold, observe that, for any fixed y ∈ J, there is some state
z ∈ C(y) with P (z, [C(y)]c ) = δ > 0 (since C(y) is not an absorbing class), and
P m (y, z) = β > 0 for some m > 0 (since C(y) is a communicating class). Suppose that
in fact the chain returns a number of times to y: then, on each of these returns, one
80 Irreducibility
has a probability greater than βδ of leaving C(y) exactly m + 1 steps later, and this
probability is independent of the past due to the Markov property.
Now, as is well known, if one tosses a coin with probability of a head given by βδ
infinitely often, then one eventually actually gets a head: similarly, one eventually leaves
the class C(y), and because of the nature of the relation x ↔ y, one never returns.
Repeating this argument for any finite set of states in D indicates that the chain
leaves such a finite set with probability one.
There are a number of things that need to be made more rigorous in order for this
argument to be valid: the forgetfulness of the chain at the random time of returning
to y, giving the independence of the trials, is a form of the strong Markov property in
Proposition 3.4.6, and the so-called “geometric trials argument” must be formalized, as
we will do in Proposition 8.3.1 (iii).
Basically, however, this heuristic sketch is sound, and shows the directions in which
we need to go: we find absorbing irreducible sets, and then restrict our attention to
them, with the knowledge that the remainder of the states lead to clearly understood
and (at least from a stability perspective) somewhat irrelevant behavior.
Queueing models
Consider the number of customers N in the GI/M/1 queue. As shown in Proposi-
tion 3.3.1, we have P (x, x + 1) = p0 > 0, and so the structure of N ensures that by
iteration, for any x > 0
P x (0, x) > P (0, 1)P (1, 2) · · · P (x − 1, x) = [p0 ]x > 0.
But we also have P (x, 0) > 0 for any x ≥ 0: hence we conclude that for any pair
x, y ∈ X, we have
P y +1 (x, y) > P (x, 0)P y (0, y) > 0.
Thus the chain N is irreducible no matter what the distribution of the inter-arrival
times.
A similar approach shows that the embedded chain N∗ of the M/G/1 queue is always
irreducible.
4.2. ψ-Irreducibility 81
4.2 ψ-Irreducibility
4.2.1 The concept of ϕ-irreducibility
We now wish to develop similar concepts of irreducibility on a general space X. The
obvious problem with extending the ideas of Section 4.1.2 is that we cannot define an
analogue of “↔”, since, although we can look at L(x, A) to decide whether a set A
is reached from a point x with positive probability, we cannot say in general that we
return to single states x.
This is particularly the case for models such as the linear models for which the
n-step transition laws typically have densities; and even for some of the models such
as storage models where there is a distinguished reachable point, there are usually no
other states to which the chain returns with positive probability.
This means that we cannot develop a decomposition such as (4.5) based on a count-
able equivalence class structure: and indeed the question of existence of a so-called
“Doeblin decomposition”
X= C(x) ∪ D, (4.8)
x∈I
with the sets C(x) being a countable collection of absorbing sets in B(X) and the
“remainder” D being a set which is in some sense ephemeral, is a non-trivial one.
We shall not discuss such reducible decompositions in this book although, remarkably,
under a variety of reasonable conditions such a countable decomposition does hold for
chains on quite general state spaces.
Rather than developing this type of decomposition structure, it is much more fruitful
to concentrate on irreducibility analogues. The one which forms the basis for much
modern general state space analysis is ϕ-irreducibility.
82 Irreducibility
this is a special case of the resolvent of Φ introduced in Section 3.4.2, and which we
consider in Section 5.5.1 in more detail. The kernel Ka 1 defines for each x a probability
∞ 2
measure equivalent to I(x, A) + U (x, A) = n =0 P n (x, A), which may be infinite for
many sets A.
(ii) for all x ∈ X, whenever ϕ(A) > 0, there exists some n > 0, possibly depending on
both A and x, such that P n (x, A) > 0;
Proof The only point that needs to be proved is that if L(x, A) > 0 for all x ∈ Ac
then, since L(x, A) = P (x, A) + A c P (x, dy)L(y, A), we have L(x, A) > 0 for all x ∈ X:
thus the inclusion of the zero-time term in Ka 1 does not affect the irreducibility.
2
This is clearly rather weaker than normal irreducibility on countable spaces, which
demands two-way communication. Thus we now look to measures which are extensions,
not restrictions, of irreducibility measures, and show that the ϕ-irreducibility condition
extends in such a way that, if we do have an irreducible chain in the sense of Section 4.1,
then the natural irreducibility measure (namely counting measure) is generated as a
“maximal” irreducibility measure.
The maximal irreducibility measure will be seen to define the range of the chain much
more completely than some of the other more arbitrary (or pragmatic) irreducibility
measures one may construct initially.
(i) Φ is ψ-irreducible;
(ii) for any other measure ϕ , the chain Φ is ϕ -irreducible if and only if ψ ϕ ;
Proof Since any probability measure which is equivalent to the irreducibility mea-
sure ϕ is also an irreducibility measure, we can assume without loss of generality that
ϕ(X) = 1. Consider the measure ψ constructed as
ψ(A) := ϕ(dy)K 12 (y, A). (4.10)
X
It is obvious that ψ is also a probability measure on B(X). To prove that ψ has all the
required properties, we use the sets
" #
k
n −1
Ā(k) = y : P (y, A) > k .
n =1
The stated properties now involve repeated use of the Chapman–Kolmogorov equations.
To see (i), observe that when ψ(A)
$ >0, then from (4.10),% there exists some k such
that ϕ(Ā(k)) > 0, since Ā(k) ↑ y : n ≥1 P n (y, A) > 0 = X. For any fixed x, by
ϕ-irreducibility there is thus some m such that P m (x, Ā(k)) > 0. Then we have
k
k
P m +n (x, A) = P m (x, dy) P n (y, A) ≥ k −1 P m (x, Ā(k)) > 0,
n =1 X n =1
Next let ϕ be such that Φ is ϕ -irreducible. If ϕ (A) > 0, we have n P n (y, A) > 0
for all y, and by its definition ψ(A) > 0, whence ψ ϕ . Conversely, suppose that
the chain is ψ-irreducible and that ψ ϕ . If ϕ {A} > 0 then ψ{A} > 0 also, and by
ψ-irreducibility it follows that Ka 1 (x, A) > 0 for any x ∈ X. Hence Φ is ϕ -irreducible,
2
as required in (ii).
Result (iv) follows from the construction (4.10) and the fact that any two maximal
irreducibility measures are equivalent, which is a consequence of (ii).
Finally, we have that
ψ(dy)P m (y, A)2−m = ϕ(dy) P m +n (y, A)2−(n +m +1) ≤ ψ(A)
X X n
Although there are other approaches to irreducibility, we will generally restrict our-
selves, in the general space case, to the concept of ϕ-irreducibility; or rather, we will
seek conditions under which it holds. We will consistently use ψ to denote an arbitrary
maximal irreducibility measure for Φ.
ψ-Irreducibility notation
(ii) We write
B + (X) := {A ∈ B(X) : ψ(A) > 0}
for the sets of positive ψ-measure; the equivalence of maximal irre-
ducibility measures means that B + (X) is uniquely defined.
The following result indicates the links between absorbing and full sets. This result
seems somewhat academic, but we will see that it is often the key to showing that very
many properties hold for ψ-almost all states.
Proof If A is absorbing, then were ψ(Ac ) > 0, it would contradict the definition
of ψ as an irreducibility measure: hence A is full.
Suppose now that A is full, and set
∞
B = {y ∈ X : P n (y, Ac ) = 0}.
n =0
which is positive: but this is impossible, and thus B is the required absorbing set.
Proposition 4.2.4. Suppose that A is an absorbing set. Let PA denote the kernel P
restricted to the states in A. Then there exists a Markov chain ΦA whose state space
is A and whose transition matrix is given by PA . Moreover, if Φ is ψ-irreducible then
ΦA is ψ-irreducible.
The effect of these two propositions is to guarantee the effective analysis of restric-
tions of chains to full sets, and we shall see that this is indeed a fruitful avenue of
approach.
Accessibility
We say that a set B ∈ B(X) is accessible from another set A ∈ B(X) if
L(x, B) > 0 for every x ∈ A;
We say that a set B ∈ B(X) is uniformly accessible from another set
A ∈ B(X) if there exists a δ > 0 such that
introduced in (3.34).
Lemma 4.2.5. If A B and B C, then A C.
Proof Since the probability of ever reaching C is greater than the probability of
ever reaching C after the first visit to B, we have
inf UC (x, C) ≥ inf UB (x, dy)UC (y, C) ≥ inf UB (y, B) inf UC (y, C) > 0
x∈A x∈A B x∈A x∈B
as required.
We shall use the following notation to describe the communication structure of the
chain.
Communicating sets
The set Ā := {x ∈ X : L(x, A) > 0} is the set of points from which A is
accessible.
m
The set Ā(m) := {x ∈ X : n =1 P n (x, A) ≥ m−1 }.
The set A0 := {x ∈ X : L(x, A) = 0} = [Ā]c is the set of points from which
A is not accessible.
Lemma 4.2.6. The set Ā = ∪m Ā(m), and for each m we have Ā(m) A.
4.3. ψ-Irreducibility for random walk models 87
Proof The first statement is obvious, whilst the second follows by noting that for
all x ∈ Ā(m) we have
L(x, A) ≥ Px (τA ≤ m) ≥ m−2 .
It follows that if the chain is ψ-irreducible, then we can find a countable cover of
X with sets from which any other given set A in B + (X) is uniformly accessible, since
Ā = X in this case.
Proof The necessity of (4.12) is trivial. Conversely, suppose for some δ, ε > 0,
Γ(−∞, −ε) > δ. Then for any n, if x/ε < n,
P n (x, {0}) ≥ δ n > 0.
If C = [0, c] for some c, then this implies for all x ∈ C that
Px (τ0 ≤ c/ε) ≥ δ 1+c/ε
so that C {0} as in Lemma 4.2.6.
It is often as simple as this to establish ϕ-irreducibility: it is not a difficult condition
to confirm, or rather, it is often easy to set up “grossly sufficient” conditions such as
(4.12) for ϕ-irreducibility.
Such a construction guarantees ϕ-irreducibility, but it does not tell us very much
about the motion of the chain. There are clearly many sets other than {0} which the
chain will reach from any starting point. To describe them in this model we can easily
construct the maximal irreducibility measure. By considering the motion of the chain
after it reaches {0} we see that Φ is also ψ-irreducible, where
ψ(A) = P n (0, A)2−n ;
n
Provided there is some probability that no input takes place over a period long enough
to ensure that the effect of the increment Sn is eroded, we will achieve δ0 -irreducibility
in one step. This amounts to saying that we can “turn off” the input for a period longer
than s whenever the last input amount was s, or that we need a positive probability of
the input remaining turned off for longer than s/r. One sufficient condition for this is
obviously that the distribution H have infinite tails.
Such a construction may fail without the type of conditions imposed here. If, for
example, the input times are deterministic, occurring at every integer time point, and
if the input amounts are always greater than unity, then we will not have an irreducible
system: in fact we will have, in the terms of Chapter 9 below, an evanescent system
which always avoids compact sets below the initial state.
An underlying structure as pathological as this seems intuitively implausible, of
course, and is in any case easily analyzed. But in the case of content-dependent release
rules,
x
it is not so obvious that the chain is always ϕ-irreducible. If we assume R(x) =
−1
0
[r(y)] dy < ∞ as in (2.32), then again if we can “turn off” the input process for
longer than R(x) we will hit {0}; so if we have
for all x we have a δ0 -irreducible model. But if we allow R(x) = ∞ as we may wish
to do for some release rules where r(x) → 0 slowly as x → 0, which is not unrealistic,
then even if the inter-input times Ti have infinite tails, this simple construction will fail.
The empty state will never be reached, and some other approach is needed if we are to
establish ϕ-irreducibility.
In such a situation, we will still get µL e b -irreducibility, where µL e b is Lebesgue mea-
sure, if the inter-input times Ti have a density with respect to µL e b : this can be de-
termined by modifying the “turning off” construction above. Exact conditions for
ϕ-irreducibility in the completely general case appear to be unknown to date.
Φk +1 = Φk + Wk +1 ,
and satisfying the assumption (RW1), let us suppose the increment distribution Γ of
{Wn } has an absolutely continuous part with respect to Lebesgue measure µL e b on R,
4.4. ψ-Irreducible linear models 89
with a density γ which is positive and bounded from zero at the origin; that is, for some
β > 0, δ > 0,
P(Wn ∈ A) ≥ γ(x) dx,
A
and
γ(x) ≥ δ > 0, |x| < β.
Set C = {x : |x| ≤ β/2} : if B ⊆ C, and x ∈ C then
P (x, B) = P (W1 ∈ B − x)
≥ γ(y) dy
B −x
≥ δµL e b (B).
But now, exactly as in the previous example, from any x we can reach C in at most n =
2|x|/β steps with positive probability, so that µL e b restricted to C forms an irreducibility
measure for the unrestricted random walk.
Such behavior might not hold without a density. Suppose we take Γ concentrated
on the rationals Q, with Γ(r) > 0, r ∈ Q. After starting at a value r ∈ Q the chain Φ
“lives” on the set {r + q, q ∈ Q} = Q so that Q is absorbing. But for any x ∈ R the
set {x + q, q ∈ Q} = x + Q is also absorbing, and thus we can produce, for this random
walk on R, an uncountably infinite number of absorbing irreducible sets.
It is precisely this type of behavior we seek to exclude for chains on a general space,
by introducing the concepts of ψ-irreducibility above.
Yn = α1 Yn −1 + α2 Yn −2 + · · · + αk Yn −k + Wn ,
This same argument applies to the general model (2.1) if the zeros of the polynomial
A(z) = 1 − α1 z 1 − · · · − αk z k lie outside of the closed unit disk in the complex plane C.
In this case Yn → 0 as n → ∞ when Wn is set equal to zero, and from this observation
it follows that it is possible for the chain to reach [−1, 1] at some time in the future
from every initial condition. If some root of A(z) lies within the open unit disk in C
then again “explosion” will occur and the chain will not be irreducible.
Our argument here is rather like that in the dam model, where we considered de-
terministic behavior with the input “turned off”. We need to be able to drive the chain
deterministically towards a center of the space, and then to be able to ensure that the
random mechanism ensures that the behavior of the chain from initial conditions in
that center are comparable.
We formalize this for multidimensional linear models in the rest of this section.
Cn := [F n −1 G | · · · | F G | G] (4.13)
is called the controllability matrix, and the pair of matrices (F, G) is called
controllable if the controllability matrix Cn has rank n.
It is a consequence of the Cayley Hamilton Theorem, which states that any power F k
is equal to a linear combination of {I, F, . . . , F n −1 }, where n is equal to the dimension
of F (see [57] for details), that (F, G) is controllable if and only if
[F k −1 G | · · · | F G | G]
4.4. ψ-Irreducible linear models 91
Proposition 4.4.1. The linear control model LCM(F ,G) is controllable if the pair
(F, G) satisfy the rank condition (LCM3).
Proof When this rank condition holds it is straightforward that in the LCM(F ,G)
model any state can be reached from any initial condition in k steps using some control
sequence (u1 , . . . , uk ), for we have by
u1
xk = F k x0 + [F k −1 G | · · · | F G | G] ... (4.14)
uk
and the rank condition implies that the range space of the matrix [F k −1 G | · · · | F G | G]
is equal to Rn .
This gives us as an immediate application
Proposition 4.4.2. The autoregressive AR(k) model may be described by a linear con-
trol model (LCM1), which can always be constructed so that it is controllable.
Proof For the linear control model associated with the autoregressive model de-
scribed by (2.1), the state process x is defined inductively by
α1 ··· ··· αk 1
1 0 0
xn = .. .. xn −1 + .. un ,
. . .
0 1 0 0
k
ηj = αi ηj −i .
i=1
The triangular structure of the controllability matrix now implies that the linear control
system associated with the AR(k) model is controllable.
92 Irreducibility
If the dimension p of the noise were the same as the dimension n of the space, and if
the matrix G were full rank, then the argument for scalar models in Section 4.4 would
immediately imply that the chain is µL e b -irreducible. In more general situations we use
controllability to ensure that the chain is µL e b -irreducible.
Proposition 4.4.3. Suppose that the LSS(F ,G) model is Gaussian and the associated
control model is controllable.
Then the LSS(F ,G) model is ϕ-irreducible for any non-trivial measure ϕ which
possesses a density on Rn , Lebesgue measure is a maximal irreducibility measure, and
for any compact set A and any set B with positive Lebesgue measure we have A B.
Proof If we can prove that the distribution P k (x, · ) is absolutely continuous with
respect to Lebesgue measure, and has a density which is everywhere positive on Rn , it
will follow that for any ϕ which is non-trivial and also possesses a density, P k (x, · ) ϕ
for all x ∈ Rn : for any such ϕ the chain is then ϕ-irreducible. This argument also shows
that Lebesgue measure is a maximal irreducibility measure for the chain.
Under condition (LSS3), for each deterministic initial condition x0 ∈ X = Rn , the
distribution of Xk is also Gaussian for each k ∈ Z+ by linearity, and so we need only
to prove that P k (x, · ) is not concentrated on some lower dimensional subspace of Rn .
This will happen if and only if the variance of the distribution P k (x, · ) is of full rank
for each x.
We can compute the mean and variance of Xk to obtain conditions under which this
occurs. Using (4.14) and (LSS3), for each initial condition x0 ∈ X the conditional mean
of Xk is easily computed as
Using (4.16), the variance of Xk has full rank n for some k if and only if the controllability
grammian, defined as
∞
F i GG F i , (4.17)
i=0
has rank n. From the Cayley Hamilton Theorem again, the conditional variance of Xk
has rank n for some k if and only if the pair (F, G) is controllable and, if this is the
case, then one can take k = n.
Under (LSS1)–(LSS3), it thus follows that the k-step transition function possesses
a smooth density; we have P k (x, dy) = pk (x, y)dy where
( )
pk (x, y) = (2π|Σk |)−k /2 exp − 21 (y − F k x) Σ−1
k (y − F x)
k
(4.18)
and |Σk | denotes the determinant of the matrix Σk . Hence P k (x, · ) has a density which
is everywhere positive, as required, and this implies finally that for any compact set A
and any set B with positive Lebesgue measure we have A B.
Assuming, as we do in the result above, that W has a density which is everywhere
positive is clearly something of a sledge hammer approach to obtaining ψ-irreducibility,
even though it may be widely satisfied. We will introduce more delicate methods in
Chapter 7 which will allow us to relax the conditions of Proposition 4.4.3.
Even if (F, G) is not controllable then we can obtain an irreducible process, by
appropriate restriction of the space on which the chain evolves, under the Gaussian
assumption. To define this formally, we let X0 ⊂ X denote the range space of the
controllability matrix:
X0 = R [F n −1 G | · · · | F G | G]
$n−1 %
= F i Gwi : wi ∈ Rp ,
i=0
4.5 Commentary
The communicating class concept was introduced in the initial development of countable
chains by Kolmogorov [216] and used systematically by Feller [114] and Chung [71] in
developing solidarity properties of states in such a class.
94 Irreducibility
The use of ψ-irreducibility as a basic tool for general chains was essentially developed
by Doeblin [93, 95], and followed up by many authors, including Doob [99], Harris [155],
Chung [70], Orey [308]. Much of their analysis is considered in greater detail in later
chapters. The maximal irreducibility measure was introduced by Tweedie [394], and the
result on full sets is given in the form we use by Nummelin [303]. Although relatively
simple they have wide-ranging implications.
Other notions of irreducibility exist for general state space Markov chains. One can,
for example, require that the transition probabilities
∞
K 12 (x, ·) = P n (x, ·)2−(n +1)
n =0
all have the same null sets. In this case the maximal measure ψ will be equivalent to
K 12 (x, ·) for every x. This was used by Nelson [291] and Šidák [353] to derive solidarity
properties for general state space chains similar to those we will consider in Part II. This
condition, though, is hard to check, since one needs to know the structure of P n (x, ·)
in some detail; and it appears too restrictive for the minor gains it leads to.
In the other direction, one might weaken ϕ-irreducibility by requiring only that,
whenever ϕ(A) > 0, we have n P n (x, A) > 0 only for ϕ-almost all x ∈ X. Whilst
this expands the class of “irreducible” models, it does not appear to be noticeably more
useful in practice, and has the drawback that many results are much harder to prove
as one tracks the uncountably many null sets which may appear. Revuz [326] Chapter
3 has a discussion of some of the results of using this weakened form.
The existence of a block decomposition of the form
X= C(x) ∪D
x∈I
such as that for countable chains, where the sum is of disjoint irreducible sets and D is
in some sense ephemeral, has been widely studied. A recent overview is in Meyn and
Tweedie [281], and the original ideas go back, as so often, to Doeblin [95], after whom
such decompositions are named. Orey [309], Chapter 9, gives a very accessible account
of the measure-theoretic approach to the Doeblin decomposition.
Application of results for ψ-irreducible chains has become more widespread recently,
but the actual usage has suffered a little because of the somewhat inadequate available
discussion in the literature of practical methods of verifying ψ-irreducibility. Typically
the assumptions are far too restrictive, as is the case in assuming that innovation pro-
cesses have everywhere positive densities or that accessible regenerative atoms exist (see
for example Laslett et al. [237] for simple operations research models, or Tong [388] in
time series analysis).
The detailed analysis of the linear model begun here illustrates one of the recur-
ring themes of this book: the derivation of stability properties for stochastic models
by consideration of the properties of analogous controlled deterministic systems. The
methods described here have surprisingly complete generalizations to nonlinear mod-
els. We will come back to this in Chapter 7 when we characterize irreducibility for the
NSS(F ) model using ideas from nonlinear control theory.
4.5. Commentary 95
Pseudo-atoms
Much Markov chain theory on a general state space can be developed in complete
analogy with the countable state situation when X contains an atom for the chain Φ.
Atoms
A set α ∈ B(X) is called an atom for Φ if there exists a measure ν on B(X)
such that
P (x, A) = ν(A), x ∈ α.
A single point α is always an atom. Clearly, when X is countable and the chain is
irreducible then every point is an accessible atom.
On a general state space, accessible atoms are less frequent. For the random walk
on a half line as in (RWHL1), the set {0} is an accessible atom when Γ(−∞, 0) > 0:
as we have seen in Proposition 4.3.1, this chain has ψ({0}) > 0. But for the random
walk on R when Γ has a density, accessible atoms do not exist.
It is not too strong to say that the single result which makes general state space
Markov chain theory as powerful as countable space theory is that there exists an
“artificial atom” for ϕ-irreducible chains, even in cases such as the random walk with
absolutely continuous increments. The highlight of this chapter is the development of
this result, and some of its immediate consequences.
Atoms are found for “strongly aperiodic” chains by constructing a “split chain” Φ̌
evolving on a split state space X̌ = X0 ∪ X1 , where X0 and X1 are copies of the state
space X, in such a way that
(i) the chain Φ is the marginal chain of Φ̌, in the sense that P(Φk ∈ A) = P(Φ̌k ∈
A0 ∪ A1 ) for appropriate initial distributions, and
96
5.1. Splitting ϕ-irreducible chains 97
The existence of a splitting of the state space in such a way that the bottom level is an
atom is proved in the next section. The proof requires the existence of so-called “small
sets” C, which have the property that there exists an m > 0, and a minorizing measure
ν on B(X) such that for any x ∈ C,
where each Ci is small: thus we have that the splitting is always possible for such chains.
Another non-trivial consequence of the introduction of small sets is that on a general
space we have a finite cyclic decomposition for ψ-irreducible chains: there is a cycle of
sets Di , i = 0, 1, . . . , d − 1 such that
*
d−1
X=N∪ Di
0
where ψ(N ) = 0 and P (x, Di ) ≡ 1 for x ∈ Di−1 (mod d). A more general and more
tractable class of sets called petite sets are introduced in Section 5.5: these are used
extensively in the sequel, and in Theorem 5.5.7 we show that every petite set is small
if the chain is aperiodic.
Proposition 5.1.2. If L(x, A) > 0 for some state x ∈ α, where α is an atom, then
α A.
In many cases the “atoms” in a state space will be real atoms: that is, single points
which are reached with positive probability.
Consider the level in a dam in any of the storage models analyzed in Section 4.3.2.
It follows from Proposition 4.3.1 that the single point {0} forms an accessible atom
satisfying the hypotheses of Proposition 5.1.1, even when the input and output processes
are continuous.
However, our reason for featuring atoms is not because some models have singletons
which can be reached with probability one: it is because even in the completely general
ψ-irreducible case, by suitably extending the probabilistic structure of the chain, we are
able to artificially construct sets which have an atomic structure and this allows much
of the critical analysis to follow the form of the countable chain theory.
This unexpected result is perhaps the major innovation in the analysis of general
Markov chains in the last two decades. It was discovered in slightly different forms,
independently and virtually simultaneously, by Nummelin [301] and by Athreya and
Ney [13].
Although the two methods are almost identical in a formal sense, in what follows we
will concentrate on the Nummelin splitting, touching only briefly on the Athreya–Ney
random renewal time method as it fits less well into the techniques of the rest of this
book.
Minorization condition
For some δ > 0, some C ∈ B(X) and some probability measure ν with
ν(C c ) = 0 and ν(C) = 1
The form (5.2) ensures that the chain has probabilities uniformly bounded below
by multiples of ν for every x ∈ C. The crucial question is, of course, whether any
chains ever satisfy the minorization condition. This is answered in the positive in
Theorem 5.2.2 below: for ϕ-irreducible chains “small sets” for which the minorization
condition holds exist, at least for the m-skeleton. The existence of such small sets is
a deep and difficult result: by indicating first how the minorization condition provides
5.1. Splitting ϕ-irreducible chains 99
the promised atomic structure to a split chain, we motivate rather more strongly the
development of Theorem 5.2.2.
In order to construct a split chain, we split both the space and all measures that
are defined on B(X).
We first split the space X itself by writing X̌ = X × {0, 1}, where X0 := X × {0} and
X1 := X × {1} are thought of as copies of X equipped with copies B(X0 ), B(X1 ) of the
σ-field B(X)
We let B(X̌) be the σ-field of subsets of X̌ generated by B(X0 ), B(X1 ): that is, B(X̌)
is the smallest σ-field containing sets of the form A0 :=A×{0}, A1 :=A×{1}, A ∈ B(X).
We will write xi , i = 0, 1 for elements of X̌, with x0 denoting members of the upper
level X0 and x1 denoting members of the lower level X1 . In order to describe more
easily the calculations associated with moving between the original and the split chain,
we will also sometimes call X0 the copy of X, and we will say that A ∈ B(X) is a copy
of the corresponding set A0 ⊆ X0 .
If λ is any measure on B(X), then the next step in the construction is to split the
measure λ into two measures on each of X0 and X1 by defining the measure λ∗ on B(X̌)
through
+
λ∗ (A0 ) = λ(A ∩ C)[1 − δ] + λ(A ∩ C c )
(5.3)
λ∗ (A1 ) = λ(A ∩ C)δ
where δ and C are the constant and the set in (5.2). Note that in this sense the splitting
is dependent on the choice of the set C, and although in general the set chosen is not
relevant, we will on occasion need to make explicit the set in (5.2) when we use the split
chain.
It is critical to note that λ is the marginal measure induced by λ∗ , in the sense that
for any A in B(X) we have
λ∗ (A0 ∪ A1 ) = λ(A). (5.4)
In the case when A ⊆ C c , we have λ∗ (A0 ) = λ(A); only subsets of C are really effectively
split by this construction.
Now the third, and most subtle, step in the construction is to split the chain Φ to
form a chain Φ̌ which lives on (X̌, B(X̌)). Define the split kernel P̌ (xi , A) for xi ∈ X̌ and
A ∈ B(X̌) by
P̌ (x1 , · ) = ν ∗ ( · ), x1 ∈ X1 . (5.7)
where C, δ and ν are the set, the constant and the measure in the minorization condition.
Outside C the chain {Φ̌n } behaves just like {Φn }, moving on the “top” half X0 of
the split space. Each time it arrives in C, it is “split”; with probability 1 − δ it remains
in C0 , with probability δ it drops to C1 . We can think of this splitting of the chain as
tossing a δ-weighted coin to decide which level to choose on each arrival in the set C
where the split takes place.
100 Pseudo-atoms
When the chain remains on the top level its next step has the modified law (5.6).
That (5.6) is always non-negative follows from (5.2). This is the sole use of the mi-
norization condition, although without it this chain cannot be defined.
Note here the whole point of the construction: the bottom level X1 is an atom,
with ϕ∗ (X1 ) = δϕ(C) > 0 whenever the chain Φ is ϕ-irreducible. By (5.3) we have
P̌ n (xi , X1 \C1 ) = 0 for all n ≥ 1 and all xi ∈ X̌, so that the atom C1 ⊆ X1 is the only
part of the bottom level which is reached with positive probability. We will use the
notation
α̌ := C1 (5.8)
when we wish to emphasize the fact that all transitions out of C1 are identical, so that
C1 is an atom in X̌.
Proof (i) From the linearity of the splitting operation we only need to check the
equivalence in the special case of λ = δx , and k = 1. This follows by direct computation.
We analyze two cases separately.
Suppose first that x ∈ C c . Then, by (5.5) and (5.4),
δx∗ (dyi )P̌ (yi , A0 ∪ A1 ) = P̌ (x0 , A0 ∪ A1 ) = P (x, A) .
X̌
On the other hand suppose x ∈ C. Then, from (5.6), (5.7) and (5.4) again,
δx∗ (dyi )P̌ (yi , A0 ∪ A1 )
X̌
= (1 − δ)P̌ (x0 , A0 ∪ A1 ) + δ P̌ (x1 , A0 ∪ A1 )
= (1 − δ) [1 − δ]−1 [P ∗ (x, A0 ∪ A1 ) − δν ∗ (A0 ∪ A1 )] + δν ∗ (A0 ∪ A1 )
= P (x, A).
(ii) If the split chain is ϕ∗ -irreducible it is straightforward that the original chain
is ϕ-irreducible from (i). The converse follows from the fact that α̌ is an accessible
atom if ϕ(C) > 0, which is easy to check, and Proposition 5.1.1.
5.1. Splitting ϕ-irreducible chains 101
The following identity will prove crucial in later development. For any measure µ
on B(X) we have ∗
∗
µ (dxi )P̌ (xi , · ) = µ(dx)P (x, · ) (5.10)
X̌ X
or, using operator notation, µ P̌ = (µP ) . This follows from the definition of the ∗ op-
∗ ∗
eration and the transition function P̌ , and is in effect a restatement of Theorem 5.1.3 (i).
Since it is only the marginal chain Φ which is really of interest, we will usually
consider only sets of the form Ǎ = A0 ∪ A1 , where A ∈ B(X), and we will largely restrict
ourselves to functions on X̌ of the form fˇ(xi ) = f (xi ), where f is some function on X;
that is, fˇ is identical on the two copies of X. By (5.9) we have for any k, any initial
distribution λ, and any function fˇ identical on X0 and X1
The details are, however, slightly less easy than for the approach we give above although
there are some other advantages to the approach through (5.11): the interested reader
should consult Nummelin [303] for more details.
The construction of a split chain is of some value in the next several chapters,
although much of the analysis will be done directly using the small sets of the next
section. The Nummelin splitting technique will, however, be central in our approach to
the asymptotic results of Part III.
To construct τ , let Φ run until it hits C; from (5.12) this happens eventually with
probability one. The time and place of first hitting C will be, say, k and x. Then with
102 Pseudo-atoms
from (5.2) Q is a probability measure, as in (5.6). Repeat this procedure each time
Φ enters C; since this happens infinitely often from (5.12) (a fact yet to be proven in
Chapter 9), and each time there is an independent probability δ of choosing ν, it is
intuitively clear that sooner or later this version of Φk is chosen. Let the time when it
occurs be τ . Then Px (τ < ∞) = 1 and (5.13) clearly holds; and (5.13) says that τ is a
regeneration time for the chain.
The two constructions are very close in spirit: if we consider the split chain con-
struction then we can take the random time τ as τα̌ , which is identical to the hitting
time on the bottom level of the split space.
There are advantages to both approaches, but the Nummelin splitting does not re-
quire the recurrence assumption (5.12), and more pertinently, it exploits the rather deep
fact that some m-skeleton always obeys the minorization condition when ψ-irreducibility
holds, as we now see.
Small sets
A set C ∈ B(X) is called a small set if there exists an m > 0, and a
non-trivial measure νm on B(X), such that for all x ∈ C, B ∈ B(X),
The central result (Theorem 5.2.2 below), on which a great deal of the subsequent
development rests, is that for a ψ-irreducible chain, every set A ∈ B + (X) contains
a small set in B + (X). As a consequence, every ψ-irreducible chain admits some m-
skeleton which can be split, and for which the atomic structure of the split chain can
be exploited.
5.2. Small sets 103
In order to prove this result, we need for the first time to consider the densities of
the transition probability kernels. Being a probability measure on (X, B(X)) for each
individual x and each n, the transition probability kernel P n (x, ·) admits a Lebesgue
decomposition into its absolutely continuous and singular parts, with respect to any
finite non-trivial measure φ on B(X) : we have for any fixed x and B ∈ B(X)
P n (x, B) = pn (x, y)φ(dy) + P⊥ (x, B). (5.15)
B
Theorem 5.2.1. Suppose φ is a σ-finite measure on (X, B(X)). Suppose A is any set
in B(X) with φ(A) > 0 such that
∞
φ(B) > 0, B ⊆ A ⇒ P k (x, B) > 0, x ∈ A.
k =1
Then, for every n, the function pn defined in (5.15) can be chosen to be a measurable
function on X2 , and there exists C ⊆ A, m > 1, and δ > 0 such that φ(C) > 0 and
pm (x, y) > δ, x, y ∈ C. (5.16)
Proof We include a detailed proof because of the central place small sets hold in
the development of the theory of ψ-irreducible Markov chains. However, the proof is
somewhat complex, and may be omitted without interrupting the flow of understanding
at this point.
It is a standard result that the densities pn (x, y) of P n (x, · ) with respect to φ exist
for each x ∈ X, and are unique except for definition on φ-null sets. We first need to
verify that
(i) the densities pn (x, y) can be chosen jointly measurable in x and y, for each n;
(ii) the densities pn (x, y) can be chosen to satisfy an appropriate form of the
Chapman–Kolmogorov property, namely for n, m ∈ Z+ , and all x, z
p n +m
(x, z) ≥ pn (x, y)pm (y, z)φ(dy). (5.17)
X
To see (i), we appeal to the fact that B(X) is assumed countably generated. This means
that there exists a sequence {Bi ; i ≥ 1} of finite partitions of X, such that Bi+1 is a
refinement of Bi , and which generate B(X). Fix x ∈ X, and let Bi (x) denote the element
in Bi with x ∈ Bi (x).
For each i, the functions
1 0, φ(Bi (y)) = 0,
pi (x, y) =
P (x, Bi (y))/φ(Bi (y)), φ(Bi (y)) > 0
are non-negative, and are clearly jointly measurable in x and y. The Basic Differenti-
ation Theorem for measures (cf. Doob [99], Chapter 7, Section 8) now assures us that
for y outside a φ-null set N ,
p1∞ (x, y) = lim p1i (x, y) (5.18)
i→∞
104 Pseudo-atoms
One can now check (see Orey [309] p. 6) that the collection {pn (x, y), x, y ∈ X, n ∈ Z+ }
satisfies both (i) and (ii).
We next verify (5.16). The constraints on φ in the statement of Theorem 5.2.1 imply
that
∞
pn (x, y) > 0, x ∈ A, a.e. y ∈ A [φ];
n =1
We suppress the notational dependence on η from now on, since η is fixed for the
remainder of the proof.
For any x, y, set Bi (x, y) = Bi (x) × Bi (y), where Bi (x) is again the element con-
taining x of the finite partition Bi above. By the Basic Differentiation Theorem as in
(5.18), this time for measures on B(X) × B(X), there are φ2 -null sets Nk ⊆ X × X such
that for any k and (x, y) ∈ Ak \Nk ,
≥ [η 2 /2]φ(Bj (v))
≥ δ1 , say . (5.24)
To finish the proof, note that since φ(En ) > 0, there is an integer k and a set C ⊆ Dm
with P k (x, En ) > δ2 > 0, for all x ∈ C. It then follows from the construction of the
densities above that for all x, z ∈ C
pk +n +m (x, z) ≥ P k (x, dy)pn +m (y, z)
En
≥ δ1 δ2 ,
Proof When Φ is ψ-irreducible, every set in B + (X) satisfies the conditions of The-
orem 5.2.1, with the measure φ = ψ. The result then follows immediately from (5.16).
As a direct corollary of this result we have
Theorem 5.2.3. If Φ is ψ-irreducible, then the minorization condition holds for some
m-skeleton, and for every Ka ε -chain, 0 < ε < 1.
Any Φ which is ψ-irreducible is well endowed with small sets from Theorem 5.2.1,
even though it is far from clear from the initial definition that this should be the case.
Given the existence of just one small set from Theorem 5.2.2, we now show that it is
further possible to cover the whole of X with small sets in the ψ-irreducible case.
(ii) Since Φ is ψ-irreducible, there exists a νm -small set C ∈ B + (X) from Theo-
rem 5.2.2. Moreover from the definition of ψ-irreducibility the sets
For spread-out random walks, we find that small sets are in general relatively easy
to find.
Proposition 5.3.1. If Φ is a spread-out random walk, with Γn ∗ non-singular with
respect to µL e b then there is a neighborhood Cβ = {x : |x| ≤ β} of the origin which is
ν2n -small, where ν2n = εµL e b I[s,t] for some interval [s, t], and some ε > 0.
Proof Since Γ is spread out, we have for some bounded non-negative function γ
with γ(x) dx > 0, and some n > 0,
P n (0, A) ≥ γ(x) dx, A ∈ B(R).
A
but since from Lemma D.4.3 the convolution γ ∗ γ(x) is continuous and not identically
zero, there exists an interval [a, b] and a δ with γ∗γ(x) ≥ δ on [a, b]. Choose β = [b−a]/4,
and [s, t] = [a + β, b − β], to prove the result using the translation invariant properties
of the random walk.
For spread out random walks, a far stronger irreducibility result will be provided in
Chapter 6 : there we will show that if Φ is a random walk with spread-out increment
distribution Γ, with Γ(−∞, 0) > 0, Γ(0, ∞) > 0, then Φ is µL e b -irreducible, and every
compact set is a small set.
108 Pseudo-atoms
P (i, x; j × A) = 0, j > i + 1,
P (i, x; j × A) = Λi−j +1 (x, A), j =, 1, . . . , i + 1,
P (i, x; 0 × A) = Λ∗i (x, A),
where
∞
Λn (x, [0, y]) = Pnt (x, y), G(dt), (5.29)
0
∞
Λ∗n (x, [0, y]) = Λj (x, [0, ∞)) H[0, y], (5.30)
n +1
and since
Λ0 (x, [0, ∞)] = G(dt)P(0 ≤ t < σ1 | R0 = x)
= G(dt)I{t < x}
= G(−∞, x],
we have
Λ∗0 (x, [0, · ]) = H[0, · ]G(x, ∞).
The result follows immediately, since for x < β, Λ∗0 (x, [0, · ]) ≥ H[0, · ]G(β, ∞).
5.3. Small sets for specific models 109
Proof As in (5.28), since Γ is spread out there exists n ∈ Z+ , an interval [a, b] and
a constant β > 0 such that
Now choose m ≥ 1 such that Γ[mδ, (m + 1)δ) = γ > 0; and set M = k + m + 2. Then
for x ∈ [0, δ), by considering the occurrence of the nth renewal where n is the index so
that (5.32) holds we find
Px (V + (M δ) ∈ du ∩ [0, δ))
≥ P0 (x + Zn +1 − M δ ∈ du ∩ [0, δ), Yn +1 ≥ δ)
= Γ(dy)P0 (x + y − M δ + Zn ∈ du ∩ [0, δ)) (5.33)
y ∈[δ,∞)
≥ Γ(dy)P0 (Zn ∈ du ∩ {[0, δ) − x − y + M δ}).
y ∈[m δ,(m +1)δ )
Hence [0, δ) is a small set, and the measure ν can be chosen as a multiple of Lebesgue
measure over [0, δ).
In this proof we have demanded that (5.32) holds for u ∈ [kδ, (k + 4)δ] and in (5.34)
we only used the fact that the equation holds for u ∈ [kδ, (k + 3)δ]. This is not an
oversight: we will use the larger range in showing in Proposition 5.4.5 that the chain is
also aperiodic.
110 Pseudo-atoms
k −1
P k (x0 , · ) = N (F k x0 , F i GG F i ); (5.36)
i=0
and if (F, G) is controllable then from (4.18) the n-step transition function possesses a
smooth density pn (x, y) which is continuous and everywhere positive on R2n . It follows
from continuity that for any pair of bounded open balls B1 and B2 ⊂ Rn , there exists
ε > 0 such that
pn (x, y) ≥ ε, (x, y) ∈ B1 × B2 .
Letting νn denote the normalized uniform distribution on B2 we see that B1 is νn -small.
This shows that for the controllable, Gaussian LSS(F ,G) model, all compact subsets
of the state space are small.
Here, if we start in x then we have P n (x, x) > 0 if and only if n = 0, d, 2d, . . ., and the
chain Φ is said to cycle through the states of X.
On a continuous state space the same phenomenon can be constructed equally easily:
let X = [0, d), let Ui denote the uniform distribution on [i, i + 1), and define
In this example, the chain again cycles through a fixed finite number of sets. We now
prove a series of results which indicate that, no matter how complex the behavior of a
ψ-irreducible chain, or a chain on an irreducible absorbing set, the finite cyclic behavior
of these examples is typical of the worst behavior to be found.
This does not guarantee that P m d(α ) (α, α) > 0 for all m, but it does imply P n (α, α) =
0 unless n = md(α), for some m.
We call d(α) the period of α. The result we now show is that the value of d(α) is
common to all states y in the class C(α) = {y : α ↔ y}, rather than taking a separate
value for each y.
Proposition 5.4.1. Suppose α has period d(α): then for any y ∈ C(α), d(α) = d(y).
Proof Since α ↔ y, we can find m and n such that P m (α, y) > 0 and P n (y, α) > 0.
By the Chapman–Kolmogorov equations, we have
we have P k (y, y) = 0, which proves d(y) ≥ d(α). Reversing the role of α and y shows
d(α) ≥ d(y), which gives the result.
This result leads to a further decomposition of the transition probability matrix for
an irreducible chain; or, equivalently, within a communicating class.
Proposition 5.4.2. Let Φ be an irreducible Markov chain on a countable space, and
let d denote the common period of the states in X. Then there exist disjoint sets
D1 , . . . , Dd ⊆ X such that
*d
X= Dk ,
i=1
and
P (x, Dk +1 ) = 1, x ∈ Dk , k = 0, . . . , d − 1 (mod d). (5.39)
P M (y, α) > 0.
Let k be any other integer such that P k (α, y) > 0. Then P k +M (α, α) > 0, and thus
k + M = jd for some j; equivalently, k = jd − M . Now M is fixed, and so we must
have P k (α, y) > 0 only for k in the sequence {r, r + d, r + 2d, . . .}, where the integer
r = r(y) ∈ {1, . . . , d} is uniquely defined for y.
Call Dr the set of states which are reached with positive probability from α only
at points in the sequence {r, r + d, r + 2d, . . .} for each r ∈ {1, 2, . . . , d}. By definition
α ∈ Dd , and P (α, D1c ) = 0 so that P (α, D1 ) = 1. Similarly, for any y ∈ Dr we have
P (y, Drc +1 ) = 0, giving our result.
112 Pseudo-atoms
The sets {Di } covering X and satisfying (5.39) are called cyclic classes, or a d-cycle,
of Φ. With probability one, each sample path of the process Φ “cycles” through values
in the sets D1 , D2 , . . . , Dd , D1 , D2 , . . ..
Diagrammatically, we have shown that we can write an irreducible transition prob-
ability matrix in “super-diagonal” form
0 P1
0 0 P2 0
.. . .
P = . . 0 P 3
. . . .
.. .. .. 0 ..
Pd . . . . . . . . . 0
where each block Pi is a square matrix whose dimension may depend upon i.
Aperiodicity
An irreducible chain on a countable space X is called
Whilst cyclic behavior can certainly occur, as illustrated in the examples at the
beginning of this section, and the periodic behavior of the control systems in Theo-
rem 7.3.3 below, most of our results will be given for aperiodic chains. The justification
for using such chains is contained in the following, whose proof is obvious.
be the set of time points for which C is a small set with minorizing measure proportional
to ν. Notice that for B ⊆ C, n, m ∈ EC implies
P n +m
(x, B) ≥ P m (x, dy)P n (y, B)
C
≥ [δm δn ν(C)]ν(B), x ∈ C;
so that EC is closed under addition. Thus there is a natural “period” for the set C,
given by the greatest common divisor of EC ; and from Lemma D.7.4, C is νn d -small
for all large enough n.
We show that this value is in fact a property of the whole chain Φ, and is independent
of the particular small set chosen, in the following analogue of Proposition 5.4.2.
+
Theorem 5.4.4. Suppose that Φ is a ψ-irreducible Markov chain on X. Let C ∈ B(X)
be a νM -small set and let d be the greatest common divisor of the set EC . Then there
exist disjoint sets D1 . . . Dd ∈ B(X) (a “d-cycle”) such that
The d-cycle {Di } is maximal in the sense that for any other collection {d , Dk , k =
1, . . . , d } satisfying (i)–(ii), we have d dividing d; whilst if d = d , then, by reordering
the indices if necessary, Di = Di a.e. ψ.
by irreducibility, X = ∪Di∗ .
The Di∗ are in general not disjoint, but we can show that their intersection is ψ-null.
For suppose there exists i, k such that ψ(Di∗ ∩ Dk∗ ) > 0. Then for some fixed m, n > 0,
there is a subset A ⊆ Di∗ ∩ Dk∗ with ψ(A) > 0 such that
and since ψ is the maximal irreducibility measure, we can also find r such that
ν(dy)P r (y, A) = δc > 0. (5.42)
C
Now we use the fact that C is a νM -small set: for x ∈ C, B ⊆ C, from (5.41), (5.42),
P 2M +m d−i+r
(x, B) ≥ M
P (x, dy) r
P (y, dw) P m d−i (w, dz)P M (z, B)
C A C
≥ [δc δm ]ν(B),
114 Pseudo-atoms
This result shows that it is clearly desirable to work with strongly aperiodic chains.
Regrettably, this condition is not satisfied in general, even for simple chains; and we will
5.5. Petite sets and sampled chains 115
often have to prove results for strongly aperiodic chains and then use special methods
to extend them to general chains through the m-skeleton or the Ka ε -chain.
We will however concentrate almost exclusively on aperiodic chains. In practice this
is not greatly restrictive, since we have as in the countable case
Proposition 5.4.6. Suppose Φ is a ψ-irreducible chain with period d and d-cycle
{Di , i = 1, . . . , d}. Then each of the sets Di is an absorbing ψ-irreducible set for the
chain Φd corresponding to the transition probability kernel P d , and Φd on each Di is
aperiodic.
in Section 3.5.3. Here, we can find explicit conditions for aperiodicity even though the
chain has no atom in the space. We have
Proposition 5.4.7. If F is spread out, then V +
δ is aperiodic for sufficiently small δ.
Proof In Proposition 5.3.3 we showed that for sufficiently small δ, the set [0, δ) is
a νM -small set, where ν is a multiple of Lebesgue measure restricted to [0, δ].
But since the bounds on the densities in (5.35) hold, not just for the range [kδ, (k +
3)δ) for which they were used, but by construction for the greater range [kδ, (k + 4)δ),
the same proof shows that [0, δ) is a νM +1 -small set also, and thus aperiodicity follows
from the definition of the period of V +δ as the g.c.d. in (5.40).
then the kernel Ka ε is the resolvent Kε which was defined in Chapter 3. The concept
of sampled chains immediately enables us to develop useful conditions under which one
set is uniformly accessible from another. We say that a set B ∈ B(X) is uniformly
accessible using a from another set A ∈ B(X) if there exists a δ > 0 such that
a
and when (5.44) holds we write A B.
a
Lemma 5.5.1. If A B for some distribution a, then A B.
Lemma 5.5.2. (i) If a and b are distributions on Z+ , then the sampled chains with
transition laws Ka and Kb satisfy the generalized Chapman–Kolmogorov equations
Ka∗b (x, A) = Ka (x, dy)Kb (y, A) (5.46)
a b a∗b
(ii) If A B and B C, then A C.
(iii) If a is a distribution on Z+ , then the sampled chain with transition law Ka satisfies
the relation
U (x, A) ≥ U (x, dy)Ka (y, A). (5.47)
5.5. Petite sets and sampled chains 117
Proof To see (i), observe that by definition and the Chapman–Kolmogorov equa-
tion
∞
Ka∗b (x, A) = P n (x, A) a ∗ b(n)
n =0
∞
n
= P n (x, A) a(m)b(n − m)
n =0 m =0
∞ n
= P m (x, dy)P n −m (y, A)a(m)b(n − m)
n =0 m =0
∞ ∞
= P m (x, dy)a(m) P n −m (y, A)b(n − m)
m =0 n =m
= Ka (x, dy)Kb (yA), (5.48)
as required.
The result (ii) follows directly from (5.46) and the definitions.
For (iii), note that for fixed m, n,
P m +n (x, A)a(n) = P m (x, dy)P n (y, A)a(n)
Petite sets
We will call a set C ∈ B(X) νa -petite if the sampled chain satisfies the
bound
Ka (x, B) ≥ νa (B),
for all x ∈ C, B ∈ B(X), where νa is a non-trivial measure on B(X).
118 Pseudo-atoms
From the definitions we see that a small set is petite, with the sampling distribution
a taken as δm for some m. Hence the property of being a small set is in general stronger
than the property of being petite. We state this formally as
Proposition 5.5.3. If C ∈ B(X) is νm -small, then C is νδ m -petite.
a
The operation “” interacts usefully with the petiteness property. We have
b
Proposition 5.5.4. (i) If A ∈ B(X) is νa -petite and D A, then D is νb∗a -petite,
where νb∗a can be chosen as a multiple of νa .
(ii) If Φ is ψ-irreducible and if A ∈ B+ (X) is νa -petite, then νa is an irreducibility
measure for Φ.
Proof To prove (i) choose δ > 0 such that for x ∈ D we have Kb (x, A) ≥ δ. By
Lemma 5.5.2 (i),
Kb∗a (x, B) = Kb (x, dy)Ka (y, B)
X
≥ Kb (x, dy)Ka (y, B) (5.49)
A
≥ δνa (B).
To see (ii), suppose A is νa -petite and νa (B) > 0. For x ∈ A(n, m) as in (5.27) we have
P n Ka (x, B) ≥ P n (x, dy)Ka (y, B) ≥ m−1 νa (B) > 0
A
Thus there is an increasing sequence {Ci } of ψc -petite sets, all with the same
sampling distribution c and minorizing measure equivalent to ψ, with ∪Ci = X.
5.5. Petite sets and sampled chains 119
Proof To prove (i) we first show that we can assume without loss of generality
that νa is an irreducibility measure, even if ψ(A) = 0.
From Proposition 5.2.4 there exists a νb -petite set C with C ∈ B + (X). We have
Ka ε (y, C) > 0 for any y ∈ X and any ε > 0, and hence for x ∈ A,
Ka∗a ε (x, C) ≥ νa (dy)Ka ε (y, C) > 0.
a∗a
This shows that A ε C, and hence from Proposition 5.5.4 we see that A is νa∗a ε ∗b -
petite, where νa∗a ε ∗b is a constant multiple of νb . Now, from Proposition 5.5.4 (ii), the
measure νa∗a ε ∗b is an irreducibility measure, as claimed.
We now assume that νa is an irreducibility measure, which is justified by the discus-
sion above, and use Proposition 5.5.2 (i) to obtain the bound, valid for any 0 < ε < 1,
Ka (x, A0 ) ≥ 1
2 min(ψa 1 (A0 ), ψa 2 (A0 )) > 0
a
so that A1 ∪ A2 A0 . From Proposition 5.5.4 we see that A1 ∪ A2 is petite.
For (iii), first apply Theorem 5.2.2 to construct a νn -small set C ∈ B + (X). By (i)
above we may assume that C is ψb -petite with ψb a maximal irreducibility measure.
Hence Kb (y, · ) ≥ IC (y)ψb ( · ) for all y ∈ X.
By irreducibility and the definitions we also have Ka ε (x, C) > 0 for all 0 < ε < 1,
and all x ∈ X. Combining these bounds gives for any x ∈ X, B ∈ B(X),
Kb∗a ε (x, B) ≥ Ka ε (y, dz)Kb (z, B) ≥ Ka ε (x, C)ψb (B)
C
(i) Without loss of generality we can take a to be either a uniform sampling distribu-
tion am (i) = 1/m, 1 ≤ i ≤ m, or a to be the geometric sampling distribution aε .
In either case, there is a finite mean sampling time
ma = ia(i).
i
(ii) If Φ is strongly aperiodic, then the set C0 ∪C1 ⊆ X̌ corresponding to C is νa∗ -petite
for the split chain Φ̌.
Proof To see (i), let A ∈ B + (X) be νn -small. By Proposition 5.5.5 (i) we have
N +n
N
P k (x, B) ≥ P k +n (x, B) ≥ 12 ψb (A)νn (B)
k =1 k =1
Theorem 5.5.7. If Φ is irreducible and aperiodic, then every petite set is small.
5.6. Commentary 121
Proof Let A be a petite set. From Proposition 5.5.5 we may assume that A is
ψa -petite, where ψa is a maximal irreducibility measure.
Let C denote the small set used in (5.40). Since the chain is aperiodic, it follows
from Theorem 5.4.4 and Lemma D.7.4 that for some n0 ∈ Z+ , the set C is νk -small,
with νk = δν for some δ > 0, for all n0 /2 − 1 ≤ k ≤ n0 .
Since C ∈ B + (X), we may also assume that n0 is so large that
∞
a(k) ≤ 12 ψa (C).
k =n 0 /2
n
0 /2
≥ P k (x, C)a(k) δν(B)
k =0
≥ 1
2 ψa (C) δν(B)
1
which shows that A is νn 0 -small, with νn 0 = 2 δψa (C) ν.
This somewhat surprising result, together with Proposition 5.5.5, indicates that
the class of small sets can be used for different purposes, depending on the choice of
sampling distribution we make: if we sample at a fixed finite time we may get small
sets with their useful fixed time point properties; and if we extend the sampling as in
Proposition 5.5.5, we develop a petite structure with a maximal irreducibility measure.
We shall use this duality frequently.
5.6 Commentary
We have already noted that the split chain and the random renewal time approaches
to regeneration were independently discovered by Nummelin [301] and Athreya and
Ney [13]. The opportunities opened up by this approach are exploited with growing
frequency in later chapters.
However, the split chain only works in the generality of ϕ-irreducible chains because
of the existence of small sets, and the ideas for the proof of their existence go back to
Doeblin [95], although the actual existence as we have it here is from Jain and Jamison
[172]. Our proof is based on that in Orey [309], where small sets are called C-sets.
Nummelin [303] Chapter 2 has a thorough discussion of conditions equivalent to that
we use here for small sets; Bonsdorff [38] also provides connections between the various
small set concepts.
Our discussion of cycles follows that in Nummelin [303] closely. A thorough study
of cyclic behavior, expanding on the original approach of Doeblin [95], is given also in
Chung [70].
Petite sets as defined here were introduced in Meyn and Tweedie [277]. The “small
sets” defined in Nummelin and Tuominen [305] as well as the petits ensembles developed
122 Pseudo-atoms
in Duflo [102] are also special instances of petite sets, where the sampling distribution
a is chosen as a(i) = 1/N for 1 ≤ i ≤ N , and a(i) = (1 − α)αi respectively. To a
French speaker, the term “petite set” might be disturbing since the gender of ensemble
is masculine: however, the nomenclature does fit normal English usage since [26] the
word “petit” is likened to “puny”, while “petite” is more closely akin to “small”.
It might seem from Theorem 5.5.7 that there is little reason to consider both petite
sets and small sets. However, we will see that the two classes of sets are useful in distinct
ways. Petite sets are easy to work with for several reasons: most particularly, they span
periodic classes so that we do not have to assume aperiodicity, they are always closed
under unions for irreducible chains (Nummelin [303] also finds that unions of small sets
are small under aperiodicity), and by Proposition 5.5.5 we may assume that the petite
measure is a maximal irreducibility measure whenever the chain is irreducible.
Perhaps most importantly, when in the next chapter we introduce a class of Markov
chains with desirable topological properties, we will see that the structure of these
chains is closely linked to petiteness properties of compact sets.
Chapter 6
a typical, and frequently used, lower semicontinuous function is the indicator function
IO (x) of an open set O in B(X).
We will use the following continuity properties of the transition kernel, couched
123
124 Topology and continuity
Theorem 6.0.1. (i) If Φ is a T-chain and L(x, O) > 0 for all x and all open sets
O ∈ B(X), then Φ is ψ-irreducible.
(iii) If Φ is a ψ-irreducible Feller chain such that supp ψ has non-empty interior, then
Φ is a ψ-irreducible T-chain.
Proof Proposition 6.2.2 proves (i); (ii) is in Theorem 6.2.5; (iii) is in Theorem 6.2.9.
In order to have any such links as those in Theorem 6.0.1 between the measure-
theoretic and topological properties of a chain, it is vital that there be at least a minimal
adaptation of the dynamics of the chain to the topology of the space on which it lives.
For consider the chain on [0, 1] with transition law for x ∈ [0, 1] given by
an obvious way if n αn < ∞, since then it moves monotonically down the sequence
{n−1 } with positive probability.
Of course, the dynamics of this chain are quite wrong for the space on which we
have embedded it: its structure is adapted to the normal topology on the integers, not
to that on the unit interval or the set {n−1 , n ∈ Z+ }. The Feller property obviously
fails at {0}, as does any continuous component property if αn → 0.
This is a trivial and pathological example, but one which proves valuable in exhibit-
ing the need for the various conditions we now consider, which do link the dynamics to
the structure of the space.
Suppose that X is a (locally compact separable metric) topological space, and let us
denote the class of bounded continuous functions from X to R by C(X).
The (weak) Feller property is frequently defined by requiring that the transition
probability kernel P maps C(X) to C(X). If the transition probability kernel P maps all
bounded measurable functions to C(X) then P (and also Φ) is called strong Feller.
That this is consistent with the definition above follows from
Proposition 6.1.1. (i) The transition kernel P IO is lower semicontinuous for every
open set O ∈ B(X) (that is, Φ is weak Feller) if and only if P maps C(X) to C(X);
and P maps all bounded measurable functions to C(X) (that is, Φ is strong Feller)
if and only if the function P IA is lower semicontinuous for every set A ∈ B(X).
(ii) If the chain is weak Feller, then for any closed set C ⊂ X and any non-decreasing
function m : Z+ → Z+ the function Ex [m(τC )] is lower semicontinuous in x.
Hence for any closed set C ⊂ X, r > 1 and n ∈ Z+ the functions
(iii) If the chain is weak Feller, then for any open set O ⊂ X, the function Px {τO ≤ n}
and hence also the functions Ka (x, O) and L(x, O) are lower semicontinuous.
Since by assumption ∆m (k) ≥ 0 for each k > 0, the proof of (ii) will be complete once
we have shown that Px {τC ≥ k} is lower semicontinuous in x for all k.
Since C is closed and hence IC c (x) is lower semicontinuous, by Theorem D.4.1 there
exist positive continuous functions fi , i ≥ 1, such that fi (x) ↑ IC c (x) for each x ∈ X.
Extend the definition of the kernel IA , given by
IA (x, B) = IA ∩B (x),
by writing for any positive function g
Ig (x, B) := g(x)IB (x).
Then for all k ∈ Z+ ,
Px {τC ≥ k} = (P IC c )k −1 (x, X) = lim (P If i )k −1 (x, X).
i→∞
Xk = F (Xk −1 , Wk ),
Proof We have by definition that the mapping x → F (x, w) is continuous for each
fixed w ∈ R. Thus whenever h : X → R is bounded and continuous, h ◦ F (x, w) is
also bounded and continuous for each fixed w ∈ R. It follows from the Dominated
Convergence Theorem that
is a continuous function of x ∈ X.
This simple proof of weak continuity can be emulated for many models. It implies
that this aspect of the topological analysis of many models is almost independent of
the random nature of the inputs. Indeed, we could rephrase Proposition 6.1.2 as saying
that since the associated control model CM(F ) is a continuous function of the state for
each fixed control sequence, the stochastic nonlinear state space model NSS(F ) is weak
Feller.
We shall see in Chapter 7 that this reflection of deterministic properties of CM(F )
by NSS(F ) is, under appropriate conditions, a powerful and exploitable feature of the
nonlinear state space model structure.
Proof Suppose that h ∈ C(X): the structure (3.35) of the transition kernel for the
random walk shows that
P h (x) = h(y)Γ(dy − x)
R
= h(y + x)Γ(dy) (6.5)
R
128 Topology and continuity
and since h is bounded and continuous, P h is also bounded and continuous, again from
the Dominated Convergence Theorem. Hence Φ is always weak Feller, as we also know
from Proposition 6.1.2.
Suppose next that Γ possesses a density γ with respect to µL e b on R. Taking h in
(6.5) to be any bounded function, we have
P h (x) = h(y)γ(y − x) dy; (6.6)
R
but now from Lemma D.4.3 it follows that the convolution P h (x) = γ ∗ h is continuous,
and the chain is strong Feller.
Conversely, suppose the random walk is strong Feller. Then for any B such that
Γ(B) = δ > 0, by the lower semicontinuity of P (x, B) there exists a neighborhood O of
{0} such that
P (x, B) ≥ P (0, B)/2 = Γ(B)/2 = δ/2, x ∈ O. (6.7)
By Fubini’s Theorem and the translation invariance of µL e b we have for any A ∈ B(X)
Leb Leb
R
µ (dy)Γ(A − y) = R IA −y (x)Γ(dx)
R µ (dy)
= R
Γ(dx) R IA −x (y)µL e b (dy)
Leb
= µ (A)
since Γ(R) = 1. Thus we have in particular from (6.7) and (6.8)
µL e b (B) = R µL e b (dy)Γ(B − y)
≥ O µL e b (dy)Γ(B − y)
≥ δµL e b (O)/2
and hence µL e b Γ as required.
(ii) The chain Φ is called open set irreducible if every point is reachable.
We will use often the following result, which is a simple consequence of the definition
of support.
6.1. Feller properties and forms of stability 129
Proof If x∗ ∈ supp (ψ) then, for any open set O containing x∗ , we have ψ(O) > 0
by the definition of the support. By ψ-irreducibility it follows that L(x, O) > 0 for all
x, and hence x∗ is reachable.
Conversely, suppose that x∗ ∈ supp (ψ), and let O = supp (ψ)c . The set O is open
by the definition of the support, and contains the state x∗ . By Proposition 4.2.3 there
exists an absorbing, full set A ⊆ supp (ψ). Since L(x, O) = 0 for x ∈ A it follows that
x∗ is not reachable.
It is easily checked that open set irreducibility is equivalent to irreducibility when
the state space of the chain is countable and is equipped with the discrete topology.
The open set irreducibility definition is conceptually similar to the ψ-irreducibility
definition: they both imply that “large” sets can be reached from every point in the
space. In the ψ-irreducible case large sets are those of positive ψ-measure, whilst in the
open set irreducible case, large sets are open non-empty sets.
In this book our focus is on the property of ψ-irreducibility as a fundamental struc-
tural property. The next result, despite its simplicity, begins to link that property to
the properties of open set irreducible chains.
Proposition 6.1.5. If Φ is a strong Feller chain, and X contains one reachable point
x∗ , then Φ is ψ-irreducible, with ψ = P (x∗ , · ).
We will see below in Proposition 6.2.2 that this strong Feller condition, which (as is
clear from Proposition 6.1.3) may be unsatisfied for many models, is not needed in full
to get this result, and that Proposition 6.1.5 and Proposition 6.1.6 hold for T-chains
also.
There are now two different approaches we can take in connecting the topological and
continuity properties of Feller chains with the stochastic or measure-theoretic properties
of the chain. We can either weaken the strong Feller property by requiring in essence
that it only hold partially; or we could strengthen the weak Feller condition whilst
retaining its essential flavor.
It will become apparent that the former, T-chain, route is usually far more pro-
ductive, and we move on to this next. A strengthening of the Feller property to give
e-chains will then be developed in Section 6.4.
130 Topology and continuity
6.2 T-chains
6.2.1 T-chains and open set irreducibility
The calculations for NSS(F ) models and random walks show that the majority of the
chains we have considered to date have the weak Feller property.
However, we clearly need more than just the weak Feller property to connect
measure-theoretic and topological irreducibility concepts: every random walk is weak
Feller, and we know from Section 4.3.3 that any chain with increment measure concen-
trated on the rationals enters every open set but is not ψ-irreducible.
Moving from the weak to the strong Feller property is however excessive. Using the
ideas of sampled chains introduced in Section 5.5.1 we now develop properties of the
class of T-chains, which we shall find includes virtually all models we will investigate,
and which appears almost ideally suited to link the general space attributes of the chain
with the topological structure of the space.
The T-chain definition describes a class of chains which are not totally adapted
to the topology of the space, in that the strongly continuous kernel T , being only a
“component” of P , may ignore many discontinuous aspects of the motion of Φ: but it
does ensure that the chain is not completely singular in its motion, with respect to the
normal topology on the space, and the strong continuity of T links set properties such
as ψ-irreducibility to the topology in a way that is not natural for weak continuity.
We illustrate precisely this point now, with the analogue of Proposition 6.1.5.
Proposition 6.2.1. If Φ is a T-chain, and X contains one reachable point x∗ , then Φ
is ψ-irreducible, with ψ = T (x∗ , · ).
In the next two results we show that the existence of sufficient open petite sets
implies that Φ is a T-chain.
Proposition 6.2.3. If an open νa -petite set A exists, then Ka possesses a continuous
component non-trivial on all of A.
Ka ( · , · ) ≥ IA ( · )ν{ · }.
Proof Since X is σ-compact, there is a countable covering of open petite sets, and
the result (i) follows from Proposition 6.2.3 and Proposition 6.2.4.
Now suppose that Φ is ψ-irreducible, so that there exists some petite A ∈ B+ (X),
and let Ka have an everywhere non-trivial continuous component T .
By irreducibility Ka ε (x, A) > 0, and hence from (5.46)
for all x ∈ X.
The function T Ka ε ( · , A) is lower semicontinuous and positive everywhere on X.
Hence Ka∗a ε (x, A) is uniformly bounded from below on compact subsets of X. Propo-
sition 5.2.4 completes the proof that each compact set is petite.
The fact that we can weaken the irreducibility condition to open set irreducibility
follows from Proposition 6.2.2.
The following factorization, which generalizes Proposition 5.5.5, further links the
continuity and petiteness properties of T-chains.
Proposition 6.2.6. If Φ is a ψ-irreducible T-chain, then there is a sampling distribu-
tion b, an everywhere strictly positive, continuous function s : X → R, and a maximal
irreducibility measure ψb such that
Ka (x, C) ≥ δ, x ∈ A.
Proof To see (i), let A be an open petite set of positive ψ-measure. Then Ka ε ( · , A)
is lower semicontinuous and positive everywhere, and hence bounded from below on
compact sets. Proposition 5.5.4 again completes the proof.
To see (ii), let A be a ψ-positive petite set, and define
Ak := closure {x : Ka ε (x, A) ≥ 1/k} ∩ supp ψ.
By Proposition 5.2.4 and Lemma 6.2.7, each Ak is petite. Since supp ψ has non-empty
interior it is of the second category, and hence there exists k ∈ Z+ and an open set
O ⊂ Ak ⊂ supp ψ. The set O is an open ψ-positive petite set, and hence we may apply
(i) to conclude (ii).
A surprising, and particularly useful, conclusion from this cycle of results concerning
petite sets and continuity properties of the transition probabilities is the following result,
showing that Feller chains are in many circumstances also T-chains. We have as a
corollary of Proposition 6.2.8 (ii) and Proposition 6.2.5 (ii) that
Theorem 6.2.9. If a ψ-irreducible chain Φ is weak Feller and if supp ψ has nonempty
interior then Φ is a T-chain.
These results indicate that the Feller property, which is a relatively simple condition
to verify in many applications, provides some strong consequences for ψ-irreducible
chains.
Since we may cover the state space of a ψ-irreducible Markov chain by a countable
collection of petite sets, and since by Lemma 6.2.7 the closure of a petite set is itself
petite, it might seem that Theorem 6.2.9 could be strengthened to provide an open
covering of X by petite sets without additional hypotheses on the chain. It would then
follow by Theorem 6.2.5 that any ψ-irreducible Feller chain is a T-chain.
Unfortunately, this is not the case, as is shown by the following counterexample.
Let X = [0, 1] with the usual topology, let 0 < |α| < 1, and define the Markov transition
function P for x > 0 by
P (x, {0}) = 1 − P (x, {αx}) = x
We set P (0, {0}) = 1. The transition function P is Feller and δ0 -irreducible. But for
any n ∈ Z+ we have
lim Px (τ{0} ≥ n) = 1,
x→0
from which it follows that there does not exist an open petite set containing the point
{0}.
Thus we have constructed a ψ-irreducible Feller chain on a compact state space
which is not a T-chain.
134 Topology and continuity
Exactly the same argument for a storage model with general state-dependent release
rule r(x), as discussed in Section 2.4.4, shows these models to be δ0 -irreducible T-chains
when the integral R(x) of (2.32) is finite for all x.
Thus the virtual equivalence of the petite compact set condition and the T-chain
condition provides an easy path to showing the existence of continuous components for
many models with a real atom in the space.
Assessing conditions for non-atomic chains to be T-chains is not quite as simple in
general. However, we can describe exactly what the continuous component condition
defining T-chains means in the case of the random walk. Recall that the random walk
is called spread-out if some convolution power Γn ∗ is non-singular with respect to µL e b
on R.
Proposition 6.3.2. The unrestricted random walk is a T-chain if and only if it is
spread out.
Proof If Γ is spread out then for some M , and some positive function γ, we have
M∗
P (x, A) = Γ (A − x) ≥
M
γ(y)dy := T (x, A)
A −x
and exactly as in the proof of Proposition 6.1.3, it follows that T is strong Feller:
the spread-out assumption ensures that T (x, X) > 0 for all x, and so by choosing the
sampling distribution as a = δM we find that Φ is a T-chain.
The converse is somewhat harder, since we do not know a priori that when Φ is a
T-chain, the component T can be chosen to be translation invariant. So let us assume
that the result is false, and choose A such that µL e b (A) = 0 but Γn ∗ (A) = 1 for every
n. Then Γn ∗ (Ac ) = 0 for all n and so for the sampling distribution a associated with
the component T ,
T (0, Ac ) ≤ Ka (0, Ac ) = Γn ∗ (Ac )a(n) = 0.
n
6.3. Continuous components for specific models 135
The non-triviality of the component T thus ensures T (0, A) > 0, and since T (x, A)
is lower semicontinuous, there exists a neighborhood O of {0} and a δ > 0 such that
T (x, A) ≥ δ > 0, x ∈ O.
Since T is a component of Ka , this ensures
Ka (x, A) ≥ δ > 0, x ∈ O.
But as in (6.8) by Fubini’s Theorem and the translation invariance of µL e b we have
µL e b (A) = µL e b (dy)Γn ∗ (A − y)
R
= µL e b (dy)P n (y, A). (6.9)
R
Multiplying both sides of (6.9) by a(n) and summing gives
µL e b (A) = R µL e b (dy)Ka (y, A)
≥ O µL e b (dy)Ka (y, A) (6.10)
≥ δµL e b (O)
and since µL e b (O) > 0, we have a contradiction.
This example illustrates clearly the advantage of requiring only a continuous com-
ponent, rather than the Feller property for the chain itself.
To obtain a continuous component for the LSS(F ,G) model, our approach is similar
to that in deriving its irreducibility properties in Section 4.4. We require that the
set of possible reachable states be large for the associated deterministic linear control
system, and we also require that the set of reachable states remain large when the
control sequence u is replaced by the random disturbance W . One condition sufficient
to ensure this is
Using (6.11) we now show that the n-step transition kernel itself possesses a contin-
uous component provided, firstly, Γ is nonsingular with respect to Lebesgue measure
and secondly, the chain X can be driven to a sufficiently large set of states in Rn
through the action of the disturbance process W = {Wk } as described in the last term
of (6.11). This second property is a consequence of the controllability of the associated
model LCM(F ,G).
In Chapter 7 we will show that this construction extends further to more complex
nonlinear models.
Proposition 6.3.3. Suppose the deterministic control model LCM(F ,G) on Rn satisfies
the controllability condition (LCM3), and the associated LSS(F ,G) model X satisfies
the nonsingularity condition(LSS4).
Then the n-skeleton possesses a continuous component which is everywhere non-
trivial, so that X is a T-chain.
Proof We will prove this result in the special case where W is a scalar. The general
case with W ∈ Rp is proved using the same methods as in the case where p = 1, but
much more notation is needed for the required change of variables [272].
Let f denote an arbitrary positive function on X = Rn . From (6.11) together with
non-singularity of the disturbance process W we may bound the conditional mean of
f (Φn ) as follows:
n −1
P n f (x0 ) = E[f (F n x0 + F i GWn −i )] (6.12)
i=0
n −1
≥ ··· f (F n x0 + F i Gwn −i ) γw (w1 ) · · · γw (wn ) dw1 . . . dwn .
i=0
Letting Cn denote the controllability matrix in (4.13) and defining the vector valued
n = (W1 , . . . , Wn ) , we define the kernel T as
random variable W
T f (x) := f (F n x + Cn w n ) γw (w
n ) dw
n .
We have T (x, X) = { γw (x) dx}n > 0, which shows that T is everywhere non-trivial;
and T is a component of P n since (6.12) may be written in terms of T as
P n f (x0 ) ≥ f (F n x0 + Cn w
n ) γw (w
n ) dw
n = T f (x0 ). (6.13)
Let |Cn | denote the determinant of Cn , which is non-zero since the pair (F, G) is con-
trollable. Making the change of variables
vn = Cn w
n , dvn = |Cn |dw
n
By Lemma D.4.3 and the Dominated Convergence Theorem, the right hand side of this
identity is a continuous function of x0 whenever f is bounded. This combined with
(6.13) shows that T is a continuous component of P n .
In particular this shows that the ARMA process (ARMA1) and any of its variations
may be modeled as a T-chain if the noise process W is sufficiently rich with respect to
Lebesgue measure, since they possess a controllable realization from Proposition 4.4.2.
In general, we can also obtain a T-chain by restricting the process to a controllable
subspace of the state space in the manner indicated after Proposition 4.4.3.
We will use the following lemma to control the growth of the models below.
Lemma 6.3.4. Let ρ(F ) denote the modulus of the eigenvalue of F of maximum mod-
ulus, where F is an n × n matrix. Then for any matrix norm
·
we have the limit
1
log ρ(F ) = lim log
F n
. (6.14)
n →∞ n
Proof The existence of the limit (6.14) follows from the Jordan Decomposition
and is a standard result from linear systems theory: see [57] or Exercises 2.I.2 and 2.I.5
of [102] for details.
A consequence of Lemma 6.3.4 is that for any constants ρ, ρ satisfying ρ < ρ(F ) < ρ,
there exists c > 1 such that
c−1 ρn ≤
F n
≤ cρn . (6.15)
Hence for the linear state space model, under the eigenvalue condition (LSS5), the
convergence F n → 0 takes place at a geometric rate. This property is used in the
following result to give conditions under which the linear state space model is irreducible.
138 Topology and continuity
Proposition 6.3.5. Suppose that the LSS(F ,G) model X satisfies the density condition
(LSS4) and the eigenvalue condition (LSS5), and that the associated control system
LCM(F ,G) is controllable.
Then X is a ψ-irreducible T-chain and every compact subset of X is small.
Proof We have seen in Proposition 6.3.3 that the linear state space model is a
T-chain under these conditions. To obtain irreducibility we will construct a reachable
state and use Proposition 6.2.1.
Let w denote any element of the support of the distribution Γ of W , and let
∞
x = F k Gw .
k =0
If in (1.4), the control uk = w for all k, then the system xk converges to x uniformly
for initial conditions in compact subsets of X.
By (pointwise) continuity of the model, it follows that for any bounded set A ⊂ X and
open set O containing x , there exists ε > 0 sufficiently small and N ∈ Z+ sufficiently
large such that xN ∈ O whenever x0 ∈ A, and ui ∈ w + εB, for 1 ≤ i ≤ N , where B
denotes the open unit ball centered at the origin in X. Since w lies in the support of
the distribution of Wk we can conclude that P N (x0 , O) ≥ Γ(w + εB)N > 0 for x0 ∈ A.
Hence x is reachable, which by Proposition 6.2.1 and Proposition 6.3.3 implies that
Φ is ψ-irreducible for some ψ.
We now show that all bounded sets are small, rather than merely petite. Propo-
sition 6.3.3 shows that P n possesses a strong Feller component T . By Theorem 5.2.2
there exists a small set C for which T (x , C) > 0 and hence, by the Feller property, an
open set O containing x exists for which
By Proposition 5.2.4 O is also a small set. If A is a bounded set, then we have already
δM
shown that A O for some N , so applying Proposition 5.2.4 once more we have the
desired conclusion that A is small.
Even though this model is not Feller, due to the possible presence of discontinuities
at the boundary points {ri }, we can establish
Proposition 6.3.6. Under (SETAR1) and (SETAR2), the SETAR model is a ϕ-
irreducible T-process with ϕ taken as Lebesgue measure µL e b on R.
6.4 e-Chains
Now that we have developed some of the structural properties of T-chains that we will
require, we move on to a class of Feller chains which also have desirable structural
properties, namely e-chains.
Weak convergence
A sequence of probabilities {µk : k ∈ Z+ } ⊂ M converges weakly to
w
µ∞ ∈ M (denoted µk −→ µ∞ ) if
lim f dµk = f dµ∞
k →∞
Due to our restrictions on the state space X, the topology of weak convergence is
induced by a number of metrics on M; see Section D.5. One such metric may be
expressed
∞
dm (µ, ν) = | fk dµ − fk dν|2−k , µ, ν ∈ M, (6.16)
k =0
where {fk } is an appropriate set of functions in Cc (X), the set of continuous functions
on X with compact support.
For (P, M, dm ) to be a dynamical system we require that P be a continuous map
on M. If P is continuous, then we must have in particular that if a sequence of point
masses {δx k : k ∈ Z+ } ⊂ M converge to some point mass δx ∞ ∈ M, then
w
δx k P −→ δx ∞ P as k → ∞
or equivalently, limk →∞ P f (xk ) = P f (x∞ ) for all f ∈ C(X). That is, if the Markov
transition function induces a continuous map on M, then P f must be continuous for
any bounded continuous function f .
This is exactly the weak Feller property. Conversely, it is obvious that for any weak
Feller Markov transition function P , the associated operator P on M is continuous.
We have thus shown
Proposition 6.4.1. The triple (P, M, dm ) is a dynamical system if and only if the
Markov transition function P has the weak Feller property.
Although we do not get further immediate value from this result, since there do not
exist a great number of results in the dynamical systems theory literature to be exploited
in this context, these observations guide us to stronger and more useful continuity
conditions.
There is one striking result which very largely justifies our focus on e-chains, espe-
cially in the context of more stable chains.
Proposition 6.4.2. Suppose that the Markov chain Φ has the Feller property, and that
there exists a unique probability measure π such that for every x
w
P n (x, · ) −→ π. (6.17)
Then Φ is an e-chain.
Proof Since the limit in (6.17) is continuous (and in fact constant) it follows from
Ascoli’s Theorem D.4.2 that the sequence of functions {P k f : k ∈ Z+ } is equicontinuous
on compact subsets of X whenever f ∈ C(X). Thus the chain Φ is an e-chain.
Thus chains with good limiting behavior, such as those in Part III in particular, are
forced to be e-chains, and in this sense the e-chain assumption is for many purposes a
minor extra step after the original Feller property is assumed.
Recall from Chapter 1 that the dynamical system (P, M, dm ) is called stable in the
sense of Lyapunov if for each measure µ ∈ M,
The following result creates a further link between classical dynamical systems theory,
and the theory of Markov chains on topological state spaces. The proof is routine and
we omit it.
Proposition 6.4.3. The Markov chain is an e-chain if and only if the dynamical system
(P, M, dm ) is stable in the sense of Lyapunov.
Conditions for such an invariant measure π to exist are the subject of considerable study
for ψ-irreducible chains in Chapter 10, and in Chapter 12 we return to this question for
weak Feller chains and e-chains.
A more immediately useful concept is that of Lagrange stability. Recall from Sec-
tion 1.3.2 that (P, M, dm ) is Lagrange stable if, for every µ ∈ M, the orbit of measures
µP k is a precompact subset of M. One way to investigate Lagrange stability for weak
Feller chains is to utilize the following concept, which will have much wider applicability
in due course.
142 Topology and continuity
Proposition 6.4.4. The chain Φ is bounded in probability if and only if the dynamical
system (P, M, dm ) is Lagrange stable.
For e-chains, the concepts of boundedness in probability and Lagrange stability also
interact to give a useful stability result for a somewhat different dynamical system.
The space C(X) can be considered as a normed linear space, where we take the norm
| · |c to be defined for f ∈ C(X) as
∞
|f |c := 2−k sup |f (x)|
x∈C k
k =0
where {Ck } is a sequence of open precompact sets whose union is equal to X. The
associated metric dc generates the topology of uniform convergence on compact subsets
of X.
If P is a weak Feller kernel, then the mapping P on C(X) is continuous with respect
to this norm, and in this case the triple (P, C(X), dc ) is a dynamical system.
By Ascoli’s Theorem D.4.2, (P, C(X), dc ) will be Lagrange stable if and only if for
each initial condition f ∈ C(X), the orbit {P k f : k ∈ Z+ } is uniformly bounded, and
equicontinuous on compact subsets of X. This fact easily implies
To summarize, for weak Feller chains boundedness in probability and the equiconti-
nuity assumption are, respectively, exactly the same as Lagrange stability and stability
in the sense of Lyapunov for the dynamical system (P, M, dm ); and these stability con-
ditions are both simultaneously satisfied if and only if the dynamical system (P, M, dm )
and its dual (P, C(X), dc ) are simultaneously Lagrange stable.
These connections suggest that equicontinuity will be a useful tool for studying the
limiting behavior of the distributions governing the Markov chain Φ, a belief which will
be justified in the results in Chapter 12 and Chapter 18.
6.4. e-Chains 143
Proof Let f ∈ Cc (X). By uniform continuity of f , for any ε > 0 we can find δ > 0
so that |f (x) − f (y)| ≤ ε whenever |x − y| ≤ δ. It follows from (6.19) that for any
n ∈ Z+ , and any x, y ∈ R with |x − y| ≤ δ,
P k f (x0 ) → f∞ , k → ∞,
144 Topology and continuity
where f∞ is a constant. When x0 = 0 we have that P k f (x0 ) = f (x0 ) = f (0) for all k.
From these observations it is easy to see that X is not an e-chain. Take f ∈ Cc (X)
with f (0) = 0 and f (x) ≥ 0 for all x > 0: we may assume without loss of generality
that f∞ > 0. Since the one-point set {0} is absorbing we have P k (0, {0}) = 1 for
all k, and it immediately follows that P k f converges to a discontinuous function. By
Ascoli’s Theorem the sequence of functions {P k f : k ∈ Z+ } cannot be equicontinuous
on compact subsets of R+ , which shows that X is not an e-chain.
However by modifying the topology on X = R+ we do obtain an e-chain as follows.
Define the topology on the strictly positive real line (0, ∞) in the usual way, and define
{0} to be open, so that X becomes a disconnected set with two open components. Then,
in this topology, P k f converges to a uniformly continuous function which is constant on
each component of X. From this and Ascoli’s Theorem it follows that X is an e-chain.
It appears in general that such pathologies are typical of “non-e” Feller chains, and
this again reinforces the value of our results for e-chains, which constitute the more
typical behavior of Feller chains.
6.5 Commentary
The weak Feller chain has been a basic starting point in certain approaches to Markov
chain theory for many years. The work of Foguel [121, 123], Jamison [174, 175, 176],
Lin [238], Rosenblatt [339] and Sine [356, 357, 358] have established a relatively rich
theory based on this approach, and the seminal book of Dynkin [105] uses the Feller
property extensively.
We will revisit this in much greater detail in Chapter 12, where we will also take up
the consequences of the e-chain assumption: this will be shown to have useful attributes
in the study of limiting behavior of chains.
The equicontinuity results here, which relate this condition to the dynamical systems
viewpoint, are developed by Meyn [260]. Equicontinuity may be compared to uniform
stability [174] or regularity [115]. Whilst e-chains have also been developed in detail,
particularly by Rosenblatt [337], Jamison [174, 175] and Sine [356, 357] they do not have
particularly useful connections with the ψ-irreducible chains we are about to explore,
which explains their relatively brief appearance at this stage.
The concept of continuous components appears first in Pollard and Tweedie [318,
319], and some practical applications are given in Laslett et al. [237]. The real ex-
ploitation of this concept really begins in Tuominen and Tweedie [391], from which we
take Proposition 6.2.2. The connections between T-chains and the existence of compact
petite sets is a recent result of Meyn and Tweedie [277].
In practice the identification of ψ-irreducible Feller chains as T-chains provided only
that supp ψ has non-empty interior is likely to make the application of the results for
such chains very much more common. This identification is new. The condition that
supp ψ have non-empty interior has however proved useful in a number of associated
areas in [319] and in Cogburn [75].
We note in advance here the results of Chapter 9 and Chapter 18, where we will
show that a number of stability criteria for general space chains have “topological”
analogues which, for T-chains, are exact equivalences. Thus T-chains will prove of
on-going interest.
6.5. Commentary 145
In applying the results and concepts of Part I in the domains of times series or systems
theory, we have so far analyzed only linear models in any detail, albeit rather general
and multidimensional ones. This chapter is intended as a relatively complete description
of the way in which nonlinear models may be analyzed within the Markovian context
developed thus far. We will consider both the general nonlinear state space model, and
some specific applications which take on this particular form.
The pattern of this analysis is to consider first some particular structural or sta-
bility aspect of the associated deterministic control, or CM(F ), model and then under
appropriate choice of conditions on the disturbance or noise process (typically a den-
sity condition as in the linear models of Section 6.3.2) to verify a related structural or
stability aspect of the stochastic nonlinear state space NSS(F ) model.
Highlights of this duality are
(ii) a form of irreducibility (the existence of a globally attracting state for the CM(F )
model) is then equivalent to the associated NSS(F ) model being a ψ-irreducible
T-chain (Section 7.2);
(iii) the existence of periodic classes for the forward accessible CM(F ) model is fur-
ther equivalent to the associated NSS(F ) model being a periodic Markov chain,
with the periodic classes coinciding for the deterministic and the stochastic model
(Section 7.3).
Thus we can reinterpret some of the concepts which we have introduced for Markov
chains in this deterministic setting; and conversely, by studying the deterministic model
we obtain criteria for our basic assumptions to be valid in the stochastic case.
In Section 7.4.3 the adaptive control model is considered to illustrate how these
results may be applied in specific applications: for this model we exploit the fact that
146
7.1. Forward accessibility and continuous components 147
We define A+ (x) to be the set of all states which are reachable from x at some time in
the future, given by
∞
*
A+ (x) := Ak+ (x). (7.3)
k =0
The analogue of controllability that we use for the nonlinear model is called forward
accessibility.
Forward accessibility
The associated control model CM(F ) is called forward accessible if for each
x0 ∈ X, the set A+ (x0 ) ⊂ X has non-empty interior.
For general nonlinear models, forward accessibility depends critically on the partic-
ular control set Ow chosen. This is in contrast to the linear state space model, where
conditions on the driving matrix pair (F, G) sufficed for controllability.
Nonetheless, for the scalar nonlinear state space model we may show that forward
accessibility is equivalent to the following “rank condition”, similar to (LCM3):
In the scalar linear case the control system (7.1) has the form
xk = F xk −1 + Guk ,
with F and G scalars. In this special case the derivative in (CM2) becomes exactly
[F k −1 G| · · · |F G|G], which shows that the rank condition (CM2) is a generalization of
the controllability condition (LCM3) for the linear state space model. This connection
will be strengthened when we consider multidimensional nonlinear models below.
Theorem 7.1.1. The control model CM(F ) is forward accessible if and only if the rank
condition (CM2) is satisfied.
A proof of this result would take us too far from the purpose of this book. It is
similar to that of Proposition 7.1.2, and details may be found in [271, 272].
7.1. Forward accessibility and continuous components 149
We know from the definitions that, with probability one, Wk ∈ Ow for all k ∈ Z+ .
Commonly assumed noise distributions satisfying this assumption include those which
possess a continuous density, such as the Gaussian model, or uniform distributions on
bounded open intervals in R.
We can now develop an explicit continuous component for such scalar nonlinear
state space models.
Proposition 7.1.2. Suppose that for the SNSS(F ) model, the noise distribution satis-
fies (SNSS3), and that the associated control system CM(F ) is forward accessible. Then
the SNSS(F ) model is a T-chain.
Proof Since CM(F ) is forward accessible we have from Theorem 7.1.1 that the
rank condition (CM2) holds. For simplicity of notation, assume that the derivative
with respect to the kth disturbance variable is non-zero:
∂Fk 0 0
(x , w , . . . , wk0 ) = 0 (7.5)
∂wk 0 1
which is evidently full rank at (x00 , w10 , . . . , wk0 ). It follows from the Inverse Function
Theorem that there exists an open set
B = Bx 00 × Bw 10 × · · · × Bw k0 ,
containing (x00 , w10 , . . . , wk0 ), and a smooth function Gk : {F k {B}} → Rk +1 such that
Gk (F k (x0 , w1 , . . . , wk )) = (x0 , w1 , . . . , wk ) ,
We now make a change of variables, similar to the linear case. For any x0 ∈ Bx 00 , and
any positive function f : R → R+ ,
P k f (x0 ) = · · · f (Fk (x0 , w1 , . . . , wk ))γw (wk ) · · · γw (w1 ) dw1 · · · dwk (7.6)
≥ ··· f (Fk (x0 , w1 , . . . , wk ))γw (wk ) · · · γw (w1 ) dw1 · · · dwk .
Bw 0 Bw 0
1 k
We will first integrate over wk , keeping the remaining variables fixed. By making the
change of variables
xk = Fk (x0 , w1 , . . . , wk ), wk = Gk (x0 , w1 , . . . , wk −1 , xk ) ,
so that
∂Gk
dwk = | (x0 , w1 , . . . , wk −1 , xk )| dxk ,
∂xk
we obtain for (x0 , w1 , . . . , wk −1 ) ∈ Bx 00 × · · · × Bw k0 −1 ,
f (Fk (x0 , w1 , . . . , wk ))γw (wk ) dwk = f (xk )qk (x0 , w1 , . . . , wk −1 , xk ) dxk (7.7)
Bw 0 R
k
where ξ 0 = (x00 , w10 , . . . , wk0 −1 , x0k ). We will show that T0 f is lower semicontinuous on
R whenever f is positive and bounded.
Since qk (x0 , w1 , . . . , wk −1 , xk )γw (w1 ) · · · γw (wk −1 ) is a lower semicontinuous func-
tion of its arguments in Rk +1 , there exists a sequence of positive, continuous functions
ri : Rk +1 → R+ , i ∈ Z+ , such that for each i, the function ri has bounded support and,
as i ↑ ∞,
It follows from the dominated convergence theorem that Ti f is continuous for any
bounded function f . If f is also positive, then as i ↑ ∞,
Ti f (x0 ) ↑ T0 f (x0 ), x0 ∈ R,
Xk +1 = θXk + bWk +1 Xk + Wk +1
where W is a disturbance process. To place this bilinear model into the framework of
this chapter we assume
Under (SBL1) and (SBL2), the bilinear model X is an SNSS(F ) model with F
defined in (2.7).
First observe that the one-step transition kernel P for this model cannot possess
an everywhere non-trivial continuous component. This may be seen from the fact that
152 The nonlinear state space model
P (−1/b, {−θ/b}) = 1, yet P (x, {−θ/b}) = 0 for all x = −1/b. It follows that the only
positive lower semicontinuous function which is majorized by P ( · , {−θ/b}) is zero, and
thus any continuous component T of P must be trivial at −1/b: that is, T (−1/b, R) = 0.
This could be anticipated by looking at the controllability vector (7.4). The first
order controllability vector is
∂F
(x0 , u1 ) = bx0 + 1,
∂u
which is zero at x0 = −1/b, and thus the first order test for forward accessibility fails.
Hence we must take k ≥ 2 in (7.4) if we hope to construct a continuous component.
When k = 2 the vector (7.4) can be computed using the chain rule to give
∂F ∂F ∂F
(x0 , u1 ) |
(x1 , u2 ) (x1 , u2 )
∂x ∂u ∂u
= [(θ + bu2 )(bx0 + 1) | bx1 + 1]
= [(θ + bu2 )(bx0 + 1) | θbx0 + b2 u1 x0 + bu1 + 1]
which is non-zero for almost every uu 12 ∈ R2 . Hence the associated control model is
forward accessible, and this together with Proposition 7.1.2 gives
Proposition 7.1.3. If (SBL1) and (SBL2) hold, then the bilinear model is a T-chain.
xk = Fk (x0 , u1 , . . . , uk ), k ∈ Z+ , (7.9)
where xk = Fk (x0 , u1 , . . . , uk ). Let Cxk0 = Cxk0 (u1 , . . . , uk ) denote the generalized con-
trollability matrix (along the sequence u1 , . . . , uk )
Cxk0 = [F k −1 G | · · · | G],
Proposition 7.1.4. The nonlinear control model CM(F ) satisfying (7.9) is forward
accessible if and only the rank condition (CM3) holds.
Using an argument which is similar to, but more complicated than the proof of
Proposition 7.1.2, we may obtain the following consequence of forward accessibility.
154 The nonlinear state space model
Proposition 7.1.5. If the NSS(F ) model satisfies the density assumption (NSS3), and
the associated control model is forward accessible, then the state space X may be written
as the union of open small sets, and hence the NSS(F ) model is a T-chain.
Note that this only guarantees the T-chain property: we now move on to consider
the equally needed irreducibility properties of the NNS(F ) models.
is also invariant.
The following result summarizes these observations:
7.2. Minimal sets and irreducibility 155
Proposition 7.2.1. For the control system (7.9) we have for any C ⊂ X,
(i) A+ (C) and A+ (C) are invariant;
(ii) Ω+ (C) is invariant;
(iii) C 0 is invariant, and C 0 is also closed if the set C is open.
As a consequence of the assumption that the map F is smooth, and hence continuous,
we then have immediately
Proposition 7.2.2. If the associated CM(F ) model is forward accessible, then for the
NSS(F ) model:
(i) a closed subset A ⊂ X is absorbing for NSS(F ) if and only if it is invariant for
CM(F );
(ii) if U ⊂ X is open, then for each k ≥ 1 and x ∈ X,
Ak+ (x) ∩ U = ∅ ⇐⇒ P k (x, U ) > 0;
Minimal sets
We call a set minimal for the deterministic control model CM(F ) if it is
(topologically) closed, invariant, and does not contain any closed invariant
set as a proper subset.
For example, consider the LCM(F ,G) model introduced in (1.4). The assumption
(LCM2) simply states that the control set Ow is equal to Rp .
In this case the system possesses a unique minimal set M which is equal to X0 , the
range space of the controllability matrix, as described after Proposition 4.4.3. If the
eigenvalue condition (LSS5) holds then this is the only minimal set for the LCM(F ,G)
model.
The following characterizations of minimality follow directly from the definitions,
and the fact that both A+ (x) and Ω+ (x) are closed and invariant.
Proposition 7.2.3. The following are equivalent for a nonempty set M ⊂ X:
(i) M is minimal for CM(F );
(ii) A+ (x) = M for all x ∈ M ;
(iii) Ω+ (x) = M for all x ∈ M .
156 The nonlinear state space model
Theorem 7.2.4. Let M ⊂ X be a minimal set for CM(F ). If CM(F ) is forward acces-
sible and the disturbance process of the associated NSS(F ) model satisfies the density
condition (NSS3), then
(ii) the NSS(F ) model restricted to M is an open set irreducible (and so ψ-irreducible)
T-chain.
xk +1 = F (xk , uk +1 ) = xk + uk +1 ,
so that all proper closed invariant sets are of the form [t, ∞) for some t ∈ R. This
system is indecomposable, yet no minimal sets exist.
x ∈ Ω+ (y).
Proposition 7.2.5. (i) The nonlinear control system (7.9) is M -irreducible if and
only if a globally attracting state exists.
(ii) If a globally attracting state x exists then the unique minimal set is equal to
A+ (x ) = Ω+ (x ).
We can now provide the desired connection between irreducibility of the nonlinear
control system and ψ-irreducibility for the corresponding Markov chain.
Theorem 7.2.6. Suppose that CM(F ) is forward accessible and the disturbance process
of the associated NSS(F ) model satisfies the density condition (NSS3).
Then the NSS(F ) model is ψ-irreducible if and only if CM(F ) is M -irreducible.
Proof If the NSS(F ) model is ψ-irreducible, let x be any state in supp ψ, and
let U be any open set containing x . By definition we have ψ(U ) > 0, which implies
that Ka ε (x, U ) > 0 for all x ∈ X. By Proposition 7.2.2 it follows that x is globally
attracting, and hence CM(F ) is M -irreducible by Proposition 7.2.5.
Conversely, suppose that CM(F ) possesses a globally attracting state, and let U
be an open petite set containing x . Then A+ (x) ∩ U = ∅ for all x ∈ X, which by
Proposition 7.2.2 and Proposition 5.5.4 implies that the NSS(F ) model is ψ-irreducible
for some ψ.
k
This will be denoted B −→ C. From the Implicit Function Theorem, in a manner
similar to the proof of Proposition 7.1.2, we can immediately connect k-accessibility
with forward accessibility.
Proposition 7.3.1. Suppose that the CM(F ) model is forward accessible. Then for
each x ∈ X, there exist open sets Bx , Cx ⊂ X, with x ∈ Bx and an integer kx ∈ Z+
kx
such that Bx −→ Cx .
Proof Using Proposition 7.3.1 we find that there exist open sets B and C, and an
k
integer k with B −→ C, such that B ∩ M = ∅. Since M is invariant, it follows that
C ⊂ A+ (B ∩ M ) ⊂ M, (7.15)
and by Proposition 7.2.1, minimality, and the hypothesis that the set B is open,
A+ (x) ∩ B = ∅ (7.16)
for every x ∈ M .
Combining (7.15) and (7.16) it follows that Am+ (c) ∩ B = ∅ for some m ∈ Z+ , and
c ∈ C. By continuity of the function F we conclude that there exists an open set E ⊂ C
such that
Am + (x) ∩ B = ∅ for all x ∈ E.
The set E satisfies the conditions of the lemma with n = m + k since by the semi-group
property,
+ (x)) ⊃ A+ (A+ (x) ∩ B) ⊃ C ⊃ E
An+ (x) = Ak+ (Am k m
for all x ∈ E.
Call a finite ordered collection of disjoint closed sets G := {Gi : 1 ≤ i ≤ d} a periodic
orbit if for each i,
Proof Using Lemma 7.3.2 we can fix an open set E with E ⊂ M , and an integer
k
k such that E −→ E. Define I ⊂ Z+ by
n
I := {n ≥ 1 : E −→ E}. (7.17)
The semi-group property implies that the set I is closed under addition: for if i, j ∈ I,
then for all x ∈ E,
Ai+j j j
+ (x) = A+ (A+ (x)) ⊃ A+ (E) ⊃ E.
i
Let d denote g.c.d.(I). The integer d will be called the period of M , and M will be
called aperiodic when d = 1.
For 1 ≤ i ≤ d we define
∞
*
Gi := {x ∈ M : Ak+d−i (x) ∩ E = ∅}. (7.18)
k =1
!d
By Proposition 7.2.1 it follows that M = i=1 Gi .
Since E is an open subset of M , it follows that for each i ∈ Z+ , the set Gi is open
in the relative topology on M . Once we have shown that the sets {Gi } are disjoint, it
will follow that they are closed in the relative topology on M . Since M itself is closed,
this will imply that for each i, the set Gi is closed.
We now show that the sets {Gi } are disjoint. Suppose that on the contrary x ∈
Gi ∩ Gj for some i = j. Then there exists ki , kj ∈ Z+ such that
k d−j
Ak+i d−i (y) ∩ E = ∅ and A+j (y) ∩ E = ∅ (7.19)
when y = x. Since E is open, we may find an open set O ⊂ X containing x such that
(7.19) holds for all y ∈ O.
By Proposition 7.2.1, there exists v ∈ E and n ∈ Z+ such that
An+ (v) ∩ O = ∅. (7.20)
k
By (7.20), (7.19), and since E −→
0
E we have for δ = i, j, and all z ∈ E,
Ak+0 +k δ d−δ +n +k 0 (z) ⊃ Ak+0 +k δ d−δ +n (E)
⊃ Ak+0 +k δ d−δ (An+ (v) ∩ O)
⊃ Ak+0 (Ak+δ d−δ (An+ (v) ∩ O) ∩ E) ⊃ E.
This shows that
2k0 + kδ d − δ + n ∈ I
for δ = i, j, and this contradicts the definition of d. We conclude that the sets {Gi }
are disjoint.
We now show that G is a periodic orbit. Let x ∈ Gi , and u ∈ Ow . Since the sets
{Gi } form a disjoint cover of M and since M is invariant, there exists a unique 1 ≤ j ≤ d
such that F (x, u) ∈ Gj . It follows from the semi-group property that x ∈ Gj −1 , and
hence i = j − 1.
The uniqueness of this construction follows from the definition given in equation
(7.18).
The following consequence of Theorem 7.3.3 further illustrates the topological struc-
ture of minimal sets.
160 The nonlinear state space model
Proposition 7.3.4. Under the conditions of Theorem 7.3.3, if the control set Ow is
connected, then the periodic orbit G constructed in Theorem 7.3.3 is precisely equal to
the connected components of the minimal set M .
In particular, in this case M is aperiodic if and only if it is connected.
n
Proof First suppose that M is aperiodic. Let E −→ E, and consider a fixed state
v ∈ E.
By aperiodicity and Lemma D.7.4 there exists an integer N0 with the property that
for all k ≥ N0 . Since Ak+ (v) is the continuous image of the connected set v × Owk , the
set
∞
*
A+ (AN+
0
(v)) = Ak+ (v) (7.22)
k =N 0
is connected. Its closure is therefore also connected, and by Proposition 7.2.1 the closure
of the set (7.22) is equal to M .
The periodic case is treated similarly. First we show that for some N0 ∈ Z+ we have
∞
*
Gd = Ak+d (v),
k =N 0
where d is the period of M , and each of the sets Ak+d (v), k ≥ N0 , contains v.
This shows that Gd is connected. Next, observe that
G1 = A1+ (Gd ),
and since the control set Ow and Gd are both connected, it follows that G1 is also
connected. By induction, each of the sets {Gi : 1 ≤ i ≤ d} is connected.
7.3.2 Periodicity
All of the results described above dealing with periodicity of minimal sets were posed
in a purely deterministic framework. We now return to the stochastic model described
by (NSS1)–(NSS3) to see how the deterministic formulation of periodicity relates to the
stochastic definition which was introduced for Markov chains in Section 5.4.
As one might hope, the connections are very strong.
Theorem 7.3.5. If the NSS(F ) model satisfies conditions (NSS1)–(NSS3) and the
associated control model CM(F ) is forward accessible then:
(i) if M is a minimal set, then the restriction of the NSS(F ) model to M is a ψ-
irreducible T-chain, and the periodic orbit {Gi : 1 ≤ i ≤ d} ⊂ M whose existence
is guaranteed by Theorem 7.3.3 is ψ-a.e. equal to the d-cycle constructed in The-
orem 5.4.4;
(ii) if CM(F ) is M -irreducible, and if its unique minimal set M is aperiodic, then the
NSS(F ) model is a ψ-irreducible aperiodic T-chain.
7.4. Forward accessible examples 161
Proof The proof of (i) follows directly from the definitions, and the observation
that by reducing E if necessary, we may assume that the set E which is used in the proof
of Theorem 7.3.3 is small. Hence the set E plays the same role as the small set used in
the proof of Theorem 5.2.1. The proof of (ii) follows from (i) and Theorem 7.2.4.
The control model is thus forward accessible, and hence Φ = Yθ is a T-chain.
Suppose now that the bound (7.24) holds for z ∗ and let w∗ denote any element of
Ow ⊆ R. If Zk and Wk are set equal to z ∗ and w∗ respectively in (7.23) then as k → ∞
θk ∗ z ∗ (1 − α)−1
→ x := .
Yk w∗ (1 − α)(1 − α − z ∗ )−1
The state x∗ is globally attracting, and it immediately follows from Proposition 7.2.5
and Theorem 7.2.6 that the chain is ψ-irreducible. Aperiodicity then follows from the
fact that any cycle must contain the state x∗ .
162 The nonlinear state space model
which is of the form (NSS1), with the associated CM(F ) model defined as
a
x −1/xa + 1/xb u
F xb ,u = + . (7.25)
xa 0
Proposition 7.4.2. The NSS(F ) model (2.12) is a T-chain if the disturbance sequence
W satisfies condition (NSS3).
Proposition 7.4.3. If (SAC1) and (SAC2) hold for the adaptive control model defined
by (2.22)–(2.24), and if σz2 < 1, then Φ is a ψ-irreducible and aperiodic T-chain.
7.5. Equicontinuity and the nonlinear state space model 163
Proof To prove the result we show that the associated deterministic control model
for the nonlinear state space model defined by (2.22)–(2.24) is forward accessible and,
for the associated deterministic control system, a globally attracting point exists.
The second-order controllability matrix has the form
−2α 2 σ w2 Σ 21 Y 1
0 0 0
∂(Σ2 , θ̃2 , Y2 ) (Σ 1 Y 12 +σ w2 ) 2
CΦ2 0 (Z2 , W2 , Z1 , W1 ) :=
= • • 1 •
∂(Z2 , W2 , Z1 , W1 )
• • 0 1
where “•” denotes a variable which does not affect the rank of the controllability
matrix. It is evident that CΦ2 0 is full rank whenever Y1 = θ̃0 Y0 + W1 is non-zero.
This shows that for each initial condition Φ0 ∈ X, the matrix CΦ2 0 is full rank for a.e.
{(Z1 , W1 ), (Z2 , W2 )} ∈ R4 , and so the associated control model is forward accessible,
and hence the stochastic model Zis a T-chain by Proposition 7.1.5.
It is easily checked that if W is set equal to zero in (2.22)–(2.23) then, since α < 1
and σz2 < 1,
σz2
Φk → ( , 0, 0) as k → ∞.
1 − α2
This shows that the control model associated with the Markov chain Φ is M -irreducible,
and hence by Proposition 7.2.6 the chain itself is ψ-irreducible. The limit above also
shows that every element of a cycle {Gi } for the unique minimal set must contain the
σ2
point ( 1−αz 2 , 0, 0). From Proposition 7.3.4 it follows that the chain is aperiodic.
where ∇Φ takes values in the set of n × n matrices, and DF denotes the derivative of
F with respect to its first variable.
Since ∇Φ0 = I it follows from the chain rule and induction that the sensitivity process
is in fact the derivative of the present state with respect to the initial state: that is,
d
∇Φk = Φk for all k ∈ Z+ .
dΦ0
164 The nonlinear state space model
The main result in this section connects stability of the derivative process with
equicontinuity of the transition function for Φ. Since the system (7.26) is closely related
to the system (NSS1), linearized about the sample path (Φ0 , Φ1 , . . . ), it is reasonable
to expect that the stability of Φ will be closely related to the stability of ∇Φ .
Theorem 7.5.1. Suppose that (NSS1)–(NSS3) hold for the NSS(F ) model. Then let-
ting ∇Φk denote the derivative of Φk with respect to Φ0 , k ∈ Z+ , we have
(i) if for some open convex set N ⊂ X,
E[ sup
∇Φk
] < ∞ (7.27)
Φ 0 ∈N
sup sup Ey [
∇Φk
] < ∞.
y ∈C k ≥0
Then Φ is an e-chain.
Then letting ∇Φk denote the derivative of Φk with respect to Φ0 , we have for any compact
set C ⊂ X, and any k ≥ 0,
E[ sup
∇Φk
] < ∞.
Φ 0 ∈C
7.6. Commentary* 165
∇Φk = F m ,
which tends to zero exponentially fast, by Lemma 6.3.4. The conditions of Theo-
rem 7.5.1 are thus satisfied, which completes the proof.
Observe that Proposition 7.5.3 uses the eigenvalue condition (LSS5), the same as-
sumption which was used in Proposition 4.4.3 to obtain ψ-irreducibility for the Gaussian
model, and the same condition that will be used to obtain stability in later chapters.
The analogous Proposition 6.3.3 uses controllability to give conditions under which
the linear state space model is a T-chain. Note that controllability is not required here.
Other specific nonlinear models, such as bilinear models, can be analyzed similarly
using this approach.
7.6 Commentary*
We have already noted that in the degenerate case where the control set Ow consists
of a single point, the NSS(F ) model defines a semi-dynamical system with state space
X, and in fact many of the concepts introduced in this chapter are generalizations of
standard concepts from dynamical systems theory.
Three standard approaches to the qualitative theory of dynamical systems are topo-
logical dynamics whose principal tool is point set topology; ergodic theory, where one
assumes (or proves, frequently using a compactness argument) the existence of an er-
godic invariant measure; and finally, the direct method of Lyapunov, which concerns
criteria for stability.
The latter two approaches will be developed in a stochastic setting in Parts II and
III. This chapter essentially focused on generalizations of the first approach, which is
also based upon, to a large extent, the structure and existence of minimal sets. Two
excellent expositions in a purely deterministic and control-free setting are the books by
Bhatia and Szegö [34] and Brown [55]. Saperstone [346] considers infinite dimensional
spaces so that, in particular, the methods may be applied directly to the dynamical
system on the space of probability measures which is generated by a Markov processes.
166 The nonlinear state space model
The connections between control theory and irreducibility described here are taken
from Meyn [259] and Meyn and Caines [272, 271]. The dissertations of Chan [61] and
Mokkadem [286], and also Diebolt and Guégan [92], treat discrete time nonlinear state
space models and their associated control models. Diebolt in [91] considers nonlinear
models with additive noise of the form Φk +1 = F (Φk ) + Wk +1 using an approach which
is very different to that described here.
Jakubsczyk and Sontag in [173] present a survey of the results obtainable for forward
accessible discrete time control systems in a purely deterministic setting. They give a
different characterization of forward accessibility, based upon the rank of an associated
Lie algebra, rather than a controllability matrix.
The origin of the approach taken in this chapter lies in the often cited paper by
Stroock and Varadhan [378]. There it is shown that the support of the distribution of
a diffusion process may be characterized by considering an associated control model.
Ichihara and Kunita in [167] and Kliemann in [211] use this approach to develop an
ergodic theory for diffusions. The invariant control sets of [211] may be compared to
minimal sets as defined here.
At this stage, introduction of the e-chain class of models is not well motivated. The
reader who wishes to explore them immediately should move to Chapter 12.
In Duflo [102], a condition closely related to the stability condition which we impose
on ∇Φ is used to obtain the Central Limit Theorem for a nonlinear state space model.
Duflo assumes that the function F satisfies
E[α(W )m ] < 1.
It is easy to see that any process Φ generated by a nonlinear state space model satisfying
this bound is an e-chain.
For models more complex than the linear model of Section 7.5.2 it will not be as easy
to prove that ∇Φ converges to zero, so a lengthier stability analysis of this sensitivity
process may be necessary. Since ∇Φ is essentially generated by a random linear system
it is therefore likely to either converge to zero or evanesce.
It seems probable that the stochastic Lyapunov function approach of Kushner [232]
or Khas’minskii [206], or a more direct analysis based upon limit theorems for products
of random matrices as developed in, for instance, Furstenberg and Kesten [134] will be
well suited for assessing the stability of ∇Φ .
Commentary for the second edition: The conjecture voiced in the first edition
was confirmed ten years after it was first put into print. A stochastic Lyapunov approach
is introduced in [165] for verification of stability of the sensitivity process1 for a class
of Markov models.
A significant omission in the first edition is any discussion of the relationship between
stability of the sensitivity process ∇Φ and Lyapunov exponents (see [212, 255]). For a
1 The sensitivity process was called the derivative process in the first edition.
7.6. Commentary* 167
given initial condition x, the top Lyapunov exponent is defined as the random variable
1
Λx := lim sup log
∇Φn
.
n →∞ n
The choice of norm is arbitrary. There is also a version defined in expectation: for any
p > 0 denote
1
Λx (p) := lim sup log Ex [
∇Φn
p ].
n →∞ n
One approach to establishing the e-chain property is to show that Λx (p) is independent
of x, and negative for all p sufficiently small [165].
Methods for estimating the Lyapunov exponent and conditions for verifying equicon-
tinuity are established for versions of the NSS(F ) model, in continuous or discrete time,
in several recent papers under a variety of assumptions [370, 371, 22, 165, 20, 323].
A hidden Markov model (HMM) is a Markov chain Φ, along with an observation
process Y evolving on a state space Y. It is assumed that there is an i.i.d. sequence D
evolving on its own state space D, along with a function G : X × D → Y such that the
observation process can be expressed as a noisy function of the chain
Yn = G(Φn , Dn ), n ≥ 0.
STABILITY STRUCTURES
Chapter 8
171
172 Transience and recurrence
(ii) there is a countable cover of X with uniformly transient sets, in which case we call
Φ transient, and every petite set is uniformly transient.
A second goal of this chapter is the development of criteria based on the drift function
for both transience and recurrence.
(i) The chain Φ is transient if and only if there exists a bounded non-negative function
V and a set C ∈ B + (X) such that for all x ∈ C c ,
∆V (x) ≥ 0 (8.2)
and
D = {V (x) > sup V (y)} ∈ B+ (X). (8.3)
y ∈C
(ii) The chain Φ is recurrent if there exists a petite set C ⊂ X, and a function V
which is unbounded off petite sets in the sense that CV (n) := {y : V (y) ≤ n} is
petite for all n, such that
∆V (x) ≤ 0, x ∈ Cc . (8.4)
8.1. Classifying chains on countable spaces 173
Proof The drift criterion for transience is proved in Theorem 8.4.2, whilst the
condition for recurrence is in Theorem 8.4.3.
Such conditions were developed by Lyapunov as criteria for stability in deterministic
systems, by Khas’minskii and others for stochastic differential equations [206, 232], and
by Foster as criteria for stability for Markov chains on a countable space: Theorem 8.0.2
is originally due (for countable spaces) to Foster [129] in essentially the form given above.
There is in fact a converse to Theorem 8.0.2 (ii) also, but only for ψ-irreducible
Feller chains (which include all countable space chains): we prove this in Section 9.4.2.
It is not known whether a converse holds in general.
Recurrence is also often phrased in terms of the hitting time variables τA = inf{k ≥
1 : Φk ∈ A}, with “recurrence” for a set A being defined by L(x, A) = Px (τA < ∞) = 1
for all x ∈ A. The connections between this condition and recurrence as we have defined
it above are simple in the countable state space case: the conditions are in fact equivalent
when A is an atom. In general spaces we do not have such complete equivalence.
Recurrence properties in terms of τA (which we call Harris recurrence properties) are
much deeper and we devote much of the next chapter to them. In this chapter we do
however give some of the simpler connections: for example, if L(x, A) = 1 for all x ∈ A
then ηA = ∞ a.s. when Φ0 ∈ A, and hence A is recurrent (see Proposition 8.3.1).
Classification of states
The state α is called transient if Eα (ηα ) < ∞, and recurrent if Eα (ηα ) =
∞.
∞
From the definition U (x, y) = n =1 P n (x, y) we have immediately that for any
states x, y ∈ X
Ex [ηy ] = U (x, y). (8.5)
The following result gives a structural dichotomy which enables us to consider, not just
the stability of states, but of chains as a whole.
Proposition 8.1.1. When X is countable and Φ is irreducible, either U (x, y) = ∞ for
all x, y ∈ X or U (x, y) < ∞ for all x, y ∈ X.
174 Transience and recurrence
Hence the series U (x, y) and U (u, v) all converge or diverge simultaneously, and the
result is proved.
Now we can extend these stability concepts for states to the whole chain.
The solidarity results of Proposition 8.1.3 and Proposition 8.1.1 enable us to classify
irreducible chains by the property possessed by one and then all states.
Theorem 8.1.2. When Φ is irreducible, then either Φ is transient or Φ is recurrent.
We can say, in the countable case, exactly what recurrence or transience means in
terms of the return time probabilities L(x, x). In order to connect these concepts, for
a fixed n consider the event {Φn = α}, and decompose this event over the mutually
exclusive events {Φn = α, τα = j} for j = 1, . . . , n. Since Φ is a Markov chain, this
provides the first-entrance decomposition of P n given for n ≥ 1 by
n −1
P n (x, α) = Px {τα = n} + Px {τα = j}P n −j (α, α). (8.7)
j =1
then multiplying (8.7) by z n and summing from n = 1 to ∞ gives for |z| < 1
Proof Consider the first entrance decomposition in (8.10) with x = α: this gives
.
U (z ) (α, α) = L(z ) (α, α) 1 − L(z ) (α, α) . (8.11)
Proof From Proposition 8.1.3 and Proposition 8.1.1, we have L(x, x) < 1 for all x
or L(x, x) = 1 for all x. Suppose in the latter case, we have L(x, y) < 1 for some pair
x, y: by irreducibility, U (y, x) > 0 and thus for some n we have Py (Φn = x, τy > n) > 0,
from which we have L(y, y) < 1, which is a contradiction.
In Chapter 9 we will define Harris recurrence as the property that L(x, A) ≡ 1 for
all x ∈ A and A ∈ B+ (X): for countable chains, we have thus shown that recurrent
chains are also Harris recurrent, a theme we return to in the next chapter when we
explore stability in terms of L(x, A) in more detail.
also. Hence the forward recurrence time chain is always recurrent if p is a proper
distribution.
The calculation in the proof of Proposition 8.1.3 is actually a special case of the use
of the renewal equation. Let Zn be a renewal process with increment distribution p as
defined in Section 2.4. By breaking up the event {Zk = n} over the last time before n
that a renewal occurred we have
∞
u(n) := P(Zk = n) = 1 + u ∗ p(n)
k =0
176 Transience and recurrence
Random walk on Z
In order to classify general random walk on the integers we will use the laws of large
numbers. Proving these is outside the scope of this book: see, for example, Billingsley
[37] or Chung [72] for these results.
Suppose that Φn is a random walk such that the increment distribution Γ has a
mean which is zero. The form of the Weak Law of Large Numbers that we will use can
be stated in our notation as
P n (0, A(εn)) → 1 (8.14)
for any ε, where the set A(k) = {y : |y| ≤ k}. From this we prove
Theorem 8.1.5. If Φ is an irreducible random walk on Z whose increment distribution
Γ has mean zero, then Φ is recurrent.
8.2. Classifying ψ-irreducible chains 177
P k (0, A(ak)) → 1
= [2a]−1 .
(8.17)
Since a can be chosen arbitrarily small, we have U (0, 0) = ∞ and the chain is recurrent.
This proof clearly uses special properties of random walk. If Γ has simpler structure
then we shall see that simpler procedures give recurrence in Section 8.4.3.
AP
n
(x, B) = Px {Φn ∈ B, τA ≥ n},
178 Transience and recurrence
n −1
P n (x, B) = A P n (x, B) + AP
j
(x, dw)P n −j (w, B) (8.18)
j =1 A
n −1
P n (x, B) = A P n (x, B) + P j (x, dw)A P n −j (w, B). (8.19)
j =1 A
∞
(z )
UA (x, B) := AP
n
(x, B)z n , |z| < 1. (8.21)
n =1
We will prove the solidarity results we require by exploiting the convolution forms in
(8.18) and (8.19). Multiplying by z n in (8.18) and (8.19) and summing, the first entrance
and last exit decompositions give, respectively, for |z| < 1
(z ) (z )
U (z ) (x, B) = UA (x, B) + UA (x, dw)U (z ) (w, B), (8.25)
A
(z ) (z )
U (z ) (x, B) = UA (x, B) + U (z ) (x, dw)UA (w, B). (8.26)
A
Proof (i) If A ∈ B + (X) then for any x we have r, s such that P r (x, α) > 0,
s
P (α, A) > 0 and so
P r +s+n (x, A) ≥ P r (x, α) P n (α, α) P s (α, A) = ∞. (8.27)
n n
Hence the series U (x, A) diverges for every x, A when U (α, α) diverges.
(ii) To prove the converse, we first note that for an atom, transience is equivalent
to L(α, α) < 1, exactly as in Proposition 8.1.3.
Now consider the last exit decomposition (8.26) with A, B = α. We have for any
x∈X
U (z ) (x, α) = Uα(z ) (x, α) + U (z ) (x, α)Uα(z ) (α, α)
and so by rearranging terms we have for all z < 1
U (z ) (x, α) = Uα(z ) (x, α)[1 − Uα(z ) (α, α)]−1 ≤ [1 − L(α, α)]−1 < ∞.
j
α(j) = {y : P n (y, α) > j −1 }.
n =1
180 Transience and recurrence
Transient sets
If A ∈ B(X) can be covered with a countable number of uniformly transient
sets, then we call A transient.
We first check that the split chain and the original chain have mutually consistent
recurrent/transient classifications.
Proposition 8.2.2. Suppose that Φ is ψ-irreducible and strongly aperiodic. Then either
both Φ and Φ̌ are recurrent, or both Φ and Φ̌ are transient.
If B ∈ B + (X) then since ψ ∗ (B0 ) > 0 it follows from (8.28) that if Φ̌ is recurrent, so is
Φ. Conversely, if Φ̌ is transient, by taking a cover of X̌ with uniformly transient sets it
is equally clear from (8.28) that Φ is transient.
We know from Theorem 8.2.1 that Φ̌ is either transient or recurrent, and so the
dichotomy extends in this way to Φ.
To extend this result to general chains without atoms we first require a link between
the recurrence of the chain and its resolvent.
Lemma 8.2.3. For any 0 < ε < 1 the following identity holds:
∞
∞
1−ε n
Kanε = P .
n =1
ε n =0
we see that B(z) = ((1 − ε)/ε)(1 − z)−1 . By uniqueness of the power series expansion
it follows that b(n) = (1 − ε)/ε for all n, which completes the proof.
As an immediate consequence of Lemma 8.2.3 we have
Proposition 8.2.4. Suppose that Φ is ψ-irreducible.
(i) The chain Φ is transient if and only if each Ka ε -chain is transient.
(ii) The chain Φ is recurrent if and only if each Ka ε -chain is recurrent.
Proof From Proposition 5.4.5 we are assured that the Ka ε -chain is strongly ape-
riodic. Using Proposition 8.2.2 we know then that each Ka ε -chain can be classified
dichotomously as recurrent or transient.
Since Proposition 8.2.4 shows that the Ka ε -chain passes on either of these properties
to Φ itself, the result is proved.
We also have the following analogue of Proposition 8.2.4:
182 Transience and recurrence
(i) The chain Φ is transient if and only if one, and then every, m-skeleton Φm is
transient.
(ii) The chain Φ is recurrent if and only if one, and then every, m-skeleton Φm is
recurrent.
Proof
(i) If A is a uniformly transient set for the m-skeleton Φm , with
j P
jm
(x, A) ≤ M , then we have from the Chapman–Kolmogorov equations
∞
m
P j (x, A) = P r (x, dy) P j m (y, A) ≤ mM. (8.29)
j =1 r =1 j
(ii) If the m-skeleton is recurrent then from the equality in (8.29) we again have
that
P j (x, A) = ∞, x ∈ X, A ∈ B + (X), (8.30)
Proposition 8.3.1. Suppose that Φ is a Markov chain, but not necessarily irreducible.
(i) If any set A ∈ B(X) is uniformly transient with U (x, A) ≤ M for x ∈ A, then
U (x, A) ≤ 1 + M for every x ∈ X.
(ii) If any set A ∈ B(X) satisfies L(x, A) = 1 for all x ∈ A, then A is recurrent. If Φ
is ψ-irreducible, then A ∈ B+ (X) and we have U (x, A) ≡ ∞ for x ∈ X.
(iii) If any set A ∈ B(X) satisfies L(x, A) ≤ ε < 1 for x ∈ A, then we have U (x, A) ≤
1/[1 − ε] for x ∈ X, so that in particular A is uniformly transient.
(iv) Let τA (k) denote the k th return time to A, and suppose that for some m
as claimed.
(iii) Suppose on the other hand that L(x, A) ≤ ε < 1, x ∈ A. The last exit
decomposition again gives
(z ) (z )
U (z ) (x, A) = UA (x, A) + U (z ) (x, dy)UA (y, A) ≤ 1 + εU (z ) (x, A)
A
184 Transience and recurrence
≤ εk +1 ,
and so for x ∈ A ∞
U (x, A) = n =1 Px (ηA ≥ n)
∞
≤ m[1 + k =1 Px (ηA ≥ km)] (8.35)
≤ m/[1 − ε].
We now use (i) to give the required bound over all of X.
If there is one uniformly transient set then it is easy to identify other such sets, even
without irreducibility. We have
a
Proposition 8.3.2. If A is uniformly transient, and B A for some a, then B is
uniformly transient. Hence if A is uniformly transient, there is a countable covering of
A by uniformly transient sets.
a
Proof From Lemma 5.5.2 (iii), we have when B A that for some δ > 0,
U (x, A) ≥ U (x, dy)Ka (y, A) ≥ δU (x, B)
and thus (8.31) holds for B(m); from (8.35) it follows that B(m) is uniformly transient.
These results have direct application in the ψ-irreducible case. We next give a
number of such consequences.
Proof Suppose Φ is not recurrent: that is, there exists some pair A ∈ B+ (X),
x ∈ X with U (x∗ , A) < ∞. If A∗ = {y : U (y, A) = ∞}, then ψ(A∗ ) = 0: for otherwise
∗
Set Ar = {y ∈ A : U (y, A) ≤ r}. Since ψ(A) > 0, and Ar ↑ A ∩ Ac∗ , there must exist
some r such that ψ(Ar ) > 0, and by Proposition 8.3.1 (i) we have for all y,
U (y, Ar ) ≤ 1 + r. (8.37)
M
Consider now Ar (M ) = {y : m =0 P m (y, Ar ) > M −1 }. For any x, from (8.37)
∞
M
M (1 + r) ≥ M U (x, Ar ) ≥ P n (x, Ar )
m =1 n =m
∞
M
= P n (x, dw) P m (w, Ar )
n =0 X m =1
(8.38)
∞
M
≥ P n (x, dw) P m (w, Ar )
n =0 A r (M ) m =1
∞
≥ M −1 P n (x, Ar (M )).
n =0
Since ψ(Ar ) > 0 we have ∪m Ar (m) = X, and so the {Ar (m)} form a partition of X into
uniformly transient sets as required.
The partition of X into uniformly transient sets given in Proposition 8.3.2 and in
Theorem 8.3.4 leads immediately to
186 Transience and recurrence
Theorem 8.3.5. If Φ is ψ-irreducible and transient, then every petite set is uniformly
transient.
Proof If C is petite, then by Proposition 5.5.5 (iii) there exists a sampling distri-
a
bution a such that C B for any B ∈ B+ (X). If Φ is transient then there exists at
least one B ∈ B+ (X) which is uniformly transient, so that C is uniformly transient from
Proposition 8.3.2.
Thus petite sets are also “small” within the transience definitions. This gives us a
criterion for recurrence which we shall use in practice for many models; we combine it
with a criterion for transience in
(i) Φ is recurrent if there exists some petite set C ∈ B(X) such that L(x, C) ≡ 1 for
all x ∈ C.
(ii) Φ is transient if and only if there exist two sets D, C in B + (X) with L(x, C) < 1
for all x ∈ D.
Proof (i) From Proposition 8.3.1 (ii) C is recurrent. Since C is petite Theo-
rem 8.3.5 shows Φ is recurrent. Note that we do not assume that C is in B + (X), but
that this follows also.
(ii) Suppose the sets C, D exist in B + (X). There must exist Dε ⊂ D such that
ψ(Dε ) > 0 and L(x, C) ≤ 1 − ε for all x ∈ Dε . If also ψ(Dε ∩ C) > 0 then since
L(x, C) ≥ L(Dε ∩ C) we have that Dε ∩ C is uniformly transient from Proposition 8.3.1
and the chain is transient.
Otherwise we must have ψ(Dε ∩ C c ) > 0. The maximal nature of ψ then implies
that for some δ > 0 and some n ≥ 1 the set Cδ := {y ∈ C : C P n (y, Dε ∩ C c ) > δ} also
has positive ψ-measure. Since, for x ∈ Cδ ,
1 − L(x, Cδ ) ≥ C P (x, dy)[1 − L(y, Cδ )] ≥ δε
n
D ε ∩C c
Proof The existence of the periodic sets Di is guaranteed by Theorem 5.4.4, and
the
!d fact that the set N is transient is then a consequence of Proposition 8.3.3, since
i=1 Di is itself absorbing.
In the main, transient sets and chains are ones we wish to exclude in practice. The
results of this section have formalized the situation we would hope would hold: sets
which appear to be irrelevant to the main dynamics of the chain are indeed so, in many
different ways. But one cannot exclude them all, and for all of the statements where
ψ-null (and hence transient) exceptional sets occur, one can construct examples to show
that the “bad” sets need not be empty.
..
.
N
≥ CP
j
(x, dy)h(y) + CP
N
(x, dy)h(y).
j =1 C Cc
(8.40)
Letting N → ∞ shows that h(x) ≥ h∗ (x) for all x.
This gives the required drift criterion for transience. Recall the definition of the
drift operator as ∆V (x) = P (x, dy)V (y) − V (x); obviously ∆ is well-defined if V is
bounded. We define the sublevel set CV (r) of any function V for r ≥ 0 by
Proof Suppose that V is an arbitrary bounded solution of (i) and (ii), and let M
be a bound for V over X. Clearly M > r. Set C = CV (r), D = C c , and
"
[M − V (x)]/[M − r] x ∈ D
hV (x) =
1 x∈C
∆V (x) ≤ 0, x ∈ Cc . (8.42)
We will find frequently that, in order to test such drift for the process Φ, we need
to consider functions V : X → R such that the set CV (M ) = {y ∈ X : V (y) ≤ M }
is “finite” for each M . Such a function on a countable space or topological space is
easy to define: in this abstract setting we first need to define a class of functions with
this property, and we will find that they recur frequently, giving further meaning to the
intuitive meaning of petite sets.
Note that since, for an irreducible chain, a finite union of petite sets is petite, and
since any subset of a petite set is itself petite, a function V : X → R+ will be unbounded
off petite sets for Φ if there merely exists a sequence {Cj } of petite sets such that, for
any n < ∞
*
N
CV (n) ⊆ Cj (8.43)
j =1
190 Transience and recurrence
Proof We will show that L(x, C) ≡ 1 which will give recurrence from Theo-
rem 8.3.6. Note that by replacing the set C by C ∪ CV (n) for n suitably large, we
can assume without loss of generality that C ∈ B+ (X).
Suppose by way of contradiction that the chain is transient, and thus that there
exists some x∗ ∈ C c with L(x∗ , C) < 1.
Set CV (n) = {y ∈ X : V (y) ≤ n}: we know this is petite, by definition of V , and
hence it follows from Theorem 8.3.5 that CV (n) is uniformly transient for any n. Now
fix M large enough that
M > V (x∗ )/[1 − L(x∗ , C)]. (8.44)
Let us modify P to define a kernel P/ with entries P/(x, A) = P (x, A) for x ∈ C c and
P/(x, x) = 1, x ∈ C. This defines a chain Φ/ with C as an absorbing set, and with the
property that for all x ∈ X
P/(x, dy)V (y) ≤ V (x). (8.45)
whilst for A ⊆ C c
P/ n (x, A) ≤ P n (x, A), x ∈ Cc . (8.47)
By iterating (8.45) we thus get, for fixed x ∈ C c
n
V (x) ≥ P/ (x, dy)V (y)
≥ P/n (x, dy)V (y) (8.48)
C c ∩[C V (M )] c
≥ M 1 − P/n (x, CV (M ) ∪ C) .
Letting n → ∞ in (8.48) for x = x∗ provides a contradiction with (8.50) and our choice
of M . Hence we must have L(x, C) ≡ 1, and Φ is recurrent, as required.
8.4. Classification using drift criteria 191
Proof In Theorem 8.4.3 choose the test function V (x) = |x|. Then for x > r we
have that
P (x, y)[V (y) − V (x)] = Γ(w)w,
y y
Then the conditions of Theorem 8.4.3 are satisfied with C = {−r, . . . , r} and with (8.42)
holding for x ∈ C c , and so the chain is recurrent.
Proof Suppose Γ has non-zero mean β > 0. We will establish for some bounded
monotone increasing V that
P (x, y)V (y) = V (x) (8.51)
y
for x ≥ r.
192 Transience and recurrence
This time choose the test function V (x) = 1 − ρx for x ≥ 0, and V (x) = 0 elsewhere.
The sublevel sets of V are of the form (−∞, r] with r ≥ 0. This function satisfies (8.51)
if and only if for x ≥ r
P (x, y)[ρy /ρx ] = 1 (8.52)
y
so that this V can be constructed as a valid test function if (and only if) there is a
ρ < 1 with
Γ(w)ρw = 1. (8.53)
w
Therefore the existence of a solution to (8.53) will imply that the chain is transient,
since return tothe whole half line (−∞, r] is less than sure from Proposition 8.4.2.
Write β(s) = w Γ(w)sw : then β is well defined for s ∈ (0, 1] by the bounded range
assumption. By irreducibility, we must have Γ(w) > 0 for some w < 0, so that β(s) → ∞
as s → 0. Since β(1) = 1, and β (1) = w wΓ(w) = β > 0 it follows that such a ρ
exists, and hence the chain is transient.
Similarly, if the mean of Γ is negative, we can by symmetry prove transience because
the chain fails to return to the half line [−r, ∞).
For random walk on the half line Z+ with bounded range, as defined by (RWHL1)
we find
Proposition 8.4.6. If the random walk increment distribution Γ on the integers has
mean β and a bounded range, then the random walk on Z+ is recurrent if and only if
β ≤ 0.
but since, in this case, the set {x ≤ r} is finite, we have (8.42) holding and the chain is
recurrent.
The first part of this proof involves a so-called “stochastic comparison” argument:
we use the return time probabilities for one chain to bound the same probabilities for
another chain. This is simple but extremely effective, and we shall use it a number
of times in classifying random walk. A more general formulation will be given in Sec-
tion 9.5.1.
Varying the condition that the range of the increment is bounded requires a much
more delicate argument, and indeed the known result of Theorem 8.1.5 for a general
random walk on Z, that recurrence is equivalent to the mean β = 0, appears difficult if
not impossible to prove by drift methods without some bounds on the spread of Γ.
8.5. Classifying random walk on R+ 193
then Φ is recurrent.
Proof Clearly the chain is ϕ-irreducible when β < 0 with ϕ = δ0 , and all compact
sets are small as in Chapter 5. To prove recurrence we use Theorem 8.4.3, and show
that we can in fact find a suitably unbounded function V and a compact set C satisfying
P (x, dy)V (y) ≤ V (x) − ε, x ∈ Cc , (8.54)
for some ε > 0. As in the countable case we note that since β < 0 there exists x0 < ∞
such that ∞
w Γ(dw) < β/2 < 0,
−x 0
and thus if V (x) = x, for x > x0
∞
P (x, dy)[V (y) − V (x)] ≤ w Γ(dw). (8.55)
−x 0
Lemma 8.5.2. Let W be a random variable with law Γ, s a positive number and t any
real number. Then for any A ⊆ {w ∈ R : s + tw > 0},
Proof For all x > −1, log(1 + x) ≤ x − (x2 /2)I{x < 0}. Thus
≤ [log(s) + tW/s]I{W ∈ A}
Lemma 8.5.3. Let W be a random variable with law Γ and finite variance. Let s be a
positive number and t a real number. Then
lim −xE[W I{W < t − sx}] = lim xE[W I{W > t + sx}] = 0. (8.56)
x→∞ x→∞
lim −xE[W I{W > t − sx}] = lim xE[W I{W < t + sx}] = 0. (8.57)
x→∞ x→∞
and
t+sx t+sx
0 ≤ lim (t + sx) wΓ(dw) ≤ lim w2 Γ(dw) = 0.
x→−∞ −∞ x→−∞ −∞
If E[W ] = 0, then E[W I{W > t + sx}] = −E[W I{W < t + sx}], giving the second
result.
We now prove
8.5. Classifying random walk on R+ 195
Proof We can assume without loss of generality that Γ(−∞, 0) > 0: for clearly, if
Γ[0, ∞) = 1 then Px (τ0 < ∞) = 0, x > 0 and the chain moves inexorably to infinity;
hence it is not irreducible, and it is transient in every meaning of the word.
We will show that for a chain which is skip-free to the right the condition β > 0 is
sufficient for transience, by examining the solutions of the equations
P (x, y)V (y) = V (x), x ≥ 1, (8.62)
V (x) = Γ(−x + 1)V (1) + Γ(−x + 2)V (2) + · · · + Γ(1)V (1 + x). (8.63)
Once the first value in the V (x) sequence is chosen, we therefore have the remaining
values given by an iterative process. Our goal is to show that we can define the sequence
in a way that gives us a non-constant positive bounded solution to (8.63).
In order to do this we first write
∞
∞
V ∗ (z) = V (x)z x , Γ∗ (z) = Γ(x)z x ,
0 −∞
where V ∗ (z) has yet to be shown to be defined for any z and Γ∗ (z) is clearly defined at
least for |z| ≥ 1. Multiplying by z x in (8.63) and summing we have that
Now suppose that we can show (as we do below) that there is an analytic expansion of
the function
∞
−1 ∗ −1
z [1 − z]/[Γ (z ) − 1] = bn z n (8.65)
0
in the region 0 < z < 1 with bn ≥ 0. Then we will have the identity
From this, we will be able to identify the form of the solution V . Explicitly, from (8.66)
we have ∞ n
V ∗ (z) = zΓ(1)V (1) n =0 z n m =0 bm (8.67)
so that equating coefficients of z n in (8.67) gives
x−1
V (x) = Γ(1)V (1) bm .
m =0
8.6. Commentary* 197
Thus we have reduced the question of transience to identifying conditions under which
the expansion in (8.65) holds with the coefficients bj positive and summable.
Let us write aj = Γ(1 − j) so that
∞
A(z) := aj z j = zΓ∗ (z −1 )
0
= 1 − [1 − A(z)]/[1 − z] (8.69)
∞ ∞
= 1− 0 zj n =j +1 an .
and so B(z)−1 is well defined for |z| < 1; moreover, by the expansion in (8.69)
B(z)−1 = bj z j
8.6 Commentary*
On countable spaces the solidarity results we generalize here are classical, and thorough
expositions are in Feller [114], Chung [71], Çinlar [59] and many more places. Recurrence
is called persistence by Feller, but the terminology we use here seems to have become
the more standard. The first entrance, and particularly the last exit, decomposition are
vital tools introduced and exploited in a number of ways by Chung [71].
There are several approaches to the transience/recurrence dichotomy. A common
one which can be shown to be virtually identical with that we present here uses the
concept of inessential sets (sets for which ηA is almost surely finite). These play the
role of transient parts of the space, with recurrent parts of the space being sets which
198 Transience and recurrence
are not inessential. This is the approach in Orey [309], based on the original methods
of Doeblin [95] and Doob [99].
Our presentation of transience, stressing the role of uniformly transient sets, is
new, although it is implicit in many places. Most of the individual calculations are in
Nummelin [303], and a number are based on the more general approach in Tweedie
[394]. Equivalences between properties of the kernel U (x, A), which we have called
recurrence and transience properties, and the properties of essential and inessential sets
are studied in Tuominen [390].
The uniform transience property is inherently stronger than the inessential property,
and it certainly aids in showing that the skeletons and the original chain share the
dichotomy between recurrence and transience. For use of the properties of skeleton
chains in direct application, see Tjøstheim [386].
The drift conditions we give here are due in the countable case to Foster [129],
and the versions for more general spaces were introduced in Tweedie [397, 398] and in
Kalashnikov [189]. We shall revisit these drift conditions, and expand somewhat on
their implications in the next chapter. Stronger versions of (V1) will play a central role
in classifying chains as yet more stable in due course.
The test functions for classifying random walk in the bounded range case are directly
based on those introduced by Foster [129]. The evaluation of the transience condition for
skip-free walks, given in Proposition 8.5.5, is also due to Foster. The approximations in
the case of zero drift are taken from Guo and Petrucelli [149] and are reused in analyzing
SETAR models in Section 9.5.2.
The proof of recurrence of random walk in Theorem 8.1.5, using the weak law of
large numbers, is due to Chung and Ornstein [73]. It appears difficult to prove this
using the elementary drift methods.
The drift condition in the case of negative mean gives, as is well known, a stronger
form of recurrence: the concerned reader will find that this is taken up in detail in
Chapter 11, where it is a central part of our analysis.
Commentary for the second edition: The drift operator (8.1) is analogous to the
generator for a Markov process in continuous time. Some of the theory surrounding
continuous time models is summarized in Section 20.3, including some foundations of
generators and resolvents.
Chapter 9
In this chapter we consider stronger concepts of recurrence and link them with the
dichotomy proved in Chapter 8. We also consider several obvious definitions of global
and local recurrence and transience for chains on topological spaces, and show that they
also link to the fundamental dichotomy.
In developing concepts of recurrence for sets A ∈ B(X), we will consider not just
the first hitting time τA , or the expected value U ( · , A) of ηA , but also the event that
Φ ∈ A infinitely often (i.o.), or ηA = ∞, defined by
∞ *
∞
{Φ ∈ A i.o.} := {Φk ∈ A}
N =1 k =N
obviously, for any x, A we have Q(x, A) ≤ L(x, A), and by the strong Markov property
we have
Q(x, A) = Ex [PΦ τ A {Φ ∈ A i.o.}I{τA < ∞}] = UA (x, dy)Q(y, A). (9.2)
A
Harris recurrence
The set A is called Harris recurrent if
Q(x, A) = Px (ηA = ∞) = 1, x ∈ A.
199
200 Harris and topological recurrence
We will see in Theorem 9.1.4 that when A ∈ B+ (X) and Φ is Harris recurrent then
in fact we have the seemingly stronger and perhaps more commonly used property that
Q(x, A) = 1 for every x ∈ X.
It is obvious from the definitions that if a set is Harris recurrent, then it is recurrent.
Indeed, in the formulation above the strengthening from recurrence to Harris recurrence
is quite explicit, indicating a move from an expected infinity of visits to an almost surely
infinite number of visits to a set.
This definition of Harris recurrence appears on the face of it to be stronger than
requiring L(x, A) ≡ 1 for x ∈ A, which is a standard alternative definition of Harris
recurrence. In one of the key results of this section, Proposition 9.1.1, we prove that
they are in fact equivalent.
The highlight of the Harris recurrence analysis is
X=H ∪N (9.3)
where H is absorbing and non-empty and every subset of H in B + (X) is Harris recur-
rent; and N is ψ-null and transient.
Theorem 9.0.2. For a ψ-irreducible T-chain, the chain is Harris recurrent if and only
if Px {Φ → ∞} = 0 for each x ∈ X.
Proposition 9.1.1. Suppose for some one set A ∈ B(X) we have L(x, A) ≡ 1, x ∈ A.
Then Q(x, A) = L(x, A) for every x ∈ X, and in particular A is Harris recurrent.
Proof Using the strong Markov property, we have that if L(y, A) = 1, y ∈ A, then
for any x ∈ A
Px (τA (2) < ∞) = UA (x, dy)L(y, A) = 1;
A
inductively this gives for x ∈ A, again using the strong Markov property,
Px (τA (k + 1) < ∞) = UA (x, dy)Py (τA (k) < ∞) = 1.
A
Write Cn for the event {|n−1 Φn − β| > β/2}. We only use the result, which follows
from the strong law, that
P0 (lim sup Cn ) = 0. (9.4)
n →∞
Now let Dn denote the event {Φn = 0}, and notice that Dn ⊆ Cn for each n. Immedi-
ately from (9.4) we have
P0 (lim sup Dn ) = 0 (9.5)
n →∞
which says exactly Q(0, 0) = 0.
Hence we have an elegant proof of the general result
Proposition 9.1.2. If Φ denotes random walk on Z and if
β= w Γ(w) > 0,
then Φ is transient.
The most difficult of the results we prove in this section, and the strongest, provides
a rather more delicate link between the probabilities L(x, A) and Q(x, A) than that in
Proposition 9.1.1.
Theorem 9.1.3. (i) Suppose that D A for any sets D and A in B(X). Then
{Φ ∈ D i.o.} ⊆ {Φ ∈ A i.o.} a.s. [P∗ ] (9.6)
and hence Q(y, D) ≤ Q(y, A), for all y ∈ X.
(ii) If X A, then A is Harris recurrent, and in fact Q(x, A) ≡ 1 for every x ∈ X.
Proof Since the event {Φ ∈ A i.o.} involves the whole path of Φ, we cannot deduce
this result merely by considering P n for fixed n. We need to consider all the events
En = {Φn +1 ∈ A}, n ∈ Z+
and evaluate the probability of those paths such that an infinite number of the En hold.
We first show that, if FnΦ is the σ-field generated by {Φ0 , . . . , Φn }, then as n → ∞
*
∞
∞ *
∞
P Ei | FnΦ → I Ei a.s. [P∗ ]. (9.7)
i=n m =1 i=m
Now apply the Martingale Convergence Theorem (see Theorem D.6.1) to the extreme
elements of the inequalities (9.8) to give
! !
∞ ∞
I i=k Ei ≥ lim supn P i=n Ei | FnΦ
!
∞
≥ lim inf n P i=n Ei | FnΦ (9.9)
0 !∞
∞
≥ I m =1 i=m Ei .
9.1. Harris recurrence 203
As k → ∞, the two extreme terms in (9.9) converge, which shows the limit in (9.7)
holds as required. !∞
By the strong Markov property, P∗ [ i=n Ei | FnΦ ] = L(Φn , A) a.s. [P∗ ]. From our
assumption that D A we have that L(Φn , A) is bounded from 0 whenever Φn ∈ D.
Thus, using (9.7) we have P∗ -a.s,
0 !∞
∞
I m =1 i=m {Φi ∈ D} ≤ I lim supn L(Φn , A) > 0
= I limn L(Φn , A) = 1 (9.10)
0 !∞
∞
= I m =1 i=m Ei ,
which is (9.6).
The proof of (ii) is then immediate, by taking D = X in (9.6).
As an easy consequence of Theorem 9.1.3 we have the following strengthening of
Harris recurrence:
Theorem 9.1.4. If Φ is Harris recurrent, then Q(x, B) = 1 for every x ∈ X and every
B ∈ B + (X).
Proof Let {Cn : n ∈ Z+ } be petite sets with ∪Cn = X. Since the finite union of
petite sets is petite for an irreducible chain by Proposition 5.5.5, we may assume that
Cn ⊂ Cn +1 and that Cn ∈ B + (X) for each n.
For any B ∈ B + (X) and any n ∈ Z+ we have from Lemma 5.5.1 that Cn B, and
hence, since Cn is Harris recurrent, we see from Theorem 9.1.3 (i) that Q(x, B) = 1 for
any x ∈ Cn . Because the sets {Ck } cover X, it follows that Q(x, B) = 1 for all x as
claimed.
Having established these stability concepts, and conditions implying they hold for
individual sets, we now move on to consider transience and recurrence of the overall
chain in the ψ-irreducible context.
For if not, and ψ(C1 ) = 0, then by Proposition 4.2.3 there exists an absorbing full
set F ⊂ C1c . We have by definition that L(x, C) = 1 for any x ∈ C ∩ F , and since
F is absorbing we must have L(x, C ∩ F ) = 1 for x ∈ C ∩ F . From Proposition 9.1.1
it follows that Q(x, C ∩ F ) = 1 for x ∈ C ∩ F , which gives a contradiction, since
Q(x, C) ≥ Q(x, C ∩ F ). This shows that in fact ψ(C1 ) > 0.
But now, since C1 ∈ B + (X) there exists B ⊆ C1 , B ∈ B + (X) and δ > 0 with
L(x, C1 ) ≤ δ < 1 for all x ∈ B: accordingly
L(x, B) ≤ L(x, C1 ) ≤ δ, x ∈ B.
9.1. Harris recurrence 205
Now Proposition 8.3.1 (iii) gives U (x, B) ≤ [1 − δ]−1 , x ∈ B and this contradicts the
assumed recurrence of Φ.
Thus H is a non-empty maximal absorbing set, and by Proposition 4.2.3 H is full:
from Proposition 8.3.7 we have immediately that N is transient. It remains to prove
that H is Harris.
For any set A in B + (X) we have C A. It follows from Theorem 9.1.3 that if
Q(x, C) = 1 then Q(x, A) = 1 for every A ∈ B+ (X). Since by construction Q(x, C) = 1
for x ∈ H, we have also that Q(x, A) = 1 for any x ∈ H and A ∈ B+ (X): so Φ restricted
to H is Harris recurrent, which is the required result.
We now strengthen the connection between properties of Φ and those of its skeletons.
Proof If the m-skeleton is Harris recurrent then, since mτAm ≥ τA for any A ∈ B(X),
m
where τA is the first entrance time for the m-skeleton, it immediately follows that Φ is
also Harris recurrent.
Suppose now that Φ is Harris recurrent. For any m ≥ 2 we know from Proposi-
tion 8.2.6 that Φm is recurrent, and hence a Harris set Hm exists for this skeleton.
Since Hm is full, there exists a subset H ⊂ Hm which is absorbing and full for Φ, by
Proposition 4.2.3.
Since Φ is Harris recurrent we have that Px {τH < ∞} ≡ 1, and since H is absorbing
we know that mτHm ≤ τH + m. This shows that
(i) If some petite set C is recurrent, then Φ is recurrent; and the set C∩N is uniformly
transient, where N is the transient set in the Harris decomposition (9.11).
(ii) If there exists some petite set in B(X) such that L(x, C) ≡ 1, x ∈ X, then Φ is
Harris recurrent.
Proof (i) If C is recurrent then so is the chain, from Theorem 8.3.5. Let D =
C ∩ N denote the part of C not in H. Since N is ψ-null, and ν is an irreducibility
measure we must have ν(N ) = 0 by the maximality of ψ; hence (8.33) holds and from
(8.35) we have a uniform bound on U (x, D), x ∈ X so that D is uniformly transient.
(ii) If L(x, C) ≡ 1, x ∈ X for some ψa -petite set C, then from Theorem 9.1.3 C
is Harris recurrent. Since C is petite we have C A for each A ∈ B+ (X). The Harris
206 Harris and topological recurrence
recurrence of C, together with Theorem 9.1.3 (ii), gives Q(x, A) ≡ 1 for all x, so Φ is
Harris recurrent.
Proof In Theorem 8.4.3 we showed that L(x, C ∪CV (n)) ≡ 1, for some n, so Harris
recurrence has already been proved in view of Proposition 9.1.7.
Non-evanescent chains
A Markov chain Φ will be called non-evanescent if Px {Φ → ∞} = 0 for
each x ∈ X.
9.2. Non-evanescent and recurrent chains 207
We first show that for a T-chain, either sample paths converge to infinity or they
enter a recurrent part of the space. Recall that for any A, we have A0 = {y : L(y, A) =
0}.
Theorem 9.2.1. Suppose that Φ is a T-chain. For any A ∈ B(X) which is transient,
and for each x ∈ X, $ %
Px {Φ → ∞} ∪ {Φ enters A0 } = 1. (9.12)
Thus if Φ is a non-evanescent T-chain, then X is not transient.
!
Proof Let A = Bj , with each Bj uniformly transient; then from Proposi-
M −1
tion 8.3.2, the sets B̄i (M ) = {x ∈ X : j
j =1 P (x, Bi ) > M } are also uniformly
!
transient, for any i, j. Thus Ā = Ai where each Ai is uniformly transient.
Since T is lower semicontinuous, the sets Oij := {x ∈ X : T (x, Ai ) > j −1 } are open,
as is Oj := {x ∈ X : T (x, A0 ) > j −1 }, i, j ∈ Z+ . Since T is everywhere non-trivial we
have for all x ∈ X,
*
T (x, Aj ∪ A0 ) = T (x, X) > 0
and hence the sets {Oij , Oj } form an open cover of X.
Let C be a compact subset of X, and choose M such that {OM , OiM : 1 ≤ i ≤ M }
is a finite subcover of C. Since each Ai is uniformly transient, and
Ka (x, Ai ) ≥ T (x, Ai ) ≥ j −1 , x ∈ Oij , (9.13)
we know from Proposition 8.3.2 that each of the sets Oij is uniformly transient. It
follows that with probability one, every trajectory that enters C infinitely often must
enter OM infinitely often: that is,
{Φ ∈ C i.o.} ⊂ {Φ ∈ OM i.o.} a.s. [P∗ ],
But since L(x, A0 ) > 1/M for x ∈ OM we have by Theorem 9.1.3 that
{Φ ∈ OM i.o.} ⊂ {Φ ∈ A0 i.o.} a.s. [P∗ ]
and this completes the proof of (9.12).
Proof Since Φ is a T-chain, there exists some distribution a such that for all x,
Ka (x, B) ≥ T (x, B).
∗
But since T (x , B) > 0 and T (x, B) is lower semicontinuous, it follows that for some
neighborhood O of x∗ ,
inf T (x, B) > 0
x∈O
and thus, as in (5.45),
inf L(x, B) ≥ inf Ka (x, B) ≥ inf T (x, B)
x∈O x∈O x∈O
Proof (i) Assume the state x∗ is topologically recurrent but that O is a neigh-
borhood of x∗ with Q(x∗ , O) = 0. Let O∞ = {y : Q(y, O) = 1}, so that L(x∗ , O∞ ) = 0.
Since
L(x, A) ≥ Ka (x, A) ≥ T (x, A), x ∈ X, A ∈ B(X)
this implies T (x∗ , O∞ ) = 0, and since T is non-trivial, we must have
Let Dn := {y : Py (ηO < n) > n−1 }: since Dn ↑ [O∞ ]c , we must have T (x∗ , Dn ) > 0 for
some n. The continuity of T now ensures that there exists some δ and a neighborhood
Oδ ⊆ O of x∗ such that
T (x, Dn ) > δ, x ∈ Oδ . (9.26)
∞
Let us take m large enough that m a(j) ≤ δ/2: then from (9.26) we have
It follows that
Px (ηO δ ≤ m + n) ≥ Px (ηO ≤ m + n)
m
≥ 1 Dn Dn
P k (x, dy)Py (ηO ≤ n)
(9.29)
≥ n−1 P(τD n ≤ m)
≥ n−1 δ/2m, x ∈ Oδ .
212 Harris and topological recurrence
With (9.29) established we can apply Proposition 8.3.1 to see that Oδ is uniformly
transient.
This contradicts our assumption that x∗ is topologically recurrent, and so in fact
Q(x∗ , O) > 0 for all neighborhoods O.
(ii) Suppose now that P (x∗ , · ) and T (x∗ , · ) are equivalent. Choose x∗ topolog-
ically recurrent and assume we can find a neighborhood O with Q(x∗ , O) < 1. Define
O∞ as before, and note that now P (x∗ , [O∞ ]c ) > 0 since otherwise
Q(x∗ , O) ≥ P (x∗ , dy)Q(y, O) = 1;
O∞
∗ ∞ c
and so also T (x , [O ] ) > 0. Thus we again have (9.25) holding, and the argument in
(i) shows that there is a uniformly transient neighborhood of x∗ , again contradicting the
assumption of topological recurrence. Hence x∗ is topologically Harris recurrent.
The examples (9.22) and (9.24) show that we do not get, in general, the second
conclusion of this proposition if the chain is merely weak Feller or has only a strong
Feller component.
In these examples, it is the lack of irreducibility which allows such obvious “patholog-
ical” behavior, and we shall see in Theorem 9.3.6 that when the chain is a ψ-irreducible
T-chain then this behavior is excluded. Even so, without any irreducibility assump-
tions we are able to derive a reasonable analogue of Theorem 9.1.5, showing that the
non-Harris recurrent states form a transient set.
Theorem 9.3.5. For any chain Φ there is a decomposition
X = R ∪ N,
where R denotes the set of states which are topologically Harris recurrent and N is
transient.
X=H ∪N
where H is non-empty or a maximal Harris set and N is transient; the set of Harris
recurrent states R is contained in H; and every state in N is topologically transient.
Proof The decomposition has already been shown to exist in Theorem 9.2.2. Let
x∗ ∈ R be a topologically Harris recurrent state. Then from (9.14), we must have
L(x, H) = 1, and so x∗ ∈ H by maximality of H.
We can write N = NE ∪ NH where NH = {y ∈ N : T (y, H) > 0} and NE = {y ∈
N : T (y, H) = 0}. For fixed x∗ ∈ NH there exists δ > 0 and an open set Oδ such that
x∗ ∈ Oδ and T (y, H) > δ for all y ∈ Oδ , by the lower semicontinuity of T ( · , H).
Hence also the sampled kernelKa minorized by T satisfies Ka (y, H) > δ for all
y ∈ Oδ . Now choose M such that n > M a(n) ≤ δ/2. Then for all y ∈ Oδ
P n (y, H)a(n) ≥ δ/2,
n ≤M
Coercive functions
A function V is called coercive if V (x) → ∞ as x → ∞: this means that
the sublevel sets {x : V (x) ≤ r} are precompact for each r > 0.
214 Harris and topological recurrence
This nomenclature is designed to remind the user that we seek functions which
behave like norms: they are large as the distance from the center of the space increases.
Typically in practice, a coercive function will be a norm on Euclidean space, or at least a
monotone function of a norm. For irreducible T-chains, functions unbounded off petite
sets certainly include coercive functions, since compacta are petite in that case; but of
course coercive functions are independent of the structure of the chain itself.
Even without irreducibility we get a useful conclusion from applying (V1).
Theorem 9.4.1. If condition (V1) holds for a coercive function V and a compact set
C, then Φ is non-evanescent.
Proof Suppose that in fact Px {Φ → ∞} > 0 for some x ∈ X. Then, since the set
C is compact, there exists M ∈ Z+ with
( )
Px {Φk ∈ C c , k ≥ M } ∩ {Φ → ∞} > 0.
when σC > k, k ∈ Z+ .
Now let Mi = V (Φi )I{σC ≥ i}. Using the fact that {σC ≥ k} ∈ FkΦ−1 , we may show
that (Mk , FkΦ ) is a positive supermartingale: indeed,
Hence there exists an almost surely finite random variable M∞ such that Mk → M∞
as k → ∞.
There are two possibilities for the limit M∞ . Either σC < ∞ in which case M∞ = 0,
or σC = ∞ in which case lim supk →∞ V (Φk ) = M∞ < ∞ and in particular Φ → ∞
since V is coercive. Thus we have shown that
( )
Pµ {σC < ∞} ∪ {Φ → ∞}c = 1,
Theorem 9.4.2. Suppose that Φ is a weak Feller chain, and suppose that there exists
a compact set C satisfying σC < ∞ a.s. [P∗ ].
Then there exists a compact set C0 containing C and a coercive function V , bounded
on compacta, such that
∆V (x) ≤ 0, x ∈ C0c . (9.31)
For any fixed n and any x ∈ Ac0 we have from the Markov property that the sequence
Vn (x) satisfies, for x ∈ Ac0 ∩ Dnc
P (x, dy)Vn (y) = Ex [PΦ 1 {σD n < σA 0 }]
= Px {σD n < σA 0 } (9.33)
= Vn (x),
which clearly satisfies the appropriate drift condition by linearity from (9.34) if finitely
defined, gives the required converse result.
Since Vn (x) = 1 on Dn , it is clear that V is coercive. To complete the proof we must
show that the sequence {ni } can be chosen to ensure that V is bounded on compact
sets, and it is for this we require the Feller property.
Let m ∈ Z+ and take the upper bound
Choose the sequence {ni } as follows. By Proposition 6.1.1, the function Px {σA 0 > m}
is an upper semicontinuous function of x, which converges to zero as m → ∞ for all
x. Hence the convergence is uniform on compacta, and thus we can choose mi so large
that
Px {σA 0 > mi } < 2−(i+1) , x ∈ Ai . (9.37)
216 Harris and topological recurrence
Now for mi fixed for each i, consider Px {σD n < mi }: as a function of x this is also
upper semicontinuous and converges to zero as n → ∞ for all x. Hence again we see
that the convergence is uniform on compacta, which implies we may choose ni so large
that
Px {σD n i < mi } < 2−(i+1) , x ∈ Ai . (9.38)
Combining (9.36), (9.37) and (9.38) we see that Vn i ≤ 2−i for x ∈ Ai . From (9.35) this
implies, finally, for all k ∈ Z+ and x ∈ Ak
∞
V (x) ≤ k+ Vn i (x)
i=k
∞
≤ k+ 2−i
i=k
≤ k + 1, (9.39)
Lemma 9.4.4. Let W be a random variable with distribution function Γ and finite
variance. Let s, c, u2 , and v2 be positive numbers, and let t1 ≥ t2 and u1 , v1 , t be real
numbers. Then
(i)
lim x2 [−Γ(t2 +sx, ∞) log(v1 +v2 x)+Γ(t1 +sx, ∞)(log(u1 +u2 x)−c)] ≤ 0. (9.41)
x→∞
and
lim log[(u1 − u2 x)/(v1 − v2 x)] = log(u2 /v2 ),
x→∞
we have
lim x2 −Γ(−∞, t1 + sx) log(u1 − u2 x) + Γ(−∞, t2 + sx)(log(v1 − v2 x) − c)
x→∞
= lim −x2 (Γ(−∞, t1 + sx) − Γ(−∞, t2 + sx)) log(u1 − u2 x)
x→∞
× −x2 Γ(−∞, t2 + sx) log[(u1 − u2 x)/(v1 − v2 x)] − cx2 Γ(−∞, t2 + sx)
and V (x) = 0 in the region [−R, R], where R > 1 is again a positive constant to be
chosen.
218 Harris and topological recurrence
We need to evaluate the behavior of Ex [V (X1 )] near both ∞ and −∞ in this case,
and we write
V1 (x) = Ex [log(1 + x + W )I{x + W > R}]
V2 (x) = Ex [log(1 − x − W )I{x + W < −R}]
so that
Ex [V (X1 )] = V1 (x) + V2 (x).
This time we develop bounds using the functions
and by Lemma 8.5.3, both V3 and V5 are also o(x−2 ). By Lemma 9.4.4 (i) we also have
The situation with x < −R is exactly symmetric, and thus we have that V is a coercive
function satisfying (V1); and so the chain is non-evanescent from Theorem 9.4.1.
already somewhat linear in structure, such as those based on the random walk: we have
already seen this in our analysis of random walk on the half line in Section 8.4.3.
Such increment analysis is of value in many models, especially if combined with
“stochastic comparison” arguments, which rely heavily on the classification of chains
through return time probabilities.
In this section we will further use the stochastic comparison approach to discuss
the structure of scalar linear models and general random walk on R, and the special
nonlinear SETAR models; we will then consider an increment analysis of general models
on R+ which have no inherent linearity in their structure.
This is not uncommon if the chains have similarly defined structure, as is the case with
random walk and the associated walk on a half line.
The stochastic comparison method tells us that a classification of one of the chains
may automatically classify the other.
In one direction we have, provided C is a petite set for both chains, that when
Px (τC ≥ n) → 0 as n → ∞ for x ∈ C c , then not only is Φ Harris recurrent, but Φ is
also Harris recurrent.
This is obvious. Its value arises in cases where the first chain Φ has a (relatively)
simpler structure so that its analysis is straightforward through, say, drift conditions,
and when the validation of (9.44) is also relatively easy.
In many ways stochastic comparison arguments are even more valuable in the tran-
sient context: as we have seen with random walk, establishing transience may need a
rather delicate argument, and it is then useful to be able to classify “more transient”
chains easily.
Suppose that (9.44) holds, and again that C is a ϕ-irreducible petite set for both
chains. Then if Φ is transient, we know that from Theorem 8.3.6 that there exists
D ⊂ C c such that L(x, C) < 1 − ε for x ∈ D where ϕ(D) > 0; it then follows that Φ
is also transient.
We first illustrate the strengths and drawbacks of this method in proving transience
for the general random walk on the half line R+ .
Proof Consider the discretized version Wh of the increment variable W with dis-
tribution
P(Wh = nh) = Γh (nh)
where Γh (nh) is constructed by setting, for every n,
(n +1)h
Γh (nh) = Γ(dw),
nh
220 Harris and topological recurrence
and let Φh be the corresponding random walk on the countable half line {nh, n ∈ Z+ }.
Then we have firstly that for any starting point nh, the chain Φh is “stochastically
smaller” than Φ, in the sense that if τ0h is the first return time to zero by Φh then
Proposition 9.5.2. Suppose the increment variable W in the scalar linear model is
symmetric with density positive everywhere on [−R, R] and zero elsewhere. Then the
scalar linear model is Harris recurrent if and only if |α| ≤ 1.
Proof The linear model is, under the conditions on W , a µL e b -irreducible chain on
R with all compact sets petite.
Suppose α > 1. By stochastic comparison of this model with a random walk Φ on
a half line with mean increment α − 1 it is obvious that provided the starting point
x > 1, then (9.44) holds with C = (−∞, 1]. Since this set is transient for the random
walk, as we have just shown, it must therefore be transient for the scalar linear model.
Provided the starting point x < −1, then by symmetry, the hitting times on the set
C = [−1, ∞) are also infinite with positive probability. This argument does not require
bounded increments.
If α < −1 then the chain oscillates. If the range of W is contained in [−R, R], with
R > 1, then by choosing x > R we have by symmetry that the hitting time of the chain
X0 , −X1 , X2 , −X3 , . . . on C = (−∞, 1] is stochastically bounded below by the hitting
time of the previous linear model with parameter |α|; thus the set [−R, R] is uniformly
transient for both models.
Thirdly, suppose that the 0 < α ≤ 1. Then by stochastic comparison with random
walk on a half line and mean increment α − 1, from x > R we have that the hitting
time on [−R, R] of the linear model is bounded above by the hitting time on [−R, R] of
the random walk; whilst by symmetry the same is true from x < −R. Since we know
random walk is Harris recurrent it follows that the linear model is Harris recurrent.
Finally, by considering an oscillating chain we have the same recurrence result for
−1 ≤ α ≤ 0.
The points to note in this example are
9.5. Stochastic comparison and increment analysis 221
(ii) even with α > 0, recurrence arguments on the whole line are also difficult to
get to work. They tend to guarantee that the hitting times on half lines such as
C = (−∞, 1] are finite, and since these sets are not compact, we do not have a
guarantee of recurrence: indeed, for transient oscillating linear systems such half
lines are reached on alternate steps with higher and higher probability.
Thus in the case of unbounded increments more delicate arguments are usually needed,
and we illustrate one such method of analysis next.
Φn = Φn −1 + Wn .
This is easy to analyze in the transient situation using stochastic comparison arguments,
given the results already proved.
Proof Suppose that the mean increment of the random walk Φ is positive. Then
the hitting time τ{−∞,0} on {−∞, 0} from an initial point x > 0 is the same as the
hitting time on {0} itself for the associated random walk on the half line; and we have
shown this to be infinite with positive probability. So the unrestricted walk is also
transient.
The argument if β < 0 is clearly symmetric.
This model is non-evanescent when β = 0, as we showed under a finite variance
assumption in Proposition 9.4.5.
Now let us consider the more complex SETAR model
where −∞ = r0 < r1 < · · · < rM = ∞ and Rj = (rj −1 , rj ]; recall that for each j,
the noise variables {Wn (j)} form independent zero-mean noise sequences, and again let
W (j) denote a generic variable in the sequence {Wn (j)}, with distribution Γj .
We will see in due course that under a second-order moment condition (SETAR3),
we can identify exactly the regions of the parameter space where this nonlinear chain
is transient, recurrent and so on.
Here we establish the parameter combinations under which transience will hold:
these are extensions of the non-zero mean increment regions of the random walk we
have just looked at.
222 Harris and topological recurrence
As suggested by Figure B.1–Figure B.3 let us call the exterior of the parameter
space the area defined by
θ(1) > 1 (9.46)
θ(M ) > 1 (9.47)
θ(1) = 1, θ(M ) ≤ 1, φ(1) < 0 (9.48)
θ(1) ≤ 1, θ(M ) = 1, φ(M ) > 0 (9.49)
θ(1) < 0, θ(1)θ(M ) > 1 (9.50)
θ(1) < 0, θ(1)θ(M ) = 1, φ(M ) + θ(M )φ(1) < 0 (9.51)
In order to make the analysis more straightforward we will make the following assump-
tion as appropriate.
(SETAR3) The variances of the noise distributions for the two end inter-
vals are finite; that is,
Proposition 9.5.4. For the SETAR model satisfying the assumptions (SETAR1)–
(SETAR3), the chain is transient in the exterior of the parameter space.
Proof Suppose (9.47) holds. Then the chain is transient, as we show by stochastic
comparison arguments. For until the first time the chain enters (−∞, −rM −1 ) it follows
the sample paths of a model
and for this linear model Px (τ(−∞,0) < ∞) < 1 for all sufficiently large x, as in the proof
of Theorem 9.5.2, by comparison with random walk.
When (9.46) holds, the chain is transient by symmetry: we find Px (τ(0,∞,) < ∞) < 1
for all sufficiently negative x.
When (9.50) holds the same argument can be used, but now for the two step chain:
the one-step chain undergoes larger and larger oscillations and thus there is a positive
probability of never returning to the set [r1 , rM −1 ] for starting points of sufficiently
large magnitude.
Suppose (9.48) holds and begin the process at xo < min(0, r1 ). Then until the first
time the process exits (−∞, min(0, r1 )), it has exactly the sample paths of a random
walk with negative drift, which we showed to be transient in Section 8.5. The proof of
transience when (9.49) holds is similar.
We finally show the chain is transient if (9.51) holds, and for this we need (SETAR3).
Here we also need to exploit Theorem 8.4.2 directly rather than construct a stochastic
comparison argument.
9.5. Stochastic comparison and increment analysis 223
Let a and b be positive constants such that −b/a = θ(1) = 1/θ(M ). Since φ(M ) +
θ(M )φ(1) < 0 we can choose u and v such that −aφ(1) < au + bv < −bφ(M ). Choose
c positive such that
and
δ(x) = φ(M ) + θ(M )x + u.
If we write
V0 (x) = −a−1 E[(1/(δ(x) + W (M )))I[W (M )> c/a−δ (x)] ],
V1 (x) = −c−1 P (−c/b − λ(x) < W (M ) < c/a − δ(x)),
V2 (x) = 1/a(x + u) + b−1 E[(1/(λ(x) + W (M )))([W (M )< −c/b−λ(x)] ],
then we get
Ex [V (X1 )] = V (x) + V0 (x) + V1 (x) + V2 (x). (9.52)
−2
It is easy to show that both V0 (x) and V1 (x) are o(x ). Since
0 ≥ −x2 W (M )/λ(x)(λ(x) + W (M ))
≥ −x2 W (M )(1 + bW (M )/c)/λ2 (x)
≥ −2W (M )(1 + bW (M )/c)/θ2 (M ); (9.53)
1/(1 + W (M )/λ(x)) ≤ 1
224 Harris and topological recurrence
and so
0 ≤ −x2 W (M )/λ(x)(λ(x) + W (M ))
≤ −x2 W (M )/λ2 (x)
≤ −2W (M )/θ2 (M ). (9.54)
Ex [V (X1 )] ≥ V (x).
We may thus apply Theorem 8.4.2 with the set C taken to be [−R, R] and the test
function V above to conclude that the process is transient.
Wx = {Φ1 − Φ0 | Φ0 = x} (9.57)
We will now show that there is a threshold or detailed balance effect between these two
quantities in considering the stability of the chain.
For ease of exposition let us consider the case where the increments again have
uniformly bounded range: that is, for some R and all x,
Γx [−R, R] = 1. (9.58)
To avoid somewhat messy calculations such as those for the random walk or SETAR
models above we will fix the state space as R+ and we will make the assumption that
the measures Γx give sufficient weight to the negative half line to ensure that the chain
is a δ0 -irreducible T-chain and also that v(x) is bounded from zero: this ensures that
recurrence means that τ0 is finite with probability one and that transience means that
P0 (τ0 < ∞) < 1. The δ0 -irreducibility and T-chain properties will of course follow from
assuming, for example, that ε < Γx (−∞, −ε) for some ε > 0.
(i) if there exists θ < 1 and x0 such that for all x > x0
then Φ is recurrent;
(ii) if there exists θ > 1 and x0 such that for all x > x0
then Φ is transient.
and using the bounded range of the increments, the integral in (9.62) after a Taylor
series expansion is, for x > R,
R
Γx (dw)[w/(x + 1) − w2 /2(x + 1)2 + o(x−2 )]
−R (9.63)
−2
= m(x)/(x + 1) − v(x)/2(x + 1) + o(x 2
).
226 Harris and topological recurrence
and hence from Theorem 9.1.8 we have that the chain is recurrent.
(ii) It is obvious with the assumption of positive mean for Γx that for any x the
sets [0, x] and [x, ∞) are both in B + (X).
In order to use Theorem 9.1.8, we will establish that for some suitable monotonic
increasing V
P (x, dy)V (y) ≥ V (x) (9.64)
y
Applying Taylor’s Theorem we see that for all w we have that the integral in (9.66)
equals
αm(x)/(x + 1)1+α − αv(x)/2(x + 1)2+α + O(x−3−α ). (9.67)
Now choose α < θ − 1. For sufficiently large x0 we have that if x > x0 then from (9.67)
we have that (9.66) holds and so the chain is transient.
The fact that this detailed balance between first and second moments is a determi-
nant of the stability properties of the chain is not surprising: on the space R+ all of the
drift conditions are essentially linearizations of the motion of the chain, and virtually
independently of the test functions chosen, a two-term Taylor series expansion will lead
to the results we have described.
One of the more interesting and rather counter-intuitive facets of these results is
that it is possible for the first-order mean drift m(x) to be positive and for the chain to
still be recurrent: in such circumstances it is the occasional negative jump thrown up
by a distribution with a variance large in proportion to its general positive drift which
will give recurrence.
Some weakening of the bounded range assumption is obviously possible for these
results: the proofs then necessitate a rather more subtle analysis and expansion of the
integrals involved. By choosing the iterated logarithm
as the test function for recurrence, and by more detailed analysis of the function
V (x) = 1 − [1 + x]−α
as a test for transience, it is in fact possible to develop the following result, whose proof
we omit.
9.5. Stochastic comparison and increment analysis 227
(i) if there exists δ > 0 and x0 such that for all x > x0
(ii) if there exists θ > 1 and x0 such that for all x > x0
then Φ is transient.
The bounds on the spread of Γx may seem somewhat artifacts of the methods of
proof used, and of course we well know that the zero-mean random walk is recurrent
even though a proof using an approach based upon a drift condition has not yet been
developed to our knowledge.
We conclude this section with a simple example showing that we cannot expect to
drop the higher moment conditions completely.
Let X = Z+ , and let
with P (0, 1) = 1.
Then the chain is easily shown to be recurrent by a direct calculation that for all
n>1
1
n
P0 (τ0 > n) = [1 − c/x].
x=1
which is clearly positive for c < 2/3: hence if Theorem 9.5.6 were applicable we should
have the chain transient.
Of course, in this case we have
and the bound on this higher moment, required in the proof of Theorem 9.5.6, is obvi-
ously violated.
228 Harris and topological recurrence
9.6 Commentary
Harris chains are named after T. E. Harris who introduced many of the essential ideas
in [155]. The important result in Theorem 9.1.3, which enables the properties of Q to
be linked to those of L, is due to Orey [308], and our proof follows that in [309]. That
recurrent chains are “almost” Harris was shown by Tuominen [390], although the key
links between the powerful Harris properties and other seemingly weaker recurrence
properties were developed initially by Jain and Jamison [172].
We have taken the proof of transience for random walk on Z using the Strong Law
of Large Numbers from Spitzer [369].
Non-evanescence is a common form of recurrence for chains on Rk : see, for example,
Khas’minskii [206]. The links between evanescent and transient chains, and the equiva-
lence between Harris and non-evanescent chains under the T-chain condition, are taken
from Meyn and Tweedie [277], who proved Theorem 9.2.2. Most of the connections
between neighborhood and global behavior of chains are given by Rosenblatt [338, 339]
and Tuominen and Tweedie [391].
The criteria for non-evanescence or Harris recurrence here are of course closely re-
lated to those in the previous chapter. The martingale argument for non-evanescence
is in [277] and [398], but can be traced back in essentially the same form to Lamperti
[234]. The converse to the recurrence criterion under the Feller condition, and the fact
that it does not hold in general, are new: the construction of the converse function V
is however based on a similar result for countable chains, in Mertens et al. [258].
The term “coercive” to describe functions whose sublevel sets are precompact is
new. The justification for the terminology is that coercive functions do, in most of
our contexts, measure the distance from a point to a compact “center” of the state
space. This will become clearer in later chapters when we see that under a suitable
drift condition, the mean time to reach some compact set from Φ0 = x is bounded by
a constant multiple of V (x). Hence V (x) bounds the mean “distance” to this compact
set, measured in units of time. Beneš in [24] uses the term moment for these functions.
Since “moments” are standard in referring to the expectations of random variables, this
terminology is obviously inappropriate here.
Stochastic comparison arguments have been used for far too long to give a detailed
attribution. For proving transience, in particular, they are a most effective tool. The
analysis we present here of the SETAR model is essentially in Petruccelli et al. [315]
and Chan et al. [64].
The analysis of chains via their increments, and the delicate balance required be-
tween m(x) and v(x) for recurrence and transience, is found in Lamperti [234]; see also
Tweedie [398]. Growth models for which m(x) ≥ θv(x)/2x are studied by, for example,
Kersting (see [205]), and their analysis via suitable renormalization proves a fruitful
approach to such transient chains.
It may appear that we are devoting a disproportionate amount of space to unstable
chains, and too little to chains with stability properties. This will be rectified in the
rest of the book, where we will be considering virtually nothing but chains with ever
stronger stability properties.
Chapter 10
The existence of π
In our treatment of the structure and stability concepts for irreducible chains we have
to this point considered only the dichotomy between transient and recurrent chains.
For transient chains there are many areas of theory that we shall not investigate
further, despite the flourishing research that has taken place in both the mathematical
development and the application of transient chains in recent years. Areas which are
notable omissions from our treatment of Markovian models thus include the study of
potential theory and boundary theory [326], as well as the study of renormalized models
approximated by diffusions and the quasi-stationary theory of transient processes [108,
4].
Rather, we concentrate on recurrent chains which have stable properties without
renormalization of any kind, and develop the consequences of the concept of recurrence.
In this chapter we further divide recurrent chains into positive and null recurrent
chains, and show here and in the next chapter that the former class provide stochastic
stability of a far stronger kind than the latter.
For many purposes, the strongest possible form of stability that we might require
in the presence of persistent variation is that the distribution of Φn does not change as
n takes on different values. If this is the case, then by the Markov property it follows
that the finite dimensional distributions of Φ are invariant under translation in time.
Such considerations lead us to the consideration of invariant measures.
Invariant measures
A σ-finite measure π on B(X) with the property
π(A) = π(dx)P (x, A), A ∈ B(X) (10.1)
X
229
230 The existence of π
Theorem 10.0.1. If the chain Φ is recurrent then it admits a unique (up to constant
multiples) invariant measure π, and the measure π has the representation, for any
A ∈ B + (X)
τA
π(B) = π(dw)Ew I{Φn ∈ B} , B ∈ B(X). (10.2)
A n =1
The invariant measure π is finite (rather than merely σ-finite) if there exists a petite
set C such that
sup Ex [τC ] < ∞.
x∈C
Proof The existence and representation of invariant measures for recurrent chains
is proved in full generality in Theorem 10.4.9: the proof exploits, via the Nummelin split-
ting technique, the corresponding theorem for chains with atoms as in Theorem 10.2.1,
in conjunction with a representation for invariant measures given in Theorem 10.4.9.
The criterion for finiteness of π is in Theorem 10.4.10.
If an invariant measure is finite, then it may be normalized to a stationary probabil-
ity measure, and in practice this is the main stable situation of interest. If an invariant
measure has infinite total mass, then its probabilistic interpretation is much more dif-
ficult, although for recurrent chains, there is at least the interpretation as described in
(10.2).
These results lead us to define the following classes of chains.
Proof Suppose that the chain is positive and let π be a invariant probability mea-
sure. If the chain is also transient, let Aj be a countable cover of X with uniformly
transient sets, as guaranteed by Theorem 8.3.4, with U (x, Aj ) ≤ Mj , say.
Using (10.4) we have for any j, k
k
kπ(Aj ) = π(dw)P n (w, Aj ) ≤ Mj
n =1
and since the left hand side remains finite as k → ∞, we have π(Aj ) = 0. This implies
π is trivial so we have a contradiction.
Positive chains are often called “positive recurrent” to reinforce the fact that they
are recurrent. This also naturally gives the definition
It is of course not yet clear that an invariant probability measure π ever exists, or
whether it will be unique when it does exist. It is the major purpose of this chapter to
find conditions for the existence of π, and to prove that for any positive (and indeed
recurrent) chain, π is essentially unique.
Invariant probability measures are important not merely because they define sta-
tionary processes. They will also turn out to be the measures which define the long
term or ergodic behavior of the chain. To understand why this should be plausible,
232 The existence of π
Subinvariant measures
If µ is σ-finite and satisfies
µ(A) ≥ µ(dx)P (x, A), A ∈ B(X), (10.5)
X
(ii) µ ψ;
Proof Suppose µ(A) < ∞ for some A with ψ(A) > 0. Using A∗ (j) = {y :
Ka 1 / 2 (y, A) > j −1 }, we have by (10.6),
∞ > µ(A) ≥ µ(dw)Ka 1 / 2 (w, A) ≥ j −1 µ(A∗ (j));
A ∗ (j )
!
since A∗ (j) = X when ψ(A) > 0, such a µ must be σ-finite.
To prove (ii) observe that, by (10.6), if B ∈ B+ (X) we have µ(B) > 0, so µ ψ.
Thirdly, if C is νa -petite then there exists a set B with νa (B) > 0 and µ(B) < ∞,
from (i). By (10.6) we have
µ(B) ≥ µ(dw)Ka (w, B) ≥ µ(C)νa (B) (10.7)
= µ(X) (10.8)
The major questions of interest in studying subinvariant measures lie with recurrent
chains, for we always have
Proposition 10.1.3. If the chain Φ is transient, then there exists a strictly subinvari-
ant measure for Φ.
234 The existence of π
Proof Suppose that Φ is transient: then by Theorem 8.3.4, we have that the
measures µx given by
µx (A) = U (x, A), A ∈ B(X),
so that each µx is subinvariant (and obviously strictly subinvariant, since there is some
A with µx (A) < ∞ such that P (x, A) > 0).
(ii) The measure µ◦α is minimal in the sense that if µ is subinvariant with µ(α) = 1,
then
µ(A) ≥ µ◦α (A), A ∈ B(X).
Eα [τα ] < ∞,
where the inequality comes from the bound µ◦α (α) ≤ 1. Thus µ◦α is subinvariant, and
is invariant if and only if µ◦α (α) = Pα (τα < ∞) = 1; that is, from Proposition 8.3.1, if
and only if the chain is recurrent.
(ii) Let µ be any subinvariant measure with µ(α) = 1. By subinvariance,
µ(A) ≥ µ(dw)P (w, A)
X
≥ µ(α)P (α, A) = P (α, A).
n
Assume inductively that µ(A) ≥ m =1 α P m (α, A), for all A. Then by subinvariance,
µ(A) ≥ µ(α)P (α, A) + µ(dw)P (w, A)
αc
8 9
n
≥ P (α, A) + m
α P (α, dw) P (w, A)
αc m =1
n +1
m
= αP (α, A).
m =1
Hence we must have µ = µ◦α , and thus when Φ is recurrent, µ◦α is the unique (sub)
invariant measure.
(iii) If µ◦α is finite it follows from Proposition 10.1.2 (iv) that µ◦α is invariant.
Finally
∞
µ◦α (X) = Pα (τα ≥ n) (10.12)
n =1
and so an invariant probability measure exists if and only if the mean return time to α
is finite, as stated.
236 The existence of π
We shall use π to denote the unique invariant measure in the recurrent case. Unless
stated otherwise we will assume π is normalized to be a probability measure when π(X)
is finite.
The invariant measure µ◦α has an equivalent sample path representation for recurrent
chains:
τα
◦
µα (A) = Eα I{Φn ∈ A} , A ∈ B(X). (10.13)
n =1
This follows from the definition of the taboo probabilities α P n .
As an immediate consequence of this construction we have the following elegant
criterion for positivity.
Theorem 10.2.2 (Kac’s Theorem). If Φ is ψ-irreducible and admits an atom α ∈
B + (X), then Φ is positive recurrent if and only if Eα [τα ] < ∞; and if π is the invariant
probability measure for Φ, then
π(α) = (Eα [τα ])−1 . (10.14)
Proof If Eα [τα ] < ∞, then also L(α, α) = 1, and by Proposition 8.3.1 Φ is recur-
rent; it follows from the structure of π in (10.10) that π is finite so that the chain is
positive.
Conversely, Eα [τα ] < ∞ when the chain is positive from the structure of the unique
invariant measure.
By the uniqueness of the invariant measure normalized to be a probability measure
π we have
µ◦ (α) Uα (α, α) 1
π(α) = α◦ = =
µα (X) Uα (α, X) Eα [τα ]
which is (10.14).
The relationship (10.14) is often known as Kac’s Theorem. For countable state space
models it immediately gives us
Proposition 10.2.3. For a positive recurrent irreducible Markov chain on a countable
space, there is a unique (up to constant multiples) invariant measure π given by
π(x) = [Ex [τx ]]−1
for every x ∈ X.
We now illustrate the use of the representation of π for a number of countable space
models.
As noted in Section 8.1.2, this chain is always recurrent since p(j) = 1.
By construction we have that
1P
n
(1, j) = p(j + n − 1), j ≤ n,
that is, if and only if the renewal distribution {p(i)} has finite mean.
It is, of course, equally easy to deduce this formula by solving the invariant equations
themselves, but the result is perhaps more illuminating from this approach.
Now suppose that the distribution {p(j)} is periodic with period d: that is, the
greatest common divisor of the set Np = {n : p(n) > 0} is d. Let [Np ] denote the span
of Np ,
( )
[Np ] = mi ri : mi ∈ Z+ , ri ∈ Np .
We have P n (j, 1) > 0 whenever n − j + 1 ∈ [Np ].
By Lemma D.7.4 there exists an integer n0 < ∞ such that nd ∈ [Np ] for all n ≥ n0 .
If d = 1 it follows that the forward recurrence time process V + is aperiodic, since in
this case
P n (j, 1) > 0, n − j + 1 ≥ n0 . (10.18)
This chain is constructed by taking the two independent copies V1+ (n), V2+ (n) of the
forward recurrence time chain and running them independently. It then follows from
(10.18) that V ∗ is ψ-irreducible if {p(j)} has period d = 1.
Moreover V ∗ is positive Harris recurrent on X∗ provided only k kp(k) < ∞, as
was the case for the single copy of the forward recurrence time chain. To prove this we
need only note that the product measure π ∗ (i, j) = π(i)π(j) is invariant for V ∗ , where
π(j) = p(k)/ kp(k)
k ≥j k
238 The existence of π
is the invariant probability measure for the forward recurrence time process from (10.16)
and (10.17); positive Harris recurrence follows since π ∗ (X∗ ) = [π(X)]2 = 1.
These conditions for positive recurrence of the bivariate forward time process will
be of critical use in the development of the asymptotic properties of general chains in
Part III.
where qi = P(Z = i − 1) for the increment variable in the chain when the server is busy;
that is, for transitions from states other than {0}. The chain N ∗ is always ψ-irreducible
if q0 > 0, and irreducible in the standard sense if also q0 + q1 < 1, and we shall assume
this to be the case to avoid trivialities.
In this case, we can actually solve the invariant equations explicitly. For j ≥ 1,
(10.1) can be written
j +1
π(j) = π(k)qj +1−k (10.20)
k =0
and if we define
∞
q̄j = qn
n =j +1
so, since β > −1, we must have β < 0. Conversely, if β < 0, and we take
π(0) = −β,
then the same summation (10.21) indicates that the invariant measure π is finite.
Thus we have
Proposition 10.3.1. The chain N ∗ is positive if and only if the increment distribution
satisfies β = jqj < 1.
This same type of direct calculation can be carried out for any so-called “skip-free”
chain with P (i, j) = 0 for j < i − 1, such as the forward recurrence time chain above.
For other chains it can be far less easy to get a direct approach to the invariant measure
through the invariant equations, and we turn to the representation in (10.10) for our
results.
[k ] P
r
(k, j) = α P r (α, j − k). (10.23)
240 The existence of π
and since µ◦α is minimal it must be the smallest solution to (10.25). As is well known,
there are two cases to consider: since the function of s on the right hand side of (10.25)
is strictly convex, a solution s ∈ (0, 1) exists if and only if
∞
jpj > 1,
0
whilst if j j pj ≤ 1 then the minimal solution to (10.25) is sα = 1.
◦
One can then verify directly that in each
of these cases µα solves all of the invariant
equations, as required. In particular, if j j pj = 1 so that the chain is recurrent from
the remarks following Proposition 9.1.2, the unique invariant measure is µα (x) ≡ 1, x ∈
X: note that in this case, in fact, the first invariant equation is exactly
1= pn = j pj .
j ≥0 n > j j
Hence for recurrent chains (those for which j j pj ≥ 1) we have shown
10.4. The existence of π: ψ-irreducible chains 241
Proposition 10.3.2. The unique subinvariant measure for N is given by µα (k) = skα ,
α is the minimal solution to (10.25) in (0, 1]; and N is positive recurrent if and
where s
only if j j pj > 1.
The geometric form (10.24), as a “trial solution” to the equation (10.1), is often
presented in an arbitrary way: the use of Theorem 10.2.1 motivates this solution, and
also shows that sα in (10.24) has an interpretation as the expected number of visits to
state k + 1 from state k, for any k.
Proof To prove (i) note that by (5.5), (5.6), and (5.7), we have that the measure
P̌ (xi , · ) is of the form µ∗x i for any xi ∈ X̌, where µx i is a probability measure on X. By
linearity of the splitting and invariance of π̌, for any Ǎ ∈ B(X̌),
∗
∗
π̌(Ǎ) = π̌(dxi )P̌ (xi , Ǎ) = π̌(dxi )µx i (Ǎ) = π̌(dxi )µx i ( · ) (Ǎ).
Thus π̌ = π0∗ , where π0 = π̌(dxi )µx i ( · ).
By (10.26) we have that π(A) = π0∗ (A0 ∪ A1 ) = π0 (A), so that in fact π̌ = π ∗ . This
proves one part of (i), and we now show that π is invariant for Φ. For any A ∈ B(X)
we have by invariance of π ∗ and (5.10),
∗
π(A) = π ∗ (A0 ∪ A1 ) = π ∗ P̌ (A0 ∪ A1 ) = πP (A0 ∪ A1 ) = πP (A),
Proof Assume that Φ is strongly aperiodic, and split the chain as in Section 5.1.
If Φ is recurrent then it follows from Proposition 8.2.2 that Φ̌ is also recurrent.
We have from Theorem 10.2.1 that Φ̌ has a unique subinvariant measure π̌ which is
invariant. Thus we have from Proposition 10.4.1 that Φ also has an invariant measure.
The uniqueness is equally easy. If Φ has another subinvariant measure µ, then by
Proposition 10.4.1 the split measure µ∗ is subinvariant for Φ̌, and since from Theo-
rem 10.2.1, the invariant measure π̌ is unique (up to constant multiples) for Φ̌, we must
have for some c > 0 that µ∗ = cπ̌. By linearity this gives µ = cπ as required.
We can, quite easily, lift this result to the whole chain even in the case where we do
not have strong aperiodicity by considering the resolvent chain, since the chain and the
resolvent share the same invariant measures.
Theorem 10.4.3. For any ε ∈ (0, 1), a measure π is invariant for the resolvent Ka ε
if and only if it is invariant for P .
This now gives us immediately
Proof Using Theorem 5.2.3, we have that the Ka ε -chain is strongly aperiodic, and
from Theorem 8.2.4 we know that the Ka ε -chain is recurrent. Let π be the unique
invariant measure for the Ka ε -chain, guaranteed from Proposition 10.4.2. From Theo-
rem 10.4.3, π is also invariant for Φ.
10.4. The existence of π: ψ-irreducible chains 243
Suppose that µ is subinvariant for Φ. Then by (10.6) we have that µ is also subin-
variant for the Ka ε -chain, and so there is a constant c > 0 such that µ = cπ. Hence
we have shown that π is the unique (up to constant multiples) invariant measure for
Φ.
We may now equate positivity of Φ to positivity for its skeletons as well as the
resolvent chains.
Theorem 10.4.5. Suppose that Φ is ψ-irreducible and aperiodic. Then, for each m, a
measure π is invariant for the m-skeleton if and only if it is invariant for Φ.
Hence, under aperiodicity, the chain Φ is positive if and only if each of the m-
skeletons Φm is positive.
and so πm = π.
Proposition 10.4.6. The measure µ◦A is subinvariant, and minimal in the sense that
µ(B) ≥ µ◦A (B) for all B ∈ B(X).
244 The existence of π
Hence the induction holds for all n, and taking n ↑ ∞ shows that
µ(B) ≥ µ(dw)UA (w, B)
A
Hence, the inequality µ(B) ≥ µ◦A (B) must be an equality for all B ⊆ A. Thus the
measure µ satisfies
µ(B) = µ(dw)UA (w, B) (10.29)
A
whenever B ⊆ A.
We now use (10.29) to prove invariance of µ◦A . For any B ∈ B(X),
µ◦A (dy)P (y, B) = µ◦A (dy)P (y, B)
X A
◦
+ µA (dw)UA (w, dy) P (y, B)
Ac A
8 ∞
9
= µ◦A (dy) P (y, B) + n
A P (y, B)
A 2
= µ◦A (B) (10.30)
c
and so µ◦A is invariant for Φ. It follows by definition that µ◦A (A )
= 0, so (ii) is proved.
We now prove (i) by contradiction. Suppose that B ⊆ A with µ(B) > µ◦A (B). Then
we have from invariance of the resolvent chain in Proposition 10.4.3 and minimality of
µ◦A , and the assumption that Ka ε (x, A) > 0 for x ∈ B,
µ(A) ≥ µ(dy)Ka ε (y, A) > µ◦A (dy)Ka ε (y, A) = µ◦A (A) = µ(A),
X X
and we thus have a contradiction.
An interesting consequence of this approach is the identity (10.29). This has the
following interpretation. Assume A is Harris recurrent, and define the process on A,
denoted by ΦA = {ΦA n }, by starting with Φ0 = x ∈ A, then setting Φ1 as the value of
A
Φ at the next visit to A, and so on. Since return to A is sure for Harris recurrent sets,
this is well defined.
Formally, ΦA is actually constructed from the transition law
∞
UA (x, B) = AP
n
(x, B) = Px {Φτ A ∈ B},
n =1
B ⊆ A, B ∈ B(X). Theorem 10.4.7 thus states that for a Harris recurrent set A, any
subinvariant measure restricted to A is actually invariant for the process on A.
One can also go in the reverse direction, starting off with an invariant measure for
the process on A. The following result is proved using the same calculations used in
(10.30):
Proposition 10.4.8. Suppose that ν is an invariant probability measure supported on
the set A with
ν(dx)UA (x, B) = ν(B), B ⊆ A.
A
Then the measure ν ◦ defined as
ν ◦ (B) := ν(dx)UA (x, B), B ∈ B(X),
A
is invariant for Φ.
246 The existence of π
Theorem 10.4.9. Suppose Φ is recurrent. Then the unique (up to constant multiples)
invariant measure π for Φ is equivalent to ψ and satisfies for any A ∈ B+ (X), B ∈ B(X),
π(B) = A π(dy)UA(y, B)
τ A
= A π(dy)Ey I{Φ k ∈ B} (10.31)
k =1
τ A −1
= A π(dy)Ey k =0 I{Φ k ∈ B} .
Proof The construction in Theorem 10.2.1 ensures that the invariant measure π
exists. Hence from Theorem 10.4.7 we see that π = πA◦ for any Harris recurrent set A,
and π then satisfies the first equality in (10.31) by construction. The second equality
is just the definition of UA . To see the third equality,
τA τ
A −1
π(dy)Ey I{Φk ∈ B} = π(dy)Ey I{Φk ∈ B} ,
A k =1 A k =0
We finally prove that π ∼ = ψ. From Proposition 10.1.2 we need only show that if
ψ(B) = 0 then also π(B) = 0. But since ψ(B̄) = 0, we have that B 0 ∈ B + (X), and so
from the representation (10.31),
π(B) = π(dy)UB 0 (y, B) = 0,
B0
Theorem 10.4.10. Suppose that Φ is ψ-irreducible, and let µ denote any subinvariant
measure.
10.5. Invariant measures for general models 247
(i) The chain Φ is positive if and only if for one, and then every, set with µ(A) > 0
µ(dy)Ey [τA ] < ∞. (10.32)
A
(ii) The measure µ is finite and thus Φ is positive recurrent if for some petite set
C ∈ B+ (X)
sup Ey [τC ] < ∞. (10.33)
y ∈C
if this is finite thenµ◦A is finite and the chain is positive by definition. Conversely, if
the chain is positive then by Theorem 10.4.9 we know that µ must be a finite invariant
measure and (10.32) then holds for every A.
The second result now follows since we know from Proposition 10.1.2 that µ(C) < ∞
for petite C; and hence we have positive recurrence from (10.33) and (i), whilst the chain
is also Harris if (10.34) holds from the criterion in Theorem 9.1.7.
In Chapter 11 we find a variety of usable and useful conditions for (10.33) and
(10.34) to hold, based on a drift approach which strengthens those in Chapter 8.
since Γ(R) = 1. We have already used this formula in (6.8): here it shows that Lebesgue
measure is invariant for unrestricted random walk in either the transient or the recurrent
case.
Since Lebesgue measure on R is infinite, we immediately have from Theorem 10.4.9
that there is no finite invariant measure for this chain: this proves
Proposition 10.5.1. The random walk on R is never positive recurrent.
If we put this together with the results in Section 9.5, then we have that when the
mean β of the increment distribution is zero, then the chain is null recurrent.
Finally, we note that this is one case where the interpretation in (10.31) can be
expressed in another way. We have, as an immediate consequence of this interpretation
Proposition 10.5.2. Suppose Φ is a random walk on R, with spread-out increment
measure Γ having zero mean and finite variance.
Let A be any bounded set in R with µL e b (A) > 0, and let the initial distribution of
Φ0 be the uniform distribution on A. If we let NA (B) denote the mean number of visits
to a set B prior to return to A, then for any two bounded sets B, C with µL e b (C) > 0
we have
E[NA (B)]/E[NA (C)] = µL e b (B)/µL e b (C).
Proof Under the given conditions on Γ we have from Proposition 9.4.5 that the
chain is non-evanescent, and hence recurrent.
Using (10.35) we have that the unique invariant measure with π(A) = 1 is π =
µL e b /π(A), and then the result follows from the form (10.31) of π.
Now we use uniqueness of the invariant measure to note that, since the chain V + δ is the
“two-step” chain for the chain V +
δ /2 , the invariant measures πδ and πδ /2 must coincide.
Thus letting δ go to zero through the values δ/2n we find that for any δ the invariant
measure is given by
πδ (dy) = m−1 F [y, ∞)dy (10.36)
∞
where m = 0 tF (dt); and πδ is a probability measure provided m < ∞.
10.5. Invariant measures for general models 249
By direct integration it is also straightforward to show that this is indeed the in-
variant measure for V +δ .
This form of the invariant measure thus reinforces the fact that the quantity
F [y, ∞)dy is the expected amount of time spent in the infinitesimal set dy on each
excursion from the point {0}, even though in the discretized chain V +δ the point {0} is
never actually reached.
P (i, x; j × A) = 0, j > i + 1,
P (i, x; j × A) = Λi−j +1 (x, A), j = 1, . . . , i + 1, (10.37)
P (i, x; 0 × A) = Λ∗i (x, A).
Let us consider the general chain defined by (10.37), where we can treat x and A
as general points in and subsets of X, so that the chain Φ now moves on a ladder
whose (countable number of) rungs are general in nature. In the special case of the
GI/G/1 model the results specialize to the situation where X = R+ , and there are many
countable models where the rungs are actually finite and matrix methods are used to
achieve the following results.
Using the representation of π, it is possible to construct an invariant measure for
this chain in an explicit way; this then gives the structure of the invariant measure for
the GI/G/1 queue also.
Since we are interested in the structure of the invariant probability measure we make
the assumption in this section that the chain defined by (10.37) is positive Harris and
ψ([0]) > 0, where [0] := {0 × X} is the bottom “rung” of the ladder. We shall explore
conditions for this to hold in Chapter 19.
Our assumption ensures we can reach the bottom of the ladder with probability one.
Let us denote by π0 the invariant probability measure for the process on [0], so that π0
can be thought of as a measure on B(X).
Our goal will be to prove that the structure of the invariant measure for Φ is an
“operator-geometric” one, mimicking the structure of the invariant measure developed
in Section 10.3 for skip-free random walk on the integers.
where
S k (y, A) = S(y, dz)S k −1 (z, A) (10.39)
X
250 The existence of π
so that if we write
S (k ) (y, A) := U[0] (0, y; k × A) (10.42)
we have by definition
π(k × A) = π0 (dy)S (k ) (y, A). (10.43)
[0]
Now if we define the set [n] = {0, 1, . . . , n} × X, by the fact that the chain is translation
invariant above the zero level we have that the functions
are independent of n. Using a last-exit decomposition over visits to [k], together with
the skip-free property which ensures that the last visit to [k] prior to reaching (k +1)×X
takes place at the level k × X, we find
[0] P(0, x; (k + 1) × A)
−1
=1 X [0] P (0, x; k × dy)[k ] P (k, y; (k + 1) × A)
j
−j
= (10.45)
j
−1
= j =1 X [0] P j
(0, x; k × dy)[0] P
−j
(0, y; 1 × A).
Summing over
and using (10.44) shows that the operators S (k ) (y, A) have the geo-
metric form in (10.39) as stated.
To see that the operator S satisfies (10.40), we decompose [0] P n over the position
at time n − 1. By construction [0] P 1 (0, x; 1 × B) = Λ0 (x, B), and for n > 1,
n −1
[0] P n
(0, x; 1 × B) = [0] P (0, x; k × dy)Λk (y, B); (10.46)
k ≥1 X
so that
SN −1 (x; k + 1 × B) ≤ k
SN −1 (x; 1 × dy)SN −1 (y; 1 × B). (10.48)
Now let S ∗ be any other solution of (10.40). Notice that S1 (x; 1 × B) = Λ0 (x, B) ≤
S ∗ (x, B), from (10.40). Assume inductively that SN −1 (x; 1×B) ≤ S ∗ (x, B) for all x, B:
then we have from (10.50) that
SN (x; 1 × B) ≤ [S ∗ ]k (x, dy)Λk (y, B) = S ∗ (x, B). (10.51)
k X
Φn = (Nn , Rn ), n≥1
where Nn is the number of customers at Tn − and Rn is the residual service time at
Tn +. In this case the representation of π[0] can also be made explicit.
For the GI/G/1 chain we have that the chain on [0] has the distribution of Rn at
a time point {Tn +} where there were no customers at {Tn −}: so at these time points
Rn has precisely the distribution of the service brought by the customer arriving at Tn ,
namely H.
So in this case we have that the process on [0], provided [0] is recurrent, is a process
of i.i.d random variables with distribution H, and thus is very clearly positive Harris
with invariant probability H.
Theorem 10.5.3 then gives us
252 The existence of π
Theorem 10.5.4. The ladder chain Φ describing the GI/G/1 queue has an invariant
probability if and only if the measure π given by
π(k × A) = H(dy)S k (y, A) (10.52)
X
In this case π suitably normalized is the unique invariant probability measure for Φ.
Proof Using the proof of Theorem 10.5.3 we have that π is the minimal subinvari-
ant measure for the GI/G/1 queue, and the result is then obvious.
k −1
Xk = F k x0 + F i GWk −i
i=0
k −1
∼ F k x0 + F i GWi .
i=0
k −1
P k g (x0 ) = Ex 0 [g(Xk )] = E[g(F k x0 + F i GWi )].
i=0
Under the additional hypothesis that the eigenvalue condition (LSS5) holds, it follows
from Lemma 6.3.4 that F i → 0 as i → ∞ at a geometric rate. Since W has a finite
mean then it follows from Fubini’s Theorem that the sum
∞
X∞ := F i GWi
i=0
10.6. Commentary 253
∞
converges absolutely, with E[|X∞ |] ≤ E[|W |] i=0
F i G
< ∞, with
·
an appropriate
matrix norm. Hence by the Dominated Convergence Theorem, and the assumption that
g is continuous,
lim P k g (x0 ) = E[g(X∞ )].
k →∞
Since π is determined by its values on continuous bounded functions, this proves that
π is invariant.
In the Gaussian case (LSS3) we can express the invariant probability more explicitly.
In this case X∞ itself is Gaussian with mean zero and covariance
∞
E[X∞ X∞ ]= F i GG F i .
k =0
That is, π = N (0, Σ) where Σ is equal to the controllability grammian for the linear
state space model, defined in (4.17).
The covariance matrix Σ is full rank if and only if the controllability condition
(LCM3) holds, and in this case, for any k greater than or equal to the dimension of
the state space, P k (x, dy) possesses the density pk (x, y)dy given in (4.18). It follows
immediately that when (LCM3) holds, the probability π possesses the density p on Rn
given by ( )
p(y) = (2π|Σ|)−n /2 exp − 12 y T Σ−1 y , (10.54)
while if the controllability condition (LCM3) fails to hold then the invariant probability
is concentrated on the controllable subspace X0 = R(Σ) ⊂ X and is hence singular with
respect to Lebesgue measure.
10.6 Commentary
The approach to positivity given here is by no means standard. It is much more common,
especially with countable spaces, to classify chains either through the behavior of the
sequence P n , with null chains being those for which P n (x, A) → 0 for, say, petite sets
A and all x, and positive chains being those for which such limits are not always zero;
a limiting argument such as that in (10.4), which we have illustrated in Section 10.5.4,
then shows the existence of π in the positive case.
Alternatively, positivity is often defined through the behavior of the expected return
times to petite or other suitable sets.
We will show in Chapter 11 and Chapter 18 that even on a general space all of
these approaches are identical. Our view is that the invariant measure approach is
254 The existence of π
much more straightforward to understand than the P n approach, and since one can
now develop through the splitting technique a technically simple set of results this gives
an appropriate classification of recurrent chains.
The existence of invariant probability measures has been a central topic of Markov
chain theory since the inception of the subject. Doob [99] and Orey [309] give some good
background. The approach to countable recurrent chains through last-exit probabilities
as in Theorem 10.2.1 is due to Derman [86], and has not changed much since, although
the uniqueness proofs we give owe something to Vere-Jones [406]. The construction of π
given here is of course one of our first serious uses of the splitting method of Nummelin
[301]; for strongly aperiodic chains the result is also derived in Athreya and Ney [13].
The fact that one identifies the actual structure of π in Theorem 10.4.9 will also be of
great use, and Kac’s Theorem [186] provides a valuable insight into the probabilistic
difference between positive and null chains: this is pursued in the next chapter in
considerably more detail.
Before the splitting technique, verifying conditions for the existence of π had ap-
peared to be a deep and rather difficult task. It was recognized in the relatively early
development of general state space Markov chains that one could prove the existence
of an invariant measure for Φ from the existence of an invariant probability measure
for the “process on A”. The approach pioneered by Harris [155] for finding the latter
involves using deeper limit theorems for the “process on A” in the special case where
A is a νn -small set, (called a C-set in Orey [309]) if an = δn and νn {A} > 0. In this
methodology, it is first shown that limiting probabilities for the process on A exist, and
the existence of such limits then provides an invariant measure for the process on A:
by the construction described in this chapter this can be lifted to an invariant measure
for the whole chain. Orey [309] remains an excellent exposition of the development of
this approach.
This “process on A” method is still the only one available without some regeneration,
and we will develop this further in a topological setting in Chapter 12, using many of
the constructions above.
We have shown that invariant measures exist without using such deep asymptotic
properties of the chain, indicating that the existence and uniqueness of such measures
is in fact a result requiring less of the detailed structure of the chain.
The minimality approach of Section 10.4.2 of course would give another route to
Theorem 10.4.4, provided we had some method of proving that a “starting” subinvari-
ant measure existed. There is one such approach, which avoids splitting and remains
conceptually simple. This involves using the kernels
∞
(r )
U (x, A) = P (x, A)r ≥ r U (r ) (x, dy)P (y, A)
n n
(10.55)
n =1 X
defined for 0 < r < 1. One can then define a subinvariant measure for Φ as a limit
lim πr ( · ) := lim[ νn (dy)U (r ) (y, · )]/[ νn (dy)U (r ) (y, C)]
r ↑1 r ↑1 C C
where C is a νn -small set. The key is the observation that this limit gives a non-trivial
σ-finite measure due to the inequalities
Mj ≥ πr (C̄(j)) (10.56)
10.6. Commentary 255
and
πr (A) ≥ rn νn (A), A ∈ B(X), (10.57)
which are valid for all r large enough. Details of this construction are in Arjas and
Nummelin [7], as is a neat alternative proof of uniqueness.
All of these approaches are now superseded by the splitting approach, but of course
only when the chain is ψ-irreducible. If this is not the case then the existence of an
invariant measure is not simple. The methods of Section 10.4.2, which are based on
Tweedie [402], do not use irreducibility, and in conjunction with those in Chapter 12
they give some ways of establishing uniqueness and structure for the invariant measures
from limiting operations, as illustrated in Section 10.5.4.
The general question of existence and, more particularly, uniqueness of invariant
measures for non-irreducible chains remains open at this stage of theoretical develop-
ment.
The invariance of Lebesgue measure for random walk is well known, as is the form
(10.36) for models in renewal theory. The invariant measures for queues are derived
directly in [59], but the motivation through the minimal measure of the geometric form
is not standard. The extension to the operator-geometric form for ladder chains is in
[399], and in the case where the rungs are finite, the development and applications are
given by Neuts [293, 294].
The linear model is analyzed in Snyders [364] using ideas from control theory, and
the more detailed analysis given there allows a generalization of the construction given in
Section 10.5.4. Essentially, if the noise does not enter the “unstable” region of the state
space then the stability condition on the driving matrix F can be slightly weakened.
Chapter 11
Using the finiteness of the invariant measure to classify two different levels of stabil-
ity is intuitively appealing. It is simple, and it also involves a fundamental stability
requirement of many classes of models. Indeed, in time series analysis for example, a
standard starting point, rather than an end point, is the requirement that the model be
stationary, and it follows from (10.4) that for a stationary version of a model to exist
we are in effect requiring that the structure of the model be positive recurrent.
In this chapter we consider two other descriptions of positive recurrence which we
show to be equivalent to that involving finiteness of π.
The first is in terms of regular sets.
Regularity
A set C ∈ B(X) is called regular when Φ is ψ-irreducible, if
We know from Theorem 10.2.1 that when there is a finite invariant measure and an
atom α ∈ B + (X) then Eα [τα ] < ∞. A regular set C ∈ B + (X) as defined by (11.1) has
the property not only that the return times to C itself, but indeed the mean hitting
times on any set in B + (X) are bounded from starting points in C.
We will see that there is a second, equivalent, approach in terms of conditions on
the one-step “mean drift”
∆V (x) = P (x, dy)V (y) − V (x) = Ex [V (Φ1 ) − V (Φ0 )]. (11.2)
X
We have already shown in Chapter 8 and Chapter 9 that for ψ-irreducible chains, drift
towards a petite set implies that the chain is recurrent or Harris recurrent, and drift
256
Drift and regularity 257
away from such a set implies that the chain is transient. The high points in this chapter
are the following much more wide ranging equivalences.
Theorem 11.0.1. Suppose that Φ is a Harris recurrent chain, with invariant measure
π. Then the following three conditions are equivalent:
(i) The measure π has finite total mass;
(ii) There exists some petite set C ∈ B(X) and MC < ∞ such that
sup Ex [τC ] ≤ MC ; (11.3)
x∈C
(iii) There exists some petite set C and some extended-real-valued, non-negative test
function V, which is finite for at least one state in X, satisfying
∆V (x) ≤ −1 + bIC (x), x ∈ X. (11.4)
When (iii) holds then V is finite on an absorbing full set S and the chain restricted to
S is regular; and any sublevel set of V satisfies (11.3).
Proof That (ii) is equivalent to (i) is shown by combining Theorem 10.4.10 with
Theorem 11.1.4, which also shows that some full absorbing set exists on which Φ is
regular. The equivalence of (ii) and (iii) is in Theorem 11.3.11, whilst the identification
of the set S as the set where V is finite is in Proposition 11.3.13, where we also show
that sublevel sets of V satisfy (11.3).
Both of these approaches, as well as giving more insight into the structure of positive
recurrent chains, provide tools for further analysis of asymptotic properties in Part III.
In this chapter, the equivalence of existence of solutions of the drift condition (11.4)
and the existence of regular sets is motivated, and explained to a large degree, by the
deterministic results in Section 11.2. Although there are a variety of proofs of such
results available, we shall develop a particularly powerful approach via a discrete time
form of Dynkin’s formula.
Because it involves only the one-step transition kernel, (11.4) provides an invaluable
practical criterion for evaluating the positive recurrence of specific models: we illustrate
this in Section 11.4.
There exists a matching, although less important, criterion for the chain to be non-
positive rather than positive: we shall also prove in Section 11.5.1 that if a test function
satisfies the reverse drift condition
∆V (x) ≥ 0, x ∈ Cc , (11.5)
then provided the increments are bounded in mean, in the sense that
sup P (x, dy)|V (x) − V (y)| < ∞, (11.6)
x∈X
Since the left hand side is finite for any x, and by irreducibility for any y there is some
n with x P n (x, y) > 0, we must have Ey [τx ] < ∞ for all y also.
It will require more work to find the connections between positive recurrence and
regularity in general.
It is not implausible that positive chains might admit regular sets. It follows imme-
diately from (10.32) that in the positive recurrent case for any A ∈ B+ (X) we have
Thus we have from the form of π more than enough “almost-regular” sets in the positive
recurrent case.
To establish the existence of true regular sets we first consider ψ-irreducible chains
which possess a recurrent atom α ∈ B + (X). Although it appears that regularity may
be a difficult criterion to meet since in principle it is necessary to test the hitting time
of every set in B + (X), when an atom exists it is only necessary to consider the first
hitting time to the atom.
Theorem 11.1.2. Suppose that there exists an accessible atom α ∈ B+ (X).
(i) If Φ is positive recurrent then there exists a decomposition
X=S∪N (11.8)
for every x ∈ X.
Proof Let
S := {x : Ex [τα ] < ∞};
obviously S is absorbing, and since the chain is positive recurrent we have from Theo-
rem 10.4.10 (ii) that Eα [τα ] < ∞, and hence α ∈ S. This also shows immediately that
S is full by Proposition 4.2.3.
11.1. Regular chains 259
Let B be any set in B + (X) with B ⊆ αc , so that for π-almost all y ∈ B we have
Ey [τB ] < ∞ from (11.7). From ψ-irreducibility there must then exist amongst these
values one w and some n such that B P n (w, α) > 0. Since
We have the obvious inequality for any x and any B ∈ B+ (X) that
so that each Sn is a regular set, and since {Sn } is a cover of S, we have that Φ restricted
to S is regular.
This proves (i): to see (ii) note that under (11.9) we have X = S, so the chain is
regular; whilst the converse is obvious.
It is unfortunate that the ψ-null set N in Theorem 11.1.2 need not be empty. For
consider a chain on Z+ with
P (0, 0) = 1,
P (j, 0) = βj > 0,
P (j, j + 1) = 1 − βj . (11.12)
Then the chain restricted to {0} is trivially regular, and the whole chain is positive
recurrent; but if
1
j
βk = ∞
j 1
Proposition 11.1.3. Suppose that Φ is strongly aperiodic and positive recurrent. Then
there exists a decomposition
X=S∪N (11.13)
Proof We know from Proposition 10.4.2 that the split chain is also positive recur-
rent with invariant probability measure π̌; and thus for π̌-a.e. xi ∈ X̌, by (11.7) we have
that
Ěx i [τα̌ ] < ∞. (11.14)
Let Š ⊆ X̌ denote the set where (11.14) holds. Then it is obvious that Š is absorbing,
and by Theorem 11.1.2 the chain Φ̌ is regular on Š. Let {Šn } denote the cover of Š
with regular sets.
Now we have Ň = X̌\Š ⊆ X0 , and so if we write N as the copy of Ň and define
S = X\N , we can cover S with the matching copies Sn . We then have for x ∈ Sn and
any B ∈ B+ (X)
Ex [τB ] ≤ Ěx 0 [τB ] + Ěx 1 [τB ]
which is bounded for x0 ∈ Šn and all x1 ∈ α̌, and hence for x ∈ Sn .
Thus S is the required full absorbing set for (11.13) to hold.
It is now possible, by the device we have used before of analyzing the m-skeleton,
to show that this proposition holds for arbitrary positive recurrent chains.
Theorem 11.1.4. Suppose that Φ is ψ-irreducible. Then the following are equivalent:
(i) The chain Φ is positive recurrent.
(ii) There exists a decomposition
X=S∪N (11.15)
where the set S is full and absorbing, and Φ restricted to S is regular.
Proof Assume Φ is positive recurrent. Then the Nummelin splitting exists for
some m-skeleton from Proposition 5.4.5, and so we have from Proposition 11.1.3 that
there is a decomposition as in (11.15) where the set S = ∪Sn and each Sn is regular for
the m-skeleton.
But if τBm denotes the number of steps needed for the m-skeleton to reach B, then
we have that
τB ≤ m τBm
and so each Sn is also regular for Φ as required.
The converse is almost trivial: when the chain is regular on S then there exists
a petite set C inside S with supx∈C Ex [τC ] < ∞, and the result follows from Theo-
rem 10.4.10.
Just as we may restrict any recurrent chain to an absorbing set H on which the
chain is Harris recurrent, we have here shown that we can further restrict a positive
recurrent chain to an absorbing set where it is regular.
We will now turn to the equivalence between regularity and mean drift conditions.
This has the considerable benefit that it enables us to identify exactly the null set on
which regularity fails, and thus to eliminate from consideration annoying and patho-
logical behavior in many models. It also provides, as noted earlier, a sound practical
approach to assessing stability of the chain.
To motivate and perhaps give more insight into the connections between hitting
times and mean drift conditions we first consider deterministic models.
11.2. Drift, hitting times and deterministic models 261
In this section we analyze a deterministic state space model, indicating the role we might
expect the drift conditions (11.4) on ∆V to play. As we have seen in Chapter 4 and
Chapter 7 in examining irreducibility structures, the underlying deterministic models
for state space systems foreshadow the directions to be followed for systems with a noise
component.
Let us then assume that there is a topology on X, and consider the deterministic
process known as a semi-dynamical system.
Φk +1 = F (Φk ), k ∈ Z+ , (11.16)
P f ( · ) = f (F ( · )).
Since we have assumed the function F to be continuous, the Markov chain Φ has the
Feller property, although in general it will not be a T-chain.
For such a deterministic system it is standard to consider two forms of stability
known as recurrence and ultimate boundedness. We shall call the deterministic system
(11.16) recurrent if there exists a compact subset C ⊂ X such that σC (x) < ∞ for each
initial condition x ∈ X. Such a concept of recurrence here is almost identical to the
definition of recurrence for stochastic models. We shall call the system (11.16) ultimately
bounded if there exists a compact set C ⊂ X such that for each fixed initial condition
Φ0 ∈ X, the trajectory starting at Φ0 eventually enters and remains in C. Ultimate
boundedness is loosely related to positive recurrence: it requires that the limit points of
the process all lie within a compact set C, which is somewhat analogous to the positivity
requirement that there be an invariant probability measure π with π(C) > 1 − ε for
some small ε.
262 Drift and regularity
sup V (F (x)) ≤ M.
x∈C
If we consider the sequence V (Φn ) on R+ then this condition requires that this
sequence move monotonically downwards at a uniform rate until the first time that Φ
enter C. It is therefore not surprising that Φ hits C in a finite time under this condition.
(ii) If Φ is recurrent, then there exists a positive function V such that (DS2) holds.
Proof To prove (i), let Φ(x, n) = F n (x) denote the deterministic position of Φn if
the chain starts at Φ0 = x. We first show that the compact set C defined as
*
C := {Φ(x, i) : x ∈ C, 1 ≤ i ≤ M + 1} ∪ C
Because V is positive and decreases on C c , every trajectory must enter the set C, and
hence also C at some finite time. We conclude that Φ is ultimately bounded.
We now prove (ii). Suppose that a compact set C1 exists such that σC 1 (x) < ∞ for
each initial condition x ∈ X. Let O be an open pre-compact set containing C1 , and set
C := cl O. Then the test function
V (x) := σO (x)
satisfies (DS2). To see this, observe that if x ∈ C c , then V (F (x)) = V (x) − 1 and hence
the first inequality is satisfied. By assumption the function V is everywhere finite,
11.3. Drift criteria for regularity 263
and since O is open it follows that V is upper semicontinuous from Proposition 6.1.1.
This implies that the second inequality in (DS2) holds, since a finite-valued upper
semicontinuous function is uniformly bounded on compact sets.
For a semi-dynamical system, this result shows that recurrence is actually equiva-
lent to ultimate boundedness. In this the deterministic system differs from the general
NSS(F ) model with a non-trivial random component. More pertinently, we have also
shown that the semi-dynamical system is ultimately bounded if and only if a test func-
tion exists satisfying (DS2).
This test function may always be taken to be the time to reach a certain compact
set. As an almost exact analogue, we now go on to see that the expected time to
reach a petite set is the appropriate test function to establish positive recurrence in the
stochastic framework; and that, as we show in Theorem 11.3.4 and Theorem 11.3.5, the
existence of a test function similar to (DS2) is equivalent to positive recurrence.
for some non-negative function V and some set C ∈ B(X); and for some M < ∞,
∆V (x) ≤ M, x ∈ C. (11.19)
Thus we might hope that (V2) might have something of the same impact for stochastic
models as (DS2) has for deterministic chains.
264 Drift and regularity
In essentially the form (11.18) and (11.19) these conditions were introduced by Foster
[129] for countable state space chains, and shown to imply positive recurrence. Use of
the form (V2) will actually make it easier to show that the existence of everywhere
finite solutions to (11.17) is equivalent to regularity and moreover we will identify the
sublevel sets of the test function V as regular sets.
The central technique we will use to make connections between one-step mean drifts
and moments of first entrance times to appropriate (usually petite) sets hinges on
a discrete time version of a result known for continuous time processes as Dynkin’s
formula.
This formula yields not only those criteria for positive Harris chains and regularity
which we discuss in this chapter, but also leads in due course to necessary and sufficient
conditions for rates of convergence of the distributions of the process; necessary and
sufficient conditions for finiteness of moments; and sample path ergodic theorems such
as the Central Limit Theorem and Law of the Iterated Logarithm. All of these are
considered in Part III.
Dynkin’s formula is a sample path formula, rather than a formula involving proba-
bilistic operators. We need to introduce a little more notation to handle such situations.
Recall from Section 3.4 the definition
and let {Zk , FkΦ } be an adapted sequence of positive random variables. For each k, Zk
will denote a fixed Borel measurable function of (Φ0 , . . . , Φk ), although in applications
this will usually (although not always) be a function of the last position, so that
Zk (Φ0 , . . . , Φk ) = Z(Φk )
for some measurable function Z. We will somewhat abuse notation and let Zk denote
both the random variable, and the function on Xk +1 .
For any stopping time τ define
The random time τ n is also a stopping time since it is the minimum of stopping times,
τ n −1
and the random variable i=0 Zi is essentially bounded by n2 .
Dynkin’s formula will now tell us that we can evaluate the expected value of Zτ n by
taking the initial value Z0 and adding on to this the average increments at each time
until τ n . This is almost obvious, but has widespread consequences: in particular it
enables us to use (V2) to control these one-step average increments, leading to control
of the expected overall hitting time.
τn
Ex [Zτ n ] = Ex [Z0 ] + Ex (E[Zi | Fi−1
Φ
] − Zi−1 ) .
i=1
11.3. Drift criteria for regularity 265
As an immediate corollary we have
Proposition 11.3.2. Suppose that there exist two sequences of positive functions
{sk , fk : k ≥ 0} on X, such that
By Dynkin’s formula
τn
0 ≤ Ex [Zτ n ] ≤ Z0 (x) + Ex (si−1 (Φi−1 ) − [fi−1 (Φi−1 ) ∧ N ])
i=1
Proposition 11.3.3. Suppose that there exists a sequence of positive functions {εk :
k ≥ 0} on X, c < ∞, such that
Then "
A −1
τ
Z0 (x), x ∈ Ac ;
Ex [ εi (Φi )] ≤
i=0
ε0 (x) + cP Z0 (x), x ∈ X.
Proof Let Zk and εk denote the random variables Zk (Φ0 , . . . , Φk ) and εk (Φk )
respectively.
By hypothesis E[Zk | FkΦ−1 ] − Zk −1 ≤ −εk −1 whenever 1 ≤ k ≤ σA . Hence for all
n ∈ Z+ and x ∈ X we have by Dynkin’s formula
n
τA
0 ≤ Ex [Zτ An ] ≤ Z0 (x) − Ex εi−1 (Φi−1 ) , x ∈ Ac .
i=1
By the Monotone Convergence Theorem it follows that for all initial conditions,
τA
Ex εi−1 (Φi−1 ) ≤ Z0 (x), x ∈ Ac .
i=1
for all x. Hence if C is petite and V is everywhere finite and bounded on C, then Φ is
positive Harris recurrent.
11.3. Drift criteria for regularity 267
P GA = GA − I + IA UA .
Lemma 11.3.6. Any solution of (11.17) is finite ψ-almost everywhere or infinite ev-
erywhere.
P V (x) ≤ V (x) + b
for all x ∈ X, and it then follows that the set {x : V (x) < ∞} is absorbing. If this set
is non-empty then it is full by Proposition 4.2.3.
Lemma 11.3.7. If the set C is petite, then the function VC (x) is unbounded off petite
sets.
Proof We have from Chebyshev’s inequality that for each of the sublevel sets
CV (
) := {x : VC (x) ≤
},
sup Px {σC ≥ n} ≤ .
x∈C V (
) n
a
Since the right hand side is less than 12 for sufficiently large n, this shows that CV (
) C
for a sampling distribution a, and hence, by Proposition 5.5.4, the set CV (
) is petite.
Lemma 11.3.7 will typically be applied to show that a given petite set is regular.
The converse is always true, as the next result shows:
1
sup Px {σC > n} ≤ sup Ex [τC ].
x∈A n x∈A
τ
B −1
Ex [τB ] ≤ V (x) + bEx IC (Φk ) . (11.26)
k =0
But now we have that I(k < τB ) is measurable with respect to Fk and so by the
smoothing property of expectations this becomes
∞
∞
ai Ex E f (Φk +i )I{k < τB } | Fk
i=0 k =0
∞
∞
= ai Ex f (Φk +i )I(k < τB )
i=0 k =0
∞
τ
B −1
= ai Ex f (Φk +i ) .
i=0 k =0
We now have a relatively simple task in proving
270 Drift and regularity
Proof To prove (i), suppose that (V2) holds, with V bounded on A and ∞C a ψa -
petite set. Without loss of generality, from Proposition 5.5.6 we can assume i=0 i ai <
∞. We also use the simple but critical bound from the definition of petiteness:
IC (x) ≤ ψa (B)−1 Ka (x, B), x ∈ X, B ∈ B + (X). (11.27)
By Lemma 11.3.9 and the bound (11.27) we then have
τ
B −1
Ex [τB ] ≤ V (x) + bEx IC (Φk )
k =0
τ
B −1
≤ V (x) + bEx ψa (B)−1 Ka (Φk , B)
k =0
∞
τ
B −1
= V (x) + bψa (B)−1 ai Ex IB (Φk +i )
i=0 k =0
∞
≤ V (x) + bψa (B)−1 (i + 1)ai
i=0
Regularity of measures
A probability measure µ is called regular, if Eµ [τB ] < ∞ for each B ∈
B + (X).
11.3. Drift criteria for regularity 271
The proof of the following result for regular measures µ is identical to that of the
previous theorem and we omit it.
Theorem 11.3.12. Suppose that Φ is ψ-irreducible.
(i) If (V2) holds for a petite set C and a function V , and if µ(V ) < ∞, then the
measure µ is regular.
(ii) If µ is regular, and if there exists one regular set C ∈ B+ (X), then there exists an
extended-valued function V satisfying (V2) with µ(V ) < ∞.
Proof Suppose that a regular set C ∈ B+ (X) exists. Since C is regular it is also
ψa -petite, and we can assume without loss of generality that the sampling distribution
a has a finite mean. By regularity of C we also have, by Theorem 11.3.11 (ii), that (V2)
holds with V = VC . From Theorem 11.3.11 each of the sets CV (
) is regular, and by
Lemma 11.3.6 the set SC = {y : VC (y) < ∞} is full and absorbing.
Theorem 11.3.11 gives a characterization of regular sets in terms of a drift condition.
Theorem 11.3.14 now gives such a characterization in terms of the mean hitting times
to petite sets.
Theorem 11.3.14. If Φ is ψ-irreducible, then the following are equivalent:
(i) The set C ∈ B(X) is petite and supx∈C Ex [τC ] < ∞.
(ii) The set C is regular and C ∈ B+ (X).
Proof (i) Suppose that C is petite, and let as before VC (x) = 1 + Ex [σC ]. By
Theorem 11.3.5 and the conditions of the theorem we may find a constant b < ∞ such
that
P VC ≤ VC − 1 + bIC .
Since VC is bounded on C by construction, it follows from Theorem 11.3.11 that C is
regular. Since the set C is Harris recurrent it follows from Proposition 8.3.1 (ii) that
C ∈ B+ (X).
(ii) Suppose that C is regular. Since C ∈ B+ (X), it follows from regularity that
supx∈C Ex [τC ] < ∞, and that C is petite follows from Proposition 11.3.8.
We can now give the following complete characterization of the case X = S.
Theorem 11.3.15. Suppose that Φ is ψ-irreducible. Then the following are equivalent:
(i) The chain Φ is regular.
272 Drift and regularity
(ii) The drift condition (V2) holds for a petite set C and an everywhere finite function
V.
Ex [τC ]
Proof If (i) holds, then it follows that a regular set C ∈ B + (X) exists. The function
V = VC is everywhere finite and satisfies (V2), by (11.24), for a suitably large constant
b; so (ii) holds. Conversely, Theorem 11.3.11 (i) tells us that if (V2) holds for a petite
set C with V finite valued then each sublevel set of V is regular, and so (i) holds.
If the expectation is finite as described in (iii), then by (11.24) we see that the func-
tion V = VC satisfies (V2) for a suitably large constant b. Hence from Theorem 11.3.15
we see that the chain is regular; and the converse is trivial.
Proposition 11.4.1. If Φ is a random walk on a half line with finite mean increment
β, then Φ is regular if
β = w Γ(dw) < 0;
Proof By consideration of the proof of Proposition 8.5.1, we see that this result has
already been established, since (11.18) was exactly the condition verified for recurrence
in that case, whilst (11.19) is simply checked for the random walk.
From the results in Section 8.5, we know that the random walk on R+ is transient
if β > 0, and that (at least under a second moment condition) it is recurrent in the
marginal case β = 0. We shall show in Proposition 11.5.3 that it is not regular in this
marginal case.
know, the chain is positive recurrent if y p(y) y < ∞.
Hence, as we already
Since E0 [τ0 ] = y p(y) y the drift condition with V (x) = x is also necessary, as we
have seen.
The forward recurrence time chain thus provides a simple but clear example of the
need to include the second bound (11.19) in the criterion for positive recurrence.
Linear models
Consider the simple linear model defined in (SLM1) by
Xn = αXn −1 + Wn .
We have
Proposition 11.4.2. Suppose that the disturbance variable W for the simple linear
model defined in (SLM1), (SLM2) is non-singular with respect to Lebesgue measure,
and satisfies E[log(1 + |W |)] < ∞. Suppose also that |α| < 1. Then every compact set
is regular, and hence the chain itself is regular.
Proof From Proposition 6.3.5 we know that the chain X is a ψ-irreducible and
aperiodic T-chain under the given assumptions.
Let V (x) = log(1 + ε|x|), where ε > 0 will be fixed below. We will verify that (V2)
holds with this choice of V by applying the following two special properties of this test
function:
V (x + y) ≤ V (x) + V (y), (11.30)
and hence from (11.31) there exists r < ∞ such that whenever X0 ≥ r,
V (X1 ) ≤ V (X0 ) − 1
2 log(|α|−1 ) + V (W1 ).
Ex [V (X1 )] ≤ V (x) − 1
4 log(|α|−1 ).
So we have that (V2) holds with C = {x : |x| ≤ r} and the result follows.
274 Drift and regularity
This is part of the recurrence result we proved using a stochastic comparison argu-
ment in Section 9.5.1, but in this case the direct proof enables us to avoid any restriction
on the range of the increment distribution.
We can extend this simple construction much further, and we shall do so in Chap-
ter 15 in particular, where we show that the geometric drift condition exhibited by the
linear model implies much more, including rates of convergence results, than we have
so far described.
λr
ρr := <1 (11.32)
µ
is satisfied. This will be shown to imply positive Harris recurrence for the chain Φ.
Write [0] = 0 × 0 for the state where the queue is empty. Under (11.32), for each
x ∈ X, we may find m ∈ Z+ sufficiently large that
This follows because under the load constraint, there exists δ > 0 such that with positive
probability, each of the first m interarrival times exceeds each of the first m service times
by at least δ, and also none of the first m customers re-enter the queue.
11.4. Using the regularity criteria 275
Proposition 11.4.3. Suppose that the load constraint (11.32) is satisfied. Then the
Markov chain Φ is δ[0] -irreducible and aperiodic, and every compact subset of X is petite.
We let Wn denote the total amount of time that the server will spend servicing the
customers which are in the system at time Tn +. Let V (x) = Ex [W0 ]. It is easily seen
that
V (x) = E[Wn | Φn = x],
Ex [Wk ] ≤ Ex [W0 ] − 1, x ∈ Ac ,
(11.34)
supx∈A Ex [Wk ] < ∞;
this implies that V (x) satisfies (V2) for the k-skeleton, and hence as in the proof of
Theorem 11.1.4 both the k-skeleton and the original chain are regular.
Proposition 11.4.4. Suppose that ρr < 1. Then (11.34) is satisfied for some compact
set A ⊂ X and some k ∈ Z+ , and hence Φ is a regular chain.
Am = {x ∈ X : |x| ≤ m}, m ∈ Z+ .
k
ni
Wk = W0 + S(i, j) − ζk , (11.35)
i=1 j =1
where ni denotes the number of times that the ith customer visits the system, and the
random variables S(i, j) are i.i.d. with mean µ−1 .
Now choose m so large that
Then by (11.35), and since λr /λ is equal to the expected number of times that a
customer will re-enter the queue,
k
Ex [Wk ] ≤ Ex [W0 ] + Ex [ni ](1/µ) − (E[Tk ] − 1)
i=1
= Ex [W0 ] + (kλr /λ)(1/µ) − k/λ + 1
= Ex [W0 ] − (k/λ)(1 − ρr ) + 1,
Proof To prove regularity for this interior set, we use (V2), and show that when
(11.36)–(11.40) hold there is a function V and an interval set [−R, R] satisfying the
drift condition
P (x, dy)V (y) ≤ V (x) − 1, |x| > R. (11.41)
First consider the condition (11.36). When this holds it is straightforward to calculate
that there must exist positive constants a, b such that
for all |x| sufficiently large. The sufficiency of (11.38) follows by symmetry, or directly
by choosing the test function
"
γ |x| x≤0
V (x) =
−2 [φ(M )]−1 x x > 0
with
γ > −2 |θ(1)| [φ(M )]−1 .
In the case (11.39), the chain is driven by the constant terms and we use the test
function "
2 [φ(1)]−1 |x| x≤0
V (x) = −1
2 [|φ(M )|] x x > 0
to give the result.
The region defined by (11.40) is the hardest to analyze. It involves the way in which
successive movements of the chain take place, and we reach the result by considering
the two-step transition matrix P 2 .
Let fj denote the density of the noise variable W (j). Fix j and x ∈ Rj and write
P 2 (x, dy)V (y) ≤ ax + (a/2)(φ(1) + θ(1)φ(M )), x ≥ R.
But now by assumption φ(M ) + θ(M )φ(1) > 0, and the complete set of conditions
(11.40) also give φ(1) + θ(1)φ(M ) < 0. By suitable choice of a, b we have that the drift
condition (11.41) holds for the two-step chain, and hence this chain is regular. Clearly,
this implies that the one-step chain is also regular, and we are done.
∆V (x) ≥ 0, x ∈ Cc ; (11.42)
and
sup P (x, dy)|V (x) − V (y)| < ∞. (11.43)
x∈X
we have Ex 0 [τC ] = ∞.
Proof The proof uses a technique similar to that used to prove Dynkin’s formula.
Suppose by way of contradiction that Ex 0 [τC ] < ∞, and let Vk = V (Φk ). Then we have
τC
Vτ C = V0 + (Vk − Vk −1 )
k =1
∞
= V0 + (Vk − Vk −1 )I{τC ≥ k}.
k =1
But by (11.44), Vτ C < V0 (x0 ) with probability one, and this contradiction shows that
Ex 0 [τC ] = ∞.
This gives a criterion for a ψ-irreducible chain to be non-positive. Based on Theo-
rem 11.1.4 we have immediately
Theorem 11.5.2. Suppose that the chain Φ is ψ-irreducible and that the non-negative
function V satisfies (11.42) and (11.43) where C ∈ B+ (X). If the set
In practice, one would set C equal to a sublevel set of the function V so that the
condition (11.44) is satisfied automatically for all x ∈ C c .
It is not the case that this result holds without some auxiliary conditions such as
(11.43). For take the state space to be Z+ , and define P (0, i) = 2−i for all i > 0; if we
now choose k(i) > 2i, and let
P0 (τ0 ≥ n + 1) ≤ 2−n .
and in fact we can choose k(i) to give any value of ∆V (i) we wish.
Proposition 11.5.3. If Φ is a random walk on a half line with mean increment β then
Φ is regular if and only if
β = w Γ(dw) < 0.
Proof In Proposition 11.4.1 the sufficiency of the negative drift condition was
established. If
β = w Γ(dw) ≥ 0,
280 Drift and regularity
then using V (x) = x we have (11.42), and the random walk homogeneity properties
ensure that the uniform drift condition (11.43) also holds, giving non-positivity.
We now give a much more detailed and intricate use of this result to show that the
scalar SETAR model is recurrent but not positive on the “margins” of its parameter
set, between the regions shown to be positive in Section 11.4.3 and those regions shown
to be transient in Section 9.5.2: see Figure B.1–Figure B.3 for the interpretation of the
parameter ranges. In terms of the basic SETAR model defined by
and V (x) = 0 in the region [−R, R], where a, b and R are positive constants and u and
v are real numbers to be chosen suitably for the different regions (11.45)-(11.49).
We denote the non-random part of the motion of the chain in the two end regions
by
k(x) = φ(M ) + θ(M )x
and
h(x) = φ(1) + θ(1)x.
We first prove recurrence when (11.45) or (11.46) holds. The proof is similar in style
to that used for random walk in Section 9.5, but we need to ensure that the different
behavior in each end of the two end intervals can be handled simultaneously.
11.5. Evaluating non-positivity 281
Consider first the parameter region θ(M ) = 1, φ(M ) = 0, and 0 ≤ θ(1) < 1, and
choose a = b = u = v = 1, with x > R > rM −1 . Write in this case
so that
Ex [V (X1 )] = V1 (x) + V2 (x).
In order to bound the terms in the expansion of the logarithms in V1 , V2 , we use the
further notation
For x < −R < r1 and θ(1) = 0, Ex [V (X1 )] is a constant and is therefore less than V (x)
for large enough R.
For x < −R < r1 and 0 < θ(1) < 1, consider
we have as before
Ex [V (X1 )] = V6 (x) + V7 (x). (11.55)
To handle the expansion of terms in this case we use
Hence choosing R large enough that v − bh(x) ≤ v − bx, we have from (11.55),
By Lemma 9.4.4(ii),
and thus
Finally consider the region θ(M ) = 1, φ(M ) = 0, θ(1) < 0, and choose a = −bθ(M )
and v − u = aφ(1). For x > R > rM −1 , (11.53) is obtained in a manner similar to the
above. For x < −R < r1 , we look at
By Lemma 9.4.3
and
V7 (x) ≤ Γ1 (−∞, −R − h(x))(log(−v + bh(x)) − 2) − V9 (x).
From the choice of a, b, u and v,
and thus by Lemma 8.5.3 and Lemma 9.4.4(i) for R large enough
When (11.46) holds, the recurrence of the SETAR model follows by symmetry from the
result in the region (11.45).
(ii) We now consider the region where (11.47) holds: in (11.48) the result will
again follow by symmetry.
Choose a = b = u = v = 1 in the definition of V . For x > R > rM −1 , (11.53) holds
as before. For x < −R < r1 , since 1 − h(x) ≤ 1 − x,
ΓM (−∞, −R − k(x)) log(v − bk(x)) = log(u + ax) − ΓM (−R − k(x), ∞) log(u + ax),
Ex [V (X1 )]] ≤ V (x) − (b2 /(2(v − bk(x))2 ))E[W 2 (M )I[W (M )> 0] ] + o(x−2 )
≤ V (x), x > R. (11.58)
and "
ax + c x>0
Vcd (x) = .
b |x| + d x ≤ 0
It is immediate that
P (x, dy)|V (x) − V (y)| ≤ aE[|W1 |] + bE[|WM |] + 2(a|θ(1)| + b|θ(M )|) + 2|d − c|,
11.6 Commentary
For countable space chains, the results of this chapter have been thoroughly explored.
The equivalence of positive recurrence and the finiteness of expected return times to
each atom is a consequence of Kac’s Theorem, and as we saw in Proposition 11.1.1, it
is then simple to deduce the regularity of all states. As usual, Feller [114] or Chung [71]
or Çinlar [59] provide excellent discussions.
Indeed, so straightforward is this in the countable case that the name “regular
chain”, or any equivalent term, does not exist as far as we are aware. The real focus
on regularity and similar properties of hitting times dates to Isaac [169] and Cogburn
[75]; the latter calls regular sets “strongly uniform”. Although many of the properties
of regular sets are derived by these authors, proving the actual existence of regular sets
for general chains is a surprisingly difficult task. It was not until the development of
the Nummelin–Athreya–Ney theory of splitting and embedded regeneration occurred
that the general result of Theorem 11.1.4, that positive recurrent chains are “almost”
regular chains was shown (see Nummelin [302]).
Chapter 5 of Nummelin [303] contains many of the equivalences between regularity
and positivity, and our development owes a lot to his approach. The more general
f -regularity condition on which he focuses is central to our Chapter 14: it seems worth
considering the probabilistic version here first.
For countable chains, the equivalence of (V2) and positive recurrence was developed
by Foster [129], although his proof of sufficiency is far less illuminating than the one we
286 Drift and regularity
have here. The earliest results of this type on a non-countable space appear to be those
in Lamperti [235], and the results for general ψ-irreducible chains were developed by
Tweedie [397, 398]. The use of drift criteria for continuous space chains, and the use of
Dynkin’s formula in discrete time, seem to appear for the first time in Kalashnikov [187,
189, 190]. The version used here and later was developed in Meyn and Tweedie [277],
although it is well known in continuous time for more special models such as diffusions
(see Kushner [232] or Khas’minskii [206]).
There are many rediscoveries of mean drift theorems in the literature. For operations
research models (V2) is often known as Pakes’ Lemma from [313]: interestingly, Pakes’
result rediscovers the original form buried in the discussion of Kendall’s famous queueing
paper [200], where Foster showed that a sufficient condition for positivity of a chain on
Z+ is the existence of a solution to the pair of equations
P (x, y)V (y) ≤ V (x) − 1, x≥N
P (x, y)V (y) < ∞, x < N,
although in [129] he only gives the result for N = 1. The general N form was also re-
discovered by Moustafa [289], and a form for reducible chains given by Mauldon [251].
An interesting state-dependent variation is given by Malyšhev and Men’šikov [243]; we
return to this and give a proof based on Dynkin’s formula in Chapter 19.
The systematic exploitation of the various equivalences between hitting times and
mean drifts, together with the representation of π, is new in the way it appears here.
In particular, although it is implicit in the work of Tweedie [398] that one can identify
sublevel sets of test functions as regular, the current statements are much more com-
prehensive than those previously available, and generalize easily to give an appealing
approach to f -regularity in Chapter 14.
The criteria given here for chains to be non-positive have a shorter history. The
fact that drift away from a petite set implies non-positivity provided the increments are
bounded in mean appears first in Tweedie [398], with a different and less transparent
proof, although a restricted form is in Doob ([99], p. 308), and a recent version similar
to that we give here has been recently given by Fayolle et al. [110]. All proofs we know
require bounded mean increments, although there appears to be no reason why weaker
constraints may not be as effective.
Related results on the drift condition can be found in Marlin [249], Tweedie [396],
Rosberg [336] and Szpankowski [380], and no doubt in many other places: we return to
these in Chapter 19.
Applications of the drift conditions are widespread. The first time series application
appears to be by Jones [182], and many more have followed. Laslett et al. [237] give an
overview of the application of the conditions to operations research chains on the real
line. The construction of a test function for the GI/G/1 queue given in Section 11.4.2
is taken from Meyn and Down [273] where this forms a first step in a stability analysis
of generalized Jackson networks. A test function approach is also used in Sigman [354]
and Fayolle et al. [110] to obtain stability for queueing networks: the interested reader
should also note that in Borovkov [43] the stability question is addressed using other
means.
The SETAR analysis we present here is based on a series of papers where the SETAR
model is analyzed in increasing detail. The positive recurrence and transience results
11.6. Commentary 287
are essentially in Petruccelli et al. [315] and Chan et al. [64], and the non-positivity
analysis as we give it here is taken from Guo and Petruccelli [149]. The assumption of
finite variances in (SETAR3) is again almost certainly redundant, but an exact condition
is not obvious.
We have been rather more restricted than we could have been in discussing specific
models at this point, since many of the most interesting examples, both in operations
research and in state space and time series models, actually satisfy a stronger version
of the drift condition (V2): we discuss these in detail in Chapter 15 and Chapter 16.
However, it is not too strong a statement that Foster’s criterion (as (V2) is often known)
has been adopted as the tool of choice to classify chains as positive recurrent: for a
number of applications of interest we refer the reader to the recent books by Tong
[388] on nonlinear models and Asmussen [9] on applied probability models. Variations
for two-dimensional chains on the positive quadrant are also widespread: the first of
these seems to be due to Kingman [207], and ongoing usage is typified by, for example,
Fayolle [109].
Chapter 12
288
12.1. Chains bounded in probability 289
Proof We prove (i) in Theorem 12.1.2, together with a number of consequents for
weak Feller chains. The proof of (ii) essentially occupies Section 12.4, and is concluded
in Theorem 12.4.1.
We will see that for Feller chains, and even more powerfully for e-chains, this ap-
proach based upon tightness and weak convergence of probability measures provides
a quite different method for constructing an invariant probability measure. This is
exemplified by the linear model construction which we have seen in Section 10.5.4.
From such constructions we will show in Section 12.4 that (V2) implies a form of
positivity for a Feller chain. In particular, for e-chains, if (V2) holds for a compact
set C and an everywhere finite function V then the chain is bounded in probability on
average, so that there is a collection of invariant measures as in Theorem 12.0.1 (ii).
In this chapter we also develop a class of kernels, introduced by Neveu in [295],
which extend the definition of the kernels UA . This involves extending the definition of
a stopping time to randomized stopping times. These operators have very considerable
intuitive appeal and demonstrate one way in which the results of Section 10.4 can be
applied to non-irreducible chains.
Using this approach, we will also show that (V1) gives a criterion for the existence
of a σ-finite invariant measure for a Feller chain.
if Φ is evanescent, then for some x there is an ε > 0 such that for every compact C,
∞
*
lim sup Px { I(Φj ∈ C)} ≤ 1 − ε
n →∞
j =n
290 Invariance and tightness
(i) If an invariant probability does not exist, then for any compact set C ⊂ X,
P n (x, C) → 0 as n → ∞ (12.4)
Ka ε (x, C) → 0 as ε ↑ 1 (12.5)
uniformly in x ∈ X.
Proof We prove only (12.4), since the proof of (12.5) is essentially identical. The
proof is by contradiction: we assume that no invariant probability exists, and that
(12.4) does not hold.
Fix f ∈ Cc (X) such that f ≥ 0, and fix δ > 0. Define the open sets {Ak : k ∈ Z+ }
by $ %
Ak = x ∈ X : P k f > δ .
If (12.4) does not hold then for some such f there exists δ > 0 and a subsequence
{Ni : i ∈ Z+ } of Z+ with AN i = ∅ for all i. Let xi ∈ AN i for each i, and define
λi := P N i (xi , · )
We see from Proposition D.5.6 that the set of sub-probabilities is sequentially compact
v
with respect to vague convergence. Let λ∞ be any vague limit point: λn i −→ λ∞ for
12.1. Chains bounded in probability 291
By regularity of finite measures on B(X) (cf Theorem D.3.2) this implies that λ∞ ≥
λ∞ P , which is only possible if λ∞ = λ∞ P . Since we have assumed that no invariant
probability exists it follows that λ∞ = 0, which contradicts (12.6). Thus we have that
Ak = ∅ for sufficiently large k.
To prove (ii), let Φ be bounded in probability on average. Since we can find ε > 0,
j
x ∈ X and a compact set C such that P (x, C) > 1 − ε for all sufficiently large j by
definition, (12.4) fails and so the chain admits an invariant probability.
The following corollary easily follows: notice that the condition (12.8) is weaker
than the obvious condition of Lemma D.5.3 for boundedness in probability on average.
Proposition 12.1.3. Suppose that the Markov chain Φ has the Feller property, and
that a coercive function V exists such that for some initial condition x ∈ X,
These results require minimal assumptions on the chain. They do have two draw-
backs in practice.
Firstly, there is no guarantee that the invariant probability is unique. Currently,
known conditions for uniqueness involve the assumption that the chain is ψ-irreducible.
This immediately puts us in the domain of Chapter 10, and if the measure ψ has an
open set in its support, then in fact we have the full T-chain structure immediately
available, and so we would avoid the weak convergence route.
Secondly, and essentially as a consequence of the lack of uniqueness of the invariant
measure π, we do not generally have guaranteed that
w
P n (x, · ) −→ π.
Proposition 12.1.4. Suppose that the Markov chain Φ has the Feller property, and is
bounded in probability on average.
If the invariant measure π is unique then for every x
w
P n (x, · ) −→ π. (12.9)
Proof Since for every subsequence {nk } the set of probabilities {P n k (x, · )} is
sequentially compact in the weak topology, then as in the proof of Theorem 12.1.2,
from boundedness in probability we have that there is a further subsequence converging
weakly to a non-trivial limit which is invariant for P . Since all these limits coincide by
the uniqueness assumption on π we must have (12.9).
Recall that in Proposition 6.4.2 we came to a similar conclusion. In that result,
convergence of the distributions to a unique invariant probability, in a manner similar
to (12.9), is given as a condition under which a Feller chain Φ is an e-chain.
= h(Φk ),
where in the second equality we used the fact that the event {τh ≥ k} is measurable
with respect to {Φ, Y1 , . . . , Yk −1 }, and in the final equality we used independence of Y
and Φ.
Now define the kernel Uh on X × B(X) by
τh
Uh (x, B) = Ex IB (Φk ) . (12.13)
k =1
where I1−h denotes the kernel which gives multiplication by 1 − h. This final expression
for Uh defines this kernel independently of the bivariate chain.
In the special cases h ≡ 0, h = IB , and h ≡ 1 we have, respectively,
Uh = U, Uh = UB , Uh = P.
1
When h = 2 so that τh is completely independent of Φ we have
∞
U = 1
2
( 12 )k −1 P k = Ka 1 .
2
k =1
For general functions h, the expression (12.14) defining Uh involves only the transition
function P for Φ and hence allows us to drop the bivariate chain if we are only interested
in properties of the kernel Uh . However the existence of the bivariate chain and the
construction of τh allows a transparent proof of the following resolvent equation.
Theorem 12.2.1 (Resolvent equation). Let h ≤ 1 and g ≤ 1 be two functions on X
with h ≥ g. Then the resolvent equation holds:
Ug = Uh + Uh Ih−g Ug = Uh + Ug Ih−g Uh .
Proof To prove the theorem we will consider the bivariate chain Ψ. We will see
that the resolvent equation formalizes several relationships between the stopping times
τg and τh for Ψ. Note that since h ≥ g, we have the inclusion Ag ⊆ Ah and hence
τg ≥ τ h .
To prove the first resolvent equation we write
τg
τh
τg
f (Φk ) = f (Φk ) + I{τg > τh } f (Φk )
k =1 k =1 k =τ h +1
−1
k1
= [h(Φk ) − g(Φk )]Ug (Φk , f ) [1 − h(Φi )].
i=1
12.2. Generalized sampling and invariant measures 295
Ug (x, f ) = Uh (x, f )
τg $
τh %
+ Ex I{g(Φk ) < Yk ≤ h(Φk )}θ k
f (Φi ) . (12.16)
k =1 i=1
The expectation can be transformed, using the Markov property for the bivariate chain,
to give
τg $
τh %
Ex I{g(Φk ) < Yk ≤ h(Φk )}θk f (Φi )
k =1 i=1
∞
τh
= Ex I{g(Φk ) < Yk ≤ h(Φk )}I{τg ≥ k}EΨ k f (Φi )
k =1 i=1
∞
= Ex [h(Φk ) − g(Φk )]I{τg ≥ k}Uh (Φk , f )
k =1
= Ug Ih−g Uh
Ph (x, A) = Uh Ih (x, A)
is a Markov transition function. This follows from (12.11), which shows that
∞
k1
−1
Ph (x, X) = Uh (x, h) = Ex (1 − h(Φi ))h(Φk )
k =1 i=1
∞
= Px {τh = k} (12.17)
k =1
296 Invariance and tightness
(ii) Px {τh < ∞} = L(x, h), and hence L(x, h) ≥ Q(x, h);
(iii) if for some ε < 1 the function h satisfies h(x) ≤ ε for all x ∈ X, then L(x, h) = 1
if and only if Q(x, h) = 1.
Px {Ψk ∈ Ah i.o. | F∞
Φ
} = Px {Yk ≤ h(Φk ) i.o. | F∞
Φ
}.
Conditioned on F∞Φ
, the events {Yk ≤ h(Φk )}, k ≥ 1, are mutually independent. Hence
by the Borel-Cantelli Lemma,
$
∞ %
Px {Ψk ∈ Ah i.o. | F∞
Φ
} =I Px {Yk ≤ h(Φk ) | F∞
Φ
}=∞ .
k =1
We now present an application of Theorem 12.2.2 which gives another representation
for an invariant measure, extending the development of Section 10.4.2.
(i) If µ is any σ-finite subinvariant measure, then µ is invariant and has the repre-
sentation
µ(A) = µ(dx)h(x)Uh (x, A).
Cε = {x : Ka 1 (x, h) > ε}
2
Proof We prove (i) by considering the bivariate chain Ψ. The set Ah ⊂ Y is Harris
recurrent and in fact Px {Ψ ∈ Ah i.o.} = 1 for all x ∈ X by Theorem 12.2.2. Now define
the measure µ on Y by
τh
µ (A × B) = Eν I{Ψk ∈ A × B}
k =1
is invariant for Ψ, and since the distribution of Φ is the marginal distribution of Ψ, the
measure µ defined for A ∈ B(X) by µ(A) := µ (A × [0, 1]), A ∈ B(X), is invariant for Φ.
We now demonstrate that µ is σ-finite. From the assumptions of the theorem and
Theorem 12.2.2 (ii) the sets Cε cover X. We have from the representation of µ,
Hence for all ε we have the bound µ(Cε ) ≤ µ(h)/ε < ∞, which completes the proof of
(ii).
298 Invariance and tightness
Observe that by Proposition 9.1.1, (12.19) implies that Φ visits C infinitely often from
each initial condition, and hence Φ is at least non-evanescent.
To construct an invariant measure we essentially consider the chain ΦC obtained
by sampling Φ at consecutive visits to the compact set C. Suppose that the resulting
sampled chain on C had the Feller property. In this case, since the sampled chain
evolves on the compact set C, we could deduce from Theorem 12.1.2 that an invariant
probability existed for the sampled chain, and we would then need only a few further
steps for an existence proof for the original chain Φ.
However, the transition function PC for the sampled chain is given by
∞
PC = (P IC c )k P IC = UC IC ,
k =0
which does not have the Feller property in general. To proceed, we must “smooth
around the edges of the compact set C”. The kernels Ph introduced in the previous
section allow us to do just that.
Let N and O be open subsets of X with compact closure for which C ⊂ O ⊂ Ō ⊂ N ,
where C satisfies (12.19) and let h : X → R be a continuous function such as
d(x, N c )
h(x) =
d(x, N c ) + d(x, Ō)
for which
IO (x) ≤ h(x) ≤ IN (x). (12.20)
The kernel Ph := Uh Ih is a Markov transition function since by (12.19) we have that
Q(x, h) ≡ 1. Since Ph (x, N̄ ) = 1 for all x ∈ X, we will immediately have an invariant
measure for Ph by Theorem 12.1.2 if Ph has the weak Feller property.
Proposition 12.3.1. Suppose that the transition function P is weak Feller. If 0 ≤ h ≤
1 is continuous and if Q(x, h) ≡ 1, then Ph is also weak Feller.
12.3. The existence of a σ-finite invariant measure 299
Proof By the Feller property, the kernel (P I1−h )n P Ih preserves positive lower
semicontinuous functions. Hence if f is positive and lower semicontinuous, then
∞
Ph f = (P I1−h )n P Ih f
k =0
Proof From Theorem 12.1.2 an invariant probability ν exists which is invariant for
Ph = Uh Ih . Hence from Theorem 12.2.3, the measure µ = νUh is invariant for Φ and
is finite on the sets {x : Ka 1 (x, h) > ε}. Since Ka 1 (x, h) is a continuous function of
2 2
x, and is strictly positive everywhere by (12.19), it follows that µ is finite on compact
sets.
Proof If L(x, C) = 1 for all x ∈ X, then the proof follows from Theorem 12.3.2.
Consider now the only other possibility, where L(x, C) = 1 for some x. In this case
the adapted process {V (Φk )I{τC > k}, FkΦ } is a convergent supermartingale, as in the
proof of Theorem 9.4.1, and since by assumption Px {τC = ∞} > 0, this shows that
Px {lim sup V (Φk ) < ∞} ≥ 1 − L(x, C) > 0.
k →∞
By Theorem 12.1.2, it follows that an invariant probability exists, and this completes
the proof.
Finally we prove that in the weak Feller case, the drift condition (V2) again provides
a criterion for the existence of an invariant probability measure.
Theorem 12.3.4. Suppose that the chain Φ is weak Feller. If (V2) is satisfied with a
compact set C and a positive function V which is finite at one x0 ∈ X, then an invariant
probability measure π exists.
300 Invariance and tightness
1 1 k
n n
1
1 ≤ V (x0 ) + b P (x0 , C).
n n n
k =0 k =0
1 k
n
1
lim inf P (x0 , C) ≥ . (12.21)
n →∞ n b
k =0
In this case, it follows as in Proposition 6.4.2 from Ascoli’s Theorem D.4.2 that {P k f :
k ∈ Z+ } is equicontinuous on compact subsets of X whenever f ∈ C(X), and so it is
necessary that the chain Φ be an e-chain, in the sense of Section 6.4, whenever we have
convergence in the sense of (12.22).
The key to analyzing e-chains lies in the following result:
Theorem 12.4.1. Suppose that Φ is an e-chain. Then
(i) There exists a substochastic kernel Π such that
v
P k (x, · ) −→ Π(x, · ) as k → ∞ (12.23)
v
Ka ε (x, · ) −→ Π(x, · ) as ε ↑ 1 (12.24)
for all x ∈ X.
(ii) For each j, k,
∈ Z+ we have
P j Π k P
= Π, (12.25)
and hence for all x ∈ X the measure Π(x, · ) is invariant with Π(x, X) ≤ 1.
(iii) The Markov chain is bounded in probability on average if and only if Π(x, X) = 1
for all x ∈ X.
12.4. Invariant measures for e-chains 301
Proof We prove the result (12.23), the proof of (12.24) being similar. Let {fn } ⊂
Cc (X) denote a fixed dense subset. By Ascoli’s theorem and a diagonal subsequence
argument, there exists a subsequence {ki } of Z+ and functions {gn } ⊂ C(X) such that
uniformly for x in compact subsets of X for each n ∈ Z+ . The set of all subprobabilities
on B(X) is sequentially compact with respect to vague convergence, and any vague limit
ν of the probabilities P k i (x, · ) must satisfy fn dν = gn (x) for all n ∈ Z+ . Since the
functions {fn } are dense in Cc (X), this shows that for each x there is exactly one vague
limit point, and hence a kernel Π exists for which
v
P k i (x, · ) −→ Π(x, · ) as i → ∞
for each x ∈ X.
Observe that by equicontinuity, the function Πf is continuous for every function
f ∈ Cc (X). It follows that Πf is positive and lower semicontinuous whenever f has
these properties.
By the Dominated Convergence Theorem we have for all k, j ∈ Z+ ,
P j Π k = Π.
Π k P j = Π, k, j ∈ Z+ .
Let f ∈ Cc (X) be a continuous positive function with compact support. Then, since the
function P f is also positive and continuous, (D.6) implies that
Πf = lim ΠP m j f
j →∞
= ΠΠ f by the Dominated Convergence Theorem
≤ lim inf P k i Π f since Π f is continuous and positive
i→∞
= Π f.
Hence by symmetry, Π = Π, and this completes the proof of (i) and (ii).
The result (iii) follows from (i) and Proposition D.5.6.
302 Invariance and tightness
Proof For the first entrance time τC to the compact set C, let θτ C denote the
τC -fold shift on sample space, defined so that θτ C f (Φk ) = f (Φk +τ C ) for any function f
on X.
Fix x ∈ C, 0 < ε < 1, and observe that by conditioning at time τC and using the
strong Markov property we have for x ∈ C,
∞
Ka ε (x, C) = (1 − ε)Ex εk I{Φk ∈ C}
k =0
∞
= (1 − ε)Ex 1 + ετ C +k θτ C I{Φk ∈ C}
k =0
∞
= (1 − ε) + (1 − ε)Ex ετ C EΦ τ C εk I{Φk ∈ C}
k =0
≥ (1 − ε) + Ex [ετ C ] inf Ka ε (y, C).
y ∈C
Px {τC < ∞} = 1, x ∈ X,
Proof (i) If Π(x, X) > 0 for some x ∈ X, then an invariant probability π exists.
In fact, we may take π = Π(x, · )/Π(x, X).
From the definition of Π and the Dominated Convergence Theorem we have that
for any f ∈ Cc (X),
π(f ) = lim [πP n (f )] = πΠ(f )
n →∞
which shows that π = πΠ. Hence 1 = π(X) = π(dx)Π(x, X). This shows that
Π(y, X) = 1 for a.e. y ∈ X [π], proving (i) of the theorem.
(ii) Let ρ = inf x∈X Π(x, X), and let
Sρ = {x ∈ X : Π(x, X) = ρ}.
Hence from Theorem 12.4.3 (ii) we have Π(x, X) = 1 for all x ∈ X. Theorem 12.4.1
then implies that the chain is bounded in probability on average.
The next result shows that the drift criterion for positive recurrence for ψ-irreducible
chains also has an impact on the class of e-chains.
Theorem 12.4.5. Let Φ be an e-chain, and suppose that condition (V2) holds for
a compact set C and an everywhere finite function V . Then the Markov chain Φ is
bounded in probability on average.
12.5. Establishing boundedness in probability 305
Proof It follows from Theorem 11.3.4 that Ex [τC ] ≤ V (x) for x ∈ C c , so that a
fortiori we also have L(x, C) ≡ 1. As in the proof of Theorem 12.3.4, for any x ∈ X,
1 k
n
1
Π(x, X) ≥ lim sup P (x, C) ≥ , x ∈ X.
n →∞ n b
k =0
From this it follows from Theorem 12.4.3 (iii) and (ii) that Π(x, X) ≡ 1, and hence Φ
is bounded in probability on average as claimed.
Since Wk +1 and Xk are independent, this together with (12.31) implies that
E[V (Xk +1 ) | X0 , . . . , Xk ] ≤ αV (Xk ) + E[|G(Wk +1 − E[Wk +1 ])|2M ], (12.34)
and taking expectations of both sides gives
E[|G(Wk +1 − E[Wk +1 ])|2M ]
lim sup E[V (Xk )] ≤ < ∞.
k →∞ 1−α
Since V is a coercive function on X, Lemma D.5.3 gives a direct proof that the chain is
bounded in probability.
We note that (12.34) also ensures immediately that (V2) is satisfied. Under the extra
conditions (LSS4) and (LCM3) we have from Proposition 6.3.5 that all compact sets
are petite, and it immediately follows from Theorem 11.3.11 that the chain is regular
and hence positive Harris.
It may be seen that stability of the linear state space model is closely tied to the
stability of the deterministic system xk +1 = F xk . For each initial condition x0 ∈ Rn of
this deterministic system, the resulting trajectory {xk } satisfies the bound
|xk |M ≤ αk |x0 |M
and hence is ultimately bounded in the sense of Section 11.2: in fact, in the dynamical
systems literature such a system is called globally exponentially stable. It is precisely
this stability for the deterministic “core” of the linear state space model which allows
us to obtain boundedness in probability for the stochastic process Φ.
We now generalize the model (LSS1) to include random variation in the coefficients
F and G.
For example, this is the case when y0 = θ̃0 = Σ0 = 0. If (12.38) holds then it follows
from (2.23) that
E[Yk2+1 | Yk ] = Σk Yk2 + σw2 . (12.39)
This identity will be used to prove the following result:
Proposition 12.5.2. For the adaptive control model satisfying (SAC1) and (SAC2),
suppose that the process Φ defined in (2.25) satisfies (12.38) and that σz2 < 1. Then we
have
lim sup E[|Φk |2 ] < ∞
k →∞
so that distributions of the chain are tight, and hence Φ is positive recurrent.
Proof We note first that since the sequence {Σk } is bounded below and above by
Σ = σz > 0 and Σ = σz /(1 − α2 ) < ∞, and the process θ clearly satisfies
σz2
lim sup E[θk2 ] = ,
k →∞ 1 − α2
E[Yk2+1 Σk +1 | Yk ] = Σk +1 E[Yk2+1 | Yk ]
Taking total expectations of each side of (12.40), we use the condition σz2 < 1 to obtain
by induction, for all k ∈ Z+ ,
In fact, we will see in Chapter 16 that not only is the process bounded in probability,
but the conditional mean of Yk2 converges to the steady state value Eπ [Y02 ] at a geometric
rate from every initial condition. These results require a more elaborate stability proof.
Note that equation (12.40) does not obviously imply that there is a solution to a
drift inequality such as (V2): the conditional expectation is taken with respect to Yk ,
which is strictly smaller than FkΦ .
The condition that σz2 < 1 cannot be omitted in this analysis: indeed, we have that
if σz2 ≥ 1, then
E[Yk2 ] ≥ [σz2 ]k Y0 + kσw2 → ∞
as k increases, so that the chain is unstable in a mean square sense, although it may
still be bounded in probability.
It is well worth observing that this is one of the few models which we have encoun-
tered where obtaining a drift inequality of the form (V2) is much more difficult than
merely proving boundedness in probability. This is due to the fact that the dynamics
of this model are extremely nonlinear, and so a direct stability proof is difficult. By
exploiting equation (12.39) we essentially linearize a portion of the dynamics, which
makes the stability proof rather straightforward. However the identity (12.39) only
holds for a restricted class of initial conditions, so in general we are forced to tackle the
nonlinear equations directly.
12.6 Commentary
The key result Theorem 12.1.2 is taken from Foguel [121]. Versions of this result have
also appeared in papers by Beneš [23, 24] and Stettner [372] which consider processes
in continuous time. For more results on Feller chains the reader is referred to Krengel
[221], and the references cited therein.
For an elegant operator-theoretic proof of results related to Theorem 12.3.2, see
Lin [238] and Foguel [123]. The method of proof based upon the use of the operator
Ph = Uh Ih to obtain a σ-finite invariant measure is taken from Rosenblatt [338]. Neveu
in [295] promoted the use of the operators Uh , and proved the resolvent equation The-
orem 12.2.1 using direct manipulations of the operators. The kernel Ph is often called
the balayage operator associated with the function h (see Krengel [221] or Revuz [326]).
In the Supplement to Krengel’s text by Brunel ([221] pp. 301–309) a development of the
recurrence structure of irreducible Markov chains is developed based upon these oper-
ators. This analysis and much of [326] exploits fully the resolvent equation, illustrating
the power of this simple formula although because of our emphasis on ψ-irreducible
chains and probabilistic methods, we do not address the resolvent equation further in
this book.
Obviously, as with Theorem 12.1.2, Theorem 12.3.4 can be applied to an irreducible
Markov chain on countable space to prove positive recurrence. It is of some historical
interest to note that Foster’s original proof of the sufficiency of (V2) for positivity of such
chains is essentially that in Theorem 12.3.4. Rather than showing in any direct way that
(V2) gives an invariant measure, Foster was able to use the countable space analogue
of Theorem 12.1.2 (i) to deduce positivity from the “non-nullity” of a “compact” finite
set of states as in (12.21). We will discuss more general versions of this classification of
sets as positive or null further, but not until Chapter 18.
12.6. Commentary 309
Observe that Theorem 12.3.4 only states that an invariant probability exists. Per-
haps surprisingly, it is not known whether the hypotheses of Theorem 12.3.4 imply that
the chain is bounded in probability when V is finite valued except for e-chains as in
Theorem 12.4.5.
The theory of e-chains is still being developed, although these processes have been
the subject of several papers over the past thirty years, most notably by Jamison and
Sine [175, 178, 358, 357, 356], Rosenblatt [337], Foguel [121] and the text by Krengel
[221]. In most of the e-chain literature, however, the state space is assumed compact
so that stability is immediate. The drift criterion for boundedness in probability on
average in Theorem 12.4.5 is new. The criterion Theorem 12.3.4 for the existence of an
invariant probability for a Feller chain was first shown in Tweedie [402].
The stability analysis of the linear state space model presented here is standard. For
an early treatment see Kalman and Bertram [192], while Caines [57] contains a modern
and complete development of discrete time linear systems. Snyders [364] treats linear
models with a continuous time parameter in a manner similar to the presentation in
this book. The bilinear model has been the subject of several papers: see for example
Feigin and Tweedie [111], or the discussion in Tong [388]. The stability of the adaptive
control model was first resolved in Meyn and Caines [270], and related stability results
were described in Solo [365]. The stability proof given here is new, and is far simpler
than any previous results.
Part III
CONVERGENCE
Chapter 13
Ergodicity
(i) The chain is positive Harris: that is, the unique invariant measure π is finite.
(ii) There exists some ν-small set C ∈ B+ (X) and some P ∞ (C) > 0 such that as
n → ∞, for all x ∈ C
P n (x, C) → P ∞ (C). (13.1)
313
314 Ergodicity
(iii) There exists some regular set in B + (X): equivalently, there is a petite set C ∈ B(X)
such that
sup Ex [τC ] < ∞. (13.2)
x∈C
(iv) There exists some petite set C, some b < ∞ and a non-negative function V finite
at some one x0 ∈ X, satisfying
Proof That π(X) < ∞ in (i) is equivalent to the finiteness of hitting times as in
(iii) and the existence of a mean drift test function in (iv) is merely a restatement of
the overview Theorem 11.0.1 in Chapter 11.
The fact that any of these positive recurrence conditions imply the uniform con-
vergence over all sets A from all starting points x as in (13.4) is of course the main
conclusion of this theorem, and is finally shown in Theorem 13.3.3.
That (ii) holds from (13.4) is obviously trivial by dominated convergence. The cycle
is completed by the implication that (ii) implies (13.4), which is in Theorem 13.3.5.
The extension from convergence to summability provided the initial measures are
regular is given in Theorem 13.4.4. Conditions under which π itself is regular are also
in Section 13.4.2.
There are four ideas which should be born in mind as we embark on this third part
of the book, especially when coming from a countable space background. The first two
involve the types of limit theorems we shall address; the third involves the method of
proof of these theorems; and the fourth involves the nomenclature we shall use.
Modes of convergence
The first is that we will be considering, in this and the next three chapters, convergence
of a chain in terms of its transition probabilities. Although it is important also to
consider convergence of a chain along its sample paths, leading to strong laws, or of
normalized variables leading to central limit theorems and associated results, we do not
turn to this until Chapter 17.
This is in contrast to the traditional approach in the countable state space case.
Typically, there, the search is for conditions under which there exist pointwise limits of
the form
lim |P n (x, y) − π(y)| = 0; (13.6)
n →∞
Ergodicity 315
but the results we derive are related to the signed measure (P n − π), and so concern
not merely such pointwise or even setwise convergence, but a more global convergence
in terms of the total variation norm.
µ
:= sup |µ(f )| = sup µ(A) − inf µ(A). (13.7)
f :|f |≤1 A ∈B(X) A ∈B(X)
lim
P n (x, · ) − π
= 2 lim sup |P n (x, A) − π(A)| = 0. (13.8)
n →∞ n →∞ A
Obviously when (13.8) holds on a countable space, then (13.6) also holds and indeed
holds uniformly in the end point y. This move to the total variation norm, necessitated
by the typical lack of structure of pointwise transitions in the general state space, will
actually prove exceedingly fruitful rather than restrictive.
When the space is topological, it is also the case that total variation convergence
implies weak convergence of the measures in question.
This is clear since (see Chapter 12) the latter is defined as convergence of expec-
tations of functions which are not only bounded but also continuous. Hence the weak
convergence of P n to π as in Proposition 12.1.4 will be subsumed in results such as
(13.4) provided the chain is suitably irreducible and positive.
Thus, for example, asymptotic properties of T-chains will be much stronger than
those for arbitrary weak Feller chains even when a unique invariant measure exists for
the latter.
The same type of behavior, and the need to ensure that initial distributions are
appropriately “regular” in extended ways, will be a highly visible part of the work in
Chapters 14 and 15.
Ergodic chains
Finally, a word on the term ergodic. We will adopt this term for chains where the limit
in (13.6) or (13.8) holds as the time sequence n → ∞, rather than as n → ∞ through
some subsequence.
Unfortunately, we know that in complete generality Markov chains may be periodic,
in which case the limits in (13.6) or (13.8) can hold at best as we go through a periodic
sequence nd as n → ∞. Thus by definition, ergodic chains will be aperiodic, and a
minor, sometimes annoying but always vital change to the structure of the results is
needed in the periodic case.
We will therefore give results, typically, for the aperiodic context and give the re-
quired modification for the periodic case following the main statement when this seems
worthwhile.
the use of total variation norm convergence results comes from an extension of the first-
entrance and last-exit decompositions of Section 8.2, together with the representation
of the invariant probability given in Theorem 10.2.1.
The first-entrance last-exit decomposition, for any states x, y, α ∈ X is given by
n −1
j
P n (x, y) = α P n (x, y) + αP
k
(x, α)P j −k (α, α) α P n −j (α, y), (13.9)
j =1 k =1
where we have used the notation α to indicate that the specific state being used for the
decomposition is distinguished from the more generic states x, y which are the starting
and end points of the decomposition.
We will wish in what follows to concentrate on the time variable rather than a
particular starting point or end point, and it will prove particularly useful to have
notation that reflects this. Let us hold the reference state α fixed and introduce the
three forms
ax (n) := Px (τα = n), (13.10)
The power of these forms becomes apparent when we link them to the representation
of the invariant measure given in (13.13). The next decomposition underlies all ergodic
theorems for countable space chains.
|P n (x, y) − π(y)| ≤ αP
n
(x, y)
n
+ |ax ∗ u ∗ ty (n) − π(α) j =1 ty (j)| (13.18)
n
+ |π(α) j =1 ty (j) − π(y)|.
provided we have (as we do for a Harris recurrent chain) that for all x
ax (j) = Px (τα < ∞) = 1. (13.21)
j
The convergence in (13.19) will be shown to hold for all states of an aperiodic positive
chain in the next section: we first motivate our need for it, and for related results in
renewal theory, by developing the ergodic structure of chains with “ergodic atoms”.
13.1. Ergodic chains on countable spaces 319
Ergodic atoms
If Φ is positive Harris, an atom α ∈ B + (X) is called ergodic if it satisfies
In the positive Harris case note that an atom can be ergodic only if the chain is
aperiodic.
With this notation, and the prescription for analyzing ergodic behavior inherent in
Proposition 13.1.1, we can prove surprisingly quickly the following solidarity result.
Theorem 13.1.2. If Φ is a positive Harris chain on a countable space, and if there
exists an ergodic atom α, then for every initial state x
P n (x, · ) − π
→ 0, n → ∞. (13.23)
and so by (13.17) we have the total variation norm bounded by three terms:
∞
P n (x, · ) − π
≤ n
α P (x, y) + |ax ∗ u − π(α)| ∗ ty (n) + π(α) ty (j).
y y y j =n +1
(13.24)
We need to show each of these goes to zero. From the representation (13.13) of π and
Harris positivity,
∞
∞> π(y) = π(α) ty (j). (13.25)
y j =1 y
The third term in (13.24) is the tail sum in this representation and so we must have
∞
π(α) ty (j) → 0, n → ∞. (13.26)
j =n +1 y
The first term in (13.24) also tends to zero, for we have the interpretation
α P (x, y) = Px (τα ≥ n)
n
(13.27)
y
This approach may be extended to give the Ergodic Theorem for a general space
chain when there is an ergodic atom in the state space. A first-entrance last-exit
decomposition will again give us an elegant proof in this case, and we prove such a
result in Section 13.2.3, from which basis we wish to prove the same type of ergodic
result for any positive Harris chain. To do this, we must of course prove that the atom
m
α̌ for the split skeleton chain Φ̌ , which we always have available, is an ergodic atom.
To show that atoms for aperiodic positive chains are indeed ergodic, which is crucial
to completing this argument, we need results from renewal theory. This is therefore
necessarily the subject of the next section.
for the renewal function for n ≥ 0. Since p0∗ = δ0 we have u(0) = 1; by convention we
will set u(−1) = 0.
If we let Z(n) denote the indicator variables
"
1 Sj = n, some j ≥ 0
Z(n) =
0 otherwise,
then we have
Pa (Z(n) = 1) = a ∗ u (n),
13.2. Renewal and regeneration 321
and thus the renewal function represents the probabilities of {Z(n) = 1} when there is
no delay, or equivalently when a = δ0 .
The coupling approach involves the study of two linked renewal processes with the
same increment distribution but different initial distributions, and, most critically, de-
fined on the same probability space.
To describe this concept we define two sets of mutually independent random variables
where each of the variables {S1 , S2 , . . .} and {S1 , S2 , . . .} are independent and identically
distributed with distribution {p(j)}; but where the distributions of the independent
variables S0 , S0 are a, b.
The coupling time of the two renewal processes is defined as
where Za , Zb are the indicator sequences of each renewal process. The random time Tab
is the first time that a renewal takes place simultaneously in both sequences, and from
that point onwards, because of the loss of memory at the renewal epoch, the renewal
processes are identical in distribution.
The key requirement to use this method is that this coupling time be almost surely
finite. In this section we will show that if we have an aperiodic positive recurrent renewal
process with finite mean
∞
mp := jp(j) < ∞, (13.28)
j =0
Proof Consider the linked forward recurrence time chain V ∗ defined by (10.19),
corresponding to the two independent renewal sequences {Sn , Sn }.
Let τ1,1 = min(n : Vn∗ = (1, 1)). Since the first coupling takes place at τ1,1 + 1,
Tab = τ1,1 + 1
Pµ (τ1,1 < ∞) = 1;
as required.
Theorem 13.2.2. Suppose that a, b, p are proper distributions on Z+ , and that u is the
renewal function corresponding to p. Then provided p is aperiodic with mean mp < ∞
We have that
But from Proposition 13.2.1 we have that P(Tab > n) → 0 as n → ∞, and (13.31)
follows.
We will see in Section 18.1.1 that Theorem 13.2.2 holds even without the assumption
that mp < ∞. For the moment, however, we will concentrate on further aspects of
coupling when we are in the positive recurrent case.
has been shown in (10.16) to be the invariant probability measure for the forward
recurrence time chain V + associated with the renewal sequence {Sn }. It also follows
that the delayed renewal distribution corresponding to the initial distribution e is given
13.2. Renewal and regeneration 323
for every n ≥ 0 by
Pe (Z(n) = 1) = e ∗ u (n)
= m−1
p (1 − p ∗ 1) ∗ u (n)
∞
= m−1
p (1 − p ∗ 1) ∗ ( p∗j ) (n)
j =0
∞ ∞
= m−1
p 1+1∗( p∗j )(n) − p ∗ 1 ∗ ( p∗j ) (n)
j =1 j =0
= m−1
p . (13.35)
For this reason the distribution e is also called the equilibrium distribution of the renewal
process.
These considerations show that in the positive recurrent case, the key quantity we
considered for Markov chains in (13.22) has the representation
|u(n) − m−1
p | = |Pδ 0 (Z(n) = 1) − Pe (Z(n) = 1)| (13.36)
and in order to prove an asymptotic limiting result for an expression of this kind, we
must consider the probabilities that Z(n) = 1 from the initial distributions δ0 , e.
But we have essentially evaluated this already. We have
Theorem 13.2.3. Suppose that a, p are proper distributions on Z+ , and that u is the
renewal function corresponding to p. Then provided p is aperiodic and has a finite mean
mp
|a ∗ u (n) − m−1
p | → 0, n → ∞. (13.37)
Proof The result follows from Theorem 13.2.2 by substituting the equilibrium
distribution e for b and using (13.35).
This has immediate application in the case where the renewal process is the return
time process to an accessible atom for a Markov chain.
Proposition 13.2.4. (i) If Φ is a positive recurrent aperiodic Markov chain, then
any atom α in B + (X) is ergodic.
(ii) If Φ is a positive recurrent aperiodic Markov chain on a countable space, then for
every initial state x
P n (x, · ) − π
→ 0, n → ∞. (13.38)
Proof We know from Proposition 10.2.2 that if Φ is positive recurrent then the
mean return time to any atom in B + (X) is finite. If the chain is aperiodic then (i)
follows directly from Theorem 13.2.3 and the definition (13.22).
The conclusion in (ii) then follows from (i) and Theorem 13.1.2.
It is worth stressing explicitly that this result depends on the classification of positive
chains in terms of finite mean return times to atoms: that is, in using renewal theory
it is the equivalence of positivity and regularity of the chain that is utilized.
324 Ergodicity
n −1 j
j −k n −j
P n (x, B) = A P n (x, B) + A P k
(x, dv)P (v, dw) AP (w, B). (13.39)
j =1 A k =1 A
If we suppose that there is an atom α and take A = α then these forms are somewhat
simplified: the decomposition (13.39) reduces to
n −1
j
n n
P (x, B) = α P (x, B) + αP
k
(x, α)P j −k (α, α) α P n −j (α, B). (13.40)
j =1 k =1
In the general state space case it is natural to consider convergence from an arbitrary
initial distribution λ. It is equally natural to consider convergence of the integrals
Eλ [f (Φn )] = λ(dx) P n (x, dy)f (w) (13.41)
for arbitrary non-negative functions f . We will use either the probabilistic or the
operator-theoretic version of this quantity (as given by the two sides of (13.41)) inter-
changeably, as seems most transparent, in what follows.
We explore convergence of Eλ [f (Φn )] for general (unbounded) f in detail in Chap-
ter 14. Here we concentrate on bounded f , in view of the definition (13.7) of the total
variation norm.
When α is an atom in B + (X), let us therefore extend the notation in (13.10)–(13.12)
to the forms
aλ (n) = Pλ (τα = n), (13.42)
tf (n) = α P n (α, dy)f (y) = Eα [f (Φn )I{τα ≥ n}] : (13.43)
these are well defined (although possibly infinite) for any non-negative function f on X
and any probability measure λ on B(X).
As in (13.14) and (13.15) we can use this terminology to write the first-entrance and
last-exit formulations as
λ(dx)P n (x, α) = aλ ∗ u (n), (13.44)
P n (α, dy)f (y) = u ∗ tf (n). (13.45)
13.2. Renewal and regeneration 325
The first-entrance last-exit decomposition (13.40) can similarly be formulated, for any
λ, f , as
λ(dx) P (x, dw)f (w) = λ(dx) α P n (x, dw)f (w) + aλ ∗ u ∗ tf (n). (13.46)
n
The general state space version of Proposition 13.1.1 provides the critical bounds needed
for our approach to ergodic theorems. Using the notation of (13.41) we have two bounds
which we shall refer to as Regenerative Decompositions.
Theorem 13.2.5. Suppose that Φ admits an accessible atom α and is positive Harris
recurrent with invariant probability measure π. Then for any probability measure λ and
f ≥ 0,
Now in the general state space case we have the representation for π given from (10.31)
by
∞
π(dw)f (w) = π(α) tf (y); (13.50)
1
(E2) control the first term in (13.48), which involves questions of the finiteness of the
hitting time distribution of τα when the chain begins with distribution λ; this is
automatically finite as required for a Harris recurrent chain, even without positive
recurrence, although for chains which are only recurrent it clearly needs care;
(E3) control the middle term in (13.48), which again involves finiteness of π to bound
its last element, but more crucially then involves only the ergodicity of the atom
α, regardless of λ: for we know from Lemma D.7.1 that if the atom is ergodic so
that (13.19) holds then also
lim aλ ∗ u (n) = π(α), (13.51)
n →∞
Thus recurrence, or rather Harris recurrence, will be used twice to give bounds: positive
recurrence gives one bound; and, centrally, the equivalence of positivity and regularity
ensures the atom is ergodic, exactly as in Theorem 13.2.3.
Bounded functions are the only ones relevant to total variation convergence. The
Regenerative Decomposition is however valid for all f ≥ 0. Bounds in this decom-
position then involve integrability of f with respect to π, and a non-trivial extension
of regularity to what will be called f -regularity. This will be held over to the next
chapter, and here we formalize the above steps and incorporate them with the splitting
technique, to prove the Aperiodic Ergodic Theorem.
Proof (i) Let us first assume that there is an accessible ergodic atom in the
space. The proof is virtually identical to that in the countable case. We have
λ(dx)P n (x, · ) − π
= sup λ(dx) P n (x, dw)f (w) − π(dw)f (w)
|f |≤1
by Lemma D.7.1; here we use the fact that α is ergodic and, again, the representation
∞
that π(X) = π(α) 1 t1 (j) < ∞.
We must finally control the first term. To do this, we need only note that, again
since |f | ≤ 1, we have
Eλ [f (Φn )I{τα ≥ n}] ≤ Pλ (τα ≥ n) (13.56)
and this expression tends to zero by monotone convergence as n → ∞, since α is Harris
recurrent and Px (τα < ∞) = 1 for every x.
Notice explicitly that in (13.54)–(13.56) the bounds which tend to zero are indepen-
dent of the particular |f | ≤ 1, and so we have the required supremum norm convergence.
(ii) Now assume that Φ is strongly aperiodic. Consider the split chain Φ̌: we
know this is also strongly aperiodic from Proposition 5.5.6 (ii), and positive Harris
from Proposition 10.4.2. Thus from Proposition 13.2.4 the atom α̌ is ergodic. Now our
use of total variation norm convergence renders the transfer to the original chain easy.
Using the fact that the original chain is the marginal chain of the split chain, and that
π is the marginal measure of π̌, we have immediately
λ(dx)P n (x, · ) − π
= 2 sup | λ(dx)P n (x, A) − π(A)|
A ∈B(X) X
= 2 sup | λ∗ (dxi )P̌ n (xi , A) − π̌(A)|
A ∈B(X) X̌
≤ 2 sup | λ∗ (dxi )P̌ n (xi , B̌) − π̌(B̌)|
B̌ ∈B( X̌) X̌
=
λ∗ (dxi )P̌ n (xi , · ) − π̌
, (13.57)
where the inequality follows since the first supremum is over sets in B(X̌) of the form
A0 ∪ A1 and the second is over all sets in B(X̌).
Applying the result (i) for chains with accessible atoms shows that the total variation
norm in (13.57) for the split chain tends to zero, so we are finished.
is non-increasing in n.
328 Ergodicity
Proof We have from the definition of total variation and the invariance of π that
λ(dx)P n +1 (x, · ) − π
= sup | λ(dx)P n +1 (x, dy)f (y) − π(dy)f (y)|
f :|f |≤1
= sup | λ(dx)P n (x, dw) P (w, dy)f (y) − π(dw) P (w, dy)f (y) |
f :|f |≤1
≤ sup | λ(dx)P n (x, dw)f (w) − π(dw)f (w)| (13.58)
f :|f |≤1
Theorem 13.3.3. If Φ is positive Harris and aperiodic, then for every initial distri-
bution λ
λ(dx)P n (x, · ) − π
→ 0, n → ∞. (13.59)
Proof Since for some m the skeleton Φm is strongly aperiodic, and also positive
Harris by Theorem 10.4.5, we know that
λ(dx)P n m (x, · ) − π
→ 0, n → ∞. (13.60)
The result for P n then follows immediately from the monotonicity in (13.58).
As we mentioned in the discussion of the periodic behavior of Markov chains, the
results are not quite as simple to state in the periodic as in the aperiodic case; but they
can be easily proved once the aperiodic case is understood.
The asymptotic behavior of positive recurrent chains which may not be Harris is
also easy to state now that we have analyzed positive Harris chains.
The final formulation of these results for quite arbitrary positive recurrent chains is
Theorem 13.3.4. (i) If Φ is positive Harris with period d ≥ 1, then for every initial
distribution λ
d−1
−1
d λ(dx) P n d+r (x, · ) − π
→ 0, n → ∞. (13.61)
r =0
(ii) If Φ is positive recurrent with period d ≥ 1, then there is a π-null set N such that
for every initial distribution λ with λ(N ) = 0
d−1
−1
d λ(dx) P n d+r (x, · ) − π
→ 0, n → ∞. (13.62)
r =0
13.4. Sums of transition probabilities 329
Proof The result (i) is straightforward to check from the existence of cycles in
Section 5.4.3, together with the fact that the chain restricted to each cyclic set is
aperiodic and positive Harris on the d-skeleton. We then have (ii) as a direct corollary
of the decomposition of Theorem 9.1.5.
Finally, let us complete the circle by showing the last step in the equivalences in The-
orem 13.0.1. Notice that (13.63) is ensured by (13.1), using the Dominated Convergence
Theorem, so that our next result is in fact marginally stronger than the corresponding
statement of the Aperiodic Ergodic Theorem.
Theorem 13.3.5. Let Φ be ψ-irreducible and aperiodic, and suppose that there exists
some ν-small set C ∈ B + (X) and some P ∞ (C) > 0 such that as n → ∞
νC (dx)(P n (x, C) − P ∞ (C)) → 0 (13.63)
C
Proof Using the Nummelin splitting via the set C for the m-skeleton, we find that
(13.63) taken through the sublattice nm is equivalent to
Thus the atom α̌ is ergodic and the results of Section 13.3 all hold, with P ∞ (C) = π(C).
∞
∞
|a ∗ u (n) − b ∗ u (n)| ≤ P(Tab > n) = E[Tab ]. (13.67)
n =0 n =0
Now we know from Section 10.3.1 that when p is aperiodic and mp < ∞, the linked
forward recurrence time chain V ∗ is positive recurrent with invariant probability
e∗ (i, j) = e(i)e(j).
Hence from any state (i, j) with e∗ (i, j) > 0 we have as in Proposition 11.1.1
Let us consider specifically the initial distributions δ0 and δ1 : these correspond to the
undelayed renewal process and the process delayed by exactly one time unit respectively.
For this choice of initial distribution we have for n > 0
δ0 ∗ u (n) = u(n),
δ1 ∗ u (n) = u(n − 1).
Now E[T01 ] ≤ E1,2 [τ1,1 ]+1 and it is certainly the case that e∗ (1, 2) > 0. So from (13.30),
(13.67) and (13.68)
∞
Var (u) := |u(n) − u(n − 1)| ≤ E1,2 [τ1,1 ] + 1 < ∞. (13.69)
n =0
We now need to extend the result to more general initial distributions with finite mean.
By the triangle inequality it suffices to consider only one arbitrary initial distribution a
and to take the other as δ0 . To bound the resulting quantity |a ∗ u (n) − u(n)| we write
the upper tails of a for k ≥ 0 as
∞
k
a(k) := a(j) = 1 − a(j)
j =k +1 j =0
and put
n
j
≥ | [1 − a(k)][u(n − j) − u(n − j − 1)]|
j =0 k =0
n
= | [u(n − j) − u(n − j − 1)]
j =0
n
j
− a(k)[u(n − j) − u(n − j − 1)]|
j =0 k =0
n
n
= |u(n) − a(k) [u(n − j) − u(n − j − 1)]|
k =0 j =k
n
= |u(n) − a(k)u(n − k)| (13.70)
k =0
so that
|u(n) − a ∗ u (n)| ≤ a ∗ w (n) = [ a(n)][ w(n)]. (13.71)
n n n n
But by assumption the mean ma = a(n) is finite, and (13.69) shows that the sequence
w(n) is also summable; and so we have
|u(n) − a ∗ u (n)| ≤ ma Var (u) < ∞ (13.72)
n
as required.
It is obviously of considerable interest to know under what conditions we have
|a ∗ u (n) − m−1
p | < ∞; (13.73)
n
that is, when this result holds with the equilibrium measure as one of the initial mea-
sures.
Using Proposition 13.4.1 we know that this will occur if the equilibrium distribution
e has a finite mean; and since we know the exact structure of e it is obvious that me < ∞
if and only if
sp := n2 p(n) < ∞.
n
me = [sp − mp ]/[2mp ]
we have from Proposition 13.4.1 and in particular the bound (13.71) the following
pleasing corollary:
332 Ergodicity
n =1
are finite. A result such as this requires regularity of the initial states x, y: recall from
Chapter 11 that a probability measure µ on B(X) is called regular if
We will again follow the route of first considering chains with an atom, then translating
the results to strongly aperiodic and thence to general chains.
Theorem 13.4.3. Suppose Φ is an aperiodic positive Harris chain and suppose that
the chain admits an atom α ∈ B+ (X). Then for any regular initial distributions λ, µ,
∞
λ(dx)µ(dy)
P n (x, · ) − P n (y, · )
< ∞; (13.75)
n =1
∞
To bound this term note that n =1 α P n (α, X) = Eα [τα ] < ∞ since every accessible
atom is regular from Theorems 11.1.4 and 11.1.2; and so it remains only to prove that
∞
λ(dx)|ax ∗ u (n) − u(n)| < ∞. (13.80)
n =1
Proof Consider the strongly aperiodic case. The theorem is valid for the split
chain, since the split measures λ∗ , µ∗ are regular for Φ̌: this follows from the charac-
terization in Theorem 11.3.12.
Since the result is a total variation result it remains valid when restricted to the
original chain, as in (13.57).
In the arbitrary aperiodic case we can apply Proposition 13.3.2 to move to a skeleton
chain, as in the proof of Theorem 13.2.5.
The most interesting special case of this result is given in the following theorem.
Theorem 13.4.5. Suppose Φ is an aperiodic positive Harris chain and that α is an
accessible atom. If
Eα [τα2 ] < ∞, (13.82)
then for any regular initial distribution λ
∞
λP n − π
< ∞. (13.83)
n =1
Proof In the case where there is an atom α in the space, we have as in Proposi-
tion 13.4.2 that π is a regular measure when the second-order moment (13.82) is finite,
and the result is then a consequence of Theorem 13.4.4.
334 Ergodicity
13.5 Commentary*
It is hard to know where to start in describing contributions to these theorems. The
countable chain case has an immaculate pedigree: Kolmogorov [215] first proved this
result, and Feller [114] and Chung [71] give refined approaches to the single-state version
(13.6), essentially through analytic proofs of the lattice renewal theorem.
The general state space results in the positive recurrent case are largely due to
Harris [155] and to Orey [308]. Their results and related material, including a null
recurrent version in Section 18.1 below, are all discussed in a most readable way in
Orey’s monograph [309]. Prior to the development of the splitting technique, proofs
utilized the concept of the tail σ-field of the chain, which we have not discussed so far,
and will only touch on in Chapter 17.
The coupling proofs are much more recent, although they are usually dated to
Doeblin [94]. Pitman [317] first exploited the positive recurrent coupling in the way
we give it here, and his use of the result in Proposition 13.4.1 was even then new, as
was Theorem 13.4.4.
Our presentation of this material has relied heavily on Nummelin [303], and further
related results can be found in his Chapter 6. In particular, for results of this kind in a
more general setting
where the renewal sequence is allowed to vary from the probabilistic
structure with n p(n) = 1 which we have used, the reader is referred to Chapters 4
and 6 of [303].
It is interesting to note that the first-entrance last-exit decomposition, which shows
so clearly the role of the single ergodic atom, is a relative late-comer on the scene.
Although probably used elsewhere, it surfaces in the form given here in Nummelin [301]
and Nummelin and Tweedie [307], and appears to be less than well known even in the
countable state space case. Certainly, the proof of ergodicity is much simplified by using
the Regenerative Decomposition.
We should note, for the reader who is yet again trying to keep stability nomenclature
straight, that even the “ergodicity” terminology we use here is not quite standard: for
example, Chung [71] uses the word “ergodic” to describe certain ratio limit theorems
rather than the simple limit theorem of (13.8). We do not treat ratio limit theorems in
this book, except in passing in Chapter 17: it is a notable omission, but one dictated by
the lack of interesting examples in our areas of application. Hence no confusion should
arise, and our ergodic chains certainly coincide with those of Feller [114], Nummelin
[303] and Revuz [326]. The latter two books also have excellent treatments of ratio
limit theorems.
We have no examples in this chapter. This is deliberate. We have shown in Chap-
ter 11 how to classify specific models as positive recurrent using drift conditions: we
can say little else here other than that we now know that such models converge in the
relatively strong total variation norm to their stationary distributions. Over the course
of the next three chapters, we will however show that other much stronger ergodic
properties hold under other more restrictive drift conditions; and most of the models
in which we have been interested will fall into these more strongly stable categories.
Commentary for the second edition: We wrote in Section 13.2 that we will re-
grettably do far less than justice to the full power of renewal and regenerative processes,
or to the coupling method itself. It is true that the proof of ergodicity in this chapter
13.5. Commentary* 335
and the refinements that follow can be streamlined by using the split chain machinery
more fully. In particular, rather than prove a renewal theorem such as (13.31) and
then use this to prove an ergodic theorem such as Proposition 13.2.4, it is far simpler
to use coupling to prove the ergodic theorem directly as in [127, 128]. See also the
aforementioned book by Lindvall on the coupling method [239].
Chapter 14
ν
f = sup |ν(g)|
g :|g |≤f
336
f -Ergodicity and f -regularity 337
(iii) There exists some petite set C and some extended-valued non-negative function V
satisfying V (x0 ) < ∞ for some x0 ∈ X, and
∆V (x) ≤ −f (x) + bIC (x), x ∈ X. (14.3)
Any of these three conditions imply that the set SV = {x : V (x) < ∞} is absorbing
and full, where V is any solution to (14.3) satisfying the conditions of (iii), and any
sublevel set of V satisfies (14.2); and for any x ∈ SV ,
P n (x, · ) − π
f → 0 (14.4)
as n → ∞. Moreover, if π(V ) < ∞, then there exists a finite constant Bf such that for
all x ∈ SV ,
∞
P n (x, · ) − π
f ≤ Bf (V (x) + 1). (14.5)
n =0
Proof The equivalence of (i) and (ii) follows from Theorem 14.1.1 and Theo-
rem 14.2.11. The equivalence of (ii) and (iii) is in Theorems 14.2.3 and 14.2.4, and the
fact that sublevel sets of V are “self-regular” as in (14.2) is shown in Theorem 14.2.3.
The limit theorems are then contained in Theorems 14.3.3, 14.3.4 and 14.3.5.
Much of this chapter is devoted to proving this result, and related f -regularity prop-
erties which follow from (14.2), and the pattern is not dissimilar to that in the previous
chapter: indeed, those ergodicity results, and the equivalences in Theorem 13.0.1, can
be viewed as special cases of the general f results we now develop.
The f -norm limit (14.4) obviously implies that the simpler limit (14.1) also holds.
In
fact, if g is any function satisfying |g| ≤ c(f +1) for some c < ∞ then Ex [g(Φk )] → g dπ
for states x with V (x) < ∞, for V satisfying (14.3). We formalize the behavior we will
analyze in
f -Ergodicity
We shall say that the Markov chain Φ is f -ergodic if f ≥ 1 and
lim
P k (x, · ) − π
f = 0.
k →∞
338 f -Ergodicity and f -regularity
The f -Norm Ergodic Theorem states that if any one of the equivalent conditions of
the Aperiodic Ergodic Theorem holds then the simple additional condition that π(f ) is
finite is enough to ensure that a full absorbing set exists on which the chain is f -ergodic.
Typically the way in which finiteness of π(f ) would be established in an application is
through finding a test function V satisfying (14.3): and if, as will typically happen, V
is finite everywhere then it follows that the chain is f -ergodic without restriction, since
then SV = X.
and thus the finiteness of this expectation will guarantee convergence of the third term
in (14.6), as it did in the case of the ergodic theorems in Chapter 13. Also as in
Chapter 13, the central term in (14.6) is controlled by the convergence of the renewal
sequence u regardless of f , provided the expression in (14.7) is finite.
Thus it is only the first term in (14.6) that requires a condition other than ergodicity
and finiteness of (14.7). Somewhat surprisingly, for unbounded f this is a much more
troublesome term to control than for bounded f , when it is a simple consequence of
recurrence that it tends to zero. This first term can be expressed alternatively as
6 7
λ(dx) α P n (x, dw)f (w) = Eλ f (Φn )I(τα ≥ n) (14.8)
14.1. f -Properties: chains with atoms 339
This is similar in form to (14.7), and if (14.9) is finite, then we have the desired con-
clusion that (14.8) does tend to zero. In fact, it is only the sum of these terms that
appears tractable, and for this reason it is in some ways more natural to consider the
summed form (14.5) rather than simple f -norm convergence.
Given this motivation to require finiteness of (14.7) and (14.9), we introduce the
concept of f -regularity which strengthens our definition of ordinary regularity.
f -Regularity
A set C ∈ B(X) is called f -regular, where f : X → [1, ∞) is a measurable
function, if for each B ∈ B+ (X),
τ
B −1
sup Ex f (Φk ) < ∞.
x∈C
k =0
τ
B −1
Eλ f (Φk ) < ∞.
k =0
From
τ B −1
this definition
an f -regular state, seen as a singleton set, is a state x for which
k =0 f (Φk ) < ∞, B ∈ B (X).
+
Ex
As with regularity, this definition of f -regularity appears initially to be stronger
than required since it involves all sets in B + (X); but we will show this to be again
illusory.
A first consequence of f -regularity, and indeed of the weaker “self-f -regular” form
in (14.2), is
C −1
τ
sup Ex [ f (Φn )] < ∞, (14.10)
x∈C n =0
Proof First of all, observe that under (14.10) the set C is Harris recurrent and
hence C ∈ B + (X) by Proposition 9.1.1. The invariant measure π then satisfies, from
Theorem 10.4.9,
τ
C −1
π(f ) = π(dy)Ey f (Φn ) .
C n =0
If C satisfies (14.10) then the expectation is uniformly bounded on C itself, so that
π(f ) ≤ π(C)MC < ∞.
Although f -regularity is a requirement on the hitting times of all sets, when the
chain admits an atom it reduces to a requirement on the hitting times of the atom as
was the case with regularity.
Proposition 14.1.2. Suppose Φ is positive recurrent with π(f ) < ∞, and that an atom
α ∈ B+ (X) exists.
(i) Any set C ∈ B(X) is f -regular if and only if
σα
sup Ex f (Φk ) < ∞.
x∈C
k =0
(ii) There exists an increasing sequence of sets Sf (n) where each Sf (n) is f -regular
and the set Sf = ∪Sf (n) is full and absorbing.
When π(f ) < ∞, by Theorem 11.3.5 the bound P Gα (x, f ) ≤ Gα (x, f ) + c holds for
τ
the constant c = Eα [ kα=1 f (Φk )] = π(f )/π(α) < ∞, which shows that the set {x :
Gα (x, f ) < ∞} is absorbing, and hence by Proposition 4.2.3 this set is full.
To prove (i), let B be any sublevel set of the function Gα (x, f ) with π(B) > 0 and
apply the bound
B −1
τ
σα
Gα (x, f ) ≤ Ex [ f (Φk )] + sup Ey [ f (Φk )].
y ∈B
k =0 k =0
This shows that Gα (x, f ) is bounded on C if C is f -regular, and proves the “only if”
part of (i).
We have from Theorem 10.4.9 that for any B ∈ B + (X),
τB
∞ > π(dx)Ex f (Φk )
B k =0
τB
≥ π(dx)Ex I(σα < τB ) f (Φk )
B k =σ α +1
τB
= π(dx)Px (σα < τB )Eα f (Φk )
B k =1
14.1. f -Properties: chains with atoms 341
where to obtain the last equality we have conditioned at time σα and used the strong
Markov property.
Since α ∈ B+ (X) we have that
τ
B −1
π(α) = π(dx)Ex I(Φk ∈ α) > 0,
B k =0
which
shows that B π(dx)Px (σα < τB ) > 0. Hence from the previous bounds, we have
τB
k =1 f (Φk ) < ∞ for B ∈ B (X).
+
Eα
Using the bound τB ≤ σα + θσ α τB , we have for arbitrary x ∈ X,
τB
σα
τB
Ex f (Φk ) ≤ Ex f (Φk ) + Eα f (Φk ) (14.12)
k =0 k =0 k =1
Proof From Proposition 14.1.2 (ii), the set of f -regular states Sf is absorbing and
full when π(f ) < ∞. If we can prove
P k (x, · )−π
f → 0, for x ∈ Sf , this will establish
both (i) and (ii).
But this f -norm convergence follows from (14.6), where the first term tends to zero
τα
since
∞ x is f -regular, so
τ α that E x [ n =1 f (Φn )] < ∞; the third term tends to zero since
n =1 t f (j) = E α [ n =1 f (Φn )] = π(f )/π(α) < ∞; and the central term converges to
zero by Lemma D.7.1 and the fact that α is an ergodic atom.
342 f -Ergodicity and f -regularity
To prove the result in (iii), we use the same method of proof as for the ergodic case.
By the triangle inequality it suffices to assume that one of the initial distributions is δα .
We again use the first form of the Regenerative Decomposition Theorem to see that for
any |g| ≤ f , x ∈ X, the sum
∞
λ(dx)|P n (x, g) − P n (α, g)|
n =1
∞
$ %$
∞ %
λ(dx)|ax ∗ u (n) − u(n)| n
α P (α, f ) . (14.15)
n =1 n =1
The first of these is again finite since we have assumed λ to be f -regular; and in the
second, the right hand term is similarly finite since π(f ) < ∞, whilst the left hand term
is independent of f , and since λ is regular (given f ≥ 1), is bounded by Eλ [τα ]Var (u),
using (13.72).
Since for some finite M
τα
Ex [τα ] ≤ Ex [ f (Φn )] ≤ M Gα (x, f ),
n =1
P k (x, · ) − π
f → 0, k → ∞,
and
∞
P n (x, · ) − P n (y, · )
f < ∞.
n =1
Somewhat surprisingly, perhaps, this recipe does not work in a trivially easy way.
The most difficult step in this approach is that when we go to a split chain it is necessary
to consider an m-skeleton, but we do not yet know if the skeletons of an f -regular chain
are also f -regular. Such is indeed the case and we will prove this key result in the next
section, by exploiting drift criteria.
This may seem to be a much greater effort than we needed for the Aperiodic Ergodic
Theorem: but it should be noted that we devoted all of Chapter 11 to the equivalence
of regularity and drift conditions in the case of f ≡ 1, and the results here actually
require rather less effort. In fact, much of the work in this chapter is based on the
results already established in Chapter 11, and the duality between drift and regularity
established there will serve us well in this more complex case.
Lemma 14.2.1. Suppose that Φ is ψ-irreducible. If (14.16) holds for a positive function
V which is finite at some x0 ∈ X, then the set Sf := {x ∈ X : V (x) < ∞} is absorbing
and full.
The power of (V3) largely comes from the following
344 f -Ergodicity and f -regularity
N
N
Ex [f (Φk )] ≤ V (x) + Ex [s(Φk )],
k =0 k =0
τ
−1 τ
−1
Ex f (Φk ) ≤ V (x) + Ex s(Φk ) .
k =0 k =0
The first inequality in Theorem 14.2.2 bounds the mean value of f (Φk ), but says
nothing about the convergence of the mean value. We will see that the second bound
is in fact crucial for obtaining f -regularity for the chain, and we turn to this now.
In linking the drift condition (V3) with f -regularity we will consider the extended-
real-valued function GC (x, f ) defined in (11.21) as
σC
GC (x, f ) = Ex f (Φk ) (14.18)
k =0
(i) If (V3) holds for a petite set C, then for any B ∈ B + (X) there exists c(B) < ∞
such that
τ
B −1
Ex f (Φk ) ≤ V (x) + c(B).
k =0
(ii) If there exists one f -regular set C ∈ B+ (X), then C is petite and the function
V (x) = GC (x, f ) satisfies (V3) and is bounded on A for any f -regular set A.
Proof (i) Suppose that (V3) holds, with C a ψa -petite set. By the Comparison
Theorem 14.2.2, Lemma 11.3.10, and the bound
τ
B −1 τ
B −1
Ex f (Φk ) ≤ V (x) + bEx IC (Φk )
k =0 k =0
τ
B −1
≤ V (x) + bEx ψa (B)−1 Ka (Φk , B)
k =0
∞
τ
B −1
= V (x) + bψa (B)−1 ai Ex IB (Φk +i )
i=0 k =0
∞
≤ V (x) + bψa (B)−1 iai .
i=0
∞
Since we can choose a so that ma = i=0 iai < ∞ from Proposition 5.5.6, the result
follows with c(B) = bψa (B)−1 ma . We then have
τ
B −1
sup Ex f (Φk ) ≤ sup V (x) + c(B),
x∈A x∈A
k =0
Proof To see that (i) implies (ii), suppose that C is petite and satisfies (14.19).
By Theorem 11.3.5 we may find a constant b < ∞ such that (V3) holds for GC (x, f ).
It follows from Theorem 14.2.3 that C is f -regular.
The set C is Harris recurrent under the conditions of (i), and hence lies in B + (X)
by Proposition 9.1.1.
346 f -Ergodicity and f -regularity
Conversely, if C is f -regular
τ −1 then it is also petite from Proposition 11.3.8, and if
C ∈ B + (X) then supx∈C Ex [ kC=0 f (Φk )] < ∞ by the definition of f -regularity.
As an easy corollary to Theorem 14.2.3 we obtain the following generalization of
Proposition 14.1.2.
Theorem 14.2.5. If there exists an f -regular set C ∈ B+ (X), then there exists an
increasing sequence {Sf (n) : n ∈ Z+ } of f -regular sets whose union is full. Hence there
is a decomposition
X = Sf ∪ N (14.20)
where the set Sf is full and absorbing and Φ restricted to Sf is f -regular.
Proof From Theorem 14.2.3 (i) we see that if (V3) holds for a finite-valued V then
each sublevel set of V is f -regular. This establishes f -regularity of Φ.
Conversely, if Φ is f -regular then it follows that an f -regular set C ∈ B + (X) ex-
ists. The function V (x) = GC (x, f ) is everywhere finite and satisfies (V3), by Theo-
rem 14.2.3 (ii).
As a corollary to Theorem 14.2.6 we obtain a final characterization of f -regularity
of Φ, this time in terms of petite sets:
Theorem 14.2.7. Suppose that Φ is ψ-irreducible. Then the chain is f -regular if and
only if there exists a petite set C such that the expectation
τ
C −1
Ex f (Φk )
k =0
This is essentially of the same form as (14.16), and provides an approach to f -regularity
for the m-skeleton which will give us the desired equivalence between f -regularity for
Φ and its skeletons.
To apply Theorem 14.2.3 and (14.21) to obtain an equivalence between f -properties
m −1 i
of Φ and its skeletons we must replace the function i=0 P IC with the indicator
function of a petite set. The following result shows that this is possible whenever C is
petite and the chain is aperiodic.
Let us write for any positive function g on X,
m −1
g (m ) := P i g. (14.22)
i=0
Lemma 14.2.8. If Φ is aperiodic and if C ∈ B(X) is a petite set, then for any ε > 0
and m ≥ 1 there exists a petite set Cε such that
(m )
IC ≤ mIC ε + ε.
Proof Since Φ is aperiodic, it follows from the definition of the period given in
(5.40) and the fact that petite sets are small, proven in Proposition 5.5.7, that for a
non-trivial measure ν and some k ∈ Z+ , we have the simultaneous bound
P k m −i (x, B) ≥ IC (x)ν(B), x ∈ X, B ∈ B(X), 0 ≤ i ≤ m − 1.
Hence we also have
P k m (x, B) ≥ P i IC (x)ν(B), x ∈ X, B ∈ B(X), 0 ≤ i ≤ m − 1,
which shows that
P k m (x, · ) ≥ IC (x)m−1 ν.
(m )
(m )
The set Cε = {x : IC (x) ≥ ε} is therefore νk -small for the m-skeleton, where νk =
εm−1 ν, whenever this set is non-empty. Moreover, C ⊂ Cε for all ε < 1.
(m ) (m )
Since IC ≤ m everywhere, and since IC (x) < ε for x ∈ Cεc , we have the bound
(m )
IC ≤ mIC ε + ε
We can now put these pieces together and prove the desired solidarity for Φ and its
skeletons.
Theorem 14.2.9. Suppose that Φ is ψ-irreducible and aperiodic. Then C ∈ B+ (X) is
f -regular if and only if it is f (m ) -regular for any one, and then every, m-skeleton chain.
348 f -Ergodicity and f -regularity
Proof If C is f (m ) -regular for an m-skeleton then, letting τBm denote the hitting
time for the skeleton, we have by the Markov property, for any B ∈ B+ (X),
τ
B −1 m τ
B −1 m
m m
−1 −1
i
Ex P f (Φk m ) = Ex f (Φk m +i )
k =0 i=0 k =0 i=0
τ
B −1
≥ Ex f (Φj ) .
j =0
By the assumption of f (m ) -regularity, the left hand side is bounded over C and hence
the set C is f -regular.
Conversely, if C ∈ B+ (X) is f -regular then it follows from Theorem 14.2.3 that (V3)
holds for a function V which is bounded on C.
By repeatedly applying P to both side of this inequality we obtain as in (14.21)
(m )
P m V ≤ V − f (m ) + bIC .
PmV ≤ V − f (m ) + bmIC + 1
2
≤ V − 12 f (m ) + bmIC ,
and thus (V3) holds for the m-skeleton. Since V is bounded on C, we see from Theo-
rem 14.2.3 that C is f (m ) -regular for the m-skeleton.
As a simple but critical corollary we have
Theorem 14.2.10. Suppose that Φ is ψ-irreducible and aperiodic. Then Φ is f -regular
if and only if each m-skeleton is f (m ) -regular.
The importance of this result is that it allows us to shift our attention to skeleton
chains, one of which is always strongly aperiodic and hence may be split to form an
artificial atom; and this of course allows us to apply the results obtained in Section 14.1
for chains with atoms.
The next result follows this approach to obtain a converse to Proposition 14.1.1,
thus extending Proposition 14.1.2 to the non-atomic case.
Theorem 14.2.11. Suppose that Φ is positive recurrent and π(f ) < ∞. Then there
exists a sequence {Sf (n)} of f -regular sets whose union is full.
Proof We need only look at a split chain corresponding to the m-skeleton chain,
which possess an f (m ) -regular atom by Proposition 14.1.2. It follows from Proposi-
tion 14.1.2 that for the split chain the required sequence of f (m ) -regular sets exist,
and then following the proof of Proposition 11.1.3 we see that for the m-skeleton an
increasing sequence {Sf (n)} of f (m ) -regular sets exists whose union is full.
From Theorem 14.2.9 we have that each of the sets {Sf (n)} is also f -regular for Φ
and the theorem is proved.
14.3. f -Ergodicity for general chains 349
Proof (i) By positive recurrence we have for x lying in the maximal Harris set H,
and any m ∈ Z+ ,
lim inf P k (x, f ) ≥ lim inf P k (x, m ∧ f ) = π(m ∧ f ).
k →∞ k →∞
Result (ii) is now obvious using the split chain, given the results for a chain possessing
an atom, and (iii) follows directly from (ii).
We again obtain f -ergodic theorems for general aperiodic Φ by considering the m-
skeleton chain. The results obtained in the previous section show that when Φ has
appropriate f -properties then so does each m-skeleton. For aperiodic chains, there
always exists some m ≥ 1 such that the m-skeleton is strongly aperiodic, and hence
we may apply Theorem 14.3.1 to the m-skeleton chain to obtain f -ergodicity for this
skeleton. This then carries over to the process by considering the m distinct skeleton
chains embedded in Φ.
The following lemma allows us to make the desired connections between Φ and its
skeletons.
Lemma 14.3.2. (i) For any f ≥ 1 we have for n ∈ Z+ ,
P n (x, · ) − π
f ≤
P k m (x, ·) − π(·)
f ( m ) ,
for k satisfying n = km + i with 0 ≤ i ≤ m − 1.
(ii) If for some m ≥ 1 and some x ∈ X we have
P k m (x, · ) − π
f ( m ) → 0 as k → ∞,
then
P k (x, · ) − π
f → 0 as k → ∞.
(iii) If the m-skeleton is f (m ) -ergodic, then Φ itself is f -ergodic.
350 f -Ergodicity and f -regularity
Proof Under the conditions of (i) let |g| ≤ f and write any n ∈ Z+ as n = km + i
with 0 ≤ i ≤ m − 1. Then
|P n (x, g) − π(g)| = |P k m (x, P i g) − π(P i g)|
≤
P k m (x, ·) − π(·)
f ( m ) .
This proves (i) and the remaining results then follow.
This lemma and the ergodic theorems obtained for strongly aperiodic chains finally
give the result we seek.
Theorem 14.3.3. Suppose that Φ is positive recurrent and aperiodic.
(i) If π(f ) = ∞, then P k (x, f ) → ∞ for all x.
(ii) If π(f ) < ∞, then the set Sf of f -regular sets is full and absorbing, and if x ∈ Sf
then
P k (x, · ) − π
f → 0, as k → ∞.
(iii) If Φ is f -regular, then Φ is f -ergodic. Conversely, if Φ is f -ergodic, then Φ
restricted to a full absorbing set is f -regular.
where V ( · ) = GC ( · , f ).
Proof Consider first the strongly aperiodic case, and construct a split chain Φ̌
using an f -regular set C. The theorem is valid from Theorem 14.1.3 for the split chain,
since the split measures µ∗ , λ∗ are f -regular for Φ̌. The bound on the sum can be taken
as
∞
λ∗ (dx)µ∗ (dy)
P̌ n (x, · ) − P̌ n (y, · )
f < Mf (λ∗ (V ) + µ∗ (V ) + 1)
n =1
and the analogous identity for µ, we see that the required bound holds in the strongly
aperiodic case.
In the arbitrary aperiodic case we can apply Lemma 14.3.2 to move to a skeleton
chain, as in the proof of Theorem 14.3.3.
The most interesting special case of this result is given in the following theorem.
where V ( · ) = GC ( · , f ).
Our final f -ergodic result, for quite arbitrary positive recurrent chains is given for
completeness in
Theorem 14.3.6. (i) If Φ is positive recurrent and if π(f ) < ∞, then there exists
a full set Sf , a cycle {Di : 1 ≤ i ≤ d} contained in Sf , and probabilities {πi : 1 ≤
i ≤ d} such that for any x ∈ Dr ,
P n d+r (x, · ) − πr
f → 0, n → ∞. (14.26)
d
d−1 P n d+r (x, · ) − π
f → 0, n → ∞. (14.27)
r =1
352 f -Ergodicity and f -regularity
Proof For π-a.e. x ∈ X we have from the Comparison Theorem 14.2.2, Theo-
rem 14.3.6 and (if π(f ) = ∞) the aperiodic version of Theorem 14.3.3, whether or not
π(s) < ∞,
1 1
N N
π(f ) = lim Ex [f (Φk )] ≤ lim Ex [s(Φk )] = π(s).
N →∞ N N →∞ N
k =1 k =1
The criterion for π(X) < ∞ in Theorem 11.0.1 is a special case of this result. How-
ever, it seems easier to prove for quite arbitrary non-negative f, s using these limiting
results.
for some finite c, d. We can rewrite (14.28) in the form of (V3); namely for some c > 0,
and large enough x
P (x, dy)y k ≤ xk − c xk −1 .
Proposition 14.4.1. If the increment distribution Γ has mean β < 0 and finite (k+1)st
moment, then the associated random walk on a half line is |x|k -regular. Hence the
process Φ admits a stationary measure π with finite moments of order k; and with
fk (y) = y k + 1,
14.4. f -Ergodicity of specific models 353
(i) for all λ such that λ(dx)xk +1 < ∞,
λ(dx)
P n (x, · ) − π
f k → 0, n → ∞;
Proof The calculations preceding the proposition show that for some c0 > 0,
d0 < ∞, and a compact set C ⊂ R+ ,
Since this V is a coercive function on R, it follows that (V3) holds with the choice of
f (x) = 1 + δV (x)
354 f -Ergodicity and f -regularity
Then the bilinear model is positive Harris, the invariant measure π also has finite k th
moments (that is, satisfies xk π(dx) < ∞), and
P n (x, · ) − π
x k → 0, n → ∞.
In the next chapter we will show that there is in fact a geometric rate of convergence
in this result. This will show that, in essence, the same drift condition gives us finiteness
of moments in the stationary case, convergence of time-dependent moments and some
conclusion about the rate at which the moments become stationary.
Theorem 14.5.1. Provided Γ has a finite mean β and is not concentrated on a lattice
nh, n ∈ Z+ , h > 0, then for any interval [a, b] and any initial distribution Γ0
Γ0 ∗ U [a + t, b + t] → β −1 (b − a), t → ∞. (14.35)
14.5. A key renewal theorem 355
Proof This result is taken from Feller ([115], p. 360) and its proof is not one we
pursue here. We do note that it is a special case of the general Key Renewal Theorem,
which states that under these conditions on Γ, (14.34) holds for all bounded non-negative
functions f which are directly Riemann integrable, for which again see Feller ([115], p.
361); for then (14.35) is the special case with f (s) = I[a,b] (s).
This result shows us the pattern for renewal theorems: in the limit, the measure U
approximates normalized Lebesgue measure.
We now show that one can trade off properties of Γ against properties of f (and to
some extent properties of Γ0 ) in asserting (14.34). We shall give a proof, based on the
ergodic properties we have been considering for Markov chains, of the following Uniform
Key Renewal Theorem.
Theorem 14.5.2. Suppose that Γ has a finite mean β and is spread out (as defined in
(RW2)).
(i) For any initial distribution Γ0 we have the uniform convergence
∞
lim sup |Γ0 ∗ U ∗ g(t) − β −1 g(s)ds| = 0 (14.36)
t→∞ |g |≤f 0
f is bounded; (14.37)
f is Lebesgue integrable; (14.38)
f (t) → 0, t → ∞. (14.39)
(ii) In particular, for any bounded interval [a, b] and Borel sets B
(iii) For any initial distribution Γ0 which is absolutely continuous, the convergence
(14.36) holds for f satisfying only (14.37) and (14.38).
Proof The proof of this set of results occupies the remainder of this section, and
contains a number of results of independent interest.
Before embarking on this proof, we note explicitly that we have accomplished a
number of tradeoffs in this result, compared with the Blackwell Renewal Theorem.
By considering spread-out distributions, we have exchanged the direct Riemann in-
tegrability condition for the simpler and often more verifiable smoothness conditions
(14.37)-(14.39). This is exemplified by the fact that (14.40) allows us to consider the
renewal measure of any bounded Borel set, whereas the general Γ version restricts us
to intervals as in (14.35). The extra benefits of smoothness of Γ0 in removing (14.39)
as a condition are also in this vein.
Moreover, by moving to the class of spread-out distributions, we have introduced a
uniformity into the Key Renewal Theorem which is analogous in many ways to the total
variation norm result in Markov chain limit theory. This analogy is not coincidental:
356 f -Ergodicity and f -regularity
as we now show, these results are all consequences of precisely that total variation
convergence for the forward recurrence time chain associated with this renewal process.
Recall from Section 3.5.3 the forward recurrence time process
that process, and denote its n-step transition law by P (x, · ). We showed that for
nδ
sufficiently small δ, when Γ is spread out, then (Proposition 5.3.3) the set [0, δ] is a
small set for V + +
δ , and (Proposition 5.4.7) V δ is also aperiodic.
It is trivial for this chain to see that (V2) holds with V (x) = x, so that the chain
is regular from Theorem 11.3.15, and if Γ0 has a finite mean, then Γ0 is regular from
Theorem 11.3.12.
This immediately enables us to assert from Theorem 13.4.4 that, if Γ1 , Γ2 are two
initial measures both with finite mean, and if Γ itself is spread out with finite mean,
∞
Γ1 P n δ ( · ) − Γ2 P n δ ( · )
< ∞. (14.41)
n =0
The crucial corollary to this example of Theorem 13.4.4, which leads to the Uniform
Key Renewal Theorem is
Proposition 14.5.3. If Γ is spread out with finite mean, and if Γ1 , Γ2 are two initial
measures both with finite mean, then
∞
Γ1 ∗ U − Γ2 ∗ U
:= |Γ1 ∗ U (dt) − Γ2 ∗ U (dt)| < ∞. (14.42)
0
∞
= n =0 [0,δ )
|(Γ1 P n δ − Γ2 P n δ ) ∗ U (dt)|
∞ (14.44)
≤ n =0 [0,δ ) [0,t]
|(Γ1 P n δ − Γ2 P n δ )(du)|U (dt − u)
∞
≤ n =0 [0,δ )
|(Γ1 P n δ − Γ2 P n δ )(du)|U [0, δ)
∞
≤ U [0, δ) n =0
Γ1 P n δ − Γ2 P n δ
Proposition 14.5.4. If Γ is spread out with finite mean, and if Γ1 , Γ2 are two initial
measures both with finite mean, then
Proof Suppose that ε is arbitrarily small but fixed. Using Proposition 14.5.3 we
can fix T such that ∞
|(Γ1 ∗ U − Γ2 ∗ U )(du)| ≤ ε. (14.46)
T
f (t − u) ≤ ε, u ∈ [0, T ];
for such a t, writing d = sup f (x) < ∞ from (14.37), it follows that for any g with
|g| ≤ f ,
T
|Γ1 ∗ U ∗ g(t) − Γ2 ∗ U ∗ g(t)| ≤ 0
|(Γ1 ∗ U − Γ2 ∗ U (du)|f (t − u)
t
+ T
|(Γ1 ∗ U − Γ2 ∗ U )(du)|f (t − u)
(14.47)
≤ ε
Γ1 ∗ U − Γ2 ∗ U
+ εd
:= ε
Proposition 14.5.5. If Γ is spread out with finite mean, and if Γ1 , Γ2 are any two
initial measures, then
Proof For fixed v, let Γv (A) := Γ(A)/Γ[0, v] for all A ⊆ [0, v] denote the truncation
of Γ(A) to [0, v].
For any g with |g| ≤ f ,
|Γ1 ∗ U ∗ g(t) − Γv1 ∗ U ∗ g(t)| ≤
Γ1 − Γv1
sup U ∗ f (x) (14.48)
x
which can be made smaller than ε by choosing v large enough, provided supx U ∗ f (x) <
∞. But if t > T , from (14.47), with Γ1 = δ0 , Γ2 = Γve and g = f ,
U ∗ f (t) = δ0 ∗ U ∗ f (t)
≤ Γve ∗ U ∗ f (t) + ε
−1 (14.49)
≤ Γe [0, v] Γe ∗ U ∗ f (t) + ε
−1 ∞
≤ Γe [0, v] β −1 0 f (u)du + ε
T
≤ |(Γ1 ∗ U − Γ2 ∗ U (du)|f (t − u)I[A ε (t)] c (u)
0 (14.50)
T
+ 0
(Γ1 ∗ U + Γ2 ∗ U )(du)f (t − u)IA ε (t) (u)
≤ ε
Γ1 ∗ U − Γ2 ∗ U
+ d(Γ1 + Γ2 ) ∗ U (Aε (t)).
If we now assume that the measure Γ1 + Γ2 to be absolutely continuous with respect
to µL e b , then, so is (Γ1 + Γ2 ) ∗ U ([115], p. 146).
Now since f is integrable, as t → ∞ for fixed T, ε we must have µL e b (Aε (t)) → 0.
But since T is fixed, we have that both µL e b [0, T ] < ∞ and (Γ1 + Γ2 ) ∗ U [0, T ] < ∞,
and it is a standard result of measure theory ([152], p. 125) that
(Γ1 + Γ2 ) ∗ U (Aε (t)) → 0, t → ∞.
We can thus make the last term in (14.50) arbitrarily small for large t, even without
assuming (14.39); now reconsidering (14.47), we see that Proposition 14.5.4 holds with-
out (14.39), provided we assume the existence of densities for Γ1 and Γ2 , and then
Theorem 14.5.2 (c) follows by the truncation argument of Proposition 14.5.5.
14.6. Commentary* 359
14.6 Commentary*
These results are largely recent. Although the question of convergence of Ex [f (Φk )] for
general f occurs in, for example, Markov reward models [25], most of the literature
on Harris chains has concentrated on convergence only for f ≤ 1 as in the previous
chapter. The results developed here are a more complete form of those in Meyn and
Tweedie [277], but there the general aperiodic case was not developed: only the strongly
aperiodic case is considered in detail. A more embryonic form of the convergence in
f -norm, indicating that if π(f ) < ∞ then Ex [f (Φk )] → π(f ), appeared as Theorem 2
of Tweedie [400].
Nummelin [303] considers f -regularity, but does not go on to apply the resulting
concepts to f -ergodicity, although in fact there are connections between the two which
are implicit through the Regenerative Decomposition in Nummelin and Tweedie [307].
That Theorem 14.1.1 admits a converse, so that when π(f ) < ∞ there exists a
sequence of f -regular sets {Sf (n)} whose union is full, is surprisingly deep. For general
state space chains, the question of the existence of f -regular sets requires the splitting
technique as did the existence of regular sets in Chapter 11. The key to their use
in analyzing chains which are not strongly aperiodic lies in the duality with the drift
condition (V3), and this is given here for the first time.
The fact that (V3) gives a criterion for finiteness of π(f ) was observed in
Tweedie [400]. Its use for asserting the second order stationarity of bilinear and other
time series models was developed in Feigin and Tweedie [111], and for analyzing random
walk in [401]. Related results on the existence of moments are also in Kalashnikov [188].
The application to the generalized Key Renewal Theorem is particularly satisfying.
By applying the ergodic theorems above to the forward recurrence time chain V + δ ,
we have “leveraged” from the discrete time renewal theory results of Section 13.2 to
the continuous time ones through the general Markov chain results. This Markovian
approach was developed in Arjas et al. [8], and the uniformity in Theorem 14.5.2,
which is a natural consequence of this approach, seems to be new there. The simpler
form without the uniformity, showing that one can exchange spread-outness of Γ for the
weaker conditions on f dates back to the original renewal theorems of Smith [361, 362,
363], whilst Breiman [47] gives a form of Theorem 14.5.2 (b). An elegant and different
approach is also possible through Stone’s Decomposition of U [374], which shows that
when Γ is spread out,
U = Uf + Uc
where Uf is a finite measure, and Uc has a density p with respect to µL e b satisfying
p(t) → β −1 as t → ∞.
The convergence, or rather summability, of the quantities
P n (x, · ) − π
f
leads naturally to a study of rates of convergence, and this is carried out in Nummelin
and Tuominen [306]. Building on this, Tweedie [401] uses similar approaches to those
in this chapter to derive drift criteria for more subtle rate of convergence results: the
interested reader should note the result of Theorem 3 (iii) of [401]. There it is shown
(essentially by using the Comparison Theorem) that if (V3) holds for a function f such
that
f (x) ≥ Ex [r(τC )], x ∈ Cc
360 f -Ergodicity and f -regularity
Commentary for the second edition: Several topics in this chapter have been
extended, or refined in specific applications, since publication of the first edition.
f -Regularity in queueing networks is the subject of [81, 264, 268, 266] – see also the
monograph [267]. The Comparison Theorem 14.2.2 is implicit in the stability analysis of
Tassiulas’s MaxWeight scheduling algorithm, now popular for routing and scheduling in
queueing networks [383, 137, 382, 268, 266, 267], and a version of Theorem 14.2.2 is used
in [145] in an early “heavy traffic” analysis of a queueing network. The Comparison
Theorem is also a component of the approach to network stability and performance
approximation developed in [273, 226, 223, 30, 31, 267]. In [81] the assumptions of
[393] are verified, provided an associated fluid model for the network is stable. This
establishes f -regularity for the network for polynomial f , as well as polynomial rates
of convergence in the f -Norm Ergodic Theorem 14.0.1.
Theory surrounding f -regularity is applied in the theory of controlled Markov models
(Markov decision processes, or MDPs) in [262, 261, 67, 263, 42, 267]. In particular, [42]
characterizes a notion of uniform f -regularity for MDPs.
Recently, Jarner and Roberts introduced a new drift criterion that can be used
to simplify the verification of polynomial rates of convergence [180]. Extensions of this
approach as well as explicit bounds on the rate of convergence are obtained in [126, 100].
The drift criterion of [180] can be expressed as an intermediate between the drift
criteria (V3) and (V4):
For example, if the inter-arrival times in the GI/M/1 queue possess a finite nth
moment, then (V4 ) holds with V (x) = 1 + xn and = 1 − n−1 .
We consider the special case = 12 to illustrate the application of (V4 ):
Proposition 14.6.1. Suppose that the chain Φ is ψ-irreducible and aperiodic, and
that the drift condition (V4 ) holds for some extended-real-valued function V satisfying
V (x0 ) < ∞ for some x0 ∈ X, with C petite, and = 12 . Then there exists a finite
constant B1 such that for all x ∈ SV ,
∞
-
P n (x, · ) − π
≤ B1 V (x). (14.52)
n =0
Proof We establish the assumptions of part (iii) of the f -Norm Ergodic Theo-
rem 14.0.1, with f ≡ 1. For this it is sufficient to show that the function U := 2β −1 V 2
1
√
Concavity of the square root gives the bound 1 + x ≤ 1 + 12 x for all x. Combining
this with the previous bound we obtain
;
−βV 2 (x) + bIC (x)
1
1 1
P V 2 (x) ≤ V 2 (x) 1 +
V (x)
−βV 2 (x) + bIC (x)
1
1
≤ V 2 (x) 1 + 12
V (x)
−βV 2 (x) + bIC (x)
1
1 1
= V 2 (x) + 2 1 .
V 2 (x)
1
∆U ≤ −1 + β −1 1 bIC (x) ≤ −1 + β −1 bIC (x) ,
V (x)
2
Geometric ergodicity
The previous two chapters have shown that for positive Harris chains, convergence of
Ex [f (Φk )] is guaranteed from almost all initial states x provided only π(f ) < ∞. Strong
though this is, for many models used in practice even more can be said: there is often
a rate of convergence ρ such that
P n (x, · ) − π
f = o(ρn )
where the rate ρ < 1 can be chosen essentially independent of the initial point x.
The purpose of this chapter is to give conditions under which convergence takes
place at such a uniform geometric rate. Because of the power of the final form of these
results, and the wide range of processes for which they hold (which include many of those
already analyzed as ergodic) it is not too strong a statement that this “geometrically
ergodic” context constitutes the most useful of all of those we present, and for this
reason we have devoted two chapters to this topic.
The following result summarizes the highlights of this chapter, where we focus on
bounds such as (15.4) and the strong relationship between such bounds and the drift
criterion given in (15.3). In Chapter 16 we will explore a number of examples in de-
tail, and describe techniques for moving from ergodicity to geometric ergodicity. The
development there is based primarily on the results of this chapter, and also on an in-
terpretation of the geometric convergence (15.4) in terms of convergence of the kernels
{P k } in a certain induced operator norm.
Theorem 15.0.1 (Geometric Ergodic Theorem). Suppose that the chain Φ is ψ-
irreducible and aperiodic. Then the following three conditions are equivalent:
(i) The chain Φ is positive recurrent with invariant probability measure π, and there
exists some ν-petite set C ∈ B + (X), ρC < 1, MC < ∞, and P ∞ (C) > 0 such that
for all x ∈ C
|P n (x, C) − P ∞ (C)| ≤ MC ρnC . (15.1)
(ii) There exists some petite set C ∈ B(X) and κ > 1 such that
362
Geometric ergodicity 363
(iii) There exists a petite set C, constants b < ∞, β > 0 and a function V ≥ 1 finite
at some one x0 ∈ X satisfying
∆V (x) ≤ −βV (x) + bIC (x), x ∈ X. (15.3)
Any of these three conditions imply that the set SV = {x : V (x) < ∞} is absorbing and
full, where V is any solution to (15.3) satisfying the conditions of (iii), and there then
exist constants r > 1, R < ∞ such that for any x ∈ SV
rn
P n (x, · ) − π
V ≤ RV (x). (15.4)
n
Proof The equivalence of the local geometric rate of convergence property in (i)
and the self-geometric recurrence property in (ii) will be shown in Theorem 15.4.3.
The equivalence of the self-geometric recurrence property and the existence of so-
lutions to the drift equation (15.3) is completed in Theorems 15.2.6 and 15.2.4. It is
in Theorem 15.4.1 that this is shown to imply the geometric nature of the V -norm
convergence in (15.4), while the upper bound on the right hand side of (15.4) follows
from Theorem 15.3.3.
The notable points of this result are that we can use the same function V in (15.4),
which leads to the operator norm results in the next chapter; and that the rate r in
(15.4) can be chosen independently of the initial starting point.
We initially discuss conditions under which there exists for some x ∈ X a rate r > 1
such that
P n (x, · ) − π
f ≤ Mx r−n (15.5)
where Mx < ∞. Notice that we have introduced f -norm convergence immediately:
it will turn out that the methods are not much simplified by first considering the
case of bounded f . We also have another advantage in considering geometric rates
of convergence compared with the development of our previous ergodicity results. We
can exploit the useful fact that (15.5) is equivalent to the requirement that for some r̄,
M̄x ,
r̄n
P n (x, · ) − π
f ≤ M̄x . (15.6)
n
Hence it is without loss of generality that we will immediately move also to consider
the summed form as in (15.6) rather than the n-step convergence as in (15.5).
f -Geometric ergodicity
We shall call Φ f -geometrically ergodic, where f ≥ 1, if Φ is positive Harris
with π(f ) < ∞ and there exists a constant rf > 1 such that
∞
rfn
P n (x, · ) − π
f < ∞ (15.7)
n =1
The development in this chapter follows a pattern similar to that of the previous
two chapters: first we consider chains which possess an atom, then move to aperiodic
chains via the Nummelin splitting.
This pattern is now well established: but in considering geometric ergodicity, the
extra complexity in introducing both unbounded functions f and exponential moments
of hitting times leads to a number of different and sometimes subtle problems. These
make the proofs a little harder in the case without an atom than was the situation with
either ergodicity or f -ergodicity. However, the final conclusion in (15.4) is well worth
this effort.
∞
n
αP (x, dw)f (w) rn ,
n =1
∞
∞
π(α) tf (j) rn , (15.8)
n =1 j =n +1
∞
|ax ∗ u − π(α)| ∗ tf (n) rn .
n =1
Now using Lemma D.7.2 and recalling that tf (n) = α P n (α, dw)f (w), we have that
the three sums in (15.8) can be bounded individually through
∞
τα
n
α P (x, dw)f (w)r
n
≤ Ex f (Φn )rn , (15.9)
n =1 n =1
∞
∞
r α τ
π(α) tf (j)rn ≤ Eα f (Φn )rn , (15.10)
n =1 j =n +1
r−1 n =1
15.1. Geometric properties: chains with atoms 365
∞
|ax ∗ u − π(α)| ∗ tf (n)rn
n =1
∞ ∞
= n =1 |ax ∗ u (n) − π(α)|r n
n =1 t f (n)r n
(15.11)
∞ τα
= n =1 |ax ∗ u (n) − π(α)|rn Eα n =1 f (Φn )rn .
In order to bound the first two sums (15.9) and (15.10), and the second term in the third
sum (15.11), we will require an extension of the notion of regularity, or more exactly of
f -regularity. For fixed r ≥ 1 recall the generating function defined in (8.21) for r < 1
by
τα
Uα(r ) (x, f ) := Ex f (Φn )rn ; (15.12)
n =1
clearly this is defined but possibly infinite for r ≥ 1. From the inequalities (15.9)–(15.11)
above it is apparent that when Φ admits an accessible atom, establishing f -geometric
(r )
ergodicity will require finding conditions such that Uα (x, f ) is finite for some r > 1.
The first term in the right hand side of (15.11) can be reduced further. Using the
fact that
∞
|ax ∗ u (n) − π(α)| = |ax ∗ (u − π(α)) (n) − π(α) ax (j)|
j =n +1
∞
≤ ax ∗ |(u − π(α))| (n) + π(α) ax (j)
j =n +1
(r )
Uα (x, f ) < ∞ for some r = rx > 1: and if we can choose such an r independent
of x then we will be able to assert that the overall rate of convergence in (15.4) is
also independent of x.
Theorem 15.1.1 (Kendall’s Theorem). Let u(n) be an ergodic renewal sequence with
increment distribution p(n), and write u(∞) = limn →∞ u(n). Then the following three
conditions are equivalent:
(ii) There exists r0 > 1 such that the function U (z) defined on the complex plane for
|z| < 1 by
∞
U (z) := u(n)z n
n =0
has an analytic extension in the disc {|z| < r0 } except for a simple pole at z = 1.
Proof Assume that (i) holds. Then by construction the function F (z) defined on
the complex plane by
∞
F (z) := (u(n) − u(n − 1))z n
n =0
we have that U (z) has no singularities in the disc {|z| < r0 } except a simple pole at
z = 1, so that (ii) holds.
15.1. Geometric properties: chains with atoms 367
Conversely suppose that (ii) holds. We can then also extend F (z) analytically in
∞< r0 } using (15.15). As
the disc {|z| the Taylor series expansion is unique, necessarily
F (z) = n =0 (u(n) − u(n − 1))z n
throughout this larger disc, and so by virtue of
Cauchy’s inequality
|u(n) − u(n − 1)|rn < ∞, r < r0 .
n
so that
∞
∞
Re P (z) ≤ p(n)Re (z n ) < p(n) = 1.
0 0
Consequently only one of these roots, namely z = 1, lies on the unit circle, and hence
there is some r0 with 1 < r0 ≤ κ such that z = 1 is the only root of P (z) = 1 in the
disc {|z| < r0 }.
Moreover this is a simple root at z = 1, since
1 − P (z) d
lim = P (z)|z =1 = np(n) = 0.
z →1 1−z dz
Now the renewal equation (8.12) shows that
6 7−1
U (z) = 1 − P (z)
Finally, to show that (ii) implies (iii) we again use (15.16): writing this as
shows that P (z) is a ratio of analytic functions and so is itself analytic in the disc
{|z| < κ}, where now κ is the first zero of F (z) in {|z| < r0 }; there are only finitely
many such zeros and none of them occurs in the closed unit disc {|z| ≤ 1} since P (z)
is bounded in this disc, so that κ > 1 as required.
It would seem that one should be able to prove this result, not only by analysis but
also by a coupling argument as in Section 13.2. Clearly one direction of this is easy: if
the renewal times are geometric then one can use coupling to get geometric convergence.
The other direction does seem to require analytic tools to the best of our knowledge,
and so we have given the classical proof here.
The application of Kendall’s Theorem to chains admitting an atom comes from the
(κ)
following, which is straightforward from the assumption that f ≥ 1, so that Uα (α, f ) ≥
τα
Eα [κ ].
15.1. Geometric properties: chains with atoms 369
This enables us to control the first term in (15.11). To exploit the other bounds in
(κ)
(15.9)–(15.11) we also need to establish finiteness of the quantities Uα (x, f ) for values
of x other than α.
Proposition 15.1.3. Suppose that Φ is ψ-irreducible, and admits an f -Kendall atom
α ∈ B+ (X) of rate κ. Then the set
(κ)
Proof The kernel Uα (x, · ) satisfies the identity
P (x, dy)Uα(κ) (y, B) = κ−1 Uα(κ) (x, B) + P (x, α)Uα(κ) (α, B)
Thus the set Sfκ is absorbing, and since Sfκ is non-empty it follows from Proposition 4.2.3
that Sfκ is full.
We now have sufficient structure to prove the geometric ergodic theorem when an
atom exists with appropriate properties.
Theorem 15.1.4. Suppose that Φ is ψ-irreducible, with invariant probability measure
π, and that there exists an f -Kendall atom α ∈ B+ (X) of rate κ.
Then there exists a decomposition X = S κ ∪ N where S κ is full and absorbing, such
that for all x ∈ S κ , some R < ∞, and some r with r > 1
rn
P n (x, ·) − π(·)
f ≤ R Uα(κ) (x, f ) < ∞. (15.18)
n
Proof By Proposition 15.1.3 the bounds (15.9) and (15.10), and the second term
in the bound (15.11), are all finite for x ∈ S κ ; and Kendall’s Theorem, as applied in
Proposition 15.1.2, gives that for some rα > 1 the other term in (15.11) is also finite.
The result follows with r = min(κ, rα ).
There is an alternative way of stating Theorem 15.1.4 in the simple geometric er-
godicity case f = 1 which emphasizes the solidarity result in terms of ergodic properties
rather than in terms of hitting time properties. The proof uses the same steps as the
previous proof, and we omit it.
370 Geometric ergodicity
so that the chain is geometrically ergodic if and only if the distribution p(n) has geo-
metrically decreasing tails.
We will see, once we develop a drift criterion for geometric ergodicity, that this
duality between geometric tails on increments and geometric rates of convergence to
stationarity is repeated for many other models.
P (0, j) = γj , j ∈ Z+ ,
P (j, j) = βj , j ∈ Z+ ,
P (j, 0) = 1 − βj , j ∈ Z+ . (15.20)
where j γj = 1.
The mean return time from zero to itself is given by
E0 [τ0 ] = γj [1 + (1 − βj )−1 ]
j
and the chain is thus ergodic if γj > 0 for all j (ensuring irreducibility and aperiodicity),
and
γj (1 − βj )−1 < ∞. (15.21)
j
In this example
E0 [rτ 0 ] ≥ r γj Ej [rτ 0 ]
j
15.1. Geometric properties: chains with atoms 371
and
Pj (τ0 > n) = βjn .
Hence if βj → 1 as n → ∞, then the chain is not geometrically ergodic regardless of the
structure of the distribution {γj }, even if γn → 0 sufficiently fast to ensure that (15.21)
holds.
Thus the best rates for convergence of P n (0, 0) and P n (2, 2) to their limits π(0) =
π(2) = 14 are ρ0 = ρ2 = 0: the limits are indeed attained at every step. But the rate of
convergence of P n (1, 1) to π(1) = 12 is at least ρ1 > 14 .
The following more complex example shows that even on an arbitrarily large finite
space {1, . . . , N + 1} there may in fact be N different rates of convergence such that
so that
k −1
P (k, k) = βk := 1 − αj − (N + 1 − k)αk , 1 ≤ k ≤ N + 1,
1
Since P is symmetric it is immediate that the invariant measure is given for all k by
π(k) = [N + 1]−1 .
For this example it is possible to show [384] that the eigenvalues of P are distinct and
are given by λ1 = 1 and for k = 2, . . . , N + 1
λk = βN +2−k − αN +2−k .
After considerable algebra it follows that for each k, there are positive constants s(k, j)
such that
N +1
P m (k, k) − [N + 1]−1 = s(k, j)λm
j
j =N +2−k
The crucial aspect of a Kendall atom is that the return times to the atom from itself
have a geometrically bounded distribution. There is an obvious extension of this idea
to more general, non-atomic, sets.
15.2. Kendall sets and drift criteria 373
τ
A −1
sup Ex f (Φk )κk < ∞. (15.22)
x∈A
k =0
τ
B −1
sup Ex f (Φk )rk < ∞.
x∈A
k =0
this is again well defined for r ≥ 1, although it may be infinite. We use this no-
tation in our next result, which establishes that any petite f -Kendall set is actually
f -geometrically regular. This is non-trivial to establish, and needs a somewhat delicate
“geometric trials” argument.
Theorem 15.2.1. Suppose that Φ is ψ-irreducible. Then the following are equivalent:
Proof To prove (ii)⇒(i) it is enough to show that A is petite, and this follows from
Proposition 11.3.8, since a geometrically regular set is automatically regular.
To prove (i)⇒(ii) is considerably more difficult, although obviously since a Kendall
set is Harris recurrent, it follows from Proposition 9.1.1 that any Kendall set is in B + (X).
374 Geometric ergodicity
Suppose that C is an f -Kendall set of rate κ, let 1 < r ≤ κ, and define U (r ) (x) =
Ex [rτ C ], so that U (r ) is bounded on C. We set M (r) = supx∈C U (r ) (x) < ∞. Put
ε = log(r)/ log(κ): by Jensen’s inequality,
M (r) = sup Ex [κετ C ] ≤ M (κ)ε .
x∈C
≤ (M (r))n −1 U (r ) (x), n ≥ 1.
To prove the theorem we will combine this bound with the sample path bound, valid
for any set B ∈ B(X),
τB ∞
τ C (n +1)
ri f (Φi ) ≤ rj f (Φj ) I{τB > τC (n)}.
i=1 n =0 j =τ C (n )+1
For any 0 < γ < 1, n ≥ 0, and positive numbers x and y we have the bound xy ≤
γ n x2 + γ −n y 2 . Applying this bound with x = rτ C (n ) and y = I{τC (n) < τB } in (15.25),
(r )
and setting Mf (r) = supx∈C UC (x, f ) we obtain for any B ∈ B(X),
∞ $
%
γ n Ex [r2τ C (n ) ] + γ −n Ex [I{τC (n) < τB }]
(r )
UB (x, f ) ≤ Mf (r)
n =0
$
∞
2
≤ Mf (r) γ n (M (r2 ))n U (r ) (x)
n =0
∞
%
+ γ −n Px {τC (n) < τB } , (15.26)
n =0
where we have used (15.24). We still need to prove the right hand side of (15.26) is
finite. Suppose now that for some R < ∞, ρ < 1, and any x ∈ X,
Px {τC (n) < τB } ≤ Rρn . (15.27)
15.2. Kendall sets and drift criteria 375
With γ so fixed, we can now choose r > 1 so close to unity that γM (r2 ) < 1 to obtain
$ U (r 2 ) (x) R %
(r )
UB (x, f ) ≤ Mf (r) + −1
,
1 − γM (r ) 1 − γ ρ
2
I{τC (mm0 ) < τB } = I{τC ([m − 1]m0 ) < τB }θτ C ([m −1]m 0 ) I{τC (m0 ) < τB }
depending on the quantity ρ in (15.27): intuitively, for a set B “far away” from C it
may take many visits to C before an excursion reaches B, and so the value of r will be
correspondingly closer to unity.
We see at once that (V4) is just (V3) in the special case where f = βV . From
this observation we can borrow several results from the previous chapter, and use the
approach there as a guide.
We first spell out some useful properties of solutions to the drift inequality in (15.28),
analogous to those we found for (14.16).
(i) If V satisfies (15.28), then {V < ∞} is either empty or absorbing and full.
(ii) If (15.28) holds for a petite set C, then V is unbounded off petite sets.
We now begin a more detailed evaluation of the consequences of (V4). We first give
a probabilistic form for one solution to the drift condition (V4), which will prove that
(15.2) implies (15.3) has a solution.
(r ) (r ) (r ) (r )
Using the kernel UC we define a further kernel GC as GC = I + IC c UC . For any
x ∈ X, B ∈ B(X), this has the interpretation
σC
(r )
GC (x, B) = Ex IB (Φk )rk . (15.29)
k =0
(r )
The kernel GC (x, B) gives us the solution we seek to (15.28).
(r )
Lemma 15.2.3. Suppose that C ∈ B(X), and let r > 1. Then the kernel GC satisfies
(r )
Proof The kernel UC satisfies the simple identity
(r ) (r )
UC = rP + rP IC c UC . (15.31)
(r )
Hence the kernel GC satisfies the chain of identities
This now gives us the easier direction of the duality between the existence of f -
Kendall sets and solutions to (15.28).
Theorem 15.2.4. Suppose that Φ is ψ-irreducible, and admits an f -Kendall set C ∈
(κ)
B + (X) for some f ≥ 1. Then the function V (x) = GC (x, f ) ≥ f (x) is a solution to
(V4).
Proof We have from (15.30) that, by the f -Kendall property, for some M < ∞
and r > 1,
∆V ≤ −βV + r−1 M IC
and so the function V satisfies (V4).
P V ≤ r−1 V − εV + bIC
Zk = rk V (Φk )
378 Geometric ergodicity
Choosing fk (x) = εrk +1 V (x) and sk (x) = brk +1 IC (x), we have by Proposition 11.3.2
τ
B −1 τ
B −1
Ex εr k +1
V (Φk ) ≤ Z0 (x) + Ex rk +1 bIC (Φk ) .
k =0 k =0
Multiplying through by ε−1 r−1 and noting that Z0 (x) = V (x), we obtain the required
bound.
The particular form with B = C is then straightforward.
We use this result to prove that in general, sublevel sets of solutions V to (15.28)
are V -geometrically regular.
Theorem 15.2.6. Suppose that Φ is ψ-irreducible, and that (V4) holds for a function
V and a petite set C.
If V is bounded on A ∈ B(X), then A is V -geometrically regular.
P V ≤ r−1 V + cID
for some c < ∞. Thus we have shown that (V4) holds with D in place of C.
Hence using (15.32) there exists s > 1 and ε > 0 such that
τ
D −1
Ex sk V (Φk ) ≤ ε−1 s−1 V (x) + ε−1 cID (x). (15.34)
k =0
15.2. Kendall sets and drift criteria 379
Theorem 15.2.7. If there exists an f -Kendall set C ∈ B+ (X), then there exists V ≥ f
and an increasing sequence {CV (i) : i ∈ Z+ } of V -geometrically regular sets whose
union is full.
(r )
Proof Let V (x) = GC (x, f ). Then V satisfies (V4) and by Theorem 15.2.6 the set
CV (n) := {x : V (x) ≤ n} is V -geometrically regular for each n. Since SV = {V < ∞}
is a full absorbing subset of X, the result follows.
The following alternative form of (V4) will simplify some of the calculations per-
formed later.
Lemma 15.2.8. The drift condition (V4) holds with a petite set C if and only if V is
unbounded off petite sets and
P V ≤ λV + L (15.35)
for some λ < 1, L < ∞.
Proof If (V4) holds, then (15.35) immediately follows. Lemma 15.2.2 states that
the function V is unbounded off petite sets.
Conversely, if (15.35) holds for a function V which is unbounded off petite sets then
set β = 12 (1 − λ) and define the petite set C as
C = {x ∈ X : V (x) ≤ L/β}
is an important one, it also has one drawback: as we have larger functions on the left,
the bounds on the distance to π(V ) also increase.
Overall it is not clear when one can have a best common bound on the distance
P n (x, · ) − π
V independent of V ; indeed, the example in Section 16.2.2 shows that as
V increases then one might even lose the geometric nature of the convergence.
380 Geometric ergodicity
However, the following result shows that one can obtain a smaller x-dependent
bound in the Geometric Ergodic Theorem if one is willing to use a smaller function V
in the application of the V -norm.
Lemma 15.2.9.
√ If (V4) holds for V , and some petite set C, then (V4) also holds for
the function V and some petite set C.
Proof If (V4) holds for the finite-valued function V then by Lemma 15.2.8 V is
unbounded- off petite sets and (15.35) holds for some λ < 1 and L < ∞. Letting
V (x) = V (x), x ∈ X, we have by Jensen’s inequality,
- √
P V (x) ≤ P V (x) ≤ λV + L
√ √ L
≤ λ V + √ since V ≥ 1
2 λ
√ L
= λV + √ ,
2 λ
√
which together with Lemma 15.2.8 implies that (V4) holds with V replaced by V.
f -Geometric regularity of Φ
The chain Φ is called f -geometrically regular if there exists a petite set C
and a fixed constant κ > 1 such that
τ
C −1
Ex f (Φk )κk (15.36)
k =0
Observe that when κ is taken equal to one, this definition then becomes f -regularity,
whilst the boundedness on C implies f -geometric regularity of the set C from Theo-
rem 15.2.1: it is the finiteness from arbitrary initial points that is new in this definition.
The following consequence of f -regularity follows immediately from the strong
Markov property and f -geometric regularity of the set C used in (15.36).
Proposition 15.3.2. If there is one petite f -Kendall set C, then there is a decompo-
sition
X = Sf ∪ N
where Sf is full and absorbing, and Φ restricted to Sf is f -geometrically regular.
Proof We know from Theorem 15.2.1 that when a petite f -Kendall set C exists
(r )
then C is V -geometrically regular, where V (x) = GC (x, f ) for some r > 1. Since V then
satisfies (V4) from Lemma 15.2.3, it follows from Lemma 15.2.2 that Sf = {V < ∞} is
absorbing and full. Now as in (15.32) we have for some κ > 1
τ
C −1
V (x) ≤ Ex V (Φn )κn ≤ ε−1 κ−1 V (x) + ε−1 cIC (x) (15.38)
n =0
and since the right hand side is finite on Sf the chain restricted to Sf is V -geometrically
regular, and hence also f -geometrically regular since f ≤ V .
The existence of an everywhere finite solution to the drift inequality (V4) is equiv-
alent to f -geometric regularity, imitating the similar characterization of f -regularity.
We have
382 Geometric ergodicity
Theorem 15.3.3. Suppose that (V4) holds for a petite set C and a function V which
is everywhere finite. Then Φ is V -geometrically regular, and for each B ∈ B+ (X) there
exists c(B) < ∞ such that
(r )
UB (x, V ) ≤ c(B)V (x).
Conversely, if Φ is f -geometrically regular, then there exists a petite set C and a
function V ≥ f which is everywhere finite and which satisfies (V4).
Proof Suppose that (V4) holds with V everywhere finite and C petite. As in the
proof of Theorem 15.2.6, there exists a petite set D on which V is bounded, and as in
(15.34) there is then r > 1 and a constant d such that
τ
D −1
Ex V (Φk )rk ≤ dV (x).
k =0
Hence Φ is V -geometrically regular, and the required bound follows from Proposi-
tion 15.3.1.
(r )
For the converse, take V (x) = GC (x, f ) where C is the petite set used in the
definition of f -geometric regularity.
This approach, using solutions V to (V4) to bound (15.36), is in effect an extended
version of the method used in the atomic case to prove Proposition 15.1.3.
(i) If V satisfies (V4) with a petite set C, then for any n-skeleton, the function V
also satisfies (V4) for some set C which is petite for the n-skeleton.
n −1
P n V ≤ ρn V + b P i IC ≤ ρn V + bmIC + ε.
i=0
Given this together with Theorem 15.3.3, which characterizes f -geometric regularity,
the following result is obvious:
We round out this series of equivalences by showing not only that the skeletons
inherit f -geometric regularity properties from the chain, but that we can go in the
other direction also.
m −1
Recall from (14.22) that for any positive function g on X, we write g (m ) = i=0 P i g.
Then we have, as a geometric analogue of Theorem 14.2.9,
Proof Letting τBm denote the hitting time for the skeleton, we have by the Markov
property, for any B ∈ B+ (X) and r > 1,
τ
B −1 τ
B −1 m
m m
m −1 −1
−m
Ex r km i
P f (Φk m ) ≥ r Ex rk m +i f (Φk m +i )
k =0 i=0 k =0 i=0
τ
B −1
≥ r−m Ex rj f (Φj ) .
j =0
If C is f (m ) -geometrically regular for an m-skeleton then the left hand side is bounded
over C for some r > 1 and hence the set C is also f -geometrically regular.
Conversely, if C ∈ B + (X) is f -geometrically regular then it follows from Theo-
rem 15.2.4 that (V4) holds for a function V ≥ f which is bounded on C.
Thus we have from (15.39) and a further application of Lemma 14.2.8 that for some
petite set C and ρ < 1
P m V (m ) ≤ ρV (m ) + mbIC ≤ ρ V (m ) + mbIC .
(m )
and thus (V4) holds for the m-skeleton. Since V (m ) is bounded on C by (15.39), we
have from Theorem 15.3.3 that C is V (m ) -geometrically regular for the m-skeleton.
τC
rn
P n (x, · ) − π
f ≤ R Ex [ f (Φk )κk ]
n k =0
Proof This proof is in several steps, from the atomic through the strongly aperiodic
to the general aperiodic case. In all cases we use the fact that the seemingly relatively
weak f -Kendall petite assumption on C implies that C is f -geometrically regular and
in B + (X) from Theorem 15.2.1.
Under the conditions of the theorem it follows from Theorem 15.2.4 that
σC
V (x) = Ex f (Φk )κk ≥ f (x) (15.40)
k =0
is a solution to (V4) which is bounded on the set C, and the set Sfκ = {x : V (x) < ∞}
is absorbing, full, and contains the set C. This will turn out to be the set required for
the result.
(i) Suppose first that the set C contains an accessible atom α. We know then
that the result is true from Theorem 15.1.4, with the bound on the f -norm convergence
given from (15.18) and (15.37) by
α −1
τ C −1
τ
Ex [ f (Φk )κk ] ≤ c(α)Ex [ f (Φk )κk ]
k =0 k =0
To prove the theorem we abandon the function f and prove V -geometric ergodicity
for the chain restricted to Sfκ and the function (15.40). By Theorem 15.3.3 applied to
the chain restricted to Sfκ we have that for some constants c < ∞, r > 1,
τC
Ex V (Φk )rk ≤ cV (x). (15.41)
k =1
Now consider the chain split on C. Exactly as in the proof of Proposition 14.3.1 we
have that
τ C
0 ∪C 1
Ěx i V̌ (Φ̌k )rk ≤ c V̌ (xi )
k =1
From the definition of V and the bound V ≥ f this proves the theorem when C is
ν1 -small.
(iii) Now let us move to the general aperiodic case. Choose m so that the set C
is itself νm -small with νm (C c ) = 0: we know that this is possible from Theorem 5.5.7.
By Theorem 15.3.3 and Theorem 15.3.5 the chain and the m-skeleton restricted to
Sfκ are both V -geometrically regular. Moreover, by Theorem 15.3.3 and Theorem 15.3.4
we have for some constants d < ∞, r > 1,
m
τC
Ex V (Φk )rk ≤ dV (x) (15.42)
k =1
where as usual τCm denotes the hitting time for the m-skeleton. From (ii), since m is
chosen specifically so that C is “ν1 -small” for the m-skeleton, there exists c < ∞ with
P n m (x, · ) − π
V ≤ cV (x)r0−n , n ∈ Z+ , x ∈ Sfκ .
We now need to compare this term with the convergence of the one-step transition
probabilities, and we do not have the contraction property of the total variation norm
available to do this. But if (V4) holds for V then we have that
P n +1 (x, · ) − π
V ≤ (1 + b)
P n (x, · ) − π
V . (15.43)
P k (x, · ) − π
V ≤ (1 + b)m
P n m (x, · ) − π
V
≤ (1 + b)m cV (x)r0−n
1/m −k
≤ (1 + b)m cr0 V (x)(r0 ) ,
Proof Let us first assume there is an accessible atom α ∈ B+ (X), and that r > 1
is such that
rn
P n (α, · ) − π
f < ∞.
n
Using the last exit decomposition (8.19) over the times of entry to α, we have as in the
Regenerative Decomposition (13.48)
∞
P (α, f ) − π(f ) ≥ (u − π(α)) ∗ tf (n) + π(α)
n
tf (j). (15.44)
j =n +1
Multiplying by rn and summing both sides of (15.44) would seem to indicate that α is
an f -Kendall atom of rate r, save for the fact that the first term may be negative, so
that we could have both positive and negative infinite terms in this sum in principle.
We need a little more delicate argument to get around this.
By truncating the last term and then multiplying by sn , s ≤ r and summing to N ,
we do have
N 6N N −n k 7
n =0 s (P (α, f ) − π(f )) ≥ k =0 s (u(k) − π(α))]
n n n
n =0 s tf (n)[
(15.45)
N N
+ π(α) n =0 sn j =n +1 tf (j).
15.4. f -Geometric ergodicity for general chains 387
N ∞ n
n =0 s |u(n) − π(α)|. We can
n
Let us write cN (f, s) = n =0 s tf (n), and d(s) =
bound the first term in (15.45) in absolute value by d(s)cN (f, s), so in particular as
s ↓ 1, by monotonicity of d(s) we know that the middle term is no more negative than
−d(r)cN (f, s).
On the other hand, the third term is by Fubini’s Theorem given by
N
π(α)[s − 1]−1 tf (n)(sn − 1) ≥ [s − 1]−1 [π(α)cN (f, s) − π(f ) − π(α)f (α)]. (15.46)
n =0
Suppose now that α is not f -Kendall. Then for any s > 1 we have that cN (f, s) is
unbounded as N becomes large. Fix s sufficiently small that π(α)[s − 1]−1 > d(r); then
we have that the right hand side of (15.45) is greater than
which tends to infinity as N → ∞. This clearly contradicts the finiteness of the left side
of (15.45). Consequently α is f -Kendall of rate s for some s < r, and then the chain is
f -geometrically regular when restricted to a full absorbing set S from Proposition 15.3.2.
Now suppose that the chain does not admit an accessible atom. If the chain is
f -geometrically ergodic, then it is straightforward that for every m-skeleton and every
x we have
rn |P n m (x, f ) − π(f )| < ∞,
n
and for the split chain corresponding to one such skeleton we also have |rn P̌ n (x, f ) −
π(f )| summable. From the first part of the proof this ensures that the split chain,
and again trivially the m-skeleton is f (m ) -geometrically regular, at least on a full ab-
sorbing set S. We can then use Theorem 15.3.7 to deduce that the original chain is
f -geometrically regular on S as required.
One of the uses of this result is to show that even when π(f ) < ∞ there is no
guarantee that geometric ergodicity actually implies f -geometric ergodicity: rates of
convergence need not be inherited by the f -norm convergence for “large” functions f .
We will see this in the example defined by (16.24) in the next chapter.
However, we can show that local geometric ergodicity does at least give the V -
geometric ergodicity of Theorem 15.4.1, for an appropriate V . As in Chapter 13, we
conclude with what is now an easy result.
Theorem 15.4.3. Suppose that Φ is an aperiodic positive Harris chain, with invariant
probability measure π, and that there exists some ν-small set C ∈ B+ (X), ρC < 1 and
MC < ∞, and P ∞ (C) > 0 such that ν(C) > 0 and
νC (dx)(P n (x, C) − P ∞ (C)) ≤ MC ρnC (15.47)
C
Proof Using the Nummelin splitting via the set C for the m-skeleton, we have
exactly as in the proof of Theorem 13.3.5 that the bound (15.47) implies that the atom
in the skeleton chain split at C is geometrically ergodic.
We can then emulate step (iii) of the proof of Theorem 15.4.1 above to reach the
conclusion.
Notice again that (15.47) is implied by (15.1), so that we have completed the circle
of results in Theorem 15.0.1.
For this chain we can consider directly Px (τ0 = n) = ax (n) in order to evaluate the
geometric tails of the distribution of the hitting times. Since we have the recurrence
relations
ax (n) = (1 − p)ax−1 (n − 1) + pax+1 (n − 1), x > 1;
ax (0) = 0, x ≥ 1;
a1 (n) = pa2 (n − 1), a0 (0) = 0,
∞
valid for n ≥ 1, the generating functions Ax (z) = n =0 ax (n)z n satisfy
and if p < 1/2, then [(1 − p)z −1 + pz − 1] = −β < 0 for z sufficiently close to unity, and
so (15.28) holds as desired.
15.5. Simple random walk and linear models 389
In fact, this same property, that for random walks on the half line ergodic chains are
also geometrically ergodic, holds in much wider generality. The crucial property is that
the increment distribution have exponentially decreasing right tails, as we shall see in
Section 16.1.3.
Xn = αXn −1 + Wn
We noted in Proposition 11.4.2 that for large enough m, V satisfies (V2) with C =
CV (m) = {x : |x| + 1 ≤ m}, provided that
Under this condition, just as in the simple linear model, the chain is irreducible and
aperiodic and thus again in this case we have that the chain is V -geometrically ergodic
with V (x) = |x| + 1.
Suppose further that W has finite variance σw2 satisfying
θ2 + b2 σw2 < 1;
exactly as in Section 14.4.2, we see that V (x) = x2 is a solution to (V4) and hence Φ
is V -geometrically ergodic with this V . As a consequence, the chain admits a second
order stationary distribution π with the property that for some r > 1 and c < ∞, and
all x and n,
n 2
r P (x, dy)y − π(dy)y < c(x2 + 1).
n 2
Thus not only does the chain admit a second order stationary version, but the time
dependent variances converge to the stationary variance.
15.6 Commentary*
Unlike much of the ergodic theory of Markov chains, the history of geometrically ergodic
chains is relatively straightforward. The concept was introduced by Kendall in [202],
where the existence of the solidarity property for countable space chains was first estab-
lished: that is, if one transition probability sequence P n (i, i) converges geometrically
quickly, so do all such sequences. In this seminal paper the critical renewal theorem
(Theorem 15.1.1) was established.
The central result, the existence of the common convergence rate, is due to Vere-
Jones [403] in the countable space case; the fact that no common best bound exists was
also shown by Vere-Jones [403], with the more complex example given in Section 15.1.4
being due to Teugels [384]. Vere-Jones extended much of this work to non-negative
matrices [405, 406], and this approach carries over to general state space operators
[394, 395, 303].
Nummelin and Tweedie [307] established the general state space version of geo-
metric ergodicity, and by using total variation norm convergence, showed that there
is independence of A in the bounds on |P n (x, A) − π(A)|, as well as an independent
geometric rate. These results were strengthened by Nummelin and Tuominen [305],
who also show as one important application that it is possible to use this approach to
establish geometric rates of convergence in the Key Renewal Theorem of Section 14.5 if
the increment distribution has geometric tails. Their results rely on a geometric trials
argument to link properties of skeletons and chains: the drift condition approach here
is new, as is most of the geometric regularity theory.
15.6. Commentary* 391
The upper bound in (15.4) was first observed by Chan [62]. Meyn and Tweedie [277]
developed the f -geometric ergodicity approach, thus leading to the final form of Theo-
rem 15.4.1; as discussed in the next chapter, this form has important operator-theoretic
consequences, as pointed out in the case of countable X by Hordijk and Spieksma [163].
The drift function criterion was first observed by Popov [320] for countable chains,
with general space versions given by Nummelin and Tuominen [305] and Tweedie [400].
The full set of equivalences in Theorem 15.0.1 is new, although much of it is implicit in
Nummelin and Tweedie [307] and Meyn and Tweedie [277].
Initial application of the results to queueing models can be found in Vere-Jones
[404] and Miller [284], although without the benefit of the drift criteria, such appli-
cations are hard work and restricted to rather simple structures. The bilinear model
in Section 15.5.2 is first analyzed in this form in Feigin and Tweedie [111]. Further
interpretation and exploitation of the form of (15.4) is given in the next chapter, where
we also provide a much wider variety of applications of these results.
In general, establishing exact rates of convergence or even bounds on such rates
remains (for infinite state spaces) an important open problem, although by analyzing
Kendall’s Theorem in detail Spieksma [367] has recently identified upper bounds on the
area of convergence for some specific queueing models.
Added in second printing: There has now been a substantial amount of work on this
problem, and quite different methods of bounding the convergence rates have been found
by Meyn and Tweedie [282], Baxendale [21], Rosenthal [343, 342] and Lund and Tweedie
[241]. However, apart from the results in [241] which apply only to stochastically
monotone chains, none of these bounds are tight, and much remains to be done in this
area.
Commentary for the second edition: This is an evolving research area, and one
that is too large to summarize here. Section 20.1 contains a partial survey of the
state-of-the-art of geometric ergodicity and its applications. Applications to queueing
networks are surveyed in [267].
Chapter 16
V -Uniform ergodicity
In this chapter we introduce the culminating form of the geometric ergodicity theorem,
and show that such convergence can be viewed as geometric convergence of an operator
norm; simultaneously, we show that the classical concept of uniform (or strong) ergod-
icity, where the convergence in (13.4) is bounded independently of the starting point,
becomes a special case of this operator norm convergence.
We also take up a number of other consequences of the geometric ergodicity proper-
ties proven in Chapter 15, and give a range of examples of this behavior. For a number
of models, including random walk, time series and state space models of many kinds,
these examples have been held back to this point precisely because the strong form of
ergodicity we now make available is met as the norm, rather than as the exception. This
is apparent in many of the calculations where we verified the ergodic drift conditions
(V2) or (V3): often we showed in these verifications that the stronger form (V4) actu-
ally held, so that unwittingly we had proved V -uniform or geometric ergodicity when
we merely looked for conditions for ergodicity.
To formalize V -uniform ergodicity, let P1 and P2 be Markov transition functions,
and for a positive function ∞ > V ≥ 1, define the V -norm distance between P1 and P2
as
P1 (x, · ) − P2 (x, · )
V
|||P1 − P2 |||V := sup . (16.1)
x∈X V (x)
The outer product of the function 1 and the measure π is denoted
V -uniform ergodicity
An ergodic chain Φ is called V -uniformly ergodic if
392
V -Uniform ergodicity 393
(iii) There exists some n > 0 such that |||P i − 1 ⊗ π|||V < ∞ for i ≤ n and
(iv) The drift condition (V4) holds for some petite set C and some V0 , where V0 is
equivalent to V in the sense that for some c ≥ 1,
Proof That (i), (ii) and (iii) are equivalent follows from Proposition 16.1.3. The
fact that (ii) follows from (iv) is proven in Theorem 16.1.2, and the converse, that (ii)
implies (iv), is Theorem 16.1.4.
Secondly, we show that V -uniform ergodicity implies that the chain is strongly mix-
ing. In fact, it is shown in Theorem 16.1.5 that for a V -uniformly ergodic chain, there
exists R and ρ < 1 such that for any g 2 , h2 ≤ V and k, n ∈ Z+ ,
Finally in this chapter, using the form (16.3), we connect concepts of geometric
ergodicity with one of the oldest, and strongest, forms of convergence in the study of
Markov chains, namely uniform ergodicity (sometimes called strong ergodicity).
Uniform ergodicity
A chain Φ is called uniformly ergodic if it is V -uniformly ergodic in the
special case where V ≡ 1, that is, if
sup
P n (x, · ) − π
→ 0, n → ∞. (16.6)
x∈X
There are a large number of stability properties all of which hold uniformly over the
whole space when the chain is uniformly ergodic.
394 V -Uniform ergodicity
Theorem 16.0.2. For any Markov chain Φ the following are equivalent:
(ii) There exist r > 1 and R < ∞ such that for all x
P n (x, · ) − π
≤ Rr−n ; (16.7)
that is, the convergence in (16.6) takes place at a uniform geometric rate.
(iv) The chain is aperiodic and Doeblin’s condition holds: that is, there is a probability
measure φ on B(X) and ε < 1, δ > 0, m ∈ Z+ such that whenever φ(A) > ε
(vii) The chain is aperiodic and there is a petite set C and a κ > 1 with
P n (x, · ) − π
≤ 2ρn /m (16.11)
where ρ = 1 − νm (X).
16.1. Operator norm convergence 395
|f (x)|
|f |V := sup < ∞.
x∈X V (x)
Proof This is largely a restatement of the result in Theorem 15.4.1. From Theo-
rem 15.4.1 for some R < ∞, ρ < 1,
P n (x, · ) − π
V ≤ RV (x)ρn , n ∈ Z+ ,
Because ||| · |||V is a norm it is now easy to show that V -uniformly ergodic chains are
always geometrically ergodic, and in fact V -geometrically ergodic.
Proposition 16.1.3. Suppose that π is an invariant probability and that for some n0 ,
Proof Since ||| · |||V is an operator norm we have for any m, n ∈ Z+ , using the
invariance of π,
|||P n +m − 1 ⊗ π|||V = |||(P − 1 ⊗ π)n (P − 1 ⊗ π)m |||V ≤ |||P n − 1 ⊗ π|||V |||P m − 1 ⊗ π|||V .
i k
|||P n − 1 ⊗ π|||V ≤ |||P − 1 ⊗ π|||V |||P n 0 − 1 ⊗ π|||V
≤ M i γk
≤ M n 0 γ −1 (γ 1/n 0 )n
Theorem 16.1.4. Suppose that Φ is ψ-irreducible, and that for some V ≥ 1 there exist
r > 1 and R < ∞ such that for all n ∈ Z+
Then the drift condition (V4) holds for some V0 , where V0 is equivalent to V in the
sense that for some c ≥ 1,
c−1 V ≤ V0 ≤ cV. (16.14)
16.1. Operator norm convergence 397
Proof Fix C ∈ B + (X) as any petite set. Then we have from (16.13) the bound
and hence the sublevel sets of V are petite by Proposition 5.5.4 (i), and so V is un-
bounded off petite sets.
From the bound
P n V ≤ Rρn V + π(V ) (16.15)
we see that (15.35) holds for the n-skeleton whenever Rρn < 1. Fix n with Rρn < e−1 ,
and set
n −1
V0 (x) := exp[i/n]P i V.
i=0
V0 ≤ e1 nRV + nπ(V ),
This shows that (15.35) also holds for Φ, and hence by Lemma 15.2.8 the drift condition
(V4) holds with this V0 , and some petite set C.
Thus we have proved the equivalence of (ii) and (iv) in Theorem 16.0.1.
where the supremum is taken over all k ∈ Z+ , and all g and h such that |g(x)|, |h(x)| ≤ 1
for all x ∈ X.
In the following result we show that V -uniformly ergodic chains satisfy a much
stronger property. We will call Φ V -geometrically mixing if there exists R < ∞, ρ < 1
such that
sup |Ex [g(Φk )h(Φn +k )] − Ex [g(Φk )]Ex [h(Φn +k )]| ≤ RV (x)ρn , n ∈ Z+ ,
where we now extend the supremum to include all k ∈ Z+ , and all g and h such that
g 2 (x), h2 (x) ≤ V (x) for all x ∈ X.
Theorem 16.1.5. If Φ is V -uniformly ergodic, then there exists R < ∞ and ρ < 1
such that for any g 2 , h2 ≤ V and k, n ∈ Z+ ,
|Ex [g(Φk )h(Φn +k )] − Ex [g(Φk )]Ex [h(Φn +k )]| ≤ Rρn [1 + ρk V (x)],
and hence the chain Φ is V -geometrically mixing.
√
Proof For any h2 ≤ V , g 2 ≤ V let h = h − π(h), g = g − π(g). We have by
V -uniform ergodicity as in Lemma 15.2.9 that for some R < ∞, ρ < 1,
6 7
|Ex [h(Φk )g(Φk +n )]| = Ex h(Φk )EΦ k [g(Φn )]
-
≤ R ρn Ex h(Φk ) V (Φk ) .
1 1 1
Since |h| ≤ 1 + V 2 dπ V 2 we can set R = R 1 + V 2 dπ and apply (15.35) to
obtain the bound
|Ex [h(Φk )g(Φk +n )]| ≤ R ρn Ex [V (Φk )]
+
n L
≤ R ρ k
+ λ V (x) .
1−λ
Assuming without loss of generality that ρ ≥ λ, and using the bounds
-
|π(h) − Ex [h(Φk )]| ≤ R ρk V (x),
-
|π(g) − Ex [g(Φk +n )]| ≤ R ρk +n V (x)
gives the result for some R < ∞.
It follows from Theorem 16.1.5 that if the chain is V -uniformly ergodic, then for
some R1 < ∞,
|Ex [h(Φk )g(Φk +n )]| ≤ R1 ρn [1 + ρk V (x)], k, n ∈ Z+ , (16.16)
where h = h − π(h), g = g − π(g).
By integrating both sides of (16.16) over X, the initial condition x may be replaced
with a finite bound for any initial distribution µ with µ(V ) < ∞, and a mixing condition
will be satisfied for such initial conditions. In the particular case where µ = π we have
by stationarity and finiteness of π(V ) (see Theorem 14.3.7)
|Eπ [h(Φk )g(Φk +n )]| ≤ R2 ρn , k, n ∈ Z+ , (16.17)
for some R2 < ∞; and hence the stationary version of the process satisfies a geometric
mixing condition under (V4).
16.1. Operator norm convergence 399
where γ > 0.
Thus (V4) also holds on C c , and we conclude that the chain is eγ x -uniformly ergodic.
Moreover, from Theorem 16.0.1 we also have that
|P n (x, dy)eγ y − π(dy)eγ y | < eγ x r−n ,
so that the moment-generating functions of the model, and moreover all polynomial
moments, converge geometrically quickly to their limits with known bounds on the
state-dependent constants.
This is the same result we showed in Section 15.1.4 for the forward recurrence time
chain on Z+ ; here we have used the drift conditions rather than the direct calculation
of hitting times to establish geometric ergodicity.
It is obvious from its construction that for this chain the condition Γ ∈ G + (γ) is
also necessary for geometric ergodicity.
The condition for uniform ergodicity for the forward recurrence time chain is also
trivial to establish, from the criterion in Theorem 16.0.2 (vi). We will only have this
condition holding if Γ is of bounded range so that Γ[0, c] = 1 for some finite c; in
this case we may take the state space X equal to the compact absorbing set [0, c]. The
existence of such a compact absorbing subset is typical of many uniformly ergodic chains
in practice.
Random walk on R+
Consider now the random walk on [0, ∞), defined by (RWHL1). Suppose that the model
has an increment distribution Γ such that
400 V -Uniform ergodicity
(a) the mean increment β = x Γ(dx) < 0;
(b) the distribution Γ is in G + (γ), for some γ > 0.
Let us choose V (x) = exp(sx), where 0 < s < γ is to be selected. Then we have
∞
P (x, dy)∆V (y)/V (x) = −x Γ(dw)[exp(sw) − 1]
+ Γ(−∞, −x][exp(−sx) − 1]
∞ (16.19)
≤ −∞
Γ(dw)[exp(sw) − 1]
−x
+ −∞
Γ(dw)[1 − exp(sw)].
Proof Obviously (i) implies (iii); but from Proposition 16.1.3 we see that (iii)
implies (ii), which clearly implies (i) as required.
Note that uniform ergodicity implies, trivially, that the chain actually is π-irreducible
and aperiodic, since for π(A) > 0 there exists n with P n (x, A) ≥ π(A)/2 for all x.
We next prove that (v)–(viii) of Theorem 16.0.2 are equivalent to uniform ergodicity.
Theorem 16.2.2. The following are equivalent for a ψ-irreducible aperiodic chain:
(iii) There is a petite set C with supx∈X Ex [τC ] < ∞, in which case for every A ∈ B + (X)
we have supx∈X Ex [τA ] < ∞.
(iv) There is a petite set C and a κ > 1 with supx∈X Ex [κτ C ] < ∞ in which case for
every A ∈ B + (X) we have supx∈X Ex [κτAA ] < ∞ for some κA > 1.
(v) There is an everywhere bounded solution V to (16.10) for some petite set C.
Proof Observe that the drift inequality (11.17) given in (V2) and the drift in-
equality (16.10) are identical for bounded V . The equivalence of (iii) and (v) is thus a
consequence of Theorem 11.3.11, whilst (iv) implies (iii) trivially and Theorem 15.2.6
shows that (v) implies (iv): such connections between boundedness of τA and solutions
of (16.10) are by now standard.
To see that (i) implies (ii), observe that if (i) holds, then Φ is π-irreducible and
hence there exists a small set A ∈ B+ (X). Then, by (i) again, for some n0 ∈ Z+ ,
inf x∈X P n 0 (x, A) > 0 which shows that X is small from Theorem 5.2.4.
The implication that (ii) implies (v) is equally simple. Let V ≡ 1, β = b = 12 , and
C = X. We then have
∆V = −βV + bIC ,
Historically, one of the most significant conditions for ergodicity of Markov chains
is Doeblin’s condition.
402 V -Uniform ergodicity
Doeblin’s condition
Suppose there exists a probability measure φ with the property that for
some m, ε < 1, δ > 0
for every x ∈ X.
From the equivalences in Theorem 16.2.1 and Theorem 16.2.2, we are now in a
position to give a very simple proof of the equivalence of uniform ergodicity and this
condition.
Theorem 16.2.3. An aperiodic ψ-irreducible chain Φ satisfies Doeblin’s condition if
and only if Φ is uniformly ergodic.
Proof Let C be any petite set with φ(C) > ε and consider the test function
V (x) = 1 + IC c (x).
= −δ + IC (x)
≤ − 12 δV (x) + IC (x).
Hence V is a bounded solution to (16.10) for the m-skeleton, and it is thus the case that
the m-skeleton and the original chain are uniformly ergodic by the contraction property
of the total variation norm.
Conversely, we have from uniform ergodicity in the form (16.7) that for any ε > 0,
if π(A) ≥ ε then
P n (x, A) ≥ ε − Rρn ≥ ε/2
for all n large enough that Rρn ≤ ε/2, and Doeblin’s condition holds with φ = π.
Thus we have proved the final equivalence in Theorem 16.0.2. We conclude by
exhibiting the one situation where the bounds on convergence are simply calculated.
Theorem 16.2.4. If a chain Φ satisfies
P n (x, · ) − π
≤ 2ρn /m (16.21)
where ρ = 1 − νm (X).
16.2. Uniform ergodicity 403
Proof This can be shown using an elegant argument based on the assumption
(16.20) that the whole space is small which relies on a coupling method closely connected
to the way in which the split chain is constructed.
Write (16.20) as
P m (x, A) ≥ (1 − ρ)ν(A) (16.22)
where ν = νm /(1 − ρ) is a probability measure.
Assume first for simplicity that m = 1. Run two copies of the chain, one from the
initial distribution concentrated at x and the other from the initial distribution π. At
every time point either
(a) with probability 1 − ρ, choose for both chains the same next position from the
distribution ν, after which they will be coupled and then can be run with identical
sample paths; or
(b) with probability ρ, choose for each chain an independent position, using the dis-
tribution (as in the split chain construction) [P (x, · ) − (1 − ρ)ν( · )]/ρ, where x is
the current position of the chain.
P n (x, · ) − π
≤ 2P(T > n) ≤ 2ρn (16.23)
which is (16.21).
When m > 1 we can use the contraction property as in Proposition 16.1.3 to give
(16.21) in the general case.
The optimal use of these many equivalent conditions for uniform ergodicity depends
of course on the context of use. In practice, this last theorem, since it identifies the
exact rate of convergence, is perhaps the most powerful, and certainly gives substantial
impetus to identifying the actual minorization measure which renders the whole space
a small set.
It can also be of importance to use these conditions in assessing when uniform
convergence does not hold: for example, in the forward recurrence time chain V + δ it is
immediate from Theorem 16.2.2 (iii) that, since the mean return time to [0, δ] from x is
of order x, the chain cannot be uniformly ergodic unless the state space can be reduced
to a compact set.
Similar remarks apply to random walk on the half line: we see this explicitly in the
simple random walk of Section 15.5, but it is a rather deeper result [69] that for general
random walk on [0, ∞), Ex [τ0 ] ∼ cx so such chains are never uniformly ergodic.
i − 1 m
m (i) := , i ≥ 1, m ≥ 0.
iβ
Note that for i = 1 we have m (1) = 0 for all m, but for i > 1
i − 1 m +1 i − 1 m i − 1 m i − 1 − iβ
− = ≥1
iβ iβ iβ iβ
since (i − 1 − iβ)/iβ ≥ (3i − 1)/i ≥ 2. Hence from the second rung up, this sequence
m (i) forms a strictly monotone increasing set of states along the rung.
The transition mechanism we consider provides a chain satisfying Doeblin’s condi-
tion. We suppose P is given by
P (i, m (i); 0, 0) = 1 − β, i = 1, 2, . . . , m = 1, 2, . . . ,
P (0, 0; i, j) = αij , i, j ∈ X,
P (0, k; 0, 0) = 1, k > 0,
= i;
τ −1 i−1 m
Ei,
m (i) [ 00 , 0 f (Φn )] ≤ iβ i, m = 1, 2, . . . ;
τ −1
Ei,k [ 00 , 0 f (Φn )] = k, k =
m (i), m = 1, 2, . . . .
and all other values except α00 as zero, and where c is chosen to ensure that the αik
form a probability distribution.
With this choice we have
τ −1 6∞ 7
E0,0 [ 00 , 0 f (Φn )] ≤ 1 + i≥1 k =
m (i),m ≥0 k2−i−k + i≥1 m =0 2
−i−
m (i)
i
≤ 1+2 i≥1 i2−i < ∞
so that the chain is certainly f -ergodic by Theorem 14.0.1. However for any r ∈ (1, β −1 ),
τ −1 ∞ n
Ei,1 [ 00 , 0 f (Φn )rn ] = (1 − β) n =0 β n rn m =0
m (i)
∞ n 6 i−1 m 7
≥ (1 − β) n =0 (βr)
n
m =0 iβ −1
1−β ∞ 6
n [(i−1)/iβ ]
n+1
−1 7
= − 1−β r + n =0 (βr) [(i−1)/iβ ]−1
which is infinite if
6i − 17
βr > 1;
iβ
that is, for those rungs i such that i > r/(r − 1). Since there is positive probability of
reaching such rungs in one step from (0, 0) it is immediate that
τ 0 , 0 −1
E0,0 [ f (Φn )rn ] = ∞
0
for all r > 1, and hence from Theorem 15.4.2 for all r > 1
rn
P n (0, 0; · ) − π
f = ∞.
n
We have thus demonstrated that the strongest rate of convergence in the simple total
variation norm may not be inherited, even by the simplest of unbounded functions; and
that one really needs, when considering such functions, to use criteria such as (V4) to
ensure that these functions converge geometrically.
Theorem 16.2.5. If Φ is a ψ-irreducible and aperiodic T-chain, and if the state space
X is compact, then Φ is uniformly ergodic.
One specific model, the nonlinear state space model, is also worth analyzing in more
detail to show how we can identify other conditions for uniform ergodicity.
In a manner similar to the proof of Theorem 16.2.5 we show that the the NSS(F )
model defined by (NSS1) and (NSS2) is uniformly ergodic, provided that the associated
control model CM(F ) is stable in the sense of Lagrange, so that in effect the state space
is reduced to a compact invariant subset.
Lagrange stability
The CM(F ) model is called Lagrange stable if A+ (x) is compact for each
x ∈ X.
Typically in applications, when the CM(F ) model is Lagrange stable the input
sequence will be constrained to lie in a bounded subset of Rp . We stress however that
no conditions on the input are made in the general definition of Lagrange stability.
The key to analyzing the NSS(F ) corresponding to a Lagrange stable control model
lies in the following lemma:
Lemma 16.2.6. Suppose that the CM(F ) model is forward accessible, Lagrange stable,
M -irreducible and aperiodic, and suppose that for the NSS(F ) model conditions (NSS1)–
(NSS3) are satisfied.
Then for each x ∈ X the set A+ (x) is closed, absorbing, and small.
16.3. Geometric ergodicity and increment analysis 407
Proof By Lagrange stability it is sufficient to show that any compact and invariant
set C ⊂ X is small. This follows from Theorem 7.3.5 (ii), which implies that compact
sets are small under the conditions of the lemma.
Using Lemma 16.2.6 we now establish geometric convergence of the expectation of
functions of Φ:
Theorem 16.2.7. Suppose the NSS(F ) model satisfies conditions (NSS1)–(NSS3) and
that the associated control model CM(F ) is forward accessible, Lagrange stable, M -
irreducible and aperiodic.
Then a unique invariant probability π exists, and the chain restricted to the absorbing
set A+ (x) is uniformly ergodic for each initial condition.
Hence also for every function f : X → R which is uniformly bounded on compact
sets, and every initial condition,
Ey [f (Φk )] → f dπ
at a geometric rate.
Recall that VC (x) = Ex [σC ] is the minimal solution to (16.25) from Theorem 11.3.5.
and
2
P (x, dy) V (y) − V (x) ≤ d, (16.27)
V (y )< V (x)
2
%
+ δ2 (V (y) − V (x))2 exp{δθx (V (y) − V (x))}
(16.28)
for some θx ∈ [0, 1], by using a second order Taylor expansion. Since V satisfies (16.25),
the right hand side of (16.28) is bounded for x ∈ C c by
2
$ 2
1 − δ + δ2 V (y )< V (x) P (x, dy) V (y) − V (x)
2 %
+ V (y )≥V (x)
P (x, dy) (V (y) − V (x) exp{δ V (y) − V (x) }
δ2 δ 2 −ξ
≤1−δ+ 2 d+ 2 V (y )≥V (x)
P (x, dy) exp{(δ + δ ξ /2 ) V (y) − V (x) }
δ 2 −ξ
≤1−δ+ 2 d+c ,
(16.29)
for some ξ ∈ (0, 1) such that δ + δ ξ /2 < β by virtue of (16.26) and (16.27), and the fact
that x2 is bounded by ex on R+ . This proves the theorem, since we have
δ 2−ξ
1−δ+ d+c <1
2
for sufficiently small δ > 0, and thus (V4) holds for V ∗ .
The typical example of this behavior, on which this proof is modeled, is the random
walk in Section 16.1.3. In that case V (x) = x, and (16.26) is the requirement that
Γ ∈ G + (γ). In this case we do not actually need (16.27), which may not in fact hold.
It is often easier to verify the conditions of this theorem than to evaluate directly the
existence of a test function for geometric ergodicity, as we shall see in the next section.
How necessary are the conditions of this theorem on the “tails” of the increments?
By considering for example the forward recurrence time chain, we see that for some
chains Γ ∈ G + (γ) may indeed be necessary for geometric ergodicity. However, geometric
tails are certainly not always necessary for geometric ergodicity: to demonstrate this
simply consider any i.i.d. process, which is trivially uniformly ergodic, regardless of its
“increment” structure.
It is interesting to note, however, that although they seem somewhat “proof depen-
dent”, the uniform bounds (16.26) and (16.27) on P that we have imposed cannot be
weakened in general when moving from ergodicity to geometric ergodicity.
16.3. Geometric ergodicity and increment analysis 409
We first show that we can ensure lack of geometric ergodicity if the drift to the right
is not uniformly controlled in terms of V as in (16.26), even for a chain satisfying all our
other conditions. To see this we consider a chain on Z+ with transition matrix given
by, for each i ∈ Z+ ,
P (0, i) = αi > 0,
P (i, i − 1) = γi > 0,
P (i, i + n) = [1 − γi ][1 − βi ]βin , n ∈ Z+ . (16.30)
where αi = 1 and γi , βi are less than unity for all i.
Provided iαi < ∞ and we choose γi sufficiently large that
[1 − γi ]βi /[1 − βi ] − γi ≤ −ε
for some ε > 0, then the chain is ergodic since V (x) = x satisfies (V2): this can be done
if we choose, for example,
γi ≥ βi + ε[1 − βi ].
so P0 (τ0 > n) does not decrease geometrically quickly, and the chain is not geometrically
ergodic from Theorem 15.4.2 (or directly from Theorem 15.1.1).
In this example we have bounded variances for the left tails of the increment dis-
tributions, and exponential tails of the right increments: it is the lack of uniformity in
these tails that fails along with the geometric convergence.
To show the need for (16.27), consider the chain on Z+ with the transition matrix
(15.20) given for all j ∈ Z+ by P (0, 0) = 0 and
then clearly the right hand increments are uniformly bounded in relation to V for j > 0:
but we find that
P (i, j)(V0 (j) − V0 (i))2 = P (i, 0)[1 − βi ]−2 = [1 − βi ]−1 → ∞, i → ∞.
Hence (16.27) is necessary in this model for the conclusion of Theorem 16.3.1 to be
valid.
410 V -Uniform ergodicity
We say that the chain is “g-skip-free to the left” if there is some k ∈ Z+ , such that for
all x ∈ X,
P (x, Ag ,k (x)) = 0, (16.31)
so that the chain can only move a limited amount of “distance” through the sublevel
sets of g in one step. Note that such skip-free behavior precludes Doeblin’s condition if
g is unbounded off petite sets, and requires a more random-walk-like behavior.
Theorem 16.3.2. Suppose that Φ is geometrically ergodic. Then there exists β > 0
such that
π(dy)eβ V C (y ) < ∞ (16.32)
Proof From geometric ergodicity, we have from Theorem 15.2.4 that for any petite
(r )
set C ∈ B + (X) there exists r > 1 such that V (y) = GC (y, X) satisfies (V4). It follows
from Theorem 14.3.7 that π(V ) < ∞. Using the interpretation (15.29) we have that
∞ > π(V ) ≥ π(dy)Ey [rσ C ]. (16.34)
we have
π(j) ∝ γj [1 − βj ]−1
and so for suitable choice of γj we can clearly ensure that the tails of π are geometric
or otherwise in the given topology, regardless of the geometric ergodicity of P .
Proof We have seen in Section 11.4 that V (i) = i is a solution to (16.25) with
C = {0}.
Let us now assume that the service time distribution H ∈ G + (γ). We prove that
(16.26) and (16.27) hold. Application of Theorem 16.3.1 then proves V ∗ -uniform er-
godicity of the embedded Markov chain where V ∗ (i) = eδ i for some δ > 0.
Let ak denote theprobability of k arrivals within one service. Note that (16.27)
trivially holds, since j ≤k P (k, j)(j − k)2 ≤ a0 . For l ≥ 0 we have
∞
1
P (k, k + l) = al+1 = e−λt (λt)l+1 dH(t).
(l + 1)! 0
Let δ > 0, so that
∞
eδ (l+1) P (k, k + l) ≤ exp{(eδ − 1)λt}dH(t)
l≥0 0
which is assumed to be finite for (eδ − 1)λ < γ. Thus we have the result.
To apply the results in this section, observe that for this embedded chain there are
only finitely many different possible one-step increments, depending on whether Φkn
exceeds
k or equals x <
k . Combined with the linearity of V , we conclude that both
sums
{ P (i, j)eλ(V (j )−V (i)) : i ∈ X}
j :V (j )≥V (i)
and
{ P (i, j)(V (j) − V (i))2 : i ∈ X}
j :V (j )< V (i)
have only finitely many non-zero elements. We must ensure that these expressions are
all finite, but it is straightforward to check as in Theorem 16.4.1 that convergence of
the Laplace–Stieltjes transforms of the service time distributions in a neighborhood of
0 is sufficient to achieve this, and the theorem follows.
for some constant c > 0, so that V0 (i) = c i1 + ci2 is thus linear in both components of
the state variable for i = 0.
Theorem 16.4.3. The h-approximation of the M/PH/1 queue as in (16.36) is geo-
metrically ergodic whenever it is ergodic, provided the phase distribution of the service
times is in G + (γ) for some γ > 0.
In particular if there are a finite number of phases ergodicity is equivalent to geo-
metric ergodicity for the h-approximation.
Φk +1 = (A + Γk +1 )Φk + Wk +1 (16.38)
Such models are developed in detail in [299], and we will assume familiarity with the
Kronecker product “⊗” and the “vec” operations, used in detail there. In particular we
use the basic identities
vec (ABC) = (C ⊗ A)vec (B),
(16.39)
(A ⊗ B) = (A ⊗ B ).
(RCA2) The following expectations exist, and have the prescribed values:
Theorem 16.5.1. If the assumptions (RCA1)–(RCA3) hold for the Markov chain de-
fined in (16.38), then Φ is V -uniformly ergodic, where V (x) = |x|2 . Thus these as-
sumptions suffice for a second-order stationary version of Φ to exist.
Proof Under the assumptions of the theorem the chain is weak Feller and we can
take ψ as µL e b on Rn . Hence from Theorem 6.2.9 the chain is an irreducible T-chain,
and compact subsets of the state space are petite. Aperiodicity is immediate from
the density assumption (RCA3). We could also apply the techniques of Chapter 7 to
conclude that Φ is a T-chain, and this would allow us to weaken (RCA3).
To prove |x|2 -uniform ergodicity we will use the following two results, which are
proved in [299]. Suppose that (RCA1) and (RCA2) hold, and let N be any n × n
positive definite matrix.
(i) If M is defined by
E[Φ
k (A + Γk +1 ) M (A + Γk +1 )Φk | Φk = x] = x M x − x N x. (16.41)
416 V -Uniform ergodicity
Now let N be any positive definite (n × n)-matrix and define M as in (16.40). Then
with V (x) := x M x,
for some λ < 1 and L < ∞, from which we see that (V4) follows, using Lemma 15.2.8.
Finally, note that for some constant c we must have c−1 |x|2 ≤ V (x) ≤ c|x|2 and the
result is proved.
This is far from (V4), but applying the operator P to the function θ̃2 y 2 gives
ασ 2 θ̃ − αΣyW 2 2
0 1
P θ̃2 y 2 = E + Z1 θ̃y + W1
σ02 + Σy 2
= σz2 θ̃2 y 2 + σz2 σw2
α 2
+ E[(σ02 θ̃ − ΣyW1 )2 (θ̃y + W1 )2 ]
σ02 + Σy 2
16.5. Autoregressive and state space models 417
When σz2 < 1 we combine (16.45)–(16.47) to find, for any 1 > ρ > max(σz2 , α4 ), con-
stants R < ∞ and ε0 > 0 such that with V defined in (16.44), P V ≤ ρV + R. Applying
Theorem 16.1.2 and Lemma 15.2.8 we have proved
Proposition 16.5.2. The Markov chain Φ is V -uniformly ergodic whenever σz2 < 1,
with V given by (16.44); and for all initial conditions x ∈ X, as k → ∞,
Ex [Yk ] → y 2 dπ
2
(16.48)
at a geometric rate.
Hence the performance of the closed loop system is characterized by the unique
invariant probability π.
From ergodicity of the model it can be shown that in steady state θ̃k = θk − E[θk |
Y0 , . . . , Yk ], and Σk = E[θ̃k2 | Y0 , . . . , Yk ]. Using these identities we now obtain bounds
on performance of the closed loop system by integrating the system equations with
respect to the invariant measure.
Taking expectations in (2.23) and (2.24) under the probability Pπ gives
Hence, by subtraction, and using the identity Eπ [|θ̃0 |2 ] = Eπ [Σ0 ], we can evaluate the
limit (16.48) as
σw2
Eπ [Y02 ] = 1 + α2 Eπ [|θ̃0 |2 ] . (16.49)
1 − σz2
This shows precisely how the steady state performance is related to the disturbance
intensity σw2 , the parameter variation intensity σz2 , and the mean square parameter
estimation error Eπ [|θ̃0 |2 ].
Using obvious bounds on Eπ [Σ0 ] we obtain the following bounds on the steady state
performance in terms of the system parameters only:
log10 Yk
30
0
k
1000
Figure 16.1: The output of the simple adaptive control model when the control Uk is set
equal to zero. The resulting process is equivalent to the dependent parameter bilinear
model with α = 0.99, Wk ∼ N (0, 0.01) and Zk ∼ N (0, 0.04).
16.6 Commentary*
This chapter brings together some of the oldest and some of the newest ergodic theorems
for Markov chains.
Initial results on uniform ergodicity for countable chains under, essentially, Doeblin’s
condition date to Markov [248]: transition matrices with a column bounded from zero
are often called Markov matrices. For general state space chains use of the condition
of Doeblin is in [93]. These ideas are strengthened in Doob [99], whose introduction
and elucidation of Doeblin’s condition as Hypothesis D (p. 192 of [99]) still guides the
analysis of many models and many applications, especially on compact spaces.
Other areas of study of uniformly ergodic (sometimes called strongly ergodic, or
quasi-compact) chains have a long history, much of it initiated by Yosida and Kakutani
[412] who considered the equivalence of (iii) and (v) in Theorem 16.0.2, as did Doob
[99]. Somewhat surprisingly, even for countable spaces the hitting time criterion of
Theorem 16.2.2 for uniformly ergodic chains appears to be as recent as the work of
Huang and Isaacson [164], with general-space extensions in Bonsdorff [38]; the obvious
value of a bounded drift function is developed in Isaacson and Tweedie [170] in the
countable space case. Nummelin ([303], Chapters 5.6 and 6.6) gives a discussion of
16.6. Commentary* 419
*
n
X= Hk ∪ E
k =0
where the Hi are disjoint absorbing sets and Φ restricted to any Hk is uniformly ergodic,
and E is uniformly transient.
The introduction to uniform ergodicity that we give here appears brief given the
history of such theory, but this is a largely a consequence of the fact that we have built
up, for ψ-irreducible chains, a substantial set of tools which makes the approach to this
class of chains relatively simple.
Much of this simplicity lies in the ability to exploit the norm ||| · |||V . This is a very
new approach. Although Kartashov [196, 197] has some initial steps in developing a
theory of general space chains using the norm ||| · |||V , he does not link his results to
the use of drift conditions, and the appearance of V -uniform results are due largely to
recent observations of Hordijk and Spieksma [366, 163] in the countable space case.
Their methods are substantially different from the general state space version we
use, which builds on Chapter 15: the general space version was first developed in [277]
for strongly aperiodic chains. This approach shows that for V -uniformly ergodic chains,
it is in fact possible to apply the same quasi-compact operator theory that has been
exploited for uniformly ergodic chains, at least within the context of the space L∞ V .
This is far from obvious: it is interesting to note Kendall himself ([203], p. 183) saying
that “ ... the theory of quasi-compact operators is completely useless” in dealing with
geometric ergodicity, whilst Vere-Jones [406] found substantial difficulty in relating
standard operator theory to geometric ergodicity. This appears to be an area where
reasonable further advances may be expected in the theory of Markov chains.
It is shown in Athreya and Pantula [15] that an ergodic chain is always strong mixing.
The extension given in Section 16.1.2 for V -uniformly ergodic chains was proved for
bounded functions in [92], and the version given in Theorem 16.1.5 is essentially taken
from Meyn and Tweedie [277].
Verifying the V -uniform ergodicity properties is usually done through test functions
and drift conditions, as we have seen. Uniform ergodicity is generally either a trivial
or a more difficult property to verify in applications. Typically one must either take
the state space of the chain to be compact (or essentially compact), or be able to
apply the Doeblin or small set conditions, in order to gain uniform ergodicity. The
identification of the rate of convergence in this last case is a powerful incentive to use
420 V -Uniform ergodicity
such an approach. The delightful proof in Theorem 16.2.4 is due to Rosenthal [341],
following the strong stopping time results of Aldous and Diaconis [1, 88], although the
result itself is inherent in Theorem 6.15 of Nummelin [303]. An application of this result
to Markov chain Monte Carlo methods is given by Tierney [385].
However, as we have shown, V -uniform ergodicity can often be obtained for some
V under much more readily obtainable conditions, such as a geometric tail for any
i.i.d. random variables generating the process. This is true for queues, general storage
models, and other random-walk-related models, as the application of the increment
analysis of Section 16.3 shows. Such chains were investigated in detail by Vere-Jones
[403] and Miller [284].
The results given in Section 16.3 and Section 16.3.2 are new in the case of general X,
but are based on a similar approach for countable spaces in Spieksma and Tweedie [368],
which also contains a partial converse to Theorem 16.3.2. There are some precursors to
these conditions: one obvious way of ensuring that P has the characteristics in (16.26)
and (16.27) is to require that the increments from any state are of bounded range, with
the range allowed depending on V , so that for some b
and in [243] it is shown that under the bounded range condition (16.50) an ergodic
chain is geometrically ergodic.
A detailed description of the polling system we consider here can be found in [2].
Note that in [2] the system is modeled slightly differently, with arrivals of the server at
each gate defining the times of the embedded process. The coupling construction used
to analyze the h-approximation to the phase-service model is based on [350] and clearly
is ideal for our type of argument. Further examples are given in [368].
For the adaptive control and linear models, as we have stressed, V -uniform ergodicity
is often actually equivalent to simple ergodicity: the examples in this chapter are chosen
to illustrate this. The analysis of the bilinear and the vector RCA model given here is
taken from Feigin and Tweedie [111]; the former had been previously analyzed by Tong
[387]. In a more traditional approach to RCA models through time series methods,
Nicholls and Quinn [299] also find (RCA2) appropriate when establishing conditions for
strict stationarity of Φ, and also when treating asymptotic results of estimators.
The adaptive model was introduced in [253] and a stability analysis appeared in
[270] where the performance bound (16.49) was obtained. Related results appeared in
[365, 148, 269, 130]. The stability of the multidimensional adaptive control model was
only recently resolved in Rayadurgam et al. [324].
Commentary for the second edition: In the first edition the vector-space setting
was credited to work of Kartashov (see preceding text). In fact its origin is the 1969 work
of Veinott [185] concerning controlled Markov models. Section 20.1 contains further
discussion on the recent evolution of topics in this chapter.
An early application of the skip-free condition is contained in [156], also in the
setting of controlled Markov models. Assumption (ii) of this paper is a version of the
g-skip-free property, in which the function g represents “reward” in a controlled model.
The implications of Doeblin’s condition to large deviations theory and to spectral
theory can be found in [140, 218, 408].
Chapter 17
Most of this chapter is devoted to the analysis of the series Sn (g), where we define for
any function g on X,
n
Sn (g) := g(Φk ). (17.1)
k =1
We are concerned primarily with four types of limit theorems for positive recurrent
chains possessing an invariant probability π:
(i) those which are based upon the existence of martingales associated with the chain;
(ii) the Strong Law of Large Numbers (LLN), which states that n−1 Sn (g) converges
to π(g) = Eπ [g(Φ0 )], the steady state expectation of g(Φ0 );
(iii) the Central Limit Theorem (CLT), which states that the sum Sn (g − π(g)), when
properly normalized, is asymptotically normally distributed;
(iv) the Law of the Iterated Logarithm (LIL) which gives precise upper and lower
bounds on the limit supremum of the sequence Sn (g − π(g)), again when properly
normalized.
The martingale results (i) provide insight into the structure of irreducible chains, and
make the proofs of more elementary ergodic theorems such as the LLN almost trivial.
Martingale methods will also prove to be very powerful when we come to the CLT for
appropriately stable chains.
The trilogy of the LLN, CLT and LIL provide measures of centrality and variability
for Φn as n becomes large: these complement and strengthen the distributional limit
theorems of previous chapters. The magnitude of variability is measured by the variance
given in the CLT, and one of the major contributions of this chapter is to identify the way
in which this variance is defined through the autocovariance sequence for the stationary
version of the process {g(Φk )}.
The three key limit theorems which we develop in this chapter using sample path
properties for chains which possess a unique invariant probability π are
421
422 Sample paths and limit theorems
LLN We say that the Law of Large Numbers holds for a function g if
1
lim Sn (g) = π(g) a.s. [P∗ ]. (17.2)
n →∞ n
CLT We say that the Central Limit Theorem holds for g if there exists a constant
0 < γg2 < ∞ such that for each initial condition x ∈ X,
$ % t 1
lim Px (nγg2 )−1/2 Sn (g) ≤ t = √ e−x /2 dx
2
n →∞ −∞ 2π
where g = g − π(g): that is, as n → ∞,
LIL When the CLT holds, we say that the Law of the Iterated Logarithm holds for g
if the limit infimum and limit supremum of the sequence
(2γg2 n log log(n))−1/2 Sn (g)
are respectively −1 and +1 with probability one for each initial condition x ∈ X.
Strictly speaking, of course, the CLT is not a sample path limit theorem, although it
does describe the behavior of the sample path averages and these three “classical” limit
theorems obviously belong together.
Proofs of all of these results will be based upon martingale techniques involving the
path behavior of the chain, and detailed sample path analysis of the process between
visits to a recurrent atom.
Much of this chapter is devoted to proving that these limits hold under various
conditions. The following set of limit theorems summarizes a large part of this devel-
opment.
Theorem 17.0.1. Suppose that Φ is a positive Harris chain with invariant probability
π.
(i) The LLN holds for any g satisfying π(|g|) < ∞.
(ii) Suppose that Φ is V -uniformly ergodic. Let g be a function on X satisfying g 2 ≤ V ,
and let g denote the centered function g = g − g dπ. Then the constant
∞
γg2 := Eπ [g 2 (Φ0 )] + 2 Eπ [g(Φ0 )g(Φk )] (17.3)
k =1
is well defined, non-negative and finite, and coincides with the asymptotic variance
1 2
lim Eπ Sn (g) = γg2 . (17.4)
n →∞ n
(iv) If the conditions of (ii) hold and if γg2 > 0, then the CLT and LIL hold for the
function g.
17.1. Invariant σ-fields and the LLN 423
Proof The LLN is proved in Theorem 17.1.7, and the CLT and LIL are proved in
Theorem 17.3.6 under conditions somewhat weaker than those assumed here.
It is shown in Lemma 17.5.2 and Theorem 17.5.3 that the asymptotic variance γg2 is
given by (17.3) under the conditions of Theorem 17.0.1, and the alternate representation
(17.4) of γg2 is given in Theorem 17.5.3. The a.s. convergence in (iii) when γg2 = 0 is
proved in Theorem 17.5.4.
While Theorem 17.0.1 summarizes the main results, the reader will find that there is
much more to be found in this chapter. We also provide here techniques for proving the
LLN and CLT in contexts far more general than given in Theorem 17.0.1. In particular,
these techniques lead to a functional CLT for f -regular chains in Section 17.4.
We begin with a discussion of invariant σ-fields, which form the basis of classical
ergodic theory.
When Y = IA for some A ∈ F, then the set A is called a Pµ -invariant event. The set
of all Pµ -invariant events is a σ-field, which we denote Σµ .
Suppose that an invariant probability measure π exists, and for now restrict attention
to the special case where µ = π. In this case, Σπ is equal to the family of invariant
events which is commonly used in ergodic theory (see for example Krengel [221]) and
is often denoted ΣI .
For a bounded, Pπ -invariant random variable Y we let hY denote the function
hY (x) := Ex [Y ], x ∈ X. (17.6)
Proof It follows from (17.7) that the adapted process (hY (Φk ), FkΦ ) is a convergent
martingale for which
lim hY (Φk ) = Y a.s. [Pπ ].
k →∞
When Φ0 ∼ π the process hY (Φk ) is also stationary, since Φ is stationary, and hence
the limit above shows that its sample paths are almost surely constant. That is, Y =
hY (Φk ) = hY (Φ0 ) a.s. [Pπ ] for all k ∈ Z+ .
It follows from Lemma 17.1.1 that if X ∈ L1 (Ω, F, Pπ ) then the Pπ -invariant random
variable E[X | Σπ ] is a function of Φ0 alone, which we shall denote X∞ (Φ0 ), or just
X∞ .
The function X∞ is significant because it describes the limit of the sample path
averages of {θk X}, as we show in the next result.
1 k
N
lim θ X = X∞ (x) a.s. [Px ].
N →∞ N
k =1
1 k
N
lim θ X = E[X | Σπ ] = X∞ (Φ0 ) a.s. [Pπ ].
N →∞ N
k =1
$ %
1 k
N
Px lim θ X = X∞ (x) π(dx) = 1.
N →∞ N
k =1
Since the integrand is always positive and less than or equal to one, this proves the
result.
1
N
Q{A} = lim sup I{Φk ∈ A}, π̃{A} = lim sup I{Φk ∈ A}
k →∞ N →∞ N
k =1
with A ∈ B(X).
A function h : X → R is called harmonic if, for all x ∈ X,
P (x, dy)h(y) = h(x). (17.8)
This is equivalent to the adapted sequence (h(Φk ), FkΦ ) possessing the martingale prop-
erty for each initial condition: that is,
E[h(Φk +1 ) | FkΦ ] = h(Φk ), k ∈ Z+ , a.s. [P∗ ].
For any measurable set A the function hQ {A } (x) = Q(x, A) is a measurable function of
x ∈ X which is easily shown to be harmonic. This correspondence is just one instance of
the following general result which shows that harmonic functions and invariant random
variables are in one-to-one correspondence in a well-defined way.
Theorem 17.1.3. (i) If Y is bounded and invariant, then the function hY is har-
monic, and
Y = lim hY (Φk ) a.s. [P∗ ].
k →∞
Proof For (i), first observe that by the Markov property and invariance we may
deduce as in the proof of Lemma 17.1.1 that
hY (Φk ) = E[Y | FkΦ ] a.s. [P∗ ].
Since Y is bounded, this shows that (hY (Φk ), FkΦ ) is a martingale which converges to
Y . To see that hY is harmonic, we use invariance of Y to calculate
P hY (x) = Ex [hY (Φ1 )] = Ex [E[Y | F1Φ ]] = hY (x).
426 Sample paths and limit theorems
To prove (ii), recall that the adapted process (h(Φk ), FkΦ ) is a martingale if h is har-
monic, and since h is assumed bounded, it is convergent. The conclusions of (ii) follow.
Theorem 17.1.3 shows that there is a one-to-one correspondence between invari-
ant random variables and harmonic functions. From this observation we have as an
immediate consequence
Proposition 17.1.4. The following two conditions are equivalent:
(i) All bounded harmonic functions are constant.
(ii) Σµ and hence Σ are Pµ -trivial for each initial distribution µ.
Finally, we show that when Φ is Harris recurrent, all bounded harmonic functions
are trivial.
Theorem 17.1.5. If Φ is Harris recurrent, then the constants are the only bounded
harmonic functions.
Proof We will give the proof for the LLN, since the proof of the result for the CLT
and LIL is identical.
Suppose that the LLN holds for the initial distribution µ0 , and let g∞ (x) =
Px { n1 Sn (g) → g dπ}. We have by assumption that
g∞ dµ0 = 1.
17.1. Invariant σ-fields and the LLN 427
We will now show that g∞ is harmonic, which together with Theorem 17.1.5 will imply
that g∞ is equal to the constant value 1, and thereby complete the proof. We have by
the Markov property and the smoothing property of the conditional expectation,
$ %
1
n
P g∞ (x) = Ex PΦ 1 lim g(Φk ) = g dπ
n →∞ n
k =1
$ %
1
n
= Ex Px lim g(Φk +1 ) = g dπ | F1Φ
n →∞ n
k =1
$ n + 1
1 g(Φ1 ) %
n +1
= Px lim g(Φk +1 ) − = g dπ
n →∞ n n+1 n
k =1
= g∞ (x).
From these results we may now provide a simple proof of the LLN for Harris chains.
Theorem 17.1.7. The following are equivalent when an invariant probability π exists
for Φ:
Proof (i) ⇒ (ii) If Φ is positive Harris with unique invariant probability π then
by Theorem 17.1.2, for each fixed f , there exists a set G ∈ B(X) of full π-measure such
that the conclusions of (ii) hold whenever the distribution of Φ0 is supported on G. By
Proposition 17.1.6 the LLN holds for every initial condition.
(ii) ⇒ (iii) Let Y be a bounded invariant random variable, and let hY be the as-
sociated bounded harmonic function defined in (17.6). By the hypotheses of (ii) and
Theorem 17.1.3 we have
1
N
Y = lim hY (Φk ) = lim hY (Φk ) = hY dπ a.s. [P∗ ],
k →∞ N →∞ N
k =1
428 Sample paths and limit theorems
σ α (j +1)
sj (f ) = f (Φi ) (17.9)
i=σ α (j )+1
By the strong Markov property the random variables {sj (f ) : j ≥ 0} are i.i.d. with
common mean
τα
Eα [s1 (f )] = Eα f (Φi ) = f dµ (17.10)
i=1
Proof For the proof we assume that each of the functions f and g are positive.
The general case follows by decomposing f and g into their positive and negative parts.
We also assume that π is equal to the measure µ defined implicitly in (17.10). This
is without loss of generality as any invariant measure is a constant multiple of µ by
Theorem 10.0.1.
For n ≥ σα we define
n
n := max(k : σα (k) ≤ n) = −1 + I{Φk ∈ α} (17.11)
k =0
17.2.2 The CLT and the LIL for chains possessing an atom
Here we show how the CLT and LIL may be proved under the assumption that an atom
α ∈ B + (X) exists.
The Central Limit Theorem (CLT) states that the normalized sum
(nγg2 )−1/2 Sn (g)
converges in distribution to a standard Gaussian random variable, while the Law of the
Iterated Logarithm (LIL) provides sharp bounds on the sequence
(2γg2 n log log(n))−1/2 Sn (g)
430 Sample paths and limit theorems
The actual variance of g(Φk ) in the stationary case is given by Theorem 10.0.1 as
τα 2
g 2 dπ = π{α}Eα g(Φk ) ;
k =1
This condition will be generalized to obtain the CLT and LIL for general positive
Harris chains in Sections 17.3–17.5. We state here the results in the special case where
an atom is assumed to exist.
By the law of large numbers for the i.i.d. random variables {(sj (|g|))2 : j ≥ 1},
1
N
lim (sj (|g|))2 = Eα [(s0 (|g|))2 ] < ∞
N →∞ N
j =1
and hence
N −1
1 1
N
lim (sj (|g|))2 − (sj (|g|))2 = 0.
N →∞ N
j =1
N − 1 j =1
From these two limits it follows that (sn (|g|))2 /n → 0 as n → ∞, and hence that
s
n (|g|) s
n (|g|)
lim sup √ ≤ lim sup √ =0 a.s. [P∗ ]. (17.16)
n →∞ n n →∞
n
This and (17.15) show that
1 n
1
n −1
√ g(Φi ) − √ sj (g) → 0 a.s. [P∗ ]. (17.17)
n i=1 n j =0
We
n now need a more delicate argument to replace the random upper limit in the sum
−1
j =0 sj (g) appearing in (17.17) with a deterministic upper bound.
First of all, note that
n
n
n
n ≤ ≤
n −1 .
j =0 sj (1) n j =0 sj (1)
n 1
n −1
lim = lim sj (1) = Eα [s0 (1)]−1 = π{α}. (17.18)
n →∞ n n →∞
n
j =1
Let ε > 0, n = !(1 − ε)π{α}n", n = (1 + ε)π{α}n, and n∗ = !π{α}n", where !x"
(x) denote the smallest integer greater than (greatest integer smaller than) the real
number x. Then by the result above, for some n0
Px {n ≤ n − 1 ≤ n} ≥ 1 − ε, n ≥ n0 . (17.19)
$ 1
n −1
1
n∗ % $ n∗ √ %
Px √ sj (g) − √ sj (g) > β ≤ ε + Px max ∗ sj (g) > β n
n j =0 n j =0 n ≤l≤n
j =l
$ l √ %
+ Px max
∗
sj (g) > β n
n ≤l≤n
j =n ∗
1
n
1
n∗
√ sj (g) − √ sj (g) → 0
n j =0 n j =0
1 n
1
n∗
√ g(Φi ) − √ sj (g) → 0 (17.20)
n i=1 n j =0
in probability. By the CLT for i.i.d. sequences, we may let σ 2 = Eα [(s0 (g))2 ] giving
$ % $ n∗
%
lim Px (nγg2 )−1/2 Sn (g) ≤ t = 2 −1/2
lim Px (nγg ) sj (g) ≤ t
n →∞ n →∞
j =0
;
$ !nπ{α}" 1
n ∗
%
= lim Px √ sj (g) ≤ t
n →∞ nπ{α} n∗ σ 2 j =0
t
1
√ e−1/2 x dx
2
=
−∞ 2π
1
n
lim sup - sj (g) = 1 a.s. [P∗ ]
n →∞ 2σ 2
n log log(
n ) j =1
and the corresponding lim inf is −1. Equation (17.18) shows that
n /n → π{α} > 0
and hence by a simple calculation log log
n / log log n → 1 as n → ∞. These relations
together with (17.17) imply
1
n
lim sup : g(Φk )
n →∞ 2γg2 n log log(n) k =1
1 1
n
= lim sup - - sj (g)
n →∞ π{α} 2σ 2 n log log(n) k =1
;
1
n log log(
n ) 1
n
= lim sup - - sj (g)
n →∞ π{α} n log log(n) 2σ 2
n log log(
n ) k =1
=1
and the corresponding lim inf is equal to −1 by the same chain of equalities.
17.3. General Harris chains 433
and hence, given that Yn = 1, the pre-nm process and post-(n + 1)m process are
independent: that is
Moreover, the distribution of the post-(n + 1)m process is the same as the P̌ν ∗ -
distribution of {(Φi , Yi ) : i ≥ 0}, with the interpretation that ν is “split” to form
ν ∗ as in (5.3) so that
Hence the set α̌ := C1 := C × {1} behaves very much like an atom for the chain.
We let σα̌ (0) denote the first entrance time of the split m-step chain to the set α̌,
and σα̌ (k) the k th entrance time to α̌ subsequent to σα̌ (0). These random variables are
defined inductively as
σ α̌ (i+1)
= Zj (f )
j =σ α̌ (i)+1
where
m −1
Zj (f ) = f (Φj m +k ).
k =0
From the remarks above and the strong Markov property we obtain the following result:
17.3. General Harris chains 435
are independent for any m ≥ 2. The distribution of si (f ) is, for any i, equal to the
τ α̌ m +m −1
P̌α̌ -distribution of the random variable k =m f (Φk ), which is equal to the P̌ν ∗ -
distribution of
+m −1
σ α̌ m σ α̌
f (Φk ) = Zk (f ). (17.22)
k =0 k =0
Proof From the definition of {σα̌ (k)} we have that the distribution of sn +j (f )
given s0 (f ), . . . , sn (f ) is equal to the distribution of si (f ) for all n ∈ Z+ , j ≥ 1. This
follows from the construction of {σα̌ (k)} which makes the distribution of Φσ α̌ (n +j )m +m
given FσΦα̌ (n +j )m ∨ FσYα̌ (n +j ) equal to ν.
From this we see that {sn (f ) : n ≥ 1} is a stationary sequence and, moreover,
that {sj (f )} is a one-dependent process: that is, {s0 (f ), . . . , sn −1 (f )} is independent
of {sn +1 (f ), . . . , } for all n ≥ 1.
From (17.22) we can express the common mean of {si (f )} in terms of the invariant
mean of f as follows
τ
Ě[si (f )] = Ěα̌ [ kα̌=1 Zk (f )]
∞
= Ěα̌ [6 k =1 Zk (f )I{k ≤ τα̌ }] 7
∞
= Ěα̌ Φ̌ m k [Z1 (f )]I{k ≤ τα̌ }
k =1 Ě
= δ −1 π(C)−1 π(dy)E y [Z1 (f )]
= δ −1 π(C)−1 m f dπ
where the fourth equality follows from the representation of π given in Theorem 10.0.1
applied to the split m-skeleton chain.
Define now, for each n ∈ Z+ ,
n := max{i ≥ 0 : mσα̌ (i) ≤ n}, and write
n m σ α̌ (0)+m −1
k =1 f (Φk ) = k
=1 f (Φk )
n −1
+ i=0 si (f ) (17.24)
n
+ k =m (σ α̌ (
n )+1) f (Φk ).
All of the ergodic theorems presented in the remainder of this section are based upon
Theorem 17.3.1 and the decomposition (17.24), valid for all n ≥ 1.
We now apply this construction to give an extension of the Law of Large Numbers.
Theorem 17.3.2. The following are equivalent when a σ-finite invariant measure π
exists for Φ:
(i) For every f , g ∈ L1 (π) with g dπ = 0,
Sn (f ) π(f )
lim = a.s. [P∗ ].
n →∞ Sn (g) π(g)
Proof We just prove the equivalence between (i) and (iii). The equivalence of (i)
and (ii) follows from the Chacon–Ornstein Theorem (see Theorem 3.2 of Revuz [326]),
and the same argument that was used in the proof of Theorem 17.1.7.
The “if” part is trivial: If f dπ > 0, then by the ratio limit result which is assumed
to hold,
Px {f (Φi ) > 0 i.o.} = 1
for all initial conditions, which is seen to be a characterization of Harris recurrence by
taking f to be an indicator function.
To prove that (iii) implies (i) we will make use of the decomposition (17.24) and
essentially the same proof that was used when an atom was assumed to exist in Theo-
rem 17.2.1.
From (17.24) we have
m σ α̌ (0)+m −1
n 1
n
s (f ) + f (Φ )
f (Φi )
n
n j =0 j k =1 k
i=1
n ≤ .
i=1 g(Φi )
n − 1 1
n −1
s (f )
n −1 j =0 j
1 1 1
N N N
lim sk (f ) = lim sk (f ) + lim sk (f )
N →∞ N N →∞ N N →∞ N
k =1 k=1 k=1
k odd k even
= 12 δ −1 π(C)−1 m f dπ + δ −1 π(C)−1 m f dπ
= δ −1 π(C)−1 m f dπ.
Theorem 17.3.3. Suppose that Φ is positive Harris, and suppose that π(|g|) < ∞.
Then the following limit holds:
1
lim max |g(Φk )| = 0 a.s. [P∗ ].
n →∞ n 1≤k ≤n
n −1
Since by Theorem 17.3.2 we have 1 1
n n −1 k =1 g(Φk ) → 0, it follows that (17.25) does
hold, and the proof is complete.
To illustrate the application of the LLN to the stability of stochastic models we will
now consider a linear system with random coefficients.
Proposition 17.3.4. If (DBL1) and (DBL2) hold, then θ is positive Harris recurrent
with invariant probability πθ . For any f : R → R satisfying
{f (x) ∨ 0} πθ (dx) < ∞
R
we have
1
N
lim f (θk ) = f (x) πθ (dx) a.s. [P∗ ].
N →∞ N R
k =1
When θ0 ∼ πθ the process is strictly stationary and may be defined on the positive
and negative time set Z. For this stationary process, the backwards LLN holds:
1
N
lim f (θ−k ) = f (x) πθ (dx) a.s. [Pπ θ ] . (17.26)
N →∞ N R
k =1
Proof The positivity of θ has already been noted prior to the proposition. The
first limit then follows from Theorem 17.1.7 when R f (x) πθ (dx) > −∞. Otherwise, we
have from Theorem 17.1.7 and integrability of f ∨ 0, for any M > 0,
1 1
N N
lim sup f (θk ) ≤ lim sup f (θk ) ∨ (−M ) = {f (x) ∨ (−M )} πθ (dx),
N →∞ N N →∞ N R
k =1 k =1
Proposition 17.3.5. Suppose that (DBL1) and (DBL2) hold, and that
log |x| πθ (dx) < 0. (17.27)
R
Then the joint process Φ = Yθ is positive recurrent and aperiodic.
Proof To begin, recall from Theorem 7.4.1 that the joint process Φ = Yθ is a
ψ-irreducible and aperiodic T-chain.
For y ∈ R fixed, let µy = πθ × δy denote the initial distribution which makes θ a
stationary process, and Y0 = y a.s. We will show that the distributions of Y , and hence
of Φ are tight whenever Φ0 ∼ µy . From the Feller property and Theorem 12.1.2, this
is sufficient to prove the theorem.
17.3. General Harris chains 439
k 1k 1
k
Yk +1 = ( θi )Wj + ( θi )Y0 + Wk +1 . (17.28)
j =1 i=j i=0
k
Establishing stability is then largely a matter of showing that the product i=j θi
converges to zero sufficiently fast. To obtain such convergence we will apply the LLN
Proposition 17.3.4 and (17.27), which imply that as n → ∞,
1
1
n n
1
log 2
θ−i =2 log |θ−i | → 2 log |x| πθ (dx) < 0. (17.29)
n i=0
n i=0 R
We will see that this limit, together with stationarity of the parameter process, implies
k
exponential convergence of the product i=j θi to zero. This will give us the desired
bounds on Y . k
To apply (17.29), fix constants L < ∞, 0 < ρ < 1, let Πj,k = i=j θi , and use
(17.28) and the inequality ab ≤ 12 (a2 + b2 ) to obtain the bound
Pµ y {|Yk +1 | ≥ L}
$ k %
≤ Pµ y |Πj,k ||Wj | + |Π0,k ||y| + |Wk +1 | ≥ L
j =1
$
k
k %
≤ Pµ y ρ−(k −j ) Π2j,k + ρ(k −j ) Wj2+1 ≥ 2L − (y 2 + 1)
j =0 j =0
$
k
1 + y2 % $ k
1 + y2 %
≤ Pµ y ρ−(k −j ) Π2j,k ≥ L − + Pµ y ρ(k −j ) Wj2+1 ≥ L − .
j =0
2 j =0
2
We now use stationarity of θ and independence of W to move the time indices within
the probabilities on the right hand side of this bound:
Pµ y {|Yk +1 | ≥ L}
$ k
1 + y2 %
≤ Pµ y ρ−(k −j ) Π2−(k −j ),0 ≥ L −
j =0
2
$
k
1 + y2 %
+ Pµ y ρ(k −j ) Wk2−j ≥ L −
j =0
2
$
∞
1 + y2 %
≤ Pµ y ρ−
Π2−
,0 ≥ L −
2
=0
$
∞
1 + y2 %
+ Pµ y ρ
W
2 ≥ L − . (17.30)
2
=0
∞
From Fubini’s Theorem we have, for any 0 < ρ < 1, that the sum
=0 ρ
W
2 converges
a.s. to a random variable with finite mean σw2 (1 − ρ)−1 .
440 Sample paths and limit theorems
∞
We now show that the sum
=0 ρ−
Π2−
,0 converges a.s. For this we apply the root
test. The logarithm of the nth root of the nth term an in this series is equal to
1
n
1
log(ann ) := log(ρ−n Π2−n ,0 ) n = − log(ρ) + 2
1
log |θ−i |.
n i=0
sup Pµ y {|Yk | ≥ L} → 0 as L → ∞,
k ≥0
When these conditions are satisfied we will show that the CLT variance may be
written
γg2 = m−1 π̌(α̌)Ěα̌ [(s1 (g))2 ] + 2m−1 π̌(α̌)Ěα̌ [s1 (g)s2 (g)] (17.32)
where π̌ is the invariant probability measure for the split chain and π̌(α̌) = δπ(C).
We may now present
Theorem 17.3.6. Suppose that Φ is ergodic and that (17.31) holds. Then 0 ≤ γg2 < ∞,
and if γg2 > 0 then the CLT and LIL hold for g.
Proof The proof is only a minor modification of the previous proof: we recall that
n := max(k : mσα̌ (k) ≤ n) and observe that in a manner similar to the derivation of
(17.17) we may show that
n −1
1 n
1
√ g(Φj ) − √ sj (g) → 0 a.s. (17.33)
n n j =0
j =0
This can be used to replace the upper limit of the second sum in (17.33) by a de-
terministic bound, just as in the proof of Theorem 17.2.2. Indeed, stationarity and
one-dependence of {sj (g) : j ≥ 1} allow us to apply Kolmogorov’s inequality Theo-
rem D.6.3 to obtain the following analogue of (17.20): letting n∗ := !m−1 π̌(α̌)n", we
have from (17.34) and (17.33) that
n∗
1 n
1
√ g(Φ ) − √ s (g)→0 (17.35)
n i
n j =0
j
i=0
in probability.
To complete the proof we will obtain a version of the CLT for one-dependent, sta-
tionary stochastic processes.
442 Sample paths and limit theorems
Fix an integer m ≥ 2 and define ηj = sj m +1 (g) + · · · + s(j +1)m −1 (g). For all n ∈ Z+
we may write
n /m −1
n /m −1
1
n n
1 1 1
√ sj (g) = √ ηj + √ sm j (g) + √ sj (g). (17.36)
n j =1 n n n
j =0 j =1 j =m
n /m
The last term converges to zero in probability, so that it is sufficient to consider the first
and second terms on the RHS of (17.36). Since {si (g) : i ≥ 1} is stationary and one-
dependent, it follows that {ηj } is an independent and identically distributed process,
and also that {sm j (g) : j ≥ 1} is i.i.d.
The common mean of the random variables {ηj } is zero, and its variance is given
by the formula
2
σm := Ě[ηj2 ] = (m − 1)Ě[s1 (g)2 ] + 2(m − 2)Ě[s1 (g)s2 (g)].
and
n /m
1
sm j (g) −→ N (0, m−1 σs2 ),
d
√
n j =0
m−1 σm
2
→ σ̄ 2 := Ě[s1 (g)2 ] + 2Ě[s1 (g)s2 (g)],
m−1 σs2 → 0,
1
n
d
√ sj (g) −→ N (0, σ̄ 2 ) as n → ∞.
n j =1
1
n
√ g(Φi ) → N (0, m−1 π̌(α̌)σ̄ 2 ) as n → ∞
n i=0
Using an expression similar to (17.36) together with the LIL for i.i.d. sequences we
can easily show that the upper and lower limits of
1
n
- sk (g)
2nσ̄ 2 log log n k =1
are +1 and −1 respectively. Here the proof of Theorem 17.2.2 may be adapted to prove
the LIL, which completes the proof of Theorem 17.3.6.
Proposition 17.4.1. Suppose that Φ is positive Harris, and suppose that ĝ and ĝ• are
two solutions to Poisson’s equation with π(|ĝ| + |ĝ• |) < ∞. Then for some constant c,
ĝ(x) = c + ĝ• (x) for a.e. x ∈ X [π].
444 Sample paths and limit theorems
Since by assumption π(|h|) < ∞, it follows from Theorem 14.3.6 that h(x) = π(h) for
a.e. x.
The expectationis well defined if the chain is f -regular for some f ≥ |g|. Since 0 =
τ
π(g) = π(α)Eα [ kα=1 g(Φk )], we have
σα
P ĝ (x) = Ex g(Φk ) I(x ∈ α)
k =1
τα
+ Eα g(Φk ) I(x ∈ α)
k =1
σα
= Ex g(Φk ) I(x ∈ α).
k =1
Since ĝ(z) = g(z) for all z ∈ α, this shows that for all x,
σα
P ĝ (x) = Ex g(Φk ) − g(x) = ĝ(x) − g(x),
k =0
also satisfies the bound |ĝ| ≤ R(V + 1), and clearly satisfies Poisson’s equation. We
state a generalization of this important observation as Theorem 17.4.2. The assumption
that π(V ) < ∞ is removed in Theorem 17.7.1.
Theorem 17.4.2. Suppose that Φ is ψ-irreducible, and that (V3) holds with V every-
where finite, f ≥ 1, and C petite. If π(V ) < ∞, then for some R < ∞ and any |g| ≤ f ,
Poisson’s equation (17.37) admits a solution ĝ satisfying the bound |ĝ| ≤ R(V + 1).
Proof The aperiodic case follows from absolute convergence of the sum in (17.39).
In the general periodic case it is convenient to consider the Ka ε chain, which is always
strongly aperiodic when Φ is ψ-irreducible by Proposition 5.4.5.
To begin, we will show that the resolvent or Ka ε -chain satisfies a version of (V3)
with the same function f and a scaled version of the function V used in the theorem.
We will on two occasions apply the identity
Ka ε Vε ≤ Vε − f + b Ka ε IC .
Since C is petite for Φ and hence also for the Ka ε -chain by Theorem 5.5.6, the set
Cn := {x : Ka ε (x, C) ≥ 1/n} is petite for the Ka ε -chain for all n. Note that C ⊆ Cn
for n sufficiently large. Since Cn is petite we may adopt the proof of Theorem 14.2.9:
scaling Vε as necessary, we may choose n and bε so large that
Ka ε Vε ≤ Vε − f + bε IC n .
Thus the Ka ε -chain is f -regular. By aperiodicity there exists a constant Rε < ∞ such
that for any |g| ≤ f , we have a solution ĝε to Poisson’s equation
The second sum on the right hand side is a telescoping series, which telescopes to
P ĝ (Φ0 ) − P ĝ (Φn ). We will prove in Theorem 17.4.3 that the first sum is a martingale,
which shall be denoted
n
Mn (g) = [ĝ(Φk ) − P ĝ (Φk −1 )]. (17.43)
k =1
Hence Sn (g) is equal to a martingale, plus a term which can be easily bounded. We
summarize these observations in
Theorem 17.4.3. Suppose that Φ is positive Harris and that a solution to Poisson’s
equation (17.37) exists with |ĝ| dπ < ∞. Then when Φ0 ∼ π, the series Sn (g) may be
written
Sn (g) = Mn (g) + P ĝ (Φ0 ) − P ĝ (Φn ) (17.44)
where (Mn (g), FnΦ ) is the martingale defined in (17.43).
Proof The expression (17.44) was established prior to the theorem statement. To
see that (Mn (g), FnΦ ) is a martingale, apply the identity
Theorem 17.4.4. Suppose that Φ is positive Harris, and suppose that g is a function
on X for which a solution ĝ to the Poisson equation exists with π(ĝ 2 ) < ∞. If the
constant
γg2 := π(ĝ 2 − {P ĝ}2 ) (17.46)
is strictly positive, then as n → ∞,
Since π(ĝ 2 ) < ∞, by Jensen’s inequality we also have π({P ĝ}2 ) < ∞. Hence by
Theorem 17.3.3 it follows that
1
max {P ĝ (Φk )}2 → 0 a.s. [Pπ ]
n 1≤k ≤n
as n → ∞. That is, |(nγg2 )−1/2 (sn − mn )|c → 0 in C[0, 1] with probability one. To prove
d
the theorem, it is therefore sufficient to show that (nγg2 )−1/2 mn (t) −→ B.
We complete the proof by showing that the conditions of Theorem D.6.4 hold for
the martingale Mn (g).
448 Sample paths and limit theorems
Since we have assumed that ĝ 2 is π-integrable, it follows that the function P ĝ 2 − {P ĝ}2
is also π-integrable. Hence the LLN holds:
1
n
lim Eπ [(Mk (g) − Mk −1 (g))2 | FkΦ−1 ] = π(P ĝ 2 − {P ĝ}2 ) = γf2 a.s.
n →∞ n
k =1
We now establish (D.9). Again by the LLN we have for any b > 0,
1
n
lim Eπ [(Mk (g) − Mk −1 (g))2 I{(Mk (g) − Mk −1 (g))2 ≥ b} | FkΦ−1 ]
n →∞ n
k =1
= Eπ [(ĝ(Φ1 ) − P ĝ (Φ0 ))2 I{(ĝ(Φ1 ) − P ĝ (Φ0 ))2 ≥ b}]
which tends to zero as b → ∞. It immediately follows that (D.9) holds for any ε > 0,
and this completes the proof.
As an illustration of the implications of Theorem 17.4.4 we state the following
corollary, which is an immediate consequence of the fact that both h(u) = u(1) and
h(u) = max0≤t≤1 u(t) are continuous functionals on u ∈ C[0, 1].
Theorem 17.4.5. Under the conditions of Theorem 17.4.4, the CLT holds for g with
γg2 given by (17.46), and as n → ∞,
γg2 = π(ĝ 2 − {ĝ − g}2 ) = 2π(ĝg) − π(g 2 ) = Eπ [2ĝ(Φ0 )g(Φ0 ) − g 2 (Φ0 )]. (17.49)
This immediately gives the representation (17.3) for γg2 whenever the expectation with
respect to π and the infinite sum may be interchanged. We will give such conditions in
the next section, under which the identity (17.3) does indeed hold.
Note that if we substituted in a different formula for ĝ we would arrive at an entirely
different formula. We now show that by taking the specific form (17.38) for ĝ we
can connect the expression for the asymptotic variance given in Section 17.2 with the
formulas given here.
Recall that using the approach of Section 17.2 based upon the existence of an atom
we arrived at the identity
τα 2
γg2 = π(α)Eα g(Φk ) . (17.51)
1
It may seem unlikely a priori that the two expressions (17.49) and (17.51) coincide.
However, as required by the theory, it is of course true that the identity
τα 2
π(α)Eα g(Φk ) = Eπ [2ĝ(Φ0 )g(Φ0 ) − g 2 (Φ0 )] (17.52)
k =1
τα
ĝ(x) = Ex g(Φj )
j =0
τα
Eπ [2ĝ(Φ0 )g(Φ0 ) − g 2 (Φ0 )] = π(α)Eα 2g(Φk )ĝ(Φk ) − g 2 (Φk )
k =1
τα
σα
= π(α)Eα 2g(Φk )EΦ k g(Φj ) − g 2 (Φk )
k =1 j =0
τα σα
= π(α)Eα 2g(Φk )E θk g(Φj ) | FkΦ − g 2 (Φk ) .
k =1 j =0
σα
τα
θk g(Φj ) = g(Φj )
j =0 j =k
450 Sample paths and limit theorems
Proof It follows from Lemma 15.2.9 that the chain is V -uniform, and hence (V3)
holds with this V . The finiteness of π(V 2 ) follows from finiteness of π(V ), which is a
consequence of the f -Norm Ergodic Theorem 14.0.1.
The following result shows that (V3) provides a sufficient condition under which the
assumptions imposed in Section 17.4 and Section 17.3 are satisfied.
Lemma 17.5.2. Under the CLT moment condition on V, f above we have:
(i) there exists a constant R < ∞ such that for any function g which satisfies the
bound |g| ≤ f , Poisson’s equation (17.37) admits a solution ĝ with |ĝ| ≤ R(V +1);
(ii) the split chain satisfies the bound
τ
α̌ −1 2
Ěα̌ Z
(f ) <∞ (17.53)
=0
and hence the CLT moment condition (17.31) holds for any function g with |g| ≤
f.
Under the assumption that π(V 2 ) < ∞ we see from the representation of π that
τ
α̌ −1
τ α̌ 2
Ěα̌ ĚΦ̌
Zk (f ) ≤ (π̌(α̌))−1 (R0 R1 )2 π([V + 1]2 ) < ∞. (17.54)
=0 k =0
452 Sample paths and limit theorems
τ
α̌ −1 2 2
Ěα̌ Z
(f ) = (π̌(α̌))−1 Eπ Z0 (f ) ≤ (π̌(α̌))−1 m2 π(f 2 ) < ∞. (17.55)
=0
τ
α̌ −1
τ α̌ 2
Ěα̌ Z
(f ) + ĚΦ̌
+ 1 Zk (f ) < ∞.
=0 k =0
It is now relatively easy to show that the bound (17.53) holds. We may calculate using
the ordinary Markov property,
τ
α̌ −1
τ α̌ 2
∞ > Ěα̌ Z
(f ) + ĚΦ̌
+ 1 Zk (f )
=0 k =0
τ
α̌ −1
τ α̌ 2
= Ěα̌ Z
(f ) + Ě Zk (f ) | F̌m (
+1)
=0 k =
+1
τ
α̌ −1
τ α̌ τ
α̌ −1 2
≥ 2Ěα̌ Z
(f )Ě Zk (f ) | F̌m (
+1) + Ěα̌ Z
(f )
=0 k =
+1
=0
τ
α̌ −1
τ α̌ τ
α̌ −1 2
= 2Ěα̌ Z
(f )Zk (f ) + Ěα̌ Z
(f )
=0 k =
+1
=0
τ
α̌ −1 2
= Ěα̌ Z
(f ) .
=0
Theorem 17.5.3. Assume the CLT moment condition on V, f , and let g be a function
on X with |g| ≤ f . Then the constant γg2 defined as
1 2 ∞
γg2 = lim Eπ Sn (g) = Eπ [g 2 (Φ0 )] + 2 Eπ [g(Φ0 )g(Φk )] (17.56)
n →∞ n
k =1
Proof To obtain the representation (17.56) for γg2 , apply the identity (17.44), from
which we obtain
Eπ [(Sn (g) − Mn (g))2 ] ≤ 4π(ĝ 2 ).
17.5. Criteria for the CLT and the LIL 453
n
Since Eπ [Mn (g)2 ] = 1 Eπ [(Mk − Mk −1 )2 ] = nγg2 , it follows that n1 Eπ [Sn (g)2 ] → γg2 as
n → ∞. ∞
We now show that n1 Eπ [Sn (g)2 ] → −∞ Eπ [g(Φ0 )g(Φk )].
First we show that this sum converges absolutely. By the f -Norm Ergodic Theo-
rem 14.0.1 we have for some R < ∞, and each x,
∞
∞
|Ex [g(Φ0 )g(Φk )]| ≤ |g(x)|
P k (x, · ) − π
f
k =0 k =0
≤ |g(x)|R(V (x) + 1).
and hence
∞
|Eπ [g(Φ0 )g(Φk )]| ≤ R (π(V 2 ) + 1) < ∞.
k =0
1
n n
1 2 2
Eπ [Sn (g) ] = Eπ [g(Φ0 ) ] + 2 Eπ [g(Φk )g(Φj )]
n n
k =1 j =k +1
1
n −1 n −1−k
= Eπ [g(Φ0 )2 ] + 2 Eπ [g(Φ0 )g(Φi )] ,
n i=1
k =0
∞
and the right hand side converges to −∞ Eπ [g(Φ0 )g(Φk )] as n → ∞.
To prove that the CLT and LIL hold when γg2 > 0, observe that by Lemma 17.5.2
under the conditions of this section the hypotheses of both Theorem 17.3.6 and Theo-
rem 17.4.5 are satisfied. Theorem 17.3.6 gives the CLT and LIL, and Theorem 17.4.5
shows that the asymptotic variance is equal to π(ĝ 2 − (P ĝ)2 ).
So far we have left open the question of what happens when γg2 = 0. Under the
conditions of Theorem 17.5.3 it may be shown that in this case
1 d
√ Sn (g) −→ 0.
n
We leave the proof of this general result to the reader. In the next result we give a
criterion for the CLT and LIL for V -uniformly ergodic chains, and show that for such
chains √1n Sn (g) converges to zero with probability one when γg2 = 0.
Theorem 17.5.4. Suppose that Φ is V -uniformly ergodic. If g 2 < V , then the conclu-
sions of Theorem 17.5.3 hold, and if γg2 = 0, then
1
√ Sn (g) → 0 a.s. [P∗ ].
n
454 Sample paths and limit theorems
Proof In view of Lemma 17.5.1 and Theorem 17.5.3, the only result which requires
proof is that ( √1n Sn (g) : n ≥ 1) converges to zero when γg2 = 0.
Recalling (17.44) we have
We have shown that √1n P ĝ (Φn ) → 0 a.s. in the proof of Theorem 17.4.4. To prove the
theorem we will show that (Mn (g)) is a convergent sequence.
We have for all n and x,
n
Ex [(Mn (g))2 ] = Ex [P (Φk −1 , ĝ 2 ) − P (Φk −1 , ĝ)2 ] .
k =1
Letting G(x) = P (x, ĝ 2 ) − P (x, ĝ)2 we have 0 ≤ G ≤ RV for some R < ∞, and
π(G) = γg2 = 0. Hence by Theorem 15.0.1,
n ∞
Ex [(Mn (g))2 ] = Ex [G(Φk −1 )] ≤ |P k (x, G) − π(G)| < ∞ .
k =1 k =0
By the Martingale Convergence Theorem D.6.1 it follows that (Mn (g)) converges to a
finite constant, and is hence bounded in n with probability one.
17.6 Applications
From Theorem 17.0.1 we see that any of the V -uniform models which were studied
in the previous chapter satisfy the CLT and LIL as long as the asymptotic variance is
positive. We will consider here two models where moment conditions on the disturbance
process may be given explicitly to ensure that the CLT holds. In the first we avoid
Theorem 17.0.1 since we can obtain a stronger result by using Theorem 17.5.3, which
is based upon the CLT moment condition of the previous section.
Proposition 17.6.1. If the increment distribution Γ has mean β < 0 and finite fifth
moment, then the associated random walk on a half line is positive Harris and the CLT
and LIL hold for the process {Φk : k ≥ 0}.
17.6. Applications 455
∞
The asymptotic variance may be written using (17.3) as γg2 = −∞ Eπ [Φ̄k Φ̄0 ], or
using (17.13) with α = {0} we have
τ0 2
γg2 = π(0)E0 Φk − Eπ [Φk ] .
k =1
where h
= c F
G and (Wk : k ∈ Z) are i.i.d. with mean zero and covariance ΣW =
E[W W ], which is assumed to be finite in (LSS2).
Let R(k) denote the autocovariance sequence for the stationary process:
R(k) = Eπ [Yk Y0 ], k ∈ Z.
If the CLT holds for the process Y , then we have seen that the asymptotic variance,
which we shall denote γc2 , is equal to
∞
γc2 = R(k). (17.57)
k =−∞
The autocovariance sequence can be analyzed through its Fourier series, and this ap-
proach gives a simple formula for the limiting variance γc2 .
The process Y has a spectral density D(ω) which is obtained from the autocovari-
ance sequence through the Fourier series
∞
D(ω) = R(m)eim ω ,
m =−∞
where
∞
iω
H(e ) = h
ei
ω = c (I − eiω F )−1 G.
=0
From these calculations we obtain the following CLT for the linear state space model:
Theorem 17.6.2. Consider the linear state space model defined by (LSS1) and (LSS2).
If the eigenvalue condition (LSS5), the nonsingularity condition (LSS4) and the con-
trollability condition (LCM3) are satisfied, then the model is V -uniformly ergodic with
V (x) = |x|2 + 1.
For any vector c ∈ Rn , the asymptotic variance is given by the formula
and the CLT and LIL hold for process Y when γc2 > 0.
Proof We have seen in the proof of Theorem 12.5.1 that (V4) holds for the linear
state space model with V (x) = 1 + x M x, where M is a positive matrix (see (12.34)).
Under the conditions of Theorem 17.6.2 we also have that Φ is a ψ-irreducible and
aperiodic T-chain by Proposition 6.3.5. By Lemma 17.5.1 and Theorem 17.5.2 it follows
that the CLT and LIL hold for Y and that the asymptotic variance is given by (17.57).
The closed form expression for γc follows from the chain of identities
∞
γc2 = R(k) = D(0) = c (I − F )−1 GΣW G (I − F )−1 c.
k =−∞
Had we proved the CLT for vector-valued functions of the state, it would be more
natural in this example to prove directly that the CLT holds for X. In fact, an extension
of Theorem D.6.4 to vector-valued processes is possible, and from such a generalization
we have under the conditions of Theorem 17.6.2 that
1
n
Xk Xk −→ N (0, Σ)
d
√
n
k =1
17.7 Commentary*
The results of this chapter may appear considerably deeper than those of other chapters,
although in truth they are often straightforward from more global stochastic process
results, given the embedded regeneration structure of the split chain, or given the
existence of a stationary version (that is, of an invariant probability measure) for the
chain.
One of the achievements of this chapter is the identification of these links, and in
particular the development of a drift-condition approach to the sample path and central
limit laws.
17.7. Commentary* 457
These laws are of value for Markov chains exactly as they are for all stochastic
processes: the LLN and CLT, in particular, provide the theoretical basis for many
results in the statistical analysis of chains as they do in related fields. In particular,
the standard proofs of asymptotic efficiency and unbiasedness for maximum likelihood
estimators is largely based upon these ergodic theorems. For this and other applications,
the reader is referred to [151].
The Law of Large Numbers has a long history whose surface we can only skim
here. Theorem 17.1.2 is a result of Doob [99], and the ratio form for Harris chains
Theorem 17.3.2 is given in Athreya and Ney [14]. Chapter 3 of Orey [309] gives a good
overview of related ratio limit theorems.
The classic text of Chung [71] gives in Section I.16 the CLT and LIL for chains on a
countable space from which we adopt many of the proofs of the results in Section 17.2
and Section 17.3. Versions of the Central Limit Theorem for Harris chains may be
found in Cogburn [74] and in Nummelin and Niemi [303, 300]. The paper [300] presents
an excellent survey of what was the state of the art at that time, and also an excellent
development of CLTs in a context more general than we have given.
Neveu remarks in [296] that “the relationship between the theory of martingales
and the theory of Markov chains is very deep”. At that time he referred mainly to
the connections between harmonic functions, martingales, and first hitting probabilities
for a Markov chain. In Section III-5 of [296] he develops fairly briefly a remarkably
strong classification of a Markov chain as either recurrent or transient, based mainly
on martingale limit theory and the existence of harmonic functions. Certainly the
connections between martingales and Markov chains are substantial. From the largely
martingale-based proof of the functional CLT described in this chapter, and the more
general implications of Poisson’s equation and its associated martingale to the ergodic
theory of Markov chains, it appears that the relationship between Markov chains and
martingales is even richer than was thought at the time of Neveu’s writing.
The martingale approach via solutions to Poisson’s equation which is developed in
Section 17.4 is adopted from Duflo [102] and Maigret [242].
For further results on the potential theory of positive kernels we refer the reader to
the seminal work of Neveu [295], Revuz [326] and Constantinescu and Cornea [77], and
to Nummelin [304] for the most current development. Applications to Markov processes
evolving in continuous time are developed in Neveu [295], Kunita [229], and Meyn and
Tweedie [278].
For an excellent account of Central Limit Theorems and versions of the Law of
the Iterated Logarithm for a variety of processes the reader is referred to Hall and
Heyde [151]. Martingale limit theory as presented in, for example, Hall and Heyde [151]
allows several obvious extensions of the results given in Section 17.4. For example, a
functional Law of the Iterated Logarithm for Markov chains can be proved in a manner
similar to the functional Central Limit Theorem given in Theorem 17.4.4. Using the
almost sure invariance principle given in Brosamler [54] and Lacey and Philipp [233], it
is likely that an almost sure Central Limit Theorem for Markov chains may be obtained
under an appropriate drift condition, such as (V4).
In work closely related to the development of Section 17.4, Kurtz [231] considers
chains arising in models found in polymer chemistry. These models evolve on the
surface of a three-dimensional sphere X = S 2 , and satisfy a multidimensional version of
458 Sample paths and limit theorems
Poisson’s equation:
P (x, dy)y = ρx
X
where |ρ| < 1. Bhattacharaya [35] also considers the CLT and LIL for Markov processes,
using an approach based upon the analogue of Poisson’s equation in continuous time.
If a solution to Poisson’s equation cannot be found directly as in [231], then a more
general approach is needed. This is the main motivation for the development of the
drift criteria (V3) and (V4) which is central to this chapter, and all of Part III. Most of
these results are either new or very recent in this general state space context. Meyn and
Tweedie [277] use a variant of (V4) to obtain the CLT and LIL for ψ-irreducible Markov
chains giving Theorem 17.0.1, and the use of (V3) to obtain solutions to Poisson’s
equation is taken from Glynn and Meyn [139]. Applications to random walks and
linear models similar to those given in Section 17.6 are also developed in [139].
Proposition 17.3.5, which establishes stability of the dependent parameter bilinear
model, is taken from Brandt et al. [45] where further related results may be found.
The finiteness of the fifth moment of the increment process which is imposed in
Proposition 17.6.1 is close to the right condition for guaranteeing that the random walk
obey the CLT. Daley [83] shows that for the GI/G/1 queue a fourth moment condition
is necessary and sufficient for the absolute convergence of the sum
∞
Eπ [Φ̄k Φ̄0 ]
−∞
where Φ̄k = Φk − Eπ [Φk ]. Recall that this sum is precisely the asymptotic variance
used in Proposition 17.6.1. This strongly suggests that the CLT does not hold for the
random walk on the half line when the increment process does not have a finite fourth
moment, and also suggests that the CLT may indeed hold when the fourth moment is
finite. These subtleties are described further in [139].
Commentary for the second edition: Of all the topics covered in this book, those
in this chapter have seen the greatest growth since 1996. The number of recognized
open questions has grown at least as quickly as the number of papers providing answers.
Section 20.2 contains a survey of advances in simulation methodology based on theory
developed in this book.
The CLT for Markov chains is better understood today. Sufficient conditions for the
CLT are obtained in [252] under conditions that appear close to minimal, and minimal
conditions for chains that are reversible1 are established in [209].
A future edition of this book will surely draw from Jones’s survey [183], which
contains many examples along with a streamlined account of the theory. Another survey
by Landim [236] develops theory for reversible chains. The rate of convergence in the
CLT for geometrically ergodic chains is investigated in [218, 219] – see Section 20.1.5
for results concerning more exotic limit theory, such as large deviations.
Looking back at the first edition, it is a surprise to see how little attention is devoted
to Poisson’s equation (17.37). This equation is central to many areas in statistics and
engineering:
1 See discussion surrounding (20.5) in the new Chapter 20.
17.7. Commentary* 459
(ii) This equation emerges in various aspects of statistics and limit theory such as
Markov renewal theory [132, 133] and refinements of the CLT [218, 219].
(iii) The martingale property described in Theorem 17.4.3 is central to variance anal-
ysis of simulation algorithms. Section 20.2.1 contains a brief survey on the appli-
cation of Poisson’s equation to variance-reduction techniques.
(iv) In controlled Markov models (also called Markov decision processes, or MDPs), a
variant of Poisson’s equation is known as the (average cost) dynamic programming
equation. In this context, the function g appearing in (17.37) is the associated
cost function, and the solution ĝ is called the relative value function [28, 262, 261,
67, 263, 42, 27, 267].
(v) Perturbation theory is typically addressed using Poisson’s equation, following the
work of Schweitzer [347]. Suppose that {Pα : α ∈ (−1, 1)} is a family of transi-
tion kernels, each ergodic with invariant measure πα . Let c denote a measurable
function on X, let ηα = πα (c), and let ĉα denote the solution to Poisson’s equation,
Pα ĉα = ĉα − c + ηα .
(vi) The ‘multiplicative Poisson equation’ is central to the theory of large deviations
for Markov chains – see Section 20.1.5 – and also risk-sensitive optimal control
for MDP models [41]. Closely related techniques are also used in the analysis of
change-detection algorithms [131].
Theorem 17.7.1. Suppose that Φ is ψ-irreducible, and that (V3) holds with V every-
where finite, f ≥ 1, and C petite. Then, for some B < ∞ and any |g| ≤ f , the Poisson
equation (17.37) admits a solution ĝ satisfying the bound |ĝ| ≤ B(V + 1).
460 Sample paths and limit theorems
Proof Suppose first that the chain is strongly aperiodic. We then consider the
split chain – the solution to Poisson’s equation is given by ĝ(x) = Gα (x, g), as discussed
following (17.38).
In the completely general setting we proceed as in the proof of Theorem 17.4.2. The
resolvent kernel Ka ε defined in (3.26) is strongly aperiodic for any ε ∈ (0, 1). We can
solve Poisson’s equation (17.41) for this kernel: the solution satisfies |ĝε | ≤ Bε (V +1) for
some fixed constant Bε . We then recall (17.42), which defines ĝ = ε(1−ε)−1 Ka ε ĝε . The
function ĝ solves Poisson’s equation, and this completes the proof with B = ε(1−ε)−1 Bε .
Application to performance approximation requires a significant strengthening of
the converse result Proposition 17.4.1. Frequently we are given an invariance equation
of the form
Ph = h − g + η (17.59)
where g and h are measurable functions and η is constant, and we hope to infer that
π(g) = η. We obtain the upper bound π(g) ≤ η by the Comparison Theorem when g
and h are each non-negative valued.
To strengthen the Comparison Theorem and deduce that π(g) = η we require bounds
on g and h. Suppose that a third function f ≥ 1 is known to be π-integrable. We have
seen in the proof of Theorem 14.2.6 that a solution to (V3) is given by
σC
∗
V (x) := GC (x, f ) = Ex f (Φk ) , (17.60)
k =0
with C ∈ B+ any f -regular set. The following result is adapted from [267, Proposition
A.6.2].
n −1
n−1 P n h (x) = n−1 h(x) + η − n−1 P k g (x) .
k =0
The right hand side converges to η − π(g) for a.e. x by the f -Norm Ergodic Theo-
rem 14.0.1 in the aperiodic case, and by Theorem 14.3.6 in general. The left hand side
converges to zero by Lemma 17.7.3, which follows.
Lemma 17.7.3. Under the assumptions of Theorem 17.7.2, there exists a full and
absorbing set Xf such that
(iii) for x ∈ Xf ,
lim k −1 Ex [V ∗ (Φk )] = lim Ex [V ∗ (Φk )I{τC > k}] = 0.
k →∞ k →∞
Proof For (i) we take Xf equal to the set XV given in the f -Norm Ergodic Theo-
rem 14.0.1, intersected with the set Xf given in Theorem 14.2.5.
The proof of (ii) is identical to the proof of Theorem 11.3.5. Note that f ∗ is π-
integrable with zero mean by the generalized Kac’s Theorem given in (10.2).
To prove the first limit in (iii) we iterate the identity in (ii) to obtain
n −1
Ex [V ∗ (Φn )] = P n V ∗ (x) = V ∗ (x) − P k f ∗ (x), n ≥ 1.
k =0
The right hand side is zero for x ∈ Xf by the f -Norm Ergodic Theorem 14.0.1. The
ergodic theorem requires aperiodicity. If this fails, we can apply the theorem to the
d-skeleton chain using Theorem 14.3.6.
By the definition of V ∗ and the Markov property we have for each m ≥ 1,
σC
∗
V (X(m)) = EX (m ) f (Φk )
k =0
(17.63)
τC
=E f (Φk ) | Fm on {τC ≥ m}.
k =m
τC
= E I{τC ≥ m} f (Φk ) .
k =m
∗
If V (x) < ∞, then the right hand side vanishes as m → ∞ by the Dominated Conver-
gence Theorem. This proves the second limit in (iii).
Chapter 18
Positivity
Turning from the sample path and classical limit theorems for normalized sums of the
previous chapter, we now return to considering limits of the transition probabilities
P n (x, A).
Our first goal in this chapter is to derive limit theorems for chains which are not
positive Harris recurrent. Although some results in this spirit have been derived as
ratio limit theorems such as Theorem 17.2.1 and Theorem 17.3.2, we have not to this
point considered in any detail the difference between limiting behavior of positive and
null recurrent chains.
The last five chapters have amply illustrated the power of ψ-irreducibility in the
positive case: that is, in conjunction with the existence of an invariant probability
measure. However, even in the non-positive case, powerful and elegant results can be
achieved. For Harris recurrent chains we prove a generalized version of the Aperiodic
Ergodicity Theorem of Chapter 13, which covers the null recurrent case and actually
subsumes the ergodic case also, since it applies to any Harris recurrent chain. We will
show
Theorem 18.0.1. Suppose Φ is an aperiodic Harris recurrent chain. Then for any
initial probability distributions λ, µ,
λ(dx)µ(dy)
P n (x, · ) − P n (y, · )
→ 0, n → ∞. (18.1)
If Φ is a null recurrent chain with invariant measure π, then for any constant ε > 0,
and any initial distribution λ,
lim sup λ(dx)P n (x, A)/[π(A) + ε] = 0. (18.2)
n →∞ A ∈B(X)
Proof The first result is shown in Theorem 18.1.2 after developing some extended
coupling arguments and then applying the splitting technique. The consequence (18.2)
is proved in Theorem 18.1.3.
Our second goal in this chapter is to use these limit results to complete the charac-
terizations of positivity through a positive/null dichotomy of the local behavior of P n
462
Positivity 463
on suitable sets: not surprisingly, the sets of relevance are petite or compact sets in the
general or topological settings respectively.
In classical countable state space analysis, as in Chung [71] or Feller [114] or
Çinlar [59], it is standard to first approach positivity as an asymptotic “P n -property”
of individual states. It is not hard to show that when Φ is irreducible, either
lim supn →∞ P n (x, y) > 0 for all x, y ∈ X or limn →∞ P n (x, y) = 0 for all x, y ∈ X.
These classifications then provide different but ultimately equivalent characterizations
of positive and null chains in the sense we have defined them, which is through the
finiteness or otherwise of π(X). In Theorem 18.2.2 we show that for ψ-irreducible
chains the positive/null dichotomy as defined in, say, Theorem 13.0.1 is equivalent to
similar dichotomous behavior of
P k (x, · ) − π
→ 0 as k → ∞. (18.4)
1
n
lim f (Φi ) = π(f ) in probability;
n →∞ n
i=1
This time, however, we define the second sequence {S1 , S2 , . . .} in a dependent way.
Let M be a (typically large, and yet to be chosen) integer. For each j define Sj as being
either exactly Sj if Sj > M , or, if Sj ≤ M , define Sj as being an independent variable
with the same conditional distribution as Sj , namely
P(Sj = k | Sj ≤ M ) = p(k)/(1 − p(M )), k ≤ M,
where p(M ) = j > M p(j).
This construction ensures that for j ≥ 1 the increments Sj and Sj are identical in
distribution even though they are not independent. By construction, also, the quantities
Wj = Sj − Sj
have the properties that they are identically distributed, they are bounded above by
M and below by −M , and they are symmetric around zero and in particular have zero
mean. n
Let Φ∗n = j =0 Wj denote the random walk generated by this sequence of variables,
and let Tab ∗
denote the first time that the random walk Φ∗ returns to zero, when the ini-
tial step W0 = S0 − S0 has the distribution induced by choosing a, b as the distributions
of S0 , S0 respectively.
As in Section 13.2 the coupling time of the two renewal processes is defined as
Tab = min{j : Za (j) = Zb (j) = 1}
where Za , Zb are the indicator sequences of each renewal process, and since
n
n
Φ∗n = Sj − Sj
j =0 j =0
Let us use this M in the construction of the random walk Φ∗ above. It is straightforward
to check that now Φ∗ really is irreducible, and so
P(Tab < ∞) = 1
initial distributions indicated. It is obvious that the chains Va− , Vb− couple at the same
time Tab that Za− , Zb− couple.
Now let A be an arbitrary set in Z+ . Since the distributions of Va− and Vb− are
identical after the time Tab we have for any n ≥ 1 by decomposing over the values of
Tab and using the Markov or renewal property
n
P(Va− (n) ∈ A) = P(Tab = m)P(Va− (n − m) ∈ A) + P(Va− (n) ∈ A, Tab > n),
m =1
n
P(Vb− (n) ∈ A) = P(Tab = m)P(Vb− (n − m) ∈ A) + P(Vb− (n) ∈ A, Tab > n).
m =1
sup |P(Va− (n) ∈ A) − P(Vb− (n) ∈ A)| ≤ P(Tab > n). (18.9)
A ⊆Z+
We already know that the right hand side of (18.9) tends to zero. But the left hand
side can be written as
2 |a ∗ u − b ∗ u| ∗ p(n) ,
1
= (18.10)
To do this we use a rather nice trick. Let us modify the distribution p(j) to form
another distribution p0 (j) on {0, 1, . . .} defined by setting
p0 (0) = p > 0;
p0 (j) = (1 − p)p(j), j ≥ 1.
Let us now carry out all of the above analysis using p0 , noting that even though this
is not a standard renewal sequence since p0 (0) > 0, all of the operations used above
remain valid.
Provided of course that p(j) is aperiodic in the usual way, we certainly have that
(18.8) holds for p0 and we can conclude that as n → ∞,
|a ∗ u0 − b ∗ u0 |(n) → 0, (18.11)
|a ∗ u0 − b ∗ u0 | ∗ p0 (n) → 0. (18.12)
Finally, by construction of p0 we have the two identities
and consequently, from (18.11) and (18.12), we have exactly (18.6) and (18.7) as re-
quired.
Note that in the null recurrent case, since we do not have p(n) < ∞, we cannot
prove this result from Lemma D.7.1 even though it is a identical conclusion to that
reached there in the positive recurrent case.
Theorem 18.1.2. Suppose Φ is an aperiodic Harris recurrent chain. Then for any
initial probability distributions λ, µ,
λ(dx)µ(dy)
P n (x, · ) − P n (y, · )
→ 0, n → ∞. (18.13)
Proof Yet again we begin with the assumption that there is an atom α in the
space. Then for any x we have from the Regenerative Decomposition (13.47)
P n (x, · ) − P n (α, · )
≤ Px (τα ≥ n) + |ax ∗ u − u| (n) + |ax ∗ u − u| ∗ p (n) (18.14)
where now p(n) = Pα (τα > n). From Theorem 18.1.1 we know the last two terms in
(18.14) tend to zero, whilst the first tends to zero from Harris recurrence.
468 Positivity
The result (18.13) then follows for any two specific initial starting points x, y from
the triangle inequality; it extends immediately to general initial distributions λ, µ from
dominated convergence.
As previously, the extension to strongly aperiodic chains is straightforward, whilst
the extension to general aperiodic chains follows from the contraction property of the
total variation norm.
We conclude with a consequence of this theorem which gives a uniform version of the
fact that, in the null recurrent case, we have convergence of the transition probabilities
to zero.
Theorem 18.1.3. Suppose that Φ is aperiodic and null recurrent, with invariant mea-
sure π. Then for any initial distribution λ and any constant ε > 0
lim sup λ(dx)P n (x, A)/[π(A) + ε] = 0. (18.15)
n →∞ A ∈B(X)
and by Egorov’s Theorem and the fact that π(X) = ∞ this convergence is uniform on
a set with π-measure arbitrarily large.
In particular we can take k and D such that π(D) > δ −1 and
| λ(dx)P n k (x, Bk ) − P n k (y, Bk )| ≤ εδ/2, y ∈ D. (18.18)
which gives
π(D) ≤ δ −1 ,
18.2. Characterizing positivity using P n 469
The two results in Theorem 18.1.2 and Theorem 18.1.3 combine to tell us that, on
the one hand, the distributions of the chain are getting closer as n gets large; and that
they are getting closer on sets increasingly remote from the “center” of the space, as
described by sets of finite π-measure.
and since in general, because of the cyclic behavior in Section 5.4, we may have
the condition (18.20) is often adopted as the next strongest stability condition after
(18.21).
This motivates the following definitions.
470 Positivity
When Φ is irreducible, either all states are positive or all states are null, since for
any w, z there exist r, s such that P r (w, x) > 0 and P s (y, z) > 0 and
lim sup P r +s+n (w, z) > P r (w, x)[lim sup P n (x, y)]P s (y, z). (18.23)
n →∞ n →∞
We need to show that these solidarity properties characterize positive and null chains in
the sense we have defined them. One direction of this is easy, for if the chain is positive
recurrent, with invariant probability π, then we have for any n
π(y) = π(x)P n (x, y);
x
n
hence if limn →∞ P (w, w) = 0 for some w then by (18.23) and dominated convergence
π(y) ≡ 0, which is impossible. The other direction is easy only if one knows, not merely
that lim supn →∞ P n (x, y) > 0, but that (at least through an aperiodic class) this is
actually a limit. Theorem 18.1.3 now gives this to us.
Theorem 18.2.1. If Φ is irreducible on a countable space, then the chain is positive
recurrent if and only if some one state is positive. When Φ is positive recurrent, for
some d ≥ 1
lim P n d+r (x, y) = dπ(y) > 0
n →∞
for all x, y ∈ X, and some 0 ≤ r(x, y) ≤ d − 1; and when Φ is null
lim P n (x, y) = 0
n →∞
for all x, y ∈ X.
Proof If the chain is transient, then since U (x, y) < ∞ for all x, y from Propo-
sition 8.1.1 we have that every state is null; whilst if the chain is null recurrent, then
since π(y) < ∞ for all y, Theorem 18.1.3 shows that every state is null.
Suppose that the chain is positive recurrent, with period d: then the Aperiodic
Ergodic Theorem for the chain on the cyclic class Dj shows that for x, y ∈ Dj we have
lim P n r (x, y) = dπ(y) > 0
n →∞
(ii) The set A is called positive if lim supn →∞ P n (x, A) > 0 for all x ∈ A.
We now prove that these definitions are consistent with the definitions of null and
positive recurrence for general ψ-irreducible chains.
Theorem 18.2.2. Suppose that Φ is ψ-irreducible. Then:
(i) the chain Φ is positive recurrent if and only if every set B ∈ B+ (X) is positive;
(ii) if Φ is null,
! then every petite set is null and hence there is a sequence of null sets
Bj with j Bj = X.
Proof If the chain is null, then either it is transient, in which case each petite set
is strongly transient and thus null by Theorem 8.3.5, or it is null and recurrent in which
case, since π exists and is finite on petite sets by Proposition 10.1.2, we have that every
petite set is again null from Theorem 18.1.3.
Suppose the chain is positive recurrent and we have A ∈ B+ (X). For x ∈ D0 ∩ H,
where H is the maximal Harris set and D0 an arbitrary cyclic set, we have for each r
which is positive for some r. Since for every x we have L(x, D0 ∩ H) > 0 we have that
the set A is positive.
Proof In the null case we do not have boundedness in probability since P n (x, y) →
0 for all x, y from Theorem 18.2.1.
In the positive case we have on each periodic set Dr a finite probability measure πr
such that if x ∈ D0
lim P n d+r (x, C) = πr (C), (18.24)
n →∞
so by choosing a finite C such that πr (C) > 1 − ε for all 1 ≤ r ≤ d we have boundedness
in probability as required.
The identical conclusion holds for T-chains. To get the broadest presentation, recall
that a state x ∈ X is reachable if
U (y, O) > 0
Theorem 18.3.2. Suppose that Φ is a T-chain and admits a reachable state x . Then
Φ is a positive Harris chain if and only if it is bounded in probability.
Proof First note from Proposition 6.2.1 that for a T-chain the existence of just
one reachable state x gives ψ-irreducibility, and thus Φ is either positive or null.
Suppose that Φ is bounded in probability. Then Φ is non-evanescent from Propo-
sition 12.1.1, and hence Harris recurrent from Theorem 9.2.2.
Moreover, boundedness in probability implies by definition that some compact set
is non-null, and hence from Theorem 18.2.2 the chain is positive Harris, since compact
sets are petite for T-chains.
Conversely, assume that the chain is positive Harris, with periodic sets Dj each
supporting a finite probability measure πj satisfying (18.24). Choose ε > 0 and compact
sets Cr ⊆ Dr such that πr (Cr ) > 1 − ε for each r.
If x ∈ Dj , then with C := ∪Cr ,
We now show that these topological properties for points can be linked to their
counterparts for the whole chain when the T-chain condition holds. This completes the
series of results begun in Theorem 9.3.3 connecting global properties of T-chains with
those at individual points.
Proof From Proposition 6.2.1 the existence of a reachable state ensures the chain
is ψ-irreducible. Assume that x∗ is positive. Since Φ is a T-chain, there exists an open
petite set C containing x∗ (take any precompact open neighborhood) and hence by
Theorem 18.2.2 the chain is also positive.
Conversely, suppose that Φ has an invariant probability π so that Φ is positive
recurrent. Since x is reachable it also lies in the support of π, and consequently any
neighborhood of x∗ is in B + (X). Hence x is positive as required, from Theorem 18.2.2.
Proof The identity P Π = Π which is proved in Theorem 12.4.1 implies that for
any f ∈ Cc (X), the adapted process (Π(Φk , f ), FkΦ ) is a bounded martingale. Hence
by the Martingale Convergence Theorem D.6.1 there exists a random variable π̃(f ) for
which
lim Π(Φk , f ) = π̃(f ) a.s. [P∗ ],
k →∞
which gives π̃(f ) = Π(x , f ) a.s. [P∗ ]. Taking expectations gives Π(y, f ) = Ey [π̃(f )] =
Π(x , f ) for all y.
Since a finite measure on B(X) is determined by its values on continuous functions
with compact support, this shows that the measures Π(y, · ), y ∈ X, are identical. Let
π denote their common value.
To prove Proposition 18.4.2 we first show that (i) and (iii) are equivalent. To see
that (iii) implies (i), observe that under positivity of x we have Π(x , X) > 0, and since
Π(y, X) = π(X) does not depend on y it follows from Theorem 12.4.3 that Π(y, X) = 1
for all y. Hence π is an invariant probability, which shows that (i) does hold.
Conversely, if (i) holds, then by reachability of x we have x ∈ supp π and hence
every neighborhood of x is positive. This shows that (iii) also holds.
We now show that (i) is equivalent to (ii).
It is obvious that (i) implies (ii). To see the converse, observe that if (ii) holds, then
by Theorem 12.4.1 we have that π is an invariant probability. Moreover, since x is
reachable we must have that π(O) > 0 for any neighborhood of x . Since Π(y, O) =
π(O) for every y, this shows that x is positive.
Hence (iii) holds, which implies that (i) also holds.
The next result justifies this definition of aperiodicity and strengthens Theo-
rem 12.4.1.
and hence v := limk →∞ |P k f | dπ exists.
Since {P k f } is equicontinuous on compact subsets of X, there exists a continuous
function g, and a subsequence {ki } ⊂ Z+ for which P k i f → g as i → ∞ uniformly
on compact subsets of X. Hence we also have P k i +
f → P
g as i → ∞ uniformly on
compact subsets of X.
By the Dominated Convergence Theorem we have for all
∈ Z+ ,
P
g dπ = f dπ and |P
g| dπ = v. (18.27)
We will now show that this implies that the function g cannot change signs on supp π.
Suppose otherwise, so that the open sets
so that g ≡ 0 on supp π. This shows that the limit (18.26) holds for all initial conditions
in supp π.
We now show that if a reachable state exists for an e-chain, then the limit in Propo-
sition 18.4.3 holds for each initial condition. A sample path version of Theorem 18.4.4
will be presented below.
Theorem 18.4.4. Suppose that Φ is an e-chain which is bounded in probability on
average. Then:
(i) A unique invariant probability π exists if and only if a reachable state x ∈ X
exists.
18.5. The LLN for e-chains 477
(ii) If an aperiodic reachable state x ∈ X exists, then for each initial state x ∈ X,
w
P k (x, · ) −→ π as k → ∞, (18.29)
where π is the unique invariant probability for Φ. Conversely, if (18.29) holds for
all x ∈ X, then every state in supp π is reachable and aperiodic.
Proof The proof of (i) follows immediately from Proposition 18.4.2, and the con-
verse of (ii) is straightforward.
To prove the remainder, we assume that the state x ∈ X is reachable and aperiodic,
and show that equation
(18.29) holds for all initial conditions.
Suppose that f dπ = 0, |f (x)| ≤ 1 for all x, and for fixed ε > 0 define the set
lim P N (x, Oε ) = 1.
N →∞
1
n
µ̃n {A} := Sn (IA ) = I{Φk ∈ A}, n ∈ Z+ , A ∈ B(X). (18.30)
n
k =1
Observe that {µ̃k } are not probabilities in the usual sense, but are probability-valued
random variables.
The Law of Large Numbers (Theorem 17.1.2) states that if an invariant probability
measure π exists, then the occupation probabilities converge with probability one for
each initial condition lying in a set of full π-measure. We now present two versions of
478 Positivity
the law of large numbers for e-chains where the null set appearing in Theorem 17.1.2
is removed by restricting consideration to continuous, bounded functions. The first is
a Weak Law of Large Numbers, since the convergence is only in probability, while the
second is a Strong Law with convergence occurring almost surely.
(ii) If for each initial condition of the Markov chain the occupation probabilities are
almost surely tight, then as k → ∞
w
µ̃k −→ π a.s. [P∗ ]. (18.31)
1 1
N N
P M f (Φk ) − f dπ ≤ ε + I{Φi ∈ C c }. (18.32)
N N i=1
k =1
1 n
N
+ P f (Φk −n ) − P n f (Φk )
N
k =1
1
N
1 M
+ P f (Φk ) − f dπ
n
N M n =1
k =1
1 1 n
M N
+ P f (Φk −n ) − P n f (Φk ) .
M n =1 N
k =1
The fourth term is a telescoping series, and hence recalling our definition of the transi-
tion function P M we have
1 −1 N
1 i
N M
f (Φk ) − f dπ ≤ P f (Φk −i ) − P i+1 f (Φk −i−1 )
N i=0
N
k =1 k =1
1 N
+ P M f (Φk ) − f dπ
N
k =1
2M
+ . (18.34)
N
For each fixed 0 ≤ i ≤ M − 1 the sequence
P i f (Φk −i ) − P i+1 f (Φk −i−1 ), FkΦ−i , k > i,
1 1 N
≤ lim sup Ex I{Φi ∈ C c } .
γ − ε N →∞ N i=1
Since Φ is bounded in probability on average, the right hand side decreases to zero as
C ↑ X, which completes the proof of (i).
To prove (ii), suppose that the occupation probabilities {µ̃k } are tight along some
sample path. Then we may choose the compact set C in (18.32) so that along this
sample path
1 N
lim sup P M f (Φk ) − f dπ ≤ 2ε.
N →∞ N
k =1
480 Positivity
so that the Strong Law of Large Numbers holds for all f ∈ C(X) and all initial conditions
x ∈ X.
Let {fn } be a sequence of continuous functions with compact support which is dense
in Cc (X) in the uniform norm. Such a sequence exists by Proposition D.5.1. Then by
the preceding result,
$ %
Px lim fn dµ̃k = fn dπ for each n ∈ Z+ = 1,
k →∞
v
which implies that µ̃k −→ π as k → ∞. Since π is a probability, this shows that in fact
w
µ̃k −→ π a.s. [P∗ ], and this completes the proof.
We conclude by stating a result which, combined with Theorem 18.5.1, provides a
test function approach to establishing the Law of Large Numbers for Φ. For a proof
see [259].
Theorem 18.5.2. If a coercive function V and a compact set C satisfy condition (V4),
then Φ is bounded in probability, and the occupation probabilities are almost surely tight
for each initial condition. Hence, if Φ is an e-chain, and if a reachable state exists,
w
µ̃k −→ π as k → ∞ a.s. [P∗ ]. (18.35)
18.6 Commentary
Theorem 18.1.2 for positive recurrent chains is first proved in Orey [308], and the null
recurrent version we give here is in Jamison and Orey [177]. The dependent coupling
which we use to prove this result for null recurrent chains is due to Ornstein [310],
[311], and is also developed in Berbee [25]. Our presentation of this material has relied
heavily on Nummelin [303], and further related results can be found in his Chapter 6.
Theorem 18.1.3 is due to Jain [171], and our proof is taken from Orey [309].
The links between positivity of states, boundedness in probability, and positive
Harris recurrence for T-chains are taken from Meyn [259], Meyn and Tweedie [277]
and Tuominen and Tweedie [391]. In [277] analogues of Theorem 18.3.2 and Proposi-
tion 18.3.3 are obtained for non-irreducible chains.
The convergence result Theorem 18.4.4 for chains possessing an aperiodic reachable
state is based upon Theorem 8.7.2 of Feller [115].
The use of the martingale property of Π(Φk , f ) to obtain uniqueness of the invariant
probability in Proposition 18.4.2 is originally in [175]. This is a powerful technique which
is perhaps even more interesting in the absence of a reachable state.
For suppose that the chain is bounded in probability but a reachable state does
not exist, and define an equivalence relation on X as follows: x ↔ y if and only if
Π(x, · ) = Π(y, · ). It follows from the same techniques which were used in the proof of
Proposition 18.4.2 that if x is recurrent, then the set of all states E x for which y ↔ x
18.6. Commentary 481
is closed. Since x ∈ E x for every recurrent point x ∈ R, F = X − E x consists
entirely of non-recurrent points. It then follows from Proposition 3.3 of Tuominen and
Tweedie [392] that F is transient.
From this decomposition and Proposition 18.4.3 it is straightforward to generalize
Theorem 18.4.4 to chains which do not possess a reachable state. The details of this
decomposition are spelled out in Meyn and Tweedie [281].
Such decompositions have a large literature for Feller chains and e-chains: see for
example Jamison [175] and also Rosenblatt [337] for e-chains, and Jamison and Sine
[178], Sine [358, 357, 356] and Foguel [121, 123] for Feller chains and the detailed
connections between the Feller property and the stronger e-chain property. All of these
papers consider exclusively compact state spaces. The results for non-compact state
spaces appear here for the first time.
The LLN for e-chains is originally due to Breiman [46] who considered Feller chains
on a compact state space. Also on a compact state space is Jamison’s extension of
Breiman’s result [174] where the LLN is obtained without the assumption that a unique
invariant probability exists.
One of the apparent difficulties in establishing this result is finding a candidate
limit π̃(f ) of the sample path averages n1 Sn (f ). Jamison resolved this by considering
the transition function Π, and the associated convergent martingale (Π(Φk , A), FkΦ ). If
the chain is bounded in probability on average, then we define the random probability
π̃ as
π̃{A} := lim Π(Φk , A), A ∈ B(X). (18.36)
k →∞
It is then easy to show by modifying (18.34) that Theorem 18.5.1 continues to hold
with f dπ replaced by f dπ̃, even when no reachable state exists for the chain. The
proof of Theorem 18.5.1 can be adopted after it is appropriately modified using the
limit (18.36).
Chapter 19
Generalized classification
criteria
We have now developed a number of simple criteria, solely involving the one-step transi-
tion function, which enable us to classify quite general Markov chains. We have seen, for
example, that the equivalences in Theorem 11.0.1, Theorem 13.0.1, or Theorem 15.0.1
give an effective approach to the analysis of many systems.
For more complex models, however, the analysis of the simple one-step drift
∆V (x) = P (x, dy)[V (y) − V (x)]
towards petite sets may not be straightforward, or indeed may even be impracticable.
Even though we know from the powerful converse theory in the theorems just mentioned
that for most forms of stability, there must be at least one V with the one-step drift
∆V suitably negative, finding such a function may well be non-trivial.
In this chapter we conclude our approach to stochastic stability by giving a number
of more general drift criteria which enable the classification of chains where the one-step
criteria are not always straightforward to construct. All of these variations are within
the general framework described previously. The steps to be used in practice are, we
hope, clear from the preceding chapters, and follow the route reiterated in Appendix A.
There are three generalizations of the drift criteria which we consider here.
(a) State-dependent drift conditions, which allow for negative drifts after a number of
steps n(x) depending on the state x from which the chain starts.
(b) Path- or history-dependent drift conditions, which allow for functions of the whole
past of the process to show a negative drift.
(c) Mixed or “average” drift conditions, which allow for functions whose drift varies
in direction, but which is negative in a suitably “averaged” way.
For each of these we also indicate the application of the method by example. The
state-dependent drift technique is used to analyze random walk on R2+ and a model
482
19.1. State-dependent drifts 483
and let Φ̂ be the corresponding Markov chain. This Markov chain may be constructed
explicitly as follows. The time n(x) is a (trivial) stopping time. Let s(k) denote its
iterates: that is, along any sample path, s(0) = 0, s(1) = n(x) and
We let τ̂A , σ̂A denote the first-return and first-entry index to A respectively for the
chain Φ̂. Clearly s(k) and the events {σ̂A ≥ k}, {τ̂A ≥ k} are F̂k −1 -measurable for any
A ∈ B(X).
Note that s(τ̂C ) denotes the time of first return to C by the original chain Φ along
an embedded path, defined by
C −1
τ̂
s(τ̂C ) := n(Φ̂k ). (19.4)
0
484 Generalized classification criteria
These relations will enable us to use the drift equations (19.1), with which we will
bound the index at which Φ̂ reaches C, to bound the hitting times on C by the original
chain.
We first give a state-dependent criterion for Harris recurrence.
Theorem 19.1.1. Suppose that Φ is a ψ-irreducible chain on X, and let n(x) be a
function from X to Z+ . The chain is Harris recurrent if there exists a non-negative
function V unbounded off petite sets and some petite set C satisfying
P n (x) (x, dy)V (y) ≤ V (x), x ∈ Cc . (19.6)
Firstly, if the chain is transient, then by Theorem 8.3.5 each Cn is uniformly transient,
and hence V (Φk ) → ∞ as k → ∞ a.s. [P∗ ], and so (19.7) holds.
Secondly, if Φ is recurrent, then the state space may be written as
X=H ∪N (19.8)
where H = N c is a maximal Harris set and ψ(N ) = 0; this follows from Theorem 9.0.1.
Since for each n the set Cn is petite we have Cn H, and hence by Theorem 9.1.3,
It follows that the inclusion {lim inf V (Φn ) < ∞} ⊂ {Φ ∈ H i.o.} holds with probability
one. Thus (19.7) holds for any x0 ∈ N , and if the chain is not Harris, we know N is
non-empty.
Now from (19.7) there exists M ∈ Z+ with
( )
Px 0 (Φk ∈ C c , k ≥ M ) ∩ (V (Φk ) → ∞) > 0 :
Hence (Mk , F̂k ) is a positive supermartingale, so that from Theorem D.6.2 there exists
an almost surely finite random variable M∞ such that Mk → M∞ a.s. as k → ∞. From
the construction of Mi , either σ̂C < ∞ in which case M∞ = 0, or σ̂C = ∞ in which
case lim supk →∞ V (Φ̂k ) = M∞ < ∞ a.s.
Since σC < ∞ whenever σ̂C < ∞, this shows that for any initial distribution µ,
( )
Pµ {σC < ∞} ∪ {lim inf V (Φn ) < ∞}c = 1.
n →∞
Proof The state-dependent drift criterion for positive recurrence is a direct con-
sequence of the f -ergodicity results of Theorem 14.2.2, which tell us that without any
irreducibility or other conditions on Φ, if f is a non-negative function and
P (x, dy)V (y) ≤ V (x) − f (x) + bIC (x), x ∈ X, (19.12)
Again define the chain Φ̂ as in (19.3). From (19.10) we can use (19.13) for Φ̂, with
f (x) taken as n(x), to deduce that
τ̂
C −1
Ex n(Φ̂k ) ≤ V (x) + b. (19.14)
k =0
But we have by adding the lengths of the embedded times n(x) along any sample path
that from (19.4)
C −1
τ̂
n(Φ̂k ) = s(τ̂C ) ≥ τC .
k =0
486 Generalized classification criteria
Thus from (19.14) and the fact that V is bounded on the petite set C, we have that Φ
is positive Harris using the one-step criterion in Theorem 13.0.1, and the bound (19.11)
follows also from (19.14).
We conclude the section with a state-dependent criterion for geometric ergodicity.
Theorem 19.1.3. Suppose that Φ is a ψ-irreducible chain on X, and let n(x) be a
function from X to Z+ . The chain is geometrically ergodic if it is aperiodic and there
exists some petite set C, a non-negative function V ≥ 1 and bounded on C, and positive
constants λ < 1 and b < ∞ satisfying
P n (x) (x, dy)V (y) ≤ λn (x) [V (x) + bIC (x)]. (19.15)
We now adapt the proof of Theorem 15.2.5. Define the random variables
Zk = κs(k ) V (Φ̂k )
for k ∈ Z+ . It follows from (19.18) that for κ = λ−1 , since κs(k +1) is F̂k -measurable,
Collapsing the sum on the left and using the fact that only the first term in the sum
on the right is non-zero, we get
Since V < ∞ and V is assumed bounded on C, and again using the fact that s(τ̂C ) > τC ,
we have from Theorem 15.0.1 (ii) that the chain is geometrically ergodic.
The final bound in (19.16) comes from the fact that for some r, an upper bound on
the state-dependent constant term in (19.16) is shown in Theorem 15.4.1 to be given
by
R(x) = Ex [κτ C ] ≤ Ex [κs( τ̂ C ) ] ≤ (2 + b)V (x)
since V ≥ 1.
(Φn (1), Φn (2)) = ([Φn −1 (1) + Zn (1)]+ , [Φn −1 (2) + Zn (2)]+ ). (19.20)
Let us assume that for each coordinate we have negative increments: that is,
This assumption ensures that the chain is a δ(0,0) -irreducible chain with all compact
sets petite. To see this note that there exists h > 0 such that
P(Zk (1) < −h) > h, P(Zk (2) < −h) > h,
This provides δ(0,0) -irreducibility, and moreover shows that Sw is small, with ν = δ0,0
in (5.14).
We will also assume that the second moments of the increments are finite:
Thus it follows from Proposition 14.4.1 that each of the marginal random walks on
[0, ∞) is positive Harris with stationary measures π1 , π2 satisfying
β1 := zπ1 (dz) < ∞, β2 := zπ2 (dz) < ∞. (19.21)
488 Generalized classification criteria
E[Zk (1)I{Zk (1) ≥ −M }] < −ε, E[Zk (2)I{Zk (2) ≥ −M }] < −ε.
This ensures that on the set A(M ) = {x ≥ M, y ≥ M }, we have that (19.10) holds with
n(x, y) = 1 in the usual manner.
Now consider the strip A1 (M, m) = {x ≤ M, y ≥ m}, and fix (x, y) ∈ A1 (M, m).
Let us choose a given fixed number of steps n, and choose m > (M + 1)n. At each
step in the time period {0, . . . , n} the expected value of Φn (2) decreases in expectation
by at least ε. Moreover, from (19.21) and the f -norm ergodic result (14.5) we have that
by convergence there is a constant c0 such that for all n
E(x,y ) [Φn (1)I{τ0 > n}] ≤ E(M ,y ) [Φn (1)I{τ0 > n}]
(19.23)
:= ζM (n)
E(x,y ) [Φn (1) + Φn (2)] = E(x,y ) [Φn (2)] + E(x,y ) [Φn (1)I{τ0 > n}]
≤ y − nε + ε0 + c0 .
Thus for x ≤ M , we have uniform negative n-step drift in the region A1 (M, m) provided
nε > M + ε0 + c0
as required.
A similar construction enables us to find that for fixed large n the n-step drift in
the region A2 (m, M ) is negative also. Thus we have shown
Theorem 19.1.4. If the bivariate random walk on R2+ has negative mean increments
and finite second moments in both coordinates, then it is positive Harris recurrent, and
for sets A(m) = {x ≥ m, y ≥ m} with m large, and some constant c,
In this example, we do not use the full power of the results of Section 19.1. Only
three values of n(x, y) are used, and indeed it is apparent from the construction in
(19.24) that we could have treated the whole chain on the region
{x ≥ M + n} ∪ {y ≥ M + n}
for the same n. In this case the n-skeleton {Φn k } would be shown to be positive
recurrent, and it follows from the fact that the invariant measure for {Φk } is also
invariant for {Φn k } that the original chain is positive Harris: see Chapter 10. This
example does, however, indicate the steps that we could go through to analyze less
homogeneous models, and also indicates that it is easier to analyze the boundaries
or non-standard regions independently of the interior or standard region of the space
without the need to put the results together for a single fixed skeleton.
Note that the chain is δ(0,0) -irreducible under assumptions (A1)–(A3), regardless of
the behavior at zero. Thus the model can be formulated to allow for a stationary
distribution at (0, 0) (i.e., extinction) or for rebirth and a more generally distributed
stationary distribution over the whole of Z+ 2 . The only restriction we place in general
is that the increments from (0, 0) have finite mean: here we will not make this more
explicit as it does not affect our analysis.
Let us, to avoid unrewarding complexities, add to (19.26) the additional condition
that the model is “left-continuous,” that is, has bounded negative increments defined
by
P ((i, j), (i − l, j − k) = 0, i, j > 0, k, l > 1 : (19.29)
this would be appropriate if the chain were embedded at the jumps of a continuous time
process, for example.
To evaluate positive recurrence of the model, we use the test function V (i, j) =
[i + j]/β, where β < ε is to be chosen.
Analysis of this model in the interior of the space is not difficult: by using (V2)
with V (i, j) on I = {i, j ≥ 1}, we have that Ei,j [τI c ] < (i + j)/β from assumption (A1).
The difficulty with such multidimensional models is that even though they reach I c in
a finite mean time, they may then “escape” along one or both of the boundaries. It is
in this region that the tools of Section 19.1 are useful in assisting with the classification
of the model.
Starting at B1 (c) = {(i, 0), i > c}, the infinite boundary edge above c, we have that
the value of V (Φ1 ) is zero if c > d, so that (19.10) also holds with n = 1 provided we
choose c > max(d, β −1 ).
On the other infinite boundary edge, denoted B2 (c) = {(0, j), j > c}, however, we
have positive one-step drift of the function V . Now from the starting point (0, j), let
us consider the (j + 1)-step drift. This is bounded above by [j + d − 2jε]/β and so we
have (19.10) also holds with n(j) = j + 1 provided
[j + d − 2jε]/β < −j − 1,
which will hold provided β < 2ε − 1 and we then choose c > (d + β)/(2ε − 1 − β).
Consequently we can assert that, writing C = I ∪ B2 (c) ∪ B1 (c) with c satisfying
both these constraints, the mean time is bounded as
regardless of the threshold level d, and so the invading strategy is successful in over-
coming the antibody defense.
Note that in this model there is no fixed time at which the drift from all points on
the boundary B2 (c) is uniformly negative, no matter what the value of c chosen. Thus,
state-dependent drift conditions appear needed to analyze this model.
To test for geometric ergodicity we use the function V (i, j) = exp(αi) + exp(αj) and
adopt the approach in Section 16.3.
19.2. History-dependent drift criteria 491
We assume that the increments in the model have uniformly geometrically decreasing
tails and bounded second moments: specifically, we assume each coordinate process
satisfies, for some γ > 0,
θi (γ) := k ≥i−1 exp(γk)Pi,j (Φ1 (1) = i + k) < ∞, j ≥ 1,
(19.30)
θj (γ) := k ≥j −1 exp(γk)Pi,j (Φ1 (2) = j + k) < ∞, i ≥ 1,
and
k ≥i−1 k 2 Pi,j (Φ1 (1) = i + k) < D1 , j ≥ 1,
(19.31)
2
k ≥j −1 k Pi,j (Φ1 (2) = j + k) < D2 , i ≥ 1.
Then on the interior set I we have, for α < γ,
j P ((r, s), (i, j))V (i, j) ≤ exp(αr)[θi (α) − 1]
+ exp(αs)[θj (α) − 1]
≤ α exp(αr)(−εr /2)
+ α exp(αs)(−εs /2)
for small enough α, using a Taylor series expansion and the uniform conditions (19.30)
and (19.31). Thus (19.15) holds with n = 1 and λ = 1 − αε/2.
Starting at B1 (c), (19.15) also obviously holds provided we choose c large enough.
On the other infinite boundary edge B2 (c) = {(0, j), j > c} we have a similar construc-
tion for the (j + 1)-step drift. We have, using the uniform bounds (19.31) assumed on
the variances,
j +1
j P ((0, s), (i, j))V (i, j) ≤ exp(α(j + d))[1 − ε/2]j
+ exp(αs)[1 − ε/2]j
and so, for α suitably small, we have (19.15) holding again as required.
As usual the condition that σC > k means that Φi ∈ C c for each i between 0 and
k. Since C will usually be assumed “small” in some sense (either petite or compact),
(19.32) implies that there is a drift towards the “center” of the state space when Φ is
“large” in exactly the same way that (V2) does.
From these generalized drift conditions and Dynkin’s formula we find
Theorem 19.2.1. If {Vk } satisfies (19.32), then
"
ε−1 V0 (x) x ∈ Cc
Ex [τC ] ≤
1 + ε−1 P V0 (x) x∈C.
when σC > k, k ∈ Z+ .
Clearly the process Vk ≡ 1 satisfies (19.33), so we will need some auxiliary conditions
to prove anything specific when (19.33) holds.
19.2. History-dependent drift criteria 493
Theorem 19.2.2. Suppose that {Vk } satisfies (19.33), and let x0 ∈ C c be such that
Suppose moreover the conditional absolute increments have bounded means: that is, for
some constant B < ∞,
E[|Vk − Vk −1 | | FkΦ−1 ] ≤ B. (19.35)
Then Ex 0 [τC ] = ∞.
Proof The proof of Theorem 11.5.1 goes through without change, although in this
case the functions Vk in that proof are not taken simply as V (Φk ) but as Vk (Φ0 , . . . , Φk ).
Proof Let λ < ρ < 1, and define the pre-compact set C and the constant ε > 0 by
2L ρ−λ
C = {x ∈ X : V (x) ≤ + 1}, ε= .
ρ−λ 2
Then for all k ∈ Z+ ,
+
ρ−λ ρ−λ
E[Vk +1 | FkΦ ] ≤ ρVk + [L + (ρ − λ)] − (V (Φk ) + 1) − (V (Φk ) + 1) .
2 2
m
τC
0 ≤ Ex [zτ Cm ] = Ex [Z1 ] + Ex E[Zk | FkΦ−1 ] − Zk −1 I(τC ≥ 2)
k =2
m
τC
≤ Ex [Z1 ] − Ex εrk f (Φk −1 )I(τC ≥ 2) .
k =2
This and the Monotone Convergence Theorem shows that for all x ∈ X,
τC
Ex rk f (Φk −1 ) ≤ ε−1 rEx [V1 ] + rV (x).
k =1
This completes the proof, since Ex [V1 ] + V (x) ≤ λV0 (x) + L + V0 (x) by (19.37) and
(19.36).
when σC > k, k ∈ Z+ .
where µ = P M (x, · ).
This time let Mi = Vi I{σC ≥ i}. Again we have that (Mk , FkΦ ) is a positive
supermartingale, since
Hence there exists an almost surely finite random variable M∞ such that Mk → M∞
as k → ∞.
But as in Theorem 9.4.1, either σC < ∞ in which case M∞ = 0, or σC = ∞ which
contradicts (19.39). Hence Φ is again non-evanescent.
The Harris recurrence when Φ is a T-chain follows as usual by Theorem 9.2.2.
Finally, we give a criterion for transience using a time-varying test function.
when σA > k, k ∈ Z+ .
Theorem 19.2.5. Suppose that the process Vk satisfies (19.40) for a set A, and suppose
that for deterministic constants L > M ,
Vk ≤ L, I{σA = k}Vk ≤ M, k ∈ Z+ .
and the adapted process (Mk , FkΦ ) is thus a submartingale. Hence (L − Mk , FkΦ ) is a
positive supermartingale. By Kolmogorov’s Inequality (Theorem D.6.3) it follows that
for any T > 0
L − M0 (x)
Px {sup(L − Mk ) ≥ T } ≤ .
k ≥0 T
Letting T = L − M , and noting that M0 (x) ≥ V0 (x), gives
L − V0 (x)
Px { inf Mk ≤ M } ≤ .
k ≥0 L−M
Finally, since Mk = M for all k sufficiently large whenever σA < ∞, it follows that
V0 (x) − M
Px {σA = ∞} ≥ Px { inf Mk > M } ≥
k ≥0 L−M
which is the desired bound.
θk
Proof It follows as in the proof of Proposition 17.3.5 that the joint process Φk =
Yk , k ≥ 0, is an aperiodic, ψ-irreducible T-chain.
In view of Theorem 19.2.3 it is enough to show that the history-dependent drift
(19.37) holds for an adapted process {Vk }. We now indicate how to construct such a
process.
First use the estimate x ≤ e−1 ex to show
1
k 1
k
k
| θi | ≤ e−(k −j +1) exp |θi | = e−(k −j +1) exp |θi | . (19.42)
i=j i=j i=j
k
k
k
|θi | ≤ |α| |θi | + |α||θj −1 | + |Zi |,
i=j i=j i=j
we have
k
|α| 1
k
|θi | ≤ |θj −1 | + |Zi |, (19.43)
i=j
1 − |α| 1 − |α| i=j
1
k
|α| 1
k
| θi | ≤ e−(k −j +1) exp{ |θj −1 |} × exp{ |Zi |}. (19.44)
i=j
1 − |α| 1 − |α| i=j
Squaring both sides of (17.28) and applying (19.44), we obtain the bound
k
|α| 1k
1
Ak = { |Wj | exp{ |θj −1 |} exp{ |Zi | − 1}}2 ,
j =1
1 − |α| i=j
1 − |α|
2|α| 1k
2
Bk = θ02 Y02 exp{ |θ0 |} exp{ |Zi | − 2}.
1 − |α| i=1
1 − |α|
If we define
2|α|
Ck = exp{ |θk |},
1 − |α|
we have the three bounds, valid for any ε > 0,
This is shown in [275] and we omit the details which are too lengthy for this exposition.
The constant ε will be assumed small, but we will keep it free until we have performed
one more calculation. For k ≥ 0 we make the definition
Vk = ε3 Yk2 + ε2 Ak + Bk + Ck .
E[Vk +1 | FkΦ ] ≤ 3ε3 A2k + 3ε3 Bk + ζz2 ε2 (1 + ε)Ak + ζz2 ε2 (1 + ε−1 )E[W 2 ]Ck
+ ζz2 Bk + |α|Ck + R.
are all well defined and finite everywhere. Obviously we need a little less than (19.46)
to guarantee this, but (19.46) will also be a convenient condition elsewhere.
Theorem 19.3.1. Suppose that Φ is ψ-irreducible and that V ≥ 0 satisfies (19.46). A
sufficient condition for the chain to be positive is that for some one x ∈ X and some
petite set C
n
lim inf n−1 P k (x, dy)∆V (y) < 0. (19.48)
n →∞ Cc
k =1
where all the terms in (19.49) are finite by induction and (19.46). By iteration, we then
get
n
−1 −1
n P n +1
(x, dy)V (y) = n P k (x, dy)∆V (y) + n−1 [∆V (x) + V (x)]
k =1
so that as n → ∞
n
lim inf n−1 P n (x, dy)∆V (y) ≥ 0. (19.50)
k =1
Now suppose by way of contradiction that Φ is null; then from Theorem 18.2.2 we have
that the petite set C is null, and so for every x we have by the bound in (19.46)
lim P n (x, dy)∆V (y) = 0.
n →∞ C
This, together with (19.50), cannot be true when we have assumed (19.48); so the chain
is indeed positive.
There is a converse to this result. We first show that for positive chains and suitable
functions V , the drift ∆V , π-averaged over the whole space, is in fact zero.
500 Generalized classification criteria
We first show that |Mz (x)| is uniformly bounded for x ∈ X and z ∈ ( 12 , 1) under the
bound (19.46).
By the Mean Value Theorem and non-negativity of V we have for any 0 < z < 1,
d t
|z V (x) − z V (y ) | ≤ |V (x) − V (y)| sup | z|
t≥0 dt
= |V (x) − V (y)|| log(z)|. (19.52)
Hence under (19.46), for all x ∈ X and z ∈ (0, 1),
| log(z)| d
|Mz (x)| ≤ P (x, dy)|V (x) − V (y)| ≤ (19.53)
1−z z
which establishes the claimed boundedness of |Mz (x)|.
Moreover, by (19.52) and dominated convergence,
$ z V (x) − z V (y ) %
lim Mz (x) = P (x, dy) lim = ∆V (x). (19.54)
z ↑1 z ↑1 1−z
Since π(dx)z V (x) < ∞ for fixed z ∈ (0, 1), we can interchange the order of integration
and find
π(dx)Mz (x) = π(dx) P (x, dy)[z V (x) − z V (y ) ]/[1 − z] = 0.
Thus, under these conditions, (19.48) is necessary and sufficient for positivity.
Now provided the set C c is in B + (X), we show the right hand side of (19.60) is strictly
positive. To see this requires
two steps.
First observe that C π(dx)P (x, C c ) > 0 since C, C c ∈ B + (X). Since V (y) >
supw ∈C V (w) for y ∈ C c we have
π(dx) P (x, dw)V (w) > sup V (w) π(dx)P (x, C c ), (19.61)
C Cc w ∈C C
provided that V does not vanish on C. If V does vanish on C, then (19.64) holds
automatically.
But now, under (19.46) we have π(dx)∆V (x) = 0 from (19.51), and so (19.56) is
a consequence of this and (19.64). Since ∆V (y) is bounded under (19.46), (19.57) is
actually identical to (19.56) and the theorem is proved.
These results show that for a wide class of functions, our criteria for positivity and
nullity, given respectively in Section 11.3 and Section 11.5.1, are essentially the two
extreme cases of this mixed drift result. We conclude with an example where similar
mixed behavior may be exhibited quite explicitly.
so that there is a greater degree of homogeneity than in the general model, then the
operator
∞
Λ(x, A) := Λj (x, A)
j =0
is stochastic.
Thus Λ(x, A) defines a Markov chain ΦΛ , which is the marginal position of Φ ig-
noring the actual rung: by direct calculation we can check that for any B
if β(w) > 1 + δ for all w then, exactly as in our analysis of the random walk on a half
line, we have that
E(i,w ) [τC ] < ∞
for all i > M, w ∈ X, where C = ∪M 0 {j × X} is the “bottom end” of the ladder.
But one might not have such downwards drift uniform across the rungs. The result
we prove is thus an average drift criterion.
Theorem 19.3.4. Suppose that the chain Φ is ψ-irreducible and has the structure
(19.65). If the marginal chain ΦΛ admits an invariant probability measure ν such that
ν(dw)β(w) > 1, (19.68)
Proof The proof is similar to that of Theorem 19.3.1, but we do not assume bound-
edness of the drifts so we must be a little more delicate. Choosing V (i, w) = i, we have
first that
i ∞
∆V (i, w) = 1 − jΛj (x, X) − (i + 1) Λj (x, X);
j =0 j =i+1
Now even though (19.46) is not assumed, because |∆V (i, w)| ≤ d + 1 for i ≤ d and
because, starting at level i, after k steps the chain cannot be above level i + k, we see
exactly as in proving (19.50) that
n
−1
lim inf n P k (i, x; j × dy)∆V (j, y) ≥ 0. (19.70)
k =1 j
We now show that this average non-negative drift is not possible under (19.68), unless
the chain is positive.
From (19.68) we have
0 > lim ν(dw)∆V (k, w). (19.71)
k →∞
n
d−1
≤ n−1 d P k (i, x; j × X)
k =1 j =0
n
+ n−1 P k (i, x; j × dy)∆V (j, y) (19.74)
k =1 D v j ≥d
Choose M so large that ν(CM (j)) ≥ 1 − ε for a given ε > 0. Then we have
n n
lim n−1 k =1 P k (i, x; j × X) = lim n−1 k =1 P k (i, x; j × CM (j))
n
+ lim n−1 k =1 P k (i, x; j × [CM (j)]c )
n
≤ lim n−1 k =1 P k (i, x; CM )
n
+ lim n−1 k =1 Λk (x, [CM (j)]c )
≤ ε
(19.75)
which shows the rung j × X to be null as claimed.
19.3. Mixed drift conditions 505
n ∞
= lim n−1 k =1 j =d P k (i, x; j × B).
n
≤ lim inf n →∞ n−1 k =1 Dv j ≥d P k (i, x; j × dy)∆V (j, y)
= Dv
ν(dy)∆V (j, y) < 0
Λ0 (x, x − 1) = pq, x ≥ 1,
Λ0 (x, x + 1) = (1 − p)q, x ≥ 0,
(19.78)
Λ2 (x, x − 1) = p(1 − q), x ≥ 1,
Λ2 (x, x + 1) = (1 − p)(1 − q), x ≥ 0,
Λ0 (0, 0) = pq,
(19.79)
Λ2 (0, 0) = p(1 − q).
The marginal chain ΦΛ is a random walk on the half line {0, 1, . . .} with an invariant
measure ν if and only if p > 1/2. On the other hand, β(x) > 1 if and only if q < 1/2.
Thus (19.68) holds if q < 1/2 < p.
506 Generalized classification criteria
This chain falls into the class that we have considered in Theorem 19.3.4; but other
behaviors follow if we vary the structure at the bottom rung.
Let us then specify the boundary conditions in a manner other than (19.65): put
Λ∗1 (x, x − 1) = p(1 − q) and Λ∗1 (x, x + 1) = (1 − p)(1 − q) but
here the second term follows since, on an excursion from [0], the expected drift to the
left at every step is no more than (1 − 2p) independent of level change, and the expected
number of steps to return to [0] from 1 × X is (1 − q)/(1 − 2q).
From (19.81) we therefore have that the chain Φ[0] is transient if r and q are small
enough, and p − 1/2 is not too large.
This example shows the critical need to identify petite sets and the return times to
them in classifying any chain: here we have an example where the set [0] is not petite,
although it has many of the properties of a petite set. Yet even though we have (19.77)
proven, we do not even have enough to guarantee the chain is recurrent.
to derive positivity.
We conclude by proving through this approach a result complementing the result
found in quite another way in Proposition 11.4.4.
Theorem 19.3.5. The GI/G/1 queue with mean inter-arrival time λ and mean service
time µ satisfies (19.68) if and only if λ > µ, and in this case the chain has an invariant
measure given by (10.52).
19.3. Mixed drift conditions 507
Proof From the representations (3.42) and (3.43), we have the kernel
∞
Λ(x, [0, y]) = G(dt)P t (x, y)
0
= λ/µ
As in (19.77), we at least know that since (19.68) holds, the left hand side of this
equation is finite, so that η < ∞. Moreover, from the Blackwell Renewal Theorem
508 Generalized classification criteria
so that, finally, (19.82) follows from (19.84), (19.85), and the fact that the mean of H
is finite.
19.4 Commentary*
Despite the success of the simple drift, or Foster–Lyapunov, approach there is a growing
need for more subtle variations such as those we present here.
There are several cases in the literature where the analysis of state-dependent (or at
least not simple one-step) drift appears unavoidable: see Tjøstheim [386] or Chen and
Tsay [66], where m-step skeletons {Φm k } are analyzed. Analysis of this kind is simplified
if the various parts of the space can be considered separately as in Section 19.1.2.
In the countable space context, Theorem 19.1.1 was first shown as Theorem 1.3 and
Theorem 19.1.2 as Theorem 1.4 of Malyšhev and Men’šikov [243]. Their proofs, espe-
cially of Theorem 19.1.2, are more complex than those based on sample path arguments,
which were developed along with Theorem 19.1.3 in [283]. As noted there, the result
can be extended by choosing n(x) as a random variable, conditionally independent of
the process, on Z+ . In the special case where n(x) has a uniform distribution on [1, n]
independent of x, we get a time-averaged result used by Meyn and Down [273] in ana-
lyzing stability of queueing networks. If the variable has a point mass at n(x) we get
the results given here.
Models of random walk on the orthant in Section 19.1.2 have been analyzed in nu-
merous different ways on the integer quadrant Z2+ by, for example, [244, 257, 243, 340,
109]. Much of their work pertains to more general models which assume different drifts
on the boundary, thus leading to more complex conditions. In [244, 257, 243] it is
assumed that the increments are bounded (although they also analyze higher dimen-
sional models), whilst in [340, 109] it is shown that one can actually choose n = 1 if a
quadratic function is used for a test function, whilst weakening the bounded increments
assumption to a second moment condition: this method appears to go back to Kingman
[207].
As we have noted, positive recurrence in the simple case illustrated here could be
established more easily given the independence of the two components. However, the
bound using linear functions in (19.25) seems to be new, as does the continuous space
methodology we use here.
The antibody model here is based on that in [283]. The attack pattern of the “in-
vaders” is modeled to a large extent on the rabies model developed in Bartoszyński [19],
although the need to be the same order of magnitude as the antibody group is a weaker
assumption than that implicit in the continuous time continuous space model there.
The results in Section 19.2 are largely taken from Meyn and Tweedie [277]: they
appear to give a fruitful approach to more complex models, and the seeming simplicity
of the presentation here is largely a function of the development of the methods based on
Dynkin’s formula for the non-time-varying case. An application to adaptive control is
19.4. Commentary* 509
given in Meyn and Guo [274], where drift functions which depend on the whole history
of the chain are used systematically. Regrettably, examples using this approach are
typically too complex to present here.
The dependent parameter bilinear time series model is analyzed in [275], from which
we adopt the proof of Theorem 19.2.6. In Karlsen [195] a decoupling inequality of [210]
is used to obtain a second order stationary solution in the Gaussian parameter case, and
Brandt [44] provides a simple argument, similar to the proof of Proposition 17.3.4, to
obtain boundedness in probability for general bilinear time series models with stationary
coefficients.
Results on mixed drifts, such as those in Section 19.3.1, have been discovered inde-
pendently several times.
Although Neuts [292] analyzed a two-drift chain in detail, on a countable space the
first approach to classifying chains with different drifts appears to be due to Marlin [249].
He considered the special case of V (x) = x and assumed a fixed finite number of dif-
ferent drifts. The form given here was developed for countable spaces by Tweedie [396]
(although the proof there is incomplete) and Rosberg [336], who gives a slightly different
converse statement. A general state space form is in Tweedie [398].
The condition (19.53) for the converse result to hold, and which also suffices to ensure
that ∆V (w) ≥ 0 on C c implies non-positivity, is known as Kaplan’s condition [193]:
the general state space version sketched here is adapted from a countable space version
in [349]. Related results are in [380].
The average mean drift criterion for the ladder process in Section 19.3.2 is due to
Neuts [293] when the rungs are finite, and is proved there by matrix methods: the
general result is in [399], and (19.68) is also shown there to be necessary for positivity
under reasonable assumptions.
The final criterion for stability of the GI/G/1 queue produced by this analysis is of
course totally standard [9]: that the very indirect Markovian approach reproduces this
result exactly brings us to a remarkably reassuring conclusion.
Added in second printing: In the past year, Dai has shown in [80] that the state-
dependent drift criterion Theorem 19.1.2 leads to a new approach to the stability of
stochastic queueing network models via the analysis of a simpler deterministic fluid
model. Related work has been developed by Chen [65] and Stolyar [373], and these
results have been strengthened in Dai and Weiss [82] and Dai and Meyn [81].
Commentary for the second edition: Over the past ten years there have been
many further improvements in the theory surrounding the multi-step drift criterion
for stability within specific applications. Applications include stochastic approximation
[40, 39], Markov chain Monte Carlo (MCMC) [100], as well as stochastic networks [267],
which was the original motivation for the technique in [243].
Chapter 20
Following publication of the “Big Red Book” in the early nineties, Richard and I de-
voted more attention to applications. Each of us became interested in simulation, albeit
in entirely different contexts. In addition, Richard spent more of his time on topics in
statistics, and I became increasingly involved in topics surrounding control and perfor-
mance evaluation for networks.
Personally, I thought that I would abandon Markov chains as a research topic. This,
fortunately, has turned out to be an impossible task!
The three sections that follow can be regarded as proposals for future monographs
that will never be written by either of us. The first section comprises our biggest thrust
shortly after the book was complete, along with my own view of geometric ergodicity
and spectral theory. The second section describes how methods in this book can be
applied to construct and analyze simulation algorithms. The final section explains how
theory in continuous time can be generated from discrete time counterparts.
510
20.1. Geometric ergodicity and spectral theory 511
Among the many applications of spectral theory is the identification of the rate of
convergence in the Geometric Ergodic Theorem. One elegant bound of Diaconis and
Stroock is described below in (20.6). Spectral theory and surrounding techniques are
also used to construct finite-rank approximations of a transition kernel. These take the
form
n
P/ = si ⊗ µi , (20.1)
i=1
where for a function r and measure µ we define [r ⊗ µ](x, dy) := r(x)µ(dy). In most
cases we restrict to an LV∞ setting. In this case it is assumed that P/ is a bounded linear
operator on LV∞ , which amounts to the inclusions {si } ⊂ LV∞ and {µi } ⊂ MV1 (i.e., V
is µi -integrable for each i).
provided the inverse exists as a bounded linear operator on LV∞ . This is true whenever
|z| > 1 and |||P n |||V is uniformly bounded in n since we can express the inverses as power
series
∞
∞
Tz = z −n −1 P n , Zz = z −n −1 [P − 1 ⊗ π]n .
n =0 n =0
Hence these kernels generalize the resolvents Ka ε and Uh , defined in (3.26) and (12.13),
respectively.
This kernel Tz defined in (20.2) is used to define the spectrum of the kernel P :
512 Epilogue to the second edition
(i) The spectrum SV (P ) is the set of all z ∈ C such that the operator
Tz defined in (20.2) does not exist as a bounded linear operator on
LV∞ .
Spectral theory for Markov chains arises in a finer analysis of the Geometric Ergodic
Theorem and in the theory of large deviations. Just as in the theory of linear systems
and finite state space Markov chains, the dynamics of the chain can be understood
through an analysis of the spectrum of P . We survey these ideas next.
This observation can be strengthened. If the state space is finite, it is known that
the rate of convergence to equilibrium is determined by the second largest eigenvalue,
and this result can be generalized to obtain bounds on the rate of convergence in the
Geometric Ergodic Theorem.
We have the following general result that follows from ideas in [282, 218]. The
constant λ∗ ∈ (0, 1) appearing in (20.3) is called the spectral radius of P n − 1 ⊗ π.
(i) If Φ is V -uniformly ergodic, then there is a spectral gap in LV∞ , and there exists
ε0 < 1 such that the inverse operator Zz exists as a bounded linear operator on
LV∞ for every z ∈ C satisfying |z| > ε0 .
(ii) Conversely, suppose that there exists ε0 < 1 such that |||Zz |||V < ∞ for every z ∈ C
satisfying |z| > ε0 . Then Φ is V -uniformly ergodic, and the rate of convergence
is bounded as follows:
where λ∗ < 1 is the minimum value of ε0 , which coincides with the minimal bound
on the spectrum within the open unit disk: λ∗ = max{|z| : z = 1, z ∈ SV (P )}.
Proof For (i) we begin with the proof that Zz is defined for this range of z.
Under the assumptions of (i) we have for some R < ∞, r > 1,
which when combined with the sequence of bounds given in (20.4) gives
∞
1
|||Zz |||V ≤ R |z|−n −1 r−n = R|z|−1
n =0
1 − |z|−1 r−1
Applying the Markov property, it can be shown that this invariance holds if and only
if the bivariate distributions are insensitive to time reversal in steady state:
dist
(Φt , Φt+1 ) = (Φt+1 , Φt ) .
The bivariate distributions can be identified, leading to the more standard definition:
Φ is reversible if the detailed balance equations hold
For a reversible chain with finite state space each of the eigenvalues is real. Diaconis
and Stroock in [90] obtain bounds on the second largest eigenvalue in this setting. A
striking conclusion is the following explicit bound on the rate of convergence
:
P n (x, · ) − π
V ≤ 1−π (x) n
π (x) λ∗ , (20.6)
where λ∗ is the magnitude of the second largest eigenvalue, λ∗ = max{|λ| : λ = 1}, and
V ≡ 1. Bounds on the rate of convergence for chains that are not necessarily reversible
are obtained in [119], again in the finite state space case. The bounds are based on
spectral theory, but the spectrum of the symmetrized kernel P P is considered, where
P is the transition kernel for the time-reversed chain.
See Diaconis and Saloff-Coste [89], Rosenthal’s survey [344], and Baxendale [21] for
bibliographies and further generalizations and improvements since 1996.
Spectral theory is often cast in a Hilbert space setting in the space L2 (π), defined as
the set of all measurable functions f on X satisfying π(f 2 ) <-∞. For arbitrary p ≥ 1,
the Lp (π) norm of a function f : X → R is defined by
f
p = p π(|f |p ). It is natural to
extend the definitions of spectrum and spectral gap to the Lp (π) norm. In particular,
the chain is called geometrically ergodic in Lp (π) if there exist r > 1 and R < ∞ such
that for all n ∈ Z+ , f ∈ Lp (π),
P n f − π(f )
pp := |P n f (x) − π(f )|p π(dx) ≤ Rr−n
f
pp . (20.7)
20.1. Geometric ergodicity and spectral theory 515
(i) There is a reversible Markov chain that is V -uniformly ergodic but not geometri-
cally ergodic in L1 (π).
(ii) There is a Markov chain that is V -uniformly ergodic but not geometrically ergodic
in L2 (π).
(iii) If the chain is reversible, then it is geometrically ergodic in L2 (π) if and only if it
is V -uniformly ergodic for some V : X → [1, ∞], finite a.e. [π].
Proof The M/M/1 queue provides an example in (i). Proposition 20.1.3 that
follows demonstrates that this model is V -uniformly ergodic and reversible provided a
“load condition” holds. It is shown that there is a constant λ ∈ (0, 1) such that every
λ ∈ [λ, 1) is an eigenvalue, with corresponding eigenfunction h ∈ LV∞ , and hence also
h ∈ L1 (π). The eigenfunction property gives π(h) = 0 and
P n h − π(h)
1 = λn
h
1
for each n. This rules out (20.7) for a fixed r > 1.
To prove (ii), note first that if the chain is geometrically ergodic in L2 (π), then
(20.7) combined with the definition (17.4) gives the following bound on the asymptotic
variance:
r+1
γf2 ≤ R
f
22 < ∞, f ∈ L2 (π).
r−1
In particular, the asymptotic variance is finite whenever the ordinary variance is finite.
Häggström in [150] gives an example of a V -uniformly ergodic Markov chain and a
function f such that f ∈ L2 (π), yet the asymptotic variance is not finite.
Part (iii) is established by Roberts and Rosenthal in [329].
where p denotes the probability that Wk is equal to one. In the special case n = 0
we set h(n − 1) = h(−1) = h(0) to make this formula consistent with the dynamics
516 Epilogue to the second edition
(1.7). We have seen in Section 16.1.3 that this chain is V -uniformly ergodic provided
ρ := p/(1 − p) < 1. We can take V (n) = r0n , n ≥ 0, for any r0 ∈ (1, ρ−1 ). It follows from
Theorem 20.1.1 that there is a spectral gap. The unique invariant measure is geometric,
π(n) = (1 − ρ)ρn , n ≥ 0. The detailed balance equations (20.5) are easily verified, so
that we can conclude that the M/M/1 queue is reversible.
We next consider the spectrum in LV∞ with this V , where r0 ∈ (ρ− 2 , ρ−1 ) is fixed.
1
We find in Proposition 20.1.3 that the spectrum is not discrete even when the model
admits a spectral gap.
The structure of eigenfunctions can be identified through the form of the transition
law (20.8). This expression suggests the application of transform techniques: define for
any complex z,
H(z) = z −n h(n) .
This is defined for |z| > r0−1 whenever h ∈ LV∞ . If h is an eigenfunction, then on taking
transforms of each side of the eigenfunction equation P h = λh, and applying (20.8),
we find that H can be expressed as the ratio of quadratic functions. If the roots of the
denominator are distinct, then H can be expressed as the sum of two simpler rational
functions
z z
H(z) = c1 + c2
z − β1 z − β2
for constants c1 , c2 , where β1 , β2 are the poles of H.
Proposition 20.1.3. Suppose that ρ < 1, fix r0 ∈ (ρ− 2 , ρ−1 ), and define V (n) = r0n
1
for n ≥ 0. Then
(i) The queue is V -uniformly ergodic.
(0, 1). This is also an eigenvalue for P in LV∞ with eigenfunction given by the
difference of scaled geometric series
-
(iii) As β ↓ ρ− 2 the eigenvalues converge, with λ ↓ 2 p(1 − p). This limiting value is
1
Proof We have already established (i). To see (ii) and (iii), first observe that the
eigenfunction equation P h (n) = λh(n) holds for n ≥ 1 for the function h(n) = β n ,
regardless of the value of β, with λ = pβ + (1 − p)β −1 . However, except for the special
case β = 1, no single geometric series defines an eigenfunction.
To cope with the special case n = 0 we consider a pair of geometric series to cancel an
“error term.” Consider h(n) = β n − cβ− n , where β and β− are given in the proposition.
From the foregoing we do have P h (n) = λh(n) for n ≥ 1. We choose c to ensure that
this also holds with n = 0, which gives the unique value c = (1 − β− −1 )/(1 − β −1 ). The
resulting function is a scalar multiple of (20.9).
20.1. Geometric ergodicity and spectral theory 517
Finally, we note that a unique solution β− ∈ (1, ρ− 2 ) exists since the function
1
In conclusion, the M/M/1 queue admits a spectral gap, but the spectrum is not
discrete. Moreover, the set of eigenvalues depends on the choice of V , with sup{λ : λ <
1} = pr0 + (1 − p)r0−1 approaching unity as r0 increases to the upper bound ρ−1 .
We next demonstrate that a suitably strong drift condition implies a discrete spec-
trum in L2 (π) and LV∞ simultaneously.
From the definition of the nonlinear generator the bound (20.11) can be expressed
as
P eV ≤ exp(−δf + bIC )eV .
If V is bounded on the set C, then this implies a version of (V4):
∆eV (x) ≤ −βeV (x) + b IC (x), x ∈ X,
where b = b + supx∈C eV (x) and 1 − β = supx∈X e−δ f (x) . We have β ≤ 1 − e−δ < 1
under the assumption that f ≥ 1.
518 Epilogue to the second edition
Consider for example the LSS(F, G) model under the assumptions of Proposi-
tion 7.5.3. Assume in addition that the distribution of the disturbance has a “Gaussian
tail”: E[exp(ε
Wk
2 )] < ∞ for some ε > 0. One solution to (20.11) is obtained using
the quadratic V (x) = 1 + ε0 |x|2M in which ε0 > 0 is chosen sufficiently small and the
norm |y|2M := y M y, y ∈ Rn , is defined in (12.31). In this case the function f can be
chosen with linear growth, f (x) = 1 + |x|M .
For the purposes of spectral analysis, (DV3) is used to justify truncation of the
transition kernel. Define for n ≥ 1,
P/n := I{C f (n )} P
Proposition 20.1.4. Suppose that (20.11) holds with f unbounded. Assume moreover
that C ⊂ Cf (n) for all n ≥ 1 sufficiently large. Then
where v := eV .
Proof Under the assumptions of the proposition we have P v ≤ e−δ f v on {Cf (n)}c
for n sufficiently large. From the definition of the sublevel set this gives
Proposition 20.1.5. Suppose that X is countable and that (20.11) holds for a coercive
function f and a finite set C. Then P has a discrete spectrum in Lv∞ .
The proof of Proposition 20.1.5 follows from Theorem 3.5 of [219]. The idea is that
/
Pn can be expressed as a finite-rank operator, of the form (20.1), and hence its spectrum
is finite.
Proposition 20.1.4 implies that P can be approximated by P/n in norm. From the
proof we obtain the explicit bound |||P − P/n |||v ≤ e−δ n . It is shown in [219] that the
spectrum of P is discrete if it can be approximated by finite-rank kernels in this fashion.
Multiplicative ergodic theory is the study of the asymptotics of this quantity for large
n. Under suitable conditions on the chain and the function F the multiplicative ergodic
theorem holds,
1
lim log λn ,x (F ) = Λ(F ),
n →∞ n
where the limiting log-moment generating function Λ(F ) is independent of the initial
condition x.
To place this problem within the context of spectral theory, introduce the positive
kernel Pf (x, dy) = f (x)P (x, dy), with f = eF . The iterates Pfn are defined in the usual
way and have the representation
6 7
Pfn (x, A) = Ex exp Sn (F ) I{Φn ∈ A} . (20.13)
Setting A = X gives λn ,x (F ) = Pfn (x, X). This is known as the Feynman–Kac semi-
group, though this terminology is usually reserved for processes in continuous time.
Recall that the kernel Uh was defined based on a power series with respect to this
semi-group with f = 1 − h ≥ 0 (see (12.14)).
When the limit defining Λ(F ) exists, we typically have Λ(F ) = log(λ), where λ is
the largest eigenvalue of Pf (known as the Perron–Frobenius eigenvalue) [303, 297, 298].
Foundations of Perron–Frobenius theory go back to Tweedie’s earliest work on positive
operators [394, 395], following the work of Vere-Jones and Seneta for positive matrices
and Markov chains on a countable or finite state space [405, 348].
The spectrum of Pf is defined precisely as for a probabilistic kernel. Criteria for a
spectral gap are developed in [17, 218, 219], along with multiplicative ergodic theorems.
These take the form
6 7
lim λ−n λn ,x (F ) = lim Ex exp Sn (F ) − nΛ(F ) = fˇ(x), x ∈ X, (20.14)
n →∞ n →∞
H(F̌ ) = F̌ − F + Λ(F ) ,
In several different settings, it is shown in [218, 17, 219] that the chain with this tran-
sition kernel is geometrically ergodic, and this implies that the convergence in (20.14)
holds at a geometric rate.
520 Epilogue to the second edition
The convergence of the log-moment generating functions is used in [218, 17, 219] to
prove large deviations estimates for the partial sums Sn – see also [84, 265, 141] and
their references. The simplest estimates take the following form: for c > π(F ),
The limit (20.15) is established for geometrically ergodic models in [218] provided F is
bounded and c > π(F ) is close enough to the mean π(F ). An elegant bound on the
error probability Pπ {Sn ≥ nc} is obtained in [140] for uniformly ergodic chains, similar
to the coupling bound in Theorem 16.2.4.
Another approach to obtaining bounds on the rate of convergence as well as large
deviations asymptotics for Markov chains is based on Sobolev inequalities and their gen-
eralizations [89, 407]. The relationship between these conditions and (DV3) is explored
in [58].
20.1.6 Quasi-stationarity
Metastability refers to the presence of near stationary behavior of a process during a
time period in which it remains in some restricted region of the state space. Quasi-
stationarity has the following precise definition: a set M is quasi-stationary if there
exists a probability measure πM satisfying
The existence of a limit can be established exactly as in the proof of (20.14), provided
a twisted kernel is ergodic [118].
The analysis carries over to processes in continuous time. It is shown in [166] that
the twisted semi-group is exponentially ergodic for a diffusion process when the set M is
taken to be a connected component of {x : h(x) = 0}, with h an eigenfunction. Further
analysis reveals that the exit time from M is approximately exponentially distributed
with mean |Λ|−1 , where Λ < 0 is the corresponding eigenvalue for the Markovian
generator. Generalizations to the case in which Λ is complex are contained in [276].
20.2. Simulation and MCMC 521
In some applications the probability measure π is given, and the question is how to
construct a Markov chain with invariant distribution π. Answers to this question are
contained in the Markov chain Monte Carlo (MCMC) literature.
The question then is how to construct a process with zero mean, and one for which the
asymptotic variance is reduced.
Henderson and Glynn introduced a collection of techniques for this purpose in [158,
160] – see also the recent monograph [10]. Suppose that the chain is V -uniformly
ergodic, fix a function g ∈ LV∞ , and define
Although the mean of Wt may not be zero for arbitrary initial conditions, its steady
state mean is always zero:
Eπ [Wt ] = − ∆g (x) π(dx) = π(g − P g) = 0.
See Theorem 17.7.2 for relaxed assumptions on g under which this limit holds.
With proper choice of g the variance of the resulting estimate is reduced. In fact,
under mild conditions the control variate can be constructed so that the asymptotic
variance of the resulting estimator is zero. Suppose that f 2 ∈ LV∞ , and set g = fˆ
equal to a solution to Poisson’s equation, P h = h − f + π(f ). In this case we have
Wt = f (Φt ) − π(f ), so that π̂nC V (f ) = π(f ) for each n ≥ 1.
522 Epilogue to the second edition
18
Without variance reduction
16 Variance reduction
using shadow function
14
12
10
0 1 2 3 4 5 6 7 8 9 10
4
x 10
Figure 20.1: Results for a simulation run of length 100,000 steps in a network model.
The dashed line represents the running average cost using the standard estimator. The
solid line represents the running average cost for the estimator (20.19).
Of course, if we can solve Poisson’s equation then we have computed the steady state
mean, so there is no reason to simulate. Henderson and Glynn propose approximate
solutions to reduce the variance in the standard estimator. See the survey by Glynn
and Szechtman [138], and results specialized to network models in [159, 265, 267]. The
function g − P g appearing in (20.20) is called a shadow function in [159, 267] since it
is meant to eclipse the function f to be simulated.
Figure 20.1 shows a comparison of the standard estimator and the estimator (20.19)
for a network model. Details on the model and the construction of the control variate
can be found in [159] and [267, Chapter 11]. The main idea is to use the control variate
(20.20) in which g is a fluid value function that approximates the solution to Poisson’s
equation. The introduction of the zero-mean term Wt results in a 100-fold reduction in
variance over the standard estimator in this example.
where the stochastic differential equation is defined in the usual L2 sense, introduced
by Itô several years after Doeblin’s early death. More on this history can be found in
[153], and scientific details are contained in [411].
In this section we provide highlights of the general theory of ψ-irreducible Markov
models in continuous time, without examples and without discussion of specific model
classes such as the Doeblin–Itô stochastic differential equation. In particular, left out
are the fruits of Richard’s collaboration with Gareth Roberts on Langevin algorithms
[334], which led to fundamental results on the stability of Langevin diffusions and their
discretizations [375, 376].
The focus here is on methods for translating theory from discrete to continuous time.
This is made possible through the resolvent kernels, and associated resolvent equations,
as described in one of our contributions to the 1991 Blaubeuren meeting [278].
Random sampling provides a representation for this kernel. Let {Tk } denote a Poisson
process with rate α, independent of the process Φ. That is, T0 = 0, and Tk +1 =
Tk + α−1 Ak +1 for k ≥ 0, where A is an i.i.d. sequence, independent of Φ, with standard
exponential distribution. The sampled chain is defined exactly as in discrete time by
Φαk := ΦT k , k ≥ 0.
The transition kernel for the Markov chain Φα coincides with the normalized resolvent
αUα .
The definitions of irreducibility and the petiteness property of sets are then based
on properties of the sampled chains:
P t (x, · ) − π
is non-increasing in t, which shows that Φ is ergodic:
lim
P t (x, · ) − π
= 0. (20.26)
t→∞
The existence of the limit is guaranteed under various conditions on the model and the
function V .
It is more convenient to work with a relaxed definition. Recall that in the devel-
opment of limit theory in Section 17.4 we relied upon the construction of martingales,
such as {Mn (g)} defined in (17.43). For a given function h : X → R define g = ∆h,
where ∆ = P − I is the drift operator, and consider the stochastic process
n
Mn := h(Φn ) − h(Φ0 ) − g(Φk ), n ≥ 1.
k =1
This means that there exists a sequence of stopping times {τn } such that the stopped
process {Mt∧τ n : t ≥ 0} is a martingale, for each n ≥ 1, and τn ↑ ∞ a.s. as n → ∞. We
let A denote the extended generator, and denote Af = g when M is a local martingale.
Two resolvent equations are given in the following. The second equation (20.30)
implies that the domain of the extended generator includes the range of the resolvent.
Theorem 20.3.1 (Resolvent equations in continuous time). If {P t } is a Markovian
semi-group, then the following hold:
(i) For any pair of positive constants β and α
Uα = Uβ + (β − α)Uβ Uα = Uβ + (β − α)Uα Uβ . (20.29)
(ii) For each α > 0 and bounded measurable function g : X → R, the function Uα g is
in the domain of the extended generator with
AUα g = αUα g − g. (20.30)
20.3. Continuous time models 527
A set C is called petite if it is petite for some sampled chain: for a probability
distribution a on R+ , all x, and all B ∈ B,
∞
Ka (x, B) := P t (x, B) a(dt) ≥ νa (B),
0
(i) The chain Φ is positive recurrent with invariant probability measure π, and there
exists some ν-petite set C ∈ B+ (X), ρC < 1, MC < ∞, and P ∞ (C) > 0 such that
for all x ∈ C
|P t (x, C) − P ∞ (C)| ≤ MC ρtC . (20.32)
(ii) There exists a closed petite set C ∈ B(X) and κ > 1 such that
(iii) There exists a closed petite set C, constants b < ∞, β > 0 and a function V ≥ 1
finite at some one x0 ∈ X satisfying (20.31).
528 Epilogue to the second edition
Any of these three conditions imply that the set SV = {x : V (x) < ∞} is absorbing and
full, where V is any solution to (20.31) satisfying the conditions of (iii), and there then
exist constants r > 1, R < ∞ such that for any x ∈ SV ,
Uα = [αI − A]−1 .
Uh = [I−H − A]−1 ,
APPENDICES
Appendices
Despite our best efforts, we understand that the scope of this book inevitably leads
to the potential for confusion in readers new to the subject, especially in view of the
variety of approaches to stability which we have given, the many related and perhaps
(until frequently used) forgettable versions of the “Foster–Lyapunov” drift criteria, and
the sometimes widely separated conditions on the various models which are introduced
throughout the book.
At the risk of repetition, we therefore gather together in this Appendix several
discussions which we hope will assist in giving both the big picture, and a detailed
illustration of how the structural results developed in this book may be applied in
different contexts.
We first give a succinct series of equivalences between and implications of the various
classifications we have defined, as a quick “mud map” to where we have been. In
particular, this should help to differentiate between those stability conditions which are
“almost” the same.
Secondly, we list together the drift conditions, in slightly abbreviated form, together
with references to their introduction and the key theorems which prove that they are
indeed criteria for different forms of stability and instability. As a guide to their usage
we then review the analysis of one specific model (the scalar threshold autoregression,
or SETAR model).
This model incorporates a number of sub-models (specifically, random walks and
scalar linear models) which we have already analyzed individually: thus, although not
the most complex model available, the SETAR model serves to illustrate many of the
technical steps needed to convert elegant theory into practical use in a number of fields
of application. The scalar SETAR model also has the distinct advantage that under the
finite second moment conditions we impose, it can be analyzed fully, with a complete cat-
egorization of its parameter space to place each model into an appropriate stability class.
Thirdly, we give a glossary of the assumptions employed in each of the various models
we have analyzed. This list is not completely self-contained: to do this would extend
repetition beyond reasonable bounds. However, our experience is that, when looking at
a multiply analyzed model, one can run out of hands with which to hold pages open,
so we trust that this recapitulation will serve our readers well.
We conclude with a short collection of mathematical results which underpin and are
used in proving results throughout the book: these are intended to render the book
self-contained, but make no pretence at giving any more comprehensive overview of the
areas of measure theory, analysis, topology and even number theory which contribute
to the overall development of the theory of general Markov chains.
531
Appendix A
Mud maps
The wide variety of approaches to and definitions of stability can be confusing. Unfor-
tunately, if one insists on non-countable spaces there is little that can be done about
the occasions when two definitions are “almost the same” except to try and delineate
the differences.
Here then is an overview of the structure of Markov chains we have developed, at
least for the class of chains on which we have concentrated, namely
We have classified chains in I using three different but (almost) equivalent properties:
P n -properties: that is, direct properties of the transition laws P n ;
τ -properties: properties couched in terms of the hitting times τA for appropriate
sets A;
drift properties: properties using one-step increments of the form of ∆V for some
function V .
I =T +R
where T denotes the class of transient chains and R denotes the class of recurrent
chains. This is defined as a dichotomy through a P n -property in Theorem 8.0.1:
532
A.1. Recurrence versus transience 533
Φ∈R ⇐⇒ P n (x, A) = ∞, x ∈ X, A ∈ B + (X),
n
Φ∈T ⇐⇒ P n (x, Aj ) ≤ Mj < ∞, x ∈ X, X = ∪Aj .
n
If Φ ∈ R, then (Theorem 9.0.1) there is a full absorbing set (a maximal Harris set) H
such that
X=H ∪N
Q(x, A) = Px (Φ ∈ A i.o.).
Φ ∈ R ⇐⇒ Q(x, A) = 1, x ∈ H, A ∈ B + (X),
Φ∈H ⇐= ∆V (x) ≤ 0, x ∈ Cc ,
C petite, V unbounded off petite sets;
Φ∈T ⇐⇒ ∆V (x) ≥ 0, x ∈ Cc ,
C petite, V bounded and increasing off C.
There is thus only one gap in these classifications, namely the actual equivalence of
the drift condition for recurrence. We have shown (Theorem 9.4.2) that such equivalence
holds for Feller (including countable space) chains.
Finally, it is valuable in practice in a topological context to recall that for T-chains,
which (Proposition 6.2.8) include all Feller chains in I such that supp ψ has non-empty
interior,
Φ ∈ H ⇐⇒ Φ is non-evanescent;
that is, Harris chains in this case do not leave compact sets forever.
I =P +N
where N denotes the set of null chains and P ⊆ R denotes the set of positive chains.
Since every transient chain is a fortiori null, this is in any real sense a breakup of R
rather than the complete set I, and is defined in Chapter 10 through a P n -property:
A.2. Positivity versus nullity 535
Φ∈P ⇐⇒ π(A) = π(dy)P n (y, A), A ∈ B(X),
X=S∪N
Φ∈N ⇐⇒ π(dx)Ex [τC ] = ∞, C ∈ B + (X).
C
Again, if Φ ∈ S, then the first of these holds with S = X. We might expect that
clearly the infinite expected hitting times will imply the chain is not positive, but the
converse appears to be so far unknown except when C is an atom.
The drift classification is:
536 Mud maps
There is again one open question in these classifications, namely that of the equiv-
alence or otherwise of the drift condition for nullity. We do not know how close this is
to complete.
In a topological context we know again (see Chapter 18) that for T-chains, there
is a further stability property completely equivalent to positivity: if Φ is an aperiodic
T-chain in R then
Both the P n and τ properties are essentially properties involving the whole trajectory
of the chain. The drift conditions, and in particular their sufficiency for classification,
are powerful practical tools of analysis because they involve only the one-step movement
of the chain: this is summarized further in Section B.1.
However, these are weak categorizations of the types of convergence which hold for
A.3. Convergence properties 537
H∩P =E
Φ ∈ E ⇐⇒ lim
P n (x, · ) − π
= 0, x ∈ X.
n →∞
The properties of E are delineated further in Part III, and in particular in our next
appendix we summarize criteria (drift conditions) for classifying sub-classes of E.
Appendix B
Typically, for well-behaved chains we are able without great difficulty to give conditions
showing a set C to be a “test set”; these sets are usually petite, or for T-chains, compact.
The choice of V , on the other hand, is an art form and depends strongly on intuition
regarding the movement of the chain.
∆V (x) ≤ 0, x ∈ Cc . (8.42)
Several theorems show this to be an appropriate condition for various forms of recur-
rence, including Theorem 8.4.3, Theorem 9.4.1, and Theorem 12.3.3.
538
B.1. Glossary of drift conditions 539
Various theorems which show this to be an appropriate condition for various forms of f -
regularity, existence of f -moments of π and f -ergodicity and even sample path results
such as the Central Limit Theorem and the Law of the Iterated Logarithm include
Theorem 14.2.3, Theorem 14.2.6, Theorem 14.3.7 and Theorem 17.5.3.
See also the drift criterion of Jarner and Roberts described on page 360.
P V ≤ λV + L (15.35)
holds for some λ < 1, L < ∞, and this is a frequently used alternative form.
Results in Section 20.1 show that (V4) characterizes the existence of a spectral
gap for the transition kernel. The stronger drift criterion of Donsker and Varadhan
introduced on page 517 is closely related to a discrete spectrum for P .
∆V (x) ≥ 0, x ∈ Cc , (8.41)
Exactly the same drift criterion can also be shown to give an appropriate condition
for various forms of transience, as in Theorem 8.4.2: these require, typically, that V be
bounded, and C be a sublevel set of V with both C and C c in B + (X).
These criteria form the basis for classification of the chains we have considered into
the various stability classes, and despite their simplicity they appear to work well across
a great range of cases. It is our experience that in the use of the two commonest criteria
(V2) and (V4) for models on Rk , quadratic forms are the most useful to use, although
the choice of a suitable form is not always trivial.
Finally, we mention that in some cases where identifying the test function is difficult
we may need greater subtlety: the generalizations in Chapter 19 then provide a number
of other methods of approach.
θ(1)
θ(1)θ(M ) = 1
Figure B.1: The SETAR model: stability classification of (θ(1), θ(M ))-space. The
model is regular in the shaded “interior” area (11.36), and transient in the unshaded
“exterior” (9.46), (9.47) and (9.50). The boundaries are in the figures below.
In Figure B.1, Figure B.2 and Figure B.3 we depict the parameter space in terms
of φ(1), θ(1), φ(M ), and θ(M ). The results we have proved show that in the “interior”
B.2. The scalar SETAR model: a complete classification 541
and “boundary” areas, the SETAR model is Harris recurrent; and it is transient in the
“exterior” of the parameter space. In accordance with intuition, the model is null on
the boundaries themselves, and regular (and indeed, in this case, geometrically ergodic)
in the strict interior of the parameter space.
φ(M )
φ(M )
φ(1) φ(1)
φ(M )
φ(1)
Figure B.2: The SETAR model: stability classification of (φ(1), φ(M ))-space in the
regions (θ(M ) = 1; θ(1) ≤ 1) and (θ(M ) ≤ 1; θ(1) = 1). The model is regular in
the shaded “interior” areas, which are (clockwise starting with the plot on the far left)
(11.38), (11.37) and (11.39); transient in the unshaded “exterior” (9.49), (9.48); and null
recurrent on the “margins” described clockwise by (11.45), (11.46) and (11.47)–(11.48).
The steps taken to carry out this classification form a template for analyzing many
models, which is our reason for reproducing them in summary form here.
(STEP 1) As a first step, we show in Theorem 6.3.6 that the SETAR model is
a ϕ-irreducible T-process with ϕ taken as Lebesgue measure µL e b on R. Thus compact
sets are test sets in all of the criteria above.
(STEP 2) In the “interior” of the parameter space we are able to identify geo-
metric ergodicity in Proposition 11.4.5, by using (V4) with linear test functions of the
form "
a x, x > 0,
V (x) =
b |x|, x ≤ 0,
and suitable choice of the coefficients a, b, related to the parameters of the model. Note
that we only indicated that V satisfied (V2), but the stronger form is actually proved
in that result.
(STEP 3) We establish transience on the “exterior” of the parameter space as
in Proposition 9.5.4 using the bounded function
1 − 1/a(x + u), x > c/a − u,
V (x) = 1 − 1/c, −c/b − v < x < c/a − u,
1 + 1/b(x + v), x < −c/b − v,
542 Testing for stability
φ(M )
φ(1)φ(M ) + φ(M ) = 0
φ(1)
Figure B.3: The SETAR model: stability classification of (φ(1), φ(M ))-space in the
region (θ(M ) θ(1) = 1; θ(1) ≤ 0). The model is regular in the shaded “interior” area
(11.40); transient in the unshaded “exterior” (9.51); and null recurrent on the “margin”
described by (11.49).
and V (x) = 0 in the region [−R, R], where a, b, R, u and v are constants chosen suitably
for different regions of the parameter space.
To complete the classification of the model, we need to prove that in this region the
model is not positive recurrent. In Proposition 11.5.5 we show that the chain is indeed
null on the margins of the parameter space, using essentially linear test functions in
(11.42).
This model, although not linear, is sufficiently so that the methods applied to the
random walk or the simple autoregressive models work here also. In this sense the
SETAR model is an example of greater complexity but not of a step change in type.
Indeed, the fact that the drift conditions only have to hold outside a compact set means
that for this model we really only have to consider the two linear models one each of
the end intervals, rendering its analysis even more straightforward.
For more detail on this model, see Tong [388]; and for some of the complications in
moving to multidimensional versions, see Brockwell, Liu and Tweedie [52].
Other generalized random coefficient models or completely nonlinear models with
which we have dealt are in many ways more difficult to classify. Nevertheless, steps
similar to those above are frequently the only ones available, and in practice linearization
to enable use of test functions of these forms will often be the approach taken.
Appendix C
Here we gather together the assumptions used for the classes of models we have analyzed
as continuing examples. The equation numbering and assumption item labels (such as
(RT1)) coincide with those used in the main body of the book.
(RT3) If {Zn } is a renewal process in continuous time with no delay, then we call the
process
V + (t) := inf(Zn − t : Zn > t, n ≥ 1), t≥0
the forward recurrence time process; and for any δ > 0, the discrete time chain V +
δ =
V + (nδ), n ∈ Z+ is called the forward recurrence time δ-skeleton.
(RT4) We call the process
V − (t) := inf(t − Zn : Zn ≤ t, n ≥ 1), t≥0
the backward recurrence time process; and for any δ > 0, the discrete time chain V −
δ =
V − (nδ), n ∈ Z+ is called the backward recurrence time δ-skeleton.
543
544 Glossary of model assumptions
Φk = Φk −1 + Wk
(RW2) We call the random walk spread out (or equivalently, we call Γ spread out) if
some convolution power Γn ∗ is non-singular with respect to µL e b .
(Q2) The nth customer brings a job requiring service Sn where the service times are
independent of each other and of the inter-arrival times, and are distributed as a
variable S with distribution H(−∞, t] = P(S ≤ t).
(Q3) There is one server and customers are served in order of arrival.
In such a general situation we have often considered the countable space chain consisting
of the number of customers in the queue either at arrival or at departure times. Under
some exponential assumptions these give the GI/M/1 and M/G/1 queueing systems:
C.1. Regenerative models 545
In storage models we have a special case of random walk on a half line, but here we
consider the model at the times of input and break the increment into the input and
output components.
The simple storage model has the following assumptions:
(SSM1) For each n ≥ 0 let Sn and Tn be i.i.d. random variables on R with distributions
H and G.
(SSM2) Define the random variables
Φn +1 = [Φn + Sn − Jn ]+
The chain Φ = {Φn } can be interpreted as the content of the storage system at the
times {Tn −} immediately before each input and is called the content-dependent storage
model.
We also note that these models can be used to represent a number of state-dependent
queueing systems where the rate of service depends on the actual state of the system
rather than being independent.
546 Glossary of model assumptions
Xn = αXn −1 + Wn , n ≥ 1;
(SLM2) the random variables {Wn } are an i.i.d. sequence with distribution Γ on R.
(LSS1) There exists an n × n matrix F and an n × p matrix G such that for each
k ∈ Z+ , the random variables Xk and Wk take values in Rn and Rp , respectively,
and satisfy inductively for k ≥ 1, and arbitrary W0 ,
Xk = F Xk −1 + GWk ;
(LSS2) The random variables {Wn } are i.i.d. with common finite mean, taking values
on Rp , with distribution Γ(A) = P(Wj ∈ A).
Then X is called the linear state space model driven by F, G, or the LSS(F ,G) model,
with associated control model LCM(F ,G) (defined below).
Further assumptions are required for the stability analysis of this model. These
include, at different times
(LSS3) The noise variable W has a Gaussian distribution on Rp with zero mean and
unit variance: that is, W ∼ N (0, I).
The associated (linear) control model LCM(F ,G) is defined by the following two sets
of assumptions.
Suppose x = {xk } is a deterministic process on Rn and u = {un } is a deterministic
process on Rp , for which x0 is arbitrary; then x is called the linear control model driven
by F, G, or the LCM(F ,G) model, if for k ≥ 1
(LCM1) there exists an n×n matrix F and an n×p matrix G such that for each k ∈ Z+ ,
xk +1 = F xk + Guk +1 ; (1.4)
C.2. State space models 547
(AR1) for each n ≥ 0, Yn and Wn are random variables on R satisfying, inductively for
n ≥ k,
Yn = α1 Yn −1 + α2 Yn −2 + · · · + αk Yn −k + Wn ,
for some α1 , . . . , αk ∈ R;
k
Yn = αj Yn −j + βj Wn −j + Wn ,
j =1 j =1
for some α1 , . . . , αk , β1 , . . . , β ∈ R;
(DS1) The process Φ is deterministic and generated by the nonlinear difference equa-
tion, or semi-dynamical system,
Φk +1 = F (Φk ), k ∈ Z+ , (11.16)
sup V (F (x)) ≤ M.
x∈C
The chain X = {Xn } is called a scalar nonlinear state space model on R driven by F ,
or SNSS(F ) model, if it satisfies
548 Glossary of model assumptions
Zn σz 0
E[ W (Zk , Wk )] = δn −k , n≥1
n 0 σw2
with σz < 1;
(SAC3) the input process satisfies Uk ∈ Yk , k ∈ Z+ , where Yk = σ{Y0 , . . . , Yk }.
With the control Uk chosen as Uk = −θ̂k Yk , k ∈ Z+ , the closed loop system equations
for the simple adaptive control model are
θ̃k +1 = αθ̃k − αΣk Yk +1 Yk (Σk Yk2 + σw2 )−1 + Zk +1 (2.22)
Yk +1 = θ̃k Yk + Wk +1 (2.23)
Σk +1 = σz2 +α 2
σw2 Σk (Σk Yk2 + σw2 )−1 , k≥1 (2.24)
where the triple Σ0 , θ̃0 , Y0 is given as an initial condition.
The closed loop system gives rise to a Markovian system of the form (NSS1), so that
σ2
Φk = (Σk , θ̃k , Yk ) is a Markov chain with state space X = [σz2 , 1−αz 2 ] × R2 .
550 Glossary of model assumptions
where −∞ = r0 < r1 < · · · < rM = ∞ and {Wn (j)} forms an i.i.d. zero-mean
error sequence for each j, independent of {Wn (i)} for i = j.
For stability classification we often use
(SETAR2) For each j = 1, . . . , M , the noise variable W (j) has a density positive on the
whole real line.
(SETAR3) The variances of the noise distributions for the two end intervals are finite;
that is,
E(W 2 (1)) < ∞, E(W 2 (M )) < ∞.
Xn = θXn −1 + bXn −1 Wn + Wn
Yk +1 = θk Yk + Wk +1 (2.13)
θk +1 = αθk + Zk +1 . (2.14)
Xk +1 = (A + Γk +1 )Xk + Wk +1
(RCA1) The sequences Γ and W are i.i.d. and also independent of each other.
(RCA2) The following expectations exist, and have the prescribed values:
Some mathematical
background
In this final section we collect together, for ease of reference, many of those mathemat-
ical results which we have used in developing our results on Markov chains and their
applications: these come from probability and measure theory, topology, stochastic
processes, the theory of probabilities on topological spaces, and even number theory.
We have tried to give results at a relevant level of generality for each of the types
of use: for example, since we assume that the leap from countable to general spaces or
topological spaces is one that this book should encourage, we have reviewed (even if
briefly) the simple aspects of this theory; conversely, we assume that only a relatively
sophisticated audience will wish to see details of sample path results, and the martingale
background provided requires some such sophistication.
Readers who are unfamiliar with any particular concepts and who wish to delve
further into them should consult the standard references cited, although in general a
deep understanding of many of these results is not vital to follow the development in
this book itself.
552
D.1. Some measure theory 553
(a) X ∈ B(X);
(b) if A ∈ B(X), then Ac ∈ B(X);
!∞
(c) if Ak ∈ B(X), k = 1, 2, 3, . . . , then k =1 Ak ∈ B(X).
D.1.2 Measures
A (signed ) measure µ on the space (X, B(X)) is a function from B(X) to (−∞, ∞] which
is countably additive: if Ak ∈ B(X), k = 1, 2, 3, . . . , and Ai ∩ Aj = ∅, i = j, then
∞
* ∞
µ( Ai ) = µ(Ai ).
i=1 i=1
We say that µ is positive if µ(A) ≥ 0 for any A. The measure µ is called a probability
(or subprobability) measure if it is positive and µ(X) = 1 (or µ(X) < 1).
A positive measure µ is σ-finite if there is a countable collection of sets {Ak } such
that X = ∪Ak and µ(Ak ) < ∞ for each k.
On the real line (R, B(R)) Lebesgue measure µL e b is a positive measure defined for
intervals (a, b] by µL e b (a, b] = b − a, and for the other sets in B(R) by an obvious
extension technique. Lebesgue measure on higher dimensional Euclidean space Rp is
constructed similarly using the area of rectangles as a basic definition.
The total variation norm of a signed measure is defined as
µ
:= sup f dµ, where
the supremum is taken over all measurable functions f from (X, B(X)) to (R, B(R)),
such that |f (x)| ≤ 1 for all x ∈ X.
For a signed measure µ, the state space X may be written as the union of disjoint
sets X+ and X− where
µ(X+ ) − µ(X− ) =
µ
.
This is known as the Hahn decomposition.
554 Some mathematical background
D.1.3 Integrals
Suppose that h is a non-negative measurable function from (X, B(X)) to (R, B(R)). The
Lebesgue integral of h with respect to a positive finite measure µ is defined in three
steps.
Firstly, for A ∈ B(X) define IA (x) = 1 if x ∈ A, and 0 otherwise: IA is called the
indicator function of the set A. In this case we define
IA (x)µ(dx) := µ(A).
X
Next consider simple functions h such that there exist sets {A1 , . . . AN } ⊂ B(X) and
N
positive numbers {b1 , . . . bN } ⊂ R+ with h = k =1 bk IA k .
If h is a simple function we can unambiguously define
N
h(x)µ(dx) := bk µ{Ak }.
X k =1
Finally, since it is possible to show that given any non-negative measurable h, there
exists a sequence of simple functions {hk }∞
k =1 such that for each x ∈ X,
hk (x) ↑ h(x),
we can take
h(x)µ(dx) := lim hk (x)µ(dx)
X k X
which always exists, though it may be infinite.
This approach works if h is non-negative. If not, write
h = h+ − h −
where h+ and h− are both non-negative measurable functions, and define
h(x)µ(dx) := h+ (x)µ(dx) − h− (x)µ(dx),
X X X
if both terms on the right are finite. Such functions are called µ-integrable, or just
integrable if there is no possibility of confusion; and we frequently denote the integral
by
h dµ := h(x)µ(dx).
X
The extension to σ-finite measures is then straightforward.
Convergence of sequences of integrals is central to much of this book. There are
three results which we use regularly:
Theorem D.1.1 (Monotone Convergence Theorem). If µ is a σ-finite positive measure
on (X, B(X)) and {fi : i ∈ Z+ } are measurable functions from (X, B(X)) to (R, B(R))
which satisfy 0 ≤ fi (x) ↑ f (x) for µ-almost every x ∈ X, then
f (x)µ(dx) = lim fi (x)µ(dx). (D.1)
X i X
D.2. Some probability theory 555
Note that in this result the monotone limit f may not be finite even µ-almost
everywhere, but the result continues to hold in the sense that both sides of (D.1) will
be finite or infinite together.
Theorem D.1.2 (Fatou’s Lemma). If µ is a σ-finite positive measure on (X, B(X)) and
{fi : i ∈ Z+ } are non-negative measurable functions from (X, B(X)) to (R, B(R)), then
lim inf fi (x)µ(dx) ≤ lim inf fi (x)µ(dx). (D.2)
X i i X
X −1 {B} := {ω : X(ω) ∈ B} ∈ F
The set of real-valued random variables Y for which the expectation is well defined and
finite is denoted L1 (Ω, F, P). Similarly, we use L∞ (Ω, F, P) to denote the collection of
essentially bounded real-valued random variables Y , that is, those for which there is a
bound M and a set AM ⊂ F with P(AM ) = 0 such that {ω : |Y (ω)| > M } ⊆ AM .
Suppose that Y ∈ L1 (Ω, F, P) and G ⊂ F is a sub-σ-field of F. If Ŷ ∈ L1 (Ω, G, P)
and satisfies
E[Y Z] = E[Ŷ Z] for all Z ∈ L∞ (Ω, G, P),
556 Some mathematical background
then Ŷ is called the conditional expectation of Y given G, and denoted E[Y | G]. The
conditional expectation defined in this way exists and is unique (modulo P-null sets)
for any Y ∈ L1 (Ω, F, P) and any sub σ-field G.
Suppose now that we have another σ-field D ⊂ G ⊂ F. Then
The identity (D.3) is often called “the smoothing property of conditional expectations”.
Those members of T containing a point x are called the neighborhoods of x, and the
complements of open sets are called closed .
A set C is called compact if any cover of C with open sets admits a finite subcover,
and a set D is dense if the smallest closed set containing D (the closure of D) is the
whole space. A set is called pre-compact if it has a compact closure.
When there is a topology assumed on the state spaces for the Markov chains con-
sidered in this book, it is always assumed that these render the space locally compact
and separable metric: a locally compact space is one for which each open neighborhood
of a point contains a compact neighborhood, and a separable space is one for which a
countable dense subset of X exists. A metric space is such that there is a metric d on
X which generates its topology.
For the topological spaces we consider, Lindelöf’s Theorem holds:
Theorem D.3.1 (Lindelöf’s Theorem). If X is a separable metric space, then every
cover of an open set by open sets admits a countable subcover.
If X is a topological space with topology T , then there is a natural σ-field on X
containing T . This σ-field B(X) is defined as
B(X) := {G : T ⊂ G, G a σ-field on X}
Extending the terminology from R, this is often called the Borel σ-field of X:
throughout this book, we have assumed that on a topological space the Borel σ-field
is being addressed, and so our general notation B(X) is consistent in the topological
context with the conventional notation.
A measure µ is called regular if for any set E ∈ B(X),
For the topological spaces we consider, measures on B(X) are regular: we have ([345]
p. 49).
Theorem D.3.2. If X is locally compact and separable, then every σ-finite measure on
B(X) is regular.
fn (x) ↑ f (x) as n → ∞.
Finally, in our context one of the most frequently used of all results on continuous
functions is that which assures us that the convolution operation applied to any pair of
L1 (R, B(R), µL e b ) and L∞ (R, B(R), µL e b ) functions is continuous.
For two functions f, g : R → R, the convolution f ∗ g is the function on R defined
for t ∈ R by
∞
f ∗ g (t) = f (s)g(t − s) ds.
−∞
This is well defined if, for example, both f and g are positive. We have (see [345],
p. 196):
It follows that M can be considered as a metric space with metric | · |w defined for
ν, µ ∈ M by
∞
|ν − µ|w := 2−k gk dν − gk dµ .
k =1
Other metrics relevant to weak convergence are summarized in, for example, [191].
A set of probability measures A ⊂ M is called tight if for every ε ≥ 0 there exists a
compact set C ⊂ X for which
The following result, which characterizes tightness with M viewed as a metric space,
follows from Proposition D.5.6 below.
where we adopt the convention that the infimum of a function over the empty set is
infinity. If X is a closed and unbounded subset of Rk , it is evident that V (x) = |x|p is
coercive for any p > 0. If X is compact, then our convention implies that any positive
function V is coercive because we may set Cn = X for all n ∈ Z+ .
It is easily verified that a collection of probabilities A ⊂ M is tight if and only if a
coercive function V exists such that
sup V dν < ∞.
ν ∈A
Theorem D.5.4. The following are equivalent for a sequence of probabilities {νk : k ∈
Z+ } ⊂ M:
560 Some mathematical background
w
(i) νk −→ ν;
(iv) for every uniformly bounded and equicontinuous family of functions C ⊂ C(X),
lim sup | f dνk − f dν| = 0.
k →∞ f ∈C
we have
lim sup |f (x)| = 0.
k →∞ x∈C c
k
The space C0 (X) is simply the closure of Cc (X), the space of continuous functions with
compact support, in the uniform norm.
A sequence of subprobability measures {νk : k ∈ Z+ } is said to converge vaguely to
a subprobability measure ν if for all f ∈ C0 (X)
lim f dνk = f dν,
k →∞
In this book we often apply the following result, which follows from the observation
that positive lower semicontinuous functions on X are the pointwise supremum of a
collection of positive, continuous functions with compact support (see Theorem D.4.1).
v
Lemma D.5.5. If νk −→ ν, then
lim inf f dνk ≥ f dν (D.6)
k →∞
It is obvious that weak convergence implies vague convergence. On the other hand,
a sequence of probabilities converges weakly if and only if it converges vaguely and is
tight.
The use and direct verification of boundedness in probability will often follow from
the following results: the first of these is a consequence of our assumption that the state
space is locally compact and separable (see Billingsley [36] and Revuz [326]).
w
(ii) If {νk } is tight and each νk is a probability measure, then νn k −→ ν∞ and ν∞ is
a probability measure.
Since a positive supermartingale is convergent, it follows that its sample paths are
bounded with probability one. The following result gives an upper bound on the mag-
nitude of variation of the sample paths of both positive supermartingales, and general
martingales.
562 Some mathematical background
1
P{ max |Mk | ≥ c} ≤ E[|Mn |p ] .
0≤k ≤n cp
1
P{ sup Mk ≥ c} ≤ E[M0 ] .
0≤k ≤∞ c
These results, and related concepts, can be found in Billingsley [37], Chung [72],
Hall and Heyde [151], and of course Doob [99].
The function mn (t) is piecewise linear and is equal to Mi when t = i/n for 0 ≤ t ≤
1. In Theorem D.6.4 below we give conditions under which the normalized sequence
{n−1/2 mn (t) : n ∈ Z+ } converges to a continuous process (Brownian motion) on [0, 1].
This result requires some care in the definition of convergence for a sequence of stochastic
processes.
Let C[0, 1] denote the normed space of all continuous functions φ : [0, 1] → R under
the uniform norm, which is defined as
The vector space C[0, 1] is a complete, separable metric space, and hence the theory of
weak convergence may be applied to analyze measures on C[0, 1].
The stochastic process mn (t) possesses a distribution µn , which is a probability
measure on C[0, 1]. We say that mn (t) converges in distribution to a stochastic process
d
m∞ (t) as n → ∞, which is denoted mn −→ m∞ , if the sequence of measures µn
converges weakly to the distribution µ∞ of m∞ . That is, for any bounded continuous
functional h on C[0, 1],
E[h(mn )] → E[h(m∞ )] as n → ∞.
The limiting process, standard Brownian motion on [0, 1], which we denote by B, is
defined as follows:
D.7. Some results on sequences and numbers 563
1
n
lim E[(Mk − Mk −1 )2 |Fk −1 ] = γ 2 a.s. (D.8)
n →∞ n
k =1
1
n
lim E[(Mk − Mk −1 )2 I{(Mk − Mk −1 )2 ≥ εn}|Fk −1 ] = 0 a.s. (D.9)
n →∞ n
k =1
d
Then (γ 2 n)−1/2 mn −→ B.
Function space limits of this kind are often called invariance principles, though we
have avoided this term because functional CLT seems more descriptive.
{a(n)}, {b(n)} are non-negative sequences such that b(n) → b(∞) <
Lemma D.7.1. If
∞ as n → ∞ and a(j) < ∞, then
∞
a ∗ b (n) → b(∞) a(j) < ∞, n → ∞. (D.10)
j =0
Proof Set b(n) = 0 for n < 0. Since b(n) converges it is bounded, and so by the
Dominated Convergence Theorem
∞
∞
lim a ∗ b (n) = a(j) lim b(n − j) = b(∞) a(j) (D.11)
n →∞ n →∞
j =0 j =0
as required.
The next lemma contains two valuable summation results for series.
Lemma D.7.2. (i) If c(n) is a non-negative sequence, then for any r > 1,
r
c(m) rn ≤ c(m)rm
r−1
n ≥0 m ≥n m ≥0
Lemma D.7.3. Let d denote the greatest common divisor (g.c.d.) of the numbers m, n.
Then there exist integers a, b such that
am + bn = d.
Lemma D.7.4. Suppose that N ⊂ Z+ is a subset of the integers which is closed under
addition: for each j, k ∈ N , j + k ∈ N . Let d denote the greatest common divisor of
the set N . Then there exists n0 < ∞ such that nd ∈ N for all n ≥ n0 .
Bibliography
[1] D. Aldous and P. Diaconis. Strong uniform times and finite random walks. Adv. Applied
Maths., 8:69–97, 1987.
[2] E. Altman, P. Konstantopoulos, and Z. Liu. Stability, monotonicity and invariant quan-
tities in general polling systems. Queueing Syst. Theory Appl., 11:35–57, 1992.
[3] B. D. O. Anderson and J. B. Moore. Optimal Control: Linear Quadratic Methods.
Prentice-Hall, Englewood Cliffs, N.J., 1990.
[4] W.J. Anderson. Continuous-Time Markov Chains: An Applications-Oriented Approach.
Springer-Verlag, New York, 1991.
[5] M. Aoki. State Space Modeling of Time Series. Springer-Verlag, Berlin, 1990.
[6] A. Arapostathis, V. S. Borkar, E. Fernandez-Gaucherand, M. K. Ghosh, and S. I. Marcus.
Discrete-time controlled Markov processes with average cost criterion: a survey. SIAM
J. Control Optim., 31:282–344, 1993.
[7] E. Arjas and E. Nummelin. A direct construction of the R-invariant measure for a
Markov chain on a general state space. Ann. Probab., 4:674–679, 1976.
[8] E. Arjas, E. Nummelin, and R. L. Tweedie. Uniform limit theorems for non-singular
renewal and Markov renewal processes. J. Appl. Probab., 15:112–125, 1978.
[9] S. Asmussen. Applied Probability and Queues. John Wiley & Sons, New York, 1987.
[10] S. Asmussen and P. W. Glynn. Stochastic Simulation: Algorithms and Analysis, vol-
ume 57 of Stochastic Modelling and Applied Probability. Springer-Verlag, New York,
2007.
[11] R. Atar and O. Zeitouni. Lyapunov exponents for finite state nonlinear filtering. SIAM
J. Control Optim., 35(1):36–55, 1997.
[12] K. B. Athreya and P. Ney. Branching Processes. Springer-Verlag, New York, 1972.
[13] K. B. Athreya and P. Ney. A new approach to the limit theory of recurrent Markov
chains. Trans. Amer. Math. Soc., 245:493–501, 1978.
[14] K. B. Athreya and P. Ney. Some aspects of ergodic theory and laws of large numbers
for Harris recurrent Markov chains. Colloquia Mathematica Societatis János Bolyai.
Nonparametric Statistical Inference, 32:41–56, 1980. Budapest, Hungary.
[15] K. B. Athreya and S. G. Pantula. Mixing properties of Harris chains and autoregressive
processes. J. Appl. Probab., 23:880–892, 1986.
[16] K. E. Avrachenkov and J. B. Lasserre. The fundamental matrix of singularly perturbed
Markov chains. Adv. Appl. Probab., 31(3):679–697, 1999.
[17] S. Balaji and S. P. Meyn. Multiplicative ergodicity and large deviations for an irreducible
Markov chain. Stoch. Proc. Applns., 90(1):123–144, 2000.
567
568 Bibliography
[60] H. P. Chan and T. L. Lai. Saddlepoint approximations and nonlinear boundary crossing
probabilities of Markov random walks. Ann. Appl. Probab., 13(2):395–429, 2003.
[61] K. S. Chan. Topics in Nonlinear Time Series Analysis. PhD thesis, Princeton University,
1986.
[62] K. S. Chan. A note on the geometric ergodicity of a Markov chain. Adv. Appl. Probab.,
21:702–704, 1989.
[63] K. S. Chan. Asymptotic behaviour of the Gibbs sampler. J. Amer. Statist. Assoc.,
88:320–326, 1993.
[64] K. S. Chan, J. Petruccelli, H. Tong, and S. W. Woolford. A multiple threshold AR(1)
model. J. Appl. Probab., 22:267–279, 1985.
[65] H. Chen. Fluid approximations and stability of multiclass queueing networks: work-
conserving disciplines. Ann. Appl. Probab., 5(3):637–665, 1995.
[66] R. Chen and R. S. Tsay. On the ergodicity of TAR(1) processes. Ann. Appl. Probab.,
1:613–634, 1991.
[67] R-R. Chen and S. P. Meyn. Value iteration and optimization of multiclass queueing
networks. Queueing Syst. Theory Appl., 32(1-3):65–97, 1999.
[68] P. Chigansky, R. Liptser, and R. van Handel. Intrinsic methods in filter stability. In
Handbook of Nonlinear Filtering. Oxford University Press, 2008. To appear.
[69] Y. S. Chow and H. Robbins. A renewal theorem for random variables which are dependent
or non-identically distributed. Ann. Math. Statist., 34:390–395, 1963.
[70] K. L. Chung. The general theory of Markov processes according to Doeblin. Z.
Wahrscheinlichkeitstheorie und Verw. Geb., 2:230–254, 1964.
[71] K. L. Chung. Markov Chains with Stationary Transition Probabilities. Springer-Verlag,
Berlin, second edition, 1967.
[72] K. L. Chung. A Course in Probability Theory. Academic Press, New York, second
edition, 1974.
[73] K. L. Chung and D. Ornstein. On the recurrence of sums of random variables. Bull.
Amer. Math. Soc., 68:30–32, 1962.
[74] R. Cogburn. The Central Limit Theorem for Markov processes. In L. M. Le Cam,
J. Neyman, and E. L. Scott, editors, Proceedings of the 6th Berkeley Symposium on
Mathematical Statistics and Probability, pages 485–512. University of California Press,
Berkeley, 1972.
[75] R. Cogburn. A uniform theory for sums of Markov chain transition probabilities. Ann.
Probab., 3:191–214, 1975.
[76] J. W. Cohen. The Single Server Queue. North-Holland, Amsterdam, second edition,
1982.
[77] C. Constantinescu and A. Cornea. Potential Theory on Harmonic Spaces. Springer-
Verlag, Berlin, 1972.
[78] P. C. Consul. Evolution of surnames. Int. Statist. Rev., 59:271–278, 1991.
[79] J. N. Corcoran and R. L. Tweedie. Perfect sampling of ergodic Harris chains. Ann. Appl.
Probab., 11(2):438–451, 2001.
[80] J. G. Dai. On positive Harris recurrence of multiclass queueing networks: a unified
approach via fluid limit models. Ann. Appl. Probab., 5(1):49–77, 1995.
Bibliography 571
[81] J. G. Dai and S. P. Meyn. Stability and convergence of moments for multiclass queueing
networks via fluid limit models. IEEE Trans. Automat. Control, 40:1889–1904, 1995.
[82] J. G. Dai and G. Weiss. Stability and instability of fluid models for reentrant lines. Math.
Oper. Res., 21(1):115–134, 1996.
[83] D. Daley. The serial correlation coefficients of waiting times in a stationary single server
queue. J. Austral. Math. Soc., 8:683–699, 1968.
[84] A. de Acosta and P. Ney. Large deviation lower bounds for arbitrary additive functionals
of a Markov chain. Ann. Probab., 26(4):1660–1682, 1998.
[85] B. Delyon and O. Zeitouni. Lyapunov exponents for filtering problems. In Applied
stochastic analysis (London, 1989), volume 5 of Stochastics Monogr., pages 511–521.
Gordon and Breach, New York, 1991.
[86] C. Derman. A solution to a set of fundamental equations in Markov chains. Proc. Amer.
Math. Soc., 5:332–334, 1954.
[87] G. B. Di Masi and L
. Stettner. Ergodicity of hidden Markov models. Math. Control
Signals Systems, 17(4):269–296, 2005.
[88] P. Diaconis. Group Representations in Probability and Statistics. Institute of Mathemat-
ical Statistics, Hayward, Calif., 1988.
[89] P. Diaconis and L. Saloff-Coste. Logarithmic sobolev inequalities for finite Markov chains.
Ann. Appl. Probab., 6(3):695–750, 1996.
[90] P. Diaconis and D. Stroock. Geometric bounds for eigenvalues of Markov chains. Ann.
Appl. Probab., 1:36–61, 1991.
[91] J. Diebolt. Loi stationnaire et loi des fluctuations pour le processus autorégressif général
d’ordre un. C. R. Acad. Sci., 310:449–453, 1990.
[92] J. Diebolt and D. Guégan. Probabilistic properties of the general nonlinear Markovian
process of order one and applications to time series modeling. Technical report 125,
Laboratoire de Statistique Théorique et Appliquée, Université Paris, 1990.
[93] W. Doeblin. Sur les propriétés asymptotiques de mouvement régis par certain types de
chaı̂nes simples. Bull. Math. Soc. Roum. Sci., 39(1):57–115; 39(2), 3–61, 1937.
[94] W. Doeblin. Exposé de la théorie des chaı̂nes simples constantes de Markov à un nombre
fini d’états. Revue Mathematique de l’Union Interbalkanique, 2:77–105, 1938.
[95] W. Doeblin. Eléments d’une théorie générale des chaı̂nes simples constantes de Markoff.
Ann. Sci. Ec. Norm. Sup., 57:61–111, 1940.
[96] M. D. Donsker and S. R. S. Varadhan. Asymptotic evaluation of certain Markov process
expectations for large time. I. II. Comm. Pure Appl. Math., 28:1–47; 28 (1975), 279–301,
1975.
[97] M. D. Donsker and S. R. S. Varadhan. Asymptotic evaluation of certain Markov process
expectations for large time. III. Comm. Pure Appl. Math., 29(4):389–461, 1976.
[98] M. D. Donsker and S. R. S. Varadhan. Asymptotic evaluation of certain Markov process
expectations for large time. IV. Comm. Pure Appl. Math., 36(2):183–212, 1983.
[99] J. L. Doob. Stochastic Processes. John Wiley & Sons, New York, 1953.
[100] R. Douc, G. Fort, E. Moulines, and P. Soulier. Practical drift conditions for subgeometric
rates of convergence. Ann. Appl. Probab., 14(3):1353–1377, 2004.
[101] D. Down, S. P. Meyn, and R. L. Tweedie. Exponential and uniform ergodicity of Markov
processes. Ann. Probab., 23(4):1671–1691, 1995.
572 Bibliography
[123] S. R. Foguel. The ergodic theory of positive operators on continuous functions. Ann.
Scuola Norm. Sup. Pisa, 27:19–51, 1973.
[124] S. R. Foguel. Selected Topics in the Study of Markov Operators. Carolina Lecture Series.
Dept. of Mathematics, University of North Carolina at Chapel Hill, 1980.
[125] G. Fort, S. Meyn, E. Moulines, and P. Priouret. ODE methods for skip-free Markov
chain stability with applications to MCMC. Ann. Appl. Probab., 18(2):664–707, 2008.
[126] G. Fort and E. Moulines. Polynomial ergodicity of Markov transition kernels. Stoch.
Proc. Applns., 103(1):57–99, 2003.
[127] S. G. Foss and R. L. Tweedie. Perfect simulation and backward coupling. Comm. Statist.
Stochastic Models, 14(1-2):187–203, 1998. Special issue in honor of Marcel F. Neuts.
[128] S. G. Foss, R. L. Tweedie, and J. N. Corcoran. Simulating the invariant measures of
Markov chains using backward coupling at regeneration times. Probab. Engrg. Inform.
Sci., 12(3):303–320, 1998.
[129] F. G. Foster. On the stochastic matrices associated with certain queuing processes. Ann.
Math. Statist., 24:355–360, 1953.
[130] J. J. Fuchs and B. Delyon. Adaptive control of a simple time-varying system. IEEE
Trans. Automat. Control, 37:1037–1040, 1992.
[131] Cheng-Der Fuh. Asymptotic operating characteristics of an optimal change point detec-
tion in hidden Markov models. Ann. Statist., 32(5):2305–2339, 2004.
[132] Cheng-Der Fuh and Tze Leung Lai. Asymptotic expansions in multidimensional Markov
renewal theory and first passage times for Markov random walks. Adv. in Appl. Probab.,
33(3):652–673, 2001.
[133] Cheng-Der Fuh and Cun-Hui Zhang. Poisson equation, moment inequalities and quick
convergence for Markov random walks. Stoch. Proc. Applns., 87(1):53–67, 2000.
[134] H. Furstenberg and H. Kesten. Products of random matrices. Ann. Math. Statist.,
31:457–469, 1960.
[135] D. Gamarnik. Extension of the PAC framework to finite and countable Markov chains.
IEEE Trans. Inform. Theory, 49(1):338–345, 2003.
[136] J. M. Gani and I. W. Saunders. Some vocabulary studies of literary texts. Sankhyā Ser.
B, 38:101–111, 1976.
[137] L. Georgiadis, W. Szpankowski, and L. Tassiulas. A scheduling policy with maximal
stability region for ring networks with spatial reuse. Queueing Syst. Theory Appl., 19(1-
2):131–148, 1995.
[138] P. Glynn and R. Szechtman. Some new perspectives on the method of control variates. In
K.T. Fang, F.J. Hickernell, and H. Niederreiter, editors, Monte Carlo and Quasi-Monte
Carlo Methods 2000: Proceedings of a Conference held at Hong Kong Baptist University,
Hong Kong SAR, China, pages 27–49. Springer-Verlag, Berlin, 2002.
[139] P. W. Glynn and S. P. Meyn. A Liapounov bound for solutions of the Poisson equation.
Ann. Probab., 24(2):916–931, 1996.
[140] P. W. Glynn and D. Ormoneit. Hoeffding’s inequality for uniformly ergodic Markov
chains. Statistics and Probability Letters, 56:143–146, 2002.
[141] F. Z. Gong and L. M. Wu. Spectral gap of positive operators and applications. J. Math.
Pure Appl., 85:151–191, 2006.
[142] G. C. Goodwin, P. J. Ramadge, and P. E. Caines. Discrete time stochastic adaptive
control. SIAM J. Control Optim., 19:829–853, 1981.
574 Bibliography
[164] C. Huang and D. Isaacson. Ergodicity using mean visit times. J. Lond. Math. Soc.,
14:570–576, 1976.
[165] J. Huang, I. Kontoyiannis, and S. P. Meyn. The ODE method and spectral theory
of Markov operators. In T. E. Duncan and B. Pasik-Duncan, editors, Proceedings of
the workshop held at the University of Kansas, Lawrence, October 18–20, 2001, pages
205–222. Springer-Verlag, Berlin, 2002.
[166] W. Huisinga, S. Meyn, and C. Schütte. Phase transitions and metastability in Markovian
and molecular systems. Ann. Appl. Probab., 14(1):419–458, 2004.
[167] K. Ichihara and H. Kunita. A classification of the second order degenerate elliptic opera-
tor and its probabilistic characterization. Z. Wahrscheinlichkeitstheorie und Verw. Geb.,
30:235–254, 1974.
[168] N. Ikeda and S. Watanabe. Stochastic Differential Equations and Diffusion Processes.
North-Holland, Amsterdam, 1981.
[169] R. Isaac. Some topics in the theory of recurrent Markov processes. Duke Math. J.,
35:641–652, 1968.
[170] D. Isaacson and R. L. Tweedie. Criteria for strong ergodicity of Markov chains. J. Appl.
Probab., 15:87–95, 1978.
[171] N. Jain. Some limit theorem for a general Markov process. Z. Wahrscheinlichkeitstheorie
und Verw. Geb., 6:206–223, 1966.
[172] N. Jain and B. Jamison. Contributions to Doeblin’s theory of Markov processes. Z.
Wahrscheinlichkeitstheorie und Verw. Geb., 8:19–40, 1967.
[173] B. Jakubczyk and E. D. Sontag. Controllability of nonlinear discrete-time systems: a
lie-algebraic approach. SIAM J. Control Optim., 28:1–33, 1990.
[174] B. Jamison. Asymptotic behavior of successive iterates of continuous functions under a
Markov operator. J. Math. Anal. Appl., 9:203–214, 1964.
[175] B. Jamison. Ergodic decomposition induced by certain Markov operators. Trans. Amer.
Math. Soc., 117:451–468, 1965.
[176] B. Jamison. Irreducible Markov operators on C(S). Proc. Amer. Math. Soc., 24:366–370,
1970.
[177] B. Jamison and S. Orey. Markov chains recurrent in the sense of Harris. Z. Wahrschein-
lichkeitstheorie und Verw. Geb., 8:41–48, 1967.
[178] B. Jamison and R. Sine. Sample path convergence of stable Markov processes. Z.
Wahrscheinlichkeitstheorie und Verw. Geb., 28:173–177, 1974.
[179] S. F. Jarner and S. Hansen. Geometric ergodicity of Metropolis algorithms. Stoch. Proc.
Applns., 85(2):341–361, 2000.
[180] S. F. Jarner and G. O. Roberts. Polynomial convergence rates of Markov chains. Ann.
Appl. Probab., 12(1):224–247, 2002.
[181] A. A. Johnson and G. L. Jones. Gibbs sampling for a Bayesian hierarchical general linear
model. ArXiv:0712.3056 [math.PR], 2007.
[182] D. A. Jones. Non-linear autoregressive processes. Proc. Roy. Soc. A, 360:71–95, 1978.
[183] G. L. Jones. On the Markov chain Central Limit Theorem. Probab. Surv., 1:299–320
(electronic), 2004.
[184] G. L. Jones and J. P. Hobert. Honest exploration of intractable probability distributions
via Markov chain Monte Carlo. Statist. Sci., 16(4):312–334, 2001.
576 Bibliography
[185] A.F. Veinott Jr. Discrete dynamic programming with sensitive discount optimality cri-
teria. Ann. Math. Statist., 40(5):1635–1660, 1969.
[186] M. Kac. On the notion of recurrence in discrete stochastic processes. Bull. Amer. Math.
Soc., 53:1002–1010, 1947.
[187] V. V. Kalashnikov. Analysis of ergodicity of queueing systems by Lyapunov’s direct
method (in russian). Avtomatica i Telemechanica, 4:46–54, 1971.
[188] V. V. Kalashnikov. The property of gamma-reflexivity for Markov sequences. Soviet
Math. Dokl., 14:1869–1873, 1973.
[189] V. V. Kalashnikov. Stability analysis in queueing problems by the method of test func-
tions. Theory Probab. Appl., 22:86–103, 1977.
[190] V. V. Kalashnikov. Qualitative Analysis of Complex Systems Behaviour by the Test
Functions Method (in Russian). Nauka, Moscow, 1978. [in Russian].
[191] V. V. Kalashnikov and S. T. Rachev. Mathematical Methods for Construction of Queue-
ing Models. Wadsworth and Brooks/Cole, New York, 1990.
[192] R. E. Kalman and J. E. Bertram. Control system analysis and design by the second
method of Lyapunov. Trans. ASME Ser. D: J. Basic Eng., 82:371–400, 1960.
[193] M. Kaplan. A sufficient condition for nonergodicity of a Markov chain. IEEE Trans.
Inform. Theory, 25:470–471, 1979.
[194] S. Karlin and H. M. Taylor. A First Course in Stochastic Processes. Academic Press,
New York, second edition, 1975.
[195] H. A. Karlsen. Existence of moments in a stationary stochastic difference equation. Adv.
Appl. Probab., 22:129–146, 1990.
[196] N. V. Kartashov. Criteria for uniform ergodicity and strong stability of Markov chains
with a common phase space. Theory Probab. Appl., 30:71–89, 1985.
[197] N.V. Kartashov. Inequalities in theorems of ergodicity and stability for Markov chains
with a common phase space. Theory Probab. Appl., 30:247–259, 1985.
[198] J. L. Kelley. General Topology. Van Nostrand, Princeton, N.J., 1955.
[199] F. P. Kelly. Reversibility and Stochastic Networks. John Wiley & Sons, Chichester, U.K.,
1979.
[200] D. G. Kendall. Some problems in the theory of queues. J. Roy. Statist. Soc. Ser. B,
13:151–185, 1951.
[201] D. G. Kendall. Stochastic processes occurring in the theory of queues and their analysis
by means of the imbedded Markov chain. Ann. Math. Statist., 24:338–354, 1953.
[202] D. G. Kendall. Unitary dilations of Markov transition operators and the correspond-
ing integral representation for transition-probability matrices. In U. Grenander, editor,
Probability and Statistics, pages 139–161. Almqvist and Wiksell, Stockholm, 1959.
[203] D. G. Kendall. Geometric ergodicity in the theory of queues. In K. J. Arrow, S. Karlin,
and P. Suppes, editors, Mathematical Methods in the Social Sciences, pages 176–195.
Stanford University Press, Stanford, 1960.
[204] D. G. Kendall. Kolmogorov as I remember him. Statist. Sci., 6:303–312, 1991.
[205] G. Kersting. On recurrence and transience of growth models. J. Appl. Probab., 23:614–
625, 1986.
[206] R. Z. Khas’minskii. Stochastic Stability of Differential Equations. Sijthoff & Noordhoff,
Netherlands, 1980.
Bibliography 577
[228] H. Kunita. Supports of diffusion processes and controllability problems. In K. Itô, editor,
Proceedings of the International Symposium on Stochastic Differential Equations, pages
163–185. John Wiley & Sons, New York, 1978.
[229] H. Kunita. Stochastic Flows and Stochastic Differential Equations. Cambridge University
Press, Cambridge, 1990.
[230] B. C. Kuo. Automatic Control Systems. Prentice-Hall, Englewood Cliffs, N.J., sixth
edition, 1990.
[231] T. G. Kurtz. The Central Limit Theorem for Markov chains. Ann. Probab., 9:557–560,
1981.
[232] H. J. Kushner. Stochastic Stability and Control. Academic Press, New York, 1967.
[233] M. T. Lacey and W. Philipp. A note on the almost sure Central Limit Theorem. Statistics
and Probability Letters, 9:201–205, 1990.
[234] J. Lamperti. Criteria for the recurrence or transience of stochastic processes I. J. Math.
Anal. Appl., 1:314–330, 1960.
[235] J. Lamperti. Criteria for stochastic processes II: passage time moments. J. Math. Anal.
Appl., 7:127–145, 1963.
[236] C. Landim. Central limit theorem for Markov processes. In From Classical to Modern
Probability, volume 54 of Progr. Probab., pages 145–205. Birkhäuser, Basel, 2003.
[237] G. M. Laslett, D. B. Pollard, and R. L. Tweedie. Techniques for establishing ergodic
and recurrence properties of continuous-valued Markov chains. Nav. Res. Log. Quart.,
25:455–472, 1978.
[238] M. Lin. Conservative Markov processes on a topological space. Israel J. Math., 8:165–186,
1970.
[239] T. Lindvall. Lectures on the Coupling Method. John Wiley & Sons, New York, 1992.
[240] R. S. Liptster and A. N. Shiryayev. Statistics of Random Processes, II: Applications.
Springer-Verlag, New York, 1978.
[241] R. Lund and R. L. Tweedie. Geometric convergence rates for stochastically ordered
Markov chains. Math. Oper. Res., 20:182–194, 1996.
[242] N. Maigret. Théorème de limite centrale pour une chaine de Markov récurrente Harris
positive. Ann. Inst. Henri Poincaré Ser. B, 14:425–440, 1978.
[243] V. A. Malyšev and M. V. Men šikov. Ergodicity, continuity and analyticity of countable
Markov chains. Trudy Moskov. Mat. Obshch., 39:3–48, 235, 1979. Trans. Moscow Math.
Soc., pp. 1-48, 1981.
[244] V. A. Malyšhev. Classification of two-dimensional positive random walks and almost
linear semi-martingales. Soviet Math. Dokl., 13:136–139, 1972.
[245] R. S. Mamon and R. J. Elliott. Hidden Markov Models in Finance, volume 104 of
International Series in Operations Research & Management Science. Springer-Verlag,
New York, 2007.
[246] P. Marbach and J. N. Tsitsiklis. Simulation-based optimization of Markov reward pro-
cesses. IEEE Trans. Automat. Control, 46(2):191–209, 2001.
[247] I. M. Y. Mareels and R. R. Bitmead. Bifurcation effects in robust adaptive control. IEEE
Trans. Circuits and Systems, 35:835–841, 1988.
[248] A. A. Markov. Extension of the law of large numbers to dependent quantities (in Rus-
sian). Izv. Fiz.-Matem. Obsch. Kazan Univ. (2nd Ser.), 15:135–156, 1906.
Bibliography 579
[249] P. G. Marlin. On the ergodic theory of Markov chains. Operations Res., 21:617–622,
1973.
[250] P. Mathé. Numerical integration using V -uniformly ergodic Markov chains. J. Appl.
Probab., 41(4):1104–1112, 2004.
[251] J. G. Mauldon. On non-dissipative Markov chains. Math. Proc. Camb. Phil. Soc., 53:825–
835, 1958.
[252] M. Maxwell and M. Woodroofe. Central limit theorems for additive functionals of Markov
chains. Ann. Probab., 28(2):713–724, 2000.
[253] D. Q. Mayne. Optimal nonstationary estimation of the parameters of a linear system
with Gaussian inputs. J. Electron. Contr., 14:101, 1963.
[254] A. Medio. Invariant probability distributions in economic models: a gen-
eral result. Macroeconomic Dynamics, 8(2):162–187, 2004. Available at
http://ideas.repec.org/a/cup/macdyn/v8y2004i02p162-187 03.html.
[255] A. I. Mees, editor. Nonlinear Dynamics and Statistics. Birkhäuser, Boston, 2001. Selected
papers from the workshop held at Cambridge University, Cambridge, September 1998.
[256] K.L. Mengersen and R.L. Tweedie. Rates of convergence of the Hastings and Metropolis
algorithms. Ann. Statist., 24:101–121, 1996.
[257] M. V. Men’šikov. Ergodicity and transience conditions for random walks in the positive
octant of space. Soviet Math. Dokl., 15:1118–1121, 1974.
[258] J.-F. Mertens, E. Samuel-Cahn, and S. Zamir. Necessary and sufficient conditions for
recurrence and transience of Markov chains, in terms of inequalities. J. Appl. Probab.,
15:848–851, 1978.
[259] S. P. Meyn. Ergodic theorems for discrete time stochastic systems using a stochastic
Lyapunov function. SIAM J. Control Optim., 27:1409–1439, 1989.
[260] S. P. Meyn. Stability of Markov chains on topological spaces with applications to adap-
tive control and time series analysis. In L. Gerencsér and P. E. Caines, editors, Top-
ics in Stochastic Systems: Modelling, Estimation and Adaptive Control, pages 369–401.
Springer-Verlag, New York, 1991.
[261] S. P. Meyn. The policy iteration algorithm for average reward Markov decision processes
with general state space. IEEE Trans. Automat. Control, 42(12):1663–1680, 1997.
[262] S. P. Meyn. Stability and optimization of queueing networks and their fluid models.
In Mathematics of Stochastic Manufacturing Systems (Williamsburg, VA, 1996), pages
175–199. American Mathematical Society, Providence, R.I., 1997.
[263] S. P. Meyn. Algorithms for optimization and stabilization of controlled Markov chains.
Sādhanā, 24(4-5):339–367, 1999.
[264] S. P. Meyn. Sequencing and routing in multiclass queueing networks I: feedback regula-
tion. SIAM J. Control Optim., 40(3):741–776, 2001.
[265] S. P. Meyn. Large deviation asymptotics and control variates for simulating large func-
tions. Ann. Appl. Probab., 16(1):310–339, 2006.
[266] S. P. Meyn. Myopic policies and MaxWeight policies for stochastic networks. In Proc.
of the 46th Conf. on Dec. and Control, pages 639–646, 2007.
[267] S. P. Meyn. Control Techniques for Complex Networks. Cambridge University Press,
Cambridge, 2008.
[268] S. P. Meyn. Stability and asymptotic optimality of generalized MaxWeight policies. To
appear in SIAM J. Control Optim., 2008.
580 Bibliography
[269] S. P. Meyn and L. J. Brown. Model reference adaptive control of time varying and
stochastic systems. IEEE Trans. Automat. Control, 38:1738–1753, 1993.
[270] S. P. Meyn and P. E. Caines. A new approach to stochastic adaptive control. IEEE
Trans. Automat. Control, AC-32:220–226, 1987.
[271] S. P. Meyn and P. E. Caines. Stochastic controllability and stochastic Lyapunov functions
with applications to adaptive and nonlinear systems. In Stochastic Differential Systems.
Proc. 4th Bad Honnef Conference, pages 235–257. Springer-Verlag, Berlin, 1989.
[272] S. P. Meyn and P. E. Caines. Asymptotic behavior of stochastic systems processing
Markovian realizations. SIAM J. Control Optim., 29:535–561, 1991.
[273] S. P. Meyn and D. G. Down. Stability of generalized Jackson networks. Ann. Appl.
Probab., 4:124–148, 1994.
[274] S. P. Meyn and L. Guo. Stability, convergence, and performance of an adaptive control
algorithm applied to a randomly varying system. IEEE Trans. Automat. Control, AC-
37:535–540, 1992.
[275] S. P. Meyn and L. Guo. Geometric ergodicity of a doubly stochastic time series model.
J. Time Ser. Analysis, 14(1):93–108, 1993.
[276] S. P. Meyn, G. Hagen, G. Mathew, and A. Banasuk. On com-
plex spectra and metastability of Markov models. In Proc. of
the 47th Conf. on Dec. and Control, 2008. More information at
https://css.paperplaza.net/conferences/scripts/abstract.pl?ConfID=32&Number=1318.
[277] S. P. Meyn and R. L. Tweedie. Stability of Markovian processes I: discrete time chains.
Adv. Appl. Probab., 24:542–574, 1992.
[278] S. P. Meyn and R. L. Tweedie. Generalized resolvents and Harris recurrence of Markov
processes. Contemporary Mathematics, 149:227–250, 1993.
[279] S. P. Meyn and R. L. Tweedie. Stability of Markovian processes II: continuous time
processes and sampled chains. Adv. Appl. Probab., 25:487–517, 1993.
[280] S. P. Meyn and R. L. Tweedie. Stability of Markovian processes III: Foster–Lyapunov
criteria for continuous time processes. Adv. Appl. Probab., 25:518–548, 1993.
[281] S. P. Meyn and R. L. Tweedie. The Doeblin decomposition. Contemporary Mathematics,
149:211–225, 1993.
[282] S. P. Meyn and R. L. Tweedie. Computable bounds for convergence rates of Markov
chains. Ann. Appl. Probab., 4:981–1011, 1994.
[283] S. P. Meyn and R. L. Tweedie. State-dependent criteria for convergence of Markov
chains. Ann. Appl. Probab., 4:149–168, 1994.
[284] H. D. Miller. Geometric ergodicity in a class of denumerable Markov chains. Z.
Wahrscheinlichkeitstheorie und Verw. Geb., 4:354–373, 1966.
[285] S. Mittnik. Nonlinear time series analysis with generalized autoregressions: a state space
approach. Working paper WP-91-06, State University of New York at Stony Brook, Stony
Brook, N.Y., 1991.
[286] A. Mokkadem. Criteres de melange pour des processus stationnaires. Estimation sous
des hypotheses de melange. Entropie de processus lineaires. PhD thesis, Université Paris
Sud, Centre d’Orsay, 1987.
[287] P. A. P. Moran. The statistical analysis of the Canadian lynx cycle I: structure and
prediction. Aust. J. Zool., 1:163–173, 1953.
[288] P. A. P. Moran. The Theory of Storage. Methuen, London, 1959.
Bibliography 581
[289] M. D. Moustafa. Input-output Markov processes. Proc. Koninkl. Ned. Akad. Wetensch.,
A60:112–118, 1957.
[290] P. Mykland, L. Tierney, and B. Yu. Regeneration in Markov chain samplers. J. Amer.
Statist. Assoc., 90(429), 1995.
[291] E. Nelson. The adjoint Markoff process. Duke Math. J., 25:671–690, 1958.
[292] M. F. Neuts. Two Markov chains arising from examples of queues with state dependent
service times. Sankhyā Ser. A, 297:259–264, 1967.
[293] M. F. Neuts. Markov chains with applications in queueing theory, which have a matrix-
geometric invariant probability vector. Adv. Appl. Probab., 10:185–212, 1978.
[294] Marcel F. Neuts. Matrix-Geometric Solutions in Stochastic Models: An Algorithmic
Approach. Dover Publications, New York, 1994. Corrected reprint of the 1981 original.
[295] J. Neveu. Potentiel Markovien récurrent des chaı̂nes de Harris. Ann. Inst. Fourier,
Grenoble, 22:7–130, 1972.
[296] J. Neveu. Discrete-Parameter Martingales. North-Holland, Amsterdam, 1975.
[297] P. Ney and E. Nummelin. Markov additive processes I: eigenvalue properties and limit
theorems. Ann. Probab., 15(2):561–592, 1987.
[298] P. Ney and E. Nummelin. Markov additive processes II: large deviations. Ann. Probab.,
15(2):593–609, 1987.
[299] D. F. Nicholls and B. G. Quinn. Random Coefficient Autoregressive Models: An Intro-
duction. Springer-Verlag, New York, 1982.
[300] S. Niemi and E. Nummelin. Central limit theorems for Markov random walks. Com-
mentationes Physico-Mathematicae, 54, 1982.
[301] E. Nummelin. A splitting technique for Harris recurrent chains. Z. Wahrscheinlichkeit-
stheorie und Verw. Geb., 43:309–318, 1978.
[302] E. Nummelin. Uniform and ratio limit theorems for Markov renewal and semi-
regenerative processes on a general state space. Ann. Inst. Henri Poincaré Ser. B,
14:119–143, 1978.
[303] E. Nummelin. General Irreducible Markov Chains and Nonnegative Operators. Cam-
bridge University Press, Cambridge, 1984.
[304] E. Nummelin. On the Poisson equation in the potential theory of a single kernel. Math.
Scand., 68:59–82, 1991.
[305] E. Nummelin and P. Tuominen. Geometric ergodicity of Harris recurrent Markov chains
with applications to renewal theory. Stoch. Proc. Applns., 12:187–202, 1982.
[306] E. Nummelin and P. Tuominen. The rate of convergence in Orey’s theorem for Har-
ris recurrent Markov chains with applications to renewal theory. Stoch. Proc. Applns.,
15:295–311, 1983.
[307] E. Nummelin and R. L. Tweedie. Geometric ergodicity and R-positivity for general
Markov chains. Ann. Probab., 6:404–420, 1978.
[308] S. Orey. Recurrent Markov chains. Pacific J. Math., 9:805–827, 1959.
[309] S. Orey. Limit Theorems for Markov Chain Transition Probabilities. Van Nostrand
Reinhold, London, 1971.
[310] D. Ornstein. Random walks I. Trans. Amer. Math. Soc., 138:1–43, 1969.
[311] D. Ornstein. Random walks II. Trans. Amer. Math. Soc., 138:45–60, 1969.
582 Bibliography
[312] A. Pakes and D. Pollard. Simulation and the asymptotics of optimization estimators.
Econometrica, 57:1027–1057, 1989.
[313] A. G. Pakes. Some conditions for ergodicity and recurrence of Markov chains. Operations
Res., 17:1048–1061, 1969.
[314] K. R. Parthasarathy. Probability Measures on Metric Spaces. Academic Press, New York,
1967.
[315] J. Petruccelli and S. W. Woolford. A threshold AR(1) model. J. Appl. Probab., 21:270–
286, 1984.
[316] R. G. Phillips and P. V. Kokotovic. A singular pertubation approach to modeling and
control of Markov chains. IEEE Trans. Automat. Control, 26(5), 1981.
[317] J. W. Pitman. Uniform rates of convergence for Markov chain transition probabilities.
Z. Wahrscheinlichkeitstheorie und Verw. Geb., 29:193–227, 1974.
[318] D. B. Pollard and R. L. Tweedie. R-theory for Markov chains on a topological state
space I. J. London Math. Society, 10:389–400, 1975.
[319] D. B. Pollard and R. L. Tweedie. R-theory for Markov chains on a topological state
space II. Z. Wahrscheinlichkeitstheorie und Verw. Geb., 34:269–278, 1976.
[320] N. Popov. Conditions for geometric ergodicity of countable Markov chains. Soviet Math.
Dokl., 18:676–679, 1977.
[321] M. Pourahmadi. On stationarity of the solution of a doubly stochastic model. J. Time
Ser. Anal., 7:123–131, 1986.
[322] N. U. Prabhu. Queues and Inventories. John Wiley & Sons, New York, 1965.
[323] S. Rai, P. Glynn, and J.E. Glynn. Recurrence classification for a family of non-linear
storage models. Submitted for publication., 2007.
[324] R. Rayadurgam, S. P. Meyn, and L. Brown. Bayesian adaptive control of time varying
systems. In Proc. of the 31st Conf. on Dec. and Control, Tucson, Ariz., 1992.
[325] S. I. Resnick. Extreme Values, Regular Variation and Point Processes. Springer-Verlag,
New York, 1987.
[326] D. Revuz. Markov Chains. North-Holland, Amsterdam, second edition, 1984.
[327] C. P. Robert and G. Casella. Monte Carlo Statistical Methods. Springer-Verlag, New
York, second edition, 2004.
[328] G. O. Roberts and N. Polson. A note on the geometric convergence of the Gibbs sampler.
J. Roy. Statist. Soc. Ser. B, 56:377–384, 1994.
[329] G. O. Roberts and J. S. Rosenthal. Geometric ergodicity and hybrid markov chains.
Electronic Comm. Probab., 2:13–25, 1997.
[330] G. O. Roberts and J. S. Rosenthal. Convergence of the slice sampler. J. Roy. Statist.
Soc. Ser. B, 61:643–660, 1999.
[331] G. O. Roberts and J. S. Rosenthal. The polar slice sampler. Stoch. Models, 18:257–280,
2002.
[332] G. O. Roberts and J. S. Rosenthal. General state space Markov chains and MCMC
algorithms. Probab. Surv., 1:20–71 (electronic), 2004.
[333] G. O. Roberts and A. F. M. Smith. Simple conditions for the convergence of the Gibbs
sampler and Hastings–Metropolis algorithms. Stoch. Proc. Applns., 49(2):207–216, 1994.
[334] G. O. Roberts and R. L. Tweedie. Exponential convergence of Langevin distributions
and their discrete approximations. Bernoulli, 2(4):341–363, 1996.
Bibliography 583
[335] G. O. Roberts and R. L. Tweedie. Geometric convergence and Central Limit Theorems
for multidimensional Hastings and Metropolis algorithms. Biometrika, 83:95–100, 1996.
[336] Z. Rosberg. A note on the ergodicity of Markov chains. J. Appl. Probab., 18:112–121,
1981.
[337] M. Rosenblatt. Equicontinuous Markov operators. Teor. Verojatnost. i Primenen, 9:205–
222, 1964.
[338] M. Rosenblatt. Invariant and subinvariant measures of transition probability functions
acting on continuous functions. Z. Wahrscheinlichkeitstheorie und Verw. Geb., 25:209–
221, 1973.
[339] M. Rosenblatt. Recurrent points and transition functions acting on continuous functions.
Z. Wahrscheinlichkeitstheorie und Verw. Geb., 30:173–183, 1974.
[340] W. A. Rosenkrantz. Ergodicity conditions for two-dimensional Markov chains on the
positive quadrant. Prob. Theory and Related Fields, 83:309–319, 1989.
[341] J. S. Rosenthal. Rates of Convergence for Gibbs Sampler and Other Markov Chains.
PhD thesis, Harvard University, 1992.
[342] J. S. Rosenthal. Correction: “Minorization conditions and convergence rates for Markov
chain Monte Carlo”. J. Amer. Statist. Assoc., 90(431):1136, 1995.
[343] J. S. Rosenthal. Minorization conditions and convergence rates for Markov chain Monte
Carlo. J. Amer. Statist. Assoc., 90(430):558–566, 1995.
[344] J. S. Rosenthal. Quantitative convergence rates of Markov chains: a simple account.
Electron. Comm. Probab., 7:123–128 (electronic), 2002.
[345] W. Rudin. Real and Complex Analysis. McGraw-Hill, New York, 2nd edition, 1974.
[346] S. H. Saperstone. Semidynamical Systems in Infinite Dimensional Spaces. Springer-
Verlag, New York, 1981.
[347] P. J. Schweitzer. Perturbation theory and finite Markov chains. J. Appl. Prob., 5:401–
403, 1968.
[348] E. Seneta. Non-Negative Matrices and Markov Chains. Springer, New York, second
edition, 1981.
[349] L. I. Sennott, P. A. Humblet, and R. L. Tweedie. Mean drifts and the non-ergodicity of
Markov chains. Operations Res., 31:783–789, 1983.
[350] J. G. Shanthikumar and D. D. Yao. Second-order properties of the throughput of a
closed queueing network. Math. Oper. Res., 13:524–533, 1988.
[351] T. Shardlow and A. M. Stuart. A perturbation theory for ergodic Markov chains and
application to numerical approximations. SIAM J. Numer. Anal., 37(4):1120–1137, 2000.
[352] M. Sharpe. General Theory of Markov Processes. Academic Press, New York, 1988.
[353] Z. Šidák. Classification of Markov chains with a general state space. In Trans. 4th Prague
Conf. Inf. Theory Stat. Dec. Functions, Random Proc, pages 547–571. Academia, Prague,
1967.
[354] K. Sigman. The stability of open queueing networks. Stoch. Proc. Applns., 35:11–25,
1990.
[355] G. F. Simmons. Introduction to Topology and Modern Analysis. McGraw Hill, New York,
1963.
[356] R. Sine. Convergence theorems for weakly almost periodic Markov operators. Israel J.
Math., 19:246–255, 1974.
584 Bibliography
[357] R. Sine. On local uniform mean convergence for Markov operators. Pacific J. Math.,
60:247–252, 1975.
[358] R. Sine. Sample path convergence of stable Markov processes II. Indiana University
Math. J., 25:23–43, 1976.
[359] A. F. M. Smith and A. E. Gelfand. Bayesian statistics without tears: a sampling-
resampling perspective. Amer. Statist., 46:84–88, 1992.
[360] A. F. M. Smith and G. O. Roberts. Bayesian computation via the Gibbs sampler and
related Markov chain Monte Carlo methods (with discussion). J. Roy. Statist. Soc. Ser.
B, 55:3–23, 1993.
[361] W. L. Smith. Asymptotic renewal theorems. Proc. Roy. Soc. Edinburgh (A), 64:9–48,
1954.
[362] W. L. Smith. Regenerative stochastic processes. Proc. Roy. Soc. London (A), 232:6–31,
1955.
[363] W. L. Smith. Remarks on the paper “Regenerative stochastic processes”. Proc. Roy.
Soc. London (A), 256:296–301, 1960.
[364] J. Snyders. Stationary probability distributions for linear time-invariant systems. SIAM
J. Control Optim., 15:428–437, 1977.
[365] V. Solo. Stochastic adaptive control and martingale limit theory. IEEE Trans. Automat.
Control, 35:66–70, 1990.
[366] F. M. Spieksma. Geometrically Ergodic Markov Chains and the Optimal Control of
Queues. PhD thesis, University of Leiden, 1991.
[367] F. M. Spieksma. Spectral conditions and bounds for the rate of convergence of countable
Markov chains. Technical report, University of Leiden, 1993.
[368] F. M. Spieksma and R. L. Tweedie. Strengthening ergodicity to geometric ergodicity for
Markov chains. Stochastic Models, 10:45–75, 1994.
[369] F. Spitzer. Principles of Random Walk. Van Nostrand, Princeton, N.J., 1964.
[370] D. Steinsaltz. Locally contractive iterated function systems. Ann. Probab., 27(4):1952–
1979, 1999.
[371] Ö. Stenflo. Uniqueness of invariant measures for place-dependent random iterations of
functions. In Fractals in Multimedia (Minneapolis, MN, 2001), volume 132 of IMA Vol.
Math. Appl., pages 13–32. Springer, New York, 2002.
[372] L. Stettner. On the existence and uniqueness of invariant measures for continuous time
Markov processes. Technical report LCDS 86-18, Brown University, Providence, R.I.,
1986.
[373] A. L. Stolyar. On the stability of multiclass queueing networks: a relaxed sufficient
condition via limiting fluid processes. Markov Process. Related Fields, 1(4):491–512,
1995.
[374] C. R. Stone. On absolutely continuous components and renewal theory. Ann. Math.
Statist., 37:271–275, 1966.
[375] O. Stramer and R. L. Tweedie. Langevin-type models I: diffusions with given stationary
distributions and their discretizations. Methodol. Comput. Appl. Probab., 1(3):283–306,
1999.
[376] O. Stramer and R. L. Tweedie. Langevin-type models II: self-targeting candidates for
MCMC algorithms. Methodol. Comput. Appl. Probab., 1(3):307–328, 1999.
Bibliography 585
[398] R. L. Tweedie. Criteria for classifying general Markov chains. Adv. Appl. Probab., 8:737–
771, 1976.
[399] R. L. Tweedie. Operator geometric stationary distributions for Markov chains with
applications to queueing models. Adv. Appl. Probab., 14:368–391, 1981.
[400] R. L. Tweedie. Criteria for rates of convergence of Markov chains with application
to queueing and storage theory. In J. F. C. Kingman and G. E. H. Reuter, editors,
Probability, Statistics and Analysis. Cambridge University Press, Cambridge, 1983.
[401] R. L. Tweedie. The existence of moments for stationary Markov chains. J. Appl. Probab.,
20:191–196, 1983.
[402] R. L. Tweedie. Invariant measures for Markov chains with no irreducibility assumptions.
J. Appl. Probab., 25A:275–285, 1988.
[403] D. Vere-Jones. Geometric ergodicity in denumerable Markov chains. Quart. J. Math.
Oxford (2nd Ser.), 13:7–28, 1962.
[404] D. Vere-Jones. A rate of convergence problem in the theory of queues. Theory Probab.
Appl., 9:96–103, 1964.
[405] D. Vere-Jones. Ergodic properties of nonnegative matrices. I. Pacific J. Math., 22:361–
386, 1967.
[406] D. Vere-Jones. Ergodic properties of nonnegative matrices II. Pacific J. Math., 26:601–
620, 1968.
[407] L. Wu and N. Yao. Large deviation principles for Markov processes via Φ-Sobolev
inequalities. Electron. Commun. Probab., 13:10–23, 2008.
[408] Liming Wu. Essential spectral radius for Markov semigroups I: discrete time case. Prob.
Theory Related Fields, 128(2):255–321, 2004.
[409] L.M. Wu. Large deviations for Markov processes under superboundedness. C. R. Acad.
Sci Paris Série I, 324:777–782, 1995.
[410] B. E. Ydstie. Bifurcations and complex dynamics in adaptive control systems. In Proc.
of the 25th Conf. on Dec. and Control, Athens, Greece, 1986.
[411] M. Yor and B. Bru. Comments on the life and mathematical legacy of Wolfgang Doeblin.
Finance and Stochastics, 6(1):3–47, 2002.
[412] K. Yosida and S. Kakutani. Operator-theoretical treatment of Markov’s process and
mean ergodic theorem. Ann. Math., 42:188–228, 1941.
Index
General index
587
588 Indexes
593
594 Indexes
, Absolute continuity, 75
τA := min{n ≥ 1 : Φn ∈ A}, 64
τA , First entry time to A, 14
τA (1) := τA , 64
τA (k) := min{n > τA (k − 1) : Φn ∈ A}, 64
θ k , k t h order shift operator, 63
ϕ, Irreducibility measure, 81
hY , Almost everywhere invariant function,
423
mn (t), Interpolation of Mn (g), 447
qj , Prob. of j arrivals in one service in
M/G/1 queue, 58
sj (f ), Sum of f (Φi ) between visits to atom,
428
sn (t), Interpolation of Sn (g), 447
B(R), Borel σ-field, 553
B+ (X), Sets with ψ(A) > 0, 84
FζΦ :={A ∈ F : {ζ = n}∩A ∈ FnΦ , n ∈ Z+ },
66
FnΦ := σ(Φ0 , . . . , Φn ), 63
G + (γ), Distributions with transform con-
vergent in [0, γ], 399
M, Borel probability measures, 17
P̌ (xi , A) Split transition function, 99
ĝ, Solution to Poisson’s equation, 443
π̃{A}, 425
P k (x, · ), Cesaro average of P k , 288
Φ, Markov chain, 3, 61
Φm , m-skeleton chain, 62
Φa , Chain with transition function Ka , 115
Φa , Sampled chain, 116
vec (B), 415
Px , Prob. conditional on Φ0 = x, 14
X, State space, 50
B(X), Borel σ-field on X, 49
C(X), Bounded continuous functions, 557
C(X), Continuous bounded functions on X,
125
C0 (X), Continuous functions vanishing at
∞, 560
Cc (X), Continuous functions on X, compact
support, 140
A P (x, B) := Px (Φ n ∈ B, τA ≥ n), 67
n