0% found this document useful (0 votes)

113 views624 pages

Markov Chains Stochastic Stability

Stochastic Stability

Uploaded by

Aidan Holwerda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

113 views624 pages

Markov Chains Stochastic Stability

Stochastic Stability

Uploaded by

Aidan Holwerda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 624

This page intentionally left blank

MARKOV CHAINS AND

STOCHASTIC STABILITY

Second Edition

Meyn and Tweedie is back!

The bible on Markov chains in general state spaces has been brought up to date to
reflect developments in the field since 1996 – many of them sparked by publication
of the first edition.
The pursuit of more efficient simulation algorithms for complex Markovian
models, or algorithms for computation of optimal policies for controlled Markov
models, has opened new directions for research on Markov chains. As a result,
new applications have emerged across a wide range of topics including optimiza-
tion, statistics, and economics. New commentary and an epilogue by Sean Meyn
summarize recent developments, and references have been fully updated.
This second edition reflects the same discipline and style that marked out the
original and helped it to become a classic: proofs are rigorous and concise, the
range of applications is broad and knowledgeable, and key ideas are accessible to
practitioners with limited mathematical background.

“This second edition remains true to the remarkable standards of scholarship

established by the first edition . . . a very welcome addition to the literature.”
Peter W. Glynn
Prologue to the Second Edition
MARKOV CHAINS AND
STOCHASTIC STABILITY
Second Edition

SEAN MEYN AND RICHARD L. TWEEDIE

CAMBRIDGE UNIVERSITY PRESS
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo

Cambridge University Press

The Edinburgh Building, Cambridge CB2 8RU, UK
Published in the United States of America by Cambridge University Press, New York

www.cambridge.org
Information on this title: www.cambridge.org/9780521731829

© S. Meyn and R. L. Tweedie 2009

This publication is in copyright. Subject to statutory exception and to the

provision of relevant collective licensing agreements, no reproduction of any part
may take place without the written permission of Cambridge University Press.
First published in print format 2009

ISBN-13 978-0-511-51671-9 eBook (EBL)

ISBN-13 978-0-521-73182-9 paperback

Cambridge University Press has no responsibility for the persistence or accuracy

of urls for external or third-party internet websites referred to in this publication,
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
Contents

Asterisks (*) mark sections from the ﬁrst edition that have been revised or augmented
in the second edition.

List of ﬁgures xi

Prologue to the second edition, Peter W. Glynn xiii

Preface to the second edition, Sean Meyn xvii

Preface to the ﬁrst edition xxi

I COMMUNICATION and REGENERATION 1

1 Heuristics 3
1.1 A range of Markovian environments 3
1.2 Basic models in practice 6
1.3 Stochastic stability for Markov models 13
1.4 Commentary 19

2 Markov models 21
2.1 Markov models in time series 22
2.2 Nonlinear state space models* 26
2.3 Models in control and systems theory 33
2.4 Markov models with regeneration times 38
2.5 Commentary* 46

3 Transition probabilities 48
3.1 Defining a Markovian process 49
3.2 Foundations on a countable space 51
3.3 Specific transition matrices 54
3.4 Foundations for general state space chains 59
3.5 Building transition kernels for specific models 67
3.6 Commentary 72

v
vi Contents

4 Irreducibility 75
4.1 Communication and irreducibility: Countable spaces 76
4.2 ψ-Irreducibility 81
4.3 ψ-Irreducibility for random walk models 87
4.4 ψ-Irreducible linear models 89
4.5 Commentary 93

5 Pseudo-atoms 96
5.1 Splitting ϕ-irreducible chains 97
5.2 Small sets 102
5.3 Small sets for speciﬁc models 106
5.4 Cyclic behavior 110
5.5 Petite sets and sampled chains 115
5.6 Commentary 121

6 Topology and continuity 123

6.1 Feller properties and forms of stability 125
6.2 T-chains 130
6.3 Continuous components for speciﬁc models 134
6.4 e-Chains 139
6.5 Commentary 144

7 The nonlinear state space model 146

7.1 Forward accessibility and continuous components 147
7.2 Minimal sets and irreducibility 154
7.3 Periodicity for nonlinear state space models 157
7.4 Forward accessible examples 161
7.5 Equicontinuity and the nonlinear state space model 163
7.6 Commentary* 165

II STABILITY STRUCTURES 169

8 Transience and recurrence 171
8.1 Classifying chains on countable spaces 173
8.2 Classifying ψ-irreducible chains 177
8.3 Recurrence and transience relationships 182
8.4 Classiﬁcation using drift criteria 187
8.5 Classifying random walk on R+ 193
8.6 Commentary* 197

9 Harris and topological recurrence 199

9.1 Harris recurrence 201
9.2 Non-evanescent and recurrent chains 206
9.3 Topologically recurrent and transient states 208
9.4 Criteria for stability on a topological space 213
9.5 Stochastic comparison and increment analysis 218
9.6 Commentary 228
Contents vii

10 The existence of π 229

10.1 Stationarity and invariance 230
10.2 The existence of π: chains with atoms 234
10.3 Invariant measures for countable space models* 236
10.4 The existence of π: ψ-irreducible chains 241
10.5 Invariant measures for general models 247
10.6 Commentary 253

11 Drift and regularity 256

11.1 Regular chains 258
11.2 Drift, hitting times and deterministic models 261
11.3 Drift criteria for regularity 263
11.4 Using the regularity criteria 272
11.5 Evaluating non-positivity 278
11.6 Commentary 285

12 Invariance and tightness 288

12.1 Chains bounded in probability 289
12.2 Generalized sampling and invariant measures 292
12.3 The existence of a σ-ﬁnite invariant measure 298
12.4 Invariant measures for e-chains 300
12.5 Establishing boundedness in probability 305
12.6 Commentary 308

III CONVERGENCE 311

13 Ergodicity 313
13.1 Ergodic chains on countable spaces 316
13.2 Renewal and regeneration 320
13.3 Ergodicity of positive Harris chains 326
13.4 Sums of transition probabilities 329
13.5 Commentary* 334

14 f -Ergodicity and f -regularity 336

14.1 f -Properties: chains with atoms 338
14.2 f -Regularity and drift 342
14.3 f -Ergodicity for general chains 349
14.4 f -Ergodicity of speciﬁc models 352
14.5 A key renewal theorem 354
14.6 Commentary* 359

15 Geometric ergodicity 362

15.1 Geometric properties: chains with atoms 364
15.2 Kendall sets and drift criteria 372
15.3 f -Geometric regularity of Φ and its skeleton 380
15.4 f -Geometric ergodicity for general chains 384
15.5 Simple random walk and linear models 388
viii Contents

15.6 Commentary* 390

16 V -Uniform ergodicity 392

16.1 Operator norm convergence 395
16.2 Uniform ergodicity 400
16.3 Geometric ergodicity and increment analysis 407
16.4 Models from queueing theory 411
16.5 Autoregressive and state space models 414
16.6 Commentary* 418

17 Sample paths and limit theorems 421

17.1 Invariant σ-ﬁelds and the LLN 423
17.2 Ergodic theorems for chains possessing an atom 428
17.3 General Harris chains 433
17.4 The functional CLT 443
17.5 Criteria for the CLT and the LIL 450
17.6 Applications 454
17.7 Commentary* 456

18 Positivity 462
18.1 Null recurrent chains 464
18.2 Characterizing positivity using P n 469
18.3 Positivity and T-chains 471
18.4 Positivity and e-chains 473
18.5 The LLN for e-chains 477
18.6 Commentary 480

19 Generalized classiﬁcation criteria 482

19.1 State-dependent drifts 483
19.2 History-dependent drift criteria 491
19.3 Mixed drift conditions 498
19.4 Commentary* 508

20 Epilogue to the second edition 510

20.1 Geometric ergodicity and spectral theory 510
20.2 Simulation and MCMC 521
20.3 Continuous time models 523

IV APPENDICES 529
A Mud maps 532
A.1 Recurrence versus transience 532
A.2 Positivity versus nullity 534
A.3 Convergence properties 536
Contents ix

B Testing for stability 538

B.1 Glossary of drift conditions 538
B.2 The scalar SETAR model: a complete classiﬁcation 540

C Glossary of model assumptions 543

C.1 Regenerative models 543
C.2 State space models 546

D Some mathematical background 552

D.1 Some measure theory 552
D.2 Some probability theory 555
D.3 Some topology 556
D.4 Some real analysis 557
D.5 Convergence concepts for measures 558
D.6 Some martingale theory 561
D.7 Some results on sequences and numbers 563

Bibliography 567

Indexes 587
General index 587
Symbols 593
List of ﬁgures

1.1 Sample paths of deterministic and stochastic linear models 8

1.2 Random walk sample paths from three diﬀerent models 11
1.3 Random walk paths reﬂected at zero 13

2.1 Sample paths from the linear model 24

2.2 Sample paths from the simple bilinear model 28
2.3 The gumleaf attractor 31
2.4 Sample paths from the dependent parameter bilinear model 33
2.5 Sample paths from the SAC model 37
2.6 Disturbance for the SAC model 37
2.7 Typical sample path of the single server queue 41
2.8 Storage system paths 45

4.1 Block decomposition of P into communicating classes 79

16.1 Simple adaptive control model when the control is set equal to zero 418

20.1 Estimates of the steady state customer population for a network model 522

B.1 The SETAR model: stability classification of (θ(1), θ(M ))-space 540
B.2 The SETAR model: stability classification of (φ(1), φ(M ))-space 541
B.3 The SETAR model: stability classification of (φ(1), φ(M ))-space 542

xi
Prologue to the second edition
Markov Chains and Stochastic Stability is one of those rare instances of a young book
that has become a classic. In understanding why the community has come to regard
the book as a classic, it should be noted that all the key ingredients are present. Firstly,
the material that is covered is both interesting mathematically and central to a number
of important applications domains. Secondly, the core mathematical content is non-
trivial and had been in constant evolution over the years and decades prior to the
ﬁrst edition’s publication; key papers were scattered across the literature and had been
published in widely diverse journals. So, there was an obvious need for a thoughtful
and well-organized book on the topic. Thirdly, and most important, the topic attracted
two authors who were research experts in the area and endowed with remarkable skill
in communicating complex ideas to specialists and applications-focused users alike,
and who also exhibited superb taste in deciding which key ideas and approaches to
emphasize.

When the first edition of the book was published in 1993, Markov chains already
had a long tradition as mathematical models for stochastically evolving dynamical sys-
tems arising in the physical sciences, economics, and engineering, largely centered on
discrete state space formulations. A great deal of theory had been developed related
to Markov chain theory, both in discrete state space and general state space. However,
the general state space theory had grown to include multiple (and somewhat divergent)
mathematical strands, having much to do with the fact that there are several natural
(but different) ways that one can choose to generalize the fundamental countable state
concept of irreducibility to general state space. Roughly speaking, one strand took ad-
vantage of topological ideas, compactness methods, and required Feller continuity of the
transition kernel. The second major strand, starting with the pioneering work of Harris
in the 1950s, subsequently amplified by Orey, and later simplified through the beautiful
contributions of Nummelin, Athreya, and Ney in the 1970s, can be viewed as an effort
to understand general state space Markov chains through the prism of regeneration.
Thus, Meyn and Tweedie had to make some key decisions regarding the general state
space tools that they would emphasize in the book. The span of time that has elapsed
since this book’s publication makes clear that they chose well.
While offering an excellent and accessible discussion of methods based on topologi-
cal machinery, the book focuses largely on the more widely applicable and more easily
used concept of regeneration in general state space. In addition, the book recognizes
the central role that Foster–Lyapunov functions play in verifying recurrence and bound-
ing the moments and expectations that arise naturally in development of the theory of

xiii
xiv Prologue to the second edition

Markov chains. In choosing to emphasize these ideas, the authors were able to offer
the community, and especially practitioners, a convenient and easily applied roadmap
through a set of concepts and ideas that had previously been accessible only to special-
ists. Sparked by the publication of the first edition of this book, there has subsequently
been an explosion in the number of papers involving applications of general state space
Markov chains.
As it turns out, the period that has elapsed since publication of the first edition
also fortuitously coincided with the rapid development of several key applications areas
in which the tools developed in the book have played a fundamental role. Perhaps
the most important such application is that of Markov chain Monte Carlo (MCMC)
algorithms. In the MCMC setting, the basic problem at hand is the construction of an
efficient algorithm capable of sampling from a given target distribution, which is known
up to a normalization constant that is not numerically or analytically computable. The
idea is to produce a Markov chain having a unique stationary distribution that coincides
with the target distribution. Constructing such a Markov chain is typically easy, so one
has many potential choices. Since the algorithm is usually initialized with an initial
distribution that is atypical of equilibrium behavior, one then wishes to find a chain
that converges to its steady state rapidly. The tools discussed in this book play a
central role in answering such questions. General state space Markov chain ideas also
have been used to great effect in other rapidly developing algorithmic contexts such as
machine learning and in the analysis of the many randomized algorithms having a time
evolution described by a stochastic recursive sequence. Finally, many of the performance
engineering applications that have been explored over the past fifteen years leverage
off this body of theory, particularly those results that have involved trying to make
rigorous the connection between stability of deterministic fluid models and stability of
the associated stochastic queueing analogue. Given the ubiquitous nature of stochastic
systems or algorithms described through stochastic recursive sequences, it seems likely
that many more applications of the theory described in this book will arise in the years
ahead. So, the marketplace of potential consumers of this book is likely to be a healthy
one for many years to come.
Even the appendices are testimony to the hard work and exacting standards the
authors brought to this project. Through additional (and very useful) discussion, these
appendices provide readers with an opportunity to see the power of the concepts of
stability and recurrence being exercised in the setting of models that are both mathe-
matically interesting and of importance in their own right. In fact, some readers will
find that the appendices are a good way to quickly remind themselves of the methods
that exist to establish a particular desired property of a Markov chain model.

This second edition remains true to the remarkable standards of scholarship estab-
lished by the ﬁrst edition. As noted above, a number of applications domains that
are consumers of this theory have developed rapidly since the publication of the ﬁrst
edition. As one would expect with any mathematically vibrant area, there have also
been important theoretical developments over that span of time, ranging from the ex-
ploration of these ideas in studying large deviations for additive functionals of Markov
chains to the generalization of these concepts to the setting of continuous time Markov
processes. This new edition does a splendid job of making clear the most important
Prologue to the second edition xv

such developments and pointing the reader in the direction of the key references to be
studied in each area. With the background oﬀered by this book, the reader who wishes
to explore these recent theoretical developments is well positioned both to read the
literature and to creatively apply these ideas to the problem at hand. All the elements
that made the ﬁrst edition of Markov Chains and Stochastic Stability a classic are here
in the second edition, and it will no doubt be a very welcome addition to the literature.

Peter W. Glynn
Palo Alto
Preface to the second edition

A new edition of Meyn & Tweedie – what for ?

The majority of topics covered in this book are well established. Ancient topics such
as the Doeblin decomposition and even more modern concepts such as f -regularity are
mature and not likely to see much improvement. Why then is there a need for a new
edition?
Publication of this book in the Cambridge Mathematical Library is a way to honor
my friend and colleague Richard Tweedie. The memorial article [103] contains a survey
of his contributions to applied probability and statistics and an announcement of the
initiation of the Tweedie New Researcher Award Fund.1 Royalties from the book will
go to Catherine Tweedie and help to support the memorial fund.
Richard would be very pleased to know that our book will be placed on the shelves
next to classics in the mathematical literature such as Hardy, Littlewood, and Pólya’s
Inequalities and Zygmund’s Trigonometric Series, as well as more modern classics such
as Katznelson’s An Introduction to Harmonic Analysis and Rogers and Williams’ Dif-
fusions, Markov Processes and Martingales.

Other reasons for this new edition are less personal.

Motivation for topics in the book has grown along with growth in computer power
since the book was last printed in March of 1996. The need for more efficient simulation
algorithms for complex Markovian models, or algorithms for computation of optimal
policies for controlled Markov models, has opened new directions for research on Markov
chains [29, 113, 10, 245, 27, 267]. It has been exciting to see new applications to diverse
topics including optimization, statistics, and economics.
Significant advances in the theory took place in the decade that the book was out
of print. Several chapters end with new commentary containing explanations regarding
changes to the text, or new references. The final chapter of this new edition contains a
partial roadmap of new directions of research on Markov models since 1996. The new
Chapter 20 is divided into three sections:
Section 20.1: Geometric ergodicity and spectral theory Topics in Chapters 15
and 16 have seen tremendous growth over the past decade. The operator-theoretic
framework of Chapter 16 was obviously valuable at the time this chapter was written.
We could not have known then how many new directions for research this framework
1 The Tweedie New Researcher Award Fund is now managed by the Institute of Mathematical

Statistics <www.imstat.org/awards/tweedie.html>.

xvii
xviii Preface to the second edition

would support. Ideally I would rewrite Chapters 15 and 16 to provide a more cohesive
treatment of geometric ergodicity, and explain how these ideas lead to foundations for
multiplicative ergodic theory, Lyapunov exponents, and the theory of large deviations.
This will have to wait for a third edition or a new book devoted to these topics. In its
place, I have provided in Section 20.1 a brief survey of these directions of research.
Section 20.2: Simulation and MCMC Richard Tweedie and I became interested
in these topics soon after the first edition went to print. Section 20.2 describes applica-
tions of general state space Markov chain techniques to the construction and analysis
of simulation algorithms, such as the control variate method [10], and algorithms found
in reinforcement learning [29, 379].
Section 20.3: Continuous time models The final section explains how theory
in continuous time can be generated from discrete time counterparts developed in this
book. In particular, all of the ergodic theorems in Part III have precise analogues in
continuous time.
The significance of Poisson’s equation was not properly highlighted in the first edi-
tion. This is rectified in a detailed commentary at the close of Chapter 17, which
includes a menu of applications, and new results on existence and uniqueness of solu-
tions to Poisson’s equation, contained in Theorems 17.7.1 and 17.7.2, respectively.
The multi-step drift criterion for stability described in Section 19.1 has been im-
proved, and this technique has found many applications. The resulting “fluid model”
approach to stability of stochastic networks is one theme of the new monograph [267].
Extensions of the techniques in Section 19.1 have found application to the theory of
stochastic approximation [40, 39], and to Markov chain Monte Carlo (MCMC) [100].
It is surprising how few errors have been uncovered since the first edition went to
print. Section 2.2.3 on the gumleaf attractor contained errors in the description of the
figures. There were other minor errors in the analysis of the forward recurrence time
chains in Section 10.3.1, and the coupling bound in Theorem 16.2.4. The term limiting
variance is now replaced by the more familiar asymptotic variance in Chapter 17, and
starting in Chapter 9 the term norm-like is replaced with the more familiar coercive.

Words of thanks
Continued support from the National Science Foundation is gratefully acknowledged.
Over the past decade, support from Control, Networks and Computational Intelligence
has funded much of the theory and applications surveyed in Chapter 20 under grants
ECS 940372, ECS 9972957, ECS 0217836, and ECS 0523620. The NSF grant DMI
0085165 supported research with Shane Henderson that is surveyed in Section 20.2.1.
It is a pleasure to convey my thanks to my wonderful editor Diana Gillooly. It was
her idea to place the book in the Cambridge Mathematical Library series. In addition
to her work “behind the scenes” at Cambridge University Press, Diana dissected the
manuscript searching for typos or inconsistencies in notation. She provided valuable
advice on structure, and patiently answered all of my questions.
Jeﬀrey Rosenthal has maintained the website for the online version of the ﬁrst edition
at probability.ca/MT. It is reassuring to know that this resource will remain in place
“till death do us part.”
Preface to the second edition xix

In the preface to the ﬁrst edition, we expressed our thanks to Peter Glynn for
his correspondence and inspiration. I am very grateful that our correspondence has
continued over the past 15 years. Much of the material contained in the surveys in the
new Chapter 20 can be regarded as part of “transcripts” from our many discussions
since the book was ﬁrst put into print.
I am very grateful to Ioannis Kontoyiannis for collaborations over the past decade.
Ioannis provided comments on the new edition, including the discovery of an error in
Theorem 16.2.4. Many have sent comments over the years. In particular, Vivek Borkar,
Jan van Casteren, Peter Haas, Lars Hansen, Galin Jones, Aziz Khanchi, Tze Lai, Zhan-
Qian Lu, Abdelkader Mokkadem, Eric Moulines, Gareth Roberts, Li-Ming Wu, and
three graduates from the University of Oslo – Tore W. Larsen, Arvid Raknerud, and
Øivind Skare – all pointed out errors that have been corrected in the new edition, or
suggested recent references that are now included in the updated bibliography.

Sean Meyn
Urbana-Champaign
Preface to the ﬁrst edition
(1993)

Books are individual and idiosyncratic. In trying to understand what makes a good
book, there is a limited amount that one can learn from other books; but at least one
can read their prefaces, in hope of help.
Our own research shows that authors use prefaces for many diﬀerent reasons.
Prefaces can be explanations of the role and the contents of the book, as in Chung
[71] or Revuz [326] or Nummelin [303]; this can be combined with what is almost an
apology for bothering the reader, as in Billingsley [37] or Çinlar [59]; prefaces can
describe the mathematics, as in Orey [309], or the importance of the applications, as
in Tong [388] or Asmussen [9], or the way in which the book works as a text, as in
Brockwell and Davis [51] or Revuz [326]; they can be the only available outlet for
thanking those who made the task of writing possible, as in almost all of the above
(although we particularly like the familial gratitude of Resnick [325] and the dedication
of Simmons [355]); they can combine all these roles, and many more.
This preface is no diﬀerent. Let us begin with those we hope will use the book.

Who wants this stuﬀ anyway?

This book is about Markov chains on general state spaces: sequences Φn evolving
randomly in time which remember their past trajectory only through its most recent
value. We develop their theoretical structure and we describe their application.
The theory of general state space chains has matured over the past twenty years in
ways which make it very much more accessible, very much more complete, and (we at
least think) rather beautiful to learn and use. We have tried to convey all of this, and
to convey it at a level that is no more difficult than the corresponding countable space
theory.
The easiest reader for us to envisage is the long-suffering graduate student, who is
expected, in many disciplines, to take a course on countable space Markov chains.
Such a graduate student should be able to read almost all of the general space the-
ory in this book without any mathematical background deeper than that needed for
studying chains on countable spaces, provided only that the fear of seeing an integral
rather than a summation sign can be overcome. Very little measure theory or analysis is
required: virtually no more in most places than must be used to define transition proba-
bilities. The remarkable Nummelin–Athreya–Ney regeneration technique, together with

xxi
xxii Preface to the ﬁrst edition

coupling methods, allows simple renewal approaches to almost all of the hard results.
Courses on countable space Markov chains abound, not only in statistics and math-
ematics departments, but in engineering schools, operations research groups and even
business schools. This book can serve as the text in most of these environments for a
one-semester course on more general space applied Markov chain theory, provided that
some of the deeper limit results are omitted and (in the interests of a fourteen-week
semester) the class is directed only to a subset of the examples, concentrating as best
suits their discipline on time series analysis, control and systems models or operations
research models.
The prerequisite texts for such a course are certainly at no deeper level than Chung
[72], Breiman [48], or Billingsley [37] for measure theory and stochastic processes, and
Simmons [355] or Rudin [345] for topology and analysis.
Be warned: we have not provided numerous illustrative unworked examples for the
student to cut teeth on. But we have developed a rather large number of thoroughly
worked examples, ensuring applications are well understood; and the literature is lit-
tered with variations for teaching purposes, many of which we reference explicitly.
This regular interplay between theory and detailed consideration of application to
specific models is one thread that guides the development of this book, as it guides the
rapidly growing usage of Markov models on general spaces by many practitioners.
The second group of readers we envisage consists of exactly those practitioners, in
several disparate areas, for all of whom we have tried to provide a set of research and
development tools: for engineers in control theory, through a discussion of linear and
nonlinear state space systems; for statisticians and probabilists in the related areas of
time series analysis; for researchers in systems analysis, through networking models for
which these techniques are becoming increasingly fruitful; and for applied probabilists,
interested in queueing and storage models and related analyses.
We have tried from the beginning to convey the applied value of the theory rather
than let it develop in a vacuum. The practitioner will find detailed examples of tran-
sition probabilities for real models. These models are classified systematically into the
various structural classes as we define them. The impact of the theory on the models
is developed in detail, not just to give examples of that theory but because the mod-
els themselves are important and there are relatively few places outside the research
journals where their analysis is collected.
Of course, there is only so much that a general theory of Markov chains can provide
to all of these areas. The contribution is in general qualitative, not quantitative. And
in our experience, the critical qualitative aspects are those of stability of the models.
Classification of a model as stable in some sense is the first fundamental operation un-
derlying other, more model-specific, analyses. It is, we think, astonishing how powerful
and accurate such a classification can become when using only the apparently blunt
instruments of a general Markovian theory: we hope the strength of the results de-
scribed here is equally visible to the reader as to the authors, for this is why we have
chosen stability analysis as the cord binding together the theory and the applications
of Markov chains.
We have adopted two novel approaches in writing this book. The reader will find
key theorems announced at the beginning of all but the discursive chapters; if these
are understood then the more detailed theory in the body of the chapter will be better
motivated, and applications made more straightforward. And at the end of the book we
Preface to the first edition xxiii

have constructed, at the risk of repetition, “mud maps” showing the crucial equivalences
between forms of stability, and we give a glossary of the models we evaluate. We trust
both of these innovations will help to make the material accessible to the full range of
readers we have considered.

What’s it all about?

We deal here with Markov chains. Despite the initial attempts by Doob and Chung
[99, 71] to reserve this term for systems evolving on countable spaces with both discrete
and continuous time parameters, usage seems to have decreed (see for example Revuz
[326]) that Markov chains move in discrete time, on whatever space they wish; and such
are the systems we describe here.
Typically, our systems evolve on quite general spaces. Many models of practical
systems are like this; or at least, they evolve on Rk or some subset thereof, and thus are
not amenable to countable space analysis, such as is found in Chung [71], or Çinlar [59],
and which is all that is found in most of the many other texts on the theory and
application of Markov chains.
We undertook this project for two main reasons. Firstly, we felt there was a lack
of accessible descriptions of such systems with any strong applied flavor; and secondly,
in our view the theory is now at a point where it can be used properly in its own
right, rather than practitioners needing to adopt countable space approximations, either
because they found the general space theory to be inadequate or the mathematical
requirements on them to be excessive.
The theoretical side of the book has some famous progenitors. The foundations of
a theory of general state space Markov chains are described in the remarkable book of
Doob [99], and although the theory is much more refined now, this is still the best source
of much basic material; the next generation of results is elegantly developed in the little
treatise of Orey [309]; the most current treatments are contained in the densely packed
goldmine of material of Nummelin [303], to whom we owe much, and in the deep but
rather different and perhaps more mathematical treatise by Revuz [326], which goes in
directions different from those we pursue.
None of these treatments pretend to have particularly strong leanings towards ap-
plications. To be sure, some recent books, such as that on applied probability models
by Asmussen [9] or that on nonlinear systems by Tong [388], come at the problem from
the other end. They provide quite substantial discussions of those specific aspects of
general Markov chain theory they require, but purely as tools for the applications they
have to hand.
Our aim has been to merge these approaches, and to do so in a way which will be
accessible to theoreticians and to practitioners both.

So what else is new?

In the preface to the second edition [71] of his classic treatise on countable space Markov
chains, Chung, writing in 1966, asserted that the general space context still had had
“little impact” on the the study of countable space chains, and that this “state of
mutual detachment” should not be suﬀered to continue. Admittedly, he was writing
xxiv Preface to the ﬁrst edition

of continuous time processes, but the remark is equally apt for discrete time models of
the period. We hope that it will be apparent in this book that the general space theory
has not only caught up with its countable counterpart in the areas we describe, but has
indeed added considerably to the ways in which the simpler systems are approached.
There are several themes in this book which instance both the maturity and the
novelty of the general space model, and which we feel deserve mention, even in the
restricted level of technicality available in a preface. These are, speciﬁcally,

(i) the use of the splitting technique, which provides an approach to general state
space chains through regeneration methods;

(ii) the use of “Foster–Lyapunov” drift criteria, both in improving the theory and in
enabling the classiﬁcation of individual chains;

(iii) the delineation of appropriate continuity conditions to link the general theory with
the properties of chains on, in particular, Euclidean space; and

(iv) the development of control model approaches, enabling analysis of models from
their deterministic counterparts.

These are not distinct themes: they interweave to a surprising extent in the mathematics
and its implementation.
The key factor is undoubtedly the existence and consequences of the Nummelin
splitting technique of Chapter 5, whereby it is shown that if a chain {Φn } on a quite
general space satisfies the simple “ϕ-irreducibility” condition (which requires that for
some measure ϕ, there is at least positive probability from any initial point x that one
of the Φn lies in any set of positive ϕ-measure; see Chapter 4), then one can induce an
artificial “regeneration time” in the chain, allowing all of the mechanisms of discrete
time renewal theory to be brought to bear.
Part I is largely devoted to developing this theme and related concepts, and their
practical implementation.
The splitting method enables essentially all of the results known for countable space
to be replicated for general spaces. Although that by itself is a major achievement,
it also has the side benefit that it forces concentration on the aspects of the theory
that depend, not on a countable space which gives regeneration at every step, but on
a single regeneration point. Part II develops the use of the splitting method, amongst
other approaches, in providing a full analogue of the positive recurrence/null recur-
rence/transience trichotomy central in the exposition of countable space chains, together
with consequences of this trichotomy.
In developing such structures, the theory of general space chains has merely caught
up with its denumerable progenitor. Somewhat surprisingly, in considering asymptotic
results for positive recurrent chains, as we do in Part III, the concentration on a sin-
gle regenerative state leads to stronger ergodic theorems (in terms of total variation
convergence), better rates of convergence results, and a more uniform set of equivalent
conditions for the strong stability regime known as positive recurrence than is typically
realised for countable space chains.
The outcomes of this splitting technique approach are possibly best exemplified in
the case of so-called “geometrically ergodic” chains.
Preface to the first edition xxv

Let τC be the hitting time on any set C: that is, the ﬁrst time that the chain Φn
returns to C; and let P n (x, A) = P(Φn ∈ A | Φ0 = x) denote the probability that
the chain is in a set A at time n given it starts at time zero in state x, or the “n-step
transition probabilities”, of the chain. One of the goals of Part II and Part III is to link
conditions under which the chain returns quickly to “small” sets C (such as ﬁnite or
compact sets), measured in terms of moments of τC , with conditions under which the
probabilities P n (x, A) converge to limiting distributions.
Here is a taste of what can be achieved. We will eventually show, in Chapter 15,
the following elegant result:
The following conditions are all equivalent for a ϕ-irreducible “aperiodic” (see Chap-
ter 5) chain:
(A) For some one “small” set C, the return time distributions have geometric tails;
that is, for some r > 1
sup Ex [rτ C ] < ∞.
x∈C

(B) For some one “small” set C, the transition probabilities converge geometrically
quickly; that is, for some M < ∞, P ∞ (C) > 0 and ρC < 1

sup |P n (x, C) − P ∞ (C)| ≤ M ρnC .

x∈C

(C) For some one “small” set C, there is “geometric drift” towards C; that is, for
some function V ≥ 1 and some β > 0

P (x, dy)V (y) ≤ (1 − β)V (x) + IC (x).

Each of these implies that there is a limiting probability measure π, a constant R < ∞
and some uniform rate ρ < 1 such that

sup P (x, dy)f (y) − π(dy)f (y) ≤ RV (x)ρn
n
|f |≤V

where the function V is as in (C).

This set of equivalences also displays a second theme of this book: not only do
we stress the relatively well-known equivalence of hitting time properties and limiting
results, as between (A) and (B), but we also develop the equivalence of these with the
one-step “Foster–Lyapunov” drift conditions as in (C), which we systematically derive
for various types of stability.
As well as their mathematical elegance, these results have great pragmatic value.
The condition (C) can be checked directly from P for specific models, giving a powerful
applied tool to be used in classifying specific models. Although such drift conditions
have been exploited in many continuous space applications areas for over a decade,
much of the formulation in this book is new.
The “small” sets in these equivalences are vague: this is of course only the preface!
It would be nice if they were compact sets, for example; and the continuity conditions
we develop, starting in Chapter 6, ensure this, and much beside.
xxvi Preface to the first edition

There is a further mathematical unity, and novelty, to much of our presentation,

especially in the application of results to linear and nonlinear systems on Rk . We for-
mulate many of our concepts first for deterministic analogues of the stochastic systems,
and we show how the insight from such deterministic modeling flows into appropriate
criteria for stochastic modeling. These ideas are taken from control theory, and forms
of control of the deterministic system and stability of its stochastic generalization run
in tandem. The duality between the deterministic and stochastic conditions is indeed
almost exact, provided one is dealing with ϕ-irreducible Markov models; and the conti-
nuity conditions above interact with these ideas in ensuring that the “stochasticization”
of the deterministic models gives such ϕ-irreducible chains.
Breiman [48] notes that he once wrote a preface so long that he never finished his
book. It is tempting to keep on, and rewrite here all the high points of the book.
We will resist such temptation. For other highlights we refer the reader instead to
the introductions to each chapter: in them we have displayed the main results in the
chapter, to whet the appetite and to guide the different classes of user. Do not be
fooled: there are many other results besides the highlights inside. We hope you will
find them as elegant and as useful as we do.

Who do we owe?
Like most authors we owe our debts, professional and personal. A preface is a good
place to acknowledge them.
The alphabetically and chronologically younger author began studying Markov
chains at McGill University in Montréal. John Taylor introduced him to the beauty
of probability. The excellent teaching of Michael Kaplan provided a first contact with
Markov chains and a unique perspective on the structure of stochastic models.
He is especially happy to have the chance to thank Peter Caines for planting him in
one of the most fantastic cities in North America, and for the friendship and academic
environment that he subsequently provided.
In applying these results, very considerable input and insight has been provided by
Lei Guo of Academia Sinica in Beijing and Doug Down of the University of Illinois.
Some of the material on control theory and on queues in particular owes much to their
collaboration in the original derivations.
He is now especially fortunate to work in close proximity to P.R. Kumar, who has
been a consistent inspiration, particularly through his work on queueing networks and
adaptive control. Others who have helped him, by corresponding on current research, by
sharing enlightenment about a new application, or by developing new theoretical ideas,
include Venkat Anantharam, A. Ganesh, Peter Glynn, Wolfgang Kliemann, Laurent
Praly, John Sadowsky, Karl Sigman, and Victor Solo.
The alphabetically later and older author has a correspondingly longer list of in-
fluences who have led to his abiding interest in this subject. Five stand out: Chip
Heathcote and Eugene Seneta at the Australian National University, who first taught
the enjoyment of Markov chains; David Kendall at Cambridge, whose own fundamental
work exemplifies the power, the beauty and the need to seek the underlying simplicity of
such processes; Joe Gani, whose unflagging enthusiasm and support for the interaction
of real theory and real problems has been an example for many years; and probably
Preface to the first edition xxvii

most significantly for the developments in this book, David Vere-Jones, who has shown
an uncanny knack for asking exactly the right questions at times when just enough was
known to be able to develop answers to them.
It was also a pleasure and a piece of good fortune for him to work with the Finnish
school of Esa Nummelin, Pekka Tuominen and Elja Arjas just as the splitting technique
was uncovered, and a large amount of the material in this book can actually be traced
to the month surrounding the First Tuusula Summer School in 1976. Applying the
methods over the years with David Pollard, Paul Feigin, Sid Resnick and Peter Brock-
well has also been both illuminating and enjoyable; whilst the ongoing stimulation and
encouragement to look at new areas given by Wojtek Szpankowski, Floske Spieksma,
Chris Adam and Kerrie Mengersen has been invaluable in maintaining enthusiasm and
energy in finishing this book.
By sheer coincidence both of us have held Postdoctoral Fellowships at the Australian
National University, albeit at somewhat different times. Both of us started much of our
own work in this field under that system, and we gratefully acknowledge those most
useful positions, even now that they are long past.
More recently, the support of our institutions has been invaluable. Bond University
facilitated our embryonic work together, whilst the Coordinated Sciences Laboratory of
the University of Illinois and the Department of Statistics at Colorado State University
have been enjoyable environments in which to do the actual writing.
Support from the National Science Foundation is gratefully acknowledged: grants
ECS 8910088 and DMS 9205687 enabled us to meet regularly, helped to fund our
students in related research, and partially supported the completion of the book.
Writing a book from multiple locations involves multiple meetings at every available
opportunity. We appreciated the support of Peter Caines in Montréal, Bozenna and
Tyrone Duncan at the University of Kansas, Will Gersch in Hawaii, Götz Kersting and
Heinrich Hering in Germany, for assisting in our meeting regularly and helping with
far-flung facilities.
Peter Brockwell, Kung-Sik Chan, Richard Davis, Doug Down, Kerrie Mengersen,
Rayadurgam Ravikanth, and Pekka Tuominen, and most significantly Vladimir Kalash-
nikov and Floske Spieksma, read fragments or reams of manuscript as we produced
them, and we gratefully acknowledge their advice, comments, corrections and encour-
agement. It is traditional, and in this case as accurate as usual, to say that any remain-
ing infelicities are there despite their best efforts.
Rayadurgam Ravikanth produced the sample path graphs for us; Bob MacFarlane
drew the remaining illustrations; and Francie Bridges produced much of the bibliogra-
phy and some of the text. The vast bulk of the material we have done ourselves: our
debt to Donald Knuth and the developers of LATEX is clear and immense, as is our debt
to Deepa Ramaswamy, Molly Shor, Rich Sutton and all those others who have kept
software, email and remote telematic facilities running smoothly.
Lastly, we are grateful to Brad Dickinson and Eduardo Sontag, and to Zvi Ruder
and Nicholas Pinfield and the Engineering and Control Series staff at Springer, for their
patience, encouragement and help.
xxviii Preface to the first edition

And finally . . .
And finally, like all authors whether they say so in the preface or not, we have received
support beyond the call of duty from our families. Writing a book of this magnitude has
taken much time that should have been spent with them, and they have been unfailingly
supportive of the enterprise, and remarkably patient and tolerant in the face of our quite
unreasonable exclusion of other interests.
They have lived with family holidays where we scribbled proto-books in restaurants
and tripped over deer whilst discussing Doeblin decompositions; they have endured
sundry absences and visitations, with no idea of which was worse; they have seen come
and go a series of deadlines with all of the structure of a renewal process.
They are delighted that we are finished, although we feel they have not yet adjusted
to the fact that a similar development of the continuous time theory clearly needs to
be written next.
So to Belinda, Sydney and Sophie; to Catherine and Marianne: with thanks for the
patience, support and understanding, this book is dedicated to you.
Part I

COMMUNICATION
and
REGENERATION
Chapter 1

Heuristics

This book is about Markovian models, and particularly about the structure and stability
of such models. We develop a theoretical basis by studying Markov chains in very
general contexts; and we develop, as systematically as we can, the applications of this
theory to applied models in systems engineering, in operations research, and in time
series.
A Markov chain is, for us, a collection of random variables Φ = {Φn : n ∈ T }, where
T is a countable time set. It is customary to write T as Z+ := {0, 1, . . .}, and we will
do this henceforth.
Heuristically, the critical aspect of a Markov model, as opposed to any other set of
random variables, is that it is forgetful of all but its most immediate past. The precise
meaning of this requirement for the evolution of a Markov model in time, that the
future of the process is independent of the past given only its present value, and the
construction of such a model in a rigorous way, is taken up in Chapter 3. Until then it
is enough to indicate that for a process Φ, evolving on a space X and governed by an
overall probability law P, to be a time-homogeneous Markov chain, there must be a set
of “transition probabilities” {P n (x, A), x ∈ X, A ⊂ X} for appropriate sets A such that
for times n, m in Z+

P(Φn +m ∈ A | Φj , j ≤ m; Φm = x) = P n (x, A); (1.1)

that is, P n (x, A) denotes the probability that a chain at x will be in the set A after n
steps, or transitions. The independence of P n on the values of Φj , j ≤ m, is the Markov
property, and the independence of P n and m is the time-homogeneity property.
We now show that systems which are amenable to modeling by discrete time Markov
chains with this structure occur frequently, especially if we take the state space of the
process to be rather general, since then we can allow auxiliary information on the past
to be incorporated to ensure the Markov property is appropriate.

1.1 A range of Markovian environments

The following examples illustrate this breadth of application of Markov models, and a
little of the reason why stability is a central requirement for such models.

3
4 Heuristics

(a) The cruise control system on a modern motor vehicle monitors, at each time point
k, a vector {Xk } of inputs: speed, fuel ﬂow, and the like (see Kuo [230]). It
calculates a control value Uk which adjusts the throttle, causing a change in the
values of the environmental variables Xk +1 which in turn causes Uk +1 to change
again. The multidimensional process Φk = {Xk , Uk } is often a Markov chain (see
Section 2.3.2), with new values overriding those of the past, and with the next
value governed by the present value. All of this is subject to measurement error,
and the process can never be other than stochastic: stability for this chain consists
in ensuring that the environmental variables do not deviate too far, within the
limits imposed by randomness, from the pre-set goals of the control algorithm.

(b) A queue at an airport evolves through the random arrival of customers and the
service times they bring. The numbers in the queue, and the time the customer
has to wait, are critical parameters for customer satisfaction, for waiting room
design, for counter staﬃng (see Asmussen [9]). Under appropriate conditions (see
Section 2.4.2), variables observed at arrival times (either the queue numbers, or
a combination of such numbers and aspects of the remaining or currently uncom-
pleted service times) can be represented as a Markov chain, and the question of
stability is central to ensuring that the queue remains at a viable level. Techniques
arising from the analysis of such models have led to the now familiar single-line
multi-server counters actually used in airports, banks and similar facilities, rather
than the previous multi-line systems.

(c) The exchange rate Xn between two currencies can be and is represented as a
function of its past several values Xn −1 , . . . , Xn −k , modiﬁed by the volatility of
the market which is incorporated as a disturbance term Wn (see Krugman and
Miller [222] for models of such ﬂuctuations). The autoregressive model

k
Xn = αj Xn −j + Wn
j =1

central in time series analysis (see Section 2.1) captures the essential concept of
such a system. By considering the whole k-length vector Φn = (Xn , . . . , Xn −k +1 ),
Markovian methods can be brought to the analysis of such time-series models.
Stability here involves relatively small ﬂuctuations around a norm; and as we will
see, if we do not have such stability, then typically we will have instability of the
grossest kind, with the exchange rate heading to inﬁnity.

(d) Storage models are fundamental in engineering, insurance and business. In en-
gineering one considers a dam, with input of random amounts at random times,
and a steady withdrawal of water for irrigation or power usage. This model has a
Markovian representation (see Section 2.4.3 and Section 2.4.4). In insurance, there
is a steady inﬂow of premiums, and random outputs of claims at random times.
This model is also a storage process, but with the input and output reversed when
compared to the engineering version, and also has a Markovian representation (see
Asmussen [9]). In business, the inventory of a ﬁrm will act in a manner between
these two models, with regular but sometimes also large irregular withdrawals,
1.1. A range of Markovian environments 5

and irregular ordering or replacements, usually triggered by levels of stock reach-

ing threshold values (for an early but still relevant overview see Prabhu [322]).
This also has, given appropriate assumptions, a Markovian representation. For all
of these, stability is essentially the requirement that the chain stays in “reasonable
values”: the stock does not overfill the warehouse, the dam does not overflow, the
claims do not swamp the premiums.
(e) The growth of populations is modeled by Markov chains, of many varieties. Small
homogeneous populations are branching processes (see Athreya and Ney [12]);
more coarse analysis of large populations by time series models allows, as in (c),
a Markovian representation (see Brockwell and Davis [51]); even the detailed and
intricate cycle of the Canadian lynx seem to fit a Markovian model [287], [388]. Of
these, only the third is stable in the sense of this book: the others either die out
(which is, trivially, stability but a rather uninteresting form); or, as with human
populations, expand (at least within the model) forever.
(f) Markov chains are currently enjoying wide popularity through their use as a tool
in simulation: Gibbs sampling, and its extension to Markov chain Monte Carlo
methods of simulation, which utilise the fact that many distributions can be con-
structed as invariant or limiting distributions (in the sense of (1.16) below), has
had great impact on a number of areas (see, as just one example, [312]). In par-
ticular, the calculation of posterior Bayesian distributions has been revolutionized
through this route [359, 381, 385], and the behavior of prior and posterior distri-
butions on very general spaces such as spaces of likelihood measures themselves
can be approached in this way (see [112]): there is no doubt that at this degree of
generality, techniques such as we develop in this book are critical.
(g) There are Markov models in all areas of human endeavor. The degree of word
usage by famous authors admits a Markovian representation (see, amongst others,
Gani and Saunders [136]). Did Shakespeare have an unlimited vocabulary? This
can be phrased as a question of stability: if he wrote forever, would the size of the
vocabulary used grow in an unlimited way? The record levels in sport are Markov-
ian (see Resnick [325]). The spread of surnames may be modeled as Markovian
(see [78]). The employment structure in a firm has a Markovian representation
(see Bartholomew and Forbes [18]). This range of examples does not imply all
human experience is Markovian: it does indicate that if enough variables are in-
corporated in the definition of “immediate past”, a forgetfulness of all but that
past is a reasonable approximation, and one which we can handle.
(h) Perhaps even more importantly, at the current level of technological development,
telecommunications and computer networks have inherent Markovian representa-
tions (see Kelly [199] for a very wide range of applications, both actual and po-
tential, and Gray [144] for applications to coding and information theory). They
may be composed of sundry connected queueing processes, with jobs completed at
nodes, and messages routed between them; to summarize the past one may need a
state space which is the product of many subspaces, including countable subspaces,
representing numbers in queues and buffers, uncountable subspaces, representing
unfinished service times or routing times, or numerous trivial 0-1 subspaces repre-
senting available slots or wait-states or busy servers. But by a suitable choice of
6 Heuristics

state space, and (as always) a choice of appropriate assumptions, the methods we
give in this book become tools to analyze the stability of the system.

Simple spaces do not describe these systems in general. Integer or real-valued models
are sufficient only to analyze the simplest models in almost all of these contexts.
The methods and descriptions in this book are for chains which take their values
in a virtually arbitrary space X. We do not restrict ourselves to countable spaces, nor
even to Euclidean space Rn , although we do give specific formulations of much of our
theory in both these special cases, to aid both understanding and application.
One of the key factors that allows this generality is that, for the models we consider,
there is no great loss of power in going from a simple to a quite general space. The
reader interested in any of the areas of application above should therefore find that
the structural and stability results for general Markov chains are potentially tools of
great value, no matter what the situation, no matter how simple or complex the model
considered.

1.2 Basic models in practice

1.2.1 The Markovian assumption
The simplest Markov models occur when the variables Φn , n ∈ Z+ , are independent.
However, a collection of random variables which is independent certainly fails to capture
the essence of Markov models, which are designed to represent systems which do have a
past, even though they depend on that past only through knowledge of the most recent
information on their trajectory.
As we have seen in Section 1.1, the seemingly simple Markovian assumption allows
a surprisingly wide variety of phenomena to be represented as Markov chains. It is this
which accounts for the central place that Markov models hold in the stochastic process
literature. For once some limited independence of the past is allowed, then there is the
possibility of reformulating many models so the dependence is as simple as in (1.1).
There are two standard paradigms for allowing us to construct Markovian represen-
tations, even if the initial phenomenon appears to be non-Markovian.
In the ﬁrst, the dependence of some model of interest Y = {Yn } on its past values
may be non-Markovian but still be based only on a ﬁnite “memory”. This means
that the system depends on the past only through the previous k + 1 values, in the
probabilistic sense that

P(Yn +m ∈ A | Yj , j ≤ n) = P(Yn +m ∈ A | Yj , j = n, n − 1, . . . , n − k). (1.2)

Merely by reformulating the model through deﬁning the vectors

Φn = {Yn , . . . , Yn −k }

and setting Φ = {Φn , n ≥ 0} (taking obvious care in defining {Φ0 , . . . , Φk −1 }), we can
define from Y a Markov chain Φ. The motion in the first coordinate of Φ reflects that
of Y , and in the other coordinates is trivial to identify, since Yn becomes Y(n +1)−1 , and
so forth; and hence Y can be analyzed by Markov chain methods.
1.2. Basic models in practice 7

Such state space representations, despite their somewhat artiﬁcial nature in some
cases, are an increasingly important tool in deterministic and stochastic systems theory,
and in linear and nonlinear time series analysis.
As the second paradigm for constructing a Markov model representing a non-
Markovian system, we look for so-called embedded regeneration points. These are times
at which the system forgets its past in a probabilistic sense: the system viewed at such
time points is Markovian even if the overall process is not.
Consider as one such model a storage system, or dam, which ﬁlls and empties. This
is rarely Markovian: for instance, knowledge of the time since the last input, or the size
of previous inputs still being drawn down, will give information on the current level of
the dam or even the time to the next input. But at that very special sequence of times
when the dam is empty and an input actually occurs, the process may well “forget
the past”, or “regenerate”: appropriate conditions for this are that the times between
inputs and the size of each input are independent. For then one cannot forecast the
time to the next input when at an input time, and the current emptiness of the dam
means that there is no information about past input levels available at such times. The
dam content, viewed at these special times, can then be analyzed as a Markov chain.
“Regenerative models” for which such “embedded Markov chains” occur are common
in operations research, and in particular in the analysis of queueing and network models.
State space models and regeneration time representations have become increasingly
important in the literature of time series, signal processing, control theory, and opera-
tions research, and not least because of the possibility they provide for analysis through
the tools of Markov chain theory. In the remainder of this opening chapter, we will in-
troduce a number of these models in their simplest form, in order to provide a concrete
basis for further development.

1.2.2 State space and deterministic control models

One theme throughout this book will be the analysis of stochastic models through con-
sideration of the underlying deterministic motion of speciﬁc (non-random) realizations
of the input driving the model.
Such an approach draws on both control theory, for the deterministic analysis; and
Markov chain theory, for the translation to the stochastic analogue of the deterministic
chain.
We introduce both of these ideas heuristically in this section.

Deterministic control models

In the theory of deterministic systems and control systems we ﬁnd the simplest possible
Markov chains: ones such that the next position of the chain is determined completely
as a function of the previous position.
Consider the deterministic linear system on Rn , whose “state trajectory” x =
{xk , k ∈ Z+ } is deﬁned inductively as

xk +1 = F xk (1.3)

where F is an n × n matrix.
8 Heuristics

X2 X2

X1 X1

Figure 1.1: At left is a sample path generated by the deterministic linear model on R2 .
At right is a sample path from the linear state space model on R2 with Gaussian noise.

Clearly, this is a multidimensional Markovian model: even if we know all of the

values of {xk , k ≤ m} then we will still predict xm +1 in the same way, with the same
(exact) accuracy, based solely on (1.3) which uses only knowledge of xm .
At left in Figure 1.1 we show a sample path corresponding
−0.2, 1to
the choice of F as
F = I + ∆A with I equal to a 2 × 2 identity matrix, A = −1, −0.2 and ∆ = 0.02. It is
instructive to realize that two very different types of behavior can follow from related
choices of the matrix F . The trajectory spirals in, and is intuitively “stable”; but if we
read the model in the other direction, the trajectory spirals out, and this is exactly the
result of using F −1 in (1.3).
Thus, although this model is one without any built-in randomness or stochastic
behavior, questions of stability of the model are still basic: the first choice of F gives a
stable model, the second choice of F −1 gives an unstable model.
A straightforward generalization of the linear system of (1.3) is the linear control
model. From the outward version of the trajectory in Figure 1.1, it is clearly possible
for the process determined by F to be out of control in an intuitively obvious sense. In
practice, one might observe the value of the process, and influence it either by adding on
a modifying “control value” either independently of the current position of the process
or directly based on the current value. Now the state trajectory x = {xk } on Rn is
defined inductively not only as a function of its past, but also of such a (deterministic)
control sequence u = {uk } taking values in, say, Rp .
Formally, we can describe the linear control model by the postulates (LCM1) and
(LCM2) below.
If the control value uk +1 depends at most on the sequence xj , j ≤ k through xk ,
then it is clear that the LCM(F ,G) model is itself Markovian.
However, the interest in the linear control model in our context comes from the fact
that it is helpful in studying an associated Markov chain called the linear state space
model. This is simply (1.4) with a certain random choice for the sequence {uk }, with
uk +1 independent of xj , j ≤ k, and we describe this next.
1.2. Basic models in practice 9

Deterministic linear control model

Suppose x = {xk } is a process on Rn and u = {un } is a process on Rp ,
for which x0 is arbitrary and for k ≥ 1
(LCM1) there exists an n × n matrix F and an n × p matrix G such that
for each k ∈ Z+ ,
xk +1 = F xk + Guk +1 ; (1.4)

(LCM2) the sequence {uk } on Rp is chosen deterministically.

Then x is called the linear control model driven by F, G, or the LCM(F ,G)
model.

The linear state space model

In developing a stochastic version of a control system, an obvious generalization is to
assume that the next position of the chain is determined as a function of the previous
position, but in some way which still allows for uncertainty in its new position, such
as by a random choice of the “control” at each step. Formally, we can describe such a
model by

Linear state space model

Suppose X = {Xk } is a stochastic process for which

(LSS1) there exists an n × n matrix F and an n × p matrix G such that
for each k ∈ Z+ , the random variables Xk and Wk take values in Rn and
Rp , respectively, and satisfy inductively for k ∈ Z+ ,

Xk +1 = F Xk + GWk +1

where X0 is arbitrary;

(LSS2) the random variables {Wk } are independent and identically dis-
tributed (i.i.d), and are independent of X0 , with common distribution
Γ(A) = P(Wj ∈ A) having ﬁnite mean and variance.
Then X is called the linear state space model driven by F, G, or the
LSS(F ,G) model, with associated control model LCM(F ,G).

Such linear models with random “noise” or “innovation” are related to both the
simple deterministic model (1.3) and also the linear control model (1.4).
10 Heuristics

There are obviously two components to the evolution of a state space model. The
matrix F controls the motion in one way, but its action is modulated by the regular
input of random fluctuations which involve both the underlying variable with distribu-
tion Γ, and its adjustment through G. At 2.5left
in Figure 1.1 we show a sample path
corresponding to the same matrix F , G = 2.5 , and with Γ taken as a bivariate Normal,
or Gaussian, distribution N (0, 1). This indicates that the addition of the noise variables
W can lead to types of behavior very different to that of the deterministic model, even
with the same choice of the function F .
Such models describe the movements of airplanes, of industrial and engineering
equipment, and even (somewhat idealistically) of economies and financial systems [3,
57]. Stability in these contexts is then understood in terms of return to level flight, or
small and (in practical terms) insignificant deviations from set engineering standards,
or minor inflation or exchange-rate variation. Because of the random nature of the noise
we cannot expect totally unvarying systems; what we seek to preclude are explosive or
wildly fluctuating operations.
We will see that, in wide generality, if the linear control model LCM(F ,G) is stable in
a deterministic way, and if we have a “reasonable” distribution Γ for our random control
sequences, then the linear state space LSS(F ,G) model is also stable in a stochastic
sense.
In Chapter 2 we will describe models which build substantially on these simple
structures, and which illustrate the development of Markovian structures for linear and
nonlinear state space model theory.
We now leave state space models, and turn to the simplest examples of another class
of models, which may be thought of collectively as models with a regenerative structure.

1.2.3 The gamblers ruin and the random walk

Unrestricted random walk
At the roots of traditional probability theory lies the problem of the gambler’s ruin.
One has a gaming house in which one plays successive games; at each time point,
there is a playing of a game, and an amount won or lost: and the successive totals of
the amounts won or lost represent the ﬂuctuations in the fortune of the gambler.
It is common, and realistic, to assume that as long as the gambler plays the same
game each time, then the winnings Wk at each time k are i.i.d.
Now write the total winnings (or losings) at time k as Φk . By this construction,

Φk +1 = Φk + Wk +1 . (1.5)

It is obvious that Φ = {Φk : k ∈ Z+ } is a Markov chain, taking values in the real line
R = (−∞, ∞); the independence of the {Wk } guarantees the Markovian nature of the
chain Φ.
In this context, stability (as far as the gambling house is concerned) requires that
Φ eventually reaches (−∞, 0]; a greater degree of stability is achieved from the same
perspective if the time to reach (−∞, 0] has finite mean. Inevitably, of course, this
stability is also the gambler’s ruin.
Such a chain, defined by taking successive sums of i.i.d. random variables, provides
a model for very many different systems, and is known as random walk.
1.2. Basic models in practice 11

Φk

Γ = N (0, 1)

Φk Φk

Γ = N (−0.2, 1) Γ = N (0.2, 1)

k k

Figure 1.2: Random walk sample paths from three diﬀerent models. The increment
distributions is Γ = N (0, 1) for the path shown at top. The increment distribution is
Γ = N (−0.2, 1) for the path shown on the lower left, and Γ = N (+0.2, 1) for the path
shown on the lower right.

Random walk
Suppose that Φ = {Φk ; k ∈ Z+ } is a collection of random variables deﬁned
by choosing an arbitrary distribution for Φ0 and setting for k ∈ Z+
(RW1)
Φk +1 = Φk + Wk +1
where the Wk are i.i.d. random variables taking values in R with

Γ(−∞, y] = P(Wn ≤ y). (1.6)

Then Φ is called random walk on R.

In Figure 1.2 we give sets of three sample paths of random walks with diﬀerent
distributions for Γ: all start at the same value but we choose for the winnings on each
game
12 Heuristics

(i) W having a Gaussian N(0, 1) distribution, so the game is fair;

(ii) W having a Gaussian N(−0.2, 1) distribution, so the game is not fair, with the
house winning one unit on average each ﬁve plays;

(iii) W having a Gaussian N(0.2, 1) distribution, so the game modeled is, perhaps,
one of “skill” where the player actually wins on average one unit per ﬁve games
against the house.

The sample paths clearly indicate that ruin is rather more likely under case (ii) than
under case (iii) or case (i): but when is ruin certain? And how long does it take if it is
certain?
These are questions involving the stability of the random walk model, or at least
that modiﬁcation of the random walk which we now deﬁne.

Random walk on a half line

Although they come from different backgrounds, it is immediately obvious that the
random walk defined by (RW1) is a particularly simple form of the linear state space
model, in one dimension and with a trivial form of the matrix pair F, G in (LSS1).
However, the models traditionally built on the random walk follow a somewhat different
path than those which have their roots in deterministic linear systems theory.
Perhaps the most widely applied variation on the random walk model, which imme-
diately moves away from a linear structure, is the random walk on a half line.

Random walk on a half line

Suppose Φ = {Φk ; k ∈ Z+ } is deﬁned by choosing an arbitrary distribution
for Φ0 and taking
(RWHL1)
Φk +1 = [Φk + Wk +1 ]+ (1.7)
where [Φk + Wk +1 ]+ := max(0, Φk + Wk +1 ) and again the Wk are i.i.d.
random variables taking values in R with Γ(−∞, y] = P(W ≤ y).
Then Φ is called random walk on a half line.

This chain follows the paths of a random walk, but is held at zero when the underly-
ing random walk becomes non-positive, leaving zero again only when the next positive
value occurs in the sequence {Wk }.
In Figure 1.3 we again give sets of sample paths of random walks on the half line
[0, ∞), corresponding to those of the unrestricted random walk in the previous section.
The diﬀerence in the proportion of paths which hit, or return to, the state {0} is again
clear.
We shall see in Chapter 2 that random walk on a half line is both a model for storage
systems and a model for queueing systems. For all such applications there are similar
1.3. Stochastic stability for Markov models 13

Φk Φk
Γ = N (−0.2, 1) Γ = N (+0.2, 1)

k k

Figure 1.3: Random walk paths reﬂected at zero. The increment distribution is Γ =
N (−0.2, 1) for the plot shown on the left, and Γ = N (+0.2, 1) for the plot shown on
the right.

concerns and concepts of the structure and the stability of the models: we need to know
whether a dam overﬂows, whether a queue ever empties, whether a computer network
jams. In the next section we give a ﬁrst heuristic description of the ways in which such
stability questions might be formalized.

1.3 Stochastic stability for Markov models

What is “stability”?
It is a word with many meanings in many contexts. We have chosen to use it partly
because of its very diffuseness and lack of technical meaning: in the stochastic process
sense it is not well defined, it is not constraining, and it will, we hope, serve to cover
a range of similar but far from identical “stable” behaviors of the models we consider,
most of which have (relatively) tightly defined technical meanings.
Stability is certainly a basic concept. In setting up models for real phenomena
evolving in time, one ideally hopes to gain a detailed quantitative description of the
evolution of the process based on the underlying assumptions incorporated in the model.
Logically prior to such detailed analyses are those questions of the structure and stability
of the model which require qualitative rather than quantitative answers, but which are
equally fundamental to an understanding of the behavior of the model. This is clear even
from the behavior of the sample paths of the models considered in the section above: as
parameters change, sample paths vary from reasonably “stable” (in an intuitive sense)
behavior, to quite “unstable” behavior, with processes taking larger or more widely
fluctuating values as time progresses.
Investigation of specific models will, of course, often require quite specific tools:
but the stability and the general structure of a model can in surprisingly wide-ranging
circumstances be established from the concepts developed purely from the Markovian
nature of the model.
We discuss in this section, again somewhat heuristically (or at least with minimal
technicality: some “quotation-marked” terms will be properly defined later), various
general stability concepts for Markov chains. Some of these are traditional in the Markov
14 Heuristics

chain literature, and some we take from dynamical or stochastic systems theory, which
is concerned with precisely these same questions under rather diﬀerent conditions on
the model structures.

1.3.1 Communication and recurrence as stability

We will systematically develop a series of increasingly strong levels of communication
and recurrence behavior within the state space of a Markov chain, which provide one
unified framework within which we can discuss stability.
To give an initial introduction, we need only the concept of the hitting time from a
point to a set: let
τA := inf(n ≥ 1 : Φn ∈ A)
denote the first time a chain reaches the set A. This will be infinite for those paths
where the set A is never reached.
In one sense the least restrictive form of stability we might require is that the chain
does not in reality consist of two chains: that is, that the collection of sets which we
can reach from different starting points is not different. This leads us to first define and
study

(I) ϕ-irreducibility for a general space chain, which we approach by requiring that
the space supports a measure ϕ with the property that for every starting point
x∈X
ϕ(A) > 0 ⇒ Px (τA < ∞) > 0

where Px denotes the probability of events conditional on the chain beginning with
Φ0 = x.
This condition ensures that all “reasonable sized” sets, as measured by ϕ, can be
reached from every possible starting point.
For a countable space chain ϕ-irreducibility is just the concept of irreducibility
commonly used [59, 71], with ϕ taken as counting measure.
For a state space model ϕ-irreducibility is related to the idea that we are able to
“steer” the system to every other state in Rn . The linear control LCM(F ,G) model
is called controllable if for any initial states x0 and any other x ∈ X, there exists
m ∈ Z+ and a sequence of control variables (u1 , . . . , um ) ∈ Rp such that xm = x when
(u1 , . . . , um ) = (u1 , . . . , um ). If this does not hold then for some starting points we
are in one part of the space forever; from others we are in another part of the space.
Controllability, and analogously irreducibility, preclude this.
Thus under irreducibility we do not have systems so unstable in their starting po-
sition that, given a small change of initial position, they might change so dramatically
that they have no possibility of reaching the same set of states.
A study of the wide-ranging consequences of such an assumption of irreducibility
will occupy much of Part I of this book: the deﬁnition above will be shown to produce
remarkable solidity of behavior.
The next level of stability is a requirement, not only that there should be a possibility
of reaching like states from unlike starting points, but that reaching such sets of states
should be guaranteed eventually. This leads us to deﬁne and study concepts of
1.3. Stochastic stability for Markov models 15

(II) recurrence, for which we might ask as a ﬁrst step that there is a measure ϕ
guaranteeing that for every starting point x ∈ X

ϕ(A) > 0 ⇒ Px (τA < ∞) = 1, (1.8)

and then, as a further strengthening, that for every starting point x ∈ X

ϕ(A) > 0 ⇒ Ex [τA ] < ∞. (1.9)

These conditions ensure that reasonable sized sets are reached with probability one, as
in (1.8), or even in a finite mean time as in (1.9). Part II of this book is devoted to
the study of such ideas, and to showing that for irreducible chains, even on a general
state space, there are solidarity results which show that either such uniform (in x)
stability properties hold, or the chain is unstable in a well-defined way: there is no
middle ground, no “partially stable” behavior available.
For deterministic models, the recurrence concepts in (II) are obviously the same.
For stochastic models they are definitely different. For “suitable” chains on spaces with
appropriate topologies (the T-chains introduced in Chapter 6), the first will turn out
to be entirely equivalent to requiring that “evanescence”, defined by
∞

{Φ → ∞} = {Φ ∈ On infinitely often}c (1.10)
n =0

for a countable collection of open pre-compact sets {On }, has zero probability for all
starting points; the second is similarly equivalent, for the same “suitable” chains, to
requiring that for any ε > 0 and any x there is a compact set C such that

lim inf P k (x, C) ≥ 1 − ε (1.11)

k →∞

which is tightness [36] of the transition probabilities of the chain.

All these conditions have the heuristic interpretation that the chain returns to the
“center” of the space in a recurring way: when (1.9) holds then this recurrence is faster
than if we only have (1.8), but in both cases the chain does not just drift off (or evanesce)
away from the center of the state space.
In such circumstances we might hope to find, further, a long-term version of stability
in terms of the convergence of the distributions of the chain as time goes by. This is
the third level of stability we consider. We define and study

(III) the limiting, or ergodic, behavior of the chain: and it emerges that in the stronger
recurrent situation described by (1.9) there is an “invariant regime” described
by a measure π such that if the chain starts in this regime (that is, if Φ0 has
distribution π) then it remains in the regime, and moreover if the chain starts in
some other regime then it converges in a strong probabilistic sense with π as a
limiting distribution.

In Part III we largely conﬁne ourselves to such ergodic chains, and ﬁnd both theoretical
and pragmatic results ensuring that a given chain is at this level of stability. For whilst
the construction of solidarity results, as in Parts I and II, provides a vital underpinning
16 Heuristics

to the use of Markov chain theory, it is the consequences of that stability, in the form
of powerful ergodic results, that makes the concepts of very much more than academic
interest.
Let us provide motivation for such endeavors by describing, with a little more for-
mality, just how solid the solidarity results are, and how strong the consequent ergodic
theorems are. We will show, in Chapter 13, the following:
Theorem 1.3.1. The following four conditions are equivalent:
(i) The chain admits a unique probability measure π satisfying the invariant equations

π(A) = π(dx)P (x, A), A ∈ B(X); (1.12)

(ii) There exists some “small” set C ∈ B(X) and MC < ∞ such that

sup Ex [τC ] ≤ MC ; (1.13)

x∈C

(iii) There exists some “small” set C, some b < ∞ and some non-negative “test func-
tion” V , ﬁnite ϕ-almost everywhere, satisfying

P (x, dy)V (y) ≤ V (x) − 1 + bIC (x), x ∈ X; (1.14)

(iv) There exists some “small” set C ∈ B(X) and some P ∞ (C) > 0 such that as
n→∞
lim inf sup |P n (x, C) − P ∞ (C)| = 0 (1.15)
n →∞ x∈C

Any of these conditions implies, for “aperiodic” chains,

sup |P n (x, A) − π(A)| → 0, n → ∞, (1.16)

A ∈B(X)

for every x ∈ X for which V (x) < ∞, where V is any function satisfying (1.14).
Thus “local recurrence” in terms of return times, as in (1.13) or “local convergence”
as in (1.15) guarantees the uniform limits in (1.16); both are equivalent to the mere
existence of the invariant probability measure π; and moreover we have in (1.14) an
exact test based only on properties of P for checking stability of this type.
Each of (i)–(iv) is a type of stability: the beauty of this result lies in the fact that they
are completely equivalent. Moreover, for this irreducible form of Markovian system, it is
further possible in the “stable” situation of this theorem to develop asymptotic results,
which ensure convergence not only of the distributions of the chain, but also of very
general (and not necessarily bounded) functions of the chain (Chapter 14); to develop
global rates of convergence to these limiting values (Chapter 15 and Chapter 16); and
to link these to Laws of Large Numbers or Central Limit Theorems (Chapter 17).
Together with these consequents of stability, we also provide a systematic approach
for establishing stability in speciﬁc models in order to utilize these concepts. The exten-
sion of the so-called “Foster–Lyapunov” criteria as in (1.14) to all aspects of stability,
1.3. Stochastic stability for Markov models 17

and application of these criteria in complex models, is a key feature of our approach to
stochastic stability.
These concepts are largely classical in the theory of countable state space Markov
chains. The extensions we give to general spaces, as described above, are neither so
well known nor, in some cases, previously known at all.
The heuristic discussion of this section will take considerable formal justiﬁcation,
but the end-product will be a rigorous approach to the stability and structure of Markov
chains.

1.3.2 A dynamical system approach to stability

Just as there are a number of ways to come to specific models such as the random
walk, there are other ways to approach stability, and the recurrence approach based on
ideas from countable space stochastic models is merely one. Another such is through
deterministic dynamical systems.
We now consider some traditional definitions of stability for a deterministic system,
such as that described by the linear model (1.3) or the linear control model LCM(F ,G).
One route is through the concepts of a (semi) dynamical system: this is a triple
(T, X , d) where (X , d) is a metric space, and T : X → X is, typically, assumed to
be continuous. A basic concern in dynamical systems is the structure of the orbit
{T k x : k ∈ Z+ }, where x ∈ X is an initial condition so that T 0 x := x, and we define
inductively T k +1 x := T k (T x) for k ≥ 1.
There are several possible dynamical systems associated with a given Markov chain.
The dynamical system which arises most naturally if X has sufficient structure is
based directly on the transition probability operators P k . If µ is an initial distribution
for the chain (that is, if Φ0 has distribution µ), one might look at the trajectory of
distributions {µP k : k ≥ 0}, and consider this as a dynamical system (P, M, d) with
M the space of Borel probability measures on a topological state space X, d a suitable
metric on M, and with the operator P defined as in (1.1) acting as P : M → M through
the relation
µP ( · ) = µ(dx)P (x, · ), µ ∈ M.
X

In this sense the Markov transition function P can be viewed as a deterministic map
from M to itself, and P will induce such a dynamical system if it is suitably continuous.
This interpretation can be achieved if the chain is on a suitably behaved space and
has the Feller property that P f (x) := P (x, dy)f (y) is continuous for every bounded
continuous f , and then d becomes a weak convergence metric (see Chapter 6).
As in the stronger recurrence ideas in (II) and (III) in Section 1.3.1, in discussing
the stability of Φ, we are usually interested in the behavior of the terms P k , k ≥ 0,
when k becomes large. Our hope is that this sequence will be bounded in some sense,
or converge to some ﬁxed probability π ∈ M, as indeed it does in (1.16).
Four traditional formulations of stability for a dynamical system, which give a frame-
work for such questions, are

(i) Lagrange stability: for each x ∈ X , the orbit starting at x is a precompact subset
of X . For the system (P, M, d) with d the weak convergence metric, this is exactly
tightness of the distributions of the chain, as deﬁned in (1.11);
18 Heuristics

(ii) Stability in the sense of Lyapunov : for each initial condition x ∈ X ,

lim sup d(T k y, T k x) = 0,

y →x k ≥0

where d denotes the metric on X . This is again the requirement that the long-term
behavior of the system is not overly sensitive to a change in the initial conditions;

(iii) Asymptotic stability: there exists some ﬁxed point x∗ so that T k x∗ = x∗ for
all k, with trajectories {xk } starting near x∗ staying near and converging to x∗
as k → ∞. For the system (P, M, d) the existence of a ﬁxed point is exactly
equivalent to the existence of a solution to the invariant equations (1.12);

(iv) Global asymptotic stability: the system is stable in the sense of Lyapunov and for
some ﬁxed x∗ ∈ X and every initial condition x ∈ X ,

lim d(T k x, x∗ ) = 0. (1.17)

k →∞

This is comparable to the result of Theorem 1.3.1 for the dynamical system
(P, M, d).

Lagrange stability requires that any limiting measure arising from the sequence {µP k }
will be a probability measure, rather as in (1.16).
Stability in the sense of Lyapunov is most closely related to irreducibility, although
rather than placing a global requirement on every initial condition in the state space,
stability in the sense of Lyapunov only requires that two initial conditions which are
suﬃciently close will then have comparable long term behavior. Stability in the sense
of Lyapunov says nothing about the actual boundedness of the orbit {T k x}, since it is
simply continuity of the maps {T k }, uniformly in k ≥ 0. An example of a system on R
which is stable in the sense of Lyapunov is the simple recursion xk +1 = xk + 1, k ≥ 0.
Although distinct trajectories stay close together if their initial conditions are similarly
close, we would not consider this system stable in most other senses of the word.
The connections between the probabilistic recurrence approach and the dynamical
systems approach become very strong in the case where the chain is both Feller and
ϕ-irreducible, and when the irreducibility measure ϕ is related to the topology by the
requirement that the support of ϕ contains an open set.
In this case, by combining the results of Chapter 6 and Chapter 18, we get for
suitable spaces

Theorem 1.3.2. For a ϕ-irreducible “aperiodic” Feller chain with supp ϕ containing
an open set, the dynamical system (P, M, d) is globally asymptotically stable if and only
if the distributions {P k (x, · )} are tight as in (1.11); and then the uniform ergodic limit
(1.16) holds.

This result follows, not from dynamical systems theory, but by showing that such
a chain satisﬁes the conditions of Theorem 1.3.1; these Feller chains are an especially
useful subset of the “suitable” chains for which tightness is equivalent to the properties
described in Theorem 1.3.1, and then, of course, (1.16) gives a result rather stronger
than (1.17).
1.4. Commentary 19

Embedding a Markov chain in a dynamical system through its transition probabili-

ties does not bring much direct beneﬁt, since results on dynamical systems in this level
of generality are relatively weak. The approach does, however, give insights into ways of
thinking of Markov chain stability, and a second heuristic to guide the types of results
we should seek.

1.4 Commentary
This book does not address models where the time set is continuous (when Φ is usually
called a Markov process), despite the sometimes close relationship between discrete and
continuous time models: see Chung [71] or Anderson [4] for the classical countable space
approach.
On general spaces in continuous time, there are a totally different set of questions
that are often seen as central: these are exemplified in Sharpe [352], although the
interested reader should also see Meyn and Tweedie [279, 280, 278] for recent results
which are much closer in spirit to, and rely heavily on, the countable time approach
followed in this book.
There has also been considerable recent work over the past two decades on the
subject of more generally indexed Markov models (such as Markov random fields, where
T is multidimensional), and these are also not in this book. In our development Markov
chains always evolve through time as a scalar, discrete quantity.
The question of what to call a Markovian model, and whether to concentrate on the
denumerability of the space or the time parameter in using the word “chain”, seems to
have been resolved in the direction we take here. Doob [99] and Chung [71] reserve the
term chain for systems evolving on countable spaces with both discrete and continuous
time parameters, but usage seems to be that it is the time set that gives the “chaining”.
Revuz [326], in his Notes, gives excellent reasons for this.
The examples we begin with here are rather elementary, but equally they are com-
pletely basic, and represent the twin strands of application we will develop: the first,
from deterministic to stochastic models via a “stochasticization” within the same func-
tional framework has analogies with the approach of Stroock and Varadhan in their
analysis of diffusion processes (see [378, 377, 168]), whilst the second, from basic inde-
pendent random variables to sums and other functionals traces its roots back too far
to be discussed here. Both these models are close to identical at this simple level. We
give more diverse examples in Chapter 2.
We will typically use X and Xn to denote state space models, or their values at
time n, in accordance with rather long established conventions. We will then typically
use lower case letters to denote the values of related deterministic models. Regenerative
models such as random walk are, on the other hand, typically denoted by the symbols
Φ and Φn , which we also use for generic chains.
The three concepts described in (I)–(III) may seem to give a rather limited number
of possible versions of “stability”. Indeed, in the various generalizations of deterministic
dynamical systems theory to stochastic models which have been developed in the past
three decades (see for example Kushner [232] or Khas’minskii [206]) there have been
many other forms of stability considered. All of them are, however, qualitatively similar,
and fall broadly within the regimes we describe, even though they differ in detail.
20 Heuristics

It will become apparent in the course of our development of the theory of irreducible
chains that in fact, under fairly mild conditions, the number of diﬀerent types of behav-
ior is indeed limited to precisely those sketched above in (I)–(III). Our aim is to unify
many of the partial approaches to stability and structural analysis, to indicate how
they are in many cases equivalent, and to develop both criteria for stability to hold for
individual models, and limit theorems indicating the value of achieving such stability.
With this rather optimistic statement, we move forward to consider some of the
speciﬁc models whose structure we will elucidate as examples of our general results.
Chapter 2

Markov models

The results presented in this book have been written in the desire that practitioners
will use them. We have tried therefore to illustrate the use of the theory in a systematic
and accessible way, and so this book concentrates not only on the theory of general
space Markov chains, but on the application of that theory in considerable detail.
We will apply the results which we develop across a range of specific applications:
typically, after developing a theoretical construct, we apply it to models of increas-
ing complexity in the areas of systems and control theory, both linear and nonlinear,
both scalar and vector valued; traditional “applied probability” or operations research
models, such as random walks, storage and queueing models, and other regenerative
schemes; and models which are in both domains, such as classical and recent time series
models.
These are not given merely as “examples” of the theory: in many cases, the appli-
cation is difficult and deep of itself, whilst applications across such a diversity of areas
have often driven the definition of general properties and the links between them. Our
goal has been to develop the analysis of applications on a step-by-step basis as the
theory becomes richer throughout the book.
To motivate the general concepts, then, and to introduce the various areas of appli-
cation, we leave until Chapter 3 the normal and necessary foundations of the subject,
and first introduce a cross-section of the models for which we shall be developing those
foundations.
These models are still described in a somewhat heuristic way. The full mathematical
description of their dynamics must await the development in the next chapter of the
concepts of transition probabilities, and the reader may on occasion benefit by moving
to some of those descriptions in parallel with the outlines here.
It is also worth observing immediately that the descriptive definitions here are from
time to time supplemented by other assumptions in order to achieve specific results:
these assumptions, and those in this chapter and the last, are collected for ease of
reference in Appendix C.
As the definitions are developed, it will be apparent immediately that very many of
these models have a random additive component, such as the i.i.d. sequence {Wn } in
both the linear state space model and the random walk model. Such a component goes
by various names, such as error, noise, innovation, disturbance or increment sequence,

21
22 Markov models

across the various model areas we consider. We shall use the nomenclature relevant to
the context of each model.
We will save considerable repetitive deﬁnition if we adopt a global convention im-
mediately to cover these sequences.

Error, noise, disturbance, innovation, and increments

Suppose W = {Wn } is labeled as an error, noise, innovation, disturbance
or increment sequence. Then this has the interpretation that the random
variables {Wn } are independent and identically distributed, with distribu-
tion identical to that of a generic variable denoted W .
We will systematically denote the probability law of such a variable W by
Γ.

It will also be apparent that many models are deﬁned inductively from their own past
in combination with such innovation sequences. In order to commence the induction,
initial values are needed. We adopt a second convention immediately to avoid repetition
in deﬁning our models.

Initialization
Unless speciﬁcally deﬁned otherwise, the initial state {Φ0 } of a Markov
model will be taken as independent of the error, noise, innovation, distur-
bance or increments process, and will have an arbitrary distribution.

2.1 Markov models in time series

The theory of time series has been developed to model a set of observations developing
in time: in this sense, the fundamental starting point for time series and for more general
Markov models is virtually identical. However, whilst the Markov theory immediately
assumes a short-term dependence structure on the variables at each time point, time
series theory concentrates rather on the parametric form of dependence between the
variables.
The time series literature has historically concentrated on linear models (that is,
those for which past disturbances and observations are combined to form the present ob-
servation through some linear transformation) although recently there has been greater
emphasis on nonlinear models. We ﬁrst survey a number of general classes of linear
models and turn to some recent nonlinear time series models in Section 2.2.
It is traditional to denote time series models as a sequence X = {Xn : n ∈ Z+ },
and we shall follow this tradition.
2.1. Markov models in time series 23

2.1.1 Simple linear models

The ﬁrst class of models we discuss has direct links with deterministic linear models,
state space models and the random walk models we have already introduced in Chap-
ter 1.
We begin with the simplest possible “time series” model, the scalar autoregression
of order one, or AR(1) model on R1 .

Simple linear model

The process X = {Xn , n ∈ Z+ } is called the simple linear model, or
AR(1) model if

(SLM1) for each n ∈ Z+ , Xn and Wn are random variables on R, satis-

fying
Xn +1 = αXn + Wn +1 ,
for some α ∈ R;

(SLM2) W = {Wn } is an error sequence with distribution Γ on R.

The simple linear model is trivially Markovian: the independence of Xn +1 from

Xn −1 , Xn −2 , . . . given Xn = x follows from the construction rule (SLM1), since the
value of Wn does not depend on any of {Xn −1 , Xn −2 . . .} from (SLM2).
The simple linear model can be viewed in one sense as an extension of the random
walk model, where now we take some proportion or multiple of the previous value,
not necessarily equal to the previous value, and again add a new random amount (the
“noise” or “error”) onto this scaled random value. Equally, it can be viewed as the
simplest special case of the linear state space model LSS(F ,G), in the scalar case with
F = α and G = 1.
In Figure 2.1 we give sets of sample paths of linear models with diﬀerent values of
the parameter α.
The choice of this parameter critically determines the behavior of the chain. If
|α| < 1 then the sample paths remain bounded in ways which we describe in detail in
later chapters, and the process X is inherently “stable”: in fact, ergodic in the sense
of Section 1.3.1 (III) and Theorem 1.3.1, for reasonable distributions Γ. But if |α| > 1
then X is unstable, in a well-deﬁned way: in fact, evanescent with probability one, in
the sense of Section 1.3.1 (II), if the noise distribution Γ is again reasonable.

2.1.2 Linear autoregressions and ARMA models

In the development of time series theory, simple linear models are usually analyzed as a
subset of the class of autoregressive models, which depend in a linear manner on their
past history for a ﬁxed number k ≥ 1 of steps in the past.
24 Markov models

Xk Xk

α = 0.85, Γ = N (0, 1) α = 1.05, Γ = N (0, 1)

k k

Figure 2.1: Shown on the left is a sample path from the linear model with α = 0.85, and
shown on the right is a sample path obtained with α = 1.05. The increment distribution
is N (0, 1) in each case.

Autoregressive model
A process Y = {Yn } is called a (scalar) autoregression of order k, or AR(k)
model, if it satisﬁes, for each set of initial values (Y0 , . . . , Y−k +1 ),
(AR1) for each n ∈ Z+ , Yn and Wn are random variables on R satisfying
inductively for n ≥ 1

Yn = α1 Yn −1 + α2 Yn −2 + . . . + αk Yn −k + Wn ,

for some α1 , . . . , αk ∈ R;

(AR2) W is an error sequence on R.

The collection Y = {Yn } is generally not Markovian if k > 1, since information on

the past (or at least the past in terms of the variables Yn −1 , Yn −2 , . . . , Yn −k ) provides
information on the current value Yn of the process. But by the device mentioned in
Section 1.2.1, of constructing the multivariate sequence

Xn = (Yn , . . . , Yn −k +1 )

and setting X = {Xn , n ≥ 0}, we define X as a Markov chain whose first component has
exactly the sample paths of the autoregressive process. Note that the general convention
that X0 has an arbitrary distribution implies that the first k variables (Y0 , . . . , Y−k +1 )
are also considered arbitrary.
The autoregressive model can then be viewed as a specific version of the vector-
2.1. Markov models in time series 25

valued linear state space model LSS(F ,G). For by (AR1),

   
α1 · · · · · · αk 1
1 0 0
   
Xn =  .. ..  Xn −1 +  ..  Wn . (2.1)
 . .  .
0 1 0 0
The same technique for producing a Markov model can be used for any linear model
which admits a ﬁnite-dimensional description. In particular, we take the following
general model:

Autoregressive moving-average model

The process Y = {Yn } is called an autoregressive moving-average process
of order (k, ), or ARMA(k, ) model, if it satisﬁes, for each set of initial
values (Y0 , . . . , Y−k +1 , W0 , . . . , W− +1 ),
(ARMA1) for each n ∈ Z+ , Yn and Wn are random variables on R,
satisfying, inductively for n ≥ 1,

Yn = α1 Yn −1 + α2 Yn −2 + · · · + αk Yn −k

+ Wn + β1 Wn −1 + β2 Wn −2 + · · · + β Wn − ,

for some α1 , . . . , αk , β1 , . . . , β ∈ R;

(ARMA2) W is an error sequence on R.

In this case more care must be taken to obtain a suitable Markovian description of
the process. One approach is to take
Xn = (Yn , . . . , Yn −k +1 , Wn , . . . , Wn − +1 ) .
Although the resulting state process X is Markovian, the dimension of this realization
may be overly large for effective analysis. A realization of lower dimension may be
obtained by defining the stochastic process Z inductively by
Zn = α1 Zn −1 + α2 Zn −2 + · · · + αk Zn −k + Wn . (2.2)
When the initial conditions are defined appropriately, it is a matter of simple algebra
and an inductive argument to show that
Yn = Zn + β1 Zn −1 + β2 Zn −2 + · · · + β Zn − ,
Hence the probabilistic structure of the ARMA(k, ) process is completely determined
by the Markov chain {(Zn , . . . , Zn −k +1 ) : n ∈ Z+ } which takes values in Rk .
The behavior of the general ARMA(k, ) model can thus be placed in the Markovian
context, and we will develop the stability theory of this, and more complex versions of
this model, in the sequel.
26 Markov models

2.2 Nonlinear state space models*

In discrete time, a general (semi) dynamical system on R is deﬁned, as in Section 1.3.2,
through a recursion of the form

xn +1 = F (xn ), n ∈ Z+ , (2.3)

for some continuous function F : R → R. Hence the simple linear model deﬁned in
(SLM1) may be interpreted as a linear dynamical system perturbed by the “noise”
sequence W .
The theory of time series is in this sense closely related to the general theory of
dynamical systems: it has developed essentially as that subset of stochastic dynamical
systems theory for which the relationships between the variables are linear, and even
with the nonlinear models from the time series literature which we consider below, there
is still a large emphasis on linear substructures.
The theory of dynamical systems, in contrast to time series theory, has grown from
a deterministic base, considering initially the type of linear relationship in (1.3) with
which we started our examples in Section 1.2, but progressing to models allowing a very
general (but still deterministic) relationship between the variables in the present and
in the past, as in (2.3). It is in the more recent development that “noise” variables,
allowing the system to be random in some part of its evolution, have been introduced.
Nonlinear state space models are stochastic versions of dynamical systems where a
Markovian realization of the model is both feasible and explicit: thus they satisfy a
generalization of (2.3) such as

Xn +1 = F (Xn , Wn +1 ), k ∈ Z+ , (2.4)

where W is a noise sequence and the function F : Rn × Rp → Rn is smooth (C ∞ ): that

is, all derivatives of F exist and are continuous.

2.2.1 Scalar nonlinear models

We begin with the simpler version of (2.4) in which the random variables are scalar.

Scalar nonlinear state space model

The chain X = {Xn } is called a scalar nonlinear state space model on R
driven by F , or SNSS(F ) model, if it satisﬁes
(SNSS1) for each n ≥ 0, Xn and Wn are random variables on R, satisfy-
ing, inductively for n ≥ 1,

Xn = F (Xn −1 , Wn ),

for some smooth (C ∞ ) function F : R × R → R;

(SNSS2) the sequence W is a disturbance sequence on R, whose marginal

distribution Γ possesses a density γw supported on an open set Ow .
2.2. Nonlinear state space models* 27

The independence of Xn +1 from Xn −1 , Xn −2 , . . . given Xn = x follows from the

rules (SNSS1) and (SNSS2), and ensures as previously that X is a Markov chain.
As with the linear control model (LCM1) associated with the linear state space
model (LSS1), we will analyze nonlinear state space models through the associated
deterministic “control models”. Define the sequence of maps {Fk : R × Rk → R : k ≥ 0}
inductively by setting F0 (x) = x, F1 (x0 , u1 ) = F (x0 , u1 ) and for k > 1
Fk (x0 , u1 , . . . , uk ) = F (Fk −1 (x0 , u1 , . . . , uk −1 ), uk ). (2.5)
We call the deterministic system with trajectories
xk = Fk (x0 , u1 , . . . , uk ), k ∈ Z+ , (2.6)
the associated control model CM(F ) for the SNSS(F ) model, provided the deterministic
control sequence {u1 , . . . , uk , k ∈ Z+ } lies in the set Ow , which we call the control set
for the scalar nonlinear state space model.
To make these definitions more concrete we define two particular classes of scalar
nonlinear models with specific structure which we shall use as examples on a number
of occasions.
The first of these is the bilinear model, so called because it is linear in each of
its input variables, namely the immediate past of the process and a noise component,
whenever the other is fixed: but their joint action is multiplicative as well as additive.

Simple bilinear model

The chain X = {Xn } is called the simple bilinear model if it satisﬁes
(SBL1) for each n ≥ 0, Xn and Wn are random variables on R, satisfying
for n ≥ 1,
Xn = θXn −1 + bXn −1 Wn + Wn
where θ and b are scalars, and the sequence W is an error sequence on R.

The bilinear process is thus a SNSS(F ) model with F given by

F (x, w) = θx + bxw + w, (2.7)
where the control set Ow ⊆ R depends upon the specific distribution of W .
In Figure 2.2 we give a sample path of a scalar nonlinear model with
F (x, w) = (0.707 + w)x + w
and with Γ = N (0, 12 ). This is the simple bilinear model with θ = 0.707 and b = 1. One
can see from this simulation that the behavior of this model is quite different from that
of any linear model.
The second specific nonlinear model we shall analyze is the scalar first-order SETAR
model. This is piecewise linear in contiguous regions of R, and thus while it may serve
as an approximation to a completely nonlinear process, we shall see that much of its
analysis is still tractable because of the linearity of its component parts.
28 Markov models

Xk
400

− 400
k
400

Figure 2.2: Simple bilinear model path with F (x, w) = (0.707 + w)x + w

SETAR model
The chain X = {Xn } is called a scalar self-exciting threshold autoregres-
sion (SETAR) model if it satisﬁes
(SETAR1) for each 1 ≤ j ≤ M , Xn and Wn (j) are random variables on
R, satisfying, inductively for n ≥ 1,

Xn = φ(j) + θ(j)Xn −1 + Wn (j), rj −1 < Xn −1 ≤ rj ,

where −∞ = r0 < r1 < · · · < rM = ∞ and {Wn (j)} forms an i.i.d.

zero-mean error sequence for each j, independent of {Wn (i)} for i = j.

Because of lack of continuity, the SETAR models do not fall into the class of non-
linear state space models, although they can often be analyzed using essentially the
same methods. The SETAR model will prove to be a useful example on which to test
the various stability criteria we develop, and the overall outcome of that analysis is
gathered together in Section B.2.

2.2.2 Multidimensional nonlinear models

Many nonlinear processes cannot be modeled by a scalar Markovian model such as the
SNSS(F ) model. The more general multidimensional model is deﬁned quite analogously.
2.2. Nonlinear state space models* 29

Nonlinear state space model

Suppose X = {Xk }, where
(NSS1) for each k ≥ 0, Xk and Wk are random variables on Rn , Rp
respectively, satisfying inductively for k ≥ 1,

Xk = F (Xk −1 , Wk ),

for some smooth (C ∞ ) function F : X×Ow → X, where X is an open subset

of Rn and Ow is an open subset of Rp ;

(NSS2) the random variables {Wk } are a disturbance sequence on Rp ,

whose marginal distribution Γ possesses a density γw which is supported
on an open set Ow .
Then X is called a nonlinear state space model driven by F , or NSS(F )
model, with control set Ow .

The general nonlinear state space model can often be analyzed by the same meth-
ods that are used for the scalar SNSS(F ) model, under appropriate conditions on the
disturbance process W and the function F .
It is a central observation of such analysis that the structure of the NSS(F ) model
(and of course its scalar counterpart) is governed under suitable conditions by an asso-
ciated deterministic control model, deﬁned analogously to the linear control model and
the linear state space model.

Control model CM(F )

(CM1) The deterministic system

xk = Fk (x0 , u1 , . . . , uk ), k ∈ Z+ , (2.8)

where the sequence of maps {Fk : X × Owk → X : k ≥ 0} is deﬁned by (2.5),

is called the associated control system for the NSS(F ) model and is denoted
CM(F ) provided the deterministic control sequence {u1 , . . . , uk , k ∈ Z+ }
lies in the control set Ow ⊆ Rp .

The general ARMA model may be generalized to obtain a class of nonlinear models,
all of which may be “Markovianized”, as in the linear case.
30 Markov models

Nonlinear autoregressive moving-average model

The process Y = {Yn } is called a nonlinear autoregressive moving-average
process of order (k, ) if the values Y0 , . . . , Yk −1 are arbitrary and
(NARMA1) for each n ≥ 0, Yn and Wn are random variables on R,
satisfying, inductively for n ≥ k,

Yn = G(Yn −1 , Yn −2 , . . . , Yn −k , Wn , Wn −1 , Wn −2 , . . . , Wn − )

where the function G : Rk + +1 → R is smooth (C ∞ );

(NARMA2) the sequence W is an error sequence on R.

As in the linear case, we may deﬁne

Xn = (Yn , . . . , Yn −k +1 , Wn , . . . , Wn − +1 )
to obtain a Markovian realization of the process Y . The process X is Markovian, with
state space X = Rk + , and has the general form of an NSS(F ) model, with
Xn = F (Xn −1 , Wn ), n ∈ Z+ . (2.9)

2.2.3 The gumleaf attractor

The gumleaf attractor is an example of a nonlinear model such as those which frequently
occur in the analysis of control algorithms for nonlinear systems, some of which are
briefly described below in Section 2.3. In an investigation of the pathologies which can
reveal themselves in adaptive control, a specific control methodology which is described
in Section 2.3.2, Mareels and Bitmead [247] found that the closed loop system dynamics
in an adaptive control application can be described by the simple recursion
1 1
vn = − , n ∈ Z+ . (2.10)
vn −1 vn −2
Here vn is a “closed loop system gain” which is a simple function of the output of the
system which is to be controlled. Figure 2.3 (a) shows a plot of v over 40,000 time steps.
The sample path behavior is similar to that observed for the simple bilinear model in
Figure 2.2. It is extremely bursty, but appears to be stationary.
a a b
We can obtain an NSS(F ) model with xn = v nv n−1 and F xx b = 1/x x−1/x a . How-
ever, in view of the extremely large values observed in simulations, we perform a one-
to-one transformation as follows. Define for z ∈ R2 ,

[z] = (1 +
z
)−1 zz 12 ,

so that the components of [z] lie within the open unit disk in R2 for any z ∈ R2 .
Following this transformation we obtain the nonlinear state space model
a a
xn xn −1 1/xan −1 − 1/xbn −1
xn = =F b = . (2.11)
xbn xn −1 xan −1
2.2. Nonlinear state space models* 31

V
4000

t
0

-4000

(a) Plot of {v(n)} after 40,000 time steps

X2 X2

X1 X1
(b) Shown on the left is the gumleaf attractor, and on the right is the gumleaf
attractor perturbed by noise.

Figure 2.3: The gumleaf attractor

A typical sample path of this model is given on the left hand side of Figure 2.3 (b).
In this figure 40,000 consecutive sample points of {xn } have been indicated by points
to illustrate the qualitative behavior of the model. Because of its similarity to some
Australian flora, the authors call the resulting plot the gumleaf attractor. Ydstie in
[410] also finds that such chaotic behavior can easily occur in adaptive systems.
One way that noise can enter the model (2.11) is to perturb (2.10) by noise. The
resulting two-dimensional recursion becomes
a
Xn 1/Xna −1 − 1/Xnb −1 Wn
Xn = b
= a + , (2.12)
Xn Xn −1 0

where W is i.i.d.. The special case where for each n the disturbance Wn is uniformly
distributed on [− 12 , 12 ] is illustrated on the right in Figure 2.3 (b). As in the previous
ﬁgure, we have plotted 40,000 values of the sequence X which takes values in R2 . Note
that the qualitative behavior of the process remains similar to the noise-free model,
although some of the detailed behavior is “smeared out” by the noise.
The analysis of general models of this type is a regular feature in what follows, and
in Chapter 7 we give a detailed analysis of the path structure that might be expected
under suitable assumptions on the noise and the associated deterministic model.
32 Markov models

2.2.4 The dependent parameter bilinear model

As a simple example of a multidimensional nonlinear state space model, we will consider

the following dependent parameter bilinear model, which is closely related to the simple
bilinear model introduced above. To allow for dependence in the parameter process, we
construct a two-dimensional process so that the Markov assumption will remain valid.

The dependent parameter bilinear model

The process Φ = Yθ is called the dependent parameter bilinear model if
it satisﬁes
(DBL1) for some |α| < 1 and all k ∈ Z+ ,

Yk +1 = θk Yk + Wk +1 , (2.13)
θk +1 = αθk + Zk +1 ; (2.14)

(DBL2) the joint process (Z, W ) is a disturbance sequence on R2 , Z

and W are mutually independent, and the distributions Γw and Γz of W ,
Z respectively possess densities which are lower semicontinuous – recall
that a function h from X to R is lower semicontinuous if

lim inf h(y) ≥ h(x), x ∈ X.

y →x

It is assumed that W has a ﬁnite second moment, and that E[log(1+|Z|)] <
∞.

This is described by a two-dimensional NSS(F ) model, where the function F is of

the form
αθ + Z
Y Z
F θ , W = . (2.15)
θY + W

As usual, the control set Ow ⊆ R2 depends upon the speciﬁc distribution of W and Z.

A plot of the joint process Yθ is given in Figure 2.4. In this simulation we have
α = 0.933, Wk ∼ N (0, 0.14) and Zk ∼ N (0, 0.01).
The dark line is a plot of the parameter process θ, and the lighter, more explosive
path is the resulting output Y . One feature of this model is that the output oscillates
rapidly when θk takes on large negative values, which occurs in this simulation for time
values between 80 and 100.
2.3. Models in control and systems theory 33

10 θk
Yk

1
−1

−10
k
150

Figure 2.4: Dependent parameter bilinear model paths with α = 0.933, Wk ∼ N (0, 0.14)
and Zk ∼ N (0, 0.01)

2.3 Models in control and systems theory

2.3.1 Choosing controls
In Section 2.2, we defined deterministic control systems, such as (2.5), associated with
Markovian state space models. We now begin with a general control system, which
might model the dynamics of an aircraft, a cruise control in an automobile, or a con-
trolled chemical reaction, and seek ways to choose a control to make the system attain
a desired level of performance.
Such control laws typically involve feedback; that is, the input at a given time is
chosen based upon present output measurements, or other features of the system which
are available at the time that the control is computed. Once such a control law has been
selected, the dynamics of the controlled system can be complex. Fortunately, with most
control laws, there is a representation (the “closed loop” system equations) which gives
rise to a Markovian state process Φ describing the variables of interest in the system.
This additional structure can greatly simplify the analysis of control systems.
We can extend the AR models of time series to an ARX (autoregressive with exoge-
nous variables) system model defined for k ≥ 1 by
Yk + α1 (k)Yk −1 + · · · + αn 1 (k)Yk −n 1 = β1 (k)Uk −1 + · · · + βn 2 (k)Uk −n 2 + Wk (2.16)
where we assume for this discussion that the output process Y , the input process (or
exogenous variable sequence) U , and the disturbance process W are all scalar valued,
and initial conditions are assigned at k = 0.
Let us also assume that we have random coefficients αj (k), βj (k) rather than fixed
coefficients at each time point k. In such a case we may have to estimate the coefficients
in order to choose the exogenous input U .
The objective in the design of the control sequence U is specific to the particular
application. However, it is often possible to set up the problem so that the goal becomes
a problem of regulation: that is, to make the output as small as possible. Given the
stochastic nature of systems, this is typically expressed using the concepts of sample
mean square stabilizing sequences and minimum variance control laws.
34 Markov models

We call the input sequence U sample mean square stabilizing if the input-output
process satisﬁes
1 2
N
lim sup [Yk + Uk2 ] < ∞ a.s.
N →∞ N
k =1

for every initial condition. The control law is then said to be minimum variance if it is
sample mean square stabilizing, and the sample path average

1 2
N
lim sup Yk (2.17)
N →∞ N
k =1

is minimized over all control laws with the property that, for each k, the input Uk is a
function of Yk , . . . , Y0 , and the initial conditions.
Such controls are often called “causal”, and for causal controls there is some pos-
sibility of a Markovian representation. We now specialize this general framework to a
situation where a Markovian analysis through state space representation is possible.

2.3.2 Adaptive control

In adaptive control, the parameters {αi (k), βi (k)} are not known a priori, but are par-
tially observed through the input-output process. Typically, a parameter estimation
algorithm, such as recursive least squares, is used to estimate the parameters on-line in
implementations. The control law at time k is computed based upon these estimates
and past output measurements.
As an example, consider the system model given in equation (2.16) with all of the
parameters taken to be independent of k, and let

θ = (−α1 , . . . , −αn 1 , β1 , . . . , βn 2 )

denote the time invariant parameter vector. Suppose for the moment that the parameter
θ is known. If we set

φ
k −1 := (Yk −1 , . . . , Yk −n 1 , Uk −1 , . . . , Uk −n 2 ),

and if we deﬁne for each k the control Uk as the solution to

φ
k θ = 0, (2.18)

then this will result in Yk = Wk for all k. This control law obviously minimizes the
performance criterion (2.17) and hence is a minimum variance control law if it is sample
mean square stabilizing.
It is also possible to obtain a minimum variance control law, even when θ is not
available directly for the computation of the control Uk . One such algorithm (developed
in [142]) has a recursive form given by ﬁrst estimating the parameters through the
following stochastic gradient algorithm:

θ̂k = θ̂k −1 + rk−1

−1 φk −1 Yk ,
(2.19)
rk = rk −1 +
φk
2 ;
2.3. Models in control and systems theory 35

the new control Uk is then deﬁned as the solution to the equation

φ
k θ̂k = 0.

With Xk ∈ X := R+ × R2(n 1 +n 2 ) deﬁned as

 
rk−1
Xk :=  φk 
θ̂k

we see that X is of the form Xk +1 = F (Xk , Wk +1 ), where F : X × R → X is a rational

function, and hence X is a Markov chain.
To illustrate the results in stochastic adaptive control obtainable from the theory of
Markov chains, we will consider here and in subsequent chapters the following ARX(1)
random parameter, or state space, model.

Simple adaptive control model

The simple adaptive control model is a triple Y , U , θ where

(SAC1) the output sequence Y and parameter sequence θ are deﬁned

inductively for any input sequence U by

Yk +1 = θk Yk + Uk + Wk +1 , (2.20)
θk +1 = αθk + Zk +1 , k ≥ 1, (2.21)

where α is a scalar with |α| < 1;

Z
(SAC2) the bivariate disturbance process W is Gaussian and satisﬁes

Zn 0
E[ W ] = ,
0 2
n

Zn σz 0
E[ W n (Zk , Wk )] = δn −k , n ≥ 1;
0 σw2

(SAC3) the input process satisﬁes Uk ∈ Yk , k ∈ Z+ , where Yk =

σ{Y0 , . . . , Yk }. That is, the input Uk at time k is a function of past and
present output values.

The time varying parameter process θ here is not observed directly but is partially
observed through the input and output processes U and Y .
The ultimate goal with such a model is to ﬁnd a mean square stabilizing, minimum
variance control law. If the parameter sequence θ were completely observed then this
goal could be easily achieved by setting Uk = −θk Yk for each k ∈ Z+ , as in (2.18).
Since θ is only partially observed, we instead obtain recursive estimates of the
parameter process and choose a control law based upon these estimates. To do this
36 Markov models

we note that by viewing θ as a state process, as deﬁned in [57], then because of the
assumptions made on (W , Z), the conditional expectation

θ̂k := E[θk | Yk ]

is computable using the Kalman ﬁlter (see [253, 240]) provided the initial distribution
of (U0 , Y0 , θ0 ) for (2.20), (2.21) is Gaussian.
In this scalar case, the Kalman ﬁlter estimates are obtained recursively by the pair
of equations

Σk (Yk +1 − θ̂k Yk − Uk )Yk

θ̂k +1 = αθ̂k + α ,
Σk Yk2 + σw2
α2 σw2 Σk
Σk +1 = σz2 + .
Σk Yk2 + σw2

When α = 1, σw = 1 and σz = 0, so that θk = θ0 for all k, these equations deﬁne the

recursive least squares estimates of θ0 , similar to the gradient algorithm described in
(2.19).
Deﬁning the parameter estimation error at time n by θ̃n := θn − θ̂n , we have that
θ̃k = θk − E[θk | Yk ], and Σk = E[θ̃k2 | Yk ] whenever θ̃0 is distributed N (0, Σ0 ) and Y0
and Σ0 are constant (see [270] for more details).
We use the resulting parameter estimates {θ̂k : k ≥ 0} to compute the “certainty
equivalence” adaptive minimum variance control Uk = −θ̂k Yk , k ∈ Z+ . With this choice
of control law, we can deﬁne the closed loop system equations.

Closed loop system equations

The closed loop system equations are

θ̃k +1 = αθ̃k − αΣk Yk +1 Yk (Σk Yk2 + σw2 )−1 + Zk +1 , (2.22)

Yk +1 = θ̃k Yk + Wk +1 , (2.23)
Σk +1 = σz2 + α2 σw2 Σk (Σk Yk2 + σw2 )−1 , k ≥ 1, (2.24)

where the triple Σ0 , θ̃0 , Y0 is given as an initial condition.

The closed loop system gives rise to a nonlinear state space model of the form
(NSS1). It follows then that the triple

Φk := (Σk , θ̃k , Yk ) , k ∈ Z+ , (2.25)

σ2
is a Markov chain with state space X = [σz2 , 1−αz 2 ] × R2 . Although the state space is not
open, as required in (NSS1), when necessary we can restrict the chain to the interior
of X to apply the general results which will be developed for the nonlinear state space
model.
2.3. Models in control and systems theory 37

Yk Yk
0.4 30

0
0

− 0.4
k k
1000 1000

Figure 2.5: Output Y of the SAC model. The sample path shown on the left was
obtained using σz = 0.2, and the one shown on the right used σz = 1.1. In each case
α = 0.99 and σw = 0.1

Wk
0.4

− 0.4
k
1000

Figure 2.6: Disturbance W for the SAC model: N (0, 0.01) Gaussian white noise

In Figure 2.5 we have illustrated two typical sample paths of the output process Y ,
identical but for the diﬀerent values of σz chosen. The disturbance process W in both
instances is i.i.d. N (0, 0.01); that is, σw = 0.1. A typical sample path of W is given in
Figure 2.6.
In both simulations we take α = 0.99. In the “stable” case shown on the left we
have σz = 0.2. In this case the output Y is barely distinguishable from the noise W .
In the second simulation, where σz = 1.1, we see that the output exhibits occasional
large bursts due to the more unpredictable behavior of the parameter process.
As we develop the general theory of Markov processes we will return to this example
to obtain fairly detailed properties of the closed loop system described by (2.22)-(2.24).
In Chapter 16 we characterize the mean square performance (2.17): when the pa-
rameter σz2 which deﬁnes the parameter variation is strictly less than unity, the limit
supremum is in fact a limit in this example, and this limit is independent of the initial
conditions of the system.
This limit, which is the expectation of Y0 with respect to an invariant measure,
cannot be calculated exactly due to the complexity of the closed loop system equations.
Using invariance, however, we may obtain explicit bounds on the limit, and give a
38 Markov models

characterization of the performance of the closed loop system which this limit describes.
Such characterizations are helpful in understanding how the performance varies as a
function of the disturbance intensity W and the parameter estimation error θ.

2.4 Markov models with regeneration times

The processes in the previous section were Markovian largely through choosing a suffi-
ciently large product space to allow augmentation by variables in the finite past.
The chains we now consider are typically Markovian using the second paradigm
in Section 1.2.1, namely by choosing specific regeneration times at which the past is
forgotten. For more details of such models see Feller [114, 115] or Asmussen [9].

2.4.1 The forward recurrence time chain

A chain which is a special form of the random walk chain in Section 1.2.3 is the renewal
process. Such chains will be fundamental in our later analysis of the structure of even
the most general of Markov chains, and here we describe the speciﬁc case where the
state space is countable.
Let {Y1 , Y2 , . . .} be a sequence of independent and identical random variables, with
distribution function p concentrated, not on the positive and negative integers, but
rather on Z+ . It is customary to assume that p(0) = 0. Let Y0 be a further independent
random variable, with the distribution of Y0 being a, also concentrated on Z+ . The
random variables

n
Zn := Yi
i=0

form an increasing sequence taking values in Z+ , and are called a delayed renewal
process, with a being the delay in the ﬁrst variable: if a = p then the sequence {Zn } is
merely referred to as a renewal process.
As with the two-sided random walk, Zn is a Markov chain: not a particularly
interesting one in some respects, since it is evanescent in the sense of Section 1.3.1 (II),
but with associated structure which we will use frequently, especially in Part III.
With this notation we have P(Z0 = n) = a(n) and by considering the value of Z0
and the independence of Y0 and Y1 , we ﬁnd

n
P(Z1 = n) = a(j)p(n − j).
j =0

To describe the n-step dynamics of the process {Zn } we need convolution notation.
2.4. Markov models with regeneration times 39

Convolutions
We write a ∗ b for the convolution of two sequences a and b given by

n
n
a ∗ b (n) := b(j)a(n − j) = a(j)b(n − j)
j =0 j =0

and ak ∗ for the k th convolution of a with itself.

By decomposing successively over the values of the ﬁrst n variables Z0 , . . . , Zn −1

and using the independence of the increments Yi we have that

P(Zk = n) = a ∗ pk ∗ (n).

Two chains with appropriate regeneration associated with the renewal process are the
forward recurrence time chain, sometimes called the residual lifetime process, and the
backward recurrence time chain, sometimes called the age process.

Forward and backward recurrence time chains

If {Zn } is a discrete time renewal process, then the forward recurrence
time chain V + = V + (n), n ∈ Z+ , is given by
(RT1) V + (n) := inf(Zm − n : Zm > n), n ≥ 0,
and the backward recurrence time chain V − = V − (n), n ∈ Z+ , is given
by
(RT2) V − (n) := inf(n − Zm : Zm ≤ n), n ≥ 0.

The dynamics of motion for V + and V − are particularly simple.

If V + (n) = k for k > 1 then, in a purely deterministic fashion, one time unit
later the forward recurrence time to the next renewal has come down to k − 1. If
V + (n) = 1 then a renewal occurs at n + 1: therefore the time to the next renewal has
the distribution p of an arbitrary Yj , and this is the distribution also of V + (n + 1) . For
the backward chain, the motion is reversed: the chain increases by one, or ages, with
the conditional probability of a renewal failing to take place, and drops to zero with
the conditional probability that a renewal occurs. We deﬁne the laws of these chains
formally in Section 3.3.1.
The regeneration property at each renewal epoch ensures that both V + and V −
are Markov chains; and, unlike the renewal process itself, these chains are stable under
straightforward conditions, as we shall see.
Renewal theory is traditionally of great importance in countable space Markov chain
theory: the same is true in general spaces, as will become especially apparent in Part
40 Markov models

III. We only use those aspects which we require in what follows, but for a much fuller
treatment of renewal and regeneration see Kingman [208] or Lindvall [239].

2.4.2 The GI/G/1, GI/M/1 and M/G/1 queues

The theory of queueing systems provides an explicit and widely used example of the
random walk models introduced in Section 1.2.3, and we will develop the application of
Markov chain and process theory to such models, and related storage and dam models,
as another of the central examples of this book.
These models indicate for the first time the need, in many physical processes, to take
care in choosing the timepoints at which the process is analyzed: at some “regeneration”
time-points, the process may be “Markovian”, whilst at others there may be a memory
of the past influencing the future.
In the modeling of queues, to use a Markov chain approach we can make certain
distributional assumptions (and specifically assumptions that some variables are expo-
nential) to generate regeneration times at which the Markovian forgetfulness property
holds. We develop such models in some detail, as they are fundamental examples of the
use of regeneration in utilizing the Markovian assumption.
Let us first consider a general queueing model to illustrate why such assumptions
may be needed.

Queueing model assumptions

Suppose the following assumptions hold.
(Q1) Customers arrive into a service operation at timepoints T0 = 0,
T0 + T1 , T0 + T1 + T2 , . . . where the interarrival times Ti , i ≥ 1, are
independent and identically distributed random variables, distributed as a
random variable T with G(−∞, t] = P(T ≤ t).

(Q2) The nth customer brings a job requiring service Sn where the ser-
vice times are independent of each other and of the interarrival times, and
are distributed as a variable S with distribution H(−∞, t] = P(S ≤ t).

(Q3) There is one server and customers are served in order of arrival.
Then the system is called a GI/G/1 queue.

The notation and many of the techniques here were introduced by Kendall [200, 201]:
GI for general independent input, G for general service time distributions, and 1 for a
single server system. There are many ways of analyzing this system: see Asmussen [9]
or Cohen [76] for comprehensive treatments.
Let N (t) be the number of customers in the queue at time t, including the customers
being served. This is clearly a process in continuous time. A typical sample path for
{N (t), t ≥ 0}, under the assumption that the ﬁrst customer arrives at t = 0, is shown
2.4. Markov models with regeneration times 41

N (t)
S0 S1 S2 S3 S4

T1 T 2 T3 T4 T5 T6 t
0
T1 T2 T3 T4 T5 T6
x

Figure 2.7: Typical sample path of the single server queue

in Figure 2.7, where we denote by Ti , the arrival times

Ti = T1 + · · · + Ti , i ≥ 1, (2.26)

and by Si the sums of service times

Si = S0 + · · · + Si , i ≥ 0. (2.27)

Note that, in the sample path illustrated, because the queue empties at S2 , due to
T3 > S2 , the point x = T3 + S3 is not S3 , and the point T4 + S4 is not S4 , and so on.
Although the process {N (t)} occurs in continuous time, one key to its analysis
through Markov chain theory is the use of embedded Markov chains.
Consider the random variable Nn = N (Tn −), which counts customers immediately
before each arrival. By convention we will set N0 = 0 unless otherwise indicated. We
will show that under appropriate circumstances for k ≥ −j

P(Nn +1 = j + k | Nn = j, Nn −1 , Nn −2 , . . . , N0 ) = pk , (2.28)

regardless of the values of {Nn −1 , . . . , N0 }. This will establish the Markovian nature of
the process, and indeed will indicate that it is a random walk on Z+ .
Since we consider N (t) immediately before every arrival time, Nn +1 can only increase
from Nn by one unit at most; hence, equation (2.28) holds trivially for k > 1.
For Nn +1 to increase by one unit we need there to be no departures in the time
period Tn +1 − Tn , and obviously this happens if the job in progress at Tn is still in
progress at Tn +1 .
It is here that some assumption on the service times will be crucial. For it is easy
to show, as we now sketch, that for a general GI/G/1 queue the probability of the
remaining service of the job in progress taking any speciﬁc length of time depends,
typically, on when the job began. In general, the past history {Nn −1 , . . . , N0 } will
provide information on when the customer began service, and this in turn provides
information on how long the customer will continue to be served.
To see this, consider, for example, a trajectory such as that up to (T1 −) on Fig-
ure 2.7, where {Nn = 1, Nn −1 = 0, . . .}. This tells us that the current job began exactly
42 Markov models

at the arrival time Tn −2 , so that (as at (T2 −))

P(Nn +1 = 2 | Nn = 1, Nn −1 = 0) = P(Sn −2 > Tn +1 + Tn | Sn −2 > Tn ). (2.29)

However, a history such as {Nn = 1, Nn −1 = 1, Nn −2 = 0}, such as occurs up to (T5 −)

on Figure 2.7, shows that the current job began within the interval (Tn , Tn −1 ), and
so for some z < Tn (given by T5 − x on Figure 2.7), the behavior at (T6 −) has the
probability

P(Nn +1 = 2 | Nn = 1, Nn −1 = 1, Nn −2 = 0) = P(Sn > Tn +1 + z | Sn > z).

It is clear that for most distributions H of the service times Si , if we know Tn +1 = t

and Tn = t > z

P(Sn > t + z | Sn > z) = P(Sn > t + t | Sn > t ); (2.30)

so N = {Nn } is not a Markov chain, since from equation (2.29) and equation (2.30)
the different information in the events {Nn = 1, Nn −1 = 0} and {Nn = 1, Nn −1 =
1, Nn −2 = 0} (which only differ in the past rather than the present position) leads to
different probabilities of transition.
There is one case where this does not happen. If both sides of (2.30) are identical
so that the time until completion of service is quite independent of the time already
taken, then the extra information from the past is of no value.
This leads us to define a specific class of models for which N is Markovian.

GI/M/1 assumption
(Q4) If the distribution of service times is exponential with

H(−∞, t] = 1 − e−µt , t ≥ 0,

then the queue is called a GI/M/1 queue.

Here the M stands for Markovian, as opposed to the previous “general” assumption.
If we can now make assumption (Q4) that we have a GI/M/1 queue, then the well-
known “loss of memory” property of the exponential shows that, for any t, z,

P(Sn > t + z | Sn > z) = e−µ(t+z ) /e−µz = e−µt .

In this way, the independence and identical distribution structure of the service times
show that, no matter which previous customer was being served, and when their service
started, there will be some z such that

P(Nn +1 = j + 1 | Nn = j, Nn −1 , . . .) = P(S > T + z | S > z)

∞
= 0
e−µt G(dt)
2.4. Markov models with regeneration times 43

independent of the value of z in any given realization, as claimed in equation (2.28).

This same reasoning can be used to show that, if we know Nn = j, then for 0 < i ≤ j,
we will ﬁnd Nn +1 = i provided j − i + 1 customers leave in the interarrival time
(Tn , Tn +1 ). This corresponds to (j − i + 1) jobs being completed in this period, and
the (j − i + 1)th job continuing past the end of the period. The probability of this
happening, using the forgetfulness of the exponential, is independent of the amount of
time the service is in place at time Tn has already consumed, and thus N is Markovian.
A similar construction holds for the chain N ∗ = {Nn∗ } deﬁned by taking the number
in the queue immediately after the nth service time is completed. This will be a Markov
chain provided the number of arrivals in each service time is independent of the times
of the arrivals prior to the beginning of that service time. As above, we have such a
property if the inter-arrival time distribution is exponential, leading us to distinguish
the class of M/G/1 queues, where again the M stands for a Markovian inter-arrival
assumption.

M/G/1 assumption
(Q5) If the distribution of inter-arrival times is exponential with

G(−∞, t] = 1 − e−λt , t ≥ 0,

then the queue is called an M/G/1 queue.

The actual probabilities governing the motion of these queueing models will be
developed in Chapter 3.

2.4.3 The Moran dam

The theory of storage systems provides another of the central examples of this book,
and is closely related to the queueing models above.
The storage process example is one where, although the time of events happening
(that is, inputs occurring) is random, between those times there is a deterministic
motion which leads to a Markovian representation at the input times which always
form regeneration points.
A simple model for storage (the “Moran dam” [288, 9]) has the following elements.
We assume there is a sequence of input times T0 = 0, T0 + T1 , T0 + T1 + T2 , . . . , at
which there is input into a storage system, and that the inter-arrival times Ti , i ≥ 1,
are independent and identically distributed random variables, distributed as a random
variable T with G(−∞, t] = P(T ≤ t).
At the nth input time, the amount of input Sn has a distribution H(−∞, t] =
P(Sn ≤ t); the input amounts are independent of each other and of the interarrival
times. Between inputs, there is steady withdrawal from the storage system, at a rate
r: so that in a time period [x, x + t], the stored contents drop by an amount rt since
there is no input.
44 Markov models

When a path of the contents process reaches zero, the process continues to take the
value zero until it is replenished by a positive input.
This model is a simpliﬁed version of the way in which a dam works; it is also a
model for an inventory, or for any other similar storage system.
The basic storage process operates in continuous time: to render it Markovian we
analyze it at speciﬁc time points when it (probabilistically) regenerates, as follows.

Simple storage models

(SSM1) For each n ≥ 0 let Sn and Tn be independent random variables
on R with distributions H and G as above.

(SSM2) Deﬁne the random variables

Φn +1 = [Φn + Sn − Jn ]+ ,

where the variables {Jn } are independent and identically distributed, with

P(Jn ≤ x) = G(−∞, x/r] (2.31)

for some r > 0.

Then the chain Φ = {Φn } represents the contents of a storage system at

the times {Tn −} immediately before each input, and is called the simple
storage model.

The independence of Sn +1 from Sn −1 , Sn −2 , . . . and the construction rules (SSM1)

and (SSM2) ensure as before that {Φn } is a Markov chain: in fact, it is a speciﬁc
example of the random walk on a half line deﬁned by (RWHL1), in the special case
where
Wn = Sn − Jn , n ∈ Z+ .

It is an important observation here that, in general, the process sampled at other

time points (say, at regular time points) is not a Markov system, since it is crucial in
calculating the probabilities of the future trajectory to know how much earlier than
the chosen time point the last input point occurred: by choosing to examine the chain
embedded at precisely those pre-input times, we lose the memory of the past. This was
discussed in more detail in Section 2.4.2.
∞
We define the mean input by α = 0 x H(dx) and the mean output between inputs
∞
by β = 0 rx G(dx). In Figure 2.8 we give two sample paths of storage models with
different values of the parameter ratio α/β. The behavior of the sample paths is quite
different for different values of this ratio, which will turn out to be the crucial quantity
in assessing the stability of these models.
2.4. Markov models with regeneration times 45

Φk α/β = 2 Φk α/β = 0.5

20 2.5

0
k 0
k
0 100 0 100

Figure 2.8: Storage system paths. The plot shown on the left uses α/β = 2, and on the
right α/β = 0.5. In each case r = 1.

2.4.4 Content-dependent release rules

As with time series models or state space systems, the linearity in the Moran storage
model is clearly a ﬁrst approximation to a more sophisticated system.
There are two directions in which this can be taken without losing the Markovian
nature of the model.
Again assume there is a sequence of input time points T0 = 0, T0 + T1 , T0 + T1 +
T2 , . . . , and that the inter-arrival times Ti , i ≥ 1, are independent and identically
distributed random variables, with distribution G.
Then one might assume that, if the contents at the nth input time are given by
Φn = x, the amount of input Sn (x) has a distribution given by Hx (−∞, t] = P(Sn (x) ≤
t) dependent on x; the input amounts remain independent of each other and of the
interarrival times.
Alternatively, one might assume that between inputs, there is withdrawal from the
storage system, at a rate r(x) which also depends on the level x at the moment of
withdrawal. This assumption leads to the conclusion that, if there are no inputs, the
deterministic time to reach the empty state from a level x is
x
R(x) = [r(y)]−1 dy. (2.32)
0

Usually we assume R(x) to be ﬁnite for all x. Since R is strictly increasing the inverse
function R−1 (t) is well deﬁned for all t, and it follows that the drop in level in a time
period t with no input is given by

Jx (t) = x − q(x, t)

where
q(x, t) = R−1 (R(x) − t).
This enables us to use the same type of random walk calculation as for the Moran dam.
As before, when a path of this storage process reaches zero, the process continues
to take the value zero until it is replenished by a positive input.
46 Markov models

It is again necessary to analyze such a model at the times immediately before each
input in order to ensure a Markovian model. The assumptions we might use for such a
model are

Content-dependent storage models

(CSM1) For each n ≥ 0 let Sn (x) and Tn be independent random vari-
ables on R with distributions Hx and G as above.
(CSM2) Deﬁne the random variables

Φn +1 = [Φn − Jn + Sn (Φn − Jn )]+ ,

where the variables {Jn } are independently distributed, with

P(Jn ≤ y | Φn = x) = G(dt)P(Jx (t) ≤ y). (2.33)

Then the chain Φ = {Φn } represents the contents of the storage system at
the times {Tn −} immediately before each input, and is called the content-
dependent storage model.

Such models are studied in [157, 53]. In considering the connections between queue-
ing and storage models, it is then immediately useful to realize that this is also a model
of the waiting times in a model where the service time varies with the level of demand,
as studied in [56].

2.5 Commentary*
We have skimmed the Markovian models in the areas in which we are interested, try-
ing to tread the thin line between accessibility and triviality. The research literature
abounds with variations on the models we present here, and many of them would benefit
by a more thorough approach along Markovian lines.
For many more models with time series applications, the reader should see Brockwell
and Davis [51], especially Chapter 12; Granger and Anderson for bilinear models [143];
and for nonlinear models see Tong [388], who considers models similar to those we have
introduced from a Markovian viewpoint, and in particular discusses the bilinear and
SETAR models. Linear and bilinear models are also developed by Duflo in [102], with
a view towards stability similar to ours. For a development of general linear systems
theory the reader is referred to Caines [57] for a control perspective, or Aoki [5] for a
view towards time series analysis.
Bilinear models have received a great deal of attention in recent years in both time
series and systems theory. The dependent parameter bilinear model defined by (2.14,
2.13) is called a doubly stochastic autoregressive process of order 1, or DSAR(1), in
2.5. Commentary* 47

Tjøstheim [386]. Realization theory for related models is developed in Guégan [146]
and Mittnik [285], and the papers Pourahmadi [321], Brandt [44], Meyn and Guo [275],
and Karlsen [195] provide various stability conditions for bilinear models.
The idea of analyzing the nonlinear state space model by examining an associated
control model goes back to Stroock and Varadhan [378] and Kunita [227, 228] in con-
tinuous time. In control and systems models, linear state space models have always
played a central role, while nonlinear models have taken a much more significant role
over the past decade: see Kumar and Varaiya [225], Duflo [102], and Caines [57] for a
development of both linear adaptive control models, and (nonlinear) controlled Markov
chains.
The embedded regeneration time approach has been enormously significant since its
introduction by Kendall in [200, 201]. There are many more sophisticated variations
than those we shall analyze available in the literature. A good recent reference is
Asmussen [9], whilst Cohen [76] is encyclopedic.
The interested reader will find that, although we restrict ourselves to these relatively
less complicated models in illustrating the value of Markov chain modeling, virtually
all of our general techniques apply across more complex systems. As one example, note
that the stability of models which are state dependent, such as the content-dependent
storage model of Section 2.4.4, has only recently received attention [56], but using the
methods developed in later chapters it is possible to characterize it in considerable detail
[277, 279, 280].
The storage models described here can also be thought of, virtually by renaming
the terms, as models for state-dependent inventories, insurance models, and models of
the residual service in a GI/G/1 queue. To see the last of these, consider the amount of
service brought by each customer as the input to the “store” of work to be processed,
and note that the server works through this store of work at a constant rate.
The residual service can be, however, a somewhat minor quantity in a queueing
model, and in Section 3.5.4 below we develop a more complex model which is a better
representation of the dynamics of the GI/G/1 queue.
Added in second printing: In the last two years there has been a virtual explosion
in the use of general state space Markov chains in simulation methods, and especially
in Markov chain Monte Carlo methods which include Metropolis–Hastings and Gibbs
sampling techniques, which were touched on in Chapter 1.1(f). Any future edition will
need to add these to the collection of models here and examine them in more detail:
the interested reader might look at the recent results [63, 290, 360, 333, 328, 256, 335],
which all provide examples of the type of chains studied in this book.

Commentary for the second edition: More recent examples of analysis of

Metropolis–Hastings and Gibbs sampling techniques based on methods in this book
can be found in [330, 331, 79, 184, 125, 181]. The interested reader can ﬁnd in Sec-
tion 20.2 a summary of simulation techniques based on the theory contained in this
book.
Chapter 3

Transition probabilities

As with all stochastic processes, there are two directions from which to approach the
formal definition of a Markov chain.
The first is via the process itself, by constructing (perhaps by heuristic arguments
at first, as in the descriptions in Chapter 2) the sample path behavior and the dynamics
of movement in time through the state space on which the chain lives. In some of our
examples, such as models for queueing processes or models for controlled stochastic
systems, this is the approach taken. From this structural definition of a Markov chain,
we can then proceed to define the probability laws governing the evolution of the chain.
The second approach is via those very probability laws. We define them to have
the structure appropriate to a Markov chain, and then we must show that there is
indeed a process, properly defined, which is described by the probability laws initially
constructed. In effect, this is what we have done with the forward recurrence time chain
in Section 2.4.1.
From a practitioner’s viewpoint there may be little difference between the ap-
proaches. In many books on stochastic processes, such as Çinlar [59] or Karlin and
Taylor [194], the two approaches are used, as they usually can be, almost interchange-
ably; and advanced monographs such as Nummelin [303] also often assume some of the
foundational aspects touched on here to be well understood.
Since one of our goals in this book is to provide a guide to modern general space
Markov chain theory and methods for practitioners, we give in this chapter only a sketch
of the full mathematical construction which provides the underpinning of Markov chain
theory.
However, we also have as another, and perhaps somewhat contradictory, goal the
provision of a thorough and rigorous exposition of results on general spaces, and for
these it is necessary to develop both notation and concepts with some care, even if some
of the more technical results are omitted.
Our approach has therefore been to develop the technical detail in so far as it is
relevant to specific Markov models, and where necessary, especially in techniques which
are rather more measure theoretic or general stochastic process theoretic in nature, to
refer the reader to the classic texts of Doob [99], and Chung [71], or the more recent
exposition of Markov chain theory by Revuz [326] for the foundations we need. Whilst
such an approach renders this chapter slightly less than self-contained, it is our hope

48
3.1. Deﬁning a Markovian process 49

that the gaps in these foundations will be either accepted or easily ﬁlled by such external
sources.
Our main goals in this chapter are thus

(i) to demonstrate that the dynamics of a Markov chain {Φn } can be completely
deﬁned by its one step “transition probabilities”

P (x, A) = P(Φn ∈ A | Φn −1 = x),

which are well deﬁned for appropriate initial points x and sets A;

(ii) to develop the functional forms of these transition probabilities for many of the
speciﬁc models in Chapter 2, based in some cases on heuristic analysis of the chain
and in other cases on development of the probability laws; and

(iii) to develop some formal concepts of hitting times on sets, and the “strong Markov
property” for these and related stopping times, which will enable us to address
issues of stability and structure in subsequent chapters.

We shall start ﬁrst with the formal concept of a Markov chain as a stochastic process,
and move then to the development of the transition laws governing the motion of the
chain; and complete the cycle by showing that if one starts from a set of possible
transition laws then it is possible to move from these to a chain which is well deﬁned
and governed by these laws.

3.1 Deﬁning a Markovian process

A Markov chain Φ = {Φ0 , Φ1 , . . .} is a particular type of stochastic process taking, at
times n ∈ Z+ , values Φn in a state space X.
We need to know and use a little of the language of stochastic processes. A dis-
crete time stochastic process Φ on a state space is, for our purposes, a collection
Φ = (Φ0 , Φ1 , . . .) of random variables, with each Φi taking values in X; these ran-
dom variables are assumed measurable individually with respect to some given σ-field
B(X), and we shall in general denote elements of X by letters x, y, z, . . . and elements of
B(X) by A, B, C.
When thinking of the process as an entity, we regard values of the whole chain Φ
itself (called sample paths or realizations)
∞ as lying in the sequence or path space formed
by a countable product Ω = X∞ = i=0 Xi , where each Xi is a copy of X equipped with
a copy of B(X). For Φ to be defined as a random variable in its own right, Ω will be
equipped with a σ-field F, and for each state x ∈ X, thought of as an initial condition
in the sample path, there will be a probability measure Px such that the probability of
the event {Φ ∈ A} is well defined for any set A ∈ F; the initial condition requires, of
course, that Px (Φ0 = x) = 1.
The triple {Ω, F, Px } thus defines a stochastic process since Ω = {ω0 , ω1 , . . . : ωi ∈ X}
has the product structure to enable the projections ωn at time n to be well defined
realizations of the random variables Φn .
Many of the models we consider (such as random walk or state space models) have
stochastic motion based on a separately defined sequence of underlying variables, namely
50 Transition probabilities

a noise or disturbance or innovation sequence W . We will slightly abuse notation by

using P(W ∈ A) to denote the probability of the event {W ∈ A} without specifically
defining the space on which W exists, or the initial condition of the chain: this could be
part of the space on which the chain Φ is defined or it could be separate. No confusion
should result from this usage.
Prior to discussing specific details of the probability laws governing the motion of a
chain Φ, we need first to be a little more explicit about the structure of the state space
X on which it takes its values. We consider, systematically, three types of state spaces
in this book:

State space deﬁnitions

(i) The state space X is called countable if X is discrete, with a ﬁnite

or countable number of elements, and with B(X) the σ-ﬁeld of all
subsets of X.
(ii) The state space X is called general if it is equipped with a countably
generated σ-ﬁeld B(X).

(iii) The state space X is called topological if it is equipped with a locally

compact, separable, metrizable topology with B(X) as the Borel σ-
ﬁeld.

It may on the face of it seem odd to introduce quite general spaces before rather
than after topological (or more structured) spaces.
This is however quite deliberate, since (perhaps surprisingly) we rarely find the extra
structure actually increasing the ease of approach. From our point of view, we introduce
topological spaces largely because specific applied models evolve on such spaces, and
for such spaces we will give specific interpretations of our general results, rather than
extending specific topological results to more general contexts.
For example, after framing general properties of sets, we identify these general prop-
erties as holding for compact or open sets if the chain is on a topological space; or after
framing general properties of Φ, we develop the consequences of these when Φ is suitably
continuous with respect to the topology considered.
The first formal introduction of such topological concepts is given in Chapter 6, and
is exemplified by an analysis of linear and nonlinear state space models in Chapter 7.
Prior to this we concentrate on countable and general spaces: for purposes of exposi-
tion, our approach will often involve the description of behavior on a countable space,
followed by the development of analogous behavior on a general space, and completed
by specialization of results, where suitable, to more structured topological spaces in due
course.
For some readers, countable space models will be familiar: nonetheless, by develop-
ing the results first in this context, and then the analogues for the less familiar general
3.2. Foundations on a countable space 51

space processes on a systematic basis we intend to make the general context more acces-
sible. By then specializing where appropriate to topological spaces, we trust the results
will be found more applicable for, say, those models which evolve on multidimensional
Euclidean space Rk , or one of its subsets.
There is one caveat to be made in giving this description. One of the major observa-
tions for Markov chains is that in many cases, the full force of a countable space is not
needed: we merely require one “accessible atom” in the space, such as we might have
with the state {0} in the storage models in Section 2.4.1. To avoid repetition we will
often assume, especially later in the book, not the full countable space structure but
just the existence of one such point: the results then carry over with only notational
changes to the countable case.
In formalizing the concept of a Markov chain we pursue this pattern now, ﬁrst
developing the countable space foundations and then moving on to the slightly more
complex basis for general space chains.

3.2 Foundations on a countable space

3.2.1 The initial distribution and the transition matrix
A discrete time Markov chain Φ on a countable state space is a collection Φ =
{Φ0 , Φ1 , . . .} of random variables, with each Φi taking values in the countable set X. In
this countable state space setting, B(X) will denote the set of all subsets of X.
We assume that for any initial distribution µ for the chain, there exists a probability
measure which denotes the law of Φ on (Ω, F), where F is the product σ-ﬁeld on the
sample space Ω := X∞ . However, since we have to work with several initial conditions
simultaneously, we need to build up a probability space for each initial distribution.
For a given initial probability distribution µ on B(X), we construct the probability
distribution Pµ on F so that Pµ (Φ0 = x0 ) = µ(x0 ) and for any A ∈ F,

Pµ (Φ ∈ A | Φ0 = x0 ) = Px 0 (Φ ∈ A) (3.1)

where Px 0 is the probability distribution on F which is obtained when the initial distri-
bution is the point mass δx 0 at x0 .
The defining characteristic of a Markov chain is that its future trajectories depend
on its present and its past only through the current value.
To commence to formalize this, we first consider only the laws governing a trajectory
of fixed length n ≥ 1. The random variables {Φ0 . . . Φn }, thought of as a sequence, take
values in the space Xn +1 = X0 × · · · × Xn , the (n + 1)-fold product of copies Xi of the
countable space X, equipped with the product σ-field B(Xn +1 ) which consists again of
all subsets of Xn +1 .
The conditional probability

Pnx 0 (Φ1 = x1 , . . . , Φn = xn ) := Px 0 (Φ1 = x1 , . . . , Φn = xn ), (3.2)

deﬁned for any sequence {x0 , . . . , xn } ∈ Xn +1 and x0 ∈ X, and the initial probability
distribution µ on B(X) completely determine the distributions of {Φ0 , . . . , Φn }.
52 Transition probabilities

Countable space Markov chain

The process Φ = (Φ0 , Φ1 , . . .), taking values in the path space (Ω, F, P), is
a Markov chain if for every n, and any sequence of states {x0 , x1 , . . . , xn },

Pµ (Φ0 = x0 , Φ1 = x1 , Φ2 = x2 , . . . , Φn = xn )
(3.3)
= µ(x0 )Px 0 (Φ1 = x1 )Px 1 (Φ1 = x2 ) · · · Px n −1 (Φ1 = xn ).

The probability µ is called the initial distribution of the chain.

The process Φ is a time-homogeneous Markov chain if the probabilities
Px j (Φ1 = xj +1 ) depend only on the values of xj , xj +1 and are independent
of the timepoints j.

By extending this in the obvious way from events in Xn to events in X∞ we have

that the initial distribution, followed by the probabilities of transitions from one step
to the next, completely deﬁne the probabilistic motion of the chain.
If Φ is a time-homogeneous Markov chain, we write

P (x, y) := Px (Φ1 = y);

then the deﬁnition (3.3) can be written

Pµ (Φ0 = x0 , Φ1 = x1 , . . . , Φn = xn )
(3.4)
= µ(x0 )P (x0 , x1 )P (x1 , x2 ) · · · P (xn −1 , xn ),

or equivalently, in terms of the conditional probabilities of the process Φ,

Pµ (Φn +1 = xn +1 | Φn = xn , . . . , Φ0 = x0 ) = P (xn , xn +1 ). (3.5)

Equation (3.5) incorporates both the “loss of memory” of Markov chains and the “time
homogeneity” embodied in our definitions. It is possible to mimic this definition, asking
that the Px j (Φ1 = xj +1 ) depend on the time j at which the transition takes place; but
the theory for such inhomogeneous chains is neither so ripe nor so clean as for the chains
we study, and we restrict ourselves solely to the time-homogeneous case in this book.
For a given model we will almost always define the probability Px 0 for a fixed x0 by
defining the one-step transition probabilities for the process, and building the overall
distribution using (3.4).
This is done using a Markov transition matrix.
3.2. Foundations on a countable space 53

Transition probability matrix

The matrix P = {P (x, y), x, y ∈ X} is called a Markov transition matrix if

P (x, y) ≥ 0, P (x, z) = 1, x, y ∈ X. (3.6)
z ∈X

We deﬁne the usual matrix iterates P n = {P n (x, y), x, y ∈ X} by setting P 0 = I,

the identity matrix, and then taking inductively

P n (x, z) = P (x, y)P n −1 (y, z). (3.7)
y ∈X

In the next section we show how to take an initial distribution µ and a transition matrix
P and construct a distribution Pµ so that the conditional distributions of the process
may be computed as in (3.1), and so that for any x, y,
Pµ (Φn = y | Φ0 = x) = P n (x, y) (3.8)
For this reason, P n is called the n-step transition matrix. For A ⊆ X, we also put

P n (x, A) := P n (x, y).
y ∈A

3.2.2 Developing Φ from the transition matrix

To define a Markov chain from a transition function we first consider only the laws
governing a trajectory of fixed length n ≥ 1. The random variables {Φ0 , . . . , Φn },
thought of as a sequence, take values in the space Xn +1 = X0 × · · · × Xn , equipped with
the σ-field B(Xn +1 ) which consists of all subsets of Xn +1 .
Define the distributions Px of Φ inductively by setting, for each fixed x ∈ X
Px (Φ0 = x) = 1,
Px (Φ1 = y) = P (x, y),
Px (Φ2 = z, Φ1 = y) = P (x, y)P (y, z),
and so on. It is then straightforward, but a little lengthy, to check that for each fixed
x, this gives a consistent set of definitions of probabilities Pnx on (Xn , B(Xn )), and these
distributions
∞ can be built up to an overall
∞ probability measure Px for each x on the
set Ω = i=0 Xi with σ-field F = i=0 B(Xi ), defined in the usual way. Once we
prescribe an initial measure µ governing the random variable Φ0 , we can define the
overall measure by
Pµ (Φ ∈ A) := µ(x)Px (Φ ∈ A)
x∈X

to govern the overall evolution of Φ. The formula (3.1) and the interpretation of the
transition function given in (3.8) follow immediately from this construction.
A careful construction is in Chung [71], Chapter I.2. This leads to
54 Transition probabilities

Theorem 3.2.1. If X is countable, and

µ = {µ(x), x ∈ X}, P = {P (x, y), x, y ∈ X}

are an initial measure on X and a Markov transition matrix satisfying (3.6) then there
exists a Markov chain Φ on (Ω, F) with probability law Pµ satisfying

Pµ (Φn +1 = y | Φn = x, . . . , Φ0 = x0 ) = P (x, y).

3.3 Speciﬁc transition matrices

In practice models are often built up by constructing sample paths heuristically, often
for quite complicated processes, such as the queues in Section 2.4.2 and their many
ramiﬁcations in the literature, and then calculating a consistent set of transition prob-
abilities. Theorem 3.2.1 then guarantees that one indeed has an underlying stochastic
process for which these probabilities make sense.
To make this more concrete, let us consider a number of the models with Markovian
structure introduced in Chapter 2, and illustrate how their transition probabilities may
be constructed on a countable space from physical or other assumptions.

3.3.1 The forward and backward recurrence time chains

Recall that the forward recurrence time chain V + is given by

V + (n) := inf(Zm − n : Zm > n), n≥0

where Zn is a renewal sequence as introduced in Section 2.4.1.

The transition matrix for V + is particularly simple. If V + (n) = k for some k > 0,
then after one time unit V + (n + 1) = k − 1. If V + (n) = 1, then a renewal occurs at
n + 1 and V + (n + 1) has the distribution p of an arbitrary term in the renewal sequence.
This gives the sub-diagonal structure
 
p(1) p(2) p(3) p(4) ...
 1 0 0 . . .
 
 .. .. 
P =
 0 . . 

 0 1 0 
 
.. .. .. ..
. . . .

The backward recurrence time chain V − has a similarly simple structure. For any
n ∈ Z+ , let us write

p(n) = p(j). (3.9)
j ≥n +1
3.3. Speciﬁc transition matrices 55

Write M = sup(m ≥ 1 : p(m) > 0); if M < ∞ then for this chain the state space
X = {0, 1, . . . , M − 1}; otherwise X = Z+ . In either case, for x ∈ X we have (with Y as
a generic increment variable in the renewal process)
P (x, x + 1) = P(Y > x + 1 | Y > x) = p(x + 1)/p(x),
P (x, 0) = P(Y = x + 1 | Y > x) = p(x + 1)/p(x), (3.10)
and zero otherwise. This gives a superdiagonal matrix of the form
 
b(1) 1 − b(1) 0 0 ...
b(2) 0 1 − b(2) 0 . . .
 
P = b(3) .. 

 0 . 1 − b(3) 
.. .. .. ..
. . . .

where we have written b(j) = p(j + 1)/p(j).

These particular chains are a rich source of simple examples of stable and unstable
behaviors, depending on the behavior of p; and they are also chains which will be found
to be fundamental in analyzing the asymptotic behavior of an arbitrary chain.

3.3.2 Random walk models

Random walk on the integers
Let us deﬁne the random walk Φ = {Φn ; n ∈ Z+ } by setting, as in (RW1), Φn =
Φn −1 + Wn where now the increment variables Wn are i.i.d. random variables taking
only integer values in Z = {. . . , −1, 0, 1, . . .}. As usual, write Γ(y) = P(W = y).
Then for x, y ∈ Z, the state space of the random walk,
P (x, y) = P(Φ1 = y | Φ0 = x)
= P(Φ0 + W1 = y | Φ0 = x)
= P(W1 = y − x)
= Γ(y − x). (3.11)
The random walk is distinguished by this translation invariant nature of the transition
probabilities: the probability that the chain moves from x to y in one step depends only
on the diﬀerence x − y between the values.

Random walks on a half line

It is equally easy to construct the transition probability matrix for the random walk on
the half line Z+ , deﬁned in (RWHL1).
Suppose again that {Wi } takes values in Z, and recall from (RWHL1) that the
random walk on a half line obeys
Φn = [Φn −1 + Wn ]+ . (3.12)
Then for y ∈ Z+ , the state space of the random walk on a half line, we have as in (3.11)
that for y > 0
P (x, y) = Γ(y − x); (3.13)
56 Transition probabilities

whilst for y = 0,
P (x, 0) = P(Φ0 + W1 ≤ 0 | Φ0 = x)
= P(W1 ≤ −x)
= Γ(−∞, −x].

The simple storage model

The storage model given by (SSM1)–(SSM2) is a concrete example of the structure
in (3.13) and (3.14), provided the release rate is r = 1, the inter-input times take
values n ∈ Z+ with distribution G, and the input values are also integer valued with
distribution H.
The random walk on a half line describes the behavior of this storage model, and
its transition matrix P therefore deﬁnes its one-step behavior. We can calculate the
values of the increment distribution function Γ in a diﬀerent way, in terms of the basic
parameters G and H of the models, by breaking up the possibilities of the input time
and the input size: we have

∞n − Jn = x)
Γ(x) = P(S
= i=0 H(i)G(x + i).

We have rather forced the storage model into our countable space context by assuming
that the variables concerned are integer valued. We will rectify this in later sections.

3.3.3 Embedded queueing models

The GI/M/1 queue
The next context in which we illustrate the construction of the transition matrix is in
the modeling of queues through their embedded chains.
Consider the random variable Nn = N (Tn −), which counts customers immediately
before each arrival in a queueing system satisfying (Q1)–(Q3).
We will ﬁrst construct the matrix P = (P (x, y)) corresponding to the number of
customers N = {Nn } for the GI/M/1 queue; that is, the queue satisfying (Q4).
Proposition 3.3.1. For the GI/M/1 queue, the sequence N = {Nn , n ≥ 0} can be
constructed as a Markov chain with state space Z+ and transition matrix
 
q 0 p0
 q 1 p1 p0 0 
 
P =  q 2 p2 p1 p0 
 
.. .. .. .. ..
. . . . .
∞
where qj = i=j +1 pi , and
∞
p0 = P(S > T ) = e−µt G(dt), (3.14)
0

pj = P{Sj > T > Sj −1 )

∞
= {e−µt (µt)j /j!} G(dt), j ≥ 1. (3.15)
0
3.3. Speciﬁc transition matrices 57

Hence N is a random walk on a half line.

Proof In Section 2.4.2 we established the Markovian nature of the increases at

Tn −, in (2.28), under the assumption of exponential service times.
Since we consider N (t) immediately before every arrival time, Nn +1 can only increase
from Nn by one unit at most; hence for k > 1 it is trivial that

P(Nn +1 = j + k | Nn = j, Nn −1 , Nn −2 , . . . , N0 ) = 0. (3.16)

The independence and identical distribution structure of the service times show as in
Section 2.4.2 that, no matter which previous customer was being served, and when their
service started,
∞
P(Nn +1 = j + 1 | Nn = j, Nn −1 , Nn −2 , . . . , N0 ) = e−µt G(dt) = p0 (3.17)
0

as shown in equation (2.31). This establishes the upper triangular structure of P .

If Nn = j, then for 0 < i ≤ j, we have Nn +1 = i provided exactly (j − i + 1)
jobs are completed in an inter-arrival period. It is an elementary property of sums of
exponential random variables (see, for example, Çinlar [59], Chapter 4) that for any t,
the number of services completed in a time [0, t] is Poisson with parameter µt, so that

P(S0 + · · · + Sj +1 > t > S0 + · · · + Sj ) = e−µt (µt)j /j! (3.18)

from which we derive (3.15). ∞

It remains to show that P (j, 0) = qj = i=j +1 pi ; but this follows analogously with
equation (3.15), since the queue empties if more than (j +1) customers complete service
between arrivals.
Finally, to assert that N = {Nn } can actually be constructed in its entirety as a
Markov chain on Z+ , we appeal to the general results of Theorem 3.2.1 above to build N
from the probabilistic building blocks P = (P (i, j)), and any initial distribution µ.

The M/G/1 queue

Next consider the random variables Nn∗ , which count customers immediately after each
service time ends in a queueing system satisfying (Q1)–(Q3).
We showed in Section 2.4.2 that this is Markovian when the inter-arrival times are
exponential: that is, for an M/G/1 model satisfying (Q5).

Proposition 3.3.2. For the M/G/1 queue, the sequence N∗ = {Nn∗ , n ≥ 0} can be
constructed as a Markov chain with state space Z+ and transition matrix
 
q0 q1 q2 q3 q4 ...
 q0 q1 q2 q3 q4 ... 
 
 q0 q1 q2 q3 ... 
P = 
 q0 q1 q2 ... 
 
.. .. .. .. ..
. . . . .
58 Transition probabilities

where for each j ≥ 0

∞
qj = {e−λt (λt)j /j!} H(dt) j ≥ 1. (3.19)
0

Hence N∗ is similar to a random walk on a half line, but with a diﬀerent modiﬁcation
of the transitions away from zero.

Proof Exactly as in (3.18), the expressions qk represent the probabilities of k

arrivals occurring in one service time with distribution H, when the inter-arrival times
are independent exponential variables of rate λ.

3.3.4 Linear models on the rationals

The discussion of the queueing models above not only gives more explicit examples of
the abstract random walk models, but also indicates how the Markov assumption may
or may not be satisfied, depending on how the process is constructed: we need the
exponential distributions for the basic building blocks, or we do not have probabilities
of transition independent of the past.
In contrast, for the simple scalar linear AR(1) models satisfying (SLM1) and (SLM2),
the Markovian nature of the process is immediate. The use of a countable space here
is in the main inappropriate, but some versions of this model do provide a good source
of examples and counterexamples which motivate the various topological conditions we
introduce in Chapter 6. Recall then that for an AR(1) model Xn and Wn are random
variables on R, satisfying
Xn = αXn −1 + Wn ,
for some α ∈ R, with the “noise” variables {Wn } independent and identically dis-
tributed. To use the countable structure of Section 3.2 we might assume, as with the
storage model in Section 3.3.2 above, that α is integer valued, and the noise variables
are also integer valued.
Or, if we need to assume a countable structure on X we might, for example, find
a better fit to reality by supposing that the constant α takes a rational value; and
that the generic noise variable W also has a distribution on the rationals Q, with
P(W = q) = Γ(q), q ∈ Q. We then have, in a very straightforward manner

Proposition 3.3.3. Provided x0 ∈ Q, the sequence X = {Xn , n ≥ 0} can be con-

structed as a time homogeneous Markov chain on the countable space Q, with transition
probability matrix

P (r, q) = P(Xn +1 = q | Xn = r)
= Γ(q − αr), r, q ∈ Q.

Proof We have established that X is Markov. Clearly, from (SLM1), when X0 ∈

Q, the value of X1 is in Q also; and P (r, q) merely describes the fact that the chain
moves from r to αr in a deterministic way before adding the noise with distribution W .
3.4. Foundations for general state space chains 59

Again, once we have P = {P (r, q), r, q ∈ Q}, we are guaranteed the existence of the
Markov chain X, using the results of Theorem 3.2.1 with P as transition probability
matrix.

This autoregression highlights immediately the shortcomings of the countable state
space structure. Although Q is countable, so that in a formal sense we can construct
a linear model satisfying (SLM1) and (SLM2) on Q in such a way that we can use
countable space Markov chain theory, it is clearly more natural to take, say, α as real
and the variable W as real valued also, so that Xn is real valued for any initial x0 ∈ R.
To model such processes, and the more complex autoregressions and nonlinear mod-
els which generalize them in Chapter 2, and which are clearly Markovian but continuous
valued in conception, we need a theory for continuous-valued Markov chains. We turn
to this now.

3.4 Foundations for general state space chains

3.4.1 Developing Φ from transition probabilities
The countable space approach guides the development of the theory we shall present in
this book for a much broader class of Markov chains, on quite general state spaces: it
is one of the more remarkable features of this seemingly sweeping generalization that
the great majority of the countable state space results carry over virtually unchanged,
without assuming any detailed structure on the space.
We let X be a general set, and B(X) denote a countably generated σ-ﬁeld on X: when
X is topological, then B(X) will be taken as the Borel σ-ﬁeld, but otherwise it may be
arbitrary.
In this case we again start from the one-step transition probabilities and construct
Φ much as in Theorem 3.2.1.

Transition probability kernels

If P = {P (x, A), x ∈ X, A ∈ B(X)} is such that

(i) for each A ∈ B(X), P ( · , A) is a non-negative measurable function on

X and

(ii) for each x ∈ X, P (x, · ) is a probability measure on B(X),

then we call P a transition probability kernel or Markov transition func-

tion.

On occasion, as in Chapter 6, we may require that a collection T = {T (x, A), x ∈

X, A ∈ B(X)} satisﬁes (i) and (ii), with the exception that T (x, X) ≤ 1 for each x: such a
collection is called a substochastic transition kernel. In the other direction, there will be
times when we need to consider completely non-probabilistic mappings K : X × B(X) →
60 Transition probabilities

R+ with K(x, · ) a measure on B(X) for each x, and K( · , B) a measurable function on

X for each B ∈ B(X). Such a map is called a kernel on (X, B(X)).
We now imitate the development on a countable space to see that from the transition
probability kernel P we can define a stochastic process with the appropriate Markovian
properties, for which P will serve as a description of the one-step transition laws.
We first define a finiten sequence Φ = {Φ0 , Φ1 , . . . , Φn } of random
nvariables on the
product space Xn +1 = i=0 Xi , equipped with the product σ-field i=0 B(Xi ), by an
inductive procedure.
For any measurable sets Ai ⊆ Xi , we develop the set functions Pnx (·) on Xn +1 by
setting, for a fixed starting point x ∈ X and for the “cylinder sets” A1 × · · · × An

P1x (A1 ) = P (x, A1 ),

P2x (A1 × A2 ) = P (x, dy1 )P (y1 , A2 ),
A1
..
.
Pnx (A1 × · · · × An ) = P (x, dy1 ) P (y1 , dy2 ) · · · P (yn −1 , An ).
A1 A2

These are all well defined by the measurability of the integrands P ( · , · ) in the first
variable, and the fact that the kernels
n are measures in the second variable.
If we now extend Pnx to all of 0 B(Xi ) in the usual way [37] and repeat this procedure
for increasing n, we find

Theorem 3.4.1. For any initial measure µ on B(X), and any transition probability ker-
{P (x, A), x ∈ X, A ∈ B(X)}, there exists
nel P = a stochastic process Φ = {Φ0 , Φ1 , . . .}
∞ ∞
on Ω = i=0 Xi , measurable with respect to F = i=0 B(Xi ), and a probability measure
Pµ on F such that Pµ (B) is the probability of the event {Φ ∈ B} for B ∈ F; and for
measurable Ai ⊆ Xi , i = 0, . . . , n, and any n

Pµ (Φ0 ∈ A0 , Φ1 ∈ A1 , . . . , Φn ∈ An ) (3.20)

= ··· µ(dy0 )P (y0 , dy1 ) · · · P (yn −1 , An ).
y 0 ∈A 0 y n −1 ∈A n −1

Proof Because of the consistency of definition of the set functions Pnx , there is an
overall measure Px for which the Pnx are finite-dimensional distributions, which leads to
the result: the details are relatively standard measure-theoretic constructions, and are
given in the general case by Revuz [326], Theorem 2.8 and Proposition 2.11; whilst if the
space has a suitable topology, as in (MC1), then the existence of Φ is a straightforward
consequence of Kolmogorov’s Consistency Theorem for construction of probabilities on
topological spaces.

The details of this construction are omitted here, since it suffices for our purposes
to have indicated why transition probabilities generate processes, and to have spelled
out that the key equation (3.20) is a reasonable representation of the behavior of the
process in terms of the kernel P .
We can now formally define
3.4. Foundations for general state space chains 61

Markov chains on general spaces

The stochastic process Φ is called a time-homogeneous Markov chain with
transition probability kernel P (x, A) and initial distribution µ if the ﬁnite
dimensional distributions of Φ satisfy (3.20) for each n.

3.4.2 The n-step transition probability kernel

As on countable spaces the n-step transition probability kernel is defined iteratively. We
set P 0 (x, A) = δx (A), the Dirac measure defined by

1 x∈A
δx (A) = (3.21)
0 x∈ / A,
and, for n ≥ 1, we define inductively

P n (x, A) = P (x, dy)P n −1 (y, A), x ∈ X, A ∈ B(X). (3.22)
X

We write P n for the n-step transition probability kernel {P n (x, A), x ∈ X, A ∈ B(X)}:
note that P n is deﬁned analogously to the n-step transition probability matrix for the
countable space case.
As a ﬁrst application of the construction equations (3.20) and (3.22), we have the
celebrated Chapman–Kolmogorov equations. These underlie, in one form or another,
virtually all of the solidarity structures we develop.
Theorem 3.4.2. For any m with 0 ≤ m ≤ n,

P n (x, A) = P m (x, dy)P n −m (y, A), x ∈ X, A ∈ B(X). (3.23)
X

Proof In (3.20), choose µ = δx and integrate over sets Ai = X for i = 1, . . . , n − 1;

and use the definition of P m and P n −m for the first m and the last n−m integrands.
We interpret (3.23) as saying that, as Φ moves from x into A in n steps, at any
intermediate time m it must take (obviously) some value y ∈ X; and that, being a
Markov chain, it forgets the past at that time m and moves the succeeding (n − m)
steps with the law appropriate to starting afresh at y. We can write equation (3.23)
alternatively as
Px (Φn ∈ A) = Px (Φm ∈ dy)Py (Φn −m ∈ A). (3.24)
X
Exactly as the one-step transition probability kernel describes a chain Φ, the m-step
kernel (viewed in isolation) satisfies the definition of a transition kernel, and thus defines
a Markov chain Φm = {Φm n } with transition probabilities

n ∈ A) = P
Px (Φm mn
(x, A). (3.25)

This, and several other transition functions obtained from P , will be used widely in the
sequel.
62 Transition probabilities

Skeletons and resolvents

The chain Φm with transition law (3.25) is called the m-skeleton of the
chain Φ.
The resolvent Ka ε is deﬁned for 0 < ε < 1 by
∞

Ka ε (x, A) := (1 − ε) εi P i (x, A), x ∈ X, A ∈ B(X). (3.26)
i=0

The Markov chain with transition function Ka ε is called the Ka ε -chain.

This nomenclature is taken from the continuous time literature, but we will see that
in discrete time the m-skeletons and resolvents of the chain also provide a useful tool
for analysis.
There is one substantial difference in moving to the general case from the countable
case, which flows from the fact that the kernel P n can no longer be viewed as symmetric
in its two arguments.
In the general case the kernel P n operates on quite different entities from the left
and the right. As an operator P n acts on both bounded measurable functions f on X
and on σ-finite measures µ on B(X) via

n n n
P f (x) = P (x, dy)f (y), µP (A) = µ(dx)P n (x, A),
X X

and we shall use the notation P n f, µP n to denote these operations. We shall also write

P n (x, f ) := P n (x, dy)f (y) := δx P n f

if it is notationally convenient. In general, the functional notation is more compact: for

example, we can rewrite the Chapman–Kolmogorov equations as
P m +n = P m P n , m, n ∈ Z+ .
On many occasions, though, where we feel that the argument is more transparent when
written in full form we shall revert to the more detailed presentation.
The form of the Markov chain definitions we have given to date concern only the
probabilities of events involving Φ. We now define the expectation operation Eµ corre-
sponding to Pµ .
For cylinder sets we define Eµ by
Eµ [IA 0 ×···×A n (Φ0 , . . . , Φn )] := Pµ ({Φ0 , . . . , Φn } ∈ A0 × · · · × An ),
where IB denotes the indicator function of a set B. We may extend the definition to
that of Eµ [h(Φ0 , Φ1 , . . .)] for any measurable bounded real-valued function h on Ω by
requiring that the expectation be linear.
By linearity of the expectation, we can also extend the Markovian relationship (3.20)
to express the Markov property in the following equivalent form. We omit the details,
which are routine.
3.4. Foundations for general state space chains 63

Proposition 3.4.3. If Φ is a Markov chain on (Ω, F), with initial measure µ, and
h : Ω → R is bounded and measurable, then

Eµ [h(Φn +1 , Φn +2 , . . .) | Φ0 , . . . , Φn ; Φn = x] = Ex [h(Φ1 , Φ2 , . . .)]. (3.27)

The formulation of the Markov concept itself is made much simpler if we develop
more systematic notation for the information encompassed in the past of the process,
and if we introduce the “shift operator” on the space Ω.
For a given initial distribution, deﬁne the σ-ﬁeld

FnΦ := σ(Φ0 , . . . , Φn ) ⊆ B(Xn +1 )

which is the smallest σ-field for which the random variable {Φ0 , . . . , Φn } is measurable.
In many cases, FnΦ will coincide with B(Xn ), although this depends in particular on the
initial measure µ chosen for a particular chain.
The shift operator θ is defined to be the mapping on Ω defined by

θ({x0 , x1 , . . . , xn , . . .}) = {x1 , x2 , . . . , xn +1 , . . .}.

We write θk for the k th iterate of the mapping θ, deﬁned inductively by

θ1 = θ, θk +1 = θ ◦ θk , k ≥ 1.

The shifts θk deﬁne operators on random variables H on (Ω, F, Pµ ) by

(θk H)(w) = H ◦ θk (ω).

It is obvious that Φn ◦ θk (ω) = Φn +k . Hence if the random variable H is of the form

H = h(Φ0 , Φ1 , . . .) for a measurable function h on the sequence space Ω then

θk H = h(Φk , Φk +1 , . . .)

Since the expectation Ex [H] is a measurable function on X, it follows that EΦ n [H] is

a random variable on (Ω, F, Pµ ) for any initial distribution. With this notation the
equation
Eµ [θn H | FnΦ ] = EΦ n [H] a.s. [Pµ ] (3.28)

valid for any bounded measurable h and ﬁxed n ∈ Z+ , describes the time-homogeneous
Markov property in a succinct way.
It is not always the case that FnΦ is complete: that is, contains every set of Pµ -
measure zero. We adopt the following convention as in [326]. For any initial measure
µ we say that an event A occurs Pµ -a.s. to indicate that Ac is a set contained in an
element of FnΦ which is of Pµ -measure zero.
If A occurs Px -a.s. for all x ∈ X then we write that A occurs P∗ -a.s.
64 Transition probabilities

3.4.3 Occupation, hitting and stopping times

The distributions of the chain Φ at time n are the basic building blocks of its existence,
but the analysis of its behavior concerns also the distributions at certain random times
in its evolution, and we need to introduce these now.

Occupation times, return times and hitting times

(i) For any set A ∈ B(X), the occupation time ηA is the number of visits
by Φ to A after time zero, and is given by
∞

ηA := I{Φn ∈ A}.
n =1

(ii) For any set A ∈ B(X), the variables

τA := min{n ≥ 1 : Φn ∈ A},
σA := min{n ≥ 0 : Φn ∈ A}

are called the ﬁrst return and ﬁrst hitting times on A, respectively.

For every A ∈ B(X), ηA , τA and σA are obviously measurable functions from Ω to

Z+ ∪ {∞}.
Unless we need to distinguish between different returns to a set, then we call τA
and σA the return and hitting times on A respectively. If we do wish to distinguish
different return times, we write τA (k) for the random time of the k th visit to A: these
are defined inductively for any A by

τA (1) := τA ,
τA (k) := min{n > τA (k − 1) : Φn ∈ A}. (3.29)

Analysis of Φ involves the kernel U deﬁned as

∞

U (x, A) := P n (x, A)
n =1
= Ex [ηA ] (3.30)

which maps X × B(X) to R ∪ {∞}, and the return time probabilities

L(x, A) := Px (τA < ∞)

= Px (Φ ever enters A). (3.31)
3.4. Foundations for general state space chains 65

In order to analyze numbers of visits to sets, we often need to consider the behavior
after the first visit τA to a set A (which is a random time), rather than behavior
after fixed times. One of the most crucial aspects of Markov chain theory is that the
“forgetfulness” properties in equation (3.20) or equation (3.27) hold, not just for fixed
times n, but for the chain interrupted at certain random times, called stopping times,
and we now introduce these ideas.

Stopping times
A function ζ : Ω → Z+ ∪ {∞} is a stopping time for Φ if for any initial
distribution µ the event {ζ = n} ∈ FnΦ for all n ∈ Z+ .

The ﬁrst return and the hitting times on sets provide simple examples of stopping
times.
Proposition 3.4.4. For any set A ∈ B(X), the variables τA and σA are stopping times
for Φ.

Proof Since we have

{τA = n} = ∩nm−1
=1 {Φm ∈ A } ∩ {Φn ∈ A} ∈ Fn ,
c Φ
n ≥ 1,
{σA = n} = ∩nm−1
=0 {Φm ∈ A } ∩ {Φn ∈ A} ∈
c
FnΦ , n ≥ 0,
it follows from the deﬁnitions that τA and σA are stopping times.

We can construct the full distributions of these stopping times from the basic build-
ing blocks governing the motion of Φ, namely the elements of the transition probability
kernel, using the Markov property for each ﬁxed n ∈ Z+ . This gives
Proposition 3.4.5. (i) For all x ∈ X, A ∈ B(X)
Px (τA = 1) = P (x, A),
and inductively for n > 1

Px (τA = n) = P (x, dy)Py (τA = n − 1)
Ac

= P (x, dy1 ) P (y1 , dy2 ) · · ·
Ac Ac

P (yn −2 , dyn −1 )P (yn −1 , A).
Ac

(ii) For all x ∈ X, A ∈ B(X)

Px (σA = 0) = IA (x),
and for n ≥ 1, x ∈ A c

Px (σA = n) = Px (τA = n).

66 Transition probabilities

If we use the kernel IB deﬁned as IB (x, A) := IA ∩B (x), we have, in more compact

functional notation,
Px (τA = k) = [(P IA c )k −1 P ] (x, A).

From this we obtain the formula

∞

L(x, A) := [(P IA c )k −1 P ] (x, A)
k =1

for the return time probability to a set A starting from the state x.
The simple Markov property (3.28) holds for any bounded measurable h and fixed
n ∈ Z+ . We now extend (3.28) to stopping times.
If ζ is an arbitrary stopping time, then the fact that our time set is Z+ enables us
to define the random variable Φζ by setting Φζ = Φn on the event {ζ = n}. For a
stopping time ζ the property which tells us that the future evolution of Φ after the
stopping time depends only on the value of Φζ , rather than on any other past values,
is called the strong Markov property.
To describe this formally, we need to define the σ-field FζΦ :={A ∈ F : {ζ = n}∩A ∈
Fn , n ∈ Z+ }, which describes events which happen “up to time ζ”.
Φ

For a stopping time ζ and a random variable H = h(Φ0 , Φ1 , . . .) the shift θζ is

deﬁned as
θζ H = h(Φζ , Φζ +1 , . . .),

on the set {ζ < ∞}. The required extension of (3.28) is then

Strong Markov property

We say Φ has the strong Markov property if for any initial distribution µ,
any real-valued bounded measurable function h on Ω, and any stopping
time ζ,
Eµ [θζ H | FζΦ ] = EΦ ζ [H] a.s. [Pµ ], (3.32)
on the set {ζ < ∞}.

Proposition 3.4.6. For a Markov chain Φ with discrete time parameter, the strong
Markov property always holds.

Proof This result is a simple consequence of decomposing the expectations on both

sides of (3.32) over the set where {ζ = n}, and using the ordinary Markov property, in
the form of equation (3.28), at each of these ﬁxed times n.

We are not always interested only in the times of visits to particular sets. Often the
quantities of interest involve conditioning on such visits being in the future.
3.5. Building transition kernels for speciﬁc models 67

Taboo probabilities
We deﬁne the n-step taboo probabilities as

AP
n
(x, B) := Px (Φn ∈ B, τA ≥ n), x ∈ X, A, B ∈ B(X).

The quantity A P n (x, B) denotes the probability of a transition to B in n steps of the

chain, “avoiding” the set A. As in Proposition 3.4.5 these satisfy the iterative relation
1
AP (x, B) = P (x, B)
and for n > 1

AP
n
(x, B) = P (x, dy)A P n −1 (y, B), x ∈ X, A, B ∈ B(X), (3.33)
Ac

or, in operator notation, A P n (x, B) = [(P IA c )n −1 P ](x, B).

We will also use extensively the notation
∞

UA (x, B) := AP
n
(x, B), x ∈ X, A, B ∈ B(X); (3.34)
n =1

note that this extends the deﬁnition of L in (3.31) since

UA (x, A) = L(x, A), x ∈ X.

3.5 Building transition kernels for speciﬁc models

3.5.1 Random walk on a half line
Let Φ be a random walk on a half line, where now we do not restrict the increment
distribution to be integer valued. Thus {Wi } is a sequence of i.i.d. random variables
taking values in R = (−∞, ∞), with distribution function Γ(A) = P(W ∈ A), A ∈ B(R).
For any A ⊆ (0, ∞), we have by the arguments we have used before
P (x, A) = P(Φ0 + W1 ∈ A | Φ0 = x)
= P(W1 ∈ A − x)
= Γ(A − x), (3.35)
whilst
P (x, {0}) = P(Φ0 + W1 ≤ 0 | Φ0 = x)
= P(W1 ≤ −x)
= Γ(−∞, −x]. (3.36)
These models are often much more appropriate in applications than random walks
restricted to integer values.
68 Transition probabilities

3.5.2 Storage and queueing models

Consider the Moran dam model given by (SSM1)–(SSM2), in the general case where
r > 0, the inter-input times have distribution G; and the input values have distribution
H.
The model of a random walk on a half line with transition probability kernel P given
by (3.36) deﬁnes the one-step behavior of the storage model. As for the integer-valued
case, we calculate the distribution function Γ explicitly by breaking up the possibilities
of the input time and the input size, to get a similar convolution form for Γ in terms of
G and H:

Γ(A) = P(Sn − Jn ∈ A)
∞
= G(A/r + y/r) H(dy), (3.37)
0

where as usual the set A/r := {y : ry ∈ A}.

The model (3.37) is of a storage system, and we have phrased the terms accordingly.
The same transition law applies to the many other models of this form: inventories,
insurance models, and models of the residual service in a GI/G/1 queue, which were
mentioned in Section 2.5.
In Section 3.5.4 below we will develop the transition probability structure for a more
complex system which can also be used to model the dynamics of the GI/G/1 queue.

3.5.3 Renewal processes and related chains

We now consider a real-valued renewal process: this extends the countable space version
of Section 2.4.1 and is closely related to the residual service time mentioned above.
Let {Y1 , Y2 , . . .} be a sequence of independent and identical random variables, now
with distribution function Γ concentrated, not on the whole real line nor on Z+ , but
rather on R+ . Let Y0 be a further independent random variable, with the distribution
of Y0 being Γ0 , also concentrated on R+ . The random variables

n
Zn := Yi
i=0

are again called a delayed renewal process, with Γ0 being the distribution of the delay
described by the ﬁrst variable. If Γ0 = Γ then the sequence {Zn } is again referred to
as a renewal process.
As with the integer-valued case, write Γ0 ∗ Γ for the convolution of Γ0 and Γ given
by
t t
Γ0 ∗ Γ (dt) := Γ(dt − s) Γ0 (ds) = Γ0 (dt − s) Γ(ds) (3.38)
0 0
n∗ th
and Γ for the n convolution of Γ with itself. By decomposing successively over the
values of the ﬁrst n variables Z0 , . . . , Zn −1 we have that

P(Zn ∈ dt) = Γ0 ∗ Γn ∗ (dt)

3.5. Building transition kernels for speciﬁc models 69

∞
and so the renewal measure given by U (−∞, t] = 0 Γn ∗ (−∞, t] has the interpretation

U [0, t] = E0 [number of renewals in [0, t]]

and
Γ0 ∗ U [0, t] = EΓ 0 [number of renewals in [0, t]],

where E0 refers to the expectation when the ﬁrst renewal is at 0, and EΓ 0 refers to the
expectation when the ﬁrst renewal has distribution Γ0 .
It is clear that Zn is a Markov chain: its transition probabilities are given by

P (x, A) = P(Zn ∈ A | Zn −1 = x) = Γ(A − x)

and so Zn is a random walk. It is not a very stable one, however, as it moves inexorably
to inﬁnity with each new step.
The forward and backward recurrence time chains, in contrast to the renewal process
itself, exhibit a much greater degree of stability: they grow, then they diminish, then
they grow again.

Forward and backward recurrence time chains

If {Zn } is a renewal process with no delay, then we call the process
(RT3) V + (t) := inf(Zn − t : Zn > t, n ≥ 1), t ≥ 0,
the forward recurrence time process; and for any δ > 0, the discrete time
δ = {Vδ (n) = V (nδ), n ∈ Z+ } is called the forward recurrence
chain V + + +

time δ-skeleton.
We call the process
(RT4) V − (t) := inf(t − Zn : Zn ≤ t, n ≥ 1), t ≥ 0,
the backward recurrence time process; and for any δ > 0, the discrete time
chain V − − −
δ = {Vδ (n) = V (nδ), n ∈ Z+ } is called the backward recurrence
time δ-skeleton.

No matter what the structure of the renewal sequence (and in particular, even if Γ
−
is not exponential), the forward and backward recurrence time δ-skeletons V + δ and V δ
are Markovian.
To see this for the forward chain, note that if x > δ, then the transition probabilities
P δ of V +
δ are merely
P δ (x, {x − δ}) = 1

whilst if x ≤ δ we have, by decomposing over the time and the index of the last
renewal in the period after the current forward recurrence time ﬁnishes, and using the
70 Transition probabilities

independence of the variables Yi

∞
δ −x
P δ (x, A) = Γn ∗ (dt)Γ(A − [δ − x] − t)
0 n =0
δ −x
= U (dt)Γ(A − [δ − x] − t). (3.39)
0

For the backward chain we have similarly that for all x

P(V − (nδ) = x + δ | V − ((n − 1)δ) = x) = Γ(x + δ, ∞)/Γ(x, ∞),

whilst for dv ⊂ [0, δ]

− −
x+δ
Γ(v, ∞)
P(V (nδ) ∈ dv | V ((n − 1)δ) = x) = Γ(du)U (dv − (u − x) − δ) .
x [Γ(x, ∞)]−1

3.5.4 Ladder chains and the GI/G/1 queue

The GI/G/1 queue satisﬁes the conditions (Q1)–(Q3). Although the residual service
time process of the GI/G/1 queue can be analyzed using the model (3.37), the more
detailed structure involving actual numbers in the queue in the case of general (i.e.
non-exponential) service and input times requires a more complex state space for a
Markovian analysis.
We saw in Section 3.3.3 that when the service time distribution H is exponential,
we can deﬁne a Markov chain by

Nn = { number of customers at Tn −, n = 1, 2, . . .},

whilst we have a similarly embedded chain after the service times if the inter-arrival
time is exponential. However, the numbers in the queue, even at the arrival or departure
times, are not Markovian without such exponential assumptions.
The key step in the general case is to augment {Nn } so that we do get a Markov
model. This augmentation involves combining the information on the numbers in the
queue with the information in the residual service time
To do this we introduce a bivariate “ladder chain” on a “ladder” space Z+ × R,
with a countable number of rungs indexed by the first variable and with each rung
constituting a copy of the real line.
This construction is in fact more general than that for the GI/G/1 queue alone, and
we shall use the ladder chain model for illustrative purposes on a number of occasions.
Define the Markov chain Φ = {Φn } on Z+ × R with motion defined by the transition
probabilities P (i, x; j × A), i, j ∈ Z+ , x ∈ R, A ∈ B(R) given by

P (i, x; j × A) = 0, j > i + 1,
P (i, x; j × A) = Λi−j +1 (x, A), j = 1, . . . , i + 1, (3.40)
P (i, x; 0 × A) = Λ∗i (x, A).

where each of the Λi , Λ∗i is a substochastic transition probability kernel on R in its own
right.
3.5. Building transition kernels for speciﬁc models 71

The translation invariant and “skip-free to the right” nature of the movement of
this chain, incorporated in (3.41), indicates that it is a generalization of those random
walks which occur in the GI/M/1 queue, as delineated in Proposition 3.3.1. We have
 ∗ 
Λ0 Λ0
 Λ∗1 Λ1 Λ0 0 
 
P =  Λ∗ Λ2 Λ1 Λ0 
 2 
.. .. .. .. ..
. . . . .

where now the Λi , Λ∗i are substochastic transition probability kernels rather than mere
scalars.
To use this construction in the GI/G/1 context we write

Φn = (Nn , Rn ), n ≥ 1,

where as before Nn is the number of customers at Tn − and

Rn = {total residual service time in the system at Tn +} :

then Φ = {Φn ; n ∈ Z+ } can be realised as a Markov chain with the structure (3.41), as
we now demonstrate by constructing the transition kernel P explicitly.
As in (Q1)–(Q3) let H denote the distribution function of service times, and G
denote the distribution function of inter-arrival times; and let Z1 , Z2 , Z3 , . . . denote an
undelayed renewal process with Zn −Zn −1 = Sn having the service distribution function
H, as in (2.27). This diﬀers from the process of completion points of services in that the
latter may have longer intervals when there is no customer present, after completion of
a busy cycle.
Let Rt denote the forward recurrence time in the renewal process {Zk } at time t
in this process, i.e., Rt = ZN (t)+1 − t, where N (t) = sup{n : Zn ≤ t} as in (RT3). If
R0 = x then Z1 = x. Now write

Pnt (x, y) = P(Zn ≤ t < Zn +1 , Rt ≤ y | R0 = x) (3.41)

for the probability that, in this renewal process n “service times” are completed in [0, t]
and that the residual time of current service at t is in [0, y], given R0 = x.
With these deﬁnitions it is easy to verify that the chain Φ has the form (3.41) with
the speciﬁc choice of the substochastic transition kernels Λi , Λ∗i given by
∞
Λn (x, [0, y]) = Pnt (x, y) G(dt) (3.42)
0

and

∞
Λ∗n (x, [0, y]) = Λj (x, [0, ∞)) H[0, y]. (3.43)
n +1

3.5.5 State space models

The simple nonlinear state space model is a very general model and, consequently, its
transition function has an unstructured form until we make more explicit assumptions
72 Transition probabilities

in particular cases. The general functional form which we construct here for the scalar
SNSS(F ) model of Section 2.2.1 will be used extensively, as will the techniques which
are used in constructing its form.
For any bounded and measurable function h : X → R we have from (SNSS1),

h(Xn +1 ) = h(F (Xn , Wn +1 )).

Since {Wn } is assumed i.i.d. in (SNSS2) we see that

P h (x) = E[h(Xn +1 ) | Xn = x]
= E[h(F (x, W ))]

where W is a generic noise variable. Since Γ denotes the distribution of W , this becomes
∞
P h (x) = h(F (x, w)) Γ(dw)
−∞

and by specializing to the case where h = IA , we see that for any measurable set A and
any x ∈ X, ∞
P (x, A) = I{F (x, w) ∈ A} Γ(dw).
−∞

To construct the k-step transition probability, recall from (2.5) that the transition
maps for the SNSS(F ) model are deﬁned by setting F0 (x) = x, F1 (x0 , w1 ) = F (x0 , w1 ),
and for k ≥ 1,

Fk +1 (x0 , w1 , . . . , wk +1 ) = F (Fk (x0 , w1 , . . . , wk ), wk +1 )

where x0 and wi are arbitrary real numbers. By induction we may show that for any
initial condition X0 = x0 and any k ∈ Z+ ,

Xk = Fk (x0 , W1 , . . . , Wk ),

which immediately implies that the k-step transition function may be expressed as

P k (x, A) = P(Fk (x, W1 , . . . , Wk ) ∈ A)

= · · · I{Fk (x, w1 , . . . , wk ) ∈ A} Γ(dw1 ) · · · Γ(dwk ) (3.44)

3.6 Commentary
The development of foundations in this chapter is standard. The existence of the
excellent accounts in Chung [71] and Revuz [326] renders it far less necessary for us to
fill in specific details.
The one real assumption in the general case is that the σ-field B(X) is countably
generated. For many purposes, even this condition can be relaxed, using the device
of “admissible σ-fields” discussed in Orey [309], Chapter 1. We shall not require, for
the models we develop, the greater generality of non-countably generated σ-fields, and
leave this expansion of the concepts to the reader if necessary.
3.6. Commentary 73

The Chapman–Kolmogorov equations, simple though they are, hold the key to much
of the analysis of Markov chains. The general formulation of these dates to Kolmogorov
[215]: David Kendall comments [204] that the physicist Chapman was not aware of his
role in this terminology, which appears to be due to work on the thermal diffusion of
grains in a non-uniform fluid.
The Chapman–Kolmogorov equations indicate that the set P n is a semi-group of
operators just as the corresponding matrices are, and in the general case this obser-
vation enables an approach to the theory of Markov chains through the mathematical
structures of semi-groups of operators. This has proved a very fruitful method, espe-
cially for continuous time models. However, we do not pursue that route directly in
this book, nor do we pursue the possibilities of the matrix structure in the countable
case.
This is largely because, as general non-negative operators, the P n often do not act
on useful spaces for our purposes. The one real case where the P n operate success-
fully on a normed space occurs in Chapter 16, and even there the space only emerges
after a probabilistic argument is completed, rather than providing a starting point for
analysis.
Foguel [122, 124] has a thorough exposition of the operator-theoretic approach to
chains in discrete time, based on their operation on L1 spaces. Vere-Jones [405, 406]
has a number of results based on the action of a matrix P as a non-negative operator
on sequence spaces suitably structured, but even in this countable case results are
limited. Nummelin [303] couches many of his results in a general non-negative operator
context, as does Tweedie [394, 395], but the methods are probabilistic rather than using
traditional operator theory.
The topological spaces we introduce here will not be considered in more detail until
Chapter 6. Very many of the properties we derive will actually need less structure
than we have imposed in our definition of “topological” spaces: often (see for example
Tuominen and Tweedie [391]) all that is required is a countably generated topology with
the T1 separability property. The assumptions we make seem unrestrictive in practice,
however, and avoid occasional technicalities of proof.
Hitting times and their properties are of prime importance in all that follows. On
a countable space Chung [71] has a detailed account of taboo probabilities, and much
of our usage follows his lead and that of Nummelin [303], although our notation differs
in minor ways from the latter. In particular our τA is, regrettably, Nummelin’s SA and
our σA is Nummelin’s TA ; our usage of τA agrees, however, with that of Chung [71] and
Asmussen [9], and we hope is the more standard.
The availability of the strong Markov property is vital for much of what follows.
Kac is reported as saying [50] that he was fortunate, for in his day all processes had the
strong Markov property: we are equally fortunate that, with a countable time set, all
chains still have the strong Markov property.
The various transition matrices that we construct are well known. The reader who
is not familiar with such concepts should read, say, Çinlar [59], Karlin and Taylor [194]
or Asmussen [9] for these and many other not dissimilar constructions in the queue-
ing and storage area. For further information on linear stochastic systems the reader
is referred to Caines [57]. The control and systems areas have concentrated more in-
tensively on controlled Markov chains which have an auxiliary input which is chosen
to control the state process Φ. Once a control is applied in this way, the “closed
74 Transition probabilities

loop system” is frequently described by a Markov chain as deﬁned in this chapter.

Kumar and Varaiya [225] is a good introduction, and the article by Arapostathis et
al. [6] gives an excellent and up-to-date survey of the controlled Markov chain litera-
ture.
Chapter 4

Irreducibility

This chapter is devoted to the fundamental concept of irreducibility: the idea that all
parts of the space can be reached by a Markov chain, no matter what the starting
point. Although the initial results are relatively simple, the impact of an appropriate
irreducibility structure will have wide-ranging consequences, and it is therefore of critical
importance that such structures be well understood.
The results summarized in Theorem 4.0.1 are the highlights of this chapter from a
theoretical point of view. An equally important aspect of the chapter is, however, to
show through the analysis of a number of models just what techniques are available in
practice to ensure the initial condition of Theorem 4.0.1 (“ϕ-irreducibility”) holds, and
we believe that these will repay equally careful consideration.
Theorem 4.0.1. If there exists an “irreducibility” measure ϕ on B(X) such that for
every state x
ϕ(A) > 0 ⇒ L(x, A) > 0 (4.1)
then there exists an essentially unique “maximal” irreducibility measure ψ on B(X) such
that
(i) for every state x we have L(x, A) > 0 whenever ψ(A) > 0, and also
(ii) if ψ(A) = 0, then ψ(Ā) = 0, where

Ā := {y : L(y, A) > 0} ;

(iii) if ψ(Ac ) = 0, then A = A0 ∪ N where the set N is also ψ-null, and the set A0 is
absorbing in the sense that

P (x, A0 ) ≡ 1, x ∈ A0 .

Proof The existence of a measure ψ satisfying the irreducibility conditions (i) and
(ii) is shown in Proposition 4.2.2, and that (iii) holds is in Proposition 4.2.3.

The term “maximal” is justiﬁed since we will see that ϕ is absolutely continuous
with respect to ψ, written ψ ϕ, for every ϕ satisfying (4.1); here the relation of
absolute continuity of ϕ with respect to ψ means that ψ(A) = 0 implies ϕ(A) = 0.

75
76 Irreducibility

Verifying (4.1) is often relatively painless. State space models on Rk for which the
noise or disturbance distribution has a density with respect to Lebesgue measure will
typically have such a property, with ϕ taken as Lebesgue measure restricted to an open
set (see Section 4.4, or in more detail, Chapter 7); chains with a regeneration point
α reached from everywhere will satisfy (4.1) with the trivial choice of ϕ = δα (see
Section 4.3).
The extra benefit of defining much more accurately the sets which are avoided by
“most” points, as in Theorem 4.0.1 (ii), or of knowing that one can omit ψ-null sets and
restrict oneself to an absorbing set of “good” points as in Theorem 4.0.1 (iii), is then of
surprising value, and we use these properties again and again. These are however far
from the most significant consequences of the seemingly innocuous assumption (4.1):
far more will flow in Chapter 5, and thereafter.
The most basic structural results for Markov chains, which lead to this formalization
of the concept of irreducibility, involve the analysis of communicating states and sets. If
one can tell which sets can be reached with positive probability from particular starting
points x ∈ X, then one can begin to have an idea of how the chain behaves in the longer
term, and then give a more detailed description of that longer term behavior.
Our approach therefore commences with a description of communication between
sets and states which precedes the development of irreducibility.

4.1 Communication and irreducibility: Countable

spaces
When X is general, it is not always easy to describe the specific points or even sets
which can be reached from different starting points x ∈ X. To guide our development,
therefore, we will first consider the simpler and more easily understood situation when
the space X is countable; and to fix some of these ideas we will initially analyze briefly
the communication behavior of the random walk on a half line defined by (RWHL1), in
the case where the increment variable takes on integer values.

4.1.1 Communication: random walk on a half line

Recall that the random walk on a half line Φ is constructed from a sequence of i.i.d.
random variables {Wi } taking values in Z = (. . . , −2, −1, 0, 1, 2, . . .), by setting

Φn = [Φn −1 + Wn ]+ . (4.2)

We know from Section 3.3.2 that this construction gives, for y ∈ Z+ ,

P (x, y) = P(W1 = y − x),

P (x, 0) = P(W1 ≤ −x). (4.3)

In this example, we might single out the set {0} and ask: can the chain ever reach the
state {0}?
It is transparent from the deﬁnition of P (x, 0) that {0} can be reached with positive
probability, and in one step, provided the distribution Γ of the increment {Wn } has an
4.1. Communication and irreducibility: Countable spaces 77

infinite negative tail. But suppose we have, not such a long tail, but only P(Wn < 0) > 0,
with, say,
Γ(w) = δ > 0 (4.4)
for some w < 0. Then we have for any x that after n ≥ |x/w| steps,
Px (Φn = 0) ≥ P(W1 = w, W2 = w, . . . , Wn = w) = δ n > 0
so that {0} is always reached with positive probability.
On the other hand, if P(Wn < 0) = 0 then it is equally clear that {0} cannot be
reached with positive probability from any starting point other than 0. Hence L(x, 0) >
0 for all states x or for none, depending on whether (4.4) holds or not.
But we might also focus on points other than {0}, and it is then possible that a
number of different sorts of behavior may occur, depending on the distribution of W .
If we have P(W = y) > 0 for all y ∈ Z then from any state there is positive probability
of Φ reaching any other state at the next step. But suppose we have the distribution
of the increments {Wn } concentrated on the even integers, with
P(W = 2y) > 0, P(W = 2y + 1) = 0, y ∈ Z,
and consider any odd valued state, say w. In this case w cannot be reached from any
even valued state, even though from w itself it is possible to reach every state with
positive probability, via transitions of the chain through {0}.
Thus for this rather trivial example, we already see X breaking into two subsets with
substantially different behavior: writing Z0+ = {2y, y ∈ Z+ } and Z1+ = {2y + 1, y ∈ Z+ }
for the set of non-negative even and odd integers respectively, we have
Z+ = Z0+ ∪ Z1+ ,
and from y ∈ Z1+ , every state may be reached, whilst for y ∈ Z0+ , only states in Z0+ may
be reached with positive probability.
Why are these questions of importance?
As we have already seen, the random walk on a half line above is one with many
applications: recall that the transition matrices of N = {Nn } and N ∗ = {Nn∗ }, the
chains introduced in Section 2.4.2 to describe the number of customers in GI/M/1 and
M/G/1 queues, have exactly the structure described by (4.3).
The question of reaching {0} is then clearly one of considerable interest, since it rep-
resents exactly the question of whether the queue will empty with positive probability.
Equally, the fact that when {Wn } is concentrated on the even integers (representing
some degenerate form of batch arrival process) we will always have an even number of
customers has design implications for number of servers (do we always want to have
two?), waiting rooms and the like.
But our efforts should and will go into finding conditions to preclude such oddities,
and we turn to these in the next section, where we develop the concepts of communi-
cation and irreducibility in the countable space context.

4.1.2 Communicating classes and irreducibility

The idea of a Markov chain Φ reaching sets or points is much simpliﬁed when X is
countable and the behavior of the chain is governed by a transition probability matrix
78 Irreducibility

P = P (x, y), x, y ∈ X. There are then a number of essentially equivalent ways of

defining the operation of communication between states.
The simplest is to say that state x leads to state y, which we write as x → y, if
L(x, y) > 0, and that two distinct states x and y in X communicate, written x ↔ y,
when L(x, y) > 0 and L(y, x) > 0. By convention we also define x → x.
The relation x ↔ y is often defined equivalently by requiring that there exists
∞ y) ≥
n(x,
n
0 and m(y, x)∞≥ 0 such that P n (x, y) > 0 and P m (y, x) > 0; that is,
n
n =0 P (x, y) > 0 and n =0 P (y, x) > 0.

Proposition 4.1.1. The relation “↔” is an equivalence relation, and so the equivalence
classes C(x) = {y : x ↔ y} cover X, with x ∈ C(x).

Proof By convention x ↔ x for all x. By the symmetry of the deﬁnition, x ↔ y if

and only if y ↔ x.
Moreover, from the Chapman–Kolmogorov relationships (3.23) we have that if x ↔ y
and y ↔ z then x ↔ z. For suppose that x → y and y → z, and choose n(x, y) and
m(y, z) such that P n (x, y) > 0 and P m (y, z) > 0. Then we have from (3.23)

P n +m (x, z) ≥ P n (x, y)P m (y, z) > 0

so that x → z: the reverse direction is identical.

Chains for which all states communicate form the basis for future analysis.

Irreducible spaces and absorbing sets

If C(x) = X for some x, then we say that X (or the chain {Xn }) is irre-
ducible.
We say C(x) is absorbing if P (y, C(x)) = 1 for all y ∈ C(x).

When states do not all communicate, then although each state in C(x) communicates
with every other state in C(x), it is possible that there are states y ∈ [C(x)]c such that
x → y. This happens, of course, if and only if C(x) is not absorbing.
Suppose that X is not irreducible for Φ. If we reorder the states according to the
equivalence classes deﬁned by the communication operation, and if we further order the
classes with absorbing classes coming ﬁrst, then we have a decomposition of P such as
that depicted in Figure 4.1.
Here, for example, the blocks C(1), C(2) and C(3) correspond to absorbing classes,
and block D contains those states which are not contained in an absorbing class. In the
extreme case, a state in D may communicate only with itself, although it must lead to
some other state from which it does not return. We can write this decomposition as

X= C(x) ∪ D (4.5)
x∈I
4.1. Communication and irreducibility: Countable spaces 79

C(1)
0
C(2)
P =
0 C(3)

Figure 4.1: Block decomposition of P into communicating classes

where the sum is of disjoint sets.

This structure allows chains to be analyzed, at least partially, through their con-
stituent irreducible classes. We have

Proposition 4.1.2. Suppose that C := C(x) is an absorbing communicating class for

some x ∈ X. Let PC denote the matrix P restricted to the states in C. Then there
exists an irreducible Markov chain ΦC whose state space is restricted to C and whose
transition matrix is given by PC .

Proof We merely need to note that the elements of PC are positive, and

P (x, y) ≡ 1, x ∈ C,
y ∈C

because C is absorbing: the existence of ΦC then follows from Theorem 3.2.1, and
irreducibility of ΦC is an obvious consequence of the communicating class structure of
C.

Thus for non-irreducible chains, we can analyze at least the absorbing subsets in the
decomposition (4.5) as separate chains.
The virtue of the block decomposition described above lies largely in this assur-
ance that any chain on a countable space can be studied assuming irreducibility. The
“irreducible absorbing” pieces C(x) can then be put together to deduce most of the
properties of a reducible chain.
Only the behavior of the remaining states in D must be studied separately, and
in analyzing stability D may often be ignored. For let J denote the indices of the
states!for which the communicating classes are not absorbing. If the chain starts in
D = y ∈J C(y), then one of two things happens: either it reaches one of the absorbing
sets C(x), x ∈ X\J, in which case it gets absorbed: or, as the only other alternative,
the chain leaves every finite subset of D and “heads to infinity”.
To see why this might hold, observe that, for any fixed y ∈ J, there is some state
z ∈ C(y) with P (z, [C(y)]c ) = δ > 0 (since C(y) is not an absorbing class), and
P m (y, z) = β > 0 for some m > 0 (since C(y) is a communicating class). Suppose that
in fact the chain returns a number of times to y: then, on each of these returns, one
80 Irreducibility

has a probability greater than βδ of leaving C(y) exactly m + 1 steps later, and this
probability is independent of the past due to the Markov property.
Now, as is well known, if one tosses a coin with probability of a head given by βδ
infinitely often, then one eventually actually gets a head: similarly, one eventually leaves
the class C(y), and because of the nature of the relation x ↔ y, one never returns.
Repeating this argument for any finite set of states in D indicates that the chain
leaves such a finite set with probability one.
There are a number of things that need to be made more rigorous in order for this
argument to be valid: the forgetfulness of the chain at the random time of returning
to y, giving the independence of the trials, is a form of the strong Markov property in
Proposition 3.4.6, and the so-called “geometric trials argument” must be formalized, as
we will do in Proposition 8.3.1 (iii).
Basically, however, this heuristic sketch is sound, and shows the directions in which
we need to go: we find absorbing irreducible sets, and then restrict our attention to
them, with the knowledge that the remainder of the states lead to clearly understood
and (at least from a stability perspective) somewhat irrelevant behavior.

4.1.3 Irreducible models on a countable space

Some speciﬁc models will illustrate the concepts of irreducibility. It is valuable to notice
that, although in principle irreducibility involves P n for all n, in practice we usually
ﬁnd conditions only on P itself that ensure the chain is irreducible.

The forward recurrence time model

Let p be the increment distribution of a renewal process on Z+ , and write
r = sup(n : p(n) > 0). (4.6)
Then from the deﬁnition of the forward recurrence time chain it is immediate that the
set A = {1, 2, . . . , r} is absorbing, and the forward recurrence time chain restricted to
A is irreducible: for if x, y ∈ A, with x > y then P x−y (x, y) = 1 whilst
P y +r −x (y, x) > P y −1 (y, 1)p(r)P r −x (r, x) = p(r) > 0. (4.7)

Queueing models
Consider the number of customers N in the GI/M/1 queue. As shown in Proposi-
tion 3.3.1, we have P (x, x + 1) = p0 > 0, and so the structure of N ensures that by
iteration, for any x > 0
P x (0, x) > P (0, 1)P (1, 2) · · · P (x − 1, x) = [p0 ]x > 0.
But we also have P (x, 0) > 0 for any x ≥ 0: hence we conclude that for any pair
x, y ∈ X, we have
P y +1 (x, y) > P (x, 0)P y (0, y) > 0.
Thus the chain N is irreducible no matter what the distribution of the inter-arrival
times.
A similar approach shows that the embedded chain N∗ of the M/G/1 queue is always
irreducible.
4.2. ψ-Irreducibility 81

Unrestricted random walk

Let d be the greatest common divisor of {n : Γ(n) > 0}. If we have a random walk
on Z with increment distribution Γ, each of the sets Dr = {md + r, m ∈ Z} for each
r = 0, 1, . . . , d − 1 is absorbing, so that the chain is not irreducible.
However, provided Γ(−∞, 0) > 0 and Γ(0, ∞) > 0 the chain is irreducible when
restricted to any one Dr . To see this we can use Lemma D.7.4: since Γ(md) > 0 for all
m > m0 we only need to move m0 steps to the left and then we can reach all states in
Dr above our starting point in one more step. Hence this chain admits a finite number
of irreducible absorbing classes.
For a different type of behavior, let us suppose we have an increment distribution
on the integers, P(Wn = x) > 0, x ∈ Z, so that d = 1; but assume the chain itself is
defined on the whole set of rationals Q.
If we start at a value q ∈ Q then Φ “lives” on the set C(q) = {n + q, n ∈ Z}, which
is both absorbing and irreducible: that is, we have P (q, C(q)) = 1, q ∈ Q, and for any
r ∈ C(q), P (r, q) > 0 also.
Thus this chain admits a countably infinite number of absorbing irreducible classes,
in contrast to the behavior of the chain on the integers.

4.2 ψ-Irreducibility
4.2.1 The concept of ϕ-irreducibility
We now wish to develop similar concepts of irreducibility on a general space X. The
obvious problem with extending the ideas of Section 4.1.2 is that we cannot deﬁne an
analogue of “↔”, since, although we can look at L(x, A) to decide whether a set A
is reached from a point x with positive probability, we cannot say in general that we
return to single states x.
This is particularly the case for models such as the linear models for which the
n-step transition laws typically have densities; and even for some of the models such
as storage models where there is a distinguished reachable point, there are usually no
other states to which the chain returns with positive probability.
This means that we cannot develop a decomposition such as (4.5) based on a count-
able equivalence class structure: and indeed the question of existence of a so-called
“Doeblin decomposition”

X= C(x) ∪ D, (4.8)
x∈I

with the sets C(x) being a countable collection of absorbing sets in B(X) and the
“remainder” D being a set which is in some sense ephemeral, is a non-trivial one.
We shall not discuss such reducible decompositions in this book although, remarkably,
under a variety of reasonable conditions such a countable decomposition does hold for
chains on quite general state spaces.
Rather than developing this type of decomposition structure, it is much more fruitful
to concentrate on irreducibility analogues. The one which forms the basis for much
modern general state space analysis is ϕ-irreducibility.
82 Irreducibility

ϕ-Irreducibility for general space chains

We call Φ = {Φn } ϕ-irreducible if there exists a measure ϕ on B(X) such
that, whenever ϕ(A) > 0, we have L(x, A) > 0 for all x ∈ X.

There are a number of alternative formulations of ϕ-irreducibility. Deﬁne the tran-

sition kernel
∞

Ka 1 (x, A) := P n (x, A)2−(n +1) , x ∈ X, A ∈ B(X); (4.9)
2
n =0

this is a special case of the resolvent of Φ introduced in Section 3.4.2, and which we
consider in Section 5.5.1 in more detail. The kernel Ka 1 deﬁnes for each x a probability
∞ 2
measure equivalent to I(x, A) + U (x, A) = n =0 P n (x, A), which may be inﬁnite for
many sets A.

Proposition 4.2.1. The following are equivalent formulations of ϕ-irreducibility:

(i) for all x ∈ X, whenever ϕ(A) > 0, U (x, A) > 0;

(ii) for all x ∈ X, whenever ϕ(A) > 0, there exists some n > 0, possibly depending on
both A and x, such that P n (x, A) > 0;

(iii) for all x ∈ X, whenever ϕ(A) > 0 then Ka 1 (x, A) > 0.

Proof The only point that needs to be proved is that if L(x, A) > 0 for all x ∈ Ac
then, since L(x, A) = P (x, A) + A c P (x, dy)L(y, A), we have L(x, A) > 0 for all x ∈ X:
thus the inclusion of the zero-time term in Ka 1 does not aﬀect the irreducibility.

2

We will use these diﬀerent expressions of ϕ-irreducibility at diﬀerent times without

further comment.

4.2.2 Maximal irreducibility measures

Although seemingly relatively weak, the assumption of ϕ-irreducibility precludes several
obvious forms of “reducible” behavior. The deﬁnition guarantees that “big” sets (as
measured by ϕ) are always reached by the chain with some positive probability, no
matter what the starting point: consequently, the chain cannot break up into separate
“reduced” pieces.
For many purposes, however, we need to know the reverse implication: that “negli-
gible” sets B, in the sense that ϕ(B) = 0, are avoided with probability one from most
starting points. This is by no means the case in general: any non-trivial restriction of an
irreducibility measure is obviously still an irreducibility measure, and such restrictions
can be chosen to give zero weight to virtually any selected part of the space.
For example, on a countable space if we only know that x → x∗ for every x and
some speciﬁc state x∗ ∈ X, then the chain is δx ∗ -irreducible.
4.2. ψ-Irreducibility 83

This is clearly rather weaker than normal irreducibility on countable spaces, which
demands two-way communication. Thus we now look to measures which are extensions,
not restrictions, of irreducibility measures, and show that the ϕ-irreducibility condition
extends in such a way that, if we do have an irreducible chain in the sense of Section 4.1,
then the natural irreducibility measure (namely counting measure) is generated as a
“maximal” irreducibility measure.
The maximal irreducibility measure will be seen to deﬁne the range of the chain much
more completely than some of the other more arbitrary (or pragmatic) irreducibility
measures one may construct initially.

Proposition 4.2.2. If Φ is ϕ-irreducible for some measure ϕ, then there exists a

probability measure ψ on B(X) such that

(i) Φ is ψ-irreducible;

(ii) for any other measure ϕ , the chain Φ is ϕ -irreducible if and only if ψ ϕ ;

(iii) if ψ(A) = 0, then ψ {y : L(y, A) > 0} = 0;

(iv) the probability measure ψ is equivalent to

ψ (A) := ϕ (dy)Ka 1 (y, A),

X 2

for any ﬁnite irreducibility measure ϕ .

Proof Since any probability measure which is equivalent to the irreducibility mea-
sure ϕ is also an irreducibility measure, we can assume without loss of generality that
ϕ(X) = 1. Consider the measure ψ constructed as

ψ(A) := ϕ(dy)K 12 (y, A). (4.10)
X

It is obvious that ψ is also a probability measure on B(X). To prove that ψ has all the
required properties, we use the sets
" #
k
n −1
Ā(k) = y : P (y, A) > k .
n =1

The stated properties now involve repeated use of the Chapman–Kolmogorov equations.
To see (i), observe that when ψ(A)
$ >0, then from (4.10),% there exists some k such
that ϕ(Ā(k)) > 0, since Ā(k) ↑ y : n ≥1 P n (y, A) > 0 = X. For any ﬁxed x, by
ϕ-irreducibility there is thus some m such that P m (x, Ā(k)) > 0. Then we have

k
k
P m +n (x, A) = P m (x, dy) P n (y, A) ≥ k −1 P m (x, Ā(k)) > 0,
n =1 X n =1

which establishes ψ-irreducibility.

84 Irreducibility

Next let ϕ be such that Φ is ϕ -irreducible. If ϕ (A) > 0, we have n P n (y, A) > 0
for all y, and by its deﬁnition ψ(A) > 0, whence ψ ϕ . Conversely, suppose that
the chain is ψ-irreducible and that ψ ϕ . If ϕ {A} > 0 then ψ{A} > 0 also, and by
ψ-irreducibility it follows that Ka 1 (x, A) > 0 for any x ∈ X. Hence Φ is ϕ -irreducible,
2
as required in (ii).
Result (iv) follows from the construction (4.10) and the fact that any two maximal
irreducibility measures are equivalent, which is a consequence of (ii).
Finally, we have that

ψ(dy)P m (y, A)2−m = ϕ(dy) P m +n (y, A)2−(n +m +1) ≤ ψ(A)
X X n

from which the property (iii) follows immediately.

Although there are other approaches to irreducibility, we will generally restrict our-
selves, in the general space case, to the concept of ϕ-irreducibility; or rather, we will
seek conditions under which it holds. We will consistently use ψ to denote an arbitrary
maximal irreducibility measure for Φ.

ψ-Irreducibility notation

(i) The Markov chain is called ψ-irreducible if it is ϕ-irreducible for some

ϕ and the measure ψ is a maximal irreducibility measure satisfying
the conditions of Proposition 4.2.2.

(ii) We write
B + (X) := {A ∈ B(X) : ψ(A) > 0}
for the sets of positive ψ-measure; the equivalence of maximal irre-
ducibility measures means that B + (X) is uniquely deﬁned.

(iii) We call a set A ∈ B(X) full if ψ(Ac ) = 0.

(iv) We call a set A ∈ B(X) absorbing if P (x, A) = 1 for x ∈ A.

The following result indicates the links between absorbing and full sets. This result
seems somewhat academic, but we will see that it is often the key to showing that very
many properties hold for ψ-almost all states.

Proposition 4.2.3. Suppose that Φ is ψ-irreducible. Then

(i) every absorbing set is full,

(ii) every full set contains a non-empty, absorbing set.

4.2. ψ-Irreducibility 85

Proof If A is absorbing, then were ψ(Ac ) > 0, it would contradict the deﬁnition
of ψ as an irreducibility measure: hence A is full.
Suppose now that A is full, and set
∞

B = {y ∈ X : P n (y, Ac ) = 0}.
n =0

We have the inclusion B ⊆ A since P 0 (y, Ac ) = 1 for y ∈ Ac . Since ψ(Ac ) = 0, from

Proposition 4.2.2 (iii) we know ψ(B) > 0, so in particular B is non-empty. By the
Chapman–Kolmogorov relationship, if P (y, B c ) > 0 for some y ∈ B, then we would
have
∞ $
∞ %
P n +1
(y, A ) ≥
c
P (y, dz) P n (z, Ac )
n =0 Bc n =0

which is positive: but this is impossible, and thus B is the required absorbing set.

If a set C is absorbing and if there is a measure ψ for which

ψ(B) > 0 ⇒ L(x, B) > 0, x ∈ C,

then we will call C an absorbing ψ-irreducible set.

Absorbing sets on a general space have exactly the properties of those on a countable
space given in Proposition 4.1.2.

Proposition 4.2.4. Suppose that A is an absorbing set. Let PA denote the kernel P
restricted to the states in A. Then there exists a Markov chain ΦA whose state space
is A and whose transition matrix is given by PA . Moreover, if Φ is ψ-irreducible then
ΦA is ψ-irreducible.

Proof The existence of ΦA is guaranteed by Theorem 3.4.1 since PA (x, A) ≡ 1, x ∈

A. If Φ is ψ-irreducible then A is full and the result is immediate by Proposition 4.2.3.

The eﬀect of these two propositions is to guarantee the eﬀective analysis of restric-
tions of chains to full sets, and we shall see that this is indeed a fruitful avenue of
approach.

4.2.3 Uniform accessibility of sets

Although the relation x ↔ y is not a generally useful one when X is uncountable, since
P n (x, y) = 0 in many cases, we now introduce the concepts of “accessibility” and,
more usefully, “uniform accessibility” which strengthens the notion of communication
on which ψ-irreducibility is based.
We will use uniform accessibility for chains on general and topological state spaces to
develop solidarity results which are almost as strong as those based on the equivalence
relation x ↔ y for countable spaces.
86 Irreducibility

Accessibility
We say that a set B ∈ B(X) is accessible from another set A ∈ B(X) if
L(x, B) > 0 for every x ∈ A;
We say that a set B ∈ B(X) is uniformly accessible from another set
A ∈ B(X) if there exists a δ > 0 such that

inf L(x, B) ≥ δ; (4.11)

x∈A

and when (4.11) holds we write A B.

The critical aspect of the relation “A B” is that it holds uniformly for x ∈ A. In

general the relation “” is non-reﬂexive although clearly there may be sets A, B such
that A is uniformly accessible from B and B is uniformly accessible from A.
Importantly, though, the relationship is transitive. In proving this we use the nota-
tion
∞
UA (x, B) = n
A P (x, B), x ∈ X, A, B ∈ B(X);
n =1

introduced in (3.34).
Lemma 4.2.5. If A B and B C, then A C.

Proof Since the probability of ever reaching C is greater than the probability of
ever reaching C after the ﬁrst visit to B, we have

inf UC (x, C) ≥ inf UB (x, dy)UC (y, C) ≥ inf UB (y, B) inf UC (y, C) > 0
x∈A x∈A B x∈A x∈B

as required.

We shall use the following notation to describe the communication structure of the
chain.

Communicating sets
The set Ā := {x ∈ X : L(x, A) > 0} is the set of points from which A is
accessible.
m
The set Ā(m) := {x ∈ X : n =1 P n (x, A) ≥ m−1 }.
The set A0 := {x ∈ X : L(x, A) = 0} = [Ā]c is the set of points from which
A is not accessible.

Lemma 4.2.6. The set Ā = ∪m Ā(m), and for each m we have Ā(m) A.
4.3. ψ-Irreducibility for random walk models 87

Proof The ﬁrst statement is obvious, whilst the second follows by noting that for
all x ∈ Ā(m) we have
L(x, A) ≥ Px (τA ≤ m) ≥ m−2 .

It follows that if the chain is ψ-irreducible, then we can ﬁnd a countable cover of
X with sets from which any other given set A in B + (X) is uniformly accessible, since
Ā = X in this case.

4.3 ψ-Irreducibility for random walk models

One of the main virtues of ψ-irreducibility is that it is even easier to check than the
standard deﬁnition of irreducibility introduced for countable chains. We ﬁrst illustrate
this using a number of models related to random walk.

4.3.1 Random walk on a half line

Let Φ be a random walk on the half line [0, ∞), with transition law as in Section 3.5.
The communication structure of this chain is made particularly easy because of the
“atom” at {0}.
Proposition 4.3.1. The random walk on a half line Φ = {Φn } with increment variable
W is ϕ-irreducible, with ϕ(0, ∞) = 0, ϕ({0}) = 1, if and only if
P(W < 0) = Γ(−∞, 0) > 0; (4.12)
and in this case if C is compact then C {0}.

Proof The necessity of (4.12) is trivial. Conversely, suppose for some δ, ε > 0,
Γ(−∞, −ε) > δ. Then for any n, if x/ε < n,
P n (x, {0}) ≥ δ n > 0.
If C = [0, c] for some c, then this implies for all x ∈ C that
Px (τ0 ≤ c/ε) ≥ δ 1+c/ε
so that C {0} as in Lemma 4.2.6.

It is often as simple as this to establish ϕ-irreducibility: it is not a difficult condition
to confirm, or rather, it is often easy to set up “grossly sufficient” conditions such as
(4.12) for ϕ-irreducibility.
Such a construction guarantees ϕ-irreducibility, but it does not tell us very much
about the motion of the chain. There are clearly many sets other than {0} which the
chain will reach from any starting point. To describe them in this model we can easily
construct the maximal irreducibility measure. By considering the motion of the chain
after it reaches {0} we see that Φ is also ψ-irreducible, where

ψ(A) = P n (0, A)2−n ;
n

we have that ψ is maximal from Proposition 4.2.2.

88 Irreducibility

4.3.2 Storage models

If we apply the result of Proposition 4.3.1 to the simple storage model deﬁned by (SSM1)
and (SSM2), we will establish ψ-irreducibility provided we have

P(Sn − Jn < 0) > 0.

Provided there is some probability that no input takes place over a period long enough
to ensure that the effect of the increment Sn is eroded, we will achieve δ0 -irreducibility
in one step. This amounts to saying that we can “turn off” the input for a period longer
than s whenever the last input amount was s, or that we need a positive probability of
the input remaining turned off for longer than s/r. One sufficient condition for this is
obviously that the distribution H have infinite tails.
Such a construction may fail without the type of conditions imposed here. If, for
example, the input times are deterministic, occurring at every integer time point, and
if the input amounts are always greater than unity, then we will not have an irreducible
system: in fact we will have, in the terms of Chapter 9 below, an evanescent system
which always avoids compact sets below the initial state.
An underlying structure as pathological as this seems intuitively implausible, of
course, and is in any case easily analyzed. But in the case of content-dependent release
rules,
x
it is not so obvious that the chain is always ϕ-irreducible. If we assume R(x) =
−1
0
[r(y)] dy < ∞ as in (2.32), then again if we can “turn off” the input process for
longer than R(x) we will hit {0}; so if we have

P(Ti > R(x)) > 0

for all x we have a δ0 -irreducible model. But if we allow R(x) = ∞ as we may wish
to do for some release rules where r(x) → 0 slowly as x → 0, which is not unrealistic,
then even if the inter-input times Ti have inﬁnite tails, this simple construction will fail.
The empty state will never be reached, and some other approach is needed if we are to
establish ϕ-irreducibility.
In such a situation, we will still get µL e b -irreducibility, where µL e b is Lebesgue mea-
sure, if the inter-input times Ti have a density with respect to µL e b : this can be de-
termined by modifying the “turning oﬀ” construction above. Exact conditions for
ϕ-irreducibility in the completely general case appear to be unknown to date.

4.3.3 Unrestricted random walk

The random walk on a half line, and the various applications of it in storage and
queueing, have a single state reached from all initial points, which forms a natural
candidate to generate an irreducibility measure. The unrestricted random walk requires
more analysis, and is an example where the irreducibility measure is not formed by a
simple regenerative structure.
For unrestricted random walk Φ given by

Φk +1 = Φk + Wk +1 ,

and satisfying the assumption (RW1), let us suppose the increment distribution Γ of
{Wn } has an absolutely continuous part with respect to Lebesgue measure µL e b on R,
4.4. ψ-Irreducible linear models 89

with a density γ which is positive and bounded from zero at the origin; that is, for some
β > 0, δ > 0,
P(Wn ∈ A) ≥ γ(x) dx,
A

and
γ(x) ≥ δ > 0, |x| < β.
Set C = {x : |x| ≤ β/2} : if B ⊆ C, and x ∈ C then

P (x, B) = P (W1 ∈ B − x)

≥ γ(y) dy
B −x
≥ δµL e b (B).

But now, exactly as in the previous example, from any x we can reach C in at most n =
2|x|/β steps with positive probability, so that µL e b restricted to C forms an irreducibility
measure for the unrestricted random walk.
Such behavior might not hold without a density. Suppose we take Γ concentrated
on the rationals Q, with Γ(r) > 0, r ∈ Q. After starting at a value r ∈ Q the chain Φ
“lives” on the set {r + q, q ∈ Q} = Q so that Q is absorbing. But for any x ∈ R the
set {x + q, q ∈ Q} = x + Q is also absorbing, and thus we can produce, for this random
walk on R, an uncountably inﬁnite number of absorbing irreducible sets.
It is precisely this type of behavior we seek to exclude for chains on a general space,
by introducing the concepts of ψ-irreducibility above.

4.4 ψ-Irreducible linear models

4.4.1 Scalar models
Let us consider the scalar autoregressive AR(k) model

Yn = α1 Yn −1 + α2 Yn −2 + · · · + αk Yn −k + Wn ,

where α1 , . . . , αk ∈ R, as deﬁned in (AR1). If we assume the Markovian representation

in (2.1), then we can determine conditions for ψ-irreducibility very much as for random
walk.
In practice the condition most likely to be adopted is that the innovation process
W has a distribution Γ with an everywhere positive density. If the innovation process
is Gaussian, for example, then clearly this condition is satisfied. We will see below, in
the more general Proposition 4.4.3, that the chain is then µL e b -irreducible regardless of
the values of α1 , . . . , αk .
It is however not always sufficient for ϕ-irreducibility to have a density only positive
in a neighborhood of zero. For suppose that W is uniform on [−1, 1], and that k = 1
so we have a first order autoregression. If |α1 | ≤ 1 the chain will be µL[−1,1]
eb
-irreducible
under such a density condition: the argument is the same as for the random walk. But
if |α1 | > 1, then once we have an initial state larger than (|α1 | − 1)−1 , the chain will
monotonically “explode” towards infinity and will not be irreducible.
90 Irreducibility

This same argument applies to the general model (2.1) if the zeros of the polynomial
A(z) = 1 − α1 z 1 − · · · − αk z k lie outside of the closed unit disk in the complex plane C.
In this case Yn → 0 as n → ∞ when Wn is set equal to zero, and from this observation
it follows that it is possible for the chain to reach [−1, 1] at some time in the future
from every initial condition. If some root of A(z) lies within the open unit disk in C
then again “explosion” will occur and the chain will not be irreducible.
Our argument here is rather like that in the dam model, where we considered de-
terministic behavior with the input “turned oﬀ”. We need to be able to drive the chain
deterministically towards a center of the space, and then to be able to ensure that the
random mechanism ensures that the behavior of the chain from initial conditions in
that center are comparable.
We formalize this for multidimensional linear models in the rest of this section.

4.4.2 Communication for linear control models

Recall that the linear control model LCM(F ,G) defined in (LCM1) by xk +1 = F xk +
Guk +1 is called controllable if for each pair of states x0 , x ∈ X, there exists m ∈
Z+ and a sequence of control variables (u1 , . . . , um ) ∈ Rp such that xm = x when
(u1 , . . . , um ) = (u1 , . . . , um ), and the initial condition is equal to x0 .
This is obviously a concept of communication between states for the deterministic
model: we can choose the inputs uk in such a way that all states can be reached
from any starting point. We first analyze this concept for the deterministic control
model then move on to the associated linear state space model LSS(F ,G), where we
see that controllability of LCM(F ,G) translates into ψ-irreducibility of LSS(F ,G) under
appropriate conditions on the noise sequence.
For the LCM(F ,G) model it is possible to decide explicitly using a finite procedure
when such control can be exerted. We use the following rank condition for the pair of
matrices (F, G):

Controllability for the linear control model

Suppose that the matrices F and G have dimensions n × n and n × p,
respectively.
(LCM3) The matrix

Cn := [F n −1 G | · · · | F G | G] (4.13)

is called the controllability matrix, and the pair of matrices (F, G) is called
controllable if the controllability matrix Cn has rank n.

It is a consequence of the Cayley Hamilton Theorem, which states that any power F k
is equal to a linear combination of {I, F, . . . , F n −1 }, where n is equal to the dimension
of F (see [57] for details), that (F, G) is controllable if and only if
[F k −1 G | · · · | F G | G]
4.4. ψ-Irreducible linear models 91

has rank n for some k ∈ Z+ .

Proposition 4.4.1. The linear control model LCM(F ,G) is controllable if the pair
(F, G) satisfy the rank condition (LCM3).

Proof When this rank condition holds it is straightforward that in the LCM(F ,G)
model any state can be reached from any initial condition in k steps using some control
sequence (u1 , . . . , uk ), for we have by


u1
 
xk = F k x0 + [F k −1 G | · · · | F G | G]  ...  (4.14)
uk

and the rank condition implies that the range space of the matrix [F k −1 G | · · · | F G | G]
is equal to Rn .

This gives us as an immediate application

Proposition 4.4.2. The autoregressive AR(k) model may be described by a linear con-
trol model (LCM1), which can always be constructed so that it is controllable.

Proof For the linear control model associated with the autoregressive model de-
scribed by (2.1), the state process x is deﬁned inductively by
   
α1 ··· ··· αk 1
1 0 0
   
xn =  .. ..  xn −1 +  ..  un ,
 . .  .
0 1 0 0

and we can compute the controllability matrix Cn of (LCM3) explicitly:

 
ηk −1 ··· η2 η1 1
 .. 
 . · 1 0
 
 .. 
Cn = [F n −1 G | · · · | F G | G] =  η2 · .
 
 .. 
 η1 1 .
1 0 ··· ··· 0

where we deﬁne η0 = 1, ηi = 0 for i < 0, and for j ≥ 2,

k
ηj = αi ηj −i .
i=1

The triangular structure of the controllability matrix now implies that the linear control
system associated with the AR(k) model is controllable.

92 Irreducibility

4.4.3 Gaussian linear models

For the LSS(F ,G) model
Xk +1 = F Xk + GWk +1
described by (LSS1) and (LSS2) to be ψ-irreducible, we now show that it is sufficient
that the associated LCM(F ,G) model be controllable and the noise sequence W have
a distribution that in effect allows a full cross-section of the possible controls to be
chosen. We return to the general form of this in Section 6.3.2 but address a specific
case of importance immediately. The Gaussian linear state space model is described by
(LSS1) and (LSS2) with the additional hypothesis

Disturbance for the Gaussian state space model

(LSS3) The noise variable W has a Gaussian distribution on Rp with
zero mean and unit variance: that is, W ∼ N (0, I), where I is the p × p
identity matrix.

If the dimension p of the noise were the same as the dimension n of the space, and if
the matrix G were full rank, then the argument for scalar models in Section 4.4 would
immediately imply that the chain is µL e b -irreducible. In more general situations we use
controllability to ensure that the chain is µL e b -irreducible.

Proposition 4.4.3. Suppose that the LSS(F ,G) model is Gaussian and the associated
control model is controllable.
Then the LSS(F ,G) model is ϕ-irreducible for any non-trivial measure ϕ which
possesses a density on Rn , Lebesgue measure is a maximal irreducibility measure, and
for any compact set A and any set B with positive Lebesgue measure we have A B.

Proof If we can prove that the distribution P k (x, · ) is absolutely continuous with
respect to Lebesgue measure, and has a density which is everywhere positive on Rn , it
will follow that for any ϕ which is non-trivial and also possesses a density, P k (x, · ) ϕ
for all x ∈ Rn : for any such ϕ the chain is then ϕ-irreducible. This argument also shows
that Lebesgue measure is a maximal irreducibility measure for the chain.
Under condition (LSS3), for each deterministic initial condition x0 ∈ X = Rn , the
distribution of Xk is also Gaussian for each k ∈ Z+ by linearity, and so we need only
to prove that P k (x, · ) is not concentrated on some lower dimensional subspace of Rn .
This will happen if and only if the variance of the distribution P k (x, · ) is of full rank
for each x.
We can compute the mean and variance of Xk to obtain conditions under which this
occurs. Using (4.14) and (LSS3), for each initial condition x0 ∈ X the conditional mean
of Xk is easily computed as

µk (x0 ) := Ex 0 [Xk ] = F k x0 (4.15)

4.5. Commentary 93

and the conditional variance of Xk is given independently of x0 by

k −1
Σk := Ex 0 [(Xk − µk (x0 ))(Xk − µk (x0 )) ] = F i GG F i . (4.16)
i=0

Using (4.16), the variance of Xk has full rank n for some k if and only if the controllability
grammian, defined as
∞
F i GG F i , (4.17)
i=0
has rank n. From the Cayley Hamilton Theorem again, the conditional variance of Xk
has rank n for some k if and only if the pair (F, G) is controllable and, if this is the
case, then one can take k = n.
Under (LSS1)–(LSS3), it thus follows that the k-step transition function possesses
a smooth density; we have P k (x, dy) = pk (x, y)dy where
( )
pk (x, y) = (2π|Σk |)−k /2 exp − 21 (y − F k x) Σ−1
k (y − F x)
k
(4.18)
and |Σk | denotes the determinant of the matrix Σk . Hence P k (x, · ) has a density which
is everywhere positive, as required, and this implies finally that for any compact set A
and any set B with positive Lebesgue measure we have A B.

Assuming, as we do in the result above, that W has a density which is everywhere
positive is clearly something of a sledge hammer approach to obtaining ψ-irreducibility,
even though it may be widely satisfied. We will introduce more delicate methods in
Chapter 7 which will allow us to relax the conditions of Proposition 4.4.3.
Even if (F, G) is not controllable then we can obtain an irreducible process, by
appropriate restriction of the space on which the chain evolves, under the Gaussian
assumption. To define this formally, we let X0 ⊂ X denote the range space of the
controllability matrix:

X0 = R [F n −1 G | · · · | F G | G]
$n−1 %
= F i Gwi : wi ∈ Rp ,
i=0

which is also the range space of the controllability grammian. If x0 ∈ X0 then so is

F x0 + Gw1 for any w1 ∈ Rp . This shows that the set X0 is absorbing, and hence the
LSS(F,G) model may be restricted to X0 .
The restricted process is then described by a linear state space model, similar to
(LSS1), but evolving on the space X0 whose dimension is strictly less than n. The
matrices (F0 , G0 ) which deﬁne the dynamics of the restricted process are a controllable
pair, so that by Proposition 4.4.3, the restricted process is µL e b -irreducible.

4.5 Commentary
The communicating class concept was introduced in the initial development of countable
chains by Kolmogorov [216] and used systematically by Feller [114] and Chung [71] in
developing solidarity properties of states in such a class.
94 Irreducibility

The use of ψ-irreducibility as a basic tool for general chains was essentially developed
by Doeblin [93, 95], and followed up by many authors, including Doob [99], Harris [155],
Chung [70], Orey [308]. Much of their analysis is considered in greater detail in later
chapters. The maximal irreducibility measure was introduced by Tweedie [394], and the
result on full sets is given in the form we use by Nummelin [303]. Although relatively
simple they have wide-ranging implications.
Other notions of irreducibility exist for general state space Markov chains. One can,
for example, require that the transition probabilities
∞

K 12 (x, ·) = P n (x, ·)2−(n +1)
n =0

all have the same null sets. In this case the maximal measure ψ will be equivalent to
K 12 (x, ·) for every x. This was used by Nelson [291] and Šidák [353] to derive solidarity
properties for general state space chains similar to those we will consider in Part II. This
condition, though, is hard to check, since one needs to know the structure of P n (x, ·)
in some detail; and it appears too restrictive for the minor gains it leads to.
In the other direction, one might weaken ϕ-irreducibility by requiring only that,
whenever ϕ(A) > 0, we have n P n (x, A) > 0 only for ϕ-almost all x ∈ X. Whilst
this expands the class of “irreducible” models, it does not appear to be noticeably more
useful in practice, and has the drawback that many results are much harder to prove
as one tracks the uncountably many null sets which may appear. Revuz [326] Chapter
3 has a discussion of some of the results of using this weakened form.
The existence of a block decomposition of the form

X= C(x) ∪D
x∈I

such as that for countable chains, where the sum is of disjoint irreducible sets and D is
in some sense ephemeral, has been widely studied. A recent overview is in Meyn and
Tweedie [281], and the original ideas go back, as so often, to Doeblin [95], after whom
such decompositions are named. Orey [309], Chapter 9, gives a very accessible account
of the measure-theoretic approach to the Doeblin decomposition.
Application of results for ψ-irreducible chains has become more widespread recently,
but the actual usage has suﬀered a little because of the somewhat inadequate available
discussion in the literature of practical methods of verifying ψ-irreducibility. Typically
the assumptions are far too restrictive, as is the case in assuming that innovation pro-
cesses have everywhere positive densities or that accessible regenerative atoms exist (see
for example Laslett et al. [237] for simple operations research models, or Tong [388] in
time series analysis).
The detailed analysis of the linear model begun here illustrates one of the recur-
ring themes of this book: the derivation of stability properties for stochastic models
by consideration of the properties of analogous controlled deterministic systems. The
methods described here have surprisingly complete generalizations to nonlinear mod-
els. We will come back to this in Chapter 7 when we characterize irreducibility for the
NSS(F ) model using ideas from nonlinear control theory.
4.5. Commentary 95

Irreducibility, whilst it is a cornerstone of the theory and practice to come, is

nonetheless rather a mundane aspect of the behavior of a Markov chain. We now explore
some far more interesting consequences of the conditions developed in this chapter.
Chapter 5

Pseudo-atoms

Much Markov chain theory on a general state space can be developed in complete
analogy with the countable state situation when X contains an atom for the chain Φ.

Atoms
A set α ∈ B(X) is called an atom for Φ if there exists a measure ν on B(X)
such that
P (x, A) = ν(A), x ∈ α.

If Φ is ψ-irreducible and ψ(α) > 0 then α is called an accessible atom.

A single point α is always an atom. Clearly, when X is countable and the chain is
irreducible then every point is an accessible atom.
On a general state space, accessible atoms are less frequent. For the random walk
on a half line as in (RWHL1), the set {0} is an accessible atom when Γ(−∞, 0) > 0:
as we have seen in Proposition 4.3.1, this chain has ψ({0}) > 0. But for the random
walk on R when Γ has a density, accessible atoms do not exist.
It is not too strong to say that the single result which makes general state space
Markov chain theory as powerful as countable space theory is that there exists an
“artiﬁcial atom” for ϕ-irreducible chains, even in cases such as the random walk with
absolutely continuous increments. The highlight of this chapter is the development of
this result, and some of its immediate consequences.
Atoms are found for “strongly aperiodic” chains by constructing a “split chain” Φ̌
evolving on a split state space X̌ = X0 ∪ X1 , where X0 and X1 are copies of the state
space X, in such a way that

(i) the chain Φ is the marginal chain of Φ̌, in the sense that P(Φk ∈ A) = P(Φ̌k ∈
A0 ∪ A1 ) for appropriate initial distributions, and

(ii) the “bottom level” X1 is an accessible atom for Φ̌.

96
5.1. Splitting ϕ-irreducible chains 97

The existence of a splitting of the state space in such a way that the bottom level is an
atom is proved in the next section. The proof requires the existence of so-called “small
sets” C, which have the property that there exists an m > 0, and a minorizing measure
ν on B(X) such that for any x ∈ C,

P m (x, B) ≥ ν(B). (5.1)

In Section 5.2, we show that, provided the chain is ψ-irreducible

∞
*
X= Ci
1

where each Ci is small: thus we have that the splitting is always possible for such chains.
Another non-trivial consequence of the introduction of small sets is that on a general
space we have a ﬁnite cyclic decomposition for ψ-irreducible chains: there is a cycle of
sets Di , i = 0, 1, . . . , d − 1 such that

*
d−1
X=N∪ Di
0

where ψ(N ) = 0 and P (x, Di ) ≡ 1 for x ∈ Di−1 (mod d). A more general and more
tractable class of sets called petite sets are introduced in Section 5.5: these are used
extensively in the sequel, and in Theorem 5.5.7 we show that every petite set is small
if the chain is aperiodic.

5.1 Splitting ϕ-irreducible chains

Before we get to these results let us ﬁrst consider some simpler consequences of the
existence of atoms.
As an elementary ﬁrst step, it is clear from the proof of the existence of a maximal
irreducibility measure in Proposition 4.2.2 that we have an easy construction of ψ when
X contains an atom.

Proposition 5.1.1. Suppose there is an atom α in X such that n P n (x, α) > 0 for
all x ∈ X. Then α is an accessible atom and Φ is ν-irreducible with ν = P (α, · ).

Proof We have, by the Chapman–Kolmogorov equations, that for any n ≥ 1

P n +1
(x, A) ≥ P n (x, dy)P (y, A)
α
= P n (x, α)ν(A)

which gives the result by summing over n.

The uniform communication relation “ A” introduced in Section 4.2.3 is also
simpliﬁed if we have an atom in the space: it is no more than the requirement that
there is a set of paths to A of positive probability, and the uniformity is automatic.
98 Pseudo-atoms

Proposition 5.1.2. If L(x, A) > 0 for some state x ∈ α, where α is an atom, then
α A.

In many cases the “atoms” in a state space will be real atoms: that is, single points
which are reached with positive probability.
Consider the level in a dam in any of the storage models analyzed in Section 4.3.2.
It follows from Proposition 4.3.1 that the single point {0} forms an accessible atom
satisfying the hypotheses of Proposition 5.1.1, even when the input and output processes
are continuous.
However, our reason for featuring atoms is not because some models have singletons
which can be reached with probability one: it is because even in the completely general
ψ-irreducible case, by suitably extending the probabilistic structure of the chain, we are
able to artificially construct sets which have an atomic structure and this allows much
of the critical analysis to follow the form of the countable chain theory.
This unexpected result is perhaps the major innovation in the analysis of general
Markov chains in the last two decades. It was discovered in slightly different forms,
independently and virtually simultaneously, by Nummelin [301] and by Athreya and
Ney [13].
Although the two methods are almost identical in a formal sense, in what follows we
will concentrate on the Nummelin splitting, touching only briefly on the Athreya–Ney
random renewal time method as it fits less well into the techniques of the rest of this
book.

5.1.1 Minorization and splitting

To construct the artiﬁcial atom or regeneration point involves a probabilistic “splitting”
of the state space in such a way that atoms for a “split chain” become natural objects.
In order to carry out this construction we need to consider sets satisfying the fol-
lowing

Minorization condition
For some δ > 0, some C ∈ B(X) and some probability measure ν with
ν(C c ) = 0 and ν(C) = 1

P (x, A) ≥ δIC (x)ν(A), A ∈ B(X), x ∈ X. (5.2)

The form (5.2) ensures that the chain has probabilities uniformly bounded below
by multiples of ν for every x ∈ C. The crucial question is, of course, whether any
chains ever satisfy the minorization condition. This is answered in the positive in
Theorem 5.2.2 below: for ϕ-irreducible chains “small sets” for which the minorization
condition holds exist, at least for the m-skeleton. The existence of such small sets is
a deep and diﬃcult result: by indicating ﬁrst how the minorization condition provides
5.1. Splitting ϕ-irreducible chains 99

the promised atomic structure to a split chain, we motivate rather more strongly the
development of Theorem 5.2.2.
In order to construct a split chain, we split both the space and all measures that
are defined on B(X).
We first split the space X itself by writing X̌ = X × {0, 1}, where X0 := X × {0} and
X1 := X × {1} are thought of as copies of X equipped with copies B(X0 ), B(X1 ) of the
σ-field B(X)
We let B(X̌) be the σ-field of subsets of X̌ generated by B(X0 ), B(X1 ): that is, B(X̌)
is the smallest σ-field containing sets of the form A0 :=A×{0}, A1 :=A×{1}, A ∈ B(X).
We will write xi , i = 0, 1 for elements of X̌, with x0 denoting members of the upper
level X0 and x1 denoting members of the lower level X1 . In order to describe more
easily the calculations associated with moving between the original and the split chain,
we will also sometimes call X0 the copy of X, and we will say that A ∈ B(X) is a copy
of the corresponding set A0 ⊆ X0 .
If λ is any measure on B(X), then the next step in the construction is to split the
measure λ into two measures on each of X0 and X1 by defining the measure λ∗ on B(X̌)
through
+
λ∗ (A0 ) = λ(A ∩ C)[1 − δ] + λ(A ∩ C c )
(5.3)
λ∗ (A1 ) = λ(A ∩ C)δ

where δ and C are the constant and the set in (5.2). Note that in this sense the splitting
is dependent on the choice of the set C, and although in general the set chosen is not
relevant, we will on occasion need to make explicit the set in (5.2) when we use the split
chain.
It is critical to note that λ is the marginal measure induced by λ∗ , in the sense that
for any A in B(X) we have
λ∗ (A0 ∪ A1 ) = λ(A). (5.4)

In the case when A ⊆ C c , we have λ∗ (A0 ) = λ(A); only subsets of C are really eﬀectively
split by this construction.
Now the third, and most subtle, step in the construction is to split the chain Φ to
form a chain Φ̌ which lives on (X̌, B(X̌)). Deﬁne the split kernel P̌ (xi , A) for xi ∈ X̌ and
A ∈ B(X̌) by

P̌ (x0 , · ) = P (x, · )∗ , x0 ∈ X0 \C0 ; (5.5)

P̌ (x0 , · ) = [1 − δ]−1 [P (x, · )∗ − δν ∗ ( · )], x0 ∈ C0 ; (5.6)

P̌ (x1 , · ) = ν ∗ ( · ), x1 ∈ X1 . (5.7)

where C, δ and ν are the set, the constant and the measure in the minorization condition.
Outside C the chain {Φ̌n } behaves just like {Φn }, moving on the “top” half X0 of
the split space. Each time it arrives in C, it is “split”; with probability 1 − δ it remains
in C0 , with probability δ it drops to C1 . We can think of this splitting of the chain as
tossing a δ-weighted coin to decide which level to choose on each arrival in the set C
where the split takes place.
100 Pseudo-atoms

When the chain remains on the top level its next step has the modiﬁed law (5.6).
That (5.6) is always non-negative follows from (5.2). This is the sole use of the mi-
norization condition, although without it this chain cannot be deﬁned.
Note here the whole point of the construction: the bottom level X1 is an atom,
with ϕ∗ (X1 ) = δϕ(C) > 0 whenever the chain Φ is ϕ-irreducible. By (5.3) we have
P̌ n (xi , X1 \C1 ) = 0 for all n ≥ 1 and all xi ∈ X̌, so that the atom C1 ⊆ X1 is the only
part of the bottom level which is reached with positive probability. We will use the
notation
α̌ := C1 (5.8)
when we wish to emphasize the fact that all transitions out of C1 are identical, so that
C1 is an atom in X̌.

5.1.2 Connecting the split and original chains

The splitting construction is valuable because of the various properties that Φ̌ inherits
from, or passes on to, Φ. We give the ﬁrst of these in the next result.
Theorem 5.1.3. The following correspondences hold for the split and original chains:
(i) The chain Φ is the marginal chain of {Φ̌n }: that is, for any initial distribution λ
on B(X) and any A ∈ B(X),

k
λ(dx)P (x, A) = λ∗ (dyi )P̌ k (yi , A0 ∪ A1 ). (5.9)
X X̌

(ii) The chain Φ is ϕ-irreducible if Φ̌ is ϕ∗ -irreducible; and if Φ is ϕ-irreducible with

ϕ(C) > 0 then Φ̌ is ν ∗ -irreducible, and α̌ is an accessible atom for the split chain.

Proof (i) From the linearity of the splitting operation we only need to check the
equivalence in the special case of λ = δx , and k = 1. This follows by direct computation.
We analyze two cases separately.
Suppose ﬁrst that x ∈ C c . Then, by (5.5) and (5.4),

δx∗ (dyi )P̌ (yi , A0 ∪ A1 ) = P̌ (x0 , A0 ∪ A1 ) = P (x, A) .
X̌

On the other hand suppose x ∈ C. Then, from (5.6), (5.7) and (5.4) again,

δx∗ (dyi )P̌ (yi , A0 ∪ A1 )
X̌
= (1 − δ)P̌ (x0 , A0 ∪ A1 ) + δ P̌ (x1 , A0 ∪ A1 )

= (1 − δ) [1 − δ]−1 [P ∗ (x, A0 ∪ A1 ) − δν ∗ (A0 ∪ A1 )] + δν ∗ (A0 ∪ A1 )
= P (x, A).

(ii) If the split chain is ϕ∗ -irreducible it is straightforward that the original chain
is ϕ-irreducible from (i). The converse follows from the fact that α̌ is an accessible
atom if ϕ(C) > 0, which is easy to check, and Proposition 5.1.1.

5.1. Splitting ϕ-irreducible chains 101

The following identity will prove crucial in later development. For any measure µ
on B(X) we have ∗
∗
µ (dxi )P̌ (xi , · ) = µ(dx)P (x, · ) (5.10)
X̌ X

or, using operator notation, µ P̌ = (µP ) . This follows from the deﬁnition of the ∗ op-
∗ ∗

eration and the transition function P̌ , and is in eﬀect a restatement of Theorem 5.1.3 (i).
Since it is only the marginal chain Φ which is really of interest, we will usually
consider only sets of the form Ǎ = A0 ∪ A1 , where A ∈ B(X), and we will largely restrict
ourselves to functions on X̌ of the form fˇ(xi ) = f (xi ), where f is some function on X;
that is, fˇ is identical on the two copies of X. By (5.9) we have for any k, any initial
distribution λ, and any function fˇ identical on X0 and X1

Eλ [f (Φk )] = Ěλ ∗ [fˇ(Φ̌k )].

To emphasize this identity we will henceforth denote fˇ by f , and Ǎ by A in these special

instances. The context should make clear whether A is a subset of X or X̌, and whether
the domain of f is X or X̌.
The minorization condition ensures that the construction in (5.6) gives a probability
law on X̌. A similar construction can also be carried out under the seemingly
more
general minorization requirement that there exists a function h(x) with h(x)ϕ(dx) >
0, and a measure ν(·) on B(X) such that

P (x, A) ≥ h(x)ν(A), x ∈ X, A ∈ B(X). (5.11)

The details are, however, slightly less easy than for the approach we give above although
there are some other advantages to the approach through (5.11): the interested reader
should consult Nummelin [303] for more details.
The construction of a split chain is of some value in the next several chapters,
although much of the analysis will be done directly using the small sets of the next
section. The Nummelin splitting technique will, however, be central in our approach to
the asymptotic results of Part III.

5.1.3 A random renewal time approach

There is a second construction of a “pseudo-atom” which is formally very similar to
that above. This approach, due to Athreya and Ney [13], concentrates, however, not
on a “physical” splitting of the space but on a random renewal time.
If we take the existence of the minorization (5.2) as an assumption, and if we also
assume
L(x, C) ≡ 1, x ∈ X (5.12)
we can then construct an almost surely ﬁnite random time τ ≥ 1 on an enlarged
probability space such that Px (τ < ∞) = 1 and for every A

Px (Φn ∈ A, τ = n) = ν(C ∩ A)Px (τ = n). (5.13)

To construct τ , let Φ run until it hits C; from (5.12) this happens eventually with
probability one. The time and place of ﬁrst hitting C will be, say, k and x. Then with
102 Pseudo-atoms

probability δ distribute Φk +1 over C according to ν; with probability (1 − δ) distribute

Φk +1 over the whole space with law Q(x, ·), where

Q(x, A) = [P (x, A) − δν(A ∩ C)]/(1 − δ);

from (5.2) Q is a probability measure, as in (5.6). Repeat this procedure each time
Φ enters C; since this happens inﬁnitely often from (5.12) (a fact yet to be proven in
Chapter 9), and each time there is an independent probability δ of choosing ν, it is
intuitively clear that sooner or later this version of Φk is chosen. Let the time when it
occurs be τ . Then Px (τ < ∞) = 1 and (5.13) clearly holds; and (5.13) says that τ is a
regeneration time for the chain.
The two constructions are very close in spirit: if we consider the split chain con-
struction then we can take the random time τ as τα̌ , which is identical to the hitting
time on the bottom level of the split space.
There are advantages to both approaches, but the Nummelin splitting does not re-
quire the recurrence assumption (5.12), and more pertinently, it exploits the rather deep
fact that some m-skeleton always obeys the minorization condition when ψ-irreducibility
holds, as we now see.

5.2 Small sets

In this section we develop the theory of small sets. These are sets for which the minoriza-
tion condition holds, at least for the m-skeleton chain. From the splitting construction
of Section 5.1.1, then, it is obvious that the existence of small sets is of considerable
importance, since they ensure the splitting method is not vacuous.
Small sets themselves behave, in many ways, analogously to atoms, and in particular
the conclusions of Proposition 5.1.1 and Proposition 5.1.2 hold. We will ﬁnd also many
cases where we exploit the “pseudo-atomic” properties of small sets without directly
using the split chain.

Small sets
A set C ∈ B(X) is called a small set if there exists an m > 0, and a
non-trivial measure νm on B(X), such that for all x ∈ C, B ∈ B(X),

P m (x, B) ≥ νm (B). (5.14)

When (5.14) holds we say that C is νm -small.

The central result (Theorem 5.2.2 below), on which a great deal of the subsequent
development rests, is that for a ψ-irreducible chain, every set A ∈ B + (X) contains
a small set in B + (X). As a consequence, every ψ-irreducible chain admits some m-
skeleton which can be split, and for which the atomic structure of the split chain can
be exploited.
5.2. Small sets 103

In order to prove this result, we need for the first time to consider the densities of
the transition probability kernels. Being a probability measure on (X, B(X)) for each
individual x and each n, the transition probability kernel P n (x, ·) admits a Lebesgue
decomposition into its absolutely continuous and singular parts, with respect to any
finite non-trivial measure φ on B(X) : we have for any fixed x and B ∈ B(X)

P n (x, B) = pn (x, y)φ(dy) + P⊥ (x, B). (5.15)
B

where p (x, y) is the density of P (x, · ) with respect to φ and P⊥ is orthogonal to φ.

n n

Theorem 5.2.1. Suppose φ is a σ-ﬁnite measure on (X, B(X)). Suppose A is any set
in B(X) with φ(A) > 0 such that
∞

φ(B) > 0, B ⊆ A ⇒ P k (x, B) > 0, x ∈ A.
k =1

Then, for every n, the function pn deﬁned in (5.15) can be chosen to be a measurable
function on X2 , and there exists C ⊆ A, m > 1, and δ > 0 such that φ(C) > 0 and
pm (x, y) > δ, x, y ∈ C. (5.16)

Proof We include a detailed proof because of the central place small sets hold in
the development of the theory of ψ-irreducible Markov chains. However, the proof is
somewhat complex, and may be omitted without interrupting the flow of understanding
at this point.
It is a standard result that the densities pn (x, y) of P n (x, · ) with respect to φ exist
for each x ∈ X, and are unique except for definition on φ-null sets. We first need to
verify that
(i) the densities pn (x, y) can be chosen jointly measurable in x and y, for each n;
(ii) the densities pn (x, y) can be chosen to satisfy an appropriate form of the
Chapman–Kolmogorov property, namely for n, m ∈ Z+ , and all x, z

p n +m
(x, z) ≥ pn (x, y)pm (y, z)φ(dy). (5.17)
X

To see (i), we appeal to the fact that B(X) is assumed countably generated. This means
that there exists a sequence {Bi ; i ≥ 1} of finite partitions of X, such that Bi+1 is a
refinement of Bi , and which generate B(X). Fix x ∈ X, and let Bi (x) denote the element
in Bi with x ∈ Bi (x).
For each i, the functions

1 0, φ(Bi (y)) = 0,
pi (x, y) =
P (x, Bi (y))/φ(Bi (y)), φ(Bi (y)) > 0
are non-negative, and are clearly jointly measurable in x and y. The Basic Differenti-
ation Theorem for measures (cf. Doob [99], Chapter 7, Section 8) now assures us that
for y outside a φ-null set N ,
p1∞ (x, y) = lim p1i (x, y) (5.18)
i→∞
104 Pseudo-atoms

exists as a jointly measurable version of the density of P (x, ·) with respect to φ.

The same construction gives the densities pn∞ (x, y) for each n, and so jointly mea-
surable versions of the densities exist as required.
We now deﬁne inductively a version pn (x, y) of the densities satisfying (5.17), start-
ing from pn∞ (x, y). Set p1 (x, y) = p1∞ (x, y) for all x, y; and set, for n ≥ 2 and any
x, y,
,
pn (x, y) = pn∞ (x, y) max P m (x, dw)pn −m (w, y).
1≤m ≤n −1

One can now check (see Orey [309] p. 6) that the collection {pn (x, y), x, y ∈ X, n ∈ Z+ }
satisﬁes both (i) and (ii).
We next verify (5.16). The constraints on φ in the statement of Theorem 5.2.1 imply
that
∞
pn (x, y) > 0, x ∈ A, a.e. y ∈ A [φ];
n =1

and thus we can ﬁnd integers n, m such that

pn (x, y)pm (y, z)φ(dx)φ(dy)φ(dz) > 0.
A A A

Now choose η > 0 suﬃciently small that, writing

An (η) := {(x, y) ∈ A × A : pn (x, y) ≥ η}

and φ3 for the product measure φ × φ × φ on X × X × X, we have

φ3 ({(x, y, z) ∈ A × A × A : (x, y) ∈ An (η), (y, z) ∈ Am (η)}) > 0.

We suppress the notational dependence on η from now on, since η is fixed for the
remainder of the proof.
For any x, y, set Bi (x, y) = Bi (x) × Bi (y), where Bi (x) is again the element con-
taining x of the finite partition Bi above. By the Basic Differentiation Theorem as in
(5.18), this time for measures on B(X) × B(X), there are φ2 -null sets Nk ⊆ X × X such
that for any k and (x, y) ∈ Ak \Nk ,

lim φ2 (Ak ∩ Bi (x, y))/φ2 (Bi (x, y)) = 1. (5.19)

i→∞

Now choose a ﬁxed triplet (u, v, w) from the set

{(x, y, z) : (x, y) ∈ An \Nn , (y, z) ∈ Am \Nm }.

From (5.19) we can ﬁnd j large enough that

φ2 (An ∩ Bj (u, v)) ≥ (3/4)φ2 (Bj (u, v)),

φ2 (Am ∩ Bj (v, w)) ≥ (3/4)φ2 (Bj (v, w)). (5.20)

Let us write An (x) = {y ∈ A : (x, y) ∈ An }, A∗m (z) = {y ∈ A : (y, z) ∈ Am } for the

sections of An and Am in the diﬀerent directions. If we deﬁne

En = {x ∈ An ∩ Bj (u) : φ(An (x) ∩ Bj (v)) ≥ (3/4)Bj (v)} (5.21)

5.2. Small sets 105

Dm = {z ∈ Am ∩ Bj (w) : φ(A∗m (z) ∩ Bj (v)) ≥ (3/4)Bj (v)}, (5.22)

then from (5.20) we have that φ(En ) > 0, φ(Dm ) > 0. This then implies, for any pair
(x, z) ∈ En × Dm ,
φ(An (x) ∩ A∗m (z)) ≥ (1/2)φ(Bj (v)) > 0 (5.23)
from (5.21) and (5.22).
Our pieces now almost ﬁt together. We have, from (5.17), that for (x, z) ∈ En × Dm

pn +m (x, z) ≥ pn (x, y)pm (y, z)φ(dy)
A n (x)∩A ∗
m (z )

≥ η φ(An (x) ∩ A∗m (z))

≥ [η 2 /2]φ(Bj (v))
≥ δ1 , say . (5.24)

To ﬁnish the proof, note that since φ(En ) > 0, there is an integer k and a set C ⊆ Dm
with P k (x, En ) > δ2 > 0, for all x ∈ C. It then follows from the construction of the
densities above that for all x, z ∈ C

pk +n +m (x, z) ≥ P k (x, dy)pn +m (y, z)
En
≥ δ1 δ2 ,

and the result follows with δ = δ1 δ2 and M = k + n + m.

The key fact proven in this theorem is that we can deﬁne a version of the densities
of the transition probability kernel such that (5.16) holds uniformly over x ∈ C. This
gives us

Theorem 5.2.2. If Φ is ψ-irreducible, then for every A ∈ B+ (X), there exists m ≥ 1

and a νm -small set C ⊆ A such that C ∈ B+ (X) and νm {C} > 0.

Proof When Φ is ψ-irreducible, every set in B + (X) satisﬁes the conditions of The-
orem 5.2.1, with the measure φ = ψ. The result then follows immediately from (5.16).

As a direct corollary of this result we have

Theorem 5.2.3. If Φ is ψ-irreducible, then the minorization condition holds for some
m-skeleton, and for every Ka ε -chain, 0 < ε < 1.

Any Φ which is ψ-irreducible is well endowed with small sets from Theorem 5.2.1,
even though it is far from clear from the initial deﬁnition that this should be the case.
Given the existence of just one small set from Theorem 5.2.2, we now show that it is
further possible to cover the whole of X with small sets in the ψ-irreducible case.

Proposition 5.2.4. (i) If C ∈ B(X) is νn -small, and for any x ∈ D we have

P m (x, C) ≥ δ, then D is νn +m -small, where νn +m is a multiple of νn .
106 Pseudo-atoms

(ii) Suppose Φ is ψ-irreducible. Then there exists a countable collection Ci of small

sets in B(X) such that
∞
*
X= Ci . (5.25)
i=0

(iii) Suppose Φ is ψ-irreducible. If C ∈ B+ (X) is νn -small, then we may ﬁnd M ∈ Z+

and a measure νM such that C is νM -small, and νM {C} > 0.

Proof (i) By the Chapman–Kolmogorov equations, for any x ∈ D,

P n +m (x, B) = P n (x, dy)P m (y, B)
X

≥ P n (x, dy)P m (y, B) (5.26)

C
≥ δνn (B).

(ii) Since Φ is ψ-irreducible, there exists a νm -small set C ∈ B + (X) from Theo-
rem 5.2.2. Moreover from the deﬁnition of ψ-irreducibility the sets

C̄(n, m) := {y : P n (y, C) ≥ m−1 } (5.27)

cover X and each C̄(n, m) is small from (i).

(iii) Since C ∈ B+ (X), we have Ka 1 (x, C) > 0 for all x ∈ X. Hence νKa 1 (C) > 0,
2 2
and it follows that for some m ∈ Z+ ,

νM (C) := νP m (C) > 0.

To complete the proof observe that, for all x ∈ C,

P n +m (x, B) = P n (x, dy)P m (y, B) ≥ νP m (B) = νM (B),
X

which shows that C is νM -small, where M = n + m.

5.3 Small sets for speciﬁc models

5.3.1 Random walk on a half line
Random walks on a half line provide a simple example of small sets, regardless of the
structure of the increment distribution.
It follows as in the proof of Proposition 4.3.1 that every set [0, c], c ∈ R+ is small,
provided only that Γ(−∞, 0) > 0: in other words, whenever the chain is ψ-irreducible,
every compact set is small. Alternatively, we could derive this result by use of Propo-
sition 5.2.4 (i) since {0} is, by definition, small.
This makes the analysis of queueing and storage models very much easier than more
general models for which there is no atom in the space. We now move on to identify
conditions under which these have identifiable small sets.
5.3. Small sets for specific models 107

5.3.2 “Spread-out” random walks

Let us again consider a random walk Φ of the form
Φn = Φn −1 + Wn ,
satisfying (RW1). We showed in Section 4.3 that, if Γ has a density γ with respect to
Lebesgue measure µL e b on R with
γ(x) ≥ δ > 0, |x| < β,
then Φ is ψ-irreducible: re-examining the proof shows that in fact we have demonstrated
that C = {x : |x| ≤ β/2} is a small set.
Random walks with nonsingular distributions with respect to µL e b , of which the
above are special cases, are particularly well adapted to the ψ-irreducible context. To
study them we introduce so-called “spread-out” distributions.

Spread-out random walk

(RW2) We call the random walk spread out (or equivalently, we call Γ
spread out) if some convolution power Γn ∗ is nonsingular with respect to
µL e b .

For spread-out random walks, we ﬁnd that small sets are in general relatively easy
to ﬁnd.
Proposition 5.3.1. If Φ is a spread-out random walk, with Γn ∗ non-singular with
respect to µL e b then there is a neighborhood Cβ = {x : |x| ≤ β} of the origin which is
ν2n -small, where ν2n = εµL e b I[s,t] for some interval [s, t], and some ε > 0.

Proof Since Γ is spread out, we have for some bounded non-negative function γ
with γ(x) dx > 0, and some n > 0,

P n (0, A) ≥ γ(x) dx, A ∈ B(R).
A

Iterating this we have

P 2n (0, A) ≥ γ(y)γ(x − y) dy dx = γ ∗ γ(x) dx : (5.28)
A R A

but since from Lemma D.4.3 the convolution γ ∗ γ(x) is continuous and not identically
zero, there exists an interval [a, b] and a δ with γ∗γ(x) ≥ δ on [a, b]. Choose β = [b−a]/4,
and [s, t] = [a + β, b − β], to prove the result using the translation invariant properties
of the random walk.

For spread out random walks, a far stronger irreducibility result will be provided in
Chapter 6 : there we will show that if Φ is a random walk with spread-out increment
distribution Γ, with Γ(−∞, 0) > 0, Γ(0, ∞) > 0, then Φ is µL e b -irreducible, and every
compact set is a small set.
108 Pseudo-atoms

5.3.3 Ladder chains and the GI/G/I queue

Recall from Section 3.5 the Markov chain constructed on Z+ × R to analyze the GI/G/1
queue, deﬁned by
Φn = (Nn , Rn ), n ≥ 1,
where Nn is the number of customers at Tn − and Rn is the residual service time at
Tn +.
This has the transition kernel

P (i, x; j × A) = 0, j > i + 1,
P (i, x; j × A) = Λi−j +1 (x, A), j =, 1, . . . , i + 1,
P (i, x; 0 × A) = Λ∗i (x, A),

where
∞
Λn (x, [0, y]) = Pnt (x, y), G(dt), (5.29)
0
∞
Λ∗n (x, [0, y]) = Λj (x, [0, ∞)) H[0, y], (5.30)
n +1

Pnt (x, y) = P(Sn ≤ t < Sn +1 , Rt ≤ y | R0 = x); (5.31)

here, Rt = SN (t)+1 − t, where N (t) is the number of renewals in [0, t] of a renewal
process with inter-renewal time H, and if R0 = x then S1 = x.
At least one collection of small sets for this chain can be described in some detail.

Proposition 5.3.2. Let Φ = {Nn , Rn } be the Markov chain at arrival times of a

GI/G/1 queue described above. Suppose G(β) < 1 for all β < ∞. Then the set {0 ×
[0, β]} is ν1 -small for Φ, with ν1 ( · ) given by G(β, ∞)H( · ).

Proof We consider the bottom “rung” {0 × R}. By construction

Λ∗0 (x, [0, · ]) = H[0, · ][1 − Λ0 (x, [0, ∞])],

and since

Λ0 (x, [0, ∞)] = G(dt)P(0 ≤ t < σ1 | R0 = x)

= G(dt)I{t < x}

= G(−∞, x],

we have
Λ∗0 (x, [0, · ]) = H[0, · ]G(x, ∞).
The result follows immediately, since for x < β, Λ∗0 (x, [0, · ]) ≥ H[0, · ]G(β, ∞).

5.3. Small sets for speciﬁc models 109

5.3.4 The forward recurrence time chain

δ = V (nδ), n ∈ Z+ , which was
Consider the forward recurrence time δ-skeleton V + +

deﬁned in Section 3.5.3: recall that

V + (t) := inf(Zn − t : Zn ≥ t), t≥0

n
where Zn := i=0 Yi for {Y1 , Y2 , . . .} a sequence of independent and identical random
variables with distribution Γ, and Y0 a further independent random variable with dis-
tribution Γ0 .
We shall prove
Proposition 5.3.3. When Γ is spread out then for δ suﬃciently small the set [0, δ] is
a small set for V +
δ .

Proof As in (5.28), since Γ is spread out there exists n ∈ Z+ , an interval [a, b] and
a constant β > 0 such that

Γn ∗ (du) ≥ βµL e b (du), du ⊆ [a, b].

Hence if we choose small enough δ then we can ﬁnd k ∈ Z+ such that

Γn ∗ (du) ≥ βI[k δ,(k +4)δ ] (u)µL e b (du), du ⊆ [a, b]. (5.32)

Now choose m ≥ 1 such that Γ[mδ, (m + 1)δ) = γ > 0; and set M = k + m + 2. Then
for x ∈ [0, δ), by considering the occurrence of the nth renewal where n is the index so
that (5.32) holds we ﬁnd

Px (V + (M δ) ∈ du ∩ [0, δ))
≥ P0 (x + Zn +1 − M δ ∈ du ∩ [0, δ), Yn +1 ≥ δ)

= Γ(dy)P0 (x + y − M δ + Zn ∈ du ∩ [0, δ)) (5.33)
y ∈[δ,∞)

≥ Γ(dy)P0 (Zn ∈ du ∩ {[0, δ) − x − y + M δ}).
y ∈[m δ,(m +1)δ )

Now when y ∈ [mδ, (m + 1)δ) and x ∈ [0, δ), we must have

{[0, δ) − x − y + M δ} ⊆ [kδ, (k + 3)δ) (5.34)

and therefore from (5.33)

Px (V + (M δ) ∈ du ∩ [0, δ)) ≥ βI[0,δ ) (u)µL e b (du)Γ(mδ, (m + 1)δ)

≥ βγI[0,δ ) (u)µL e b (du). (5.35)

Hence [0, δ) is a small set, and the measure ν can be chosen as a multiple of Lebesgue
measure over [0, δ).

In this proof we have demanded that (5.32) holds for u ∈ [kδ, (k + 4)δ] and in (5.34)
we only used the fact that the equation holds for u ∈ [kδ, (k + 3)δ]. This is not an
oversight: we will use the larger range in showing in Proposition 5.4.5 that the chain is
also aperiodic.
110 Pseudo-atoms

5.3.5 Linear state space models

For the linear state space LSS(F ,G) model we showed in Proposition 4.4.3 that in the
Gaussian case when (LSS3) holds, for every initial condition x0 ∈ X = Rn ,

k −1
P k (x0 , · ) = N (F k x0 , F i GG F i ); (5.36)
i=0

and if (F, G) is controllable then from (4.18) the n-step transition function possesses a
smooth density pn (x, y) which is continuous and everywhere positive on R2n . It follows
from continuity that for any pair of bounded open balls B1 and B2 ⊂ Rn , there exists
ε > 0 such that
pn (x, y) ≥ ε, (x, y) ∈ B1 × B2 .
Letting νn denote the normalized uniform distribution on B2 we see that B1 is νn -small.
This shows that for the controllable, Gaussian LSS(F ,G) model, all compact subsets
of the state space are small.

5.4 Cyclic behavior

5.4.1 The cycle phenomenon
In the previous sections of this chapter we concentrated on the communication structure
between states. Here we consider the set of time points at which such communication is
possible; for even within a communicating class, it is possible that the chain returns to
given states only at specific time points, and this certainly governs the detailed behavior
of the chain in any longer term analysis.
A highly artificial example of cyclic behavior on the finite set X = {1, 2, 3, . . . , d} is
given by the transition probability matrix

P (x, x + 1) = 1, x ∈ {1, 2, 3, . . . , d − 1}, P (d, 1) = 1.

Here, if we start in x then we have P n (x, x) > 0 if and only if n = 0, d, 2d, . . ., and the
chain Φ is said to cycle through the states of X.
On a continuous state space the same phenomenon can be constructed equally easily:
let X = [0, d), let Ui denote the uniform distribution on [i, i + 1), and deﬁne

P (x, ·) := I[i−1,i) (x)Ui (·), i = 0, 1, . . . , d − 1 (mod d).

In this example, the chain again cycles through a fixed finite number of sets. We now
prove a series of results which indicate that, no matter how complex the behavior of a
ψ-irreducible chain, or a chain on an irreducible absorbing set, the finite cyclic behavior
of these examples is typical of the worst behavior to be found.

5.4.2 Cycles for a countable space chain

We discuss this structural question initially for a countable space X.
5.4. Cyclic behavior 111

Let α be a speciﬁc state in X, and write

d(α) = g.c.d.{n ≥ 1 : P n (α, α) > 0}. (5.37)

This does not guarantee that P m d(α ) (α, α) > 0 for all m, but it does imply P n (α, α) =
0 unless n = md(α), for some m.
We call d(α) the period of α. The result we now show is that the value of d(α) is
common to all states y in the class C(α) = {y : α ↔ y}, rather than taking a separate
value for each y.
Proposition 5.4.1. Suppose α has period d(α): then for any y ∈ C(α), d(α) = d(y).

Proof Since α ↔ y, we can ﬁnd m and n such that P m (α, y) > 0 and P n (y, α) > 0.
By the Chapman–Kolmogorov equations, we have

P m +n (α, α) ≥ P m (α, y)P n (y, α) > 0, (5.38)

and so by deﬁnition, (m + n) is a multiple of d(α). Choose k such that k is not a

multiple of d(α). Then (k + m + n) is not a multiple of d(α): hence, since

P m (α, y)P k (y, y)P n (y, α) ≤ P k +m +n (α, α) = 0,

we have P k (y, y) = 0, which proves d(y) ≥ d(α). Reversing the role of α and y shows
d(α) ≥ d(y), which gives the result.

This result leads to a further decomposition of the transition probability matrix for
an irreducible chain; or, equivalently, within a communicating class.
Proposition 5.4.2. Let Φ be an irreducible Markov chain on a countable space, and
let d denote the common period of the states in X. Then there exist disjoint sets
D1 , . . . , Dd ⊆ X such that
*d
X= Dk ,
i=1

and
P (x, Dk +1 ) = 1, x ∈ Dk , k = 0, . . . , d − 1 (mod d). (5.39)

Proof The proof is similar to that of the previous proposition. Choose α ∈ X as a

distinguished state, and let y be another state, such that for some M

P M (y, α) > 0.

Let k be any other integer such that P k (α, y) > 0. Then P k +M (α, α) > 0, and thus
k + M = jd for some j; equivalently, k = jd − M . Now M is fixed, and so we must
have P k (α, y) > 0 only for k in the sequence {r, r + d, r + 2d, . . .}, where the integer
r = r(y) ∈ {1, . . . , d} is uniquely defined for y.
Call Dr the set of states which are reached with positive probability from α only
at points in the sequence {r, r + d, r + 2d, . . .} for each r ∈ {1, 2, . . . , d}. By definition
α ∈ Dd , and P (α, D1c ) = 0 so that P (α, D1 ) = 1. Similarly, for any y ∈ Dr we have
P (y, Drc +1 ) = 0, giving our result.

112 Pseudo-atoms

The sets {Di } covering X and satisfying (5.39) are called cyclic classes, or a d-cycle,
of Φ. With probability one, each sample path of the process Φ “cycles” through values
in the sets D1 , D2 , . . . , Dd , D1 , D2 , . . ..
Diagrammatically, we have shown that we can write an irreducible transition prob-
ability matrix in “super-diagonal” form
 
0 P1
 0 0 P2 0 
 
 .. . . 
P =  . . 0 P 3


 . . . . 
 .. .. .. 0 .. 
Pd . . . . . . . . . 0

where each block Pi is a square matrix whose dimension may depend upon i.

Aperiodicity
An irreducible chain on a countable space X is called

(i) aperiodic, if d(x) ≡ 1, x ∈ X;

(ii) strongly aperiodic, if P (x, x) > 0 for some x ∈ X.

Whilst cyclic behavior can certainly occur, as illustrated in the examples at the
beginning of this section, and the periodic behavior of the control systems in Theo-
rem 7.3.3 below, most of our results will be given for aperiodic chains. The justiﬁcation
for using such chains is contained in the following, whose proof is obvious.

Proposition 5.4.3. Suppose Φ is an irreducible chain on a countable space X, with pe-

riod d and cyclic classes {D1 , . . . , Dd }. Then for the Markov chain Φd = {Φd , Φ2d , . . .}
with transition matrix P d , each Di is an irreducible absorbing set of aperiodic states.

5.4.3 Cycles for a general state space chain

The existence of small sets enables us to show that, even on a general space, we still
have a ﬁnite periodic breakup into cyclic sets for ψ-irreducible chains.
Suppose that C is any νM -small set, and assume that νM (C) > 0, as we may without
loss of generality by Proposition 5.2.4.
We will use the set C and the corresponding measure νM to deﬁne a cycle for a
general irreducible Markov chain. To simplify notation we will suppress the subscript
on ν. Hence we have P M (x, · ) ≥ ν( · ), x ∈ C, and ν(C) > 0, so that, when the chain
starts in C, there is a positive probability that the chain will return to C at time M .
Let

EC = {n ≥ 1 : the set C is νn -small, with νn = δn ν for some δn > 0} (5.40)

5.4. Cyclic behavior 113

be the set of time points for which C is a small set with minorizing measure proportional
to ν. Notice that for B ⊆ C, n, m ∈ EC implies

P n +m
(x, B) ≥ P m (x, dy)P n (y, B)
C
≥ [δm δn ν(C)]ν(B), x ∈ C;

so that EC is closed under addition. Thus there is a natural “period” for the set C,
given by the greatest common divisor of EC ; and from Lemma D.7.4, C is νn d -small
for all large enough n.
We show that this value is in fact a property of the whole chain Φ, and is independent
of the particular small set chosen, in the following analogue of Proposition 5.4.2.
+
Theorem 5.4.4. Suppose that Φ is a ψ-irreducible Markov chain on X. Let C ∈ B(X)
be a νM -small set and let d be the greatest common divisor of the set EC . Then there
exist disjoint sets D1 . . . Dd ∈ B(X) (a “d-cycle”) such that

(i) for x ∈ Di , P (x, Di+1 ) = 1, i = 0, . . . , d − 1 (mod d);

!d
(ii) the set N = [ i=1 Di ]c is ψ-null.

The d-cycle {Di } is maximal in the sense that for any other collection {d , Dk , k =
1, . . . , d } satisfying (i)–(ii), we have d dividing d; whilst if d = d , then, by reordering
the indices if necessary, Di = Di a.e. ψ.

Proof For i = 0, 1, . . . , d − 1 set

" ∞
#

Di∗ = y: P n d−i
(y, C) > 0 :
n =1

by irreducibility, X = ∪Di∗ .
The Di∗ are in general not disjoint, but we can show that their intersection is ψ-null.
For suppose there exists i, k such that ψ(Di∗ ∩ Dk∗ ) > 0. Then for some ﬁxed m, n > 0,
there is a subset A ⊆ Di∗ ∩ Dk∗ with ψ(A) > 0 such that

P m d−i (w, C) ≥ δm > 0, w∈A

P n d−k (w, C) ≥ δn > 0, w∈A (5.41)

and since ψ is the maximal irreducibility measure, we can also ﬁnd r such that

ν(dy)P r (y, A) = δc > 0. (5.42)
C

Now we use the fact that C is a νM -small set: for x ∈ C, B ⊆ C, from (5.41), (5.42),

P 2M +m d−i+r
(x, B) ≥ M
P (x, dy) r
P (y, dw) P m d−i (w, dz)P M (z, B)
C A C

≥ [δc δm ]ν(B),
114 Pseudo-atoms

so that [2M + md + r] − i ∈ EC . By identical reasoning, we also have [2M + nd + r] − k ∈

EC . This contradicts the definition of d, and we have shown that ψ(Di∗ ∩Dk∗ ) = 0, i = k.
Let N = ∪i,j (Di∗ ∩ Dk∗ ), so that ψ(N ) = 0. The sets {Di∗ \N } form a disjoint class of
sets whose union is full. By Proposition 4.2.3, we can find an absorbing set D such that
Di = D ∩ (Di∗ \N ) are disjoint and D = ∪Di . By the Chapman–Kolmogorov equations
again, if x ∈ D is such that P (x, Dj ) > 0, then we have x ∈ Dj −1 , by definition, for
j = 0, . . . , d − 1 (mod d). Thus {Di } is a d-cycle.
To prove the maximality and uniqueness result, suppose {Di } is another cycle with
period d , with N = [∪Di ]c such that ψ(N ) = 0. Let k be any index with ν(Dk ∩C) > 0:
since ψ(N ) = 0 and ψ ν, such a k exists. We then have, since C is a νM -small set,
P M (x, Dk ∩ C) ≥ ν(Dk ∩ C) > 0 for every x ∈ C. Since (Dk ∩ C) is non-empty,
this implies firstly that M is a multiple of d ; since this happens for any n ∈ EC , by
definition of d we have d divides d as required. Also, we must have C ∩ Dj empty
for any j = k: for if not we would have some x ∈ C with P M (x, C ∩ Dk ) = 0, which
contradicts the properties of C.
Hence we have C ⊆ (Dk ∪ N ), for some particular k. It follows by the definition of
the original cycle that each Dj is a union up to ψ-null sets of (d/di ) elements of Di .
It is obvious from the above proof that the cycle does not depend, except perhaps for
ψ-null sets, on the small set initially chosen, and that any small set must be essentially
contained inside one specific member of the cyclic class {Di }.

Periodic and aperiodic chains

Suppose that Φ is a ϕ-irreducible Markov chain.
The largest d for which a d-cycle occurs for Φ is called the period of Φ.
When d = 1, the chain Φ is called aperiodic.
When there exists a ν1 -small set A with ν1 (A) > 0, then the chain is called
strongly aperiodic.

As a direct consequence of these deﬁnitions and Theorem 5.2.3 we have

Proposition 5.4.5. Suppose that Φ is a ψ-irreducible Markov chain.
(i) If Φ is strongly aperiodic, then the minorization condition (5.2) holds.
(ii) The resolvent, or Ka ε -chain, is strongly aperiodic for all 0 < ε < 1.
(iii) If Φ is aperiodic, then every skeleton is ψ-irreducible and aperiodic, and some
m-skeleton is strongly aperiodic.

This result shows that it is clearly desirable to work with strongly aperiodic chains.
Regrettably, this condition is not satisﬁed in general, even for simple chains; and we will
5.5. Petite sets and sampled chains 115

often have to prove results for strongly aperiodic chains and then use special methods
to extend them to general chains through the m-skeleton or the Ka ε -chain.
We will however concentrate almost exclusively on aperiodic chains. In practice this
is not greatly restrictive, since we have as in the countable case
Proposition 5.4.6. Suppose Φ is a ψ-irreducible chain with period d and d-cycle
{Di , i = 1, . . . , d}. Then each of the sets Di is an absorbing ψ-irreducible set for the
chain Φd corresponding to the transition probability kernel P d , and Φd on each Di is
aperiodic.

Proof That each Di is absorbing and irreducible for Φd is obvious: that Φd on

each Di is aperiodic follows from the deﬁnition of d as the largest value for which a
cycle exists.

5.4.4 Periodic and aperiodic examples: forward recurrence

times
For the forward recurrence time chain on the integers it is easy to evaluate the period
of the chain. For let p be the distribution of the renewal variables, and let
d = g.c.d.{n : p(n) > 0}.
It is a simple exercise to check that d is also the g.c.d. of the set of times {n : P n (0, 0) >
0} and so d is the period of the chain.
Now consider the forward recurrence time δ-skeleton V + δ = V (nδ), n ∈ Z+ deﬁned
+

in Section 3.5.3. Here, we can ﬁnd explicit conditions for aperiodicity even though the
chain has no atom in the space. We have
Proposition 5.4.7. If F is spread out, then V +
δ is aperiodic for suﬃciently small δ.

Proof In Proposition 5.3.3 we showed that for suﬃciently small δ, the set [0, δ) is
a νM -small set, where ν is a multiple of Lebesgue measure restricted to [0, δ].
But since the bounds on the densities in (5.35) hold, not just for the range [kδ, (k +
3)δ) for which they were used, but by construction for the greater range [kδ, (k + 4)δ),
the same proof shows that [0, δ) is a νM +1 -small set also, and thus aperiodicity follows
from the deﬁnition of the period of V +δ as the g.c.d. in (5.40).

5.5 Petite sets and sampled chains

5.5.1 Sampling a Markov chain
A convenient tool for the analysis of Markov chains is the sampled chain, which extends
substantially the idea of the m-skeleton or the resolvent chain.
Let a = {a(n)} be a distribution, or probability measure, on Z+ , and consider the
Markov chain Φa with probability transition kernel
∞

Ka (x, A) := P n (x, A)a(n), x ∈ X, A ∈ B(X). (5.43)
n =0
116 Pseudo-atoms

It is obvious that Ka is indeed a transition kernel, so that Φa is well-deﬁned by Theo-

rem 3.4.1.
We will call Φa the Ka -chain, with sampling distribution a. Probabilistically, Φa
has the interpretation of being the chain Φ “sampled” at time points drawn successively
according to the distribution a, or more accurately, at time points of an independent
renewal process with increment distribution a as deﬁned in Section 2.4.1.
There are two speciﬁc sampled chains which we have already invoked, and which
will be used frequently in the sequel. If a = δm is the Dirac measure with δm (m) = 1,
then the Kδ m -chain is the m-skeleton with transition kernel P m . If aε is the geometric
distribution with
aε (n) = [1 − ε]εn , n ∈ Z+ ,

then the kernel Ka ε is the resolvent Kε which was deﬁned in Chapter 3. The concept
of sampled chains immediately enables us to develop useful conditions under which one
set is uniformly accessible from another. We say that a set B ∈ B(X) is uniformly
accessible using a from another set A ∈ B(X) if there exists a δ > 0 such that

inf Ka (x, B) > δ; (5.44)

x∈A

a
and when (5.44) holds we write A B.
a
Lemma 5.5.1. If A B for some distribution a, then A B.

Proof Since L(x, B) = Px (τB < ∞) = Px (Φn ∈ B for some n ∈ Z+ ) and

Ka (x, B) = Px (Φη ∈ B) where η has the distribution a, it follows that

L(x, B) ≥ Ka (x, B) (5.45)

for any distribution a, and the result follows.

The following relationships will be used frequently.

Lemma 5.5.2. (i) If a and b are distributions on Z+ , then the sampled chains with
transition laws Ka and Kb satisfy the generalized Chapman–Kolmogorov equations

Ka∗b (x, A) = Ka (x, dy)Kb (y, A) (5.46)

where a ∗ b denotes the convolution of a and b.

a b a∗b
(ii) If A B and B C, then A C.

(iii) If a is a distribution on Z+ , then the sampled chain with transition law Ka satisﬁes
the relation
U (x, A) ≥ U (x, dy)Ka (y, A). (5.47)
5.5. Petite sets and sampled chains 117

Proof To see (i), observe that by deﬁnition and the Chapman–Kolmogorov equa-
tion
∞

Ka∗b (x, A) = P n (x, A) a ∗ b(n)
n =0
∞
n
= P n (x, A) a(m)b(n − m)
n =0 m =0
∞ n

= P m (x, dy)P n −m (y, A)a(m)b(n − m)
n =0 m =0
∞ ∞

= P m (x, dy)a(m) P n −m (y, A)b(n − m)
m =0 n =m

= Ka (x, dy)Kb (yA), (5.48)

as required.
The result (ii) follows directly from (5.46) and the deﬁnitions.
For (iii), note that for ﬁxed m, n,

P m +n (x, A)a(n) = P m (x, dy)P n (y, A)a(n)

so that summing over m gives

U (x, A)a(n) ≥ m
P (x, A)a(n) = U (x, dy)P n (y, A)a(n);
m>n

a second summation over n gives the result since n a(n) = 1.

The probabilistic interpretation of Lemma 5.5.2 (i) is simple: if the chain is sampled
at a random time η = η1 + η2 , where η1 has distribution a and η2 has independent
distribution b, then since η has distribution a∗b, it follows that (5.46) is just a Chapman–
Kolmogorov decomposition at the intermediate random time.

5.5.2 The property of petiteness

Small sets always exist in the ψ-irreducible case, and provide most of the properties
we need. We now introduce a generalization of small sets, petite sets, which have even
more tractable properties, especially in topological analyses.

Petite sets
We will call a set C ∈ B(X) νa -petite if the sampled chain satisﬁes the
bound
Ka (x, B) ≥ νa (B),
for all x ∈ C, B ∈ B(X), where νa is a non-trivial measure on B(X).
118 Pseudo-atoms

From the deﬁnitions we see that a small set is petite, with the sampling distribution
a taken as δm for some m. Hence the property of being a small set is in general stronger
than the property of being petite. We state this formally as
Proposition 5.5.3. If C ∈ B(X) is νm -small, then C is νδ m -petite.

a
The operation “” interacts usefully with the petiteness property. We have
b
Proposition 5.5.4. (i) If A ∈ B(X) is νa -petite and D A, then D is νb∗a -petite,
where νb∗a can be chosen as a multiple of νa .
(ii) If Φ is ψ-irreducible and if A ∈ B+ (X) is νa -petite, then νa is an irreducibility
measure for Φ.

Proof To prove (i) choose δ > 0 such that for x ∈ D we have Kb (x, A) ≥ δ. By
Lemma 5.5.2 (i),

Kb∗a (x, B) = Kb (x, dy)Ka (y, B)
X
≥ Kb (x, dy)Ka (y, B) (5.49)
A
≥ δνa (B).

To see (ii), suppose A is νa -petite and νa (B) > 0. For x ∈ A(n, m) as in (5.27) we have

P n Ka (x, B) ≥ P n (x, dy)Ka (y, B) ≥ m−1 νa (B) > 0
A

which gives the result.

Proposition 5.5.4 provides us with a prescription for generating an irreducibility
measure from a petite set A, even if all we know for general x ∈ X is that the single
petite set A is reached with positive probability. We see the value of this in the examples
later in this chapter
The following result illustrates further useful properties of petite sets, which distin-
guish them from small sets.
Proposition 5.5.5. Suppose Φ is ψ-irreducible.
(i) If A is νa -petite, then there exists a sampling distribution b such that A is also
ψb -petite where ψb is a maximal irreducibility measure.
(ii) The union of two petite sets is petite.
(iii) There exists a sampling distribution c, an everywhere strictly positive, measurable
function s : X → R, and a maximal irreducibility measure ψc such that

Kc (x, B) ≥ s(x)ψc (B), x ∈ X, B ∈ B(X)

Thus there is an increasing sequence {Ci } of ψc -petite sets, all with the same
sampling distribution c and minorizing measure equivalent to ψ, with ∪Ci = X.
5.5. Petite sets and sampled chains 119

Proof To prove (i) we ﬁrst show that we can assume without loss of generality
that νa is an irreducibility measure, even if ψ(A) = 0.
From Proposition 5.2.4 there exists a νb -petite set C with C ∈ B + (X). We have
Ka ε (y, C) > 0 for any y ∈ X and any ε > 0, and hence for x ∈ A,

Ka∗a ε (x, C) ≥ νa (dy)Ka ε (y, C) > 0.

a∗a
This shows that A ε C, and hence from Proposition 5.5.4 we see that A is νa∗a ε ∗b -
petite, where νa∗a ε ∗b is a constant multiple of νb . Now, from Proposition 5.5.4 (ii), the
measure νa∗a ε ∗b is an irreducibility measure, as claimed.
We now assume that νa is an irreducibility measure, which is justiﬁed by the discus-
sion above, and use Proposition 5.5.2 (i) to obtain the bound, valid for any 0 < ε < 1,

Ka∗a ε (x, B) = Ka Ka ε (x, B) ≥ νa Ka ε (B), x ∈ A, B ∈ B(X).

Hence A is ψb -petite with b = aε ∗ a and ψb = νa Ka ε . Proposition 4.2.2 (iv) asserts

that, since νa is an irreducibility measure, the measure ψb is a maximal irreducibility
measure.
To see (ii), suppose that A1 is ψa 1 -petite, and that A2 is ψa 2 -petite. Let A0 ∈ B + (X)
be a ﬁxed petite set and deﬁne the sampling measure a on Z+ as a(i) = 12 [a1 (i) + a2 (i)],
i ∈ Z+ .
Since both ψa 1 and ψa 2 can be chosen as maximal irreducibility measures, it follows
that for x ∈ A1 ∪ A2

Ka (x, A0 ) ≥ 1
2 min(ψa 1 (A0 ), ψa 2 (A0 )) > 0
a
so that A1 ∪ A2 A0 . From Proposition 5.5.4 we see that A1 ∪ A2 is petite.
For (iii), ﬁrst apply Theorem 5.2.2 to construct a νn -small set C ∈ B + (X). By (i)
above we may assume that C is ψb -petite with ψb a maximal irreducibility measure.
Hence Kb (y, · ) ≥ IC (y)ψb ( · ) for all y ∈ X.
By irreducibility and the deﬁnitions we also have Ka ε (x, C) > 0 for all 0 < ε < 1,
and all x ∈ X. Combining these bounds gives for any x ∈ X, B ∈ B(X),

Kb∗a ε (x, B) ≥ Ka ε (y, dz)Kb (z, B) ≥ Ka ε (x, C)ψb (B)
C

which shows that (iii) holds with c = b ∗ aε , s(x) = Ka ε (x, C) and ψc = ψb .

The petite sets forming the countable cover can be taken as Cm := {x ∈ X : s(x) ≥
m−1 }, m ≥ 1.

Clearly the result in (ii) is best possible, since the whole space is a countable union
of small (and hence petite) sets from Proposition 5.2.4, yet is not necessarily petite
itself.
Our next result is interesting of itself, but is more than useful as a tool in the use
of petite sets.

Proposition 5.5.6. Suppose that Φ is ψ-irreducible and that C is νa -petite.

120 Pseudo-atoms

(i) Without loss of generality we can take a to be either a uniform sampling distribu-
tion am (i) = 1/m, 1 ≤ i ≤ m, or a to be the geometric sampling distribution aε .
In either case, there is a ﬁnite mean sampling time

ma = ia(i).
i

(ii) If Φ is strongly aperiodic, then the set C0 ∪C1 ⊆ X̌ corresponding to C is νa∗ -petite
for the split chain Φ̌.

Proof To see (i), let A ∈ B + (X) be νn -small. By Proposition 5.5.5 (i) we have

Kb (x, A) ≥ ψb (A) > 0, x∈C

N
where ψb is a maximal irreducibility measure. Hence k =1 P k (x, A) ≥ 12 ψb (A), x ∈ C,
for some N suﬃciently large.
Since A is νn -small, it follows that for any B ∈ B(X),

N +n
N
P k (x, B) ≥ P k +n (x, B) ≥ 12 ψb (A)νn (B)
k =1 k =1

for x ∈ C. This shows that C is νa -petite with a(k) = (N + n)−1 for 1 ≤ k ≤ N + n.

Since for all ε and m there exists some constant c such that aε (j) ≥ cam (j), j ∈ Z+ ,
this proves (i).
To see (ii), suppose that the chain is split with the small set A ∈ B+ (X). Then
A0 ∪ X1 is also petite: for X1 is small, and A0 is also small since P̌ (x, X1 ) ≥ δ for
x0 ∈ A0 , and we know that the union of petite sets is petite, by Proposition 5.5.5.
Since when x0 ∈ Ac0 we have for n ≥ 1, P̌ n (x0 , A0 ∪X1 ) = P̌ n (x0 , A0 ∪A1 ) = P n (x, A)
it follows that
∞

Ǩa (x0 , A0 ∪ X1 ) = a(j)P̌ j (x0 , A0 ∪ X1 )
j =0

is uniformly bounded from below for x0 ∈ C0 \ A0 , which shows that C0 \ A0 is petite.

Since the union of petite sets is petite, C0 ∪ X1 is also petite.

5.5.3 Petite sets and aperiodicity

If A is a petite set for a ψ-irreducible Markov chain, then the corresponding minorizing
measure can always be taken to be equal to a maximal irreducibility measure, although
the measure νm appropriate to a small set is not as large as this.
We now prove that in the ψ-irreducible aperiodic case, every petite set is also small
for an appropriate choice of m and νm .

Theorem 5.5.7. If Φ is irreducible and aperiodic, then every petite set is small.
5.6. Commentary 121

Proof Let A be a petite set. From Proposition 5.5.5 we may assume that A is
ψa -petite, where ψa is a maximal irreducibility measure.
Let C denote the small set used in (5.40). Since the chain is aperiodic, it follows
from Theorem 5.4.4 and Lemma D.7.4 that for some n0 ∈ Z+ , the set C is νk -small,
with νk = δν for some δ > 0, for all n0 /2 − 1 ≤ k ≤ n0 .
Since C ∈ B + (X), we may also assume that n0 is so large that
∞

a(k) ≤ 12 ψa (C).
k =n 0 /2

With n0 so ﬁxed, we have for all x ∈ A and B ∈ B(X),

n 0 /2$ %

P n 0 (x, B) ≥ P k (x, dy)P n 0 −k (y, B) a(k)
k =0 C

n
0 /2
≥ P k (x, C)a(k) δν(B)
k =0

≥ 1
2 ψa (C) δν(B)
1
which shows that A is νn 0 -small, with νn 0 = 2 δψa (C) ν.

This somewhat surprising result, together with Proposition 5.5.5, indicates that
the class of small sets can be used for different purposes, depending on the choice of
sampling distribution we make: if we sample at a fixed finite time we may get small
sets with their useful fixed time point properties; and if we extend the sampling as in
Proposition 5.5.5, we develop a petite structure with a maximal irreducibility measure.
We shall use this duality frequently.

5.6 Commentary
We have already noted that the split chain and the random renewal time approaches
to regeneration were independently discovered by Nummelin [301] and Athreya and
Ney [13]. The opportunities opened up by this approach are exploited with growing
frequency in later chapters.
However, the split chain only works in the generality of ϕ-irreducible chains because
of the existence of small sets, and the ideas for the proof of their existence go back to
Doeblin [95], although the actual existence as we have it here is from Jain and Jamison
[172]. Our proof is based on that in Orey [309], where small sets are called C-sets.
Nummelin [303] Chapter 2 has a thorough discussion of conditions equivalent to that
we use here for small sets; Bonsdorff [38] also provides connections between the various
small set concepts.
Our discussion of cycles follows that in Nummelin [303] closely. A thorough study
of cyclic behavior, expanding on the original approach of Doeblin [95], is given also in
Chung [70].
Petite sets as defined here were introduced in Meyn and Tweedie [277]. The “small
sets” defined in Nummelin and Tuominen [305] as well as the petits ensembles developed
122 Pseudo-atoms

in Duflo [102] are also special instances of petite sets, where the sampling distribution
a is chosen as a(i) = 1/N for 1 ≤ i ≤ N , and a(i) = (1 − α)αi respectively. To a
French speaker, the term “petite set” might be disturbing since the gender of ensemble
is masculine: however, the nomenclature does fit normal English usage since [26] the
word “petit” is likened to “puny”, while “petite” is more closely akin to “small”.
It might seem from Theorem 5.5.7 that there is little reason to consider both petite
sets and small sets. However, we will see that the two classes of sets are useful in distinct
ways. Petite sets are easy to work with for several reasons: most particularly, they span
periodic classes so that we do not have to assume aperiodicity, they are always closed
under unions for irreducible chains (Nummelin [303] also finds that unions of small sets
are small under aperiodicity), and by Proposition 5.5.5 we may assume that the petite
measure is a maximal irreducibility measure whenever the chain is irreducible.
Perhaps most importantly, when in the next chapter we introduce a class of Markov
chains with desirable topological properties, we will see that the structure of these
chains is closely linked to petiteness properties of compact sets.
Chapter 6

Topology and continuity

The structure of Markov chains is essentially probabilistic, as we have described it

so far. In examining the stability properties of Markov chains, the context we shall
most frequently use is also a probabilistic one: in Part II, stability properties such as
recurrence or regularity will be defined as certain return to sets of positive ψ-measure,
or as finite mean return times to petite sets, and so forth.
Yet for many chains, there is more structure than simply a σ-field and a probability
kernel available, and the expectation is that any topological structure of the space will
play a strong role in defining the behavior of the chain. In particular, we are used
thinking of specific classes of sets in Rn as having intuitively reasonable properties.
When there is a topology, compact sets are thought of in some sense as manageable
sets, having the same sort of properties as a finite set on a countable space; and so we
could well expect “stable” chains to spend the bulk of their time in compact sets. Indeed,
we would expect compact sets to have the sort of characteristics we have identified, and
will identify, for small or petite sets.
Conversely, open sets are “non-negligible” in some sense, and if the chain is irre-
ducible we might expect it at least to visit all open sets with positive probability. This
indeed forms one alternative definition of “irreducibility”.
In this, the first chapter in which we explicitly introduce topological considera-
tions, we will have, as our two main motivations, the desire to link the concept of
ψ-irreducibility with that of open set irreducibility and the desire to identify compact
sets as petite.
The major achievement of the chapter lies in identifying a topological condition on
the transition probabilities which achieves both of these goals, utilizing the sampled
chain construction we have just considered in Section 5.5.1.
Assume then that X is equipped with a locally compact, separable, metrizable topol-
ogy with B(X) as the Borel σ-field. Recall that a function h from X to R is lower
semicontinuous if
lim inf h(y) ≥ h(x), x∈X:
y →x

a typical, and frequently used, lower semicontinuous function is the indicator function
IO (x) of an open set O in B(X).
We will use the following continuity properties of the transition kernel, couched

123
124 Topology and continuity

in terms of lower semicontinuous functions, to deﬁne classes of chains with suitable

topological properties.

Feller chains, continuous components and T-chains

(i) If P ( · , O) is a lower semicontinuous function for any open set O ∈

B(X), then P is called a (weak) Feller chain.

(ii) If a is a sampling distribution and there exists a substochastic tran-

sition kernel T satisfying

Ka (x, A) ≥ T (x, A), x ∈ X, A ∈ B(X),

where T ( · , A) is a lower semicontinuous function for any A ∈ B(X),

then T is called a continuous component of Ka .

(iii) If Φ is a Markov chain for which there exists a sampling distribution a

such that Ka possesses a continuous component T , with T (x, X) > 0
for all x, then Φ is called a T-chain.

We will prove as one highlight of this section

Theorem 6.0.1. (i) If Φ is a T-chain and L(x, O) > 0 for all x and all open sets
O ∈ B(X), then Φ is ψ-irreducible.

(ii) If every compact set is petite, then Φ is a T-chain; and conversely, if Φ is a

ψ-irreducible T-chain, then every compact set is petite.

(iii) If Φ is a ψ-irreducible Feller chain such that supp ψ has non-empty interior, then
Φ is a ψ-irreducible T-chain.

Proof Proposition 6.2.2 proves (i); (ii) is in Theorem 6.2.5; (iii) is in Theorem 6.2.9.

In order to have any such links as those in Theorem 6.0.1 between the measure-
theoretic and topological properties of a chain, it is vital that there be at least a minimal
adaptation of the dynamics of the chain to the topology of the space on which it lives.
For consider the chain on [0, 1] with transition law for x ∈ [0, 1] given by

P (n−1 , (n + 1)−1 ) = 1 − αn , P (n−1 , 0) = αn , n ∈ Z+ ; (6.1)

P (x, 1) = 1, x = n−1 , n ≥ 1. (6.2)

This chain fails to visit most open sets, although it is deﬁnitely irreducible provided
αn > 0 for all n: and although it never leaves a compact set, it is clearly unstable in
6.1. Feller properties and forms of stability 125

an obvious way if n αn < ∞, since then it moves monotonically down the sequence
{n−1 } with positive probability.
Of course, the dynamics of this chain are quite wrong for the space on which we
have embedded it: its structure is adapted to the normal topology on the integers, not
to that on the unit interval or the set {n−1 , n ∈ Z+ }. The Feller property obviously
fails at {0}, as does any continuous component property if αn → 0.
This is a trivial and pathological example, but one which proves valuable in exhibit-
ing the need for the various conditions we now consider, which do link the dynamics to
the structure of the space.

6.1 Feller properties and forms of stability

6.1.1 Weak and strong Feller chains
Recall that the transition probability kernel P acts on bounded functions through the
mapping
P h (x) = P (x, dy)h(y), x ∈ X. (6.3)

Suppose that X is a (locally compact separable metric) topological space, and let us
denote the class of bounded continuous functions from X to R by C(X).
The (weak) Feller property is frequently deﬁned by requiring that the transition
probability kernel P maps C(X) to C(X). If the transition probability kernel P maps all
bounded measurable functions to C(X) then P (and also Φ) is called strong Feller.
That this is consistent with the deﬁnition above follows from

Proposition 6.1.1. (i) The transition kernel P IO is lower semicontinuous for every
open set O ∈ B(X) (that is, Φ is weak Feller) if and only if P maps C(X) to C(X);
and P maps all bounded measurable functions to C(X) (that is, Φ is strong Feller)
if and only if the function P IA is lower semicontinuous for every set A ∈ B(X).

(ii) If the chain is weak Feller, then for any closed set C ⊂ X and any non-decreasing
function m : Z+ → Z+ the function Ex [m(τC )] is lower semicontinuous in x.
Hence for any closed set C ⊂ X, r > 1 and n ∈ Z+ the functions

Px {τC ≥ n} Ex [τC ] and Ex [rτ C ]

are lower semicontinuous.

(iii) If the chain is weak Feller, then for any open set O ⊂ X, the function Px {τO ≤ n}
and hence also the functions Ka (x, O) and L(x, O) are lower semicontinuous.

Proof To prove (i), suppose that Φ is Feller, so that P IO is lower semicontinuous

for any open set O. Choose f ∈ C(X), and assume initially that 0 ≤ f (x) ≤ 1 for all x.
For N ≥ 1 deﬁne the N th approximation to f as
N −1
1
fN (x) := IO k (x)
N
k =1
126 Topology and continuity

where Ok = {x : f (x) > k/N }. It is easy to see that fN ↑ f as N ↑ ∞, and by

assumption P fN is lower semicontinuous for each N . By monotone convergence, P fN ↑
P f as N ↑ ∞, and hence by Theorem D.4.1 the function P f is lower semicontinuous.
Identical reasoning shows that the function P (1 − f ) = 1 − P f , and hence also −P f ,
is lower semicontinuous. Applying Theorem D.4.1 once more we see that the function
P f is continuous whenever f is continuous with 0 ≤ f ≤ 1.
By scaling and translation it follows that P f is continuous whenever f is bounded
and continuous.
Conversely, if P maps C(X) to itself, and O is an open set then by Theorem D.4.1
there exist continuous positive functions fN such that fN (x) ↑ IO (x) for each x as
N ↑ ∞. By monotone convergence P IO = lim P fN , which by Theorem D.4.1 implies
that P IO is lower semicontinuous.
A similar argument shows that P is strong Feller if and only if the function P IA is
lower semicontinuous for every set A ∈ B(X).
We next prove (ii). By deﬁnition of τC we have Px {τC = 0} = 0, and hence
without loss of generality we may assume that m(0) = 0. For each i ≥ 1 deﬁne
∆m (i) := m(i) − m(i − 1), which is non-negative since m is non-increasing. By a change
of summation,
∞

E[m(τC )] = m(k)Px {τC = k}
k =1
∞
k
= ∆m (i)Px {τC = k}
k =1 i=1
∞
= ∆m (i)Px {τC ≥ i}.
i=1

Since by assumption ∆m (k) ≥ 0 for each k > 0, the proof of (ii) will be complete once
we have shown that Px {τC ≥ k} is lower semicontinuous in x for all k.
Since C is closed and hence IC c (x) is lower semicontinuous, by Theorem D.4.1 there
exist positive continuous functions fi , i ≥ 1, such that fi (x) ↑ IC c (x) for each x ∈ X.
Extend the deﬁnition of the kernel IA , given by
IA (x, B) = IA ∩B (x),
by writing for any positive function g
Ig (x, B) := g(x)IB (x).
Then for all k ∈ Z+ ,
Px {τC ≥ k} = (P IC c )k −1 (x, X) = lim (P If i )k −1 (x, X).
i→∞

It follows from the Feller property that {(P If i )k −1 (x, X) : i ≥ 1} is an increas-

ing sequence of continuous functions and, again by Theorem D.4.1, this shows that
Px {τC ≥ k} is lower semicontinuous in x, completing the proof of (ii).
Result (iii) is similar, and we omit the proof.

Many chains satisfy these continuity properties, and we next give some important
examples.
6.1. Feller properties and forms of stability 127

Weak Feller chains: the nonlinear state space models

One of the simplest examples of a weak Feller chain is the quite general nonlinear state
space model NSS(F ).
Suppose conditions (NSS1) and (NSS2) are satisﬁed, so that X = {Xn }, where

Xk = F (Xk −1 , Wk ),

for some smooth (C ∞ ) function F : X × Rp → X, where X is an open subset of Rn ; and

the random variables {Wk } are a disturbance sequence on Rp .
Proposition 6.1.2. The NSS(F ) model is always weak Feller.

Proof We have by definition that the mapping x → F (x, w) is continuous for each
fixed w ∈ R. Thus whenever h : X → R is bounded and continuous, h ◦ F (x, w) is
also bounded and continuous for each fixed w ∈ R. It follows from the Dominated
Convergence Theorem that

P h (x) = E[h(F (x, W ))]

= Γ(dw)h ◦ F (x, w) (6.4)

is a continuous function of x ∈ X.

This simple proof of weak continuity can be emulated for many models. It implies
that this aspect of the topological analysis of many models is almost independent of
the random nature of the inputs. Indeed, we could rephrase Proposition 6.1.2 as saying
that since the associated control model CM(F ) is a continuous function of the state for
each ﬁxed control sequence, the stochastic nonlinear state space model NSS(F ) is weak
Feller.
We shall see in Chapter 7 that this reﬂection of deterministic properties of CM(F )
by NSS(F ) is, under appropriate conditions, a powerful and exploitable feature of the
nonlinear state space model structure.

Weak and strong Feller chains: the random walk

The diﬀerence between the weak and strong Feller properties is graphically illustrated
in
Proposition 6.1.3. The unrestricted random walk is always weak Feller, and is strong
Feller if and only if the increment distribution Γ is absolutely continuous with respect
to Lebesgue measure µL e b on R.

Proof Suppose that h ∈ C(X): the structure (3.35) of the transition kernel for the
random walk shows that

P h (x) = h(y)Γ(dy − x)
R

= h(y + x)Γ(dy) (6.5)
R
128 Topology and continuity

and since h is bounded and continuous, P h is also bounded and continuous, again from
the Dominated Convergence Theorem. Hence Φ is always weak Feller, as we also know
from Proposition 6.1.2.
Suppose next that Γ possesses a density γ with respect to µL e b on R. Taking h in
(6.5) to be any bounded function, we have

P h (x) = h(y)γ(y − x) dy; (6.6)
R

but now from Lemma D.4.3 it follows that the convolution P h (x) = γ ∗ h is continuous,
and the chain is strong Feller.
Conversely, suppose the random walk is strong Feller. Then for any B such that
Γ(B) = δ > 0, by the lower semicontinuity of P (x, B) there exists a neighborhood O of
{0} such that
P (x, B) ≥ P (0, B)/2 = Γ(B)/2 = δ/2, x ∈ O. (6.7)
By Fubini’s Theorem and the translation invariance of µL e b we have for any A ∈ B(X)
Leb Leb
R
µ (dy)Γ(A − y) = R IA −y (x)Γ(dx)
R µ (dy)
= R
Γ(dx) R IA −x (y)µL e b (dy)
Leb
= µ (A)
since Γ(R) = 1. Thus we have in particular from (6.7) and (6.8)

µL e b (B) = R µL e b (dy)Γ(B − y)
≥ O µL e b (dy)Γ(B − y)
≥ δµL e b (O)/2
and hence µL e b Γ as required.

6.1.2 Strong Feller chains and open set irreducibility

Our ﬁrst interest in chains on a topological space lies in identifying their accessible sets.

Open set irreducibility

(i) A point x ∈ X is called reachable if for every open set O ∈ B(X)

containing x (i.e. for every neighborhood of x)

P n (y, O) > 0, y ∈ X.
n

(ii) The chain Φ is called open set irreducible if every point is reachable.

We will use often the following result, which is a simple consequence of the deﬁnition
of support.
6.1. Feller properties and forms of stability 129

Lemma 6.1.4. If Φ is ψ-irreducible, then x∗ is reachable if and only if x∗ ∈ supp (ψ).

Proof If x∗ ∈ supp (ψ) then, for any open set O containing x∗ , we have ψ(O) > 0
by the definition of the support. By ψ-irreducibility it follows that L(x, O) > 0 for all
x, and hence x∗ is reachable.
Conversely, suppose that x∗ ∈ supp (ψ), and let O = supp (ψ)c . The set O is open
by the definition of the support, and contains the state x∗ . By Proposition 4.2.3 there
exists an absorbing, full set A ⊆ supp (ψ). Since L(x, O) = 0 for x ∈ A it follows that
x∗ is not reachable.

It is easily checked that open set irreducibility is equivalent to irreducibility when
the state space of the chain is countable and is equipped with the discrete topology.
The open set irreducibility definition is conceptually similar to the ψ-irreducibility
definition: they both imply that “large” sets can be reached from every point in the
space. In the ψ-irreducible case large sets are those of positive ψ-measure, whilst in the
open set irreducible case, large sets are open non-empty sets.
In this book our focus is on the property of ψ-irreducibility as a fundamental struc-
tural property. The next result, despite its simplicity, begins to link that property to
the properties of open set irreducible chains.
Proposition 6.1.5. If Φ is a strong Feller chain, and X contains one reachable point
x∗ , then Φ is ψ-irreducible, with ψ = P (x∗ , · ).

Proof Suppose A is such that P (x∗ , A) > 0. By lower semicontinuity of P ( · , A),

there is a neighborhood O of x∗ such that P (z, A) > 0, z ∈ O. Now, since x∗ is
reachable, for any y ∈ X, we have for some n

P n +1
(y, A) ≥ P n (y, dz)P (z, A) > 0 (6.8)
O

which is the result.

This gives trivially
Proposition 6.1.6. If Φ is an open set irreducible strong Feller chain, then Φ is a
ψ-irreducible chain.

We will see below in Proposition 6.2.2 that this strong Feller condition, which (as is
clear from Proposition 6.1.3) may be unsatisfied for many models, is not needed in full
to get this result, and that Proposition 6.1.5 and Proposition 6.1.6 hold for T-chains
also.
There are now two different approaches we can take in connecting the topological and
continuity properties of Feller chains with the stochastic or measure-theoretic properties
of the chain. We can either weaken the strong Feller property by requiring in essence
that it only hold partially; or we could strengthen the weak Feller condition whilst
retaining its essential flavor.
It will become apparent that the former, T-chain, route is usually far more pro-
ductive, and we move on to this next. A strengthening of the Feller property to give
e-chains will then be developed in Section 6.4.
130 Topology and continuity

6.2 T-chains
6.2.1 T-chains and open set irreducibility
The calculations for NSS(F ) models and random walks show that the majority of the
chains we have considered to date have the weak Feller property.
However, we clearly need more than just the weak Feller property to connect
measure-theoretic and topological irreducibility concepts: every random walk is weak
Feller, and we know from Section 4.3.3 that any chain with increment measure concen-
trated on the rationals enters every open set but is not ψ-irreducible.
Moving from the weak to the strong Feller property is however excessive. Using the
ideas of sampled chains introduced in Section 5.5.1 we now develop properties of the
class of T-chains, which we shall ﬁnd includes virtually all models we will investigate,
and which appears almost ideally suited to link the general space attributes of the chain
with the topological structure of the space.
The T-chain deﬁnition describes a class of chains which are not totally adapted
to the topology of the space, in that the strongly continuous kernel T , being only a
“component” of P , may ignore many discontinuous aspects of the motion of Φ: but it
does ensure that the chain is not completely singular in its motion, with respect to the
normal topology on the space, and the strong continuity of T links set properties such
as ψ-irreducibility to the topology in a way that is not natural for weak continuity.
We illustrate precisely this point now, with the analogue of Proposition 6.1.5.
Proposition 6.2.1. If Φ is a T-chain, and X contains one reachable point x∗ , then Φ
is ψ-irreducible, with ψ = T (x∗ , · ).

Proof Let T be a continuous component for Ka : since T is everywhere non-trivial,

we must have in particular that T (x∗ , X) > 0. Suppose A is such that T (x∗ , A) > 0. By
lower semicontinuity of T ( · , A), there is a neighborhood O of x∗ such that T (w, A) > 0,
w ∈ O. Now, since x∗ is reachable, for any y ∈ X, we have from Proposition 5.5.2

Ka ε ∗a (y, A) ≥ Ka ε (y, dw)Ka (w, A)
O
≥ Ka ε (y, dw)T (w, A) > 0
O

which is the result.

This result has, as a direct but important corollary
Proposition 6.2.2. If Φ is an open set irreducible T-chain, then Φ is a ψ-irreducible
T-chain.

6.2.2 T-chains and petite sets

When the Markov chain Φ is ψ-irreducible, we know that there always exists at least
one petite set. When X is topological, it turns out that there is a perhaps surprisingly
direct connection between the existence of petite sets and the existence of continuous
components.
6.2. T-chains 131

In the next two results we show that the existence of suﬃcient open petite sets
implies that Φ is a T-chain.
Proposition 6.2.3. If an open νa -petite set A exists, then Ka possesses a continuous
component non-trivial on all of A.

Proof Since A is νa -petite, by deﬁnition we have

Ka ( · , · ) ≥ IA ( · )ν{ · }.

Now set T (x, B) := IA (x)ν(B): this is certainly a component of Ka , non-trivial on A.

Since A is an open set its indicator function is lower semicontinuous; hence T is a
continuous component of Ka .

Using such a construction we can build up a component which is non-trivial every-
where, if the space X is suﬃciently rich in petite sets. We need ﬁrst
Proposition 6.2.4. Suppose that for each x ∈ X there exists a probability distribution
ax on Z+ such that Ka x possesses a continuous component Tx which is non-trivial at
x. Then Φ is a T-chain.

Proof For each x ∈ X, let Ox denote the set

Ox = {y ∈ X : Tx (y, X) > 0}.

which is open since Tx ( · , X) is lower semicontinuous. Observe that by assumption,

x ∈ Ox for each x ∈ X.
By Lindelöf’s Theorem D.3.1 there exists a countable!subcollection of sets
{Oi : i ∈ Z+ } and corresponding kernels Ti and Ka i such that Oi = X. Letting
∞
∞

T = 2−k Tk and a= 2−k ak ,
k =1 k =1

it follows that Ka ≥ T , and hence satisﬁes the conclusions of the proposition.

We now get a virtual equivalence between the T-chain property and the existence
of compact petite sets.
Theorem 6.2.5. (i) If every compact set is petite, then Φ is a T-chain.
(ii) Conversely, if Φ is a ψ-irreducible T-chain then every compact set is petite, and
consequently if Φ is an open set irreducible T-chain then every compact set is
petite.

Proof Since X is σ-compact, there is a countable covering of open petite sets, and
the result (i) follows from Proposition 6.2.3 and Proposition 6.2.4.
Now suppose that Φ is ψ-irreducible, so that there exists some petite A ∈ B+ (X),
and let Ka have an everywhere non-trivial continuous component T .
By irreducibility Ka ε (x, A) > 0, and hence from (5.46)

Ka∗a ε (x, A) = Ka Ka ε (x, A) ≥ T Ka ε (x, A) > 0

132 Topology and continuity

for all x ∈ X.
The function T Ka ε ( · , A) is lower semicontinuous and positive everywhere on X.
Hence Ka∗a ε (x, A) is uniformly bounded from below on compact subsets of X. Propo-
sition 5.2.4 completes the proof that each compact set is petite.
The fact that we can weaken the irreducibility condition to open set irreducibility
follows from Proposition 6.2.2.

The following factorization, which generalizes Proposition 5.5.5, further links the
continuity and petiteness properties of T-chains.
Proposition 6.2.6. If Φ is a ψ-irreducible T-chain, then there is a sampling distribu-
tion b, an everywhere strictly positive, continuous function s : X → R, and a maximal
irreducibility measure ψb such that

Kb (x, B) ≥ s (x)ψb (B), x ∈ X, B ∈ B(X).

Proof If T is a continuous component of Ka , then we have from Proposi-

tion 5.5.5(iii),

Ka∗c (x, B) ≥ Ka (x, dy)s(y) ψc (B)

≥ T (x, s)ψc (B)

The function T ( · , s) is positive everywhere and lower semicontinuous, and therefore it

dominates an everywhere positive continuous function s ; and we can take b = a ∗ c to
get the required properties.

6.2.3 Feller chains, petite sets, and T-chains

We now investigate the existence of compact petite sets when the chain satisfies only
the (weak) Feller continuity condition. Ultimately this leads to an auxiliary condition,
satisfied by very many models in practice, under which a weak Feller chain is also a
T-chain.
We first require the following lemma for petite sets for Feller chains.
Lemma 6.2.7. If Φ is a ψ-irreducible Feller chain, then the closure of every petite set
is petite.

Proof By Proposition 5.2.4 and Proposition 5.5.4 and regularity of probability

measures on B(X) (i.e. a set A ∈ B(X) may be approximated from within by compact
sets), the set A is petite if and only if there exists a probability a on Z+ , δ > 0, and a
compact petite set C ⊂ X such that

Ka (x, C) ≥ δ, x ∈ A.

By Proposition 6.1.1 the function Ka (x, C) is upper semicontinuous when C is compact.

Thus we have
inf Ka (x, C) = inf Ka (x, C)
x∈Ā x∈A
6.2. T-chains 133

and this shows that the closure of a petite set is petite.

It is now possible to deﬁne auxiliary conditions under which all compact sets are
petite for a Feller chain.
Proposition 6.2.8. Suppose that Φ is ψ-irreducible. Then all compact subsets of X
are petite if either:
(i) Φ has the Feller property and an open ψ-positive petite set exists; or
(ii) Φ has the Feller property and supp ψ has non-empty interior.

Proof To see (i), let A be an open petite set of positive ψ-measure. Then Ka ε ( · , A)
is lower semicontinuous and positive everywhere, and hence bounded from below on
compact sets. Proposition 5.5.4 again completes the proof.
To see (ii), let A be a ψ-positive petite set, and deﬁne
Ak := closure {x : Ka ε (x, A) ≥ 1/k} ∩ supp ψ.
By Proposition 5.2.4 and Lemma 6.2.7, each Ak is petite. Since supp ψ has non-empty
interior it is of the second category, and hence there exists k ∈ Z+ and an open set
O ⊂ Ak ⊂ supp ψ. The set O is an open ψ-positive petite set, and hence we may apply
(i) to conclude (ii).

A surprising, and particularly useful, conclusion from this cycle of results concerning
petite sets and continuity properties of the transition probabilities is the following result,
showing that Feller chains are in many circumstances also T-chains. We have as a
corollary of Proposition 6.2.8 (ii) and Proposition 6.2.5 (ii) that
Theorem 6.2.9. If a ψ-irreducible chain Φ is weak Feller and if supp ψ has nonempty
interior then Φ is a T-chain.

These results indicate that the Feller property, which is a relatively simple condition
to verify in many applications, provides some strong consequences for ψ-irreducible
chains.
Since we may cover the state space of a ψ-irreducible Markov chain by a countable
collection of petite sets, and since by Lemma 6.2.7 the closure of a petite set is itself
petite, it might seem that Theorem 6.2.9 could be strengthened to provide an open
covering of X by petite sets without additional hypotheses on the chain. It would then
follow by Theorem 6.2.5 that any ψ-irreducible Feller chain is a T-chain.
Unfortunately, this is not the case, as is shown by the following counterexample.
Let X = [0, 1] with the usual topology, let 0 < |α| < 1, and deﬁne the Markov transition
function P for x > 0 by
P (x, {0}) = 1 − P (x, {αx}) = x
We set P (0, {0}) = 1. The transition function P is Feller and δ0 -irreducible. But for
any n ∈ Z+ we have
lim Px (τ{0} ≥ n) = 1,
x→0
from which it follows that there does not exist an open petite set containing the point
{0}.
Thus we have constructed a ψ-irreducible Feller chain on a compact state space
which is not a T-chain.
134 Topology and continuity

6.3 Continuous components for speciﬁc models

For a very wide range of the irreducible examples we consider, the support of the
irreducibility measure does indeed have non-empty interior under some “spread-out”
type of assumption. Hence weak Feller chains, such as the entire class of nonlinear
models, will have all of the properties of the seemingly much stronger T-chain models
provided they have an appropriate irreducibility structure.
We now identify a number of other examples of T-chains more explicitly.

6.3.1 Random walks

Suppose Φ is random walk on a half line. We have already shown that provided the
increment distribution Γ provides some probability of negative increments then the
chain is δ0 -irreducible, and moreover all of the sets [0, c] are small sets.
Thus all compact sets are small and we have immediately from Theorem 6.2.5
Proposition 6.3.1. The random walk on a half line with increment measure Γ is always
a ψ-irreducible T-chain provided that Γ(−∞, 0) > 0.

Exactly the same argument for a storage model with general state-dependent release
rule r(x), as discussed in Section 2.4.4, shows these models to be δ0 -irreducible T-chains
when the integral R(x) of (2.32) is ﬁnite for all x.
Thus the virtual equivalence of the petite compact set condition and the T-chain
condition provides an easy path to showing the existence of continuous components for
many models with a real atom in the space.
Assessing conditions for non-atomic chains to be T-chains is not quite as simple in
general. However, we can describe exactly what the continuous component condition
deﬁning T-chains means in the case of the random walk. Recall that the random walk
is called spread-out if some convolution power Γn ∗ is non-singular with respect to µL e b
on R.
Proposition 6.3.2. The unrestricted random walk is a T-chain if and only if it is
spread out.

Proof If Γ is spread out then for some M , and some positive function γ, we have

M∗
P (x, A) = Γ (A − x) ≥
M
γ(y)dy := T (x, A)
A −x

and exactly as in the proof of Proposition 6.1.3, it follows that T is strong Feller:
the spread-out assumption ensures that T (x, X) > 0 for all x, and so by choosing the
sampling distribution as a = δM we ﬁnd that Φ is a T-chain.
The converse is somewhat harder, since we do not know a priori that when Φ is a
T-chain, the component T can be chosen to be translation invariant. So let us assume
that the result is false, and choose A such that µL e b (A) = 0 but Γn ∗ (A) = 1 for every
n. Then Γn ∗ (Ac ) = 0 for all n and so for the sampling distribution a associated with
the component T ,

T (0, Ac ) ≤ Ka (0, Ac ) = Γn ∗ (Ac )a(n) = 0.
n
6.3. Continuous components for speciﬁc models 135

The non-triviality of the component T thus ensures T (0, A) > 0, and since T (x, A)
is lower semicontinuous, there exists a neighborhood O of {0} and a δ > 0 such that
T (x, A) ≥ δ > 0, x ∈ O.
Since T is a component of Ka , this ensures
Ka (x, A) ≥ δ > 0, x ∈ O.
But as in (6.8) by Fubini’s Theorem and the translation invariance of µL e b we have

µL e b (A) = µL e b (dy)Γn ∗ (A − y)
R
= µL e b (dy)P n (y, A). (6.9)
R
Multiplying both sides of (6.9) by a(n) and summing gives

µL e b (A) = R µL e b (dy)Ka (y, A)
≥ O µL e b (dy)Ka (y, A) (6.10)
≥ δµL e b (O)
and since µL e b (O) > 0, we have a contradiction.

This example illustrates clearly the advantage of requiring only a continuous com-
ponent, rather than the Feller property for the chain itself.

6.3.2 Linear models as T-chains

Proposition 6.3.2 implies that the random walk model is a T-chain whenever the distri-
bution of the increment variable W is suﬃciently rich that, from each starting point,
the chain does not remain in a set of zero Lebesgue measure.
This property, that when the set of reachable states is appropriately large the model
is a T-chain, carries over to a much larger class of processes, including the linear and
nonlinear state space models.
Suppose that X is a LSS(F ,G)model, deﬁned as usual by Xk +1 = F Xk + GWk +1 .
By repeated substitution in (LSS1) we obtain for any m ∈ Z+ ,

m −1
Xm = F m X0 + F i GWm −i . (6.11)
i=0

To obtain a continuous component for the LSS(F ,G) model, our approach is similar
to that in deriving its irreducibility properties in Section 4.4. We require that the
set of possible reachable states be large for the associated deterministic linear control
system, and we also require that the set of reachable states remain large when the
control sequence u is replaced by the random disturbance W . One condition suﬃcient
to ensure this is

Non-singularity condition for the LSS(F ,G) model

(LSS4) The distribution Γ of the random variable W is nonsingular with
respect to Lebesgue measure, with non-trivial density γw .
136 Topology and continuity

Using (6.11) we now show that the n-step transition kernel itself possesses a contin-
uous component provided, firstly, Γ is nonsingular with respect to Lebesgue measure
and secondly, the chain X can be driven to a sufficiently large set of states in Rn
through the action of the disturbance process W = {Wk } as described in the last term
of (6.11). This second property is a consequence of the controllability of the associated
model LCM(F ,G).
In Chapter 7 we will show that this construction extends further to more complex
nonlinear models.
Proposition 6.3.3. Suppose the deterministic control model LCM(F ,G) on Rn satisfies
the controllability condition (LCM3), and the associated LSS(F ,G) model X satisfies
the nonsingularity condition(LSS4).
Then the n-skeleton possesses a continuous component which is everywhere non-
trivial, so that X is a T-chain.

Proof We will prove this result in the special case where W is a scalar. The general
case with W ∈ Rp is proved using the same methods as in the case where p = 1, but
much more notation is needed for the required change of variables [272].
Let f denote an arbitrary positive function on X = Rn . From (6.11) together with
non-singularity of the disturbance process W we may bound the conditional mean of
f (Φn ) as follows:

n −1
P n f (x0 ) = E[f (F n x0 + F i GWn −i )] (6.12)
i=0

n −1
≥ ··· f (F n x0 + F i Gwn −i ) γw (w1 ) · · · γw (wn ) dw1 . . . dwn .
i=0

Letting Cn denote the controllability matrix in (4.13) and deﬁning the vector valued
n = (W1 , . . . , Wn ) , we deﬁne the kernel T as
random variable W

T f (x) := f (F n x + Cn w n ) γw (w
n ) dw
n .

We have T (x, X) = { γw (x) dx}n > 0, which shows that T is everywhere non-trivial;
and T is a component of P n since (6.12) may be written in terms of T as

P n f (x0 ) ≥ f (F n x0 + Cn w
n ) γw (w
n ) dw
n = T f (x0 ). (6.13)

Let |Cn | denote the determinant of Cn , which is non-zero since the pair (F, G) is con-
trollable. Making the change of variables

vn = Cn w
n , dvn = |Cn |dw
n

in (6.13) allows us to write

T f (x0 ) = f (F n x0 + vn )γw (Cn−1 vn )|Cn |−1 dvn .
6.3. Continuous components for speciﬁc models 137

By Lemma D.4.3 and the Dominated Convergence Theorem, the right hand side of this
identity is a continuous function of x0 whenever f is bounded. This combined with
(6.13) shows that T is a continuous component of P n .

In particular this shows that the ARMA process (ARMA1) and any of its variations
may be modeled as a T-chain if the noise process W is suﬃciently rich with respect to
Lebesgue measure, since they possess a controllable realization from Proposition 4.4.2.
In general, we can also obtain a T-chain by restricting the process to a controllable
subspace of the state space in the manner indicated after Proposition 4.4.3.

6.3.3 Linear models as ψ-irreducible T-chains

We saw in Proposition 4.4.3 that a controllable LSS(F ,G) model is ψ-irreducible (with
ψ equivalent to Lebesgue measure) if the distribution Γ of W is Gaussian. In fact, under
the conditions of that result, the process is also strong Feller, as we can see from the
exact form of (4.18). Thus the controllable Gaussian model is a ψ-irreducible T-chain,
with ψ speciﬁcally identiﬁed and the “component” T given by P itself.
In Proposition 6.3.3 we weakened the Gaussian assumption and still found condi-
tions for the LSS(F ,G) model to be a T-chain. We need extra conditions to retain
ψ-irreducibility.
Now that we have developed the general theory further we can also use substantially
weaker conditions on W to prove the chain possesses a reachable state, and this will
give us the required result from Section 6.2.1. We introduce the following condition on
the matrix F used in (LSS1):

Eigenvalue condition for the LSS(F ,G) model

(LSS5) The eigenvalues of F fall within the open unit disk in C.

We will use the following lemma to control the growth of the models below.
Lemma 6.3.4. Let ρ(F ) denote the modulus of the eigenvalue of F of maximum mod-
ulus, where F is an n × n matrix. Then for any matrix norm
·
we have the limit
1
log ρ(F ) = lim log
F n
. (6.14)
n →∞ n

Proof The existence of the limit (6.14) follows from the Jordan Decomposition
and is a standard result from linear systems theory: see [57] or Exercises 2.I.2 and 2.I.5
of [102] for details.

A consequence of Lemma 6.3.4 is that for any constants ρ, ρ satisfying ρ < ρ(F ) < ρ,
there exists c > 1 such that
c−1 ρn ≤
F n
≤ cρn . (6.15)
Hence for the linear state space model, under the eigenvalue condition (LSS5), the
convergence F n → 0 takes place at a geometric rate. This property is used in the
following result to give conditions under which the linear state space model is irreducible.
138 Topology and continuity

Proposition 6.3.5. Suppose that the LSS(F ,G) model X satisﬁes the density condition
(LSS4) and the eigenvalue condition (LSS5), and that the associated control system
LCM(F ,G) is controllable.
Then X is a ψ-irreducible T-chain and every compact subset of X is small.

Proof We have seen in Proposition 6.3.3 that the linear state space model is a
T-chain under these conditions. To obtain irreducibility we will construct a reachable
state and use Proposition 6.2.1.
Let w denote any element of the support of the distribution Γ of W , and let
∞

x = F k Gw .
k =0

If in (1.4), the control uk = w for all k, then the system xk converges to x uniformly
for initial conditions in compact subsets of X.
By (pointwise) continuity of the model, it follows that for any bounded set A ⊂ X and
open set O containing x , there exists ε > 0 suﬃciently small and N ∈ Z+ suﬃciently
large such that xN ∈ O whenever x0 ∈ A, and ui ∈ w + εB, for 1 ≤ i ≤ N , where B
denotes the open unit ball centered at the origin in X. Since w lies in the support of
the distribution of Wk we can conclude that P N (x0 , O) ≥ Γ(w + εB)N > 0 for x0 ∈ A.
Hence x is reachable, which by Proposition 6.2.1 and Proposition 6.3.3 implies that
Φ is ψ-irreducible for some ψ.
We now show that all bounded sets are small, rather than merely petite. Propo-
sition 6.3.3 shows that P n possesses a strong Feller component T . By Theorem 5.2.2
there exists a small set C for which T (x , C) > 0 and hence, by the Feller property, an
open set O containing x exists for which

inf T (x, C) > 0.

x∈O

By Proposition 5.2.4 O is also a small set. If A is a bounded set, then we have already
δM
shown that A O for some N , so applying Proposition 5.2.4 once more we have the
desired conclusion that A is small.

6.3.4 The ﬁrst-order SETAR model

Results for nonlinear models are not always as easy to establish. However, for simple
models similar conditions on the noise variables establish similar results. Here we
consider the ﬁrst-order SETAR models, which are deﬁned as piecewise linear models
satisfying
Xn = φ(j) + θ(j)Xn −1 + Wn (j), Xn −1 ∈ Rj
where −∞ = r0 < r1 < · · · < rM = ∞ and Rj = (rj −1 , rj ]; for each j, the noise
variables {Wn (j)} form an i.i.d. zero-mean sequence independent of {Wn (i)} for i = j.
Throughout, W (j) denotes a generic variable with distribution Γj .
In order to ensure that these models can be analyzed as T-chains we make the
following additional assumption, analogous to those above.
6.4. e-Chains 139

(SETAR2) For each j = 1, . . . , M , the noise variable W (j) has a density

positive on the whole real line.

Even though this model is not Feller, due to the possible presence of discontinuities
at the boundary points {ri }, we can establish
Proposition 6.3.6. Under (SETAR1) and (SETAR2), the SETAR model is a ϕ-
irreducible T-process with ϕ taken as Lebesgue measure µL e b on R.

Proof The µL e b -irreducibility is immediate from the assumption of positive densi-

ties for each of the W (j). The existence of a continuous component is less simple.
It is obvious from the existence of the densities that at any point in the interior
of any of the regions Ri the transition function is strongly continuous. We do not
necessarily have this continuity at the boundaries ri themselves. However, as x ↑ ri
we have strong continuity of P (x, · ) to P (ri , · ), whilst the limits as x ↓ ri of P (x, A)
always exist giving a limit measure P (ri , · ) which may diﬀer from P (ri , · ).
If we take Ti (x, · ) = min(P (ri , · ), P (ri , · ), P (x, · )) then Ti is a continuous com-
ponent of P at least in some neighborhood of ri ; and the assumption that the densities
of both W (i), W (i + 1) are positive everywhere guarantees that Ti is non-trivial.
But now we may put these components together using Proposition 6.2.4 and we
have shown that the SETAR model is a T-chain.

Clearly one can weaken the positive density assumption. For example, it is enough
for the T-chain result that for each j the supports of W (j) − φ(j) − θ(j)rj and W (j +
1) − φ(j + 1) − θ(j + 1)rj should not be distinct, whilst for the irreducibility one
can similarly require only that the densities of W (j) − φ(j) − θ(j)x exist in a ﬁxed
neighborhood of zero, for x ∈ (rj −1 , rj ]. For chains which do not for some structural
reason obey (SETAR2) one would need to check the conditions on the support of the
noise variables with care to ensure that the conclusions of Proposition 6.3.6 hold.

6.4 e-Chains
Now that we have developed some of the structural properties of T-chains that we will
require, we move on to a class of Feller chains which also have desirable structural
properties, namely e-chains.

6.4.1 e-Chains and dynamical systems

The stability of weak Feller chains is naturally approached in the context of dynamical
systems theory as introduced in the heuristic discussion in Chapter 1. Recall from
Section 1.3.2 that the Markov transition function P gives rise to a deterministic map
from M, the space of probabilities on B(X), to itself, and we can construct on this
basis a dynamical system (P, M, d), provided we specify a metric d, and hence also a
topology, on M.
To do this we now introduce the topology of weak convergence.
140 Topology and continuity

Weak convergence
A sequence of probabilities {µk : k ∈ Z+ } ⊂ M converges weakly to
w
µ∞ ∈ M (denoted µk −→ µ∞ ) if

lim f dµk = f dµ∞
k →∞

for every f ∈ C(X).

Due to our restrictions on the state space X, the topology of weak convergence is
induced by a number of metrics on M; see Section D.5. One such metric may be
expressed
∞
dm (µ, ν) = | fk dµ − fk dν|2−k , µ, ν ∈ M, (6.16)
k =0
where {fk } is an appropriate set of functions in Cc (X), the set of continuous functions
on X with compact support.
For (P, M, dm ) to be a dynamical system we require that P be a continuous map
on M. If P is continuous, then we must have in particular that if a sequence of point
masses {δx k : k ∈ Z+ } ⊂ M converge to some point mass δx ∞ ∈ M, then
w
δx k P −→ δx ∞ P as k → ∞
or equivalently, limk →∞ P f (xk ) = P f (x∞ ) for all f ∈ C(X). That is, if the Markov
transition function induces a continuous map on M, then P f must be continuous for
any bounded continuous function f .
This is exactly the weak Feller property. Conversely, it is obvious that for any weak
Feller Markov transition function P , the associated operator P on M is continuous.
We have thus shown
Proposition 6.4.1. The triple (P, M, dm ) is a dynamical system if and only if the
Markov transition function P has the weak Feller property.

Although we do not get further immediate value from this result, since there do not
exist a great number of results in the dynamical systems theory literature to be exploited
in this context, these observations guide us to stronger and more useful continuity
conditions.

Equicontinuity and e-chains

The Markov transition function P is called equicontinuous if for each f ∈
Cc (X) the sequence of functions {P k f : k ∈ Z+ } is equicontinuous on
compact sets.
A Markov chain which possesses an equicontinuous Markov transition func-
tion will be called an e-chain.
6.4. e-Chains 141

There is one striking result which very largely justiﬁes our focus on e-chains, espe-
cially in the context of more stable chains.
Proposition 6.4.2. Suppose that the Markov chain Φ has the Feller property, and that
there exists a unique probability measure π such that for every x
w
P n (x, · ) −→ π. (6.17)

Then Φ is an e-chain.

Proof Since the limit in (6.17) is continuous (and in fact constant) it follows from
Ascoli’s Theorem D.4.2 that the sequence of functions {P k f : k ∈ Z+ } is equicontinuous
on compact subsets of X whenever f ∈ C(X). Thus the chain Φ is an e-chain.

Thus chains with good limiting behavior, such as those in Part III in particular, are
forced to be e-chains, and in this sense the e-chain assumption is for many purposes a
minor extra step after the original Feller property is assumed.
Recall from Chapter 1 that the dynamical system (P, M, dm ) is called stable in the
sense of Lyapunov if for each measure µ ∈ M,

lim sup dm (νP k , µP k ) = 0.

ν →µ k ≥0

The following result creates a further link between classical dynamical systems theory,
and the theory of Markov chains on topological state spaces. The proof is routine and
we omit it.
Proposition 6.4.3. The Markov chain is an e-chain if and only if the dynamical system
(P, M, dm ) is stable in the sense of Lyapunov.

6.4.2 e-Chains and tightness

Stability in the sense of Lyapunov is a useful concept when a stationary point for the
dynamical system exists. If x∗ is a stationary point and the dynamical system is stable
in the sense of Lyapunov, then trajectories which start near x∗ will stay near x∗ , and
this turns out to be a useful notion of stability.
For the dynamical system (P, M, dm ), a stationary point is an invariant probability:
that is, a probability satisfying

π(A) = π(dx)P (x, A), A ∈ B(X). (6.18)

Conditions for such an invariant measure π to exist are the subject of considerable study
for ψ-irreducible chains in Chapter 10, and in Chapter 12 we return to this question for
weak Feller chains and e-chains.
A more immediately useful concept is that of Lagrange stability. Recall from Sec-
tion 1.3.2 that (P, M, dm ) is Lagrange stable if, for every µ ∈ M, the orbit of measures
µP k is a precompact subset of M. One way to investigate Lagrange stability for weak
Feller chains is to utilize the following concept, which will have much wider applicability
in due course.
142 Topology and continuity

Chains bounded in probability

The Markov chain Φ is called bounded in probability if for each initial
condition x ∈ X and each ε > 0, there exists a compact subset C ⊂ X such
that
lim inf Px {Φk ∈ C} ≥ 1 − ε.
k →∞

Boundedness in probability is simply tightness for the collection of probabilities

{P k (x, · ) : k ≥ 1}. Since it is well known [36] that a set of probabilities A ⊂ M is
tight if and only if A is precompact in the metric space (M, dm ) this proves

Proposition 6.4.4. The chain Φ is bounded in probability if and only if the dynamical
system (P, M, dm ) is Lagrange stable.

For e-chains, the concepts of boundedness in probability and Lagrange stability also
interact to give a useful stability result for a somewhat diﬀerent dynamical system.
The space C(X) can be considered as a normed linear space, where we take the norm
| · |c to be deﬁned for f ∈ C(X) as
∞

|f |c := 2−k sup |f (x)|
x∈C k
k =0

where {Ck } is a sequence of open precompact sets whose union is equal to X. The
associated metric dc generates the topology of uniform convergence on compact subsets
of X.
If P is a weak Feller kernel, then the mapping P on C(X) is continuous with respect
to this norm, and in this case the triple (P, C(X), dc ) is a dynamical system.
By Ascoli’s Theorem D.4.2, (P, C(X), dc ) will be Lagrange stable if and only if for
each initial condition f ∈ C(X), the orbit {P k f : k ∈ Z+ } is uniformly bounded, and
equicontinuous on compact subsets of X. This fact easily implies

Proposition 6.4.5. Suppose that Φ is bounded in probability. Then Φ is an e-chain if

and only if the dynamical system (P, C(X), dc ) is Lagrange stable.

To summarize, for weak Feller chains boundedness in probability and the equiconti-
nuity assumption are, respectively, exactly the same as Lagrange stability and stability
in the sense of Lyapunov for the dynamical system (P, M, dm ); and these stability con-
ditions are both simultaneously satisﬁed if and only if the dynamical system (P, M, dm )
and its dual (P, C(X), dc ) are simultaneously Lagrange stable.
These connections suggest that equicontinuity will be a useful tool for studying the
limiting behavior of the distributions governing the Markov chain Φ, a belief which will
be justiﬁed in the results in Chapter 12 and Chapter 18.
6.4. e-Chains 143

6.4.3 Examples of e-chains

The easiest example of an e-chain is the simple linear model described by (SLM1) and
(SLM2).
If x and y are two initial conditions for this model, and the resulting sample paths
are denoted {Xn (x)} and {Xn (y)} respectively for the same noise path, then by (SLM1)
we have
Xn +1 (x) − Xn +1 (y) = α(Xn (x) − Xn (y)) = αn +1 (x − y). (6.19)
If |α| ≤ 1, then this indicates that the sample paths should remain close together if
their initial conditions are also close.
From this observation we now show that the simple linear model is an e-chain under
the stability condition that |α| ≤ 1. Since the random walk on R is a special case of
the simple linear model with α = 1, this also implies that the random walk is also an
e-chain.
Proposition 6.4.6. The simple linear model deﬁned by (SLM1) and (SLM2) is an
e-chain provided that |α| ≤ 1.

Proof Let f ∈ Cc (X). By uniform continuity of f , for any ε > 0 we can ﬁnd δ > 0
so that |f (x) − f (y)| ≤ ε whenever |x − y| ≤ δ. It follows from (6.19) that for any
n ∈ Z+ , and any x, y ∈ R with |x − y| ≤ δ,

|P n +1 f (x) − P n +1 f (y)| = |E[f (Xn +1 (x)) − f (Xn +1 (y))]|

≤ E[|f (Xn +1 (x)) − f (Xn +1 (y))|]
≤ ε,

which shows that X is an e-chain.

Equicontinuity is rather difficult to verify or rule out directly in general, especially
before some form of stability has been established for the process. Although the equicon-
tinuity condition may seem strong, it is surprisingly difficult to construct a natural ex-
ample of a Feller chain which is not an e-chain. Indeed, our concentration on them is
justified by Proposition 6.4.2 and this does provide an indirect way to verify that many
Feller examples are indeed e-chains.
One example of a “non-e” chain is, however, provided by a “multiplicative random
walk” on R+ , defined by
-
Xk +1 = Xk Wk +1 , k ∈ Z+ , (6.20)

where W is a disturbance sequence on R+ whose marginal distribution possesses a ﬁnite

ﬁrst moment. The chain is Feller since the right hand side of (6.20) is continuous in
Xk . However, X is not an e-chain when R is equipped with the usual topology.
A complete proof of this fact requires more theory than we have so far developed,
but we can give a sketch to illustrate what can go wrong.
When X0 = 0, the process log Xk , k ∈ Z+ , is a version of the simple linear model
described in Chapter 2, with α = 12 . We will see in Section 10.5.4 that this implies that
for any X0 = x0 = 0 and any bounded continuous function f ,

P k f (x0 ) → f∞ , k → ∞,
144 Topology and continuity

where f∞ is a constant. When x0 = 0 we have that P k f (x0 ) = f (x0 ) = f (0) for all k.
From these observations it is easy to see that X is not an e-chain. Take f ∈ Cc (X)
with f (0) = 0 and f (x) ≥ 0 for all x > 0: we may assume without loss of generality
that f∞ > 0. Since the one-point set {0} is absorbing we have P k (0, {0}) = 1 for
all k, and it immediately follows that P k f converges to a discontinuous function. By
Ascoli’s Theorem the sequence of functions {P k f : k ∈ Z+ } cannot be equicontinuous
on compact subsets of R+ , which shows that X is not an e-chain.
However by modifying the topology on X = R+ we do obtain an e-chain as follows.
Deﬁne the topology on the strictly positive real line (0, ∞) in the usual way, and deﬁne
{0} to be open, so that X becomes a disconnected set with two open components. Then,
in this topology, P k f converges to a uniformly continuous function which is constant on
each component of X. From this and Ascoli’s Theorem it follows that X is an e-chain.
It appears in general that such pathologies are typical of “non-e” Feller chains, and
this again reinforces the value of our results for e-chains, which constitute the more
typical behavior of Feller chains.

6.5 Commentary
The weak Feller chain has been a basic starting point in certain approaches to Markov
chain theory for many years. The work of Foguel [121, 123], Jamison [174, 175, 176],
Lin [238], Rosenblatt [339] and Sine [356, 357, 358] have established a relatively rich
theory based on this approach, and the seminal book of Dynkin [105] uses the Feller
property extensively.
We will revisit this in much greater detail in Chapter 12, where we will also take up
the consequences of the e-chain assumption: this will be shown to have useful attributes
in the study of limiting behavior of chains.
The equicontinuity results here, which relate this condition to the dynamical systems
viewpoint, are developed by Meyn [260]. Equicontinuity may be compared to uniform
stability [174] or regularity [115]. Whilst e-chains have also been developed in detail,
particularly by Rosenblatt [337], Jamison [174, 175] and Sine [356, 357] they do not have
particularly useful connections with the ψ-irreducible chains we are about to explore,
which explains their relatively brief appearance at this stage.
The concept of continuous components appears first in Pollard and Tweedie [318,
319], and some practical applications are given in Laslett et al. [237]. The real ex-
ploitation of this concept really begins in Tuominen and Tweedie [391], from which we
take Proposition 6.2.2. The connections between T-chains and the existence of compact
petite sets is a recent result of Meyn and Tweedie [277].
In practice the identification of ψ-irreducible Feller chains as T-chains provided only
that supp ψ has non-empty interior is likely to make the application of the results for
such chains very much more common. This identification is new. The condition that
supp ψ have non-empty interior has however proved useful in a number of associated
areas in [319] and in Cogburn [75].
We note in advance here the results of Chapter 9 and Chapter 18, where we will
show that a number of stability criteria for general space chains have “topological”
analogues which, for T-chains, are exact equivalences. Thus T-chains will prove of
on-going interest.
6.5. Commentary 145

Finding criteria for chains to have continuity properties is a model-by-model exercise,

but the results on linear and nonlinear systems here are intended to guide this process
in some detail.
The assumption of a spread-out increment process, made in previous chapters for
chains such as the unrestricted random walk, may have seemed somewhat arbitrary. It is
striking therefore that this condition is both necessary and sufficient for random walk to
be a T-chain, as in Proposition 6.3.2 which is taken from Tuominen and Tweedie [391];
they also show that this result extends to random walks on locally compact Haussdorff
groups, which are T-chains if and only if the increment measure has some convolution
power non-singular with respect to (right) Haar measure. These results have been
extended to random walks on semi-groups by Högnas in [162].
In a similar fashion, the analysis carried out in Athreya and Pantula [15] shows that
the simple linear model satisfying the eigenvalue condition (LSS5) is a T-chain if and
only if the disturbance process is spread out. Chan et al. [64] show in effect that for
the SETAR model compact sets are petite under positive density assumptions, but the
proof here is somewhat more transparent.
These results all reinforce the impression that even for the simplest possible models
it is not possible to dispense with an assumption of positive densities, and we adopt it
extensively in the models we consider from here on.
Chapter 7

The nonlinear state space

model

In applying the results and concepts of Part I in the domains of times series or systems
theory, we have so far analyzed only linear models in any detail, albeit rather general
and multidimensional ones. This chapter is intended as a relatively complete description
of the way in which nonlinear models may be analyzed within the Markovian context
developed thus far. We will consider both the general nonlinear state space model, and
some speciﬁc applications which take on this particular form.
The pattern of this analysis is to consider ﬁrst some particular structural or sta-
bility aspect of the associated deterministic control, or CM(F ), model and then under
appropriate choice of conditions on the disturbance or noise process (typically a den-
sity condition as in the linear models of Section 6.3.2) to verify a related structural or
stability aspect of the stochastic nonlinear state space NSS(F ) model.
Highlights of this duality are

(i) if the associated CM(F ) model is forward accessible (a form of controllability),

and the noise has an appropriate density, then the NSS(F ) model is a T-chain
(Section 7.1);

(ii) a form of irreducibility (the existence of a globally attracting state for the CM(F )
model) is then equivalent to the associated NSS(F ) model being a ψ-irreducible
T-chain (Section 7.2);

(iii) the existence of periodic classes for the forward accessible CM(F ) model is fur-
ther equivalent to the associated NSS(F ) model being a periodic Markov chain,
with the periodic classes coinciding for the deterministic and the stochastic model
(Section 7.3).

Thus we can reinterpret some of the concepts which we have introduced for Markov
chains in this deterministic setting; and conversely, by studying the deterministic model
we obtain criteria for our basic assumptions to be valid in the stochastic case.
In Section 7.4.3 the adaptive control model is considered to illustrate how these
results may be applied in speciﬁc applications: for this model we exploit the fact that

146
7.1. Forward accessibility and continuous components 147

Φ is generated by a NSS(F ) model to give a simple proof that Φ is a ψ-irreducible and

aperiodic T-chain.
We will end the chapter by considering the nonlinear state space model without
forward accessibility, and showing how e-chain properties may then be established in
lieu of the T-chain properties.

7.1 Forward accessibility and continuous compo-

nents
The nonlinear state space model NSS(F ) may be interpreted as a control system driven
by a noise sequence exactly as the linear model is interpreted. We will take such a view-
point in this section as we generalize the concepts used in the proof of Proposition 6.3.3,
where we constructed a continuous component for the linear state space model.

7.1.1 Scalar models and forward accessibility

We first consider the scalar model SNSS(F ) defined by
Xn = F (Xn −1 , Wn ),
for some smooth (C ∞ ) function F : R × R → R and satisfying (SNSS1)–(SNSS2).
Recall that in (2.5) we defined the map Fk inductively, for x0 and wi arbitrary real
numbers, by
Fk +1 (x0 , w1 , . . . , wk +1 ) = F (Fk (x0 , w1 , . . . , wk ), wk +1 ),
so that for any initial condition X0 = x0 and any k ∈ Z+ ,
Xk = Fk (x0 , W1 , . . . , Wk ).
Now let {uk } be the associated scalar “control sequence” for CM(F ) as in (CM1), and
use this to define the resulting state trajectory for CM(F ) by
xk = Fk (x0 , u1 , . . . , uk ), k ∈ Z+ . (7.1)
Just as in the linear case, if from each initial condition x0 ∈ X a sufficiently large set
of states may be reached from x0 , then we will find that a continuous component may
be constructed for the Markov chain X. It is not important that every state may be
reached from a given initial condition; the main idea in the proof of Proposition 6.3.3,
which carries over to the nonlinear case, is that the set of possible states reachable from
a given initial condition is not concentrated in some lower dimensional subset of the
state space.
Recall also that we have assumed in (CM1) that for the associated deterministic
control model CM(F ) with trajectory (7.1), the control sequence {uk } is constrained so
that uk ∈ Ow , k ∈ Z+ , where the control set Ow is an open set in R.
For x ∈ X, k ∈ Z+ , we define Ak+ (x) to be the set of all states reachable from x at
time k by CM(F ): that is, A0+ (x) = {x}, and
$ %
Ak+ (x) := Fk (x, u1 , . . . , uk ) : ui ∈ Ow , 1 ≤ i ≤ k , k ≥ 1. (7.2)
148 The nonlinear state space model

We deﬁne A+ (x) to be the set of all states which are reachable from x at some time in
the future, given by
∞
*
A+ (x) := Ak+ (x). (7.3)
k =0

The analogue of controllability that we use for the nonlinear model is called forward
accessibility.

Forward accessibility
The associated control model CM(F ) is called forward accessible if for each
x0 ∈ X, the set A+ (x0 ) ⊂ X has non-empty interior.

For general nonlinear models, forward accessibility depends critically on the partic-
ular control set Ow chosen. This is in contrast to the linear state space model, where
conditions on the driving matrix pair (F, G) suﬃced for controllability.
Nonetheless, for the scalar nonlinear state space model we may show that forward
accessibility is equivalent to the following “rank condition”, similar to (LCM3):

Rank condition for the scalar CM(F ) model

(CM2) For each initial condition x00 ∈ R there exists k ∈ Z+ and a
sequence (u01 , . . . , u0k ) ∈ Owk such that the derivative
∂ ∂
Fk (x00 , u01 , . . . , u0k ) | · · · | Fk (x00 , u01 , . . . , u0k ) (7.4)
∂u1 ∂uk
is non-zero.

In the scalar linear case the control system (7.1) has the form

xk = F xk −1 + Guk ,

with F and G scalars. In this special case the derivative in (CM2) becomes exactly
[F k −1 G| · · · |F G|G], which shows that the rank condition (CM2) is a generalization of
the controllability condition (LCM3) for the linear state space model. This connection
will be strengthened when we consider multidimensional nonlinear models below.

Theorem 7.1.1. The control model CM(F ) is forward accessible if and only if the rank
condition (CM2) is satisﬁed.

A proof of this result would take us too far from the purpose of this book. It is
similar to that of Proposition 7.1.2, and details may be found in [271, 272].
7.1. Forward accessibility and continuous components 149

7.1.2 Continuous components for the scalar nonlinear model

Using the characterization of forward accessibility given in Theorem 7.1.1 we now show
how this condition on CM(F ) leads to the existence of a continuous component for the
associated SNSS(F ) model.
To do this we need to increase the strength of our assumptions on the noise process,
as we did for the linear model or the random walk.

Density for the SNSS(F ) model

(SNSS3) The distribution Γ of W is absolutely continuous, with a density
γw on R which is lower semicontinuous.
The control set for the SNSS(F ) model is the open set

Ow := {x ∈ R : γw (x) > 0}.

We know from the deﬁnitions that, with probability one, Wk ∈ Ow for all k ∈ Z+ .
Commonly assumed noise distributions satisfying this assumption include those which
possess a continuous density, such as the Gaussian model, or uniform distributions on
bounded open intervals in R.
We can now develop an explicit continuous component for such scalar nonlinear
state space models.
Proposition 7.1.2. Suppose that for the SNSS(F ) model, the noise distribution satis-
ﬁes (SNSS3), and that the associated control system CM(F ) is forward accessible. Then
the SNSS(F ) model is a T-chain.

Proof Since CM(F ) is forward accessible we have from Theorem 7.1.1 that the
rank condition (CM2) holds. For simplicity of notation, assume that the derivative
with respect to the kth disturbance variable is non-zero:
∂Fk 0 0
(x , w , . . . , wk0 ) = 0 (7.5)
∂wk 0 1

with (w10 , . . . , wk0 ) ∈ Owk . Deﬁne the function F k : R × Owk → R × Owk −1 × R as

F k (x0 , w1 , . . . , wk ) = x0 , w1 , . . . , wk −1 , xk ,

where xk = Fk (x0 , w1 , . . . , wk ). The total derivative of F k can be computed as

 
1 0 ··· 0
 .. .. 
 0 . . 
DF = 
k
 ..
,

 . 1 0 
∂ Fk
∂ x0
∂ Fk
∂w1 ··· ∂ Fk
∂wk
150 The nonlinear state space model

which is evidently full rank at (x00 , w10 , . . . , wk0 ). It follows from the Inverse Function
Theorem that there exists an open set

B = Bx 00 × Bw 10 × · · · × Bw k0 ,

containing (x00 , w10 , . . . , wk0 ), and a smooth function Gk : {F k {B}} → Rk +1 such that

Gk (F k (x0 , w1 , . . . , wk )) = (x0 , w1 , . . . , wk ) ,

for all (x0 , w1 , . . . , wk ) ∈ B.

Taking Gk to be the ﬁnal component of Gk , we see that for all (x0 , w1 , . . . , wk ) ∈ B,

Gk (x0 , w1 , . . . , wk −1 , xk ) = Gk (x0 , w1 , . . . , wk −1 , Fk (x0 , w1 , . . . , wk )) = wk .

We now make a change of variables, similar to the linear case. For any x0 ∈ Bx 00 , and
any positive function f : R → R+ ,

P k f (x0 ) = · · · f (Fk (x0 , w1 , . . . , wk ))γw (wk ) · · · γw (w1 ) dw1 · · · dwk (7.6)

≥ ··· f (Fk (x0 , w1 , . . . , wk ))γw (wk ) · · · γw (w1 ) dw1 · · · dwk .
Bw 0 Bw 0
1 k

We will ﬁrst integrate over wk , keeping the remaining variables ﬁxed. By making the
change of variables

xk = Fk (x0 , w1 , . . . , wk ), wk = Gk (x0 , w1 , . . . , wk −1 , xk ) ,

so that
∂Gk
dwk = | (x0 , w1 , . . . , wk −1 , xk )| dxk ,
∂xk
we obtain for (x0 , w1 , . . . , wk −1 ) ∈ Bx 00 × · · · × Bw k0 −1 ,

f (Fk (x0 , w1 , . . . , wk ))γw (wk ) dwk = f (xk )qk (x0 , w1 , . . . , wk −1 , xk ) dxk (7.7)
Bw 0 R
k

where we deﬁne, with ξ := (x0 , w1 , . . . , wk −1 , xk ),

∂Gk
qk (ξ) := I{Gk (ξ) ∈ B}γw (Gk (ξ))| (ξ)|.
∂xk
Since qk is positive and lower semicontinuous on the open set F k {B}, and zero on
F k {B}c , it follows that qk is lower semicontinuous on Rk +1 .
Deﬁne the kernel T0 for an arbitrary bounded function f as

T0 f (x0 ) := · · · f (xk ) qk (ξ) γw (w1 ) · · · γw (wk −1 ) dw1 · · · dwk −1 dxk . (7.8)

The kernel T0 is non-trivial at x00 since

∂Gk 0
qk (ξ 0 )γw (w10 ) · · · γw (wk0 −1 ) = | (ξ )|γw (wk0 )γw (w10 ) · · · γw (wk0 −1 ) > 0,
∂xk
7.1. Forward accessibility and continuous components 151

where ξ 0 = (x00 , w10 , . . . , wk0 −1 , x0k ). We will show that T0 f is lower semicontinuous on
R whenever f is positive and bounded.
Since qk (x0 , w1 , . . . , wk −1 , xk )γw (w1 ) · · · γw (wk −1 ) is a lower semicontinuous func-
tion of its arguments in Rk +1 , there exists a sequence of positive, continuous functions
ri : Rk +1 → R+ , i ∈ Z+ , such that for each i, the function ri has bounded support and,
as i ↑ ∞,

ri (x0 , w1 , . . . , wk −1 , xk ) ↑ qk (x0 , w1 , . . . , wk −1 , xk )γw (w1 ) · · · γw (wk −1 )

for each (x0 , w1 , . . . , wk −1 , xk ) ∈ Rk +1 . Deﬁne the kernel Ti using ri as

Ti f (x0 ) := f (xk )ri (x0 , w1 , . . . , wk −1 , xk ) dw1 · · · dwk −1 dxk .
Rk

It follows from the dominated convergence theorem that Ti f is continuous for any
bounded function f . If f is also positive, then as i ↑ ∞,

Ti f (x0 ) ↑ T0 f (x0 ), x0 ∈ R,

which implies that T0 f is lower semicontinuous when f is positive.

Using (7.6) and (7.7) we see that T0 is a continuous component of P k which is non-
zero at x00 . From Theorem 6.2.4, the model is a T-chain as claimed.

7.1.3 Simple bilinear model

The forward accessibility of the SNSS(F ) model is usually immediate since the rank
condition (CM2) is easily checked.
To illustrate the use of Proposition 7.1.2, and in particular the computation of the
“controllability vector” (7.4) in (CM2), we consider the scalar example where Φ is the
bilinear state space model on X = R deﬁned in (SBL1) by

Xk +1 = θXk + bWk +1 Xk + Wk +1

where W is a disturbance process. To place this bilinear model into the framework of
this chapter we assume

Density for the simple bilinear model

(SBL2) The sequence W is a disturbance process on R, whose marginal
distribution Γ possesses a ﬁnite second moment, and a density γw which
is lower semicontinuous.

Under (SBL1) and (SBL2), the bilinear model X is an SNSS(F ) model with F
deﬁned in (2.7).
First observe that the one-step transition kernel P for this model cannot possess
an everywhere non-trivial continuous component. This may be seen from the fact that
152 The nonlinear state space model

P (−1/b, {−θ/b}) = 1, yet P (x, {−θ/b}) = 0 for all x = −1/b. It follows that the only
positive lower semicontinuous function which is majorized by P ( · , {−θ/b}) is zero, and
thus any continuous component T of P must be trivial at −1/b: that is, T (−1/b, R) = 0.
This could be anticipated by looking at the controllability vector (7.4). The ﬁrst
order controllability vector is

∂F
(x0 , u1 ) = bx0 + 1,
∂u
which is zero at x0 = −1/b, and thus the ﬁrst order test for forward accessibility fails.
Hence we must take k ≥ 2 in (7.4) if we hope to construct a continuous component.
When k = 2 the vector (7.4) can be computed using the chain rule to give
∂F ∂F ∂F
(x0 , u1 ) |
(x1 , u2 ) (x1 , u2 )
∂x ∂u ∂u
= [(θ + bu2 )(bx0 + 1) | bx1 + 1]
= [(θ + bu2 )(bx0 + 1) | θbx0 + b2 u1 x0 + bu1 + 1]

which is non-zero for almost every uu 12 ∈ R2 . Hence the associated control model is
forward accessible, and this together with Proposition 7.1.2 gives

Proposition 7.1.3. If (SBL1) and (SBL2) hold, then the bilinear model is a T-chain.

7.1.4 Multidimensional models

Most nonlinear processes that are encountered in applications cannot be modeled by a
scalar Markovian model such as the SNSS(F ) model. The more general NSS(F ) model
is deﬁned by (NSS1), and we now analyze this in a similar way to the scalar model.
We again call the associated control system CM(F ) with trajectories

xk = Fk (x0 , u1 , . . . , uk ), k ∈ Z+ , (7.9)

forward accessible if the set of attainable states A+ (x), deﬁned as

∞ $
* %
A+ (x) := Fk (x, u1 , . . . , uk ) : ui ∈ Ow , 1 ≤ i ≤ k , k ≥ 1, (7.10)
k =0

has non-empty interior for every initial condition x ∈ X.

To verify forward accessibility we deﬁne a further generalization of the controllability
matrix introduced in (LCM3).
For x0 ∈ X and a sequence {uk : uk ∈ Ow , k ∈ Z+ } let {Ξk , Λk : k ∈ Z+ } denote
the matrices

∂F
Ξk +1 = Ξk +1 (x0 , u1 , . . . , uk +1 ) := ,
∂x (x k ,u k + 1 )

∂F
Λk +1 = Λk +1 (x0 , u1 , . . . , uk +1 ) := ,
∂u (x k ,u k + 1 )
7.1. Forward accessibility and continuous components 153

where xk = Fk (x0 , u1 , . . . , uk ). Let Cxk0 = Cxk0 (u1 , . . . , uk ) denote the generalized con-
trollability matrix (along the sequence u1 , . . . , uk )

Cxk0 := [Ξk · · · Ξ2 Λ1 | Ξk · · · Ξ3 Λ2 | · · · | Ξk Λk −1 | Λk ] . (7.11)

If F takes the linear form

F (x, u) = F x + Gu, (7.12)
then the generalized controllability matrix again becomes

Cxk0 = [F k −1 G | · · · | G],

which is the controllability matrix introduced in (LCM3).

Rank condition for the multidimensional CM(F ) model

(CM3) For each initial condition x0 ∈ Rn , there exists k ∈ Z+ and a
sequence u0 = (u01 , . . . , u0k ) ∈ Owk such that

rank Cxk0 (u0 ) = n. (7.13)

The controllability matrix Cyk is the derivative of the state xk = F (y, u1 , . . . , uk )

at time k with respect to the input sequence (u
k , . . . , u1 ). The following result is
a consequence of this fact together with the Implicit Function Theorem and Sard’s
Theorem (see [173, 272] and the proof of Proposition 7.1.2 for details).

Proposition 7.1.4. The nonlinear control model CM(F ) satisfying (7.9) is forward
accessible if and only the rank condition (CM3) holds.

To connect forward accessibility to the stochastic model (NSS1) we again assume

that the distribution of W possesses a density.

Density for the NSS(F ) model

(NSS3) The distribution Γ of W possesses a density γw on Rp which is
lower semicontinuous, and the control set for the NSS(F ) model is the
open set Ow := {x ∈ R : γw (x) > 0}.

Using an argument which is similar to, but more complicated than the proof of
Proposition 7.1.2, we may obtain the following consequence of forward accessibility.
154 The nonlinear state space model

Proposition 7.1.5. If the NSS(F ) model satisﬁes the density assumption (NSS3), and
the associated control model is forward accessible, then the state space X may be written
as the union of open small sets, and hence the NSS(F ) model is a T-chain.

Note that this only guarantees the T-chain property: we now move on to consider
the equally needed irreducibility properties of the NNS(F ) models.

7.2 Minimal sets and irreducibility

We now develop a more detailed description of reachable states and topological irre-
ducibility for the nonlinear state space NSS(F ) model, and exhibit more of the interplay
between the stochastic and topological communication structures for NSS(F ) models.
Since one of the major goals here is to exhibit further the links between the behavior
of the associated deterministic control model and the NSS(F ) model, it is ﬁrst helpful to
study the structure of the accessible sets for the control system CM(F ) with trajectories
(7.9).
A large part of this analysis deals with a class of sets called minimal sets for the
control system CM(F ). In this section we will develop criteria for their existence and
properties of their topological structure. This will allow us to decompose the state space
of the corresponding NSS(F ) model into disjoint, closed, absorbing sets which are both
ψ-irreducible and topologically irreducible.

7.2.1 Minimality for the deterministic control model

We deﬁne A+ (E) to be the set of all states attainable by CM(F ) from the set E at
some time k ≥ 0, and we let E 0 denote those states which cannot reach the set E:
*
A+ (E) := A+ (x), E 0 := {x ∈ X : A+ (x) ∩ E = ∅}.
x∈E

Because the functions Fk ( · , u1 , . . . , uk ) have the semi-group property

Fk +j (x0 , u1 , . . . , uk +j ) = Fj (Fk (x0 , u1 , . . . , uk ), uk +1 , . . . , uk +j ),

for x0 ∈ X, ui ∈ Ow , k, j ∈ Z+ , the set maps {Ak+ : k ∈ Z+ } also have this property:

that is,
Ak++j (E) = Ak+ (Aj+ (E)), E ⊂ X, k, j ∈ Z+ .
If E ⊂ X has the property that
A+ (E) ⊂ E,
then E is called invariant. For example, for all C ⊂ X, the sets A+ (C) and C 0 are
invariant, and since the closure, union, and intersection of invariant sets is invariant,
the set
∞ $ *∞ %
Ω+ (C) := Ak+ (C) (7.14)
N =1 k =N

is also invariant.
The following result summarizes these observations:
7.2. Minimal sets and irreducibility 155

Proposition 7.2.1. For the control system (7.9) we have for any C ⊂ X,
(i) A+ (C) and A+ (C) are invariant;
(ii) Ω+ (C) is invariant;
(iii) C 0 is invariant, and C 0 is also closed if the set C is open.

As a consequence of the assumption that the map F is smooth, and hence continuous,
we then have immediately
Proposition 7.2.2. If the associated CM(F ) model is forward accessible, then for the
NSS(F ) model:
(i) a closed subset A ⊂ X is absorbing for NSS(F ) if and only if it is invariant for
CM(F );
(ii) if U ⊂ X is open, then for each k ≥ 1 and x ∈ X,
Ak+ (x) ∩ U = ∅ ⇐⇒ P k (x, U ) > 0;

(iii) if U ⊂ X is open, then for each x ∈ X,

A+ (x) ∩ U = ∅ ⇐⇒ Ka ε (x, U ) > 0.

We now introduce minimal sets for the general CM(F ) model.

Minimal sets
We call a set minimal for the deterministic control model CM(F ) if it is
(topologically) closed, invariant, and does not contain any closed invariant
set as a proper subset.

For example, consider the LCM(F ,G) model introduced in (1.4). The assumption
(LCM2) simply states that the control set Ow is equal to Rp .
In this case the system possesses a unique minimal set M which is equal to X0 , the
range space of the controllability matrix, as described after Proposition 4.4.3. If the
eigenvalue condition (LSS5) holds then this is the only minimal set for the LCM(F ,G)
model.
The following characterizations of minimality follow directly from the deﬁnitions,
and the fact that both A+ (x) and Ω+ (x) are closed and invariant.
Proposition 7.2.3. The following are equivalent for a nonempty set M ⊂ X:
(i) M is minimal for CM(F );
(ii) A+ (x) = M for all x ∈ M ;
(iii) Ω+ (x) = M for all x ∈ M .

156 The nonlinear state space model

7.2.2 M -Irreducibility and ψ-irreducibility

Proposition 7.2.3 asserts that any state in a minimal set can be “almost reached” from
any other state. This property is similar in ﬂavor to topological irreducibility for a
Markov chain. The link between these concepts is given in the following central result
for the NSS(F ) model.

Theorem 7.2.4. Let M ⊂ X be a minimal set for CM(F ). If CM(F ) is forward acces-
sible and the disturbance process of the associated NSS(F ) model satisﬁes the density
condition (NSS3), then

(i) the set M is absorbing for NSS(F );

(ii) the NSS(F ) model restricted to M is an open set irreducible (and so ψ-irreducible)
T-chain.

Proof That M is absorbing follows directly from Proposition 7.2.3, proving M =

A+ (x) for some x; Proposition 7.2.1, proving A+ (x) is invariant; and Proposition 7.2.2,
proving any closed invariant set is absorbing for the NSS(F ) model.
To see that the process restricted to M is topologically irreducible, let x0 ∈ M ,
and let U ⊆ X be an open set for which U ∩ M = ∅. By Proposition 7.2.3 we have
A+ (x0 ) ∩ U = ∅. Hence by Proposition 7.2.2 Ka ε (x0 , U ) > 0, which establishes open set
irreducibility. The process is then ψ-irreducible from Proposition 6.2.2 since we know
it is a T-chain from Proposition 7.1.5.

Clearly, under the conditions of Theorem 7.2.4, if X itself is minimal then the NSS(F )
model is both ψ-irreducible and open set irreducible. The condition that X be mini-
mal is a strong requirement which we now weaken by introducing a diﬀerent form of
“controllability” for the control system CM(F ).
We say that the deterministic control system CM(F ) is indecomposable if its state
space X does not contain two disjoint closed invariant sets. This condition is clearly
necessary for CM(F ) to possess a unique minimal set. Indecomposability is not suﬃcient
to ensure the existence of a minimal set: take X = R, Ow = (0, 1), and

xk +1 = F (xk , uk +1 ) = xk + uk +1 ,

so that all proper closed invariant sets are of the form [t, ∞) for some t ∈ R. This
system is indecomposable, yet no minimal sets exist.

Irreducible control models

If CM(F ) is indecomposable and also possesses a minimal set M , then
CM(F ) will be called M -irreducible.

If CM(F ) is M -irreducible it follows that M 0 = ∅: otherwise M and M 0 would be

disjoint non-empty closed invariant sets, contradicting indecomposability. To establish
7.3. Periodicity for nonlinear state space models 157

necessary and suﬃcient conditions for M -irreducibility we introduce a concept from

dynamical systems theory. A state x ∈ X is called globally attracting if for all y ∈ X,

x ∈ Ω+ (y).

The following result easily follows from the deﬁnitions.

Proposition 7.2.5. (i) The nonlinear control system (7.9) is M -irreducible if and
only if a globally attracting state exists.

(ii) If a globally attracting state x exists then the unique minimal set is equal to
A+ (x ) = Ω+ (x ).

We can now provide the desired connection between irreducibility of the nonlinear
control system and ψ-irreducibility for the corresponding Markov chain.

Theorem 7.2.6. Suppose that CM(F ) is forward accessible and the disturbance process
of the associated NSS(F ) model satisﬁes the density condition (NSS3).
Then the NSS(F ) model is ψ-irreducible if and only if CM(F ) is M -irreducible.

Proof If the NSS(F ) model is ψ-irreducible, let x be any state in supp ψ, and
let U be any open set containing x . By deﬁnition we have ψ(U ) > 0, which implies
that Ka ε (x, U ) > 0 for all x ∈ X. By Proposition 7.2.2 it follows that x is globally
attracting, and hence CM(F ) is M -irreducible by Proposition 7.2.5.
Conversely, suppose that CM(F ) possesses a globally attracting state, and let U
be an open petite set containing x . Then A+ (x) ∩ U = ∅ for all x ∈ X, which by
Proposition 7.2.2 and Proposition 5.5.4 implies that the NSS(F ) model is ψ-irreducible
for some ψ.

7.3 Periodicity for nonlinear state space models

We now look at the periodic structure of the nonlinear NSS(F ) model to see how the
cycles of Section 5.4.3 can be further described, and in particular their topological
structure elucidated.
We ﬁrst demonstrate that minimal sets for the deterministic control model CM(F )
exhibit periodic behavior. This periodicity extends to the stochastic framework in a
natural way, and under mild conditions on the deterministic control system, we will see
that the period is in fact trivial, so that the chain is aperiodic.

7.3.1 Periodicity for control models

To develop a periodic structure for CM(F ) we mimic the construction of a cycle for an
irreducible Markov chain. To do this we ﬁrst require a deterministic analogue of small
sets: we say that the set C is k-accessible from the set B, for any k ∈ Z+ , if for each
y ∈ B,
C ⊂ Ak+ (y).
158 The nonlinear state space model

k
This will be denoted B −→ C. From the Implicit Function Theorem, in a manner
similar to the proof of Proposition 7.1.2, we can immediately connect k-accessibility
with forward accessibility.
Proposition 7.3.1. Suppose that the CM(F ) model is forward accessible. Then for
each x ∈ X, there exist open sets Bx , Cx ⊂ X, with x ∈ Bx and an integer kx ∈ Z+
kx
such that Bx −→ Cx .

In order to construct a cycle for an irreducible Markov chain, we ﬁrst constructed a

νn -small set A with νn (A) > 0. A similar construction is necessary for CM(F ).
Lemma 7.3.2. Suppose that the CM(F ) model is forward accessible. If M is minimal
for CM(F ) then there exists an open set E ⊂ M , and an integer n ∈ Z+ , such that
n
E −→ E.

Proof Using Proposition 7.3.1 we ﬁnd that there exist open sets B and C, and an
k
integer k with B −→ C, such that B ∩ M = ∅. Since M is invariant, it follows that

C ⊂ A+ (B ∩ M ) ⊂ M, (7.15)

and by Proposition 7.2.1, minimality, and the hypothesis that the set B is open,

A+ (x) ∩ B = ∅ (7.16)

for every x ∈ M .
Combining (7.15) and (7.16) it follows that Am+ (c) ∩ B = ∅ for some m ∈ Z+ , and
c ∈ C. By continuity of the function F we conclude that there exists an open set E ⊂ C
such that
Am + (x) ∩ B = ∅ for all x ∈ E.
The set E satisﬁes the conditions of the lemma with n = m + k since by the semi-group
property,
+ (x)) ⊃ A+ (A+ (x) ∩ B) ⊃ C ⊃ E
An+ (x) = Ak+ (Am k m

for all x ∈ E.

Call a ﬁnite ordered collection of disjoint closed sets G := {Gi : 1 ≤ i ≤ d} a periodic
orbit if for each i,

A1+ (Gi ) ⊂ Gi+1 , i = 1, . . . , d (mod d).

The integer d is called the period of G.

The cyclic result for CM(F ) is given in
Theorem 7.3.3. Suppose that the function F : X × Ow → X is smooth, and that the
system CM(F ) is forward accessible.
If M is a minimal set, then there ! exists an integer d ≥ 1, and disjoint closed sets
d
G = {Gi : 1 ≤ i ≤ d} such that M = i=1 Gi , and G is a periodic orbit. It is unique
in the sense that if H is another periodic orbit whose union is equal to M with period
d , then d divides d, and for each i the set Hi may be written as a union of sets from
G.
7.3. Periodicity for nonlinear state space models 159

Proof Using Lemma 7.3.2 we can ﬁx an open set E with E ⊂ M , and an integer
k
k such that E −→ E. Deﬁne I ⊂ Z+ by
n
I := {n ≥ 1 : E −→ E}. (7.17)
The semi-group property implies that the set I is closed under addition: for if i, j ∈ I,
then for all x ∈ E,
Ai+j j j
+ (x) = A+ (A+ (x)) ⊃ A+ (E) ⊃ E.
i

Let d denote g.c.d.(I). The integer d will be called the period of M , and M will be
called aperiodic when d = 1.
For 1 ≤ i ≤ d we define
∞
*
Gi := {x ∈ M : Ak+d−i (x) ∩ E = ∅}. (7.18)
k =1
!d
By Proposition 7.2.1 it follows that M = i=1 Gi .
Since E is an open subset of M , it follows that for each i ∈ Z+ , the set Gi is open
in the relative topology on M . Once we have shown that the sets {Gi } are disjoint, it
will follow that they are closed in the relative topology on M . Since M itself is closed,
this will imply that for each i, the set Gi is closed.
We now show that the sets {Gi } are disjoint. Suppose that on the contrary x ∈
Gi ∩ Gj for some i = j. Then there exists ki , kj ∈ Z+ such that
k d−j
Ak+i d−i (y) ∩ E = ∅ and A+j (y) ∩ E = ∅ (7.19)
when y = x. Since E is open, we may find an open set O ⊂ X containing x such that
(7.19) holds for all y ∈ O.
By Proposition 7.2.1, there exists v ∈ E and n ∈ Z+ such that
An+ (v) ∩ O = ∅. (7.20)
k
By (7.20), (7.19), and since E −→
0
E we have for δ = i, j, and all z ∈ E,
Ak+0 +k δ d−δ +n +k 0 (z) ⊃ Ak+0 +k δ d−δ +n (E)
⊃ Ak+0 +k δ d−δ (An+ (v) ∩ O)
⊃ Ak+0 (Ak+δ d−δ (An+ (v) ∩ O) ∩ E) ⊃ E.
This shows that
2k0 + kδ d − δ + n ∈ I
for δ = i, j, and this contradicts the definition of d. We conclude that the sets {Gi }
are disjoint.
We now show that G is a periodic orbit. Let x ∈ Gi , and u ∈ Ow . Since the sets
{Gi } form a disjoint cover of M and since M is invariant, there exists a unique 1 ≤ j ≤ d
such that F (x, u) ∈ Gj . It follows from the semi-group property that x ∈ Gj −1 , and
hence i = j − 1.
The uniqueness of this construction follows from the definition given in equation
(7.18).

The following consequence of Theorem 7.3.3 further illustrates the topological struc-
ture of minimal sets.
160 The nonlinear state space model

Proposition 7.3.4. Under the conditions of Theorem 7.3.3, if the control set Ow is
connected, then the periodic orbit G constructed in Theorem 7.3.3 is precisely equal to
the connected components of the minimal set M .
In particular, in this case M is aperiodic if and only if it is connected.

n
Proof First suppose that M is aperiodic. Let E −→ E, and consider a ﬁxed state
v ∈ E.
By aperiodicity and Lemma D.7.4 there exists an integer N0 with the property that

e ∈ Ak+ (v) (7.21)

for all k ≥ N0 . Since Ak+ (v) is the continuous image of the connected set v × Owk , the
set
∞
*
A+ (AN+
0
(v)) = Ak+ (v) (7.22)
k =N 0

is connected. Its closure is therefore also connected, and by Proposition 7.2.1 the closure
of the set (7.22) is equal to M .
The periodic case is treated similarly. First we show that for some N0 ∈ Z+ we have
∞
*
Gd = Ak+d (v),
k =N 0

where d is the period of M , and each of the sets Ak+d (v), k ≥ N0 , contains v.
This shows that Gd is connected. Next, observe that

G1 = A1+ (Gd ),

and since the control set Ow and Gd are both connected, it follows that G1 is also
connected. By induction, each of the sets {Gi : 1 ≤ i ≤ d} is connected.

7.3.2 Periodicity
All of the results described above dealing with periodicity of minimal sets were posed
in a purely deterministic framework. We now return to the stochastic model described
by (NSS1)–(NSS3) to see how the deterministic formulation of periodicity relates to the
stochastic deﬁnition which was introduced for Markov chains in Section 5.4.
As one might hope, the connections are very strong.
Theorem 7.3.5. If the NSS(F ) model satisﬁes conditions (NSS1)–(NSS3) and the
associated control model CM(F ) is forward accessible then:
(i) if M is a minimal set, then the restriction of the NSS(F ) model to M is a ψ-
irreducible T-chain, and the periodic orbit {Gi : 1 ≤ i ≤ d} ⊂ M whose existence
is guaranteed by Theorem 7.3.3 is ψ-a.e. equal to the d-cycle constructed in The-
orem 5.4.4;
(ii) if CM(F ) is M -irreducible, and if its unique minimal set M is aperiodic, then the
NSS(F ) model is a ψ-irreducible aperiodic T-chain.
7.4. Forward accessible examples 161

Proof The proof of (i) follows directly from the deﬁnitions, and the observation
that by reducing E if necessary, we may assume that the set E which is used in the proof
of Theorem 7.3.3 is small. Hence the set E plays the same role as the small set used in
the proof of Theorem 5.2.1. The proof of (ii) follows from (i) and Theorem 7.2.4.

7.4 Forward accessible examples

We now see how speciﬁc models may be viewed in this general context. It will become
apparent that without making any unnatural assumptions, both simple models such as
the dependent parameter bilinear model, and relatively more complex nonlinear models
such as the gumleaf attractor with noise and adaptive control models can be handled
within this framework.

7.4.1 The dependent parameter bilinear model

The dependent parameter bilinear model is a simple NSS(F ) model where the function
F is given in (2.15) by
αθ + Z
F Yθ , WZ
= . (7.23)
θY + W
Using Proposition 7.1.4 it is easy to see that the associated control model is forward
accessible, and then the model is easily analyzed. We have
Proposition 7.4.1. The dependent parameter bilinear model Φ satisfying assump-
tions (DBL1)–(DBL2) is a T-chain. If further there exists some one z ∗ ∈ Oz such
that
z∗
| | < 1, (7.24)
1−α
then Φ is ψ-irreducible and aperiodic .
Z
Proof With the noise W considered a “control”, the ﬁrst order controllability
matrix may be computed to give

∂ Yθ 11 1 0
Cθ ,y = Z 1 =
1
.
∂ W 0 1
1

The control model is thus forward accessible, and hence Φ = Yθ is a T-chain.
Suppose now that the bound (7.24) holds for z ∗ and let w∗ denote any element of
Ow ⊆ R. If Zk and Wk are set equal to z ∗ and w∗ respectively in (7.23) then as k → ∞

θk ∗ z ∗ (1 − α)−1
→ x := .
Yk w∗ (1 − α)(1 − α − z ∗ )−1
The state x∗ is globally attracting, and it immediately follows from Proposition 7.2.5
and Theorem 7.2.6 that the chain is ψ-irreducible. Aperiodicity then follows from the
fact that any cycle must contain the state x∗ .

162 The nonlinear state space model

7.4.2 The gumleaf attractor

Consider the NSS(F ) model whose sample paths evolve to create the version of the
“gumleaf attractor” illustrated in Figure 2.3. This model is given in (2.12) by

Xna −1/Xna −1 + 1/Xnb −1 Wn
Xn = = +
Xnb Xna −1 0

which is of the form (NSS1), with the associated CM(F ) model deﬁned as

a
x −1/xa + 1/xb u
F xb ,u = + . (7.25)
xa 0

From the formulae

∂F (1/xa )2 −(1/xb )2 ∂F 1
= =
∂x 1 0 ∂u 0

we see that the second order controllability matrix is given by

(1/xa1 )2 1
Cx2 0 (u1 , u2 ) =
1 0
x a
where x0 = x0b and xa1 = −1/xa0 + 1/xb0 + u1 . Hence, since Cx2 0 is full rank for
0
all x0 , u1 and u2 , it follows that the control system is forward accessible. Applying
Proposition 7.2.6 gives

Proposition 7.4.2. The NSS(F ) model (2.12) is a T-chain if the disturbance sequence
W satisﬁes condition (NSS3).

7.4.3 The adaptive control model

The adaptive control model described by (2.22)–(2.24) is of the general form of the
NSS(F ) model and the results of the previous section are well suited to the analysis of
this specific example
An apparent difficulty with this model is that the state space X is not an open
subset of Euclidean space, so that the general results obtained for the NSS(F ) model
may not seem to apply directly. However, given our assumptions on the model, the
2 ) × R , is absorbing, and is reached in one step with
σz 2
interior of the state space, (σz , 1−α
probability one from each initial condition. Hence to obtain a continuous component,
and to address periodicity for the adaptive model, we can apply the general results
obtained for the nonlinear state space models by first restricting Φ to the interior of X.

Proposition 7.4.3. If (SAC1) and (SAC2) hold for the adaptive control model deﬁned
by (2.22)–(2.24), and if σz2 < 1, then Φ is a ψ-irreducible and aperiodic T-chain.
7.5. Equicontinuity and the nonlinear state space model 163

Proof To prove the result we show that the associated deterministic control model
for the nonlinear state space model deﬁned by (2.22)–(2.24) is forward accessible and,
for the associated deterministic control system, a globally attracting point exists.
The second-order controllability matrix has the form
 
−2α 2 σ w2 Σ 21 Y 1
0 0 0
∂(Σ2 , θ̃2 , Y2 )  (Σ 1 Y 12 +σ w2 ) 2 
CΦ2 0 (Z2 , W2 , Z1 , W1 ) :=
= • • 1 •
∂(Z2 , W2 , Z1 , W1 )
• • 0 1

where “•” denotes a variable which does not aﬀect the rank of the controllability
matrix. It is evident that CΦ2 0 is full rank whenever Y1 = θ̃0 Y0 + W1 is non-zero.
This shows that for each initial condition Φ0 ∈ X, the matrix CΦ2 0 is full rank for a.e.
{(Z1 , W1 ), (Z2 , W2 )} ∈ R4 , and so the associated control model is forward accessible,
and hence the stochastic model Zis a T-chain by Proposition 7.1.5.
It is easily checked that if W is set equal to zero in (2.22)–(2.23) then, since α < 1
and σz2 < 1,
σz2
Φk → ( , 0, 0) as k → ∞.
1 − α2
This shows that the control model associated with the Markov chain Φ is M -irreducible,
and hence by Proposition 7.2.6 the chain itself is ψ-irreducible. The limit above also
shows that every element of a cycle {Gi } for the unique minimal set must contain the
σ2
point ( 1−αz 2 , 0, 0). From Proposition 7.3.4 it follows that the chain is aperiodic.

7.5 Equicontinuity and the nonlinear state space

model
7.5.1 e-Chain properties of nonlinear state space models
We have seen in this chapter that the NSS(F ) model is a T-chain if the noise variable,
viewed as a control, can “steer the state process Φ” to a sufficiently large set of states.
If the forward accessibility property does not hold then the chain must be analyzed
using different methods. The process is always a Feller Markov chain, because of the
continuity of F , as shown in Proposition 6.1.2. In this section we search for conditions
under which the process Φ is also an e-chain.
To do this we consider the sensitivity process associated with the NSS(F ) model,
defined by ∇Φ0 = I and

∇Φk +1 = [DF (Φk , wk +1 )]∇Φk , k ∈ Z+ (7.26)

where ∇Φ takes values in the set of n × n matrices, and DF denotes the derivative of
F with respect to its ﬁrst variable.
Since ∇Φ0 = I it follows from the chain rule and induction that the sensitivity process
is in fact the derivative of the present state with respect to the initial state: that is,
d
∇Φk = Φk for all k ∈ Z+ .
dΦ0
164 The nonlinear state space model

The main result in this section connects stability of the derivative process with
equicontinuity of the transition function for Φ. Since the system (7.26) is closely related
to the system (NSS1), linearized about the sample path (Φ0 , Φ1 , . . . ), it is reasonable
to expect that the stability of Φ will be closely related to the stability of ∇Φ .
Theorem 7.5.1. Suppose that (NSS1)–(NSS3) hold for the NSS(F ) model. Then let-
ting ∇Φk denote the derivative of Φk with respect to Φ0 , k ∈ Z+ , we have
(i) if for some open convex set N ⊂ X,

E[ sup
∇Φk
] < ∞ (7.27)
Φ 0 ∈N

then for all x ∈ N ,

d
Ex [Φk ] = Ex [∇Φk ];
dx
(ii) suppose that (7.27) holds for all suﬃciently small neighborhoods N of each y0 ∈ X,
and further that for any compact set C ⊂ X,

sup sup Ey [
∇Φk
] < ∞.
y ∈C k ≥0

Then Φ is an e-chain.

Proof The ﬁrst result is a consequence of the Dominated Convergence Theorem.

To prove the second result, let f ∈ Cc (X) ∩ C ∞ (X). Then
d d

P k f (x) = Ex [f (Φk )] ≤
f
∞ Ex [
∇Φk
]
dx dx
which by the assumptions of (ii), implies that the sequence of functions {P k f : k ∈ Z+ }
is equicontinuous on compact subsets of X. Since C ∞ ∩ Cc is dense in Cc , this completes
the proof.

It may seem that the technical assumption (7.27) will be difficult to verify in practice.
However, we can immediately identify one large class of examples by considering the
case where the i.i.d. process W is uniformly bounded. It follows from the smoothness
condition on F that supΦ 0 ∈N
∇Φk
is almost surely finite for any compact subset N ⊂ X,
which shows that in this case (7.27) is trivially satisfied.
The following result provides another large class of models for which (7.27) is satis-
fied. Observe that the conditions imposed on W in Proposition 7.5.2 are satisfied for
any i.i.d. Gaussian process. The proof is straightforward.
Proposition 7.5.2. For the Markov chain defined by (NSS1)–(NSS3), suppose that F
is a rational function of its arguments, and that for some ε0 > 0,

E[exp(ε0 |W1 |)] < ∞.

Then letting ∇Φk denote the derivative of Φk with respect to Φ0 , we have for any compact
set C ⊂ X, and any k ≥ 0,
E[ sup
∇Φk
] < ∞.
Φ 0 ∈C
7.6. Commentary* 165

Hence under these conditions,

d
Ex [Φk ] = Ex [∇Φk ].
dx

7.5.2 Linear state space models

We can easily specialize Theorem 7.5.1 to give conditions under which a linear model
is an e-chain.
Proposition 7.5.3. Suppose the LSS(F ,G) model X satisﬁes (LSS1) and (LSS2), and
that the eigenvalue condition (LSS5) also holds. Then Φ is an e-chain.
m −1
Proof Using the identity Xm = F m X0 + i=0 F i GWm −i we see that

∇Φk = F m ,

which tends to zero exponentially fast, by Lemma 6.3.4. The conditions of Theo-
rem 7.5.1 are thus satisﬁed, which completes the proof.

Observe that Proposition 7.5.3 uses the eigenvalue condition (LSS5), the same as-
sumption which was used in Proposition 4.4.3 to obtain ψ-irreducibility for the Gaussian
model, and the same condition that will be used to obtain stability in later chapters.
The analogous Proposition 6.3.3 uses controllability to give conditions under which
the linear state space model is a T-chain. Note that controllability is not required here.
Other speciﬁc nonlinear models, such as bilinear models, can be analyzed similarly
using this approach.

7.6 Commentary*
We have already noted that in the degenerate case where the control set Ow consists
of a single point, the NSS(F ) model defines a semi-dynamical system with state space
X, and in fact many of the concepts introduced in this chapter are generalizations of
standard concepts from dynamical systems theory.
Three standard approaches to the qualitative theory of dynamical systems are topo-
logical dynamics whose principal tool is point set topology; ergodic theory, where one
assumes (or proves, frequently using a compactness argument) the existence of an er-
godic invariant measure; and finally, the direct method of Lyapunov, which concerns
criteria for stability.
The latter two approaches will be developed in a stochastic setting in Parts II and
III. This chapter essentially focused on generalizations of the first approach, which is
also based upon, to a large extent, the structure and existence of minimal sets. Two
excellent expositions in a purely deterministic and control-free setting are the books by
Bhatia and Szegö [34] and Brown [55]. Saperstone [346] considers infinite dimensional
spaces so that, in particular, the methods may be applied directly to the dynamical
system on the space of probability measures which is generated by a Markov processes.
166 The nonlinear state space model

The connections between control theory and irreducibility described here are taken
from Meyn [259] and Meyn and Caines [272, 271]. The dissertations of Chan [61] and
Mokkadem [286], and also Diebolt and Guégan [92], treat discrete time nonlinear state
space models and their associated control models. Diebolt in [91] considers nonlinear
models with additive noise of the form Φk +1 = F (Φk ) + Wk +1 using an approach which
is very different to that described here.
Jakubsczyk and Sontag in [173] present a survey of the results obtainable for forward
accessible discrete time control systems in a purely deterministic setting. They give a
different characterization of forward accessibility, based upon the rank of an associated
Lie algebra, rather than a controllability matrix.
The origin of the approach taken in this chapter lies in the often cited paper by
Stroock and Varadhan [378]. There it is shown that the support of the distribution of
a diffusion process may be characterized by considering an associated control model.
Ichihara and Kunita in [167] and Kliemann in [211] use this approach to develop an
ergodic theory for diffusions. The invariant control sets of [211] may be compared to
minimal sets as defined here.
At this stage, introduction of the e-chain class of models is not well motivated. The
reader who wishes to explore them immediately should move to Chapter 12.
In Duflo [102], a condition closely related to the stability condition which we impose
on ∇Φ is used to obtain the Central Limit Theorem for a nonlinear state space model.
Duflo assumes that the function F satisfies

|F (x, w) − F (y, w)| ≤ α(w)|x − y|

where α is a function on Ow satisfying, for some suﬃciently large m,

E[α(W )m ] < 1.

It is easy to see that any process Φ generated by a nonlinear state space model satisfying
this bound is an e-chain.
For models more complex than the linear model of Section 7.5.2 it will not be as easy
to prove that ∇Φ converges to zero, so a lengthier stability analysis of this sensitivity
process may be necessary. Since ∇Φ is essentially generated by a random linear system
it is therefore likely to either converge to zero or evanesce.
It seems probable that the stochastic Lyapunov function approach of Kushner [232]
or Khas’minskii [206], or a more direct analysis based upon limit theorems for products
of random matrices as developed in, for instance, Furstenberg and Kesten [134] will be
well suited for assessing the stability of ∇Φ .

Commentary for the second edition: The conjecture voiced in the first edition
was confirmed ten years after it was first put into print. A stochastic Lyapunov approach
is introduced in [165] for verification of stability of the sensitivity process1 for a class
of Markov models.
A significant omission in the first edition is any discussion of the relationship between
stability of the sensitivity process ∇Φ and Lyapunov exponents (see [212, 255]). For a
1 The sensitivity process was called the derivative process in the first edition.
7.6. Commentary* 167

given initial condition x, the top Lyapunov exponent is deﬁned as the random variable
1
Λx := lim sup log
∇Φn
.
n →∞ n

The choice of norm is arbitrary. There is also a version deﬁned in expectation: for any
p > 0 denote
1
Λx (p) := lim sup log Ex [
∇Φn
p ].
n →∞ n

One approach to establishing the e-chain property is to show that Λx (p) is independent
of x, and negative for all p suﬃciently small [165].
Methods for estimating the Lyapunov exponent and conditions for verifying equicon-
tinuity are established for versions of the NSS(F ) model, in continuous or discrete time,
in several recent papers under a variety of assumptions [370, 371, 22, 165, 20, 323].
A hidden Markov model (HMM) is a Markov chain Φ, along with an observation
process Y evolving on a state space Y. It is assumed that there is an i.i.d. sequence D
evolving on its own state space D, along with a function G : X × D → Y such that the
observation process can be expressed as a noisy function of the chain

Yn = G(Φn , Dn ), n ≥ 0.

The conditional distribution of Xn given Y0 , . . . , Yn is denoted π̂n . It is known that

Υn := (Yn , π̂n ) is itself a Markov chain [106, 107], but one that is rarely ψ-irreducible.
Consequently we are forced to consider alternative approaches to address stability of
the ﬁltering process {π̂n }. Lyapunov exponents as well as equicontinuity have proved
valuable in the analysis of Υ.
Lyapunov exponents for Υ are examined in a series of papers by Zeitouni and
coauthors [85, 11]. Under certain conditions on the model the Lyapunov exponent Λx
is negative and independent of x, which implies that the ﬁlter is insensitive to its initial
condition. The e-chain property is established directly in [87, 213], under conditions
more general than [11]. The recent survey of Chigansky et al. [68] contains an extensive
bibliography.
Part II

STABILITY STRUCTURES
Chapter 8

Transience and recurrence

We have developed substantial structural results for ψ-irreducible Markov chains in

Part I of this book. Part II is devoted to stability results of ever-increasing strength for
such chains.
In Chapter 1, we discussed in a heuristic manner two possible approaches to the
stability of Markov chains. The ﬁrst of these discussed basic ideas of stability and
instability, formulated in terms of recurrence and transience for ψ-irreducible Markov
chains. The aim of this chapter is to formalize those ideas.
In many ways it is easier to tell when a Markov chain is unstable than when
it is stable: it fails to return to its starting point, it eventually leaves any
“bounded” set with probability one, it returns only a ﬁnite number of times to a
given set of “reasonable size”. Stable chains are then conceived of as those which
do not vanish from their starting points in at least some of these ways. There
are many ways in which stability may occur, ranging from weak “expected return
to origin” properties, to convergence of all sample paths to a single point, as in
global asymptotic stability for deterministic processes. In this chapter we concen-
trate on rather weak forms of stability, or conversely on strong forms of instabil-
ity.
∞Our focus here is on the behavior of the occupation time random variable ηA :=
n =1 I{Φn ∈ A} which counts the number of visits to a set A. In terms of ηA we study
the stability of a chain through the transience and recurrence of its sets.

Uniform transience and recurrence

The set A is called uniformly transient if for there exists M < ∞ such
that Ex [ηA ] ≤ M for all x ∈ A.
The set A is called recurrent if Ex [ηA ] = ∞ for all x ∈ A.

The highlight of this approach is a solidarity, or dichotomy, theorem of surprising

strength.

171
172 Transience and recurrence

Theorem 8.0.1. Suppose that Φ is ψ-irreducible. Then either

(i) every set in B + (X) is recurrent, in which case we call Φ recurrent, or

(ii) there is a countable cover of X with uniformly transient sets, in which case we call
Φ transient, and every petite set is uniformly transient.

Proof This result is proved through a splitting approach in Section 8.2.3. We

also give a different proof, not using splitting, in Theorem 8.3.4, where the cover with
uniformly transient sets is made more explicit, leading to Theorem 8.3.5 where all petite
sets are shown to be uniformly transient if there is just one petite set in B + (X) which
is not recurrent.

The other high point of this chapter is the first development of one of the themes
of the book: the existence of so-called drift criteria, couched in terms of the expected
change, or drift, defined by the one-step transition function P , for chains to be stable
or unstable in the various ways this is defined.

Drift for Markov chains

The (possibly extended valued) drift operator ∆ is deﬁned for any non-
negative measurable function V by

∆V (x) := P (x, dy)V (y) − V (x), x ∈ X. (8.1)

A second goal of this chapter is the development of criteria based on the drift function
for both transience and recurrence.

Theorem 8.0.2. Suppose Φ is a ψ-irreducible chain.

(i) The chain Φ is transient if and only if there exists a bounded non-negative function
V and a set C ∈ B + (X) such that for all x ∈ C c ,

∆V (x) ≥ 0 (8.2)

and
D = {V (x) > sup V (y)} ∈ B+ (X). (8.3)
y ∈C

(ii) The chain Φ is recurrent if there exists a petite set C ⊂ X, and a function V
which is unbounded oﬀ petite sets in the sense that CV (n) := {y : V (y) ≤ n} is
petite for all n, such that

∆V (x) ≤ 0, x ∈ Cc . (8.4)
8.1. Classifying chains on countable spaces 173

Proof The drift criterion for transience is proved in Theorem 8.4.2, whilst the
condition for recurrence is in Theorem 8.4.3.

Such conditions were developed by Lyapunov as criteria for stability in deterministic
systems, by Khas’minskii and others for stochastic differential equations [206, 232], and
by Foster as criteria for stability for Markov chains on a countable space: Theorem 8.0.2
is originally due (for countable spaces) to Foster [129] in essentially the form given above.
There is in fact a converse to Theorem 8.0.2 (ii) also, but only for ψ-irreducible
Feller chains (which include all countable space chains): we prove this in Section 9.4.2.
It is not known whether a converse holds in general.
Recurrence is also often phrased in terms of the hitting time variables τA = inf{k ≥
1 : Φk ∈ A}, with “recurrence” for a set A being defined by L(x, A) = Px (τA < ∞) = 1
for all x ∈ A. The connections between this condition and recurrence as we have defined
it above are simple in the countable state space case: the conditions are in fact equivalent
when A is an atom. In general spaces we do not have such complete equivalence.
Recurrence properties in terms of τA (which we call Harris recurrence properties) are
much deeper and we devote much of the next chapter to them. In this chapter we do
however give some of the simpler connections: for example, if L(x, A) = 1 for all x ∈ A
then ηA = ∞ a.s. when Φ0 ∈ A, and hence A is recurrent (see Proposition 8.3.1).

8.1 Classifying chains on countable spaces

8.1.1 The countable recurrence/transience dichotomy
We turn as before to the countable space to guide and motivate our general results, and
to aid in their interpretation.
When X = Z+ , we initially consider the stability of an individual state α. This will
lead to a global classification for irreducible chains.
The first, and weakest, stability
∞ property involves the expected number of visits to
α. The random variable ηα = n =1 I{Φn = α} has been defined in Section 3.4.3 as the
number of visits by Φ to α: clearly ηα is a measurable function from Ω to Z+ ∪ {∞}.

Classiﬁcation of states
The state α is called transient if Eα (ηα ) < ∞, and recurrent if Eα (ηα ) =
∞.

∞
From the deﬁnition U (x, y) = n =1 P n (x, y) we have immediately that for any
states x, y ∈ X
Ex [ηy ] = U (x, y). (8.5)
The following result gives a structural dichotomy which enables us to consider, not just
the stability of states, but of chains as a whole.
Proposition 8.1.1. When X is countable and Φ is irreducible, either U (x, y) = ∞ for
all x, y ∈ X or U (x, y) < ∞ for all x, y ∈ X.
174 Transience and recurrence

Proof This relies on the deﬁnition of irreducibility through the relation ↔.

If n P n (x, y) = ∞ for some x, y, then since u → x and y → v for any u, v, we have
r, s such that P r (u, x) > 0, P s (y, v) > 0 and so

P r +s+n (u, v) > P r (u, x) P n (x, y) P s (y, v) = ∞. (8.6)
n n

Hence the series U (x, y) and U (u, v) all converge or diverge simultaneously, and the
result is proved.

Now we can extend these stability concepts for states to the whole chain.

Transient and recurrent chains

If every state is transient, the chain itself is called transient.
If every state is recurrent, the chain is called recurrent.

The solidarity results of Proposition 8.1.3 and Proposition 8.1.1 enable us to classify
irreducible chains by the property possessed by one and then all states.
Theorem 8.1.2. When Φ is irreducible, then either Φ is transient or Φ is recurrent.

We can say, in the countable case, exactly what recurrence or transience means in
terms of the return time probabilities L(x, x). In order to connect these concepts, for
a ﬁxed n consider the event {Φn = α}, and decompose this event over the mutually
exclusive events {Φn = α, τα = j} for j = 1, . . . , n. Since Φ is a Markov chain, this
provides the ﬁrst-entrance decomposition of P n given for n ≥ 1 by

n −1
P n (x, α) = Px {τα = n} + Px {τα = j}P n −j (α, α). (8.7)
j =1

If we introduce the generating functions for the series P n and α P n as

∞

U (z ) (x, α) := P n (x, α)z n , |z| < 1, (8.8)
n =1
∞
L(z ) (x, α) := Px (τα = n)z n , |z| < 1, (8.9)
n =1

then multiplying (8.7) by z n and summing from n = 1 to ∞ gives for |z| < 1

U (z ) (x, α) = L(z ) (x, α) + L(z ) (x, α)U (z ) (α, α). (8.10)

From this identity we have

Proposition 8.1.3. For any x ∈ X, U (x, x) = ∞ if and only if L(x, x) = 1.
8.1. Classifying chains on countable spaces 175

Proof Consider the ﬁrst entrance decomposition in (8.10) with x = α: this gives
.
U (z ) (α, α) = L(z ) (α, α) 1 − L(z ) (α, α) . (8.11)

Letting z ↑ 1 in (8.11) shows that

L(α, α) = 1 ⇐⇒ U (α, α) = ∞.

This gives the following interpretation of the transience/recurrence dichotomy of
Proposition 8.1.1.
Proposition 8.1.4. When Φ is irreducible, either L(x, y) = 1 for all x, y ∈ X or
L(x, x) < 1 for all x ∈ X.

Proof From Proposition 8.1.3 and Proposition 8.1.1, we have L(x, x) < 1 for all x
or L(x, x) = 1 for all x. Suppose in the latter case, we have L(x, y) < 1 for some pair
x, y: by irreducibility, U (y, x) > 0 and thus for some n we have Py (Φn = x, τy > n) > 0,
from which we have L(y, y) < 1, which is a contradiction.

In Chapter 9 we will deﬁne Harris recurrence as the property that L(x, A) ≡ 1 for
all x ∈ A and A ∈ B+ (X): for countable chains, we have thus shown that recurrent
chains are also Harris recurrent, a theme we return to in the next chapter when we
explore stability in terms of L(x, A) in more detail.

8.1.2 Speciﬁc models: evaluating transience and recurrence

Calculating the quantities U (x, y) or L(x, x) directly for speciﬁc models is non-trivial
except in the simplest of cases. However, we give as examples two simple models for
which this is possible, and then a deeper proof of a result for general random walk.

Renewal processes and forward recurrence time chains

Let the transition matrix of the forward recurrence time chain be given as in Section 3.3.
Then it is straightforward to see that for all states n > 1,
n −1
1P (n, 1) = 1.
This gives
L(1, 1) = p(n) 1 P n −1 (n, 1) = 1
n ≥1

also. Hence the forward recurrence time chain is always recurrent if p is a proper
distribution.
The calculation in the proof of Proposition 8.1.3 is actually a special case of the use
of the renewal equation. Let Zn be a renewal process with increment distribution p as
deﬁned in Section 2.4. By breaking up the event {Zk = n} over the last time before n
that a renewal occurred we have
∞

u(n) := P(Zk = n) = 1 + u ∗ p(n)
k =0
176 Transience and recurrence

and multiplying by z n and summing over n gives the form

U (z) = [1 − P (z)]−1 (8.12)
∞ n
∞ n
where U (z) := n =0 u(n)z and P (z) := n =0 p(n)z .
Hence a renewal process is also called recurrent if p is a proper distribution, and in
this case U (1) = ∞.
Notice that the renewal equation (8.12) is identical to (8.11) in the case of the
speciﬁc renewal chain given by the return time τα (n) to the state α.

Simple random walk on Z+

Let P be the transition matrix of random walk on a half line in the simplest irreducible
case, namely P (0, 0) = p and
P (x, x − 1) = p, x > 0,
P (x, x + 1) = q, x ≥ 0.
where p + q = 1. This is known as the simple, or Bernoulli, random walk.
We have that
L(0, 0) = p + qL(1, 0),
L(1, 0) = p + qL(2, 0).
Now we use two tricks speciﬁc to chains such as this. Firstly, since the chain is
skip-free to the left, it must reach {0} from {2} only by going through {1}, so that we
have
L(2, 0) = L(2, 1)L(1, 0).
Secondly, the translation invariance of the chain, which implies L(j, j − 1) = L(1, 0),
j ≥ 1, gives us
L(2, 0) = [L(1, 0)]2 .
Thus from (8.13), we ﬁnd that
L(1, 0) = p + q[L(1, 0)]2 (8.13)
so that L(1, 0) = 1 or L(1, 0) = p/q.
This shows that L(1, 0) = 1 if p ≥ q, and from (8.13) we derive the well-known
result that L(0, 0) = 1 if p ≥ q.

Random walk on Z
In order to classify general random walk on the integers we will use the laws of large
numbers. Proving these is outside the scope of this book: see, for example, Billingsley
[37] or Chung [72] for these results.
Suppose that Φn is a random walk such that the increment distribution Γ has a
mean which is zero. The form of the Weak Law of Large Numbers that we will use can
be stated in our notation as
P n (0, A(εn)) → 1 (8.14)
for any ε, where the set A(k) = {y : |y| ≤ k}. From this we prove
Theorem 8.1.5. If Φ is an irreducible random walk on Z whose increment distribution
Γ has mean zero, then Φ is recurrent.
8.2. Classifying ψ-irreducible chains 177

Proof First note that from (8.7) we have for any x

N N k
m =1 P m (x, 0) = k =1 j =0 Px (τ0 = k − j)P j (0, 0)
N N −j
= j =0 P j (0, 0) i=0 Px (τ0 = i) (8.15)
N
≤ j =0 P j (0, 0).
N N
Now using this with the symmetry that m =1 P m (x, 0) = m =1 P m (0, −x) gives
N N
m =0 P m (0, 0) ≥ [2M + 1]−1 |x|≤M j =0 P j (0, x)
N
≥ [2M + 1]−1 j =0 P j (0, A(jM/N )) (8.16)
N
= [2aN + 1]−1 j =0 P j (0, A(aj))

where we choose M = N a where a is to be chosen later.

But now from the Weak Law of Large Numbers (8.14) we have

P k (0, A(ak)) → 1

as k → ∞; and so from (8.16) we have

N N
lim inf N →∞ m =0 P m (x, 0) ≥ lim inf N →∞ [2aN + 1]−1 j =0 P j (0, A(aj))

= [2a]−1 .
(8.17)
Since a can be chosen arbitrarily small, we have U (0, 0) = ∞ and the chain is recurrent.

This proof clearly uses special properties of random walk. If Γ has simpler structure
then we shall see that simpler procedures give recurrence in Section 8.4.3.

8.2 Classifying ψ-irreducible chains

The countable case provides guidelines for us to develop solidarity properties of chains
which admit a single atom rather than a multiplicity of atoms. These ideas can then
be applied to the split chain and carried over through the m-skeleton to the original
chain, and this is the agenda in this section.
In order to accomplish this, we need to describe precisely what we mean by recur-
rence or transience of sets in a general space.

8.2.1 Transience and recurrence for individual sets

For general A, B ∈ B(X) recall from Section 3.4.3 the taboo probabilities given by

AP
n
(x, B) = Px {Φn ∈ B, τA ≥ n},
178 Transience and recurrence

and by convention we set A P 0 (x, A) = 0. Extending the ﬁrst entrance decomposition

(8.7) from the countable space case, for a ﬁxed n consider the event {Φn ∈ B} for
arbitrary B ∈ B(X), and decompose this event over the mutually exclusive events
{Φn ∈ B, τA = j} for j = 1, . . . , n, where A is any other set in B(X). The general
ﬁrst-entrance decomposition can be written

n −1
P n (x, B) = A P n (x, B) + AP
j
(x, dw)P n −j (w, B) (8.18)
j =1 A

whilst the analogous last-exit decomposition is given by

n −1
P n (x, B) = A P n (x, B) + P j (x, dw)A P n −j (w, B). (8.19)
j =1 A

The ﬁrst-entrance decomposition is clearly a decomposition of the event {Φn ∈ A}

which could be developed using the strong Markov property and the stopping time
ζ = τA ∧ n. The last-exit decomposition, however, is not an example of the use of the
strong Markov property: for, although the ﬁrst-entrance time τA is a stopping time
for Φ, the last-exit time is not a stopping time. These decompositions do however
illustrate the same principle that underlies the strong Markov property, namely the
decomposition of an event over the sub-events on which the random time takes on the
(countable) set of values available to it.
We will develop classiﬁcations of sets using the generating functions for the series
{P n } and {A P n }:
∞

U (z ) (x, B) := P n (x, B)z n , |z| < 1, (8.20)
n =1

∞

(z )
UA (x, B) := AP
n
(x, B)z n , |z| < 1. (8.21)
n =1

The kernel U then has the property

∞

U (x, A) = P n (x, A) = lim U (z ) (x, A) (8.22)
z ↑1
n =1

and as in the countable case, for any x ∈ X, A ∈ B(X),

Ex (ηA ) = U (x, A). (8.23)

Thus uniform transience or recurrence is quantiﬁable in terms of the ﬁniteness or oth-

erwise of U (x, A).
The return time probabilities L(x, A) = Px {τA < ∞} satisfy
∞

n (z )
L(x, A) = AP (x, A) = lim UA (x, A). (8.24)
z ↑1
n =1
8.2. Classifying ψ-irreducible chains 179

We will prove the solidarity results we require by exploiting the convolution forms in
(8.18) and (8.19). Multiplying by z n in (8.18) and (8.19) and summing, the ﬁrst entrance
and last exit decompositions give, respectively, for |z| < 1

(z ) (z )
U (z ) (x, B) = UA (x, B) + UA (x, dw)U (z ) (w, B), (8.25)
A

(z ) (z )
U (z ) (x, B) = UA (x, B) + U (z ) (x, dw)UA (w, B). (8.26)
A

In classifying the chain Φ we will use these relationships extensively.

8.2.2 The recurrence/transience dichotomy: chains with an

atom
We can now move to classifying a chain Φ which admits an atom in a dichotomous way
as either recurrent or transient. Through the splitting techniques of Chapter 5 this will
then enable us to classify general chains.

Theorem 8.2.1. Suppose that Φ is ψ-irreducible and admits an atom α ∈ B+ (X).

Then

(i) if α is recurrent, then every set in B + (X) is recurrent;

(ii) if α is transient, then there is a countable covering of X by uniformly transient

sets.

Proof (i) If A ∈ B + (X) then for any x we have r, s such that P r (x, α) > 0,
s
P (α, A) > 0 and so

P r +s+n (x, A) ≥ P r (x, α) P n (α, α) P s (α, A) = ∞. (8.27)
n n

Hence the series U (x, A) diverges for every x, A when U (α, α) diverges.
(ii) To prove the converse, we ﬁrst note that for an atom, transience is equivalent
to L(α, α) < 1, exactly as in Proposition 8.1.3.
Now consider the last exit decomposition (8.26) with A, B = α. We have for any
x∈X
U (z ) (x, α) = Uα(z ) (x, α) + U (z ) (x, α)Uα(z ) (α, α)
and so by rearranging terms we have for all z < 1

U (z ) (x, α) = Uα(z ) (x, α)[1 − Uα(z ) (α, α)]−1 ≤ [1 − L(α, α)]−1 < ∞.

Hence U (x, α) is bounded for all x.

Now consider the countable covering of X given by the sets

j
α(j) = {y : P n (y, α) > j −1 }.
n =1
180 Transience and recurrence

Using the Chapman–Kolmogorov equations,

j
U (x, α) ≥ j −1 U (x, α(j)) inf P n (y, α) ≥ j −2 U (x, α(j))
y ∈α (j )
n =1

and thus {α(j)} is the required cover by uniformly transient sets.

We shall frequently ﬁnd sets which are not uniformly transient themselves, but which
can be covered by a countable number of uniformly transient sets. This leads to the
deﬁnition

Transient sets
If A ∈ B(X) can be covered with a countable number of uniformly transient
sets, then we call A transient.

8.2.3 The general recurrence/transience dichotomy

Now let us consider chains which do not have atoms, but which are strongly aperiodic.
We shall find that the split chain construction leads to a “solidarity result” for the
sets in B + (X) in the ψ-irreducible case, thus allowing classification of Φ as a whole.
Thus the following definitions will not be vacuous.

Stability classiﬁcation of ψ-irreducible chains

(i) The chain Φ is called recurrent if it is ψ-irreducible and U (x, A) ≡ ∞

for every x ∈ X and every A ∈ B + (X).

(ii) The chain Φ is called transient if it is ψ-irreducible and X is transient.

We ﬁrst check that the split chain and the original chain have mutually consistent
recurrent/transient classiﬁcations.
Proposition 8.2.2. Suppose that Φ is ψ-irreducible and strongly aperiodic. Then either
both Φ and Φ̌ are recurrent, or both Φ and Φ̌ are transient.

Proof Strong aperiodicity ensures as in Proposition 5.4.5 that the minorization

condition holds, and thus we can use the Nummelin splitting of the chain Φ to produce
a chain Φ̌ on X̌ which contains an accessible atom α̌.
We see from (5.9) that for every x ∈ X, and for every B ∈ B + (X),
∞ ∞

δx∗ (dyi )P̌ n (yi , B) = P n (x, B). (8.28)
n =1 n =1
8.2. Classifying ψ-irreducible chains 181

If B ∈ B + (X) then since ψ ∗ (B0 ) > 0 it follows from (8.28) that if Φ̌ is recurrent, so is
Φ. Conversely, if Φ̌ is transient, by taking a cover of X̌ with uniformly transient sets it
is equally clear from (8.28) that Φ is transient.
We know from Theorem 8.2.1 that Φ̌ is either transient or recurrent, and so the
dichotomy extends in this way to Φ.

To extend this result to general chains without atoms we ﬁrst require a link between
the recurrence of the chain and its resolvent.
Lemma 8.2.3. For any 0 < ε < 1 the following identity holds:
∞
∞
1−ε n
Kanε = P .
n =1
ε n =0

Proof From the generalized Chapman–Kolmogorov equations (5.46) we have

∞
∞
∞

Kanε = Ka ∗n
ε
= b(n)P n
n =1 n =1 n =0
∞
where we deﬁne b(k) to be the kth term in the sequence n =1 a∗n ε . To complete the
proof, we will show
that b(k) = (1 −
ε)/ε for all k ≥ 0.
Let B(z) = b(k)z k , Aε (z) = aε (k)z k denote the power series representation of
the sequences b and aε . From the identities
1−ε ∞
n
Aε (z) = , B(z) = Aε (z)
1 − εz n =1

we see that B(z) = ((1 − ε)/ε)(1 − z)−1 . By uniqueness of the power series expansion
it follows that b(n) = (1 − ε)/ε for all n, which completes the proof.

As an immediate consequence of Lemma 8.2.3 we have
Proposition 8.2.4. Suppose that Φ is ψ-irreducible.
(i) The chain Φ is transient if and only if each Ka ε -chain is transient.
(ii) The chain Φ is recurrent if and only if each Ka ε -chain is recurrent.

We may now prove

Theorem 8.2.5. If Φ is ψ-irreducible, then Φ is either recurrent or transient.

Proof From Proposition 5.4.5 we are assured that the Ka ε -chain is strongly ape-
riodic. Using Proposition 8.2.2 we know then that each Ka ε -chain can be classiﬁed
dichotomously as recurrent or transient.
Since Proposition 8.2.4 shows that the Ka ε -chain passes on either of these properties
to Φ itself, the result is proved.

We also have the following analogue of Proposition 8.2.4:
182 Transience and recurrence

Theorem 8.2.6. Suppose that Φ is ψ-irreducible and aperiodic.

(i) The chain Φ is transient if and only if one, and then every, m-skeleton Φm is
transient.

(ii) The chain Φ is recurrent if and only if one, and then every, m-skeleton Φm is
recurrent.

Proof
(i) If A is a uniformly transient set for the m-skeleton Φm , with
j P
jm
(x, A) ≤ M , then we have from the Chapman–Kolmogorov equations
∞
m

P j (x, A) = P r (x, dy) P j m (y, A) ≤ mM. (8.29)
j =1 r =1 j

Thus A is uniformly transient for Φ. Hence Φ is transient whenever a skeleton is

transient. Conversely, if Φ is transient then every Φk is transient, since
∞
∞

P j (x, A) ≥ P j k (x, A).
j =1 j =1

(ii) If the m-skeleton is recurrent then from the equality in (8.29) we again have
that
P j (x, A) = ∞, x ∈ X, A ∈ B + (X), (8.30)

so that the chain Φ is recurrent.

Conversely, suppose that Φ is recurrent. For any m it follows from aperiodicity and
Proposition 5.4.5 that Φm is ψ-irreducible, and hence by Theorem 8.2.5, this skeleton
is either recurrent or transient. If it were transient we would have Φ transient, from
(i).

It would clearly be desirable that we strengthen the definition of recurrence to a form
of Harris recurrence in terms of L(x, A), similar to that in Proposition 8.1.4. The key
problem in moving to the general situation is that we do not have, for a general set, the
equivalence in Proposition 8.1.3. There does not seem to be a simple way to exploit the
fact that the atom in the split chain is not only recurrent but also satisfies L(α̌, α̌) = 1,
and the dichotomy in Theorem 8.2.5 is as far as we can go without considerably stronger
techniques which we develop in the next chapter.
Until such time as we provide these techniques we will consider various partial
relationships between transience and recurrence conditions, which will serve well in
practical classification of chains.

8.3 Recurrence and transience relationships

8.3.1 Transience of sets
We next give conditions on hitting times which ensure that a set is uniformly transient,
and which commence to link the behavior of τA with that of ηA .
8.3. Recurrence and transience relationships 183

Proposition 8.3.1. Suppose that Φ is a Markov chain, but not necessarily irreducible.
(i) If any set A ∈ B(X) is uniformly transient with U (x, A) ≤ M for x ∈ A, then
U (x, A) ≤ 1 + M for every x ∈ X.
(ii) If any set A ∈ B(X) satisﬁes L(x, A) = 1 for all x ∈ A, then A is recurrent. If Φ
is ψ-irreducible, then A ∈ B+ (X) and we have U (x, A) ≡ ∞ for x ∈ X.
(iii) If any set A ∈ B(X) satisﬁes L(x, A) ≤ ε < 1 for x ∈ A, then we have U (x, A) ≤
1/[1 − ε] for x ∈ X, so that in particular A is uniformly transient.
(iv) Let τA (k) denote the k th return time to A, and suppose that for some m

Px (τA (m) < ∞) ≤ ε < 1, x ∈ A, (8.31)

then U (x, A) ≤ 1 + m/[1 − ε] for every x ∈ X.

Proof (i) We use the ﬁrst-entrance decomposition: letting z ↑ 1 in (8.25) with

A = B shows that for all x,

U (x, A) ≤ 1 + sup U (y, A), (8.32)

y ∈A

which gives the required bound.

(ii) Suppose that L(x, A) ≡ 1 for x ∈ A. The last-exit decomposition (8.26) gives

(z ) (z ) (z )
U (x, A) = UA (x, A) + U (z ) (x, dy)UA (y, A).
A

Letting z ↑ 1 gives for x ∈ A,

U (x, A) = 1 + U (x, A),

which shows that U (x, A) = ∞ for x ∈ A, and hence that A is recurrent.

Suppose now that Φ is ψ-irreducible. The set A∞ = {x ∈ X : L(x, A) = 1} contains
A by assumption. Hence we have for any x,

P (x, dy)L(y, A) = P (x, A) + P (x, dy)UA (y, A) = L(x, A).
Ac

This shows that A∞ is absorbing, and hence full by Proposition 4.2.3.

It follows from ψ-irreducibility that Ka 1 (x, A) > 0 for all x ∈ X, and we also have
2
for all x that, from (5.47),

U (x, A) ≥ Ka 1 (x, dy)U (y, A) = ∞
A 2

as claimed.
(iii) Suppose on the other hand that L(x, A) ≤ ε < 1, x ∈ A. The last exit
decomposition again gives

(z ) (z )
U (z ) (x, A) = UA (x, A) + U (z ) (x, dy)UA (y, A) ≤ 1 + εU (z ) (x, A)
A
184 Transience and recurrence

and so U (z ) (x, A) ≤ [1 − ε]−1 : letting z ↑ 1 shows that A is uniformly transient as

claimed.
(iv) Suppose now (8.31) holds. This means that for some ﬁxed m ∈ Z+ , we have
ε < 1 with
Px (ηA ≥ m) ≤ ε, x ∈ A; (8.33)
by induction in (8.33) we ﬁnd that

Px (ηA ≥ m(k + 1)) = A
Px (Φτ A (k m ) ∈ dy)Py (ηA ≥ m)

≤ ε Px (τA (km) < ∞)

(8.34)
≤ ε Px (ηA ≥ km)

≤ εk +1 ,

and so for x ∈ A ∞
U (x, A) = n =1 Px (ηA ≥ n)
∞
≤ m[1 + k =1 Px (ηA ≥ km)] (8.35)

≤ m/[1 − ε].
We now use (i) to give the required bound over all of X.

If there is one uniformly transient set then it is easy to identify other such sets, even
without irreducibility. We have
a
Proposition 8.3.2. If A is uniformly transient, and B A for some a, then B is
uniformly transient. Hence if A is uniformly transient, there is a countable covering of
A by uniformly transient sets.

a
Proof From Lemma 5.5.2 (iii), we have when B A that for some δ > 0,

U (x, A) ≥ U (x, dy)Ka (y, A) ≥ δU (x, B)

so that B is uniformly transient if A is uniformly transient. Since A is covered by the

a
sets A(m), m ∈ Z+ , and each A(m) A for some a, the result follows.

The next result provides a useful condition under which sets are transient even if
not uniformly transient.
Proposition 8.3.3. Suppose Dc is absorbing and L(x, Dc ) > 0 for all x ∈ D. Then D
is transient.

Proof Suppose Dc is absorbing and write B(m) = {y ∈ D : P m (y, Dc ) ≥ m−1 }:

clearly, the sets B(m) cover D since L(x, Dc ) > 0 for all x ∈ D, by assumption.
But since Dc is absorbing, for every y ∈ B(m) we have

Py (ηB (m ) ≥ m) ≤ Py (ηD ≥ m) ≤ [1 − m−1 ]

8.3. Recurrence and transience relationships 185

and thus (8.31) holds for B(m); from (8.35) it follows that B(m) is uniformly transient.

These results have direct application in the ψ-irreducible case. We next give a
number of such consequences.

8.3.2 Identifying transient sets for ψ-irreducible chains

We ﬁrst give an alternative proof that there is a recurrence/transience dichotomy for
general state space chains which is an analogue of that in the countable state space case.
Although this result has already been shown through the use of the splitting technique
in Theorem 8.2.5, the following approach enables us to identify uniformly transient sets
without going through the atom.
Theorem 8.3.4. If Φ is ψ-irreducible, then Φ is either recurrent or transient.

Proof Suppose Φ is not recurrent: that is, there exists some pair A ∈ B+ (X),
x ∈ X with U (x∗ , A) < ∞. If A∗ = {y : U (y, A) = ∞}, then ψ(A∗ ) = 0: for otherwise
∗

we would have P m (x∗ , A∗ ) > 0 for some m, and then

U (x∗ , A) ≥ X P m (x∗ , dw)U (w, A)
(8.36)
≥ A ∗ P m (x∗ , dw)U (w, A) = ∞.

Set Ar = {y ∈ A : U (y, A) ≤ r}. Since ψ(A) > 0, and Ar ↑ A ∩ Ac∗ , there must exist
some r such that ψ(Ar ) > 0, and by Proposition 8.3.1 (i) we have for all y,

U (y, Ar ) ≤ 1 + r. (8.37)
M
Consider now Ar (M ) = {y : m =0 P m (y, Ar ) > M −1 }. For any x, from (8.37)

∞
M
M (1 + r) ≥ M U (x, Ar ) ≥ P n (x, Ar )
m =1 n =m

∞

M
= P n (x, dw) P m (w, Ar )
n =0 X m =1
(8.38)
∞

M
≥ P n (x, dw) P m (w, Ar )
n =0 A r (M ) m =1

∞

≥ M −1 P n (x, Ar (M )).
n =0

Since ψ(Ar ) > 0 we have ∪m Ar (m) = X, and so the {Ar (m)} form a partition of X into
uniformly transient sets as required.

The partition of X into uniformly transient sets given in Proposition 8.3.2 and in
Theorem 8.3.4 leads immediately to
186 Transience and recurrence

Theorem 8.3.5. If Φ is ψ-irreducible and transient, then every petite set is uniformly
transient.

Proof If C is petite, then by Proposition 5.5.5 (iii) there exists a sampling distri-
a
bution a such that C B for any B ∈ B+ (X). If Φ is transient then there exists at
least one B ∈ B+ (X) which is uniformly transient, so that C is uniformly transient from
Proposition 8.3.2.

Thus petite sets are also “small” within the transience deﬁnitions. This gives us a
criterion for recurrence which we shall use in practice for many models; we combine it
with a criterion for transience in

Theorem 8.3.6. Suppose that Φ is ψ-irreducible. Then

(i) Φ is recurrent if there exists some petite set C ∈ B(X) such that L(x, C) ≡ 1 for
all x ∈ C.

(ii) Φ is transient if and only if there exist two sets D, C in B + (X) with L(x, C) < 1
for all x ∈ D.

Proof (i) From Proposition 8.3.1 (ii) C is recurrent. Since C is petite Theo-
rem 8.3.5 shows Φ is recurrent. Note that we do not assume that C is in B + (X), but
that this follows also.
(ii) Suppose the sets C, D exist in B + (X). There must exist Dε ⊂ D such that
ψ(Dε ) > 0 and L(x, C) ≤ 1 − ε for all x ∈ Dε . If also ψ(Dε ∩ C) > 0 then since
L(x, C) ≥ L(Dε ∩ C) we have that Dε ∩ C is uniformly transient from Proposition 8.3.1
and the chain is transient.
Otherwise we must have ψ(Dε ∩ C c ) > 0. The maximal nature of ψ then implies
that for some δ > 0 and some n ≥ 1 the set Cδ := {y ∈ C : C P n (y, Dε ∩ C c ) > δ} also
has positive ψ-measure. Since, for x ∈ Cδ ,

1 − L(x, Cδ ) ≥ C P (x, dy)[1 − L(y, Cδ )] ≥ δε
n
D ε ∩C c

the set Cδ is uniformly transient, and again the chain is transient.

To prove the converse, suppose that Φ is transient. Then for some petite set C ∈
B + (X) the set D = {y ∈ C c : L(y, C) < 1} is non-empty; for otherwise by (i) the chain
is recurrent. Suppose that ψ(D) = 0. Then by Proposition 4.2.3 there exists a full
absorbing set F ⊂ Dc . By deﬁnition we have L(x, C) = 1 for x ∈ F \ C, and since
F is absorbing it then follows that L(x, C) = 1 for every x ∈ F , and hence also that
L(x, C0 ) = 1 for x ∈ F where C0 = C ∩ F also lies in B + (X).
But now from Proposition 8.3.1 (ii), we see that C0 is recurrent, which is a contra-
diction of Theorem 8.3.5; and we conclude that D ∈ B+ (X) as required.

We would hope that ψ-null sets would also have some transience property, and
indeed they do.

Proposition 8.3.7. If Φ is ψ-irreducible, then every ψ-null set is transient.

8.4. Classiﬁcation using drift criteria 187

Proof Suppose that Φ is ψ-irreducible, and D is ψ-null. By Proposition 4.2.3, Dc

contains an absorbing set, whose complement can be covered by uniformly transient
sets as in Proposition 8.3.3: clearly, these uniformly transient sets cover D itself, and
we are ﬁnished.

As a direct application of Proposition 8.3.7 we extend the description of the cyclic
decomposition for ψ-irreducible chains to give
Proposition 8.3.8. Suppose that Φ is a ψ-irreducible Markov chain on (X, B(X)).
Then there exist sets D1 , . . . , Dd ∈ B(X) such that
(i) for x ∈ Di , P (x, Di+1 ) = 1, i = 0, . . . , d − 1 (mod d);
!d
(ii) the set N = [ i=1 Di ]c is ψ-null and transient.

Proof The existence of the periodic sets Di is guaranteed by Theorem 5.4.4, and
the
!d fact that the set N is transient is then a consequence of Proposition 8.3.3, since
i=1 Di is itself absorbing.

In the main, transient sets and chains are ones we wish to exclude in practice. The
results of this section have formalized the situation we would hope would hold: sets
which appear to be irrelevant to the main dynamics of the chain are indeed so, in many
diﬀerent ways. But one cannot exclude them all, and for all of the statements where
ψ-null (and hence transient) exceptional sets occur, one can construct examples to show
that the “bad” sets need not be empty.

8.4 Classiﬁcation using drift criteria

Identifying whether any particular model is recurrent or transient is not trivial from
what we have done so far, and indeed, the calculation of the matrix U or the hitting
time probabilities L involves in principle the calculation and analysis of all of the P n , a
daunting task in all but the most simple cases such as those addressed in Section 8.1.2.
Fortunately, it is possible to give practical criteria for both recurrence and tran-
sience, couched purely in terms of the drift of the one-step transition matrix P towards
individual sets, based on Theorem 8.3.6.

8.4.1 A drift criterion for transience

We ﬁrst give a criterion for transience of chains on general spaces, which rests on ﬁnding
the minimal solution to a class of inequalities.
Recall that σC , the hitting time on a set C, is identical to τC on C c and σC = 0
on C.
Proposition 8.4.1. For any C ∈ B(X), the pointwise minimal non-negative solution
to the set of inequalities

P (x, dy)h(y) ≤ h(x), x ∈ Cc ,
(8.39)
h(x) ≥ 1, x ∈ C,
188 Transience and recurrence

is given by the function

h∗ (x) = Px (σC < ∞), x ∈ X;

and h* satisﬁes (8.39) with equality.

Proof Since for x ∈ C c

Px (σC < ∞) = P (x, C) + P (x, dy)Py (σC < ∞) = P h∗ (x)
Cc

it is clear that h∗ satisﬁes (8.39) with equality.

Now let h be any solution to (8.39). By iterating (8.39) we have

h(x) ≥ P (x, dy)h(y) + P (x, dy)h(y)
C Cc

≥ P (x, dy)h(y) + P (x, dy)[ P (y, dz)h(z) + P (x, dz)h(z)]
C Cc C Cc

..
.

N

≥ CP
j
(x, dy)h(y) + CP
N
(x, dy)h(y).
j =1 C Cc
(8.40)
Letting N → ∞ shows that h(x) ≥ h∗ (x) for all x.

This gives the required drift criterion for transience. Recall the definition of the
drift operator as ∆V (x) = P (x, dy)V (y) − V (x); obviously ∆ is well-defined if V is
bounded. We define the sublevel set CV (r) of any function V for r ≥ 0 by

CV (r) := {x : V (x) ≤ r}.

Theorem 8.4.2. Suppose Φ is a ψ-irreducible chain. Then Φ is transient if and only

if there exists a bounded function V : X → R+ and r ≥ 0 such that
(i) both CV (r) and CV (r)c lie in B + (X);
(ii) whenever x ∈ CV (r)c ,
∆V (x) > 0. (8.41)

Proof Suppose that V is an arbitrary bounded solution of (i) and (ii), and let M
be a bound for V over X. Clearly M > r. Set C = CV (r), D = C c , and
"
[M − V (x)]/[M − r] x ∈ D
hV (x) =
1 x∈C

so that hV is a solution of (8.39). Then from the minimality of h∗ in Proposition 8.4.1,

hV is an upper bound on h∗ , and since for x ∈ D, hV (x) < 1 we must have L(x, C) < 1
also for x ∈ D.
8.4. Classiﬁcation using drift criteria 189

Hence Φ is transient as claimed, from Theorem 8.3.6.

Conversely, if Φ is transient, there exists a bounded function V satisfying (i) and
(ii). For from Theorem 8.3.6 we can always ﬁnd ε < 1 and a petite set C ∈ B+ (X) such
that {y ∈ C c : L(y, C) < ε} is also in B + (X). Thus from Proposition 8.4.1, the function
V (x) = 1 − Px (σC < ∞) has the required properties.

8.4.2 A drift criterion for recurrence

Theorem 8.4.2 essentially asserts that if Φ “drifts away” in expectation from a set in
B + (X), as indicated in (8.41), then Φ is transient. Of even more value in assessing
stability are conditions which show that “drift toward” a set implies recurrence, and we
provide the ﬁrst of these now. The condition we will use is

Drift criterion for recurrence

(V1) There exists a positive function V and a set C ∈ B(X) satisfying

∆V (x) ≤ 0, x ∈ Cc . (8.42)

We will find frequently that, in order to test such drift for the process Φ, we need
to consider functions V : X → R such that the set CV (M ) = {y ∈ X : V (y) ≤ M }
is “finite” for each M . Such a function on a countable space or topological space is
easy to define: in this abstract setting we first need to define a class of functions with
this property, and we will find that they recur frequently, giving further meaning to the
intuitive meaning of petite sets.

Functions unbounded oﬀ petite sets

We will call a measurable function V : X → R+ unbounded oﬀ petite sets
for Φ if for any n < ∞, the sublevel set CV (n) is petite, where

CV (n) = {y : V (y) ≤ n}.

Note that since, for an irreducible chain, a ﬁnite union of petite sets is petite, and
since any subset of a petite set is itself petite, a function V : X → R+ will be unbounded
oﬀ petite sets for Φ if there merely exists a sequence {Cj } of petite sets such that, for
any n < ∞
*
N
CV (n) ⊆ Cj (8.43)
j =1
190 Transience and recurrence

for some N < ∞. In practice this may be easier to verify directly.

We now have a drift condition which provides a test for recurrence.
Theorem 8.4.3. Suppose Φ is ψ-irreducible. If there exists a petite set C ⊂ X, and
a function V which is unbounded oﬀ petite sets such that (V1) holds, then L(x, C) ≡ 1
and Φ is recurrent.

Proof We will show that L(x, C) ≡ 1 which will give recurrence from Theo-
rem 8.3.6. Note that by replacing the set C by C ∪ CV (n) for n suitably large, we
can assume without loss of generality that C ∈ B+ (X).
Suppose by way of contradiction that the chain is transient, and thus that there
exists some x∗ ∈ C c with L(x∗ , C) < 1.
Set CV (n) = {y ∈ X : V (y) ≤ n}: we know this is petite, by definition of V , and
hence it follows from Theorem 8.3.5 that CV (n) is uniformly transient for any n. Now
fix M large enough that
M > V (x∗ )/[1 − L(x∗ , C)]. (8.44)
Let us modify P to define a kernel P/ with entries P/(x, A) = P (x, A) for x ∈ C c and
P/(x, x) = 1, x ∈ C. This defines a chain Φ/ with C as an absorbing set, and with the
property that for all x ∈ X

P/(x, dy)V (y) ≤ V (x). (8.45)

/ is absorbed in C, we also have

Since P is unmodiﬁed outside C, but Φ

P/n (x, C) = Px (τC ≤ n) ↑ L(x, C), x ∈ Cc , (8.46)

whilst for A ⊆ C c
P/ n (x, A) ≤ P n (x, A), x ∈ Cc . (8.47)
By iterating (8.45) we thus get, for ﬁxed x ∈ C c

n
V (x) ≥ P/ (x, dy)V (y)

≥ P/n (x, dy)V (y) (8.48)
C c ∩[C V (M )] c

≥ M 1 − P/n (x, CV (M ) ∪ C) .

Since CV (M ) is uniformly transient, from (8.47) we have

P/n (x∗ , CV (M ) ∩ C c ) ≤ P n (x∗ , CV (M ) ∩ C c ) → 0, n → ∞. (8.49)

Combining this with (8.46) gives

[1 − P/n (x∗ , CV (M ) ∪ C)] → [1 − L(x∗ , C)], n → ∞. (8.50)

Letting n → ∞ in (8.48) for x = x∗ provides a contradiction with (8.50) and our choice
of M . Hence we must have L(x, C) ≡ 1, and Φ is recurrent, as required.

8.4. Classiﬁcation using drift criteria 191

8.4.3 Random walks with bounded range

The drift condition on the function V in Theorem 8.4.3 basically says that, whenever
the chain is outside C, it “moves down” towards that part of the space described by
the petite sets outside which V tends to infinity.
This condition implies that we know where the petite sets for Φ lie, and can identify
those functions which are unbounded off the petite sets. This provides very substantial
motivation for the identification of petite sets in a manner independent of Φ; and for
many chains we can use the results in Chapter 6 to give such form to the results.
On a countable space, of course, finite sets are petite. Our problem is then to identify
the correct test function to use in the criteria.
In order to illustrate the use of the drift criteria we will first consider the simplest case
of a random walk on Z with finite range r. Thus we assume the increment distribution
Γ is concentrated on the integers and is such that Γ(x) = 0 for |x| > r. We then have
a relatively simple proof of the result in Theorem 8.1.5.

Proposition 8.4.4. Suppose that Φ is an irreducible random walk on the integers. If

the increment distribution Γ has a bounded range and the mean of Γ is zero, then Φ is
recurrent.

Proof In Theorem 8.4.3 choose the test function V (x) = |x|. Then for x > r we
have that
P (x, y)[V (y) − V (x)] = Γ(w)w,
y y

whilst for x < −r we have that

P (x, y)[V (y) − V (x)] = − Γ(w)w.
y w

Suppose the “mean drift”

β= Γ(w)w = 0.
w

Then the conditions of Theorem 8.4.3 are satisﬁed with C = {−r, . . . , r} and with (8.42)
holding for x ∈ C c , and so the chain is recurrent.

Proposition 8.4.5. Suppose that Φ is an irreducible random walk on the integers. If

the increment distribution Γ has a bounded range and the mean of Γ is non-zero, then
Φ is transient.

Proof Suppose Γ has non-zero mean β > 0. We will establish for some bounded
monotone increasing V that

P (x, y)V (y) = V (x) (8.51)
y

for x ≥ r.
192 Transience and recurrence

This time choose the test function V (x) = 1 − ρx for x ≥ 0, and V (x) = 0 elsewhere.
The sublevel sets of V are of the form (−∞, r] with r ≥ 0. This function satisﬁes (8.51)
if and only if for x ≥ r
P (x, y)[ρy /ρx ] = 1 (8.52)
y

so that this V can be constructed as a valid test function if (and only if) there is a
ρ < 1 with
Γ(w)ρw = 1. (8.53)
w

Therefore the existence of a solution to (8.53) will imply that the chain is transient,
since return tothe whole half line (−∞, r] is less than sure from Proposition 8.4.2.
Write β(s) = w Γ(w)sw : then β is well defined for s ∈ (0, 1] by the bounded range
assumption. By irreducibility, we must have Γ(w) > 0 for some w < 0, so that β(s) → ∞
as s → 0. Since β(1) = 1, and β (1) = w wΓ(w) = β > 0 it follows that such a ρ
exists, and hence the chain is transient.
Similarly, if the mean of Γ is negative, we can by symmetry prove transience because
the chain fails to return to the half line [−r, ∞).

For random walk on the half line Z+ with bounded range, as defined by (RWHL1)
we find

Proposition 8.4.6. If the random walk increment distribution Γ on the integers has
mean β and a bounded range, then the random walk on Z+ is recurrent if and only if
β ≤ 0.

Proof If β is positive, then the probability of return of the unrestricted random

walk to (−∞, r] is less than one, for starting points above r, and since the probability
of return of the random walk on a half line to [0, r] is identical to the return to (−∞, r]
for the unrestricted random walk, the chain is transient.
If β ≤ 0, then we have as for the unrestricted random walk that, for the test function
V (x) = x and all x ≥ r

P (x, y)[V (y) − V (x)] = Γ(w)w ≤ 0;
y w

but since, in this case, the set {x ≤ r} is finite, we have (8.42) holding and the chain is
recurrent.

The first part of this proof involves a so-called “stochastic comparison” argument:
we use the return time probabilities for one chain to bound the same probabilities for
another chain. This is simple but extremely effective, and we shall use it a number
of times in classifying random walk. A more general formulation will be given in Sec-
tion 9.5.1.
Varying the condition that the range of the increment is bounded requires a much
more delicate argument, and indeed the known result of Theorem 8.1.5 for a general
random walk on Z, that recurrence is equivalent to the mean β = 0, appears difficult if
not impossible to prove by drift methods without some bounds on the spread of Γ.
8.5. Classifying random walk on R+ 193

8.5 Classifying random walk on R+

In order to give further exposure to the use of drift conditions, we will conclude this
chapter with a detailed examination of random walk on R+ .
The analysis here is obviously immediately applicable to the various queueing and
storage models introduced in Chapter 2 and Chapter 3, although we do not fill in the
details explicitly. The interested reader will find, for example, that the conditions on the
increment do translate easily into intuitively appealing statements on the mean input
rate to such systems being no larger than the mean service or output rate if recurrence
is to hold.
These results are intended to illustrate a variety of approaches to the use of the
stability criteria above. Different test functions are utilized, and a number of different
methods of ensuring they are applicable are developed. Many of these are used in the
sequel where we classify more general models.
As in (RW1) and (RWHL1) we let Φ denote a chain with
Φn = [Φn −1 + Wn ]+
where as usual Wn is a noise variable with distribution Γ and mean β which we shall
assume in this section is well defined and finite.
Clearly we would expect from the bounded increments results above that β ≤ 0 is
the appropriate necessary and sufficient condition for recurrence of Φ. We now address
the three separate cases in different ways.

8.5.1 Recurrence when β is negative

When the inequality is strict it is not hard to show that the chain is recurrent.
Proposition 8.5.1. If Φ is random walk on a half line and if

β = w Γ(dw) < 0,

then Φ is recurrent.

Proof Clearly the chain is ϕ-irreducible when β < 0 with ϕ = δ0 , and all compact
sets are small as in Chapter 5. To prove recurrence we use Theorem 8.4.3, and show
that we can in fact ﬁnd a suitably unbounded function V and a compact set C satisfying

P (x, dy)V (y) ≤ V (x) − ε, x ∈ Cc , (8.54)

for some ε > 0. As in the countable case we note that since β < 0 there exists x0 < ∞
such that ∞
w Γ(dw) < β/2 < 0,
−x 0
and thus if V (x) = x, for x > x0
∞
P (x, dy)[V (y) − V (x)] ≤ w Γ(dw). (8.55)
−x 0

Hence taking ε = β/2 and C = [0, x0 ] we have the required result.

194 Transience and recurrence

8.5.2 Recurrence when β is zero

When the mean increment β = 0 the situation is much less simple, and in general the
drift conditions can be verified simply only under somewhat stronger conditions on the
increment distribution Γ, such as an assumption of a finite variance of the increments.
We will find it convenient to develop prior to our calculations some detailed bounds
on the moments of Γ, which will become relevant when we consider test functions of
the form V (x) = log(1 + |x|).

Lemma 8.5.2. Let W be a random variable with law Γ, s a positive number and t any
real number. Then for any A ⊆ {w ∈ R : s + tw > 0},

E[log(s + tW )I{W ∈ A}] ≤ Γ(A) log(s) + (t/s)E[W I{W ∈ A}]

− (t2 /(2s2 ))E[W 2 I{W ∈ A, tW < 0}].

Proof For all x > −1, log(1 + x) ≤ x − (x2 /2)I{x < 0}. Thus

log(s + tW )I{W ∈ A} = [log(s) + log(1 + tW/s)]I{W ∈ A}

≤ [log(s) + tW/s]I{W ∈ A}

− ((tW )2 /(2s2 ))I{tW < 0, W ∈ A}

and taking expectations gives the result.

Lemma 8.5.3. Let W be a random variable with law Γ and ﬁnite variance. Let s be a
positive number and t a real number. Then

lim −xE[W I{W < t − sx}] = lim xE[W I{W > t + sx}] = 0. (8.56)
x→∞ x→∞

Furthermore, if E[W ] = 0, then

lim −xE[W I{W > t − sx}] = lim xE[W I{W < t + sx}] = 0. (8.57)
x→∞ x→∞

Proof This is a consequence of

∞ ∞
0 ≤ lim (t + sx) wΓ(dw) ≤ lim w2 Γ(dw) = 0,
x→∞ t+sx x→∞ t+sx

and
t+sx t+sx
0 ≤ lim (t + sx) wΓ(dw) ≤ lim w2 Γ(dw) = 0.
x→−∞ −∞ x→−∞ −∞

If E[W ] = 0, then E[W I{W > t + sx}] = −E[W I{W < t + sx}], giving the second
result.

We now prove
8.5. Classifying random walk on R+ 195

Proposition 8.5.4. If W is an increment variable on R with β = 0 and

0 < E[W ] = w2 Γ(dw) < ∞,
2

then the random walk on R+ with increment W is recurrent.

Proof We use the test function

"
log(1 + x) x > R
V (x) = (8.58)
0 0≤x≤R
where R is a positive constant to be chosen. Since β = 0 and 0 < E[W 2 ] the chain is
δ0 -irreducible, and we have seen that all compact sets are small as in Chapter 5. Hence
V is unbounded oﬀ petite sets.
For x > R, 1 + x > 0, and thus by Lemma 8.5.2,
Ex [V (X1 )] = E[log(1 + x + W )I{x + W > R}]
(8.59)
≤ (1 − Γ(−∞, R − x)) log(1 + x) + U1 (x) − U2 (x),
where in order to bound the terms in the expansion of the logarithms in V , we consider
separately
U1 (x) = (1/(1 + x))E[W I{W > R − x}]
(8.60)
U2 (x) = (1/(2(1 + x)2 ))E[W 2 I{R − x < W < 0}].
Since E[W 2 ] < ∞
U2 (x) = (1/(2(1 + x)2 ))E[W 2 I{W < 0}] − o(x−2 ),
and by Lemma 8.5.3, U1 is also o(x−2 ).
Thus by choosing R large enough
Ex [V (X1 )] ≤ V (x) − (1/(2(1 + x)2 ))E[W 2 I{W < 0}] + o(x−2 )
(8.61)
≤ V (x), x > R.
Hence the conditions of Theorem 8.4.3 hold, and chain is recurrent.

8.5.3 Transience of skip-free random walk when β is positive

It is possible to verify transience when β > 0, without any restrictions on the range of
the increments of the distribution Γ, thus extending Proposition 8.4.5; but the argument
(in Proposition 9.1.2) is a somewhat diﬀerent one which is based on the Strong Law of
Large Numbers and must wait some stronger results on the meaning of recurrence in
the next chapter.
Proving transience for random walk without bounded range using drift conditions is
diﬃcult in general. There is however one model for which some exact calculations can
be made: this is the random walk which is “skip-free to the right” and which models
the GI/M/1 queue as in Theorem 3.3.1.
Proposition 8.5.5. If Φ denotes random walk on a half line Z+ which is skip-free to
the right (so Γ(x) = 0 for x > 1), and if

β= w Γ(w) > 0,
then Φ is transient.
196 Transience and recurrence

Proof We can assume without loss of generality that Γ(−∞, 0) > 0: for clearly, if
Γ[0, ∞) = 1 then Px (τ0 < ∞) = 0, x > 0 and the chain moves inexorably to inﬁnity;
hence it is not irreducible, and it is transient in every meaning of the word.
We will show that for a chain which is skip-free to the right the condition β > 0 is
suﬃcient for transience, by examining the solutions of the equations

P (x, y)V (y) = V (x), x ≥ 1, (8.62)

and actually constructing a bounded non-constant positive solution if β is positive. The

result will then follow from Theorem 8.4.2.
First note that we can assume V (0) = 0 by linearity, and write out the equation
(8.62) in this case as

V (x) = Γ(−x + 1)V (1) + Γ(−x + 2)V (2) + · · · + Γ(1)V (1 + x). (8.63)

Once the first value in the V (x) sequence is chosen, we therefore have the remaining
values given by an iterative process. Our goal is to show that we can define the sequence
in a way that gives us a non-constant positive bounded solution to (8.63).
In order to do this we first write
∞
∞

V ∗ (z) = V (x)z x , Γ∗ (z) = Γ(x)z x ,
0 −∞

where V ∗ (z) has yet to be shown to be deﬁned for any z and Γ∗ (z) is clearly deﬁned at
least for |z| ≥ 1. Multiplying by z x in (8.63) and summing we have that

V ∗ (z) = Γ∗ (z −1 )V ∗ (z) − Γ(1)V (1). (8.64)

Now suppose that we can show (as we do below) that there is an analytic expansion of
the function
∞
−1 ∗ −1
z [1 − z]/[Γ (z ) − 1] = bn z n (8.65)
0

in the region 0 < z < 1 with bn ≥ 0. Then we will have the identity

V ∗ (z) = zΓ(1)V (1)z −1 /[Γ∗ (z −1 ) − 1]

∞
= zΓ(1)V (1)( 0 z n )z −1 [1 − z]/[Γ∗ (z −1 ) − 1] (8.66)
∞ ∞
= zΓ(1)V (1)( 0 z n )( 0 bm z m ).

From this, we will be able to identify the form of the solution V . Explicitly, from (8.66)
we have ∞ n
V ∗ (z) = zΓ(1)V (1) n =0 z n m =0 bm (8.67)
so that equating coeﬃcients of z n in (8.67) gives

x−1
V (x) = Γ(1)V (1) bm .
m =0
8.6. Commentary* 197

Clearly then the solution V is bounded and non-constant if

bm < ∞. (8.68)
m

Thus we have reduced the question of transience to identifying conditions under which
the expansion in (8.65) holds with the coeﬃcients bj positive and summable.
Let us write aj = Γ(1 − j) so that
∞

A(z) := aj z j = zΓ∗ (z −1 )
0

and for 0 < z < 1 we have

B(z) := z[Γ∗ (z −1 ) − 1]/[1 − z] = [A(z) − z]/[1 − z]

= 1 − [1 − A(z)]/[1 − z] (8.69)
∞ ∞
= 1− 0 zj n =j +1 an .

Now if we have a positive mean for the increment distribution,

∞
∞

| zj an | ≤ nan < 1
0 n =j +1 n

and so B(z)−1 is well deﬁned for |z| < 1; moreover, by the expansion in (8.69)

B(z)−1 = bj z j

with all with all bj ≥ 0, and hence by Abel’s Theorem,

bj = [1 − nan ]−1 = β −1
n

which is ﬁnite as required.

8.6 Commentary*
On countable spaces the solidarity results we generalize here are classical, and thorough
expositions are in Feller [114], Chung [71], Çinlar [59] and many more places. Recurrence
is called persistence by Feller, but the terminology we use here seems to have become
the more standard. The ﬁrst entrance, and particularly the last exit, decomposition are
vital tools introduced and exploited in a number of ways by Chung [71].
There are several approaches to the transience/recurrence dichotomy. A common
one which can be shown to be virtually identical with that we present here uses the
concept of inessential sets (sets for which ηA is almost surely ﬁnite). These play the
role of transient parts of the space, with recurrent parts of the space being sets which
198 Transience and recurrence

are not inessential. This is the approach in Orey [309], based on the original methods
of Doeblin [95] and Doob [99].
Our presentation of transience, stressing the role of uniformly transient sets, is
new, although it is implicit in many places. Most of the individual calculations are in
Nummelin [303], and a number are based on the more general approach in Tweedie
[394]. Equivalences between properties of the kernel U (x, A), which we have called
recurrence and transience properties, and the properties of essential and inessential sets
are studied in Tuominen [390].
The uniform transience property is inherently stronger than the inessential property,
and it certainly aids in showing that the skeletons and the original chain share the
dichotomy between recurrence and transience. For use of the properties of skeleton
chains in direct application, see Tjøstheim [386].
The drift conditions we give here are due in the countable case to Foster [129],
and the versions for more general spaces were introduced in Tweedie [397, 398] and in
Kalashnikov [189]. We shall revisit these drift conditions, and expand somewhat on
their implications in the next chapter. Stronger versions of (V1) will play a central role
in classifying chains as yet more stable in due course.
The test functions for classifying random walk in the bounded range case are directly
based on those introduced by Foster [129]. The evaluation of the transience condition for
skip-free walks, given in Proposition 8.5.5, is also due to Foster. The approximations in
the case of zero drift are taken from Guo and Petrucelli [149] and are reused in analyzing
SETAR models in Section 9.5.2.
The proof of recurrence of random walk in Theorem 8.1.5, using the weak law of
large numbers, is due to Chung and Ornstein [73]. It appears diﬃcult to prove this
using the elementary drift methods.
The drift condition in the case of negative mean gives, as is well known, a stronger
form of recurrence: the concerned reader will ﬁnd that this is taken up in detail in
Chapter 11, where it is a central part of our analysis.

Commentary for the second edition: The drift operator (8.1) is analogous to the
generator for a Markov process in continuous time. Some of the theory surrounding
continuous time models is summarized in Section 20.3, including some foundations of
generators and resolvents.
Chapter 9

Harris and topological

recurrence

In this chapter we consider stronger concepts of recurrence and link them with the
dichotomy proved in Chapter 8. We also consider several obvious definitions of global
and local recurrence and transience for chains on topological spaces, and show that they
also link to the fundamental dichotomy.
In developing concepts of recurrence for sets A ∈ B(X), we will consider not just
the first hitting time τA , or the expected value U ( · , A) of ηA , but also the event that
Φ ∈ A infinitely often (i.o.), or ηA = ∞, defined by
∞ *
∞
{Φ ∈ A i.o.} := {Φk ∈ A}
N =1 k =N

which is well deﬁned as an F-measurable event on Ω. For x ∈ X, A ∈ B(X) we write

Q(x, A) := Px {Φ ∈ A i.o.} : (9.1)

obviously, for any x, A we have Q(x, A) ≤ L(x, A), and by the strong Markov property
we have

Q(x, A) = Ex [PΦ τ A {Φ ∈ A i.o.}I{τA < ∞}] = UA (x, dy)Q(y, A). (9.2)
A

Harris recurrence
The set A is called Harris recurrent if

Q(x, A) = Px (ηA = ∞) = 1, x ∈ A.

A chain Φ is called Harris (recurrent) if it is ψ-irreducible and every set

in B + (X) is Harris recurrent.

199
200 Harris and topological recurrence

We will see in Theorem 9.1.4 that when A ∈ B+ (X) and Φ is Harris recurrent then
in fact we have the seemingly stronger and perhaps more commonly used property that
Q(x, A) = 1 for every x ∈ X.
It is obvious from the definitions that if a set is Harris recurrent, then it is recurrent.
Indeed, in the formulation above the strengthening from recurrence to Harris recurrence
is quite explicit, indicating a move from an expected infinity of visits to an almost surely
infinite number of visits to a set.
This definition of Harris recurrence appears on the face of it to be stronger than
requiring L(x, A) ≡ 1 for x ∈ A, which is a standard alternative definition of Harris
recurrence. In one of the key results of this section, Proposition 9.1.1, we prove that
they are in fact equivalent.
The highlight of the Harris recurrence analysis is

Theorem 9.0.1. If Φ is recurrent, then we can write

X=H ∪N (9.3)

where H is absorbing and non-empty and every subset of H in B + (X) is Harris recur-
rent; and N is ψ-null and transient.

Proof This is proved, in a slightly stronger form, in Theorem 9.1.5.

Hence a recurrent chain differs only by a ψ-null set from a Harris recurrent chain. In
general we can then restrict analysis to H and derive very much stronger results using
properties of Harris recurrent chains.
For chains on a countable space the null set N in (9.3) is empty, so recurrent chains
are automatically Harris recurrent.
On a topological space we can also find conditions for this set to be empty, and
these also provide a useful interpretation of the Harris property.
We say that a sample path of Φ converges to infinity (denoted Φ → ∞) if the
trajectory visits each compact set only finitely often. This definition leads to

Theorem 9.0.2. For a ψ-irreducible T-chain, the chain is Harris recurrent if and only
if Px {Φ → ∞} = 0 for each x ∈ X.

Proof This is proved in Theorem 9.2.2

Even without its equivalence to Harris recurrence for such chains this “recurrence”
type of property (which we will call non-evanescence) repays study, and this occupies
Section 9.2.
In this chapter, we also connect local recurrence properties of a chain on a topological
space with global properties: if the chain is a ψ-irreducible T-chain, then recurrence of
the neighborhoods of any one point in the support of ψ implies recurrence of the whole
chain.
Finally, we demonstrate further connections between drift conditions and Harris
recurrence, and apply these results to give an increment analysis of chains on R which
generalizes that for the random walk in the previous chapter.
9.1. Harris recurrence 201

9.1 Harris recurrence

9.1.1 Harris properties of sets
We ﬁrst develop conditions to ensure that a set is Harris recurrent, based only on the
ﬁrst return time probabilities L(x, A).

Proposition 9.1.1. Suppose for some one set A ∈ B(X) we have L(x, A) ≡ 1, x ∈ A.
Then Q(x, A) = L(x, A) for every x ∈ X, and in particular A is Harris recurrent.

Proof Using the strong Markov property, we have that if L(y, A) = 1, y ∈ A, then
for any x ∈ A
Px (τA (2) < ∞) = UA (x, dy)L(y, A) = 1;
A

inductively this gives for x ∈ A, again using the strong Markov property,

Px (τA (k + 1) < ∞) = UA (x, dy)Py (τA (k) < ∞) = 1.
A

For any x we have

Px (ηA ≥ k) = Px (τA (k) < ∞),
and since by monotone convergence

Q(x, A) = lim Px (ηA ≥ k)

we have Q(x, A) ≡ 1 for x ∈ A.

It now follows since

Q(x, A) = UA (x, dy)Q(y, A) = L(x, A)
A

that the theorem is proved.

This shows that the definition of Harris recurrence in terms of Q is identical to a
similar definition in terms of L: the latter is often used (see for example Orey [309])
but the use of Q highlights the difference between recurrence and Harris recurrence.
We illustrate immediately the usefulness of the stronger version of recurrence in
conjunction with the basic dichotomy to give a proof of transience of random walk on
Z.
We showed in Section 8.4.3 that random walk on Z is transient when the increment
has non-zero mean and the range of the increment is bounded.
Using the fact that, on the integers, recurrence and Harris recurrence are identical
from Proposition 8.1.3, we can remove this bounded range restriction. To do this we
use the strong rather than the weak law of large numbers, as used in Theorem 8.1.5.
The form we require (see again, for example, Billingsley [37]) states that if Φn is a
random walk such that the increment distribution Γ has a mean β which is not zero,
then
P0 ( lim n−1 Φn = β) = 1.
n →∞
202 Harris and topological recurrence

Write Cn for the event {|n−1 Φn − β| > β/2}. We only use the result, which follows
from the strong law, that
P0 (lim sup Cn ) = 0. (9.4)
n →∞
Now let Dn denote the event {Φn = 0}, and notice that Dn ⊆ Cn for each n. Immedi-
ately from (9.4) we have
P0 (lim sup Dn ) = 0 (9.5)
n →∞
which says exactly Q(0, 0) = 0.
Hence we have an elegant proof of the general result
Proposition 9.1.2. If Φ denotes random walk on Z and if

β= w Γ(w) > 0,

then Φ is transient.

The most diﬃcult of the results we prove in this section, and the strongest, provides
a rather more delicate link between the probabilities L(x, A) and Q(x, A) than that in
Proposition 9.1.1.
Theorem 9.1.3. (i) Suppose that D A for any sets D and A in B(X). Then
{Φ ∈ D i.o.} ⊆ {Φ ∈ A i.o.} a.s. [P∗ ] (9.6)
and hence Q(y, D) ≤ Q(y, A), for all y ∈ X.
(ii) If X A, then A is Harris recurrent, and in fact Q(x, A) ≡ 1 for every x ∈ X.

Proof Since the event {Φ ∈ A i.o.} involves the whole path of Φ, we cannot deduce
this result merely by considering P n for fixed n. We need to consider all the events
En = {Φn +1 ∈ A}, n ∈ Z+
and evaluate the probability of those paths such that an infinite number of the En hold.
We first show that, if FnΦ is the σ-field generated by {Φ0 , . . . , Φn }, then as n → ∞
*
∞
∞ *
∞
P Ei | FnΦ → I Ei a.s. [P∗ ]. (9.7)
i=n m =1 i=m

To see this, note that for ﬁxed k ≤ n

*
∞ *
∞ ∞
∞ *
P Ei | FnΦ ≥ P Ei | FnΦ ≥ P Ei | FnΦ . (9.8)
i=k i=n m =1 i=m

Now apply the Martingale Convergence Theorem (see Theorem D.6.1) to the extreme
elements of the inequalities (9.8) to give
! !
∞ ∞
I i=k Ei ≥ lim supn P i=n Ei | FnΦ
!
∞
≥ lim inf n P i=n Ei | FnΦ (9.9)
0 !∞
∞
≥ I m =1 i=m Ei .
9.1. Harris recurrence 203

As k → ∞, the two extreme terms in (9.9) converge, which shows the limit in (9.7)
holds as required. !∞
By the strong Markov property, P∗ [ i=n Ei | FnΦ ] = L(Φn , A) a.s. [P∗ ]. From our
assumption that D A we have that L(Φn , A) is bounded from 0 whenever Φn ∈ D.
Thus, using (9.7) we have P∗ -a.s,
0 !∞
∞
I m =1 i=m {Φi ∈ D} ≤ I lim supn L(Φn , A) > 0

= I limn L(Φn , A) = 1 (9.10)
0 !∞
∞
= I m =1 i=m Ei ,

which is (9.6).
The proof of (ii) is then immediate, by taking D = X in (9.6).

As an easy consequence of Theorem 9.1.3 we have the following strengthening of
Harris recurrence:

Theorem 9.1.4. If Φ is Harris recurrent, then Q(x, B) = 1 for every x ∈ X and every
B ∈ B + (X).

Proof Let {Cn : n ∈ Z+ } be petite sets with ∪Cn = X. Since the ﬁnite union of
petite sets is petite for an irreducible chain by Proposition 5.5.5, we may assume that
Cn ⊂ Cn +1 and that Cn ∈ B + (X) for each n.
For any B ∈ B + (X) and any n ∈ Z+ we have from Lemma 5.5.1 that Cn B, and
hence, since Cn is Harris recurrent, we see from Theorem 9.1.3 (i) that Q(x, B) = 1 for
any x ∈ Cn . Because the sets {Ck } cover X, it follows that Q(x, B) = 1 for all x as
claimed.

Having established these stability concepts, and conditions implying they hold for
individual sets, we now move on to consider transience and recurrence of the overall
chain in the ψ-irreducible context.

9.1.2 Harris recurrent chains

It would clearly be desirable if, as in the countable space case, every set in B + (X) were
Harris recurrent for every recurrent Φ. Regrettably this is not quite true.
For consider any chain Φ for which every set in B + (X) is Harris recurrent: append
to X a sequence of individual points N = {xi }, and expand P to P on X := X ∪ N by
setting P (x, A) = P (x, A) for x ∈ X, A ∈ B(X), and

P (xi , xi+1 ) = βi , P (xi , α) = 1 − βi

for some one speciﬁc α ∈ X and all xi ∈ N .

Any choice of the probabilities βi which provides
∞
1
1> βi > 0
i=0
204 Harris and topological recurrence

then ensures that

∞
1
L (xi , A) = L (xi , α) = 1 − βi < 1, A ∈ B + (X),
n =i

so that no set B ⊂ X with B ∩ X in B + (X) and B ∩ N non-empty is Harris recurrent:

but
U (xi , A) ≥ L (xi , α)U (α, A) = ∞, A ∈ B(X),
so that every set in B+ (X ) is recurrent.
We now show that this example typiﬁes the only way in which an irreducible chain
can be recurrent and not Harris recurrent: that is, by the existence of an absorbing
set which is Harris recurrent, accompanied by a single ψ-null set on which the Harris
recurrence fails.
For any Harris recurrent set D, we write D∞ = {y : L(y, D) = 1}, so that D ⊆ D∞ ,
and D∞ is absorbing.
We will call D a maximal absorbing set if D = D∞ . This will be used, in general,
in the following form:

Maximal Harris sets

We call a set H maximal Harris if H is a maximal absorbing set such that
Φ restricted to H is Harris recurrent.

Theorem 9.1.5. If Φ is recurrent, then we can write

X=H ∪N (9.11)
where H is a non-empty maximal Harris set and N is transient.

Proof Let C be a ψa -petite set in B + (X), where we choose ψa as a maximal

irreducibility measure. Set H = {y : Q(x, C) = 1} and write N = H c .
Clearly, since H ∞ = H, either H is empty or H is maximal absorbing. We ﬁrst
show that H is non-empty.
Suppose otherwise, so that Q(x, C) < 1 for all x. We ﬁrst show this implies the set
C1 := {x ∈ C : L(x, C) < 1} :
is in B (X).
+

For if not, and ψ(C1 ) = 0, then by Proposition 4.2.3 there exists an absorbing full
set F ⊂ C1c . We have by deﬁnition that L(x, C) = 1 for any x ∈ C ∩ F , and since
F is absorbing we must have L(x, C ∩ F ) = 1 for x ∈ C ∩ F . From Proposition 9.1.1
it follows that Q(x, C ∩ F ) = 1 for x ∈ C ∩ F , which gives a contradiction, since
Q(x, C) ≥ Q(x, C ∩ F ). This shows that in fact ψ(C1 ) > 0.
But now, since C1 ∈ B + (X) there exists B ⊆ C1 , B ∈ B + (X) and δ > 0 with
L(x, C1 ) ≤ δ < 1 for all x ∈ B: accordingly
L(x, B) ≤ L(x, C1 ) ≤ δ, x ∈ B.
9.1. Harris recurrence 205

Now Proposition 8.3.1 (iii) gives U (x, B) ≤ [1 − δ]−1 , x ∈ B and this contradicts the
assumed recurrence of Φ.
Thus H is a non-empty maximal absorbing set, and by Proposition 4.2.3 H is full:
from Proposition 8.3.7 we have immediately that N is transient. It remains to prove
that H is Harris.
For any set A in B + (X) we have C A. It follows from Theorem 9.1.3 that if
Q(x, C) = 1 then Q(x, A) = 1 for every A ∈ B+ (X). Since by construction Q(x, C) = 1
for x ∈ H, we have also that Q(x, A) = 1 for any x ∈ H and A ∈ B+ (X): so Φ restricted
to H is Harris recurrent, which is the required result.

We now strengthen the connection between properties of Φ and those of its skeletons.

Theorem 9.1.6. Suppose that Φ is ψ-irreducible and aperiodic. Then Φ is Harris if

and only if each skeleton is Harris.

Proof If the m-skeleton is Harris recurrent then, since mτAm ≥ τA for any A ∈ B(X),
m
where τA is the ﬁrst entrance time for the m-skeleton, it immediately follows that Φ is
also Harris recurrent.
Suppose now that Φ is Harris recurrent. For any m ≥ 2 we know from Proposi-
tion 8.2.6 that Φm is recurrent, and hence a Harris set Hm exists for this skeleton.
Since Hm is full, there exists a subset H ⊂ Hm which is absorbing and full for Φ, by
Proposition 4.2.3.
Since Φ is Harris recurrent we have that Px {τH < ∞} ≡ 1, and since H is absorbing
we know that mτHm ≤ τH + m. This shows that

Px {τHm < ∞} = Px {τH < ∞} ≡ 1

and hence Φm is Harris recurrent as claimed.

9.1.3 A hitting time criterion for Harris recurrence

The Harris recurrence results give useful extensions of the results in Theorem 8.3.5 and
Theorem 8.3.6.

Proposition 9.1.7. Suppose that Φ is ψ-irreducible.

(i) If some petite set C is recurrent, then Φ is recurrent; and the set C∩N is uniformly
transient, where N is the transient set in the Harris decomposition (9.11).

(ii) If there exists some petite set in B(X) such that L(x, C) ≡ 1, x ∈ X, then Φ is
Harris recurrent.

Proof (i) If C is recurrent then so is the chain, from Theorem 8.3.5. Let D =
C ∩ N denote the part of C not in H. Since N is ψ-null, and ν is an irreducibility
measure we must have ν(N ) = 0 by the maximality of ψ; hence (8.33) holds and from
(8.35) we have a uniform bound on U (x, D), x ∈ X so that D is uniformly transient.
(ii) If L(x, C) ≡ 1, x ∈ X for some ψa -petite set C, then from Theorem 9.1.3 C
is Harris recurrent. Since C is petite we have C A for each A ∈ B+ (X). The Harris
206 Harris and topological recurrence

recurrence of C, together with Theorem 9.1.3 (ii), gives Q(x, A) ≡ 1 for all x, so Φ is
Harris recurrent.

This leads to a stronger version of Theorem 8.4.3.

Theorem 9.1.8. Suppose Φ is a ψ-irreducible chain. If there exists a petite set C ⊂ X,

and a function V which is unbounded oﬀ petite sets such that (V1) holds, then Φ is
Harris recurrent.

Proof In Theorem 8.4.3 we showed that L(x, C ∪CV (n)) ≡ 1, for some n, so Harris
recurrence has already been proved in view of Proposition 9.1.7.

9.2 Non-evanescent and recurrent chains

9.2.1 Evanescence and transience
Let us now turn to chains on topological spaces. Here, as was the case when considering
irreducibility, it is our major goal to delineate behavior on open sets rather than arbi-
trary sets in B(X); and when considering questions of stability in terms of sure return
to sets, the objects of interest will typically be compact sets.
With probabilistic stability one has “finiteness” in terms of return visits to sets of
positive measure of some sort, where the measure is often dependent on the chain; with
topological stability the “finite” sets of interest are compact sets which are defined by
the structure of the space rather than of the chain. It is obvious from the links between
petite sets and compact sets for T-chains that we will be able to describe behavior on
compacta directly from the behavior on petite sets described in the previous section,
provided there is an appropriate continuous component for the transition law of Φ.
In this section we investigate a stability concept which provides such links between
the chain and the topology on the space, and which we touched on in Section 1.3.1.
As we discussed in the introduction of this chapter, a sample path of Φ is said
to converge to infinity (denoted Φ → ∞) if the trajectory visits each compact set
only finitely often. Since X is locally compact and separable, it follows from Lindelöf’s
Theorem D.3.1 that there exists a countable collection of open precompact sets {On :
n ∈ Z+ } such that
∞

{Φ → ∞} = {Φ ∈ On i.o.}c .
n =0

In particular, then, the event {Φ → ∞} lies in F.

Non-evanescent chains
A Markov chain Φ will be called non-evanescent if Px {Φ → ∞} = 0 for
each x ∈ X.
9.2. Non-evanescent and recurrent chains 207

We first show that for a T-chain, either sample paths converge to infinity or they
enter a recurrent part of the space. Recall that for any A, we have A0 = {y : L(y, A) =
0}.
Theorem 9.2.1. Suppose that Φ is a T-chain. For any A ∈ B(X) which is transient,
and for each x ∈ X, $ %
Px {Φ → ∞} ∪ {Φ enters A0 } = 1. (9.12)
Thus if Φ is a non-evanescent T-chain, then X is not transient.
!
Proof Let A = Bj , with each Bj uniformly transient; then from Proposi-
M −1
tion 8.3.2, the sets B̄i (M ) = {x ∈ X : j
j =1 P (x, Bi ) > M } are also uniformly
!
transient, for any i, j. Thus Ā = Ai where each Ai is uniformly transient.
Since T is lower semicontinuous, the sets Oij := {x ∈ X : T (x, Ai ) > j −1 } are open,
as is Oj := {x ∈ X : T (x, A0 ) > j −1 }, i, j ∈ Z+ . Since T is everywhere non-trivial we
have for all x ∈ X,
*
T (x, Aj ∪ A0 ) = T (x, X) > 0
and hence the sets {Oij , Oj } form an open cover of X.
Let C be a compact subset of X, and choose M such that {OM , OiM : 1 ≤ i ≤ M }
is a finite subcover of C. Since each Ai is uniformly transient, and
Ka (x, Ai ) ≥ T (x, Ai ) ≥ j −1 , x ∈ Oij , (9.13)
we know from Proposition 8.3.2 that each of the sets Oij is uniformly transient. It
follows that with probability one, every trajectory that enters C infinitely often must
enter OM infinitely often: that is,
{Φ ∈ C i.o.} ⊂ {Φ ∈ OM i.o.} a.s. [P∗ ],
But since L(x, A0 ) > 1/M for x ∈ OM we have by Theorem 9.1.3 that
{Φ ∈ OM i.o.} ⊂ {Φ ∈ A0 i.o.} a.s. [P∗ ]
and this completes the proof of (9.12).

9.2.2 Non-evanescence and recurrence

We can now prove one of the major links between topological and probabilistic stability
conditions.
Theorem 9.2.2. For a ψ-irreducible T-chain, the space admits a decomposition
X=H ∪N
where H is either empty or a maximal Harris set, and N is transient: and for all x ∈ X,
L(x, H) = 1 − Px {Φ → ∞}. (9.14)
Hence we have
(i) the chain is recurrent if and only if Px {Φ → ∞} < 1 for some x ∈ X; and
(ii) the chain is Harris recurrent if and only if the chain is non-evanescent.
208 Harris and topological recurrence

Proof We have the decomposition X = H ∪ N from Theorem 9.1.5 in the recurrent

case, and Theorem 8.3.4 otherwise.
We have (9.14) from (9.12), since N is transient and H = N 0 .
Thus if Φ is a non-evanescent T-chain, then it must leave the transient set N in
(9.11) with probability one, from Theorem 9.2.1. By construction, this means N is
empty, and Φ is Harris recurrent.
Conversely, if Φ is Harris recurrent (9.14) shows the chain is non-evanescent.

This result shows that natural definitions of stability and instability in the topolog-
ical and in the probabilistic contexts are exactly equivalent, for chains appropriately
adapted to the topology.
Before exploring conditions for either recurrence or non-evanescence, we look at the
ways in which it is possible to classify individual states on a topological space, and the
solidarity between such definitions and the overall classification of the chain which we
have just described.

9.3 Topologically recurrent and transient states

9.3.1 Classifying states through neighborhoods
We now introduce some natural stochastic stability concepts for individual states when
the space admits a topology. The reader should be aware that uses of terms such as
“recurrence” vary across the literature. Our deﬁnitions are consistent with those we
have given earlier, and indeed will be shown to be identical under appropriate conditions
when the chain is an irreducible T-chain or an irreducible Feller process; however, when
comparing them with some terms used by other authors, care needs to be taken.
In the general space case, we developed deﬁnitions for sets rather than individual
states: when there is a topology, and hence a natural collection of sets (the open
neighborhoods) associated with each point, it is possible to discuss recurrence and
transience of each point even if each point is not itself reached with positive probability.

Topological recurrence concepts

We shall call a point x∗ topologically recurrent if U (x∗ , O) = ∞ for all
neighborhoods O of x∗ , and topologically transient otherwise.
We shall call a point x∗ topologically Harris recurrent if Q(x∗ , O) = 1 for
all neighborhoods O of x∗ .

We ﬁrst determine that this deﬁnition of topological Harris recurrence is equivalent

to the formally weaker version involving ﬁniteness only of ﬁrst return times.

Proposition 9.3.1. The point x∗ is topologically Harris recurrent if and only if

L(x∗ , O) = 1 for all neighborhoods O of x∗ .
9.3. Topologically recurrent and transient states 209

Proof Our assumption is that

Px ∗ (τO < ∞) = 1, (9.15)
∗
for each neighborhood O of x . We show by induction that if τO (j) is the time of the
j th return to O as usual, and for some integer j ≥ 1,
Px ∗ (τO (j) < ∞) = 1, (9.16)
∗
for each neighborhood O of x , then for each such neighborhood
Px ∗ (τO (j + 1) < ∞) = 1. (9.17)
∗
Thus (9.17) holds for all j and the point x is by deﬁnition topologically Harris recurrent.
Recall that for any B ⊂ O we have the following probabilistic interpretation of the
kernel UO :
UO (x∗ , B) = Px ∗ (τO < ∞ and Φτ O ∈ B).
Suppose that UO (x∗ , {x∗ }) = q ≥ 0 where {x∗ } is the set containing the one point x∗ ,
so that
UO (x∗ , O\{x∗ }) = 1 − q. (9.18)
The assumption that j distinct returns to O are sure implies that
Px ∗ (Φτ O (1) = x∗ , Φτ O (r ) ∈ O, r = 2, . . . , j + 1) = q. (9.19)
Let Od ↓ {x∗ } be a countable neighborhood basis at x∗ . The assumption (9.16) applied
to each Od also implies that
Py (τO d (j) < ∞) = 1, (9.20)
for almost all y in O\Od with respect to UO (x∗ , ·). But by (9.18) we have
UO (x∗ , O\Od ) ↑ 1 − q,
as Od ↓ {x∗ } and so by (9.20),

U (x, dy)Py (τO (j) < ∞)
O \{x ∗ } O
≥ limd↓0 O \O d
UO (x∗ , dy)Py (τO d (j) < ∞)
= 1 − q.
(9.21)
This yields the desired conclusion, since by (9.19) and (9.21),

Px ∗ (τO (j + 1) < ∞) = UO (x∗ , dy)Py (τO (j) < ∞) = 1.
O

9.3.2 Solidarity of recurrence for T-chains

For T-chains we can connect the idea of properties of individual states with the prop-
erties of the whole space under suitable topological irreducibility conditions.
The key to much of our analysis of chains on topological spaces is the following
simple lemma.
Lemma 9.3.2. If Φ is a T-chain, and T (x∗ , B) > 0 for some x∗ , B, then there
a
is a neighborhood O of x∗ and a distribution a such that O B, and hence from
Lemma 5.5.1, O B.
210 Harris and topological recurrence

Proof Since Φ is a T-chain, there exists some distribution a such that for all x,
Ka (x, B) ≥ T (x, B).
∗
But since T (x , B) > 0 and T (x, B) is lower semicontinuous, it follows that for some
neighborhood O of x∗ ,
inf T (x, B) > 0
x∈O
and thus, as in (5.45),
inf L(x, B) ≥ inf Ka (x, B) ≥ inf T (x, B)
x∈O x∈O x∈O

and the result is proved.

Theorem 9.3.3. Suppose that Φ is a ψ-irreducible T-chain, and that x∗ is reachable.

Then Φ is recurrent if and only if x∗ is topologically recurrent.

Proof If x∗ is reachable then x∗ ∈ supp ψ and so O ∈ B + (X) for every neighbor-

hood of x . Thus if Φ is recurrent then every neighborhood O of x∗ is recurrent, and
∗

so by deﬁnition x∗ is topologically recurrent.

If Φ is transient then there exists a uniformly transient set B such that T (x∗ , B) > 0,
from Theorem 8.3.4, and thus from Lemma 9.3.2 there is a neighborhood O of x∗ such
that O B; and now from Proposition 8.3.2, O is uniformly transient and thus x∗ is
topologically transient also.

We now work towards developing links between topological recurrence and topolog-
ical Harris recurrence of points, as we did with sets in the general space case.
It is unfortunately easy to construct an example which shows that even for a T-
chain, topologically recurrent states need not be topologically Harris recurrent without
some extra assumptions. Take X = [0, 1] ∪ {2}, and deﬁne the transition law for Φ by
P (0, · ) = (µ + δ2 )/2,
P (x, · ) = µ, x ∈ (0, 1], (9.22)
P (2, · ) = δ2 ,
where µ is Lebesgue measure on [0, 1] and δ2 is the point mass at {2}. Set the everywhere
non-trivial continuous component T of P itself as
T (x, · ) = µ/2, x ∈ [0, 1],
T (2, · ) = δ2 . (9.23)
By direct calculation one can easily see that {0} is a topologically recurrent state but
is not topologically Harris recurrent.
It is also possible to develop examples where the chain is weak Feller but topological
recurrence does not imply topological Harris recurrence of states.
Let X = {0, ±1, ±2, . . . , ±∞}, and choose 0 < p < 12 and q = 1 − p. Put P (0, 1) =
p, P (0, −1) = q, and for n = 1, 2, . . ., set
P (n, n + 1) = p P (n, n − 1) = q
P (−n, −n − 1) = p P (−n, 0) = 12 − p P (−n, n) = 1
2 (9.24)
P (−∞, −∞) = p P (−∞, 0) = 12 − p P (−∞, ∞) = 1
2
P (∞, ∞) = 1.
9.3. Topologically recurrent and transient states 211

By comparison with a simple random walk, such as analyzed in Proposition 8.4.4, it is

clear that the ﬁnite integers are all recurrent states in the countable state space sense.
Now endow the space X with the discrete topology on the integers, and with a
countable basis for the neighborhoods at ∞, −∞ given respectively by the two sets
{n, n + 1, . . . , ∞} and {−n, −n − 1, . . . , −∞} for n ∈ Z+ . The chain is a Feller chain in
this topology, and every neighborhood of −∞ is recurrent so that −∞ is a topologically
recurrent state.
But L(−∞, {−∞, −1}) < 12 , so the state at −∞ is not topologically Harris recurrent.
There are however some connections which do hold between recurrence and Harris
recurrence.

Proposition 9.3.4. If Φ is a T-chain and the state x∗ is topologically recurrent then

Q(x∗ , O) > 0 for all neighborhoods O of x∗ .
If P (x∗ , · ) ∼
= T (x∗ , · ) then also x∗ is topologically Harris recurrent. In particular,
therefore, for strong Feller chains topologically recurrent states are topologically Harris
recurrent.

Proof (i) Assume the state x∗ is topologically recurrent but that O is a neigh-
borhood of x∗ with Q(x∗ , O) = 0. Let O∞ = {y : Q(y, O) = 1}, so that L(x∗ , O∞ ) = 0.
Since
L(x, A) ≥ Ka (x, A) ≥ T (x, A), x ∈ X, A ∈ B(X)
this implies T (x∗ , O∞ ) = 0, and since T is non-trivial, we must have

T (x∗ , [O∞ ]c ) > 0. (9.25)

Let Dn := {y : Py (ηO < n) > n−1 }: since Dn ↑ [O∞ ]c , we must have T (x∗ , Dn ) > 0 for
some n. The continuity of T now ensures that there exists some δ and a neighborhood
Oδ ⊆ O of x∗ such that
T (x, Dn ) > δ, x ∈ Oδ . (9.26)
∞
Let us take m large enough that m a(j) ≤ δ/2: then from (9.26) we have

max P j (x, Dn ) > δ/2m, x ∈ Oδ , (9.27)

1≤j ≤m

which obviously implies

Px (τD n ≤ m) > δ/2m, x ∈ Oδ . (9.28)

It follows that

Px (ηO δ ≤ m + n) ≥ Px (ηO ≤ m + n)
m
≥ 1 Dn Dn
P k (x, dy)Py (ηO ≤ n)
(9.29)
≥ n−1 P(τD n ≤ m)

≥ n−1 δ/2m, x ∈ Oδ .
212 Harris and topological recurrence

With (9.29) established we can apply Proposition 8.3.1 to see that Oδ is uniformly
transient.
This contradicts our assumption that x∗ is topologically recurrent, and so in fact
Q(x∗ , O) > 0 for all neighborhoods O.
(ii) Suppose now that P (x∗ , · ) and T (x∗ , · ) are equivalent. Choose x∗ topolog-
ically recurrent and assume we can ﬁnd a neighborhood O with Q(x∗ , O) < 1. Deﬁne
O∞ as before, and note that now P (x∗ , [O∞ ]c ) > 0 since otherwise

Q(x∗ , O) ≥ P (x∗ , dy)Q(y, O) = 1;
O∞
∗ ∞ c
and so also T (x , [O ] ) > 0. Thus we again have (9.25) holding, and the argument in
(i) shows that there is a uniformly transient neighborhood of x∗ , again contradicting the
assumption of topological recurrence. Hence x∗ is topologically Harris recurrent.

The examples (9.22) and (9.24) show that we do not get, in general, the second
conclusion of this proposition if the chain is merely weak Feller or has only a strong
Feller component.
In these examples, it is the lack of irreducibility which allows such obvious “patholog-
ical” behavior, and we shall see in Theorem 9.3.6 that when the chain is a ψ-irreducible
T-chain then this behavior is excluded. Even so, without any irreducibility assump-
tions we are able to derive a reasonable analogue of Theorem 9.1.5, showing that the
non-Harris recurrent states form a transient set.
Theorem 9.3.5. For any chain Φ there is a decomposition
X = R ∪ N,
where R denotes the set of states which are topologically Harris recurrent and N is
transient.

Proof Let Oi be a countable basis for the topology on X. If x ∈ Rc then, by

Proposition 9.3.1, we have some n ∈ Z+ such that x ∈ On with L(x, On ) < 1. Thus the
sets Dn = {y ∈ On : L(y, On ) < 1} cover the set of non-topologically Harris recurrent
states. We can further partition each Dn into
Dn (j) := {y ∈ Dn : L(y, On ) ≤ 1 − j −1 }
and by this construction, for y ∈ Dn (j), we have
L(y, Dn (j)) ≤ L(y, Dn ) ≤ L(y, On ) ≤ 1 − j −1 :
it follows from Proposition 8.3.1 that U (x, Dn (j)) is bounded above by j, and hence is
uniformly transient.

Regrettably, this decomposition does not partition X into Harris recurrent and tran-
sient states, since the sets Dn (j) in the cover of non-Harris states may not be open.
Therefore there may actually be topologically recurrent states which lie in the set which
we would hope to have as the “transient” part of the space, as happens in the example
(9.22).
We can, for ψ-irreducible T-chains, now improve on this result to round out the
links between the Harris properties of points and those of the chain itself.
9.4. Criteria for stability on a topological space 213

Theorem 9.3.6. For a ψ-irreducible T-chain, the space admits a decomposition

X=H ∪N

where H is non-empty or a maximal Harris set and N is transient; the set of Harris
recurrent states R is contained in H; and every state in N is topologically transient.

Proof The decomposition has already been shown to exist in Theorem 9.2.2. Let
x∗ ∈ R be a topologically Harris recurrent state. Then from (9.14), we must have
L(x, H) = 1, and so x∗ ∈ H by maximality of H.
We can write N = NE ∪ NH where NH = {y ∈ N : T (y, H) > 0} and NE = {y ∈
N : T (y, H) = 0}. For ﬁxed x∗ ∈ NH there exists δ > 0 and an open set Oδ such that
x∗ ∈ Oδ and T (y, H) > δ for all y ∈ Oδ , by the lower semicontinuity of T ( · , H).
Hence also the sampled kernelKa minorized by T satisﬁes Ka (y, H) > δ for all
y ∈ Oδ . Now choose M such that n > M a(n) ≤ δ/2. Then for all y ∈ Oδ

P n (y, H)a(n) ≥ δ/2,
n ≤M

and since H is absorbing

Py (ηN > M ) = Py (τH > M ) ≤ 1 − δ/2,

which shows that Oδ is uniformly transient from (8.35).

If on the other hand x∗ ∈ NE then since T is non-trivial, there exists a uniformly
transient set D ⊆ N such T (x∗ , D) > 0; and now by Lemma 9.3.2, there is again a
a
neighbourhood O of x∗ with O D, so that O is uniformly transient by Proposi-
tion 8.3.2 as required.

The maximal Harris set in Theorem 9.3.6 may be strictly larger than the set R of
topologically Harris recurrent states. For consider the trivial example where X = [0, 1]
and P (x, {0}) = 1 for all x. This is a δ0 -irreducible strongly Feller chain, with R = {0}
and yet H = [0, 1].

9.4 Criteria for stability on a topological space

9.4.1 A drift criterion for non-evanescence
We can extend the results of Theorem 8.4.3 in a number of ways if we take up the
obvious martingale implications of (V1), and in the topological case we can also gain a
better understanding of the rather inexplicit concept of functions unbounded oﬀ petite
sets for a particular chain if we deﬁne “coercive” functions.

Coercive functions
A function V is called coercive if V (x) → ∞ as x → ∞: this means that
the sublevel sets {x : V (x) ≤ r} are precompact for each r > 0.
214 Harris and topological recurrence

This nomenclature is designed to remind the user that we seek functions which
behave like norms: they are large as the distance from the center of the space increases.
Typically in practice, a coercive function will be a norm on Euclidean space, or at least a
monotone function of a norm. For irreducible T-chains, functions unbounded oﬀ petite
sets certainly include coercive functions, since compacta are petite in that case; but of
course coercive functions are independent of the structure of the chain itself.
Even without irreducibility we get a useful conclusion from applying (V1).

Theorem 9.4.1. If condition (V1) holds for a coercive function V and a compact set
C, then Φ is non-evanescent.

Proof Suppose that in fact Px {Φ → ∞} > 0 for some x ∈ X. Then, since the set
C is compact, there exists M ∈ Z+ with
( )
Px {Φk ∈ C c , k ≥ M } ∩ {Φ → ∞} > 0.

Hence letting µ = P M (x, · ), we have by conditioning at time M ,

( )
Pµ {σC = ∞} ∩ {Φ → ∞} > 0. (9.30)

We now show that (9.30) leads to a contradiction.

In order to use the martingale nature of (V1), we write (8.42) as

E[V (Φk +1 ) | FkΦ ] ≤ V (Φk ) a.s. [P∗ ],

when σC > k, k ∈ Z+ .
Now let Mi = V (Φi )I{σC ≥ i}. Using the fact that {σC ≥ k} ∈ FkΦ−1 , we may show
that (Mk , FkΦ ) is a positive supermartingale: indeed,

E[Mk | FkΦ−1 ] = I{σC ≥ k}E[V (Φk ) | FkΦ−1 ] ≤ I{σC ≥ k}V (Φk −1 ) ≤ Mk −1 .

Hence there exists an almost surely ﬁnite random variable M∞ such that Mk → M∞
as k → ∞.
There are two possibilities for the limit M∞ . Either σC < ∞ in which case M∞ = 0,
or σC = ∞ in which case lim supk →∞ V (Φk ) = M∞ < ∞ and in particular Φ → ∞
since V is coercive. Thus we have shown that
( )
Pµ {σC < ∞} ∪ {Φ → ∞}c = 1,

which clearly contradicts (9.30). Hence Φ is non-evanescent.

Note that in general the set C used in (V1) is not necessarily Harris recurrent, and
it is possible that the set may not be reached from any initial condition. Consider the
example where X = R+ , P (0, {1}) = 1, and P (x, {x}) ≡ 1 for x > 0. This is non-
evanescent, satisﬁes (V1) with V (x) = x, and C = {0}, but clearly from x there is no
possibility of reaching compacta not containing {x}.
However, from our previous analysis in Theorem 9.1.8 we obviously have that if Φ
is ψ-irreducible and condition (V1) holds for C petite, then both C and Φ are Harris
recurrent.
9.4. Criteria for stability on a topological space 215

9.4.2 A converse theorem for Feller chains

In the topological case we can construct a converse to the drift condition (V1), provided
the chain has appropriate continuity properties.

Theorem 9.4.2. Suppose that Φ is a weak Feller chain, and suppose that there exists
a compact set C satisfying σC < ∞ a.s. [P∗ ].
Then there exists a compact set C0 containing C and a coercive function V , bounded
on compacta, such that
∆V (x) ≤ 0, x ∈ C0c . (9.31)

Proof Let {An } be a countable increasing cover of X by open pre-compact sets

with C ⊆ A0 ; and put Dn = Acn for n ∈ Z+ . For n ∈ Z+ , set

Vn (x) = Px (σD n < σA 0 ). (9.32)

For any ﬁxed n and any x ∈ Ac0 we have from the Markov property that the sequence
Vn (x) satisﬁes, for x ∈ Ac0 ∩ Dnc

P (x, dy)Vn (y) = Ex [PΦ 1 {σD n < σA 0 }]
= Px {σD n < σA 0 } (9.33)
= Vn (x),

whilst for x ∈ Dn we have Vn (x) = 1; so that for all n ∈ Z+ and x ∈ Ac0

P (x, dy)Vn (y) ≤ Vn (x). (9.34)

We will show that for suitably chosen {ni } the function

∞

V (x) = Vn i (x), (9.35)
i=0

which clearly satisfies the appropriate drift condition by linearity from (9.34) if finitely
defined, gives the required converse result.
Since Vn (x) = 1 on Dn , it is clear that V is coercive. To complete the proof we must
show that the sequence {ni } can be chosen to ensure that V is bounded on compact
sets, and it is for this we require the Feller property.
Let m ∈ Z+ and take the upper bound

Vn (x) = Px {{σD n < σA 0 } ∩ {σA 0 ≤ m} ∪ {σD n < σA 0 } ∩ {σA 0 > m}}

≤ Px {σD n < m} + Px {σA 0 > m}. (9.36)

Choose the sequence {ni } as follows. By Proposition 6.1.1, the function Px {σA 0 > m}
is an upper semicontinuous function of x, which converges to zero as m → ∞ for all
x. Hence the convergence is uniform on compacta, and thus we can choose mi so large
that
Px {σA 0 > mi } < 2−(i+1) , x ∈ Ai . (9.37)
216 Harris and topological recurrence

Now for mi ﬁxed for each i, consider Px {σD n < mi }: as a function of x this is also
upper semicontinuous and converges to zero as n → ∞ for all x. Hence again we see
that the convergence is uniform on compacta, which implies we may choose ni so large
that
Px {σD n i < mi } < 2−(i+1) , x ∈ Ai . (9.38)
Combining (9.36), (9.37) and (9.38) we see that Vn i ≤ 2−i for x ∈ Ai . From (9.35) this
implies, ﬁnally, for all k ∈ Z+ and x ∈ Ak
∞

V (x) ≤ k+ Vn i (x)
i=k
∞
≤ k+ 2−i
i=k
≤ k + 1, (9.39)

which completes the proof.

The following somewhat pathological example shows that in this instance we cannot
use a strongly continuous component condition in place of the Feller property if we
require V to be continuous.
Set X = R+ and for every irrational x and every integer x set P (x, {0}) = 1. Let
{rn } be an ordering of the remaining rationals Q\Z+ , and deﬁne P for these states by
P (rn , 0) = 1/2, P (rn , n) = 1/2. Then the chain is δ0 -irreducible, and clearly recurrent;
and the component T (x, A) = 12 δ0 {A} renders the chain a T-chain. But P V (rn ) ≥
V (n)/2, so that for any coercive function V , within any open set P (x, dy)V (y) is
unbounded.
However, for discontinuous V we do get a coercive test function: just take V (rn ) =
n, and V (x) = x, for x not equal to any rn . Then P V (rn ) = n/2 < V (rn ), and
P V (x) = 0 < V (x), for x not equal to any rn , so that (V1) does hold.

9.4.3 Non-evanescence of random walk

As an example of the use of (V1) we consider in more detail the analysis of the unre-
stricted random walk
Φn = Φn −1 + Wn .
We will show that if W is an increment variable on R with β = 0 and

E(W 2 ) = w2 Γ(dw) < ∞,

then the unrestricted random walk on R with increment W is non-evanescent.

To verify this using (V1) we ﬁrst need to add to the bounds on the moments of Γ
which we gave in Lemma 8.5.2 and Lemma 8.5.3.
Lemma 9.4.3. Let W be a random variable, s a positive number and t any real number.
Then for any B ⊆ {w : −s + tw > 0},

E[log(−s + tW )I{W ∈ B}] ≤ P(B)(log(s) − 2) + (t/s)E[W I{W ∈ B}].

9.4. Criteria for stability on a topological space 217

Proof For all x > 1, log(−1 + x) ≤ x − 2. Thus

log(−s + tW )I{W ∈ B} = [log(s) + log(−1 + tW/s)]I{W ∈ B}

≤ (log(s) + tW/s − 2)I{W ∈ B};

taking expectations again gives the result.

Lemma 9.4.4. Let W be a random variable with distribution function Γ and ﬁnite
variance. Let s, c, u2 , and v2 be positive numbers, and let t1 ≥ t2 and u1 , v1 , t be real
numbers. Then
(i)

lim x2 [−Γ(−∞, t1 + sx) log(u1 − u2 x) + Γ(−∞, t2 + sx)(log(v1 − v2 x) − c)] ≤ 0.

x→−∞
(9.40)
(ii)

lim x2 [−Γ(t2 +sx, ∞) log(v1 +v2 x)+Γ(t1 +sx, ∞)(log(u1 +u2 x)−c)] ≤ 0. (9.41)
x→∞

Proof To see (i), note that from

lim x2 Γ(−∞, t2 + sx) = 0

x→∞

and
lim log[(u1 − u2 x)/(v1 − v2 x)] = log(u2 /v2 ),
x→∞

we have

lim x2 −Γ(−∞, t1 + sx) log(u1 − u2 x) + Γ(−∞, t2 + sx)(log(v1 − v2 x) − c)
x→∞

= lim −x2 (Γ(−∞, t1 + sx) − Γ(−∞, t2 + sx)) log(u1 − u2 x)
x→∞

× −x2 Γ(−∞, t2 + sx) log[(u1 − u2 x)/(v1 − v2 x)] − cx2 Γ(−∞, t2 + sx)

which is non-positive. The proof of (ii) is similar.

We can now prove the most general version of Theorem 8.1.5 using a drift condition
that we shall attempt.
Proposition 9.4.5. If W is an increment variable on R with β = 0 and E(W 2 ) < ∞,
then the unrestricted random walk on R+ with increment W is non-evanescent.

Proof In this situation we use the test function

"
log(1 + x) x > R
V (x) = (9.42)
log(1 − x) x < −R

and V (x) = 0 in the region [−R, R], where R > 1 is again a positive constant to be
chosen.
218 Harris and topological recurrence

We need to evaluate the behavior of Ex [V (X1 )] near both ∞ and −∞ in this case,
and we write
V1 (x) = Ex [log(1 + x + W )I{x + W > R}]
V2 (x) = Ex [log(1 − x − W )I{x + W < −R}]
so that
Ex [V (X1 )] = V1 (x) + V2 (x).
This time we develop bounds using the functions

V3 (x) = (1/(1 + x))E[W I{W > R − x}]

V4 (x) = (1/(2(1 + x)2 ))E[W 2 I{R − x < W < 0}]
V5 (x) = (1/(1 − x))E[W I{W < −R − x}].

For x > R, 1 + x > 0, and thus as in (8.59), by Lemma 8.5.2,

V1 (x) ≤ Γ(R − x, ∞) log(1 + x) + V3 (x) − V4 (x),

while 1 − x < 0, and by Lemma 9.4.3,

V2 (x) ≤ Γ(−∞, −R − x)(log(−1 + x) − 2) − V5 (x).

Since E(W 2 ) < ∞,

V4 (x) = (1/(2(1 + x)2 ))E[W 2 I{W < 0}] − o(x−2 ),

and by Lemma 8.5.3, both V3 and V5 are also o(x−2 ). By Lemma 9.4.4 (i) we also have

−Γ(−∞, R − x) log(1 + x) + Γ(−∞, −R − x)(log(−1 + x) − 2) ≤ o(x−2 ).

Thus by choosing R large enough

Ex [V (X1 )] ≤ V (x) − (1/(2(1 + x)2 ))E[W 2 I{W < 0}] + o(x−2 )

≤ V (x), x > R. (9.43)

The situation with x < −R is exactly symmetric, and thus we have that V is a coercive
function satisfying (V1); and so the chain is non-evanescent from Theorem 9.4.1.

9.5 Stochastic comparison and increment analysis

There are two further valuable tools for analyzing specific chains which we will consider
in this final section on recurrence and transience. Both have been used implicitly in
some of the examples we have looked at in this and the previous chapter, but because
they are of wide applicability we will discuss them somewhat more formally here.
The first method analyzes chains through an “increment analysis”. Because they
consider only expected changes in the one-step position of some function V of the chain,
and because expectation is a linear operator, drift criteria such as those in Section 9.4
essentially classify the behavior of the Markov model by a linearization of its increments.
They are therefore often relatively easy to use for models where the transitions are
9.5. Stochastic comparison and increment analysis 219

already somewhat linear in structure, such as those based on the random walk: we have
already seen this in our analysis of random walk on the half line in Section 8.4.3.
Such increment analysis is of value in many models, especially if combined with
“stochastic comparison” arguments, which rely heavily on the classiﬁcation of chains
through return time probabilities.
In this section we will further use the stochastic comparison approach to discuss
the structure of scalar linear models and general random walk on R, and the special
nonlinear SETAR models; we will then consider an increment analysis of general models
on R+ which have no inherent linearity in their structure.

9.5.1 Linear models and the stochastic comparison technique

Suppose we have two ϕ-irreducible chains Φ and Φ evolving on a common state space,
and that for some set C and for all n

Px (τC ≥ n) ≤ Px (τC ≥ n), x ∈ Cc . (9.44)

This is not uncommon if the chains have similarly defined structure, as is the case with
random walk and the associated walk on a half line.
The stochastic comparison method tells us that a classification of one of the chains
may automatically classify the other.
In one direction we have, provided C is a petite set for both chains, that when
Px (τC ≥ n) → 0 as n → ∞ for x ∈ C c , then not only is Φ Harris recurrent, but Φ is
also Harris recurrent.
This is obvious. Its value arises in cases where the first chain Φ has a (relatively)
simpler structure so that its analysis is straightforward through, say, drift conditions,
and when the validation of (9.44) is also relatively easy.
In many ways stochastic comparison arguments are even more valuable in the tran-
sient context: as we have seen with random walk, establishing transience may need a
rather delicate argument, and it is then useful to be able to classify “more transient”
chains easily.
Suppose that (9.44) holds, and again that C is a ϕ-irreducible petite set for both
chains. Then if Φ is transient, we know that from Theorem 8.3.6 that there exists
D ⊂ C c such that L(x, C) < 1 − ε for x ∈ D where ϕ(D) > 0; it then follows that Φ
is also transient.
We first illustrate the strengths and drawbacks of this method in proving transience
for the general random walk on the half line R+ .

Proposition 9.5.1. If Φ is random walk on R+ and if β > 0 then Φ is transient.

Proof Consider the discretized version Wh of the increment variable W with dis-
tribution
P(Wh = nh) = Γh (nh)
where Γh (nh) is constructed by setting, for every n,
(n +1)h
Γh (nh) = Γ(dw),
nh
220 Harris and topological recurrence

and let Φh be the corresponding random walk on the countable half line {nh, n ∈ Z+ }.
Then we have ﬁrstly that for any starting point nh, the chain Φh is “stochastically
smaller” than Φ, in the sense that if τ0h is the ﬁrst return time to zero by Φh then

P0 (τ0h ≤ k) ≥ P0 (τ0 ≤ k).

Hence Φ is transient if Φh is transient.

But now we have that
(n +1)h
βh := n nh Γh (nh) ≥ n nh (w − h)Γ(dw)
= (w − h)Γ(dw) (9.45)
= β−h

so that if h < β then βh > 0.

Finally, for such suﬃciently small h we have that the chain Φh is transient from
Proposition 9.1.2, as required.

Let us next consider the use of stochastic comparison methods for the scalar linear
model
Xn = αXn −1 + Wn .

Proposition 9.5.2. Suppose the increment variable W in the scalar linear model is
symmetric with density positive everywhere on [−R, R] and zero elsewhere. Then the
scalar linear model is Harris recurrent if and only if |α| ≤ 1.

Proof The linear model is, under the conditions on W , a µL e b -irreducible chain on
R with all compact sets petite.
Suppose α > 1. By stochastic comparison of this model with a random walk Φ on
a half line with mean increment α − 1 it is obvious that provided the starting point
x > 1, then (9.44) holds with C = (−∞, 1]. Since this set is transient for the random
walk, as we have just shown, it must therefore be transient for the scalar linear model.
Provided the starting point x < −1, then by symmetry, the hitting times on the set
C = [−1, ∞) are also inﬁnite with positive probability. This argument does not require
bounded increments.
If α < −1 then the chain oscillates. If the range of W is contained in [−R, R], with
R > 1, then by choosing x > R we have by symmetry that the hitting time of the chain
X0 , −X1 , X2 , −X3 , . . . on C = (−∞, 1] is stochastically bounded below by the hitting
time of the previous linear model with parameter |α|; thus the set [−R, R] is uniformly
transient for both models.
Thirdly, suppose that the 0 < α ≤ 1. Then by stochastic comparison with random
walk on a half line and mean increment α − 1, from x > R we have that the hitting
time on [−R, R] of the linear model is bounded above by the hitting time on [−R, R] of
the random walk; whilst by symmetry the same is true from x < −R. Since we know
random walk is Harris recurrent it follows that the linear model is Harris recurrent.
Finally, by considering an oscillating chain we have the same recurrence result for
−1 ≤ α ≤ 0.

The points to note in this example are
9.5. Stochastic comparison and increment analysis 221

(i) without some bounds on W , in general it is diﬃcult to get a stochastic comparison

argument for transience to work on the whole real line: on a half line, or equiv-
alently if α > 0, the transience argument does not need bounds, but if the chain
can oscillate then usually there is insuﬃcient monotonicity to exploit in sample
paths for a simple stochastic comparison argument to succeed;

(ii) even with α > 0, recurrence arguments on the whole line are also diﬃcult to
get to work. They tend to guarantee that the hitting times on half lines such as
C = (−∞, 1] are ﬁnite, and since these sets are not compact, we do not have a
guarantee of recurrence: indeed, for transient oscillating linear systems such half
lines are reached on alternate steps with higher and higher probability.

Thus in the case of unbounded increments more delicate arguments are usually needed,
and we illustrate one such method of analysis next.

9.5.2 Unrestricted random walk and SETAR models

Consider next the unrestricted random walk on R given by

Φn = Φn −1 + Wn .

This is easy to analyze in the transient situation using stochastic comparison arguments,
given the results already proved.

Proposition 9.5.3. If the mean increment of an irreducible random walk on R is

non-zero, then the walk is transient.

Proof Suppose that the mean increment of the random walk Φ is positive. Then
the hitting time τ{−∞,0} on {−∞, 0} from an initial point x > 0 is the same as the
hitting time on {0} itself for the associated random walk on the half line; and we have
shown this to be inﬁnite with positive probability. So the unrestricted walk is also
transient.
The argument if β < 0 is clearly symmetric.

This model is non-evanescent when β = 0, as we showed under a ﬁnite variance
assumption in Proposition 9.4.5.
Now let us consider the more complex SETAR model

Xn = φ(j) + θ(j)Xn −1 + Wn (j), Xn −1 ∈ Rj ,

where −∞ = r0 < r1 < · · · < rM = ∞ and Rj = (rj −1 , rj ]; recall that for each j,
the noise variables {Wn (j)} form independent zero-mean noise sequences, and again let
W (j) denote a generic variable in the sequence {Wn (j)}, with distribution Γj .
We will see in due course that under a second-order moment condition (SETAR3),
we can identify exactly the regions of the parameter space where this nonlinear chain
is transient, recurrent and so on.
Here we establish the parameter combinations under which transience will hold:
these are extensions of the non-zero mean increment regions of the random walk we
have just looked at.
222 Harris and topological recurrence

As suggested by Figure B.1–Figure B.3 let us call the exterior of the parameter
space the area deﬁned by
θ(1) > 1 (9.46)
θ(M ) > 1 (9.47)
θ(1) = 1, θ(M ) ≤ 1, φ(1) < 0 (9.48)
θ(1) ≤ 1, θ(M ) = 1, φ(M ) > 0 (9.49)
θ(1) < 0, θ(1)θ(M ) > 1 (9.50)
θ(1) < 0, θ(1)θ(M ) = 1, φ(M ) + θ(M )φ(1) < 0 (9.51)
In order to make the analysis more straightforward we will make the following assump-
tion as appropriate.

(SETAR3) The variances of the noise distributions for the two end inter-
vals are ﬁnite; that is,

E(W 2 (1)) < ∞, E(W 2 (M )) < ∞.

Proposition 9.5.4. For the SETAR model satisfying the assumptions (SETAR1)–
(SETAR3), the chain is transient in the exterior of the parameter space.

Proof Suppose (9.47) holds. Then the chain is transient, as we show by stochastic
comparison arguments. For until the ﬁrst time the chain enters (−∞, −rM −1 ) it follows
the sample paths of a model

Xn = φ(M ) + θ(M )Xn −1 + WM

and for this linear model Px (τ(−∞,0) < ∞) < 1 for all sufficiently large x, as in the proof
of Theorem 9.5.2, by comparison with random walk.
When (9.46) holds, the chain is transient by symmetry: we find Px (τ(0,∞,) < ∞) < 1
for all sufficiently negative x.
When (9.50) holds the same argument can be used, but now for the two step chain:
the one-step chain undergoes larger and larger oscillations and thus there is a positive
probability of never returning to the set [r1 , rM −1 ] for starting points of sufficiently
large magnitude.
Suppose (9.48) holds and begin the process at xo < min(0, r1 ). Then until the first
time the process exits (−∞, min(0, r1 )), it has exactly the sample paths of a random
walk with negative drift, which we showed to be transient in Section 8.5. The proof of
transience when (9.49) holds is similar.
We finally show the chain is transient if (9.51) holds, and for this we need (SETAR3).
Here we also need to exploit Theorem 8.4.2 directly rather than construct a stochastic
comparison argument.
9.5. Stochastic comparison and increment analysis 223

Let a and b be positive constants such that −b/a = θ(1) = 1/θ(M ). Since φ(M ) +
θ(M )φ(1) < 0 we can choose u and v such that −aφ(1) < au + bv < −bφ(M ). Choose
c positive such that

c/a − u > max(0, rM −1 ), −c/b − v < min(0, r1 ).

Consider the function



1 − 1/a(x + u), x > c/a − u,
V (x) = 1 − 1/c, −c/b − v < x < c/a − u,


1 + 1/b(x + v), x < −c/b − v.

Suppose x > R > c/a − u, where R is to be chosen. Let

λ(x) = φ(M ) + θ(M )x + v

and
δ(x) = φ(M ) + θ(M )x + u.
If we write
V0 (x) = −a−1 E[(1/(δ(x) + W (M )))I[W (M )> c/a−δ (x)] ],
V1 (x) = −c−1 P (−c/b − λ(x) < W (M ) < c/a − δ(x)),
V2 (x) = 1/a(x + u) + b−1 E[(1/(λ(x) + W (M )))([W (M )< −c/b−λ(x)] ],

then we get
Ex [V (X1 )] = V (x) + V0 (x) + V1 (x) + V2 (x). (9.52)
−2
It is easy to show that both V0 (x) and V1 (x) are o(x ). Since

1/(λ(x) + W (M )) = 1/λ(x) − W (M )/λ(x)(λ(x) + W (M )),

the second summand of V2 (x) equals

ΓM (−∞, −c/b − λ(x))/bλ(x) − E[(W (M )/λ(x)(λ(x) + W (M )))I[W (M )< −c/b−λ(x)] ].

Since for 0 < W (M ) < −c/b − λ(x),

1/(1 + W (M )/λ(x)) ≤ 1 + bW (M )/c,

we have in this case that for x large enough

0 ≥ −x2 W (M )/λ(x)(λ(x) + W (M ))
≥ −x2 W (M )(1 + bW (M )/c)/λ2 (x)
≥ −2W (M )(1 + bW (M )/c)/θ2 (M ); (9.53)

whilst for W (M ) ≤ 0, we have

1/(1 + W (M )/λ(x)) ≤ 1
224 Harris and topological recurrence

and so

0 ≤ −x2 W (M )/λ(x)(λ(x) + W (M ))
≤ −x2 W (M )/λ2 (x)
≤ −2W (M )/θ2 (M ). (9.54)

Thus, by the Dominated Convergence Theorem,

lim x2 E[−W (M )/λ(x)(λ(x) + W (M ))I[W (M )< −c/b−λ(x)] ]

(9.55)
= E[−W (M )/θ2 (M )] = 0.

From (9.55) we therefore see that V2 equals

1/a(x + u) + 1/bλ(x) − ΓM (−c/b − λ(x), ∞)/bλ(x) − o(x−2 )

= (bφ(M ) + bv + au)/abλ(x)(x + u) − o(x−2 ).

We now have from the breakup (9.52) that by choosing R large enough

Ex [V (X1 )] = V (x) + (bφ(M ) + bv + au)/abλ(x)(x + u) − o(x−2 )

≥ V (x), x > R. (9.56)

Similarly, for x < −R < −c/b − v < r1 , it can be shown that

Ex [V (X1 )] ≥ V (x).

We may thus apply Theorem 8.4.2 with the set C taken to be [−R, R] and the test
function V above to conclude that the process is transient.

9.5.3 General chains with bounded increments

One of the more subtle uses of the drift conditions involves a development of the in-
terplay between ﬁrst and second moment conditions in determining recurrence or tran-
sience of a chain.
When the state space is R, then even for a chain Φ which is not a random walk it
makes obvious sense to talk about the increment at x, deﬁned by the random variable

Wx = {Φ1 − Φ0 | Φ0 = x} (9.57)

with probability law

Γx (A) = P(Φ1 ∈ A + x | Φ0 = x).
The deﬁning characteristic of the random walk model is then that the law Γx is inde-
pendent of x, giving the characteristic spatial homogeneity to the model.
In general we can deﬁne the “mean drift” at x by

m(x) = Ex [Wx ] = w Γx (dw)

so that m(x) = ∆V (x) for the special choice of V (x) = x.

9.5. Stochastic comparison and increment analysis 225

Let us denote the second moment of the drift at x by

v(x) = Ex [Wx ] = w2 Γx (dw).
2

We will now show that there is a threshold or detailed balance eﬀect between these two
quantities in considering the stability of the chain.
For ease of exposition let us consider the case where the increments again have
uniformly bounded range: that is, for some R and all x,

Γx [−R, R] = 1. (9.58)

To avoid somewhat messy calculations such as those for the random walk or SETAR
models above we will fix the state space as R+ and we will make the assumption that
the measures Γx give sufficient weight to the negative half line to ensure that the chain
is a δ0 -irreducible T-chain and also that v(x) is bounded from zero: this ensures that
recurrence means that τ0 is finite with probability one and that transience means that
P0 (τ0 < ∞) < 1. The δ0 -irreducibility and T-chain properties will of course follow from
assuming, for example, that ε < Γx (−∞, −ε) for some ε > 0.

Theorem 9.5.5. For the chain Φ with increment (9.57) we have

(i) if there exists θ < 1 and x0 such that for all x > x0

m(x) ≤ θv(x)/2x, (9.59)

then Φ is recurrent;

(ii) if there exists θ > 1 and x0 such that for all x > x0

m(x) ≥ θv(x)/2x, (9.60)

then Φ is transient.

Proof (i) We use Theorem 9.1.8, with the test function

V (x) = log(1 + x), x≥0: (9.61)

for this test function (V1) requires

∞
Γx (dw)[log(w + x + 1) − log(x + 1)] ≤ 0, (9.62)
−x

and using the bounded range of the increments, the integral in (9.62) after a Taylor
series expansion is, for x > R,
R
Γx (dw)[w/(x + 1) − w2 /2(x + 1)2 + o(x−2 )]
−R (9.63)
−2
= m(x)/(x + 1) − v(x)/2(x + 1) + o(x 2
).
226 Harris and topological recurrence

If x > x0 for suﬃciently large x0 > R, and m(x) ≤ θv(x)/2x, then

P (x, dy)V (y) ≤ V (x)

and hence from Theorem 9.1.8 we have that the chain is recurrent.
(ii) It is obvious with the assumption of positive mean for Γx that for any x the
sets [0, x] and [x, ∞) are both in B + (X).
In order to use Theorem 9.1.8, we will establish that for some suitable monotonic
increasing V
P (x, dy)V (y) ≥ V (x) (9.64)
y

for x ≥ x0 . An appropriate test function in this case is given by

V (x) = 1 − [1 + x]−α , x≥0: (9.65)

we can write (9.64) for x > R as

R
Γx (dw)[(w + x + 1)−α − (x + 1)−α ] ≥ 0. (9.66)
−R

Applying Taylor’s Theorem we see that for all w we have that the integral in (9.66)
equals
αm(x)/(x + 1)1+α − αv(x)/2(x + 1)2+α + O(x−3−α ). (9.67)
Now choose α < θ − 1. For sufficiently large x0 we have that if x > x0 then from (9.67)
we have that (9.66) holds and so the chain is transient.

The fact that this detailed balance between first and second moments is a determi-
nant of the stability properties of the chain is not surprising: on the space R+ all of the
drift conditions are essentially linearizations of the motion of the chain, and virtually
independently of the test functions chosen, a two-term Taylor series expansion will lead
to the results we have described.
One of the more interesting and rather counter-intuitive facets of these results is
that it is possible for the first-order mean drift m(x) to be positive and for the chain to
still be recurrent: in such circumstances it is the occasional negative jump thrown up
by a distribution with a variance large in proportion to its general positive drift which
will give recurrence.
Some weakening of the bounded range assumption is obviously possible for these
results: the proofs then necessitate a rather more subtle analysis and expansion of the
integrals involved. By choosing the iterated logarithm

V (x) = log log(x + c)

as the test function for recurrence, and by more detailed analysis of the function

V (x) = 1 − [1 + x]−α

as a test for transience, it is in fact possible to develop the following result, whose proof
we omit.
9.5. Stochastic comparison and increment analysis 227

Theorem 9.5.6. Suppose the increment Wx given by (9.57) satisﬁes

sup Ex [|Wx |2+ε ] < ∞

for some ε > 0. Then

(i) if there exists δ > 0 and x0 such that for all x > x0

m(x) ≤ v(x)/2x + O(x−1−δ ), (9.68)

the chain Φ is recurrent;

(ii) if there exists θ > 1 and x0 such that for all x > x0

m(x) ≥ θv(x)/2x, (9.69)

then Φ is transient.

The bounds on the spread of Γx may seem somewhat artifacts of the methods of
proof used, and of course we well know that the zero-mean random walk is recurrent
even though a proof using an approach based upon a drift condition has not yet been
developed to our knowledge.
We conclude this section with a simple example showing that we cannot expect to
drop the higher moment conditions completely.
Let X = Z+ , and let

P (x, x + 1) = 1 − c/x, P (x, 0) = c/x, x>0

with P (0, 1) = 1.
Then the chain is easily shown to be recurrent by a direct calculation that for all
n>1
1
n
P0 (τ0 > n) = [1 − c/x].
x=1

But we have m(x) = −c + 1 − c/x and v(x) = cx + 1 − c/x so that

2xm(x) − v(x) = (2 − 3c)x2 − (c + 1)x + c,

which is clearly positive for c < 2/3: hence if Theorem 9.5.6 were applicable we should
have the chain transient.
Of course, in this case we have

Ex [|Wx |2+ε ] = x2+ε c/x + 1 − c/x > x1+ε

and the bound on this higher moment, required in the proof of Theorem 9.5.6, is obvi-
ously violated.
228 Harris and topological recurrence

9.6 Commentary
Harris chains are named after T. E. Harris who introduced many of the essential ideas
in [155]. The important result in Theorem 9.1.3, which enables the properties of Q to
be linked to those of L, is due to Orey [308], and our proof follows that in [309]. That
recurrent chains are “almost” Harris was shown by Tuominen [390], although the key
links between the powerful Harris properties and other seemingly weaker recurrence
properties were developed initially by Jain and Jamison [172].
We have taken the proof of transience for random walk on Z using the Strong Law
of Large Numbers from Spitzer [369].
Non-evanescence is a common form of recurrence for chains on Rk : see, for example,
Khas’minskii [206]. The links between evanescent and transient chains, and the equiva-
lence between Harris and non-evanescent chains under the T-chain condition, are taken
from Meyn and Tweedie [277], who proved Theorem 9.2.2. Most of the connections
between neighborhood and global behavior of chains are given by Rosenblatt [338, 339]
and Tuominen and Tweedie [391].
The criteria for non-evanescence or Harris recurrence here are of course closely re-
lated to those in the previous chapter. The martingale argument for non-evanescence
is in [277] and [398], but can be traced back in essentially the same form to Lamperti
[234]. The converse to the recurrence criterion under the Feller condition, and the fact
that it does not hold in general, are new: the construction of the converse function V
is however based on a similar result for countable chains, in Mertens et al. [258].
The term “coercive” to describe functions whose sublevel sets are precompact is
new. The justification for the terminology is that coercive functions do, in most of
our contexts, measure the distance from a point to a compact “center” of the state
space. This will become clearer in later chapters when we see that under a suitable
drift condition, the mean time to reach some compact set from Φ0 = x is bounded by
a constant multiple of V (x). Hence V (x) bounds the mean “distance” to this compact
set, measured in units of time. Beneš in [24] uses the term moment for these functions.
Since “moments” are standard in referring to the expectations of random variables, this
terminology is obviously inappropriate here.
Stochastic comparison arguments have been used for far too long to give a detailed
attribution. For proving transience, in particular, they are a most effective tool. The
analysis we present here of the SETAR model is essentially in Petruccelli et al. [315]
and Chan et al. [64].
The analysis of chains via their increments, and the delicate balance required be-
tween m(x) and v(x) for recurrence and transience, is found in Lamperti [234]; see also
Tweedie [398]. Growth models for which m(x) ≥ θv(x)/2x are studied by, for example,
Kersting (see [205]), and their analysis via suitable renormalization proves a fruitful
approach to such transient chains.
It may appear that we are devoting a disproportionate amount of space to unstable
chains, and too little to chains with stability properties. This will be rectified in the
rest of the book, where we will be considering virtually nothing but chains with ever
stronger stability properties.
Chapter 10

The existence of π

In our treatment of the structure and stability concepts for irreducible chains we have
to this point considered only the dichotomy between transient and recurrent chains.
For transient chains there are many areas of theory that we shall not investigate
further, despite the flourishing research that has taken place in both the mathematical
development and the application of transient chains in recent years. Areas which are
notable omissions from our treatment of Markovian models thus include the study of
potential theory and boundary theory [326], as well as the study of renormalized models
approximated by diffusions and the quasi-stationary theory of transient processes [108,
4].
Rather, we concentrate on recurrent chains which have stable properties without
renormalization of any kind, and develop the consequences of the concept of recurrence.
In this chapter we further divide recurrent chains into positive and null recurrent
chains, and show here and in the next chapter that the former class provide stochastic
stability of a far stronger kind than the latter.
For many purposes, the strongest possible form of stability that we might require
in the presence of persistent variation is that the distribution of Φn does not change as
n takes on different values. If this is the case, then by the Markov property it follows
that the finite dimensional distributions of Φ are invariant under translation in time.
Such considerations lead us to the consideration of invariant measures.

Invariant measures
A σ-ﬁnite measure π on B(X) with the property

π(A) = π(dx)P (x, A), A ∈ B(X) (10.1)
X

will be called invariant.

Although we develop a number of results concerning invariant measures, the key

229
230 The existence of π

conclusion in this chapter is undoubtedly

Theorem 10.0.1. If the chain Φ is recurrent then it admits a unique (up to constant
multiples) invariant measure π, and the measure π has the representation, for any
A ∈ B + (X)

τA
π(B) = π(dw)Ew I{Φn ∈ B} , B ∈ B(X). (10.2)
A n =1

The invariant measure π is ﬁnite (rather than merely σ-ﬁnite) if there exists a petite
set C such that
sup Ex [τC ] < ∞.
x∈C

Proof The existence and representation of invariant measures for recurrent chains
is proved in full generality in Theorem 10.4.9: the proof exploits, via the Nummelin split-
ting technique, the corresponding theorem for chains with atoms as in Theorem 10.2.1,
in conjunction with a representation for invariant measures given in Theorem 10.4.9.
The criterion for finiteness of π is in Theorem 10.4.10.

If an invariant measure is finite, then it may be normalized to a stationary probabil-
ity measure, and in practice this is the main stable situation of interest. If an invariant
measure has infinite total mass, then its probabilistic interpretation is much more dif-
ficult, although for recurrent chains, there is at least the interpretation as described in
(10.2).
These results lead us to define the following classes of chains.

Positive and null chains

Suppose that Φ is ψ-irreducible, and admits an invariant probability mea-
sure π. Then Φ is called a positive chain.
If Φ does not admit such a measure, then we call Φ null .

10.1 Stationarity and invariance

10.1.1 Invariant measures
Processes with the property that for any k, the marginal distribution of {Φn , . . . , Φn +k }
does not change as n varies are called stationary processes, and whilst it is clear that
in general a Markov chain will not be stationary, since in a particular realization we
may have Φ0 = x with probability one for some ﬁxed x, it is possible that with an
appropriate choice of the initial distribution for Φ0 we may produce a stationary process
{Φn , n ∈ Z+ }.
It is immediate that we only need to consider a form of ﬁrst step stationarity in
order to generate an entire stationary process. Given an initial invariant probability
10.1. Stationarity and invariance 231

measure π such that

π(A) = π(dw)P (w, A), (10.3)
X
we can iterate to give
6 7
π(A) = X X
π(dx)P (x, dw) P (w, A)

= X
π(dx) X
P (x, dw)P (w, A)

= X
π(dx)P 2 (x, A)
..
.
= X
π(dx)P n (x, A) = Pπ (Φn ∈ A),

for any n and all A ∈ B(X).

From the Markov property, it is clear that Φ is stationary if and only if the distri-
bution of Φn does not vary with time. We have immediately
Proposition 10.1.1. If the chain Φ is positive, then it is recurrent.

Proof Suppose that the chain is positive and let π be a invariant probability mea-
sure. If the chain is also transient, let Aj be a countable cover of X with uniformly
transient sets, as guaranteed by Theorem 8.3.4, with U (x, Aj ) ≤ Mj , say.
Using (10.4) we have for any j, k
k

kπ(Aj ) = π(dw)P n (w, Aj ) ≤ Mj
n =1

and since the left hand side remains ﬁnite as k → ∞, we have π(Aj ) = 0. This implies
π is trivial so we have a contradiction.

Positive chains are often called “positive recurrent” to reinforce the fact that they
are recurrent. This also naturally gives the deﬁnition

Positive Harris chains

If Φ is Harris recurrent and positive, then Φ is called a positive Harris
chain.

It is of course not yet clear that an invariant probability measure π ever exists, or
whether it will be unique when it does exist. It is the major purpose of this chapter to
find conditions for the existence of π, and to prove that for any positive (and indeed
recurrent) chain, π is essentially unique.
Invariant probability measures are important not merely because they define sta-
tionary processes. They will also turn out to be the measures which define the long
term or ergodic behavior of the chain. To understand why this should be plausible,
232 The existence of π

consider Pµ (Φn ∈ · ) for any starting distribution µ. If a limiting measure γµ exists in

a suitable topology on the space of probability measures, such as
Pµ (Xn ∈ A) → γµ (A)
for all A ∈ B(X), then

γµ (A) = lim µ(dx)P n (x, A)
n →∞

= lim µ(dx) P n −1 (x, dw)P (w, A)
n →∞ X

= γµ (dw)P (w, A), (10.4)
X

since setwise convergence of µ(dx)P n (x, ·) implies convergence of integrals of bounded
measurable functions such as P (w, A).
Hence if a limiting distribution exists, it is an invariant probability measure; and
obviously, if there is a unique invariant probability measure, the limit γµ will be inde-
pendent of µ whenever it exists.
We will not study the existence of such limits properly until Part III, where our goal
will be to develop asymptotic properties of Φ in some detail. However, motivated by
these ideas, we will give in Section 10.5 one example, the linear model, where this route
leads to the existence of an invariant probability measure.

10.1.2 Subinvariant measures

The easiest way to investigate the existence of π is to consider a yet wider class of
measures, satisfying inequalities related to the invariant equation (10.1).

Subinvariant measures
If µ is σ-ﬁnite and satisﬁes

µ(A) ≥ µ(dx)P (x, A), A ∈ B(X), (10.5)
X

then µ is called subinvariant.

The following generalization of the subinvariance equation (10.5) is often useful: we

have, by iterating (10.5),
µ(B) ≥ µ(dw)P n (w, B)

and hence, multiplying by a(n) and summing,

µ(B) ≥ µ(dw)Ka (w, B), (10.6)
10.1. Stationarity and invariance 233

for any sampling distribution a.

We begin with some structural results for arbitrary subinvariant measures.

Proposition 10.1.2. Suppose that Φ is ψ-irreducible. If µ is any measure satisfying

(10.5) with µ(A) < ∞ for some one A ∈ B+ (X), then

(i) µ is σ-ﬁnite, and thus µ is a subinvariant measure;

(ii) µ ψ;

(iii) if C is petite then µ(C) < ∞;

(iv) if µ(X) < ∞ then µ is invariant.

Proof Suppose µ(A) < ∞ for some A with ψ(A) > 0. Using A∗ (j) = {y :
Ka 1 / 2 (y, A) > j −1 }, we have by (10.6),

∞ > µ(A) ≥ µ(dw)Ka 1 / 2 (w, A) ≥ j −1 µ(A∗ (j));
A ∗ (j )

!
since A∗ (j) = X when ψ(A) > 0, such a µ must be σ-ﬁnite.
To prove (ii) observe that, by (10.6), if B ∈ B+ (X) we have µ(B) > 0, so µ ψ.
Thirdly, if C is νa -petite then there exists a set B with νa (B) > 0 and µ(B) < ∞,
from (i). By (10.6) we have

µ(B) ≥ µ(dw)Ka (w, B) ≥ µ(C)νa (B) (10.7)

and so µ(C) < ∞ as required.

Finally, if there exists some A such that µ(A) > µ(dy)P (y, A) then we have

µ(X) = µ(A) + µ(Ac ) > µ(dy)P (y, A) + µ(dy)P (y, Ac )

= µ(dy)P (y, X)

= µ(X) (10.8)

and if µ(X) < ∞ we have a contradiction.

The major questions of interest in studying subinvariant measures lie with recurrent
chains, for we always have

Proposition 10.1.3. If the chain Φ is transient, then there exists a strictly subinvari-
ant measure for Φ.
234 The existence of π

Proof Suppose that Φ is transient: then by Theorem 8.3.4, we have that the
measures µx given by
µx (A) = U (x, A), A ∈ B(X),

are σ-ﬁnite; and trivially

µx (A) = P (x, A) + µx (dy)P (y, A) ≥ µx (dy)P (y, A), A ∈ B(X) (10.9)

so that each µx is subinvariant (and obviously strictly subinvariant, since there is some
A with µx (A) < ∞ such that P (x, A) > 0).

We now move on to study recurrent chains, where the existence of a subinvariant

measure is less obvious.

10.2 The existence of π: chains with atoms

Rather than pursue the question of existence of invariant and subinvariant measures
on a fully countable space in the ﬁrst instance, we prove here that the existence of just
one atom α in the space is enough to describe completely the existence and structure
of such measures.
The following theorem obviously incorporates countable space chains as a special
case; but the main value of this presentation will be in the development of a theory for
general space chains via the split chain construction of Section 5.1.

Theorem 10.2.1. Suppose Φ is ψ-irreducible, and X contains an accessible atom α.

(i) There is always a subinvariant measure µ◦α for Φ given by

∞

µ◦α (A) = Uα (α, A) = αP
n
(α, A), A ∈ B(X); (10.10)
n =1

and µ◦α is invariant if and only if Φ is recurrent.

(ii) The measure µ◦α is minimal in the sense that if µ is subinvariant with µ(α) = 1,
then
µ(A) ≥ µ◦α (A), A ∈ B(X).

When Φ is recurrent, µ◦α is the unique (sub)invariant measure with µ(α) = 1.

(iii) The subinvariant measure µ◦α is a ﬁnite measure if and only if

Eα [τα ] < ∞,

in which case µ◦α is invariant.

10.2. The existence of π: chains with atoms 235

Proof (i) By construction we have for A ∈ B(X)

∞
µ◦α (dy)P (y, A) = µ◦α (α)P (α, A) + n
α P (α, dy)P (y, A)
X αc n =1
∞

≤ α P (α, A) + αP
n
(α, A) (10.11)
n =2
= µ◦α (A),

where the inequality comes from the bound µ◦α (α) ≤ 1. Thus µ◦α is subinvariant, and
is invariant if and only if µ◦α (α) = Pα (τα < ∞) = 1; that is, from Proposition 8.3.1, if
and only if the chain is recurrent.
(ii) Let µ be any subinvariant measure with µ(α) = 1. By subinvariance,

µ(A) ≥ µ(dw)P (w, A)
X
≥ µ(α)P (α, A) = P (α, A).
n
Assume inductively that µ(A) ≥ m =1 α P m (α, A), for all A. Then by subinvariance,

µ(A) ≥ µ(α)P (α, A) + µ(dw)P (w, A)
αc
8 9
n
≥ P (α, A) + m
α P (α, dw) P (w, A)
αc m =1

n +1
m
= αP (α, A).
m =1

Taking n ↑ ∞ shows that µ(A) ≥ µ◦α (A) for all A ∈ B(X).

Suppose Φ is recurrent, so that µ◦α (α) = 1. If µ◦α diﬀers from µ, there exists A
and n such that µ(A) > µ◦α (A) and P n (w, α) > 0 for all w ∈ A, since ψ(α) > 0. By
minimality, subinvariance of µ, and invariance of µ◦α ,

1 = µ(α) ≥ µ(dw)P n (w, α)
X

> µ◦α (dw)P n (w, α)

X
= µ◦α (α) = 1.

Hence we must have µ = µ◦α , and thus when Φ is recurrent, µ◦α is the unique (sub)
invariant measure.
(iii) If µ◦α is ﬁnite it follows from Proposition 10.1.2 (iv) that µ◦α is invariant.
Finally
∞

µ◦α (X) = Pα (τα ≥ n) (10.12)
n =1

and so an invariant probability measure exists if and only if the mean return time to α
is ﬁnite, as stated.

236 The existence of π

We shall use π to denote the unique invariant measure in the recurrent case. Unless
stated otherwise we will assume π is normalized to be a probability measure when π(X)
is ﬁnite.
The invariant measure µ◦α has an equivalent sample path representation for recurrent
chains:

τα
◦
µα (A) = Eα I{Φn ∈ A} , A ∈ B(X). (10.13)
n =1
This follows from the deﬁnition of the taboo probabilities α P n .
As an immediate consequence of this construction we have the following elegant
criterion for positivity.
Theorem 10.2.2 (Kac’s Theorem). If Φ is ψ-irreducible and admits an atom α ∈
B + (X), then Φ is positive recurrent if and only if Eα [τα ] < ∞; and if π is the invariant
probability measure for Φ, then
π(α) = (Eα [τα ])−1 . (10.14)

Proof If Eα [τα ] < ∞, then also L(α, α) = 1, and by Proposition 8.3.1 Φ is recur-
rent; it follows from the structure of π in (10.10) that π is ﬁnite so that the chain is
positive.
Conversely, Eα [τα ] < ∞ when the chain is positive from the structure of the unique
invariant measure.
By the uniqueness of the invariant measure normalized to be a probability measure
π we have
µ◦ (α) Uα (α, α) 1
π(α) = α◦ = =
µα (X) Uα (α, X) Eα [τα ]
which is (10.14).

The relationship (10.14) is often known as Kac’s Theorem. For countable state space
models it immediately gives us
Proposition 10.2.3. For a positive recurrent irreducible Markov chain on a countable
space, there is a unique (up to constant multiples) invariant measure π given by
π(x) = [Ex [τx ]]−1
for every x ∈ X.

We now illustrate the use of the representation of π for a number of countable space
models.

10.3 Invariant measures for countable space models*

10.3.1 Renewal chains
Forward recurrence time chains
Consider the forward recurrence time process V + with
P (1, j) = p(j), j ≥ 1; P (j, j − 1) = 1, j > 1. (10.15)
10.3. Invariant measures for countable space models* 237

As noted in Section 8.1.2, this chain is always recurrent since p(j) = 1.
By construction we have that

1P
n
(1, j) = p(j + n − 1), j ≤ n,

and zero otherwise; thus the minimal invariant measure satisﬁes

π(j) = U1 (1, j) = p(n) (10.16)
n ≥j

which is ﬁnite if and only if

∞
∞
∞ ∞

π(j) = p(n) = np(n) < ∞ : (10.17)
j =1 j =1 n =j n =1

that is, if and only if the renewal distribution {p(i)} has ﬁnite mean.
It is, of course, equally easy to deduce this formula by solving the invariant equations
themselves, but the result is perhaps more illuminating from this approach.
Now suppose that the distribution {p(j)} is periodic with period d: that is, the
greatest common divisor of the set Np = {n : p(n) > 0} is d. Let [Np ] denote the span
of Np ,
( )
[Np ] = mi ri : mi ∈ Z+ , ri ∈ Np .
We have P n (j, 1) > 0 whenever n − j + 1 ∈ [Np ].
By Lemma D.7.4 there exists an integer n0 < ∞ such that nd ∈ [Np ] for all n ≥ n0 .
If d = 1 it follows that the forward recurrence time process V + is aperiodic, since in
this case
P n (j, 1) > 0, n − j + 1 ≥ n0 . (10.18)

Linked forward recurrence time chains

Consider the forward recurrence time chain with transition law (10.15), and deﬁne the
bivariate chain V ∗ = (V1+ (n), V2+ (n)) on the space X∗ := {1, 2, . . .} × {1, 2, . . .}, with
the transition law
P ((i, j), (i − 1, j − 1)) = 1, i, j > 1;
P ((1, j), (k, j − 1)) = p(k), k, j > 1;
(10.19)
P ((i, 1), (i − 1, k)) = p(k), i, k > 1;
P ((1, 1), (j, k)) = p(j)p(k), j, k > 1.

This chain is constructed by taking the two independent copies V1+ (n), V2+ (n) of the
forward recurrence time chain and running them independently. It then follows from
(10.18) that V ∗ is ψ-irreducible if {p(j)} has period d = 1.
Moreover V ∗ is positive Harris recurrent on X∗ provided only k kp(k) < ∞, as
was the case for the single copy of the forward recurrence time chain. To prove this we
need only note that the product measure π ∗ (i, j) = π(i)π(j) is invariant for V ∗ , where

π(j) = p(k)/ kp(k)
k ≥j k
238 The existence of π

is the invariant probability measure for the forward recurrence time process from (10.16)
and (10.17); positive Harris recurrence follows since π ∗ (X∗ ) = [π(X)]2 = 1.
These conditions for positive recurrence of the bivariate forward time process will
be of critical use in the development of the asymptotic properties of general chains in
Part III.

10.3.2 The number in an M/G/1 queue

Recall from Section 3.3.3 that N ∗ is a modiﬁed random walk on a half line with in-
crement distribution concentrated on the integers {. . . , −1, 0, 1} having the transition
probability matrix of the form
 
q0 q 1 q 2 q 3 . . .
 q0 q1 q2 q3 . . . 
 
P = q0 q1 q2 . . .  
 q0 q1 . . . 
q0 . . .

where qi = P(Z = i − 1) for the increment variable in the chain when the server is busy;
that is, for transitions from states other than {0}. The chain N ∗ is always ψ-irreducible
if q0 > 0, and irreducible in the standard sense if also q0 + q1 < 1, and we shall assume
this to be the case to avoid trivialities.
In this case, we can actually solve the invariant equations explicitly. For j ≥ 1,
(10.1) can be written

j +1
π(j) = π(k)qj +1−k (10.20)
k =0
and if we deﬁne
∞

q̄j = qn
n =j +1

we get the system of equations

π(1)q0 = π(0)q̄0 ,
π(2)q0 = π(0)q̄1 + π(1)q̄1 ,
π(3)q0 = π(0)q̄2 + π(1)q̄2 + π(2)q̄1 ,
..
.
In this case, therefore, we always get a unique invariant measure, regardless of the
transience or recurrence of the chain.
The criterion for positivity follows from (10.21). Note that the mean increment β
of Z satisﬁes
β= q̄j − 1
j ≥0

so that formally summing both sides of (10.21) gives, since q0 = 1 − q̄0 ,

∞
∞

(1 − q̄0 ) π(j) = (β + 1)π(0) + (β + 1 − q̄0 ) π(j). (10.21)
j =1 j =1
10.3. Invariant measures for countable space models* 239

If the chain is positive, this implies

∞

∞> π(j) = −π(0)(β + 1)/β,
j =1

so, since β > −1, we must have β < 0. Conversely, if β < 0, and we take

π(0) = −β,

then the same summation (10.21) indicates that the invariant measure π is ﬁnite.
Thus we have

Proposition 10.3.1. The chain N ∗ is positive if and only if the increment distribution
satisﬁes β = jqj < 1.

This same type of direct calculation can be carried out for any so-called “skip-free”
chain with P (i, j) = 0 for j < i − 1, such as the forward recurrence time chain above.
For other chains it can be far less easy to get a direct approach to the invariant measure
through the invariant equations, and we turn to the representation in (10.10) for our
results.

10.3.3 The number in a GI/M/1 queue

We illustrate the use of the structural result in giving a novel interpretation of an old
result for the speciﬁc random walk on a half line N corresponding to the number in a
GI/M/1 queue.
Recall from Section 3.3.3 that N has increment distribution concentrated on the
integers {. . . , −1, 0, 1} giving the transition probability matrix of the form
 ∞ 
1 pi p0
 ∞ 


P =  ∞ 2 p i p 1 p 0
p2 p1 p0 . . . 

0
 3 pi 
.. .. .. ..
. . . .

where pi = P(Z = 1 − i). The chain N is ψ-irreducible if p0 + p1 < 1, and irreducible

if p0 > 0 also. Assume these inequalities hold, and let {0} = α be our atom.
To investigate the existence of an invariant measure for N , we know from Theo-
rem 10.2.1 that we should look at the quantities α P n (α, j).
Write [k] = {0, . . . , k}. Because the chain can only move up one step at a time, so
the last visit to [k] is at k itself, we have on decomposing over the last visit to [k], for
k≥1
n
n r n −r
α P (α, k + 1) = α P (α, k)[k ] P (k, k + 1). (10.22)
r =1

Now the translation invariance property of P implies that for j > k

[k ] P
r
(k, j) = α P r (α, j − k). (10.23)
240 The existence of π

Thus, summing (10.22) from 1 to ∞ gives

∞
8∞ 9 8∞ 9

n n n
α P (α, k + 1) = α P (α, k) [k ] P (k, k + 1)
n =1 n =1 n =1
8 ∞
9 8 ∞
9

n n
= αP (α, k) αP (α, 1) .
n =1 n =1

Using the form (10.10) of µ◦α , we have now shown that

µ◦α (k + 1) = µ◦α (k)µ◦α (1),

and so the minimal invariant measure satisﬁes

µ◦α (k) = skα (10.24)

where sα = µ◦α (1).

The chain then has an invariant probability measure if and only if we can ﬁnd
sα < 1 for which the measure µ◦α deﬁned by the geometric form (10.24) is a solution to
the subinvariant equations for P : otherwise the minimal subinvariant measure is not
summable.
We can go further and identify these two cases in terms of the underlying parameters
pj . Consider the second (that is, the k = 1) invariant equation

µ◦α (1) = µ◦α (k)P (k, 1).

This shows that sα must be a solution to

∞

s= pj sj , (10.25)
0

and since µ◦α is minimal it must be the smallest solution to (10.25). As is well known,
there are two cases to consider: since the function of s on the right hand side of (10.25)
is strictly convex, a solution s ∈ (0, 1) exists if and only if
∞

jpj > 1,
0

whilst if j j pj ≤ 1 then the minimal solution to (10.25) is sα = 1.
◦
One can then verify directly that in each
of these cases µα solves all of the invariant
equations, as required. In particular, if j j pj = 1 so that the chain is recurrent from
the remarks following Proposition 9.1.2, the unique invariant measure is µα (x) ≡ 1, x ∈
X: note that in this case, in fact, the ﬁrst invariant equation is exactly

1= pn = j pj .
j ≥0 n > j j

Hence for recurrent chains (those for which j j pj ≥ 1) we have shown
10.4. The existence of π: ψ-irreducible chains 241

Proposition 10.3.2. The unique subinvariant measure for N is given by µα (k) = skα ,
α is the minimal solution to (10.25) in (0, 1]; and N is positive recurrent if and
where s
only if j j pj > 1.

The geometric form (10.24), as a “trial solution” to the equation (10.1), is often
presented in an arbitrary way: the use of Theorem 10.2.1 motivates this solution, and
also shows that sα in (10.24) has an interpretation as the expected number of visits to
state k + 1 from state k, for any k.

10.4 The existence of π: ψ-irreducible chains

10.4.1 Invariant measures for recurrent chains
We prove in this section that a general recurrent ψ-irreducible chain has an invariant
measure, using the Nummelin splitting technique.
First we show how subinvariant measures for the split chain correspond with subin-
variant measures for Φ.
Proposition 10.4.1. Suppose that Φ is a strongly aperiodic Markov chain and let Φ̌
denote the split chain. Then:
(i) If the measure π̌ is invariant for Φ̌, then the measure π on B(X) deﬁned by

π(A) = π̌(A0 ∪ A1 ), A ∈ B(X), (10.26)

is invariant for Φ, and π̌ = π ∗ .

(ii) If µ is any subinvariant measure for Φ then µ∗ is subinvariant for Φ̌, and if µ is
invariant then so is µ∗ .

Proof To prove (i) note that by (5.5), (5.6), and (5.7), we have that the measure
P̌ (xi , · ) is of the form µ∗x i for any xi ∈ X̌, where µx i is a probability measure on X. By
linearity of the splitting and invariance of π̌, for any Ǎ ∈ B(X̌),
∗
∗
π̌(Ǎ) = π̌(dxi )P̌ (xi , Ǎ) = π̌(dxi )µx i (Ǎ) = π̌(dxi )µx i ( · ) (Ǎ).

Thus π̌ = π0∗ , where π0 = π̌(dxi )µx i ( · ).
By (10.26) we have that π(A) = π0∗ (A0 ∪ A1 ) = π0 (A), so that in fact π̌ = π ∗ . This
proves one part of (i), and we now show that π is invariant for Φ. For any A ∈ B(X)
we have by invariance of π ∗ and (5.10),
∗
π(A) = π ∗ (A0 ∪ A1 ) = π ∗ P̌ (A0 ∪ A1 ) = πP (A0 ∪ A1 ) = πP (A),

which shows that π is invariant and completes the proof of (i).

The proof of (ii) also follows easily from (5.10): if the measure µ is subinvariant
then
µ∗ P̌ = (µP )∗ ≤ µ∗ ,
242 The existence of π

which establishes subinvariance of µ∗ , and similarly, µ∗ P̌ = µ∗ if µ is strictly invariant.

We can now give a simple proof of

Proposition 10.4.2. If Φ is recurrent and strongly aperiodic, then Φ admits a unique

(up to constant multiples) subinvariant measure which is invariant.

Proof Assume that Φ is strongly aperiodic, and split the chain as in Section 5.1.
If Φ is recurrent then it follows from Proposition 8.2.2 that Φ̌ is also recurrent.
We have from Theorem 10.2.1 that Φ̌ has a unique subinvariant measure π̌ which is
invariant. Thus we have from Proposition 10.4.1 that Φ also has an invariant measure.
The uniqueness is equally easy. If Φ has another subinvariant measure µ, then by
Proposition 10.4.1 the split measure µ∗ is subinvariant for Φ̌, and since from Theo-
rem 10.2.1, the invariant measure π̌ is unique (up to constant multiples) for Φ̌, we must
have for some c > 0 that µ∗ = cπ̌. By linearity this gives µ = cπ as required.

We can, quite easily, lift this result to the whole chain even in the case where we do
not have strong aperiodicity by considering the resolvent chain, since the chain and the
resolvent share the same invariant measures.

Theorem 10.4.3. For any ε ∈ (0, 1), a measure π is invariant for the resolvent Ka ε
if and only if it is invariant for P .

Proof If π is invariant with respect to P then by (10.4) it is also invariant for Ka ,

for any sampling distribution a.
To see the converse, suppose that π satisﬁes πKa ε = π for some ε ∈ (0, 1), and
consider the chain of equalities
∞

πP = (1 − ε) εk πP k +1
k =0
∞

= (1 − ε)ε−1 ( εk πP k − π)
k =0
= ε−1 (πKa ε − (1 − ε)π)
= π.

This now gives us immediately

Theorem 10.4.4. If Φ is recurrent then Φ has a unique (up to constant multiples)

subinvariant measure which is invariant.

Proof Using Theorem 5.2.3, we have that the Ka ε -chain is strongly aperiodic, and
from Theorem 8.2.4 we know that the Ka ε -chain is recurrent. Let π be the unique
invariant measure for the Ka ε -chain, guaranteed from Proposition 10.4.2. From Theo-
rem 10.4.3, π is also invariant for Φ.
10.4. The existence of π: ψ-irreducible chains 243

Suppose that µ is subinvariant for Φ. Then by (10.6) we have that µ is also subin-
variant for the Ka ε -chain, and so there is a constant c > 0 such that µ = cπ. Hence
we have shown that π is the unique (up to constant multiples) invariant measure for
Φ.

We may now equate positivity of Φ to positivity for its skeletons as well as the
resolvent chains.
Theorem 10.4.5. Suppose that Φ is ψ-irreducible and aperiodic. Then, for each m, a
measure π is invariant for the m-skeleton if and only if it is invariant for Φ.
Hence, under aperiodicity, the chain Φ is positive if and only if each of the m-
skeletons Φm is positive.

Proof If π is invariant for Φ then it is obviously invariant for Φm , by (10.4).

Conversely, if πm is invariant for the m-skeleton then by aperiodicity the measure
πm is the unique invariant measure (up to constant multiples) for Φm . In this case
write
m −1
1
π(A) = πm (dw)P k (w, A), A ∈ B(X).
m
k =0
From the P m -invariance we have, using operator theoretic notation,
m −1
1
πP = πm P k +1 = π
m
k =0

so that π is an invariant measure for P . Moreover, since π is invariant for P , it is also

invariant for P m from (10.4), and so by uniqueness of πm , for some c > 0 we have
π = cπm . But as π is invariant for P j for every j, we have from the deﬁnition that
m −1
1
π = c−1 πP k +1 = c−1 π
m
k =0

and so πm = π.

10.4.2 Minimal subinvariant measures

In order to use invariant measures for recurrent chains, we shall study in some detail the
structure of the invariant measures we have now proved to exist in Theorem 10.2.1. We
do this through the medium of subinvariant measures, and we note that, in this section
at least, we do not need to assume any form of irreducibility. Our goal is essentially to
give a more general version of Kac’s Theorem.
Assume that µ is an arbitrary subinvariant measure, and let A ∈ B(X) be such that
0 < µ(A) < ∞. Deﬁne the measure µ◦A by

µ◦A (B) = µ(dy)UA (y, B), B ∈ B(X). (10.27)
A

Proposition 10.4.6. The measure µ◦A is subinvariant, and minimal in the sense that
µ(B) ≥ µ◦A (B) for all B ∈ B(X).
244 The existence of π

Proof If µ is subinvariant, then we have ﬁrst that

µ(B) ≥ µ(dw)P (w, B);
A
n
assume inductively that µ(B) ≥ A µ(dw) m =1 A P m (w, B), for all B. Then, by subin-
variance,
8 n
9
µ(B) ≥ µ(dw) m
A P (w, dv) P (v, B) + µ(dw)P (w, B)
Ac A m =1 A

n +1
m
= µ(dw) AP (w, B).
A m =1

Hence the induction holds for all n, and taking n ↑ ∞ shows that

µ(B) ≥ µ(dw)UA (w, B)
A

for all B. Now by this minimality of µ◦A

∞

◦ m
µA (B) = µ(dw)P (w, B) + µ(dw) A P (w, B)
A A m =2
∞

≥ µ◦A (dw)P (w, B) + [ µ(dw) m
A P (w, dv)]P (v, B)
A Ac A m =1

= µ◦A (dw)P (w, B).
X

Hence µ◦A is subinvariant also.

Recall that we deﬁne A := {x : L(x, A) > 0}. We now show that if the set A in
the deﬁnition of µ◦A is Harris recurrent, the minimal subinvariant measure is in fact
invariant and identical to µ itself on A.
Theorem 10.4.7. If L(x, A) ≡ 1 for µ-almost all x ∈ A, then we have
(i) µ(B) = µ◦A (B) for B ⊂ A;
c
(ii) µ◦A is invariant and µ◦A (A ) = 0.

Proof (i) We ﬁrst show that µ(B) = µ◦A (B) for B ⊆ A.

For any B ⊆ A, since L(x, A) ≡ 1 for µ-almost all x ∈ A, we have from minimality
of µ◦A
µ(A) = µ(B) + µ(A ∩ B c )
≥ µ◦A (B) + µ◦A (A ∩ B c )

= µ(dw)UA (w, B) + µ(dw)UA (w, A ∩ B c )
A A

= µ(dw)UA (w, A) = µ(A). (10.28)

A
10.4. The existence of π: ψ-irreducible chains 245

Hence, the inequality µ(B) ≥ µ◦A (B) must be an equality for all B ⊆ A. Thus the
measure µ satisfies
µ(B) = µ(dw)UA (w, B) (10.29)
A
whenever B ⊆ A.
We now use (10.29) to prove invariance of µ◦A . For any B ∈ B(X),

µ◦A (dy)P (y, B) = µ◦A (dy)P (y, B)
X A

◦
+ µA (dw)UA (w, dy) P (y, B)
Ac A
8 ∞
9

= µ◦A (dy) P (y, B) + n
A P (y, B)
A 2
= µ◦A (B) (10.30)
c
and so µ◦A is invariant for Φ. It follows by definition that µ◦A (A )
= 0, so (ii) is proved.
We now prove (i) by contradiction. Suppose that B ⊆ A with µ(B) > µ◦A (B). Then
we have from invariance of the resolvent chain in Proposition 10.4.3 and minimality of
µ◦A , and the assumption that Ka ε (x, A) > 0 for x ∈ B,

µ(A) ≥ µ(dy)Ka ε (y, A) > µ◦A (dy)Ka ε (y, A) = µ◦A (A) = µ(A),
X X
and we thus have a contradiction.

An interesting consequence of this approach is the identity (10.29). This has the
following interpretation. Assume A is Harris recurrent, and define the process on A,
denoted by ΦA = {ΦA n }, by starting with Φ0 = x ∈ A, then setting Φ1 as the value of
A

Φ at the next visit to A, and so on. Since return to A is sure for Harris recurrent sets,
this is well deﬁned.
Formally, ΦA is actually constructed from the transition law
∞

UA (x, B) = AP
n
(x, B) = Px {Φτ A ∈ B},
n =1

B ⊆ A, B ∈ B(X). Theorem 10.4.7 thus states that for a Harris recurrent set A, any
subinvariant measure restricted to A is actually invariant for the process on A.
One can also go in the reverse direction, starting oﬀ with an invariant measure for
the process on A. The following result is proved using the same calculations used in
(10.30):
Proposition 10.4.8. Suppose that ν is an invariant probability measure supported on
the set A with
ν(dx)UA (x, B) = ν(B), B ⊆ A.
A
Then the measure ν ◦ deﬁned as

ν ◦ (B) := ν(dx)UA (x, B), B ∈ B(X),
A
is invariant for Φ.

246 The existence of π

10.4.3 The structure of π for recurrent chains

These preliminaries lead to the following key result.

Theorem 10.4.9. Suppose Φ is recurrent. Then the unique (up to constant multiples)
invariant measure π for Φ is equivalent to ψ and satisﬁes for any A ∈ B+ (X), B ∈ B(X),

π(B) = A π(dy)UA(y, B)
τ A
= A π(dy)Ey I{Φ k ∈ B} (10.31)
k =1
τ A −1
= A π(dy)Ey k =0 I{Φ k ∈ B} .

Proof The construction in Theorem 10.2.1 ensures that the invariant measure π
exists. Hence from Theorem 10.4.7 we see that π = πA◦ for any Harris recurrent set A,
and π then satisfies the first equality in (10.31) by construction. The second equality
is just the definition of UA . To see the third equality,

τA τ
A −1
π(dy)Ey I{Φk ∈ B} = π(dy)Ey I{Φk ∈ B} ,
A k =1 A k =0

apply (10.29) which implies that

π(dy)Ey [I{Φτ A ∈ B}] = π(dy)Ey [I{Φ0 ∈ B}].
A A

We ﬁnally prove that π ∼ = ψ. From Proposition 10.1.2 we need only show that if
ψ(B) = 0 then also π(B) = 0. But since ψ(B̄) = 0, we have that B 0 ∈ B + (X), and so
from the representation (10.31),

π(B) = π(dy)UB 0 (y, B) = 0,
B0

which is the required result.

The interpretation of (10.31) is this: for a ﬁxed set A ∈ B+ (X), the invariant measure
π(B) is proportional to the amount of time spent in B between visits to A, provided
the chain starts in A with the distribution πA which is invariant for the chain ΦA on
A.
When A is a single point, α, with π(α) > 0 then each visit to α occurs at α. The
chain Φα is hence trivial, and its invariant measure πα is just δα . The representation
(10.31) then reduces to µα given in Theorem 10.2.1.
We will use these concepts systematically in building the asymptotic theory of pos-
itive chains in Chapter 13 and later work, and in Chapter 11 we develop a number of
conditions equivalent to positivity through this representation of π. The next result is
a foretaste of that work.

Theorem 10.4.10. Suppose that Φ is ψ-irreducible, and let µ denote any subinvariant
measure.
10.5. Invariant measures for general models 247

(i) The chain Φ is positive if and only if for one, and then every, set with µ(A) > 0

µ(dy)Ey [τA ] < ∞. (10.32)
A

(ii) The measure µ is ﬁnite and thus Φ is positive recurrent if for some petite set
C ∈ B+ (X)
sup Ey [τC ] < ∞. (10.33)
y ∈C

The chain Φ is positive Harris if also

Ex [τC ] < ∞, x ∈ X. (10.34)

Proof The ﬁrst result is a direct consequence of (10.27), since we have

µ◦A (X) = µ(dy)UA (y, X) = µ(dy)Ey [τA ];
A A

if this is finite thenµ◦A is finite and the chain is positive by definition. Conversely, if
the chain is positive then by Theorem 10.4.9 we know that µ must be a finite invariant
measure and (10.32) then holds for every A.
The second result now follows since we know from Proposition 10.1.2 that µ(C) < ∞
for petite C; and hence we have positive recurrence from (10.33) and (i), whilst the chain
is also Harris if (10.34) holds from the criterion in Theorem 9.1.7.

In Chapter 11 we find a variety of usable and useful conditions for (10.33) and
(10.34) to hold, based on a drift approach which strengthens those in Chapter 8.

10.5 Invariant measures for general models

The constructive approach to the existence of invariant measures which we have fea-
tured so far enables us either to develop results on invariant measures for a number of
models, based on the representation in (10.31), or to interpret the invariant measure
probabilistically once we have determined it by some other means.
We now give a variety of examples of this.

10.5.1 Random walk

Consider the random walk on the line, with increment measure Γ, as deﬁned in (RW1).
Then by Fubini’s Theorem and the translation invariance of µL e b we have for any A ∈
B(X)

Leb
µ (dy)P (y, A) = µL e b (dy)Γ(A − y)
R R

= µL e b (dy) IA −y (x)Γ(dx)
R R
= Γ(dx) IA −x (y)µL e b (dy) (10.35)
R R
= µL e b (A)
248 The existence of π

since Γ(R) = 1. We have already used this formula in (6.8): here it shows that Lebesgue
measure is invariant for unrestricted random walk in either the transient or the recurrent
case.
Since Lebesgue measure on R is inﬁnite, we immediately have from Theorem 10.4.9
that there is no ﬁnite invariant measure for this chain: this proves
Proposition 10.5.1. The random walk on R is never positive recurrent.

If we put this together with the results in Section 9.5, then we have that when the
mean β of the increment distribution is zero, then the chain is null recurrent.
Finally, we note that this is one case where the interpretation in (10.31) can be
expressed in another way. We have, as an immediate consequence of this interpretation
Proposition 10.5.2. Suppose Φ is a random walk on R, with spread-out increment
measure Γ having zero mean and ﬁnite variance.
Let A be any bounded set in R with µL e b (A) > 0, and let the initial distribution of
Φ0 be the uniform distribution on A. If we let NA (B) denote the mean number of visits
to a set B prior to return to A, then for any two bounded sets B, C with µL e b (C) > 0
we have
E[NA (B)]/E[NA (C)] = µL e b (B)/µL e b (C).

Proof Under the given conditions on Γ we have from Proposition 9.4.5 that the
chain is non-evanescent, and hence recurrent.
Using (10.35) we have that the unique invariant measure with π(A) = 1 is π =
µL e b /π(A), and then the result follows from the form (10.31) of π.

10.5.2 Forward recurrence time chains

Let us consider the forward recurrence time chain V + δ deﬁned in Section 3.5 for a
renewal process on R+ . For any ﬁxed δ consider the expected number of visits to an
interval strictly outside [0, δ]. Exactly as we reasoned in the discrete time case studied
in Section 10.3, we have

F [y, ∞)dy ≤ U[0,δ ] (x, dy) ≤ F [y − δ, ∞)dy.

Thus, if πδ is to be the invariant probability measure for V +

δ , by using the normalized
version of the representation (10.31) we obtain

F [y, ∞)dy F [y − δ, ∞)dy

∞ ≤ πδ (dy) ≤ ∞ .
[ 0 F (w, ∞)dw] [ δ F (w, ∞)dw]

Now we use uniqueness of the invariant measure to note that, since the chain V + δ is the
“two-step” chain for the chain V +
δ /2 , the invariant measures πδ and πδ /2 must coincide.
Thus letting δ go to zero through the values δ/2n we ﬁnd that for any δ the invariant
measure is given by
πδ (dy) = m−1 F [y, ∞)dy (10.36)
∞
where m = 0 tF (dt); and πδ is a probability measure provided m < ∞.
10.5. Invariant measures for general models 249

By direct integration it is also straightforward to show that this is indeed the in-
variant measure for V +δ .
This form of the invariant measure thus reinforces the fact that the quantity
F [y, ∞)dy is the expected amount of time spent in the inﬁnitesimal set dy on each
excursion from the point {0}, even though in the discretized chain V +δ the point {0} is
never actually reached.

10.5.3 Ladder chains and GI/G/1 queues

General ladder chains
We will now turn to a more complex structure and see how far the representation of
the invariant measure enables us to carry the analysis.
Recall from Section 3.5.4 the Markov chain constructed on Z+ × R to analyze the
GI/G/1 queue, with the “ladder-invariant” transition kernel

P (i, x; j × A) = 0, j > i + 1,
P (i, x; j × A) = Λi−j +1 (x, A), j = 1, . . . , i + 1, (10.37)
P (i, x; 0 × A) = Λ∗i (x, A).

Let us consider the general chain defined by (10.37), where we can treat x and A
as general points in and subsets of X, so that the chain Φ now moves on a ladder
whose (countable number of) rungs are general in nature. In the special case of the
GI/G/1 model the results specialize to the situation where X = R+ , and there are many
countable models where the rungs are actually finite and matrix methods are used to
achieve the following results.
Using the representation of π, it is possible to construct an invariant measure for
this chain in an explicit way; this then gives the structure of the invariant measure for
the GI/G/1 queue also.
Since we are interested in the structure of the invariant probability measure we make
the assumption in this section that the chain defined by (10.37) is positive Harris and
ψ([0]) > 0, where [0] := {0 × X} is the bottom “rung” of the ladder. We shall explore
conditions for this to hold in Chapter 19.
Our assumption ensures we can reach the bottom of the ladder with probability one.
Let us denote by π0 the invariant probability measure for the process on [0], so that π0
can be thought of as a measure on B(X).
Our goal will be to prove that the structure of the invariant measure for Φ is an
“operator-geometric” one, mimicking the structure of the invariant measure developed
in Section 10.3 for skip-free random walk on the integers.

Theorem 10.5.3. The invariant measure π for Φ is given by

π(k × A) = π0 (dy)S k (y, A), (10.38)
X

where
S k (y, A) = S(y, dz)S k −1 (z, A) (10.39)
X
250 The existence of π

for a kernel S which is the minimal solution of the operator equation

∞

S(y, B) = S k (y, dz)Λk (z, B), x ∈ X, B ∈ B(X). (10.40)
k =0 X

Proof Using the structural result (10.31) we have

π(k × A) = π0 (dy)U[0] (0, y; k × B) (10.41)
[0]

so that if we write
S (k ) (y, A) := U[0] (0, y; k × A) (10.42)
we have by deﬁnition
π(k × A) = π0 (dy)S (k ) (y, A). (10.43)
[0]

Now if we deﬁne the set [n] = {0, 1, . . . , n} × X, by the fact that the chain is translation
invariant above the zero level we have that the functions

U[n ] (n, y; (n + k) × B) = U[0] (0, y; k × B) = S (k ) (y, A) (10.44)

are independent of n. Using a last-exit decomposition over visits to [k], together with
the skip-free property which ensures that the last visit to [k] prior to reaching (k +1)×X
takes place at the level k × X, we ﬁnd

[0] P(0, x; (k + 1) × A)

−1
=1 X [0] P (0, x; k × dy)[k ] P (k, y; (k + 1) × A)
j −j
= (10.45)
j −1
= j =1 X [0] P j
(0, x; k × dy)[0] P −j
(0, y; 1 × A).

Summing over and using (10.44) shows that the operators S (k ) (y, A) have the geo-
metric form in (10.39) as stated.
To see that the operator S satisﬁes (10.40), we decompose [0] P n over the position
at time n − 1. By construction [0] P 1 (0, x; 1 × B) = Λ0 (x, B), and for n > 1,

n −1
[0] P n
(0, x; 1 × B) = [0] P (0, x; k × dy)Λk (y, B); (10.46)
k ≥1 X

summing over n and using (10.39) gives the result (10.40).

To prove minimality of the solution S to (10.40), we ﬁrst deﬁne, for N ≥ 1, the
partial sums

N
SN (x; k × B) := [0] P (0, x; k × B)
j
(10.47)
j =1

so that as N → ∞, SN (x; 1 × B) → S(x; B).

Using (10.45) these partial sums also satisfy

SN −1 (x; k + 1 × B) ≤ SN −1 (x; k × dy)SN −1 (y; 1 × B)
10.5. Invariant measures for general models 251

so that
SN −1 (x; k + 1 × B) ≤ k
SN −1 (x; 1 × dy)SN −1 (y; 1 × B). (10.48)

Moreover from (10.46)

SN (x; 1 × B) = Λ0 (x, B) + SN −1 (x; k × dy)Λk (y, B). (10.49)
k ≥1 X

Substituting from (10.48) in (10.49) shows that

SN (x; 1, B) ≤ k
SN −1 (x; 1, dy)Λk (y, B). (10.50)
k X

Now let S ∗ be any other solution of (10.40). Notice that S1 (x; 1 × B) = Λ0 (x, B) ≤
S ∗ (x, B), from (10.40). Assume inductively that SN −1 (x; 1×B) ≤ S ∗ (x, B) for all x, B:
then we have from (10.50) that

SN (x; 1 × B) ≤ [S ∗ ]k (x, dy)Λk (y, B) = S ∗ (x, B). (10.51)
k X

Taking limits as N → ∞ gives S(x, B) ≤ S ∗ (x, B) for all x, B as required.

This result is a generalized version of (10.24) and (10.25), where the “rungs” on the
ladder were singletons.

The GI/G/1 queue

Note that in the ladder processes above, the returns to the bottom rung of the lad-
der, governed by the kernels Λ∗i in (10.37), only appear in the representation (10.38)
implicitly, through the form of the invariant measure π0 for the process on the set [0].
In particular cases it is of course of critical importance to identify this component
of the invariant measure also. In the case of a singleton rung, this is trivial since the
rung is an atom. This gives the explicit form in (10.24) and (10.25).
We have seen in Section 3.5 that the general ladder chain is a model for the GI/G/1
queue, if we make the particular choice of

Φn = (Nn , Rn ), n≥1

where Nn is the number of customers at Tn − and Rn is the residual service time at
Tn +. In this case the representation of π[0] can also be made explicit.
For the GI/G/1 chain we have that the chain on [0] has the distribution of Rn at
a time point {Tn +} where there were no customers at {Tn −}: so at these time points
Rn has precisely the distribution of the service brought by the customer arriving at Tn ,
namely H.
So in this case we have that the process on [0], provided [0] is recurrent, is a process
of i.i.d random variables with distribution H, and thus is very clearly positive Harris
with invariant probability H.
Theorem 10.5.3 then gives us
252 The existence of π

Theorem 10.5.4. The ladder chain Φ describing the GI/G/1 queue has an invariant
probability if and only if the measure π given by

π(k × A) = H(dy)S k (y, A) (10.52)
X

is a ﬁnite measure, where S is the minimal solution of the operator equation

∞

S(y, B) = S k (y, dz)Λk (z, B), x ∈ X, B ∈ B(X). (10.53)
k =0 X

In this case π suitably normalized is the unique invariant probability measure for Φ.

Proof Using the proof of Theorem 10.5.3 we have that π is the minimal subinvari-
ant measure for the GI/G/1 queue, and the result is then obvious.

10.5.4 Linear state space models

We now consider brieﬂy a chain where we utilize the property (10.4) to develop the
form of the invariant measure. We will return in much more detail to this approach in
Chapter 12.
We have seen in (10.4) that limiting distributions provide invariant probability mea-
sures for Markov chains, provided such limits exist. The linear model has a structure
which makes it easy to construct an invariant probability through this route, rather
than through the minimal measure construction above.
Suppose that (LSS1) and (LSS2) are satisﬁed, and observe that since W is assumed
i.i.d. we have for each initial condition X0 = x0 ∈ Rn ,

k −1
Xk = F k x0 + F i GWk −i
i=0

k −1
∼ F k x0 + F i GWi .
i=0

This says that for any continuous, bounded function g : Rn → R,

k −1
P k g (x0 ) = Ex 0 [g(Xk )] = E[g(F k x0 + F i GWi )].
i=0

Under the additional hypothesis that the eigenvalue condition (LSS5) holds, it follows
from Lemma 6.3.4 that F i → 0 as i → ∞ at a geometric rate. Since W has a ﬁnite
mean then it follows from Fubini’s Theorem that the sum
∞

X∞ := F i GWi
i=0
10.6. Commentary 253

∞
converges absolutely, with E[|X∞ |] ≤ E[|W |] i=0
F i G
< ∞, with
·
an appropriate
matrix norm. Hence by the Dominated Convergence Theorem, and the assumption that
g is continuous,
lim P k g (x0 ) = E[g(X∞ )].
k →∞

Let us write π∞ for the distribution of X∞ . Then π∞ is an invariant probability. For

take g bounded and continuous as before, so that using the Feller property for X in
Chapter 6 we have that P g is continuous. For such a function g

π∞ (P g) = E[P g(X∞ )] = lim P k (x0 , P g)

k →∞
= lim P k +1 g (x0 )
k →∞
= E[g(X∞ )] = π∞ (g).

Since π is determined by its values on continuous bounded functions, this proves that
π is invariant.
In the Gaussian case (LSS3) we can express the invariant probability more explicitly.
In this case X∞ itself is Gaussian with mean zero and covariance
∞

E[X∞ X∞ ]= F i GG F i .
k =0

That is, π = N (0, Σ) where Σ is equal to the controllability grammian for the linear
state space model, deﬁned in (4.17).
The covariance matrix Σ is full rank if and only if the controllability condition
(LCM3) holds, and in this case, for any k greater than or equal to the dimension of
the state space, P k (x, dy) possesses the density pk (x, y)dy given in (4.18). It follows
immediately that when (LCM3) holds, the probability π possesses the density p on Rn
given by ( )
p(y) = (2π|Σ|)−n /2 exp − 12 y T Σ−1 y , (10.54)
while if the controllability condition (LCM3) fails to hold then the invariant probability
is concentrated on the controllable subspace X0 = R(Σ) ⊂ X and is hence singular with
respect to Lebesgue measure.

10.6 Commentary
The approach to positivity given here is by no means standard. It is much more common,
especially with countable spaces, to classify chains either through the behavior of the
sequence P n , with null chains being those for which P n (x, A) → 0 for, say, petite sets
A and all x, and positive chains being those for which such limits are not always zero;
a limiting argument such as that in (10.4), which we have illustrated in Section 10.5.4,
then shows the existence of π in the positive case.
Alternatively, positivity is often deﬁned through the behavior of the expected return
times to petite or other suitable sets.
We will show in Chapter 11 and Chapter 18 that even on a general space all of
these approaches are identical. Our view is that the invariant measure approach is
254 The existence of π

much more straightforward to understand than the P n approach, and since one can
now develop through the splitting technique a technically simple set of results this gives
an appropriate classification of recurrent chains.
The existence of invariant probability measures has been a central topic of Markov
chain theory since the inception of the subject. Doob [99] and Orey [309] give some good
background. The approach to countable recurrent chains through last-exit probabilities
as in Theorem 10.2.1 is due to Derman [86], and has not changed much since, although
the uniqueness proofs we give owe something to Vere-Jones [406]. The construction of π
given here is of course one of our first serious uses of the splitting method of Nummelin
[301]; for strongly aperiodic chains the result is also derived in Athreya and Ney [13].
The fact that one identifies the actual structure of π in Theorem 10.4.9 will also be of
great use, and Kac’s Theorem [186] provides a valuable insight into the probabilistic
difference between positive and null chains: this is pursued in the next chapter in
considerably more detail.
Before the splitting technique, verifying conditions for the existence of π had ap-
peared to be a deep and rather difficult task. It was recognized in the relatively early
development of general state space Markov chains that one could prove the existence
of an invariant measure for Φ from the existence of an invariant probability measure
for the “process on A”. The approach pioneered by Harris [155] for finding the latter
involves using deeper limit theorems for the “process on A” in the special case where
A is a νn -small set, (called a C-set in Orey [309]) if an = δn and νn {A} > 0. In this
methodology, it is first shown that limiting probabilities for the process on A exist, and
the existence of such limits then provides an invariant measure for the process on A:
by the construction described in this chapter this can be lifted to an invariant measure
for the whole chain. Orey [309] remains an excellent exposition of the development of
this approach.
This “process on A” method is still the only one available without some regeneration,
and we will develop this further in a topological setting in Chapter 12, using many of
the constructions above.
We have shown that invariant measures exist without using such deep asymptotic
properties of the chain, indicating that the existence and uniqueness of such measures
is in fact a result requiring less of the detailed structure of the chain.
The minimality approach of Section 10.4.2 of course would give another route to
Theorem 10.4.4, provided we had some method of proving that a “starting” subinvari-
ant measure existed. There is one such approach, which avoids splitting and remains
conceptually simple. This involves using the kernels
∞
(r )
U (x, A) = P (x, A)r ≥ r U (r ) (x, dy)P (y, A)
n n
(10.55)
n =1 X

deﬁned for 0 < r < 1. One can then deﬁne a subinvariant measure for Φ as a limit

lim πr ( · ) := lim[ νn (dy)U (r ) (y, · )]/[ νn (dy)U (r ) (y, C)]
r ↑1 r ↑1 C C

where C is a νn -small set. The key is the observation that this limit gives a non-trivial
σ-ﬁnite measure due to the inequalities
Mj ≥ πr (C̄(j)) (10.56)
10.6. Commentary 255

and
πr (A) ≥ rn νn (A), A ∈ B(X), (10.57)
which are valid for all r large enough. Details of this construction are in Arjas and
Nummelin [7], as is a neat alternative proof of uniqueness.
All of these approaches are now superseded by the splitting approach, but of course
only when the chain is ψ-irreducible. If this is not the case then the existence of an
invariant measure is not simple. The methods of Section 10.4.2, which are based on
Tweedie [402], do not use irreducibility, and in conjunction with those in Chapter 12
they give some ways of establishing uniqueness and structure for the invariant measures
from limiting operations, as illustrated in Section 10.5.4.
The general question of existence and, more particularly, uniqueness of invariant
measures for non-irreducible chains remains open at this stage of theoretical develop-
ment.
The invariance of Lebesgue measure for random walk is well known, as is the form
(10.36) for models in renewal theory. The invariant measures for queues are derived
directly in [59], but the motivation through the minimal measure of the geometric form
is not standard. The extension to the operator-geometric form for ladder chains is in
[399], and in the case where the rungs are ﬁnite, the development and applications are
given by Neuts [293, 294].
The linear model is analyzed in Snyders [364] using ideas from control theory, and
the more detailed analysis given there allows a generalization of the construction given in
Section 10.5.4. Essentially, if the noise does not enter the “unstable” region of the state
space then the stability condition on the driving matrix F can be slightly weakened.
Chapter 11

Drift and regularity

Using the finiteness of the invariant measure to classify two different levels of stabil-
ity is intuitively appealing. It is simple, and it also involves a fundamental stability
requirement of many classes of models. Indeed, in time series analysis for example, a
standard starting point, rather than an end point, is the requirement that the model be
stationary, and it follows from (10.4) that for a stationary version of a model to exist
we are in effect requiring that the structure of the model be positive recurrent.
In this chapter we consider two other descriptions of positive recurrence which we
show to be equivalent to that involving finiteness of π.
The first is in terms of regular sets.

Regularity
A set C ∈ B(X) is called regular when Φ is ψ-irreducible, if

sup Ex [τB ] < ∞, B ∈ B + (X). (11.1)

x∈C

The chain Φ is called regular if there is a countable cover of X by regular

sets.

We know from Theorem 10.2.1 that when there is a ﬁnite invariant measure and an
atom α ∈ B + (X) then Eα [τα ] < ∞. A regular set C ∈ B + (X) as deﬁned by (11.1) has
the property not only that the return times to C itself, but indeed the mean hitting
times on any set in B + (X) are bounded from starting points in C.
We will see that there is a second, equivalent, approach in terms of conditions on
the one-step “mean drift”

∆V (x) = P (x, dy)V (y) − V (x) = Ex [V (Φ1 ) − V (Φ0 )]. (11.2)
X

We have already shown in Chapter 8 and Chapter 9 that for ψ-irreducible chains, drift
towards a petite set implies that the chain is recurrent or Harris recurrent, and drift

256
Drift and regularity 257

away from such a set implies that the chain is transient. The high points in this chapter
are the following much more wide ranging equivalences.
Theorem 11.0.1. Suppose that Φ is a Harris recurrent chain, with invariant measure
π. Then the following three conditions are equivalent:
(i) The measure π has ﬁnite total mass;
(ii) There exists some petite set C ∈ B(X) and MC < ∞ such that
sup Ex [τC ] ≤ MC ; (11.3)
x∈C

(iii) There exists some petite set C and some extended-real-valued, non-negative test
function V, which is ﬁnite for at least one state in X, satisfying
∆V (x) ≤ −1 + bIC (x), x ∈ X. (11.4)

When (iii) holds then V is ﬁnite on an absorbing full set S and the chain restricted to
S is regular; and any sublevel set of V satisﬁes (11.3).

Proof That (ii) is equivalent to (i) is shown by combining Theorem 10.4.10 with
Theorem 11.1.4, which also shows that some full absorbing set exists on which Φ is
regular. The equivalence of (ii) and (iii) is in Theorem 11.3.11, whilst the identification
of the set S as the set where V is finite is in Proposition 11.3.13, where we also show
that sublevel sets of V satisfy (11.3).

Both of these approaches, as well as giving more insight into the structure of positive
recurrent chains, provide tools for further analysis of asymptotic properties in Part III.
In this chapter, the equivalence of existence of solutions of the drift condition (11.4)
and the existence of regular sets is motivated, and explained to a large degree, by the
deterministic results in Section 11.2. Although there are a variety of proofs of such
results available, we shall develop a particularly powerful approach via a discrete time
form of Dynkin’s formula.
Because it involves only the one-step transition kernel, (11.4) provides an invaluable
practical criterion for evaluating the positive recurrence of specific models: we illustrate
this in Section 11.4.
There exists a matching, although less important, criterion for the chain to be non-
positive rather than positive: we shall also prove in Section 11.5.1 that if a test function
satisfies the reverse drift condition
∆V (x) ≥ 0, x ∈ Cc , (11.5)
then provided the increments are bounded in mean, in the sense that

sup P (x, dy)|V (x) − V (y)| < ∞, (11.6)
x∈X

the mean hitting times Ex [τC ] are inﬁnite for x ∈ C c .

Prior to considering drift conditions, in the next section we develop through the use
of the Nummelin splitting technique the structural results which show why (11.3) holds
for some petite set C, and why this “local” bounded mean return time gives bounds on
the mean ﬁrst entrance time to any set in B + (X).
258 Drift and regularity

11.1 Regular chains

On a countable space we have a simple connection between the concept of regularity
and positive recurrence.
Proposition 11.1.1. For an irreducible chain on a countable space, positive recurrence
and regularity are equivalent.

Proof Clearly, from Theorem 10.2.2, positive recurrence is implied by regularity.

To see the converse note that, for any ﬁxed states x, y ∈ X and any n

Ex [τx ] ≥ x P n (x, y)[Ey [τx ] + n].

Since the left hand side is ﬁnite for any x, and by irreducibility for any y there is some
n with x P n (x, y) > 0, we must have Ey [τx ] < ∞ for all y also.

It will require more work to ﬁnd the connections between positive recurrence and
regularity in general.
It is not implausible that positive chains might admit regular sets. It follows imme-
diately from (10.32) that in the positive recurrent case for any A ∈ B+ (X) we have

Ex [τA ] < ∞, a.e. x ∈ A [π]. (11.7)

Thus we have from the form of π more than enough “almost-regular” sets in the positive
recurrent case.
To establish the existence of true regular sets we first consider ψ-irreducible chains
which possess a recurrent atom α ∈ B + (X). Although it appears that regularity may
be a difficult criterion to meet since in principle it is necessary to test the hitting time
of every set in B + (X), when an atom exists it is only necessary to consider the first
hitting time to the atom.
Theorem 11.1.2. Suppose that there exists an accessible atom α ∈ B+ (X).
(i) If Φ is positive recurrent then there exists a decomposition

X=S∪N (11.8)

where the set S is full and absorbing, and Φ restricted to S is regular.

(ii) The chain Φ is regular if and only if

Ex [τα ] < ∞ (11.9)

for every x ∈ X.

Proof Let
S := {x : Ex [τα ] < ∞};
obviously S is absorbing, and since the chain is positive recurrent we have from Theo-
rem 10.4.10 (ii) that Eα [τα ] < ∞, and hence α ∈ S. This also shows immediately that
S is full by Proposition 4.2.3.
11.1. Regular chains 259

Let B be any set in B + (X) with B ⊆ αc , so that for π-almost all y ∈ B we have
Ey [τB ] < ∞ from (11.7). From ψ-irreducibility there must then exist amongst these
values one w and some n such that B P n (w, α) > 0. Since

Ew [τB ] ≥ B P n (w, α)Eα [τB ]

we must have Eα [τB ] < ∞.

Let us set
Sn = {y : Ey [τα ] ≤ n}. (11.10)

We have the obvious inequality for any x and any B ∈ B+ (X) that

Ex [τB ] ≤ Ex [τα ] + Eα [τB ] (11.11)

so that each Sn is a regular set, and since {Sn } is a cover of S, we have that Φ restricted
to S is regular.
This proves (i): to see (ii) note that under (11.9) we have X = S, so the chain is
regular; whilst the converse is obvious.

It is unfortunate that the ψ-null set N in Theorem 11.1.2 need not be empty. For
consider a chain on Z+ with

P (0, 0) = 1,
P (j, 0) = βj > 0,
P (j, j + 1) = 1 − βj . (11.12)

Then the chain restricted to {0} is trivially regular, and the whole chain is positive
recurrent; but if
1
j
βk = ∞
j 1

then the chain is not regular, and N = {1, 2, . . .} in (11.8).

It is the weak form of irreducibility we use which allows such null sets to exist:
this pathology is of course avoided on a countable space under the normal form of
irreducibility, as we saw in Proposition 11.1.1.
However, even under ψ-irreducibility we can extend this result without requiring an
atom in the original space.
Let us next consider the case where Φ is strongly aperiodic, and use the Nummelin
splitting to deﬁne Φ̌ on X̌ as in Section 5.1.1.

Proposition 11.1.3. Suppose that Φ is strongly aperiodic and positive recurrent. Then
there exists a decomposition
X=S∪N (11.13)

where the set S is full and absorbing, and Φ restricted to S is regular.

260 Drift and regularity

Proof We know from Proposition 10.4.2 that the split chain is also positive recur-
rent with invariant probability measure π̌; and thus for π̌-a.e. xi ∈ X̌, by (11.7) we have
that
Ěx i [τα̌ ] < ∞. (11.14)
Let Š ⊆ X̌ denote the set where (11.14) holds. Then it is obvious that Š is absorbing,
and by Theorem 11.1.2 the chain Φ̌ is regular on Š. Let {Šn } denote the cover of Š
with regular sets.
Now we have Ň = X̌\Š ⊆ X0 , and so if we write N as the copy of Ň and deﬁne
S = X\N , we can cover S with the matching copies Sn . We then have for x ∈ Sn and
any B ∈ B+ (X)
Ex [τB ] ≤ Ěx 0 [τB ] + Ěx 1 [τB ]
which is bounded for x0 ∈ Šn and all x1 ∈ α̌, and hence for x ∈ Sn .
Thus S is the required full absorbing set for (11.13) to hold.

It is now possible, by the device we have used before of analyzing the m-skeleton,
to show that this proposition holds for arbitrary positive recurrent chains.
Theorem 11.1.4. Suppose that Φ is ψ-irreducible. Then the following are equivalent:
(i) The chain Φ is positive recurrent.
(ii) There exists a decomposition
X=S∪N (11.15)
where the set S is full and absorbing, and Φ restricted to S is regular.

Proof Assume Φ is positive recurrent. Then the Nummelin splitting exists for
some m-skeleton from Proposition 5.4.5, and so we have from Proposition 11.1.3 that
there is a decomposition as in (11.15) where the set S = ∪Sn and each Sn is regular for
the m-skeleton.
But if τBm denotes the number of steps needed for the m-skeleton to reach B, then
we have that
τB ≤ m τBm
and so each Sn is also regular for Φ as required.
The converse is almost trivial: when the chain is regular on S then there exists
a petite set C inside S with supx∈C Ex [τC ] < ∞, and the result follows from Theo-
rem 10.4.10.

Just as we may restrict any recurrent chain to an absorbing set H on which the
chain is Harris recurrent, we have here shown that we can further restrict a positive
recurrent chain to an absorbing set where it is regular.
We will now turn to the equivalence between regularity and mean drift conditions.
This has the considerable beneﬁt that it enables us to identify exactly the null set on
which regularity fails, and thus to eliminate from consideration annoying and patho-
logical behavior in many models. It also provides, as noted earlier, a sound practical
approach to assessing stability of the chain.
To motivate and perhaps give more insight into the connections between hitting
times and mean drift conditions we ﬁrst consider deterministic models.
11.2. Drift, hitting times and deterministic models 261

11.2 Drift, hitting times and deterministic models

In this section we analyze a deterministic state space model, indicating the role we might
expect the drift conditions (11.4) on ∆V to play. As we have seen in Chapter 4 and
Chapter 7 in examining irreducibility structures, the underlying deterministic models
for state space systems foreshadow the directions to be followed for systems with a noise
component.
Let us then assume that there is a topology on X, and consider the deterministic
process known as a semi-dynamical system.

The semi-dynamical system

(DS1) The process Φ is deterministic, and generated by the nonlinear
diﬀerence equation, or semi-dynamical system,

Φk +1 = F (Φk ), k ∈ Z+ , (11.16)

where F : X → X is a continuous function.

Although Φ is deterministic, it is certainly a Markov chain (if a trivial one in a

probabilistic sense), with Markov transition operator P deﬁned through its operations
on any function f on X by

P f ( · ) = f (F ( · )).

Since we have assumed the function F to be continuous, the Markov chain Φ has the
Feller property, although in general it will not be a T-chain.
For such a deterministic system it is standard to consider two forms of stability
known as recurrence and ultimate boundedness. We shall call the deterministic system
(11.16) recurrent if there exists a compact subset C ⊂ X such that σC (x) < ∞ for each
initial condition x ∈ X. Such a concept of recurrence here is almost identical to the
deﬁnition of recurrence for stochastic models. We shall call the system (11.16) ultimately
bounded if there exists a compact set C ⊂ X such that for each ﬁxed initial condition
Φ0 ∈ X, the trajectory starting at Φ0 eventually enters and remains in C. Ultimate
boundedness is loosely related to positive recurrence: it requires that the limit points of
the process all lie within a compact set C, which is somewhat analogous to the positivity
requirement that there be an invariant probability measure π with π(C) > 1 − ε for
some small ε.
262 Drift and regularity

Drift condition for the semi-dynamical system

(DS2) There exists a positive function V : X → R+ and a compact set
C ⊂ X and constant M < ∞ such that

∆V (x) := V (F (x)) − V (x) ≤ −1

for all x lying outside the compact set C, and

sup V (F (x)) ≤ M.
x∈C

If we consider the sequence V (Φn ) on R+ then this condition requires that this
sequence move monotonically downwards at a uniform rate until the ﬁrst time that Φ
enter C. It is therefore not surprising that Φ hits C in a ﬁnite time under this condition.

Theorem 11.2.1. Suppose that Φ is deﬁned by (DS1).

(i) If (DS2) is satisﬁed, then Φ is ultimately bounded.

(ii) If Φ is recurrent, then there exists a positive function V such that (DS2) holds.

(iii) Hence Φ is recurrent if and only if it is ultimately bounded.

Proof To prove (i), let Φ(x, n) = F n (x) denote the deterministic position of Φn if
the chain starts at Φ0 = x. We ﬁrst show that the compact set C deﬁned as
*
C := {Φ(x, i) : x ∈ C, 1 ≤ i ≤ M + 1} ∪ C

where M is the constant used in (DS2), is invariant as deﬁned in Chapter 7.

For any x ∈ C we have Φ(x, i) ∈ C for some 1 ≤ i ≤ M + 1 by (DS2) and the
hypothesis that V is positive. Hence for an arbitrary j ∈ Z+ , Φ(x, j) = Φ(y, i) for some
y ∈ C, and some 1 ≤ i ≤ M + 1. This implies that Φ(x, j) ∈ C and hence C is equal
to the invariant set
∞
*
C = {Φ(x, i) : x ∈ C} ∪ C.
i=1

Because V is positive and decreases on C c , every trajectory must enter the set C, and
hence also C at some ﬁnite time. We conclude that Φ is ultimately bounded.
We now prove (ii). Suppose that a compact set C1 exists such that σC 1 (x) < ∞ for
each initial condition x ∈ X. Let O be an open pre-compact set containing C1 , and set
C := cl O. Then the test function

V (x) := σO (x)

satisfies (DS2). To see this, observe that if x ∈ C c , then V (F (x)) = V (x) − 1 and hence
the first inequality is satisfied. By assumption the function V is everywhere finite,
11.3. Drift criteria for regularity 263

and since O is open it follows that V is upper semicontinuous from Proposition 6.1.1.
This implies that the second inequality in (DS2) holds, since a ﬁnite-valued upper
semicontinuous function is uniformly bounded on compact sets.

For a semi-dynamical system, this result shows that recurrence is actually equiva-
lent to ultimate boundedness. In this the deterministic system diﬀers from the general
NSS(F ) model with a non-trivial random component. More pertinently, we have also
shown that the semi-dynamical system is ultimately bounded if and only if a test func-
tion exists satisfying (DS2).
This test function may always be taken to be the time to reach a certain compact
set. As an almost exact analogue, we now go on to see that the expected time to
reach a petite set is the appropriate test function to establish positive recurrence in the
stochastic framework; and that, as we show in Theorem 11.3.4 and Theorem 11.3.5, the
existence of a test function similar to (DS2) is equivalent to positive recurrence.

11.3 Drift criteria for regularity

11.3.1 Mean drift and Dynkin’s formula
The deterministic models of the previous section lead us to hope that we can obtain
criteria for regularity by considering a drift criterion for positive recurrence based on
(11.4).
What is somewhat more surprising is the depth of these connections and the direct
method of attack on regularity which we have through this route.
The key to exploiting the eﬀect of mean drift is the following condition, which is
stronger on C c than (V1) and also requires a bound on the drift away from C.

Strict drift towards C

(V2) For some set C ∈ B(X), some constant b < ∞, and an extended-
real-valued function V : X → [0, ∞]

∆V (x) ≤ −1 + bIC (x) x ∈ X. (11.17)

This is a portmanteau form of the following two equations:

∆V (x) ≤ −1, x ∈ Cc , (11.18)

for some non-negative function V and some set C ∈ B(X); and for some M < ∞,

∆V (x) ≤ M, x ∈ C. (11.19)

Thus we might hope that (V2) might have something of the same impact for stochastic
models as (DS2) has for deterministic chains.
264 Drift and regularity

In essentially the form (11.18) and (11.19) these conditions were introduced by Foster
[129] for countable state space chains, and shown to imply positive recurrence. Use of
the form (V2) will actually make it easier to show that the existence of everywhere
finite solutions to (11.17) is equivalent to regularity and moreover we will identify the
sublevel sets of the test function V as regular sets.
The central technique we will use to make connections between one-step mean drifts
and moments of first entrance times to appropriate (usually petite) sets hinges on
a discrete time version of a result known for continuous time processes as Dynkin’s
formula.
This formula yields not only those criteria for positive Harris chains and regularity
which we discuss in this chapter, but also leads in due course to necessary and sufficient
conditions for rates of convergence of the distributions of the process; necessary and
sufficient conditions for finiteness of moments; and sample path ergodic theorems such
as the Central Limit Theorem and Law of the Iterated Logarithm. All of these are
considered in Part III.
Dynkin’s formula is a sample path formula, rather than a formula involving proba-
bilistic operators. We need to introduce a little more notation to handle such situations.
Recall from Section 3.4 the definition

FkΦ = σ{Φ0 , . . . , Φk }, (11.20)

and let {Zk , FkΦ } be an adapted sequence of positive random variables. For each k, Zk
will denote a ﬁxed Borel measurable function of (Φ0 , . . . , Φk ), although in applications
this will usually (although not always) be a function of the last position, so that

Zk (Φ0 , . . . , Φk ) = Z(Φk )

for some measurable function Z. We will somewhat abuse notation and let Zk denote
both the random variable, and the function on Xk +1 .
For any stopping time τ deﬁne

τ n := min{n, τ, inf {k ≥ 0 : Zk ≥ n}}.

The random time τ n is also a stopping time since it is the minimum of stopping times,
τ n −1
and the random variable i=0 Zi is essentially bounded by n2 .
Dynkin’s formula will now tell us that we can evaluate the expected value of Zτ n by
taking the initial value Z0 and adding on to this the average increments at each time
until τ n . This is almost obvious, but has widespread consequences: in particular it
enables us to use (V2) to control these one-step average increments, leading to control
of the expected overall hitting time.

Theorem 11.3.1 (Dynkin’s formula). For each x ∈ X and n ∈ Z+ ,

τn
Ex [Zτ n ] = Ex [Z0 ] + Ex (E[Zi | Fi−1
Φ
] − Zi−1 ) .
i=1
11.3. Drift criteria for regularity 265

Proof For each n ∈ Z+ ,

n

τ
Zτ n = Z0 + (Zi − Zi−1 )
i=1
n
= Z0 + I{τ n ≥ i}(Zi − Zi−1 ).
i=1

Taking expectations and noting that {τ n ≥ i} ∈ Fi−1

Φ
we obtain

n
Ex [Zτ n ] = Ex [Z0 ] + Ex Ex [Zi − Zi−1 | Fi−1
Φ
]I{τ n ≥ i}
i=1

τn
= Ex [Z0 ] + Ex (Ex [Zi | Fi−1
Φ
] − Zi−1 ) .
i=1

As an immediate corollary we have
Proposition 11.3.2. Suppose that there exist two sequences of positive functions
{sk , fk : k ≥ 0} on X, such that

E[Zk +1 | FkΦ ] ≤ Zk − fk (Φk ) + sk (Φk ).

Then for any initial condition x and any stopping time τ

−1
τ −1
τ
Ex [ fk (Φk )] ≤ Z0 (x) + Ex [ sk (Φk )].
k =0 k =0

Proof Fix N > 0 and note that

E[Zk +1 | FkΦ ] ≤ Zk − fk (Φk ) ∧ N + sk (Φk ).

By Dynkin’s formula

τn
0 ≤ Ex [Zτ n ] ≤ Z0 (x) + Ex (si−1 (Φi−1 ) − [fi−1 (Φi−1 ) ∧ N ])
i=1

and hence by adding the ﬁnite term

τn
Ex [fk −1 (Φk −1 ) ∧ N ]
k =1

to each side we get

τn
τn
τ
Ex [fk −1 (Φk −1 )∧N ] ≤ Z0 (x)+Ex sk −1 (Φk −1 ) ≤ Z0 (x)+Ex sk −1 (Φk −1 ) .
k =1 k =1 k =1
266 Drift and regularity

Letting n → ∞ and then N → ∞ gives the result by the Monotone Convergence

Theorem.

Closely related to this we have

Proposition 11.3.3. Suppose that there exists a sequence of positive functions {εk :
k ≥ 0} on X, c < ∞, such that

(i) εk +1 (x) ≤ cεk (x), k ∈ Z+ , x ∈ Ac ;

(ii) E[Zk +1 | FkΦ ] ≤ Zk − εk (Φk ), σA > k.

Then "
A −1
τ
Z0 (x), x ∈ Ac ;
Ex [ εi (Φi )] ≤
i=0
ε0 (x) + cP Z0 (x), x ∈ X.

Proof Let Zk and εk denote the random variables Zk (Φ0 , . . . , Φk ) and εk (Φk )
respectively.
By hypothesis E[Zk | FkΦ−1 ] − Zk −1 ≤ −εk −1 whenever 1 ≤ k ≤ σA . Hence for all
n ∈ Z+ and x ∈ X we have by Dynkin’s formula

n
τA
0 ≤ Ex [Zτ An ] ≤ Z0 (x) − Ex εi−1 (Φi−1 ) , x ∈ Ac .
i=1

By the Monotone Convergence Theorem it follows that for all initial conditions,

τA
Ex εi−1 (Φi−1 ) ≤ Z0 (x), x ∈ Ac .
i=1

This proves the result for x ∈ Ac .

For arbitrary x we have

τA
τA
Ex εi−1 (Φi−1 ) = ε0 (x) + Ex EΦ 1 εi (Φi−1 ) I(Φ1 ∈ Ac )
i=1 i=1
≤ ε0 (x) + cP Z0 (x).

We can immediately use Dynkin’s formula to prove

Theorem 11.3.4. Suppose C ∈ B(X), and V satisﬁes (V2). Then

Ex [τC ] ≤ V (x) + bIC (x)

for all x. Hence if C is petite and V is everywhere ﬁnite and bounded on C, then Φ is
positive Harris recurrent.
11.3. Drift criteria for regularity 267

Proof Applying Proposition 11.3.3 with Zk = V (Φk ), εk = 1 we have the bound

"
V (x) for x ∈ C c
Ex [τC ] ≤
1 + P V (x) x ∈ C

Since (V2) gives P V ≤ V − 1 + b on C, we have the required result.

If V is everywhere ﬁnite then this bound trivially implies L(x, C) ≡ 1 and so, if C
is petite, the chain is Harris recurrent from Proposition 9.1.7. Positivity follows from
Theorem 10.4.10 (ii).

We will strengthen Theorem 11.3.4 below in Theorem 11.3.11 where we show that
V need not be bounded on C, and moreover that (V2) gives bounds on the mean return
time to general sets in B + (X).

11.3.2 Hitting times and test functions

The upper bound in Theorem 11.3.4 is a typical consequence of the drift condition. The
key observation in showing the actual equivalence of mean drift towards petite sets and
regularity is the identification of specific solutions to (V2) when the chain is regular.
For any set A ∈ B(X) we define the kernel GA on (X, B(X)) through

σA
GA (x, f ) := [I + IA c UA ] (x, f ) = Ex [ f (Φk )], (11.21)
k =0

where x is an arbitrary state and f is any positive function.

For f ≥ 1 ﬁxed we will see in Theorem 11.3.5 that the function V = GC ( · , f )
satisﬁes (V2), and also a generalization of this drift condition to be developed in later
chapters. In this chapter we concentrate on the special case where f ≡ 1 and we will
simplify the notation by setting

VC (x) = GC (x, X) = 1 + Ex [σC ]. (11.22)

Theorem 11.3.5. For any set A ∈ B(X) we have

(i) The kernel GA satisﬁes the identity

P GA = GA − I + IA UA .

(ii) The function VA ( · ) = GA ( · , X) satisﬁes the identity

P VA (x) = VA (x) − 1, x ∈ Ac , (11.23)

P VA (x) = Ex [τA ] − 1, x ∈ A. (11.24)

Thus if C ∈ B (X) is regular, VC is a solution to (11.17).
+

(iii) The function V = VA −1 is the pointwise minimal solution on Ac to the inequalities

P V (x) ≤ V (x) − 1, x ∈ Ac . (11.25)

268 Drift and regularity

Proof From the deﬁnition

∞

UA := (P IA c )k P
k =0

we see that UA = P + P IA c UA = P GA . Since UA = GA − I + IA UA we have (i), and

then (ii) follows.
We have that VA solves (11.25) from (ii); but if V is any other solution then it is
pointwise larger than VA exactly as in Theorem 11.3.4.

We shall use repeatedly the following lemmas, which guarantee ﬁniteness of solu-
tions to (11.17), and which also give a better description of the structure of the most
interesting solution, namely VC .

Lemma 11.3.6. Any solution of (11.17) is ﬁnite ψ-almost everywhere or inﬁnite ev-
erywhere.

Proof If V satisﬁes (11.17), then

P V (x) ≤ V (x) + b

for all x ∈ X, and it then follows that the set {x : V (x) < ∞} is absorbing. If this set
is non-empty then it is full by Proposition 4.2.3.

Lemma 11.3.7. If the set C is petite, then the function VC (x) is unbounded oﬀ petite
sets.

Proof We have from Chebyshev’s inequality that for each of the sublevel sets
CV ( ) := {x : VC (x) ≤ },

sup Px {σC ≥ n} ≤ .
x∈C V ( ) n
a
Since the right hand side is less than 12 for suﬃciently large n, this shows that CV ( ) C
for a sampling distribution a, and hence, by Proposition 5.5.4, the set CV ( ) is petite.

Lemma 11.3.7 will typically be applied to show that a given petite set is regular.
The converse is always true, as the next result shows:

Proposition 11.3.8. If the set A is regular, then it is petite.

Proof Again we apply Chebyshev’s inequality. If C ∈ B+ (X) is petite then

1
sup Px {σC > n} ≤ sup Ex [τC ].
x∈A n x∈A

As in the proof of Lemma 11.3.7 this shows that A is petite if it is regular.

11.3. Drift criteria for regularity 269

11.3.3 Regularity, drifts and petite sets

In this section, using the full force of Dynkin’s formula and the form (V2) for the drift
condition, we will ﬁnd we can do rather more than bound the return times to C from
states in C. We have ﬁrst
Lemma 11.3.9. If (V2) holds, then for each x ∈ X and any set B ∈ B(X)

τ
B −1
Ex [τB ] ≤ V (x) + bEx IC (Φk ) . (11.26)
k =0

Proof This follows from Proposition 11.3.2 on letting fk = 1, sk = bIC .

Note that Theorem 11.3.4 is the special case of this result when B = C.
In order to derive the central characterization of regularity, we ﬁrst need an identity
linking sampling distributions and hitting times on sets.
Lemma 11.3.10. For any ﬁrst entrance time τB , any sampling distribution a, and any
positive function f : X → R+ , we have
τ
B −1 ∞ τ
B −1
Ex Ka (Φk , f ) = ai Ex f (Φk +i ) .
k =0 i=0 k =0

Proof By the Markov property and Fubini’s Theorem we have

τ
B −1
Ex Ka (Φk , f )
k =0
∞
∞
= ai Ex P i (Φk , f )I{k < τB }
i=0 k =0
∞
∞
= ai Ex E f (Φk +i ) | Fk I{k < τB } .
i=0 k =0

But now we have that I(k < τB ) is measurable with respect to Fk and so by the
smoothing property of expectations this becomes
∞
∞
ai Ex E f (Φk +i )I{k < τB } | Fk
i=0 k =0
∞
∞
= ai Ex f (Φk +i )I(k < τB )
i=0 k =0
∞
τ
B −1
= ai Ex f (Φk +i ) .
i=0 k =0

We now have a relatively simple task in proving
270 Drift and regularity

Theorem 11.3.11. Suppose that Φ is ψ-irreducible.

(i) If (V2) holds for a function V and a petite set C, then for any B ∈ B + (X) there
exists c(B) < ∞ such that
Ex [τB ] ≤ V (x) + c(B), x ∈ X.
Hence if V is bounded on A, then A is regular.
(ii) If there exists one regular set C ∈ B+ (X), then C is petite and the function V = VC
satisﬁes (V2), with V uniformly bounded on A for any regular set A.

Proof To prove (i), suppose that (V2) holds, with V bounded on A and ∞C a ψa -
petite set. Without loss of generality, from Proposition 5.5.6 we can assume i=0 i ai <
∞. We also use the simple but critical bound from the deﬁnition of petiteness:
IC (x) ≤ ψa (B)−1 Ka (x, B), x ∈ X, B ∈ B + (X). (11.27)
By Lemma 11.3.9 and the bound (11.27) we then have
τ
B −1
Ex [τB ] ≤ V (x) + bEx IC (Φk )
k =0
τ
B −1
≤ V (x) + bEx ψa (B)−1 Ka (Φk , B)
k =0
∞
τ
B −1
= V (x) + bψa (B)−1 ai Ex IB (Φk +i )
i=0 k =0
∞

≤ V (x) + bψa (B)−1 (i + 1)ai
i=0

for any B ∈ B + (X), and all x ∈ X. If V is bounded on A, it follows that

sup Ex [τB ] < ∞,
x∈A

which shows that A is regular.

To prove (ii), suppose that a regular set C ∈ B+ (X) exists. By Lemma 11.3.8 the
set C is petite. Then V = VC is clearly positive, and bounded on any regular set A.
Moreover, by Theorem 11.3.5 and regularity of C it follows that condition (V2) holds
for a suitably large constant b.

Boundedness of hitting times from arbitrary initial measures will become important
in Part III. The following deﬁnition is an obvious one.

Regularity of measures
A probability measure µ is called regular, if Eµ [τB ] < ∞ for each B ∈
B + (X).
11.3. Drift criteria for regularity 271

The proof of the following result for regular measures µ is identical to that of the
previous theorem and we omit it.
Theorem 11.3.12. Suppose that Φ is ψ-irreducible.
(i) If (V2) holds for a petite set C and a function V , and if µ(V ) < ∞, then the
measure µ is regular.
(ii) If µ is regular, and if there exists one regular set C ∈ B+ (X), then there exists an
extended-valued function V satisfying (V2) with µ(V ) < ∞.

As an application of Theorem 11.3.11 we obtain a description of regular sets as in

Theorem 11.1.4.
Proposition 11.3.13. If there exists a regular set C ∈ B+ (X), then the sets CV ( ) :=
{x : VC (x) ≤ , : ∈ Z+ } are regular and SC = {y : VC (y) < ∞} is a full absorbing set
such that Φ restricted to SC is regular.

Proof Suppose that a regular set C ∈ B+ (X) exists. Since C is regular it is also
ψa -petite, and we can assume without loss of generality that the sampling distribution
a has a ﬁnite mean. By regularity of C we also have, by Theorem 11.3.11 (ii), that (V2)
holds with V = VC . From Theorem 11.3.11 each of the sets CV ( ) is regular, and by
Lemma 11.3.6 the set SC = {y : VC (y) < ∞} is full and absorbing.

Theorem 11.3.11 gives a characterization of regular sets in terms of a drift condition.
Theorem 11.3.14 now gives such a characterization in terms of the mean hitting times
to petite sets.
Theorem 11.3.14. If Φ is ψ-irreducible, then the following are equivalent:
(i) The set C ∈ B(X) is petite and supx∈C Ex [τC ] < ∞.
(ii) The set C is regular and C ∈ B+ (X).

Proof (i) Suppose that C is petite, and let as before VC (x) = 1 + Ex [σC ]. By
Theorem 11.3.5 and the conditions of the theorem we may ﬁnd a constant b < ∞ such
that
P VC ≤ VC − 1 + bIC .
Since VC is bounded on C by construction, it follows from Theorem 11.3.11 that C is
regular. Since the set C is Harris recurrent it follows from Proposition 8.3.1 (ii) that
C ∈ B+ (X).
(ii) Suppose that C is regular. Since C ∈ B+ (X), it follows from regularity that
supx∈C Ex [τC ] < ∞, and that C is petite follows from Proposition 11.3.8.

We can now give the following complete characterization of the case X = S.
Theorem 11.3.15. Suppose that Φ is ψ-irreducible. Then the following are equivalent:
(i) The chain Φ is regular.
272 Drift and regularity

(ii) The drift condition (V2) holds for a petite set C and an everywhere ﬁnite function
V.

(iii) There exists a petite set C such that the expectation

Ex [τC ]

is ﬁnite for each x, and uniformly bounded for x ∈ C.

Proof If (i) holds, then it follows that a regular set C ∈ B + (X) exists. The function
V = VC is everywhere finite and satisfies (V2), by (11.24), for a suitably large constant
b; so (ii) holds. Conversely, Theorem 11.3.11 (i) tells us that if (V2) holds for a petite
set C with V finite valued then each sublevel set of V is regular, and so (i) holds.
If the expectation is finite as described in (iii), then by (11.24) we see that the func-
tion V = VC satisfies (V2) for a suitably large constant b. Hence from Theorem 11.3.15
we see that the chain is regular; and the converse is trivial.

11.4 Using the regularity criteria

11.4.1 Some straightforward applications
Random walk on a half line
We have already used a drift criterion for positive recurrence, without identifying it as
such, in some of our analysis of the random walk on a half line.
Using the criteria above, we have

Proposition 11.4.1. If Φ is a random walk on a half line with ﬁnite mean increment
β, then Φ is regular if
β = w Γ(dw) < 0;

and in this case all compact sets are regular sets.

Proof By consideration of the proof of Proposition 8.5.1, we see that this result has
already been established, since (11.18) was exactly the condition veriﬁed for recurrence
in that case, whilst (11.19) is simply checked for the random walk.

From the results in Section 8.5, we know that the random walk on R+ is transient
if β > 0, and that (at least under a second moment condition) it is recurrent in the
marginal case β = 0. We shall show in Proposition 11.5.3 that it is not regular in this
marginal case.

Forward recurrence times

We could also use this approach in a simple way to analyze positivity for the forward
recurrence time chain.
11.4. Using the regularity criteria 273

In this example, using the function V (x) = x we have

P (x, y)V (y) = V (x) − 1, x ≥ 1, (11.28)
y

P (0, y)V (y) = p(y) y. (11.29)
y y

know, the chain is positive recurrent if y p(y) y < ∞.
Hence, as we already
Since E0 [τ0 ] = y p(y) y the drift condition with V (x) = x is also necessary, as we
have seen.
The forward recurrence time chain thus provides a simple but clear example of the
need to include the second bound (11.19) in the criterion for positive recurrence.

Linear models
Consider the simple linear model deﬁned in (SLM1) by

Xn = αXn −1 + Wn .

We have

Proposition 11.4.2. Suppose that the disturbance variable W for the simple linear
model deﬁned in (SLM1), (SLM2) is non-singular with respect to Lebesgue measure,
and satisﬁes E[log(1 + |W |)] < ∞. Suppose also that |α| < 1. Then every compact set
is regular, and hence the chain itself is regular.

Proof From Proposition 6.3.5 we know that the chain X is a ψ-irreducible and
aperiodic T-chain under the given assumptions.
Let V (x) = log(1 + ε|x|), where ε > 0 will be ﬁxed below. We will verify that (V2)
holds with this choice of V by applying the following two special properties of this test
function:
V (x + y) ≤ V (x) + V (y), (11.30)

lim [V (x) − V (|α|x)] = log((|α|−1 ). (11.31)

x→∞

From (11.30) and (SLM1),

V (X1 ) = V (αX0 + W1 ) ≤ V (|α|X0 ) + V (W1 ),

and hence from (11.31) there exists r < ∞ such that whenever X0 ≥ r,

V (X1 ) ≤ V (X0 ) − 1
2 log(|α|−1 ) + V (W1 ).

Choosing ε > 0 suﬃciently small so that E[V (W )] ≤ 1

4 log(|α|−1 ) we see that for x ≥ r,

Ex [V (X1 )] ≤ V (x) − 1
4 log(|α|−1 ).

So we have that (V2) holds with C = {x : |x| ≤ r} and the result follows.

274 Drift and regularity

This is part of the recurrence result we proved using a stochastic comparison argu-
ment in Section 9.5.1, but in this case the direct proof enables us to avoid any restriction
on the range of the increment distribution.
We can extend this simple construction much further, and we shall do so in Chap-
ter 15 in particular, where we show that the geometric drift condition exhibited by the
linear model implies much more, including rates of convergence results, than we have
so far described.

11.4.2 The GI/G/1 queue with re-entry

In Section 2.4.2 we described models for GI/G/1 queueing systems. We now indicate
one class of models where we generalize the conditions imposed on the arrival stream
and service times by allowing re-entry to the system, and still ﬁnd conditions under
which the queue is positive Harris recurrent.
As in Section 2.4.2, we assume that customers enter the queue at successive time
instants 0 = T0 < T1 < T2 < T3 < · · · . Upon arrival, a customer waits in the queue
if necessary, and then is serviced and exits the system. In the G1/G/1 queue, the
interarrival times {Tn +1 − Tn : n ∈ Z+ } and the service times {Si : i ∈ Z+ } are
i.i.d. and independent of each other with general distributions, and means 1/λ, 1/µ
respectively.
After being served, a customer exits the system with probability r and re-enters the
queue with probability 1 − r. Hence the eﬀective rate of customers to the queue is, at
least intuitively,
λ
λr := .
r
If we now let Nn denote the queue length (not including the customer which may
be in service) at time Tn −, and this time let Rn+ denote the residual service time (set
to zero if the server is free) for the system at time Tn −, then the stochastic process

Nn
Φn = , n ∈ Z+ ,
Rn+

is a Markov chain with stationary transition probabilities evolving on the ladder-

structure space X = Z+ × R+ .
Now suppose that the load condition

λr
ρr := <1 (11.32)
µ

is satisfied. This will be shown to imply positive Harris recurrence for the chain Φ.
Write [0] = 0 × 0 for the state where the queue is empty. Under (11.32), for each
x ∈ X, we may find m ∈ Z+ sufficiently large that

Px {Φm = [0]} > 0. (11.33)

This follows because under the load constraint, there exists δ > 0 such that with positive
probability, each of the first m interarrival times exceeds each of the first m service times
by at least δ, and also none of the first m customers re-enter the queue.
11.4. Using the regularity criteria 275

For x, y ∈ X we say that x ≥ y if xi ≥ yi for i = 1, 2. It is easy to see that

Px (Φm = [0]) ≤ Py (Φm = [0]) whenever x ≥ y, and hence by (11.33) we have the
following result:

Proposition 11.4.3. Suppose that the load constraint (11.32) is satisﬁed. Then the
Markov chain Φ is δ[0] -irreducible and aperiodic, and every compact subset of X is petite.

We let Wn denote the total amount of time that the server will spend servicing the
customers which are in the system at time Tn +. Let V (x) = Ex [W0 ]. It is easily seen
that
V (x) = E[Wn | Φn = x],

and hence that P n V (x) = Ex [Wn ].

The random variable Wn is also called the waiting time of the nth customer to arrive
at the queue. The quantity W0 may be thought of as the total amount of work which
is initially present in the system. Hence it is natural that V (x), the expected work,
should play the role of a Lyapunov function.
The drift condition we will establish for some k > 0 is

Ex [Wk ] ≤ Ex [W0 ] − 1, x ∈ Ac ,
(11.34)
supx∈A Ex [Wk ] < ∞;

this implies that V (x) satisﬁes (V2) for the k-skeleton, and hence as in the proof of
Theorem 11.1.4 both the k-skeleton and the original chain are regular.

Proposition 11.4.4. Suppose that ρr < 1. Then (11.34) is satisﬁed for some compact
set A ⊂ X and some k ∈ Z+ , and hence Φ is a regular chain.

Proof Let | · | denote the Euclidean norm on R2 , and set

Am = {x ∈ X : |x| ≤ m}, m ∈ Z+ .

For each m ∈ Z+ , the set Am is a compact subset of X.

We ﬁrst ﬁx k such that (k/λ)(1−ρr ) ≥ 2; we can do this since ρr < 1 by assumption.
Let ζk then denote the time that the server is active in [0, Tk ]. We have

k
ni
Wk = W0 + S(i, j) − ζk , (11.35)
i=1 j =1

where ni denotes the number of times that the ith customer visits the system, and the
random variables S(i, j) are i.i.d. with mean µ−1 .
Now choose m so large that

Ex [ζk ] ≥ Ex [Tk ] − 1, x ∈ Acm .

276 Drift and regularity

Then by (11.35), and since λr /λ is equal to the expected number of times that a
customer will re-enter the queue,

k
Ex [Wk ] ≤ Ex [W0 ] + Ex [ni ](1/µ) − (E[Tk ] − 1)
i=1
= Ex [W0 ] + (kλr /λ)(1/µ) − k/λ + 1
= Ex [W0 ] − (k/λ)(1 − ρr ) + 1,

and this completes the proof that (11.34) holds.

11.4.3 Regularity of the scalar SETAR model

Let us conclude this section by analyzing the SETAR models deﬁned in (SETAR1) and
(SETAR2) by
Xn = φ(j) + θ(j)Xn −1 + Wn (j), Xn −1 ∈ Rj ;
these were shown in Proposition 6.3.6 to be ϕ-irreducible T-chains with ϕ taken as
Lebesgue measure µL e b on R under these assumptions.
In Proposition 9.5.4 we showed that the SETAR chain is transient in the “exterior”
of the parameter space; we now use Theorem 11.3.15 to characterize the behavior of the
chain in the “interior” of the space (see Figure B.1). This still leaves the characterization
on the boundaries, which will be done below in Section 11.5.2.
Let us call the interior of the parameter space that combination of parameters given
by
θ(1) < 1, θ(M ) < 1, θ(1)θ(M ) < 1 (11.36)
θ(1) = 1, θ(M ) < 1, φ(1) > 0 (11.37)
θ(1) < 1, θ(M ) = 1, φ(M ) < 0 (11.38)
θ(1) = θ(M ) = 1, φ(M ) < 0 < φ(1) (11.39)
θ(1) < 0, θ(1)θ(M ) = 1, φ(M ) + θ(M )φ(1) > 0. (11.40)

Proposition 11.4.5. For the SETAR model satisfying (SETAR1)–(SETAR2), the

chain is regular in the interior of the parameter space.

Proof To prove regularity for this interior set, we use (V2), and show that when
(11.36)–(11.40) hold there is a function V and an interval set [−R, R] satisfying the
drift condition
P (x, dy)V (y) ≤ V (x) − 1, |x| > R. (11.41)

First consider the condition (11.36). When this holds it is straightforward to calculate
that there must exist positive constants a, b such that

1 > θ(1) > −(b/a),

1 > θ(M ) > −(a/b).

11.4. Using the regularity criteria 277

If we now take "

ax x>0
V (x) =
b |x| x≤0
then it is easy to check that (11.41) holds under (11.36) for all |x| suﬃciently large.
To prove regularity under (11.37), use the function
"
γx x>0
V (x) = −1
2 [φ(1)] |x| x ≤ 0

for which (11.41) is again satisﬁed provided

γ > 2 |θ(M )| [φ(1)]−1

for all |x| suﬃciently large. The suﬃciency of (11.38) follows by symmetry, or directly
by choosing the test function
"
γ |x| x≤0
V (x) =
−2 [φ(M )]−1 x x > 0

with
γ > −2 |θ(1)| [φ(M )]−1 .
In the case (11.39), the chain is driven by the constant terms and we use the test
function "
2 [φ(1)]−1 |x| x≤0
V (x) = −1
2 [|φ(M )|] x x > 0
to give the result.
The region deﬁned by (11.40) is the hardest to analyze. It involves the way in which
successive movements of the chain take place, and we reach the result by considering
the two-step transition matrix P 2 .
Let fj denote the density of the noise variable W (j). Fix j and x ∈ Rj and write

R(k, j) = {y : y + φ(j) + θ(j)x ∈ Rk },

ζ(k, x) = −φ(k) − θ(k)φ(j) − θ(k)θ(j)x.

If we take the linear test function
"
ax x>0
V (x) =
b |x| x≤0

(with a, b to be determined below), then we have

M ∞
P 2 (x, dy)V (y) = a (u − ζ(k, x))[ fk (u − θ(k)w)fj (w)dw]du
k =1 ζ (k ,x) R (k ,j )
ζ (k ,x)
−b (u − ζ(k, x))[ fk (u − θ(k)w)fj (w)dw]du.
−∞ R (k ,j )
278 Drift and regularity

It is straightforward to ﬁnd from this that for some R > 0, we have

P 2 (x, dy)V (y) ≤ −bx − (b/2)(φ(M ) + θ(M )φ(1)), x ≤ −R,

P 2 (x, dy)V (y) ≤ ax + (a/2)(φ(1) + θ(1)φ(M )), x ≥ R.

But now by assumption φ(M ) + θ(M )φ(1) > 0, and the complete set of conditions
(11.40) also give φ(1) + θ(1)φ(M ) < 0. By suitable choice of a, b we have that the drift
condition (11.41) holds for the two-step chain, and hence this chain is regular. Clearly,
this implies that the one-step chain is also regular, and we are done.

11.5 Evaluating non-positivity

11.5.1 A drift criterion for non-positivity
Although criteria for regularity are central to analyzing stability, it is also of value to
be able to identify unstable models.

Theorem 11.5.1. Suppose that the non-negative function V satisﬁes

∆V (x) ≥ 0, x ∈ Cc ; (11.42)

and
sup P (x, dy)|V (x) − V (y)| < ∞. (11.43)
x∈X

Then for any x0 ∈ C c such that

V (x0 ) > V (x), for all x ∈ C (11.44)

we have Ex 0 [τC ] = ∞.

Proof The proof uses a technique similar to that used to prove Dynkin’s formula.
Suppose by way of contradiction that Ex 0 [τC ] < ∞, and let Vk = V (Φk ). Then we have

τC
Vτ C = V0 + (Vk − Vk −1 )
k =1
∞
= V0 + (Vk − Vk −1 )I{τC ≥ k}.
k =1

Now from the bound in (11.43) we have for some B < ∞

∞
∞

6 7
Ex 0 |E[(Vk − Vk −1 ) | FkΦ−1 ]I{τC ≥ k}| ≤ B Px 0 {τC ≥ k} = BEx 0 [τC ]
k =1 k =1
11.5. Evaluating non-positivity 279

which is ﬁnite. Thus the use of Fubini’s Theorem is justiﬁed, giving

∞

Ex 0 [Vτ C ] = V0 (x0 ) + Ex 0 [E[(Vk − Vk −1 ) | FkΦ−1 ]I{τC ≥ k}] ≥ V0 (x0 ).
k =1

But by (11.44), Vτ C < V0 (x0 ) with probability one, and this contradiction shows that
Ex 0 [τC ] = ∞.

This gives a criterion for a ψ-irreducible chain to be non-positive. Based on Theo-
rem 11.1.4 we have immediately

Theorem 11.5.2. Suppose that the chain Φ is ψ-irreducible and that the non-negative
function V satisﬁes (11.42) and (11.43) where C ∈ B+ (X). If the set

C+c = {x ∈ X : V (x) > sup V (y)}

y ∈C

also lies in B + (X) then the chain is non-positive.

In practice, one would set C equal to a sublevel set of the function V so that the
condition (11.44) is satisﬁed automatically for all x ∈ C c .
It is not the case that this result holds without some auxiliary conditions such as
(11.43). For take the state space to be Z+ , and deﬁne P (0, i) = 2−i for all i > 0; if we
now choose k(i) > 2i, and let

P (i, 0) = P (i, k(i)) = 1/2,

then the chain is certainly positive Harris, since by direct calculation

P0 (τ0 ≥ n + 1) ≤ 2−n .

But now if V (i) = i then for all i > 0

∆V (i) = [k(i)/2] − i > 0

and in fact we can choose k(i) to give any value of ∆V (i) we wish.

11.5.2 Applications to random walk and SETAR models

As an immediate application of Theorem 11.5.2 we have

Proposition 11.5.3. If Φ is a random walk on a half line with mean increment β then
Φ is regular if and only if
β = w Γ(dw) < 0.

Proof In Proposition 11.4.1 the suﬃciency of the negative drift condition was
established. If
β = w Γ(dw) ≥ 0,
280 Drift and regularity

then using V (x) = x we have (11.42), and the random walk homogeneity properties
ensure that the uniform drift condition (11.43) also holds, giving non-positivity.

We now give a much more detailed and intricate use of this result to show that the
scalar SETAR model is recurrent but not positive on the “margins” of its parameter
set, between the regions shown to be positive in Section 11.4.3 and those regions shown
to be transient in Section 9.5.2: see Figure B.1–Figure B.3 for the interpretation of the
parameter ranges. In terms of the basic SETAR model deﬁned by

Xn = φ(j) + θ(j)Xn −1 + Wn (j), Xn −1 ∈ Rj ,

we call the margins of the parameter space the regions deﬁned by

θ(1) < 1, θ(M ) = 1, φ(M ) = 0 (11.45)

θ(1) = 1, θ(M ) < 1, φ(1) = 0 (11.46)

θ(1) = θ(M ) = 1, φ(M ) = 0, φ(1) ≥ 0 (11.47)
θ(1) = θ(M ) = 1, φ(M ) < 0, φ(1) = 0 (11.48)
θ(1) < 0, θ(1)θ(M ) = 1, φ(M ) + θ(M )φ(1) = 0. (11.49)
We ﬁrst establish recurrence; then we establish non-positivity. For this group of param-
eter combinations, we need test functions of the form V (x) = log(u + ax) where u, a are
chosen to give appropriate drift in (V1). To use these we will need the full force of the
approximation results in Lemma 8.5.2, Lemma 8.5.3, Lemma 9.4.3, and Lemma 9.4.4,
which we previously used in the analysis of random walk, and to analyze this region we
will also need to assume (SETAR3): that is, that the variances of the noise distributions
for the two end intervals are ﬁnite.

Proposition 11.5.4. For the SETAR model satisfying (SETAR1)–(SETAR3), the

chain is recurrent on the margins of the parameter space.

Proof We will consider the test function

"
log(u + ax) x > R > rM −1
V (x) = (11.50)
log(v − bx) x < −R < r1

and V (x) = 0 in the region [−R, R], where a, b and R are positive constants and u and
v are real numbers to be chosen suitably for the different regions (11.45)-(11.49).
We denote the non-random part of the motion of the chain in the two end regions
by
k(x) = φ(M ) + θ(M )x
and
h(x) = φ(1) + θ(1)x.
We first prove recurrence when (11.45) or (11.46) holds. The proof is similar in style
to that used for random walk in Section 9.5, but we need to ensure that the different
behavior in each end of the two end intervals can be handled simultaneously.
11.5. Evaluating non-positivity 281

Consider ﬁrst the parameter region θ(M ) = 1, φ(M ) = 0, and 0 ≤ θ(1) < 1, and
choose a = b = u = v = 1, with x > R > rM −1 . Write in this case

V1 (x) = E[log(u + ak(x) + aW (M ))I[k (x)+W (M )> R ] ]

V2 (x) = E[log(v − bk(x) − bW (M ))I[k (x)+W (M )< −R ] ] (11.51)

so that
Ex [V (X1 )] = V1 (x) + V2 (x).
In order to bound the terms in the expansion of the logarithms in V1 , V2 , we use the
further notation

V3 (x) = (a/(u + ak(x)))E[W (M )I[W (M )> R −k (x)] ]

V4 (x) = (a2 /(2(u + ak(x))2 ))E[W 2 (M )I[R −k (x)< W (M )< 0] ]
V5 (x) = (b/(v − bk(x)))E[W (M )I[W (M )< −R −k (x)] ]. (11.52)

Since E(W 2 (M )) < ∞

V4 (x) = (a2 /(2(u + ak(x))2 ))E[W 2 (M )I[W (M )< 0] ] − o(x−2 ),

and by Lemma 8.5.3 both V3 and V5 are also o(x−2 ).

For x > R, u + ak(x) > 0, and thus by Lemma 8.5.2,

V1 (x) ≤ ΓM (R − k(x), ∞) log(u + ak(x)) + V3 (x) − V4 (x),

while v − bk(x) < 0, and thus by Lemma 9.4.3,

V2 (x) ≤ ΓM (−∞, −R − k(x))(log(−v + bk(x)) − 2) − V5 (x).

By Lemma 9.4.4(i) we also have that the terms

−ΓM (−∞, R − k(x)) log(u + ak(x)) + ΓM (−∞, −R − k(x))(log(−v + bk(x)) − 2)

are o(x−2 ). Thus by choosing R large enough

Ex [V (X1 )] ≤ V (x) − (a2 /(2(u + ak(x))2 ))E[W 2 (M )I[W (M )< 0] ] + o(x−2 )

≤ V (x), x > R. (11.53)

For x < −R < r1 and θ(1) = 0, Ex [V (X1 )] is a constant and is therefore less than V (x)
for large enough R.
For x < −R < r1 and 0 < θ(1) < 1, consider

V6 (x) = E[log(u + ah(x) + aW (1))I[h(x)+W (1)> R ] ]

V7 (x) = E[log(v − bh(x) − bW (1))I[h(x)+W (1)< −R ] ] : (11.54)

we have as before
Ex [V (X1 )] = V6 (x) + V7 (x). (11.55)
To handle the expansion of terms in this case we use

V8 (x) = (a/(u + ah(x)))E[W (1)I[W (1)> R −h(x)] ]

282 Drift and regularity

V9 (x) = (b/v − bh(x)))E[W (1)I[W (1)< −R −h(x)] ]

V10 (x) = (b2 /(2(v − bh(x))2 ))E[W 2 (1)I[−R −h(x)> W (1)> 0] ].
Since E[W 2 (1)] < ∞

V10 (x) = (b2 /(2(v − bh(x))2 ))E[W 2 (1)I[W (1)> 0] ] − o(x−2 ),

and by Lemma 8.5.3, both V8 (x) and V9 (x) are o(x−2 ).

For x < −R, u + ah(x) < 0, we have by Lemma 9.4.3(i),

V6 (x) ≤ Γ1 (R − h(x), ∞)(log(−u − ah(x)) − 2) − V8 (x),

and v − bh(x) > 0, so that by Lemma 8.5.2,

V7 (x) ≤ Γ1 (−∞, −R − h(x)) log(v − bh(x)) − V9 (x) − V10 (x).

Hence choosing R large enough that v − bh(x) ≤ v − bx, we have from (11.55),

Γ1 (−∞, −R − h(x)) log(v − bh(x)) ≤ Γ1 (−∞, −R − h(x)) log(v − bx)

= V (x) − Γ1 (−R − h(x), ∞) log(v − bx).

By Lemma 9.4.4(ii),

Γ1 (R − h(x), ∞)(log(−u − ah(x)) − 2) − Γ1 (−R − h(x), ∞) log(v − bx) ≤ o(x−2 ),

and thus

Ex [V (X1 )] ≤ V (x) − (b2 /(2(v − bh(x))2 ))E[W 2 (1)IW (1)> 0] ] + o(x−2 )

≤ V (x), x < −R. (11.56)

Finally consider the region θ(M ) = 1, φ(M ) = 0, θ(1) < 0, and choose a = −bθ(M )
and v − u = aφ(1). For x > R > rM −1 , (11.53) is obtained in a manner similar to the
above. For x < −R < r1 , we look at

V11 (x) = (a2 /(2(u + ah(x))2 ))E[W 2 (1)I[R −h(x)< W (1)< 0] ].

By Lemma 9.4.3

V6 (x) ≤ Γ1 (R − h(x), ∞) log(u + ah(x)) + V8 (x) − V11 (x),

and
V7 (x) ≤ Γ1 (−∞, −R − h(x))(log(−v + bh(x)) − 2) − V9 (x).
From the choice of a, b, u and v,

log(u + ah(x)) = log(v − bx) = V (x),

and thus by Lemma 8.5.3 and Lemma 9.4.4(i) for R large enough

Ex [V (X1 )] ≤ V (x) − (a2 /(2(u + ah(x))2 ))E[W 2 (1)I[W (1)< 0] ] + o(x−2 )

≤ V (x), x < −R. (11.57)
11.5. Evaluating non-positivity 283

When (11.46) holds, the recurrence of the SETAR model follows by symmetry from the
result in the region (11.45).
(ii) We now consider the region where (11.47) holds: in (11.48) the result will
again follow by symmetry.
Choose a = b = u = v = 1 in the deﬁnition of V . For x > R > rM −1 , (11.53) holds
as before. For x < −R < r1 , since 1 − h(x) ≤ 1 − x,

Γ1 (−∞, −R − h(x)) log(1 − h(x)) ≤ Γ1 (−∞, −R − h(x)) log(1 − x).

From this, (11.56) is also obtained as before.

(iii) Finally we show that the chain is recurrent if the boundary condition (11.49)
holds.
Choose v − u = bφ(M ) = aφ(1), b = −aθ(1) = −a/θ(M ). For x > R > rM −1 ,
consider
V12 (x) = (b2 /(2(v − bk(x))2 ))E[W 2 (M )I[−R −k (x)> W (M )> 0] ].
By Lemma 9.4.3 we get both

V1 (x) ≤ ΓM (R − k(x), ∞)(log(−u − ak(x)) − 2) − V3 (x),

V2 (x) ≤ ΓM (−∞, −R − k(x)) log(v − bk(x)) − V5 (x) − V12 (x).

From the choice of a, b, u and v

ΓM (−∞, −R − k(x)) log(v − bk(x)) = log(u + ax) − ΓM (−R − k(x), ∞) log(u + ax),

and thus by Lemma 9.4.4(i) and (iii), for R large enough

Ex [V (X1 )]] ≤ V (x) − (b2 /(2(v − bk(x))2 ))E[W 2 (M )I[W (M )> 0] ] + o(x−2 )
≤ V (x), x > R. (11.58)

For x < −R < r1 , since

log(u + ah(x)) = log(v − bx),
(11.57) is obtained similarly.
It is obvious that the above test functions V are coercive, and hence (V1) holds out-
side a compact set [−R, R] in each case. Hence we have recurrence from Theorem 9.1.8.

To complete the classiﬁcation of the model, we need to prove that in this region the
model is not positive recurrent.

Proposition 11.5.5. For the SETAR model satisfying (SETAR1)–(SETAR3), the

chain is non-positive on the margins of the parameter space.

Proof We need to show that in the case where

φ(1) < 0, φ(1)φ(M ) = 1, θ(1)φ(M ) + θ(M ) ≤ 0

the chain is non-positive. To do this we appeal to the criterion in Section 11.5.1.

284 Drift and regularity

As we have φ(1)φ(M ) = 1 we can as before ﬁnd positive constants a, b such that

φ(1) = −ba−1 , φ(M ) = −ab−1 .

We will consider the test function

V (x) = Vcd (x) + Ik R (x) (11.59)

where the functions Vcd and Ik R are deﬁned for positive c, d, k, R by

"
k |x| ≤ R
Ik R (x) =
0 |x| > R

and "
ax + c x>0
Vcd (x) = .
b |x| + d x ≤ 0

It is immediate that

P (x, dy)|V (x) − V (y)| ≤ aE[|W1 |] + bE[|WM |] + 2(a|θ(1)| + b|θ(M )|) + 2|d − c|,

whilst V is obviously coercive.

We now verify that indeed the mean drift of V (Φn ) is positive. Now for x ∈ RM ,
we have

P (x, dy)V (y) = ΓM (dy − θ(M ) − φ(M )x)Vcd (y)

+ ΓM (dy − θ(M ) − φ(M )x)Ik R (y), (11.60)

and the ﬁrst of these terms can be written as

ΓM (dy − θ(M ) − φ(M )x)Vcd (y)

6 7
= ΓM (dz) −b(z + θ(M ) + φ(M )x) + d
∞
6 7
+ ΓM (dz) (a + b)(z + θ(M ) + φ(M )x) + c − d . (11.61)
−θ (M )−φ(M )x

Using this representation we thus have

P (x, dy)V (y) = ax + d − bθ(M )
∞
+ ΓM (dy − θ(M ) − φ(M )x)[(a + b)y + c − d]
0
R
+ kΓM (dy − θ(M ) − φ(M )x). (11.62)
−R
11.6. Commentary 285

A similar calculation shows that for x ∈ R1 ,

P (x, dy)V (y) = −bx + c − aθ(1)
0
− Γ1 (dy − θ(1) − φ(1)x)[(a + b)y + c − d]
−∞
R
+ kΓ1 (dy − θ(1) − φ(1)x). (11.63)
−R

Let us now choose the positive constants c, d to satisfy the constraints

aθ(1) ≥ d − c ≥ bθ(M ) (11.64)

(which is possible since θ(1)φ(M ) + θ(M ) ≤ 0) and k, R suﬃciently large that

R ≥ max(|θ(1)|, |θ(M )|) (11.65)

k ≥ (a + b) max(|θ(1)|, |θ(M )|). (11.66)

It then follows that for all x with |x| suﬃciently large

P (x, dy)V (y) ≥ V (x)

and the chain is non-positive from Section 11.5.1.

11.6 Commentary
For countable space chains, the results of this chapter have been thoroughly explored.
The equivalence of positive recurrence and the finiteness of expected return times to
each atom is a consequence of Kac’s Theorem, and as we saw in Proposition 11.1.1, it
is then simple to deduce the regularity of all states. As usual, Feller [114] or Chung [71]
or Çinlar [59] provide excellent discussions.
Indeed, so straightforward is this in the countable case that the name “regular
chain”, or any equivalent term, does not exist as far as we are aware. The real focus
on regularity and similar properties of hitting times dates to Isaac [169] and Cogburn
[75]; the latter calls regular sets “strongly uniform”. Although many of the properties
of regular sets are derived by these authors, proving the actual existence of regular sets
for general chains is a surprisingly difficult task. It was not until the development of
the Nummelin–Athreya–Ney theory of splitting and embedded regeneration occurred
that the general result of Theorem 11.1.4, that positive recurrent chains are “almost”
regular chains was shown (see Nummelin [302]).
Chapter 5 of Nummelin [303] contains many of the equivalences between regularity
and positivity, and our development owes a lot to his approach. The more general
f -regularity condition on which he focuses is central to our Chapter 14: it seems worth
considering the probabilistic version here first.
For countable chains, the equivalence of (V2) and positive recurrence was developed
by Foster [129], although his proof of sufficiency is far less illuminating than the one we
286 Drift and regularity

have here. The earliest results of this type on a non-countable space appear to be those
in Lamperti [235], and the results for general ψ-irreducible chains were developed by
Tweedie [397, 398]. The use of drift criteria for continuous space chains, and the use of
Dynkin’s formula in discrete time, seem to appear for the first time in Kalashnikov [187,
189, 190]. The version used here and later was developed in Meyn and Tweedie [277],
although it is well known in continuous time for more special models such as diffusions
(see Kushner [232] or Khas’minskii [206]).
There are many rediscoveries of mean drift theorems in the literature. For operations
research models (V2) is often known as Pakes’ Lemma from [313]: interestingly, Pakes’
result rediscovers the original form buried in the discussion of Kendall’s famous queueing
paper [200], where Foster showed that a sufficient condition for positivity of a chain on
Z+ is the existence of a solution to the pair of equations

P (x, y)V (y) ≤ V (x) − 1, x≥N

P (x, y)V (y) < ∞, x < N,

although in [129] he only gives the result for N = 1. The general N form was also re-
discovered by Moustafa [289], and a form for reducible chains given by Mauldon [251].
An interesting state-dependent variation is given by Malyšhev and Men’šikov [243]; we
return to this and give a proof based on Dynkin’s formula in Chapter 19.
The systematic exploitation of the various equivalences between hitting times and
mean drifts, together with the representation of π, is new in the way it appears here.
In particular, although it is implicit in the work of Tweedie [398] that one can identify
sublevel sets of test functions as regular, the current statements are much more com-
prehensive than those previously available, and generalize easily to give an appealing
approach to f -regularity in Chapter 14.
The criteria given here for chains to be non-positive have a shorter history. The
fact that drift away from a petite set implies non-positivity provided the increments are
bounded in mean appears first in Tweedie [398], with a different and less transparent
proof, although a restricted form is in Doob ([99], p. 308), and a recent version similar
to that we give here has been recently given by Fayolle et al. [110]. All proofs we know
require bounded mean increments, although there appears to be no reason why weaker
constraints may not be as effective.
Related results on the drift condition can be found in Marlin [249], Tweedie [396],
Rosberg [336] and Szpankowski [380], and no doubt in many other places: we return to
these in Chapter 19.
Applications of the drift conditions are widespread. The first time series application
appears to be by Jones [182], and many more have followed. Laslett et al. [237] give an
overview of the application of the conditions to operations research chains on the real
line. The construction of a test function for the GI/G/1 queue given in Section 11.4.2
is taken from Meyn and Down [273] where this forms a first step in a stability analysis
of generalized Jackson networks. A test function approach is also used in Sigman [354]
and Fayolle et al. [110] to obtain stability for queueing networks: the interested reader
should also note that in Borovkov [43] the stability question is addressed using other
means.
The SETAR analysis we present here is based on a series of papers where the SETAR
model is analyzed in increasing detail. The positive recurrence and transience results
11.6. Commentary 287

are essentially in Petruccelli et al. [315] and Chan et al. [64], and the non-positivity
analysis as we give it here is taken from Guo and Petruccelli [149]. The assumption of
finite variances in (SETAR3) is again almost certainly redundant, but an exact condition
is not obvious.
We have been rather more restricted than we could have been in discussing specific
models at this point, since many of the most interesting examples, both in operations
research and in state space and time series models, actually satisfy a stronger version
of the drift condition (V2): we discuss these in detail in Chapter 15 and Chapter 16.
However, it is not too strong a statement that Foster’s criterion (as (V2) is often known)
has been adopted as the tool of choice to classify chains as positive recurrent: for a
number of applications of interest we refer the reader to the recent books by Tong
[388] on nonlinear models and Asmussen [9] on applied probability models. Variations
for two-dimensional chains on the positive quadrant are also widespread: the first of
these seems to be due to Kingman [207], and ongoing usage is typified by, for example,
Fayolle [109].
Chapter 12

Invariance and tightness

In one of our heuristic descriptions of stability, in Section 1.3, we outlined a picture of

a chain settling down to a stable regime independent of its initial starting point: we
will show in Part III that positive Harris chains do precisely this, and one role of π is
to describe the ﬁnal stochastic regime of the chain, as we have seen.
It is equally possible to approach the problem from the other end: if we have a
limiting measure for P n , then it may well generate a stationary measure for the chain.
We saw this described brieﬂy in (10.4): and our main goal now is to consider chains on
topological spaces which do not necessarily enjoy the property of ψ-irreducibility, and
to show how we can construct invariant measures for such chains through such limiting
arguments, rather than through regenerative and splitting techniques.
We will develop the consequences of the following slightly extended form of bound-
edness in probability, introduced in Chapter 6.

Tightness and boundedness in probability on average

A sequence of probabilities {µk : k ∈ Z+ } is called tight if for each ε > 0,
there exists a compact subset C ⊂ X such that

lim inf µk (C) ≥ 1 − ε. (12.1)

k →∞

The chain Φ will be called bounded in probability on average if for each

initial condition x ∈ X the sequence {P k (x, · ) : k ∈ Z+ } is tight, where
we deﬁne
1 i
k
P k (x, · ) := P (x, · ). (12.2)
k i=1

We have the following highlights of the consequences of these deﬁnitions.

288
12.1. Chains bounded in probability 289

Theorem 12.0.1. (i) If Φ is a weak Feller chain which is bounded in probability on

average, then there exists at least one invariant probability measure.
(ii) If Φ is an e-chain which is bounded in probability on average, then there exists
a weak Feller transition function Π such that for each x the measure Π(x, · ) is
invariant, and
P n (x, f ) → Π(x, f ), as n → ∞,
for all bounded continuous functions f , and all initial conditions x ∈ X.

Proof We prove (i) in Theorem 12.1.2, together with a number of consequents for
weak Feller chains. The proof of (ii) essentially occupies Section 12.4, and is concluded
in Theorem 12.4.1.

We will see that for Feller chains, and even more powerfully for e-chains, this ap-
proach based upon tightness and weak convergence of probability measures provides
a quite different method for constructing an invariant probability measure. This is
exemplified by the linear model construction which we have seen in Section 10.5.4.
From such constructions we will show in Section 12.4 that (V2) implies a form of
positivity for a Feller chain. In particular, for e-chains, if (V2) holds for a compact
set C and an everywhere finite function V then the chain is bounded in probability on
average, so that there is a collection of invariant measures as in Theorem 12.0.1 (ii).
In this chapter we also develop a class of kernels, introduced by Neveu in [295],
which extend the definition of the kernels UA . This involves extending the definition of
a stopping time to randomized stopping times. These operators have very considerable
intuitive appeal and demonstrate one way in which the results of Section 10.4 can be
applied to non-irreducible chains.
Using this approach, we will also show that (V1) gives a criterion for the existence
of a σ-finite invariant measure for a Feller chain.

12.1 Chains bounded in probability

12.1.1 Weak and vague convergence
It is easy to see that for any chain, being bounded in probability on average is a stronger
condition than being non-evanescent.
Proposition 12.1.1. If Φ is bounded in probability on average, then it is non-
evanescent.

Proof We obviously have

∞
*
Px { I(Φj ∈ C)} ≥ P n (x, C); (12.3)
j =n

if Φ is evanescent, then for some x there is an ε > 0 such that for every compact C,
∞
*
lim sup Px { I(Φj ∈ C)} ≤ 1 − ε
n →∞
j =n
290 Invariance and tightness

and so the chain is not bounded in probability on average.

The consequences of an assumption of tightness are well-known (see Billingsley [36]):
essentially, tightness ensures that we can take weak limits (possibly through a subse-
quence) of the distributions {P k (x, · ) : k ∈ Z+ } and the limit will then be a probability
measure. In many instances we may apply Fatou’s Lemma to prove that this limit is
subinvariant for Φ; and since it is a probability measure it is in fact invariant.
We will then have, typically, that the convergence to the stationary measure (when
it occurs) is in the weak topology on the space of all probability measures on B(X) as
deﬁned in Section D.5.

12.1.2 Feller chains and invariant probability measures

For weak Feller chains, boundedness in probability gives an effective approach to finding
an invariant measure for the chain, even without irreducibility.
We begin with a general existence result which gives necessary and sufficient condi-
tions for the existence of an invariant probability. From this we will find that the test
function approach developed in Chapter 11 may be applied again, this time to establish
the existence of an invariant probability measure for a Feller Markov chain.
Recall that the geometrically sampled
∞ Markov transition function, or resolvent, Ka ε
is defined for ε < 1 as Ka ε = (1 − ε) k =0 εk P k

Theorem 12.1.2. Suppose that Φ is a Feller Markov chain. Then

(i) If an invariant probability does not exist, then for any compact set C ⊂ X,

P n (x, C) → 0 as n → ∞ (12.4)
Ka ε (x, C) → 0 as ε ↑ 1 (12.5)

uniformly in x ∈ X.

(ii) If Φ is bounded in probability on average, then it admits at least one invariant

probability.

Proof We prove only (12.4), since the proof of (12.5) is essentially identical. The
proof is by contradiction: we assume that no invariant probability exists, and that
(12.4) does not hold.
Fix f ∈ Cc (X) such that f ≥ 0, and ﬁx δ > 0. Deﬁne the open sets {Ak : k ∈ Z+ }
by $ %
Ak = x ∈ X : P k f > δ .

If (12.4) does not hold then for some such f there exists δ > 0 and a subsequence
{Ni : i ∈ Z+ } of Z+ with AN i = ∅ for all i. Let xi ∈ AN i for each i, and deﬁne

λi := P N i (xi , · )

We see from Proposition D.5.6 that the set of sub-probabilities is sequentially compact
v
with respect to vague convergence. Let λ∞ be any vague limit point: λn i −→ λ∞ for
12.1. Chains bounded in probability 291

some subsequence {ni : i ∈ Z+ } of Z+ . The sub-probability λ∞ = 0 because, by the

deﬁnition of vague convergence, and since xi ∈ AN i ,

f dλ∞ ≥ lim inf f dλi
i→∞

= lim inf P N i (xi , f )

i→∞
≥ δ > 0. (12.6)

But now λ∞ is a non-trivial invariant measure. For, letting g ∈ Cc (X) satisfy g ≥ 0, we

have by continuity of P g and (D.6),

g dλ∞ = limi→∞ P N n i (xn i , g)
= limi→∞ [P N n i (xn i , g) + Ni−1 (P N n i +1 (xn i , g) − P g)]
(12.7)
= limi→∞ P N n i (xn i , P g)
≥ (P g) dλ∞ .

By regularity of finite measures on B(X) (cf Theorem D.3.2) this implies that λ∞ ≥
λ∞ P , which is only possible if λ∞ = λ∞ P . Since we have assumed that no invariant
probability exists it follows that λ∞ = 0, which contradicts (12.6). Thus we have that
Ak = ∅ for sufficiently large k.
To prove (ii), let Φ be bounded in probability on average. Since we can find ε > 0,
j
x ∈ X and a compact set C such that P (x, C) > 1 − ε for all sufficiently large j by
definition, (12.4) fails and so the chain admits an invariant probability.

The following corollary easily follows: notice that the condition (12.8) is weaker
than the obvious condition of Lemma D.5.3 for boundedness in probability on average.

Proposition 12.1.3. Suppose that the Markov chain Φ has the Feller property, and
that a coercive function V exists such that for some initial condition x ∈ X,

lim inf Ex [V (Φk )] < ∞. (12.8)

k →∞

Then an invariant probability exists.

These results require minimal assumptions on the chain. They do have two draw-
backs in practice.
Firstly, there is no guarantee that the invariant probability is unique. Currently,
known conditions for uniqueness involve the assumption that the chain is ψ-irreducible.
This immediately puts us in the domain of Chapter 10, and if the measure ψ has an
open set in its support, then in fact we have the full T-chain structure immediately
available, and so we would avoid the weak convergence route.
Secondly, and essentially as a consequence of the lack of uniqueness of the invariant
measure π, we do not generally have guaranteed that
w
P n (x, · ) −→ π.

However, we do have the result

292 Invariance and tightness

Proposition 12.1.4. Suppose that the Markov chain Φ has the Feller property, and is
bounded in probability on average.
If the invariant measure π is unique then for every x
w
P n (x, · ) −→ π. (12.9)

Proof Since for every subsequence {nk } the set of probabilities {P n k (x, · )} is
sequentially compact in the weak topology, then as in the proof of Theorem 12.1.2,
from boundedness in probability we have that there is a further subsequence converging
weakly to a non-trivial limit which is invariant for P . Since all these limits coincide by
the uniqueness assumption on π we must have (12.9).

Recall that in Proposition 6.4.2 we came to a similar conclusion. In that result,
convergence of the distributions to a unique invariant probability, in a manner similar
to (12.9), is given as a condition under which a Feller chain Φ is an e-chain.

12.2 Generalized sampling and invariant measures

In this section we generalize the idea of sampled chains in order to develop another
approach to the existence of invariant measures for Φ. This relies on an identity called
the resolvent equation for the kernels UB , B ∈ B(X). The idea of the generalized
resolvent identity is taken from the theory of continuous time processes, and we shall
see that even in discrete time it unifies several concepts which we have used already, and
which we shall use in this chapter to give a different construction method for σ-finite
invariant measures for a Feller chain, even without boundedness in probability.
To state the resolvent equation in full generality we introduce randomized first
entrance times. These include as special cases the ordinary first entrance time τA ,
and also random times which are completely independent of the process: the former
have of course been used extensively in results such as the identification of the structure
of the unique invariant measure for ψ-irreducible chains, whilst the latter give us the
sampled chains with kernel Ka ε .
The more general version involves a function h which will usually be continuous
with compact support when the chain is on a topological space, although it need not
always be so.
Let 0 ≤ h ≤ 1 be a function on X. The random time τh which we associate with
the function h will have the property that Px {τh ≥ 1} = 1, and for any initial condition
x ∈ X and any time k ≥ 1,
Px {τh = k | τh ≥ k, F∞
Φ
} = h(Φk ). (12.10)
A probabilistic interpretation of this equation is that at each time k ≥ 1 a weighted
coin is flipped with the probability of heads equal to h(Φk ). At the first instance k that
a head is finally achieved we set τh = k. Hence we must have, for any k ≥ 1,
−1
k1
Px {τh = k | F∞
Φ
} = (1 − h(Φi ))h(Φk ) (12.11)
i=1
−1
k1
Px {τh ≥ k | F∞
Φ
} = (1 − h(Φi )) (12.12)
i=1
12.2. Generalized sampling and invariant measures 293

where the product is interpreted as one when k = 1.

For example, if h = IB then we see that τh = τB . If h = 12 IB then a fair coin is
flipped on each visit to B, so that Φτ h ∈ B, but with probability one half, the random
time τh will be greater then τB .
Note that this is very similar to the Athreya–Ney randomized stopping time con-
struction of an atom, mentioned in Section 5.1.3.
By enlarging the probability space on which Φ is defined, and adjoining an i.i.d.
process Y = {Yk , k ∈ Z+ } to Φ, we now show that we can explicitly construct the
random time τh so that it is an ordinary stopping time for the bivariate chain

Φk
Ψk = , k ∈ Z+ .
Yk
Suppose that Y is i.i.d. and independent of Φ, and that each Yk has distribution µu n i ,
where µu n i denotes the uniform distribution on [0, 1]. Then for any sets A ∈ B(X),
B ∈ B([0, 1]),
Px {Ψ1 ∈ A × B | Φ0 = x, Y0 = u} = P (x, A)µu n i (B)
With this transition probability, Ψ is a Markov chain whose state space is equal to
Y = X × [0, 1].
Let Ah ∈ B(Y) denote the set
Ah = {(x, u) ∈ Y : h(x) ≥ u}
and define the random time τh = min(k ≥ 1 : Ψk ∈ Ah ). Then τh is a stopping time
for the bivariate chain.
We see at once from the definition and the fact that Yk is independent of
(Φ, Y1 , . . . , Yk −1 ) that τh satisfies (12.10). For given any k ≥ 1,
Px {τh = k | τh ≥ k, F∞
Φ
} = Px {h(Φk ) ≥ Yk | τh ≥ k, F∞
Φ
}
= Px {h(Φk ) ≥ Yk | F∞ }
Φ

= h(Φk ),
where in the second equality we used the fact that the event {τh ≥ k} is measurable
with respect to {Φ, Y1 , . . . , Yk −1 }, and in the ﬁnal equality we used independence of Y
and Φ.
Now deﬁne the kernel Uh on X × B(X) by

τh
Uh (x, B) = Ex IB (Φk ) . (12.13)
k =1

where the expectation is understood to be on the enlarged probability space. We have

∞

Uh (x, B) = Ex [IB (Φk )I{τh ≥ k}]
k =1

and hence from (12.12)

∞

Uh (x, B) = P (I1−h P )k (x, B) (12.14)
k =0
294 Invariance and tightness

where I1−h denotes the kernel which gives multiplication by 1 − h. This ﬁnal expression
for Uh deﬁnes this kernel independently of the bivariate chain.
In the special cases h ≡ 0, h = IB , and h ≡ 1 we have, respectively,

Uh = U, Uh = UB , Uh = P.
1
When h = 2 so that τh is completely independent of Φ we have
∞

U = 1
2
( 12 )k −1 P k = Ka 1 .
2
k =1

For general functions h, the expression (12.14) deﬁning Uh involves only the transition
function P for Φ and hence allows us to drop the bivariate chain if we are only interested
in properties of the kernel Uh . However the existence of the bivariate chain and the
construction of τh allows a transparent proof of the following resolvent equation.
Theorem 12.2.1 (Resolvent equation). Let h ≤ 1 and g ≤ 1 be two functions on X
with h ≥ g. Then the resolvent equation holds:

Ug = Uh + Uh Ih−g Ug = Uh + Ug Ih−g Uh .

Proof To prove the theorem we will consider the bivariate chain Ψ. We will see
that the resolvent equation formalizes several relationships between the stopping times
τg and τh for Ψ. Note that since h ≥ g, we have the inclusion Ag ⊆ Ah and hence
τg ≥ τ h .
To prove the ﬁrst resolvent equation we write

τg

τh
τg
f (Φk ) = f (Φk ) + I{τg > τh } f (Φk )
k =1 k =1 k =τ h +1

so by the strong Markov property for the process Ψ,

Ug (x, f ) = Uh (x, f ) + Ex [I{g(Φτ h ) < Uτ h }Ug (Φτ h , f )]. (12.15)

The latter expectation can be computed using (12.12). We have

Ex [I{g(Φτ h ) < Yτ h }Ug (Φτ h , f )I{τh = k} | F∞

Φ
]

= Ex [I{g(Φk ) < Yk }Ug (Φk , f )I{τh = k} | F∞

Φ
]

= Ex [I{g(Φk ) < Yk }I{h(Φk ) ≥ Yk }Ug (Φk , f )I{τh ≥ k} | F∞

Φ
]

= Ex [I{g(Φk ) < Yk ≤ h(Φk )}Ug (Φk , f )I{τh ≥ k} | F∞

Φ
]

−1
k1
= [h(Φk ) − g(Φk )]Ug (Φk , f ) [1 − h(Φi )].
i=1
12.2. Generalized sampling and invariant measures 295

Taking expectations and summing over k gives

Ex [I{g(Φτ h ) < Yτ h }Ug (Φτ h , f )]

∞ k1
−1
= Ex [1 − h(Φi )][h(Φk ) − g(Φk )]Ug (Φk , f )
k =1 i=1
∞
= (P I1−h )k P Ih−g Ug (x, f ).
k =0

This together with (12.15) gives the ﬁrst resolvent equation.

To prove the second, break the sum to τg into the pieces between consecutive visits
to Ah :
τg

τh
τg $
τh %
f (Φk ) = f (Φk ) + I{Ψk ∈ {Ah \ Ag }}θk f (Φi ) .
k =1 k =1 k =1 i=1

Taking expectations gives

Ug (x, f ) = Uh (x, f )
τg $
τh %
+ Ex I{g(Φk ) < Yk ≤ h(Φk )}θ k
f (Φi ) . (12.16)
k =1 i=1

The expectation can be transformed, using the Markov property for the bivariate chain,
to give

τg $
τh %
Ex I{g(Φk ) < Yk ≤ h(Φk )}θk f (Φi )
k =1 i=1
∞

τh
= Ex I{g(Φk ) < Yk ≤ h(Φk )}I{τg ≥ k}EΨ k f (Φi )
k =1 i=1
∞
= Ex [h(Φk ) − g(Φk )]I{τg ≥ k}Uh (Φk , f )
k =1
= Ug Ih−g Uh

which together with (12.16) proves the second resolvent equation.

When τh is a.s. ﬁnite for each initial condition the kernel Ph deﬁned as

Ph (x, A) = Uh Ih (x, A)

is a Markov transition function. This follows from (12.11), which shows that
∞
k1
−1
Ph (x, X) = Uh (x, h) = Ex (1 − h(Φi ))h(Φk )
k =1 i=1
∞
= Px {τh = k} (12.17)
k =1
296 Invariance and tightness

and hence Ph (x, X) = 1 if Px {τh < ∞} = 1.

It is natural to seek conditions which will ensure that τh is finite, since this is of course
analogous to the concept of Harris recurrence, and indeed identical to it for h = IC .
The following result answers this question as completely
∞ as we will find necessary.
Define L(x, h) = Uh (x, h) and Q(x, h) = Px { k =1 h(Φk ) = ∞}. Theorem 12.2.2
now shows that these functions are extensions of the the functions L and Q which we
have used extensively: in the special case where h = IB for some B ∈ B(X) we have
Q(x, IB ) = Q(x, B) and L(x, IB ) = L(x, B).

Theorem 12.2.2. For any x ∈ X and function 0 ≤ h ≤ 1,

(i) Px {Ψk ∈ Ah i.o.} = Q(x, h);

(ii) Px {τh < ∞} = L(x, h), and hence L(x, h) ≥ Q(x, h);

(iii) if for some ε < 1 the function h satisﬁes h(x) ≤ ε for all x ∈ X, then L(x, h) = 1
if and only if Q(x, h) = 1.

Proof (i) We have from the deﬁnition of Ah ,

Px {Ψk ∈ Ah i.o. | F∞
Φ
} = Px {Yk ≤ h(Φk ) i.o. | F∞
Φ
}.

Conditioned on F∞Φ
, the events {Yk ≤ h(Φk )}, k ≥ 1, are mutually independent. Hence
by the Borel-Cantelli Lemma,
$
∞ %
Px {Ψk ∈ Ah i.o. | F∞
Φ
} =I Px {Yk ≤ h(Φk ) | F∞
Φ
}=∞ .
k =1

Since Px {Yk ≤ h(Φk ) | F∞ Φ

} = h(Φk ), taking expectations of each side of this identity
completes the proof of (i).
(ii) This follows directly from the deﬁnitions and (12.17).
(iii) Suppose that h(x) ≤ ε for all x, and suppose that Q(x, h) < 1 for some x.
We will show that L(x, h) < 1 also.
If this is the case then by (i), for some N < ∞ and δ > 0,

Px { Ψk ∈ Ach for all k > N } = δ.

But then by the fact that Y is i.i.d. and independent of Φ,

1 − L(x, h) ≥ Px { Ψk ∈ Ach for all k > N , and Yk > ε for all k ≤ N }

= Px { Ψk ∈ Ach for all k > N }Px { Yk > ε for all k ≤ N }
= δ(1 − ε)N > 0.

We now present an application of Theorem 12.2.2 which gives another representation
for an invariant measure, extending the development of Section 10.4.2.

Theorem 12.2.3. Suppose that 0 ≤ h ≤ 1 with Q(x, h) = 1 for all x ∈ X.

12.2. Generalized sampling and invariant measures 297

(i) If µ is any σ-ﬁnite subinvariant measure, then µ is invariant and has the repre-
sentation
µ(A) = µ(dx)h(x)Uh (x, A).

(ii) If ν is a ﬁnite measure satisfying, for some A ∈ B(X),

ν(B) = νUh Ih (B), B ⊆ A,

then the measure µ := νUh is invariant for Φ. The sets

Cε = {x : Ka 1 (x, h) > ε}
2

cover X and have ﬁnite µ-measure for every ε > 0.

Proof We prove (i) by considering the bivariate chain Ψ. The set Ah ⊂ Y is Harris
recurrent and in fact Px {Ψ ∈ Ah i.o.} = 1 for all x ∈ X by Theorem 12.2.2. Now deﬁne
the measure µ on Y by

µ(A × B) = µ(A)µu n i (B), A ∈ B(X), B ∈ B([0, 1]). (12.18)

Obviously µ is an invariant measure for Ψ and hence by Theorem 10.4.7,

µ(A) = µ(A × [0, 1]) = µ(dx)u(dy)Uh (x, A)
(x,y )∈A h

= µ(dx)h(x)Uh (x, A),

which is the ﬁrst result.

To prove (ii) ﬁrst extend ν to B(Y) as µ was extended in (12.18) to obtain a measure
ν on B(Y). Now apply Theorem 10.4.7. The measure µ deﬁned as

τh
µ (A × B) = Eν I{Ψk ∈ A × B}
k =1

is invariant for Ψ, and since the distribution of Φ is the marginal distribution of Ψ, the
measure µ deﬁned for A ∈ B(X) by µ(A) := µ (A × [0, 1]), A ∈ B(X), is invariant for Φ.
We now demonstrate that µ is σ-ﬁnite. From the assumptions of the theorem and
Theorem 12.2.2 (ii) the sets Cε cover X. We have from the representation of µ,

ν(X) = µ(h) = µKa 1 (h) ≥ εµ(Cε ).

Hence for all ε we have the bound µ(Cε ) ≤ µ(h)/ε < ∞, which completes the proof of
(ii).

298 Invariance and tightness

12.3 The existence of a σ-ﬁnite invariant measure

12.3.1 The smoothed chain on a compact set
Here we shall give a weak sufficient condition for the existence of a σ-finite invariant
measure for a Feller chain. This provides an analogue of the results in Chapter 10
for recurrent chains. The construction we use mimics the construction mentioned in
Section 10.4.2: here, though, a function on a compact set plays the part of the petite
set A used in the construction of the “process on A”, and the fact that there is an
invariant measure to play the part of the measure ν in Theorem 10.4.8 is an application
of Theorem 12.1.2.
These results will again lead to a test function approach to establishing the existence
of an invariant measure for a Feller chain, even without ψ-irreducibility.
We will, however, assume that some one compact set C satisfies a strong form of
Harris recurrence: that is, that there exists a compact set C ⊂ X with

L(x, C) = Px {Φ enters C} ≡ 1, x ∈ X. (12.19)

Observe that by Proposition 9.1.1, (12.19) implies that Φ visits C inﬁnitely often from
each initial condition, and hence Φ is at least non-evanescent.
To construct an invariant measure we essentially consider the chain ΦC obtained
by sampling Φ at consecutive visits to the compact set C. Suppose that the resulting
sampled chain on C had the Feller property. In this case, since the sampled chain
evolves on the compact set C, we could deduce from Theorem 12.1.2 that an invariant
probability existed for the sampled chain, and we would then need only a few further
steps for an existence proof for the original chain Φ.
However, the transition function PC for the sampled chain is given by
∞

PC = (P IC c )k P IC = UC IC ,
k =0

which does not have the Feller property in general. To proceed, we must “smooth
around the edges of the compact set C”. The kernels Ph introduced in the previous
section allow us to do just that.
Let N and O be open subsets of X with compact closure for which C ⊂ O ⊂ Ō ⊂ N ,
where C satisﬁes (12.19) and let h : X → R be a continuous function such as

d(x, N c )
h(x) =
d(x, N c ) + d(x, Ō)

for which
IO (x) ≤ h(x) ≤ IN (x). (12.20)
The kernel Ph := Uh Ih is a Markov transition function since by (12.19) we have that
Q(x, h) ≡ 1. Since Ph (x, N̄ ) = 1 for all x ∈ X, we will immediately have an invariant
measure for Ph by Theorem 12.1.2 if Ph has the weak Feller property.
Proposition 12.3.1. Suppose that the transition function P is weak Feller. If 0 ≤ h ≤
1 is continuous and if Q(x, h) ≡ 1, then Ph is also weak Feller.
12.3. The existence of a σ-ﬁnite invariant measure 299

Proof By the Feller property, the kernel (P I1−h )n P Ih preserves positive lower
semicontinuous functions. Hence if f is positive and lower semicontinuous, then
∞

Ph f = (P I1−h )n P Ih f
k =0

is lower semicontinuous, being the increasing limit of a sequence of lower semicontinuous

functions.
Suppose now that f is bounded and continuous, and choose a constant L so large
that L + f and L − f are both positive. Then the functions
L + f, L − f, Ph (L + f ), Ph (L − f ),
are all positive and lower semicontinuous, from which it follows that Ph f is continuous.
Hence Ph is weak Feller as required.

We now prove using the generalized resolvent operators
Theorem 12.3.2. If Φ is Feller and (12.19) is satisﬁed, then there exists at least one
invariant measure which is ﬁnite on compact sets.

Proof From Theorem 12.1.2 an invariant probability ν exists which is invariant for
Ph = Uh Ih . Hence from Theorem 12.2.3, the measure µ = νUh is invariant for Φ and
is ﬁnite on the sets {x : Ka 1 (x, h) > ε}. Since Ka 1 (x, h) is a continuous function of
2 2
x, and is strictly positive everywhere by (12.19), it follows that µ is ﬁnite on compact
sets.

12.3.2 Drift criteria for the existence of invariant measures

We conclude this section by proving that the test function which implies Harris recur-
rence or regularity for a ψ-irreducible T-chain may also be used to prove the existence
of σ-finite invariant measures or invariant probability measures for Feller chains.
Theorem 12.3.3. Suppose that Φ is Feller and that (V1) is satisfied with a compact
set C ⊂ X. Then an invariant measure exists which is finite on compact subsets of X.

Proof If L(x, C) = 1 for all x ∈ X, then the proof follows from Theorem 12.3.2.
Consider now the only other possibility, where L(x, C) = 1 for some x. In this case
the adapted process {V (Φk )I{τC > k}, FkΦ } is a convergent supermartingale, as in the
proof of Theorem 9.4.1, and since by assumption Px {τC = ∞} > 0, this shows that
Px {lim sup V (Φk ) < ∞} ≥ 1 − L(x, C) > 0.
k →∞

By Theorem 12.1.2, it follows that an invariant probability exists, and this completes
the proof.

Finally we prove that in the weak Feller case, the drift condition (V2) again provides
a criterion for the existence of an invariant probability measure.
Theorem 12.3.4. Suppose that the chain Φ is weak Feller. If (V2) is satisﬁed with a
compact set C and a positive function V which is ﬁnite at one x0 ∈ X, then an invariant
probability measure π exists.
300 Invariance and tightness

Proof Iterating (V2) n times gives

1 1 k
n n
1
1 ≤ V (x0 ) + b P (x0 , C).
n n n
k =0 k =0

Letting n → ∞ we see that

1 k
n
1
lim inf P (x0 , C) ≥ . (12.21)
n →∞ n b
k =0

Theorem 12.3.4 then follows directly from Theorem 12.1.2 (i).

12.4 Invariant measures for e-chains

12.4.1 Existence of an invariant measure for e-chains
Up to now we have shown under very mild conditions that an invariant probability
measure exists for a Feller chain, based largely on arguments using weak convergence
of P n .
As we have seen, such weak limits will depend in general on the value of x chosen,
unless as in Proposition 12.1.4 there is a unique invariant measure. In this section we
will explore the properties of the collection of such limiting measures.
Suppose that the chain is weak Feller and we can prove that a Markov transition
function Π exists which is itself weak Feller, such that for any f ∈ C(X),
lim P k f (x) = Πf (x), x ∈ X. (12.22)
k →∞

In this case, it follows as in Proposition 6.4.2 from Ascoli’s Theorem D.4.2 that {P k f :
k ∈ Z+ } is equicontinuous on compact subsets of X whenever f ∈ C(X), and so it is
necessary that the chain Φ be an e-chain, in the sense of Section 6.4, whenever we have
convergence in the sense of (12.22).
The key to analyzing e-chains lies in the following result:
Theorem 12.4.1. Suppose that Φ is an e-chain. Then
(i) There exists a substochastic kernel Π such that
v
P k (x, · ) −→ Π(x, · ) as k → ∞ (12.23)
v
Ka ε (x, · ) −→ Π(x, · ) as ε ↑ 1 (12.24)
for all x ∈ X.
(ii) For each j, k, ∈ Z+ we have
P j Π k P = Π, (12.25)
and hence for all x ∈ X the measure Π(x, · ) is invariant with Π(x, X) ≤ 1.
(iii) The Markov chain is bounded in probability on average if and only if Π(x, X) = 1
for all x ∈ X.
12.4. Invariant measures for e-chains 301

Proof We prove the result (12.23), the proof of (12.24) being similar. Let {fn } ⊂
Cc (X) denote a ﬁxed dense subset. By Ascoli’s theorem and a diagonal subsequence
argument, there exists a subsequence {ki } of Z+ and functions {gn } ⊂ C(X) such that

lim P k i fn (x) = gn (x) (12.26)

i→∞

uniformly for x in compact subsets of X for each n ∈ Z+ . The set of all subprobabilities
on B(X) is sequentially compact with respect to vague convergence, and any vague limit
ν of the probabilities P k i (x, · ) must satisfy fn dν = gn (x) for all n ∈ Z+ . Since the
functions {fn } are dense in Cc (X), this shows that for each x there is exactly one vague
limit point, and hence a kernel Π exists for which
v
P k i (x, · ) −→ Π(x, · ) as i → ∞

for each x ∈ X.
Observe that by equicontinuity, the function Πf is continuous for every function
f ∈ Cc (X). It follows that Πf is positive and lower semicontinuous whenever f has
these properties.
By the Dominated Convergence Theorem we have for all k, j ∈ Z+ ,

P j Π k = Π.

Next we show that ΠP = Π, and hence that

Π k P j = Π, k, j ∈ Z+ .

Let f ∈ Cc (X) be a continuous positive function with compact support. Then, since the
function P f is also positive and continuous, (D.6) implies that

Π(P f ) ≤ lim inf P k i (P f )

i→∞
= Πf,

which shows that ΠP = Π.

We now show that (12.23) holds. Suppose that P N does not converge vaguely to
Π. Then there exists a diﬀerent subsequence {mj } of Z+ , and a distinct kernel Π such
that
P m j −→ Π (x, · ),
v
j → ∞.
However, for each positive function f ∈ Cc (X),

Πf = lim ΠP m j f
j →∞

= ΠΠ f by the Dominated Convergence Theorem
≤ lim inf P k i Π f since Π f is continuous and positive
i→∞

= Π f.

Hence by symmetry, Π = Π, and this completes the proof of (i) and (ii).
The result (iii) follows from (i) and Proposition D.5.6.

302 Invariance and tightness

12.4.2 Hitting time and drift criteria for stability of e-chains

We now consider the stability of e-chains. First we show in Theorem 12.4.3 that if the
chain hits a fixed compact subset of X with probability one from each initial condition,
and if this compact set is positive in a well defined way, then the chain is bounded
in probability on average. This is an analogue of the rather more powerful regularity
results in Chapter 11.
This result is then applied to obtain a drift criterion for boundedness in probability
using (V2).
To characterize boundedness in probability we use the following weak analogue of
Kac’s Theorem 10.2.2, connecting positivity of Ka ε (x, C) with finiteness of the mean
return time to C.
Proposition 12.4.2. For any compact set C ⊂ X
−1
lim inf Ka ε (x, C) ≥ sup Ey [τC ] , x ∈ C.
ε↑1 y ∈C

Proof For the ﬁrst entrance time τC to the compact set C, let θτ C denote the
τC -fold shift on sample space, deﬁned so that θτ C f (Φk ) = f (Φk +τ C ) for any function f
on X.
Fix x ∈ C, 0 < ε < 1, and observe that by conditioning at time τC and using the
strong Markov property we have for x ∈ C,

∞
Ka ε (x, C) = (1 − ε)Ex εk I{Φk ∈ C}
k =0
∞

= (1 − ε)Ex 1 + ετ C +k θτ C I{Φk ∈ C}
k =0

∞
= (1 − ε) + (1 − ε)Ex ετ C EΦ τ C εk I{Φk ∈ C}
k =0
≥ (1 − ε) + Ex [ετ C ] inf Ka ε (y, C).
y ∈C

Taking the inﬁmum over all x ∈ C, we obtain

inf Ka ε (y, C) ≥ (1 − ε) + inf Ey [ετ C ] inf Ka ε (y, C). (12.27)

y ∈C y ∈C y ∈C

By Jensen’s inequality we have the bound E[ετ C ] ≥ εE[τ C ] . Hence letting MC =

supx∈C Ex [τC ] it follows from (12.27) that for y ∈ C,
1−ε
Ka ε (y, C) ≥ .
1 − εM C
Letting ε ↑ 1 we have for each y ∈ C,

1−ε 1
lim inf Ka ε (y, C) ≥ lim = .
ε↑1 ε↑1 1 − εM C MC

12.4. Invariant measures for e-chains 303

We saw in Theorem 12.4.1 that Φ is bounded in probability on average if and only

if Π(x, X) = 1 for all x ∈ X. Hence the following result shows that compact sets serve
as test sets for stability: if a ﬁxed compact set is reachable from all initial conditions,
and if Φ is reasonably well behaved from initial conditions on that compact set, then
Φ will be bounded in probability on average.

Theorem 12.4.3. Suppose Φ is an e-chain. Then

(i) max Π(x, X) exists and is equal to zero or one;

x∈X

(ii) if min Π(x, X) exists, then it is equal to zero or one;

x∈X

(iii) if there exists a compact set C ⊂ X such that

Px {τC < ∞} = 1, x ∈ X,

then min Π(x, X) exists and is attained on C, so that

x∈X

inf Π(x, X) = min Π(x, X);

x∈X x∈C

(iv) if C ⊂ X is compact, then

−1
inf Π(x, X) ≥ sup Ex [τC ] .
x∈C x∈C

Proof (i) If Π(x, X) > 0 for some x ∈ X, then an invariant probability π exists.
In fact, we may take π = Π(x, · )/Π(x, X).
From the deﬁnition of Π and the Dominated Convergence Theorem we have that
for any f ∈ Cc (X),
π(f ) = lim [πP n (f )] = πΠ(f )
n →∞

which shows that π = πΠ. Hence 1 = π(X) = π(dx)Π(x, X). This shows that
Π(y, X) = 1 for a.e. y ∈ X [π], proving (i) of the theorem.
(ii) Let ρ = inf x∈X Π(x, X), and let

Sρ = {x ∈ X : Π(x, X) = ρ}.

By the assumptions of (ii), Sρ = ∅. Letting u( · ) := Π( · , X), we have P u = u, and this

implies that the set Sρ is absorbing. Since u is lower semicontinuous, the set Sρ is also
a closed subset of X.
Since Sρ is closed, it follows by vague convergence and (D.6) that for all x ∈ X,

lim inf P N (x, Sρc ) ≥ Π(x, Sρc ),

N →∞

and since Sρ is also absorbing, this shows that for all x ∈ Sρ

Π(x, Sρc ) = 0. (12.28)

304 Invariance and tightness

Suppose now that 0 ≤ ρ < 1. As in the proof of (i),

π{y ∈ X : Π(y, X) = 1} = 1
for any invariant probability π, and hence
Π(x, Sρ ) ≤ Π(x, {y ∈ X : Π(y, X) < 1}) = 0. (12.29)
Equations (12.28) and (12.29) show that for any x ∈ Sρ ,
ρ = Π(x, X) = Π(x, Sρ ) + Π(x, Sρc ) = 0,
and this proves (ii).
(iii) Since u(x) := Π(x, X) is lower semicontinuous we have
inf u(x) = min u(x).
x∈C x∈C

That is, the inﬁmum is attained.

Since P u = u, the sequence {u(Φk ), FkΦ } is a martingale, which converges to a ran-
dom variable u∞ satisfying Ex [u∞ ] = u(x), x ∈ X. By Proposition 9.1.1, the assumption
that Px {τC < ∞} ≡ 1 implies that
Px {Φ ∈ C i.o.} = 1, x ∈ X. (12.30)
If Φk ∈ C for some k ∈ Z+ , then obviously u(Φk ) ≥ minx∈C u(x), which by (12.30)
implies that
u∞ = lim u(Φk ) ≥ min u(x) a.s.
k →∞ x∈C
Taking expectations shows that u(y) ≥ minx∈C u(x) for all y ∈ X, proving part (iii) of
the theorem.
(iv) Letting MC = supx∈C Ex [τC ] it follows from Proposition 12.4.2 that
1
inf lim inf Ka ε (y, C) ≥ .
y ∈C ε↑1 MC
This proves the result since lim supε↑1 Ka ε (y, C) ≤ Π(y, C) by Theorem 12.4.1.

We have immediately
Proposition 12.4.4. Let Φ be an e-chain, and let C ⊂ X be compact. If Px {τC <
∞} = 1, x ∈ X, and supx∈C Ex [τC ] < ∞, then Φ is bounded in probability on average.

Proof From Theorem 12.4.3 (iii) we see that for all x,

−1
min Π(x, X) = min Π(x, X) ≥ sup Ex [τC ] > 0.
x∈X x∈C x∈C

Hence from Theorem 12.4.3 (ii) we have Π(x, X) = 1 for all x ∈ X. Theorem 12.4.1
then implies that the chain is bounded in probability on average.

The next result shows that the drift criterion for positive recurrence for ψ-irreducible
chains also has an impact on the class of e-chains.
Theorem 12.4.5. Let Φ be an e-chain, and suppose that condition (V2) holds for
a compact set C and an everywhere ﬁnite function V . Then the Markov chain Φ is
bounded in probability on average.
12.5. Establishing boundedness in probability 305

Proof It follows from Theorem 11.3.4 that Ex [τC ] ≤ V (x) for x ∈ C c , so that a
fortiori we also have L(x, C) ≡ 1. As in the proof of Theorem 12.3.4, for any x ∈ X,
1 k
n
1
Π(x, X) ≥ lim sup P (x, C) ≥ , x ∈ X.
n →∞ n b
k =0

From this it follows from Theorem 12.4.3 (iii) and (ii) that Π(x, X) ≡ 1, and hence Φ
is bounded in probability on average as claimed.

12.5 Establishing boundedness in probability

Boundedness in probability is clearly the key condition needed to establish the exis-
tence of an invariant measure under a variety of continuity regimes. In this section we
illustrate the veriﬁcation of boundedness in probability for some speciﬁc models.

12.5.1 Linear state space models

We show first that the conditions used in Proposition 6.3.5 to obtain irreducibility are in
fact sufficient to establish boundedness in probability for the linear state space model.
Thus with no extra conditions we are able to show that a stationary version of this
model exists.
Recall that we have already seen in Chapter 7 that the linear state space model is
an e-chain when (LSS5) holds.
Proposition 12.5.1. Consider the linear state space model defined by (LSS1) and
(LSS2). If the eigenvalue condition (LSS5) is satisfied, then Φ is bounded in probabil-
ity. Moreover, if the nonsingularity condition (LSS4) and the controllability condition
(LCM3) are also satisfied then the model is positive Harris.

Proof Let us take

∞

M := I + F i F i ,
i=1
where F denotes the transpose of F . If condition (LSS5) holds, then by Lemma 6.3.4
the matrix M is finite and positive definite with I ≤ M , and for some α < 1
|F x|2M ≤ α|x|2M , (12.31)

where |y|2M
:=
y M y for y∈R .
n
∞ i
Let m = i=0 F G E[W1 ], and define

V (x) = |x − m|2M , x ∈ X. (12.32)

Then it follows from (LSS1) that
V (Xk +1 ) = |F (Xk − m)|2M + |G(Wk +1 − E[Wk +1 ])|2M

+ (Xk − m) F M G(Wk +1 − E[Wk +1 ]) (12.33)

+ (Wk +1 − E[Wk +1 ]) G M F (Xk − m).

306 Invariance and tightness

Since Wk +1 and Xk are independent, this together with (12.31) implies that
E[V (Xk +1 ) | X0 , . . . , Xk ] ≤ αV (Xk ) + E[|G(Wk +1 − E[Wk +1 ])|2M ], (12.34)
and taking expectations of both sides gives
E[|G(Wk +1 − E[Wk +1 ])|2M ]
lim sup E[V (Xk )] ≤ < ∞.
k →∞ 1−α
Since V is a coercive function on X, Lemma D.5.3 gives a direct proof that the chain is
bounded in probability.
We note that (12.34) also ensures immediately that (V2) is satisfied. Under the extra
conditions (LSS4) and (LCM3) we have from Proposition 6.3.5 that all compact sets
are petite, and it immediately follows from Theorem 11.3.11 that the chain is regular
and hence positive Harris.

It may be seen that stability of the linear state space model is closely tied to the
stability of the deterministic system xk +1 = F xk . For each initial condition x0 ∈ Rn of
this deterministic system, the resulting trajectory {xk } satisfies the bound
|xk |M ≤ αk |x0 |M
and hence is ultimately bounded in the sense of Section 11.2: in fact, in the dynamical
systems literature such a system is called globally exponentially stable. It is precisely
this stability for the deterministic “core” of the linear state space model which allows
us to obtain boundedness in probability for the stochastic process Φ.
We now generalize the model (LSS1) to include random variation in the coefficients
F and G.

12.5.2 Bilinear models

Let us next consider the scalar example where Φ is the bilinear state space model on
X = R deﬁned in (SBL1)–(SBL2)
Xk +1 = θXk + bWk +1 Xk + Wk +1 , (12.35)
where W is a zero-mean disturbance process. This is related closely to the linear model
above, and the analysis is almost identical.
To obtain boundedness in probability by direct calculation, observe that
E[|Xk +1 | | Xk = x] ≤ E[|θ + bWk +1 |]|x| + E[|Wk +1 |]. (12.36)
Hence for every initial condition of the process,
E[|Wk +1 |]
lim sup E[|Xk |] ≤
k →∞ 1 − E[|θ + bWk +1 |]
provided that
E[|θ + bWk +1 |] < 1. (12.37)
Since | · | is a coercive function on X, this shows that Φ is bounded in probability
provided that (12.37) is satisﬁed.
Again observe that in fact the bound (12.36) implies that the mean drift criterion
(V2) holds.
12.5. Establishing boundedness in probability 307

12.5.3 Adaptive control models

Finally we consider the adaptive control model (2.22)–(2.24).
The closed loop system described by (2.25) is a Feller Markov chain, and thus an
invariant probability exists if the distributions of the process are tight for some initial
condition. We show here that the distributions of Φ are tight when the initial conditions
are chosen so that

θ̃k = θk − E[θk | Yk ], and Σk = E[θ̃k2 | Yk ]. (12.38)

For example, this is the case when y0 = θ̃0 = Σ0 = 0. If (12.38) holds then it follows
from (2.23) that
E[Yk2+1 | Yk ] = Σk Yk2 + σw2 . (12.39)
This identity will be used to prove the following result:
Proposition 12.5.2. For the adaptive control model satisfying (SAC1) and (SAC2),
suppose that the process Φ deﬁned in (2.25) satisﬁes (12.38) and that σz2 < 1. Then we
have
lim sup E[|Φk |2 ] < ∞
k →∞

so that distributions of the chain are tight, and hence Φ is positive recurrent.

Proof We note ﬁrst that since the sequence {Σk } is bounded below and above by
Σ = σz > 0 and Σ = σz /(1 − α2 ) < ∞, and the process θ clearly satisﬁes

σz2
lim sup E[θk2 ] = ,
k →∞ 1 − α2

to prove the proposition it is enough to bound E[Yk2 ].

From (12.39) and (2.24) we have

E[Yk2+1 Σk +1 | Yk ] = Σk +1 E[Yk2+1 | Yk ]

= Σk +1 (Σk Yk2 + σw2 )

(12.40)
= (σz2 + α2 σw2 Σk (Σk Yk2 + σw2 )−1 )(Σk Yk2 + σw2 )

= σz2 Yk2 Σk + σw2 σz2 + α2 σw2 Σk .

Taking total expectations of each side of (12.40), we use the condition σz2 < 1 to obtain
by induction, for all k ∈ Z+ ,

σw2 σz2 + α2 σw2 Σ

ΣE[Yk2+1 ] ≤ E[Yk2+1 Σk +1 ] ≤ + σz2k E[Y02 Σ0 ]. (12.41)
1 − σz2

This shows that the mean of Yk2 is uniformly bounded.

Since Φ has the Feller property it follows from Proposition 12.1.3 that an invariant
probability exists. Hence from Theorem 7.4.3 the chain is positive recurrent.

308 Invariance and tightness

In fact, we will see in Chapter 16 that not only is the process bounded in probability,
but the conditional mean of Yk2 converges to the steady state value Eπ [Y02 ] at a geometric
rate from every initial condition. These results require a more elaborate stability proof.
Note that equation (12.40) does not obviously imply that there is a solution to a
drift inequality such as (V2): the conditional expectation is taken with respect to Yk ,
which is strictly smaller than FkΦ .
The condition that σz2 < 1 cannot be omitted in this analysis: indeed, we have that
if σz2 ≥ 1, then
E[Yk2 ] ≥ [σz2 ]k Y0 + kσw2 → ∞
as k increases, so that the chain is unstable in a mean square sense, although it may
still be bounded in probability.
It is well worth observing that this is one of the few models which we have encoun-
tered where obtaining a drift inequality of the form (V2) is much more diﬃcult than
merely proving boundedness in probability. This is due to the fact that the dynamics
of this model are extremely nonlinear, and so a direct stability proof is diﬃcult. By
exploiting equation (12.39) we essentially linearize a portion of the dynamics, which
makes the stability proof rather straightforward. However the identity (12.39) only
holds for a restricted class of initial conditions, so in general we are forced to tackle the
nonlinear equations directly.

12.6 Commentary
The key result Theorem 12.1.2 is taken from Foguel [121]. Versions of this result have
also appeared in papers by Beneš [23, 24] and Stettner [372] which consider processes
in continuous time. For more results on Feller chains the reader is referred to Krengel
[221], and the references cited therein.
For an elegant operator-theoretic proof of results related to Theorem 12.3.2, see
Lin [238] and Foguel [123]. The method of proof based upon the use of the operator
Ph = Uh Ih to obtain a σ-finite invariant measure is taken from Rosenblatt [338]. Neveu
in [295] promoted the use of the operators Uh , and proved the resolvent equation The-
orem 12.2.1 using direct manipulations of the operators. The kernel Ph is often called
the balayage operator associated with the function h (see Krengel [221] or Revuz [326]).
In the Supplement to Krengel’s text by Brunel ([221] pp. 301–309) a development of the
recurrence structure of irreducible Markov chains is developed based upon these oper-
ators. This analysis and much of [326] exploits fully the resolvent equation, illustrating
the power of this simple formula although because of our emphasis on ψ-irreducible
chains and probabilistic methods, we do not address the resolvent equation further in
this book.
Obviously, as with Theorem 12.1.2, Theorem 12.3.4 can be applied to an irreducible
Markov chain on countable space to prove positive recurrence. It is of some historical
interest to note that Foster’s original proof of the sufficiency of (V2) for positivity of such
chains is essentially that in Theorem 12.3.4. Rather than showing in any direct way that
(V2) gives an invariant measure, Foster was able to use the countable space analogue
of Theorem 12.1.2 (i) to deduce positivity from the “non-nullity” of a “compact” finite
set of states as in (12.21). We will discuss more general versions of this classification of
sets as positive or null further, but not until Chapter 18.
12.6. Commentary 309

Observe that Theorem 12.3.4 only states that an invariant probability exists. Per-
haps surprisingly, it is not known whether the hypotheses of Theorem 12.3.4 imply that
the chain is bounded in probability when V is finite valued except for e-chains as in
Theorem 12.4.5.
The theory of e-chains is still being developed, although these processes have been
the subject of several papers over the past thirty years, most notably by Jamison and
Sine [175, 178, 358, 357, 356], Rosenblatt [337], Foguel [121] and the text by Krengel
[221]. In most of the e-chain literature, however, the state space is assumed compact
so that stability is immediate. The drift criterion for boundedness in probability on
average in Theorem 12.4.5 is new. The criterion Theorem 12.3.4 for the existence of an
invariant probability for a Feller chain was first shown in Tweedie [402].
The stability analysis of the linear state space model presented here is standard. For
an early treatment see Kalman and Bertram [192], while Caines [57] contains a modern
and complete development of discrete time linear systems. Snyders [364] treats linear
models with a continuous time parameter in a manner similar to the presentation in
this book. The bilinear model has been the subject of several papers: see for example
Feigin and Tweedie [111], or the discussion in Tong [388]. The stability of the adaptive
control model was first resolved in Meyn and Caines [270], and related stability results
were described in Solo [365]. The stability proof given here is new, and is far simpler
than any previous results.
Part III

CONVERGENCE
Chapter 13

Ergodicity

In Part II we developed the ideas of stability largely in terms of recurrence structures.

Our concern was with the way in which the chain returned to the “center” of the space,
how sure we could be that this would happen, and whether it might happen in a finite
mean time.
Part III is devoted to the perhaps even more important, and certainly deeper, con-
cepts of the chain “settling down”, or converging, to a stable or stationary regime.
In our heuristic introduction to the various possible ideas of stability in Section 1.3,
such convergence was presented as a fundamental idea, related in the dynamical sys-
tems and deterministic contexts to asymptotic stability. We noted briefly, in (10.4) in
Chapter 10, that the existence of a finite invariant measure was a necessary condition
for such a stationary regime to exist as a limit. In Chapter 12 we explored in much
greater detail the way in which convergence of P n to a limit, on topological spaces,
leads to the existence of invariant measures.
In this chapter we begin a systematic approach to this question from the other side.
Given the existence of π, when do the n-step transition probabilities converge in a
suitable way to π?
We will prove that for positive recurrent ψ-irreducible chains, such limiting behavior
takes place with no topological assumptions, and moreover the limits are achieved in
a much stronger way than under the tightness assumptions in the topological context.
The Aperiodic Ergodic Theorem, which unifies the various definitions of positivity,
summarizes this asymptotic theory. It is undoubtedly the outstanding achievement in
the general theory of ψ-irreducible Markov chains, even though we shall prove some
considerably stronger variations in the next two chapters.

Theorem 13.0.1 (Aperiodic Ergodic Theorem). Suppose that Φ is an aperiodic Harris

recurrent chain, with invariant measure π. The following are equivalent:

(i) The chain is positive Harris: that is, the unique invariant measure π is ﬁnite.

(ii) There exists some ν-small set C ∈ B+ (X) and some P ∞ (C) > 0 such that as
n → ∞, for all x ∈ C
P n (x, C) → P ∞ (C). (13.1)

313
314 Ergodicity

(iii) There exists some regular set in B + (X): equivalently, there is a petite set C ∈ B(X)
such that
sup Ex [τC ] < ∞. (13.2)
x∈C

(iv) There exists some petite set C, some b < ∞ and a non-negative function V ﬁnite
at some one x0 ∈ X, satisfying

∆V (x) := P V (x) − V (x) ≤ −1 + bIC (x), x ∈ X. (13.3)

Any of these conditions is equivalent to the existence of a unique invariant probability

measure π such that for every initial condition x ∈ X,

sup |P n (x, A) − π(A)| → 0 (13.4)

A ∈B(X)

as n → ∞, and moreover for any regular initial distributions λ, µ,

∞

λ(dx)µ(dy) sup |P n (x, A) − P n (y, A)| < ∞. (13.5)
n =1 A ∈B(X)

Proof That π(X) < ∞ in (i) is equivalent to the finiteness of hitting times as in
(iii) and the existence of a mean drift test function in (iv) is merely a restatement of
the overview Theorem 11.0.1 in Chapter 11.
The fact that any of these positive recurrence conditions imply the uniform con-
vergence over all sets A from all starting points x as in (13.4) is of course the main
conclusion of this theorem, and is finally shown in Theorem 13.3.3.
That (ii) holds from (13.4) is obviously trivial by dominated convergence. The cycle
is completed by the implication that (ii) implies (13.4), which is in Theorem 13.3.5.
The extension from convergence to summability provided the initial measures are
regular is given in Theorem 13.4.4. Conditions under which π itself is regular are also
in Section 13.4.2.

There are four ideas which should be born in mind as we embark on this third part
of the book, especially when coming from a countable space background. The first two
involve the types of limit theorems we shall address; the third involves the method of
proof of these theorems; and the fourth involves the nomenclature we shall use.

Modes of convergence
The ﬁrst is that we will be considering, in this and the next three chapters, convergence
of a chain in terms of its transition probabilities. Although it is important also to
consider convergence of a chain along its sample paths, leading to strong laws, or of
normalized variables leading to central limit theorems and associated results, we do not
turn to this until Chapter 17.
This is in contrast to the traditional approach in the countable state space case.
Typically, there, the search is for conditions under which there exist pointwise limits of
the form
lim |P n (x, y) − π(y)| = 0; (13.6)
n →∞
Ergodicity 315

but the results we derive are related to the signed measure (P n − π), and so concern
not merely such pointwise or even setwise convergence, but a more global convergence
in terms of the total variation norm.

Total variation norm

If µ is a signed measure on B(X), then the total variation norm
µ
is
deﬁned as

µ
:= sup |µ(f )| = sup µ(A) − inf µ(A). (13.7)
f :|f |≤1 A ∈B(X) A ∈B(X)

The key limit of interest to us in this chapter will be of the form

lim
P n (x, · ) − π
= 2 lim sup |P n (x, A) − π(A)| = 0. (13.8)
n →∞ n →∞ A

Obviously when (13.8) holds on a countable space, then (13.6) also holds and indeed
holds uniformly in the end point y. This move to the total variation norm, necessitated
by the typical lack of structure of pointwise transitions in the general state space, will
actually prove exceedingly fruitful rather than restrictive.
When the space is topological, it is also the case that total variation convergence
implies weak convergence of the measures in question.
This is clear since (see Chapter 12) the latter is deﬁned as convergence of expec-
tations of functions which are not only bounded but also continuous. Hence the weak
convergence of P n to π as in Proposition 12.1.4 will be subsumed in results such as
(13.4) provided the chain is suitably irreducible and positive.
Thus, for example, asymptotic properties of T-chains will be much stronger than
those for arbitrary weak Feller chains even when a unique invariant measure exists for
the latter.

Independence of initial and limiting distributions

The second point to be made explicitly is that the limits in (13.8), and their refinements
and extensions in Chapters 14–16, will typically be found to hold independently of the
particular starting point x, and indeed we will be seeking conditions under which this
is the case.
Having established this, however, the identification of the class of starting distri-
butions for which particular asymptotic limits hold becomes a question of some im-
portance, and the answer is not always obvious: in essence, if the chain starts with a
distribution “too near infinity” then it may never reach the expected stationary distri-
bution.
This is typified in (13.5), where the summability holds only for regular initial mea-
sures.
316 Ergodicity

The same type of behavior, and the need to ensure that initial distributions are
appropriately “regular” in extended ways, will be a highly visible part of the work in
Chapters 14 and 15.

The role of renewal theory and splitting

Thirdly, in developing the ergodic properties of ψ-irreducible chains we will use the
splitting techniques of Chapter 5 in a systematic and fruitful way, and we will also need
the properties of renewal sequences associated with visits to the atom in the split chain.
Up to now the existence of a “pseudo-atom” has not generated many results that
could not have been derived (sometimes with considerable but nevertheless relatively
elementary work) from the existence of petite sets: the only real “atom-based” result
has been the existence of regular sets in Chapter 11. We have not given much reason
for the reader to believe that the atom-based constructions are other than a gloss on
the results obtainable through petite sets.
In Part III, however, we will ﬁnd that the existence of atoms provides a critical
step in the development of asymptotic results. This is due to the many limit theorems
available for renewal processes, and we will prove such theorems as they ﬁt into the
Markov chain development.
We will also see that several generalizations of regular sets also play a key role in such
results: the essential equivalence of regularity and positivity, developed in Chapter 11,
becomes of far more than academic value in developing ergodic structures.

Ergodic chains
Finally, a word on the term ergodic. We will adopt this term for chains where the limit
in (13.6) or (13.8) holds as the time sequence n → ∞, rather than as n → ∞ through
some subsequence.
Unfortunately, we know that in complete generality Markov chains may be periodic,
in which case the limits in (13.6) or (13.8) can hold at best as we go through a periodic
sequence nd as n → ∞. Thus by deﬁnition, ergodic chains will be aperiodic, and a
minor, sometimes annoying but always vital change to the structure of the results is
needed in the periodic case.
We will therefore give results, typically, for the aperiodic context and give the re-
quired modiﬁcation for the periodic case following the main statement when this seems
worthwhile.

13.1 Ergodic chains on countable spaces

13.1.1 First-entrance last-exit decompositions
In this section we will approach the ergodic question for Markov chains in the countable
state space case, before moving on to the general case in later sections. The methods
are rather similar: indeed, given the splitting technique there will be a relatively small
amount of extra work needed to move to the more general context.
Even in the countable case, the technique of proof we give is simpler and more
powerful than that usually presented. One real simpliﬁcation of the analysis through
13.1. Ergodic chains on countable spaces 317

the use of total variation norm convergence results comes from an extension of the ﬁrst-
entrance and last-exit decompositions of Section 8.2, together with the representation
of the invariant probability given in Theorem 10.2.1.
The ﬁrst-entrance last-exit decomposition, for any states x, y, α ∈ X is given by

n −1
j
P n (x, y) = α P n (x, y) + αP
k
(x, α)P j −k (α, α) α P n −j (α, y), (13.9)
j =1 k =1

where we have used the notation α to indicate that the specific state being used for the
decomposition is distinguished from the more generic states x, y which are the starting
and end points of the decomposition.
We will wish in what follows to concentrate on the time variable rather than a
particular starting point or end point, and it will prove particularly useful to have
notation that reflects this. Let us hold the reference state α fixed and introduce the
three forms
ax (n) := Px (τα = n), (13.10)

u (n) := Pα (Φn = α), (13.11)

ty (n) := α P n (α, y). (13.12)

This notation is designed to stress the role of ax (n) as a delay distribution in the
renewal sequence of visits to α, and the “tail properties” of ty (n) in the representation
of π: recall from (10.10) that
∞
π(y) = (Eα [τα ])−1 j =1 α P j (α, y)
∞ (13.13)
= π(α) j =1 ty (j).

Using this notation the ﬁrst-entrance and last-exit decompositions become

n n −j
P n (x, α) = j =0 Px (τα = j)P (α, α)
n
= j =0 ax (j)u(n − j),
n
P n (α, y) = j =0 P j (α, α)α P n −j (α, y)
n
= j =0 u(j)ty (n − j)
n
or, using the convolution notation a∗b (n) = 0 a(j)b(n−j) introduced in Section 2.4.1,

P n (x, α) = ax ∗ u (n), (13.14)

P n (α, y) = u ∗ ty (n). (13.15)

The ﬁrst-exit last-entrance decomposition (13.9) can be written similarly as

P n (x, y) = α P n (x, y) + ax ∗ u ∗ ty (n). (13.16)

318 Ergodicity

The power of these forms becomes apparent when we link them to the representation
of the invariant measure given in (13.13). The next decomposition underlies all ergodic
theorems for countable space chains.

Proposition 13.1.1. Suppose that Φ is a positive Harris recurrent chain on a countable

space, with invariant probability π. Then for any x, y, α ∈ X
∞

|P n (x, y) − π(y)| ≤ α P n (x, y) + |ax ∗ u − π(α)| ∗ ty (n) + π(α) ty (j). (13.17)
j =n +1

Proof From the decomposition (13.16) we have

Now we use the representation (13.13) for π and (13.17) is immediate.

13.1.2 Solidarity from one ergodic state

If the three terms in (13.17) can all be made to converge to zero, we will have shown
that P n (x, y) → π(y) as n → ∞. The two extreme terms involve the convergence
of simple positive expressions, and ﬁnding bounds for both of these is at the level of
calculation we have already used, especially in Chapters 10 and 11. The middle term
involves a deeper limiting operation, and showing that this term does indeed converge
is at the heart of proving ergodic theorems.
We can reduce the problem of this middle term entirely to one independent of the
initial state x and involving only the reference state α. Suppose we have

|u(n) − π(α)| → 0, n → ∞. (13.19)

Then using Lemma D.7.1 we ﬁnd

lim ax ∗ u (n) = π(α) (13.20)

n →∞

provided we have (as we do for a Harris recurrent chain) that for all x

ax (j) = Px (τα < ∞) = 1. (13.21)
j

The convergence in (13.19) will be shown to hold for all states of an aperiodic positive
chain in the next section: we ﬁrst motivate our need for it, and for related results in
renewal theory, by developing the ergodic structure of chains with “ergodic atoms”.
13.1. Ergodic chains on countable spaces 319

Ergodic atoms
If Φ is positive Harris, an atom α ∈ B + (X) is called ergodic if it satisﬁes

lim |P n (α, α) − π(α)| = 0. (13.22)

n →∞

In the positive Harris case note that an atom can be ergodic only if the chain is
aperiodic.
With this notation, and the prescription for analyzing ergodic behavior inherent in
Proposition 13.1.1, we can prove surprisingly quickly the following solidarity result.
Theorem 13.1.2. If Φ is a positive Harris chain on a countable space, and if there
exists an ergodic atom α, then for every initial state x

P n (x, · ) − π
→ 0, n → ∞. (13.23)

Proof On a countable space the total variation norm is given simply by

P n (x, · ) − π
= |P n (x, y) − π(y)|
y

and so by (13.17) we have the total variation norm bounded by three terms:

∞

P n (x, · ) − π
≤ n
α P (x, y) + |ax ∗ u − π(α)| ∗ ty (n) + π(α) ty (j).
y y y j =n +1
(13.24)
We need to show each of these goes to zero. From the representation (13.13) of π and
Harris positivity,
∞
∞> π(y) = π(α) ty (j). (13.25)
y j =1 y

The third term in (13.24) is the tail sum in this representation and so we must have
∞

π(α) ty (j) → 0, n → ∞. (13.26)
j =n +1 y

The ﬁrst term in (13.24) also tends to zero, for we have the interpretation

α P (x, y) = Px (τα ≥ n)
n
(13.27)
y

and since Φ is Harris recurrent Px (τα ≥ n) → 0 for every x.

Finally, the middle term in (13.24) tends to zero by a double application of
Lemma D.7.1, ﬁrst using the assumption that α is ergodic so that (13.20) holds and,
∞
once we have this, using the ﬁniteness of j =1 y ty (j) given by (13.25).

320 Ergodicity

This approach may be extended to give the Ergodic Theorem for a general space
chain when there is an ergodic atom in the state space. A ﬁrst-entrance last-exit
decomposition will again give us an elegant proof in this case, and we prove such a
result in Section 13.2.3, from which basis we wish to prove the same type of ergodic
result for any positive Harris chain. To do this, we must of course prove that the atom
m
α̌ for the split skeleton chain Φ̌ , which we always have available, is an ergodic atom.
To show that atoms for aperiodic positive chains are indeed ergodic, which is crucial
to completing this argument, we need results from renewal theory. This is therefore
necessarily the subject of the next section.

13.2 Renewal and regeneration

13.2.1 Coupling renewal processes
When α is a recurrent atom in X, the sequence of return times given by τα (1) = τα and
for n > 1
τα (n) = min(j > τα (n − 1) : Φj = α)
is a specific example of a renewal process, as defined in Section 2.4.1.
The asymptotic structure of renewal processes has, deservedly, been the subject of
a great deal of analysis: such processes have a central place in the asymptotic theory
of many kinds of stochastic processes, but nowhere more than in the development of
asymptotic properties of general ψ-irreducible Markov chains.
Our goal in this section is to provide essentially those results needed for proving
the ergodic properties of Markov chains, and we shall do this through the use of the
so-called “coupling approach”. We will regrettably do far less than justice to the full
power of renewal and regenerative processes, or to the coupling method itself: for more
details on renewal and regeneration, the reader should consult Feller [114] or Kingman
[208], whilst the more recent flowering of the coupling technique is well covered by the
recent book by Lindvall [239].
As in Section 2.4.1 we let p = {p(j)} denote the distribution of the increments in a
renewal process, whilst a = {a(j)} and b = {b(j)} will denote possible delays in the first
increment variable S0 . For n = 1, 2, . . . let Sn denote the time of the (n + 1)st renewal,
so that the distribution of Sn is given by a ∗ pn ∗ if S0 has the delay distribution a.
Recall the standard notation
∞

u(n) = pj ∗ (n)
j =0

for the renewal function for n ≥ 0. Since p0∗ = δ0 we have u(0) = 1; by convention we
will set u(−1) = 0.
If we let Z(n) denote the indicator variables
"
1 Sj = n, some j ≥ 0
Z(n) =
0 otherwise,

then we have
Pa (Z(n) = 1) = a ∗ u (n),
13.2. Renewal and regeneration 321

and thus the renewal function represents the probabilities of {Z(n) = 1} when there is
no delay, or equivalently when a = δ0 .
The coupling approach involves the study of two linked renewal processes with the
same increment distribution but different initial distributions, and, most critically, de-
fined on the same probability space.
To describe this concept we define two sets of mutually independent random variables

{S0 , S1 , S2 , . . .}, {S0 , S1 , S2 , . . .}

where each of the variables {S1 , S2 , . . .} and {S1 , S2 , . . .} are independent and identically
distributed with distribution {p(j)}; but where the distributions of the independent
variables S0 , S0 are a, b.
The coupling time of the two renewal processes is deﬁned as

Tab = min{j : Za (j) = Zb (j) = 1}

where Za , Zb are the indicator sequences of each renewal process. The random time Tab
is the first time that a renewal takes place simultaneously in both sequences, and from
that point onwards, because of the loss of memory at the renewal epoch, the renewal
processes are identical in distribution.
The key requirement to use this method is that this coupling time be almost surely
finite. In this section we will show that if we have an aperiodic positive recurrent renewal
process with finite mean
∞
mp := jp(j) < ∞, (13.28)
j =0

then such coupling times are always almost surely ﬁnite.

Proposition 13.2.1. If the increment distribution has an aperiodic distribution p with
mp < ∞, then for any initial proper distributions a, b

P(Tab < ∞) = 1. (13.29)

Proof Consider the linked forward recurrence time chain V ∗ deﬁned by (10.19),
corresponding to the two independent renewal sequences {Sn , Sn }.
Let τ1,1 = min(n : Vn∗ = (1, 1)). Since the ﬁrst coupling takes place at τ1,1 + 1,

Tab = τ1,1 + 1

and thus we have that

P(Tab > n) = Pa×b (τ1,1 ≥ n). (13.30)
But we know from Section 10.3.1 that, under our assumptions of aperiodicity of p and
ﬁniteness of mp , the chain V ∗ is δ1,1 -irreducible and positive Harris recurrent. Thus
for any initial measure µ we have a fortiori

Pµ (τ1,1 < ∞) = 1;

and hence in particular for the initial measure a × b, it follows that

Pa×b (τ1,1 ≥ n) → 0, n→∞

322 Ergodicity

as required.

This gives a structure suﬃcient to prove

Theorem 13.2.2. Suppose that a, b, p are proper distributions on Z+ , and that u is the
renewal function corresponding to p. Then provided p is aperiodic with mean mp < ∞

|a ∗ u (n) − b ∗ u (n)| → 0, n → ∞. (13.31)

Proof Let us deﬁne the random variables

"
Za (n) n < Tab
Zab (n) =
Zb (n) n ≥ Tab

so that for any n

P(Zab (n) = 1) = P(Za (n) = 1). (13.32)

We have that

|a ∗ u (n) − b ∗ u (n)| = |P(Za (n) = 1) − P(Zb (n) = 1)|

= |P(Zab (n) = 1) − P(Zb (n) = 1)|
= |P(Za (n) = 1, Tab > n) + P(Zb (n) = 1, Tab ≤ n)
−P(Zb (n) = 1, Tab > n) − P(Zb (n) = 1, Tab ≤ n)|
= |P(Za (n) = 1, Tab > n) − P(Zb (n) = 1, Tab > n)|
≤ max{P(Za (n) = 1, Tab > n), P(Zb (n) = 1, Tab > n)}
≤ P(Tab > n). (13.33)

But from Proposition 13.2.1 we have that P(Tab > n) → 0 as n → ∞, and (13.31)
follows.

We will see in Section 18.1.1 that Theorem 13.2.2 holds even without the assumption
that mp < ∞. For the moment, however, we will concentrate on further aspects of
coupling when we are in the positive recurrent case.

13.2.2 Convergence of the renewal function

Suppose that we have a positive recurrent renewal sequence with ﬁnite mean mp < ∞.
Then the proper probability distribution e = e(n) deﬁned by
∞

n
e(n) := m−1
p p(j) = m−1
p (1 − p(j)) (13.34)
j =n +1 j =0

has been shown in (10.16) to be the invariant probability measure for the forward
recurrence time chain V + associated with the renewal sequence {Sn }. It also follows
that the delayed renewal distribution corresponding to the initial distribution e is given
13.2. Renewal and regeneration 323

for every n ≥ 0 by

Pe (Z(n) = 1) = e ∗ u (n)
= m−1
p (1 − p ∗ 1) ∗ u (n)
∞
= m−1
p (1 − p ∗ 1) ∗ ( p∗j ) (n)
j =0
∞ ∞

= m−1
p 1+1∗( p∗j )(n) − p ∗ 1 ∗ ( p∗j ) (n)
j =1 j =0

= m−1
p . (13.35)

For this reason the distribution e is also called the equilibrium distribution of the renewal
process.
These considerations show that in the positive recurrent case, the key quantity we
considered for Markov chains in (13.22) has the representation

|u(n) − m−1
p | = |Pδ 0 (Z(n) = 1) − Pe (Z(n) = 1)| (13.36)

and in order to prove an asymptotic limiting result for an expression of this kind, we
must consider the probabilities that Z(n) = 1 from the initial distributions δ0 , e.
But we have essentially evaluated this already. We have
Theorem 13.2.3. Suppose that a, p are proper distributions on Z+ , and that u is the
renewal function corresponding to p. Then provided p is aperiodic and has a ﬁnite mean
mp
|a ∗ u (n) − m−1
p | → 0, n → ∞. (13.37)

Proof The result follows from Theorem 13.2.2 by substituting the equilibrium
distribution e for b and using (13.35).

This has immediate application in the case where the renewal process is the return
time process to an accessible atom for a Markov chain.
Proposition 13.2.4. (i) If Φ is a positive recurrent aperiodic Markov chain, then
any atom α in B + (X) is ergodic.
(ii) If Φ is a positive recurrent aperiodic Markov chain on a countable space, then for
every initial state x

P n (x, · ) − π
→ 0, n → ∞. (13.38)

Proof We know from Proposition 10.2.2 that if Φ is positive recurrent then the
mean return time to any atom in B + (X) is finite. If the chain is aperiodic then (i)
follows directly from Theorem 13.2.3 and the definition (13.22).
The conclusion in (ii) then follows from (i) and Theorem 13.1.2.

It is worth stressing explicitly that this result depends on the classification of positive
chains in terms of finite mean return times to atoms: that is, in using renewal theory
it is the equivalence of positivity and regularity of the chain that is utilized.
324 Ergodicity

13.2.3 The regenerative decomposition for chains with atoms

We now consider general positive Harris chains and use the renewal theorems above to
commence development of their ergodic properties.
In order to use the splitting technique for analysis of total variation norm conver-
gence for general state space chains we must extend the ﬁrst-entrance last-exit decom-
position (13.9) to general spaces. For any sets A, B ∈ B(X) and x ∈ X we have, by
decomposing the event {Φn ∈ B} over the times of the ﬁrst and last entrances to A
prior to n, that

n −1 j

j −k n −j
P n (x, B) = A P n (x, B) + A P k
(x, dv)P (v, dw) AP (w, B). (13.39)
j =1 A k =1 A

If we suppose that there is an atom α and take A = α then these forms are somewhat
simpliﬁed: the decomposition (13.39) reduces to

n −1
j
n n
P (x, B) = α P (x, B) + αP
k
(x, α)P j −k (α, α) α P n −j (α, B). (13.40)
j =1 k =1

In the general state space case it is natural to consider convergence from an arbitrary
initial distribution λ. It is equally natural to consider convergence of the integrals

Eλ [f (Φn )] = λ(dx) P n (x, dy)f (w) (13.41)

for arbitrary non-negative functions f . We will use either the probabilistic or the
operator-theoretic version of this quantity (as given by the two sides of (13.41)) inter-
changeably, as seems most transparent, in what follows.
We explore convergence of Eλ [f (Φn )] for general (unbounded) f in detail in Chap-
ter 14. Here we concentrate on bounded f , in view of the deﬁnition (13.7) of the total
variation norm.
When α is an atom in B + (X), let us therefore extend the notation in (13.10)–(13.12)
to the forms
aλ (n) = Pλ (τα = n), (13.42)

tf (n) = α P n (α, dy)f (y) = Eα [f (Φn )I{τα ≥ n}] : (13.43)

these are well defined (although possibly infinite) for any non-negative function f on X
and any probability measure λ on B(X).
As in (13.14) and (13.15) we can use this terminology to write the first-entrance and
last-exit formulations as

λ(dx)P n (x, α) = aλ ∗ u (n), (13.44)

P n (α, dy)f (y) = u ∗ tf (n). (13.45)
13.2. Renewal and regeneration 325

The ﬁrst-entrance last-exit decomposition (13.40) can similarly be formulated, for any
λ, f , as

λ(dx) P (x, dw)f (w) = λ(dx) α P n (x, dw)f (w) + aλ ∗ u ∗ tf (n). (13.46)
n

The general state space version of Proposition 13.1.1 provides the critical bounds needed
for our approach to ergodic theorems. Using the notation of (13.41) we have two bounds
which we shall refer to as Regenerative Decompositions.
Theorem 13.2.5. Suppose that Φ admits an accessible atom α and is positive Harris
recurrent with invariant probability measure π. Then for any probability measure λ and
f ≥ 0,

| Eλ [f (Φn )] − Eα [f (Φn )] | ≤ Eλ [f (Φn )I{τα ≥ n}]

(13.47)
+ |aλ ∗ u − u| ∗ tf (n),

| Eλ [f (Φn )] − Eπ [f (Φn )] | ≤ Eλ [f (Φn )I{τα ≥ n}]

+ | aλ ∗ u − π(α) | ∗ tf (n) (13.48)

∞
+ π(α) j =n +1 tf (j).

Proof The ﬁrst-entrance last-exit decomposition (13.46), in conjunction with the

simple last exit decomposition in the form (13.45), gives the ﬁrst bound on the distance
between Eλ [f (Φn )] and Eα [f (Φn )] in (13.47).
The decomposition (13.46) also gives

| Eλ [f (Φn )] − Eπ [f (Φn )] | ≤ Eλ [f (Φn )I{τα ≥ n}]

n

+ aλ ∗ u ∗ tf (n) − π(α) j =1 tf (j) (13.49)
n

+ π(α) j =1 tf (j) − π(dw)f (w) .

Now in the general state space case we have the representation for π given from (10.31)
by
∞

π(dw)f (w) = π(α) tf (y); (13.50)
1

and (13.48) now follows from (13.49).

The Regenerative Decomposition (13.48) in Theorem 13.2.5 shows clearly what is
needed to prove limiting results in the presence of an atom. Suppose that f is bounded.
Then we must
(E1) control the third term in (13.48), which involves questions of the finiteness of
π, but is independent of the initial measure λ: this finiteness is guaranteed for
positive chains by definition;
326 Ergodicity

(E2) control the first term in (13.48), which involves questions of the finiteness of the
hitting time distribution of τα when the chain begins with distribution λ; this is
automatically finite as required for a Harris recurrent chain, even without positive
recurrence, although for chains which are only recurrent it clearly needs care;
(E3) control the middle term in (13.48), which again involves finiteness of π to bound
its last element, but more crucially then involves only the ergodicity of the atom
α, regardless of λ: for we know from Lemma D.7.1 that if the atom is ergodic so
that (13.19) holds then also
lim aλ ∗ u (n) = π(α), (13.51)
n →∞

since for Φ a Harris recurrent chain, any probability measure λ satisﬁes

aλ (n) = Pλ (τα < ∞) = 1. (13.52)
n

Thus recurrence, or rather Harris recurrence, will be used twice to give bounds: positive
recurrence gives one bound; and, centrally, the equivalence of positivity and regularity
ensures the atom is ergodic, exactly as in Theorem 13.2.3.
Bounded functions are the only ones relevant to total variation convergence. The
Regenerative Decomposition is however valid for all f ≥ 0. Bounds in this decom-
position then involve integrability of f with respect to π, and a non-trivial extension
of regularity to what will be called f -regularity. This will be held over to the next
chapter, and here we formalize the above steps and incorporate them with the splitting
technique, to prove the Aperiodic Ergodic Theorem.

13.3 Ergodicity of positive Harris chains

13.3.1 Strongly aperiodic chains
The prescription (E1)–(E3) above for ergodic behavior is followed in the proof of
Theorem 13.3.1. If Φ is a positive Harris recurrent and strongly aperiodic chain, then
for any initial measure λ

λ(dx)P n (x, · ) − π
→ 0, n → ∞. (13.53)

Proof (i) Let us ﬁrst assume that there is an accessible ergodic atom in the
space. The proof is virtually identical to that in the countable case. We have

λ(dx)P n (x, · ) − π
= sup λ(dx) P n (x, dw)f (w) − π(dw)f (w)
|f |≤1

and we use (13.48) to bound these terms uniformly for functions f ≤ 1.

Since |f | ≤ 1 the third term in (13.48) is bounded above by
∞

π(α) t1 (j) → 0, n→∞ (13.54)
n +1
13.3. Ergodicity of positive Harris chains 327

since it is the tail sum in the representation (13.50) of π(X).

The second term in (13.48) is bounded above by

|aλ ∗ u − π(α)| ∗ t1 (n) → 0, n → ∞, (13.55)

by Lemma D.7.1; here we use the fact that α is ergodic and, again, the representation
∞
that π(X) = π(α) 1 t1 (j) < ∞.
We must ﬁnally control the ﬁrst term. To do this, we need only note that, again
since |f | ≤ 1, we have
Eλ [f (Φn )I{τα ≥ n}] ≤ Pλ (τα ≥ n) (13.56)
and this expression tends to zero by monotone convergence as n → ∞, since α is Harris
recurrent and Px (τα < ∞) = 1 for every x.
Notice explicitly that in (13.54)–(13.56) the bounds which tend to zero are indepen-
dent of the particular |f | ≤ 1, and so we have the required supremum norm convergence.
(ii) Now assume that Φ is strongly aperiodic. Consider the split chain Φ̌: we
know this is also strongly aperiodic from Proposition 5.5.6 (ii), and positive Harris
from Proposition 10.4.2. Thus from Proposition 13.2.4 the atom α̌ is ergodic. Now our
use of total variation norm convergence renders the transfer to the original chain easy.
Using the fact that the original chain is the marginal chain of the split chain, and that
π is the marginal measure of π̌, we have immediately

λ(dx)P n (x, · ) − π
= 2 sup | λ(dx)P n (x, A) − π(A)|
A ∈B(X) X

= 2 sup | λ∗ (dxi )P̌ n (xi , A) − π̌(A)|
A ∈B(X) X̌

≤ 2 sup | λ∗ (dxi )P̌ n (xi , B̌) − π̌(B̌)|
B̌ ∈B( X̌) X̌

=
λ∗ (dxi )P̌ n (xi , · ) − π̌
, (13.57)

where the inequality follows since the ﬁrst supremum is over sets in B(X̌) of the form
A0 ∪ A1 and the second is over all sets in B(X̌).
Applying the result (i) for chains with accessible atoms shows that the total variation
norm in (13.57) for the split chain tends to zero, so we are ﬁnished.

13.3.2 The ergodic theorem for ψ-irreducible chains

We can now move from the strongly aperiodic chain result to arbitrary aperiodic Harris
recurrent chains. This is made simpler as a result of another useful property of the total
variation norm.
Proposition 13.3.2. If π is invariant for P , then the total variation norm

λ(dx)P n (x, · ) − π

is non-increasing in n.
328 Ergodicity

Proof We have from the deﬁnition of total variation and the invariance of π that

λ(dx)P n +1 (x, · ) − π

= sup | λ(dx)P n +1 (x, dy)f (y) − π(dy)f (y)|
f :|f |≤1

= sup | λ(dx)P n (x, dw) P (w, dy)f (y) − π(dw) P (w, dy)f (y) |
f :|f |≤1

≤ sup | λ(dx)P n (x, dw)f (w) − π(dw)f (w)| (13.58)
f :|f |≤1

since whenever |f | ≤ 1 we also have |P f | ≤ 1.

We can now prove the general state space result in the aperiodic case.

Theorem 13.3.3. If Φ is positive Harris and aperiodic, then for every initial distri-
bution λ

λ(dx)P n (x, · ) − π
→ 0, n → ∞. (13.59)

Proof Since for some m the skeleton Φm is strongly aperiodic, and also positive
Harris by Theorem 10.4.5, we know that

λ(dx)P n m (x, · ) − π
→ 0, n → ∞. (13.60)

The result for P n then follows immediately from the monotonicity in (13.58).

As we mentioned in the discussion of the periodic behavior of Markov chains, the
results are not quite as simple to state in the periodic as in the aperiodic case; but they
can be easily proved once the aperiodic case is understood.
The asymptotic behavior of positive recurrent chains which may not be Harris is
also easy to state now that we have analyzed positive Harris chains.
The ﬁnal formulation of these results for quite arbitrary positive recurrent chains is

Theorem 13.3.4. (i) If Φ is positive Harris with period d ≥ 1, then for every initial
distribution λ

d−1
−1

d λ(dx) P n d+r (x, · ) − π
→ 0, n → ∞. (13.61)
r =0

(ii) If Φ is positive recurrent with period d ≥ 1, then there is a π-null set N such that
for every initial distribution λ with λ(N ) = 0

d−1
−1

d λ(dx) P n d+r (x, · ) − π
→ 0, n → ∞. (13.62)
r =0
13.4. Sums of transition probabilities 329

Proof The result (i) is straightforward to check from the existence of cycles in
Section 5.4.3, together with the fact that the chain restricted to each cyclic set is
aperiodic and positive Harris on the d-skeleton. We then have (ii) as a direct corollary
of the decomposition of Theorem 9.1.5.

Finally, let us complete the circle by showing the last step in the equivalences in The-
orem 13.0.1. Notice that (13.63) is ensured by (13.1), using the Dominated Convergence
Theorem, so that our next result is in fact marginally stronger than the corresponding
statement of the Aperiodic Ergodic Theorem.

Theorem 13.3.5. Let Φ be ψ-irreducible and aperiodic, and suppose that there exists
some ν-small set C ∈ B + (X) and some P ∞ (C) > 0 such that as n → ∞

νC (dx)(P n (x, C) − P ∞ (C)) → 0 (13.63)
C

where νC ( · ) = ν( · )/ν(C) is normalized to a probability on C. Then the chain is

positive, and there exists a ψ-null set such that for every initial distribution λ with
λ(N ) = 0

λ(dx)P n (x, · ) − π
→ 0, n → ∞. (13.64)

Proof Using the Nummelin splitting via the set C for the m-skeleton, we ﬁnd that
(13.63) taken through the sublattice nm is equivalent to

δ −1 (P̌ n (α̌, α̌) − δP ∞ (C)) → 0. (13.65)

Thus the atom α̌ is ergodic and the results of Section 13.3 all hold, with P ∞ (C) = π(C).

13.4 Sums of transition probabilities

13.4.1 A stronger coupling theorem
In order to derive bounds such as those in (13.5) on the sums of n-step total variation
diﬀerences from the invariant measure π, we need to bound sums of terms such as
|P n (α, α) − π(α)| rather than the individual terms. This again requires a renewal
theory result, which we prove using the coupling method. We have

Proposition 13.4.1. Suppose that a, b, p are proper distributions on Z+ , and that u is

the renewal function corresponding to p. Then provided p is aperiodic and has a ﬁnite
mean mp , and a, b also have ﬁnite means ma , mb , we have
∞

|a ∗ u (n) − b ∗ u (n)| < ∞. (13.66)
n =0
330 Ergodicity

Proof We have from (13.33) that

∞
∞

|a ∗ u (n) − b ∗ u (n)| ≤ P(Tab > n) = E[Tab ]. (13.67)
n =0 n =0

Now we know from Section 10.3.1 that when p is aperiodic and mp < ∞, the linked
forward recurrence time chain V ∗ is positive recurrent with invariant probability

e∗ (i, j) = e(i)e(j).

Hence from any state (i, j) with e∗ (i, j) > 0 we have as in Proposition 11.1.1

Ei,j [τ1,1 ] < ∞. (13.68)

Let us consider speciﬁcally the initial distributions δ0 and δ1 : these correspond to the
undelayed renewal process and the process delayed by exactly one time unit respectively.
For this choice of initial distribution we have for n > 0

δ0 ∗ u (n) = u(n),
δ1 ∗ u (n) = u(n − 1).

Now E[T01 ] ≤ E1,2 [τ1,1 ]+1 and it is certainly the case that e∗ (1, 2) > 0. So from (13.30),
(13.67) and (13.68)

∞

Var (u) := |u(n) − u(n − 1)| ≤ E1,2 [τ1,1 ] + 1 < ∞. (13.69)
n =0

We now need to extend the result to more general initial distributions with ﬁnite mean.
By the triangle inequality it suﬃces to consider only one arbitrary initial distribution a
and to take the other as δ0 . To bound the resulting quantity |a ∗ u (n) − u(n)| we write
the upper tails of a for k ≥ 0 as

∞

k
a(k) := a(j) = 1 − a(j)
j =k +1 j =0

and put

w(k) = |u(k) − u(k − 1)|.

13.4. Sums of transition probabilities 331

We then have the relation

n
a ∗ w (n) = a(j)w(n − j)
j =0

n
j
≥ | [1 − a(k)][u(n − j) − u(n − j − 1)]|
j =0 k =0
n
= | [u(n − j) − u(n − j − 1)]
j =0

so that
|u(n) − a ∗ u (n)| ≤ a ∗ w (n) = [ a(n)][ w(n)]. (13.71)
n n n n

But by assumption the mean ma = a(n) is ﬁnite, and (13.69) shows that the sequence
w(n) is also summable; and so we have

|u(n) − a ∗ u (n)| ≤ ma Var (u) < ∞ (13.72)
n

as required.

It is obviously of considerable interest to know under what conditions we have

|a ∗ u (n) − m−1
p | < ∞; (13.73)
n

that is, when this result holds with the equilibrium measure as one of the initial mea-
sures.
Using Proposition 13.4.1 we know that this will occur if the equilibrium distribution
e has a ﬁnite mean; and since we know the exact structure of e it is obvious that me < ∞
if and only if
sp := n2 p(n) < ∞.
n

In fact, using the exact form

me = [sp − mp ]/[2mp ]

we have from Proposition 13.4.1 and in particular the bound (13.71) the following
pleasing corollary:
332 Ergodicity

Proposition 13.4.2. If p is an aperiodic distribution with sp < ∞, then

|u(n) − m−1p | ≤ Var (u)[sp − mp ]/[2mp ] < ∞. (13.74)
n

13.4.2 General chains with atoms

We now reﬁne the ergodic theorem Theorem 13.3.3 to give conditions under which sums
such as
∞

P n (x, · ) − P n (y, · )

n =1

are ﬁnite. A result such as this requires regularity of the initial states x, y: recall from
Chapter 11 that a probability measure µ on B(X) is called regular if

Eµ [τB ] < ∞, B ∈ B + (X).

We will again follow the route of ﬁrst considering chains with an atom, then translating
the results to strongly aperiodic and thence to general chains.
Theorem 13.4.3. Suppose Φ is an aperiodic positive Harris chain and suppose that
the chain admits an atom α ∈ B+ (X). Then for any regular initial distributions λ, µ,
∞

λ(dx)µ(dy)
P n (x, · ) − P n (y, · )
< ∞; (13.75)
n =1

and in particular, if Φ is regular, then for every x, y ∈ X

∞

P n (x, · ) − P n (y, · )
< ∞. (13.76)
n =1

Proof By the triangle inequality it will suﬃce to prove that

∞
λ(dx)
P n (x, · ) − P n (α, · )
< ∞, (13.77)
n =1

that is, to assume that one of the initial distributions is δα .

If we sum the first Regenerative Decomposition (13.47) in Theorem 13.2.5 with
f ≤ 1 we find (13.77) is bounded by two sums: firstly,
∞

λ(dx)α P n (x, X) = Eλ [τα ] (13.78)
n =1

which is ﬁnite since λ is regular; and secondly,

$ ∞ %$
∞ %
λ(dx)|ax ∗ u (n) − u(n)| n
α P (α, X) . (13.79)
n =1 n =1
13.4. Sums of transition probabilities 333

∞
To bound this term note that n =1 α P n (α, X) = Eα [τα ] < ∞ since every accessible
atom is regular from Theorems 11.1.4 and 11.1.2; and so it remains only to prove that
∞
λ(dx)|ax ∗ u (n) − u(n)| < ∞. (13.80)
n =1

From (13.71) we have

∞

∞
∞
|ax ∗ u (n) − u(n)| ≤ ax (n) |u(n) − u(n − 1)|
n =1 n =1 n =1
= Ex [τα ]Var (u),
and hence the sum (13.80) is bounded by Eλ [τα ]Var (u), which is again ﬁnite by Propo-
sition 13.4.1 and regularity of λ.

13.4.3 General aperiodic chains

The move from the atomic case is by now familiar.
Theorem 13.4.4. Suppose Φ is an aperiodic positive Harris chain. For any regular
initial distributions λ, µ
∞

λ(dx)µ(dy)
P n (x, · ) − P n (y, · )
< ∞. (13.81)
n =1

Proof Consider the strongly aperiodic case. The theorem is valid for the split
chain, since the split measures λ∗ , µ∗ are regular for Φ̌: this follows from the charac-
terization in Theorem 11.3.12.
Since the result is a total variation result it remains valid when restricted to the
original chain, as in (13.57).
In the arbitrary aperiodic case we can apply Proposition 13.3.2 to move to a skeleton
chain, as in the proof of Theorem 13.2.5.

The most interesting special case of this result is given in the following theorem.
Theorem 13.4.5. Suppose Φ is an aperiodic positive Harris chain and that α is an
accessible atom. If
Eα [τα2 ] < ∞, (13.82)
then for any regular initial distribution λ
∞

λP n − π
< ∞. (13.83)
n =1

Proof In the case where there is an atom α in the space, we have as in Proposi-
tion 13.4.2 that π is a regular measure when the second-order moment (13.82) is ﬁnite,
and the result is then a consequence of Theorem 13.4.4.
334 Ergodicity

13.5 Commentary*
It is hard to know where to start in describing contributions to these theorems. The
countable chain case has an immaculate pedigree: Kolmogorov [215] first proved this
result, and Feller [114] and Chung [71] give refined approaches to the single-state version
(13.6), essentially through analytic proofs of the lattice renewal theorem.
The general state space results in the positive recurrent case are largely due to
Harris [155] and to Orey [308]. Their results and related material, including a null
recurrent version in Section 18.1 below, are all discussed in a most readable way in
Orey’s monograph [309]. Prior to the development of the splitting technique, proofs
utilized the concept of the tail σ-field of the chain, which we have not discussed so far,
and will only touch on in Chapter 17.
The coupling proofs are much more recent, although they are usually dated to
Doeblin [94]. Pitman [317] first exploited the positive recurrent coupling in the way
we give it here, and his use of the result in Proposition 13.4.1 was even then new, as
was Theorem 13.4.4.
Our presentation of this material has relied heavily on Nummelin [303], and further
related results can be found in his Chapter 6. In particular, for results of this kind in a
more general setting
where the renewal sequence is allowed to vary from the probabilistic
structure with n p(n) = 1 which we have used, the reader is referred to Chapters 4
and 6 of [303].
It is interesting to note that the first-entrance last-exit decomposition, which shows
so clearly the role of the single ergodic atom, is a relative late-comer on the scene.
Although probably used elsewhere, it surfaces in the form given here in Nummelin [301]
and Nummelin and Tweedie [307], and appears to be less than well known even in the
countable state space case. Certainly, the proof of ergodicity is much simplified by using
the Regenerative Decomposition.
We should note, for the reader who is yet again trying to keep stability nomenclature
straight, that even the “ergodicity” terminology we use here is not quite standard: for
example, Chung [71] uses the word “ergodic” to describe certain ratio limit theorems
rather than the simple limit theorem of (13.8). We do not treat ratio limit theorems in
this book, except in passing in Chapter 17: it is a notable omission, but one dictated by
the lack of interesting examples in our areas of application. Hence no confusion should
arise, and our ergodic chains certainly coincide with those of Feller [114], Nummelin
[303] and Revuz [326]. The latter two books also have excellent treatments of ratio
limit theorems.
We have no examples in this chapter. This is deliberate. We have shown in Chap-
ter 11 how to classify specific models as positive recurrent using drift conditions: we
can say little else here other than that we now know that such models converge in the
relatively strong total variation norm to their stationary distributions. Over the course
of the next three chapters, we will however show that other much stronger ergodic
properties hold under other more restrictive drift conditions; and most of the models
in which we have been interested will fall into these more strongly stable categories.

Commentary for the second edition: We wrote in Section 13.2 that we will re-
grettably do far less than justice to the full power of renewal and regenerative processes,
or to the coupling method itself. It is true that the proof of ergodicity in this chapter
13.5. Commentary* 335

and the reﬁnements that follow can be streamlined by using the split chain machinery
more fully. In particular, rather than prove a renewal theorem such as (13.31) and
then use this to prove an ergodic theorem such as Proposition 13.2.4, it is far simpler
to use coupling to prove the ergodic theorem directly as in [127, 128]. See also the
aforementioned book by Lindvall on the coupling method [239].
Chapter 14

f -Ergodicity and f -regularity

In Chapter 13 we considered ergodic chains for which the limit

lim Ex [f (Φk )] = f dπ (14.1)
k →∞

exists for every initial condition and every bounded function f on X.

An assumption that f is bounded is often unsatisfactory in applications. For exam-
ple, f may denote a cost function in an optimal control problem, in which case f (Φn )
will typically be a coercive function of Φn on X; in queueing applications, the function
f (x) might denote buffer levels in a queue corresponding to the particular state x ∈ X
which is, again, typically an unbounded function on X; in storage models, f may denote
penalties for high values of the storage level, which correspond to overflow penalties in
reality.
The purpose of this chapter is to relax the boundedness condition by developing more
general formulations of regularity and ergodicity. Our aim is to obtain convergence
results of the form (14.1) for the mean value of f (Φk ), where f : X → [1, ∞) is an
arbitrary fixed function. As in Chapter 13, it will be shown that the simplest approach
to ergodic theorems of this kind is to consider simultaneously all functions which are
dominated by f : that is, to consider convergence in the f -norm, defined as

ν
f = sup |ν(g)|
g :|g |≤f

where ν is any signed measure.

The goals described above are achieved in the following f -Norm Ergodic Theorem
for aperiodic chains.
Theorem 14.0.1 (f -Norm Ergodic Theorem). Suppose that the chain Φ is ψ-
irreducible and aperiodic, and let f ≥ 1 be a function on X. Then the following condi-
tions are equivalent:
(i) The chain is positive recurrent with invariant probability measure π and

π(f ) := π(dx)f (x) < ∞.

336
f -Ergodicity and f -regularity 337

(ii) There exists some petite set C ∈ B(X) such that

C −1
τ
sup Ex [ f (Φn )] < ∞. (14.2)
x∈C n =0

(iii) There exists some petite set C and some extended-valued non-negative function V
satisfying V (x0 ) < ∞ for some x0 ∈ X, and
∆V (x) ≤ −f (x) + bIC (x), x ∈ X. (14.3)

Any of these three conditions imply that the set SV = {x : V (x) < ∞} is absorbing
and full, where V is any solution to (14.3) satisfying the conditions of (iii), and any
sublevel set of V satisﬁes (14.2); and for any x ∈ SV ,

P n (x, · ) − π
f → 0 (14.4)
as n → ∞. Moreover, if π(V ) < ∞, then there exists a ﬁnite constant Bf such that for
all x ∈ SV ,
∞

P n (x, · ) − π
f ≤ Bf (V (x) + 1). (14.5)
n =0

Proof The equivalence of (i) and (ii) follows from Theorem 14.1.1 and Theo-
rem 14.2.11. The equivalence of (ii) and (iii) is in Theorems 14.2.3 and 14.2.4, and the
fact that sublevel sets of V are “self-regular” as in (14.2) is shown in Theorem 14.2.3.
The limit theorems are then contained in Theorems 14.3.3, 14.3.4 and 14.3.5.

Much of this chapter is devoted to proving this result, and related f -regularity prop-
erties which follow from (14.2), and the pattern is not dissimilar to that in the previous
chapter: indeed, those ergodicity results, and the equivalences in Theorem 13.0.1, can
be viewed as special cases of the general f results we now develop.
The f -norm limit (14.4) obviously implies that the simpler limit (14.1) also holds.
In
fact, if g is any function satisfying |g| ≤ c(f +1) for some c < ∞ then Ex [g(Φk )] → g dπ
for states x with V (x) < ∞, for V satisfying (14.3). We formalize the behavior we will
analyze in

f -Ergodicity
We shall say that the Markov chain Φ is f -ergodic if f ≥ 1 and

(i) Φ is positive Harris recurrent with invariant probability π;

(ii) the expectation π(f ) is ﬁnite;

(iii) for every initial condition of the chain,

lim
P k (x, · ) − π
f = 0.
k →∞
338 f -Ergodicity and f -regularity

The f -Norm Ergodic Theorem states that if any one of the equivalent conditions of
the Aperiodic Ergodic Theorem holds then the simple additional condition that π(f ) is
finite is enough to ensure that a full absorbing set exists on which the chain is f -ergodic.
Typically the way in which finiteness of π(f ) would be established in an application is
through finding a test function V satisfying (14.3): and if, as will typically happen, V
is finite everywhere then it follows that the chain is f -ergodic without restriction, since
then SV = X.

14.1 f -Properties: chains with atoms

14.1.1 f -Regularity for chains with atoms
We have already given the pattern of approach in detail in Chapter 13. It is not
worthwhile treating the countable case completely separately again: as was the case for
ergodicity properties, a single accessible atom is all that is needed, and we will initially
develop f -ergodic theorems for chains possessing such an atom.
The generalization from total variation convergence to f -norm convergence given an
initial accessible atom α can be carried out based on the developments of Chapter 13,
and these also guide us in developing characterizations of the initial measures λ for
which general f -ergodicity might be expected to hold. It is in this part of the analysis,
which corresponds to bounding the ﬁrst term in the Regenerative Decomposition of
Theorem 13.2.5, that the hard work is needed, as we now discuss.
Suppose that Φ admits an atom α and is positive Harris recurrent with invariant
probability measure π. Let f ≥ 1 be arbitrary: that is, we place no restrictions on the
boundedness or otherwise of f . Recall that for any probability measure λ we have from
the Regenerative Decomposition that for arbitrary |g| ≤ f ,

|Eλ [g(Φn )] − π(g)| ≤ λ(dx) α P n (x, dw)f (w) (14.6)
∞

+ | aλ ∗ u − π(α) | ∗ tf (n) + π(α) tf (j).
j =n +1

Using hitting time notation we have

∞

τα
tf (n) = Eα f (Φj ) (14.7)
n =1 j =1

and thus the finiteness of this expectation will guarantee convergence of the third term
in (14.6), as it did in the case of the ergodic theorems in Chapter 13. Also as in
Chapter 13, the central term in (14.6) is controlled by the convergence of the renewal
sequence u regardless of f , provided the expression in (14.7) is finite.
Thus it is only the first term in (14.6) that requires a condition other than ergodicity
and finiteness of (14.7). Somewhat surprisingly, for unbounded f this is a much more
troublesome term to control than for bounded f , when it is a simple consequence of
recurrence that it tends to zero. This first term can be expressed alternatively as

6 7
λ(dx) α P n (x, dw)f (w) = Eλ f (Φn )I(τα ≥ n) (14.8)
14.1. f -Properties: chains with atoms 339

and so we have the representation

∞

τα
n
λ(dx) αP (x, dw)f (w) = Eλ f (Φj ) . (14.9)
n =1 j =1

This is similar in form to (14.7), and if (14.9) is finite, then we have the desired con-
clusion that (14.8) does tend to zero. In fact, it is only the sum of these terms that
appears tractable, and for this reason it is in some ways more natural to consider the
summed form (14.5) rather than simple f -norm convergence.
Given this motivation to require finiteness of (14.7) and (14.9), we introduce the
concept of f -regularity which strengthens our definition of ordinary regularity.

f -Regularity
A set C ∈ B(X) is called f -regular, where f : X → [1, ∞) is a measurable
function, if for each B ∈ B+ (X),

τ
B −1
sup Ex f (Φk ) < ∞.
x∈C
k =0

A measure λ is called f -regular if for each B ∈ B+ (X),

τ
B −1
Eλ f (Φk ) < ∞.
k =0

The chain Φ is called f -regular if there is a countable cover of X with

f -regular sets.

From
τ B −1
this definition
an f -regular state, seen as a singleton set, is a state x for which
k =0 f (Φk ) < ∞, B ∈ B (X).
+
Ex
As with regularity, this definition of f -regularity appears initially to be stronger
than required since it involves all sets in B + (X); but we will show this to be again
illusory.
A first consequence of f -regularity, and indeed of the weaker “self-f -regular” form
in (14.2), is

Proposition 14.1.1. If Φ is recurrent with invariant measure π and there exists C ∈

B(X) satisfying π(C) < ∞ and

C −1
τ
sup Ex [ f (Φn )] < ∞, (14.10)
x∈C n =0

then Φ is positive recurrent and π(f ) < ∞.

340 f -Ergodicity and f -regularity

Proof First of all, observe that under (14.10) the set C is Harris recurrent and
hence C ∈ B + (X) by Proposition 9.1.1. The invariant measure π then satisﬁes, from
Theorem 10.4.9,
τ
C −1
π(f ) = π(dy)Ey f (Φn ) .
C n =0
If C satisﬁes (14.10) then the expectation is uniformly bounded on C itself, so that
π(f ) ≤ π(C)MC < ∞.

Although f -regularity is a requirement on the hitting times of all sets, when the
chain admits an atom it reduces to a requirement on the hitting times of the atom as
was the case with regularity.
Proposition 14.1.2. Suppose Φ is positive recurrent with π(f ) < ∞, and that an atom
α ∈ B+ (X) exists.
(i) Any set C ∈ B(X) is f -regular if and only if

σα
sup Ex f (Φk ) < ∞.
x∈C
k =0

(ii) There exists an increasing sequence of sets Sf (n) where each Sf (n) is f -regular
and the set Sf = ∪Sf (n) is full and absorbing.

Proof Consider the function Gα (x, f ) previously deﬁned in (11.21) by

σα
Gα (x, f ) = Ex [ f (Φk )]. (14.11)
k =0

When π(f ) < ∞, by Theorem 11.3.5 the bound P Gα (x, f ) ≤ Gα (x, f ) + c holds for
τ
the constant c = Eα [ kα=1 f (Φk )] = π(f )/π(α) < ∞, which shows that the set {x :
Gα (x, f ) < ∞} is absorbing, and hence by Proposition 4.2.3 this set is full.
To prove (i), let B be any sublevel set of the function Gα (x, f ) with π(B) > 0 and
apply the bound
B −1
τ
σα
Gα (x, f ) ≤ Ex [ f (Φk )] + sup Ey [ f (Φk )].
y ∈B
k =0 k =0

This shows that Gα (x, f ) is bounded on C if C is f -regular, and proves the “only if”
part of (i).
We have from Theorem 10.4.9 that for any B ∈ B + (X),

τB
∞ > π(dx)Ex f (Φk )
B k =0

τB
≥ π(dx)Ex I(σα < τB ) f (Φk )
B k =σ α +1
τB
= π(dx)Px (σα < τB )Eα f (Φk )
B k =1
14.1. f -Properties: chains with atoms 341

where to obtain the last equality we have conditioned at time σα and used the strong
Markov property.
Since α ∈ B+ (X) we have that
τ
B −1
π(α) = π(dx)Ex I(Φk ∈ α) > 0,
B k =0

which
shows that B π(dx)Px (σα < τB ) > 0. Hence from the previous bounds, we have
τB
k =1 f (Φk ) < ∞ for B ∈ B (X).
+
Eα
Using the bound τB ≤ σα + θσ α τB , we have for arbitrary x ∈ X,

τB
σα
τB
Ex f (Φk ) ≤ Ex f (Φk ) + Eα f (Φk ) (14.12)
k =0 k =0 k =1

and hence C is f -regular if Gα (x, f ) is bounded on C, which proves (i).

To prove (ii), observe that from (14.12) we have that the set Sf (n):={x : Gα (x, f ) ≤
n} is f -regular, and so the proposition is proved.

14.1.2 f -Ergodicity for chains with atoms

As we have foreshadowed, f -regularity is exactly the condition needed to obtain con-
vergence in the f -norm.
Theorem 14.1.3. Suppose that Φ is positive Harris, aperiodic, and that an atom
α ∈ B + (X) exists.
(i) If π(f ) < ∞, then the set Sf of f -regular states is absorbing and full, and for any
x ∈ Sf we have

P k (x, · ) − π
f → 0, k → ∞.

(ii) If Φ is f -regular, then Φ is f -ergodic.

(iii) There exists a constant Mf < ∞ such that for any two f -regular initial distribu-
tions λ and µ,
∞

λ(dx)µ(dy)
P n (x, · ) − P n (y, · )
f
n =1

≤ Mf λ(dx)Gα (x, f ) + µ(dy)Gα (y, f ) . (14.13)

Proof From Proposition 14.1.2 (ii), the set of f -regular states Sf is absorbing and
full when π(f ) < ∞. If we can prove
P k (x, · )−π
f → 0, for x ∈ Sf , this will establish
both (i) and (ii).
But this f -norm convergence follows from (14.6), where the ﬁrst term tends to zero
τα
since
∞ x is f -regular, so
τ α that E x [ n =1 f (Φn )] < ∞; the third term tends to zero since
n =1 t f (j) = E α [ n =1 f (Φn )] = π(f )/π(α) < ∞; and the central term converges to
zero by Lemma D.7.1 and the fact that α is an ergodic atom.
342 f -Ergodicity and f -regularity

To prove the result in (iii), we use the same method of proof as for the ergodic case.
By the triangle inequality it suﬃces to assume that one of the initial distributions is δα .
We again use the ﬁrst form of the Regenerative Decomposition Theorem to see that for
any |g| ≤ f , x ∈ X, the sum
∞

λ(dx)|P n (x, g) − P n (α, g)|
n =1

is bounded by the sum of the following two terms:

∞

τα
λ(dx)α P n (x, f ) = Eλ f (Φn ) , (14.14)
n =1 n =1

∞
$ %$
∞ %
λ(dx)|ax ∗ u (n) − u(n)| n
α P (α, f ) . (14.15)
n =1 n =1

The first of these is again finite since we have assumed λ to be f -regular; and in the
second, the right hand term is similarly finite since π(f ) < ∞, whilst the left hand term
is independent of f , and since λ is regular (given f ≥ 1), is bounded by Eλ [τα ]Var (u),
using (13.72).
Since for some finite M

τα
Ex [τα ] ≤ Ex [ f (Φn )] ≤ M Gα (x, f ),
n =1

this completes the proof of (iii).

Thus for a chain with an accessible atom, we have very little diﬃculty moving to
f -norm convergence. The simplicity of the results is exempliﬁed in the countable state
space case where the f -regularity of all states, guaranteed by Proposition 14.1.2, gives
us

Theorem 14.1.4. Suppose that Φ is an irreducible positive Harris aperiodic chain on

a countable space. Then if π(f ) < ∞, for all x, y ∈ X

P k (x, · ) − π
f → 0, k → ∞,

and
∞

P n (x, · ) − P n (y, · )
f < ∞.
n =1

14.2 f -Regularity and drift

It would seem at this stage that all we have to do is move, as we did in Chapter 13, to
strongly aperiodic chains; bring the f -properties proved in the previous section above
over from the split chain in this case; and then move to general aperiodic chains by
using the Nummelin splitting of the m-skeleton.
14.2. f -Regularity and drift 343

Somewhat surprisingly, perhaps, this recipe does not work in a trivially easy way.
The most difficult step in this approach is that when we go to a split chain it is necessary
to consider an m-skeleton, but we do not yet know if the skeletons of an f -regular chain
are also f -regular. Such is indeed the case and we will prove this key result in the next
section, by exploiting drift criteria.
This may seem to be a much greater effort than we needed for the Aperiodic Ergodic
Theorem: but it should be noted that we devoted all of Chapter 11 to the equivalence
of regularity and drift conditions in the case of f ≡ 1, and the results here actually
require rather less effort. In fact, much of the work in this chapter is based on the
results already established in Chapter 11, and the duality between drift and regularity
established there will serve us well in this more complex case.

14.2.1 The drift characterization of f -regularity

In order to establish f -regularity for a chain on a general state space without atoms,
we will use the following criterion, which is a generalization of the condition in (V2).
As for regular chains, we will ﬁnd that there is a duality between appropriate solutions
to (V3) and f -regularity.

f -Modulated drift towards C

(V3) For a function f : X → [1, ∞), a set C ∈ B(X), a constant b < ∞,
and an extended-real-valued function V : X → [0, ∞]

∆V (x) ≤ −f (x) + bIC (x) x ∈ X. (14.16)

The condition (14.16) is implied by the slightly stronger pair of bounds

"
V (x) x ∈ C c
f (x) + P V (x) ≤ (14.17)
b x∈C

with V bounded on C, and it is this form that is often veriﬁed in practice.

Those states x for which V (x) is ﬁnite when V satisﬁes (V3) will turn out to be
those f -regular states from which the distributions of Φ converge in f -norm. For this
reason the following generalization of Lemma 11.3.6 is important: we omit the proof
which is similar to that of Lemma 11.3.6 or Proposition 14.1.2.

Lemma 14.2.1. Suppose that Φ is ψ-irreducible. If (14.16) holds for a positive function
V which is ﬁnite at some x0 ∈ X, then the set Sf := {x ∈ X : V (x) < ∞} is absorbing
and full.

The power of (V3) largely comes from the following
344 f -Ergodicity and f -regularity

Theorem 14.2.2 (Comparison Theorem). Suppose that the non-negative functions

V, f, s satisfy the relationship

P V (x) ≤ V (x) − f (x) + s(x), x ∈ X.

Then for each x ∈ X, N ∈ Z+ , and any stopping time τ , we have

N
N
Ex [f (Φk )] ≤ V (x) + Ex [s(Φk )],
k =0 k =0
τ
−1 τ
−1
Ex f (Φk ) ≤ V (x) + Ex s(Φk ) .
k =0 k =0

Proof This follows from Proposition 11.3.2 on letting fk = f , sk = s.

The ﬁrst inequality in Theorem 14.2.2 bounds the mean value of f (Φk ), but says
nothing about the convergence of the mean value. We will see that the second bound
is in fact crucial for obtaining f -regularity for the chain, and we turn to this now.
In linking the drift condition (V3) with f -regularity we will consider the extended-
real-valued function GC (x, f ) deﬁned in (11.21) as

σC
GC (x, f ) = Ex f (Φk ) (14.18)
k =0

where C is typically f -regular or petite. The following characterization of f -regularity

shows that this function is both a solution to (14.16), and can be bounded using any
other solution V to (14.16). Together with Lemma 14.2.1, this result proves the equiv-
alence between (ii) and (iii) in the f -Norm Ergodic Theorem.

Theorem 14.2.3. Suppose that Φ is ψ-irreducible.

(i) If (V3) holds for a petite set C, then for any B ∈ B + (X) there exists c(B) < ∞
such that
τ
B −1
Ex f (Φk ) ≤ V (x) + c(B).
k =0

Hence if V is bounded on the set A, then A is f -regular.

(ii) If there exists one f -regular set C ∈ B+ (X), then C is petite and the function
V (x) = GC (x, f ) satisﬁes (V3) and is bounded on A for any f -regular set A.

Proof (i) Suppose that (V3) holds, with C a ψa -petite set. By the Comparison
Theorem 14.2.2, Lemma 11.3.10, and the bound

IC (x) ≤ ψa (B)−1 Ka (x, B)

14.2. f -Regularity and drift 345

in (11.27) we have for any B ∈ B + (X), x ∈ X,

τ
B −1 τ
B −1
Ex f (Φk ) ≤ V (x) + bEx IC (Φk )
k =0 k =0
τ
B −1
≤ V (x) + bEx ψa (B)−1 Ka (Φk , B)
k =0
∞
τ
B −1
= V (x) + bψa (B)−1 ai Ex IB (Φk +i )
i=0 k =0
∞

≤ V (x) + bψa (B)−1 iai .
i=0
∞
Since we can choose a so that ma = i=0 iai < ∞ from Proposition 5.5.6, the result
follows with c(B) = bψa (B)−1 ma . We then have

τ
B −1
sup Ex f (Φk ) ≤ sup V (x) + c(B),
x∈A x∈A
k =0

and so if V is bounded on A, it follows that A is f -regular.

(ii) If an f -regular set C ∈ B + (X) exists, then it is also regular and hence petite
from Proposition 11.3.8. The function GC (x, f ) is clearly positive, and bounded on any
f -regular set A. Moreover, by Theorem 11.3.5 and f -regularity of C it follows that
condition (V3) holds with V (x) = GC (x, f ).

14.2.2 f -Regular sets

Theorem 14.2.3 gives a characterization of f -regularity in terms of a drift condition.
The next result gives such a characterization in terms of the return times to petite sets,
and generalizes Proposition 11.3.14: f -regular sets in B + (X) are precisely those petite
sets which are “self-f -regular”.

Theorem 14.2.4. When Φ is a ψ-irreducible chain, the following are equivalent:

(i) C ∈ B(X) is petite and

τ
C −1
sup Ex f (Φk ) < ∞; (14.19)
x∈C
k =0

(ii) C is f -regular and C ∈ B+ (X).

Proof To see that (i) implies (ii), suppose that C is petite and satisﬁes (14.19).
By Theorem 11.3.5 we may ﬁnd a constant b < ∞ such that (V3) holds for GC (x, f ).
It follows from Theorem 14.2.3 that C is f -regular.
The set C is Harris recurrent under the conditions of (i), and hence lies in B + (X)
by Proposition 9.1.1.
346 f -Ergodicity and f -regularity

Conversely, if C is f -regular
τ −1 then it is also petite from Proposition 11.3.8, and if
C ∈ B + (X) then supx∈C Ex [ kC=0 f (Φk )] < ∞ by the deﬁnition of f -regularity.

As an easy corollary to Theorem 14.2.3 we obtain the following generalization of
Proposition 14.1.2.
Theorem 14.2.5. If there exists an f -regular set C ∈ B+ (X), then there exists an
increasing sequence {Sf (n) : n ∈ Z+ } of f -regular sets whose union is full. Hence there
is a decomposition
X = Sf ∪ N (14.20)
where the set Sf is full and absorbing and Φ restricted to Sf is f -regular.

Proof By f -regularity and positivity of C we have, by Theorem 14.2.3 (ii),

that (V3) holds for the function V (x) = GC (x, f ) which is bounded on C, and by
Lemma 14.2.1 we have that V is ﬁnite π-a.e.
The required sequence of f -regular sets can then be taken as
Sf (n) := {x : V (x) ≤ n}, n≥1
by Theorem 14.2.3. It is a consequence of Lemma 14.2.1 that Sf = ∪Sf (n) is absorbing.

We now give a characterization of f -regularity using the Comparison Theo-
rem 14.2.2.
Theorem 14.2.6. Suppose that Φ is ψ-irreducible. Then the chain is f -regular if and
only if (V3) holds for an everywhere ﬁnite function V , and every sublevel set of V is
then f -regular.

Proof From Theorem 14.2.3 (i) we see that if (V3) holds for a finite-valued V then
each sublevel set of V is f -regular. This establishes f -regularity of Φ.
Conversely, if Φ is f -regular then it follows that an f -regular set C ∈ B + (X) ex-
ists. The function V (x) = GC (x, f ) is everywhere finite and satisfies (V3), by Theo-
rem 14.2.3 (ii).

As a corollary to Theorem 14.2.6 we obtain a final characterization of f -regularity
of Φ, this time in terms of petite sets:
Theorem 14.2.7. Suppose that Φ is ψ-irreducible. Then the chain is f -regular if and
only if there exists a petite set C such that the expectation
τ
C −1
Ex f (Φk )
k =0

is ﬁnite for each x and uniformly bounded for x ∈ C.

Proof If the expectation is ﬁnite as described in the theorem, then by Theo-

rem 11.3.5 the function GC (x, f ) is everywhere ﬁnite and satisﬁes (V3) with the petite
set C. Hence from Theorem 14.2.6 we see that the chain is f -regular.
For the converse take C to be any f -regular set in B + (X).

14.2. f -Regularity and drift 347

14.2.3 f -Regularity and m-skeletons

One advantage of the form (V3) over (14.17) is that, once f -regularity of Φ is estab-
lished, we may easily iterate (14.16) to obtain

m −1
m −1
P m V (x) ≤ V (x) − P if + P i IC (x) x ∈ X. (14.21)
i=0 i=0

This is essentially of the same form as (14.16), and provides an approach to f -regularity
for the m-skeleton which will give us the desired equivalence between f -regularity for
Φ and its skeletons.
To apply Theorem 14.2.3 and (14.21) to obtain an equivalence between f -properties
m −1 i
of Φ and its skeletons we must replace the function i=0 P IC with the indicator
function of a petite set. The following result shows that this is possible whenever C is
petite and the chain is aperiodic.
Let us write for any positive function g on X,

m −1
g (m ) := P i g. (14.22)
i=0

Lemma 14.2.8. If Φ is aperiodic and if C ∈ B(X) is a petite set, then for any ε > 0
and m ≥ 1 there exists a petite set Cε such that
(m )
IC ≤ mIC ε + ε.

Proof Since Φ is aperiodic, it follows from the deﬁnition of the period given in
(5.40) and the fact that petite sets are small, proven in Proposition 5.5.7, that for a
non-trivial measure ν and some k ∈ Z+ , we have the simultaneous bound
P k m −i (x, B) ≥ IC (x)ν(B), x ∈ X, B ∈ B(X), 0 ≤ i ≤ m − 1.
Hence we also have
P k m (x, B) ≥ P i IC (x)ν(B), x ∈ X, B ∈ B(X), 0 ≤ i ≤ m − 1,
which shows that
P k m (x, · ) ≥ IC (x)m−1 ν.
(m )

(m )
The set Cε = {x : IC (x) ≥ ε} is therefore νk -small for the m-skeleton, where νk =
εm−1 ν, whenever this set is non-empty. Moreover, C ⊂ Cε for all ε < 1.
(m ) (m )
Since IC ≤ m everywhere, and since IC (x) < ε for x ∈ Cεc , we have the bound
(m )
IC ≤ mIC ε + ε

We can now put these pieces together and prove the desired solidarity for Φ and its
skeletons.
Theorem 14.2.9. Suppose that Φ is ψ-irreducible and aperiodic. Then C ∈ B+ (X) is
f -regular if and only if it is f (m ) -regular for any one, and then every, m-skeleton chain.
348 f -Ergodicity and f -regularity

Proof If C is f (m ) -regular for an m-skeleton then, letting τBm denote the hitting
time for the skeleton, we have by the Markov property, for any B ∈ B+ (X),

τ
B −1 m τ
B −1 m
m m
−1 −1
i
Ex P f (Φk m ) = Ex f (Φk m +i )
k =0 i=0 k =0 i=0
τ
B −1
≥ Ex f (Φj ) .
j =0

By the assumption of f (m ) -regularity, the left hand side is bounded over C and hence
the set C is f -regular.
Conversely, if C ∈ B+ (X) is f -regular then it follows from Theorem 14.2.3 that (V3)
holds for a function V which is bounded on C.
By repeatedly applying P to both side of this inequality we obtain as in (14.21)
(m )
P m V ≤ V − f (m ) + bIC .

By Lemma 14.2.8 we have for a petite set C

PmV ≤ V − f (m ) + bmIC + 1
2
≤ V − 12 f (m ) + bmIC ,

and thus (V3) holds for the m-skeleton. Since V is bounded on C, we see from Theo-
rem 14.2.3 that C is f (m ) -regular for the m-skeleton.

As a simple but critical corollary we have
Theorem 14.2.10. Suppose that Φ is ψ-irreducible and aperiodic. Then Φ is f -regular
if and only if each m-skeleton is f (m ) -regular.

The importance of this result is that it allows us to shift our attention to skeleton
chains, one of which is always strongly aperiodic and hence may be split to form an
artiﬁcial atom; and this of course allows us to apply the results obtained in Section 14.1
for chains with atoms.
The next result follows this approach to obtain a converse to Proposition 14.1.1,
thus extending Proposition 14.1.2 to the non-atomic case.
Theorem 14.2.11. Suppose that Φ is positive recurrent and π(f ) < ∞. Then there
exists a sequence {Sf (n)} of f -regular sets whose union is full.

Proof We need only look at a split chain corresponding to the m-skeleton chain,
which possess an f (m ) -regular atom by Proposition 14.1.2. It follows from Proposi-
tion 14.1.2 that for the split chain the required sequence of f (m ) -regular sets exist,
and then following the proof of Proposition 11.1.3 we see that for the m-skeleton an
increasing sequence {Sf (n)} of f (m ) -regular sets exists whose union is full.
From Theorem 14.2.9 we have that each of the sets {Sf (n)} is also f -regular for Φ
and the theorem is proved.

14.3. f -Ergodicity for general chains 349

14.3 f -Ergodicity for general chains

14.3.1 The aperiodic f -ergodic theorem
We are now, at last, in a position to extend the atom-based f -ergodic results of Sec-
tion 14.1 to general aperiodic chains.
We ﬁrst give an f -ergodic theorem for strongly aperiodic chains. This is an easy
consequence of the result for chains with atoms.
Proposition 14.3.1. Suppose that Φ is strongly aperiodic, positive recurrent, and sup-
pose that f ≥ 1.
(i) If π(f ) = ∞, then P k (x, f ) → ∞ as k → ∞ for all x ∈ X.
(ii) If π(f ) < ∞, then almost every state is f -regular and for any f -regular state
x∈X

P k (x, · ) − π
f → 0, k → ∞.
(iii) If Φ is f -regular, then Φ is f -ergodic.

Proof (i) By positive recurrence we have for x lying in the maximal Harris set H,
and any m ∈ Z+ ,
lim inf P k (x, f ) ≥ lim inf P k (x, m ∧ f ) = π(m ∧ f ).
k →∞ k →∞

Letting m → ∞ we see that P (x, f ) → ∞ for these x. For arbitrary x ∈ X we choose

n0 so large that P n 0 (x, H) > 0. This is possible by ψ-irreducibility. By Fatou’s Lemma

we then have the bound
$ %
k
lim inf P (x, f ) = lim inf P n 0 +k
(x, f ) ≥ P n 0 (x, dy) lim inf P k (x, f ) = ∞.
k →∞ k →∞ H k →∞

Result (ii) is now obvious using the split chain, given the results for a chain possessing
an atom, and (iii) follows directly from (ii).

We again obtain f -ergodic theorems for general aperiodic Φ by considering the m-
skeleton chain. The results obtained in the previous section show that when Φ has
appropriate f -properties then so does each m-skeleton. For aperiodic chains, there
always exists some m ≥ 1 such that the m-skeleton is strongly aperiodic, and hence
we may apply Theorem 14.3.1 to the m-skeleton chain to obtain f -ergodicity for this
skeleton. This then carries over to the process by considering the m distinct skeleton
chains embedded in Φ.
The following lemma allows us to make the desired connections between Φ and its
skeletons.
Lemma 14.3.2. (i) For any f ≥ 1 we have for n ∈ Z+ ,

P n (x, · ) − π
f ≤
P k m (x, ·) − π(·)
f ( m ) ,
for k satisfying n = km + i with 0 ≤ i ≤ m − 1.
(ii) If for some m ≥ 1 and some x ∈ X we have
P k m (x, · ) − π
f ( m ) → 0 as k → ∞,
then
P k (x, · ) − π
f → 0 as k → ∞.
(iii) If the m-skeleton is f (m ) -ergodic, then Φ itself is f -ergodic.
350 f -Ergodicity and f -regularity

Proof Under the conditions of (i) let |g| ≤ f and write any n ∈ Z+ as n = km + i
with 0 ≤ i ≤ m − 1. Then
|P n (x, g) − π(g)| = |P k m (x, P i g) − π(P i g)|
≤
P k m (x, ·) − π(·)
f ( m ) .
This proves (i) and the remaining results then follow.

This lemma and the ergodic theorems obtained for strongly aperiodic chains ﬁnally
give the result we seek.
Theorem 14.3.3. Suppose that Φ is positive recurrent and aperiodic.
(i) If π(f ) = ∞, then P k (x, f ) → ∞ for all x.
(ii) If π(f ) < ∞, then the set Sf of f -regular sets is full and absorbing, and if x ∈ Sf
then
P k (x, · ) − π
f → 0, as k → ∞.
(iii) If Φ is f -regular, then Φ is f -ergodic. Conversely, if Φ is f -ergodic, then Φ
restricted to a full absorbing set is f -regular.

Proof Result (i) follows as in the proof of Proposition 14.3.1 (i).

If π(f ) < ∞, then there exists a sequence of f -regular sets {Sf (n)} whose union is
full. By aperiodicity, for some m, the m-skeleton is strongly aperiodic and each of the
sets {Sf (n)} is f (m ) -regular. From Proposition 14.3.1 we see that the distributions of
the m-skeleton converge in f (m ) -norm for initial x ∈ Sf (n).
This and Lemma 14.3.2 proves (ii). The ﬁrst part of (iii) is then a simple conse-
quence; the converse is also immediate from (ii) since f -ergodicity implies π(f ) < ∞.

Note that if Φ is f -ergodic then Φ may not be f -regular: this is already obvious in
the case f = 1.

14.3.2 Sums of transition probabilities

We now refine the ergodic theorem Theorem 14.3.3 to give conditions under which the
sum
∞

P n (x, · ) − π
f (14.23)
n =1
is finite.
The first result of this kind requires f -regularity of the initial probability measures
λ, µ. For practical implementation, note that if (V3) holds for a petite set C and a
function V , and if λ(V ) < ∞, then from Theorem 14.2.3 (i) we see that the measure λ
is f -regular.
Theorem 14.3.4. Suppose Φ is an aperiodic positive Harris chain. If π(f ) < ∞, then
for any f -regular set C ∈ B + (X) there exists Mf < ∞ such that for any f -regular initial
distributions λ, µ,
∞
λ(dx)µ(dy)
P n (x, · ) − P n (y, · )
f ≤ Mf (λ(V ) + µ(V ) + 1) < ∞ (14.24)
n =1
14.3. f -Ergodicity for general chains 351

where V ( · ) = GC ( · , f ).

Proof Consider ﬁrst the strongly aperiodic case, and construct a split chain Φ̌
using an f -regular set C. The theorem is valid from Theorem 14.1.3 for the split chain,
since the split measures µ∗ , λ∗ are f -regular for Φ̌. The bound on the sum can be taken
as
∞
λ∗ (dx)µ∗ (dy)
P̌ n (x, · ) − P̌ n (y, · )
f < Mf (λ∗ (V ) + µ∗ (V ) + 1)
n =1

with V = ǦC 0 ∪C 1 ( · , f ), since C0 ∪ C1 ∈ B + (X̌) is f -regular for the split chain.

Since the result is a total variation result it is then obviously valid when restricted
to the original chain, as in (13.57). Using the identity

∗
λ (dx)ǦC 0 ∪C 1 (x, f ) = λ(dx)GC (x, f ),

and the analogous identity for µ, we see that the required bound holds in the strongly
aperiodic case.
In the arbitrary aperiodic case we can apply Lemma 14.3.2 to move to a skeleton
chain, as in the proof of Theorem 14.3.3.

The most interesting special case of this result is given in the following theorem.

Theorem 14.3.5. Suppose Φ is an aperiodic positive Harris chain and that π is f -

regular. Then π(f ) < ∞ and for any f -regular set C ∈ B + (X) there exists Bf < ∞
such that for any f -regular initial distribution λ
∞

λP n − π
f ≤ Bf (λ(V ) + 1). (14.25)
n =1

where V ( · ) = GC ( · , f ).

Our ﬁnal f -ergodic result, for quite arbitrary positive recurrent chains is given for
completeness in

Theorem 14.3.6. (i) If Φ is positive recurrent and if π(f ) < ∞, then there exists
a full set Sf , a cycle {Di : 1 ≤ i ≤ d} contained in Sf , and probabilities {πi : 1 ≤
i ≤ d} such that for any x ∈ Dr ,

P n d+r (x, · ) − πr
f → 0, n → ∞. (14.26)

(ii) If Φ is f -regular, then for all x,

d

d−1 P n d+r (x, · ) − π
f → 0, n → ∞. (14.27)
r =1

352 f -Ergodicity and f -regularity

14.3.3 A criterion for ﬁniteness of π(f )

From the Comparison Theorem 14.2.2 and the ergodic theorems presented above we
also obtain the following criterion for ﬁniteness of moments.

Theorem 14.3.7. Suppose that Φ is positive recurrent with invariant probability π,

and suppose that V, f and s are non-negative, ﬁnite-valued functions on X such that

P V (x) ≤ V (x) − f (x) + s(x)

for every x ∈ X. Then π(f ) ≤ π(s).

Proof For π-a.e. x ∈ X we have from the Comparison Theorem 14.2.2, Theo-
rem 14.3.6 and (if π(f ) = ∞) the aperiodic version of Theorem 14.3.3, whether or not
π(s) < ∞,

1 1
N N
π(f ) = lim Ex [f (Φk )] ≤ lim Ex [s(Φk )] = π(s).
N →∞ N N →∞ N
k =1 k =1

The criterion for π(X) < ∞ in Theorem 11.0.1 is a special case of this result. How-
ever, it seems easier to prove for quite arbitrary non-negative f, s using these limiting
results.

14.4 f -Ergodicity of speciﬁc models

14.4.1 Random walk on R+ and storage models
Consider random walk on a half line given by Φn = [Φn −1 + Wn ]+ , and assume that
the increment distribution Γ has negative ﬁrst moment and a ﬁnite absolute moment
σ (k ) of order k.
Let us choose the test function V (x) = xk . Then using the binomial expansion the
drift ∆V is given for x > 0 by
∞
∆V (x) = −x Γ(dy)(x + y)k − xk
∞ (14.28)
≤ −x
Γ(dy)y kxk −1 + cσ (k ) xk −2 + d

for some ﬁnite c, d. We can rewrite (14.28) in the form of (V3); namely for some c > 0,
and large enough x
P (x, dy)y k ≤ xk − c xk −1 .

From this we may prove the following

Proposition 14.4.1. If the increment distribution Γ has mean β < 0 and finite (k+1)st
moment, then the associated random walk on a half line is |x|k -regular. Hence the
process Φ admits a stationary measure π with finite moments of order k; and with
fk (y) = y k + 1,
14.4. f -Ergodicity of specific models 353

(i) for all λ such that λ(dx)xk +1 < ∞,

λ(dx)
P n (x, · ) − π
f k → 0, n → ∞;

(ii) for some Bf < ∞, and any initial distribution λ,

∞

λ(dx)
P n (x, · ) − π
f k −1 ≤ Bf 1 + xk λ(dx) .
n =0

Proof The calculations preceding the proposition show that for some c0 > 0,
d0 < ∞, and a compact set C ⊂ R+ ,

P Vi+1 (x) ≤ Vi+1 (x) − c0 fi (x) + d0 IC (x) 0 ≤ i ≤ k, (14.29)

where Vj (x) = xj , fj (x) = xj + 1. Result (i) is then an immediate consequence of the

f -Norm Ergodic Theorem.
To prove (ii) apply (14.29) with i = k and Theorem 14.3.7 to conclude that π(Vk ) <
∞. Applying (14.29) again with i = k − 1 we see that π is fk −1 -regular and then (ii)
follows from the f -Norm Ergodic Theorem.

It is well known that the invariant measure for a random walk on the half line has
moments of order one degree lower than those of the increment distribution, but this is
a particularly simple proof of this result.
For the Moran dam model or the queueing models developed in Chapter 2, this
result translates directly into a condition on the input distribution. Provided the mean
input is less than the mean output between input times, then there is a finite invariant
measure: and this has a finite k th moment if the input distribution has finite (k + 1)st
moment.

14.4.2 Bilinear models

The random walk model in the previous section can be generalized in a variety of ways,
as we have seen many times in the applications above.
For illustrative purposes we next consider the scalar bilinear model

Xk +1 = θXk + bWk +1 Xk + Wk +1 (14.30)

for which we proved boundedness in probability in Section 12.5.2. For simplicity, we

take E[W ] = 0.
To obtain a solution to (V3), assume that W has ﬁnite variance. Then for the test
function V (x) = x2 , we observe that by independence

E[(Xk +1 )2 | Xk = x] ≤ θ2 + b2 E[Wk2+1 ] x2 + (2bx + 1)E[Wk2+1 ]. (14.31)

Since this V is a coercive function on R, it follows that (V3) holds with the choice of

f (x) = 1 + δV (x)
354 f -Ergodicity and f -regularity

for some δ > 0 provided

θ2 + b2 E[Wk2 ] < 1. (14.32)
Under this condition it follows just as in the LSS(F ) model that provided the noise
process forces this model to be a T-chain (for example, if the conditions of Proposi-
tion 7.1.3 hold) then (14.32) is a condition not just for positive Harris recurrence, but
for the existence of a second order stationary model with ﬁnite variance: this is precisely
the interpretation of π(f ) < ∞ in this case.
A more general version of this result is

Proposition 14.4.2. Suppose that (SBL1) and (SBL2) hold and

E[Wnk ] < ∞. (14.33)

Then the bilinear model is positive Harris, the invariant measure π also has ﬁnite k th
moments (that is, satisﬁes xk π(dx) < ∞), and

P n (x, · ) − π
x k → 0, n → ∞.

In the next chapter we will show that there is in fact a geometric rate of convergence
in this result. This will show that, in essence, the same drift condition gives us ﬁniteness
of moments in the stationary case, convergence of time-dependent moments and some
conclusion about the rate at which the moments become stationary.

14.5 A key renewal theorem

One of the most interesting applications of the ergodic theorems in these last two
chapters is a probabilistic proof of the Key Renewal Theorem.
n
As in Section 3.5.3, let Zn := i=0 Yi , where {Y1 , Y2 , . . .} is a sequence of independent
and identical random variables with distribution Γ on R+ , and Y0 is a further ∞ indepen-
dent random variable with distribution Γ0 also on R+ ; and let U ( · ) = n =0 Γn ∗ ( · ) be
the associated renewal measure.
Renewal theorems concern the limiting behavior of U ; speciﬁcally, they concern
conditions under which
∞
Γ0 ∗ U ∗ f (t) → β −1 f (s) ds (14.34)
0
∞
as t → ∞, where β = 0 sΓ(ds) and f and Γ0 are an appropriate function and measure
respectively.
With minimal assumptions about Γ we have Blackwell’s Renewal Theorem.

Theorem 14.5.1. Provided Γ has a ﬁnite mean β and is not concentrated on a lattice
nh, n ∈ Z+ , h > 0, then for any interval [a, b] and any initial distribution Γ0

Γ0 ∗ U [a + t, b + t] → β −1 (b − a), t → ∞. (14.35)
14.5. A key renewal theorem 355

Proof This result is taken from Feller ([115], p. 360) and its proof is not one we
pursue here. We do note that it is a special case of the general Key Renewal Theorem,
which states that under these conditions on Γ, (14.34) holds for all bounded non-negative
functions f which are directly Riemann integrable, for which again see Feller ([115], p.
361); for then (14.35) is the special case with f (s) = I[a,b] (s).

This result shows us the pattern for renewal theorems: in the limit, the measure U
approximates normalized Lebesgue measure.
We now show that one can trade off properties of Γ against properties of f (and to
some extent properties of Γ0 ) in asserting (14.34). We shall give a proof, based on the
ergodic properties we have been considering for Markov chains, of the following Uniform
Key Renewal Theorem.
Theorem 14.5.2. Suppose that Γ has a finite mean β and is spread out (as defined in
(RW2)).
(i) For any initial distribution Γ0 we have the uniform convergence
∞
lim sup |Γ0 ∗ U ∗ g(t) − β −1 g(s)ds| = 0 (14.36)
t→∞ |g |≤f 0

provided the function f ≥ 0 satisﬁes

f is bounded; (14.37)
f is Lebesgue integrable; (14.38)
f (t) → 0, t → ∞. (14.39)

(ii) In particular, for any bounded interval [a, b] and Borel sets B

lim sup |Γ0 ∗ U (t + B) − β −1 µL e b (B)| = 0. (14.40)

t→∞ B ⊆[a,b]

(iii) For any initial distribution Γ0 which is absolutely continuous, the convergence
(14.36) holds for f satisfying only (14.37) and (14.38).

Proof The proof of this set of results occupies the remainder of this section, and
contains a number of results of independent interest.

Before embarking on this proof, we note explicitly that we have accomplished a
number of tradeoffs in this result, compared with the Blackwell Renewal Theorem.
By considering spread-out distributions, we have exchanged the direct Riemann in-
tegrability condition for the simpler and often more verifiable smoothness conditions
(14.37)-(14.39). This is exemplified by the fact that (14.40) allows us to consider the
renewal measure of any bounded Borel set, whereas the general Γ version restricts us
to intervals as in (14.35). The extra benefits of smoothness of Γ0 in removing (14.39)
as a condition are also in this vein.
Moreover, by moving to the class of spread-out distributions, we have introduced a
uniformity into the Key Renewal Theorem which is analogous in many ways to the total
variation norm result in Markov chain limit theory. This analogy is not coincidental:
356 f -Ergodicity and f -regularity

as we now show, these results are all consequences of precisely that total variation
convergence for the forward recurrence time chain associated with this renewal process.
Recall from Section 3.5.3 the forward recurrence time process

V + (t) := inf(Zn − t : Zn ≥ t), t ≥ 0.

We will consider the forward recurrence time δ-skeleton V + δ = V (nδ), n ∈ Z+ for

that process, and denote its n-step transition law by P (x, · ). We showed that for
nδ

sufficiently small δ, when Γ is spread out, then (Proposition 5.3.3) the set [0, δ] is a
small set for V + +
δ , and (Proposition 5.4.7) V δ is also aperiodic.
It is trivial for this chain to see that (V2) holds with V (x) = x, so that the chain
is regular from Theorem 11.3.15, and if Γ0 has a finite mean, then Γ0 is regular from
Theorem 11.3.12.
This immediately enables us to assert from Theorem 13.4.4 that, if Γ1 , Γ2 are two
initial measures both with finite mean, and if Γ itself is spread out with finite mean,
∞

Γ1 P n δ ( · ) − Γ2 P n δ ( · )
< ∞. (14.41)
n =0

The crucial corollary to this example of Theorem 13.4.4, which leads to the Uniform
Key Renewal Theorem is
Proposition 14.5.3. If Γ is spread out with ﬁnite mean, and if Γ1 , Γ2 are two initial
measures both with ﬁnite mean, then
∞

Γ1 ∗ U − Γ2 ∗ U
:= |Γ1 ∗ U (dt) − Γ2 ∗ U (dt)| < ∞. (14.42)
0

Proof By interpreting the measure Γ0 P s as an initial distribution, observe that

for A ⊆ [t, ∞), and ﬁxed s ∈ [0, t), we have from the Markov property at s the identity

Γ0 ∗ U (A) = Γ0 P s ∗ U (A − s). (14.43)

Using this we then have

∞
|Γ1 ∗ U (dt) − Γ2 ∗ U (dt)|
0 ∞
= n =0 [n δ,(n +1)δ ) |Γ1 ∗ U (dt) − Γ2 ∗ U (dt)|

which is ﬁnite from (14.41).

From this we can prove a precursor to Theorem 14.5.2.
14.5. A key renewal theorem 357

Proposition 14.5.4. If Γ is spread out with ﬁnite mean, and if Γ1 , Γ2 are two initial
measures both with ﬁnite mean, then

sup |Γ1 ∗ U ∗ g(t) − Γ2 ∗ U ∗ g(t)| → 0, t→∞ (14.45)

|g |≤f

for any f satisfying (14.37)-(14.39).

Proof Suppose that ε is arbitrarily small but ﬁxed. Using Proposition 14.5.3 we
can ﬁx T such that ∞
|(Γ1 ∗ U − Γ2 ∗ U )(du)| ≤ ε. (14.46)
T

If f satisﬁes (14.39), then for all suﬃciently large t,

f (t − u) ≤ ε, u ∈ [0, T ];

:= ε

which is arbitrarily small, from (14.44), thus proving the result.

This would prove Theorem 14.5.2 (a) if the equilibrium measure
t
Γe [0, t] = β −1 Γ(u, ∞)du
0

deﬁned in (10.36) were itself regular, since we have that Γe ∗ U ( · ) = β −1 µL e b ( · ),

which gives the right hand side of (14.36). But as can be veriﬁed by direct calculation,
Γe is regular if and only if Γ has a ﬁnite second moment, exactly as is the case in
Theorem 13.4.5 for general chains with atoms.
However, we can reach the following result, of which Theorem 14.5.2 (a) is a corollary,
using a truncation argument.

Proposition 14.5.5. If Γ is spread out with ﬁnite mean, and if Γ1 , Γ2 are any two
initial measures, then

sup |Γ1 ∗ U ∗ g(t) − Γ2 ∗ U ∗ g(t)| → 0, t→∞

|g |≤f

for any f satisfying (14.37)–(14.39).

358 f -Ergodicity and f -regularity

Proof For ﬁxed v, let Γv (A) := Γ(A)/Γ[0, v] for all A ⊆ [0, v] denote the truncation
of Γ(A) to [0, v].
For any g with |g| ≤ f ,
|Γ1 ∗ U ∗ g(t) − Γv1 ∗ U ∗ g(t)| ≤
Γ1 − Γv1
sup U ∗ f (x) (14.48)
x

which can be made smaller than ε by choosing v large enough, provided supx U ∗ f (x) <
∞. But if t > T , from (14.47), with Γ1 = δ0 , Γ2 = Γve and g = f ,
U ∗ f (t) = δ0 ∗ U ∗ f (t)

≤ Γve ∗ U ∗ f (t) + ε
−1 (14.49)
≤ Γe [0, v] Γe ∗ U ∗ f (t) + ε

−1 ∞
≤ Γe [0, v] β −1 0 f (u)du + ε

which is indeed ﬁnite, by (14.38).

The result then follows from Proposition 14.5.4 and (14.48) by a standard triangle
inequality argument.

Theorem 14.5.2 (b) is a simple consequence of Theorem 14.5.2 (a), but to prove
Theorem 14.5.2 (c), we need to reﬁne the arguments above a little.
Suppose that (14.39) does not hold, and write
Aε (t) := {u ∈ [0, T ] : f (t − u) ≥ ε},
where ε and T are as in (14.46). We then have
T
|(Γ1 ∗ U − Γ2 ∗ U )(du)|f (t − u)
0

T
≤ |(Γ1 ∗ U − Γ2 ∗ U (du)|f (t − u)I[A ε (t)] c (u)
0 (14.50)
T
+ 0
(Γ1 ∗ U + Γ2 ∗ U )(du)f (t − u)IA ε (t) (u)

≤ ε
Γ1 ∗ U − Γ2 ∗ U
+ d(Γ1 + Γ2 ) ∗ U (Aε (t)).
If we now assume that the measure Γ1 + Γ2 to be absolutely continuous with respect
to µL e b , then, so is (Γ1 + Γ2 ) ∗ U ([115], p. 146).
Now since f is integrable, as t → ∞ for ﬁxed T, ε we must have µL e b (Aε (t)) → 0.
But since T is ﬁxed, we have that both µL e b [0, T ] < ∞ and (Γ1 + Γ2 ) ∗ U [0, T ] < ∞,
and it is a standard result of measure theory ([152], p. 125) that
(Γ1 + Γ2 ) ∗ U (Aε (t)) → 0, t → ∞.
We can thus make the last term in (14.50) arbitrarily small for large t, even without
assuming (14.39); now reconsidering (14.47), we see that Proposition 14.5.4 holds with-
out (14.39), provided we assume the existence of densities for Γ1 and Γ2 , and then
Theorem 14.5.2 (c) follows by the truncation argument of Proposition 14.5.5.
14.6. Commentary* 359

14.6 Commentary*
These results are largely recent. Although the question of convergence of Ex [f (Φk )] for
general f occurs in, for example, Markov reward models [25], most of the literature
on Harris chains has concentrated on convergence only for f ≤ 1 as in the previous
chapter. The results developed here are a more complete form of those in Meyn and
Tweedie [277], but there the general aperiodic case was not developed: only the strongly
aperiodic case is considered in detail. A more embryonic form of the convergence in
f -norm, indicating that if π(f ) < ∞ then Ex [f (Φk )] → π(f ), appeared as Theorem 2
of Tweedie [400].
Nummelin [303] considers f -regularity, but does not go on to apply the resulting
concepts to f -ergodicity, although in fact there are connections between the two which
are implicit through the Regenerative Decomposition in Nummelin and Tweedie [307].
That Theorem 14.1.1 admits a converse, so that when π(f ) < ∞ there exists a
sequence of f -regular sets {Sf (n)} whose union is full, is surprisingly deep. For general
state space chains, the question of the existence of f -regular sets requires the splitting
technique as did the existence of regular sets in Chapter 11. The key to their use
in analyzing chains which are not strongly aperiodic lies in the duality with the drift
condition (V3), and this is given here for the first time.
The fact that (V3) gives a criterion for finiteness of π(f ) was observed in
Tweedie [400]. Its use for asserting the second order stationarity of bilinear and other
time series models was developed in Feigin and Tweedie [111], and for analyzing random
walk in [401]. Related results on the existence of moments are also in Kalashnikov [188].
The application to the generalized Key Renewal Theorem is particularly satisfying.
By applying the ergodic theorems above to the forward recurrence time chain V + δ ,
we have “leveraged” from the discrete time renewal theory results of Section 13.2 to
the continuous time ones through the general Markov chain results. This Markovian
approach was developed in Arjas et al. [8], and the uniformity in Theorem 14.5.2,
which is a natural consequence of this approach, seems to be new there. The simpler
form without the uniformity, showing that one can exchange spread-outness of Γ for the
weaker conditions on f dates back to the original renewal theorems of Smith [361, 362,
363], whilst Breiman [47] gives a form of Theorem 14.5.2 (b). An elegant and different
approach is also possible through Stone’s Decomposition of U [374], which shows that
when Γ is spread out,
U = Uf + Uc
where Uf is a finite measure, and Uc has a density p with respect to µL e b satisfying
p(t) → β −1 as t → ∞.
The convergence, or rather summability, of the quantities

P n (x, · ) − π
f
leads naturally to a study of rates of convergence, and this is carried out in Nummelin
and Tuominen [306]. Building on this, Tweedie [401] uses similar approaches to those
in this chapter to derive drift criteria for more subtle rate of convergence results: the
interested reader should note the result of Theorem 3 (iii) of [401]. There it is shown
(essentially by using the Comparison Theorem) that if (V3) holds for a function f such
that
f (x) ≥ Ex [r(τC )], x ∈ Cc
360 f -Ergodicity and f -regularity

where r(n) is some function on Z+ , then

V (x) ≥ Ex [r0 (τC )], x ∈ Cc
n
where r0 (n) = 1 r(j). If C is petite, then this is (see [306] or Theorem 4 (iii) of [401])
enough to ensure that
r(n)
P n (x, · ) − π
→ 0, n→∞
so that (V3) gives convergence at rate r(n)−1 in the ergodic theorem.
Applications of these ideas to the Key Renewal Theorem are also contained in [306].
The special case of r(n) = rn is explored thoroughly in the next two chapters.
The rate results above are valuable also in the case of r(n) = nk since then r0 (n) is
asymptotically nk +1 . This allows an inductive approach to the level of convergence rate
achieved; but this more general topic is not pursued in this book. The interested reader
will ﬁnd the most recent versions, building on those of Nummelin and Tuominen [306],
in [393].

Commentary for the second edition: Several topics in this chapter have been
extended, or refined in specific applications, since publication of the first edition.
f -Regularity in queueing networks is the subject of [81, 264, 268, 266] – see also the
monograph [267]. The Comparison Theorem 14.2.2 is implicit in the stability analysis of
Tassiulas’s MaxWeight scheduling algorithm, now popular for routing and scheduling in
queueing networks [383, 137, 382, 268, 266, 267], and a version of Theorem 14.2.2 is used
in [145] in an early “heavy traffic” analysis of a queueing network. The Comparison
Theorem is also a component of the approach to network stability and performance
approximation developed in [273, 226, 223, 30, 31, 267]. In [81] the assumptions of
[393] are verified, provided an associated fluid model for the network is stable. This
establishes f -regularity for the network for polynomial f , as well as polynomial rates
of convergence in the f -Norm Ergodic Theorem 14.0.1.
Theory surrounding f -regularity is applied in the theory of controlled Markov models
(Markov decision processes, or MDPs) in [262, 261, 67, 263, 42, 267]. In particular, [42]
characterizes a notion of uniform f -regularity for MDPs.
Recently, Jarner and Roberts introduced a new drift criterion that can be used
to simplify the verification of polynomial rates of convergence [180]. Extensions of this
approach as well as explicit bounds on the rate of convergence are obtained in [126, 100].

The drift criterion of [180] can be expressed as an intermediate between the drift
criteria (V3) and (V4):

Drift criterion of Jarner and Roberts

(V4 ) There exists an extended-real-valued function V : X → [1, ∞],
a measurable set C, and constants β > 0, > 0, b < ∞, satisfying

∆V (x) ≤ −βV (x) + bIC (x), x ∈ X. (14.51)

14.6. Commentary* 361

For example, if the inter-arrival times in the GI/M/1 queue possess a ﬁnite nth
moment, then (V4 ) holds with V (x) = 1 + xn and = 1 − n−1 .
We consider the special case = 12 to illustrate the application of (V4 ):

Proposition 14.6.1. Suppose that the chain Φ is ψ-irreducible and aperiodic, and
that the drift condition (V4 ) holds for some extended-real-valued function V satisfying
V (x0 ) < ∞ for some x0 ∈ X, with C petite, and = 12 . Then there exists a ﬁnite
constant B1 such that for all x ∈ SV ,
∞
-

P n (x, · ) − π
≤ B1 V (x). (14.52)
n =0

Proof We establish the assumptions of part (iii) of the f -Norm Ergodic Theo-
rem 14.0.1, with f ≡ 1. For this it is suﬃcient to show that the function U := 2β −1 V 2
1

satisﬁes Foster’s criterion, and that π(U ) < ∞.

1
Finiteness of π(V 2 ) follows from the assumed drift condition and the Comparison
Theorem, which gives the explicit bound π(V 2 ) ≤ β −1 bπ(C).
1

To show that Foster’s criterion is satisﬁed we begin with an application of Jensen’s

inequality: :
1
-
P V 2 (x) ≤ P V (x) ≤ V (x) − βV 2 (x) + bIC (x).
1

√
Concavity of the square root gives the bound 1 + x ≤ 1 + 12 x for all x. Combining
this with the previous bound we obtain
;
−βV 2 (x) + bIC (x)
1
1 1
P V 2 (x) ≤ V 2 (x) 1 +
V (x)
−βV 2 (x) + bIC (x)
1
1
≤ V 2 (x) 1 + 12
V (x)
−βV 2 (x) + bIC (x)
1
1 1
= V 2 (x) + 2 1 .
V 2 (x)

Multiplying each side by 2β −1 gives Foster’s criterion, with Lyapunov function U =

2β −1 V 2 ,
1

1
∆U ≤ −1 + β −1 1 bIC (x) ≤ −1 + β −1 bIC (x) ,
V (x)
2

where the second inequality follows from the assumption V ≥ 1.

Chapter 15

Geometric ergodicity

The previous two chapters have shown that for positive Harris chains, convergence of
Ex [f (Φk )] is guaranteed from almost all initial states x provided only π(f ) < ∞. Strong
though this is, for many models used in practice even more can be said: there is often
a rate of convergence ρ such that

P n (x, · ) − π
f = o(ρn )

where the rate ρ < 1 can be chosen essentially independent of the initial point x.
The purpose of this chapter is to give conditions under which convergence takes
place at such a uniform geometric rate. Because of the power of the ﬁnal form of these
results, and the wide range of processes for which they hold (which include many of those
already analyzed as ergodic) it is not too strong a statement that this “geometrically
ergodic” context constitutes the most useful of all of those we present, and for this
reason we have devoted two chapters to this topic.
The following result summarizes the highlights of this chapter, where we focus on
bounds such as (15.4) and the strong relationship between such bounds and the drift
criterion given in (15.3). In Chapter 16 we will explore a number of examples in de-
tail, and describe techniques for moving from ergodicity to geometric ergodicity. The
development there is based primarily on the results of this chapter, and also on an in-
terpretation of the geometric convergence (15.4) in terms of convergence of the kernels
{P k } in a certain induced operator norm.
Theorem 15.0.1 (Geometric Ergodic Theorem). Suppose that the chain Φ is ψ-
irreducible and aperiodic. Then the following three conditions are equivalent:
(i) The chain Φ is positive recurrent with invariant probability measure π, and there
exists some ν-petite set C ∈ B + (X), ρC < 1, MC < ∞, and P ∞ (C) > 0 such that
for all x ∈ C
|P n (x, C) − P ∞ (C)| ≤ MC ρnC . (15.1)

(ii) There exists some petite set C ∈ B(X) and κ > 1 such that

sup Ex [κτ C ] < ∞. (15.2)

x∈C

362
Geometric ergodicity 363

(iii) There exists a petite set C, constants b < ∞, β > 0 and a function V ≥ 1 ﬁnite
at some one x0 ∈ X satisfying
∆V (x) ≤ −βV (x) + bIC (x), x ∈ X. (15.3)

Any of these three conditions imply that the set SV = {x : V (x) < ∞} is absorbing and
full, where V is any solution to (15.3) satisfying the conditions of (iii), and there then
exist constants r > 1, R < ∞ such that for any x ∈ SV

rn
P n (x, · ) − π
V ≤ RV (x). (15.4)
n

Proof The equivalence of the local geometric rate of convergence property in (i)
and the self-geometric recurrence property in (ii) will be shown in Theorem 15.4.3.
The equivalence of the self-geometric recurrence property and the existence of so-
lutions to the drift equation (15.3) is completed in Theorems 15.2.6 and 15.2.4. It is
in Theorem 15.4.1 that this is shown to imply the geometric nature of the V -norm
convergence in (15.4), while the upper bound on the right hand side of (15.4) follows
from Theorem 15.3.3.

The notable points of this result are that we can use the same function V in (15.4),
which leads to the operator norm results in the next chapter; and that the rate r in
(15.4) can be chosen independently of the initial starting point.
We initially discuss conditions under which there exists for some x ∈ X a rate r > 1
such that

P n (x, · ) − π
f ≤ Mx r−n (15.5)
where Mx < ∞. Notice that we have introduced f -norm convergence immediately:
it will turn out that the methods are not much simpliﬁed by ﬁrst considering the
case of bounded f . We also have another advantage in considering geometric rates
of convergence compared with the development of our previous ergodicity results. We
can exploit the useful fact that (15.5) is equivalent to the requirement that for some r̄,
M̄x ,
r̄n
P n (x, · ) − π
f ≤ M̄x . (15.6)
n
Hence it is without loss of generality that we will immediately move also to consider
the summed form as in (15.6) rather than the n-step convergence as in (15.5).

f -Geometric ergodicity
We shall call Φ f -geometrically ergodic, where f ≥ 1, if Φ is positive Harris
with π(f ) < ∞ and there exists a constant rf > 1 such that
∞

rfn
P n (x, · ) − π
f < ∞ (15.7)
n =1

for all x ∈ X. If (15.7) holds for f ≡ 1, then we call Φ geometrically

ergodic.
364 Geometric ergodicity

The development in this chapter follows a pattern similar to that of the previous
two chapters: first we consider chains which possess an atom, then move to aperiodic
chains via the Nummelin splitting.
This pattern is now well established: but in considering geometric ergodicity, the
extra complexity in introducing both unbounded functions f and exponential moments
of hitting times leads to a number of different and sometimes subtle problems. These
make the proofs a little harder in the case without an atom than was the situation with
either ergodicity or f -ergodicity. However, the final conclusion in (15.4) is well worth
this effort.

15.1 Geometric properties: chains with atoms

15.1.1 Using the regenerative decomposition
Suppose in this section that Φ is a positive Harris recurrent chain and that we have an
accessible atom α in B + (X): as in the previous chapter, we do not consider completely
countable spaces separately, as one atom is all that is needed. We will again use the
Regenerative Decomposition (13.48) to identify the bounds which will ensure that the
chain is f -geometrically ergodic.
Multiplying (13.48) by rn and summing, we have that

P n (x, · ) − π
f rn
n

is bounded by the three sums

∞

n
αP (x, dw)f (w) rn ,
n =1

∞
∞
π(α) tf (j) rn , (15.8)
n =1 j =n +1

∞

|ax ∗ u − π(α)| ∗ tf (n) rn .
n =1

Now using Lemma D.7.2 and recalling that tf (n) = α P n (α, dw)f (w), we have that
the three sums in (15.8) can be bounded individually through

∞

τα
n
α P (x, dw)f (w)r
n
≤ Ex f (Φn )rn , (15.9)
n =1 n =1
∞
∞
r α τ
π(α) tf (j)rn ≤ Eα f (Φn )rn , (15.10)
n =1 j =n +1
r−1 n =1
15.1. Geometric properties: chains with atoms 365

In order to bound the first two sums (15.9) and (15.10), and the second term in the third
sum (15.11), we will require an extension of the notion of regularity, or more exactly of
f -regularity. For fixed r ≥ 1 recall the generating function defined in (8.21) for r < 1
by

τα
Uα(r ) (x, f ) := Ex f (Φn )rn ; (15.12)
n =1

clearly this is defined but possibly infinite for r ≥ 1. From the inequalities (15.9)–(15.11)
above it is apparent that when Φ admits an accessible atom, establishing f -geometric
(r )
ergodicity will require finding conditions such that Uα (x, f ) is finite for some r > 1.
The first term in the right hand side of (15.11) can be reduced further. Using the
fact that
∞

|ax ∗ u (n) − π(α)| = |ax ∗ (u − π(α)) (n) − π(α) ax (j)|
j =n +1
∞

≤ ax ∗ |(u − π(α))| (n) + π(α) ax (j)
j =n +1

and again applying Lemma D.7.2, we ﬁnd the bound

Thus from (15.9)–(15.11) we might hope to ﬁnd that convergence of P n to π takes

place at a geometric rate provided
(i) the atom itself is geometrically ergodic, in the sense that
∞

|u(n) − π(α)|rn
n =1

converges for some r > 1;

(ii) the distribution of τα possess an “f -modulated” geometrically decaying tail from
(r )
both α and from the initial state x, in the sense that both Uα (α, f ) < ∞ and
366 Geometric ergodicity

(r )
Uα (x, f ) < ∞ for some r = rx > 1: and if we can choose such an r independent
of x then we will be able to assert that the overall rate of convergence in (15.4) is
also independent of x.

We now show that as with ergodicity or f -ergodicity, a remarkable degree of soli-

darity in this analysis is indeed possible.

15.1.2 Kendall’s renewal theorem

As in the ergodic case, we need a key result from renewal theory. Kendall’s Theorem
shows that for atoms, geometric ergodicity and geometric decay of the tails of the return
time distribution are actually equivalent conditions.

Theorem 15.1.1 (Kendall’s Theorem). Let u(n) be an ergodic renewal sequence with
increment distribution p(n), and write u(∞) = limn →∞ u(n). Then the following three
conditions are equivalent:

(i) There exists r0 > 1 such that the series

∞

U0 (z) := |u(n) − u(∞)|z n (15.13)
n =0

converges for |z| < r0 .

(ii) There exists r0 > 1 such that the function U (z) deﬁned on the complex plane for
|z| < 1 by
∞

U (z) := u(n)z n
n =0

has an analytic extension in the disc {|z| < r0 } except for a simple pole at z = 1.

(iii) There exists κ > 1 such that the series P (z)

∞

P (z) := p(n)z n (15.14)
n =0

converges for {|z| < κ}.

Proof Assume that (i) holds. Then by construction the function F (z) deﬁned on
the complex plane by
∞

F (z) := (u(n) − u(n − 1))z n
n =0

has no singularities in the disc {|z| < r0 }, and since

F (z) = (1 − z)U (z), |z| < 1, (15.15)

we have that U (z) has no singularities in the disc {|z| < r0 } except a simple pole at
z = 1, so that (ii) holds.
15.1. Geometric properties: chains with atoms 367

Conversely suppose that (ii) holds. We can then also extend F (z) analytically in
∞< r0 } using (15.15). As
the disc {|z| the Taylor series expansion is unique, necessarily
F (z) = n =0 (u(n) − u(n − 1))z n
throughout this larger disc, and so by virtue of
Cauchy’s inequality

|u(n) − u(n − 1)|rn < ∞, r < r0 .
n

Hence from Lemma D.7.2

∞ > |u(m + 1) − u(m)|rn
n m ≥n

≥ | (u(m + 1) − u(m))|rn
n m ≥n

= |u(∞) − u(n)|rn
n

so that (i) holds.

Now suppose that (iii) holds. Since P (z) is analytic in the disc {|z| < κ}, for any
ε > 0 there are at most ﬁnitely many values of z such that P (z) = 1 in the smaller disc
{|z| < κ − ε}.
By aperiodicity of the sequence {p(n)}, we have p(n) > 0 for all n > N for some N ,
from Lemma D.7.4. This implies that for z = 1 on the unit circle {|z| = 1}, we have
∞
∞

p(n)Re (z n ) < p(n),
N N

so that
∞
∞

Re P (z) ≤ p(n)Re (z n ) < p(n) = 1.
0 0

Consequently only one of these roots, namely z = 1, lies on the unit circle, and hence
there is some r0 with 1 < r0 ≤ κ such that z = 1 is the only root of P (z) = 1 in the
disc {|z| < r0 }.
Moreover this is a simple root at z = 1, since

1 − P (z) d
lim = P (z)|z =1 = np(n) = 0.
z →1 1−z dz
Now the renewal equation (8.12) shows that
6 7−1
U (z) = 1 − P (z)

is valid at least in the disc {|z| < 1}, and hence

6 7−1
F (z) = (1 − z)U (z) = (1 − z) 1 − P (z) (15.16)

has no singularities in the disc {|z| < r0 }; and so (ii) holds.

368 Geometric ergodicity

Finally, to show that (ii) implies (iii) we again use (15.16): writing this as

P (z) = [F (z) − 1 + z]/F (z)

shows that P (z) is a ratio of analytic functions and so is itself analytic in the disc
{|z| < κ}, where now κ is the ﬁrst zero of F (z) in {|z| < r0 }; there are only ﬁnitely
many such zeros and none of them occurs in the closed unit disc {|z| ≤ 1} since P (z)
is bounded in this disc, so that κ > 1 as required.

It would seem that one should be able to prove this result, not only by analysis but
also by a coupling argument as in Section 13.2. Clearly one direction of this is easy: if
the renewal times are geometric then one can use coupling to get geometric convergence.
The other direction does seem to require analytic tools to the best of our knowledge,
and so we have given the classical proof here.

15.1.3 The geometric ergodic theorem

Following this result we formalize some of the conditions that will obviously be required
in developing a geometric ergodicity result.

Kendall atoms and geometrically ergodic atoms

An accessible atom is called geometrically ergodic if there exists rα > 1
such that
rαn |P n (α, α) − π(α)| < ∞.
n

An accessible atom is called a Kendall atom of rate κ if there exists κ > 1

such that
Uα(κ) (α, α) = Eα [κτ α ] < ∞.
Suppose that f ≥ 1. An accessible atom is called f -Kendall of rate κ if
there exists κ > 1 such that
τ
α −1
sup Ex f (Φn )κn < ∞.
x∈α
n =0

Equivalently, if f is bounded on the accessible atom α, then α is f -Kendall of rate

κ provided

τα
Uα(κ) (α, f ) = Eα f (Φn )κn < ∞.
n =1

The application of Kendall’s Theorem to chains admitting an atom comes from the
(κ)
following, which is straightforward from the assumption that f ≥ 1, so that Uα (α, f ) ≥
τα
Eα [κ ].
15.1. Geometric properties: chains with atoms 369

Proposition 15.1.2. Suppose that Φ is ψ-irreducible and aperiodic, and α is an ac-

cessible Kendall atom. Then there exists rα > 1 and R < ∞ such that

|P n (α, α) − π(α)| ≤ Rrα−n , n → ∞.

This enables us to control the ﬁrst term in (15.11). To exploit the other bounds in
(κ)
(15.9)–(15.11) we also need to establish ﬁniteness of the quantities Uα (x, f ) for values
of x other than α.
Proposition 15.1.3. Suppose that Φ is ψ-irreducible, and admits an f -Kendall atom
α ∈ B+ (X) of rate κ. Then the set

Sfκ := {x : Uα(κ) (x, f ) < ∞} (15.17)

is full and absorbing.

(κ)
Proof The kernel Uα (x, · ) satisﬁes the identity

P (x, dy)Uα(κ) (y, B) = κ−1 Uα(κ) (x, B) + P (x, α)Uα(κ) (α, B)

and integrating against f gives

P Uα(κ) (x, f ) = κ−1 Uα(κ) (x, f ) + P (x, α)Uα(κ) (α, f ).

Thus the set Sfκ is absorbing, and since Sfκ is non-empty it follows from Proposition 4.2.3
that Sfκ is full.

We now have suﬃcient structure to prove the geometric ergodic theorem when an
atom exists with appropriate properties.
Theorem 15.1.4. Suppose that Φ is ψ-irreducible, with invariant probability measure
π, and that there exists an f -Kendall atom α ∈ B+ (X) of rate κ.
Then there exists a decomposition X = S κ ∪ N where S κ is full and absorbing, such
that for all x ∈ S κ , some R < ∞, and some r with r > 1

rn
P n (x, ·) − π(·)
f ≤ R Uα(κ) (x, f ) < ∞. (15.18)
n

Proof By Proposition 15.1.3 the bounds (15.9) and (15.10), and the second term
in the bound (15.11), are all ﬁnite for x ∈ S κ ; and Kendall’s Theorem, as applied in
Proposition 15.1.2, gives that for some rα > 1 the other term in (15.11) is also ﬁnite.
The result follows with r = min(κ, rα ).

There is an alternative way of stating Theorem 15.1.4 in the simple geometric er-
godicity case f = 1 which emphasizes the solidarity result in terms of ergodic properties
rather than in terms of hitting time properties. The proof uses the same steps as the
previous proof, and we omit it.
370 Geometric ergodicity

Theorem 15.1.5. Suppose that Φ is ψ-irreducible, with invariant probability measure

π, and that there is one geometrically ergodic atom α ∈ B+ (X). Then there exists
κ > 1, r > 1 and a decomposition X = S κ ∪ N where S κ is full and absorbing, such that
for some R < ∞ and all x ∈ S κ

rn
P n (x, ·) − π(·)
≤ REx [κτ α ] < ∞, (15.19)
n

so that Φ restricted to S κ is also geometrically ergodic.

15.1.4 Some geometrically ergodic chains on countable spaces

Forward recurrence time chains
Consider as in Section 2.4 the forward recurrence time chain V + .
By construction, we have for this chain that

E1 [rτ 1 ] = rn P1 (τ1 = n) = rn p(n)
n n

so that the chain is geometrically ergodic if and only if the distribution p(n) has geo-
metrically decreasing tails.
We will see, once we develop a drift criterion for geometric ergodicity, that this
duality between geometric tails on increments and geometric rates of convergence to
stationarity is repeated for many other models.

A non-geometrically ergodic example

Not all ergodic chains on Z+ are geometrically ergodic, even if (as in the forward
recurrence time chain) the steps to the right are geometrically decreasing. Consider a
chain on Z+ with the transition matrix

P (0, j) = γj , j ∈ Z+ ,
P (j, j) = βj , j ∈ Z+ ,
P (j, 0) = 1 − βj , j ∈ Z+ . (15.20)

where j γj = 1.
The mean return time from zero to itself is given by

E0 [τ0 ] = γj [1 + (1 − βj )−1 ]
j

and the chain is thus ergodic if γj > 0 for all j (ensuring irreducibility and aperiodicity),
and
γj (1 − βj )−1 < ∞. (15.21)
j

In this example
E0 [rτ 0 ] ≥ r γj Ej [rτ 0 ]
j
15.1. Geometric properties: chains with atoms 371

and
Pj (τ0 > n) = βjn .
Hence if βj → 1 as n → ∞, then the chain is not geometrically ergodic regardless of the
structure of the distribution {γj }, even if γn → 0 suﬃciently fast to ensure that (15.21)
holds.

Diﬀerent rates of convergence

Although it is possible to ensure a common rate of convergence in the Geometric Ergodic
Theorem, there appears to be no simple way to ensure for a particular state that the
rate is best possible. Indeed, in general this will not be the case.
To see this consider the matrix
 1 1 1 
4 2 4
P = 0 3
4
1
4
.
3 1
4 0 4

By direct inspection we ﬁnd the diagonal elements have generating functions

U (z ) (0, 0) = 1 + z/4(1 − z),

U (z ) (1, 1) = 1 + z/2(1 − z) + z/4(1 − z),
U (z ) (2, 2) = 1 + z/4(1 − z).

Thus the best rates for convergence of P n (0, 0) and P n (2, 2) to their limits π(0) =
π(2) = 14 are ρ0 = ρ2 = 0: the limits are indeed attained at every step. But the rate of
convergence of P n (1, 1) to π(1) = 12 is at least ρ1 > 14 .
The following more complex example shows that even on an arbitrarily large ﬁnite
space {1, . . . , N + 1} there may in fact be N diﬀerent rates of convergence such that

|P n (i, i) − π(i)| ≤ Mi ρni .

Consider the matrix

 
β1 α1 α1 ... α1 α1 α1
 α1 β2 α2 ... α2 α2 α2 
 
 α1 α2 β3 ... α3 α3 α3 
 
 .. .. .. .. .. .. 
P = . . . ... . . . 
 
 α1 α2 α3 ... βN −1 αN −1 αN −1 
 
 α1 α2 α3 ... αN −1 βN αN 
α1 α2 α3 ... αN −1 αN βN

so that

k −1
P (k, k) = βk := 1 − αj − (N + 1 − k)αk , 1 ≤ k ≤ N + 1,
1

where the oﬀ-diagonal elements are ordered by

0 < αN < αN −1 < . . . < α2 < α1 ≤ [N + 1]−1 .

372 Geometric ergodicity

Since P is symmetric it is immediate that the invariant measure is given for all k by

π(k) = [N + 1]−1 .

For this example it is possible to show [384] that the eigenvalues of P are distinct and
are given by λ1 = 1 and for k = 2, . . . , N + 1

λk = βN +2−k − αN +2−k .

After considerable algebra it follows that for each k, there are positive constants s(k, j)
such that

N +1
P m (k, k) − [N + 1]−1 = s(k, j)λm
j
j =N +2−k

and hence k has the exact “self-convergence” rate λN +2−k .

Moreover, s(N + 1, j) = s(N, j) for all 1 ≤ j ≤ N + 1, and so for the N + 1 states
there are N diﬀerent “best” rates of convergence.
Thus our conclusion of a common rate parameter is the most that can be said.

15.2 Kendall sets and drift criteria

It is of course now obvious that we should try to move from the results valid for chains
with atoms, to strongly aperiodic chains and thence to general aperiodic chains via the
Nummelin splitting and the m-skeleton.
We first need to find conditions on the original chain under which the atom in the
split chain is an f -Kendall atom. This will give the desired ergodic theorem for the split
chain, which is then passed back to the original chain by exploiting a growth rate on the
f -norm which holds for “f -geometrically regular chains”. This extends the argument
used in the proof of Lemma 14.3.2 to prove the f -Norm Ergodic Theorem in Chapter 14.
To do this we need to extend the concepts of Kendall atoms to general sets, and
connect these with another and stronger drift condition: this has a dual purpose, for not
only will it enable us to move relatively easily between chains, their skeletons, and their
split forms, it will also give us a verifiable criterion for establishing geometric ergodicity.

15.2.1 f -Kendall sets and f -geometrically regular sets

The crucial aspect of a Kendall atom is that the return times to the atom from itself
have a geometrically bounded distribution. There is an obvious extension of this idea
to more general, non-atomic, sets.
15.2. Kendall sets and drift criteria 373

Kendall sets and f -geometrically regular sets

A set A ∈ B(X) is called a Kendall set if there exists κ > 1 such that

sup Ex [κτ A ] < ∞.

x∈A

A set A ∈ B(X) is called an f -Kendall set for a measurable f : X → [1, ∞)

if there exists κ = κ(f ) > 1 such that

τ
A −1
sup Ex f (Φk )κk < ∞. (15.22)
x∈A
k =0

A set A ∈ B(X) is called f -geometrically regular for a measurable f : X →

[1, ∞) if for each B ∈ B + (X) there exists r = r(f, B) > 1 such that

τ
B −1
sup Ex f (Φk )rk < ∞.
x∈A
k =0

Clearly, since we have r > 1 in these deﬁnitions, an f -geometrically regular set is

also f -regular. When a set or a chain is 1-geometrically regular then we will call it
geometrically regular.
A Kendall set is, in an obvious way, “self-geometrically regular”: return times to
the set itself are geometrically bounded, although not necessarily hitting times on other
sets.
(r )
As in (15.12), for any set C in B(X) the kernel UC (x, B) is given by

τC
(r )
UC (x, B) = Ex IB (Φk )rk ; (15.23)
k =1

this is again well deﬁned for r ≥ 1, although it may be inﬁnite. We use this no-
tation in our next result, which establishes that any petite f -Kendall set is actually
f -geometrically regular. This is non-trivial to establish, and needs a somewhat delicate
“geometric trials” argument.

Theorem 15.2.1. Suppose that Φ is ψ-irreducible. Then the following are equivalent:

(i) The set C ∈ B(X) is a petite f -Kendall set.

(ii) The set C is f -geometrically regular and C ∈ B+ (X).

Proof To prove (ii)⇒(i) it is enough to show that A is petite, and this follows from
Proposition 11.3.8, since a geometrically regular set is automatically regular.
To prove (i)⇒(ii) is considerably more diﬃcult, although obviously since a Kendall
set is Harris recurrent, it follows from Proposition 9.1.1 that any Kendall set is in B + (X).
374 Geometric ergodicity

Suppose that C is an f -Kendall set of rate κ, let 1 < r ≤ κ, and deﬁne U (r ) (x) =
Ex [rτ C ], so that U (r ) is bounded on C. We set M (r) = supx∈C U (r ) (x) < ∞. Put
ε = log(r)/ log(κ): by Jensen’s inequality,
M (r) = sup Ex [κετ C ] ≤ M (κ)ε .
x∈C

From this bound we see that M (r) → 1 as r ↓ 1.

Let τC (n) denote the nth return time to the set C, where for convenience, we set
τC (0) := 0. We have by the strong Markov property and induction,
τ C ( n −1 )
Ex [rτ C (n ) ] = Ex [rτ C (n −1)+θ τC
]

= Ex [rτ C (n −1) EΦ τ C ( n −1 ) [rτ C ]]

(15.24)
≤ M (r) Ex [rτ C (n −1) ]

≤ (M (r))n −1 U (r ) (x), n ≥ 1.
To prove the theorem we will combine this bound with the sample path bound, valid
for any set B ∈ B(X),

τB ∞

τ C (n +1)
ri f (Φi ) ≤ rj f (Φj ) I{τB > τC (n)}.
i=1 n =0 j =τ C (n )+1

Taking expectations and applying the strong Markov property gives

∞

τC
(r )
UB (x, f ) ≤ Ex I{τB > τC (n)}rτ C (n ) EΦ τ C ( n ) rj f (Φj )
n =0 j =1
∞

(r )
≤ sup UC (x, f ) Ex I{τB > τC (n)}rτ C (n ) . (15.25)
x∈C n =0

For any 0 < γ < 1, n ≥ 0, and positive numbers x and y we have the bound xy ≤
γ n x2 + γ −n y 2 . Applying this bound with x = rτ C (n ) and y = I{τC (n) < τB } in (15.25),
(r )
and setting Mf (r) = supx∈C UC (x, f ) we obtain for any B ∈ B(X),
∞ $
%
γ n Ex [r2τ C (n ) ] + γ −n Ex [I{τC (n) < τB }]
(r )
UB (x, f ) ≤ Mf (r)
n =0
$
∞
2
≤ Mf (r) γ n (M (r2 ))n U (r ) (x)
n =0
∞
%
+ γ −n Px {τC (n) < τB } , (15.26)
n =0

where we have used (15.24). We still need to prove the right hand side of (15.26) is
ﬁnite. Suppose now that for some R < ∞, ρ < 1, and any x ∈ X,
Px {τC (n) < τB } ≤ Rρn . (15.27)
15.2. Kendall sets and drift criteria 375

Choosing ρ < γ < 1 in (15.26) gives

$ 2 ∞
R %
(r )
UB (x, f ) ≤ Mf (r) U (r ) (x) (γM (r2 ))n + −1
.
n =0
1−γ ρ

With γ so ﬁxed, we can now choose r > 1 so close to unity that γM (r2 ) < 1 to obtain
$ U (r 2 ) (x) R %
(r )
UB (x, f ) ≤ Mf (r) + −1
,
1 − γM (r ) 1 − γ ρ
2

and the result holds.

To complete the proof, it is thus enough to bound Px {τC (n) < τB } by a geometric
series as in (15.27). Since C is petite, there exists n0 ∈ Z+ , c < 1, such that

Px {τC (n0 ) < τB } ≤ Px {n0 < τB } ≤ c, x ∈ C,

and by the strong Markov property it follows that with m0 = n0 + 1,

Px {τC (m0 ) < τB } ≤ c, x ∈ X.

Hence, using the identity

I{τC (mm0 ) < τB } = I{τC ([m − 1]m0 ) < τB }θτ C ([m −1]m 0 ) I{τC (m0 ) < τB }

we have again by the strong Markov property that for all x ∈ X, m ≥ 1,

$ %
Px {τC (mm0 ) < τB } = Ex I{τC ([m − 1]m0 ) < τB }PΦ τ C ( [ m −1 ] m 0 ) {τC (m0 ) < τB }
≤ cPx {τC ([m − 1]m0 ) < τB }
≤ cm ,

and it now follows easily that (15.27) holds.

Notice speciﬁcally in this result that there may be a separate rate of convergence r
for each of the quantities
(r )
sup UB (x, f )
x∈C

depending on the quantity ρ in (15.27): intuitively, for a set B “far away” from C it
may take many visits to C before an excursion reaches B, and so the value of r will be
correspondingly closer to unity.

15.2.2 The geometric drift condition

Whilst for strongly aperiodic chains an approach to geometric ergodicity is possible with
the tools we now have directly through petite sets, in order to move from strongly ape-
riodic to aperiodic chains through skeleton chains and splitting methods an attractive
theoretical route is through another set of drift inequalities.
This has, as usual, the enormous practical beneﬁt of providing a set of veriﬁable
conditions for geometric ergodicity. The drift condition appropriate for geometric con-
vergence is:
376 Geometric ergodicity

Geometric drift towards C

(V4) There exists an extended-real-valued function V : X → [1, ∞], a
measurable set C, and constants β > 0, b < ∞,

∆V (x) ≤ −βV (x) + bIC (x), x ∈ X. (15.28)

We see at once that (V4) is just (V3) in the special case where f = βV . From
this observation we can borrow several results from the previous chapter, and use the
approach there as a guide.
We ﬁrst spell out some useful properties of solutions to the drift inequality in (15.28),
analogous to those we found for (14.16).

Lemma 15.2.2. Suppose that Φ is ψ-irreducible.

(i) If V satisﬁes (15.28), then {V < ∞} is either empty or absorbing and full.

(ii) If (15.28) holds for a petite set C, then V is unbounded oﬀ petite sets.

Proof Since (15.28) implies P V ≤ V + b the set {V < ∞} is absorbing; hence if it

is non-empty it is full, by Proposition 4.2.3.
Since V ≥ 1, we see that (V4) implies that (V2) holds with V = V /(1 − β). From
Lemma 11.3.7 it then follows that V (and hence obviously V ) is unbounded oﬀ petite
sets.

We now begin a more detailed evaluation of the consequences of (V4). We ﬁrst give
a probabilistic form for one solution to the drift condition (V4), which will prove that
(15.2) implies (15.3) has a solution.
(r ) (r ) (r ) (r )
Using the kernel UC we deﬁne a further kernel GC as GC = I + IC c UC . For any
x ∈ X, B ∈ B(X), this has the interpretation

σC
(r )
GC (x, B) = Ex IB (Φk )rk . (15.29)
k =0

(r )
The kernel GC (x, B) gives us the solution we seek to (15.28).
(r )
Lemma 15.2.3. Suppose that C ∈ B(X), and let r > 1. Then the kernel GC satisﬁes

P GC = r−1 GC − r−1 I + r−1 IC UC

(r ) (r ) (r )

so that in particular for β = 1 − r−1

P GC − GC = ∆GC ≤ −βGC + r−1 IC UC .

(r ) (r ) (r ) (r ) (r )
(15.30)
15.2. Kendall sets and drift criteria 377

(r )
Proof The kernel UC satisﬁes the simple identity
(r ) (r )
UC = rP + rP IC c UC . (15.31)
(r )
Hence the kernel GC satisﬁes the chain of identities

P GC = P + P IC c UC = r−1 UC = r−1 [GC − I + IC UC ].

(r ) (r ) (r ) (r ) (r )

This now gives us the easier direction of the duality between the existence of f -
Kendall sets and solutions to (15.28).
Theorem 15.2.4. Suppose that Φ is ψ-irreducible, and admits an f -Kendall set C ∈
(κ)
B + (X) for some f ≥ 1. Then the function V (x) = GC (x, f ) ≥ f (x) is a solution to
(V4).

Proof We have from (15.30) that, by the f -Kendall property, for some M < ∞
and r > 1,
∆V ≤ −βV + r−1 M IC
and so the function V satisﬁes (V4).

15.2.3 Other solutions of the drift inequalities

We have shown that the existence of f -geometrically regular sets will lead to solutions
of (V4). We now show that the converse also holds.
The tool we need in order to consider properties of general solutions to (15.28) is
the following “geometric” generalization of the Comparison Theorem.
Theorem 15.2.5. If (V4) holds, then for any r ∈ (1, (1 − β)−1 ) there exists ε > 0 such
that for any ﬁrst entrance time τB ,
τ
B −1 τ
B −1
Ex V (Φk )rk ≤ ε−1 r−1 V (x) + ε−1 bEx IC (Φk )rk
k =0 k =0

and hence in particular choosing B = C

τ
C −1
V (x) ≤ Ex V (Φk )rk ≤ ε−1 r−1 V (x) + ε−1 bIC (x). (15.32)
k =0

Proof We have the bound

P V ≤ r−1 V − εV + bIC

where 0 < ε < β is the solution to r = (1 − β + ε)−1 . Deﬁning

Zk = rk V (Φk )
378 Geometric ergodicity

for k ∈ Z+ , it follows that

E[Zk +1 | FkΦ ] = rk +1 E[V (Φk +1 ) | FkΦ ]

≤ rk +1 {r−1 V (Φk ) − εV (Φk ) + bIC (Φk )}

= Zk − εrk +1 V (Φk ) + rk +1 bIC (Φk ).

Choosing fk (x) = εrk +1 V (x) and sk (x) = brk +1 IC (x), we have by Proposition 11.3.2

τ
B −1 τ
B −1
Ex εr k +1
V (Φk ) ≤ Z0 (x) + Ex rk +1 bIC (Φk ) .
k =0 k =0

Multiplying through by ε−1 r−1 and noting that Z0 (x) = V (x), we obtain the required
bound.
The particular form with B = C is then straightforward.

We use this result to prove that in general, sublevel sets of solutions V to (15.28)
are V -geometrically regular.
Theorem 15.2.6. Suppose that Φ is ψ-irreducible, and that (V4) holds for a function
V and a petite set C.
If V is bounded on A ∈ B(X), then A is V -geometrically regular.

Proof We ﬁrst show that if V is bounded on A, then A ⊆ D where D is a V -Kendall

set.
Assume (V4) holds, let ρ = 1 − β, and fix ρ < r−1 < 1. Now consider the set D
defined by
$ M +b %
D := x : V (x) ≤ −1 , (15.33)
r −ρ
where the integer M > 0 is chosen so that A ⊆ D (which is possible because the function
V is bounded on A) and D ∈ B+ (X), which must be the case for sufficiently large M
from Lemma 15.2.2 (i).
Using (V4) we have

P V (x) ≤ r−1 V (x) − (r−1 − ρ)V (x) + bIC (x)

≤ r−1 V (x) − M, x ∈ Dc .

Since P V (x) ≤ V (x) + b, which is bounded on D, it follows that

P V ≤ r−1 V + cID

for some c < ∞. Thus we have shown that (V4) holds with D in place of C.
Hence using (15.32) there exists s > 1 and ε > 0 such that

τ
D −1
Ex sk V (Φk ) ≤ ε−1 s−1 V (x) + ε−1 cID (x). (15.34)
k =0
15.2. Kendall sets and drift criteria 379

Since V is bounded on D by construction, this shows that D is V -Kendall as required.

By Lemma 15.2.2 (ii) the function V is unbounded oﬀ petite sets, and therefore the
set D is petite. Applying Theorem 15.2.1 we see that D is V -geometrically regular.
Finally, since by deﬁnition any subset of a V -geometrically regular set is itself V -
geometrically regular, we have that A inherits this property from D.

As a simple consequence of Theorem 15.2.6 we can construct, given just one f -
Kendall set in B + (X), an increasing sequence of f -geometrically regular sets whose
union is full: indeed we have a somewhat more detailed description than this.

Theorem 15.2.7. If there exists an f -Kendall set C ∈ B+ (X), then there exists V ≥ f
and an increasing sequence {CV (i) : i ∈ Z+ } of V -geometrically regular sets whose
union is full.

(r )
Proof Let V (x) = GC (x, f ). Then V satisﬁes (V4) and by Theorem 15.2.6 the set
CV (n) := {x : V (x) ≤ n} is V -geometrically regular for each n. Since SV = {V < ∞}
is a full absorbing subset of X, the result follows.

The following alternative form of (V4) will simplify some of the calculations per-
formed later.

Lemma 15.2.8. The drift condition (V4) holds with a petite set C if and only if V is
unbounded oﬀ petite sets and
P V ≤ λV + L (15.35)
for some λ < 1, L < ∞.

Proof If (V4) holds, then (15.35) immediately follows. Lemma 15.2.2 states that
the function V is unbounded off petite sets.
Conversely, if (15.35) holds for a function V which is unbounded off petite sets then
set β = 12 (1 − λ) and define the petite set C as

C = {x ∈ X : V (x) ≤ L/β}

It follows that ∆V ≤ −βV + LIC so that (V4) is satisﬁed.

We will ﬁnd in several examples on topological spaces that the bound (15.35) is
obtained for some coercive function V and compact C. If the Markov chain is a ψ-
irreducible T-chain it follows from Lemma 15.2.8 that (V4) holds and then that the
chain is V -geometrically ergodic.
Although the result that one can use the same function V in both sides of

rn
P n (x, · ) − π
V ≤ RV (x).
n

is an important one, it also has one drawback: as we have larger functions on the left,
the bounds on the distance to π(V ) also increase.
Overall it is not clear when one can have a best common bound on the distance

P n (x, · ) − π
V independent of V ; indeed, the example in Section 16.2.2 shows that as
V increases then one might even lose the geometric nature of the convergence.
380 Geometric ergodicity

However, the following result shows that one can obtain a smaller x-dependent
bound in the Geometric Ergodic Theorem if one is willing to use a smaller function V
in the application of the V -norm.

Lemma 15.2.9.
√ If (V4) holds for V , and some petite set C, then (V4) also holds for
the function V and some petite set C.

Proof If (V4) holds for the ﬁnite-valued function V then by Lemma 15.2.8 V is
unbounded- oﬀ petite sets and (15.35) holds for some λ < 1 and L < ∞. Letting
V (x) = V (x), x ∈ X, we have by Jensen’s inequality,
- √
P V (x) ≤ P V (x) ≤ λV + L
√ √ L
≤ λ V + √ since V ≥ 1
2 λ
√ L
= λV + √ ,
2 λ
√
which together with Lemma 15.2.8 implies that (V4) holds with V replaced by V.

15.3 f -Geometric regularity of Φ and its skeleton

15.3.1 f -Geometric regularity of chains
There are two aspects to the f -geometric regularity of sets that we need in moving to
our prime purpose in this chapter, namely proving the f -geometric convergence part of
the Geometric Ergodic Theorem.
The first is to locate sets from which the hitting times on other sets are geometrically
fast. For the purpose of our convergence theorems, we need this in a specific way: from
an f -Kendall set we will only need to show that the hitting times on a split atom are
geometrically fast, and in effect this merely requires that hitting times on a (rather
specific) subset of a petite set be geometrically fast. Indeed, note that in the case with
an atom we only needed the f -Kendall (or self f -geometric regularity) property of the
atom, and there was no need to prove that the atom was fully f -geometrically regular.
The other structural results shown in the previous section are an unexpectedly rich by-
product of the requirement to delineate the geometric bounds on subsets of petite sets.
This approach also gives, as a more directly useful outcome, an approach to working
with the m-skeleton from which we will deduce rates of convergence.
Secondly, we can see from the Regenerative Decomposition that we will need the
analogue of Proposition 15.1.3: that is, we need to ensure that for some specific set
there is a fixed geometric bound on the hitting times of the set from arbitrary starting
points. This motivates the next definition.
15.3. f -Geometric regularity of Φ and its skeleton 381

f -Geometric regularity of Φ
The chain Φ is called f -geometrically regular if there exists a petite set C
and a ﬁxed constant κ > 1 such that
τ
C −1
Ex f (Φk )κk (15.36)
k =0

is ﬁnite for all x ∈ X and bounded on C.

Observe that when κ is taken equal to one, this definition then becomes f -regularity,
whilst the boundedness on C implies f -geometric regularity of the set C from Theo-
rem 15.2.1: it is the finiteness from arbitrary initial points that is new in this definition.
The following consequence of f -regularity follows immediately from the strong
Markov property and f -geometric regularity of the set C used in (15.36).

Proposition 15.3.1. If Φ is f -geometrically regular so that (15.36) holds for a petite

set C, then for each B ∈ B + (X) there exists r = r(B) > 1 and c(B) < ∞ such that
(r ) (r )
UB (x, f ) ≤ c(B)UC (x, f ). (15.37)

By now the techniques we have developed ensure that f -geometrically regularity is

relatively easy to verify.

Proposition 15.3.2. If there is one petite f -Kendall set C, then there is a decompo-
sition
X = Sf ∪ N
where Sf is full and absorbing, and Φ restricted to Sf is f -geometrically regular.

Proof We know from Theorem 15.2.1 that when a petite f -Kendall set C exists
(r )
then C is V -geometrically regular, where V (x) = GC (x, f ) for some r > 1. Since V then
satisﬁes (V4) from Lemma 15.2.3, it follows from Lemma 15.2.2 that Sf = {V < ∞} is
absorbing and full. Now as in (15.32) we have for some κ > 1

τ
C −1
V (x) ≤ Ex V (Φn )κn ≤ ε−1 κ−1 V (x) + ε−1 cIC (x) (15.38)
n =0

and since the right hand side is ﬁnite on Sf the chain restricted to Sf is V -geometrically
regular, and hence also f -geometrically regular since f ≤ V .

The existence of an everywhere ﬁnite solution to the drift inequality (V4) is equiv-
alent to f -geometric regularity, imitating the similar characterization of f -regularity.
We have
382 Geometric ergodicity

Theorem 15.3.3. Suppose that (V4) holds for a petite set C and a function V which
is everywhere finite. Then Φ is V -geometrically regular, and for each B ∈ B+ (X) there
exists c(B) < ∞ such that
(r )
UB (x, V ) ≤ c(B)V (x).
Conversely, if Φ is f -geometrically regular, then there exists a petite set C and a
function V ≥ f which is everywhere finite and which satisfies (V4).

Proof Suppose that (V4) holds with V everywhere ﬁnite and C petite. As in the
proof of Theorem 15.2.6, there exists a petite set D on which V is bounded, and as in
(15.34) there is then r > 1 and a constant d such that

τ
D −1
Ex V (Φk )rk ≤ dV (x).
k =0

Hence Φ is V -geometrically regular, and the required bound follows from Proposi-
tion 15.3.1.
(r )
For the converse, take V (x) = GC (x, f ) where C is the petite set used in the
deﬁnition of f -geometric regularity.

This approach, using solutions V to (V4) to bound (15.36), is in eﬀect an extended
version of the method used in the atomic case to prove Proposition 15.1.3.

15.3.2 Connections between Φ and Φn

A striking consequence of the characterization of geometric regularity in terms of the
solution of (V4) is that we can prove almost instantly that if a set C is f -geometrically
regular, and if Φ is aperiodic, then C is also f -geometrically regular for every skeleton
chain.

Theorem 15.3.4. Suppose that Φ is ψ-irreducible and aperiodic.

(i) If V satisﬁes (V4) with a petite set C, then for any n-skeleton, the function V
also satisﬁes (V4) for some set C which is petite for the n-skeleton.

(ii) If C is f -geometrically regular, then C is f -geometrically regular for the chain Φn

for any n ≥ 1.

Proof (i) Suppose ρ = 1 − β and 0 < ε < ρ − ρn . By iteration we have using

Lemma 14.2.8 that for some petite set C ,

n −1
P n V ≤ ρn V + b P i IC ≤ ρn V + bmIC + ε.
i=0

Since V ≥ 1 this gives

P n V ≤ ρV + bmIC , (15.39)
and hence (V4) holds for the n-skeleton.
15.3. f -Geometric regularity of Φ and its skeleton 383

(ii) If C is f -geometrically regular then we know that (V4) holds with V =

(r )
GC (x, f ). We can then apply Theorem 15.2.6 to the n-skeleton and the result follows.

Given this together with Theorem 15.3.3, which characterizes f -geometric regularity,
the following result is obvious:

Theorem 15.3.5. If Φ is f -geometrically regular and aperiodic, then every skeleton is

also f -geometrically regular.

We round out this series of equivalences by showing not only that the skeletons
inherit f -geometric regularity properties from the chain, but that we can go in the
other direction also.
m −1
Recall from (14.22) that for any positive function g on X, we write g (m ) = i=0 P i g.
Then we have, as a geometric analogue of Theorem 14.2.9,

Theorem 15.3.6. Suppose that Φ is ψ-irreducible and aperiodic. Then C ∈ B+ (X) is

f -geometrically regular if and only if it is f (m ) -geometrically regular for any one, and
then every, m-skeleton chain.

Proof Letting τBm denote the hitting time for the skeleton, we have by the Markov
property, for any B ∈ B+ (X) and r > 1,

τ
B −1 τ
B −1 m
m m

m −1 −1
−m
Ex r km i
P f (Φk m ) ≥ r Ex rk m +i f (Φk m +i )
k =0 i=0 k =0 i=0
τ
B −1
≥ r−m Ex rj f (Φj ) .
j =0

If C is f (m ) -geometrically regular for an m-skeleton then the left hand side is bounded
over C for some r > 1 and hence the set C is also f -geometrically regular.
Conversely, if C ∈ B + (X) is f -geometrically regular then it follows from Theo-
rem 15.2.4 that (V4) holds for a function V ≥ f which is bounded on C.
Thus we have from (15.39) and a further application of Lemma 14.2.8 that for some
petite set C and ρ < 1

P m V (m ) ≤ ρV (m ) + mbIC ≤ ρ V (m ) + mbIC .
(m )

and thus (V4) holds for the m-skeleton. Since V (m ) is bounded on C by (15.39), we
have from Theorem 15.3.3 that C is V (m ) -geometrically regular for the m-skeleton.

This gives the following solidarity result.

Theorem 15.3.7. Suppose that Φ is ψ-irreducible and aperiodic. Then Φ is f -

geometrically regular if and only if each m-skeleton is f (m ) -geometrically regular.

384 Geometric ergodicity

15.4 f -Geometric ergodicity for general chains

We now have the results that we need to prove the geometrically ergodic limit (15.4).
Using the result in Section 15.1.3 for a chain possessing an atom we immediately obtain
the desired ergodic theorem for strongly aperiodic chains. We then consider the m-
skeleton chain: we have proved that when Φ is f -geometrically regular then so is each
m-skeleton. For aperiodic chains, there always exists some m ≥ 1 such that the m-
skeleton is strongly aperiodic, and hence as in Chapter 14 we can prove geometric
ergodicity using this strongly aperiodic skeleton chain.
We follow these steps in the proof of the following theorem.
Theorem 15.4.1. Suppose that Φ is ψ-irreducible and aperiodic, and that there is one
f -Kendall petite set C ∈ B(X).
Then there exists κ > 1 and an absorbing full set Sfκ on which
C −1
τ
Ex [ f (Φk )κk ]
k =0

is ﬁnite, and for all x ∈ Sfκ ,

τC
rn
P n (x, · ) − π
f ≤ R Ex [ f (Φk )κk ]
n k =0

for some r > 1 and R < ∞ independent of x.

Proof This proof is in several steps, from the atomic through the strongly aperiodic
to the general aperiodic case. In all cases we use the fact that the seemingly relatively
weak f -Kendall petite assumption on C implies that C is f -geometrically regular and
in B + (X) from Theorem 15.2.1.
Under the conditions of the theorem it follows from Theorem 15.2.4 that
σC
V (x) = Ex f (Φk )κk ≥ f (x) (15.40)
k =0

is a solution to (V4) which is bounded on the set C, and the set Sfκ = {x : V (x) < ∞}
is absorbing, full, and contains the set C. This will turn out to be the set required for
the result.
(i) Suppose ﬁrst that the set C contains an accessible atom α. We know then
that the result is true from Theorem 15.1.4, with the bound on the f -norm convergence
given from (15.18) and (15.37) by
α −1
τ C −1
τ
Ex [ f (Φk )κk ] ≤ c(α)Ex [ f (Φk )κk ]
k =0 k =0

for some κ > 1 and a constant c(α) < ∞.

(ii) Consider next the case where the chain is strongly aperiodic, and this time
assume that C ∈ B+ (X) is a ν1 -small set with ν1 (C c ) = 0. Clearly this will not always
be the case, but in part (iii) of the proof we see that this is no loss in generality.
15.4. f -Geometric ergodicity for general chains 385

To prove the theorem we abandon the function f and prove V -geometric ergodicity
for the chain restricted to Sfκ and the function (15.40). By Theorem 15.3.3 applied to
the chain restricted to Sfκ we have that for some constants c < ∞, r > 1,

τC
Ex V (Φk )rk ≤ cV (x). (15.41)
k =1

Now consider the chain split on C. Exactly as in the proof of Proposition 14.3.1 we
have that
τ C
0 ∪C 1
Ěx i V̌ (Φ̌k )rk ≤ c V̌ (xi )
k =1

where c ≥ c and V̌ is deﬁned on X̌ by V̌ (xi ) = V (x), x ∈ X, i = 0, 1.

But this implies that α̌ is a V̌ -Kendall atom, and so from step (i) above we see that
for some r0 > 1, c < ∞,

r0n
P̌ n (xi , · ) − π̌
V̌ ≤ c V̌ (xi )
n

for all xi ∈ (Sfκ )0 ∪ X1 .

It is then immediate that the original (unsplit) chain restricted to Sfκ is V -
geometrically ergodic and that

r0n
P n (x, · ) − π
V ≤ c V (x).
n

From the deﬁnition of V and the bound V ≥ f this proves the theorem when C is
ν1 -small.
(iii) Now let us move to the general aperiodic case. Choose m so that the set C
is itself νm -small with νm (C c ) = 0: we know that this is possible from Theorem 5.5.7.
By Theorem 15.3.3 and Theorem 15.3.5 the chain and the m-skeleton restricted to
Sfκ are both V -geometrically regular. Moreover, by Theorem 15.3.3 and Theorem 15.3.4
we have for some constants d < ∞, r > 1,

m
τC
Ex V (Φk )rk ≤ dV (x) (15.42)
k =1

where as usual τCm denotes the hitting time for the m-skeleton. From (ii), since m is
chosen speciﬁcally so that C is “ν1 -small” for the m-skeleton, there exists c < ∞ with

P n m (x, · ) − π
V ≤ cV (x)r0−n , n ∈ Z+ , x ∈ Sfκ .

We now need to compare this term with the convergence of the one-step transition
probabilities, and we do not have the contraction property of the total variation norm
available to do this. But if (V4) holds for V then we have that

P V (x) ≤ V (x) + b ≤ (1 + b)V (x),

386 Geometric ergodicity

and hence for any g ≤ V ,

|P n +1 (x, g) − π(g)| = |P n (x, P g) − π(P g)|

≤
P n (x, · ) − π
(1+b)V
= (1 + b)
P n (x, · ) − π
V .

Thus we have the bound

P n +1 (x, · ) − π
V ≤ (1 + b)
P n (x, · ) − π
V . (15.43)

Now observe that for any k ∈ Z+ , if we write k = nm + i with 0 ≤ i ≤ m − 1, we obtain

from (15.43) the bound, for any x ∈ Sfκ

P k (x, · ) − π
V ≤ (1 + b)m
P n m (x, · ) − π
V
≤ (1 + b)m cV (x)r0−n
1/m −k
≤ (1 + b)m cr0 V (x)(r0 ) ,

and the theorem is proved.

Intuitively it seems obvious from the method of proof we have used here that f -
geometric ergodicity will imply f -geometric regularity for any f , but of course the
inequalities in the Regenerative Decomposition are all in one direction, and so we need
to be careful in proving this result.

Theorem 15.4.2. If Φ is f -geometrically ergodic, then there is a full absorbing set S

such that Φ is f -geometrically regular when restricted to S.

Proof Let us ﬁrst assume there is an accessible atom α ∈ B+ (X), and that r > 1
is such that
rn
P n (α, · ) − π
f < ∞.
n

Using the last exit decomposition (8.19) over the times of entry to α, we have as in the
Regenerative Decomposition (13.48)
∞

P (α, f ) − π(f ) ≥ (u − π(α)) ∗ tf (n) + π(α)
n
tf (j). (15.44)
j =n +1

Multiplying by rn and summing both sides of (15.44) would seem to indicate that α is
an f -Kendall atom of rate r, save for the fact that the ﬁrst term may be negative, so
that we could have both positive and negative inﬁnite terms in this sum in principle.
We need a little more delicate argument to get around this.
By truncating the last term and then multiplying by sn , s ≤ r and summing to N ,
we do have
N 6N N −n k 7
n =0 s (P (α, f ) − π(f )) ≥ k =0 s (u(k) − π(α))]
n n n
n =0 s tf (n)[
(15.45)
N N
+ π(α) n =0 sn j =n +1 tf (j).
15.4. f -Geometric ergodicity for general chains 387

N ∞ n
n =0 s |u(n) − π(α)|. We can
n
Let us write cN (f, s) = n =0 s tf (n), and d(s) =
bound the ﬁrst term in (15.45) in absolute value by d(s)cN (f, s), so in particular as
s ↓ 1, by monotonicity of d(s) we know that the middle term is no more negative than
−d(r)cN (f, s).
On the other hand, the third term is by Fubini’s Theorem given by

N
π(α)[s − 1]−1 tf (n)(sn − 1) ≥ [s − 1]−1 [π(α)cN (f, s) − π(f ) − π(α)f (α)]. (15.46)
n =0

Suppose now that α is not f -Kendall. Then for any s > 1 we have that cN (f, s) is
unbounded as N becomes large. Fix s suﬃciently small that π(α)[s − 1]−1 > d(r); then
we have that the right hand side of (15.45) is greater than

cN (f, s)[π(α)[s − 1]−1 − d(r)] − (π(f ) + π(α)f (α))/(1 − s)

which tends to inﬁnity as N → ∞. This clearly contradicts the ﬁniteness of the left side
of (15.45). Consequently α is f -Kendall of rate s for some s < r, and then the chain is
f -geometrically regular when restricted to a full absorbing set S from Proposition 15.3.2.
Now suppose that the chain does not admit an accessible atom. If the chain is
f -geometrically ergodic, then it is straightforward that for every m-skeleton and every
x we have
rn |P n m (x, f ) − π(f )| < ∞,
n

and for the split chain corresponding to one such skeleton we also have |rn P̌ n (x, f ) −
π(f )| summable. From the ﬁrst part of the proof this ensures that the split chain,
and again trivially the m-skeleton is f (m ) -geometrically regular, at least on a full ab-
sorbing set S. We can then use Theorem 15.3.7 to deduce that the original chain is
f -geometrically regular on S as required.

One of the uses of this result is to show that even when π(f ) < ∞ there is no
guarantee that geometric ergodicity actually implies f -geometric ergodicity: rates of
convergence need not be inherited by the f -norm convergence for “large” functions f .
We will see this in the example deﬁned by (16.24) in the next chapter.
However, we can show that local geometric ergodicity does at least give the V -
geometric ergodicity of Theorem 15.4.1, for an appropriate V . As in Chapter 13, we
conclude with what is now an easy result.

Theorem 15.4.3. Suppose that Φ is an aperiodic positive Harris chain, with invariant
probability measure π, and that there exists some ν-small set C ∈ B+ (X), ρC < 1 and
MC < ∞, and P ∞ (C) > 0 such that ν(C) > 0 and

νC (dx)(P n (x, C) − P ∞ (C)) ≤ MC ρnC (15.47)

C

where νC ( · ) = ν( · )/ν(C) is normalized to a probability measure on C.

Then there exists a full absorbing set S such that the chain restricted to S is geo-
metrically ergodic.
388 Geometric ergodicity

Proof Using the Nummelin splitting via the set C for the m-skeleton, we have
exactly as in the proof of Theorem 13.3.5 that the bound (15.47) implies that the atom
in the skeleton chain split at C is geometrically ergodic.
We can then emulate step (iii) of the proof of Theorem 15.4.1 above to reach the
conclusion.

Notice again that (15.47) is implied by (15.1), so that we have completed the circle
of results in Theorem 15.0.1.

15.5 Simple random walk and linear models

In order to establish geometric ergodicity for speciﬁc models, we will of course use the
drift criterion (V4) as a practical tool to establish the required properties of the chain.
We conclude by illustrating this for three models: the simple random walk on Z+ ,
the simple linear model, and a bilinear model. We give many further examples in
Chapter 16, after we have established a variety of desirable and somewhat surprising
consequences of geometric ergodicity.

15.5.1 Bernoulli random walk

Consider the simple random walk on Z+ with transition law

P (x, x + 1) = p, x ≥ 0; P (x, x − 1) = 1 − p, x > 0; P (0, 0) = 1 − p.

For this chain we can consider directly Px (τ0 = n) = ax (n) in order to evaluate the
geometric tails of the distribution of the hitting times. Since we have the recurrence
relations
ax (n) = (1 − p)ax−1 (n − 1) + pax+1 (n − 1), x > 1;
ax (0) = 0, x ≥ 1;
a1 (n) = pa2 (n − 1), a0 (0) = 0,
∞
valid for n ≥ 1, the generating functions Ax (z) = n =0 ax (n)z n satisfy

Ax (z) = z(1 − p)Ax−1 (z) + zpAx+1 (z), x > 1;

A1 (z) = z(1 − p) + zpA2 (z),

giving the solution

1 − (1 − 4pqz 2 )1/2 x x
Ax (z) = = A1 (z) . (15.48)
2pz
-
This is analytic for z < 2/ p(1 − p), so that if p < 1/2 (that is, if the chain is ergodic)
then the chain is also geometrically ergodic.
Using the drift criterion (V4) to establish this same result is rather easier. Consider
the test function V (x) = z x with z > 1. Then we have, for x > 0,

∆V (x) = z x [(1 − p)z −1 + pz − 1]

and if p < 1/2, then [(1 − p)z −1 + pz − 1] = −β < 0 for z suﬃciently close to unity, and
so (15.28) holds as desired.
15.5. Simple random walk and linear models 389

In fact, this same property, that for random walks on the half line ergodic chains are
also geometrically ergodic, holds in much wider generality. The crucial property is that
the increment distribution have exponentially decreasing right tails, as we shall see in
Section 16.1.3.

15.5.2 Autoregressive and bilinear models

Models common in time series, especially those with some autoregressive character,
often converge geometrically quickly without the need to assume that the innovation
distribution has exponential character. This is because the exponential “drift” of such
models comes from control of the autoregressive terms, which “swamp” the linear drift of
the innovation terms for large state space values. Thus the linear or quadratic functions
used to establish simple ergodicity will satisfy the Foster criterion (V2), not merely in
a linear way as is the case of random walk, but in fact in the stronger mode necessary
to satisfy (15.28).
We will therefore often ﬁnd that, for such models, we have already established geo-
metric ergodicity by the steps used to establish simple ergodicity or even boundedness
in probability, with no further assumptions on the structure of the model.

Simple linear models

Consider again the simple linear model deﬁned in (SLM1) by

Xn = αXn −1 + Wn

and assume W has an everywhere positive density so the chain is a ψ-irreducible T-

chain. Now choosing V (x) = |x| + 1 gives

Ex [V (X1 )] ≤ |α|V (x) + E[|W |] + 1. (15.49)

We noted in Proposition 11.4.2 that for large enough m, V satisﬁes (V2) with C =
CV (m) = {x : |x| + 1 ≤ m}, provided that

E[|W |] < ∞, |α| < 1 :

thus {Xn } admits an invariant probability measure under these conditions.

But now we can look with better educated eyes at (15.49) to see that V is in fact
a solution to (15.28) under precisely these same conditions, and so we can strengthen
Proposition 11.4.2 to give the conclusion that such simple linear models are geometri-
cally ergodic.

Scalar bilinear models

We illustrate this phenomenon further by re-considering the scalar bilinear model, and
examining the conditions which we showed in Section 12.5.2 to be suﬃcient for this
model to be bounded in probability. Recall that X is deﬁned by the bilinear process
on X = R
Xk +1 = θXk + bWk +1 Xk + Wk +1 (15.50)
where W is i.i.d. From Proposition 7.1.3 we know when Φ is a T-chain.
390 Geometric ergodicity

To obtain a geometric rate of convergence, we reinterpret (12.36) which showed that

E[|Xk +1 | | Xk = x] ≤ E[|θ + bWk +1 |]|x| + E[|Wk +1 |] (15.51)

to see that V (x) = |x| + 1 is a solution to (V4) provided that

E[|θ + bWk +1 |] < 1. (15.52)

Under this condition, just as in the simple linear model, the chain is irreducible and
aperiodic and thus again in this case we have that the chain is V -geometrically ergodic
with V (x) = |x| + 1.
Suppose further that W has ﬁnite variance σw2 satisfying

θ2 + b2 σw2 < 1;

exactly as in Section 14.4.2, we see that V (x) = x2 is a solution to (V4) and hence Φ
is V -geometrically ergodic with this V . As a consequence, the chain admits a second
order stationary distribution π with the property that for some r > 1 and c < ∞, and
all x and n,

n 2
r P (x, dy)y − π(dy)y < c(x2 + 1).
n 2

Thus not only does the chain admit a second order stationary version, but the time
dependent variances converge to the stationary variance.

15.6 Commentary*
Unlike much of the ergodic theory of Markov chains, the history of geometrically ergodic
chains is relatively straightforward. The concept was introduced by Kendall in [202],
where the existence of the solidarity property for countable space chains was ﬁrst estab-
lished: that is, if one transition probability sequence P n (i, i) converges geometrically
quickly, so do all such sequences. In this seminal paper the critical renewal theorem
(Theorem 15.1.1) was established.
The central result, the existence of the common convergence rate, is due to Vere-
Jones [403] in the countable space case; the fact that no common best bound exists was
also shown by Vere-Jones [403], with the more complex example given in Section 15.1.4
being due to Teugels [384]. Vere-Jones extended much of this work to non-negative
matrices [405, 406], and this approach carries over to general state space operators
[394, 395, 303].
Nummelin and Tweedie [307] established the general state space version of geo-
metric ergodicity, and by using total variation norm convergence, showed that there
is independence of A in the bounds on |P n (x, A) − π(A)|, as well as an independent
geometric rate. These results were strengthened by Nummelin and Tuominen [305],
who also show as one important application that it is possible to use this approach to
establish geometric rates of convergence in the Key Renewal Theorem of Section 14.5 if
the increment distribution has geometric tails. Their results rely on a geometric trials
argument to link properties of skeletons and chains: the drift condition approach here
is new, as is most of the geometric regularity theory.
15.6. Commentary* 391

The upper bound in (15.4) was first observed by Chan [62]. Meyn and Tweedie [277]
developed the f -geometric ergodicity approach, thus leading to the final form of Theo-
rem 15.4.1; as discussed in the next chapter, this form has important operator-theoretic
consequences, as pointed out in the case of countable X by Hordijk and Spieksma [163].
The drift function criterion was first observed by Popov [320] for countable chains,
with general space versions given by Nummelin and Tuominen [305] and Tweedie [400].
The full set of equivalences in Theorem 15.0.1 is new, although much of it is implicit in
Nummelin and Tweedie [307] and Meyn and Tweedie [277].
Initial application of the results to queueing models can be found in Vere-Jones
[404] and Miller [284], although without the benefit of the drift criteria, such appli-
cations are hard work and restricted to rather simple structures. The bilinear model
in Section 15.5.2 is first analyzed in this form in Feigin and Tweedie [111]. Further
interpretation and exploitation of the form of (15.4) is given in the next chapter, where
we also provide a much wider variety of applications of these results.
In general, establishing exact rates of convergence or even bounds on such rates
remains (for infinite state spaces) an important open problem, although by analyzing
Kendall’s Theorem in detail Spieksma [367] has recently identified upper bounds on the
area of convergence for some specific queueing models.
Added in second printing: There has now been a substantial amount of work on this
problem, and quite different methods of bounding the convergence rates have been found
by Meyn and Tweedie [282], Baxendale [21], Rosenthal [343, 342] and Lund and Tweedie
[241]. However, apart from the results in [241] which apply only to stochastically
monotone chains, none of these bounds are tight, and much remains to be done in this
area.

Commentary for the second edition: This is an evolving research area, and one
that is too large to summarize here. Section 20.1 contains a partial survey of the
state-of-the-art of geometric ergodicity and its applications. Applications to queueing
networks are surveyed in [267].
Chapter 16

V -Uniform ergodicity

In this chapter we introduce the culminating form of the geometric ergodicity theorem,
and show that such convergence can be viewed as geometric convergence of an operator
norm; simultaneously, we show that the classical concept of uniform (or strong) ergod-
icity, where the convergence in (13.4) is bounded independently of the starting point,
becomes a special case of this operator norm convergence.
We also take up a number of other consequences of the geometric ergodicity proper-
ties proven in Chapter 15, and give a range of examples of this behavior. For a number
of models, including random walk, time series and state space models of many kinds,
these examples have been held back to this point precisely because the strong form of
ergodicity we now make available is met as the norm, rather than as the exception. This
is apparent in many of the calculations where we verified the ergodic drift conditions
(V2) or (V3): often we showed in these verifications that the stronger form (V4) actu-
ally held, so that unwittingly we had proved V -uniform or geometric ergodicity when
we merely looked for conditions for ergodicity.
To formalize V -uniform ergodicity, let P1 and P2 be Markov transition functions,
and for a positive function ∞ > V ≥ 1, define the V -norm distance between P1 and P2
as

P1 (x, · ) − P2 (x, · )
V
|||P1 − P2 |||V := sup . (16.1)
x∈X V (x)
The outer product of the function 1 and the measure π is denoted

[1 ⊗ π](x, A) = π(A), x ∈ X, A ∈ B(X).

In typical applications we consider the distance |||P k − 1 ⊗ π|||V for large k.

V -uniform ergodicity
An ergodic chain Φ is called V -uniformly ergodic if

|||P n − 1 ⊗ π|||V → 0, n → ∞. (16.2)

392
V -Uniform ergodicity 393

We develop three main consequences of Theorem 15.0.1 in this chapter.

Firstly, we interpret (15.4) in terms of convergence in the operator norm |||P k −1⊗π|||V
when V satisﬁes (15.3), and consider in particular the uniformity of bounds on the
geometric convergence in terms of such solutions of (V4). Showing that the choice of
V in the term V -uniformly ergodic is not coincidental, we prove
Theorem 16.0.1. Suppose that Φ is ψ-irreducible and aperiodic. Then the following
are equivalent for any V ≥ 1:
(i) Φ is V -uniformly ergodic.
(ii) There exist r > 1 and R < ∞ such that for all n ∈ Z+

|||P n − 1 ⊗ π|||V ≤ Rr−n . (16.3)

(iii) There exists some n > 0 such that |||P i − 1 ⊗ π|||V < ∞ for i ≤ n and

|||P n − 1 ⊗ π|||V < 1. (16.4)

(iv) The drift condition (V4) holds for some petite set C and some V0 , where V0 is
equivalent to V in the sense that for some c ≥ 1,

c−1 V ≤ V0 ≤ cV. (16.5)

Proof That (i), (ii) and (iii) are equivalent follows from Proposition 16.1.3. The
fact that (ii) follows from (iv) is proven in Theorem 16.1.2, and the converse, that (ii)
implies (iv), is Theorem 16.1.4.

Secondly, we show that V -uniform ergodicity implies that the chain is strongly mix-
ing. In fact, it is shown in Theorem 16.1.5 that for a V -uniformly ergodic chain, there
exists R and ρ < 1 such that for any g 2 , h2 ≤ V and k, n ∈ Z+ ,

|Ex [g(Φk )h(Φn +k )] − Ex [g(Φk )]Ex [h(Φn +k )]| ≤ Rρn [1 + ρk V (x)].

Finally in this chapter, using the form (16.3), we connect concepts of geometric
ergodicity with one of the oldest, and strongest, forms of convergence in the study of
Markov chains, namely uniform ergodicity (sometimes called strong ergodicity).

Uniform ergodicity
A chain Φ is called uniformly ergodic if it is V -uniformly ergodic in the
special case where V ≡ 1, that is, if

sup
P n (x, · ) − π
→ 0, n → ∞. (16.6)
x∈X

There are a large number of stability properties all of which hold uniformly over the
whole space when the chain is uniformly ergodic.
394 V -Uniform ergodicity

Theorem 16.0.2. For any Markov chain Φ the following are equivalent:

(i) Φ is uniformly ergodic.

(ii) There exist r > 1 and R < ∞ such that for all x

P n (x, · ) − π
≤ Rr−n ; (16.7)

that is, the convergence in (16.6) takes place at a uniform geometric rate.

(iii) For some n ∈ Z+ ,

sup
P n (x, · ) − π( · )
< 1. (16.8)
x∈X

(iv) The chain is aperiodic and Doeblin’s condition holds: that is, there is a probability
measure φ on B(X) and ε < 1, δ > 0, m ∈ Z+ such that whenever φ(A) > ε

inf P m (x, A) > δ. (16.9)

x∈X

(v) The state space X is νm -small for some m.

(vi) The chain is aperiodic and there is a petite set C with

sup Ex [τC ] < ∞,

x∈X

in which case for every set A ∈ B + (X), supx∈X Ex [τA ] < ∞.

(vii) The chain is aperiodic and there is a petite set C and a κ > 1 with

sup Ex [κτ C ] < ∞,

x∈X

in which case for every A ∈ B+ (X) we have for some κA > 1,

sup Ex [κτAA ] < ∞.

x∈X

(viii) The chain is aperiodic and there is a bounded solution V ≥ 1 to

∆V (x) ≤ −βV (x) + bIC (x), x∈X (16.10)

for some β > 0, b < ∞, and some petite set C.

Under (v), we have in particular that for any x,

P n (x, · ) − π
≤ 2ρn /m (16.11)

where ρ = 1 − νm (X).
16.1. Operator norm convergence 395

Proof This cycle of results is proved in Theorem 16.2.1–Theorem 16.2.4.

Thus we see that uniform convergence can be embedded as a special case of V -
geometric ergodicity, with V bounded; and by identifying the minorization that makes
the whole space small we can explicitly bound the rate of convergence.
Clearly then, from these results geometric ergodicity is even richer, and the iden-
tiﬁcation of test functions for geometric ergodicity even more valuable, than the last
chapter indicated. This leads us to devote attention to providing a method of moving
from ergodicity with a test function V to esV -geometric convergence, which in practice
appears to be a natural tool for strengthening ergodicity to its geometric counterpart.
Throughout this chapter, we provide examples of geometric or uniform convergence
for a variety of models. These should be seen as templates for the use of the veriﬁcation
techniques we have given in the theorems of the past several chapters.

16.1 Operator norm convergence

16.1.1 The operator norm ||| · |||V
We ﬁrst verify that ||| · |||V is indeed an operator norm.
Lemma 16.1.1. Let L∞
V denote the vector space of all functions f : X → R+ satisfying

|f (x)|
|f |V := sup < ∞.
x∈X V (x)

If |||P1 − P2 |||V is ﬁnite then P1 − P2 is a bounded operator from L∞

V to itself, and
|||P1 − P2 |||V is its operator norm.

Proof The deﬁnition of ||| · |||V may be restated as

$ sup %
|g |≤V |P1 (x, g) − P2 (x, g)|
|||P1 − P2 |||V = sup
x∈X V (x)
|P1 (x, g) − P2 (x, g)|
= sup sup
|g |≤V x∈X V (x)
= sup |P1 ( · , g) − P2 ( · , g)|V
|g |≤V

= sup |P1 ( · , g) − P2 ( · , g)|V

|g |V ≤1

which is by deﬁnition the operator norm of P1 − P2 viewed as a mapping from L∞

V to
itself.

We can put this concept together with the results of the last chapter to show
Theorem 16.1.2. Suppose that Φ is ψ-irreducible and aperiodic and (V4) is satisﬁed
with C petite and V everywhere ﬁnite. Then for some r > 1,

rn |||P n − 1 ⊗ π|||V < ∞, (16.12)

and hence Φ is V -uniformly ergodic.

396 V -Uniform ergodicity

Proof This is largely a restatement of the result in Theorem 15.4.1. From Theo-
rem 15.4.1 for some R < ∞, ρ < 1,

P n (x, · ) − π
V ≤ RV (x)ρn , n ∈ Z+ ,

and the theorem follows from the deﬁnition of ||| · |||V .

Because ||| · |||V is a norm it is now easy to show that V -uniformly ergodic chains are
always geometrically ergodic, and in fact V -geometrically ergodic.

Proposition 16.1.3. Suppose that π is an invariant probability and that for some n0 ,

|||P − 1 ⊗ π|||V < ∞ and |||P n 0 − 1 ⊗ π|||V < 1.

Then there exists r > 1 such that

∞

rn |||P n − 1 ⊗ π|||V < ∞.
n =1

Proof Since ||| · |||V is an operator norm we have for any m, n ∈ Z+ , using the
invariance of π,

|||P n +m − 1 ⊗ π|||V = |||(P − 1 ⊗ π)n (P − 1 ⊗ π)m |||V ≤ |||P n − 1 ⊗ π|||V |||P m − 1 ⊗ π|||V .

For arbitrary n ∈ Z+ write n = kn0 + i with 1 ≤ i ≤ n0 . Then since we have

|||P n 0 − 1 ⊗ π|||V = γ < 1 and |||P − 1 ⊗ π|||V ≤ M < ∞ this implies that (choosing M ≥ 1
with no loss of generality)

i k
|||P n − 1 ⊗ π|||V ≤ |||P − 1 ⊗ π|||V |||P n 0 − 1 ⊗ π|||V
≤ M i γk
≤ M n 0 γ −1 (γ 1/n 0 )n

which gives the claimed geometric convergence result.

Next we conclude the proof that V -uniform ergodicity is essentially equivalent to V

solving the drift condition (V4).

Theorem 16.1.4. Suppose that Φ is ψ-irreducible, and that for some V ≥ 1 there exist
r > 1 and R < ∞ such that for all n ∈ Z+

|||P n − 1 ⊗ π|||V ≤ Rr−n . (16.13)

Then the drift condition (V4) holds for some V0 , where V0 is equivalent to V in the
sense that for some c ≥ 1,
c−1 V ≤ V0 ≤ cV. (16.14)
16.1. Operator norm convergence 397

Proof Fix C ∈ B + (X) as any petite set. Then we have from (16.13) the bound

P n (x, C) ≥ π(C) − Rρn V (x)

and hence the sublevel sets of V are petite by Proposition 5.5.4 (i), and so V is un-
bounded oﬀ petite sets.
From the bound
P n V ≤ Rρn V + π(V ) (16.15)
we see that (15.35) holds for the n-skeleton whenever Rρn < 1. Fix n with Rρn < e−1 ,
and set

n −1
V0 (x) := exp[i/n]P i V.
i=0

We have that V0 > V , and from (16.15),

V0 ≤ e1 nRV + nπ(V ),

which shows that V0 is equivalent to V in the required sense of (16.14).

From the drift (16.15) which holds for the n-skeleton we have

n
P V0 = exp[i/n − 1/n]P i V
i=1

n −1
= exp[−1/n] exp[i/n]P i V + exp[1 − 1/n]P n V
i=1

n −1
≤ exp[−1/n] exp[i/n]P i V + exp[−1/n]V + exp[1 − 1/n]π(V )
i=1
= exp[−1/n]V0 + exp[1 − 1/n]π(V ).

This shows that (15.35) also holds for Φ, and hence by Lemma 15.2.8 the drift condition
(V4) holds with this V0 , and some petite set C.

Thus we have proved the equivalence of (ii) and (iv) in Theorem 16.0.1.

16.1.2 V -geometric mixing and V -uniform ergodicity

In addition to the very strong total variation norm convergence that V -uniformly ergodic
chains satisfy by deﬁnition, several other ergodic theorems and mixing results may be
obtained for these stochastic processes. Much of Chapter 17 will be devoted to proving
that the Central Limit Theorem, the Law of the Iterated Logorithm, and an invariance
principle holds for V -uniformly ergodic chains. These results are obtained by applying
the ergodic theorems developed in this chapter, and by exploiting the V -geometric
regularity of these chains. Here we will consider a relatively simple result which is a
direct consequence of the operator norm convergence (16.2).
A stochastic process X taking values in X is called strong mixing if there exists a
sequence of positive numbers {δ(n) : n ≥ 0} tending to zero for which

sup |E[g(Xk )h(Xn +k )] − E[g(Xk )]E[h(Xn +k )]| ≤ δ(n), n ∈ Z+ ,

398 V -Uniform ergodicity

where the supremum is taken over all k ∈ Z+ , and all g and h such that |g(x)|, |h(x)| ≤ 1
for all x ∈ X.
In the following result we show that V -uniformly ergodic chains satisfy a much
stronger property. We will call Φ V -geometrically mixing if there exists R < ∞, ρ < 1
such that
sup |Ex [g(Φk )h(Φn +k )] − Ex [g(Φk )]Ex [h(Φn +k )]| ≤ RV (x)ρn , n ∈ Z+ ,
where we now extend the supremum to include all k ∈ Z+ , and all g and h such that
g 2 (x), h2 (x) ≤ V (x) for all x ∈ X.
Theorem 16.1.5. If Φ is V -uniformly ergodic, then there exists R < ∞ and ρ < 1
such that for any g 2 , h2 ≤ V and k, n ∈ Z+ ,
|Ex [g(Φk )h(Φn +k )] − Ex [g(Φk )]Ex [h(Φn +k )]| ≤ Rρn [1 + ρk V (x)],
and hence the chain Φ is V -geometrically mixing.

√
Proof For any h2 ≤ V , g 2 ≤ V let h = h − π(h), g = g − π(g). We have by
V -uniform ergodicity as in Lemma 15.2.9 that for some R < ∞, ρ < 1,
6 7
|Ex [h(Φk )g(Φk +n )]| = Ex h(Φk )EΦ k [g(Φn )]
-
≤ R ρn Ex h(Φk ) V (Φk ) .
1 1 1
Since |h| ≤ 1 + V 2 dπ V 2 we can set R = R 1 + V 2 dπ and apply (15.35) to
obtain the bound
|Ex [h(Φk )g(Φk +n )]| ≤ R ρn Ex [V (Φk )]
+
n L
≤ R ρ k
+ λ V (x) .
1−λ
Assuming without loss of generality that ρ ≥ λ, and using the bounds
-
|π(h) − Ex [h(Φk )]| ≤ R ρk V (x),
-
|π(g) − Ex [g(Φk +n )]| ≤ R ρk +n V (x)
gives the result for some R < ∞.

It follows from Theorem 16.1.5 that if the chain is V -uniformly ergodic, then for
some R1 < ∞,
|Ex [h(Φk )g(Φk +n )]| ≤ R1 ρn [1 + ρk V (x)], k, n ∈ Z+ , (16.16)
where h = h − π(h), g = g − π(g).
By integrating both sides of (16.16) over X, the initial condition x may be replaced
with a finite bound for any initial distribution µ with µ(V ) < ∞, and a mixing condition
will be satisfied for such initial conditions. In the particular case where µ = π we have
by stationarity and finiteness of π(V ) (see Theorem 14.3.7)
|Eπ [h(Φk )g(Φk +n )]| ≤ R2 ρn , k, n ∈ Z+ , (16.17)
for some R2 < ∞; and hence the stationary version of the process satisfies a geometric
mixing condition under (V4).
16.1. Operator norm convergence 399

16.1.3 V -uniform ergodicity for regenerative models

In order to establish geometric ergodicity for speciﬁc models, we will obviously use the
drift criterion (V4) to establish the required convergence. We begin by illustrating this
for two regenerative models: we give many further examples later in the chapter.
For many models with some degree of spatial homogeneity, the crucial condition
leading to geometric convergence involves exponential bounds on the increments of the
process. Let us say that the distribution function G of a random variable is in G + (γ) if
G has a Laplace–Stieltjes transform convergent in [0, γ]: that is, if
∞
est G(dt) < ∞, 0 < s ≤ γ, (16.18)
0

where γ > 0.

Forward recurrence time chains

Consider the forward recurrence time δ-skeleton chain Vδ+ deﬁned by (RT3), based on
increments with spread-out distribution Γ.
Suppose that Γ ∈ G + (γ). By choosing V (x) = eγ x we have immediately that (V4)
holds for x ∈ C with C = [0, δ], and also

−1
[V (x)] P (x, dy)V (y) = eγ (x−δ ) /eγ x = e−γ δ < 1, x > δ.

Thus (V4) also holds on C c , and we conclude that the chain is eγ x -uniformly ergodic.
Moreover, from Theorem 16.0.1 we also have that

|P n (x, dy)eγ y − π(dy)eγ y | < eγ x r−n ,

so that the moment-generating functions of the model, and moreover all polynomial
moments, converge geometrically quickly to their limits with known bounds on the
state-dependent constants.
This is the same result we showed in Section 15.1.4 for the forward recurrence time
chain on Z+ ; here we have used the drift conditions rather than the direct calculation
of hitting times to establish geometric ergodicity.
It is obvious from its construction that for this chain the condition Γ ∈ G + (γ) is
also necessary for geometric ergodicity.
The condition for uniform ergodicity for the forward recurrence time chain is also
trivial to establish, from the criterion in Theorem 16.0.2 (vi). We will only have this
condition holding if Γ is of bounded range so that Γ[0, c] = 1 for some ﬁnite c; in
this case we may take the state space X equal to the compact absorbing set [0, c]. The
existence of such a compact absorbing subset is typical of many uniformly ergodic chains
in practice.

Random walk on R+
Consider now the random walk on [0, ∞), deﬁned by (RWHL1). Suppose that the model
has an increment distribution Γ such that
400 V -Uniform ergodicity

(a) the mean increment β = x Γ(dx) < 0;
(b) the distribution Γ is in G + (γ), for some γ > 0.
Let us choose V (x) = exp(sx), where 0 < s < γ is to be selected. Then we have
∞
P (x, dy)∆V (y)/V (x) = −x Γ(dw)[exp(sw) − 1]

+ Γ(−∞, −x][exp(−sx) − 1]
∞ (16.19)
≤ −∞
Γ(dw)[exp(sw) − 1]
−x
+ −∞
Γ(dw)[1 − exp(sw)].

But now if we let s ↓ 0, then

∞
s−1 Γ(dw)[exp(sw) − 1] → β < 0.
−∞
∞
Thus choosing s0 suﬃciently small that −∞
Γ(dw)[exp(s0 w) − 1] = ξ < 0, and then
choosing c large enough that
Γ(−∞, −x] ≤ −ξ/2, x ≥ c,
we have that (V4) holds with C = [0, c]. Since C is petite for this chain, the random
walk is exp(s0 x)-uniformly ergodic when (a) and (b) hold.
It is then again a consequence of Theorem 16.0.1 that the moment generating func-
tion, and indeed all moments, of the chain converge geometrically quickly.
Thus we see that the behavior of the Bernoulli walk in Section 15.5 is due, essentially,
to the bounded and hence exponential nature of its increment distribution.
We will show in Section 16.3 that one can generalize this result to general chains,
giving conditions for geometric ergodicity in terms of exponentially decreasing “tails”
of the increment distributions.

16.2 Uniform ergodicity

16.2.1 Equivalent conditions for uniform ergodicity
From the deﬁnition (16.6), a Markov chain is uniformly ergodic if |||P n − 1 ⊗ π|||V → 0
as n → ∞ when V ≡ 1. This simple observation immediately enables us to establish
the ﬁrst three equivalences in Theorem 16.0.2, which relate convergence properties of
the chain.
Theorem 16.2.1. The following are equivalent, without any a priori assumption of
ψ-irreducibility or aperiodicity:
(i) Φ is uniformly ergodic.
(ii) There exists ρ < 1 and R < ∞ such that for all x

P n (x, · ) − π
≤ Rρn .
16.2. Uniform ergodicity 401

(iii) For some n ∈ Z+ ,

sup
P n (x, · ) − π( · )
< 1.
x∈X

Proof Obviously (i) implies (iii); but from Proposition 16.1.3 we see that (iii)
implies (ii), which clearly implies (i) as required.

Note that uniform ergodicity implies, trivially, that the chain actually is π-irreducible
and aperiodic, since for π(A) > 0 there exists n with P n (x, A) ≥ π(A)/2 for all x.
We next prove that (v)–(viii) of Theorem 16.0.2 are equivalent to uniform ergodicity.

Theorem 16.2.2. The following are equivalent for a ψ-irreducible aperiodic chain:

(i) Φ is uniformly ergodic.

(ii) The state space X is petite.

(iii) There is a petite set C with supx∈X Ex [τC ] < ∞, in which case for every A ∈ B + (X)
we have supx∈X Ex [τA ] < ∞.

(iv) There is a petite set C and a κ > 1 with supx∈X Ex [κτ C ] < ∞ in which case for
every A ∈ B + (X) we have supx∈X Ex [κτAA ] < ∞ for some κA > 1.

(v) There is an everywhere bounded solution V to (16.10) for some petite set C.

Proof Observe that the drift inequality (11.17) given in (V2) and the drift in-
equality (16.10) are identical for bounded V . The equivalence of (iii) and (v) is thus a
consequence of Theorem 11.3.11, whilst (iv) implies (iii) trivially and Theorem 15.2.6
shows that (v) implies (iv): such connections between boundedness of τA and solutions
of (16.10) are by now standard.
To see that (i) implies (ii), observe that if (i) holds, then Φ is π-irreducible and
hence there exists a small set A ∈ B+ (X). Then, by (i) again, for some n0 ∈ Z+ ,
inf x∈X P n 0 (x, A) > 0 which shows that X is small from Theorem 5.2.4.
The implication that (ii) implies (v) is equally simple. Let V ≡ 1, β = b = 12 , and
C = X. We then have
∆V = −βV + bIC ,

giving a bounded solution to (16.10) as required.

Finally, when (v) holds, we immediately have uniform geometric ergodicity by The-
orem 16.1.2.

Historically, one of the most signiﬁcant conditions for ergodicity of Markov chains
is Doeblin’s condition.
402 V -Uniform ergodicity

Doeblin’s condition
Suppose there exists a probability measure φ with the property that for
some m, ε < 1, δ > 0

φ(A) > ε =⇒ P m (x, A) ≥ δ

for every x ∈ X.

From the equivalences in Theorem 16.2.1 and Theorem 16.2.2, we are now in a
position to give a very simple proof of the equivalence of uniform ergodicity and this
condition.
Theorem 16.2.3. An aperiodic ψ-irreducible chain Φ satisﬁes Doeblin’s condition if
and only if Φ is uniformly ergodic.

Proof Let C be any petite set with φ(C) > ε and consider the test function

V (x) = 1 + IC c (x).

Then from Doeblin’s condition

P m V (x) − V (x) = P m (x, C c ) − IC c (x) ≤ 1 − δ − IC c (x)

= −δ + IC (x)

≤ − 12 δV (x) + IC (x).

Hence V is a bounded solution to (16.10) for the m-skeleton, and it is thus the case that
the m-skeleton and the original chain are uniformly ergodic by the contraction property
of the total variation norm.
Conversely, we have from uniform ergodicity in the form (16.7) that for any ε > 0,
if π(A) ≥ ε then
P n (x, A) ≥ ε − Rρn ≥ ε/2
for all n large enough that Rρn ≤ ε/2, and Doeblin’s condition holds with φ = π.

Thus we have proved the ﬁnal equivalence in Theorem 16.0.2. We conclude by
exhibiting the one situation where the bounds on convergence are simply calculated.
Theorem 16.2.4. If a chain Φ satisﬁes

P m (x, A) ≥ νm (A) (16.20)

for all x ∈ X and A ∈ B(X), then

P n (x, · ) − π
≤ 2ρn /m (16.21)

where ρ = 1 − νm (X).
16.2. Uniform ergodicity 403

Proof This can be shown using an elegant argument based on the assumption
(16.20) that the whole space is small which relies on a coupling method closely connected
to the way in which the split chain is constructed.
Write (16.20) as
P m (x, A) ≥ (1 − ρ)ν(A) (16.22)
where ν = νm /(1 − ρ) is a probability measure.
Assume ﬁrst for simplicity that m = 1. Run two copies of the chain, one from the
initial distribution concentrated at x and the other from the initial distribution π. At
every time point either

(a) with probability 1 − ρ, choose for both chains the same next position from the
distribution ν, after which they will be coupled and then can be run with identical
sample paths; or

(b) with probability ρ, choose for each chain an independent position, using the dis-
tribution (as in the split chain construction) [P (x, · ) − (1 − ρ)ν( · )]/ρ, where x is
the current position of the chain.

This is possible because of the minorization in (16.22). The marginal distributions of

these chains are identical with the original distributions, for every n. If we let T denote
the ﬁrst time that the chains are chosen using the ﬁrst option (a), then we have

P n (x, · ) − π
≤ 2P(T > n) ≤ 2ρn (16.23)

which is (16.21).
When m > 1 we can use the contraction property as in Proposition 16.1.3 to give
(16.21) in the general case.

The optimal use of these many equivalent conditions for uniform ergodicity depends
of course on the context of use. In practice, this last theorem, since it identiﬁes the
exact rate of convergence, is perhaps the most powerful, and certainly gives substantial
impetus to identifying the actual minorization measure which renders the whole space
a small set.
It can also be of importance to use these conditions in assessing when uniform
convergence does not hold: for example, in the forward recurrence time chain V + δ it is
immediate from Theorem 16.2.2 (iii) that, since the mean return time to [0, δ] from x is
of order x, the chain cannot be uniformly ergodic unless the state space can be reduced
to a compact set.
Similar remarks apply to random walk on the half line: we see this explicitly in the
simple random walk of Section 15.5, but it is a rather deeper result [69] that for general
random walk on [0, ∞), Ex [τ0 ] ∼ cx so such chains are never uniformly ergodic.

16.2.2 Geometric convergence of given moments

It is instructive to note that, although the concept of uniform ergodicity is a very
strong one for convergence of distributions, it need not have any implications for the
convergence of moments or other unbounded functionals of the chain at a geometric
rate.
404 V -Uniform ergodicity

This is obviously true in a trivial sense: an i.i.d. sequence Φn converges in a uni-

formly ergodic manner, regardless of whether E[Φn ] is ﬁnite or not.
But rather more subtly, we now show that it is possible for us to construct a uni-
formly ergodic chain with convergence rate ρ such that π(f ) < ∞, so that we know
Ex [f (Φn )] → π(f ), but where not only does this convergence not take place at rate ρ,
it actually does not take place at any geometric rate at all.
For convenience of exposition we construct this chain on a countable ladder space
X = Z+ × Z+ , even though the example is essentially one-dimensional.
Fix β < 1/4, and deﬁne for the ith rung of the ladder the indices

i − 1 m
m (i) := , i ≥ 1, m ≥ 0.
iβ

Note that for i = 1 we have m (1) = 0 for all m, but for i > 1

i − 1 m +1 i − 1 m i − 1 m i − 1 − iβ
− = ≥1
iβ iβ iβ iβ

since (i − 1 − iβ)/iβ ≥ (3i − 1)/i ≥ 2. Hence from the second rung up, this sequence
m (i) forms a strictly monotone increasing set of states along the rung.
The transition mechanism we consider provides a chain satisfying Doeblin’s condi-
tion. We suppose P is given by

P (i, m (i); i, m +1 (i)) = β, i = 1, 2, . . . , m = 1, 2, . . . ,

P (i, m (i); 0, 0) = 1 − β, i = 1, 2, . . . , m = 1, 2, . . . ,

P (i, k; 0, 0) = 1, i = 1, 2, . . . , k = m (i), m = 1, 2, . . . , (16.24)

P (0, 0; i, j) = αij , i, j ∈ X,

P (0, k; 0, 0) = 1, k > 0,

where the αij are to be determined, with α00 > 0.

In eﬀect this chain moves only on the states (0, 0) and the sequences m (i), and the
whole space is small with

P (i, k; · ) ≥ min(1 − β, α00 )δ00 ( · ).

Thus the chain is clearly uniformly and hence geometrically ergodic.

Now consider the function f deﬁned by f (i, k) = k; that is, f denotes the distance of
the chain along the rung independent of the rung in question. We show that the chain
is f -ergodic but not f -geometrically ergodic, under suitable choice of the distribution
αij .
16.2. Uniform ergodicity 405

First note that we can calculate

τ −1 ∞ n
Ei,1 [ 00 , 0 f (Φn )] = (1 − β) n =0 βn m =0 m (i)
∞ n i−1 m
≤ (1 − β) n =0 βn m =0 iβ

= i;
τ −1 i−1 m
Ei, m (i) [ 00 , 0 f (Φn )] ≤ iβ i, m = 1, 2, . . . ;
τ −1
Ei,k [ 00 , 0 f (Φn )] = k, k = m (i), m = 1, 2, . . . .

Now let us choose

−i−k
αik = c2 , m (i), m = 1, 2, . . . ;
k=
∞
= c m =0 2−i− (i) ,
m
αik k = 1,

and all other values except α00 as zero, and where c is chosen to ensure that the αik
form a probability distribution.
With this choice we have
τ −1 6∞ 7
E0,0 [ 00 , 0 f (Φn )] ≤ 1 + i≥1 k = m (i),m ≥0 k2−i−k + i≥1 m =0 2
−i− m (i)
i

≤ 1+2 i≥1 i2−i < ∞

so that the chain is certainly f -ergodic by Theorem 14.0.1. However for any r ∈ (1, β −1 ),
τ −1 ∞ n
Ei,1 [ 00 , 0 f (Φn )rn ] = (1 − β) n =0 β n rn m =0 m (i)
∞ n 6 i−1 m 7
≥ (1 − β) n =0 (βr)
n
m =0 iβ −1
1−β ∞ 6
n [(i−1)/iβ ]
n+1
−1 7
= − 1−β r + n =0 (βr) [(i−1)/iβ ]−1

which is inﬁnite if
6i − 17
βr > 1;
iβ
that is, for those rungs i such that i > r/(r − 1). Since there is positive probability of
reaching such rungs in one step from (0, 0) it is immediate that
τ 0 , 0 −1

E0,0 [ f (Φn )rn ] = ∞
0

for all r > 1, and hence from Theorem 15.4.2 for all r > 1

rn
P n (0, 0; · ) − π
f = ∞.
n

Since {0, 0} ∈ B+ (X), this implies that

P n (x; · ) − π
f is not o(ρn ) for any x or any
ρ < 1.
406 V -Uniform ergodicity

We have thus demonstrated that the strongest rate of convergence in the simple total
variation norm may not be inherited, even by the simplest of unbounded functions; and
that one really needs, when considering such functions, to use criteria such as (V4) to
ensure that these functions converge geometrically.

16.2.3 Uniform ergodicity: T-chains on compact spaces

For T-chains, we have an almost trivial route to uniform ergodicity, given the results
we now have available.

Theorem 16.2.5. If Φ is a ψ-irreducible and aperiodic T-chain, and if the state space
X is compact, then Φ is uniformly ergodic.

Proof If Φ is a ψ-irreducible T-chain, and if the state space X is compact, then

it follows directly from Theorem 6.0.1 that X is petite. Applying the equivalence of (i)
and (ii) given in Theorem 16.2.2 gives the result.

One speciﬁc model, the nonlinear state space model, is also worth analyzing in more
detail to show how we can identify other conditions for uniform ergodicity.

The NSS(F ) model

In a manner similar to the proof of Theorem 16.2.5 we show that the the NSS(F )
model deﬁned by (NSS1) and (NSS2) is uniformly ergodic, provided that the associated
control model CM(F ) is stable in the sense of Lagrange, so that in eﬀect the state space
is reduced to a compact invariant subset.

Lagrange stability
The CM(F ) model is called Lagrange stable if A+ (x) is compact for each
x ∈ X.

Typically in applications, when the CM(F ) model is Lagrange stable the input
sequence will be constrained to lie in a bounded subset of Rp . We stress however that
no conditions on the input are made in the general deﬁnition of Lagrange stability.
The key to analyzing the NSS(F ) corresponding to a Lagrange stable control model
lies in the following lemma:

Lemma 16.2.6. Suppose that the CM(F ) model is forward accessible, Lagrange stable,
M -irreducible and aperiodic, and suppose that for the NSS(F ) model conditions (NSS1)–
(NSS3) are satisﬁed.
Then for each x ∈ X the set A+ (x) is closed, absorbing, and small.
16.3. Geometric ergodicity and increment analysis 407

Proof By Lagrange stability it is suﬃcient to show that any compact and invariant
set C ⊂ X is small. This follows from Theorem 7.3.5 (ii), which implies that compact
sets are small under the conditions of the lemma.

Using Lemma 16.2.6 we now establish geometric convergence of the expectation of
functions of Φ:

Theorem 16.2.7. Suppose the NSS(F ) model satisﬁes conditions (NSS1)–(NSS3) and
that the associated control model CM(F ) is forward accessible, Lagrange stable, M -
irreducible and aperiodic.
Then a unique invariant probability π exists, and the chain restricted to the absorbing
set A+ (x) is uniformly ergodic for each initial condition.
Hence also for every function f : X → R which is uniformly bounded on compact
sets, and every initial condition,

Ey [f (Φk )] → f dπ

at a geometric rate.

Proof When CM(F ) is forward accessible, M -irreducible and aperiodic, we have

seen in Theorem 7.3.5 that the Markov chain Φ is ψ-irreducible and aperiodic.
The result then follows from Lemma 16.2.6: the chain restricted to A+ (x) is uni-
formly ergodic by Theorem 16.0.2.

16.3 Geometric ergodicity and increment analysis

16.3.1 Strengthening ergodicity to geometric ergodicity
It is possible to give a “generic” method of establishing that (V4) holds when we have
already used the test function approach to establishing simple (non-geometric) ergodic-
ity through Theorem 13.0.1. This method builds on the speciﬁc technique for random
walks, shown in Section 16.1.3 above, and is an increment-based method similar to that
in Section 9.5.1.
Suppose that V is a test function for regularity. We assume that V takes on the
“traditional” form due to Foster: V is ﬁnite valued, and for some petite set C and some
constant b < ∞, we have
"
V (x) − 1 for x ∈ C c ,
P (x, dy)V (y) ≤ (16.25)
b for x ∈ C.

Recall that VC (x) = Ex [σC ] is the minimal solution to (16.25) from Theorem 11.3.5.

Theorem 16.3.1. If Φ is a ψ-irreducible ergodic chain and V is a test function satis-

fying (16.25), and if P satisﬁes, for some c, d < ∞ and β > 0, and all x ∈ X,

P (x, dy) exp{β V (y) − V (x) } ≤ c (16.26)
V (y )≥V (x)
408 V -Uniform ergodicity

and
2
P (x, dy) V (y) − V (x) ≤ d, (16.27)
V (y )< V (x)

then Φ is V ∗ -uniformly ergodic, where V ∗ (y) = eδ V (y ) for some δ < β.

Proof For positive δ < β we have

[V ∗ (x)]−1 P (x, dy)V ∗ (y) = P (x, dy) exp{δ(V (y) − V (x))}
$
= P (x, dy) 1 + δ(V (y) − V (x))

2
%
+ δ2 (V (y) − V (x))2 exp{δθx (V (y) − V (x))}
(16.28)
for some θx ∈ [0, 1], by using a second order Taylor expansion. Since V satisﬁes (16.25),
the right hand side of (16.28) is bounded for x ∈ C c by
2
$ 2
1 − δ + δ2 V (y )< V (x) P (x, dy) V (y) − V (x)

2 %
+ V (y )≥V (x)
P (x, dy) (V (y) − V (x) exp{δ V (y) − V (x) }

δ2 δ 2 −ξ

≤1−δ+ 2 d+ 2 V (y )≥V (x)
P (x, dy) exp{(δ + δ ξ /2 ) V (y) − V (x) }

δ 2 −ξ

≤1−δ+ 2 d+c ,
(16.29)
for some ξ ∈ (0, 1) such that δ + δ ξ /2 < β by virtue of (16.26) and (16.27), and the fact
that x2 is bounded by ex on R+ . This proves the theorem, since we have

δ 2−ξ
1−δ+ d+c <1
2
for suﬃciently small δ > 0, and thus (V4) holds for V ∗ .

The typical example of this behavior, on which this proof is modeled, is the random
walk in Section 16.1.3. In that case V (x) = x, and (16.26) is the requirement that
Γ ∈ G + (γ). In this case we do not actually need (16.27), which may not in fact hold.
It is often easier to verify the conditions of this theorem than to evaluate directly the
existence of a test function for geometric ergodicity, as we shall see in the next section.
How necessary are the conditions of this theorem on the “tails” of the increments?
By considering for example the forward recurrence time chain, we see that for some
chains Γ ∈ G + (γ) may indeed be necessary for geometric ergodicity. However, geometric
tails are certainly not always necessary for geometric ergodicity: to demonstrate this
simply consider any i.i.d. process, which is trivially uniformly ergodic, regardless of its
“increment” structure.
It is interesting to note, however, that although they seem somewhat “proof depen-
dent”, the uniform bounds (16.26) and (16.27) on P that we have imposed cannot be
weakened in general when moving from ergodicity to geometric ergodicity.
16.3. Geometric ergodicity and increment analysis 409

We ﬁrst show that we can ensure lack of geometric ergodicity if the drift to the right
is not uniformly controlled in terms of V as in (16.26), even for a chain satisfying all our
other conditions. To see this we consider a chain on Z+ with transition matrix given
by, for each i ∈ Z+ ,

P (0, i) = αi > 0,
P (i, i − 1) = γi > 0,
P (i, i + n) = [1 − γi ][1 − βi ]βin , n ∈ Z+ . (16.30)

where αi = 1 and γi , βi are less than unity for all i.

Provided iαi < ∞ and we choose γi suﬃciently large that

[1 − γi ]βi /[1 − βi ] − γi ≤ −ε

for some ε > 0, then the chain is ergodic since V (x) = x satisﬁes (V2): this can be done
if we choose, for example,
γi ≥ βi + ε[1 − βi ].

And now if we choose βj → 1 as j → ∞ we see that the chain is not geometrically

ergodic: we have for any j

Pj (τ0 > n) ≥ [1 − γj ][1 − βj ]βjn

so P0 (τ0 > n) does not decrease geometrically quickly, and the chain is not geometrically
ergodic from Theorem 15.4.2 (or directly from Theorem 15.1.1).
In this example we have bounded variances for the left tails of the increment dis-
tributions, and exponential tails of the right increments: it is the lack of uniformity in
these tails that fails along with the geometric convergence.
To show the need for (16.27), consider the chain on Z+ with the transition matrix
(15.20) given for all j ∈ Z+ by P (0, 0) = 0 and

P (0, j) = γj > 0, P (j, j) = βj , P (j, 0) = 1 − βj ,

where j γj = 1. We saw in Section 15.1.4 that if βj → 1 as n → ∞, the chain cannot
be geometrically ergodic regardless of the structure of the distribution {γj }.
If we consider the minimal solution to (16.25), namely

V0 (j) = Ej [σ0 ] = [1 − βj ]−1 , j > 0,

then clearly the right hand increments are uniformly bounded in relation to V for j > 0:
but we ﬁnd that

P (i, j)(V0 (j) − V0 (i))2 = P (i, 0)[1 − βi ]−2 = [1 − βi ]−1 → ∞, i → ∞.

Hence (16.27) is necessary in this model for the conclusion of Theorem 16.3.1 to be
valid.
410 V -Uniform ergodicity

16.3.2 Geometric ergodicity and the structure of π

The relationship between spatial and temporal geometric convergence in the previous
section is largely a result of the spatial homogeneity we have assumed when using
increment analysis.
We now show that this type of relationship extends to the invariant probability
measure π also, at least in terms of the “natural” ordering of the space induced by
petite sets and test functions.
Let us we write, for any function g,

Ag ,n (x) = {y : g(y) ≤ g(x) − n}.

We say that the chain is “g-skip-free to the left” if there is some k ∈ Z+ , such that for
all x ∈ X,
P (x, Ag ,k (x)) = 0, (16.31)
so that the chain can only move a limited amount of “distance” through the sublevel
sets of g in one step. Note that such skip-free behavior precludes Doeblin’s condition if
g is unbounded oﬀ petite sets, and requires a more random-walk-like behavior.
Theorem 16.3.2. Suppose that Φ is geometrically ergodic. Then there exists β > 0
such that
π(dy)eβ V C (y ) < ∞ (16.32)

where VC (y) = Ey [σC ] for any petite set C ∈ B+ (X).

If Φ is g-skip-free to the left for a function g which is unbounded oﬀ petite sets, then
for some β > 0

π(dy)eβ g (y )
< ∞. (16.33)

Proof From geometric ergodicity, we have from Theorem 15.2.4 that for any petite
(r )
set C ∈ B + (X) there exists r > 1 such that V (y) = GC (y, X) satisﬁes (V4). It follows
from Theorem 14.3.7 that π(V ) < ∞. Using the interpretation (15.29) we have that

∞ > π(V ) ≥ π(dy)Ey [rσ C ]. (16.34)

Now the function f (j) = z j is convex in j ∈ Z+ , so that Ex [rσ C ] ≥ rEx [σ C ] by Jensen’s

inequality. Thus we have (16.32) as desired.
Now suppose that g is such that the chain is g-skip-free to the left, and fix b so that
the petite set C = {y : g(y) ≤ b} is in B + (X). Because of the left skip-free property
(16.31), for g(x) ≥ nk + b, we have Px (σC ≤ n) = 0 so that Ex [rσ C ] ≥ r(g (x)−b)/k .
As π(dx)Ex [rσ C ] < √ ∞ by virtue of (16.34), we have thus proved the second part
of the theorem for eβ = k r.

This result shows two things; firstly, if we think of VC (or equivalently GC (x, X)) as
providing a natural scaling of the space in some way, then geometrically ergodic chains
do have invariant measures with geometric “tails” in this scaling.
Secondly, and in practice more usefully, we have an identifiable scaling for such tails
in terms of a “skip-free” condition, which is frequently satisfied by models in queueing
16.4. Models from queueing theory 411

applications on Zn in particular. For example, if we embed a model at the departure

times in such applications, and a limited number of customers leave each time, we get a
skip-free condition holding naturally. Indeed, in all of the queueing models of the next
section this condition is satisﬁed, so that this theorem can be applied there.
To see that geometric ergodicity and conditions on π such as (16.33) are not always
linked in the given topology on the space, however, again consider any i.i.d. chain. This
is always uniformly ergodic, regardless of π: the rescaling through gC here is too trivial
to be useful.
In the other direction, consider again the chain on Z+ with the transition matrix
given for all j ∈ Z+ by

P (0, j) = γj , P (j, j) = βj , P (j, 0) = 1 − βj ,

where j γj = 1: we know that if βj → 1 as n → ∞, the chain is not geometrically
ergodic. But for this chain, since we know that π(j) is proportional to

E0 [Number of visits to j before return to 0],

we have
π(j) ∝ γj [1 − βj ]−1
and so for suitable choice of γj we can clearly ensure that the tails of π are geometric
or otherwise in the given topology, regardless of the geometric ergodicity of P .

16.4 Models from queueing theory

We further illustrate the use of these theorems through the analysis of three queueing
systems.
These are all models on Zn+ and their analysis consists of showing that there exists
ε1 , ε2 > 0, such that ε1 |i|1 ≤ V (i) ≤ ε2 |i|1 , where V is the minimal solution to
(16.25) and |i|1 is the 1 -norm on Zn+ ; we then ﬁnd that Φ is V ∗ -uniformly ergodic for
V ∗ (i) = eδ V (i) , so that in particular we conclude that V ∗ is bounded above and below
by exponential functions of |i|1 for these models.
Typically in all of these examples the key extra assumption needed to ensure geo-
metric ergodicity is a geometric tail on the distributions involved: that is, the increment
distributions are in G + (γ) for some γ. Recall that this was precisely the condition used
for regenerative models in Section 16.1.3.

16.4.1 The embedded M/G/1 queue Nn

The M/G/1 queue exempliﬁes the steps needed to apply Theorem 16.3.1 in queueing
models.

Theorem 16.4.1. If Φ the Markov chain Nn deﬁned by (Q4) is ergodic, then Φ is

also geometrically ergodic provided the service time distributions are in G + (γ) for some
γ > 0.
412 V -Uniform ergodicity

Proof We have seen in Section 11.4 that V (i) = i is a solution to (16.25) with
C = {0}.
Let us now assume that the service time distribution H ∈ G + (γ). We prove that
(16.26) and (16.27) hold. Application of Theorem 16.3.1 then proves V ∗ -uniform er-
godicity of the embedded Markov chain where V ∗ (i) = eδ i for some δ > 0.
Let ak denote theprobability of k arrivals within one service. Note that (16.27)
trivially holds, since j ≤k P (k, j)(j − k)2 ≤ a0 . For l ≥ 0 we have
∞
1
P (k, k + l) = al+1 = e−λt (λt)l+1 dH(t).
(l + 1)! 0
Let δ > 0, so that
∞
eδ (l+1) P (k, k + l) ≤ exp{(eδ − 1)λt}dH(t)
l≥0 0

which is assumed to be ﬁnite for (eδ − 1)λ < γ. Thus we have the result.

16.4.2 A gated-limited polling system

We next consider a somewhat more complex multidimensional queueing model. Con-
sider a system consisting of K inﬁnite capacity queues and a single server.
The server visits the queues in order (hence the name “polling system”) and during
a visit to queue k the server serves min(x, k ) customers, where x is the number of
customers present at queue k at the instant the server arrives there: thus k is the
“gate limit”.
To develop a Markovian representation, this system is observed at each instant the
server arrives back at queue 1: the queue lengths at the respective queues are then
recorded. We thus have a K-dimensional state description Φn = Φkn , where Φkn stands
for the number of customers in queue k at the server’s nth visit to queue 1.
The arrival stream at queue k is assumed to be a Poisson stream with parameter λk ;
the amount of service given to a queue k customer is drawn from a general distribution
with mean µ−1 k .
To make the process Φ a Markov chain we assume that the sequence of service times
to queue k are i.i.d. random variables. Moreover, the arrival streams and service times
are assumed to be independent of each other.
Theorem 16.4.2. The gated-limited polling model Φ described above is geometrically
ergodic provided
1 > ρ := λk /µk (16.35)
k

and the service time distributions are in G (γ) for some γ.

Proof It is straightforward to show that Φ is ergodic for the gated-limited service

discipline when (16.35) holds, by identifying a drift function that is linear in the number
K
of customers in the respective queues: speciﬁcally V (i) = k =1 ik /µk , where i is a K-
dimensional vector with kth component ik , can easily be shown to satisfy (16.25).
16.4. Models from queueing theory 413

To apply the results in this section, observe that for this embedded chain there are
only ﬁnitely many diﬀerent possible one-step increments, depending on whether Φkn
exceeds k or equals x < k . Combined with the linearity of V , we conclude that both
sums
{ P (i, j)eλ(V (j )−V (i)) : i ∈ X}
j :V (j )≥V (i)

and
{ P (i, j)(V (j) − V (i))2 : i ∈ X}
j :V (j )< V (i)

have only finitely many non-zero elements. We must ensure that these expressions are
all finite, but it is straightforward to check as in Theorem 16.4.1 that convergence of
the Laplace–Stieltjes transforms of the service time distributions in a neighborhood of
0 is sufficient to achieve this, and the theorem follows.

16.4.3 A queue with phase-type service times

In many cases of ergodic chains there are no closed form expressions for the drift
function, even though it follows from Chapter 11 that such functions exist. How-
ever, once ergodicity has been established, we do know by minimality that the function
VC (x) = Ex [σC ] is a ﬁnite solution to (16.25). We now consider a queueing model for
which we can study properties of this function without explicit calculation: this is the
single server queue with phase-type service time distribution.
Jobs arrive at a service facility according to a Poisson process with parameter λ.
With probability pk any job requires k independent exponentially distributed phases of
service each with mean ν. The sum of these ∞ phases is the “phase-type” service time
distribution, with mean service time µ−1 = k =1 kpk /ν.
This process can be viewed as a continuous time Markov process on the state space
X = {i = (i1 , i2 ) | i1 , i2 ∈ Z+ }
where i1 stands for the number of jobs in the queue and i2 for the remaining number
of phases of service the job currently in service is to receive.
We consider an approximating discrete time Markov chain, which has the following
transition probabilities for h < (λ + ν)−1 and e1 = (1, 0), e2 = (0, 1):
P (0, 0 + e2 ) = λpl h,
P (i, i + e1 ) = λh, i1 , i2 > 0,
P (i, i − e2 ) = νh, i1 > 0, i2 > 1,
P (i, i − e1 + le2 ) = νpl h,
i1 > 0, i2 = 1,
P (i, i) = 1 − j = i P (i, j).
We call this the h-approximation to the M/PH/1 queue.
Although we do not evaluate a drif criterion explicitly for this chain, we will use a
coupling argument to show for V0 (i) = Ei [σ0 ] that when i = 0
V0 (i + e2 ) − V0 (i) = c, (16.36)
∞

V0 (i + e1 ) − V0 (i) = c := c lpl (16.37)
l=1
414 V -Uniform ergodicity

for some constant c > 0, so that V0 (i) = c i1 + ci2 is thus linear in both components of
the state variable for i = 0.
Theorem 16.4.3. The h-approximation of the M/PH/1 queue as in (16.36) is geo-
metrically ergodic whenever it is ergodic, provided the phase distribution of the service
times is in G + (γ) for some γ > 0.
In particular if there are a ﬁnite number of phases ergodicity is equivalent to geo-
metric ergodicity for the h-approximation.

Proof To develop the coupling argument, we ﬁrst generate sample paths of Φ

drawing from two i.i.d. sequences U 1 = {Un1 }n , U 2 = {Un2 }n of random variables hav-
ing a uniform distribution on (0, 1]. The first sequence generates arrivals and phase-
completions, the second generates the number of phases of service that will be given
to a customer starting service. The procedure is as follows. If Un1 ∈ (0, λh] an arrival
is generated in (nh, (n + 1)h]; if Un1 ∈ (λh, λh + νh] a phase completion is generated,
k −1 k
and otherwise nothing happens. Similarly, if Un2 ∈ ( l=0 pl , l=0 pl ] k phases will be
given to the nth job starting service. This stochastic process has the same probabilistic
behavior as Φ.
To prove (16.36) we compare two sample paths, say φk = {φkn }n , k = 1, 2, with
φ11 = i and φ21 = i+e2 , generated by one realization of U 1 and U 2 . Clearly φ2n = φ1n +e2 ,
until the first moment that φ1 hits 0, say at time n∗ . But then φ2n ∗ = (0, 1). This
holds for all realizations φ1 and φ2 and we conclude that V0 (i + e2 ) = Ei+e 2 [σ0 ] =
Ei [σ0 ] + Ee 2 [σ0 ] = V0 (i) + c, for c = Ee 2 [σ0 ].
If φ2starts in i + e1 then φ 2
n ∗ = (0, l) with probability pl , so that V0 (i + e2 ) =
V0 (i) + l pl Ele 2 [σ0 ] = V0 (i) + c l pl l.
Hence, (16.37) and (16.36) hold, and the combination of (16.37) and (16.36) proves
(16.26) if we assume that the service time distribution is in G + (γ) for some γ > 0, again
giving sufficiency of this condition for geometric ergodicity.

16.5 Autoregressive and state space models

As we saw brieﬂy in Section 15.5.2, models with some autoregressive character may be
geometrically ergodic without the need to assume that the innovation distribution is in
G + (γ). We saw this occur for simple linear models, and for scalar bilinear models.
We now consider rather more complex versions of such models and see that the
phenomenon persists, even with increasing complexity of space and structure, if there
is a multiplicative constant essentially driving the movement of the chain.

16.5.1 Multidimensional RCA models

The model we consider next is a multidimensional version of the RCA model. The
process of n-vector observations Φ is generated by the Markovian system

Φk +1 = (A + Γk +1 )Φk + Wk +1 (16.38)

where A is an n × n non-random matrix, Γ is a sequence of random (n × n) matrices,

and W is a sequence of random p-vectors.
16.5. Autoregressive and state space models 415

Such models are developed in detail in [299], and we will assume familiarity with the
Kronecker product “⊗” and the “vec” operations, used in detail there. In particular we
use the basic identities
vec (ABC) = (C ⊗ A)vec (B),
(16.39)
(A ⊗ B) = (A ⊗ B ).

To obtain a Markov chain and then establish ergodicity we assume:

Random coeﬃcient autoregression

(RCA1) The sequences Γ and W are i.i.d. and also independent of each
other.

(RCA2) The following expectations exist, and have the prescribed values:

E[Wk ] = 0 E[Wk Wk ] = G (n × n),

E[Γk ] = 0 (n × n) E[Γk ⊗ Γk ] = C (n2 × n2 ),

and the eigenvalues of A ⊗ A + C have moduli less than unity.

Γk
(RCA3) The distribution of W k
has an everywhere positive density with
2
respect to µL e b on Rn +p
.

Theorem 16.5.1. If the assumptions (RCA1)–(RCA3) hold for the Markov chain de-
ﬁned in (16.38), then Φ is V -uniformly ergodic, where V (x) = |x|2 . Thus these as-
sumptions suﬃce for a second-order stationary version of Φ to exist.

Proof Under the assumptions of the theorem the chain is weak Feller and we can
take ψ as µL e b on Rn . Hence from Theorem 6.2.9 the chain is an irreducible T-chain,
and compact subsets of the state space are petite. Aperiodicity is immediate from
the density assumption (RCA3). We could also apply the techniques of Chapter 7 to
conclude that Φ is a T-chain, and this would allow us to weaken (RCA3).
To prove |x|2 -uniform ergodicity we will use the following two results, which are
proved in [299]. Suppose that (RCA1) and (RCA2) hold, and let N be any n × n
positive deﬁnite matrix.

(i) If M is deﬁned by

vec (M ) = (I − A ⊗ A − C)−1 vec (N ), (16.40)

then M is also positive deﬁnite.

(ii) For any x,

E[Φ
k (A + Γk +1 ) M (A + Γk +1 )Φk | Φk = x] = x M x − x N x. (16.41)
416 V -Uniform ergodicity

Now let N be any positive deﬁnite (n × n)-matrix and deﬁne M as in (16.40). Then
with V (x) := x M x,

E[V (Φk +1 ) | Φk = x] = E[Φ

k (A + Γk +1 ) M (A + Γk +1 )Φk | Φk = x]
(16.42)
+ E[Wk+1 M Wk +1 ]

on applying (RCA1) and (RCA2).

From (16.41) we also deduce that

P V (x) = V (x) − x N x + tr (V G) < λV (x) + L (16.43)

for some λ < 1 and L < ∞, from which we see that (V4) follows, using Lemma 15.2.8.
Finally, note that for some constant c we must have c−1 |x|2 ≤ V (x) ≤ c|x|2 and the
result is proved.

16.5.2 Adaptive control models

In this section we return to the simple adaptive control model defined by (SAC1)–
(SAC2) whose associated Markovian state process Φ is defined by (2.25).
We showed in Proposition 12.5.2 that the distributions of the state process Φ for
this adaptive control model are tight whenever stability in the mean square sense is
possible, for a certain class of initial distributions. Here we refine the stability proof to
obtain V -uniform ergodicity for the model.
Once these stability results are obtained we can further analyze the system equations
and find that we can bound the steady state variance of the output process by the mean
square tracking error Eπ [|θ̃0 |2 ] and the disturbance intensity σw2 .
Let y : X → R, θ̃ : X → R, Σ : X → R denote the coordinate variables on X so that

Yk = y(Φk ), θ̃k = θ̃(Φk ), Σk = Σ(Φk ), k ∈ Z+ ,

and deﬁne the coercive function V on X by

V (y, θ̃, Σ) = θ̃4 + ε0 θ̃2 y 2 + ε20 y 2 (16.44)

where ε0 > 0 is a small constant which will be speciﬁed below.

Letting P denote the Markov transition function for Φ we have by (2.23),

P y 2 = θ̃2 y 2 + σw2 . (16.45)

This is far from (V4), but applying the operator P to the function θ̃2 y 2 gives
ασ 2 θ̃ − αΣyW 2 2
0 1
P θ̃2 y 2 = E + Z1 θ̃y + W1
σ02 + Σy 2
= σz2 θ̃2 y 2 + σz2 σw2
α 2
+ E[(σ02 θ̃ − ΣyW1 )2 (θ̃y + W1 )2 ]
σ02 + Σy 2
16.5. Autoregressive and state space models 417

and hence we may ﬁnd a constant K1 < ∞ such that

P θ̃2 y 2 ≤ σz2 θ̃2 y 2 + K1 (θ̃4 + θ̃2 + 1). (16.46)

From (2.22) it is easy to show that for some constant K2 > 0

P θ̃4 ≤ α4 θ̃4 + K2 (θ̃2 + 1). (16.47)

When σz2 < 1 we combine (16.45)–(16.47) to ﬁnd, for any 1 > ρ > max(σz2 , α4 ), con-
stants R < ∞ and ε0 > 0 such that with V deﬁned in (16.44), P V ≤ ρV + R. Applying
Theorem 16.1.2 and Lemma 15.2.8 we have proved
Proposition 16.5.2. The Markov chain Φ is V -uniformly ergodic whenever σz2 < 1,
with V given by (16.44); and for all initial conditions x ∈ X, as k → ∞,

Ex [Yk ] → y 2 dπ
2
(16.48)

at a geometric rate.

Hence the performance of the closed loop system is characterized by the unique
invariant probability π.
From ergodicity of the model it can be shown that in steady state θ̃k = θk − E[θk |
Y0 , . . . , Yk ], and Σk = E[θ̃k2 | Y0 , . . . , Yk ]. Using these identities we now obtain bounds
on performance of the closed loop system by integrating the system equations with
respect to the invariant measure.
Taking expectations in (2.23) and (2.24) under the probability Pπ gives

Eπ [Y02 ] = Eπ [Σ0 Y02 ] + σw2 ,

σz Eπ [Y02 ]
2
= Eπ [Σ0 Y02 ] − α2 σw2 Eπ [Σ0 ].

Hence, by subtraction, and using the identity Eπ [|θ̃0 |2 ] = Eπ [Σ0 ], we can evaluate the
limit (16.48) as
σw2
Eπ [Y02 ] = 1 + α2 Eπ [|θ̃0 |2 ] . (16.49)
1 − σz2

This shows precisely how the steady state performance is related to the disturbance
intensity σw2 , the parameter variation intensity σz2 , and the mean square parameter
estimation error Eπ [|θ̃0 |2 ].
Using obvious bounds on Eπ [Σ0 ] we obtain the following bounds on the steady state
performance in terms of the system parameters only:

σw2 σw2 α2 σz2

(1 + α 2 2
σ ) ≤ E π [Y 2
] ≤ (1 + ).
1 − σz2 z 0
1 − σz2 1 − α2
If it were possible to directly observe θk −1 at time k, then the optimal performance
would be
σw2
Eπ [Y02 ] = .
1 − σz2
This shows that the lower bound in the previous chain of inequalities is non-trivial.
418 V -Uniform ergodicity

log10 Yk
30

0
k
1000

Figure 16.1: The output of the simple adaptive control model when the control Uk is set
equal to zero. The resulting process is equivalent to the dependent parameter bilinear
model with α = 0.99, Wk ∼ N (0, 0.01) and Zk ∼ N (0, 0.04).

The performance of the closed loop system is illustrated in Chapter 2.

A sample path of the output Y of the controlled system is given on the left in
Figure 2.5, which is comparable to the noise sample path illustrated in Figure 2.6. To
see how this compares to the control-free system, a simulation of the simple adaptive
control model with the control value Uk set equal to zero for all k is given in Figure 16.1.

The resulting process Yθ becomes a version of the dependent parameter bilinear model.
Even though we will see in Chapter 17 that this process is bounded in probability, the
sample paths ﬂuctuate wildly, with the output process Y quickly exceeding 10100 in
this simulation.

16.6 Commentary*
This chapter brings together some of the oldest and some of the newest ergodic theorems
for Markov chains.
Initial results on uniform ergodicity for countable chains under, essentially, Doeblin’s
condition date to Markov [248]: transition matrices with a column bounded from zero
are often called Markov matrices. For general state space chains use of the condition
of Doeblin is in [93]. These ideas are strengthened in Doob [99], whose introduction
and elucidation of Doeblin’s condition as Hypothesis D (p. 192 of [99]) still guides the
analysis of many models and many applications, especially on compact spaces.
Other areas of study of uniformly ergodic (sometimes called strongly ergodic, or
quasi-compact) chains have a long history, much of it initiated by Yosida and Kakutani
[412] who considered the equivalence of (iii) and (v) in Theorem 16.0.2, as did Doob
[99]. Somewhat surprisingly, even for countable spaces the hitting time criterion of
Theorem 16.2.2 for uniformly ergodic chains appears to be as recent as the work of
Huang and Isaacson [164], with general-space extensions in Bonsdorﬀ [38]; the obvious
value of a bounded drift function is developed in Isaacson and Tweedie [170] in the
countable space case. Nummelin ([303], Chapters 5.6 and 6.6) gives a discussion of
16.6. Commentary* 419

much of this material.

There is a large subsequent body of theory for quasi-compact chains, exploiting
operator-theoretic approaches. Revuz ([326], Chapter 6) has a thorough discussion of
uniformly ergodic chains and associated quasi-compact operators when the chain is not
irreducible. He shows that in this case there is essentially a finite decomposition into
recurrent parts of the space: this is beyond the scope of our work here.
We noted in Theorem 16.2.5 that uniform ergodicity results take on a particularly
elegant form when we are dealing with irreducible T-chains: this is first derived in
a different way in [391]. It is worth noting that for reducible T-chains there is an
appealing structure related to the quasi-compactness above. It is shown by Tuominen
and Tweedie [391] that, even for chains which are not necessarily irreducible, if the
space is compact then for any T-chain there is also a finite decomposition

*
n
X= Hk ∪ E
k =0

where the Hi are disjoint absorbing sets and Φ restricted to any Hk is uniformly ergodic,
and E is uniformly transient.
The introduction to uniform ergodicity that we give here appears brief given the
history of such theory, but this is a largely a consequence of the fact that we have built
up, for ψ-irreducible chains, a substantial set of tools which makes the approach to this
class of chains relatively simple.
Much of this simplicity lies in the ability to exploit the norm ||| · |||V . This is a very
new approach. Although Kartashov [196, 197] has some initial steps in developing a
theory of general space chains using the norm ||| · |||V , he does not link his results to
the use of drift conditions, and the appearance of V -uniform results are due largely to
recent observations of Hordijk and Spieksma [366, 163] in the countable space case.
Their methods are substantially different from the general state space version we
use, which builds on Chapter 15: the general space version was first developed in [277]
for strongly aperiodic chains. This approach shows that for V -uniformly ergodic chains,
it is in fact possible to apply the same quasi-compact operator theory that has been
exploited for uniformly ergodic chains, at least within the context of the space L∞ V .
This is far from obvious: it is interesting to note Kendall himself ([203], p. 183) saying
that “ ... the theory of quasi-compact operators is completely useless” in dealing with
geometric ergodicity, whilst Vere-Jones [406] found substantial difficulty in relating
standard operator theory to geometric ergodicity. This appears to be an area where
reasonable further advances may be expected in the theory of Markov chains.
It is shown in Athreya and Pantula [15] that an ergodic chain is always strong mixing.
The extension given in Section 16.1.2 for V -uniformly ergodic chains was proved for
bounded functions in [92], and the version given in Theorem 16.1.5 is essentially taken
from Meyn and Tweedie [277].
Verifying the V -uniform ergodicity properties is usually done through test functions
and drift conditions, as we have seen. Uniform ergodicity is generally either a trivial
or a more difficult property to verify in applications. Typically one must either take
the state space of the chain to be compact (or essentially compact), or be able to
apply the Doeblin or small set conditions, in order to gain uniform ergodicity. The
identification of the rate of convergence in this last case is a powerful incentive to use
420 V -Uniform ergodicity

such an approach. The delightful proof in Theorem 16.2.4 is due to Rosenthal [341],
following the strong stopping time results of Aldous and Diaconis [1, 88], although the
result itself is inherent in Theorem 6.15 of Nummelin [303]. An application of this result
to Markov chain Monte Carlo methods is given by Tierney [385].
However, as we have shown, V -uniform ergodicity can often be obtained for some
V under much more readily obtainable conditions, such as a geometric tail for any
i.i.d. random variables generating the process. This is true for queues, general storage
models, and other random-walk-related models, as the application of the increment
analysis of Section 16.3 shows. Such chains were investigated in detail by Vere-Jones
[403] and Miller [284].
The results given in Section 16.3 and Section 16.3.2 are new in the case of general X,
but are based on a similar approach for countable spaces in Spieksma and Tweedie [368],
which also contains a partial converse to Theorem 16.3.2. There are some precursors to
these conditions: one obvious way of ensuring that P has the characteristics in (16.26)
and (16.27) is to require that the increments from any state are of bounded range, with
the range allowed depending on V , so that for some b

|V (j) − V (k)| ≥ b ⇒ P (k, j) = 0 : (16.50)

and in [243] it is shown that under the bounded range condition (16.50) an ergodic
chain is geometrically ergodic.
A detailed description of the polling system we consider here can be found in [2].
Note that in [2] the system is modeled slightly differently, with arrivals of the server at
each gate defining the times of the embedded process. The coupling construction used
to analyze the h-approximation to the phase-service model is based on [350] and clearly
is ideal for our type of argument. Further examples are given in [368].
For the adaptive control and linear models, as we have stressed, V -uniform ergodicity
is often actually equivalent to simple ergodicity: the examples in this chapter are chosen
to illustrate this. The analysis of the bilinear and the vector RCA model given here is
taken from Feigin and Tweedie [111]; the former had been previously analyzed by Tong
[387]. In a more traditional approach to RCA models through time series methods,
Nicholls and Quinn [299] also find (RCA2) appropriate when establishing conditions for
strict stationarity of Φ, and also when treating asymptotic results of estimators.
The adaptive model was introduced in [253] and a stability analysis appeared in
[270] where the performance bound (16.49) was obtained. Related results appeared in
[365, 148, 269, 130]. The stability of the multidimensional adaptive control model was
only recently resolved in Rayadurgam et al. [324].

Commentary for the second edition: In the ﬁrst edition the vector-space setting
was credited to work of Kartashov (see preceding text). In fact its origin is the 1969 work
of Veinott [185] concerning controlled Markov models. Section 20.1 contains further
discussion on the recent evolution of topics in this chapter.
An early application of the skip-free condition is contained in [156], also in the
setting of controlled Markov models. Assumption (ii) of this paper is a version of the
g-skip-free property, in which the function g represents “reward” in a controlled model.
The implications of Doeblin’s condition to large deviations theory and to spectral
theory can be found in [140, 218, 408].
Chapter 17

Sample paths and limit

theorems

Most of this chapter is devoted to the analysis of the series Sn (g), where we deﬁne for
any function g on X,

n
Sn (g) := g(Φk ). (17.1)
k =1

We are concerned primarily with four types of limit theorems for positive recurrent
chains possessing an invariant probability π:
(i) those which are based upon the existence of martingales associated with the chain;
(ii) the Strong Law of Large Numbers (LLN), which states that n−1 Sn (g) converges
to π(g) = Eπ [g(Φ0 )], the steady state expectation of g(Φ0 );
(iii) the Central Limit Theorem (CLT), which states that the sum Sn (g − π(g)), when
properly normalized, is asymptotically normally distributed;
(iv) the Law of the Iterated Logarithm (LIL) which gives precise upper and lower
bounds on the limit supremum of the sequence Sn (g − π(g)), again when properly
normalized.
The martingale results (i) provide insight into the structure of irreducible chains, and
make the proofs of more elementary ergodic theorems such as the LLN almost trivial.
Martingale methods will also prove to be very powerful when we come to the CLT for
appropriately stable chains.
The trilogy of the LLN, CLT and LIL provide measures of centrality and variability
for Φn as n becomes large: these complement and strengthen the distributional limit
theorems of previous chapters. The magnitude of variability is measured by the variance
given in the CLT, and one of the major contributions of this chapter is to identify the way
in which this variance is deﬁned through the autocovariance sequence for the stationary
version of the process {g(Φk )}.
The three key limit theorems which we develop in this chapter using sample path
properties for chains which possess a unique invariant probability π are

421
422 Sample paths and limit theorems

LLN We say that the Law of Large Numbers holds for a function g if
1
lim Sn (g) = π(g) a.s. [P∗ ]. (17.2)
n →∞ n
CLT We say that the Central Limit Theorem holds for g if there exists a constant
0 < γg2 < ∞ such that for each initial condition x ∈ X,
$ % t 1
lim Px (nγg2 )−1/2 Sn (g) ≤ t = √ e−x /2 dx
2

n →∞ −∞ 2π
where g = g − π(g): that is, as n → ∞,

(nγg2 )−1/2 Sn (g) −→ N (0, 1).

LIL When the CLT holds, we say that the Law of the Iterated Logarithm holds for g
if the limit inﬁmum and limit supremum of the sequence
(2γg2 n log log(n))−1/2 Sn (g)
are respectively −1 and +1 with probability one for each initial condition x ∈ X.
Strictly speaking, of course, the CLT is not a sample path limit theorem, although it
does describe the behavior of the sample path averages and these three “classical” limit
theorems obviously belong together.
Proofs of all of these results will be based upon martingale techniques involving the
path behavior of the chain, and detailed sample path analysis of the process between
visits to a recurrent atom.
Much of this chapter is devoted to proving that these limits hold under various
conditions. The following set of limit theorems summarizes a large part of this devel-
opment.
Theorem 17.0.1. Suppose that Φ is a positive Harris chain with invariant probability
π.
(i) The LLN holds for any g satisfying π(|g|) < ∞.
(ii) Suppose that Φ is V -uniformly ergodic. Let g be a function on X satisfying g 2 ≤ V ,
and let g denote the centered function g = g − g dπ. Then the constant
∞

γg2 := Eπ [g 2 (Φ0 )] + 2 Eπ [g(Φ0 )g(Φk )] (17.3)
k =1

is well deﬁned, non-negative and ﬁnite, and coincides with the asymptotic variance
1 2
lim Eπ Sn (g) = γg2 . (17.4)
n →∞ n

(iii) If the conditions of (ii) hold and if γg2 = 0, then

1
lim √ Sn (g) = 0 a.s. [P∗ ].
n →∞ n

(iv) If the conditions of (ii) hold and if γg2 > 0, then the CLT and LIL hold for the
function g.
17.1. Invariant σ-ﬁelds and the LLN 423

Proof The LLN is proved in Theorem 17.1.7, and the CLT and LIL are proved in
Theorem 17.3.6 under conditions somewhat weaker than those assumed here.
It is shown in Lemma 17.5.2 and Theorem 17.5.3 that the asymptotic variance γg2 is
given by (17.3) under the conditions of Theorem 17.0.1, and the alternate representation
(17.4) of γg2 is given in Theorem 17.5.3. The a.s. convergence in (iii) when γg2 = 0 is
proved in Theorem 17.5.4.

While Theorem 17.0.1 summarizes the main results, the reader will ﬁnd that there is
much more to be found in this chapter. We also provide here techniques for proving the
LLN and CLT in contexts far more general than given in Theorem 17.0.1. In particular,
these techniques lead to a functional CLT for f -regular chains in Section 17.4.
We begin with a discussion of invariant σ-ﬁelds, which form the basis of classical
ergodic theory.

17.1 Invariant σ-ﬁelds and the LLN

Here we introduce the concepts of invariant random variables and σ-ﬁelds, and show
how these concepts are related to Harris recurrence on the one hand and the LLN on
the other.

17.1.1 Invariant random variables and events

For a ﬁxed initial distribution µ, a random variable Y on the sample space (Ω, F) will be
called Pµ -invariant if θk Y = Y a.s. [Pµ ] for each k ∈ Z+ , where θ is the shift operator.
Hence Y is Pµ -invariant if there exists a function f on the sample space such that

Y = f (Φk , Φk +1 , . . . ) a.s. [Pµ ], k ∈ Z+ . (17.5)

When Y = IA for some A ∈ F, then the set A is called a Pµ -invariant event. The set
of all Pµ -invariant events is a σ-ﬁeld, which we denote Σµ .
Suppose that an invariant probability measure π exists, and for now restrict attention
to the special case where µ = π. In this case, Σπ is equal to the family of invariant
events which is commonly used in ergodic theory (see for example Krengel [221]) and
is often denoted ΣI .
For a bounded, Pπ -invariant random variable Y we let hY denote the function

hY (x) := Ex [Y ], x ∈ X. (17.6)

By the Markov property and invariance of the random variable Y ,

hY (Φk ) = E[θk Y | FkΦ ] = E[Y | FkΦ ] a.s. [Pπ ]. (17.7)

This will be used to prove:

Lemma 17.1.1. If π is an invariant probability measure and Y is a Pπ -invariant ran-

dom variable satisfying Eπ [|Y |] < ∞, then

Y = hY (Φ0 ) a.s. [Pπ ].

424 Sample paths and limit theorems

Proof It follows from (17.7) that the adapted process (hY (Φk ), FkΦ ) is a convergent
martingale for which
lim hY (Φk ) = Y a.s. [Pπ ].
k →∞

When Φ0 ∼ π the process hY (Φk ) is also stationary, since Φ is stationary, and hence
the limit above shows that its sample paths are almost surely constant. That is, Y =
hY (Φk ) = hY (Φ0 ) a.s. [Pπ ] for all k ∈ Z+ .

It follows from Lemma 17.1.1 that if X ∈ L1 (Ω, F, Pπ ) then the Pπ -invariant random
variable E[X | Σπ ] is a function of Φ0 alone, which we shall denote X∞ (Φ0 ), or just
X∞ .
The function X∞ is signiﬁcant because it describes the limit of the sample path
averages of {θk X}, as we show in the next result.

Theorem 17.1.2. If Φ is a Markov chain with invariant probability measure π, and

X ∈ L1 (Ω, F, Pπ ), then there exists a set FX ∈ B(X) of full π-measure such that for
each initial condition x ∈ FX ,

1 k
N
lim θ X = X∞ (x) a.s. [Px ].
N →∞ N
k =1

Proof Since Φ is a stationary stochastic process when Φ0 ∼ π, the process {θk X :

k ∈ Z+ } is also stationary, and hence the Strong Law of Large Numbers for stationary
sequences [99] can be applied:

1 k
N
lim θ X = E[X | Σπ ] = X∞ (Φ0 ) a.s. [Pπ ].
N →∞ N
k =1

Hence, using the deﬁnition of Pπ , we may calculate

$ %
1 k
N
Px lim θ X = X∞ (x) π(dx) = 1.
N →∞ N
k =1

Since the integrand is always positive and less than or equal to one, this proves the
result.

This is an extremely powerful result, as it only requires the existence of an invariant

probability without any further regularity or even irreducibility assumptions on the
chain. As a product of its generality, it has a number of drawbacks. In particular, the
set FX may be very small, may be difficult to identify, and will typically depend upon
the particular random variable X.
We now turn to a more restrictive notion of invariance which allows us to deal more
easily with null sets such as FXc . In particular we will see that the difficulties associated
with the general nature of Theorem 17.1.2 are resolved for Harris processes.
17.1. Invariant σ-fields and the LLN 425

17.1.2 Harmonic functions

To obtain ergodic theorems for arbitrary initial conditions, it is helpful to restrict some-
what our definition of invariance.
The concepts introduced in this section will necessitate some care in our definition of
a random variable. In this section, a random variable Y must “live on” several different
probability spaces at the same time. For this reason we will now stress that Y has the
form Y = f (Φ0 , . . . , Φk , . . . ) where f is a function which is measurable with respect to
B(Xz ) = F. We call a random variable Y of this form invariant if it is Pµ -invariant for
every initial distribution µ. The class of invariant events is defined analogously, and is
a σ-field which we denote Σ.
Two examples of invariant random variables in this sense are

1
N

Q{A} = lim sup I{Φk ∈ A}, π̃{A} = lim sup I{Φk ∈ A}
k →∞ N →∞ N
k =1

with A ∈ B(X).
A function h : X → R is called harmonic if, for all x ∈ X,

P (x, dy)h(y) = h(x). (17.8)

This is equivalent to the adapted sequence (h(Φk ), FkΦ ) possessing the martingale prop-
erty for each initial condition: that is,
E[h(Φk +1 ) | FkΦ ] = h(Φk ), k ∈ Z+ , a.s. [P∗ ].
For any measurable set A the function hQ {A } (x) = Q(x, A) is a measurable function of
x ∈ X which is easily shown to be harmonic. This correspondence is just one instance of
the following general result which shows that harmonic functions and invariant random
variables are in one-to-one correspondence in a well-deﬁned way.
Theorem 17.1.3. (i) If Y is bounded and invariant, then the function hY is har-
monic, and
Y = lim hY (Φk ) a.s. [P∗ ].
k →∞

(ii) If h is bounded and harmonic, then the random variable

H := lim sup h(Φk )
k →∞

is invariant, with hH (x) = h(x).

Proof For (i), ﬁrst observe that by the Markov property and invariance we may
deduce as in the proof of Lemma 17.1.1 that
hY (Φk ) = E[Y | FkΦ ] a.s. [P∗ ].
Since Y is bounded, this shows that (hY (Φk ), FkΦ ) is a martingale which converges to
Y . To see that hY is harmonic, we use invariance of Y to calculate
P hY (x) = Ex [hY (Φ1 )] = Ex [E[Y | F1Φ ]] = hY (x).
426 Sample paths and limit theorems

To prove (ii), recall that the adapted process (h(Φk ), FkΦ ) is a martingale if h is har-
monic, and since h is assumed bounded, it is convergent. The conclusions of (ii) follow.

Theorem 17.1.3 shows that there is a one-to-one correspondence between invari-
ant random variables and harmonic functions. From this observation we have as an
immediate consequence
Proposition 17.1.4. The following two conditions are equivalent:
(i) All bounded harmonic functions are constant.
(ii) Σµ and hence Σ are Pµ -trivial for each initial distribution µ.
Finally, we show that when Φ is Harris recurrent, all bounded harmonic functions
are trivial.
Theorem 17.1.5. If Φ is Harris recurrent, then the constants are the only bounded
harmonic functions.

Proof We suppose that Φ is Harris, let h be a bounded harmonic function, and

ﬁx a real constant a. If the set {x : h(x) ≥ a} lies in B+ (X), then we will show that
h(x) ≥ a for all x ∈ X. Similarly, if {x : h(x) ≤ a} lies in B + (X), then we will show
that h(x) ≤ a for all x ∈ X. These two bounds easily imply that h is constant, which is
the desired conclusion.
If {x : h(x) ≥ a} ∈ B+ (X), then Φ enters this set i.o. from each initial condition,
and consequently
lim sup h(Φk ) ≥ a a.s. [P∗ ].
k →∞
Applying Theorem 17.1.3 we see that h(x) = Ex [H] ≥ a for all x ∈ X. Identical
reasoning shows that h(x) ≤ a for all x when {x : h(x) ≤ a} ∈ B+ (X), and this
completes the proof.

It is of considerable interest to note that in quite another way we have already proved
this result: it is indeed a rephrasing of our criterion for transience in Theorem 8.4.2.
In the proof of Theorem 17.1.5 we are not in fact using the full power of the Mar-
tingale Convergence Theorem, and consequently the proposition can be extended to
include larger classes of functions, extending those which are bounded and harmonic, if
this is required.
As an easy consequence we have
Proposition 17.1.6. Suppose that Φ is positive Harris and that any of the LLN, the
CLT, or the LIL hold for some g and some one initial distribution. Then this same
limit holds for every initial distribution.

Proof We will give the proof for the LLN, since the proof of the result for the CLT
and LIL is identical.
Suppose that the LLN holds for the initial distribution µ0 , and let g∞ (x) =
Px { n1 Sn (g) → g dπ}. We have by assumption that

g∞ dµ0 = 1.
17.1. Invariant σ-ﬁelds and the LLN 427

We will now show that g∞ is harmonic, which together with Theorem 17.1.5 will imply
that g∞ is equal to the constant value 1, and thereby complete the proof. We have by
the Markov property and the smoothing property of the conditional expectation,
$ %
1
n
P g∞ (x) = Ex PΦ 1 lim g(Φk ) = g dπ
n →∞ n
k =1
$ %
1
n
= Ex Px lim g(Φk +1 ) = g dπ | F1Φ
n →∞ n
k =1
$ n + 1
1 g(Φ1 ) %
n +1
= Px lim g(Φk +1 ) − = g dπ
n →∞ n n+1 n
k =1
= g∞ (x).

From these results we may now provide a simple proof of the LLN for Harris chains.

17.1.3 The LLN for positive Harris chains

We present here the LLN for positive Harris chains. In subsequent sections we will prove
more general results which are based upon the existence of an atom for the process, or
an atom α̌ for the split version of a general Harris chain.
In the next result we see that when Φ is positive Harris, the null set FXc deﬁned in
Theorem 17.1.2 is empty:

Theorem 17.1.7. The following are equivalent when an invariant probability π exists
for Φ:

(i) Φ is positive Harris.

(ii) For each f ∈ L1 (X, B(X), π),

1
lim Sn (f ) = f dπ a.s. [P∗ ] .
n →∞ n

(iii) The invariant σ-ﬁeld Σ is Px -trivial for all x.

Proof (i) ⇒ (ii) If Φ is positive Harris with unique invariant probability π then
by Theorem 17.1.2, for each ﬁxed f , there exists a set G ∈ B(X) of full π-measure such
that the conclusions of (ii) hold whenever the distribution of Φ0 is supported on G. By
Proposition 17.1.6 the LLN holds for every initial condition.
(ii) ⇒ (iii) Let Y be a bounded invariant random variable, and let hY be the as-
sociated bounded harmonic function deﬁned in (17.6). By the hypotheses of (ii) and
Theorem 17.1.3 we have

1
N
Y = lim hY (Φk ) = lim hY (Φk ) = hY dπ a.s. [P∗ ],
k →∞ N →∞ N
k =1
428 Sample paths and limit theorems

which shows that every set in Σ has Px -measure zero or one.

(iii) ⇒ (i) If (iii) holds, then for any measurable set A the function Q( · , A) is con-
stant. It follows from Theorem 9.1.3 (ii) that Q( · , A) ≡ 0 or Q( · , A) ≡ 1. When
π{A} > 0, Theorem 17.1.2 rules out the case Q( · , A) ≡ 0, which establishes Harris
recurrence.

17.2 Ergodic theorems for chains possessing an atom

In this section we consider chains which possess a Harris recurrent atom α. Under
this assumption we can state a self-contained and more transparent proof of the Law
of Large Numbers and related ergodic theorems, and the methods extend to general
ψ-irreducible chains without much diﬃculty.
The main step in the proofs of the ergodic theorems considered here is to divide the
sample paths of the process into i.i.d. blocks corresponding to pieces of a sample path
between consecutive visits to the atom α. This makes it possible to infer most ergodic
theorems of interest for the Markov chain from relatively simple ergodic theorems for
i.i.d. random variables.
Let σα (0) = σα , and let {σα (j) : j ≥ 1} denote the times of consecutive visits to α
so that
σα (k + 1) = θσ α (k ) τα + σα (k), k ≥ 0.
For a function f : X → R we let sj (f ) denote the sum of f (Φi ) over the jth piece of the
sample path of Φ between consecutive visits to α:

σ α (j +1)
sj (f ) = f (Φi ) (17.9)
i=σ α (j )+1

By the strong Markov property the random variables {sj (f ) : j ≥ 0} are i.i.d. with
common mean

τα
Eα [s1 (f )] = Eα f (Φi ) = f dµ (17.10)
i=1

where the deﬁnition of µ is self-evident. The measure µ on B(X) is invariant by Theo-

rem 10.0.1.
By writing the sum of {f (Φi )} as a sum of {si (f )} we may prove the LLN, CLT and
LIL for Φ by citing the corresponding ergodic theorem for the i.i.d. sequence {si (f )}.
We illustrate this technique ﬁrst with the LLN.

17.2.1 Ratio form of the law of large numbers

We ﬁrst present a version of Theorem 17.1.7 for arbitrary recurrent chains.
Theorem 17.2.1. Suppose that Φ is Harris recurrent with invariant measure π, and
suppose
that there exists an atom α ∈ B+ (X). Then for any f , g ∈ L1 (X, B(X), π) with
g dπ = 0,
Sn (f ) π(f )
lim = a.s. [P∗ ].
n →∞ Sn (g) π(g)
17.2. Ergodic theorems for chains possessing an atom 429

Proof For the proof we assume that each of the functions f and g are positive.
The general case follows by decomposing f and g into their positive and negative parts.
We also assume that π is equal to the measure µ deﬁned implicitly in (17.10). This
is without loss of generality as any invariant measure is a constant multiple of µ by
Theorem 10.0.1.
For n ≥ σα we deﬁne

n
n := max(k : σα (k) ≤ n) = −1 + I{Φk ∈ α} (17.11)
k =0

so that from (17.9) we obtain the pair of bounds

n −1

n
n
τα
sj (f ) ≤ f (Φi ) ≤ sj (f ) + f (Φi ). (17.12)
j =0 i=1 j =0 i=1

Since the same relation holds with f replaced by g we have

τ α
n 1 n
s (f ) + f (Φ )
f (Φi ) n n j =1 j i=1 i
i=1
n ≤ n −1 .
i=1 g(Φ i ) n − 1 1
s (g)
n −1 j =0 j

Because {sj (f ) : j ≥ 1} is i.i.d. and n → ∞,

1
n
sj (f ) → E[s1 (f )] = f dµ
n j =0

and similarly for g. This yields

n
f (Φi ) f dµ
lim sup n
i=1
≤ ,
n →∞ i=1 g(Φi ) g dµ
and by interchanging the roles of f and g we obtain
n
f (Φi ) f dµ
lim inf ni=1
≥
n →∞
i=1 g(Φi ) g dµ
which completes the proof.

17.2.2 The CLT and the LIL for chains possessing an atom
Here we show how the CLT and LIL may be proved under the assumption that an atom
α ∈ B + (X) exists.
The Central Limit Theorem (CLT) states that the normalized sum
(nγg2 )−1/2 Sn (g)
converges in distribution to a standard Gaussian random variable, while the Law of the
Iterated Logarithm (LIL) provides sharp bounds on the sequence
(2γg2 n log log(n))−1/2 Sn (g)
430 Sample paths and limit theorems

where g is the centered function g := g − π(g), π is an invariant probability, and γg2 is a

normalizing constant.
These results do not hold unless some restrictions are imposed on both the function
and the Markov chain: for counterexamples on countable state spaces, the reader is
referred to Chung [71]. The purpose of this section is to provide general suﬃcient
conditions for chains which possess an atom.
One might expect that, as in the i.i.d. case, the asymptotic variance γg2 is equal to
the variance of the random variable g(Φk ) under the invariant probability. Somewhat
surprisingly, therefore, we will see below that this is not the case. When an atom α
exists we will demonstrate that in fact

τα 2
γg2 = π{α}Eα g(Φk ) . (17.13)
k =1

The actual variance of g(Φk ) in the stationary case is given by Theorem 10.0.1 as

τα 2
g 2 dπ = π{α}Eα g(Φk ) ;
k =1

thus when Φ is i.i.d., these expressions do coincide, but diﬀer otherwise.

We will need a moment condition to prove the CLT in the case where there is an
atom.

CLT moment condition for α

An atom α ∈ B + (X) exists with

Eα [s0 (|g|)2 ] < ∞ and Eα [s0 (1)2 ] < ∞. (17.14)

This condition will be generalized to obtain the CLT and LIL for general positive
Harris chains in Sections 17.3–17.5. We state here the results in the special case where
an atom is assumed to exist.

Theorem 17.2.2. Suppose that Φ is Harris recurrent, g : X → R is a function, and

that (17.14) holds so that Φ is in fact positive Harris. Then γg2 < ∞, and if γg2 > 0
then the CLT and LIL hold for g.

Proof The proof is a surprisingly straightforward extension of the second proof of

the LLN. Using the notation introduced in the proof of Theorem 17.2.1 we obtain the
bound
n n −1

g(Φi ) − sj (g) ≤ s n (|g|). (17.15)
i=1 j =0
17.2. Ergodic theorems for chains possessing an atom 431

By the law of large numbers for the i.i.d. random variables {(sj (|g|))2 : j ≥ 1},

1
N
lim (sj (|g|))2 = Eα [(s0 (|g|))2 ] < ∞
N →∞ N
j =1

and hence
N −1
1 1
N
lim (sj (|g|))2 − (sj (|g|))2 = 0.
N →∞ N
j =1
N − 1 j =1

From these two limits it follows that (sn (|g|))2 /n → 0 as n → ∞, and hence that

s n (|g|) s n (|g|)
lim sup √ ≤ lim sup √ =0 a.s. [P∗ ]. (17.16)
n →∞ n n →∞ n
This and (17.15) show that
1 n
1
n −1

√ g(Φi ) − √ sj (g) → 0 a.s. [P∗ ]. (17.17)
n i=1 n j =0

We
n now need a more delicate argument to replace the random upper limit in the sum
−1
j =0 sj (g) appearing in (17.17) with a deterministic upper bound.
First of all, note that
n n n
n ≤ ≤ n −1 .
j =0 sj (1) n j =0 sj (1)

Since s0 (1) is almost surely ﬁnite, s0 (1)/ n → 0, and as in (17.16), s n (1)/ n → 0.

Hence by the LLN for i.i.d. random variables,

n 1
n −1
lim = lim sj (1) = Eα [s0 (1)]−1 = π{α}. (17.18)
n →∞ n n →∞ n
j =1

Let ε > 0, n = !(1 − ε)π{α}n", n = (1 + ε)π{α}n, and n∗ = !π{α}n", where !x"
(x) denote the smallest integer greater than (greatest integer smaller than) the real
number x. Then by the result above, for some n0

Px {n ≤ n − 1 ≤ n} ≥ 1 − ε, n ≥ n0 . (17.19)

Hence for these n we have by Kolmogorov’s Inequality (Theorem D.6.3)

$ 1
n −1
1
n∗ % $ n∗ √ %

Px √ sj (g) − √ sj (g) > β ≤ ε + Px max ∗ sj (g) > β n
n j =0 n j =0 n ≤l≤n
j =l
$ l √ %

+ Px max
∗
sj (g) > β n
n ≤l≤n
j =n ∗

2nεEα [(s0 (g))2 ]

≤ ε+ .
β2 n
432 Sample paths and limit theorems

Since ε > 0 is arbitrary, this shows that

1 n
1
n∗

√ sj (g) − √ sj (g) → 0
n j =0 n j =0

in probability. This together with (17.17) implies that also

1 n
1
n∗

√ g(Φi ) − √ sj (g) → 0 (17.20)
n i=1 n j =0

in probability. By the CLT for i.i.d. sequences, we may let σ 2 = Eα [(s0 (g))2 ] giving

$ % $ n∗
%
lim Px (nγg2 )−1/2 Sn (g) ≤ t = 2 −1/2
lim Px (nγg ) sj (g) ≤ t
n →∞ n →∞
j =0
;
$ !nπ{α}" 1
n ∗
%
= lim Px √ sj (g) ≤ t
n →∞ nπ{α} n∗ σ 2 j =0
t
1
√ e−1/2 x dx
2
=
−∞ 2π

which proves (i).

To prove (ii), observe that (17.17) implies that, as in the proof of the CLT, the
analysis can be shifted to the sequence of i.i.d. random variables {sj (g) : j ≥ 1}. By
the LIL for this sequence,

1
n
lim sup - sj (g) = 1 a.s. [P∗ ]
n →∞ 2σ 2 n log log( n ) j =1

and the corresponding lim inf is −1. Equation (17.18) shows that n /n → π{α} > 0
and hence by a simple calculation log log n / log log n → 1 as n → ∞. These relations
together with (17.17) imply

1
n
lim sup : g(Φk )
n →∞ 2γg2 n log log(n) k =1

1 1 n
= lim sup - - sj (g)
n →∞ π{α} 2σ 2 n log log(n) k =1
;
1 n log log( n ) 1
n
= lim sup - - sj (g)
n →∞ π{α} n log log(n) 2σ 2 n log log( n ) k =1
=1

and the corresponding lim inf is equal to −1 by the same chain of equalities.

17.3. General Harris chains 433

17.3 General Harris chains

We have seen in the previous section that when Φ possesses an atom, the sample paths
of the process may be divided into i.i.d. blocks to obtain for the Markov chain almost
any ergodic theorem that holds for an i.i.d. process.
If Φ is strongly aperiodic, such ergodic theorems may be established by considering
the split chain, which possesses the atom X × {1}. For a general aperiodic chain such
a splitting is not possible in such a “clean” form. However, since an m-step skeleton
chain is always strongly aperiodic we may split this embedded chain as in Chapter 5
to construct an atom for the split chain. In this section we will show how we can then
embed the split chain onto the same probability space as the entire chain Φ. This will
again allow us to divide the sample paths of the chain into i.i.d. blocks, and the proofs
will be only slightly more complicated than when a genuine atom is assumed to exist.

17.3.1 Splitting general Harris chains

When Φ is aperiodic, we have seen in Proposition 5.4.5 that every skeleton is ψ-
irreducible, and that the minorization condition holds for some skeleton chain. That
is, we can ﬁnd a set C ∈ B+ (X), a probability ν, δ > 0, and an integer m such that
ν(C) = 1, ν(C c ) = 0 and
P m (x, B) ≥ δν(B), x ∈ C, B ∈ B(X).
The m-step chain {Φk m : k ∈ Z+ } is strongly aperiodic and hence may be split to form
a chain which possesses a Harris recurrent atom.
We will now show how the split chain may be put on the same probability space
as the entire chain Φ. It will be helpful to introduce some new notation so that we
can distinguish between the split skeleton chain, and the original process Φ. We will
let {Yn } denote the level of the split m-skeleton at time nm; for each n the random
variable Yn may take on the value zero or one. The split chain Φ̌ will become the
bivariate process {Φ̌n = (Φm n , Yn ) : n ∈ Z+ }, where the equality Φ̌n = xi means that
Φn m = x and Yn = i.
The split chain is constructed by deﬁning the conditional probabilities
P̌{Yn = 1, Φn m +1 ∈ dx1 , . . . , Φ(n +1)m −1 ∈ dxm −1 , Φ(n +1)m ∈ dy
| Φn0 m , Y0n −1 ; Φn m = x}
= P̌{Y0 = 1, Φ1 ∈ dx1 , . . . , Φm −1 ∈ dxm −1 , Φm ∈ dy | Φ0 = x}
= δr(x, y)P (x, dx1 ) · · · P (xm −1 , dy) (17.21)
where r ∈ B(X2 ) is the Radon–Nykodym derivative
ν(dy)
r(x, y) = I{x ∈ C} .
P m (x, dy)
Integrating over x1 , . . . , xm −1 we see that
P̌{Yn = 1, Φ(n +1)m ∈ dy | Φn0 m , Y0n −1 ; Φn m = x}
ν(dy)
= δI(x ∈ C) P m (x, dy)
P m (x, dy)
= δI(x ∈ C)ν(dy).
434 Sample paths and limit theorems

From Bayes’ rule, it follows that

P̌{Yn = 1 | Φn0 m , Y0n −1 ; Φn m = x} = δI{x ∈ C},

P̌{Φ(n +1)m ∈ dy | Φn0 m , Y0n ; Φn m = x, Yn = 1} = ν(dy)

and hence, given that Yn = 1, the pre-nm process and post-(n + 1)m process are
independent: that is

{Φk , Yi : k ≤ nm, i ≤ n} is independent of {Φk , Yi : k ≥ (n + 1)m, i ≥ n + 1}.

Moreover, the distribution of the post-(n + 1)m process is the same as the P̌ν ∗ -
distribution of {(Φi , Yi ) : i ≥ 0}, with the interpretation that ν is “split” to form
ν ∗ as in (5.3) so that

P̌ν ∗ {Y0 = 1, Φ0 ∈ dx} := δI(x ∈ C)ν(dx).

For example, for any positive function f on X, we have

Ě[f (Φ(n +1)m +k ) | Φm n n

0 , Y0 ; Yn = 1] = Eν [f (Φk )].

Hence the set α̌ := C1 := C × {1} behaves very much like an atom for the chain.
We let σα̌ (0) denote the ﬁrst entrance time of the split m-step chain to the set α̌,
and σα̌ (k) the k th entrance time to α̌ subsequent to σα̌ (0). These random variables are
deﬁned inductively as

σα̌ (0) = min(k ≥ 0 : Yk = 1),

σα̌ (n) = min(k > σα̌ (n − 1) : Yk = 1), n ≥ 1.

The hitting times {τα̌ (k)} are deﬁned in a similar manner:

τα̌ (1) = min(k ≥ 1 : Yk = 1),

τα̌ (n) = min(k > τα̌ (n − 1) : Yk = 1), n ≥ 1.

For each n deﬁne

m σ α̌ (i+1)+m −1

si (f ) = f (Φj )
j =m (σ α̌ (i)+1)

σ α̌ (i+1)
= Zj (f )
j =σ α̌ (i)+1

where

m −1
Zj (f ) = f (Φj m +k ).
k =0

From the remarks above and the strong Markov property we obtain the following result:
17.3. General Harris chains 435

Theorem 17.3.1. The two collections of random variables

{si (f ) : 0 ≤ j ≤ m − 2}, {si (f ) : j ≥ m}

are independent for any m ≥ 2. The distribution of si (f ) is, for any i, equal to the
τ α̌ m +m −1
P̌α̌ -distribution of the random variable k =m f (Φk ), which is equal to the P̌ν ∗ -
distribution of
+m −1
σ α̌ m σ α̌
f (Φk ) = Zk (f ). (17.22)
k =0 k =0

The common mean of {si (f )} may be expressed

Ě[si (f )] = δ −1 π(C)−1 m f dπ. (17.23)

Proof From the deﬁnition of {σα̌ (k)} we have that the distribution of sn +j (f )
given s0 (f ), . . . , sn (f ) is equal to the distribution of si (f ) for all n ∈ Z+ , j ≥ 1. This
follows from the construction of {σα̌ (k)} which makes the distribution of Φσ α̌ (n +j )m +m
given FσΦα̌ (n +j )m ∨ FσYα̌ (n +j ) equal to ν.
From this we see that {sn (f ) : n ≥ 1} is a stationary sequence and, moreover,
that {sj (f )} is a one-dependent process: that is, {s0 (f ), . . . , sn −1 (f )} is independent
of {sn +1 (f ), . . . , } for all n ≥ 1.
From (17.22) we can express the common mean of {si (f )} in terms of the invariant
mean of f as follows
τ
Ě[si (f )] = Ěα̌ [ kα̌=1 Zk (f )]
∞
= Ěα̌ [6 k =1 Zk (f )I{k ≤ τα̌ }] 7
∞
= Ěα̌ Φ̌ m k [Z1 (f )]I{k ≤ τα̌ }
k =1 Ě
= δ −1 π(C)−1 π(dy)E y [Z1 (f )]
= δ −1 π(C)−1 m f dπ

where the fourth equality follows from the representation of π given in Theorem 10.0.1
applied to the split m-skeleton chain.

Deﬁne now, for each n ∈ Z+ , n := max{i ≥ 0 : mσα̌ (i) ≤ n}, and write
n m σ α̌ (0)+m −1
k =1 f (Φk ) = k
=1 f (Φk )
n −1
+ i=0 si (f ) (17.24)
n
+ k =m (σ α̌ ( n )+1) f (Φk ).

All of the ergodic theorems presented in the remainder of this section are based upon
Theorem 17.3.1 and the decomposition (17.24), valid for all n ≥ 1.
We now apply this construction to give an extension of the Law of Large Numbers.

17.3.2 The LLN for general Harris chains

The following general version of the LLN for Harris chains follows easily by considering
the split chain Φ̌.
436 Sample paths and limit theorems

Theorem 17.3.2. The following are equivalent when a σ-ﬁnite invariant measure π
exists for Φ:

(i) For every f , g ∈ L1 (π) with g dπ = 0,
Sn (f ) π(f )
lim = a.s. [P∗ ].
n →∞ Sn (g) π(g)

(ii) The invariant σ-ﬁeld Σ is Px -trivial for all x.

(iii) Φ is Harris recurrent.

Proof We just prove the equivalence between (i) and (iii). The equivalence of (i)
and (ii) follows from the Chacon–Ornstein Theorem (see Theorem 3.2 of Revuz [326]),
and the same argument that was used in the proof of Theorem 17.1.7.
The “if” part is trivial: If f dπ > 0, then by the ratio limit result which is assumed
to hold,
Px {f (Φi ) > 0 i.o.} = 1
for all initial conditions, which is seen to be a characterization of Harris recurrence by
taking f to be an indicator function.
To prove that (iii) implies (i) we will make use of the decomposition (17.24) and
essentially the same proof that was used when an atom was assumed to exist in Theo-
rem 17.2.1.
From (17.24) we have
m σ α̌ (0)+m −1
n 1 n
s (f ) + f (Φ )
f (Φi ) n n j =0 j k =1 k
i=1
n ≤ .
i=1 g(Φi ) n − 1 1 n −1
s (f )
n −1 j =0 j

Since by Theorem 17.3.1 the two sequences {s2k (f ) : k ∈ Z+ } and {s2k +1 (f ) : k ∈ Z+ }

are both i.i.d., we have from (17.23) and the LLN for i.i.d. sequences that

1 1 1
N N N
lim sk (f ) = lim sk (f ) + lim sk (f )
N →∞ N N →∞ N N →∞ N
k =1 k=1 k=1

k odd k even

= 12 δ −1 π(C)−1 m f dπ + δ −1 π(C)−1 m f dπ

= δ −1 π(C)−1 m f dπ.

Since n → ∞ a.s. it follows that

n
f (Φi ) f dπ
lim sup i=1
n ≤ .
n →∞ i=1 g(Φ i ) gdπ
Interchanging the roles of f and g gives an identical lower bound on the limit inﬁmum,
and this completes the proof.

Observe that this result holds for both positive and null recurrent chains. In the
positive case, substituting g ≡ 1 gives Theorem 17.2.1.
17.3. General Harris chains 437

17.3.3 Applications of the LLN

In this section we will describe two applications of the LLN. The ﬁrst is a technical result
which is generally useful, and will be needed when we prove the functional central limit
theorem for Markov chains in Section 17.4.
As a second application of the LLN we will give a proof that the dependent parameter
bilinear model is positive recurrent under a weak moment condition on the parameter
process.

The running maximum

As a simple application of the Theorem 17.3.2 we will establish here a bound on the
running maximum of g(Φk ).

Theorem 17.3.3. Suppose that Φ is positive Harris, and suppose that π(|g|) < ∞.
Then the following limit holds:
1
lim max |g(Φk )| = 0 a.s. [P∗ ].
n →∞ n 1≤k ≤n

Proof We may suppose without loss of generality that g ≥ 0.

It is easy to verify that the desired limit holds if and only if
1
lim g(Φn ) = 0 a.s. [P∗ ]. (17.25)
n →∞ n

It follows from Theorem 17.3.2 and positive Harris recurrence that

" n n −1
#
1 1
lim g(Φk ) − g(Φk ) = π(g) − π(g) = 0.
n →∞ n n−1
k =1 k =1

The left hand side of this equation is equal to

n −1
1 1 1
limg(Φn ) − g(Φk ).
n →∞ n nn−1
k =1

n −1
Since by Theorem 17.3.2 we have 1 1
n n −1 k =1 g(Φk ) → 0, it follows that (17.25) does
hold, and the proof is complete.

To illustrate the application of the LLN to the stability of stochastic models we will
now consider a linear system with random coeﬃcients.

The dependent parameter bilinear model

Here we revisit the dependent parameter bilinear deﬁned by (DBL1)–(DBL2).
We saw in Proposition 7.4.1 that this model is a Feller T-chain. Since Z is i.i.d.,
the parameter process θ is itself a Feller T-chain, which is positive Harris by Propo-
sition 11.4.2. Hence the LLN holds for θ, and this fact is the basis of our subsequent
analysis of this bilinear model.
438 Sample paths and limit theorems

Proposition 17.3.4. If (DBL1) and (DBL2) hold, then θ is positive Harris recurrent
with invariant probability πθ . For any f : R → R satisfying

{f (x) ∨ 0} πθ (dx) < ∞
R

we have

1
N
lim f (θk ) = f (x) πθ (dx) a.s. [P∗ ].
N →∞ N R
k =1

When θ0 ∼ πθ the process is strictly stationary and may be deﬁned on the positive
and negative time set Z. For this stationary process, the backwards LLN holds:

1
N
lim f (θ−k ) = f (x) πθ (dx) a.s. [Pπ θ ] . (17.26)
N →∞ N R
k =1

Proof The positivity of θ has already been noted prior to the proposition. The
ﬁrst limit then follows from Theorem 17.1.7 when R f (x) πθ (dx) > −∞. Otherwise, we
have from Theorem 17.1.7 and integrability of f ∨ 0, for any M > 0,

1 1
N N
lim sup f (θk ) ≤ lim sup f (θk ) ∨ (−M ) = {f (x) ∨ (−M )} πθ (dx),
N →∞ N N →∞ N R
k =1 k =1

and the right hand side converges to −∞ = πθ (f ) as M → ∞.

The limit (17.26) holds by stationarity, as in the proof of Theorem 17.1.2 (see [99]).

We now apply the LLN for θ to obtain stability for the joint process. The bound
(17.27) used in Proposition 17.3.5 is analogous to the condition that |α| < 1 in the
simple linear model. Indeed, suppose that we have the condition that |θk | is less than
one only in the mean: Eπ θ [|θk |] < 1. Then by Jensen’s inequality it follows that the
bound (17.27) is also satisﬁed.

Proposition 17.3.5. Suppose that (DBL1) and (DBL2) hold, and that

log |x| πθ (dx) < 0. (17.27)
R

Then the joint process Φ = Yθ is positive recurrent and aperiodic.

Proof To begin, recall from Theorem 7.4.1 that the joint process Φ = Yθ is a
ψ-irreducible and aperiodic T-chain.
For y ∈ R ﬁxed, let µy = πθ × δy denote the initial distribution which makes θ a
stationary process, and Y0 = y a.s. We will show that the distributions of Y , and hence
of Φ are tight whenever Φ0 ∼ µy . From the Feller property and Theorem 12.1.2, this
is suﬃcient to prove the theorem.
17.3. General Harris chains 439

The following equality is obtained by iterating equation (2.13):

k 1k 1
k
Yk +1 = ( θi )Wj + ( θi )Y0 + Wk +1 . (17.28)
j =1 i=j i=0

k
Establishing stability is then largely a matter of showing that the product i=j θi
converges to zero suﬃciently fast. To obtain such convergence we will apply the LLN
Proposition 17.3.4 and (17.27), which imply that as n → ∞,
1
1
n n
1
log 2
θ−i =2 log |θ−i | → 2 log |x| πθ (dx) < 0. (17.29)
n i=0
n i=0 R

We will see that this limit, together with stationarity of the parameter process, implies
k
exponential convergence of the product i=j θi to zero. This will give us the desired
bounds on Y . k
To apply (17.29), ﬁx constants L < ∞, 0 < ρ < 1, let Πj,k = i=j θi , and use
(17.28) and the inequality ab ≤ 12 (a2 + b2 ) to obtain the bound

Pµ y {|Yk +1 | ≥ L}
$ k %
≤ Pµ y |Πj,k ||Wj | + |Π0,k ||y| + |Wk +1 | ≥ L
j =1
$
k
k %
≤ Pµ y ρ−(k −j ) Π2j,k + ρ(k −j ) Wj2+1 ≥ 2L − (y 2 + 1)
j =0 j =0
$
k
1 + y2 % $ k
1 + y2 %
≤ Pµ y ρ−(k −j ) Π2j,k ≥ L − + Pµ y ρ(k −j ) Wj2+1 ≥ L − .
j =0
2 j =0
2

We now use stationarity of θ and independence of W to move the time indices within
the probabilities on the right hand side of this bound:

Pµ y {|Yk +1 | ≥ L}
$ k
1 + y2 %
≤ Pµ y ρ−(k −j ) Π2−(k −j ),0 ≥ L −
j =0
2
$
k
1 + y2 %
+ Pµ y ρ(k −j ) Wk2−j ≥ L −
j =0
2
$
∞
1 + y2 %
≤ Pµ y ρ− Π2− ,0 ≥ L −
2
=0
$
∞
1 + y2 %
+ Pµ y ρ W 2 ≥ L − . (17.30)
2
=0
∞
From Fubini’s Theorem we have, for any 0 < ρ < 1, that the sum =0 ρ W 2 converges
a.s. to a random variable with ﬁnite mean σw2 (1 − ρ)−1 .
440 Sample paths and limit theorems

∞
We now show that the sum =0 ρ− Π2− ,0 converges a.s. For this we apply the root
test. The logarithm of the nth root of the nth term an in this series is equal to

1
n
1
log(ann ) := log(ρ−n Π2−n ,0 ) n = − log(ρ) + 2
1
log |θ−i |.
n i=0

By (17.29) it follows that

1
lim log(ann ) = − log(ρ) + 2 log |x| πθ (dx),
n →∞ R
1
which is negative for sufficiently large ρ < 1. Fixing such
∞ a ρ, we have that limn →∞ ann <
− 2
1, and thus the root test is positive. Thus the sum =0 ρ Π− ,0 converges to a finite
limit with probability one.
By (17.30) and finiteness of the sums on the right hand side we conclude that

sup Pµ y {|Yk | ≥ L} → 0 as L → ∞,
k ≥0

which is the desired tightness property for the process Y .

This stability result may be surprising given the very weak conditions imposed,
and it may be even more surprising to ﬁnd that these conditions can be substantially
relaxed. It is really only the bound (17.27) together with stationarity of the parameter
process which was needed in the proof of tightness for the output process Y . The use
of the linear model θ was merely a matter of convenience.
This result illustrates the strengths and weaknesses of adopting boundedness in
probability, or even positive Harris recurrence as a stability condition. Although the
dependent parameter bilinear model is positive recurrent under (17.27), the behavior of
the sample paths of Y can appear quite explosive. To illustrate this, recall the simula-
tion given in Chapter 16 where we took the simple adaptive control model illustrated
in Figure 2.5, but set the control equal to zero for illustrative purposes. This gives the
model described in (DBL1)–(DBL2) with Z and W Gaussian N (0, σz2 ) and N (0, σw2 )
respectively, where σz = 0.2 and σw = 0.1. The parameter α is taken as 0.99. These
parameter values are identical to those of the simulation given for the simple adaptive
control model illustrated
on the left in Figure 2.5. The stability condition (17.27) holds
in this example since R log |x| πθ (dx) ≈ −0.3 < 0.
A sample path of log10 (|Yk |) is given in Figure 16.1. Note the gross diﬀerence in
behavior between this model and the simple adaptive control model with the control
intact: In less than 700 time points the output of the dependent parameter bilinear
model exceeds 10100 , while in the controlled case we see in Figures 2.5 and 2.6 that the
output is barely distinguishable from the disturbance W when σz = 0.2.

17.3.4 The CLT and LIL for Harris chains

We now give versions of the CLT and LIL without the assumption that a true atom
α ∈ B + (X) exists.
We will require the following bounds on the split chain constructed in this section.
These conditions will be translated back to a condition on a petite set in Section 17.5.
17.3. General Harris chains 441

CLT moment condition for the split chain

For the split chain constructed in this section, P̌x i {σα̌ < ∞} = 1 for all
xi ∈ X̌, and the function g and the atom α̌ jointly satisfy the bounds

σ α̌ 2
Ěν ∗ Zn (|g|) <∞ and Ěν ∗ σα̌2 < ∞. (17.31)
n =0

When these conditions are satisﬁed we will show that the CLT variance may be
written
γg2 = m−1 π̌(α̌)Ěα̌ [(s1 (g))2 ] + 2m−1 π̌(α̌)Ěα̌ [s1 (g)s2 (g)] (17.32)
where π̌ is the invariant probability measure for the split chain and π̌(α̌) = δπ(C).
We may now present

Theorem 17.3.6. Suppose that Φ is ergodic and that (17.31) holds. Then 0 ≤ γg2 < ∞,
and if γg2 > 0 then the CLT and LIL hold for g.

Proof The proof is only a minor modiﬁcation of the previous proof: we recall that
n := max(k : mσα̌ (k) ≤ n) and observe that in a manner similar to the derivation of
(17.17) we may show that

n −1

1 n
1

√ g(Φj ) − √ sj (g) → 0 a.s. (17.33)
n n j =0
j =0

From the LLN we have

n /m −1
n 1 π̌(α̌)
lim = lim I {(Φm k , Yk ) ∈ α̌} = a.s. [P∗ ]. (17.34)
n →∞ n n →∞ n m
k =1

This can be used to replace the upper limit of the second sum in (17.33) by a de-
terministic bound, just as in the proof of Theorem 17.2.2. Indeed, stationarity and
one-dependence of {sj (g) : j ≥ 1} allow us to apply Kolmogorov’s inequality Theo-
rem D.6.3 to obtain the following analogue of (17.20): letting n∗ := !m−1 π̌(α̌)n", we
have from (17.34) and (17.33) that

n∗
1 n
1
√ g(Φ ) − √ s (g)→0 (17.35)
n i
n j =0
j
i=0

in probability.
To complete the proof we will obtain a version of the CLT for one-dependent, sta-
tionary stochastic processes.
442 Sample paths and limit theorems

Fix an integer m ≥ 2 and deﬁne ηj = sj m +1 (g) + · · · + s(j +1)m −1 (g). For all n ∈ Z+
we may write
n /m −1 n /m −1
1
n n
1 1 1
√ sj (g) = √ ηj + √ sm j (g) + √ sj (g). (17.36)
n j =1 n n n
j =0 j =1 j =m n /m

The last term converges to zero in probability, so that it is suﬃcient to consider the ﬁrst
and second terms on the RHS of (17.36). Since {si (g) : i ≥ 1} is stationary and one-
dependent, it follows that {ηj } is an independent and identically distributed process,
and also that {sm j (g) : j ≥ 1} is i.i.d.
The common mean of the random variables {ηj } is zero, and its variance is given
by the formula
2
σm := Ě[ηj2 ] = (m − 1)Ě[s1 (g)2 ] + 2(m − 2)Ě[s1 (g)s2 (g)].

By the CLT for i.i.d. random variables, we have therefore

n /m −1
1
ηj −→ N (0, m−1 σm
d
√ 2
),
n j =0

and
n /m
1
sm j (g) −→ N (0, m−1 σs2 ),
d
√
n j =0

where σs2 = E[s1 (g)2 ]. Letting m → ∞ we have

m−1 σm
2
→ σ̄ 2 := Ě[s1 (g)2 ] + 2Ě[s1 (g)s2 (g)],
m−1 σs2 → 0,

from which it can be shown, using (17.36), that

1
n
d
√ sj (g) −→ N (0, σ̄ 2 ) as n → ∞.
n j =1

Returning to (17.35) we see that

1
n
√ g(Φi ) → N (0, m−1 π̌(α̌)σ̄ 2 ) as n → ∞
n i=0

which establishes the CLT.

We can use Theorem 17.3.1 to prove the LIL, where the details are much simpler.
We ﬁrst write, as in the proof of Theorem 17.2.2,
 
1 n n
√  g(Φk ) − sj (g) → 0 a.s.
2n log log n k =1 j =1
17.4. The functional CLT 443

Using an expression similar to (17.36) together with the LIL for i.i.d. sequences we
can easily show that the upper and lower limits of

1
n
- sk (g)
2nσ̄ 2 log log n k =1

are +1 and −1 respectively. Here the proof of Theorem 17.2.2 may be adapted to prove
the LIL, which completes the proof of Theorem 17.3.6.

17.4 The functional CLT

In this section we show that a sequence of continuous functions obtained by interpolating
the values of Sn (f ) converge to a standard Brownian motion. The machinery which we
develop to prove this result rests heavily on the stability theory developed in Chapters 14
and 15. These techniques are extremely appealing as well as powerful, and can lead
to much further insight into asymptotic behavior of the chain. Here we will focus on
just one result: a functional central limit theorem, or invariance principle for the chain.
This will allow us to reﬁne the CLT which was presented in the previous chapter as well
as allow us to obtain the expression (17.3) for the asymptotic variance.
We may now drop the aperiodicity assumption which was required in the previous
section because of the very diﬀerent approach taken.

17.4.1 Poisson’s equation

Much of this development is based upon the following identity, known as Poisson’s
equation:
ĝ − P ĝ = g − π(g). (17.37)
The function g is called the forcing function. In most cases the forcing function is given,
and then ĝ is called the solution to Poisson’s equation.
Given a function g on X with π(|g|) < ∞ we will require that a finite-valued solution
ĝ to Poisson’s equation (17.37) exist, and we will develop in this section sufficient
conditions under which this is the case. The assumption that ĝ is finite valued is made
without any real loss of generality. If ĝ solves Poisson’s equation for some finite-valued
function g, and if ĝ(x0 ) is finite for just one x0 ∈ X, then the set Sg of all x such that
|ĝ(x)| < ∞ is full and absorbing, and hence the chain may be restricted to the set Sg .
In the special case where g ≡ 0, solutions to Poisson’s equation are precisely what
we have called harmonic functions in Section 17.1.2. In general, if ĝ1 and ĝ2 are two
solutions to Poisson’s equation then the difference ĝ1 − ĝ2 is harmonic. This observation
is useful in answering questions regarding the uniqueness of solutions, as we see in
Proposition 17.4.1. Integrability of solutions is not guaranteed in general – a much
more generally applicable criterion for uniqueness is contained in Theorem 17.7.2.

Proposition 17.4.1. Suppose that Φ is positive Harris, and suppose that ĝ and ĝ• are
two solutions to Poisson’s equation with π(|ĝ| + |ĝ• |) < ∞. Then for some constant c,
ĝ(x) = c + ĝ• (x) for a.e. x ∈ X [π].
444 Sample paths and limit theorems

Proof We have already remarked that h := ĝ − ĝ• is harmonic. To show that h is

a constant we will require a strengthening of Theorem 17.1.5.
By iteration of the harmonic equation (17.8) we have P k h = h for all k, and hence
for all n,
1 k
n
h= P h.
n
k =1

Since by assumption π(|h|) < ∞, it follows from Theorem 14.3.6 that h(x) = π(h) for
a.e. x.

One approach to the question of existence of solutions to (17.37) when an atom α

exists is to let

σα
ĝ(x) = Gα (x, g) = Ex g(Φk ) . (17.38)
k =0

The expectationis well deﬁned if the chain is f -regular for some f ≥ |g|. Since 0 =
τ
π(g) = π(α)Eα [ kα=1 g(Φk )], we have

σα
P ĝ (x) = Ex g(Φk ) I(x ∈ α)
k =1

τα
+ Eα g(Φk ) I(x ∈ α)
k =1

σα
= Ex g(Φk ) I(x ∈ α).
k =1

Since ĝ(z) = g(z) for all z ∈ α, this shows that for all x,

σα
P ĝ (x) = Ex g(Φk ) − g(x) = ĝ(x) − g(x),
k =0

so that Poisson’s equation is satisﬁed.

This approach can be extended to general ergodic chains by considering a split chain.
However we will find it more convenient to follow a slightly different approach based
upon the ergodic and regularity theorems developed in Chapter 14.
First note the formal similarity between Poisson’s equation, which can be written
∆ĝ = −g + π(g), and the drift inequality (V3). Poisson’s equation and (V3) are closely
related, and in fact the inequality implies fairly easily that a solution to Poisson’s
equation exists. Assume that Φ is f -regular, so that (V3) holds for a function V which
is everywhere finite, and a set C which is petite. If Φ is aperiodic, and if π(V ) < ∞,
then from the f -Norm Ergodic Theorem 14.0.1 we know that there exists a constant
R < ∞ such that for any function g satisfying |g| ≤ f ,
∞

|P k (x, g) − π(g)| ≤ R(V (x) + 1).
k =0
17.4. The functional CLT 445

Hence the function ĝ deﬁned as

∞

ĝ(x) = {P k (x, g) − π(g)} (17.39)
k =0

also satisﬁes the bound |ĝ| ≤ R(V + 1), and clearly satisﬁes Poisson’s equation. We
state a generalization of this important observation as Theorem 17.4.2. The assumption
that π(V ) < ∞ is removed in Theorem 17.7.1.

Theorem 17.4.2. Suppose that Φ is ψ-irreducible, and that (V3) holds with V every-
where ﬁnite, f ≥ 1, and C petite. If π(V ) < ∞, then for some R < ∞ and any |g| ≤ f ,
Poisson’s equation (17.37) admits a solution ĝ satisfying the bound |ĝ| ≤ R(V + 1).

Proof The aperiodic case follows from absolute convergence of the sum in (17.39).
In the general periodic case it is convenient to consider the Ka ε chain, which is always
strongly aperiodic when Φ is ψ-irreducible by Proposition 5.4.5.
To begin, we will show that the resolvent or Ka ε -chain satisﬁes a version of (V3)
with the same function f and a scaled version of the function V used in the theorem.
We will on two occasions apply the identity

Ka ε = εKa ε P + (1 − ε)I , (17.40)

whose derivation is straightforward given the deﬁnition of the resolvent Ka ε . Hence by

(V3) for the kernel P ,

Ka ε V ≤ εKa ε (V − f + bIC ) + (1 − ε)V.

Since f ≤ (1 − ε)−1 Ka ε f it follows that with Vε equal to a suitable constant multiple

of V we have for some b ,

Ka ε Vε ≤ Vε − f + b Ka ε IC .

Since C is petite for Φ and hence also for the Ka ε -chain by Theorem 5.5.6, the set
Cn := {x : Ka ε (x, C) ≥ 1/n} is petite for the Ka ε -chain for all n. Note that C ⊆ Cn
for n suﬃciently large. Since Cn is petite we may adopt the proof of Theorem 14.2.9:
scaling Vε as necessary, we may choose n and bε so large that

Ka ε Vε ≤ Vε − f + bε IC n .

Thus the Ka ε -chain is f -regular. By aperiodicity there exists a constant Rε < ∞ such
that for any |g| ≤ f , we have a solution ĝε to Poisson’s equation

Ka ε ĝε = ĝε − g (17.41)

satisfying |ĝε | ≤ Rε (V + 1).

To complete the proof let
ε ε
ĝ := Ka ĝε = (ĝε − g). (17.42)
1−ε ε 1−ε
446 Sample paths and limit theorems

Writing (17.40) in the form

ε 1
P Ka ε = Ka − I
1−ε 1−ε ε
we have by applying both sides to ĝε

P ĝ = ε−1 ĝ − ĝε = ε−1 ĝ − (ε−1 − 1)ĝ − g = ĝ − g

so that Poisson’s equation is satisﬁed.

The signiﬁcance of Poisson’s equation is that it enables us to apply martingale theory
to analyze the series Sn (g). If ĝ solves Poisson’s equation, then we may write for any
n ≥ 1,

n
n
Sn (g) = g(Φk ) = [ĝ(Φk ) − P ĝ (Φk )]
k =1 k =1
n
n
= [ĝ(Φk ) − P ĝ (Φk −1 )] + [P ĝ(Φk −1 ) − P ĝ (Φk )].
k =1 k =1

The second sum on the right hand side is a telescoping series, which telescopes to
P ĝ (Φ0 ) − P ĝ (Φn ). We will prove in Theorem 17.4.3 that the ﬁrst sum is a martingale,
which shall be denoted

n
Mn (g) = [ĝ(Φk ) − P ĝ (Φk −1 )]. (17.43)
k =1

Hence Sn (g) is equal to a martingale, plus a term which can be easily bounded. We
summarize these observations in
Theorem 17.4.3. Suppose that Φ is positive Harris and that a solution to Poisson’s
equation (17.37) exists with |ĝ| dπ < ∞. Then when Φ0 ∼ π, the series Sn (g) may be
written
Sn (g) = Mn (g) + P ĝ (Φ0 ) − P ĝ (Φn ) (17.44)
where (Mn (g), FnΦ ) is the martingale deﬁned in (17.43).

Proof The expression (17.44) was established prior to the theorem statement. To
see that (Mn (g), FnΦ ) is a martingale, apply the identity

ĝ(Φk ) − P ĝ (Φk −1 ) = ĝ(Φk ) − E[ĝ(Φk ) | FkΦ−1 ].

The integrability condition on ĝ is imposed so that

Eπ [|ĝ(Φk ) − E[ĝ(Φk ) | FkΦ−1 ]|] < ∞, k ≥ 1,

and hence also Eπ [|Mn |] < ∞ for all n.

Theorem 17.4.3 adds a great deal of structure to the problem of analyzing the partial
sums Sn (g) which we may utilize by applying the results of Section D.6.2 for square
integrable martingales.
17.4. The functional CLT 447

17.4.2 The functional CLT for Markov chains

We now combine the functional CLT for martingales (Theorem D.6.4) and Theo-
rem 17.4.3 to give a functional CLT for Markov chains. In the following main result of
this section we consider the function sn (t) which interpolates the values of the partial
sums of g(Φk ):

sn (t) = Sn t (g) + (nt − nt) Sn t+1 (g) − Sn t (g) . (17.45)

Theorem 17.4.4. Suppose that Φ is positive Harris, and suppose that g is a function
on X for which a solution ĝ to the Poisson equation exists with π(ĝ 2 ) < ∞. If the
constant
γg2 := π(ĝ 2 − {P ĝ}2 ) (17.46)
is strictly positive, then as n → ∞,

(nγg2 )−1/2 sn (t) −→ B

d
a.s. [P∗ ]

where B denotes a standard Brownian motion on [0, 1].

Proof Using an obvious generalization of Proposition 17.1.6 we see that it is enough

to prove the theorem when Φ0 ∼ π. From Theorem 17.4.3 we have

Sn (g) = Mn (g) + P ĝ (Φ0 ) − P ĝ (Φn ).

Deﬁning the stochastic process mn (t) for t ∈ [0, 1] as in (D.7) by

mn (t) = Mn t (g) + (nt − nt) Mn t+1 (g) − Mn t (g) , (17.47)

it follows that for all t ∈ [0, 1],

(nγg2 )−1/2 |sn (t) − mn (t)| ≤ (nγg2 )−1/2 |P ĝ (Φ0 )|

+ (nγg2 )−1/2 max |P ĝ (Φk )| . (17.48)
1≤k ≤n

Since π(ĝ 2 ) < ∞, by Jensen’s inequality we also have π({P ĝ}2 ) < ∞. Hence by
Theorem 17.3.3 it follows that
1
max {P ĝ (Φk )}2 → 0 a.s. [Pπ ]
n 1≤k ≤n

as n → ∞, and from (17.48) we have

sup (nγg2 )−1/2 |sn (t) − mn (t)| → 0 a.s. [Pπ ]

0≤t≤1

as n → ∞. That is, |(nγg2 )−1/2 (sn − mn )|c → 0 in C[0, 1] with probability one. To prove
d
the theorem, it is therefore suﬃcient to show that (nγg2 )−1/2 mn (t) −→ B.
We complete the proof by showing that the conditions of Theorem D.6.4 hold for
the martingale Mn (g).
448 Sample paths and limit theorems

To show that (D.8) holds note that

Eπ [(Mk (g) − Mk −1 (g))2 | FkΦ−1 ] = Eπ [(ĝ(Φk ) − P ĝ (Φk −1 ))2 | FkΦ−1 ]

= P ĝ 2 (Φk −1 ) − {P ĝ (Φk −1 )}2 .

Since we have assumed that ĝ 2 is π-integrable, it follows that the function P ĝ 2 − {P ĝ}2
is also π-integrable. Hence the LLN holds:

1
n
lim Eπ [(Mk (g) − Mk −1 (g))2 | FkΦ−1 ] = π(P ĝ 2 − {P ĝ}2 ) = γf2 a.s.
n →∞ n
k =1

We now establish (D.9). Again by the LLN we have for any b > 0,

1
n
lim Eπ [(Mk (g) − Mk −1 (g))2 I{(Mk (g) − Mk −1 (g))2 ≥ b} | FkΦ−1 ]
n →∞ n
k =1
= Eπ [(ĝ(Φ1 ) − P ĝ (Φ0 ))2 I{(ĝ(Φ1 ) − P ĝ (Φ0 ))2 ≥ b}]

which tends to zero as b → ∞. It immediately follows that (D.9) holds for any ε > 0,
and this completes the proof.

As an illustration of the implications of Theorem 17.4.4 we state the following
corollary, which is an immediate consequence of the fact that both h(u) = u(1) and
h(u) = max0≤t≤1 u(t) are continuous functionals on u ∈ C[0, 1].
Theorem 17.4.5. Under the conditions of Theorem 17.4.4, the CLT holds for g with
γg2 given by (17.46), and as n → ∞,

(nγg2 )−1/2 max Sk (g) −→ max B(t).

d
1≤k ≤n 0≤t≤1

17.4.3 The representations of γg2

It is apparent now that the asymptotic variance in the CLT can take on many different
forms depending on the context in which this limit theorem is proven. Here we will
briefly describe how the various forms may be identified and related.
The CLT variance given in (17.46) can be transformed by substituting in Poisson’s
equation (17.37), and we thus obtain

γg2 = π(ĝ 2 − {ĝ − g}2 ) = 2π(ĝg) − π(g 2 ) = Eπ [2ĝ(Φ0 )g(Φ0 ) − g 2 (Φ0 )]. (17.49)

Substituting in the particular solution (17.39), which we may write as

∞

ĝ(x) = P k (x, g),
k =0

results in the expression

∞

γg2 = π(g 2 ) + 2π( gP k (x, g)) . (17.50)
k =1
17.4. The functional CLT 449

This immediately gives the representation (17.3) for γg2 whenever the expectation with
respect to π and the infinite sum may be interchanged. We will give such conditions in
the next section, under which the identity (17.3) does indeed hold.
Note that if we substituted in a different formula for ĝ we would arrive at an entirely
different formula. We now show that by taking the specific form (17.38) for ĝ we
can connect the expression for the asymptotic variance given in Section 17.2 with the
formulas given here.
Recall that using the approach of Section 17.2 based upon the existence of an atom
we arrived at the identity

τα 2
γg2 = π(α)Eα g(Φk ) . (17.51)
1

It may seem unlikely a priori that the two expressions (17.49) and (17.51) coincide.
However, as required by the theory, it is of course true that the identity

τα 2
π(α)Eα g(Φk ) = Eπ [2ĝ(Φ0 )g(Φ0 ) − g 2 (Φ0 )] (17.52)
k =1

holds whenever an atom α ∈ B+ (X) exists. To see this we will take

τα
ĝ(x) = Ex g(Φj )
j =0

which is the speciﬁc solution (17.38) to Poisson’s equation. By the representation of π

using the atom α and the formula for the solution ĝ to Poisson’s equation we then have

τα
Eπ [2ĝ(Φ0 )g(Φ0 ) − g 2 (Φ0 )] = π(α)Eα 2g(Φk )ĝ(Φk ) − g 2 (Φk )
k =1

τα
σα
= π(α)Eα 2g(Φk )EΦ k g(Φj ) − g 2 (Φk )
k =1 j =0

τα σα
= π(α)Eα 2g(Φk )E θk g(Φj ) | FkΦ − g 2 (Φk ) .
k =1 j =0

For any k ≥ 1 we have on the event {k ≤ τα },

σα
τα
θk g(Φj ) = g(Φj )
j =0 j =k
450 Sample paths and limit theorems

and hence the previous equation gives

τα
τα
Eπ [2ĝ(Φ0 )g(Φ0 ) − g 2 (Φ0 )] = π(α)Eα 2g(Φk )E g(Φj ) | FkΦ − g 2 (Φk )
k =1 j =k

τα
τα
= π(α)Eα E 2g(Φk )g(Φj ) − g 2 (Φk ) | FkΦ
k =1 j =k

τ α
τα
= π(α)Eα 2g(Φk )g(Φj ) − g 2 (Φk )
k =1 j =k

τα 2
= π(α)Eα g(Φk )
k =1

which gives (17.52).

We now apply the martingale and atom-based approaches simultaneously to obtain
criteria for the CLT and LIL.

17.5 Criteria for the CLT and the LIL

In this section we give more easily verifiable conditions under which the CLT and
LIL hold for general Harris chains. Up to now, our assumptions on the chain involve
the statistics of the return time to the atom α̌ for the split chain, or integrability
conditions on a solution to Poisson’s equation. Neither of these assumptions is easy
to interpret, and therefore it is crucial to connect them to verifiable properties of the
one-step transition function P . We do this now by proving that a drift property gives
a sufficient condition under which the CLT and LIL are valid. Under this condition we
will also show that the CLT variance may be written in the form (17.3).
The following conditions will be imposed throughout this section:

CLT moment condition on V, f

The chain Φ is ergodic, and there exists a function f ≥ 1, a ﬁnite-valued
function V and a petite set C satisfying (V3).
Letting π denote the unique invariant probability measure for the chain,
we assume that π(V 2 ) < ∞.

The integrability condition on V 2 can be obtained by applying Theorem 14.3.7, but

this condition may be diﬃcult to verify in practice. For this reason we give in the
following lemma a stronger condition under which this bound is satisﬁed automatically.

Lemma 17.5.1. If Φ is V √-uniformly√

ergodic, then √
the CLT moment condition on V, f
are satisﬁed with V = (1 − 1 − β)−1 V and f = V .
17.5. Criteria for the CLT and the LIL 451

Proof It follows from Lemma 15.2.9 that the chain is V -uniform, and hence (V3)
holds with this V . The finiteness of π(V 2 ) follows from finiteness of π(V ), which is a
consequence of the f -Norm Ergodic Theorem 14.0.1.

The following result shows that (V3) provides a sufficient condition under which the
assumptions imposed in Section 17.4 and Section 17.3 are satisfied.
Lemma 17.5.2. Under the CLT moment condition on V, f above we have:
(i) there exists a constant R < ∞ such that for any function g which satisfies the
bound |g| ≤ f , Poisson’s equation (17.37) admits a solution ĝ with |ĝ| ≤ R(V +1);
(ii) the split chain satisfies the bound
τ
α̌ −1 2
Ěα̌ Z (f ) <∞ (17.53)
=0

and hence the CLT moment condition (17.31) holds for any function g with |g| ≤
f.

Proof Result (i) is simply a restatement of Theorem 17.4.2, so it is enough to

prove (ii).
Under the CLT moment condition on V, f above, Φ is f -regular, and hence the
m-skeleton is f (m ) -regular by Theorem 14.2.10. Hence the split chain Φ̌ for the m-
skeleton is f (m ) -regular if the set C used in the splitting is a sublevel set of V , and from
Theorem 14.2.3 applied to the m-skeleton we have for some R0 < ∞ and any xi ∈ X̌,

τ α̌
Ěx i f (m ) (Φ̌k ) ≤ R0 (V (x) + 1)
k =0

where we deﬁne f (m ) (Φ̌k ) = f (m ) (Φm k , Yk ) := f (m ) (Φk ).

Since {τα̌ ≥ k} ∈ F̌m k = σ{Yi : i ≤ k, Φj : j ≤ mk}, we have for all xi ,

τ α̌ ∞

Ěx i Zk (f ) = Ěx i [Zk (f )I{τα̌ ≥ k}]
k =0 k =0
∞
= Ěx i [Ě[Zk (f ) | F̌m k ]I{τα̌ ≥ k}].
k =0

From (17.21) we may ﬁnd R1 < ∞ such that for i = 0, 1,

Ě[Zk (f ) | F̌m k ; Φ̌k = (Φm k , Yk ) = (x, i)] ≤ R1 f (m ) (x),
and hence

τ α̌
Ěx i Zk (f ) ≤ R0 R1 (V (x) + 1), xi ∈ X̌.
k =0

Under the assumption that π(V 2 ) < ∞ we see from the representation of π that
τ
α̌ −1
τ α̌ 2
Ěα̌ ĚΦ̌ Zk (f ) ≤ (π̌(α̌))−1 (R0 R1 )2 π([V + 1]2 ) < ∞. (17.54)
=0 k =0
452 Sample paths and limit theorems

Similar arguments give the bound

τ
α̌ −1 2 2
Ěα̌ Z (f ) = (π̌(α̌))−1 Eπ Z0 (f ) ≤ (π̌(α̌))−1 m2 π(f 2 ) < ∞. (17.55)
=0

Combining (17.54) and (17.55) we obtain

τ
α̌ −1
τ α̌ 2
Ěα̌ Z (f ) + ĚΦ̌ + 1 Zk (f ) < ∞.
=0 k =0

It is now relatively easy to show that the bound (17.53) holds. We may calculate using
the ordinary Markov property,

τ
α̌ −1
τ α̌ 2
∞ > Ěα̌ Z (f ) + ĚΦ̌ + 1 Zk (f )
=0 k =0
τ
α̌ −1
τ α̌ 2
= Ěα̌ Z (f ) + Ě Zk (f ) | F̌m ( +1)
=0 k = +1
τ
α̌ −1
τ α̌ τ
α̌ −1 2
≥ 2Ěα̌ Z (f )Ě Zk (f ) | F̌m ( +1) + Ěα̌ Z (f )
=0 k = +1 =0
τ
α̌ −1
τ α̌ τ
α̌ −1 2
= 2Ěα̌ Z (f )Zk (f ) + Ěα̌ Z (f )
=0 k = +1 =0
τ
α̌ −1 2
= Ěα̌ Z (f ) .
=0

Theorem 17.5.3. Assume the CLT moment condition on V, f , and let g be a function
on X with |g| ≤ f . Then the constant γg2 deﬁned as

γg2 := π(ĝ 2 − (P ĝ)2 )

is well deﬁned, non-negative, and ﬁnite, and may be written as

1 2 ∞

γg2 = lim Eπ Sn (g) = Eπ [g 2 (Φ0 )] + 2 Eπ [g(Φ0 )g(Φk )] (17.56)
n →∞ n
k =1

where the sum converges absolutely.

If γg2 > 0, then the CLT and LIL hold for g.

Proof To obtain the representation (17.56) for γg2 , apply the identity (17.44), from
which we obtain
Eπ [(Sn (g) − Mn (g))2 ] ≤ 4π(ĝ 2 ).
17.5. Criteria for the CLT and the LIL 453

n
Since Eπ [Mn (g)2 ] = 1 Eπ [(Mk − Mk −1 )2 ] = nγg2 , it follows that n1 Eπ [Sn (g)2 ] → γg2 as
n → ∞. ∞
We now show that n1 Eπ [Sn (g)2 ] → −∞ Eπ [g(Φ0 )g(Φk )].
First we show that this sum converges absolutely. By the f -Norm Ergodic Theo-
rem 14.0.1 we have for some R < ∞, and each x,
∞
∞

|Ex [g(Φ0 )g(Φk )]| ≤ |g(x)|
P k (x, · ) − π
f
k =0 k =0
≤ |g(x)|R(V (x) + 1).

Since |g| is bounded by f , which is bounded by a constant times V + 1, it follows that

for some R ,
∞
|Ex [g(Φ0 )g(Φk )]| ≤ R (V 2 (x) + 1)
k =0

and hence
∞

|Eπ [g(Φ0 )g(Φk )]| ≤ R (π(V 2 ) + 1) < ∞.
k =0

We now compute γg2 : For each n we have by invariance,

1
n n
1 2 2
Eπ [Sn (g) ] = Eπ [g(Φ0 ) ] + 2 Eπ [g(Φk )g(Φj )]
n n
k =1 j =k +1

1
n −1 n −1−k
= Eπ [g(Φ0 )2 ] + 2 Eπ [g(Φ0 )g(Φi )] ,
n i=1
k =0
∞
and the right hand side converges to −∞ Eπ [g(Φ0 )g(Φk )] as n → ∞.
To prove that the CLT and LIL hold when γg2 > 0, observe that by Lemma 17.5.2
under the conditions of this section the hypotheses of both Theorem 17.3.6 and Theo-
rem 17.4.5 are satisﬁed. Theorem 17.3.6 gives the CLT and LIL, and Theorem 17.4.5
shows that the asymptotic variance is equal to π(ĝ 2 − (P ĝ)2 ).

So far we have left open the question of what happens when γg2 = 0. Under the
conditions of Theorem 17.5.3 it may be shown that in this case
1 d
√ Sn (g) −→ 0.
n

We leave the proof of this general result to the reader. In the next result we give a
criterion for the CLT and LIL for V -uniformly ergodic chains, and show that for such
chains √1n Sn (g) converges to zero with probability one when γg2 = 0.

Theorem 17.5.4. Suppose that Φ is V -uniformly ergodic. If g 2 < V , then the conclu-
sions of Theorem 17.5.3 hold, and if γg2 = 0, then

1
√ Sn (g) → 0 a.s. [P∗ ].
n
454 Sample paths and limit theorems

Proof In view of Lemma 17.5.1 and Theorem 17.5.3, the only result which requires
proof is that ( √1n Sn (g) : n ≥ 1) converges to zero when γg2 = 0.
Recalling (17.44) we have

Sn (g) = Mn (g) + P ĝ (Φ0 ) − P ĝ (Φn ) .

We have shown that √1n P ĝ (Φn ) → 0 a.s. in the proof of Theorem 17.4.4. To prove the
theorem we will show that (Mn (g)) is a convergent sequence.
We have for all n and x,

n
Ex [(Mn (g))2 ] = Ex [P (Φk −1 , ĝ 2 ) − P (Φk −1 , ĝ)2 ] .
k =1

Letting G(x) = P (x, ĝ 2 ) − P (x, ĝ)2 we have 0 ≤ G ≤ RV for some R < ∞, and
π(G) = γg2 = 0. Hence by Theorem 15.0.1,

n ∞

Ex [(Mn (g))2 ] = Ex [G(Φk −1 )] ≤ |P k (x, G) − π(G)| < ∞ .
k =1 k =0

By the Martingale Convergence Theorem D.6.1 it follows that (Mn (g)) converges to a
ﬁnite constant, and is hence bounded in n with probability one.

17.6 Applications
From Theorem 17.0.1 we see that any of the V -uniform models which were studied
in the previous chapter satisfy the CLT and LIL as long as the asymptotic variance is
positive. We will consider here two models where moment conditions on the disturbance
process may be given explicitly to ensure that the CLT holds. In the ﬁrst we avoid
Theorem 17.0.1 since we can obtain a stronger result by using Theorem 17.5.3, which
is based upon the CLT moment condition of the previous section.

17.6.1 Random walks and storage models

Consider random walk on a half line given by Φn = [Φn −1 + Wn ]+ , and assume that
the increment distribution Γ is has negative first moment and a finite fifth moment.
We have analyzed this model in Section 14.4 where it was shown in Proposition 14.4.1
that under these conditions the chain is (x4 + 1)-regular.
Let f (x) = |x|+1 and V (x) = cx2 , with c > 0. From (14.29) we have that (V3) holds
for some c, and we have just noted that the chain is V 2 -regular. Hence the conditions
imposed in Section 17.5 are satisfied, and applying Theorem 17.5.3 we see that the CLT
and LIL hold for any g satisfying |g| ≤ f .
In particular, on setting g(x) = x we see that the CLT and LIL hold for Φ itself.

Proposition 17.6.1. If the increment distribution Γ has mean β < 0 and ﬁnite ﬁfth
moment, then the associated random walk on a half line is positive Harris and the CLT
and LIL hold for the process {Φk : k ≥ 0}.
17.6. Applications 455

∞
The asymptotic variance may be written using (17.3) as γg2 = −∞ Eπ [Φ̄k Φ̄0 ], or
using (17.13) with α = {0} we have

τ0 2
γg2 = π(0)E0 Φk − Eπ [Φk ] .
k =1

17.6.2 Linear state space models

Here we illustrate Theorem 17.0.1. We can easily obtain conditions under which the
CLT holds for the linear state space model, and explicitly calculate the asymptotic
variance. To avoid unnecessary technicalities we will assume that E[W ] = 0.
Let Yk = c Xk , k ∈ Z+ , where c ∈ Rn . If the eigenvalue condition (LSS5) holds,
then we have seen in Proposition 12.5.1 that a unique invariant probability π exists,
and hence a stationary version of the process Yk also exists, deﬁned for k ∈ Z. The
stationary process can be realized as
∞

Yk = h Wk − ,
=0

where h = c F G and (Wk : k ∈ Z) are i.i.d. with mean zero and covariance ΣW =
E[W W ], which is assumed to be ﬁnite in (LSS2).
Let R(k) denote the autocovariance sequence for the stationary process:

R(k) = Eπ [Yk Y0 ], k ∈ Z.

If the CLT holds for the process Y , then we have seen that the asymptotic variance,
which we shall denote γc2 , is equal to
∞

γc2 = R(k). (17.57)
k =−∞

The autocovariance sequence can be analyzed through its Fourier series, and this ap-
proach gives a simple formula for the limiting variance γc2 .
The process Y has a spectral density D(ω) which is obtained from the autocovari-
ance sequence through the Fourier series
∞

D(ω) = R(m)eim ω ,
m =−∞

and R(m) can be recovered from D(ω) by the integral

π
1
R(m) = e−im ω D(ω) dω .
2π −π
It is a straightforward exercise (see [225], p. 66) to show that the spectral density has
the form
D(ω) = H(eiω )ΣW H(eiω )∗
456 Sample paths and limit theorems

where
∞

iω
H(e ) = h ei ω = c (I − eiω F )−1 G.
=0

From these calculations we obtain the following CLT for the linear state space model:
Theorem 17.6.2. Consider the linear state space model deﬁned by (LSS1) and (LSS2).
If the eigenvalue condition (LSS5), the nonsingularity condition (LSS4) and the con-
trollability condition (LCM3) are satisﬁed, then the model is V -uniformly ergodic with
V (x) = |x|2 + 1.
For any vector c ∈ Rn , the asymptotic variance is given by the formula

γc2 = c (I − F )−1 GΣW G (I − F )−1 c,

and the CLT and LIL hold for process Y when γc2 > 0.

Proof We have seen in the proof of Theorem 12.5.1 that (V4) holds for the linear
state space model with V (x) = 1 + x M x, where M is a positive matrix (see (12.34)).
Under the conditions of Theorem 17.6.2 we also have that Φ is a ψ-irreducible and
aperiodic T-chain by Proposition 6.3.5. By Lemma 17.5.1 and Theorem 17.5.2 it follows
that the CLT and LIL hold for Y and that the asymptotic variance is given by (17.57).
The closed form expression for γc follows from the chain of identities
∞

γc2 = R(k) = D(0) = c (I − F )−1 GΣW G (I − F )−1 c.
k =−∞

Had we proved the CLT for vector-valued functions of the state, it would be more
natural in this example to prove directly that the CLT holds for X. In fact, an extension
of Theorem D.6.4 to vector-valued processes is possible, and from such a generalization
we have under the conditions of Theorem 17.6.2 that

1
n
Xk Xk −→ N (0, Σ)
d
√
n
k =1

where Σ = (I − F )−1 GΣW G (I − F )−1 .

17.7 Commentary*
The results of this chapter may appear considerably deeper than those of other chapters,
although in truth they are often straightforward from more global stochastic process
results, given the embedded regeneration structure of the split chain, or given the
existence of a stationary version (that is, of an invariant probability measure) for the
chain.
One of the achievements of this chapter is the identiﬁcation of these links, and in
particular the development of a drift-condition approach to the sample path and central
limit laws.
17.7. Commentary* 457

These laws are of value for Markov chains exactly as they are for all stochastic
processes: the LLN and CLT, in particular, provide the theoretical basis for many
results in the statistical analysis of chains as they do in related fields. In particular,
the standard proofs of asymptotic efficiency and unbiasedness for maximum likelihood
estimators is largely based upon these ergodic theorems. For this and other applications,
the reader is referred to [151].
The Law of Large Numbers has a long history whose surface we can only skim
here. Theorem 17.1.2 is a result of Doob [99], and the ratio form for Harris chains
Theorem 17.3.2 is given in Athreya and Ney [14]. Chapter 3 of Orey [309] gives a good
overview of related ratio limit theorems.
The classic text of Chung [71] gives in Section I.16 the CLT and LIL for chains on a
countable space from which we adopt many of the proofs of the results in Section 17.2
and Section 17.3. Versions of the Central Limit Theorem for Harris chains may be
found in Cogburn [74] and in Nummelin and Niemi [303, 300]. The paper [300] presents
an excellent survey of what was the state of the art at that time, and also an excellent
development of CLTs in a context more general than we have given.
Neveu remarks in [296] that “the relationship between the theory of martingales
and the theory of Markov chains is very deep”. At that time he referred mainly to
the connections between harmonic functions, martingales, and first hitting probabilities
for a Markov chain. In Section III-5 of [296] he develops fairly briefly a remarkably
strong classification of a Markov chain as either recurrent or transient, based mainly
on martingale limit theory and the existence of harmonic functions. Certainly the
connections between martingales and Markov chains are substantial. From the largely
martingale-based proof of the functional CLT described in this chapter, and the more
general implications of Poisson’s equation and its associated martingale to the ergodic
theory of Markov chains, it appears that the relationship between Markov chains and
martingales is even richer than was thought at the time of Neveu’s writing.
The martingale approach via solutions to Poisson’s equation which is developed in
Section 17.4 is adopted from Duflo [102] and Maigret [242].
For further results on the potential theory of positive kernels we refer the reader to
the seminal work of Neveu [295], Revuz [326] and Constantinescu and Cornea [77], and
to Nummelin [304] for the most current development. Applications to Markov processes
evolving in continuous time are developed in Neveu [295], Kunita [229], and Meyn and
Tweedie [278].
For an excellent account of Central Limit Theorems and versions of the Law of
the Iterated Logarithm for a variety of processes the reader is referred to Hall and
Heyde [151]. Martingale limit theory as presented in, for example, Hall and Heyde [151]
allows several obvious extensions of the results given in Section 17.4. For example, a
functional Law of the Iterated Logarithm for Markov chains can be proved in a manner
similar to the functional Central Limit Theorem given in Theorem 17.4.4. Using the
almost sure invariance principle given in Brosamler [54] and Lacey and Philipp [233], it
is likely that an almost sure Central Limit Theorem for Markov chains may be obtained
under an appropriate drift condition, such as (V4).
In work closely related to the development of Section 17.4, Kurtz [231] considers
chains arising in models found in polymer chemistry. These models evolve on the
surface of a three-dimensional sphere X = S 2 , and satisfy a multidimensional version of
458 Sample paths and limit theorems

Poisson’s equation:
P (x, dy)y = ρx
X

where |ρ| < 1. Bhattacharaya [35] also considers the CLT and LIL for Markov processes,
using an approach based upon the analogue of Poisson’s equation in continuous time.
If a solution to Poisson’s equation cannot be found directly as in [231], then a more
general approach is needed. This is the main motivation for the development of the
drift criteria (V3) and (V4) which is central to this chapter, and all of Part III. Most of
these results are either new or very recent in this general state space context. Meyn and
Tweedie [277] use a variant of (V4) to obtain the CLT and LIL for ψ-irreducible Markov
chains giving Theorem 17.0.1, and the use of (V3) to obtain solutions to Poisson’s
equation is taken from Glynn and Meyn [139]. Applications to random walks and
linear models similar to those given in Section 17.6 are also developed in [139].
Proposition 17.3.5, which establishes stability of the dependent parameter bilinear
model, is taken from Brandt et al. [45] where further related results may be found.
The finiteness of the fifth moment of the increment process which is imposed in
Proposition 17.6.1 is close to the right condition for guaranteeing that the random walk
obey the CLT. Daley [83] shows that for the GI/G/1 queue a fourth moment condition
is necessary and sufficient for the absolute convergence of the sum
∞

Eπ [Φ̄k Φ̄0 ]
−∞

where Φ̄k = Φk − Eπ [Φk ]. Recall that this sum is precisely the asymptotic variance
used in Proposition 17.6.1. This strongly suggests that the CLT does not hold for the
random walk on the half line when the increment process does not have a ﬁnite fourth
moment, and also suggests that the CLT may indeed hold when the fourth moment is
ﬁnite. These subtleties are described further in [139].

Commentary for the second edition: Of all the topics covered in this book, those
in this chapter have seen the greatest growth since 1996. The number of recognized
open questions has grown at least as quickly as the number of papers providing answers.
Section 20.2 contains a survey of advances in simulation methodology based on theory
developed in this book.
The CLT for Markov chains is better understood today. Suﬃcient conditions for the
CLT are obtained in [252] under conditions that appear close to minimal, and minimal
conditions for chains that are reversible1 are established in [209].
A future edition of this book will surely draw from Jones’s survey [183], which
contains many examples along with a streamlined account of the theory. Another survey
by Landim [236] develops theory for reversible chains. The rate of convergence in the
CLT for geometrically ergodic chains is investigated in [218, 219] – see Section 20.1.5
for results concerning more exotic limit theory, such as large deviations.
Looking back at the ﬁrst edition, it is a surprise to see how little attention is devoted
to Poisson’s equation (17.37). This equation is central to many areas in statistics and
engineering:
1 See discussion surrounding (20.5) in the new Chapter 20.
17.7. Commentary* 459

(i) Approximate solutions to Poisson’s equation are used to obtain performance

bounds in Markov models [226, 224, 33, 32, 267].

(ii) This equation emerges in various aspects of statistics and limit theory such as
Markov renewal theory [132, 133] and reﬁnements of the CLT [218, 219].

(iii) The martingale property described in Theorem 17.4.3 is central to variance anal-
ysis of simulation algorithms. Section 20.2.1 contains a brief survey on the appli-
cation of Poisson’s equation to variance-reduction techniques.

(iv) In controlled Markov models (also called Markov decision processes, or MDPs), a
variant of Poisson’s equation is known as the (average cost) dynamic programming
equation. In this context, the function g appearing in (17.37) is the associated
cost function, and the solution ĝ is called the relative value function [28, 262, 261,
67, 263, 42, 27, 267].

(v) Perturbation theory is typically addressed using Poisson’s equation, following the
work of Schweitzer [347]. Suppose that {Pα : α ∈ (−1, 1)} is a family of transi-
tion kernels, each ergodic with invariant measure πα . Let c denote a measurable
function on X, let ηα = πα (c), and let ĉα denote the solution to Poisson’s equation,

Pα ĉα = ĉα − c + ηα .

Assuming diﬀerentiability of each term, along with suitable regularity conditions,

we obtain from the product rule of diﬀerentiation

Pα ĉα + Pα ĉα = ĉα + ηα .

Under mild additional assumptions we obtain the sensitivity formula

ηα = Pα (x, dy)ĉα (y) πα (dx). (17.58)

This result is a foundation of the theory of singularly perturbed Markov chains

[316, 214, 16], it is used in the analysis of numerical integration [351], and it is
a major component of the theory of actor-critic algorithms in machine learning
[217].

(vi) The ‘multiplicative Poisson equation’ is central to the theory of large deviations
for Markov chains – see Section 20.1.5 – and also risk-sensitive optimal control
for MDP models [41]. Closely related techniques are also used in the analysis of
change-detection algorithms [131].

The existence of a solution to Poisson’s equation is guaranteed without the restrictive

assumption π(V ) < ∞ imposed in Theorem 17.4.2. The following improvement is new
– it is based on the countable state space result [267, Theorem A.4.5].

Theorem 17.7.1. Suppose that Φ is ψ-irreducible, and that (V3) holds with V every-
where ﬁnite, f ≥ 1, and C petite. Then, for some B < ∞ and any |g| ≤ f , the Poisson
equation (17.37) admits a solution ĝ satisfying the bound |ĝ| ≤ B(V + 1).
460 Sample paths and limit theorems

Proof Suppose first that the chain is strongly aperiodic. We then consider the
split chain – the solution to Poisson’s equation is given by ĝ(x) = Gα (x, g), as discussed
following (17.38).
In the completely general setting we proceed as in the proof of Theorem 17.4.2. The
resolvent kernel Ka ε defined in (3.26) is strongly aperiodic for any ε ∈ (0, 1). We can
solve Poisson’s equation (17.41) for this kernel: the solution satisfies |ĝε | ≤ Bε (V +1) for
some fixed constant Bε . We then recall (17.42), which defines ĝ = ε(1−ε)−1 Ka ε ĝε . The
function ĝ solves Poisson’s equation, and this completes the proof with B = ε(1−ε)−1 Bε .

Application to performance approximation requires a significant strengthening of
the converse result Proposition 17.4.1. Frequently we are given an invariance equation
of the form
Ph = h − g + η (17.59)
where g and h are measurable functions and η is constant, and we hope to infer that
π(g) = η. We obtain the upper bound π(g) ≤ η by the Comparison Theorem when g
and h are each non-negative valued.
To strengthen the Comparison Theorem and deduce that π(g) = η we require bounds
on g and h. Suppose that a third function f ≥ 1 is known to be π-integrable. We have
seen in the proof of Theorem 14.2.6 that a solution to (V3) is given by

σC
∗
V (x) := GC (x, f ) = Ex f (Φk ) , (17.60)
k =0

with C ∈ B+ any f -regular set. The following result is adapted from [267, Proposition
A.6.2].

Theorem 17.7.2. Suppose that Φ is a ψ-irreducible, positive recurrent Markov chain

on X. Assume that the given function f : X → [1, ∞] satisfies π(f ) < ∞, and that the
given set C satisfies C ∈ B+ .
Then the function V ∗ given in (17.60), defined using this f and C, is finite on a full
∗
and absorbing set. If (g, h, η) is any solution to (17.59) satisfying g ∈ Lf∞ and h ∈ LV∞ ,
then π(g) = η, so that h is a solution to Poisson’s equation with forcing function g.

Proof Under the assumptions of the theorem we have for any n ≥ 1,

n −1
n−1 P n h (x) = n−1 h(x) + η − n−1 P k g (x) .
k =0

The right hand side converges to η − π(g) for a.e. x by the f -Norm Ergodic Theo-
rem 14.0.1 in the aperiodic case, and by Theorem 14.3.6 in general. The left hand side
converges to zero by Lemma 17.7.3, which follows.

Lemma 17.7.3. Under the assumptions of Theorem 17.7.2, there exists a full and
absorbing set Xf such that

(i) V ∗ (x) < ∞ for x ∈ Xf ;

17.7. Commentary* 461

(ii) Poisson’s equation holds on Xf :

P V ∗ = V ∗ − f∗ (17.61)
where f ∗ is the zero-mean function given by

τC
∗
f (x) := f (x) − IC (x)Ex [ f (Φk )]; (17.62)
k =1

(iii) for x ∈ Xf ,
lim k −1 Ex [V ∗ (Φk )] = lim Ex [V ∗ (Φk )I{τC > k}] = 0.
k →∞ k →∞

Proof For (i) we take Xf equal to the set XV given in the f -Norm Ergodic Theo-
rem 14.0.1, intersected with the set Xf given in Theorem 14.2.5.
The proof of (ii) is identical to the proof of Theorem 11.3.5. Note that f ∗ is π-
integrable with zero mean by the generalized Kac’s Theorem given in (10.2).
To prove the ﬁrst limit in (iii) we iterate the identity in (ii) to obtain

n −1
Ex [V ∗ (Φn )] = P n V ∗ (x) = V ∗ (x) − P k f ∗ (x), n ≥ 1.
k =0

Dividing by n and letting n → ∞ we obtain, whenever V ∗ (x) < ∞,

n −1
lim n−1 Ex [V ∗ (Φn )] = lim n−1 P k f ∗ (x).
n →∞ n →∞
k =0

The right hand side is zero for x ∈ Xf by the f -Norm Ergodic Theorem 14.0.1. The
ergodic theorem requires aperiodicity. If this fails, we can apply the theorem to the
d-skeleton chain using Theorem 14.3.6.
By the deﬁnition of V ∗ and the Markov property we have for each m ≥ 1,

σC
∗
V (X(m)) = EX (m ) f (Φk )
k =0
(17.63)

τC
=E f (Φk ) | Fm on {τC ≥ m}.
k =m

Consequently, by the smoothing property of the conditional expectation,

τC
Ex [V ∗ (X(m))I{τC ≥ m}] = E I{τC ≥ m}E f (Φk ) | Fm
k =m

τC
= E I{τC ≥ m} f (Φk ) .
k =m
∗
If V (x) < ∞, then the right hand side vanishes as m → ∞ by the Dominated Conver-
gence Theorem. This proves the second limit in (iii).

Chapter 18

Positivity

Turning from the sample path and classical limit theorems for normalized sums of the
previous chapter, we now return to considering limits of the transition probabilities
P n (x, A).
Our first goal in this chapter is to derive limit theorems for chains which are not
positive Harris recurrent. Although some results in this spirit have been derived as
ratio limit theorems such as Theorem 17.2.1 and Theorem 17.3.2, we have not to this
point considered in any detail the difference between limiting behavior of positive and
null recurrent chains.
The last five chapters have amply illustrated the power of ψ-irreducibility in the
positive case: that is, in conjunction with the existence of an invariant probability
measure. However, even in the non-positive case, powerful and elegant results can be
achieved. For Harris recurrent chains we prove a generalized version of the Aperiodic
Ergodicity Theorem of Chapter 13, which covers the null recurrent case and actually
subsumes the ergodic case also, since it applies to any Harris recurrent chain. We will
show
Theorem 18.0.1. Suppose Φ is an aperiodic Harris recurrent chain. Then for any
initial probability distributions λ, µ,

λ(dx)µ(dy)
P n (x, · ) − P n (y, · )
→ 0, n → ∞. (18.1)

If Φ is a null recurrent chain with invariant measure π, then for any constant ε > 0,
and any initial distribution λ,

lim sup λ(dx)P n (x, A)/[π(A) + ε] = 0. (18.2)
n →∞ A ∈B(X)

Proof The ﬁrst result is shown in Theorem 18.1.2 after developing some extended
coupling arguments and then applying the splitting technique. The consequence (18.2)
is proved in Theorem 18.1.3.

Our second goal in this chapter is to use these limit results to complete the charac-
terizations of positivity through a positive/null dichotomy of the local behavior of P n

462
Positivity 463

on suitable sets: not surprisingly, the sets of relevance are petite or compact sets in the
general or topological settings respectively.
In classical countable state space analysis, as in Chung [71] or Feller [114] or
Çinlar [59], it is standard to first approach positivity as an asymptotic “P n -property”
of individual states. It is not hard to show that when Φ is irreducible, either
lim supn →∞ P n (x, y) > 0 for all x, y ∈ X or limn →∞ P n (x, y) = 0 for all x, y ∈ X.
These classifications then provide different but ultimately equivalent characterizations
of positive and null chains in the sense we have defined them, which is through the
finiteness or otherwise of π(X). In Theorem 18.2.2 we show that for ψ-irreducible
chains the positive/null dichotomy as defined in, say, Theorem 13.0.1 is equivalent to
similar dichotomous behavior of

lim sup P n (x, C) (18.3)

n →∞

for petite sets, exactly as it is in the countable case.

Hence for irreducible T-chains, positivity of the chain is characterized by positivity
of (18.3) for compact sets C. For T-chains we also show in this chapter that positivity is
characterized by the behavior of (18.3) for the open neighborhoods of x, and that similar
characterizations exist for e-chains. Thus there are, for these two classes of topologically
well-behaved chains, descriptions in topological terms of the various concepts embodied
in the concept of positivity.
These results are summarized in the following theorem:
Theorem 18.0.2. Suppose that Φ is a chain on a topological space for which a reachable
state x ∈ X exists.

(i) If the chain is a T-chain, then the following are equivalent:

(a) Φ is positive Harris;
(b) Φ is bounded in probability;
(c) Φ is non-evanescent and x is “positive”.
If any of these equivalent conditions hold and if the chain is aperiodic, then for
each initial state x ∈ X,

P k (x, · ) − π
→ 0 as k → ∞. (18.4)

(ii) If the chain is an e-chain, then the following are equivalent:

(a) There exists a unique invariant probability π and for every initial condition
x ∈ X and each bounded continuous function f ∈ C(X),

lim P k (x, f ) = π(f ),

k →∞

1
n
lim f (Φi ) = π(f ) in probability;
n →∞ n
i=1

(b) Φ is bounded in probability on average;

464 Positivity

(c) Φ is non-evanescent and x is “positive”.

If any of these equivalent conditions hold and if the reachable state is “aperiodic”,
then for each initial state x ∈ X,
w
P k (x, · ) −→ π as k → ∞. (18.5)

Proof (i) The equivalence of Harris positivity and boundedness in probability

for T-chains is given in Theorem 18.3.2, and the equivalence of (a) and (c) follows from
Proposition 18.3.3.
(ii) The equivalences of (a)–(c) follow from Proposition 18.4.2, and the limit result
(18.5) is given in Theorem 18.4.4.

Thus we have global convergence properties following from local properties, whether
the local properties are with respect to petite sets as in Theorem 18.0.1 or neighborhoods
of points as in Theorem 18.0.2.
Finally, we revisit the LLN for e-chains in the light of these characterizations and
show that a slight strengthening of the hypotheses of Theorem 18.0.2 are precisely those
needed for such chains to obey such a law.

18.1 Null recurrent chains

Our initial step in examining positivity is to develop, somewhat paradoxically, a limit
result whose main novelty is for null recurrent chains. Orey’s Theorem 18.1.2 actually
subsumes some aspects of the ergodic theorem in the positive case, but for us its virtue
lies in ensuring that limits can be also be defined for null chains.
The method of proof is again via a coupling argument and the Regenerative Decom-
position.
The coupling in Section 13.2 was made somewhat easier because of the existence of
a finite invariant measure in product form to give positivity of the forward recurrence
time chain. If the mean time between renewals is not finite, then such a coupling of
independent copies of the renewal process may not actually occur with probability one.
To see this, consider the recurrence and transience classification of simple symmetric
random walks in two and four dimensions (see Spitzer [369], Section 8). The former
is known to be recurrent, so the return times to zero form a proper renewal sequence.
Now consider two independent copies of this random walk: this is a four-dimensional
random walk which is equally well known to be transient, so the return time to zero is
infinite with positive probability.
Since this is the coupling time of the two independent renewal processes, we cannot
couple them as we did in the positive recurrent case. It is therefore perhaps surprising
that we can achieve our aims by the following rather different and less obvious coupling
method.

18.1.1 Coupling renewal processes for null chains

As in Section 13.2 we again deﬁne two sets of random variables {S0 , S1 , S2 , . . .} and
{S0 , S1 , S2 , . . .}, where {S1 , S2 , . . .} are independent and identically distributed with
distribution {p(j)}, and the distributions of the independent variables S0 , S0 are a, b.
18.1. Null recurrent chains 465

This time, however, we define the second sequence {S1 , S2 , . . .} in a dependent way.
Let M be a (typically large, and yet to be chosen) integer. For each j define Sj as being
either exactly Sj if Sj > M , or, if Sj ≤ M , define Sj as being an independent variable
with the same conditional distribution as Sj , namely
P(Sj = k | Sj ≤ M ) = p(k)/(1 − p(M )), k ≤ M,

where p(M ) = j > M p(j).
This construction ensures that for j ≥ 1 the increments Sj and Sj are identical in
distribution even though they are not independent. By construction, also, the quantities
Wj = Sj − Sj
have the properties that they are identically distributed, they are bounded above by
M and below by −M , and they are symmetric around zero and in particular have zero
mean. n
Let Φ∗n = j =0 Wj denote the random walk generated by this sequence of variables,
and let Tab ∗
denote the first time that the random walk Φ∗ returns to zero, when the ini-
tial step W0 = S0 − S0 has the distribution induced by choosing a, b as the distributions
of S0 , S0 respectively.
As in Section 13.2 the coupling time of the two renewal processes is defined as
Tab = min{j : Za (j) = Zb (j) = 1}
where Za , Zb are the indicator sequences of each renewal process, and since

n
n
Φ∗n = Sj − Sj
j =0 j =0

we have immediately that

∗
Tab = Tab .
But we have shown in Proposition 8.4.4 that such a random walk, with its bounded
increments, is recurrent on Z, provided of course that it is ψ-irreducible; and if the
∗
random walk is recurrent, Tab < ∞ with probability one from all initial distributions
and we have a successful coupling of the two sequences.
Oddly enough, it is now the irreducibility that causes the problems. Obviously a
random walk need not be irreducible if the increment distribution Γ is concentrated on
sublattices of Z, and as yet we have no guarantee that Φ∗ does not have increments
concentrated on such a sublattice: it is clear that it may actually do so without further
assumptions.
We now proceed with the proof of the result we require, which is the same conclusion
as in Theorem 13.2.2 without the assumption that mp < ∞; and the issues just raised
are addressed in that proof.
Theorem 18.1.1. Suppose that a, b, p are proper distributions on Z+ , and that u is the
renewal function corresponding to p. Then provided p is aperiodic
|a ∗ u − b ∗ u|(n) → 0, n → ∞, (18.6)
and
|a ∗ u − b ∗ u| ∗ p(n) → 0, n → ∞. (18.7)
466 Positivity

Proof We will ﬁrst assume a stronger form of aperiodicity, namely

g.c.d.{n − m : m < n, p(m) > 0, p(n) > 0} = 1.

With this assumption we can choose M suﬃciently large that

g.c.d.{n − m : m < n ≤ M, p(m) > 0, p(n) > 0} = 1. (18.8)

Let us use this M in the construction of the random walk Φ∗ above. It is straightforward
to check that now Φ∗ really is irreducible, and so

P(Tab < ∞) = 1

for any a, b. In particular, then, (18.6) is true for a, b.

We now move on to prove (18.7), and to do this we will now use the backward
recurrence chain rather than the forward recurrence chain.
Let Va− , Vb− be the backward recurrence chains deﬁned for the renewal indicators
Za , Zb− : note that the subscripts a, b denote conditional random variables with the
−

initial distributions indicated. It is obvious that the chains Va− , Vb− couple at the same
time Tab that Za− , Zb− couple.
Now let A be an arbitrary set in Z+ . Since the distributions of Va− and Vb− are
identical after the time Tab we have for any n ≥ 1 by decomposing over the values of
Tab and using the Markov or renewal property

n
P(Va− (n) ∈ A) = P(Tab = m)P(Va− (n − m) ∈ A) + P(Va− (n) ∈ A, Tab > n),
m =1

n
P(Vb− (n) ∈ A) = P(Tab = m)P(Vb− (n − m) ∈ A) + P(Vb− (n) ∈ A, Tab > n).
m =1

Using this and the inequality |x − y| ≤ max(x, y), x, y ≥ 0, we get

sup |P(Va− (n) ∈ A) − P(Vb− (n) ∈ A)| ≤ P(Tab > n). (18.9)
A ⊆Z+

We already know that the right hand side of (18.9) tends to zero. But the left hand
side can be written as

sup |P(Va− (n) ∈ A) − P(Vb− (n) ∈ A)|

A ⊆Z+
∞

= 1
2 |P(Va− (n) = m) − P(Vb− (n) = m)|
m =0
n
= 1
2 |a ∗ u (n − m)p(m) − b ∗ u (n − m)p(m)|
m =0

2 |a ∗ u − b ∗ u| ∗ p(n) ,
1
= (18.10)

and so the result (18.7) holds.

It remains to remove the extraneous aperiodicity assumption (18.8).
18.1. Null recurrent chains 467

To do this we use a rather nice trick. Let us modify the distribution p(j) to form
another distribution p0 (j) on {0, 1, . . .} deﬁned by setting

p0 (0) = p > 0;

p0 (j) = (1 − p)p(j), j ≥ 1.
Let us now carry out all of the above analysis using p0 , noting that even though this
is not a standard renewal sequence since p0 (0) > 0, all of the operations used above
remain valid.
Provided of course that p(j) is aperiodic in the usual way, we certainly have that
(18.8) holds for p0 and we can conclude that as n → ∞,

|a ∗ u0 − b ∗ u0 |(n) → 0, (18.11)

|a ∗ u0 − b ∗ u0 | ∗ p0 (n) → 0. (18.12)
Finally, by construction of p0 we have the two identities

p0 (n) = (1 − p)p(n), u0 (n) = (1 − p)−1 u(n),

and consequently, from (18.11) and (18.12), we have exactly (18.6) and (18.7) as re-
quired.

Note that in the null recurrent case, since we do not have p(n) < ∞, we cannot
prove this result from Lemma D.7.1 even though it is a identical conclusion to that
reached there in the positive recurrent case.

18.1.2 Orey’s convergence theorem

In the positive recurrent case, the asymptotic properties of the chain are interesting
largely because of the proper distribution π occurring as the limit of the sequence P n .
In the null recurrent case we know that no such limiting distribution can exist, since
there is no ﬁnite invariant measure.
It is therefore remarkable that we can give a strong result on the closeness of the
n-step distributions from diﬀerent initial laws, even for chains which may be null.

Theorem 18.1.2. Suppose Φ is an aperiodic Harris recurrent chain. Then for any
initial probability distributions λ, µ,

λ(dx)µ(dy)
P n (x, · ) − P n (y, · )
→ 0, n → ∞. (18.13)

Proof Yet again we begin with the assumption that there is an atom α in the
space. Then for any x we have from the Regenerative Decomposition (13.47)

P n (x, · ) − P n (α, · )
≤ Px (τα ≥ n) + |ax ∗ u − u| (n) + |ax ∗ u − u| ∗ p (n) (18.14)

where now p(n) = Pα (τα > n). From Theorem 18.1.1 we know the last two terms in
(18.14) tend to zero, whilst the ﬁrst tends to zero from Harris recurrence.
468 Positivity

The result (18.13) then follows for any two speciﬁc initial starting points x, y from
the triangle inequality; it extends immediately to general initial distributions λ, µ from
dominated convergence.
As previously, the extension to strongly aperiodic chains is straightforward, whilst
the extension to general aperiodic chains follows from the contraction property of the
total variation norm.

We conclude with a consequence of this theorem which gives a uniform version of the
fact that, in the null recurrent case, we have convergence of the transition probabilities
to zero.

Theorem 18.1.3. Suppose that Φ is aperiodic and null recurrent, with invariant mea-
sure π. Then for any initial distribution λ and any constant ε > 0

lim sup λ(dx)P n (x, A)/[π(A) + ε] = 0. (18.15)
n →∞ A ∈B(X)

Proof Suppose by way of contradiction that we have a sequence of integers {nk }

with nk → ∞ and a sequence of sets Bk ∈ B(X) such that, for some λ, and some δ, ε > 0,

λ(dx)P n k (x, Bk ) ≥ δ[π(Bk ) + ε], k ∈ Z+ . (18.16)

Now from (18.13), we know that for every y

| λ(dx)P n k (x, Bk ) − P n k (y, Bk )| → 0, k → ∞, (18.17)

and by Egorov’s Theorem and the fact that π(X) = ∞ this convergence is uniform on
a set with π-measure arbitrarily large.
In particular we can take k and D such that π(D) > δ −1 and

| λ(dx)P n k (x, Bk ) − P n k (y, Bk )| ≤ εδ/2, y ∈ D. (18.18)

Combining (18.16) and (18.18) gives

π(Bk ) = π(dy)P n k (y, Bk )

≥ π(dy)P n k (y, Bk )
D

≥ π(D)[ λ(dx)P n k (x, Bk ) − εδ/2]

≥ π(D)[δ(π(Bk ) + ε) − εδ/2] (18.19)

which gives
π(D) ≤ δ −1 ,
18.2. Characterizing positivity using P n 469

thus contradicting the deﬁnition of D.

The two results in Theorem 18.1.2 and Theorem 18.1.3 combine to tell us that, on
the one hand, the distributions of the chain are getting closer as n gets large; and that
they are getting closer on sets increasingly remote from the “center” of the space, as
described by sets of ﬁnite π-measure.

18.2 Characterizing positivity using P n

We have chosen to formulate positive recurrence initially, in Chapter 10, in terms of
the finiteness of the invariant measure π. The ergodic properties of such chains are
demonstrated in Chapters 13–16 as a consequence of this simple definition.
In contrast to this definition, the classical approach to the classification of irreducible
chains as positive or null recurrent uses the transition probabilities rather than the
invariant measure: typically, the invariant measure is demonstrated to exist only after a
null/positive dichotomy is established in terms of the convergence properties of P n (x, A)
for appropriate sets A. Null chains in this approach are those for which P n (x, A) → 0
for, say, all x and all small sets A, and almost by default, positive recurrent chains are
those which are not null, that is, for which lim sup P n (x, A) > 0.
We now develop a classification of states or of sets as positive recurrent or null using
transition probabilities, and show that this approach is consistent with the definitions
involving invariant measures in the case of ψ-irreducible chains.

18.2.1 Countable spaces

We will first consider the classical classification of null and positive chains based on P n
in the countable state space case.
When X is countable, recall that recurrence of individual states ∞x, y ∈ X involves
consideration of the finiteness or otherwise of Ex (ηy ) = U (x, y) = n =1 P n (x, y). The
stronger condition
lim sup P n (x, y) > 0 (18.20)
n →∞

obviously implies that

Ex (ηy ) = ∞; (18.21)

and since in general, because of the cyclic behavior in Section 5.4, we may have

lim inf P n (x, y) = 0, (18.22)

n →∞

the condition (18.20) is often adopted as the next strongest stability condition after
(18.21).
This motivates the following deﬁnitions.
470 Positivity

Null and positive states

(i) The state α is called null if limn →∞ P n (α, α) = 0.

(ii) The state α is called positive if lim supn →∞ P n (α, α) > 0.

When Φ is irreducible, either all states are positive or all states are null, since for
any w, z there exist r, s such that P r (w, x) > 0 and P s (y, z) > 0 and
lim sup P r +s+n (w, z) > P r (w, x)[lim sup P n (x, y)]P s (y, z). (18.23)
n →∞ n →∞

We need to show that these solidarity properties characterize positive and null chains in
the sense we have deﬁned them. One direction of this is easy, for if the chain is positive
recurrent, with invariant probability π, then we have for any n

π(y) = π(x)P n (x, y);
x
n
hence if limn →∞ P (w, w) = 0 for some w then by (18.23) and dominated convergence
π(y) ≡ 0, which is impossible. The other direction is easy only if one knows, not merely
that lim supn →∞ P n (x, y) > 0, but that (at least through an aperiodic class) this is
actually a limit. Theorem 18.1.3 now gives this to us.
Theorem 18.2.1. If Φ is irreducible on a countable space, then the chain is positive
recurrent if and only if some one state is positive. When Φ is positive recurrent, for
some d ≥ 1
lim P n d+r (x, y) = dπ(y) > 0
n →∞
for all x, y ∈ X, and some 0 ≤ r(x, y) ≤ d − 1; and when Φ is null
lim P n (x, y) = 0
n →∞

for all x, y ∈ X.

Proof If the chain is transient, then since U (x, y) < ∞ for all x, y from Propo-
sition 8.1.1 we have that every state is null; whilst if the chain is null recurrent, then
since π(y) < ∞ for all y, Theorem 18.1.3 shows that every state is null.
Suppose that the chain is positive recurrent, with period d: then the Aperiodic
Ergodic Theorem for the chain on the cyclic class Dj shows that for x, y ∈ Dj we have
lim P n r (x, y) = dπ(y) > 0
n →∞

whilst for z ∈ Dj −r (m o d d) we have P j −r (z, Dj ) = 1, showing that every state is

positive.

The simple equivalences in this result are in fact surprisingly hard to prove until we
have established, not just the properties of the sequences lim sup P n , but the actual
existence of the limits of the sequences P n through the periodic classes. This is why
this somewhat elementary result has been reserved until now to establish.
18.3. Positivity and T-chains 471

18.2.2 General spaces

We now move on to the equivalent concepts for general chains: here, we must consider
properties of sets rather than individual states, but we will see that the results above
have completely general analogues.
When X is general, the deﬁnitions for sets which we shall use are

Null and positive sets

(i) The set A is called null if limn →∞ P n (x, A) = 0 for all x ∈ A.

(ii) The set A is called positive if lim supn →∞ P n (x, A) > 0 for all x ∈ A.

We now prove that these deﬁnitions are consistent with the deﬁnitions of null and
positive recurrence for general ψ-irreducible chains.
Theorem 18.2.2. Suppose that Φ is ψ-irreducible. Then:
(i) the chain Φ is positive recurrent if and only if every set B ∈ B+ (X) is positive;
(ii) if Φ is null,
! then every petite set is null and hence there is a sequence of null sets
Bj with j Bj = X.

Proof If the chain is null, then either it is transient, in which case each petite set
is strongly transient and thus null by Theorem 8.3.5, or it is null and recurrent in which
case, since π exists and is ﬁnite on petite sets by Proposition 10.1.2, we have that every
petite set is again null from Theorem 18.1.3.
Suppose the chain is positive recurrent and we have A ∈ B+ (X). For x ∈ D0 ∩ H,
where H is the maximal Harris set and D0 an arbitrary cyclic set, we have for each r

lim P n d+r (x, A) = dπ(A ∩ Dr )

n →∞

which is positive for some r. Since for every x we have L(x, D0 ∩ H) > 0 we have that
the set A is positive.

18.3 Positivity and T-chains

18.3.1 T-chains bounded in probability
In Chapter 12 we showed that chains on a topological space which are bounded in prob-
ability admit ﬁnite subinvariant measures under a wide range of continuity conditions.
It is thus reasonable to hope that ψ-irreducible chains on a topological space which
are bounded in probability will be positive recurrent. Not surprisingly, we will see in
this section that such a result is true for T-chains, and indeed we can say considerably
472 Positivity

more: boundedness in probability is actually equivalent to positive Harris recurrence in

this case. Moreover, for T-chains positive or null sets also govern the behavior of the
whole chain.
It is easy to see that on a countable space, where the continuous component prop-
erties are always satisﬁed, irreducible chains admit the following connection between
boundedness in probability and positive recurrence.

Proposition 18.3.1. For an irreducible chain on a countable space, positive Harris

recurrence is equivalent to boundedness in probability.

Proof In the null case we do not have boundedness in probability since P n (x, y) →
0 for all x, y from Theorem 18.2.1.
In the positive case we have on each periodic set Dr a ﬁnite probability measure πr
such that if x ∈ D0
lim P n d+r (x, C) = πr (C), (18.24)
n →∞

so by choosing a ﬁnite C such that πr (C) > 1 − ε for all 1 ≤ r ≤ d we have boundedness
in probability as required.

The identical conclusion holds for T-chains. To get the broadest presentation, recall
that a state x ∈ X is reachable if

U (y, O) > 0

for every state y ∈ X and every open set O containing x .

Theorem 18.3.2. Suppose that Φ is a T-chain and admits a reachable state x . Then
Φ is a positive Harris chain if and only if it is bounded in probability.

Proof First note from Proposition 6.2.1 that for a T-chain the existence of just
one reachable state x gives ψ-irreducibility, and thus Φ is either positive or null.
Suppose that Φ is bounded in probability. Then Φ is non-evanescent from Propo-
sition 12.1.1, and hence Harris recurrent from Theorem 9.2.2.
Moreover, boundedness in probability implies by deﬁnition that some compact set
is non-null, and hence from Theorem 18.2.2 the chain is positive Harris, since compact
sets are petite for T-chains.
Conversely, assume that the chain is positive Harris, with periodic sets Dj each
supporting a ﬁnite probability measure πj satisfying (18.24). Choose ε > 0 and compact
sets Cr ⊆ Dr such that πr (Cr ) > 1 − ε for each r.
If x ∈ Dj , then with C := ∪Cr ,

lim P n d+r −j (x, C) = πr (Cr ) > 1 − ε. (18.25)

n →∞

If x is in the non-cyclic set N = X\ ∪ Dj , then P n (x, ∪Dj ) → 1 by Harris recurrence,

and thus from (18.25) we also have lim inf n P n (x, C) > 1 − ε, and this establishes
boundedness in probability as required.

18.4. Positivity and e-chains 473

18.3.2 Positive and null states for T-chains

The ideas encapsulated in the definitions of positive and null states in the countable
case and positive and null sets in the general state space case find their counterparts in
the local behavior of chains on spaces with a topology.
Analogously to the definition of topological recurrence at a point we have

Topological positive and null recurrence of states

We shall call a state x∗

(i) null if limn →∞ P n (x∗ , O) = 0 for some neighborhood O of x∗ ;

(ii) positive if lim supn →∞ P n (x∗ , O) > 0 for all neighborhoods O of x∗ .

We now show that these topological properties for points can be linked to their
counterparts for the whole chain when the T-chain condition holds. This completes the
series of results begun in Theorem 9.3.3 connecting global properties of T-chains with
those at individual points.

Proposition 18.3.3. Suppose that Φ is a T-chain, and suppose that x∗ is a reachable

state. Then the chain Φ is positive recurrent if and only if x∗ is positive.

Proof From Proposition 6.2.1 the existence of a reachable state ensures the chain
is ψ-irreducible. Assume that x∗ is positive. Since Φ is a T-chain, there exists an open
petite set C containing x∗ (take any precompact open neighborhood) and hence by
Theorem 18.2.2 the chain is also positive.
Conversely, suppose that Φ has an invariant probability π so that Φ is positive
recurrent. Since x is reachable it also lies in the support of π, and consequently any
neighborhood of x∗ is in B + (X). Hence x is positive as required, from Theorem 18.2.2.

18.4 Positivity and e-chains

For T-chains we have a great degree of coherence in the concepts of positivity. Although
there is not quite the same consistency for weak Feller chains, within the context of
chains bounded in probability we can develop several valuable approaches, as we saw
in Chapter 12.
In particular, for e-chains we now prove several further positivity results to indicate
the level of work needed in the absence of ψ-irreducibility. It is interesting to note that it
is the existence of a reachable state that essentially takes over the role of ψ-irreducibility,
and that such states then interact well with the e-chain assumption.
474 Positivity

18.4.1 Reachability and positivity

To begin we show that for an e-chain which is non-evanescent, the topological irre-
ducibility condition that a reachable state exists is equivalent to the measure-theoretic
irreducibility condition that the limiting measure Π(x, X) is independent of the starting
state x. Boundedness in probability on average is then equivalent to positivity of the
reachable state.
We ﬁrst give a general result for Feller chains:
Lemma 18.4.1. If Φ is a Feller chain and if a reachable state x exists, then for any
pre-compact neighborhood O containing x ,

{Φ → ∞} = {Φ ∈ O i.o.}c a.s. [P∗ ].

Proof Since L(x, O) is a lower semicontinuous function of x by Proposition 6.1.1,

and since by reachability it is strictly positive everywhere, it follows that L(x, O) is
bounded from below on compact subsets of X.
Letting {On } denote a sequence of pre-compact open subsets of X with On ↑ X, it
follows that On O for each n, and hence by Theorem 9.1.3 we have

{Φ ∈ On i.o.} ⊆ {Φ ∈ O i.o.} a.s. [P∗ ].

This immediately implies that

*
{Φ → ∞}c = {Φ ∈ On i.o.} ⊆ {Φ ∈ O i.o.} a.s. [P∗ ],
n ≥1

and since it is obvious that {Φ → ∞} ⊆ {Φ ∈ O i.o.}c , this proves the lemma.

Proposition 18.4.2. Suppose that Φ is an e-chain which is non-evanescent, and sup-

pose that a reachable state x ∈ X exists. Then the following are equivalent:
(i) There exists a unique invariant probability π such that
w
P k (x, · ) −→ π as k → ∞.

(ii) Φ is bounded in probability on average.

(iii) x is positive.

Proof The identity P Π = Π which is proved in Theorem 12.4.1 implies that for
any f ∈ Cc (X), the adapted process (Π(Φk , f ), FkΦ ) is a bounded martingale. Hence
by the Martingale Convergence Theorem D.6.1 there exists a random variable π̃(f ) for
which
lim Π(Φk , f ) = π̃(f ) a.s. [P∗ ],
k →∞

with Ey [π̃(f )] = Π(y, f ) for all y ∈ X.

Since Π(y, f ) is a continuous function of y, it follows from Lemma 18.4.1 that

lim inf |Π(Φk , f ) − Π(x , f )| = 0 a.s. [P∗ ],

k →∞
18.4. Positivity and e-chains 475

which gives π̃(f ) = Π(x , f ) a.s. [P∗ ]. Taking expectations gives Π(y, f ) = Ey [π̃(f )] =
Π(x , f ) for all y.
Since a ﬁnite measure on B(X) is determined by its values on continuous functions
with compact support, this shows that the measures Π(y, · ), y ∈ X, are identical. Let
π denote their common value.
To prove Proposition 18.4.2 we ﬁrst show that (i) and (iii) are equivalent. To see
that (iii) implies (i), observe that under positivity of x we have Π(x , X) > 0, and since
Π(y, X) = π(X) does not depend on y it follows from Theorem 12.4.3 that Π(y, X) = 1
for all y. Hence π is an invariant probability, which shows that (i) does hold.
Conversely, if (i) holds, then by reachability of x we have x ∈ supp π and hence
every neighborhood of x is positive. This shows that (iii) also holds.
We now show that (i) is equivalent to (ii).
It is obvious that (i) implies (ii). To see the converse, observe that if (ii) holds, then
by Theorem 12.4.1 we have that π is an invariant probability. Moreover, since x is
reachable we must have that π(O) > 0 for any neighborhood of x . Since Π(y, O) =
π(O) for every y, this shows that x is positive.
Hence (iii) holds, which implies that (i) also holds.

18.4.2 Aperiodicity and convergence

The existence of a limit for P k in Proposition 18.4.2 rather than for the individual terms
P n seems to follow naturally in the topology we are using here.
We can strengthen such convergence results using a topological notion of aperiodicity
and we turn to such concepts in the this section. It appears to be a particularly difficult
problem to find such limits for the terms P n in the weak Feller situation without an
e-chain condition.
In the topological case we use a definition justified by the result in Lemma D.7.4,
which is one of the crucial consequences of the definitions in Chapter 5.

Topological aperiodicity of states

A recurrent state x is called aperiodic if P k (x, O) > 0 for each open set O
containing x and all k ∈ Z+ suﬃciently large.

The next result justiﬁes this deﬁnition of aperiodicity and strengthens Theo-
rem 12.4.1.

Proposition 18.4.3. Suppose that Φ is an e-chain which is bounded in probability on

average. Let x ∈ X be reachable and aperiodic, and let π = Π(x , · ). Then for each
initial condition y lying in supp π,
w
P k (y, · ) −→ π as k → ∞ . (18.26)
476 Positivity

Proof For any f ∈ Cc (X) we have by stationarity,

|P f | dπ = [ P |P f | ]dπ ≥ |P k +1 f | dπ,
k k

and hence v := limk →∞ |P k f | dπ exists.
Since {P k f } is equicontinuous on compact subsets of X, there exists a continuous
function g, and a subsequence {ki } ⊂ Z+ for which P k i f → g as i → ∞ uniformly
on compact subsets of X. Hence we also have P k i + f → P g as i → ∞ uniformly on
compact subsets of X.
By the Dominated Convergence Theorem we have for all ∈ Z+ ,

P g dπ = f dπ and |P g| dπ = v. (18.27)

We will now show that this implies that the function g cannot change signs on supp π.
Suppose otherwise, so that the open sets

O+ := {x ∈ X : g(x) > 0}, O− := {x ∈ X : g(x) < 0}

both have positive π measure.

Because x ∈ supp π, it follows by Proposition 18.4.2 that there exist k+ , k− ∈ Z+
such that
P k + (y, O+ ) > 0 and P k − (y, O− ) > 0 (18.28)
when y = x , and since P n ( · , O) is lower semicontinuous for any open set O ⊂ X,
equation (18.28) holds for all y in an open neighborhood N containing x .
We may now use aperiodicity. Since P k (x , N ) > 0 for all k suﬃciently large, we
deduce from (18.28) that there exists ∈ Z+ for which

P (y, O+ ) > 0 and P (y, O− ) > 0

when y = x , and hence for all y in an open neighborhood

N of x . This implies that

|P g| < P |g| on N , and since π{N } > 0, that |P g| dπ < |g| dπ, in contradiction

to the second equality in (18.27).

Hence g does not change signs in supp π. But by (18.27) it follows that if f dπ = 0,
then
0 = | g dπ| = |g| dπ,

so that g ≡ 0 on supp π. This shows that the limit (18.26) holds for all initial conditions
in supp π.

We now show that if a reachable state exists for an e-chain, then the limit in Propo-
sition 18.4.3 holds for each initial condition. A sample path version of Theorem 18.4.4
will be presented below.
Theorem 18.4.4. Suppose that Φ is an e-chain which is bounded in probability on
average. Then:
(i) A unique invariant probability π exists if and only if a reachable state x ∈ X
exists.
18.5. The LLN for e-chains 477

(ii) If an aperiodic reachable state x ∈ X exists, then for each initial state x ∈ X,
w
P k (x, · ) −→ π as k → ∞, (18.29)

where π is the unique invariant probability for Φ. Conversely, if (18.29) holds for
all x ∈ X, then every state in supp π is reachable and aperiodic.

Proof The proof of (i) follows immediately from Proposition 18.4.2, and the con-
verse of (ii) is straightforward.
To prove the remainder, we assume that the state x ∈ X is reachable and aperiodic,
and show that equation
(18.29) holds for all initial conditions.
Suppose that f dπ = 0, |f (x)| ≤ 1 for all x, and for ﬁxed ε > 0 deﬁne the set

Oε := {x ∈ X : lim sup |P k f | < ε}.

k →∞

Because the Markov transition function P is equicontinuous, and because Proposi-

tion 18.4.3 implies that (18.29) holds for all initial conditions in supp π, the set Oε is
an open neighborhood of supp π.
Hence π{Oε } = 1, and since Oε is open, it follows from Theorem 18.4.4 (i) that

lim P N (x, Oε ) = 1.
N →∞

Fix x ∈ X, and choose N0 ∈ Z+ such that P N 0 (x, Oε ) ≥ 1 − ε. We then have by the

deﬁnition of Oε and Fatou’s Lemma,

lim sup |P N 0 +k f (x)| ≤ P N 0 (x, Oεc ) + lim sup P N 0 (x, dy)P k f (y)
k →∞ k →∞ Oε
≤ 2ε.

Since ε is arbitrary, this completes the proof.

18.5 The LLN for e-chains

As a ﬁnal result, illustrating both these methods and the sample path methods de-
veloped in Chapter 17, we now give a sample path version of Proposition 18.4.2 for
e-chains.
Deﬁne the occupation probabilities as

1
n
µ̃n {A} := Sn (IA ) = I{Φk ∈ A}, n ∈ Z+ , A ∈ B(X). (18.30)
n
k =1

Observe that {µ̃k } are not probabilities in the usual sense, but are probability-valued
random variables.
The Law of Large Numbers (Theorem 17.1.2) states that if an invariant probability
measure π exists, then the occupation probabilities converge with probability one for
each initial condition lying in a set of full π-measure. We now present two versions of
478 Positivity

the law of large numbers for e-chains where the null set appearing in Theorem 17.1.2
is removed by restricting consideration to continuous, bounded functions. The ﬁrst is
a Weak Law of Large Numbers, since the convergence is only in probability, while the
second is a Strong Law with convergence occurring almost surely.

Theorem 18.5.1. Suppose that Φ is an e-chain bounded in probability on average, and

suppose that a reachable state exists. Then a unique invariant probability π exists and
the following limits hold.

(i) For any f ∈ C(X), as k → ∞

f dµ̃k → f dπ

in probability for each initial condition.

(ii) If for each initial condition of the Markov chain the occupation probabilities are
almost surely tight, then as k → ∞
w
µ̃k −→ π a.s. [P∗ ]. (18.31)

Proof Let f ∈ C(X)

with 0 ≤ f (x) ≤ 1 for all x, let C ⊂ X be compact and choose
ε > 0. Since P k f → f dπ as k → ∞, uniformly on compact subsets of X, there exists
M suﬃciently large for which

1 1
N N

P M f (Φk ) − f dπ ≤ ε + I{Φi ∈ C c }. (18.32)
N N i=1
k =1

Now for any M ∈ Z+ , we will show that

1
N 1 N

f (Φk ) − f dπ = P M f (Φk ) − f dπ + o(1) (18.33)
N N
k =1 k =1

where the term o(1) converges to zero as n → ∞ with probability one.

For each N , n ∈ Z+ we have

1 i
−1
1
N n N
f (Φk ) − f dπ = P f (Φk −i ) − P i+1 f (Φk −i−1 )
N i=0
N
k =1 k =1

1 n
N
+ P f (Φk ) − f dπ
N
k =1

1 n
N
+ P f (Φk −n ) − P n f (Φk )
N
k =1

where we adopt the convention that Φk = Φ0 for k ≤ 0. For each M ∈ Z+ we may

18.5. The LLN for e-chains 479

average the right hand side of this equality from n = 1 to M to obtain

n −1
1 1 1 i
N M N
f (Φk ) − f dπ = P f (Φk −i ) − P i+1 f (Φk −i−1 )
N M n =1 i=0 N
k =1 k =1

1
N
1 M
+ P f (Φk ) − f dπ
n
N M n =1
k =1

1 1 n
M N
+ P f (Φk −n ) − P n f (Φk ) .
M n =1 N
k =1

The fourth term is a telescoping series, and hence recalling our deﬁnition of the transi-
tion function P M we have
1 −1 N
1 i
N M

f (Φk ) − f dπ ≤ P f (Φk −i ) − P i+1 f (Φk −i−1 )
N i=0
N
k =1 k =1
1 N

+ P M f (Φk ) − f dπ
N
k =1
2M
+ . (18.34)
N
For each ﬁxed 0 ≤ i ≤ M − 1 the sequence

P i f (Φk −i ) − P i+1 f (Φk −i−1 ), FkΦ−i , k > i,

is a bounded martingale diﬀerence process. Hence by Theorem 5.2 of Chapter 4 of [99],

the ﬁrst summand converges to zero almost surely for every M ∈ Z+ , and thus (18.33)
is proved.
Hence for any γ > ε, it follows from (18.33) and (18.32) that
$ 1
N %

lim sup Px f (Φk ) − f dπ ≥ γ
N →∞ N
k =1
$1 N %
≤ lim sup Px I{Φi ∈ C c } ≥ γ − ε
N →∞ N i=1

1 1 N
≤ lim sup Ex I{Φi ∈ C c } .
γ − ε N →∞ N i=1

Since Φ is bounded in probability on average, the right hand side decreases to zero as
C ↑ X, which completes the proof of (i).
To prove (ii), suppose that the occupation probabilities {µ̃k } are tight along some
sample path. Then we may choose the compact set C in (18.32) so that along this
sample path
1 N

lim sup P M f (Φk ) − f dπ ≤ 2ε.
N →∞ N
k =1
480 Positivity

Since ε > 0 is arbitrary, (18.33) shows that

1
N
lim f (Φk ) = f dπ a.s. [P∗ ] ,
N →∞ N
k =1

so that the Strong Law of Large Numbers holds for all f ∈ C(X) and all initial conditions
x ∈ X.
Let {fn } be a sequence of continuous functions with compact support which is dense
in Cc (X) in the uniform norm. Such a sequence exists by Proposition D.5.1. Then by
the preceding result,
$ %
Px lim fn dµ̃k = fn dπ for each n ∈ Z+ = 1,
k →∞

v
which implies that µ̃k −→ π as k → ∞. Since π is a probability, this shows that in fact
w
µ̃k −→ π a.s. [P∗ ], and this completes the proof.

We conclude by stating a result which, combined with Theorem 18.5.1, provides a
test function approach to establishing the Law of Large Numbers for Φ. For a proof
see [259].
Theorem 18.5.2. If a coercive function V and a compact set C satisfy condition (V4),
then Φ is bounded in probability, and the occupation probabilities are almost surely tight
for each initial condition. Hence, if Φ is an e-chain, and if a reachable state exists,
w
µ̃k −→ π as k → ∞ a.s. [P∗ ]. (18.35)

18.6 Commentary
Theorem 18.1.2 for positive recurrent chains is ﬁrst proved in Orey [308], and the null
recurrent version we give here is in Jamison and Orey [177]. The dependent coupling
which we use to prove this result for null recurrent chains is due to Ornstein [310],
[311], and is also developed in Berbee [25]. Our presentation of this material has relied
heavily on Nummelin [303], and further related results can be found in his Chapter 6.
Theorem 18.1.3 is due to Jain [171], and our proof is taken from Orey [309].
The links between positivity of states, boundedness in probability, and positive
Harris recurrence for T-chains are taken from Meyn [259], Meyn and Tweedie [277]
and Tuominen and Tweedie [391]. In [277] analogues of Theorem 18.3.2 and Proposi-
tion 18.3.3 are obtained for non-irreducible chains.
The convergence result Theorem 18.4.4 for chains possessing an aperiodic reachable
state is based upon Theorem 8.7.2 of Feller [115].
The use of the martingale property of Π(Φk , f ) to obtain uniqueness of the invariant
probability in Proposition 18.4.2 is originally in [175]. This is a powerful technique which
is perhaps even more interesting in the absence of a reachable state.
For suppose that the chain is bounded in probability but a reachable state does
not exist, and deﬁne an equivalence relation on X as follows: x ↔ y if and only if
Π(x, · ) = Π(y, · ). It follows from the same techniques which were used in the proof of
Proposition 18.4.2 that if x is recurrent, then the set of all states E x for which y ↔ x
18.6. Commentary 481

is closed. Since x ∈ E x for every recurrent point x ∈ R, F = X − E x consists
entirely of non-recurrent points. It then follows from Proposition 3.3 of Tuominen and
Tweedie [392] that F is transient.
From this decomposition and Proposition 18.4.3 it is straightforward to generalize
Theorem 18.4.4 to chains which do not possess a reachable state. The details of this
decomposition are spelled out in Meyn and Tweedie [281].
Such decompositions have a large literature for Feller chains and e-chains: see for
example Jamison [175] and also Rosenblatt [337] for e-chains, and Jamison and Sine
[178], Sine [358, 357, 356] and Foguel [121, 123] for Feller chains and the detailed
connections between the Feller property and the stronger e-chain property. All of these
papers consider exclusively compact state spaces. The results for non-compact state
spaces appear here for the first time.
The LLN for e-chains is originally due to Breiman [46] who considered Feller chains
on a compact state space. Also on a compact state space is Jamison’s extension of
Breiman’s result [174] where the LLN is obtained without the assumption that a unique
invariant probability exists.
One of the apparent difficulties in establishing this result is finding a candidate
limit π̃(f ) of the sample path averages n1 Sn (f ). Jamison resolved this by considering
the transition function Π, and the associated convergent martingale (Π(Φk , A), FkΦ ). If
the chain is bounded in probability on average, then we define the random probability
π̃ as
π̃{A} := lim Π(Φk , A), A ∈ B(X). (18.36)
k →∞

It is then easy to show by modifying (18.34) that Theorem 18.5.1 continues to hold
with f dπ replaced by f dπ̃, even when no reachable state exists for the chain. The
proof of Theorem 18.5.1 can be adopted after it is appropriately modiﬁed using the
limit (18.36).
Chapter 19

Generalized classiﬁcation
criteria

We have now developed a number of simple criteria, solely involving the one-step transi-
tion function, which enable us to classify quite general Markov chains. We have seen, for
example, that the equivalences in Theorem 11.0.1, Theorem 13.0.1, or Theorem 15.0.1
give an eﬀective approach to the analysis of many systems.
For more complex models, however, the analysis of the simple one-step drift

∆V (x) = P (x, dy)[V (y) − V (x)]

towards petite sets may not be straightforward, or indeed may even be impracticable.
Even though we know from the powerful converse theory in the theorems just mentioned
that for most forms of stability, there must be at least one V with the one-step drift
∆V suitably negative, ﬁnding such a function may well be non-trivial.
In this chapter we conclude our approach to stochastic stability by giving a number
of more general drift criteria which enable the classiﬁcation of chains where the one-step
criteria are not always straightforward to construct. All of these variations are within
the general framework described previously. The steps to be used in practice are, we
hope, clear from the preceding chapters, and follow the route reiterated in Appendix A.
There are three generalizations of the drift criteria which we consider here.

(a) State-dependent drift conditions, which allow for negative drifts after a number of
steps n(x) depending on the state x from which the chain starts.

(b) Path- or history-dependent drift conditions, which allow for functions of the whole
past of the process to show a negative drift.

(c) Mixed or “average” drift conditions, which allow for functions whose drift varies
in direction, but which is negative in a suitably “averaged” way.

For each of these we also indicate the application of the method by example. The
state-dependent drift technique is used to analyze random walk on R2+ and a model

482
19.1. State-dependent drifts 483

of invasion/defense, where simple one-step drift conditions seem almost impossible to

construct; the history-dependent methods are shown to be suited to bilinear models
with random coeﬃcients, where again one-step drift conditions seem to fail; and, ﬁnally,
the mixed drift analysis gives us a criterion for ladder processes, and in particular the
Markovian representation of the full GI/G/1 queue, to be ergodic.

19.1 State-dependent drifts

19.1.1 The state-dependent drift criteria
In this section we consider consequences of state-dependent drift conditions of the form

P n (x) (x, dy)V (y) ≤ g[V (x), n(x)], x ∈ Cc , (19.1)

where n(x) is a function from X to Z+ , g is a function depending on which type of

stability we seek to establish, and C is an appropriate petite set.
The function n(x) here provides the state dependence of the drift conditions, since
from any x we must wait n(x) steps for the drift to be negative.
In order to develop results in this framework we work with an “embedded” chain
Φ̂. Using n(x) we deﬁne the new transition law {P̂ (x, A)} by

P̂ (x, A) = P n (x) (x, A), x ∈ X, A ∈ B(X), (19.2)

and let Φ̂ be the corresponding Markov chain. This Markov chain may be constructed
explicitly as follows. The time n(x) is a (trivial) stopping time. Let s(k) denote its
iterates: that is, along any sample path, s(0) = 0, s(1) = n(x) and

s(k + 1) = s(k) + n(Φs(k ) ).

Then it follows from the strong Markov property that

Φ̂k = Φs(k ) , k ≥ 0, (19.3)

is a Markov chain with transition law P̂ .

Let F̂k = Fs(k ) be the σ-ﬁeld generated by the events “before s(k)”: that is,

F̂k := {A : A ∩ {s(k) ≤ n} ∈ Fn , n ≥ 0}.

We let τ̂A , σ̂A denote the first-return and first-entry index to A respectively for the
chain Φ̂. Clearly s(k) and the events {σ̂A ≥ k}, {τ̂A ≥ k} are F̂k −1 -measurable for any
A ∈ B(X).
Note that s(τ̂C ) denotes the time of first return to C by the original chain Φ along
an embedded path, defined by

C −1
τ̂
s(τ̂C ) := n(Φ̂k ). (19.4)
0
484 Generalized classiﬁcation criteria

From (19.3) we have

s(τ̂C ) ≥ τC , s(σ̂C ) ≥ σC , a.s. [P∗ ]. (19.5)

These relations will enable us to use the drift equations (19.1), with which we will
bound the index at which Φ̂ reaches C, to bound the hitting times on C by the original
chain.
We ﬁrst give a state-dependent criterion for Harris recurrence.
Theorem 19.1.1. Suppose that Φ is a ψ-irreducible chain on X, and let n(x) be a
function from X to Z+ . The chain is Harris recurrent if there exists a non-negative
function V unbounded oﬀ petite sets and some petite set C satisfying

P n (x) (x, dy)V (y) ≤ V (x), x ∈ Cc . (19.6)

Proof The proof is an adaptation of the proof of Theorem 9.4.1.

Let C0 = C, and let Cn = {x ∈ X : V (x) ≤ n}. By assumption, the sets Cn , n ∈ Z+ ,
are petite.
Now suppose by way of contradiction that Φ is not Harris recurrent. By Theo-
rem 8.0.1 the chain is either recurrent, but not Harris recurrent, or the chain is transient.
In either case, we show that there exists an initial condition x0 such that

Px 0 {(Φ ∈ C i.o.)c ∩ (V (Φk ) → ∞)} > 0. (19.7)

Firstly, if the chain is transient, then by Theorem 8.3.5 each Cn is uniformly transient,
and hence V (Φk ) → ∞ as k → ∞ a.s. [P∗ ], and so (19.7) holds.
Secondly, if Φ is recurrent, then the state space may be written as

X=H ∪N (19.8)

where H = N c is a maximal Harris set and ψ(N ) = 0; this follows from Theorem 9.0.1.
Since for each n the set Cn is petite we have Cn H, and hence by Theorem 9.1.3,

{Φ ∈ Cn i.o.} ⊂ {Φ ∈ H i.o.} a.s. [P∗ ].

It follows that the inclusion {lim inf V (Φn ) < ∞} ⊂ {Φ ∈ H i.o.} holds with probability
one. Thus (19.7) holds for any x0 ∈ N , and if the chain is not Harris, we know N is
non-empty.
Now from (19.7) there exists M ∈ Z+ with
( )
Px 0 (Φk ∈ C c , k ≥ M ) ∩ (V (Φk ) → ∞) > 0 :

letting µ = P M (x0 , · ), we have by conditioning at time M ,

( )
Pµ (σC = ∞) ∩ (V (Φk ) → ∞) > 0. (19.9)

We now show that (19.9) leads to a contradiction when (19.6) holds.

Deﬁne the chain Φ̂ as in (19.3). We can write (19.6), for every k, as

E[V (Φ̂k +1 ) | F̂k ] ≤ V (Φ̂k ) a.s. [P∗ ]

19.1. State-dependent drifts 485

when σ̂C > k, k ∈ Z+ .

Let Mi = V (Φ̂i )I{σ̂C ≥ i}. Using the fact that {σ̂C ≥ k} ∈ F̂k −1 , we have that

E[Mk | F̂k −1 ] = I{σ̂C ≥ k}E[V (Φ̂k ) | F̂k −1 ] ≤ I{σ̂C ≥ k}V (Φ̂k −1 ) ≤ Mk −1 .

Hence (Mk , F̂k ) is a positive supermartingale, so that from Theorem D.6.2 there exists
an almost surely ﬁnite random variable M∞ such that Mk → M∞ a.s. as k → ∞. From
the construction of Mi , either σ̂C < ∞ in which case M∞ = 0, or σ̂C = ∞ in which
case lim supk →∞ V (Φ̂k ) = M∞ < ∞ a.s.
Since σC < ∞ whenever σ̂C < ∞, this shows that for any initial distribution µ,
( )
Pµ {σC < ∞} ∪ {lim inf V (Φn ) < ∞}c = 1.
n →∞

This contradicts (19.9), and hence the chain is Harris recurrent.

We next prove a state-dependent criterion for positive recurrence.
Theorem 19.1.2. Suppose that Φ is a ψ-irreducible chain on X, and let n(x) be a
function from X to Z+ . The chain is positive Harris recurrent if there exists some petite
set C, a non-negative function V bounded on C, and a positive constant b satisfying

P n (x) (x, dy)V (y) ≤ V (x) − n(x) + bIC (x), x ∈ X, (19.10)

in which case for all x

Ex [τC ] ≤ V (x) + b. (19.11)

Proof The state-dependent drift criterion for positive recurrence is a direct con-
sequence of the f -ergodicity results of Theorem 14.2.2, which tell us that without any
irreducibility or other conditions on Φ, if f is a non-negative function and

P (x, dy)V (y) ≤ V (x) − f (x) + bIC (x), x ∈ X, (19.12)

for some set C, then for all x ∈ X

τ
C −1
Ex f (Φk ) ≤ V (x) + b. (19.13)
k =0

Again deﬁne the chain Φ̂ as in (19.3). From (19.10) we can use (19.13) for Φ̂, with
f (x) taken as n(x), to deduce that

τ̂
C −1
Ex n(Φ̂k ) ≤ V (x) + b. (19.14)
k =0

But we have by adding the lengths of the embedded times n(x) along any sample path
that from (19.4)
C −1
τ̂
n(Φ̂k ) = s(τ̂C ) ≥ τC .
k =0
486 Generalized classiﬁcation criteria

Thus from (19.14) and the fact that V is bounded on the petite set C, we have that Φ
is positive Harris using the one-step criterion in Theorem 13.0.1, and the bound (19.11)
follows also from (19.14).

We conclude the section with a state-dependent criterion for geometric ergodicity.
Theorem 19.1.3. Suppose that Φ is a ψ-irreducible chain on X, and let n(x) be a
function from X to Z+ . The chain is geometrically ergodic if it is aperiodic and there
exists some petite set C, a non-negative function V ≥ 1 and bounded on C, and positive
constants λ < 1 and b < ∞ satisfying

P n (x) (x, dy)V (y) ≤ λn (x) [V (x) + bIC (x)]. (19.15)

When (19.15) holds,

rn
P n (x, · ) − π
≤ RV (x), x∈X (19.16)
n

for some constants R < ∞ and r > 1.

Proof Suppose that (19.15) holds, and deﬁne

V (x) = 2(V (x) − 1/2) ≥ 1.

Then we can write (19.15) as

P̂ (x, dy)V (y) ≤ λn (x) [2V (x) + 2bIC (x)] − 1
(19.17)
= λn (x) [V (x) + 1 + 2bIC (x)] − 1.
Without loss of generality we will therefore assume that V itself satisﬁes the inequality

P̂ (x, dy)V (y) ≤ λn (x) [V (x) + 1 + bIC (x)] − 1. (19.18)

We now adapt the proof of Theorem 15.2.5. Deﬁne the random variables

Zk = κs(k ) V (Φ̂k )

for k ∈ Z+ . It follows from (19.18) that for κ = λ−1 , since κs(k +1) is F̂k -measurable,

E[Zk +1 | F̂k ] = κs(k +1) E[V (Φ̂k +1 ) | F̂k ]

≤ κs(k +1) {κ−n (Φ k ) [V (Φk ) + 1 + bIC (Φk )] − 1}

= Zk − κs(k +1) + κs(k ) + κs(k ) bIC (Φk ).

Using Proposition 11.3.2 we have

τ̂
C −1 τ̂
C −1
Ex [κs(k +1) − κs(k ) ] ≤ Z0 (x) + Ex κs(k ) bIC (Φ̂k ) .
k =0 k =0
19.1. State-dependent drifts 487

Collapsing the sum on the left and using the fact that only the ﬁrst term in the sum
on the right is non-zero, we get

Ex [κs( τ̂ C ) − 1] ≤ V (x) + bIC (x). (19.19)

Since V < ∞ and V is assumed bounded on C, and again using the fact that s(τ̂C ) > τC ,
we have from Theorem 15.0.1 (ii) that the chain is geometrically ergodic.
The ﬁnal bound in (19.16) comes from the fact that for some r, an upper bound on
the state-dependent constant term in (19.16) is shown in Theorem 15.4.1 to be given
by
R(x) = Ex [κτ C ] ≤ Ex [κs( τ̂ C ) ] ≤ (2 + b)V (x)
since V ≥ 1.

19.1.2 Models on R2+

State-dependent criteria appear to be of most use in analyzing multidimensional models,
especially those on the positive orthant of Euclidean space. This is because, although the
normal one-step drift conditions may work in the interior of such spaces, the constraints
on the faces of the orthant can imply that drift is not negative in this part of the space.
We illustrate this in a simple case when the space is R2+ = {(x, y), x ≥ 0, y ≥ 0}.
Consider the case of random walk restricted to the positive orthant. Let Zk =
(Zk (1), Zk (2)) be a sequence of i.i.d. random variables in R2 and deﬁne the chain Φ by

(Φn (1), Φn (2)) = ([Φn −1 (1) + Zn (1)]+ , [Φn −1 (2) + Zn (2)]+ ). (19.20)

Let us assume that for each coordinate we have negative increments: that is,

E[Zk (1)] < 0, E[Zk (2)] < 0.

This assumption ensures that the chain is a δ(0,0) -irreducible chain with all compact
sets petite. To see this note that there exists h > 0 such that

P(Zk (1) < −h) > h, P(Zk (2) < −h) > h,

and so for any square Sw = {x ≤ w, y ≤ w} we have that, choosing m ≥ w/h,

P m ((x, y), (0, 0)) > h2m > 0, (x, y) ∈ Sw .

This provides δ(0,0) -irreducibility, and moreover shows that Sw is small, with ν = δ0,0
in (5.14).
We will also assume that the second moments of the increments are ﬁnite:

E[Zk2 (1)] < ∞, E[Zk2 (2)] < ∞.

Thus it follows from Proposition 14.4.1 that each of the marginal random walks on
[0, ∞) is positive Harris with stationary measures π1 , π2 satisfying

β1 := zπ1 (dz) < ∞, β2 := zπ2 (dz) < ∞. (19.21)
488 Generalized classiﬁcation criteria

Of course, from this we could establish positivity merely by noting that π = π1 × π2

is invariant for the bivariate chain. However, in order to illustrate the methods of
this section we will establish that Φ is positive Harris by considering the test function
V (x, y) = x + y: this also gives us a bound on the hitting times of rectangles that the
more indirect result does not provide.
By choosing M large enough we can ensure that the truncated versions of the in-
crements are also negative, so that for some ε > 0

E[Zk (1)I{Zk (1) ≥ −M }] < −ε, E[Zk (2)I{Zk (2) ≥ −M }] < −ε.

This ensures that on the set A(M ) = {x ≥ M, y ≥ M }, we have that (19.10) holds with
n(x, y) = 1 in the usual manner.
Now consider the strip A1 (M, m) = {x ≤ M, y ≥ m}, and ﬁx (x, y) ∈ A1 (M, m).
Let us choose a given ﬁxed number of steps n, and choose m > (M + 1)n. At each
step in the time period {0, . . . , n} the expected value of Φn (2) decreases in expectation
by at least ε. Moreover, from (19.21) and the f -norm ergodic result (14.5) we have that
by convergence there is a constant c0 such that for all n

E(0,y ) [Φn (1)] ≤ c0 (19.22)

independent of y. From stochastic monotonicity we also have that for all x ≤ M , if τ0

denotes the ﬁrst hitting time on {0} for the marginal chain Φn (1),

E(x,y ) [Φn (1)I{τ0 > n}] ≤ E(M ,y ) [Φn (1)I{τ0 > n}]
(19.23)
:= ζM (n)

which is ﬁnite and tends to zero as n → ∞, from Theorem 14.2.7, independent of y.

Let us choose n large enough that ζM (n) ≤ ε0 .
We thus have from the Markov property

E(x,y ) [Φn (1) + Φn (2)] = E(x,y ) [Φn (2)] + E(x,y ) [Φn (1)I{τ0 > n}]

+ E(x,y ) [Φn (1)I{τ0 ≤ n}] (19.24)

≤ y − nε + ε0 + c0 .

Thus for x ≤ M , we have uniform negative n-step drift in the region A1 (M, m) provided

nε > M + ε0 + c0

as required.
A similar construction enables us to ﬁnd that for ﬁxed large n the n-step drift in
the region A2 (m, M ) is negative also. Thus we have shown

Theorem 19.1.4. If the bivariate random walk on R2+ has negative mean increments
and ﬁnite second moments in both coordinates, then it is positive Harris recurrent, and
for sets A(m) = {x ≥ m, y ≥ m} with m large, and some constant c,

E(x,y ) [τA (m ) ] ≤ c(x + y). (19.25)

19.1. State-dependent drifts 489

In this example, we do not use the full power of the results of Section 19.1. Only
three values of n(x, y) are used, and indeed it is apparent from the construction in
(19.24) that we could have treated the whole chain on the region
{x ≥ M + n} ∪ {y ≥ M + n}
for the same n. In this case the n-skeleton {Φn k } would be shown to be positive
recurrent, and it follows from the fact that the invariant measure for {Φk } is also
invariant for {Φn k } that the original chain is positive Harris: see Chapter 10. This
example does, however, indicate the steps that we could go through to analyze less
homogeneous models, and also indicates that it is easier to analyze the boundaries
or non-standard regions independently of the interior or standard region of the space
without the need to put the results together for a single ﬁxed skeleton.

19.1.3 An invasion/antibody model

We conclude this section with the analysis of an invasion/antibody model on a countable
space, illustrating another type of model where control of state-dependent drift is useful.
Models for competition between two groups can be modeled as bivariate processes
on the integer-valued quadrant Z2+ = {i, j ∈ Z+ }. Consider such a process in discrete
time with the first coordinate process Φn (1) denoting the numbers of invaders and the
second coordinate process Φn (2) denoting the numbers of defenders.
(A1) Suppose first that the defenders and invaders mutually tend to reduce the oppo-
sition numbers when both groups are present, even though “reinforcements” may
join either side. Thus on the interior of the space, denoted I = {i, j ≥ 1}, we
assume that for some εi , εj ≥ ε > 1/2
Ei,j [Φ1 (1) + Φ1 (2)] ≤ (i − εi ) + (j − εj ) ≤ i + j − 2ε, i, j > 1. (19.26)
Such behavior might model, for example, antibody action against invasive bodies
where there is physical attachment of at least one antibody to each invader and
then both die: in such a context we would have εi = εj = 1.
(A2) On one boundary, when the defender numbers reach the level 0, if the invaders
are above a threshold level d the body dies in which case the invaders also die and
the chain drops to (0, 0), so that
P ((i, 0), (0, 0)) = 1, i > d; (19.27)
otherwise a new population of antibodies or defenders of finite mean size is gen-
erated. These assumptions are of course somewhat unrealistic and clearly with
more delicate arguments can be made much more general if required.
(A3) Much more critically, on the other boundary, when the invader numbers fall to
level 0, and the defenders are of size j > 0, a new “invading army” is raised
to bring the invaders to size N , where N is a random variable concentrated on
{j + 1, j + 2, . . . , j + d} for the same threshold d, so that

d
P ((0, j), (j + k, j) = 1 : (19.28)
k =1
490 Generalized classification criteria

this distribution being concentrated above j represents the physically realistic

concept that a new invasion will fail instantly if the invading population is not at
least the size of the defending population. The bounded size of the increment is
purely for convenience of exposition.

Note that the chain is δ(0,0) -irreducible under assumptions (A1)–(A3), regardless of
the behavior at zero. Thus the model can be formulated to allow for a stationary
distribution at (0, 0) (i.e., extinction) or for rebirth and a more generally distributed
stationary distribution over the whole of Z+ 2 . The only restriction we place in general
is that the increments from (0, 0) have finite mean: here we will not make this more
explicit as it does not affect our analysis.
Let us, to avoid unrewarding complexities, add to (19.26) the additional condition
that the model is “left-continuous,” that is, has bounded negative increments defined
by
P ((i, j), (i − l, j − k) = 0, i, j > 0, k, l > 1 : (19.29)
this would be appropriate if the chain were embedded at the jumps of a continuous time
process, for example.
To evaluate positive recurrence of the model, we use the test function V (i, j) =
[i + j]/β, where β < ε is to be chosen.
Analysis of this model in the interior of the space is not difficult: by using (V2)
with V (i, j) on I = {i, j ≥ 1}, we have that Ei,j [τI c ] < (i + j)/β from assumption (A1).
The difficulty with such multidimensional models is that even though they reach I c in
a finite mean time, they may then “escape” along one or both of the boundaries. It is
in this region that the tools of Section 19.1 are useful in assisting with the classification
of the model.
Starting at B1 (c) = {(i, 0), i > c}, the infinite boundary edge above c, we have that
the value of V (Φ1 ) is zero if c > d, so that (19.10) also holds with n = 1 provided we
choose c > max(d, β −1 ).
On the other infinite boundary edge, denoted B2 (c) = {(0, j), j > c}, however, we
have positive one-step drift of the function V . Now from the starting point (0, j), let
us consider the (j + 1)-step drift. This is bounded above by [j + d − 2jε]/β and so we
have (19.10) also holds with n(j) = j + 1 provided

[j + d − 2jε]/β < −j − 1,

which will hold provided β < 2ε − 1 and we then choose c > (d + β)/(2ε − 1 − β).
Consequently we can assert that, writing C = I ∪ B2 (c) ∪ B1 (c) with c satisfying
both these constraints, the mean time is bounded as

E(i,j ) [τC ] ≤ [i + j]/β

regardless of the threshold level d, and so the invading strategy is successful in over-
coming the antibody defense.
Note that in this model there is no ﬁxed time at which the drift from all points on
the boundary B2 (c) is uniformly negative, no matter what the value of c chosen. Thus,
state-dependent drift conditions appear needed to analyze this model.
To test for geometric ergodicity we use the function V (i, j) = exp(αi) + exp(αj) and
adopt the approach in Section 16.3.
19.2. History-dependent drift criteria 491

We assume that the increments in the model have uniformly geometrically decreasing
tails and bounded second moments: speciﬁcally, we assume each coordinate process
satisﬁes, for some γ > 0,

θi (γ) := k ≥i−1 exp(γk)Pi,j (Φ1 (1) = i + k) < ∞, j ≥ 1,
(19.30)
θj (γ) := k ≥j −1 exp(γk)Pi,j (Φ1 (2) = j + k) < ∞, i ≥ 1,

and
k ≥i−1 k 2 Pi,j (Φ1 (1) = i + k) < D1 , j ≥ 1,
(19.31)
2
k ≥j −1 k Pi,j (Φ1 (2) = j + k) < D2 , i ≥ 1.
Then on the interior set I we have, for α < γ,

j P ((r, s), (i, j))V (i, j) ≤ exp(αr)[θi (α) − 1]

+ exp(αs)[θj (α) − 1]

≤ α exp(αr)(−εr /2)

+ α exp(αs)(−εs /2)

for small enough α, using a Taylor series expansion and the uniform conditions (19.30)
and (19.31). Thus (19.15) holds with n = 1 and λ = 1 − αε/2.
Starting at B1 (c), (19.15) also obviously holds provided we choose c large enough.
On the other inﬁnite boundary edge B2 (c) = {(0, j), j > c} we have a similar construc-
tion for the (j + 1)-step drift. We have, using the uniform bounds (19.31) assumed on
the variances,
j +1
j P ((0, s), (i, j))V (i, j) ≤ exp(α(j + d))[1 − ε/2]j

+ exp(αs)[1 − ε/2]j

and so, for α suitably small, we have (19.15) holding again as required.

19.2 History-dependent drift criteria

The approach through Dynkin’s formula to obtaining bounds on hitting times of appro-
priate sets allows a straightforward generalization to more complex, history-dependent,
test functions with very little extra effort above that expended already.
Rather than considering a fixed function V of the state Φk , we will now let
{Vk : k ∈ Z+ } denote a family of non-negative Borel measurable functions Vk : Xk +1 →
R+ . By imposing the appropriate “drift condition” on the stochastic process {Vk =
Vk (Φ0 , . . . , Φk )}, we will obtain generalized criteria for stability and non-stability. The
value of this generalization will be illustrated below in an application to an autoregres-
sive model with random coefficients.
492 Generalized classification criteria

19.2.1 Generalized criteria for positivity and nullity

We ﬁrst consider, in the time-varying context, drift conditions on such a family {Vk :
k ∈ Z+ } for chains to be positive or to be null. We call a sequence {Vk , FkΦ } adapted if
Vk is measurable with respect to FkΦ for each k.
The following condition generalizes (V2).

Generalized negative drift condition

There exists a set C ∈ B(X), and an adapted sequence {Vk , FkΦ }, such that
for some ε > 0,
E[Vk +1 | FkΦ ] ≤ Vk − ε a.s. [P∗ ] (19.32)
when σC > k, k ∈ Z+ .

As usual the condition that σC > k means that Φi ∈ C c for each i between 0 and
k. Since C will usually be assumed “small” in some sense (either petite or compact),
(19.32) implies that there is a drift towards the “center” of the state space when Φ is
“large” in exactly the same way that (V2) does.
From these generalized drift conditions and Dynkin’s formula we ﬁnd
Theorem 19.2.1. If {Vk } satisﬁes (19.32), then
"
ε−1 V0 (x) x ∈ Cc
Ex [τC ] ≤
1 + ε−1 P V0 (x) x∈C.

Hence if C is petite and supx∈C Ex [V0 (Φ1 )] < ∞, then Φ is regular.

Proof The proof follows immediately from Proposition 11.3.3 by letting Zk = Vk ,

εk = ε, exactly as in Theorem 11.3.4.

There is a similar generalization of the drift criterion for determining whether a
given chain is null.

Generalized positive drift condition

There exists a set C ∈ B(X), and an adapted sequence {Vk , FkΦ } with

E[Vk +1 | FkΦ ] ≥ Vk a.s. [P∗ ], (19.33)

when σC > k, k ∈ Z+ .

Clearly the process Vk ≡ 1 satisﬁes (19.33), so we will need some auxiliary conditions
to prove anything speciﬁc when (19.33) holds.
19.2. History-dependent drift criteria 493

Theorem 19.2.2. Suppose that {Vk } satisﬁes (19.33), and let x0 ∈ C c be such that

V0 (x0 ) > Vk (x0 , . . . , xk ), xk ∈ C, k ∈ Z+ . (19.34)

Suppose moreover the conditional absolute increments have bounded means: that is, for
some constant B < ∞,
E[|Vk − Vk −1 | | FkΦ−1 ] ≤ B. (19.35)
Then Ex 0 [τC ] = ∞.

Proof The proof of Theorem 11.5.1 goes through without change, although in this
case the functions Vk in that proof are not taken simply as V (Φk ) but as Vk (Φ0 , . . . , Φk ).

19.2.2 Generalized criteria for geometric ergodicity

We can extend the results of Chapter 15 in a similar way when the space admits a
topology. In order to derive such criteria we need to adapt the sequence {Vk } appro-
priately to the topology. Let us call the whole sequence {Vk } coercive if there exists a
coercive function V : X → R+ with the property

Vk (x0 , . . . , xk ) ≥ V (xk ) ≥ 0 (19.36)

for all k ∈ Z+ and all xi ∈ X.

The criterion for such a family {Vk } generalizes (15.35), which we showed in
Lemma 15.2.8 to be equivalent to (V4).

Generalized geometric drift condition

There exist λ < 1, L < ∞ and an adapted coercive sequence {Vk , FkΦ }
such that

Ex [Vk +1 | FkΦ ] ≤ λVk + L a.s. [P∗ ], k ∈ Z+ . (19.37)

Theorem 19.2.3. Suppose that Φ is an irreducible aperiodic T-chain. If the generalized

geometric drift condition (19.37) holds, and if V0 is uniformly bounded on compact
subsets of X, then there exist R < ∞ and r > 1 such that
∞

rn
P n (x, · ) − π
f ≤ R(V0 (x) + 1), n ∈ Z+ , x ∈ X,
n =1

where f = V +1 and V is as deﬁned in (19.36). In particular, Φ is then f -geometrically

ergodic.
494 Generalized classiﬁcation criteria

Proof Let λ < ρ < 1, and deﬁne the pre-compact set C and the constant ε > 0 by
2L ρ−λ
C = {x ∈ X : V (x) ≤ + 1}, ε= .
ρ−λ 2
Then for all k ∈ Z+ ,
+
ρ−λ ρ−λ
E[Vk +1 | FkΦ ] ≤ ρVk + [L + (ρ − λ)] − (V (Φk ) + 1) − (V (Φk ) + 1) .
2 2

Hence E[Vk +1 | FkΦ ] ≤ ρVk − εf (Φk ) when Φk ∈ C c . Letting Zk = rk Vk , where r = ρ−1 ,

we then have E[Zk | FkΦ−1 ] − Zk −1 ≤ −εrk f (Φk −1 ) when Φk −1 ∈ C c . We now use
Dynkin’s formula to deduce that for all x ∈ X,

m
τC
0 ≤ Ex [zτ Cm ] = Ex [Z1 ] + Ex E[Zk | FkΦ−1 ] − Zk −1 I(τC ≥ 2)
k =2

m
τC
≤ Ex [Z1 ] − Ex εrk f (Φk −1 )I(τC ≥ 2) .
k =2

This and the Monotone Convergence Theorem shows that for all x ∈ X,

τC
Ex rk f (Φk −1 ) ≤ ε−1 rEx [V1 ] + rV (x).
k =1

This completes the proof, since Ex [V1 ] + V (x) ≤ λV0 (x) + L + V0 (x) by (19.37) and
(19.36).

19.2.3 Generalized criteria for non-evanescence and transience

A general criterion for Harris recurrence on a topological space can be obtained from
the following history-dependent drift condition, which generalizes (V1).

Generalized non-positive drift condition

There exists a compact set C ⊂ X, and an adapted coercive sequence
{Vk , FkΦ } such that

E[Vk +1 | FkΦ ] ≤ Vk a.s. [P∗ ], (19.38)

when σC > k, k ∈ Z+ .

Theorem 19.2.4. If (19.38) holds, then Φ is non-evanescent. Hence if Φ is a ψ-

irreducible T-chain and (19.38) holds for a coercive sequence and a compact C, then Φ
is Harris recurrent.
19.2. History-dependent drift criteria 495

Proof The proof is almost identical to that of Theorem 9.4.1. If Px {Φ → ∞} > 0

for some x ∈ X, then (9.30) holds, so that for some M
( )
Pµ {σC = ∞} ∩ {Φ → ∞} > 0, (19.39)

where µ = P M (x, · ).
This time let Mi = Vi I{σC ≥ i}. Again we have that (Mk , FkΦ ) is a positive
supermartingale, since

E[Mk | FkΦ−1 ] = I{σC ≥ k}E[Vk | FkΦ−1 ] ≤ I{σC ≥ k}Vk −1 ≤ Mk −1 .

Hence there exists an almost surely ﬁnite random variable M∞ such that Mk → M∞
as k → ∞.
But as in Theorem 9.4.1, either σC < ∞ in which case M∞ = 0, or σC = ∞ which
contradicts (19.39). Hence Φ is again non-evanescent.
The Harris recurrence when Φ is a T-chain follows as usual by Theorem 9.2.2.

Finally, we give a criterion for transience using a time-varying test function.

Generalized non-negative drift condition

There exists a set A ∈ B(X), and a uniformly bounded, adapted sequence
{Vk , FkΦ } such that

E[Vk +1 | FkΦ ] ≥ Vk a.s. [P∗ ], (19.40)

when σA > k, k ∈ Z+ .

Theorem 19.2.5. Suppose that the process Vk satisﬁes (19.40) for a set A, and suppose
that for deterministic constants L > M ,

Vk ≤ L, I{σA = k}Vk ≤ M, k ∈ Z+ .

Then for all x ∈ X

V0 (x) − M
Px 0 {σA = ∞} ≥ .
L−M
Hence if both A and {x : V0 (x) > M } lie in B + (X), then Φ is transient.

Proof Deﬁne the sequence {Mk } by

Mk +1 = Vk +1 I{σA > k} + M I{σA ≤ k}.

Then, since {σA ≤ k} ∈ FkΦ , we have

E[Mk +1 | FkΦ ] ≥ Vk I{σA > k} + M I{σA ≤ k}

≥ Vk I{σA > k} + Vk I{σA = k} + M I{σA ≤ k − 1}
= Mk
496 Generalized classiﬁcation criteria

and the adapted process (Mk , FkΦ ) is thus a submartingale. Hence (L − Mk , FkΦ ) is a
positive supermartingale. By Kolmogorov’s Inequality (Theorem D.6.3) it follows that
for any T > 0
L − M0 (x)
Px {sup(L − Mk ) ≥ T } ≤ .
k ≥0 T
Letting T = L − M , and noting that M0 (x) ≥ V0 (x), gives
L − V0 (x)
Px { inf Mk ≤ M } ≤ .
k ≥0 L−M
Finally, since Mk = M for all k suﬃciently large whenever σA < ∞, it follows that
V0 (x) − M
Px {σA = ∞} ≥ Px { inf Mk > M } ≥
k ≥0 L−M
which is the desired bound.

19.2.4 The dependent parameter bilinear model

To illustrate the general results described above we will analyze the dependent param-
eter bilinear model defined as in (7.23) by the pair of equations
θk +1 = αθk + Zk +1 , |α| < 1,
Yk +1 = θk Yk + Wk +1 .
This model is just the simple adaptive control model with the control set to zero; but
while the model is somewhat simpler to define than the adaptive control model, we
will see that the lack of control makes it much more difficult to show that the model
is geometrically ergodic. One of the difficulties with this model is that to date a test
function of the form (V4) has not been explicitly computed, though we will show here
that a time-varying test function of the form (19.37) can be constructed.
The proof will require a substantially more stringent bound on the parameter process
than that which was used in the proof of Proposition 17.3.5. We will assume that
$ 2 %
ζz2 := E exp |Z1 | − 2 < 1. (19.41)
1 − |α|
Using a history-dependent test function of the form (19.37) we will prove the following:
Theorem 19.2.6. Suppose that conditions (DBL1)–(DBL2) hold, and (19.41) is satis-
fied. Then Φ is geometrically ergodic, and hence possesses a unique invariant probability
π. The CLT and LIL hold for the processes Y and θ, and for each initial condition
x ∈ X,

1 2
N
lim Yk = y 2 dπ < ∞ a.s. [Px ],
N →∞ N
k =1

|Ex [Yk2 ] − y 2 dπ| ≤ M (x)ρk , k ≥ 0,

where M is a continuous function on X and 0 < ρ < 1.

19.2. History-dependent drift criteria 497

θk
Proof It follows as in the proof of Proposition 17.3.5 that the joint process Φk =
Yk , k ≥ 0, is an aperiodic, ψ-irreducible T-chain.
In view of Theorem 19.2.3 it is enough to show that the history-dependent drift
(19.37) holds for an adapted process {Vk }. We now indicate how to construct such a
process.
First use the estimate x ≤ e−1 ex to show

1
k 1
k
k
| θi | ≤ e−(k −j +1) exp |θi | = e−(k −j +1) exp |θi | . (19.42)
i=j i=j i=j

But since by (2.14),

k
k
k
|θi | ≤ |α| |θi | + |α||θj −1 | + |Zi |,
i=j i=j i=j

we have

k
|α| 1
k
|θi | ≤ |θj −1 | + |Zi |, (19.43)
i=j
1 − |α| 1 − |α| i=j

and (19.42) and (19.43) imply the bound, for j ≥ 1,

1
k
|α| 1
k
| θi | ≤ e−(k −j +1) exp{ |θj −1 |} × exp{ |Zi |}. (19.44)
i=j
1 − |α| 1 − |α| i=j

Squaring both sides of (17.28) and applying (19.44), we obtain the bound

Yk2+1 ≤ 3Ak + 3Bk + 3Wk2+1 (19.45)

for all k ∈ Z+ , where

k
|α| 1k
1
Ak = { |Wj | exp{ |θj −1 |} exp{ |Zi | − 1}}2 ,
j =1
1 − |α| i=j
1 − |α|

2|α| 1k
2
Bk = θ02 Y02 exp{ |θ0 |} exp{ |Zi | − 2}.
1 − |α| i=1
1 − |α|

If we deﬁne
2|α|
Ck = exp{ |θk |},
1 − |α|
we have the three bounds, valid for any ε > 0,

E[Ak +1 | FkΦ ] ≤ ζz2 {(1 + ε)Ak + (1 + ε−1 )E[W 2 ]Ck },

E[Bk +1 | FkΦ ] ≤ ζz2 Bk ,
2|α| 1
E[Ck +1 | FkΦ ] ≤ |α|Ck + (1 − |α|)(E[exp{ |Z1 |}]) 1 −|α | .
1 − |α|
498 Generalized classiﬁcation criteria

This is shown in [275] and we omit the details which are too lengthy for this exposition.
The constant ε will be assumed small, but we will keep it free until we have performed
one more calculation. For k ≥ 0 we make the deﬁnition

Vk = ε3 Yk2 + ε2 Ak + Bk + Ck .

We have for any k ≥ 0,

2|α|
ε3 Yk2 + exp{ |θk |} ≤ Vk ,
1 − |α|
2|α |
and since V (y, θ) = ε3 y 2 + exp{ 1−|α | |θ|} is a coercive function on X, it follows that the
sequence {Vk : k ∈ Z+ } is coercive.
Using the bounds above we have for some R < ∞,

E[Vk +1 | FkΦ ] ≤ 3ε3 A2k + 3ε3 Bk + ζz2 ε2 (1 + ε)Ak + ζz2 ε2 (1 + ε−1 )E[W 2 ]Ck
+ ζz2 Bk + |α|Ck + R.

Rearranging terms gives

E[Vk +1 | FkΦ ] ≤ {3ε + ζz2 (1 + ε)}ε2 Ak + {3ε3 + ζz2 }Bk

+ {|α| + ε2 (1 + ε−1 )E[W 2 ]ζz2 }Ck + R.

Hence (19.37) holds with

λ = max(|α| + ζz2 ε2 (1 + ε−1 )E[W 2 ], ζz2 + 3ε3 , ζz2 (1 + ε) + 3ε),

and for ε suﬃciently small, we have λ < 1 as required.

19.3 Mixed drift conditions

One of the themes of this book has been the interplay between the various stability
concepts, and the existence of test functions which give appropriate and consistent drift
towards the center of the space.
We conclude with a section which considers chains where the drift is mixed: that
is, inward in some parts of the space, and outward in other parts. Of course, it again
follows from all we have done to date that for some functions (and in particular the
expected hitting time functions VC ) the one-step drift will always be towards the set C
from initial conditions outside of C. However, it is of considerable intuitive interest to
consider the drift when the function V is relatively arbitrary, in which case there is no
reason a priori to expect that the drift will be consistent in any useful way.
We will ﬁnd in this section that for a large class of functions, an appropriately
averaged drift over the state space is indeed “inwards” when the chain is positive, and
“outwards” when the chain is null. This accounts in yet another way for the success of
the seemingly simple drift criteria as tools for classifying general chains.
19.3. Mixed drift conditions 499

19.3.1 The limiting-average drift

Suppose that V is an everywhere ﬁnite non-negative function satisfying

P (x, dy)|V (y) − V (x)| ≤ d < ∞, x ∈ X. (19.46)

Then we have, for all n ∈ Z+ , x ∈ X,

P n (x, dy)|∆V (y)| ≤ d,

and thus the functions

n

n−1 P k (x, dy)∆V (y) (19.47)
k =1 Cc

are all well defined and finite everywhere. Obviously we need a little less than (19.46)
to guarantee this, but (19.46) will also be a convenient condition elsewhere.
Theorem 19.3.1. Suppose that Φ is ψ-irreducible and that V ≥ 0 satisfies (19.46). A
sufficient condition for the chain to be positive is that for some one x ∈ X and some
petite set C
n
lim inf n−1 P k (x, dy)∆V (y) < 0. (19.48)
n →∞ Cc
k =1

Proof By deﬁnition we have

P n +1 (x, dy)V (y) = P n (x, dw) P (w, dy)V (y)
(19.49)
= P n (x, dy)∆V (y) + P n (x, dy)V (y)

where all the terms in (19.49) are ﬁnite by induction and (19.46). By iteration, we then
get
n

−1 −1
n P n +1
(x, dy)V (y) = n P k (x, dy)∆V (y) + n−1 [∆V (x) + V (x)]
k =1

so that as n → ∞
n

lim inf n−1 P n (x, dy)∆V (y) ≥ 0. (19.50)
k =1
Now suppose by way of contradiction that Φ is null; then from Theorem 18.2.2 we have
that the petite set C is null, and so for every x we have by the bound in (19.46)

lim P n (x, dy)∆V (y) = 0.
n →∞ C

This, together with (19.50), cannot be true when we have assumed (19.48); so the chain
is indeed positive.

There is a converse to this result. We ﬁrst show that for positive chains and suitable
functions V , the drift ∆V , π-averaged over the whole space, is in fact zero.
500 Generalized classiﬁcation criteria

Theorem 19.3.2. Suppose that Φ is ψ-irreducible, positive with invariant probability

measure π, and that V ≥ 0 satisﬁes (19.46). Then

π(dy)∆V (y) = 0. (19.51)
X

Proof Consider the function Mz (x) deﬁned for z ∈ (0, 1) by

Mz (x) = P (x, dy)[z V (x) − z V (y ) ]/[1 − z].

We first show that |Mz (x)| is uniformly bounded for x ∈ X and z ∈ ( 12 , 1) under the
bound (19.46).
By the Mean Value Theorem and non-negativity of V we have for any 0 < z < 1,
d t
|z V (x) − z V (y ) | ≤ |V (x) − V (y)| sup | z|
t≥0 dt
= |V (x) − V (y)|| log(z)|. (19.52)
Hence under (19.46), for all x ∈ X and z ∈ (0, 1),

| log(z)| d
|Mz (x)| ≤ P (x, dy)|V (x) − V (y)| ≤ (19.53)
1−z z
which establishes the claimed boundedness of |Mz (x)|.
Moreover, by (19.52) and dominated convergence,
$ z V (x) − z V (y ) %
lim Mz (x) = P (x, dy) lim = ∆V (x). (19.54)
z ↑1 z ↑1 1−z

Since π(dx)z V (x) < ∞ for fixed z ∈ (0, 1), we can interchange the order of integration
and find

π(dx)Mz (x) = π(dx) P (x, dy)[z V (x) − z V (y ) ]/[1 − z] = 0.

Hence by the Dominated Convergence Theorem once more we have

0 = limz ↑1 π(dx)Mz (x)

= π(dx) limz ↑1 Mz (x) (19.55)

= π(dx)∆V (x)
as required.

Intuitively, one might expect from stationarity that the balance equation (19.51)
will hold in complete generality. But we know that this is not the case without some
auxiliary conditions such as (19.46): we saw this in Section 11.5.1, where we showed an
example of a positive chain with everywhere strictly positive drift.
We now see that the balanced drift of (19.51) occurs, as one might expect from
(19.48), from the inward drift towards suitable sets C, combined with an outward drift
from such sets. This gives us the converse to Theorem 19.3.1.
19.3. Mixed drift conditions 501

Theorem 19.3.3. Suppose that Φ is ψ-irreducible and that V ≥ 0 satisﬁes (19.46). If

C is a sublevel set of V with C c , C ∈ B + (X), then a necessary condition for the chain
to be positive is that
π(dw)∆V (w) < 0, (19.56)
Cc
in which case for almost all x ∈ X
n

lim n−1 P k (x, dy)∆V (y) < 0. (19.57)
n →∞ Cc
k =1

Thus, under these conditions, (19.48) is necessary and suﬃcient for positivity.

Proof Suppose the chain is positive, and that C = {x : V (x) ≤ b} ∈ B+ (X) is a

sublevel set of the function V , so that obviously

V (y) > sup V (x), y ∈ Cc . (19.58)

x∈C

From (19.46) we certainly have that drift oﬀ C is bounded, so that

|∆V (x)| ≤ B < ∞, x ∈ C, (19.59)

and in particular C π(dw)∆V (w) ≤ B .
Using the invariance of π,

C
π(dw)∆V (w) = C π(dx) P (x, dw)V (w) − C π(dw)V (w)

= C
π(dx)[ C c P (x, dw)V (w) + C P (x, dw)V (w)]

− [ π(dx)P (x, dw)]V (w)
C X
(19.60)
= C
π(dx) Cc
P (x, dw)V (w)

+ C
π(dx) C
P (x, dw)V (w)

− X
π(dx) C
P (x, dw)V (w).

Now provided the set C c is in B + (X), we show the right hand side of (19.60) is strictly
positive. To see this requires
two steps.
First observe that C π(dx)P (x, C c ) > 0 since C, C c ∈ B + (X). Since V (y) >
supw ∈C V (w) for y ∈ C c we have

π(dx) P (x, dw)V (w) > sup V (w) π(dx)P (x, C c ), (19.61)
C Cc w ∈C C

showing from (19.60) that

6
7
π(dw)∆V (w) > sup V (w) π(dx)P (x, C ) −
c
π(dx)P (x, C) . (19.62)
C w ∈C C Cc
502 Generalized classiﬁcation criteria

Secondly, we have the balanced-ﬂow equation

C
π(dx)P (x, C c ) = C π(dx)[1 − P (x, C)]

= π(C) − C
π(dx)P (x, C)
(19.63)
= X
π(dx)P (x, C) − C
π(dx)P (x, C)

= Cc
π(dx)P (x, C).

Putting this into the strict inequality in (19.62), we have that

π(dw)∆V (w) > 0 (19.64)
C

provided that V does not vanish on C. If V does vanish on C, then (19.64) holds
automatically.
But now, under (19.46) we have π(dx)∆V (x) = 0 from (19.51), and so (19.56) is
a consequence of this and (19.64). Since ∆V (y) is bounded under (19.46), (19.57) is
actually identical to (19.56) and the theorem is proved.

These results show that for a wide class of functions, our criteria for positivity and
nullity, given respectively in Section 11.3 and Section 11.5.1, are essentially the two
extreme cases of this mixed drift result. We conclude with an example where similar
mixed behavior may be exhibited quite explicitly.

19.3.2 A mixed drift criterion for stability of the ladder chain

We return to the ladder chain deﬁned by (10.37). Recall that the structure of the
stationary measure, when it exists, is known to have an operator-geometric form as
in Section 10.5.3. Here we consider conditions under which such a stationary measure
exists.
If we assume that the zero-level transitions have the form
∞

Λ∗i (x, A) = P (i, x; 0, A) = Λj (x, A) (19.65)
j =k +1

so that there is a greater degree of homogeneity than in the general model, then the
operator
∞

Λ(x, A) := Λj (x, A)
j =0

is stochastic.
Thus Λ(x, A) deﬁnes a Markov chain ΦΛ , which is the marginal position of Φ ig-
noring the actual rung: by direct calculation we can check that for any B

P n (i, x; Z+ × B) = Λn (x, B). (19.66)

Moreover, (19.66) immediately gives that if Φ is ψ-irreducible, then ΦΛ is ψ ∗ -irreducible,

where ψ ∗ (B) = ψ(Z+ × B).
19.3. Mixed drift conditions 503

Now deﬁne, for any w ∈ X, the expected change in ladder height by

∞

β(w) = jΛj (x, X) : (19.67)
j =0

if β(w) > 1 + δ for all w then, exactly as in our analysis of the random walk on a half
line, we have that
E(i,w ) [τC ] < ∞
for all i > M, w ∈ X, where C = ∪M 0 {j × X} is the “bottom end” of the ladder.
But one might not have such downwards drift uniform across the rungs. The result
we prove is thus an average drift criterion.

Theorem 19.3.4. Suppose that the chain Φ is ψ-irreducible and has the structure
(19.65). If the marginal chain ΦΛ admits an invariant probability measure ν such that

ν(dw)β(w) > 1, (19.68)

then Φ admits an invariant probability measure π.

Proof The proof is similar to that of Theorem 19.3.1, but we do not assume bound-
edness of the drifts so we must be a little more delicate. Choosing V (i, w) = i, we have
ﬁrst that
i ∞

∆V (i, w) = 1 − jΛj (x, X) − (i + 1) Λj (x, X);
j =0 j =i+1

note that in particular for i > d this gives

∆V (i, w) ≤ ∆V (d, w), w ∈ X. (19.69)

Now even though (19.46) is not assumed, because |∆V (i, w)| ≤ d + 1 for i ≤ d and
because, starting at level i, after k steps the chain cannot be above level i + k, we see
exactly as in proving (19.50) that
n

−1
lim inf n P k (i, x; j × dy)∆V (j, y) ≥ 0. (19.70)
k =1 j

We now show that this average non-negative drift is not possible under (19.68), unless
the chain is positive.
From (19.68) we have

0 > lim ν(dw)∆V (k, w). (19.71)
k →∞

Choose d suﬃciently large that

0> ν(dw)∆V (d, w). (19.72)
504 Generalized classiﬁcation criteria

Further truncate by choosing v ≥ 1 large enough that if Dv = {y : ∆V (d, y) ≥ −v}

then, using (19.72),
0> ν(dw)∆V (d, w). (19.73)
Dv

Now decompose the left hand side of (19.70) as

n

−1
n P k (i, x; j × dy)∆V (j, y)
k =1 X j
n

d−1
= n−1 P k (i, x; j × dy)∆V (j, y)
k =1 X j =0
n
−1
+n P k (i, x; j × dy)∆V (j, y)
k =1 X j ≥d

n
d−1
≤ n−1 d P k (i, x; j × X)
k =1 j =0
n

+ n−1 P k (i, x; j × dy)∆V (j, y) (19.74)
k =1 D v j ≥d

since on Dvc we have ∆V (d, y) ≤ −1.

Assume the chain is not positive: we now show that (19.74) is strictly negative, and
this provides the required contradiction of (19.70).
We know from Theorem 18.2.2 that there exists a sequence Cn of null sets with
Cn ↑ Z+ × X.
In fact, in this model we now show that every rung is such a null set. Fix a rung
j × X, and let Cn (j) = Cn ∩ j × X. Since Φ is assumed ψ ∗ -irreducible with an invariant
probability measure ν, we have from the ergodic theorem (13.62) that for ψ ∗ -a.e. x and
any M ,
n
lim n−1 Λk (x, CM (j)) = ν(CM (j)).
k =1

Choose M so large that ν(CM (j)) ≥ 1 − ε for a given ε > 0. Then we have
n n
lim n−1 k =1 P k (i, x; j × X) = lim n−1 k =1 P k (i, x; j × CM (j))
n
+ lim n−1 k =1 P k (i, x; j × [CM (j)]c )
n
≤ lim n−1 k =1 P k (i, x; CM )
n
+ lim n−1 k =1 Λk (x, [CM (j)]c )

≤ ε
(19.75)
which shows the rung j × X to be null as claimed.
19.3. Mixed drift conditions 505

Using (19.75) we have in particular that for any B, and d as above,

n
ν(B) = lim n−1 k =1 Λk (x, B)
n d−1
= lim n−1 k =1 j =0 P k (i, x; j × B)
n ∞ (19.76)
+ lim n−1 k =1 j =d P (i, x; j × B)
k

n ∞
= lim n−1 k =1 j =d P k (i, x; j × B).

We now use (19.75) and (19.76) in (19.74). This gives, successively,

n

−1
lim inf n P k (i, x; j × dy)∆V (j, y)
n →∞ X
k =1 j

n
≤ lim inf n →∞ n−1 k =1 Dv j ≥d P k (i, x; j × dy)∆V (j, y)

= Dv
ν(dy)∆V (j, y) < 0

from the construction in (19.73).

This is the required contradiction of (19.70) and we are finished.

It is obviously of interest to know whether the same average drift condition suffices
for positivity when (19.65) does not hold.
In general, this is a subtle question. Writing as before [0] = 0 × X, we obviously
have that under (19.68)
E0,y [τ[0] ] < ∞ (19.77)
for ν-a.e. y, since this quantity does not depend on the detailed hitting distribution on
[0]. But although this ensures that the process on [0] is well defined, it does not even
ensure that it is recurrent.
As an example of the range of behaviors possible, let us take X = Z+ also, and
consider a chain that can move only up one rung or down one rung: specifically, choose
0 < p, q < 1 and

Λ0 (x, x − 1) = pq, x ≥ 1,
Λ0 (x, x + 1) = (1 − p)q, x ≥ 0,
(19.78)
Λ2 (x, x − 1) = p(1 − q), x ≥ 1,
Λ2 (x, x + 1) = (1 − p)(1 − q), x ≥ 0,

with the transitions on the boundary given by

Λ0 (0, 0) = pq,
(19.79)
Λ2 (0, 0) = p(1 − q).

The marginal chain ΦΛ is a random walk on the half line {0, 1, . . .} with an invariant
measure ν if and only if p > 1/2. On the other hand, β(x) > 1 if and only if q < 1/2.
Thus (19.68) holds if q < 1/2 < p.
506 Generalized classiﬁcation criteria

This chain falls into the class that we have considered in Theorem 19.3.4; but other
behaviors follow if we vary the structure at the bottom rung.
Let us then specify the boundary conditions in a manner other than (19.65): put
Λ∗1 (x, x − 1) = p(1 − q) and Λ∗1 (x, x + 1) = (1 − p)(1 − q) but

Λ∗0 (x, x − 1) = r(1 − q), x ≥ 1,

(19.80)
Λ∗0 (x, x + 1) = (1 − r)(1 − q), x ≥ 1,

where 0 < r < 1.

Consider now the expected increments in the chain Φ[0] on [0]. By considering
whether the chain leaves [0] or not we have for all x ≥ 1
1−q
[0]
E[Φ[0] | Φn −1 = x] − x ≥ (1 − 2r)(1 − q) + (1 − 2p) + 1 q: (19.81)
n
1 − 2q

here the second term follows since, on an excursion from [0], the expected drift to the
left at every step is no more than (1 − 2p) independent of level change, and the expected
number of steps to return to [0] from 1 × X is (1 − q)/(1 − 2q).
From (19.81) we therefore have that the chain Φ[0] is transient if r and q are small
enough, and p − 1/2 is not too large.
This example shows the critical need to identify petite sets and the return times to
them in classifying any chain: here we have an example where the set [0] is not petite,
although it has many of the properties of a petite set. Yet even though we have (19.77)
proven, we do not even have enough to guarantee the chain is recurrent.

19.3.3 Stability of the GI/G/1 queue

We saw in Section 3.5 that with appropriate choice of kernels the ladder chain serves as a
model for the GI/G/1 queue. We will use the average drift condition of Theorem 19.3.4
to derive criteria for stability of this model.
Of course, in this case we do not have (19.65), and the example at the end of the
last section shows that we cannot necessarily deduce anything from (19.68).
In this case, however, we have as in Section 10.5.3 that [0] is petite, and that
the process on [0], if honest, has invariant measure H where H is the service time
distribution. If we can satisfy (19.68), then it follows from (19.77) that the process on
[0] is indeed honest, and we only have to check further that

H(dy)E0,y [τ[0] ] < ∞ (19.82)

to derive positivity.
We conclude by proving through this approach a result complementing the result
found in quite another way in Proposition 11.4.4.

Theorem 19.3.5. The GI/G/1 queue with mean inter-arrival time λ and mean service
time µ satisﬁes (19.68) if and only if λ > µ, and in this case the chain has an invariant
measure given by (10.52).
19.3. Mixed drift conditions 507

Proof From the representations (3.42) and (3.43), we have the kernel
∞
Λ(x, [0, y]) = G(dt)P t (x, y)
0

where P t (x, y) = P(Rt ≤ y | R0 ≤ y) is the forward recurrence time process in a renewal

process N (t) generated by increments with distribution H.
Since H has ﬁnite mean µ, we know from (10.36) that P δ (x, y) has invariant measure
x
−1
ν[0, x] = µ [1 − H(x)]dx
0

for every δ: thus ν is also invariant for Λ.

On the other hand, from (3.42),
∞

β(x) = nΛn (x, [0, ∞))
n =0
∞
= n G(dt)Pnt (x, ∞)
n =0

= G(dt)E[N (t) | R0 = x].

The stationarity of ν for the renewal process N (t) shows that

∞
ν(dx)E[N (t) | R0 = x] = t/µ
0

and so by Fubini’s Theorem, we therefore have

∞ ∞
ν(dx)β(x) = 0 0
ν(dx)E[N (t) | R 0 = x] G(dt)
∞
= [t/µ]G(dt) (19.83)
0

= λ/µ

which proves the ﬁrst part of the theorem.

To conclude, we note that in this particular case, we know more about the structure
of E0,y [τ[0] ], and this enables us to move from the case where (19.65) holds. Given the
starting conﬁguration (0, y), let ny denote the number of customers arriving in the ﬁrst
service time y: if η(≤ ∞) denotes the expected number of customers in a busy period of
the queue, then by using the trick of rearranging the order of service to deal with each
of the identical ny “busy periods” generated by these customers separately, we have the
linear structure
∞

E0,y [τ[0] ] = 1 + E0,y [ny η] = 1 + η Gn ∗ [0, y]. (19.84)
n =0

As in (19.77), we at least know that since (19.68) holds, the left hand side of this
equation is ﬁnite, so that η < ∞. Moreover, from the Blackwell Renewal Theorem
508 Generalized classiﬁcation criteria

(Theorem 14.5.1) we have for any ε and large y

∞

Gn ∗ [0, y] ≤ y[λ−1 + ε] (19.85)
n =0

so that, ﬁnally, (19.82) follows from (19.84), (19.85), and the fact that the mean of H
is ﬁnite.

19.4 Commentary*
Despite the success of the simple drift, or Foster–Lyapunov, approach there is a growing
need for more subtle variations such as those we present here.
There are several cases in the literature where the analysis of state-dependent (or at
least not simple one-step) drift appears unavoidable: see Tjøstheim [386] or Chen and
Tsay [66], where m-step skeletons {Φm k } are analyzed. Analysis of this kind is simplified
if the various parts of the space can be considered separately as in Section 19.1.2.
In the countable space context, Theorem 19.1.1 was first shown as Theorem 1.3 and
Theorem 19.1.2 as Theorem 1.4 of Malyšhev and Men’šikov [243]. Their proofs, espe-
cially of Theorem 19.1.2, are more complex than those based on sample path arguments,
which were developed along with Theorem 19.1.3 in [283]. As noted there, the result
can be extended by choosing n(x) as a random variable, conditionally independent of
the process, on Z+ . In the special case where n(x) has a uniform distribution on [1, n]
independent of x, we get a time-averaged result used by Meyn and Down [273] in ana-
lyzing stability of queueing networks. If the variable has a point mass at n(x) we get
the results given here.
Models of random walk on the orthant in Section 19.1.2 have been analyzed in nu-
merous different ways on the integer quadrant Z2+ by, for example, [244, 257, 243, 340,
109]. Much of their work pertains to more general models which assume different drifts
on the boundary, thus leading to more complex conditions. In [244, 257, 243] it is
assumed that the increments are bounded (although they also analyze higher dimen-
sional models), whilst in [340, 109] it is shown that one can actually choose n = 1 if a
quadratic function is used for a test function, whilst weakening the bounded increments
assumption to a second moment condition: this method appears to go back to Kingman
[207].
As we have noted, positive recurrence in the simple case illustrated here could be
established more easily given the independence of the two components. However, the
bound using linear functions in (19.25) seems to be new, as does the continuous space
methodology we use here.
The antibody model here is based on that in [283]. The attack pattern of the “in-
vaders” is modeled to a large extent on the rabies model developed in Bartoszyński [19],
although the need to be the same order of magnitude as the antibody group is a weaker
assumption than that implicit in the continuous time continuous space model there.
The results in Section 19.2 are largely taken from Meyn and Tweedie [277]: they
appear to give a fruitful approach to more complex models, and the seeming simplicity
of the presentation here is largely a function of the development of the methods based on
Dynkin’s formula for the non-time-varying case. An application to adaptive control is
19.4. Commentary* 509

given in Meyn and Guo [274], where drift functions which depend on the whole history
of the chain are used systematically. Regrettably, examples using this approach are
typically too complex to present here.
The dependent parameter bilinear time series model is analyzed in [275], from which
we adopt the proof of Theorem 19.2.6. In Karlsen [195] a decoupling inequality of [210]
is used to obtain a second order stationary solution in the Gaussian parameter case, and
Brandt [44] provides a simple argument, similar to the proof of Proposition 17.3.4, to
obtain boundedness in probability for general bilinear time series models with stationary
coefficients.
Results on mixed drifts, such as those in Section 19.3.1, have been discovered inde-
pendently several times.
Although Neuts [292] analyzed a two-drift chain in detail, on a countable space the
first approach to classifying chains with different drifts appears to be due to Marlin [249].
He considered the special case of V (x) = x and assumed a fixed finite number of dif-
ferent drifts. The form given here was developed for countable spaces by Tweedie [396]
(although the proof there is incomplete) and Rosberg [336], who gives a slightly different
converse statement. A general state space form is in Tweedie [398].
The condition (19.53) for the converse result to hold, and which also suffices to ensure
that ∆V (w) ≥ 0 on C c implies non-positivity, is known as Kaplan’s condition [193]:
the general state space version sketched here is adapted from a countable space version
in [349]. Related results are in [380].
The average mean drift criterion for the ladder process in Section 19.3.2 is due to
Neuts [293] when the rungs are finite, and is proved there by matrix methods: the
general result is in [399], and (19.68) is also shown there to be necessary for positivity
under reasonable assumptions.
The final criterion for stability of the GI/G/1 queue produced by this analysis is of
course totally standard [9]: that the very indirect Markovian approach reproduces this
result exactly brings us to a remarkably reassuring conclusion.
Added in second printing: In the past year, Dai has shown in [80] that the state-
dependent drift criterion Theorem 19.1.2 leads to a new approach to the stability of
stochastic queueing network models via the analysis of a simpler deterministic fluid
model. Related work has been developed by Chen [65] and Stolyar [373], and these
results have been strengthened in Dai and Weiss [82] and Dai and Meyn [81].

Commentary for the second edition: Over the past ten years there have been
many further improvements in the theory surrounding the multi-step drift criterion
for stability within speciﬁc applications. Applications include stochastic approximation
[40, 39], Markov chain Monte Carlo (MCMC) [100], as well as stochastic networks [267],
which was the original motivation for the technique in [243].
Chapter 20

Epilogue to the second edition

Following publication of the “Big Red Book” in the early nineties, Richard and I de-
voted more attention to applications. Each of us became interested in simulation, albeit
in entirely different contexts. In addition, Richard spent more of his time on topics in
statistics, and I became increasingly involved in topics surrounding control and perfor-
mance evaluation for networks.
Personally, I thought that I would abandon Markov chains as a research topic. This,
fortunately, has turned out to be an impossible task!
The three sections that follow can be regarded as proposals for future monographs
that will never be written by either of us. The first section comprises our biggest thrust
shortly after the book was complete, along with my own view of geometric ergodicity
and spectral theory. The second section describes how methods in this book can be
applied to construct and analyze simulation algorithms. The final section explains how
theory in continuous time can be generated from discrete time counterparts.

20.1 Geometric ergodicity and spectral theory

The weighted supremum norm
·
V that forms a foundation of this subject was brought
to our attention by Arie Hordijk. In his paper [163], co-authored with Floske Spieksma,
they establish a version of the Geometric Ergodic Theorem for countable state space
models. This technique had tremendous influence on our research during the writing of
the first edition, and in subsequent research.
The Geometric Ergodic Theorem and refinements developed in Chapter 16 have
found application in many fields. Examples include numerical integration [250], statis-
tics [60], machine learning [217, 135], Markov decision theory [41], and economics
[154, 254]. This ergodic theorem and other ideas in Chapter 16 have proved valu-
able in the development of various theoretical aspects of Markov chains. Some of these
ideas are surveyed in the discussion that follows.

20.1.1 The spectrum of P

One topic missing from the ﬁrst edition is the relationship between ergodic theory and
the spectral properties of the transition kernel P that deﬁnes the chain (see (1.1)).

510
20.1. Geometric ergodicity and spectral theory 511

Among the many applications of spectral theory is the identiﬁcation of the rate of
convergence in the Geometric Ergodic Theorem. One elegant bound of Diaconis and
Stroock is described below in (20.6). Spectral theory and surrounding techniques are
also used to construct ﬁnite-rank approximations of a transition kernel. These take the
form

n
P/ = si ⊗ µi , (20.1)
i=1

where for a function r and measure µ we deﬁne [r ⊗ µ](x, dy) := r(x)µ(dy). In most
cases we restrict to an LV∞ setting. In this case it is assumed that P/ is a bounded linear
operator on LV∞ , which amounts to the inclusions {si } ⊂ LV∞ and {µi } ⊂ MV1 (i.e., V
is µi -integrable for each i).

For any complex number z ∈ C we denote

Tz := [Iz − P ]−1 , Zz := [Iz − P + 1 ⊗ π]−1 , (20.2)

provided the inverse exists as a bounded linear operator on LV∞ . This is true whenever
|z| > 1 and |||P n |||V is uniformly bounded in n since we can express the inverses as power
series

∞
∞

Tz = z −n −1 P n , Zz = z −n −1 [P − 1 ⊗ π]n .
n =0 n =0

Hence these kernels generalize the resolvents Ka ε and Uh , deﬁned in (3.26) and (12.13),
respectively.

We omit the subscript on Zz when z = 1. In this special case, Z = Z1 is known

as the fundamental kernel. In the proof of Theorem 17.4.2 we noted that the function
ĝ := Zg is a solution to Poisson’s equation (17.37) for any g ∈ LV∞ .

This kernel Tz deﬁned in (20.2) is used to deﬁne the spectrum of the kernel P :
512 Epilogue to the second edition

Spectrum for a Markov chain

(i) The spectrum SV (P ) is the set of all z ∈ C such that the operator
Tz deﬁned in (20.2) does not exist as a bounded linear operator on
LV∞ .

(ii) z0 ∈ SV (P ) is a pole of (ﬁnite) multiplicity n if it is an isolated point

in SV (P ) and the associated projection operator

/ 1
Pz 0 := lim [Iz − P ]−1 dz
ε→0 2πi {z :|z −z |=ε}
0

can be expressed as a ﬁnite-rank operator on LV∞ of the form (20.1).

(iii) There is a spectral gap in LV∞ if 1 is an isolated point in the spectrum

with finite multiplicity.
(iv) The spectrum is said to be discrete in LV∞ if the set {z : z ∈
SV (P ), |z| ≥ ε} is finite for each ε > 0 and contains only poles
of finite multiplicity.

Spectral theory for Markov chains arises in a ﬁner analysis of the Geometric Ergodic
Theorem and in the theory of large deviations. Just as in the theory of linear systems
and ﬁnite state space Markov chains, the dynamics of the chain can be understood
through an analysis of the spectrum of P . We survey these ideas next.

20.1.2 Rates of convergence and eigenvalues

The rate of convergence in the Geometric Ergodic Theorem is intimately related to the
spectrum of P . To see this, recall that a (possibly complex) number λ ∈ SV (P ) is called
an eigenvalue if there exists h ∈ LV∞ satisfying P h = λh and π(|h|) = 0. In this case h
is called an eigenfunction. Provided |λ| < 1 and π(h) = 0, we obtain an exact bound
on the rate of convergence to steady state when h is an eigenfunction:

|P n h − π(h)| = |λ|n |h|.

This observation can be strengthened. If the state space is ﬁnite, it is known that
the rate of convergence to equilibrium is determined by the second largest eigenvalue,
and this result can be generalized to obtain bounds on the rate of convergence in the
Geometric Ergodic Theorem.
We have the following general result that follows from ideas in [282, 218]. The
constant λ∗ ∈ (0, 1) appearing in (20.3) is called the spectral radius of P n − 1 ⊗ π.

Theorem 20.1.1. Suppose that Φ is a Markov chain on a general state space.

20.1. Geometric ergodicity and spectral theory 513

(i) If Φ is V -uniformly ergodic, then there is a spectral gap in LV∞ , and there exists
ε0 < 1 such that the inverse operator Zz exists as a bounded linear operator on
LV∞ for every z ∈ C satisfying |z| > ε0 .
(ii) Conversely, suppose that there exists ε0 < 1 such that |||Zz |||V < ∞ for every z ∈ C
satisfying |z| > ε0 . Then Φ is V -uniformly ergodic, and the rate of convergence
is bounded as follows:

lim n−1 log(|||P n − 1 ⊗ π|||V ) ≤ log(λ∗ ) (20.3)

n →∞

where λ∗ < 1 is the minimum value of ε0 , which coincides with the minimal bound
on the spectrum within the open unit disk: λ∗ = max{|z| : z = 1, z ∈ SV (P )}.

Proof For (i) we begin with the proof that Zz is deﬁned for this range of z.
Under the assumptions of (i) we have for some R < ∞, r > 1,

|||P n − 1 ⊗ π|||V ≤ Rr−n , n ≥ 0. (20.4)

In this case the power series representation is justiﬁed,

∞
∞

Zz = z −n −1 [P − 1 ⊗ π]n = z −n −1 [P n − 1 ⊗ π] .
n =0 n =0

The triangle inequality gives the bound

∞

|||Zz |||V ≤ |z|−n −1 |||P n − 1 ⊗ π|||V ,
n =0

which when combined with the sequence of bounds given in (20.4) gives
∞
1
|||Zz |||V ≤ R |z|−n −1 r−n = R|z|−1
n =0
1 − |z|−1 r−1

for |z| > r−1 .

We now turn to Tz . The two inverses are related: whenever Zz is deﬁned we can
express Tz in terms of Zz ,

[Iz − P ]−1 = [(Iz − P + 1 ⊗ π) − 1 ⊗ π]−1 = Zz [I − 1 ⊗ πZz ]−1 .

From the deﬁnition (20.2) we also have πZz = Zz π = z −1 π. Consequently, provided

z = 1,
Tz = [Iz − P ]−1 = Zz [I − z −1 1 ⊗ π]−1 = Zz [I + (z − 1)−1 1 ⊗ π].
This representation completes the proof of (i).
To prove (ii) we ﬁrst observe that the norm |||Zz |||V is continuous on the set {z : |z| >
λ∗ }. Consequently, for each ε > λ∗ there exists Bε < ∞ such that |||Zz |||V ≤ Bε when
|z| = ε. For any f ∈ LV∞ and n ≥ 1 we denote
2π
1
fn (x) := Zεe i θ f (x) e−in θ dz , x ∈ X.
2πi 0
514 Epilogue to the second edition

The bound |||Zz |||V ≤ Bε implies that

fn
V ≤ Bε
f
V . In fact we can identify this
function as
fn (x) = ε−n −1 (P n f (x) − π(f )).
We conclude that
P n f − π(f )
V ≤ Bε
f
V εn +1 , and hence

lim n−1 log(|||P n − 1 ⊗ π|||V ) ≤ log(ε).

n →∞

This establishes (20.3) since ε > λ∗ was arbitrary.

While elegant, the limit (20.3) tells us nothing about the actual error |P n f (x)−π(f )|
for a given ﬁnite n. Elegant bounds are available for chains that are reversible.
A Markov chain with invariant measure π is called reversible if the statistics of the
stationary process on the two-sided time interval are invariant under time reversal:
dist
{Φt : t ∈ R} = {Φ−t : t ∈ R} .

Applying the Markov property, it can be shown that this invariance holds if and only
if the bivariate distributions are insensitive to time reversal in steady state:
dist
(Φt , Φt+1 ) = (Φt+1 , Φt ) .

The bivariate distributions can be identiﬁed, leading to the more standard deﬁnition:
Φ is reversible if the detailed balance equations hold

π(dx)P (x, dy) = π(dy)P (y, dx). (20.5)

For a reversible chain with ﬁnite state space each of the eigenvalues is real. Diaconis
and Stroock in [90] obtain bounds on the second largest eigenvalue in this setting. A
striking conclusion is the following explicit bound on the rate of convergence
:

P n (x, · ) − π
V ≤ 1−π (x) n
π (x) λ∗ , (20.6)

where λ∗ is the magnitude of the second largest eigenvalue, λ∗ = max{|λ| : λ = 1}, and
V ≡ 1. Bounds on the rate of convergence for chains that are not necessarily reversible
are obtained in [119], again in the finite state space case. The bounds are based on
spectral theory, but the spectrum of the symmetrized kernel P P is considered, where
P is the transition kernel for the time-reversed chain.
See Diaconis and Saloff-Coste [89], Rosenthal’s survey [344], and Baxendale [21] for
bibliographies and further generalizations and improvements since 1996.
Spectral theory is often cast in a Hilbert space setting in the space L2 (π), defined as
the set of all measurable functions f on X satisfying π(f 2 ) <-∞. For arbitrary p ≥ 1,
the Lp (π) norm of a function f : X → R is defined by
f
p = p π(|f |p ). It is natural to
extend the definitions of spectrum and spectral gap to the Lp (π) norm. In particular,
the chain is called geometrically ergodic in Lp (π) if there exist r > 1 and R < ∞ such
that for all n ∈ Z+ , f ∈ Lp (π),

P n f − π(f )
pp := |P n f (x) − π(f )|p π(dx) ≤ Rr−n
f
pp . (20.7)
20.1. Geometric ergodicity and spectral theory 515

In operator-theoretic language, (20.7) means that the spectral radius of P − 1 ⊗ π (in

the induced operator norm) is strictly less than unity.
An application of Theorem 15.0.1 shows that, for any p, if the chain is geometrically
ergodic in Lp (π) then it is also V -uniformly ergodic for some V : X → [1, ∞], ﬁnite
a.e. [π]. A few other relationships between these diﬀerent notions of ergodicity are
summarized in the following.

Proposition 20.1.2. Each of the following statements refers to a positive recurrent

ψ-irreducible Markov chain with invariant probability measure π:

(i) There is a reversible Markov chain that is V -uniformly ergodic but not geometri-
cally ergodic in L1 (π).

(ii) There is a Markov chain that is V -uniformly ergodic but not geometrically ergodic
in L2 (π).

(iii) If the chain is reversible, then it is geometrically ergodic in L2 (π) if and only if it
is V -uniformly ergodic for some V : X → [1, ∞], ﬁnite a.e. [π].

Proof The M/M/1 queue provides an example in (i). Proposition 20.1.3 that
follows demonstrates that this model is V -uniformly ergodic and reversible provided a
“load condition” holds. It is shown that there is a constant λ ∈ (0, 1) such that every
λ ∈ [λ, 1) is an eigenvalue, with corresponding eigenfunction h ∈ LV∞ , and hence also
h ∈ L1 (π). The eigenfunction property gives π(h) = 0 and
P n h − π(h)
1 = λn
h
1
for each n. This rules out (20.7) for a fixed r > 1.
To prove (ii), note first that if the chain is geometrically ergodic in L2 (π), then
(20.7) combined with the definition (17.4) gives the following bound on the asymptotic
variance:
r+1
γf2 ≤ R
f
22 < ∞, f ∈ L2 (π).
r−1
In particular, the asymptotic variance is finite whenever the ordinary variance is finite.
Häggström in [150] gives an example of a V -uniformly ergodic Markov chain and a
function f such that f ∈ L2 (π), yet the asymptotic variance is not finite.
Part (iii) is established by Roberts and Rosenthal in [329].

20.1.3 The spectrum of the M/M/1 queue

The M/M/1 queue is the simplest queueing model in which service times and inter-
arrival times are exponentially distributed. Following sampling, as described in Sec-
tion 2.4.2, the discrete time model is identical to the random walk on the half line
(1.7) in which the marginal distribution of W is supported on the two points ±1. The
following discussion is based on discussion in [282].
The Markov transition matrix can be expressed for any function h by

P h (n) = ph(n + 1) + (1 − p)h(n − 1), n ∈ X := {0, 1, 2, . . . } , (20.8)

where p denotes the probability that Wk is equal to one. In the special case n = 0
we set h(n − 1) = h(−1) = h(0) to make this formula consistent with the dynamics
516 Epilogue to the second edition

(1.7). We have seen in Section 16.1.3 that this chain is V -uniformly ergodic provided
ρ := p/(1 − p) < 1. We can take V (n) = r0n , n ≥ 0, for any r0 ∈ (1, ρ−1 ). It follows from
Theorem 20.1.1 that there is a spectral gap. The unique invariant measure is geometric,
π(n) = (1 − ρ)ρn , n ≥ 0. The detailed balance equations (20.5) are easily veriﬁed, so
that we can conclude that the M/M/1 queue is reversible.
We next consider the spectrum in LV∞ with this V , where r0 ∈ (ρ− 2 , ρ−1 ) is ﬁxed.
1

We find in Proposition 20.1.3 that the spectrum is not discrete even when the model
admits a spectral gap.
The structure of eigenfunctions can be identified through the form of the transition
law (20.8). This expression suggests the application of transform techniques: define for
any complex z,
H(z) = z −n h(n) .

This is defined for |z| > r0−1 whenever h ∈ LV∞ . If h is an eigenfunction, then on taking
transforms of each side of the eigenfunction equation P h = λh, and applying (20.8),
we find that H can be expressed as the ratio of quadratic functions. If the roots of the
denominator are distinct, then H can be expressed as the sum of two simpler rational
functions
z z
H(z) = c1 + c2
z − β1 z − β2
for constants c1 , c2 , where β1 , β2 are the poles of H.
Proposition 20.1.3. Suppose that ρ < 1, fix r0 ∈ (ρ− 2 , ρ−1 ), and define V (n) = r0n
1

for n ≥ 0. Then
(i) The queue is V -uniformly ergodic.

(ii) For each β ∈ (ρ− 2 , r0 ) an element λ of SV (P ) is given by λ = pβ + (1 − p)β −1 ∈

(0, 1). This is also an eigenvalue for P in LV∞ with eigenfunction given by the
diﬀerence of scaled geometric series

h(n) = (1 − β −1 )β n − (1 − β− −1 )β− n , n ∈ X, (20.9)

where β− ∈ (1, ρ− 2 ) is the second solution to the equation λ = pβ− + (1 − p)β− −1 .

-
(iii) As β ↓ ρ− 2 the eigenvalues converge, with λ ↓ 2 p(1 − p). This limiting value is
1

also an eigenvalue, with eigenfunction

h(n) = (1 − (p−1 − 2)n)ρ− 2 n ,

1
n ∈ X.

Proof We have already established (i). To see (ii) and (iii), ﬁrst observe that the
eigenfunction equation P h (n) = λh(n) holds for n ≥ 1 for the function h(n) = β n ,
regardless of the value of β, with λ = pβ + (1 − p)β −1 . However, except for the special
case β = 1, no single geometric series deﬁnes an eigenfunction.
To cope with the special case n = 0 we consider a pair of geometric series to cancel an
“error term.” Consider h(n) = β n − cβ− n , where β and β− are given in the proposition.
From the foregoing we do have P h (n) = λh(n) for n ≥ 1. We choose c to ensure that
this also holds with n = 0, which gives the unique value c = (1 − β− −1 )/(1 − β −1 ). The
resulting function is a scalar multiple of (20.9).
20.1. Geometric ergodicity and spectral theory 517

Finally, we note that a unique solution β− ∈ (1, ρ− 2 ) exists since the function
1

g(x) = px + (1 − p)x−1 is convex, tends to inﬁnity for x ∼ 0 and for x ∼ ∞ is equal to

unity for x = 1 and x = ρ−1 , and attains a unique minimum at x = ρ− 2 .

1

In conclusion, the M/M/1 queue admits a spectral gap, but the spectrum is not
discrete. Moreover, the set of eigenvalues depends on the choice of V , with sup{λ : λ <
1} = pr0 + (1 − p)r0−1 approaching unity as r0 increases to the upper bound ρ−1 .
We next demonstrate that a suitably strong drift condition implies a discrete spec-
trum in L2 (π) and LV∞ simultaneously.

20.1.4 Drift criterion for a discrete spectrum

We saw in Theorem 20.1.1 that V -uniform ergodicity implies a spectral gap in LV∞ .
Hence the drift condition (V4) can be regarded as a sufficient condition for a spectral
gap for an aperiodic chain.
There is a stronger drift condition that provides a minimal sufficient condition for a
discrete spectrum. It is most conveniently expressed in terms of the nonlinear generator,
defined for measurable functions G : X → R via
G
Pe
H(G) := log . (20.10)
eG
The nonlinear generator was introduced by Fleming for Markov models in continuous
time [120, 104, 116, 117, 219], following Donsker and Varadhan [96, 97].
The following drift condition is analogous to (V3). A closely related bound is one
component of the assumptions used in Donsker and Varadhan’s classic papers on the
large deviations theory for Markov models [96, 97, 98]. Condition (DV3), together with
techniques surveyed in [89], is applied in [58] to bound rates of convergence for a Markov
chain.

Drift criterion of Donsker and Varadhan

(DV3) For a function f : X → [1, ∞), a set C ∈ B(X), constants δ > 0
and b < ∞, and an extended-real-valued function V : X → [0, ∞]

H(V ) ≤ −δf + bIC . (20.11)

From the deﬁnition of the nonlinear generator the bound (20.11) can be expressed
as
P eV ≤ exp(−δf + bIC )eV .
If V is bounded on the set C, then this implies a version of (V4):
∆eV (x) ≤ −βeV (x) + b IC (x), x ∈ X,
where b = b + supx∈C eV (x) and 1 − β = supx∈X e−δ f (x) . We have β ≤ 1 − e−δ < 1
under the assumption that f ≥ 1.
518 Epilogue to the second edition

Consider for example the LSS(F, G) model under the assumptions of Proposi-
tion 7.5.3. Assume in addition that the distribution of the disturbance has a “Gaussian
tail”: E[exp(ε
Wk
2 )] < ∞ for some ε > 0. One solution to (20.11) is obtained using
the quadratic V (x) = 1 + ε0 |x|2M in which ε0 > 0 is chosen sufficiently small and the
norm |y|2M := y M y, y ∈ Rn , is defined in (12.31). In this case the function f can be
chosen with linear growth, f (x) = 1 + |x|M .
For the purposes of spectral analysis, (DV3) is used to justify truncation of the
transition kernel. Define for n ≥ 1,

P/n := I{C f (n )} P

where Cf (n) := {x : f (x) ≤ n}.

Proposition 20.1.4. Suppose that (20.11) holds with f unbounded. Assume moreover
that C ⊂ Cf (n) for all n ≥ 1 suﬃciently large. Then

|||P − P/n |||v → 0, n → ∞,

where v := eV .

Proof Under the assumptions of the proposition we have P v ≤ e−δ f v on {Cf (n)}c
for n suﬃciently large. From the deﬁnition of the sublevel set this gives

I{C f (n )}c P v ≤ e−δ n v,

so that |||P − P/n |||v = |||I{C f (n )}c P |||v ≤ e−δ n .

Wu, beginning with his 1995 work [409], has developed this truncation technique for
establishing large deviations limit theory, as well as a spectral gap in the Lp norm. For
recent bibliographies on these methods and other applications, see [141, 147].
The L∞ setting of Proposition 20.1.4 is the foundation of research with Kontoyiannis
in [219, 220]. To illustrate its application consider the discrete state space setting:

Proposition 20.1.5. Suppose that X is countable and that (20.11) holds for a coercive
function f and a ﬁnite set C. Then P has a discrete spectrum in Lv∞ .

The proof of Proposition 20.1.5 follows from Theorem 3.5 of [219]. The idea is that
/
Pn can be expressed as a finite-rank operator, of the form (20.1), and hence its spectrum
is finite.
Proposition 20.1.4 implies that P can be approximated by P/n in norm. From the
proof we obtain the explicit bound |||P − P/n |||v ≤ e−δ n . It is shown in [219] that the
spectrum of P is discrete if it can be approximated by finite-rank kernels in this fashion.

20.1.5 Multiplicative ergodic theory and large deviations

n −1
Suppose that F : X → R is a measurable function, denote Sn (F ) := k =0 F (Φk ), and
consider the expectation 6 7
λn ,x (F ) = Ex exp Sn (F ) . (20.12)
20.1. Geometric ergodicity and spectral theory 519

Multiplicative ergodic theory is the study of the asymptotics of this quantity for large
n. Under suitable conditions on the chain and the function F the multiplicative ergodic
theorem holds,
1
lim log λn ,x (F ) = Λ(F ),
n →∞ n

where the limiting log-moment generating function Λ(F ) is independent of the initial
condition x.
To place this problem within the context of spectral theory, introduce the positive
kernel Pf (x, dy) = f (x)P (x, dy), with f = eF . The iterates Pfn are deﬁned in the usual
way and have the representation
6 7
Pfn (x, A) = Ex exp Sn (F ) I{Φn ∈ A} . (20.13)

Setting A = X gives λn ,x (F ) = Pfn (x, X). This is known as the Feynman–Kac semi-
group, though this terminology is usually reserved for processes in continuous time.
Recall that the kernel Uh was defined based on a power series with respect to this
semi-group with f = 1 − h ≥ 0 (see (12.14)).
When the limit defining Λ(F ) exists, we typically have Λ(F ) = log(λ), where λ is
the largest eigenvalue of Pf (known as the Perron–Frobenius eigenvalue) [303, 297, 298].
Foundations of Perron–Frobenius theory go back to Tweedie’s earliest work on positive
operators [394, 395], following the work of Vere-Jones and Seneta for positive matrices
and Markov chains on a countable or finite state space [405, 348].
The spectrum of Pf is defined precisely as for a probabilistic kernel. Criteria for a
spectral gap are developed in [17, 218, 219], along with multiplicative ergodic theorems.
These take the form
6 7
lim λ−n λn ,x (F ) = lim Ex exp Sn (F ) − nΛ(F ) = fˇ(x), x ∈ X, (20.14)
n →∞ n →∞

where the limit fˇ is a Perron–Frobenius eigenfunction associated with the eigenvalue λ,

so that Pf fˇ = λfˇ. In terms of the nonlinear generator (20.10), the eigenvector equation
is a multiplicative version of Poisson’s equation,

H(F̌ ) = F̌ − F + Λ(F ) ,

where F = log(f ) and F̌ = log(fˇ).

The multiplicative ergodic theorem holds for bounded functions with suﬃciently
small L∞ norm if the chain is V -uniformly ergodic [218], and for a class of unbounded
functions under the drift criterion (DV3) of Donsker and Varadhan [17, 219]. The proof
of the multiplicative ergodic theorem (20.14) is by reduction to the probabilistic setting.
First the eigenfunction is constructed, and from this a twisted kernel is deﬁned by

f (x)
P̌ (x, A) := P (x, dy)fˇ(y).
λfˇ(x) y ∈A

In several diﬀerent settings, it is shown in [218, 17, 219] that the chain with this tran-
sition kernel is geometrically ergodic, and this implies that the convergence in (20.14)
holds at a geometric rate.
520 Epilogue to the second edition

The convergence of the log-moment generating functions is used in [218, 17, 219] to
prove large deviations estimates for the partial sums Sn – see also [84, 265, 141] and
their references. The simplest estimates take the following form: for c > π(F ),

lim log Pπ {Sn ≥ nc} = −IF (c) , (20.15)

n →∞

where IF is the Fenchel–Legendre transform

IF (c) = sup θc − Λ(θF ) .
θ ∈R

The limit (20.15) is established for geometrically ergodic models in [218] provided F is
bounded and c > π(F ) is close enough to the mean π(F ). An elegant bound on the
error probability Pπ {Sn ≥ nc} is obtained in [140] for uniformly ergodic chains, similar
to the coupling bound in Theorem 16.2.4.
Another approach to obtaining bounds on the rate of convergence as well as large
deviations asymptotics for Markov chains is based on Sobolev inequalities and their gen-
eralizations [89, 407]. The relationship between these conditions and (DV3) is explored
in [58].

20.1.6 Quasi-stationarity
Metastability refers to the presence of near stationary behavior of a process during a
time period in which it remains in some restricted region of the state space. Quasi-
stationarity has the following precise deﬁnition: a set M is quasi-stationary if there
exists a probability measure πM satisfying

lim Px {Φn ∈ A | τM > n} = πM {A}, x ∈ M, A ∈ B.

n →∞

A special case of the Feynman–Kac semi-group is obtained with f = IM , in which

case we denote Pf = PM . The semi-group has a diﬀerent interpretation in this case:
for any time n, state x, and set A we have
n
PM (x, A) = Px {Φn ∈ A and τM > n}. (20.16)

It follows that quasi-stationarity is equivalent to the ratio limit theorem

n
PM (x, A)
lim = πM {A}, x ∈ M, A ∈ B.
n →∞ P n (x, X)
M

The existence of a limit can be established exactly as in the proof of (20.14), provided
a twisted kernel is ergodic [118].
The analysis carries over to processes in continuous time. It is shown in [166] that
the twisted semi-group is exponentially ergodic for a diﬀusion process when the set M is
taken to be a connected component of {x : h(x) = 0}, with h an eigenfunction. Further
analysis reveals that the exit time from M is approximately exponentially distributed
with mean |Λ|−1 , where Λ < 0 is the corresponding eigenvalue for the Markovian
generator. Generalizations to the case in which Λ is complex are contained in [276].
20.2. Simulation and MCMC 521

20.2 Simulation and MCMC

Suppose that Φ is a Markov chain that we can simulate on a computer. A function
f : X → R is given, and we wish to compute estimates of the steady state mean π(f ).
The usual Monte Carlo method constructs estimates recursively through the sample
path averages
n −1
1
π̂n (f ) := f (Φk ). (20.17)
n
k =0

In some applications the probability measure π is given, and the question is how to
construct a Markov chain with invariant distribution π. Answers to this question are
contained in the Markov chain Monte Carlo (MCMC) literature.

20.2.1 Variance reduction

Performance of the estimator (20.17) is naturally addressed through the CLT, which
can be expressed
π̂n (f ) ≈ π(f ) + n− 2 γg W ,
1
(20.18)
where W is an N (0, 1) random variable and the approximation is in distribution. The
asymptotic variance γg2 is the subject of Section 17.4.3.
There are other estimators of π(f ) that satisfy a CLT with lower variance. One
class of estimators is based on the control variate method [10], in which a zero-mean
process {Wk } is introduced in (20.17):
n −1
1
π̂nC V (f ) := [f (Φk ) − Wk ]. (20.19)
n
k =0

The question then is how to construct a process with zero mean, and one for which the
asymptotic variance is reduced.
Henderson and Glynn introduced a collection of techniques for this purpose in [158,
160] – see also the recent monograph [10]. Suppose that the chain is V -uniformly
ergodic, ﬁx a function g ∈ LV∞ , and deﬁne

Wt := g(Φt ) − P g (Φt ) . (20.20)

Although the mean of Wt may not be zero for arbitrary initial conditions, its steady
state mean is always zero:

Eπ [Wt ] = − ∆g (x) π(dx) = π(g − P g) = 0.

See Theorem 17.7.2 for relaxed assumptions on g under which this limit holds.
With proper choice of g the variance of the resulting estimate is reduced. In fact,
under mild conditions the control variate can be constructed so that the asymptotic
variance of the resulting estimator is zero. Suppose that f 2 ∈ LV∞ , and set g = fˆ
equal to a solution to Poisson’s equation, P h = h − f + π(f ). In this case we have
Wt = f (Φt ) − π(f ), so that π̂nC V (f ) = π(f ) for each n ≥ 1.
522 Epilogue to the second edition

Average Queue-Length Estimates

18
Without variance reduction

16 Variance reduction
using shadow function
14

10
0 1 2 3 4 5 6 7 8 9 10
4
x 10

Figure 20.1: Results for a simulation run of length 100,000 steps in a network model.
The dashed line represents the running average cost using the standard estimator. The
solid line represents the running average cost for the estimator (20.19).

Of course, if we can solve Poisson’s equation then we have computed the steady state
mean, so there is no reason to simulate. Henderson and Glynn propose approximate
solutions to reduce the variance in the standard estimator. See the survey by Glynn
and Szechtman [138], and results specialized to network models in [159, 265, 267]. The
function g − P g appearing in (20.20) is called a shadow function in [159, 267] since it
is meant to eclipse the function f to be simulated.
Figure 20.1 shows a comparison of the standard estimator and the estimator (20.19)
for a network model. Details on the model and the construction of the control variate
can be found in [159] and [267, Chapter 11]. The main idea is to use the control variate
(20.20) in which g is a ﬂuid value function that approximates the solution to Poisson’s
equation. The introduction of the zero-mean term Wt results in a 100-fold reduction in
variance over the standard estimator in this example.

20.2.2 Markov chain Monte Carlo (MCMC)

Given a target density π on X, the Metropolis–Hastings algorithm constructs a Markov
chain X with stationary distribution π. To simplify the description of this technique
we restrict to the case in which X = R, and π possesses a density which we also denote
by π.
At each iteration, and conditional on Xk −1 = x, it proposes a candidate new value
Yk according to a transition density q(x, ·). The new value of the Markov chain Xk is
chosen via the following mechanism. The value Xk = Yk is accepted with conditional
probability given by
$ π(y)q(y, x) %
α(x, y) = min 1, ; (20.21)
π(x)q(x, y)
otherwise, the previous state value Xk = Xk −1 is retained. The resulting Markov chain
satisﬁes the detailed balance equations (20.5) with invariant measure π.
Virtually any transition density q( · , · ) can be used in this construction, although
some may be more useful then others [327]. There are two generic and popular choices:
a) the independence sampler, where q(x, · ) ∼ p( · ) for some ﬁxed probability distribu-
tion p on R, independent of x; and b) the Metropolis–Hastings algorithm, where q( · , · )
20.3. Continuous time models 523

is the transition density of a symmetric random walk, q(x, y) = p(|y − x|), x, y ∈ R,

where p is again fixed.
The papers [256, 179] set out general conditions relating properties of the distri-
bution used in the simulation with geometric ergodicity of the algorithm. It is shown
that:
(i) The independence sampler is uniformly ergodic if and only if supy (π(y)/p(y)) <
∞. Where this condition fails, the independence sampler fails to be geometrically
ergodic. This gives clear, practical guidance about the construction of proposal
distributions. It also leads to a powerful and general method for constructing
perfect simulation algorithms [49].
(ii) The Metropolis–Hastings algorithm is rarely uniformly ergodic for unbounded
state spaces. The algorithm is geometrically ergodic if and only if the tails of the
target density π are bounded by ae−b|x| for positive constants a and b.
The Geometric Ergodic Theorem is used to analyze the Metropolis–Hastings al-
gorithm in [330, 331, 79]. These results are generalized to multidimensional models
in [335], with surprising consequences for apparently innocuous target densities. The
paper [332] contains a survey of MCMC for general state space Markov chains.
A new approach to MCMC analysis was introduced in [125]. This work is based
on a fluid model constructed as an approximation to the dynamics of the chain for a
large initial condition, similar to the way in which fluid models are used in the theory
of stochastic networks (see the commentary for Chapter 19 and [267]). The MCMC
algorithm is stable, in the sense that the algorithm is ergodic, provided the fluid model
is stable. The structure of the fluid model also provides insight regarding the dynamics
of the MCMC algorithm.

20.2.3 Machine learning

Much of the machine learning literature is based on Markovian models [29, 379, 217,
246, 389]. A theme in this literature is the approximation of value functions such as
Poisson’s equation, which we have already noted arises as the relative value function in
average-cost optimal control. The ﬁnal chapter of [267] explains how the “TD learning”
technique for value function approximation can be analyzed within the framework of
this book.
Many of the recursive algorithms found in this literature can be regarded as in-
stances of stochastic approximation [39], which is itself a generalization of the Monte
Carlo method for estimation. It would be worthwhile to search for generalizations of
the control variate technique to accelerate algorithms found in the machine learning
literature.

20.3 Continuous time models

We wrote in the preface to the ﬁrst edition that we had not yet adjusted to the fact that
a similar development of the continuous time theory clearly needs to be written next.
Although our interest in continuous time models grew in the years following publication
of the book, we never found the time to write this sequel.
524 Epilogue to the second edition

One of our earliest contributions to Markov processes in continuous time was at

a workshop honoring the contributions of Wolfgang Doeblin to the field of stochastic
processes. Entitled 50 Years after Doeblin: Developments in the Theory of Markov
Chains, Markov Processes and Sums of Random Variables, it was held in 1991, just
before our book went to print. It is remarkable that before his death in 1940, at the
age of just 25, he was able to make such influential contributions to these fields. We
can thank Doeblin for the coupling method [94], his minorization condition [93, 95], as
well as the Doeblin decomposition [95].
Even more remarkable is that in 1991 his most important achievements remained
sealed and unknown to any living person. Quoting from the abstract of a recent movie
on Doeblin’s life [153], the full measure of his mathematical stature became apparent
only in 2000 when the sealed envelope containing his construction of diffusion processes
in terms of a time change of Brownian motion was finally opened, 60 years after it
was sent to the Academy of Sciences in Paris. Among the ideas contained in these
notes, written by Doeblin on the front lines of France during World War II, is the
one-dimensional equation
t t
X(t) = x + a(s, X(s)) ds + β σ 2 (s, X(s)) ds (20.22)
0 0

with β Brownian motion on R. This is a sample path representation of the solution to

a stochastic diﬀerential equation

dX(t) = a(t, X(t)) dt + σ 2 (t, X(t)) dβ(t) , (20.23)

where the stochastic differential equation is defined in the usual L2 sense, introduced
by Itô several years after Doeblin’s early death. More on this history can be found in
[153], and scientific details are contained in [411].
In this section we provide highlights of the general theory of ψ-irreducible Markov
models in continuous time, without examples and without discussion of specific model
classes such as the Doeblin–Itô stochastic differential equation. In particular, left out
are the fruits of Richard’s collaboration with Gareth Roberts on Langevin algorithms
[334], which led to fundamental results on the stability of Langevin diffusions and their
discretizations [375, 376].
The focus here is on methods for translating theory from discrete to continuous time.
This is made possible through the resolvent kernels, and associated resolvent equations,
as described in one of our contributions to the 1991 Blaubeuren meeting [278].

20.3.1 Structure and sampled chains

If time is continuous, then we can still deﬁne a Markovian semi-group {P t : t ∈ R}
via conditional probabilities, exactly as in the discrete time setting. There are only a
few technicalities that must be addressed. The ﬁrst is to rule out “explosion”, and for
this we restrict to a topological state space. The process is called non-explosive if there
exists a sequence of compact sets K1 ⊂ K2 ⊂ · · · ⊂ Kn such that X = ∪Kn , and for
each initial condition and each time T ,
(
lim Px σK nc < T } = 0,
n →∞
20.3. Continuous time models 525

where the hitting times are deﬁned as in discrete time by

σA = inf{t ≥ 0 : Φt ∈ A}, τA = inf{t ≥ 1 : Φt ∈ A}.
This minimal condition is assumed throughout this section, and throughout most of the
literature.
When considering sample path properties of the process, it is often desirable to
have Φτ K ∈ K when K is closed. This is true provided the sample paths of Φ are
right continuous. The Markov process is called CADLAG if this condition holds and
in addition the sample paths have left hand limits with probability one at any point of
discontinuity. The acronym is taken from the French continue à droite, limite à gauche.
The CADLAG assumption will be assumed throughout.
Translation of results from discrete time to the continuous time domain is made
possible through sampling. We consider the δ-skeleton {Φδ n : n ≥ 0} for arbitrary
δ > 0, and we also sample randomly as in Section 5.5.1.
For a given α > 0, the resolvent is deﬁned as the Laplace transform
∞
Uα := e−α t P t dt . (20.24)
0

Random sampling provides a representation for this kernel. Let {Tk } denote a Poisson
process with rate α, independent of the process Φ. That is, T0 = 0, and Tk +1 =
Tk + α−1 Ak +1 for k ≥ 0, where A is an i.i.d. sequence, independent of Φ, with standard
exponential distribution. The sampled chain is deﬁned exactly as in discrete time by
Φαk := ΦT k , k ≥ 0.
The transition kernel for the Markov chain Φα coincides with the normalized resolvent
αUα .
The deﬁnitions of irreducibility and the petiteness property of sets are then based
on properties of the sampled chains:

ψ-Irreducibility for a Markov process

We say that the Markov process is

(i) ψ-irreducible if Φα is ψ-irreducible for some α > 0;

(ii) positive recurrent if some Φα is positive recurrent.

(iii) A ψ-irreducible Markov process is called aperiodic if a δ-skeleton is

ψ-irreducible for some δ > 0.

The deﬁnition of aperiodicity deserves explanation. For the sake of illustration,

suppose that some δ-skeleton is Harris ergodic: there is a unique invariant measure π
satisfying, for any initial distribution µ,
lim
µP n δ − π
= 0. (20.25)
n →∞
526 Epilogue to the second edition

For an initial condition x and t ∈ (0, δ) we can set µ = P t (x, · ), giving µP n δ =

P t+n δ (x, · ). Hence
P t+n δ (x, · ) − π
→ 0, n → ∞, as well. The total variation norm

P t (x, · ) − π
is non-increasing in t, which shows that Φ is ergodic:
lim
P t (x, · ) − π
= 0. (20.26)
t→∞

20.3.2 Resolvent equations and drift

To express Lyapunov criteria for stability, a natural analogue of the drift operator ∆
deﬁned in (8.1) is the generator, deﬁned for a function V by

1
DV (x) := lim P t (x, dy)V (y) − V (x) , x ∈ X. (20.27)
t→0 t

The existence of the limit is guaranteed under various conditions on the model and the
function V .
It is more convenient to work with a relaxed definition. Recall that in the devel-
opment of limit theory in Section 17.4 we relied upon the construction of martingales,
such as {Mn (g)} defined in (17.43). For a given function h : X → R define g = ∆h,
where ∆ = P − I is the drift operator, and consider the stochastic process

n
Mn := h(Φn ) − h(Φ0 ) − g(Φk ), n ≥ 1.
k =1

This process is a martingale under general conditions on the function h. A suﬃcient

condition is ﬁniteness of the n-step expectation P n |h| (x) for each n and x.
The drift operator is nothing but a discrete time analogue of the generator (20.27).
This suggests the following deﬁnition: a function V is in the domain of the extended
generator if there exists a function g such that the process below is a local martingale
for each initial condition of Φ,
T
MT := h(ΦT ) − h(Φ0 ) − g(Φs ) ds, T ≥ 0. (20.28)
0

This means that there exists a sequence of stopping times {τn } such that the stopped
process {Mt∧τ n : t ≥ 0} is a martingale, for each n ≥ 1, and τn ↑ ∞ a.s. as n → ∞. We
let A denote the extended generator, and denote Af = g when M is a local martingale.
Two resolvent equations are given in the following. The second equation (20.30)
implies that the domain of the extended generator includes the range of the resolvent.
Theorem 20.3.1 (Resolvent equations in continuous time). If {P t } is a Markovian
semi-group, then the following hold:
(i) For any pair of positive constants β and α
Uα = Uβ + (β − α)Uβ Uα = Uβ + (β − α)Uα Uβ . (20.29)

(ii) For each α > 0 and bounded measurable function g : X → R, the function Uα g is
in the domain of the extended generator with
AUα g = αUα g − g. (20.30)

20.3. Continuous time models 527

20.3.3 Ergodic theory

We have already seen how to translate ergodic theorems from discrete to continuous
time in the implication (20.25) ⇒ (20.26). We now ask, how can we establish ergodicity
of a skeleton based on primitive properties of the continuous time process?
By applying the diﬀerential resolvent equation (20.30) we can translate a Lyapunov
function for the sampled chain Φα to the original process, and vice versa. The mar-
tingale characterization of the generator (20.28) can be used to translate a Lyapunov
function from the continuous time process to a δ-skeleton. These ideas are the basis of
the development in [278, 279, 280, 101, 218, 219].
Theorem 20.3.2 is one such result – the extension of the Geometric Ergodic Theorem
to continuous time. The drift condition (V4) is deﬁned exactly as in discrete time:

Exponential drift towards C

There exist an extended-real-valued function V : X → [1, ∞], a measurable
set C, and constants β > 0, b < ∞, such that

AV (x) ≤ −βV (x) + bIC (x), x ∈ X. (20.31)

A set C is called petite if it is petite for some sampled chain: for a probability
distribution a on R+ , all x, and all B ∈ B,
∞
Ka (x, B) := P t (x, B) a(dt) ≥ νa (B),
0

where νa is a non-trivial measure on B(X). In (20.31) we typically take C to be petite.

A proof of Theorem 20.3.2 can be found in [101] – see [218] for reﬁnements and
extensions.

Theorem 20.3.2 (Exponential Ergodic Theorem). Suppose that the process Φ is ψ-

irreducible and aperiodic. Then the following three conditions are equivalent:

(i) The chain Φ is positive recurrent with invariant probability measure π, and there
exists some ν-petite set C ∈ B+ (X), ρC < 1, MC < ∞, and P ∞ (C) > 0 such that
for all x ∈ C
|P t (x, C) − P ∞ (C)| ≤ MC ρtC . (20.32)

(ii) There exists a closed petite set C ∈ B(X) and κ > 1 such that

sup Ex [κτ C ] < ∞. (20.33)

x∈C

(iii) There exists a closed petite set C, constants b < ∞, β > 0 and a function V ≥ 1
ﬁnite at some one x0 ∈ X satisfying (20.31).
528 Epilogue to the second edition

Any of these three conditions imply that the set SV = {x : V (x) < ∞} is absorbing and
full, where V is any solution to (20.31) satisfying the conditions of (iii), and there then
exist constants r > 1, R < ∞ such that for any x ∈ SV ,

|||P t − 1 ⊗ π|||V ≤ Rr−t , t ≥ 0. (20.34)

20.3.4 Multiplicative ergodic theory and large deviations

The generalized resolvent Uh defined in (12.13) also has a continuous time analogue.
Let H be a measurable function on X, and set h = eH . The Feynman–Kac semi-group
is defined for x ∈ X, A ∈ B(X), and t ∈ R+ in analogy with the discrete time formula
(20.13),
$ t %
t
Ph (x, A) := Ex exp H(Φ(s)) ds IA (Φ(t)) . (20.35)
0
Integrating the semi-group defines the resolvent kernel,
∞
Uh := Pht dt . (20.36)
0

This reduces to the simple resolvent kernel Uh = Uα when h = e−α is independent of x.

The diﬀerential resolvent equation (20.30) is often expressed [αI − A]Uα g = g, or

Uα = [αI − A]−1 .

The generalized resolvent has an analogous formal representation,

Uh = [I−H − A]−1 ,

where IF is the multiplication operator deﬁned by IF g = F g for any function g on X.

See [278, 295] for a formal treatment, and [218, 219] for application to the theory of
large deviations.
Part IV

APPENDICES
Appendices

Despite our best efforts, we understand that the scope of this book inevitably leads
to the potential for confusion in readers new to the subject, especially in view of the
variety of approaches to stability which we have given, the many related and perhaps
(until frequently used) forgettable versions of the “Foster–Lyapunov” drift criteria, and
the sometimes widely separated conditions on the various models which are introduced
throughout the book.
At the risk of repetition, we therefore gather together in this Appendix several
discussions which we hope will assist in giving both the big picture, and a detailed
illustration of how the structural results developed in this book may be applied in
different contexts.
We first give a succinct series of equivalences between and implications of the various
classifications we have defined, as a quick “mud map” to where we have been. In
particular, this should help to differentiate between those stability conditions which are
“almost” the same.
Secondly, we list together the drift conditions, in slightly abbreviated form, together
with references to their introduction and the key theorems which prove that they are
indeed criteria for different forms of stability and instability. As a guide to their usage
we then review the analysis of one specific model (the scalar threshold autoregression,
or SETAR model).
This model incorporates a number of sub-models (specifically, random walks and
scalar linear models) which we have already analyzed individually: thus, although not
the most complex model available, the SETAR model serves to illustrate many of the
technical steps needed to convert elegant theory into practical use in a number of fields
of application. The scalar SETAR model also has the distinct advantage that under the
finite second moment conditions we impose, it can be analyzed fully, with a complete cat-
egorization of its parameter space to place each model into an appropriate stability class.
Thirdly, we give a glossary of the assumptions employed in each of the various models
we have analyzed. This list is not completely self-contained: to do this would extend
repetition beyond reasonable bounds. However, our experience is that, when looking at
a multiply analyzed model, one can run out of hands with which to hold pages open,
so we trust that this recapitulation will serve our readers well.
We conclude with a short collection of mathematical results which underpin and are
used in proving results throughout the book: these are intended to render the book
self-contained, but make no pretence at giving any more comprehensive overview of the
areas of measure theory, analysis, topology and even number theory which contribute
to the overall development of the theory of general Markov chains.

531
Appendix A

Mud maps

The wide variety of approaches to and definitions of stability can be confusing. Unfor-
tunately, if one insists on non-countable spaces there is little that can be done about
the occasions when two definitions are “almost the same” except to try and delineate
the differences.
Here then is an overview of the structure of Markov chains we have developed, at
least for the class of chains on which we have concentrated, namely

I := {Φ : Φ is ψ-irreducible for some ψ}.

We have classiﬁed chains in I using three diﬀerent but (almost) equivalent properties:
P n -properties: that is, direct properties of the transition laws P n ;
τ -properties: properties couched in terms of the hitting times τA for appropriate
sets A;
drift properties: properties using one-step increments of the form of ∆V for some
function V .

A.1 Recurrence versus transience

The ﬁrst fundamental dichotomy (Chapter 8) is

I =T +R

where T denotes the class of transient chains and R denotes the class of recurrent
chains. This is deﬁned as a dichotomy through a P n -property in Theorem 8.0.1:

532
A.1. Recurrence versus transience 533

P n -deﬁnition of recurrent and transient chains

Φ∈R ⇐⇒ P n (x, A) = ∞, x ∈ X, A ∈ B + (X),
n

Φ∈T ⇐⇒ P n (x, Aj ) ≤ Mj < ∞, x ∈ X, X = ∪Aj .
n

A recurrent chain is “almost” a Harris chain (Chapter 9). Deﬁne H ⊆ R by the

Harris τ -property

Φ ∈ H ⇐⇒ Px (τA < ∞) ≡ 1, x ∈ X, A ∈ B+ (X).

If Φ ∈ R, then (Theorem 9.0.1) there is a full absorbing set (a maximal Harris set) H
such that

X=H ∪N

and Φ can be restricted in a unique way to a chain Φ ∈ H on the set H.

The τ -classiﬁcation of T and R can be made stronger in terms of

Q(x, A) = Px (Φ ∈ A i.o.).

We have from Theorem 8.0.1 and Theorem 9.0.1:

τ -Classiﬁcation of recurrent and transient chains

Φ ∈ R ⇐⇒ Q(x, A) = 1, x ∈ H, A ∈ B + (X),

Φ∈T ⇐⇒ Q(x, A) = 0, x ∈ X, A petite.

If indeed Φ ∈ H, then the ﬁrst of these holds for all x since H = X.

The drift classiﬁcation we have derived is then (Theorem 9.1.8 and Theorem 8.0.2):
534 Mud maps

Drift classiﬁcation of recurrent and transient chains

Φ∈H ⇐= ∆V (x) ≤ 0, x ∈ Cc ,
C petite, V unbounded oﬀ petite sets;

Φ∈T ⇐⇒ ∆V (x) ≥ 0, x ∈ Cc ,
C petite, V bounded and increasing oﬀ C.

There is thus only one gap in these classiﬁcations, namely the actual equivalence of
the drift condition for recurrence. We have shown (Theorem 9.4.2) that such equivalence
holds for Feller (including countable space) chains.
Finally, it is valuable in practice in a topological context to recall that for T-chains,
which (Proposition 6.2.8) include all Feller chains in I such that supp ψ has non-empty
interior,

(i) if Φ is in I, then (Theorem 6.2.5)

Φ is a T-chain ⇐⇒ every compact set is petite;

(ii) if Φ is a T-chain in I, then (Theorem 9.2.2)

Φ ∈ H ⇐⇒ Φ is non-evanescent;

that is, Harris chains in this case do not leave compact sets forever.

A.2 Positivity versus nullity

The second fundamental dichotomy (Chapter 10) is

I =P +N

where N denotes the set of null chains and P ⊆ R denotes the set of positive chains.
Since every transient chain is a fortiori null, this is in any real sense a breakup of R
rather than the complete set I, and is deﬁned in Chapter 10 through a P n -property:
A.2. Positivity versus nullity 535

First P n -deﬁnition of positive and null chains

Φ∈P ⇐⇒ π(A) = π(dy)P n (y, A), A ∈ B(X),

where π is a probability measure with π(X) = 1;

Φ∈N ⇐⇒ µ(A) ≥ µ(dy)P n (y, A), A ∈ B(X),

where µ is a measure with µ(X) = ∞.

A positive chain is again “almost” a regular chain. Deﬁne the collection S ⊆ P by

the τ -property of regularity

Φ ∈ S ⇐⇒ sup Ex [τA ] < ∞, A ∈ B+ (X), X = ∪Cj .

x∈C j

If Φ ∈ P, then (Theorem 11.0.1) there is a full absorbing set S such that

X=S∪N

and Φ can be restricted in a unique way to a regular chain Φ ∈ S on the set S.

The τ -classiﬁcation of P and N can be made stronger, in almost exact analogy to
the recurrence classiﬁcation above. Theorem 11.0.1 shows:

τ -Classiﬁcation of positive and null chains

Φ∈P ⇐⇒ sup Ex [τA ] < ∞, A ∈ B + (X), S = ∪Cj ,

x∈C j

Φ∈N ⇐⇒ π(dx)Ex [τC ] = ∞, C ∈ B + (X).
C

Again, if Φ ∈ S, then the ﬁrst of these holds with S = X. We might expect that

Φ∈N ⇐⇒ inf Ex [τC ] = ∞, some C ∈ B + (X) :

x∈C

clearly the inﬁnite expected hitting times will imply the chain is not positive, but the
converse appears to be so far unknown except when C is an atom.
The drift classiﬁcation is:
536 Mud maps

Drift classiﬁcation of positive and null chains

Φ∈S ⇐⇒ ∆V (x) ≤ −1 + bIC , x ∈ X, C petite;



∆V (x) ≥ 0, x ∈ Cc ,

Φ∈N ⇐= P (x, dy)|V (y) − V (x)| bounded,


C petite, V increasing oﬀ C.

There is again one open question in these classiﬁcations, namely that of the equiv-
alence or otherwise of the drift condition for nullity. We do not know how close this is
to complete.
In a topological context we know again (see Chapter 18) that for T-chains, there
is a further stability property completely equivalent to positivity: if Φ is an aperiodic
T-chain in R then

Φ ∈ P ⇐⇒ {P n (x, · )} is tight, a.e. x ∈ X.

Both the P n and τ properties are essentially properties involving the whole trajectory
of the chain. The drift conditions, and in particular their suﬃciency for classiﬁcation,
are powerful practical tools of analysis because they involve only the one-step movement
of the chain: this is summarized further in Section B.1.

A.3 Convergence properties

There is a further P n -description of P and N , closer to the recurrence/transience di-
chotomy, which is developed in Chapter 18, and which is the classical starting point in
countable chain theory.

Second P n -deﬁnition of positive and null chains

Φ∈P ⇐⇒ lim sup P n (x, A) > 0, x ∈ X, A ∈ B+ (X);

n →∞
*
Φ∈N ⇐⇒ lim P n (x, Bj ) = 0, x ∈ X, X = Bj .
n →∞

However, these are weak categorizations of the types of convergence which hold for
A.3. Convergence properties 537

these chains. For aperiodic chains we have (Theorem 13.0.1)

H∩P =E

where the class E is the set of ergodic chains such that

Φ ∈ E ⇐⇒ lim
P n (x, · ) − π
= 0, x ∈ X.
n →∞

The properties of E are delineated further in Part III, and in particular in our next
appendix we summarize criteria (drift conditions) for classifying sub-classes of E.
Appendix B

Testing for stability

B.1 Glossary of drift conditions

In this section we collect together the various “Foster–Lyapunov” or “drift” criteria
which we have developed for the testing of various forms of stability described in Ap-
pendix A.
In using each of these drift conditions, one is required to ﬁnd two chain-related
characteristics:

(i) a suitable non-negative “test function” which is always denoted V ;

(ii) a suitable “test set” which is always denoted C.

Typically, for well-behaved chains we are able without great diﬃculty to give conditions
showing a set C to be a “test set”; these sets are usually petite, or for T-chains, compact.
The choice of V , on the other hand, is an art form and depends strongly on intuition
regarding the movement of the chain.

The recurrence criterion (V1)

The weakest stability condition was introduced on page 189. Its use in general requires
the existence of a function V , unbounded oﬀ petite sets, or coercive on topological
spaces, and a petite or compact set C, with

∆V (x) ≤ 0, x ∈ Cc . (8.42)

Several theorems show this to be an appropriate condition for various forms of recur-
rence, including Theorem 8.4.3, Theorem 9.4.1, and Theorem 12.3.3.

The positivity/regularity criterion (V2)

The second condition (often known as Foster’s condition) was introduced on page 263.
We require for some constant b < ∞

∆V (x) ≤ −1 + bIC (x), x ∈ X, (11.17)

538
B.1. Glossary of drift conditions 539

where V is allowed to be an extended-real-valued function V : X → [0, ∞] provided it is

ﬁnite at some point in X, and C is typically petite or compact. Theorems which show
this to be an appropriate condition for various forms of regularity, existence of invariant
measures, positive recurrence and ergodicity are Theorem 11.3.4, Theorem 11.3.11,
Theorem 11.3.15, Theorem 12.3.4, Theorem 12.4.5 and Theorem 13.0.1.

The f -positivity/f -regularity criterion (V3)

The third condition was introduced on page 343. Here again V is an extended-real-
valued function V : X → [0, ∞] ﬁnite at some point in X and C is typically petite or
compact; and we require for some function f : X → [1, ∞), and a constant b < ∞,

∆V (x) ≤ −f (x) + bIC (x), x ∈ X. (14.16)

Various theorems which show this to be an appropriate condition for various forms of f -
regularity, existence of f -moments of π and f -ergodicity and even sample path results
such as the Central Limit Theorem and the Law of the Iterated Logarithm include
Theorem 14.2.3, Theorem 14.2.6, Theorem 14.3.7 and Theorem 17.5.3.
See also the drift criterion of Jarner and Roberts described on page 360.

The V -uniform/V -geometric ergodicity criterion (V4)

The strongest stability condition was introduced on page 376. Again V is an extended-
real-valued function V : X → [1, ∞] ﬁnite at some point in X, and for constants β > 0
and b < ∞,
∆V (x) ≤ −βV (x) + bIC (x), x ∈ X. (15.28)
Critical theorems which show this to be an appropriate condition for various forms of
V -geometric regularity, geometric ergodicity, V -uniform ergodicity are Theorem 15.2.6
and Theorem 16.1.2. We also showed in Lemma 15.2.8 that (V4) holds with a petite
set C if and only if V is unbounded oﬀ petite sets and

P V ≤ λV + L (15.35)

holds for some λ < 1, L < ∞, and this is a frequently used alternative form.
Results in Section 20.1 show that (V4) characterizes the existence of a spectral
gap for the transition kernel. The stronger drift criterion of Donsker and Varadhan
introduced on page 517 is closely related to a discrete spectrum for P .

The transience/nullity criterion

Finally, we introduced conditions for instability. These involve the relation

∆V (x) ≥ 0, x ∈ Cc , (8.41)

which was introduced on both page 278 and page 188.

Theorems which show this to be an appropriate condition for various forms of non-
positivity or nullity include Theorem 11.5.1: typically these require V to have bounded
increments in expectation, and C to be a sublevel set of V .
540 Testing for stability

Exactly the same drift criterion can also be shown to give an appropriate condition
for various forms of transience, as in Theorem 8.4.2: these require, typically, that V be
bounded, and C be a sublevel set of V with both C and C c in B + (X).
These criteria form the basis for classiﬁcation of the chains we have considered into
the various stability classes, and despite their simplicity they appear to work well across
a great range of cases. It is our experience that in the use of the two commonest criteria
(V2) and (V4) for models on Rk , quadratic forms are the most useful to use, although
the choice of a suitable form is not always trivial.
Finally, we mention that in some cases where identifying the test function is diﬃcult
we may need greater subtlety: the generalizations in Chapter 19 then provide a number
of other methods of approach.

B.2 The scalar SETAR model: a complete classiﬁca-

tion
In this section we summarize, for illustration, the use of these drift conditions in practice
for scalar first order SETAR models: recall that these are piecewise linear models
satisfying
Xn = φ(j) + θ(j)Xn −1 + Wn (j), Xn −1 ∈ Rj ,
where −∞ = r0 < r1 < · · · < rM = ∞ and Rj = (rj −1 , rj ]; for each j, the noise
variables {Wn (j)} form an i.i.d. zero-mean sequence independent of {Wn (i)} for i = j.
We assume (for convenience of exposition) that the following conditions hold on the
noise distributions:
(i) each {Wn (i)} has a density positive on the whole real line, and
(ii) the variances of the noise distributions for the two end intervals are finite.
Neither of these conditions is necessary for what follows, although weakening them
makes proofs rather more difficult.
θ(M )
θ(M ) = 1

θ(1)

θ(1)θ(M ) = 1

Figure B.1: The SETAR model: stability classiﬁcation of (θ(1), θ(M ))-space. The
model is regular in the shaded “interior” area (11.36), and transient in the unshaded
“exterior” (9.46), (9.47) and (9.50). The boundaries are in the ﬁgures below.

In Figure B.1, Figure B.2 and Figure B.3 we depict the parameter space in terms
of φ(1), θ(1), φ(M ), and θ(M ). The results we have proved show that in the “interior”
B.2. The scalar SETAR model: a complete classiﬁcation 541

and “boundary” areas, the SETAR model is Harris recurrent; and it is transient in the
“exterior” of the parameter space. In accordance with intuition, the model is null on
the boundaries themselves, and regular (and indeed, in this case, geometrically ergodic)
in the strict interior of the parameter space.

φ(M )

φ(1) φ(1)

φ(M )

φ(1)

Figure B.2: The SETAR model: stability classiﬁcation of (φ(1), φ(M ))-space in the
regions (θ(M ) = 1; θ(1) ≤ 1) and (θ(M ) ≤ 1; θ(1) = 1). The model is regular in
the shaded “interior” areas, which are (clockwise starting with the plot on the far left)
(11.38), (11.37) and (11.39); transient in the unshaded “exterior” (9.49), (9.48); and null
recurrent on the “margins” described clockwise by (11.45), (11.46) and (11.47)–(11.48).

The steps taken to carry out this classiﬁcation form a template for analyzing many
models, which is our reason for reproducing them in summary form here.
(STEP 1) As a ﬁrst step, we show in Theorem 6.3.6 that the SETAR model is
a ϕ-irreducible T-process with ϕ taken as Lebesgue measure µL e b on R. Thus compact
sets are test sets in all of the criteria above.
(STEP 2) In the “interior” of the parameter space we are able to identify geo-
metric ergodicity in Proposition 11.4.5, by using (V4) with linear test functions of the
form "
a x, x > 0,
V (x) =
b |x|, x ≤ 0,

and suitable choice of the coeﬃcients a, b, related to the parameters of the model. Note
that we only indicated that V satisﬁed (V2), but the stronger form is actually proved
in that result.
(STEP 3) We establish transience on the “exterior” of the parameter space as
in Proposition 9.5.4 using the bounded function


1 − 1/a(x + u), x > c/a − u,
V (x) = 1 − 1/c, −c/b − v < x < c/a − u,


1 + 1/b(x + v), x < −c/b − v,
542 Testing for stability

φ(M )

φ(1)φ(M ) + φ(M ) = 0

φ(1)

Figure B.3: The SETAR model: stability classiﬁcation of (φ(1), φ(M ))-space in the
region (θ(M ) θ(1) = 1; θ(1) ≤ 0). The model is regular in the shaded “interior” area
(11.40); transient in the unshaded “exterior” (9.51); and null recurrent on the “margin”
described by (11.49).

for suitable u, v, a, b, c: this satisﬁes (8.41) so that Theorem 8.4.2 applies.

(STEP 4) Null recurrence is, as is often the case, the hardest to establish. Firstly,
Proposition 11.5.4 shows the chain to be recurrent on the boundaries of the parameter
space. This is done by applying (V1) with a logarithmic test function
"
log(u + ax), x > R > rM −1 ,
V (x) =
log(v − bx), x < −R < r1 ,

and V (x) = 0 in the region [−R, R], where a, b, R, u and v are constants chosen suitably
for different regions of the parameter space.
To complete the classification of the model, we need to prove that in this region the
model is not positive recurrent. In Proposition 11.5.5 we show that the chain is indeed
null on the margins of the parameter space, using essentially linear test functions in
(11.42).
This model, although not linear, is sufficiently so that the methods applied to the
random walk or the simple autoregressive models work here also. In this sense the
SETAR model is an example of greater complexity but not of a step change in type.
Indeed, the fact that the drift conditions only have to hold outside a compact set means
that for this model we really only have to consider the two linear models one each of
the end intervals, rendering its analysis even more straightforward.
For more detail on this model, see Tong [388]; and for some of the complications in
moving to multidimensional versions, see Brockwell, Liu and Tweedie [52].
Other generalized random coefficient models or completely nonlinear models with
which we have dealt are in many ways more difficult to classify. Nevertheless, steps
similar to those above are frequently the only ones available, and in practice linearization
to enable use of test functions of these forms will often be the approach taken.
Appendix C

Glossary of model assumptions

Here we gather together the assumptions used for the classes of models we have analyzed
as continuing examples. The equation numbering and assumption item labels (such as
(RT1)) coincide with those used in the main body of the book.

C.1 Regenerative models

We ﬁrst consider the class of models loosely deﬁned as “regenerative”. Such models are
usually addressed in applied probability or operations research contexts.

C.1.1 Recurrence time chains

Both discrete time and continuous time renewal processes have served as examples as
well as tools in our analysis.
(RT1) If {Zn } is a discrete time renewal process, then the forward recurrence time
chain V + = V + (n), n ∈ Z+ is given by
V + (n) := inf(Zm − n : Zm > n), n ≥ 0.

(RT2) The backward recurrence time chain V − = V − (n), n ∈ Z+ is given by

V − (n) := inf(n − Zm : Zm ≤ n), n ≥ 0.

(RT3) If {Zn } is a renewal process in continuous time with no delay, then we call the
process
V + (t) := inf(Zn − t : Zn > t, n ≥ 1), t≥0
the forward recurrence time process; and for any δ > 0, the discrete time chain V +
δ =
V + (nδ), n ∈ Z+ is called the forward recurrence time δ-skeleton.
(RT4) We call the process
V − (t) := inf(t − Zn : Zn ≤ t, n ≥ 1), t≥0

the backward recurrence time process; and for any δ > 0, the discrete time chain V −
δ =
V − (nδ), n ∈ Z+ is called the backward recurrence time δ-skeleton.

543
544 Glossary of model assumptions

C.1.2 Random walk

We have analyzed both random walk on the real line and random walk on the half line,
and many models based on these.

(RW1) Suppose that Φ = {Φn ; n ∈ Z+ } is a collection of random variables deﬁned by

choosing an arbitrary distribution for Φ0 and setting for k ≥ 1

Φk = Φk −1 + Wk

where the Wk are i.i.d. random variables taking values in R with

Γ(−∞, y] = P(Wn ≤ y). (1.6)

Then Φ is called random walk on R.

(RW2) We call the random walk spread out (or equivalently, we call Γ spread out) if
some convolution power Γn ∗ is non-singular with respect to µL e b .

(RWHL1) Suppose Φ = {Φn ; n ∈ Z+ } is deﬁned by choosing an arbitrary distribution

for Φ0 and taking
Φn = [Φn −1 + Wn ]+ (1.7)
where [Φn −1 + Wn ]+ := max(0, Φn −1 + Wn ) and again the Wn are i.i.d. random
variables taking values in R with Γ(−∞, y] = P(W ≤ y).

Then Φ is called random walk on a half line.

C.1.3 Storage models and queues

Random walks provide the underlying structure for both queueing and storage models,
and we have assumed several specializations for these physical systems.
Queueing models and storage models are closely related in formal structure, although
the physical interpretation of the quantities of interest are somewhat diﬀerent.
We have analyzed GI/G/1 queueing models under the following assumptions:

(Q1) Customers arrive into a service operation at time points T0 = 0, T0 + T1 , T0 +

T1 + T2 , . . . where the inter-arrival times Ti , i ≥ 1, are i.i.d. random variables,
distributed as a random variable T with G(−∞, t] = P(T ≤ t).

(Q2) The nth customer brings a job requiring service Sn where the service times are
independent of each other and of the inter-arrival times, and are distributed as a
variable S with distribution H(−∞, t] = P(S ≤ t).

(Q3) There is one server and customers are served in order of arrival.

In such a general situation we have often considered the countable space chain consisting
of the number of customers in the queue either at arrival or at departure times. Under
some exponential assumptions these give the GI/M/1 and M/G/1 queueing systems:
C.1. Regenerative models 545

(Q4) If the distribution H(−∞, t] of service times is exponential with

H(−∞, t] = 1 − e−µt , t≥0

then the queue is called a GI/M/1 queue.

(Q5) If the distribution G(−∞, t] of inter-arrival times is exponential with

G(−∞, t] = 1 − e−λt , t≥0

then the queue is called an M/G/1 queue.

In storage models we have a special case of random walk on a half line, but here we
consider the model at the times of input and break the increment into the input and
output components.
The simple storage model has the following assumptions:
(SSM1) For each n ≥ 0 let Sn and Tn be i.i.d. random variables on R with distributions
H and G.
(SSM2) Deﬁne the random variables

Φn +1 = [Φn + Sn − Jn ]+

where the variables Jn are i.i.d., with

P(Jn ≤ x) = G(−∞, x/r] (2.31)

for some r > 0.

Then the chain Φ = {Φn } represents the contents of a storage system at the times {Tn −}
immediately before each input and is called the simple storage model with release rate
r.
More complex content-dependent storage models have the following assumptions:
(CSM1) For each n ≥ 0 let Sn (x) and Tn be i.i.d. random variables on R with distri-
butions Hx and G.
(CSM2) Deﬁne the random variables

Φn +1 = [Φn − Jn + Sn (Φn − Jn )]+

where the variables Jn are independently distributed, with

P(Jn ≤ y | Φn = x) = G(dt)P(Jx (t) ≤ y). (2.33)

The chain Φ = {Φn } can be interpreted as the content of the storage system at the
times {Tn −} immediately before each input and is called the content-dependent storage
model.
We also note that these models can be used to represent a number of state-dependent
queueing systems where the rate of service depends on the actual state of the system
rather than being independent.
546 Glossary of model assumptions

C.2 State space models

The other broad class of models we have considered are loosely described as “state space
models” and occur in communication and control engineering, other areas of systems
analysis, and in time series.

C.2.1 Linear models

The process X = {Xn , n ∈ Z+ } is called the simple linear model if

(SLM1) Xn and Wn are random variables on R satisfying, for some α ∈ R,

Xn = αXn −1 + Wn , n ≥ 1;

(SLM2) the random variables {Wn } are an i.i.d. sequence with distribution Γ on R.

Next suppose X = {Xk } is a stochastic process for which

(LSS1) There exists an n × n matrix F and an n × p matrix G such that for each
k ∈ Z+ , the random variables Xk and Wk take values in Rn and Rp , respectively,
and satisfy inductively for k ≥ 1, and arbitrary W0 ,

Xk = F Xk −1 + GWk ;

(LSS2) The random variables {Wn } are i.i.d. with common ﬁnite mean, taking values
on Rp , with distribution Γ(A) = P(Wj ∈ A).

Then X is called the linear state space model driven by F, G, or the LSS(F ,G) model,
with associated control model LCM(F ,G) (deﬁned below).
Further assumptions are required for the stability analysis of this model. These
include, at diﬀerent times

(LSS3) The noise variable W has a Gaussian distribution on Rp with zero mean and
unit variance: that is, W ∼ N (0, I).

(LSS4) The distribution Γ of the random variable W is non-singular with respect to

Lebesgue measure, with non-trivial density γw .

(LSS5) The eigenvalues of F fall within the open unit disk in C.

The associated (linear) control model LCM(F ,G) is deﬁned by the following two sets
of assumptions.
Suppose x = {xk } is a deterministic process on Rn and u = {un } is a deterministic
process on Rp , for which x0 is arbitrary; then x is called the linear control model driven
by F, G, or the LCM(F ,G) model, if for k ≥ 1

(LCM1) there exists an n×n matrix F and an n×p matrix G such that for each k ∈ Z+ ,

xk +1 = F xk + Guk +1 ; (1.4)
C.2. State space models 547

(LCM2) the sequence {un } on Rp is chosen deterministically.

A process Y = {Yn } is called a (scalar) autoregression of order k, or AR(k) model, if

it satisﬁes

(AR1) for each n ≥ 0, Yn and Wn are random variables on R satisfying, inductively for
n ≥ k,
Yn = α1 Yn −1 + α2 Yn −2 + · · · + αk Yn −k + Wn ,
for some α1 , . . . , αk ∈ R;

(AR2) the sequence W is an error or innovation sequence on R.

The process Y = {Yn } is called an autoregressive moving-average process of order (k, ),

or ARMA(k, ) model, if it satisﬁes

(ARMA1) for each n ≥ 0, Yn and Wn are random variables on R satisfying, inductively

for n ≥ k,

k

Yn = αj Yn −j + βj Wn −j + Wn ,
j =1 j =1

for some α1 , . . . , αk , β1 , . . . , β ∈ R;

(ARMA2) the sequence W is an error or innovation sequence on R.

C.2.2 Nonlinear models

The stochastic nonlinear systems we analyze have a deterministic analogue in semi-
dynamical systems, deﬁned by:

(DS1) The process Φ is deterministic and generated by the nonlinear diﬀerence equa-
tion, or semi-dynamical system,

Φk +1 = F (Φk ), k ∈ Z+ , (11.16)

where F : X → X is a continuous function.

(DS2) There exists a positive function V : X → R+ and a compact set C ⊂ X and

constant M < ∞ such that

∆V (x) := V (F (x)) − V (x) ≤ −1

for all x lying outside the compact set C, and

sup V (F (x)) ≤ M.
x∈C

The chain X = {Xn } is called a scalar nonlinear state space model on R driven by F ,
or SNSS(F ) model, if it satisﬁes
548 Glossary of model assumptions

(SNSS1) for each n ≥ 0, Xn and Wn are random variables on R satisfying, inductively

for n ≥ 1,
Xn = F (Xn −1 , Wn ),
for some smooth (C ∞ ) function F : R × R → R.
We also use, for various results at various times,
(SNSS2) The sequence W is a disturbance sequence on R, whose marginal distribution
Γ possesses a density γw supported on an open set Ow , called the control set.
(SNSS3) The distribution Γ of W is absolutely continuous, with a density γw on R
which is lower semicontinuous.
Suppose X = {Xk }, where
(NSS1) for each k ≥ 0, Xk and Wk are random variables on Rn , Rp respectively satis-
fying, inductively for k ≥ 1,
Xk = F (Xk −1 , Wk ),
for some smooth (C ∞ ) function F : X × Ow → X, where X is an open subset of
Rn and Ow is an open subset of Rp .
Then X is called a nonlinear state space model driven by F , or NSS(F ) model, with
control set Ow .
Again for various properties to hold we require
(NSS2) The random variables {Wk } are a disturbance sequence on Rp , whose marginal
distribution Γ possesses a density γw which is supported on an open set Ow .
(NSS3) The distribution Γ of W possesses a density γw on Rp which is lower semicon-
tinuous, and the control set is the open set
Ow := {x ∈ R : γw (x) > 0}.

The associated control model CM(F ) is deﬁned as follows:

(CM1) The deterministic system
xk = Fk (x0 , u1 , . . . , uk ), k ∈ Z+ , (2.8)
where the sequence of maps {Fk : X × Owk → X : k ≥ 0} is deﬁned by (2.5),
is called the associated control system for the NSS(F ) model (denoted CM(F ))
provided the deterministic control sequence {u1 , . . . , uk , k ∈ Z+ } lies in the control
set Ow ⊆ Rp .
To obtain a T-chain, we assume for the SNSS(F ) model,
(CM2) For each initial condition x00 ∈ R there exists k ∈ Z+ and a sequence
(u01 , . . . , u0k ) ∈ Owk such that the derivative
∂ ∂
Fk (x00 , u01 , . . . , u0k ) | · · · | Fk (x00 , u01 , . . . , u0k ) (7.4)
∂u1 ∂uk
is non-zero.
C.2. State space models 549

For the multidimensional NSS(F ) model we often assume

(CM3) For each initial condition x00 ∈ R there exists k ∈ Z+ and a sequence u0 =
(u01 , . . . , u0k ) ∈ Owk such that
rank Cxk (u0 ) = n. (7.13)

A speciﬁc example of the NSS(F ) model is the nonlinear autoregressive moving-average,

or NARMA, model.
The process Y = {Yn } is called a nonlinear autoregressive moving-average process
of order (k, ) if the values Y0 , . . . , Yk −1 are arbitrary and
(NARMA1) for each n ≥ 0, Yn and Wn are random variables on R satisfying, inductively
for n ≥ k,
Yn = G(Yn −1 , Yn −2 , . . . , Yn −k , Wn , Wn −1 , Wn −2 , . . . , Wn − )
where the function G : Rk + +1 → R is smooth (C ∞ ).
(NARMA2) the sequence W is an error sequence on R.

C.2.3 Particular examples

The simple adaptive control model is a triple Y , U , θ where
(SAC1) the output sequence Y and parameter sequence θ are deﬁned inductively for
any input sequence U by
Yk +1 = θk Yk + Uk + Wk +1 (2.20)
θk +1 = αθk + Zk +1 , k≥1 (2.21)
where α is a scalar with |α| < 1;
Z
(SAC2) the bivariate disturbance process Wis Gaussian and satisﬁes

Zn 0
E[ W ] =
0 2
n

Zn σz 0
E[ W (Zk , Wk )] = δn −k , n≥1
n 0 σw2
with σz < 1;
(SAC3) the input process satisﬁes Uk ∈ Yk , k ∈ Z+ , where Yk = σ{Y0 , . . . , Yk }.
With the control Uk chosen as Uk = −θ̂k Yk , k ∈ Z+ , the closed loop system equations
for the simple adaptive control model are
θ̃k +1 = αθ̃k − αΣk Yk +1 Yk (Σk Yk2 + σw2 )−1 + Zk +1 (2.22)
Yk +1 = θ̃k Yk + Wk +1 (2.23)
Σk +1 = σz2 +α 2
σw2 Σk (Σk Yk2 + σw2 )−1 , k≥1 (2.24)
where the triple Σ0 , θ̃0 , Y0 is given as an initial condition.
The closed loop system gives rise to a Markovian system of the form (NSS1), so that
σ2
Φk = (Σk , θ̃k , Yk ) is a Markov chain with state space X = [σz2 , 1−αz 2 ] × R2 .
550 Glossary of model assumptions

A chain X = {Xn } is called a scalar self-exciting threshold autoregression (or SE-

TAR) model if it satisﬁes
(SETAR1) for each 1 ≤ j ≤ M , Xn and Wn (j) are random variables on R satisfying,
inductively for n ≥ 1,

Xn = φ(j) + θ(j)Xn −1 + Wn (j), rj −1 < Xn −1 ≤ rj ,

where −∞ = r0 < r1 < · · · < rM = ∞ and {Wn (j)} forms an i.i.d. zero-mean
error sequence for each j, independent of {Wn (i)} for i = j.
For stability classiﬁcation we often use
(SETAR2) For each j = 1, . . . , M , the noise variable W (j) has a density positive on the
whole real line.
(SETAR3) The variances of the noise distributions for the two end intervals are ﬁnite;
that is,
E(W 2 (1)) < ∞, E(W 2 (M )) < ∞.

A chain X = {Xn } is called a simple (ﬁrst order) bilinear process if it satisﬁes

(SBL1) For each n ≥ 0, Xn and Wn are random variables on R satisfying, for n ≥ 1,

Xn = θXn −1 + bXn −1 Wn + Wn

where θ and b are scalars and the sequence W is an error sequence on R.

(SBL2) The sequence W is a disturbance process on R, whose marginal distribution Γ

possesses a ﬁnite second moment, and a density γw which is lower semicontinuous.

The process Φ = Yθ is called the dependent parameter bilinear model if it satisﬁes

(DBL1) For some |α| < 1 and all k ∈ Z+ ,

Yk +1 = θk Yk + Wk +1 (2.13)

θk +1 = αθk + Zk +1 . (2.14)

We often also require

(DBL2) The joint process (Z, W ) is a disturbance sequence on R2 , Z and W are
mutually independent, and the distributions Γw and Γz of W , Z respectively
possess densities which are lower semicontinuous. It is assumed that W has a
finite second moment, and that E[log(1 + |Z|)] < ∞.
The chain X = {Xk } is called a random coefficient autoregression (RCA) process if it
satisfies, for each k ≥ 0,

Xk +1 = (A + Γk +1 )Xk + Wk +1

where Xk , Γk and Wk are random variables satisfying the following:

C.2. State space models 551

(RCA1) The sequences Γ and W are i.i.d. and also independent of each other.

Conditions which lead to stability are then

(RCA2) The following expectations exist, and have the prescribed values:

E[Wk ] = 0 E[Wk Wk ] = G (n × n),

E[Γk ] = 0 (n × n) E[Γk ⊗ Γk ] = C (n2 × n2 ),

and the eigenvalues of A ⊗ A + C have moduli less than unity.

Γk
(RCA3) The distribution of W k
has an everywhere positive density with respect to
2
µL e b on Rn +p
Appendix D

Some mathematical
background

In this ﬁnal section we collect together, for ease of reference, many of those mathemat-
ical results which we have used in developing our results on Markov chains and their
applications: these come from probability and measure theory, topology, stochastic
processes, the theory of probabilities on topological spaces, and even number theory.
We have tried to give results at a relevant level of generality for each of the types
of use: for example, since we assume that the leap from countable to general spaces or
topological spaces is one that this book should encourage, we have reviewed (even if
brieﬂy) the simple aspects of this theory; conversely, we assume that only a relatively
sophisticated audience will wish to see details of sample path results, and the martingale
background provided requires some such sophistication.
Readers who are unfamiliar with any particular concepts and who wish to delve
further into them should consult the standard references cited, although in general a
deep understanding of many of these results is not vital to follow the development in
this book itself.

D.1 Some measure theory

We assume throughout this book that the reader has some familiarity with the elements
of measure and probability theory. The following sketch of key concepts will serve
only as a reminder of terms, and perhaps as an introduction to some non-elementary
concepts; anyone who is unfamiliar with this section must take much in the general
state space part of the book on trust, or delve into serious texts such as Billingsley [37],
Chung [72] or Doob [99] for enlightenment.

D.1.1 Measurable spaces and σ-ﬁelds

A general measurable space is a pair (X, B(X)) with

X: an abstract set of points;

552
D.1. Some measure theory 553

B(X): a σ-ﬁeld of subsets of X; that is,

(a) X ∈ B(X);
(b) if A ∈ B(X), then Ac ∈ B(X);
!∞
(c) if Ak ∈ B(X), k = 1, 2, 3, . . . , then k =1 Ak ∈ B(X).

A σ-ﬁeld B is generated by a collection of sets A in B if B is the smallest σ-ﬁeld

containing the sets A, and then we write B = σ(A); a σ-field B is countably generated
if it is generated by a countable collection A of sets in B. The σ-fields B(X) we use are
always assumed to be countably generated.
On the real line R := (−∞, ∞) the Borel σ-field B(R) is generated by the countable
collection of sets A = (a, b] where a, b range over the rationals Q.
When our state space is R, then we always assume it is equipped with the Borel
σ-field.
If (X1 , B(X1 )) is a measurable space and (X2 , B(X2 )) is another measurable space,
then a mapping h : X1 → X2 is called a measurable function if

h−1 {B} := {x : h(x) ∈ B} ∈ B(X1 )

for all sets B ∈ B(X2 ).

As a convention, functions on (X, B(X)) which we use are always assumed to be
measurable, and in general this is omitted from theorem statements and the like.

D.1.2 Measures
A (signed ) measure µ on the space (X, B(X)) is a function from B(X) to (−∞, ∞] which
is countably additive: if Ak ∈ B(X), k = 1, 2, 3, . . . , and Ai ∩ Aj = ∅, i = j, then
∞
* ∞

µ( Ai ) = µ(Ai ).
i=1 i=1

We say that µ is positive if µ(A) ≥ 0 for any A. The measure µ is called a probability
(or subprobability) measure if it is positive and µ(X) = 1 (or µ(X) < 1).
A positive measure µ is σ-finite if there is a countable collection of sets {Ak } such
that X = ∪Ak and µ(Ak ) < ∞ for each k.
On the real line (R, B(R)) Lebesgue measure µL e b is a positive measure defined for
intervals (a, b] by µL e b (a, b] = b − a, and for the other sets in B(R) by an obvious
extension technique. Lebesgue measure on higher dimensional Euclidean space Rp is
constructed similarly using the area of rectangles as a basic definition.
The total variation norm of a signed measure is defined as
µ
:= sup f dµ, where
the supremum is taken over all measurable functions f from (X, B(X)) to (R, B(R)),
such that |f (x)| ≤ 1 for all x ∈ X.
For a signed measure µ, the state space X may be written as the union of disjoint
sets X+ and X− where
µ(X+ ) − µ(X− ) =
µ
.
This is known as the Hahn decomposition.
554 Some mathematical background

D.1.3 Integrals
Suppose that h is a non-negative measurable function from (X, B(X)) to (R, B(R)). The
Lebesgue integral of h with respect to a positive finite measure µ is defined in three
steps.
Firstly, for A ∈ B(X) define IA (x) = 1 if x ∈ A, and 0 otherwise: IA is called the
indicator function of the set A. In this case we define

IA (x)µ(dx) := µ(A).
X

Next consider simple functions h such that there exist sets {A1 , . . . AN } ⊂ B(X) and
N
positive numbers {b1 , . . . bN } ⊂ R+ with h = k =1 bk IA k .
If h is a simple function we can unambiguously deﬁne

N
h(x)µ(dx) := bk µ{Ak }.
X k =1

Finally, since it is possible to show that given any non-negative measurable h, there
exists a sequence of simple functions {hk }∞
k =1 such that for each x ∈ X,

hk (x) ↑ h(x),
we can take
h(x)µ(dx) := lim hk (x)µ(dx)
X k X
which always exists, though it may be inﬁnite.
This approach works if h is non-negative. If not, write
h = h+ − h −
where h+ and h− are both non-negative measurable functions, and deﬁne

h(x)µ(dx) := h+ (x)µ(dx) − h− (x)µ(dx),
X X X

if both terms on the right are finite. Such functions are called µ-integrable, or just
integrable if there is no possibility of confusion; and we frequently denote the integral
by
h dµ := h(x)µ(dx).
X
The extension to σ-finite measures is then straightforward.
Convergence of sequences of integrals is central to much of this book. There are
three results which we use regularly:
Theorem D.1.1 (Monotone Convergence Theorem). If µ is a σ-finite positive measure
on (X, B(X)) and {fi : i ∈ Z+ } are measurable functions from (X, B(X)) to (R, B(R))
which satisfy 0 ≤ fi (x) ↑ f (x) for µ-almost every x ∈ X, then

f (x)µ(dx) = lim fi (x)µ(dx). (D.1)
X i X
D.2. Some probability theory 555

Note that in this result the monotone limit f may not be finite even µ-almost
everywhere, but the result continues to hold in the sense that both sides of (D.1) will
be finite or infinite together.
Theorem D.1.2 (Fatou’s Lemma). If µ is a σ-finite positive measure on (X, B(X)) and
{fi : i ∈ Z+ } are non-negative measurable functions from (X, B(X)) to (R, B(R)), then

lim inf fi (x)µ(dx) ≤ lim inf fi (x)µ(dx). (D.2)
X i i X

Theorem D.1.3 (Dominated Convergence Theorem). Suppose that µ is a σ-ﬁnite

positive measure on (X, B(X)) and g ≥ 0 is a µ-integrable function from (X, B(X)) to
(R, B(R)).
If f and {fi : i ∈ Z+ } are measurable functions from (X, B(X)) to (R, B(R)) satis-
fying |fi (x)| ≤ g(x) for µ-almost every x ∈ X, and if fi (x) → f (x) as i → ∞ for µ-a.e.
x ∈ X, then each fi is µ-integrable and

f (x)µ(dx) = lim fi (x)µ(dx).
X i X

D.2 Some probability theory

A general probability space is an ordered triple (Ω, F, P) with Ω an abstract set of points,
F a σ-ﬁeld of subsets of Ω, and P a probability measure on F.
If (Ω, F, P) is a probability space and (X, B(X)) is a measurable space, then a map-
ping X : Ω → X is called a random variable if

X −1 {B} := {ω : X(ω) ∈ B} ∈ F

for all sets B ∈ B(X): that is, if X is a measurable mapping from Ω to X.

Given a random variable X on the probability space (Ω, F, P), we define the σ-
field generated by X, denoted σ{X} ⊆ F, to be the smallest σ-field on which X is
measurable.
If X is a random variable from a probability space (Ω, F, P) to a general measurable
space (X, B(X)), and h is a real-valued measurable mapping from (X, B(X)) to the real
line (R, B(R)) then the composite function h(X) is a real-valued random variable on
(Ω, F, P): note that some authors reserve the term “random variable” for such real-
valued mappings. For such functions, we define the expectation as

E[h(X)] = h(X(ω))P(dw).
Ω

The set of real-valued random variables Y for which the expectation is well defined and
finite is denoted L1 (Ω, F, P). Similarly, we use L∞ (Ω, F, P) to denote the collection of
essentially bounded real-valued random variables Y , that is, those for which there is a
bound M and a set AM ⊂ F with P(AM ) = 0 such that {ω : |Y (ω)| > M } ⊆ AM .
Suppose that Y ∈ L1 (Ω, F, P) and G ⊂ F is a sub-σ-field of F. If Ŷ ∈ L1 (Ω, G, P)
and satisfies
E[Y Z] = E[Ŷ Z] for all Z ∈ L∞ (Ω, G, P),
556 Some mathematical background

then Ŷ is called the conditional expectation of Y given G, and denoted E[Y | G]. The
conditional expectation defined in this way exists and is unique (modulo P-null sets)
for any Y ∈ L1 (Ω, F, P) and any sub σ-field G.
Suppose now that we have another σ-field D ⊂ G ⊂ F. Then

E[Y | D] = E[E[Y | G] | D]. (D.3)

The identity (D.3) is often called “the smoothing property of conditional expectations”.

D.3 Some topology

We summarize in this section several concepts needed for chains on topological spaces,
and for the analysis of some of the applications on such spaces. The classical texts of
Kelley [198] and Halmos [152] are excellent references for details at the level we require,
as is the more introductory but very readable exposition of Simmons [355].

D.3.1 Topological spaces

On any abstract space X a topology T := {open subsets of X} is a collection of sets
containing
(i) arbitrary unions of members of T ,

(ii) ﬁnite intersections of members of T ,

(iii) the whole space X and the empty set ∅.

Those members of T containing a point x are called the neighborhoods of x, and the
complements of open sets are called closed .
A set C is called compact if any cover of C with open sets admits a finite subcover,
and a set D is dense if the smallest closed set containing D (the closure of D) is the
whole space. A set is called pre-compact if it has a compact closure.
When there is a topology assumed on the state spaces for the Markov chains con-
sidered in this book, it is always assumed that these render the space locally compact
and separable metric: a locally compact space is one for which each open neighborhood
of a point contains a compact neighborhood, and a separable space is one for which a
countable dense subset of X exists. A metric space is such that there is a metric d on
X which generates its topology.
For the topological spaces we consider, Lindelöf’s Theorem holds:
Theorem D.3.1 (Lindelöf’s Theorem). If X is a separable metric space, then every
cover of an open set by open sets admits a countable subcover.
If X is a topological space with topology T , then there is a natural σ-field on X
containing T . This σ-field B(X) is defined as

B(X) := {G : T ⊂ G, G a σ-field on X}

so that B(X) is generated by the open subsets of X.

D.4. Some real analysis 557

Extending the terminology from R, this is often called the Borel σ-ﬁeld of X:
throughout this book, we have assumed that on a topological space the Borel σ-ﬁeld
is being addressed, and so our general notation B(X) is consistent in the topological
context with the conventional notation.
A measure µ is called regular if for any set E ∈ B(X),

µ(E) = inf{µ(O) : E ⊆ O, O open} = sup{µ(C) : C ⊆ E, C compact}.

For the topological spaces we consider, measures on B(X) are regular: we have ([345]
p. 49).
Theorem D.3.2. If X is locally compact and separable, then every σ-ﬁnite measure on
B(X) is regular.

D.4 Some real analysis

A function f : X → R on a space X with a metric d is called continuous if for each ε > 0
there exists δ > 0 such that for any two x, y ∈ X, if d(x, y) < δ then |f (x) − f (y)| < ε.
The set of all bounded continuous functions on the locally compact and separable metric
space X forms a metric space denoted C(X), whose metric is generated by the supremum
norm
|f |c := sup |f (x)|.
x∈X

A function f : X → R is called lower semicontinuous if the sublevel set {x : f (x) ≤ c}

is closed for any constant c, and upper semicontinuous if {x : f (x) < c} is open for any
constant c.
Theorem D.4.1. A real-valued function f on X is continuous if and only if it is
simultaneously upper semicontinuous and lower semicontinuous.
If the function f is positive, then it is lower semicontinuous if and only if there
exists a sequence of continuous bounded positive functions {fn : n ∈ Z+ } ⊂ C(X), each
with compact support, such that for all x ∈ X,

fn (x) ↑ f (x) as n → ∞.

A sequence of functions {fi : i ∈ Z+ } ⊂ C(X) is called equicontinuous if for each ε >

0 there exists δ > 0 such that for any two x, y ∈ X, if d(x, y) < δ then |fi (x) − fi (y)| < ε
for all i.
Theorem D.4.2 (Ascoli’s Theorem). Suppose that the topological space X is compact.
A collection of functions {fi : i ∈ Z+ } ⊂ C(X) is pre-compact as a subset of C(X) if and
only if the following two conditions are satisﬁed:
(i) The sequence is uniformly bounded: that is, for some M < ∞, and all i ∈ Z+ ,

|fi |c = sup |fi (x)| ≤ M.

x∈X

(ii) The sequence is equicontinuous.

558 Some mathematical background

Finally, in our context one of the most frequently used of all results on continuous
functions is that which assures us that the convolution operation applied to any pair of
L1 (R, B(R), µL e b ) and L∞ (R, B(R), µL e b ) functions is continuous.
For two functions f, g : R → R, the convolution f ∗ g is the function on R deﬁned
for t ∈ R by
∞
f ∗ g (t) = f (s)g(t − s) ds.
−∞

This is well deﬁned if, for example, both f and g are positive. We have (see [345],
p. 196):

Theorem D.4.3. Suppose that f and g are measurable functions on R, that f is

bounded, and that |g| dx < ∞. Then the convolution f ∗ g is a bounded continuous
function.

D.5 Convergence concepts for measures

In this section we summarize various forms of convergence of probability measures
which are used throughout the book. For further information the reader is referred to
Parthasarathy [314] and Billingsley [36].
Assume X to be a locally compact and separable metric space. Letting M denote the
set of probability measures on B(X), we can construct a number of natural topologies
on M.
As is obvious in Part III of this book, we are frequently concerned with the very
strong topology of convergence in total variation norm. However, for individual se-
quences of measures, the topologies of weak or vague convergence prove more natural
in many respects.

D.5.1 Weak convergence

In the topology of weak convergence a sequence {νk : k ∈ Z+ } of elements of M
converges to ν if and only if

lim f dνk = f dν (D.4)
k →∞

for every f ∈ C(X).

In this case we say that {νk } converges weakly to ν as k → ∞, and this will be
w
denoted νk −→ ν.
The following key result is given as Theorem 6.6 in [314]:

Proposition D.5.1. There exists a sequence of uniformly continuous, uniformly

bounded functions {gn : n ∈ Z+ } ⊂ C(X) with the property that

w
µk −→ µ∞ ⇐⇒ ∀n ∈ Z+ , lim gn dµk = gn dµ∞ . (D.5)
k →∞
D.5. Convergence concepts for measures 559

It follows that M can be considered as a metric space with metric | · |w deﬁned for
ν, µ ∈ M by
∞

|ν − µ|w := 2−k gk dν − gk dµ .
k =1

Other metrics relevant to weak convergence are summarized in, for example, [191].
A set of probability measures A ⊂ M is called tight if for every ε ≥ 0 there exists a
compact set C ⊂ X for which

ν {C} ≥ 1 − ε for every ν ∈ A.

The following result, which characterizes tightness with M viewed as a metric space,
follows from Proposition D.5.6 below.

Proposition D.5.2. The set of probabilities A ⊂ M is pre-compact if and only if it is

tight.

A function V : X → R+ is called coercive if there exists a sequence of compact sets,

Cn ⊂ X, Cn ↑ X such that
lim inf c V (x) = ∞
n →∞ x∈C n

where we adopt the convention that the infimum of a function over the empty set is
infinity. If X is a closed and unbounded subset of Rk , it is evident that V (x) = |x|p is
coercive for any p > 0. If X is compact, then our convention implies that any positive
function V is coercive because we may set Cn = X for all n ∈ Z+ .
It is easily verified that a collection of probabilities A ⊂ M is tight if and only if a
coercive function V exists such that

sup V dν < ∞.
ν ∈A

The following simple lemma will often be needed.

Lemma D.5.3. (i) A sequence of probabilities {νk : k ∈ Z+ } is tight if and only if

there exists a coercive function V such that

lim sup νk (V ) < ∞.

k →∞

(ii) If for each x ∈ X there exists a coercive function Vx ( · ) on X such that

lim sup Ex [Vx (Φk )] < ∞,

k →∞

then the chain is bounded in probability.

The next result can be found in [36] and [314].

Theorem D.5.4. The following are equivalent for a sequence of probabilities {νk : k ∈
Z+ } ⊂ M:
560 Some mathematical background

w
(i) νk −→ ν;

(ii) for all open sets O ⊂ X, lim inf νk {O} ≥ ν {O} ;

k →∞

(iii) for all closed sets C ⊂ X, lim sup νk {C} ≤ ν {C} ;

k →∞

(iv) for every uniformly bounded and equicontinuous family of functions C ⊂ C(X),

lim sup | f dνk − f dν| = 0.
k →∞ f ∈C

D.5.2 Vague convergence

Vague convergence is less stringent than weak convergence. Let C0 (X) ⊂ C(X) denote
the set of continuous functions on X which converge to zero on the “boundary” of X:
that is, f ∈ C0 (X) if for some (and hence any) sequence {Ck : k ∈ Z+ } of compact sets
which satisfy
∞
*
Ck ⊂ Ck +1 and Ck = X
k =0

we have
lim sup |f (x)| = 0.
k →∞ x∈C c
k

The space C0 (X) is simply the closure of Cc (X), the space of continuous functions with
compact support, in the uniform norm.
A sequence of subprobability measures {νk : k ∈ Z+ } is said to converge vaguely to
a subprobability measure ν if for all f ∈ C0 (X)

lim f dνk = f dν,
k →∞

and in this case we will write

v
νk −→ ν as k → ∞.

In this book we often apply the following result, which follows from the observation
that positive lower semicontinuous functions on X are the pointwise supremum of a
collection of positive, continuous functions with compact support (see Theorem D.4.1).
v
Lemma D.5.5. If νk −→ ν, then

lim inf f dνk ≥ f dν (D.6)
k →∞

for any positive lower semicontinuous function f on X.

D.6. Some martingale theory 561

It is obvious that weak convergence implies vague convergence. On the other hand,
a sequence of probabilities converges weakly if and only if it converges vaguely and is
tight.
The use and direct veriﬁcation of boundedness in probability will often follow from
the following results: the ﬁrst of these is a consequence of our assumption that the state
space is locally compact and separable (see Billingsley [36] and Revuz [326]).

Proposition D.5.6. (i) For any sequence of subprobabilities {νk : k ∈ Z+ } there

exists a subsequence {nk } and a subprobability ν∞ such that
v
νn k −→ ν∞ , k → ∞.

w
(ii) If {νk } is tight and each νk is a probability measure, then νn k −→ ν∞ and ν∞ is
a probability measure.

D.6 Some martingale theory

D.6.1 The martingale convergence theorem
A sequence of integrable random variables {Mn : n ∈ Z+ } is called adapted to an increas-
ing family of σ-ﬁelds {Fn : n ∈ Z+ } if Mn is Fn -measurable for each n. The sequence
is called a martingale if E[Mn +1 | Fn ] = Mn for all n ∈ Z+ , and a supermartingale if
E[Mn +1 | Fn ] ≤ Mn for n ∈ Z+ .
A martingale diﬀerence sequence {Z n : n ∈ Z+ } is an adapted sequence of random
n
variables such that the sequence Mn = k =0 Zk is a martingale.
The following result is basic:

Theorem D.6.1 (Martingale Convergence Theorem). Let Mn be a supermartingale,

and suppose that
sup E[|Mn |] < ∞.
n

Then {Mn } converges to a ﬁnite limit with probability one.

If {Mn } is a positive, real-valued supermartingale, then by the smoothing property

of conditional expectations (D.3),

E[|Mn |] = E[Mn ] ≤ E[M0 ] < ∞, n ∈ Z+ .

Hence we have as a direct corollary to the Martingale Convergence Theorem:

Theorem D.6.2. A positive supermartingale converges to a ﬁnite limit with probability

one.

Since a positive supermartingale is convergent, it follows that its sample paths are
bounded with probability one. The following result gives an upper bound on the mag-
nitude of variation of the sample paths of both positive supermartingales, and general
martingales.
562 Some mathematical background

Theorem D.6.3 (Kolmogorov’s Inequality). (i) If Mn is a martingale, then for

each c > 0 and p ≥ 1,

1
P{ max |Mk | ≥ c} ≤ E[|Mn |p ] .
0≤k ≤n cp

(ii) If Mn is a positive supermartingale, then for each c > 0

1
P{ sup Mk ≥ c} ≤ E[M0 ] .
0≤k ≤∞ c

These results, and related concepts, can be found in Billingsley [37], Chung [72],
Hall and Heyde [151], and of course Doob [99].

D.6.2 The functional CLT for martingales

Consider a general martingale (Mn , Fn ). Our purpose is to analyze the following se-
quence of continuous functions on [0, 1]:

mn (t) := Mn t + (nt − nt) Mn t+1 − Mn t , 0 ≤ t ≤ 1. (D.7)

The function mn (t) is piecewise linear and is equal to Mi when t = i/n for 0 ≤ t ≤
1. In Theorem D.6.4 below we give conditions under which the normalized sequence
{n−1/2 mn (t) : n ∈ Z+ } converges to a continuous process (Brownian motion) on [0, 1].
This result requires some care in the deﬁnition of convergence for a sequence of stochastic
processes.
Let C[0, 1] denote the normed space of all continuous functions φ : [0, 1] → R under
the uniform norm, which is deﬁned as

|φ|c = sup |φ(t)|.

0≤t≤1

The vector space C[0, 1] is a complete, separable metric space, and hence the theory of
weak convergence may be applied to analyze measures on C[0, 1].
The stochastic process mn (t) possesses a distribution µn , which is a probability
measure on C[0, 1]. We say that mn (t) converges in distribution to a stochastic process
d
m∞ (t) as n → ∞, which is denoted mn −→ m∞ , if the sequence of measures µn
converges weakly to the distribution µ∞ of m∞ . That is, for any bounded continuous
functional h on C[0, 1],

E[h(mn )] → E[h(m∞ )] as n → ∞.

The limiting process, standard Brownian motion on [0, 1], which we denote by B, is
deﬁned as follows:
D.7. Some results on sequences and numbers 563

Standard Brownian motion

Brownian motion B(t) is a real-valued stochastic process on [0, 1] with
B(0) = 0, satisfying

(i) the sample paths of B are continuous with probability one;

(ii) the increment B(t) − B(s) is independent of {B(r) : r ≤ s} for each
0 ≤ s ≤ t ≤ 1;

(iii) the distribution of B(t) − B(s) is Gaussian N (0, |t − s|).

To prove convergence we use the following key result which is a consequence of

Theorem 4.1 of [151].
Theorem D.6.4. Let (Mn , Fn ) be a square integrable martingale, so that for all n ∈ Z+

n
E[Mn2 ] = E[M02 ] + E[(Mk − Mk −1 )2 ] < ∞,
k =1

and suppose that the following conditions hold:

(i) For some constant 0 < γ 2 < ∞,

1
n
lim E[(Mk − Mk −1 )2 |Fk −1 ] = γ 2 a.s. (D.8)
n →∞ n
k =1

(ii) For all ε > 0,

1
n
lim E[(Mk − Mk −1 )2 I{(Mk − Mk −1 )2 ≥ εn}|Fk −1 ] = 0 a.s. (D.9)
n →∞ n
k =1

d
Then (γ 2 n)−1/2 mn −→ B.

Function space limits of this kind are often called invariance principles, though we
have avoided this term because functional CLT seems more descriptive.

D.7 Some results on sequences and numbers

We conclude with some useful lemmas on sequences and convolutions. The ﬁrst gives
an interaction between convolutions and limits. Recall that for two series a, b on Z+ ,
the convolution is deﬁned as

n
a ∗ b (n) := a(j)b(n − j).
j =0
564 Some mathematical background

{a(n)}, {b(n)} are non-negative sequences such that b(n) → b(∞) <
Lemma D.7.1. If
∞ as n → ∞ and a(j) < ∞, then
∞

a ∗ b (n) → b(∞) a(j) < ∞, n → ∞. (D.10)
j =0

Proof Set b(n) = 0 for n < 0. Since b(n) converges it is bounded, and so by the
Dominated Convergence Theorem
∞
∞

lim a ∗ b (n) = a(j) lim b(n − j) = b(∞) a(j) (D.11)
n →∞ n →∞
j =0 j =0

as required.

The next lemma contains two valuable summation results for series.
Lemma D.7.2. (i) If c(n) is a non-negative sequence, then for any r > 1,
r
c(m) rn ≤ c(m)rm
r−1
n ≥0 m ≥n m ≥0

and hence the two series

c(n)rn , c(m) rn
n ≥0 n ≥0 m ≥n

converge or diverge together.

(ii) If a, b are two non-negative sequences and r ≥ 0, then

a ∗ b (n)rn = a(n)rn b(n)rn .

Proof By Fubini’s Theorem we have

[ c(m)]rn = c(m) rn
n ≥0 m ≥n m ≥0 n ≤m

= c(m)[rm +1 − 1]/[r − 1]
m ≥0

which gives the ﬁrst result. Similarly, we have

a ∗ b (n)rn = [ a(m)b(n − m)]rn
n ≥0 n ≥0 m ≤n

= a(m)rm b(n − m)rn −m
m ≥0 n ≥m

m
= a(m)r b(n)rn
m ≥0 n ≥0

which gives the second result.

An elementary result on the greatest common divisor is useful for periodic chains.
D.7. Some results on sequences and numbers 565

Lemma D.7.3. Let d denote the greatest common divisor (g.c.d.) of the numbers m, n.
Then there exist integers a, b such that

am + bn = d.

For a proof, see the corollary to Lemma 1.31 in Herstein [161].

Finally, in analyzing the periodic behavior of Markov chains, the following lemma
is invaluable on very many occasions in ensuring positivity of transition probabilities.
A proof can be found in Billingsley [37].

Lemma D.7.4. Suppose that N ⊂ Z+ is a subset of the integers which is closed under
addition: for each j, k ∈ N , j + k ∈ N . Let d denote the greatest common divisor of
the set N . Then there exists n0 < ∞ such that nd ∈ N for all n ≥ n0 .
Bibliography

[1] D. Aldous and P. Diaconis. Strong uniform times and finite random walks. Adv. Applied
Maths., 8:69–97, 1987.
[2] E. Altman, P. Konstantopoulos, and Z. Liu. Stability, monotonicity and invariant quan-
tities in general polling systems. Queueing Syst. Theory Appl., 11:35–57, 1992.
[3] B. D. O. Anderson and J. B. Moore. Optimal Control: Linear Quadratic Methods.
Prentice-Hall, Englewood Cliffs, N.J., 1990.
[4] W.J. Anderson. Continuous-Time Markov Chains: An Applications-Oriented Approach.
Springer-Verlag, New York, 1991.
[5] M. Aoki. State Space Modeling of Time Series. Springer-Verlag, Berlin, 1990.
[6] A. Arapostathis, V. S. Borkar, E. Fernandez-Gaucherand, M. K. Ghosh, and S. I. Marcus.
Discrete-time controlled Markov processes with average cost criterion: a survey. SIAM
J. Control Optim., 31:282–344, 1993.
[7] E. Arjas and E. Nummelin. A direct construction of the R-invariant measure for a
Markov chain on a general state space. Ann. Probab., 4:674–679, 1976.
[8] E. Arjas, E. Nummelin, and R. L. Tweedie. Uniform limit theorems for non-singular
renewal and Markov renewal processes. J. Appl. Probab., 15:112–125, 1978.
[9] S. Asmussen. Applied Probability and Queues. John Wiley & Sons, New York, 1987.
[10] S. Asmussen and P. W. Glynn. Stochastic Simulation: Algorithms and Analysis, vol-
ume 57 of Stochastic Modelling and Applied Probability. Springer-Verlag, New York,
2007.
[11] R. Atar and O. Zeitouni. Lyapunov exponents for finite state nonlinear filtering. SIAM
J. Control Optim., 35(1):36–55, 1997.
[12] K. B. Athreya and P. Ney. Branching Processes. Springer-Verlag, New York, 1972.
[13] K. B. Athreya and P. Ney. A new approach to the limit theory of recurrent Markov
chains. Trans. Amer. Math. Soc., 245:493–501, 1978.
[14] K. B. Athreya and P. Ney. Some aspects of ergodic theory and laws of large numbers
for Harris recurrent Markov chains. Colloquia Mathematica Societatis János Bolyai.
Nonparametric Statistical Inference, 32:41–56, 1980. Budapest, Hungary.
[15] K. B. Athreya and S. G. Pantula. Mixing properties of Harris chains and autoregressive
processes. J. Appl. Probab., 23:880–892, 1986.
[16] K. E. Avrachenkov and J. B. Lasserre. The fundamental matrix of singularly perturbed
Markov chains. Adv. Appl. Probab., 31(3):679–697, 1999.
[17] S. Balaji and S. P. Meyn. Multiplicative ergodicity and large deviations for an irreducible
Markov chain. Stoch. Proc. Applns., 90(1):123–144, 2000.

567
568 Bibliography

[18] D. J. Bartholomew and D. J. Forbes. Statistical Techniques for Manpower Planning.

John Wiley & Sons, New York, 1979.
[19] R. Bartoszyński. On the risk of rabies. Math. Biosci., 24:355–377, 1975.
[20] P. Baxendale. Stochastic averaging and asymptotic behavior of the stochastic Duffing-
van der Pol equation. Stoch. Proc. Applns., 113:235–272, 2004.
[21] P. H. Baxendale. Renewal theory and computable convergence rates for geometrically
ergodic Markov chains. Adv. Appl. Probab., 15(1B):700–738, 2005.
[22] P. H. Baxendale and L. Goukasian. Lyapunov exponents for small random perturbations
of Hamiltonian systems. Ann. Probab., 30(1):101–134, 2002.
[23] V. E. Beneš. Existence of finite invariant measures for Markov processes. Proc. Amer.
Math. Soc., 18:1058–1061, 1967.
[24] V. E. Beneš. Finite regular invariant measures for Feller processes. J. Appl. Probab.,
5:203–209, 1968.
[25] H. C. P. Berbee. Random walks with stationary increments and renewal theory. PhD
thesis, Vrije University, Amsterdam, 1979.
[26] J. R. L. Bernard, editor. Macquarie English Thesaurus. Macquarie Library, 1984.
[27] D. P. Bertsekas. Dynamic Programming and Optimal Control. Athena Scientific, Cam-
bridge, Mass., third edition, 2007.
[28] D. P. Bertsekas and J. N. Tsitsiklis. An analysis of stochastic shortest path problems.
Math. Oper. Res., 16(3):580–595, 1991.
[29] D. P. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific,
Cambridge, Mass., 1996.
[30] D. Bertsimas, D. Gamarnik, and J. N. Tsitsiklis. Stability conditions for multiclass fluid
queueing networks. IEEE Trans. Automat. Control, 41(11):1618–1631, 1996.
[31] D. Bertsimas, D. Gamarnik, and J. N. Tsitsiklis. Correction to: “Stability conditions for
multiclass fluid queueing networks” [IEEE Trans. Automat. Control. 41 (1996), no. 11,
1618–1631; MR1419686 (97f:90028)]. IEEE Trans. Automat. Control, 42(1):128, 1997.
[32] D. Bertsimas, D. Gamarnik, and J. N. Tsitsiklis. Performance of multiclass Markov-
ian queueing networks via piecewise linear Lyapunov functions. Ann. Appl. Probab.,
11(4):1384–1428, 2001.
[33] D. Bertsimas, I. Paschalidis, and J. N. Tsitsiklis. Optimization of multiclass queueing
networks: polyhedral and nonlinear characterizations of achievable performance. Ann.
Appl. Probab., 4:43–75, 1994.
[34] N. P. Bhatia and G. P. Szegö. Stability Theory of Dynamical Systems. Springer-Verlag,
New York, 1970.
[35] R. N. Bhattacharaya. On the functional Central Limit Theorem and the law of the
iterated logarithm for Markov processes. Z. Wahrscheinlichkeitstheorie und Verw. Geb.,
60:185–201, 1982.
[36] P. Billingsley. Convergence of Probability Measures. John Wiley & Sons, New York,
1968.
[37] P. Billingsley. Probability and Measure. John Wiley & Sons, New York, 1995.
[38] H. Bonsdorff. Characterisation of uniform recurrence for general Markov chains. Ann.
Acad. Scientarum Fennicae Ser. A, I. Mathematica, Dissertationes, 32, 1980.
Bibliography 569

[39] V. S. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint. Hindustan

Publishing Agency and Cambridge University Press (jointly), Delhi and Cambridge,
2008.
[40] V. S. Borkar and S. P. Meyn. The O.D.E. method for convergence of stochastic approxi-
mation and reinforcement learning. SIAM J. Control Optim., 38(2):447–469, 2000. (also
presented at the IEEE CDC, December 1998).
[41] V. S. Borkar and S. P. Meyn. Risk-sensitive optimal control for Markov decision processes
with monotone cost. Math. Oper. Res., 27(1):192–209, 2002.
[42] Vivek S. Borkar. Uniform stability of controlled Markov processes. In System Theory:
Modeling, Analysis and Control (Cambridge, MA, 1999), volume 518 of Kluwer Internat.
Ser. Engrg. Comput. Sci., pages 107–120. Kluwer Acad. Publ., Boston, 2000.
[43] A. A. Borovkov. Limit theorems for queueing networks. Theory Probab. Appl., 31:413–
427, 1986.
[44] A. Brandt. The stochastic equation yn + 1 = an yn + bn with stationary coeﬃcients. Adv.
Appl. Probab., 18:211–220, 1986.
[45] A. Brandt, P. Franken, and B. Lisek. Stationary Stochastic Models. Akademie-Verlag,
Berlin, 1990.
[46] L. Breiman. The strong law of large numbers for a class of Markov chains. Ann. Math.
Statist., 31:801–803, 1960.
[47] L. Breiman. Some probabilistic aspects of the renewal theorem. In Trans. 4th Prague
Conf. on Inf. Theory, Statist. Dec. Functions and Random Procs., pages 255–261.
Academia, Prague, 1967.
[48] L. Breiman. Probability. Addison-Wesley, Reading, Mass., 1968.
[49] L. A. Breyer and G. O. Roberts. Catalytic perfect simulation. Methodology and Com-
puting in Applied Probability, 3(2):161 – 177, 2001.
[50] P. J. Brockwell, 1992. Personal communication.
[51] P. J. Brockwell and R. A. Davis. Time Series: Theory and Methods. Springer-Verlag,
New York, second edition, 1991.
[52] P. J. Brockwell, J. Liu, and R. L. Tweedie. On the existence of stationary threshold
autoregressive moving-average processes. J. Time Ser. Anal., 13:95–107, 1992.
[53] P. J. Brockwell, S. J. Resnick, and R. L. Tweedie. Storage processes with general release
rule and additive inputs. Adv. Appl. Probab., 14:392–433, 1982.
[54] G. A. Brosamler. An almost everywhere Central Limit Theorem. Math. Proc. Camb.
Phil. Soc., 104:561–574, 1988.
[55] J. B. Brown. Ergodic Theory and Topological Dynamics. Academic Press, New York,
1976.
[56] S. Browne and K. Sigman. Work-modulated queues with applications to storage pro-
cesses. J. Appl. Probab., 29:699–712, 1992.
[57] P. E. Caines. Linear Stochastic Systems. John Wiley & Sons, New York, 1988.
[58] P. Cattiaux, A. Guillin, F.-Y. Wang, and L. Wu. Lyapunov conditions for logarithmic
Sobolev and super Poincaré inequality. ArXiv 0712.0235 [math.PR], 2007.
[59] E. Çinlar. Introduction to Stochastic Processes. Prentice-Hall, Englewood Cliﬀs, N.J.,
1975.
570 Bibliography

[60] H. P. Chan and T. L. Lai. Saddlepoint approximations and nonlinear boundary crossing
probabilities of Markov random walks. Ann. Appl. Probab., 13(2):395–429, 2003.
[61] K. S. Chan. Topics in Nonlinear Time Series Analysis. PhD thesis, Princeton University,
1986.
[62] K. S. Chan. A note on the geometric ergodicity of a Markov chain. Adv. Appl. Probab.,
21:702–704, 1989.
[63] K. S. Chan. Asymptotic behaviour of the Gibbs sampler. J. Amer. Statist. Assoc.,
88:320–326, 1993.
[64] K. S. Chan, J. Petruccelli, H. Tong, and S. W. Woolford. A multiple threshold AR(1)
model. J. Appl. Probab., 22:267–279, 1985.
[65] H. Chen. Fluid approximations and stability of multiclass queueing networks: work-
conserving disciplines. Ann. Appl. Probab., 5(3):637–665, 1995.
[66] R. Chen and R. S. Tsay. On the ergodicity of TAR(1) processes. Ann. Appl. Probab.,
1:613–634, 1991.
[67] R-R. Chen and S. P. Meyn. Value iteration and optimization of multiclass queueing
networks. Queueing Syst. Theory Appl., 32(1-3):65–97, 1999.
[68] P. Chigansky, R. Liptser, and R. van Handel. Intrinsic methods in filter stability. In
Handbook of Nonlinear Filtering. Oxford University Press, 2008. To appear.
[69] Y. S. Chow and H. Robbins. A renewal theorem for random variables which are dependent
or non-identically distributed. Ann. Math. Statist., 34:390–395, 1963.
[70] K. L. Chung. The general theory of Markov processes according to Doeblin. Z.
Wahrscheinlichkeitstheorie und Verw. Geb., 2:230–254, 1964.
[71] K. L. Chung. Markov Chains with Stationary Transition Probabilities. Springer-Verlag,
Berlin, second edition, 1967.
[72] K. L. Chung. A Course in Probability Theory. Academic Press, New York, second
edition, 1974.
[73] K. L. Chung and D. Ornstein. On the recurrence of sums of random variables. Bull.
Amer. Math. Soc., 68:30–32, 1962.
[74] R. Cogburn. The Central Limit Theorem for Markov processes. In L. M. Le Cam,
J. Neyman, and E. L. Scott, editors, Proceedings of the 6th Berkeley Symposium on
Mathematical Statistics and Probability, pages 485–512. University of California Press,
Berkeley, 1972.
[75] R. Cogburn. A uniform theory for sums of Markov chain transition probabilities. Ann.
Probab., 3:191–214, 1975.
[76] J. W. Cohen. The Single Server Queue. North-Holland, Amsterdam, second edition,
1982.
[77] C. Constantinescu and A. Cornea. Potential Theory on Harmonic Spaces. Springer-
Verlag, Berlin, 1972.
[78] P. C. Consul. Evolution of surnames. Int. Statist. Rev., 59:271–278, 1991.
[79] J. N. Corcoran and R. L. Tweedie. Perfect sampling of ergodic Harris chains. Ann. Appl.
Probab., 11(2):438–451, 2001.
[80] J. G. Dai. On positive Harris recurrence of multiclass queueing networks: a unified
approach via fluid limit models. Ann. Appl. Probab., 5(1):49–77, 1995.
Bibliography 571

[81] J. G. Dai and S. P. Meyn. Stability and convergence of moments for multiclass queueing
networks via fluid limit models. IEEE Trans. Automat. Control, 40:1889–1904, 1995.
[82] J. G. Dai and G. Weiss. Stability and instability of fluid models for reentrant lines. Math.
Oper. Res., 21(1):115–134, 1996.
[83] D. Daley. The serial correlation coefficients of waiting times in a stationary single server
queue. J. Austral. Math. Soc., 8:683–699, 1968.
[84] A. de Acosta and P. Ney. Large deviation lower bounds for arbitrary additive functionals
of a Markov chain. Ann. Probab., 26(4):1660–1682, 1998.
[85] B. Delyon and O. Zeitouni. Lyapunov exponents for filtering problems. In Applied
stochastic analysis (London, 1989), volume 5 of Stochastics Monogr., pages 511–521.
Gordon and Breach, New York, 1991.
[86] C. Derman. A solution to a set of fundamental equations in Markov chains. Proc. Amer.
Math. Soc., 5:332–334, 1954.
[87] G. B. Di Masi and L . Stettner. Ergodicity of hidden Markov models. Math. Control
Signals Systems, 17(4):269–296, 2005.
[88] P. Diaconis. Group Representations in Probability and Statistics. Institute of Mathemat-
ical Statistics, Hayward, Calif., 1988.
[89] P. Diaconis and L. Saloff-Coste. Logarithmic sobolev inequalities for finite Markov chains.
Ann. Appl. Probab., 6(3):695–750, 1996.
[90] P. Diaconis and D. Stroock. Geometric bounds for eigenvalues of Markov chains. Ann.
Appl. Probab., 1:36–61, 1991.
[91] J. Diebolt. Loi stationnaire et loi des fluctuations pour le processus autorégressif général
d’ordre un. C. R. Acad. Sci., 310:449–453, 1990.
[92] J. Diebolt and D. Guégan. Probabilistic properties of the general nonlinear Markovian
process of order one and applications to time series modeling. Technical report 125,
Laboratoire de Statistique Théorique et Appliquée, Université Paris, 1990.
[93] W. Doeblin. Sur les propriétés asymptotiques de mouvement régis par certain types de
chaı̂nes simples. Bull. Math. Soc. Roum. Sci., 39(1):57–115; 39(2), 3–61, 1937.
[94] W. Doeblin. Exposé de la théorie des chaı̂nes simples constantes de Markov à un nombre
fini d’états. Revue Mathematique de l’Union Interbalkanique, 2:77–105, 1938.
[95] W. Doeblin. Eléments d’une théorie générale des chaı̂nes simples constantes de Markoff.
Ann. Sci. Ec. Norm. Sup., 57:61–111, 1940.
[96] M. D. Donsker and S. R. S. Varadhan. Asymptotic evaluation of certain Markov process
expectations for large time. I. II. Comm. Pure Appl. Math., 28:1–47; 28 (1975), 279–301,
1975.
[97] M. D. Donsker and S. R. S. Varadhan. Asymptotic evaluation of certain Markov process
expectations for large time. III. Comm. Pure Appl. Math., 29(4):389–461, 1976.
[98] M. D. Donsker and S. R. S. Varadhan. Asymptotic evaluation of certain Markov process
expectations for large time. IV. Comm. Pure Appl. Math., 36(2):183–212, 1983.
[99] J. L. Doob. Stochastic Processes. John Wiley & Sons, New York, 1953.
[100] R. Douc, G. Fort, E. Moulines, and P. Soulier. Practical drift conditions for subgeometric
rates of convergence. Ann. Appl. Probab., 14(3):1353–1377, 2004.
[101] D. Down, S. P. Meyn, and R. L. Tweedie. Exponential and uniform ergodicity of Markov
processes. Ann. Probab., 23(4):1671–1691, 1995.
572 Bibliography

[102] M. Duﬂo. Méthodes Récursives Aléatoires. Masson, Paris, 1990.

[103] W. T. M. Dunsmuir, S. P. Meyn, and G. Roberts. Obituary: Richard Lewis Tweedie. J.
Appl. Probab., 39(2):441–454, 2002.
[104] P. Dupuis and R. S. Ellis. A Weak Convergence Approach to the Theory of Large Devi-
ations. John Wiley & Sons Inc., New York, 1997.
[105] E. B. Dynkin. Markov Processes I, II. Academic Press, New York, 1965.
[106] R. J. Elliott, L. Aggoun, and J. B. Moore. Hidden Markov Models. Springer-Verlag, New
York, 1995.
[107] Y. Ephraim and N. Merhav. Hidden Markov processes. IEEE Trans. Inform. Theory,
48(6):1518–1569, 2002.
[108] S. N. Ethier and T. G. Kurtz. Markov Processes: Characterization and Convergence.
John Wiley & Sons, New York, 1986.
[109] G. Fayolle. On random walks arising in queueing systems: ergodicity and transience via
quadratic forms as Lyapounov functions. I. Queueing Systems, 5:167–183, 1989.
[110] G. Fayolle, V. A. Malyshev, M. V. Menshikov, and A. F. Sidorenko. Lyapunov functions
for Jackson networks. Math. Oper. Res., 18(4):916–927, 1993.
[111] P. D. Feigin and R. L. Tweedie. Random coeﬃcient autoregressive processes: a Markov
chain analysis of stationarity and ﬁniteness of moments. J. Time Ser. Anal., 6:1–14,
1985.
[112] P. D. Feigin and R. L. Tweedie. Linear functionals and Markov chains associated with
the Dirichlet process. Math. Proc. Camb. Phil. Soc., 105:579–585, 1989.
[113] E. Feinberg and A. Shwartz, editors. Markov Decision Processes: Models, Methods,
Directions, and Open Problems. Kluwer Acad. Publ., Holland, 2001.
[114] W. Feller. An Introduction to Probability Theory and Its Applications. I. John Wiley &
Sons, New York, third edition, 1968.
[115] W. Feller. An Introduction to Probability Theory and Its Applications. II. John Wiley &
Sons, New York, second edition, 1971.
[116] J. Feng. Martingale problems for large deviations of Markov processes. Stoch. Proc.
Applns., 81:165–212, 1999.
[117] J. Feng and T. G. Kurtz. Large Deviations for Stochastic Processes, volume 131 of
Mathematical Surveys and Monographs. American Mathematical Society, Providence,
R.I., 2006.
[118] P. A. Ferrari, H. Kesten, and S. Martı́nez. R-positivity, quasi-stationary distributions and
ratio limit theorems for a class of probabilistic automata. Ann. Appl. Probab., 6:577–616,
1996.
[119] J. A. Fill. Eigenvalue bounds on convergence to stationarity for nonreversible Markov
chains, with an application to the exclusion process. Ann. Appl. Probab., 1(1):62–87,
1991.
[120] W. H. Fleming. Exit probabilities and optimal stochastic control. App. Math. Optim.,
4:329–346, 1978.
[121] S. R. Foguel. Positive operators on C(X). Proc. Amer. Math. Soc., 22:295–297, 1969.
[122] S. R. Foguel. The Ergodic Theory of Markov Processes. Van Nostrand Reinhold, New
York, 1969.
Bibliography 573

[123] S. R. Foguel. The ergodic theory of positive operators on continuous functions. Ann.
Scuola Norm. Sup. Pisa, 27:19–51, 1973.
[124] S. R. Foguel. Selected Topics in the Study of Markov Operators. Carolina Lecture Series.
Dept. of Mathematics, University of North Carolina at Chapel Hill, 1980.
[125] G. Fort, S. Meyn, E. Moulines, and P. Priouret. ODE methods for skip-free Markov
chain stability with applications to MCMC. Ann. Appl. Probab., 18(2):664–707, 2008.
[126] G. Fort and E. Moulines. Polynomial ergodicity of Markov transition kernels. Stoch.
Proc. Applns., 103(1):57–99, 2003.
[127] S. G. Foss and R. L. Tweedie. Perfect simulation and backward coupling. Comm. Statist.
Stochastic Models, 14(1-2):187–203, 1998. Special issue in honor of Marcel F. Neuts.
[128] S. G. Foss, R. L. Tweedie, and J. N. Corcoran. Simulating the invariant measures of
Markov chains using backward coupling at regeneration times. Probab. Engrg. Inform.
Sci., 12(3):303–320, 1998.
[129] F. G. Foster. On the stochastic matrices associated with certain queuing processes. Ann.
Math. Statist., 24:355–360, 1953.
[130] J. J. Fuchs and B. Delyon. Adaptive control of a simple time-varying system. IEEE
Trans. Automat. Control, 37:1037–1040, 1992.
[131] Cheng-Der Fuh. Asymptotic operating characteristics of an optimal change point detec-
tion in hidden Markov models. Ann. Statist., 32(5):2305–2339, 2004.
[132] Cheng-Der Fuh and Tze Leung Lai. Asymptotic expansions in multidimensional Markov
renewal theory and first passage times for Markov random walks. Adv. in Appl. Probab.,
33(3):652–673, 2001.
[133] Cheng-Der Fuh and Cun-Hui Zhang. Poisson equation, moment inequalities and quick
convergence for Markov random walks. Stoch. Proc. Applns., 87(1):53–67, 2000.
[134] H. Furstenberg and H. Kesten. Products of random matrices. Ann. Math. Statist.,
31:457–469, 1960.
[135] D. Gamarnik. Extension of the PAC framework to finite and countable Markov chains.
IEEE Trans. Inform. Theory, 49(1):338–345, 2003.
[136] J. M. Gani and I. W. Saunders. Some vocabulary studies of literary texts. Sankhyā Ser.
B, 38:101–111, 1976.
[137] L. Georgiadis, W. Szpankowski, and L. Tassiulas. A scheduling policy with maximal
stability region for ring networks with spatial reuse. Queueing Syst. Theory Appl., 19(1-
2):131–148, 1995.
[138] P. Glynn and R. Szechtman. Some new perspectives on the method of control variates. In
K.T. Fang, F.J. Hickernell, and H. Niederreiter, editors, Monte Carlo and Quasi-Monte
Carlo Methods 2000: Proceedings of a Conference held at Hong Kong Baptist University,
Hong Kong SAR, China, pages 27–49. Springer-Verlag, Berlin, 2002.
[139] P. W. Glynn and S. P. Meyn. A Liapounov bound for solutions of the Poisson equation.
Ann. Probab., 24(2):916–931, 1996.
[140] P. W. Glynn and D. Ormoneit. Hoeffding’s inequality for uniformly ergodic Markov
chains. Statistics and Probability Letters, 56:143–146, 2002.
[141] F. Z. Gong and L. M. Wu. Spectral gap of positive operators and applications. J. Math.
Pure Appl., 85:151–191, 2006.
[142] G. C. Goodwin, P. J. Ramadge, and P. E. Caines. Discrete time stochastic adaptive
control. SIAM J. Control Optim., 19:829–853, 1981.
574 Bibliography

[143] C. W. J. Granger and P. Andersen. An Introduction to Bilinear Time Series Models.

Vandenhoeck and Ruprecht, Göttingen, 1978.
[144] R. M. Gray. Entropy and Information Theory. Springer-Verlag, New York, 1990.
[145] J. A. Gubner, B. Gopinath, and S. R. S. Varadhan. Bounding functions of Markov
processes and the shortest queue problem. Adv. in Appl. Probab., 21(4):842–860, 1989.
[146] D. Guégan. Different representations for bilinear models. J. Time Ser. Anal., 8:389–408,
1987.
[147] A. Guillin, C. Leonard, L. Wu, and N. Yao. Transportation inequalities for Markov
processes. Unpublished manuscript. Preprint available at http://www.latp.univ-
mrs.fr/˜guillin/index3.html, 2007.
[148] L. Guo and S. P. Meyn. Adaptive control for time-varying systems: a combination of
martingale and Markov chain techniques. Int. J. Adaptive Control and Signal Processing,
3:1–14, 1989.
[149] M. Guo and J. Petruccelli. On the null recurrence and transience of a first-order SETAR
model. J. Appl. Probab., 28:584–592, 1991.
[150] Olle Häggström. On the Central Limit Theorem for geometrically ergodic Markov chains.
Prob. Theory Related Fields, 132(1):74–82, 2005.
[151] P. Hall and C. C. Heyde. Martingale Limit Theory and Its Application. Academic Press,
New York, 1980.
[152] P. R. Halmos. Measure Theory. Van Nostrand, Princeton, 1950.
[153] A. Handwerk and H. Willems. Wolfgang Doeblin: A mathematician rediscovered. Video-
MATH. Springer, Berlin, 2007.
[154] L. P. Hansen and J. Scheinkman. Long term risk: an operator approach. In preparation,
2008.
[155] T. E. Harris. The existence of stationary measures for certain Markov processes. In
Proceedings of the 3rd Berkeley Symposium on Mathematical Statistics and Probability,
volume 2, pages 113–124. University of California Press, Berkeley, 1956.
[156] J. M. Harrison. Discrete dynamic programming with unbounded rewards. Ann. Math.
Statist., 43:636–644, 1972.
[157] J. M. Harrison and S. I. Resnick. The stationary distribution and first exit probabilities
of a storage process with general release rule. Math. Oper. Res., 1:347–358, 1976.
[158] S. G. Henderson and P. W. Glynn. Approximating martingales for variance reduction in
Markov process simulation. Math. Oper. Res., 27(2):253–271, 2002.
[159] S. G. Henderson, S. P. Meyn, and V. B. Tadić. Performance evaluation and policy selec-
tion in multiclass networks. Discrete Event Dynamic Systems: Theory and Applications,
13(1-2):149–189, 2003.
[160] S.G. Henderson. Variance Reduction via an Approximating Markov Process. PhD thesis,
Stanford University, Stanford, Calif., 1997.
[161] I. N. Herstein. Topics in Algebra. John Wiley & Sons, New York, second edition, 1975.
[162] G. Högnas. On random walks with continuous components. Preprint no 26, Aarhus
Universitet, 1977.
[163] A. Hordijk and F.M. Spieksma. On ergodicity and recurrence properties of a Markov
chain with an application. Adv. Appl. Probab., 24:343–376, 1992.
Bibliography 575

[164] C. Huang and D. Isaacson. Ergodicity using mean visit times. J. Lond. Math. Soc.,
14:570–576, 1976.
[165] J. Huang, I. Kontoyiannis, and S. P. Meyn. The ODE method and spectral theory
of Markov operators. In T. E. Duncan and B. Pasik-Duncan, editors, Proceedings of
the workshop held at the University of Kansas, Lawrence, October 18–20, 2001, pages
205–222. Springer-Verlag, Berlin, 2002.
[166] W. Huisinga, S. Meyn, and C. Schütte. Phase transitions and metastability in Markovian
and molecular systems. Ann. Appl. Probab., 14(1):419–458, 2004.
[167] K. Ichihara and H. Kunita. A classification of the second order degenerate elliptic opera-
tor and its probabilistic characterization. Z. Wahrscheinlichkeitstheorie und Verw. Geb.,
30:235–254, 1974.
[168] N. Ikeda and S. Watanabe. Stochastic Differential Equations and Diffusion Processes.
North-Holland, Amsterdam, 1981.
[169] R. Isaac. Some topics in the theory of recurrent Markov processes. Duke Math. J.,
35:641–652, 1968.
[170] D. Isaacson and R. L. Tweedie. Criteria for strong ergodicity of Markov chains. J. Appl.
Probab., 15:87–95, 1978.
[171] N. Jain. Some limit theorem for a general Markov process. Z. Wahrscheinlichkeitstheorie
und Verw. Geb., 6:206–223, 1966.
[172] N. Jain and B. Jamison. Contributions to Doeblin’s theory of Markov processes. Z.
Wahrscheinlichkeitstheorie und Verw. Geb., 8:19–40, 1967.
[173] B. Jakubczyk and E. D. Sontag. Controllability of nonlinear discrete-time systems: a
lie-algebraic approach. SIAM J. Control Optim., 28:1–33, 1990.
[174] B. Jamison. Asymptotic behavior of successive iterates of continuous functions under a
Markov operator. J. Math. Anal. Appl., 9:203–214, 1964.
[175] B. Jamison. Ergodic decomposition induced by certain Markov operators. Trans. Amer.
Math. Soc., 117:451–468, 1965.
[176] B. Jamison. Irreducible Markov operators on C(S). Proc. Amer. Math. Soc., 24:366–370,
1970.
[177] B. Jamison and S. Orey. Markov chains recurrent in the sense of Harris. Z. Wahrschein-
lichkeitstheorie und Verw. Geb., 8:41–48, 1967.
[178] B. Jamison and R. Sine. Sample path convergence of stable Markov processes. Z.
Wahrscheinlichkeitstheorie und Verw. Geb., 28:173–177, 1974.
[179] S. F. Jarner and S. Hansen. Geometric ergodicity of Metropolis algorithms. Stoch. Proc.
Applns., 85(2):341–361, 2000.
[180] S. F. Jarner and G. O. Roberts. Polynomial convergence rates of Markov chains. Ann.
Appl. Probab., 12(1):224–247, 2002.
[181] A. A. Johnson and G. L. Jones. Gibbs sampling for a Bayesian hierarchical general linear
model. ArXiv:0712.3056 [math.PR], 2007.
[182] D. A. Jones. Non-linear autoregressive processes. Proc. Roy. Soc. A, 360:71–95, 1978.
[183] G. L. Jones. On the Markov chain Central Limit Theorem. Probab. Surv., 1:299–320
(electronic), 2004.
[184] G. L. Jones and J. P. Hobert. Honest exploration of intractable probability distributions
via Markov chain Monte Carlo. Statist. Sci., 16(4):312–334, 2001.
576 Bibliography

[185] A.F. Veinott Jr. Discrete dynamic programming with sensitive discount optimality cri-
teria. Ann. Math. Statist., 40(5):1635–1660, 1969.
[186] M. Kac. On the notion of recurrence in discrete stochastic processes. Bull. Amer. Math.
Soc., 53:1002–1010, 1947.
[187] V. V. Kalashnikov. Analysis of ergodicity of queueing systems by Lyapunov’s direct
method (in russian). Avtomatica i Telemechanica, 4:46–54, 1971.
[188] V. V. Kalashnikov. The property of gamma-reflexivity for Markov sequences. Soviet
Math. Dokl., 14:1869–1873, 1973.
[189] V. V. Kalashnikov. Stability analysis in queueing problems by the method of test func-
tions. Theory Probab. Appl., 22:86–103, 1977.
[190] V. V. Kalashnikov. Qualitative Analysis of Complex Systems Behaviour by the Test
Functions Method (in Russian). Nauka, Moscow, 1978. [in Russian].
[191] V. V. Kalashnikov and S. T. Rachev. Mathematical Methods for Construction of Queue-
ing Models. Wadsworth and Brooks/Cole, New York, 1990.
[192] R. E. Kalman and J. E. Bertram. Control system analysis and design by the second
method of Lyapunov. Trans. ASME Ser. D: J. Basic Eng., 82:371–400, 1960.
[193] M. Kaplan. A sufficient condition for nonergodicity of a Markov chain. IEEE Trans.
Inform. Theory, 25:470–471, 1979.
[194] S. Karlin and H. M. Taylor. A First Course in Stochastic Processes. Academic Press,
New York, second edition, 1975.
[195] H. A. Karlsen. Existence of moments in a stationary stochastic difference equation. Adv.
Appl. Probab., 22:129–146, 1990.
[196] N. V. Kartashov. Criteria for uniform ergodicity and strong stability of Markov chains
with a common phase space. Theory Probab. Appl., 30:71–89, 1985.
[197] N.V. Kartashov. Inequalities in theorems of ergodicity and stability for Markov chains
with a common phase space. Theory Probab. Appl., 30:247–259, 1985.
[198] J. L. Kelley. General Topology. Van Nostrand, Princeton, N.J., 1955.
[199] F. P. Kelly. Reversibility and Stochastic Networks. John Wiley & Sons, Chichester, U.K.,
1979.
[200] D. G. Kendall. Some problems in the theory of queues. J. Roy. Statist. Soc. Ser. B,
13:151–185, 1951.
[201] D. G. Kendall. Stochastic processes occurring in the theory of queues and their analysis
by means of the imbedded Markov chain. Ann. Math. Statist., 24:338–354, 1953.
[202] D. G. Kendall. Unitary dilations of Markov transition operators and the correspond-
ing integral representation for transition-probability matrices. In U. Grenander, editor,
Probability and Statistics, pages 139–161. Almqvist and Wiksell, Stockholm, 1959.
[203] D. G. Kendall. Geometric ergodicity in the theory of queues. In K. J. Arrow, S. Karlin,
and P. Suppes, editors, Mathematical Methods in the Social Sciences, pages 176–195.
Stanford University Press, Stanford, 1960.
[204] D. G. Kendall. Kolmogorov as I remember him. Statist. Sci., 6:303–312, 1991.
[205] G. Kersting. On recurrence and transience of growth models. J. Appl. Probab., 23:614–
625, 1986.
[206] R. Z. Khas’minskii. Stochastic Stability of Differential Equations. Sijthoff & Noordhoff,
Netherlands, 1980.
Bibliography 577

[207] J. F. C. Kingman. The ergodic behaviour of random walks. Biometrika, 48:391–396,

1961.
[208] J. F. C. Kingman. Regenerative Phenomena. John Wiley & Sons, London, 1972.
[209] C. Kipnis and S. R. S. Varadhan. Central limit theorem for additive functionals of
reversible Markov processes and applications to simple exclusions. Comm. Math. Phys.,
104(1):1–19, 1986.
[210] A. Klein, L. J. Landau, and D. S. Shucker. Decoupling inequalities for stationary Gaus-
sian processes. Ann. Probab., 10:702–708, 1981.
[211] W. Kliemann. Recurrence and invariant measures for degenerate diffusions. Ann.
Probab., 15:690–707, 1987.
[212] W. Kliemann and N. Sri Namachchivaya, editors. Nonlinear Dynamics and Stochastic
Mechanics. CRC Press, Boca Raton, Fla., 1995.
[213] F. Kochman and J. Reeds. A simple proof of Kaijser’s unique ergodicity result for hidden
Markov α-chains. Ann. Appl. Probab., 16(4):1805–1815, 2006.
[214] P. V. Kokotovic, J. O’Reilly, and J. K. Khalil. Singular Perturbation Methods in Control:
Analysis and Design. Academic Press, Orlando, Fla., 1986.
[215] A. N. Kolmogorov. Über die analytischen methoden in der wahrscheinlichkeitsrechnung.
Math. Ann., 104:415–458, 1931.
[216] A. N. Kolmogorov. Anfangsgründe der theorie der Markoffschen ketten mit unendlichen
vielen möglichen zuständen. Mat. Sbornik N.S. Ser, pages 607–610, 1936.
[217] V. R. Konda and J. N. Tsitsiklis. On actor-critic algorithms. SIAM J. Control Optim.,
42(4):1143–1166 (electronic), 2003.
[218] I. Kontoyiannis and S. P. Meyn. Spectral theory and limit theorems for geometrically
ergodic Markov processes. Ann. Appl. Probab., 13:304–362, 2003.
[219] I. Kontoyiannis and S. P. Meyn. Large deviations asymptotics and the spectral theory
of multiplicatively regular Markov processes. Electron. J. Probab., 10(3):61–123 (elec-
tronic), 2005.
[220] I. Kontoyiannis and S. P. Meyn. Finite state-space Markov chain approximations for
diffusions. In preparation, 2008.
[221] U. Krengel. Ergodic Theorems. Walter de Gruyter, Berlin, New York, 1985.
[222] P. Krugman and M. Miller, editors. Exchange Rate Targets and Currency Bands. Cam-
bridge University Press, Cambridge, 1992.
[223] P. R. Kumar and S. P. Meyn. Stability of queueing networks and scheduling policies.
IEEE Trans. Automat. Control, 40(2):251–260, 1995.
[224] P. R. Kumar and S. P. Meyn. Duality and linear programs for stability and performance
analysis queueing networks and scheduling policies. IEEE Trans. Automat. Control,
41(1):4–17, 1996.
[225] P. R. Kumar and P. P. Varaiya. Stochastic Systems: Estimation, Identification and
Adaptive Control. Prentice-Hall, Englewood Cliffs, N.J., 1986.
[226] S. Kumar and P. R. Kumar. Performance bounds for queueing networks and scheduling
policies. IEEE Trans. Automat. Control, AC-39:1600–1611, 1994.
[227] H. Kunita. Diffusion processes and control systems. Course at the University of Paris
VI, 1974.
578 Bibliography

[228] H. Kunita. Supports of diffusion processes and controllability problems. In K. Itô, editor,
Proceedings of the International Symposium on Stochastic Differential Equations, pages
163–185. John Wiley & Sons, New York, 1978.
[229] H. Kunita. Stochastic Flows and Stochastic Differential Equations. Cambridge University
Press, Cambridge, 1990.
[230] B. C. Kuo. Automatic Control Systems. Prentice-Hall, Englewood Cliffs, N.J., sixth
edition, 1990.
[231] T. G. Kurtz. The Central Limit Theorem for Markov chains. Ann. Probab., 9:557–560,
1981.
[232] H. J. Kushner. Stochastic Stability and Control. Academic Press, New York, 1967.
[233] M. T. Lacey and W. Philipp. A note on the almost sure Central Limit Theorem. Statistics
and Probability Letters, 9:201–205, 1990.
[234] J. Lamperti. Criteria for the recurrence or transience of stochastic processes I. J. Math.
Anal. Appl., 1:314–330, 1960.
[235] J. Lamperti. Criteria for stochastic processes II: passage time moments. J. Math. Anal.
Appl., 7:127–145, 1963.
[236] C. Landim. Central limit theorem for Markov processes. In From Classical to Modern
Probability, volume 54 of Progr. Probab., pages 145–205. Birkhäuser, Basel, 2003.
[237] G. M. Laslett, D. B. Pollard, and R. L. Tweedie. Techniques for establishing ergodic
and recurrence properties of continuous-valued Markov chains. Nav. Res. Log. Quart.,
25:455–472, 1978.
[238] M. Lin. Conservative Markov processes on a topological space. Israel J. Math., 8:165–186,
1970.
[239] T. Lindvall. Lectures on the Coupling Method. John Wiley & Sons, New York, 1992.
[240] R. S. Liptster and A. N. Shiryayev. Statistics of Random Processes, II: Applications.
Springer-Verlag, New York, 1978.
[241] R. Lund and R. L. Tweedie. Geometric convergence rates for stochastically ordered
Markov chains. Math. Oper. Res., 20:182–194, 1996.
[242] N. Maigret. Théorème de limite centrale pour une chaine de Markov récurrente Harris
positive. Ann. Inst. Henri Poincaré Ser. B, 14:425–440, 1978.
[243] V. A. Malyšev and M. V. Men šikov. Ergodicity, continuity and analyticity of countable
Markov chains. Trudy Moskov. Mat. Obshch., 39:3–48, 235, 1979. Trans. Moscow Math.
Soc., pp. 1-48, 1981.
[244] V. A. Malyšhev. Classification of two-dimensional positive random walks and almost
linear semi-martingales. Soviet Math. Dokl., 13:136–139, 1972.
[245] R. S. Mamon and R. J. Elliott. Hidden Markov Models in Finance, volume 104 of
International Series in Operations Research & Management Science. Springer-Verlag,
New York, 2007.
[246] P. Marbach and J. N. Tsitsiklis. Simulation-based optimization of Markov reward pro-
cesses. IEEE Trans. Automat. Control, 46(2):191–209, 2001.
[247] I. M. Y. Mareels and R. R. Bitmead. Bifurcation effects in robust adaptive control. IEEE
Trans. Circuits and Systems, 35:835–841, 1988.
[248] A. A. Markov. Extension of the law of large numbers to dependent quantities (in Rus-
sian). Izv. Fiz.-Matem. Obsch. Kazan Univ. (2nd Ser.), 15:135–156, 1906.
Bibliography 579

[249] P. G. Marlin. On the ergodic theory of Markov chains. Operations Res., 21:617–622,
1973.
[250] P. Mathé. Numerical integration using V -uniformly ergodic Markov chains. J. Appl.
Probab., 41(4):1104–1112, 2004.
[251] J. G. Mauldon. On non-dissipative Markov chains. Math. Proc. Camb. Phil. Soc., 53:825–
835, 1958.
[252] M. Maxwell and M. Woodroofe. Central limit theorems for additive functionals of Markov
chains. Ann. Probab., 28(2):713–724, 2000.
[253] D. Q. Mayne. Optimal nonstationary estimation of the parameters of a linear system
with Gaussian inputs. J. Electron. Contr., 14:101, 1963.
[254] A. Medio. Invariant probability distributions in economic models: a gen-
eral result. Macroeconomic Dynamics, 8(2):162–187, 2004. Available at
http://ideas.repec.org/a/cup/macdyn/v8y2004i02p162-187 03.html.
[255] A. I. Mees, editor. Nonlinear Dynamics and Statistics. Birkhäuser, Boston, 2001. Selected
papers from the workshop held at Cambridge University, Cambridge, September 1998.
[256] K.L. Mengersen and R.L. Tweedie. Rates of convergence of the Hastings and Metropolis
algorithms. Ann. Statist., 24:101–121, 1996.
[257] M. V. Men’šikov. Ergodicity and transience conditions for random walks in the positive
octant of space. Soviet Math. Dokl., 15:1118–1121, 1974.
[258] J.-F. Mertens, E. Samuel-Cahn, and S. Zamir. Necessary and suﬃcient conditions for
recurrence and transience of Markov chains, in terms of inequalities. J. Appl. Probab.,
15:848–851, 1978.
[259] S. P. Meyn. Ergodic theorems for discrete time stochastic systems using a stochastic
Lyapunov function. SIAM J. Control Optim., 27:1409–1439, 1989.
[260] S. P. Meyn. Stability of Markov chains on topological spaces with applications to adap-
tive control and time series analysis. In L. Gerencsér and P. E. Caines, editors, Top-
ics in Stochastic Systems: Modelling, Estimation and Adaptive Control, pages 369–401.
Springer-Verlag, New York, 1991.
[261] S. P. Meyn. The policy iteration algorithm for average reward Markov decision processes
with general state space. IEEE Trans. Automat. Control, 42(12):1663–1680, 1997.
[262] S. P. Meyn. Stability and optimization of queueing networks and their ﬂuid models.
In Mathematics of Stochastic Manufacturing Systems (Williamsburg, VA, 1996), pages
175–199. American Mathematical Society, Providence, R.I., 1997.
[263] S. P. Meyn. Algorithms for optimization and stabilization of controlled Markov chains.
Sādhanā, 24(4-5):339–367, 1999.
[264] S. P. Meyn. Sequencing and routing in multiclass queueing networks I: feedback regula-
tion. SIAM J. Control Optim., 40(3):741–776, 2001.
[265] S. P. Meyn. Large deviation asymptotics and control variates for simulating large func-
tions. Ann. Appl. Probab., 16(1):310–339, 2006.
[266] S. P. Meyn. Myopic policies and MaxWeight policies for stochastic networks. In Proc.
of the 46th Conf. on Dec. and Control, pages 639–646, 2007.
[267] S. P. Meyn. Control Techniques for Complex Networks. Cambridge University Press,
Cambridge, 2008.
[268] S. P. Meyn. Stability and asymptotic optimality of generalized MaxWeight policies. To
appear in SIAM J. Control Optim., 2008.
580 Bibliography

[269] S. P. Meyn and L. J. Brown. Model reference adaptive control of time varying and
stochastic systems. IEEE Trans. Automat. Control, 38:1738–1753, 1993.
[270] S. P. Meyn and P. E. Caines. A new approach to stochastic adaptive control. IEEE
Trans. Automat. Control, AC-32:220–226, 1987.
[271] S. P. Meyn and P. E. Caines. Stochastic controllability and stochastic Lyapunov functions
with applications to adaptive and nonlinear systems. In Stochastic Diﬀerential Systems.
Proc. 4th Bad Honnef Conference, pages 235–257. Springer-Verlag, Berlin, 1989.
[272] S. P. Meyn and P. E. Caines. Asymptotic behavior of stochastic systems processing
Markovian realizations. SIAM J. Control Optim., 29:535–561, 1991.
[273] S. P. Meyn and D. G. Down. Stability of generalized Jackson networks. Ann. Appl.
Probab., 4:124–148, 1994.
[274] S. P. Meyn and L. Guo. Stability, convergence, and performance of an adaptive control
algorithm applied to a randomly varying system. IEEE Trans. Automat. Control, AC-
37:535–540, 1992.
[275] S. P. Meyn and L. Guo. Geometric ergodicity of a doubly stochastic time series model.
J. Time Ser. Analysis, 14(1):93–108, 1993.
[276] S. P. Meyn, G. Hagen, G. Mathew, and A. Banasuk. On com-
plex spectra and metastability of Markov models. In Proc. of
the 47th Conf. on Dec. and Control, 2008. More information at
https://css.paperplaza.net/conferences/scripts/abstract.pl?ConfID=32&Number=1318.
[277] S. P. Meyn and R. L. Tweedie. Stability of Markovian processes I: discrete time chains.
Adv. Appl. Probab., 24:542–574, 1992.
[278] S. P. Meyn and R. L. Tweedie. Generalized resolvents and Harris recurrence of Markov
processes. Contemporary Mathematics, 149:227–250, 1993.
[279] S. P. Meyn and R. L. Tweedie. Stability of Markovian processes II: continuous time
processes and sampled chains. Adv. Appl. Probab., 25:487–517, 1993.
[280] S. P. Meyn and R. L. Tweedie. Stability of Markovian processes III: Foster–Lyapunov
criteria for continuous time processes. Adv. Appl. Probab., 25:518–548, 1993.
[281] S. P. Meyn and R. L. Tweedie. The Doeblin decomposition. Contemporary Mathematics,
149:211–225, 1993.
[282] S. P. Meyn and R. L. Tweedie. Computable bounds for convergence rates of Markov
chains. Ann. Appl. Probab., 4:981–1011, 1994.
[283] S. P. Meyn and R. L. Tweedie. State-dependent criteria for convergence of Markov
chains. Ann. Appl. Probab., 4:149–168, 1994.
[284] H. D. Miller. Geometric ergodicity in a class of denumerable Markov chains. Z.
Wahrscheinlichkeitstheorie und Verw. Geb., 4:354–373, 1966.
[285] S. Mittnik. Nonlinear time series analysis with generalized autoregressions: a state space
approach. Working paper WP-91-06, State University of New York at Stony Brook, Stony
Brook, N.Y., 1991.
[286] A. Mokkadem. Criteres de melange pour des processus stationnaires. Estimation sous
des hypotheses de melange. Entropie de processus lineaires. PhD thesis, Université Paris
Sud, Centre d’Orsay, 1987.
[287] P. A. P. Moran. The statistical analysis of the Canadian lynx cycle I: structure and
prediction. Aust. J. Zool., 1:163–173, 1953.
[288] P. A. P. Moran. The Theory of Storage. Methuen, London, 1959.
Bibliography 581

[289] M. D. Moustafa. Input-output Markov processes. Proc. Koninkl. Ned. Akad. Wetensch.,
A60:112–118, 1957.
[290] P. Mykland, L. Tierney, and B. Yu. Regeneration in Markov chain samplers. J. Amer.
Statist. Assoc., 90(429), 1995.
[291] E. Nelson. The adjoint Markoff process. Duke Math. J., 25:671–690, 1958.
[292] M. F. Neuts. Two Markov chains arising from examples of queues with state dependent
service times. Sankhyā Ser. A, 297:259–264, 1967.
[293] M. F. Neuts. Markov chains with applications in queueing theory, which have a matrix-
geometric invariant probability vector. Adv. Appl. Probab., 10:185–212, 1978.
[294] Marcel F. Neuts. Matrix-Geometric Solutions in Stochastic Models: An Algorithmic
Approach. Dover Publications, New York, 1994. Corrected reprint of the 1981 original.
[295] J. Neveu. Potentiel Markovien récurrent des chaı̂nes de Harris. Ann. Inst. Fourier,
Grenoble, 22:7–130, 1972.
[296] J. Neveu. Discrete-Parameter Martingales. North-Holland, Amsterdam, 1975.
[297] P. Ney and E. Nummelin. Markov additive processes I: eigenvalue properties and limit
theorems. Ann. Probab., 15(2):561–592, 1987.
[298] P. Ney and E. Nummelin. Markov additive processes II: large deviations. Ann. Probab.,
15(2):593–609, 1987.
[299] D. F. Nicholls and B. G. Quinn. Random Coefficient Autoregressive Models: An Intro-
duction. Springer-Verlag, New York, 1982.
[300] S. Niemi and E. Nummelin. Central limit theorems for Markov random walks. Com-
mentationes Physico-Mathematicae, 54, 1982.
[301] E. Nummelin. A splitting technique for Harris recurrent chains. Z. Wahrscheinlichkeit-
stheorie und Verw. Geb., 43:309–318, 1978.
[302] E. Nummelin. Uniform and ratio limit theorems for Markov renewal and semi-
regenerative processes on a general state space. Ann. Inst. Henri Poincaré Ser. B,
14:119–143, 1978.
[303] E. Nummelin. General Irreducible Markov Chains and Nonnegative Operators. Cam-
bridge University Press, Cambridge, 1984.
[304] E. Nummelin. On the Poisson equation in the potential theory of a single kernel. Math.
Scand., 68:59–82, 1991.
[305] E. Nummelin and P. Tuominen. Geometric ergodicity of Harris recurrent Markov chains
with applications to renewal theory. Stoch. Proc. Applns., 12:187–202, 1982.
[306] E. Nummelin and P. Tuominen. The rate of convergence in Orey’s theorem for Har-
ris recurrent Markov chains with applications to renewal theory. Stoch. Proc. Applns.,
15:295–311, 1983.
[307] E. Nummelin and R. L. Tweedie. Geometric ergodicity and R-positivity for general
Markov chains. Ann. Probab., 6:404–420, 1978.
[308] S. Orey. Recurrent Markov chains. Pacific J. Math., 9:805–827, 1959.
[309] S. Orey. Limit Theorems for Markov Chain Transition Probabilities. Van Nostrand
Reinhold, London, 1971.
[310] D. Ornstein. Random walks I. Trans. Amer. Math. Soc., 138:1–43, 1969.
[311] D. Ornstein. Random walks II. Trans. Amer. Math. Soc., 138:45–60, 1969.
582 Bibliography

[312] A. Pakes and D. Pollard. Simulation and the asymptotics of optimization estimators.
Econometrica, 57:1027–1057, 1989.
[313] A. G. Pakes. Some conditions for ergodicity and recurrence of Markov chains. Operations
Res., 17:1048–1061, 1969.
[314] K. R. Parthasarathy. Probability Measures on Metric Spaces. Academic Press, New York,
1967.
[315] J. Petruccelli and S. W. Woolford. A threshold AR(1) model. J. Appl. Probab., 21:270–
286, 1984.
[316] R. G. Phillips and P. V. Kokotovic. A singular pertubation approach to modeling and
control of Markov chains. IEEE Trans. Automat. Control, 26(5), 1981.
[317] J. W. Pitman. Uniform rates of convergence for Markov chain transition probabilities.
Z. Wahrscheinlichkeitstheorie und Verw. Geb., 29:193–227, 1974.
[318] D. B. Pollard and R. L. Tweedie. R-theory for Markov chains on a topological state
space I. J. London Math. Society, 10:389–400, 1975.
[319] D. B. Pollard and R. L. Tweedie. R-theory for Markov chains on a topological state
space II. Z. Wahrscheinlichkeitstheorie und Verw. Geb., 34:269–278, 1976.
[320] N. Popov. Conditions for geometric ergodicity of countable Markov chains. Soviet Math.
Dokl., 18:676–679, 1977.
[321] M. Pourahmadi. On stationarity of the solution of a doubly stochastic model. J. Time
Ser. Anal., 7:123–131, 1986.
[322] N. U. Prabhu. Queues and Inventories. John Wiley & Sons, New York, 1965.
[323] S. Rai, P. Glynn, and J.E. Glynn. Recurrence classiﬁcation for a family of non-linear
storage models. Submitted for publication., 2007.
[324] R. Rayadurgam, S. P. Meyn, and L. Brown. Bayesian adaptive control of time varying
systems. In Proc. of the 31st Conf. on Dec. and Control, Tucson, Ariz., 1992.
[325] S. I. Resnick. Extreme Values, Regular Variation and Point Processes. Springer-Verlag,
New York, 1987.
[326] D. Revuz. Markov Chains. North-Holland, Amsterdam, second edition, 1984.
[327] C. P. Robert and G. Casella. Monte Carlo Statistical Methods. Springer-Verlag, New
York, second edition, 2004.
[328] G. O. Roberts and N. Polson. A note on the geometric convergence of the Gibbs sampler.
J. Roy. Statist. Soc. Ser. B, 56:377–384, 1994.
[329] G. O. Roberts and J. S. Rosenthal. Geometric ergodicity and hybrid markov chains.
Electronic Comm. Probab., 2:13–25, 1997.
[330] G. O. Roberts and J. S. Rosenthal. Convergence of the slice sampler. J. Roy. Statist.
Soc. Ser. B, 61:643–660, 1999.
[331] G. O. Roberts and J. S. Rosenthal. The polar slice sampler. Stoch. Models, 18:257–280,
2002.
[332] G. O. Roberts and J. S. Rosenthal. General state space Markov chains and MCMC
algorithms. Probab. Surv., 1:20–71 (electronic), 2004.
[333] G. O. Roberts and A. F. M. Smith. Simple conditions for the convergence of the Gibbs
sampler and Hastings–Metropolis algorithms. Stoch. Proc. Applns., 49(2):207–216, 1994.
[334] G. O. Roberts and R. L. Tweedie. Exponential convergence of Langevin distributions
and their discrete approximations. Bernoulli, 2(4):341–363, 1996.
Bibliography 583

[335] G. O. Roberts and R. L. Tweedie. Geometric convergence and Central Limit Theorems
for multidimensional Hastings and Metropolis algorithms. Biometrika, 83:95–100, 1996.
[336] Z. Rosberg. A note on the ergodicity of Markov chains. J. Appl. Probab., 18:112–121,
1981.
[337] M. Rosenblatt. Equicontinuous Markov operators. Teor. Verojatnost. i Primenen, 9:205–
222, 1964.
[338] M. Rosenblatt. Invariant and subinvariant measures of transition probability functions
acting on continuous functions. Z. Wahrscheinlichkeitstheorie und Verw. Geb., 25:209–
221, 1973.
[339] M. Rosenblatt. Recurrent points and transition functions acting on continuous functions.
Z. Wahrscheinlichkeitstheorie und Verw. Geb., 30:173–183, 1974.
[340] W. A. Rosenkrantz. Ergodicity conditions for two-dimensional Markov chains on the
positive quadrant. Prob. Theory and Related Fields, 83:309–319, 1989.
[341] J. S. Rosenthal. Rates of Convergence for Gibbs Sampler and Other Markov Chains.
PhD thesis, Harvard University, 1992.
[342] J. S. Rosenthal. Correction: “Minorization conditions and convergence rates for Markov
chain Monte Carlo”. J. Amer. Statist. Assoc., 90(431):1136, 1995.
[343] J. S. Rosenthal. Minorization conditions and convergence rates for Markov chain Monte
Carlo. J. Amer. Statist. Assoc., 90(430):558–566, 1995.
[344] J. S. Rosenthal. Quantitative convergence rates of Markov chains: a simple account.
Electron. Comm. Probab., 7:123–128 (electronic), 2002.
[345] W. Rudin. Real and Complex Analysis. McGraw-Hill, New York, 2nd edition, 1974.
[346] S. H. Saperstone. Semidynamical Systems in Infinite Dimensional Spaces. Springer-
Verlag, New York, 1981.
[347] P. J. Schweitzer. Perturbation theory and finite Markov chains. J. Appl. Prob., 5:401–
403, 1968.
[348] E. Seneta. Non-Negative Matrices and Markov Chains. Springer, New York, second
edition, 1981.
[349] L. I. Sennott, P. A. Humblet, and R. L. Tweedie. Mean drifts and the non-ergodicity of
Markov chains. Operations Res., 31:783–789, 1983.
[350] J. G. Shanthikumar and D. D. Yao. Second-order properties of the throughput of a
closed queueing network. Math. Oper. Res., 13:524–533, 1988.
[351] T. Shardlow and A. M. Stuart. A perturbation theory for ergodic Markov chains and
application to numerical approximations. SIAM J. Numer. Anal., 37(4):1120–1137, 2000.
[352] M. Sharpe. General Theory of Markov Processes. Academic Press, New York, 1988.
[353] Z. Šidák. Classification of Markov chains with a general state space. In Trans. 4th Prague
Conf. Inf. Theory Stat. Dec. Functions, Random Proc, pages 547–571. Academia, Prague,
1967.
[354] K. Sigman. The stability of open queueing networks. Stoch. Proc. Applns., 35:11–25,
1990.
[355] G. F. Simmons. Introduction to Topology and Modern Analysis. McGraw Hill, New York,
1963.
[356] R. Sine. Convergence theorems for weakly almost periodic Markov operators. Israel J.
Math., 19:246–255, 1974.
584 Bibliography

[357] R. Sine. On local uniform mean convergence for Markov operators. Pacific J. Math.,
60:247–252, 1975.
[358] R. Sine. Sample path convergence of stable Markov processes II. Indiana University
Math. J., 25:23–43, 1976.
[359] A. F. M. Smith and A. E. Gelfand. Bayesian statistics without tears: a sampling-
resampling perspective. Amer. Statist., 46:84–88, 1992.
[360] A. F. M. Smith and G. O. Roberts. Bayesian computation via the Gibbs sampler and
related Markov chain Monte Carlo methods (with discussion). J. Roy. Statist. Soc. Ser.
B, 55:3–23, 1993.
[361] W. L. Smith. Asymptotic renewal theorems. Proc. Roy. Soc. Edinburgh (A), 64:9–48,
1954.
[362] W. L. Smith. Regenerative stochastic processes. Proc. Roy. Soc. London (A), 232:6–31,
1955.
[363] W. L. Smith. Remarks on the paper “Regenerative stochastic processes”. Proc. Roy.
Soc. London (A), 256:296–301, 1960.
[364] J. Snyders. Stationary probability distributions for linear time-invariant systems. SIAM
J. Control Optim., 15:428–437, 1977.
[365] V. Solo. Stochastic adaptive control and martingale limit theory. IEEE Trans. Automat.
Control, 35:66–70, 1990.
[366] F. M. Spieksma. Geometrically Ergodic Markov Chains and the Optimal Control of
Queues. PhD thesis, University of Leiden, 1991.
[367] F. M. Spieksma. Spectral conditions and bounds for the rate of convergence of countable
Markov chains. Technical report, University of Leiden, 1993.
[368] F. M. Spieksma and R. L. Tweedie. Strengthening ergodicity to geometric ergodicity for
Markov chains. Stochastic Models, 10:45–75, 1994.
[369] F. Spitzer. Principles of Random Walk. Van Nostrand, Princeton, N.J., 1964.
[370] D. Steinsaltz. Locally contractive iterated function systems. Ann. Probab., 27(4):1952–
1979, 1999.
[371] Ö. Stenflo. Uniqueness of invariant measures for place-dependent random iterations of
functions. In Fractals in Multimedia (Minneapolis, MN, 2001), volume 132 of IMA Vol.
Math. Appl., pages 13–32. Springer, New York, 2002.
[372] L. Stettner. On the existence and uniqueness of invariant measures for continuous time
Markov processes. Technical report LCDS 86-18, Brown University, Providence, R.I.,
1986.
[373] A. L. Stolyar. On the stability of multiclass queueing networks: a relaxed sufficient
condition via limiting fluid processes. Markov Process. Related Fields, 1(4):491–512,
1995.
[374] C. R. Stone. On absolutely continuous components and renewal theory. Ann. Math.
Statist., 37:271–275, 1966.
[375] O. Stramer and R. L. Tweedie. Langevin-type models I: diffusions with given stationary
distributions and their discretizations. Methodol. Comput. Appl. Probab., 1(3):283–306,
1999.
[376] O. Stramer and R. L. Tweedie. Langevin-type models II: self-targeting candidates for
MCMC algorithms. Methodol. Comput. Appl. Probab., 1(3):307–328, 1999.
Bibliography 585

[377] D. W. Stroock and S. R. Varadhan. On degenerate elliptic-parabolic operators of second

order and their associated diffusions. Comm. Pure Appl. Math., 25:651–713, 1972.
[378] D. W. Stroock and S. R. Varadhan. On the support of diffusion processes with ap-
plications to the strong maximum principle. In L. M. Le Cam, J. Neyman, and E. L.
Scott, editors, Proceedings of the 6th Berkeley Symposium on Mathematical Statistics
and Probability, pages 333–368. University of California Press, Berkeley, 1972.
[379] R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press,
Cambridge, Mass., 1998.
[380] W. Szpankowski. Some sufficient conditions for non-ergodicity of Markov chains. J.
Appl. Probab., 22:138–147, 1985.
[381] M. A. Tanner and W. H. Wong. The calculation of posterior distributions by data
augmentation. J. Amer. Statist. Assoc., 82:528–540, 1987.
[382] L. Tassiulas. Adaptive back-pressure congestion control based on local information. IEEE
Trans. Automat. Control, 40(2):236–250, 1995.
[383] L. Tassiulas and A. Ephremides. Stability properties of constrained queueing systems
and scheduling policies for maximum throughput in multihop radio networks. IEEE
Trans. Automat. Control, 37(12):1936–1948, 1992.
[384] J. L. Teugels. An example on geometric ergodicity of a finite Markov chain. J. Appl.
Probab., 9:466–469, 1972.
[385] Luke Tierney. Markov chains for exploring posterior distributions. Ann. Statist.,
22(4):1701–1762, 1994. With discussion and a rejoinder by the author.
[386] D. Tjøstheim. Non-linear time series and Markov chains. Adv. Appl. Probab., 22:587–611,
1990.
[387] H. Tong. A note on a Markov bilinear stochastic process in discrete time. J. Time Ser.
Anal., 2:279–284, 1981.
[388] H. Tong. Non-linear Time Series: A Dynamical System Approach. Oxford University
Press, Oxford, 1990.
[389] J. N. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function
approximation. IEEE Trans. Automat. Control, 42(5):674–690, 1997.
[390] P. Tuominen. Notes on 1-recurrent Markov chains. Z. Wahrscheinlichkeitstheorie und
Verw. Geb., 36:111–118, 1976.
[391] P. Tuominen and R. L. Tweedie. Markov chains with continuous components. Proc.
London Math. Soc. (3), 38:89–114, 1979.
[392] P. Tuominen and R. L. Tweedie. The recurrence structure of general Markov processes.
Proc. London Math. Soc. (3), 39:554–576, 1979.
[393] P. Tuominen and R. L. Tweedie. Subgeometric rates of convergence of f -ergodic Markov
chains. Adv. Appl. Probab., 26:775–798, 1994.
[394] R. L. Tweedie. R-theory for Markov chains on a general state space I: solidarity properties
and R-recurrent chains. Ann. Probab., 2:840–864, 1974.
[395] R. L. Tweedie. R-theory for Markov chains on a general state space II: R-subinvariant
measures for R-transient chains. Ann. Probab., 2:865–878, 1974.
[396] R. L. Tweedie. Relations between ergodicity and mean drift for Markov chains. Austral.
J. Statist., 17:96–102, 1975.
[397] R. L. Tweedie. Sufficient conditions for ergodicity and recurrence of Markov chains on
a general state space. Stoch. Proc. Applns., 3:385–403, 1975.
586 Bibliography

[398] R. L. Tweedie. Criteria for classifying general Markov chains. Adv. Appl. Probab., 8:737–
771, 1976.
[399] R. L. Tweedie. Operator geometric stationary distributions for Markov chains with
applications to queueing models. Adv. Appl. Probab., 14:368–391, 1981.
[400] R. L. Tweedie. Criteria for rates of convergence of Markov chains with application
to queueing and storage theory. In J. F. C. Kingman and G. E. H. Reuter, editors,
Probability, Statistics and Analysis. Cambridge University Press, Cambridge, 1983.
[401] R. L. Tweedie. The existence of moments for stationary Markov chains. J. Appl. Probab.,
20:191–196, 1983.
[402] R. L. Tweedie. Invariant measures for Markov chains with no irreducibility assumptions.
J. Appl. Probab., 25A:275–285, 1988.
[403] D. Vere-Jones. Geometric ergodicity in denumerable Markov chains. Quart. J. Math.
Oxford (2nd Ser.), 13:7–28, 1962.
[404] D. Vere-Jones. A rate of convergence problem in the theory of queues. Theory Probab.
Appl., 9:96–103, 1964.
[405] D. Vere-Jones. Ergodic properties of nonnegative matrices. I. Paciﬁc J. Math., 22:361–
386, 1967.
[406] D. Vere-Jones. Ergodic properties of nonnegative matrices II. Paciﬁc J. Math., 26:601–
620, 1968.
[407] L. Wu and N. Yao. Large deviation principles for Markov processes via Φ-Sobolev
inequalities. Electron. Commun. Probab., 13:10–23, 2008.
[408] Liming Wu. Essential spectral radius for Markov semigroups I: discrete time case. Prob.
Theory Related Fields, 128(2):255–321, 2004.
[409] L.M. Wu. Large deviations for Markov processes under superboundedness. C. R. Acad.
Sci Paris Série I, 324:777–782, 1995.
[410] B. E. Ydstie. Bifurcations and complex dynamics in adaptive control systems. In Proc.
of the 25th Conf. on Dec. and Control, Athens, Greece, 1986.
[411] M. Yor and B. Bru. Comments on the life and mathematical legacy of Wolfgang Doeblin.
Finance and Stochastics, 6(1):3–47, 2002.
[412] K. Yosida and S. Kakutani. Operator-theoretical treatment of Markov’s process and
mean ergodic theorem. Ann. Math., 42:188–228, 1941.
Index

General index

Ka -chain, 116 Geometrically ergodic a., 365, 368

L1 space, 555 Kendall a., 368
L∞ space, 555 Autoregressive model, 389
P n -definition of positivity, 534, 536 ARMA, 25
P n -definition of recurrence, 533 Dependent parameter (RCA), 32
P n -properties, 532 Order (k, ), 25
σ-field, 553 Order k, 24
Generated, 553 Random coefficient (RCA), 415
Generated by a r.v., 555
σ-finite measure, 553 Backward recurrence time, 39, 54
τ -classification of chains, 535 δ-skeleton, 69
τ -properties, 532 Process, 69
d-Cycle, 112 Balayage operator, 308
Bilinear model, 27, 389
f -Regularity and ergodicity, 353
Absolute continuity, 75
Dependent parameter, 32
Absorbing set, 84
Geometrically ergodic, 389, 414
Maximal a. s., 204
Irreducible T-chain, 152
Accessible atom, 96
Multidimensional, 414
Accessible set, 86
Blackwell’s Renewal Theorem, 354
Adaptive control, 34
Borel σ-field, 553, 557
Adaptive control model, 35, 416
Bounded in probability, 142, 305
V -uniform, 417 On average, 288
Irreducibility, 162 T-chain, 472
Performance, 417 Brownian motion, 562
Tight, 307
Age process, 39 Causal controls, 34
Aperiodic, 112, 114 Central Limit Theorem, 422
A. state, 475 Asymptotic variance, 422
A. Ergodic Theorem, 313 Functional, 443, 447, 563
Strongly a., 112 Martingale CLT, 563
Topological a. state, 464 Random walk, 454
ARMA, 24, 25 Chapman–Kolmogorov equations, 61, 62
Ascoli’s Theorem, 557 Generalized, 116
Asymptotic variance, 422 Closed sets, 556
Atom, 96 Closure of sets, 556
f -Kendall a., 368 Coercive function, 213, 559
Ergodic a., 319 Coercive sequence, 493

587
588 Indexes

Communicate, 78 Generalized drift, 492

Compact sets, 556 Recurrence, 494
Comparison Theorem, 343 Geometric, 375
Geometric, 377 History dependent, 491
Conditional expectations, 556 Jarner & Roberts, 360
Continuous functions, 557 Mixed, 499
Control set, 27, 29, 149, 153 Mixed for ladder chain, 502
Controllability grammian, 93 Positivity, 536
Controllability matrix Recurrence and transience, 534
Generalized, 153 State dependent, 483, 484, 509
Controllable, 14, 90 Geometric, 486
Converges to infinity, 200, 206 Strict (Foster’s criterion), 263
Convolution, 39, 68, 558 Dynamical system, 17, 26
Countably generated σ-field, 553 Dynkin’s formula, 264
Coupling, xxii, 334
C. and convergence rate, 329 e-Chain, 140
C. null chains, 464 Hidden Markov model, 167
C. time, 321 Lyapunov exponent, 167
C. renewal processes, 320 Nonlinear filtering, 167
Cruise control, 4 Eigenfunction, 512
Cycles for control models, 157 Eigenvalue, 512
Cyclic classes, 112 Embedded Markov chains, 7
Equicontinuous functions, 557
Dense sets, 556 Equicontinuous transition kernel, 140
Detailed balance equations, 514 Ergodic, 316
Dirac probability measure, 61 f -e., 337
Disturbance, 22 E. atom, 319
Doeblin E. chains, 537
Coupling, 334 Strongly e., 418
D. decomposition, 94 Ergodicity, 313, 316
Doeblin’s condition, 394, 402, 418 V -uniform, 392
Historical notes, 524 e-chains, 475
Minorization, 121 f -Norm Ergodic Theorem, 336
Dominated Convergence Theorem, 555 f -geometric, 363, 384
Drift E. and regularity, 333
f -Modulated, 337, 343 Exponential Ergodic Theorem, 527
Criteria for σ-finite invariant measure, Geometric, 362
299 Exact, 402
Criteria for existence of invariant History 1993–2007, 391, 420, 510
measure, 299 Geometric Ergodic Theorem, 362
Criteria for non-positivity, 278 History, 334
Criteria for stability for e-chains, 302 Multiplicative e., 518
Criterion for non-evanescence, 214 Null chains, 462
Criterion for recurrence, 189 Renewal theory and splitting, 316
Criterion for transience, 187 Uniform, 393, 400
D. operator, 172 Error, 22
D. criteria, 172 Evanescence, 15, 206
D. properties, 532 Feller chains, 474
Deterministic models, 261 Exchange rate, 4
Donsker & Varadhan, 517 Exogenous variables, 33
Extended generator, 526 Expectation, 555
Indexes 589

Extended generator, 526 Initial distribution, 51

Innovation, 22
Fatou’s Lemma, 555 Integrable functions, 554
Feller property, 125 Invariant σ-ﬁeld, 423
Strong, 125 Invariant event, 423, 425
Feynman–Kac semi-group, 519, 528 Invariant measure, 229
Finiteness of moments, 352 e-chains, 300
First entrance decomposition, 174, 178 Recurrent chains, 242
First-entrance last-exit decomposition, 317 Invariant random variable, 423, 425
Forcing function, 443 Invariant set, 154
Forward accessible, 148, 152 Invasion/antibody model, 489
Forward recurrence time, 39, 54 Irreducible, 78
V -uniform, 399 M -i., 156
δ-skeleton, 69 ϕ-i., 82
δ-skeleton, 109 Maximal irreducibility measure, 84
and the Key Renewal Theorem, 356 Open set i., 128, 130
Geometrically ergodic, 370
Invariant measure, 236 Kac’s Theorem, 236
Positive, 248 Kaplan’s condition, 509
Process, 69 Kendall sets, 372
Recurrence, 175 Kendall’s Theorem, 366
Regular, 272 Kernel, 60
Foster’s criterion, 263, 287 n-step, 61
Foster–Lyapunov criteria, 16 Substochastic, 70
Full set, 84 Transition probability k., 59
Fundamental kernel, 511 Kolmogorov’s inequality, 561

Generalized sampling, 292 Ladder chain, 70

Generator, 526 Ladder chains and queues, 249
Drift operator, 172 Large deviations, 518
Fleming’s nonlinear g., 517 Last-exit decomposition, 178
Geometrically ergodic atom, 365, 368 Law of large numbers, 422, 424
Globally attracting, 157 For e-chains, 478
General ratio form, 435
Harmonic functions, 425 Ratio limit with atom, 428
Harris Law of the Iterated Logarithm, 422
H. recurrent, 199 LCM(F ,G) model, 9
H. τ -property, 533 Lebesgue integral, 554
Maximal H. set, 204 Lebesgue measure, 553
Positive H., 231 Lebesgue measure irreducibility, 88
Topologically H. recurrent, 208 Limiting variance, xviii
Lindelöf’s Theorem, 556
Increment Linear control model, 8, 9, 90
Bounded increments, 225 Controllable, 90
Geometric I. analysis, 407 Linear model
I. sequence, 22 Simple, 23
I. analysis, 218 Linear state space model, 9
Indecomposable, 156 Bounded in probability, 305
Indicator function, 62, 554 Central Limit Theorem, 456
Inessential sets, 197 Control model LCM(F ,G), 9
Initial condition, 49 Controllable, 14
590 Indexes

Deterministic, 7 Moran dam, 43, 68

Eigenvalue condition, 137 Multidimensional models, 487
Gaussian, 92, 110 Multiplicative ergodic theory, 518
LSS(F ,G), 9
Positive, 252 Neighborhoods, 556
T-chains, 135 Network
Locally compact, 556 Computer, 5
Lower semicontinuous, 123, 557 Queueing, 286
Lyapunov exponent, 167 Stochastic, 360
Lyapunov function Teletraffic, 5
f -drift, 343 Noise, 22
(DV3), 517 Non-evanescent, 206, 289, 534
For recurrence, 189 Nonlinear state space model, 26, 127, 146
Foster’s criterion, 263 V -uniform, 407
Generalized drift, 492 Associated control system, 29
Recurrence, 494 Scalar n.s.s.m., 26
Geometric, 375 Norm
Mixed, 499 V -n., 392, 395
State dependent, 484 f -n., 336
Geometric, 486 Operator-n., 395
Total variation n., 315
Markov chain, 3, 52, 61 Norm-like function (see coercive f.), xviii
Definition, 49 NSS(F ) model, 29
Time-homogeneous, 52 Null chains, 534
Markov chain Monte Carlo (MCMC), 47, Null Markov process, 230
522 Null sets, 471
Markov decision process, 360, 459 Null states, 470, 473
Markov property, 63
Strong m.p., 66 Occupation probabilities, 477
Markov transition function, 59 Occupation time, 64
Markov transition matrix, 53 Open sets, 556
Martingale, 561 Orey’s Theorem, 467
M. difference sequence, 561
Martingale Convergence Theorem, 561 Pakes’ lemma, 286
Maximal Harris set, 204 Period, 114
Maximal irreducibility measure, 84 Persistence, 197
Mean drift, 224 Petite set, 117
Mean square stabilizing, 34 Poisson’s equation, xviii, 443
Measurable function, 553 Applications, 458
Measurable space, 552 Forcing function, 443
Measure, 553 Perturbation theory, 459
Metastability, 520 Simulation, 521
Metric space, 556 Polling system, 412
Minimal set, 155 Population growth, 5
Minimal subinvariant measure, 243 Positive chain, 230, 534
Minimum variance, 34, 35 Positive recurrent T-chain, 472
Minorization condition, 98 Positive set, 471
Mixing, 419 Positive state, 463, 470, 473
V -geometric, 398 Topological, 464
Moment, 228 Positivity versus nullity, 534
Monotone Convergence Theorem, 554 Pre-compact set, 556
Indexes 591

Probability space, 555 Regularity

Process on A, 245, 254, 298 f -geometric r. of chains, 380
f -geometric r. sets, 372
Quasi-compact, 418 f -r., 338, 344, 359
Quasi-stationarity, 229, 520 f -r. set, 345
Queue, 4, 40 f -r. sets and chains, 339
GI/G/1, 41, 70, 108 Regular chain, 535
GI/G/1 q. with re-entry, 275 Regular measure, 270, 557
GI/M/1, 42, 56 Regular sets, 256
Ladder chains, 249 Renewal measure, 69
M/G/1, 43, 80 Renewal process, 38
M/G/1, geometrically ergodic, 411 Delayed, 68
M/PH/1 q., 413 Recurrence, 175
Moran dam, 43 Renewal Theorem, 354
Networks, 360 Blackwell, 354
Number in an GI/M/1 queue, 239 Residual lifetime process, 39
Number in an M/G/1 queue, 238 Resolvent
Phase type service, geometrically er- Cts. time, 524
godic, 413 Resolvent equation, 294
Polling system, geometrically ergodic, Cts. time, 526
412 Resolvent kernel, 62
Re-entry, 274 Reversible, 514
Stability of GI/G/1, 506 Running maximum, 437

Random coeﬃcient autoregression, 415 Sample paths, 49

Random variable, 555 Sampled chain, 115
Random walk, 11, 44, 55, 248 Generalized sampling, 292
Bernoulli, 176 Sampling distribution, 116
Bernoulli, geometrically ergodic, 388 Semi-dynamical system, 17, 261
Central Limit Theorem, 454 Sensitivity process, 163
R.w. on the half line, 12, 55, 67, 76, Separability, 556
87, 193, 219 Sequence or path space, 49
V -uniform, 399 SETAR model, 27, 138, 540
Regularity, 272 Null recurrence, 280
Recurrent, 191 Regularity, 276
Simple, 176 Transience, 221
Transient, 191 Shift operator, 63
Unrestricted, 88, 134 Simple linear model
Randomized ﬁrst entrance time, 292 Regularity, 273
Ratio limit theorem, 428 Simulation, 521
Reachable state, 128, 130, 463, 472 Skeleton, 62
Recurrence, 15 Skip-free
Deterministic system, 261 g-s.f. to the left, 410
Recurrent state Invariant measure, 239
Recurrence characterization, 210 Random walk on a half line, 195
Recurrent atom, 173 S.f. to the right, 71
Recurrent chain, 174, 180, 532 Small set, 102
Structure of π, 246 SNSS(F ) model, 26
Recurrent set, 171 Sobolev inequality, 520
Regeneration times, 38 Spectra, 511
Regenerative decomposition, 325, 364 Eigenfunction, 512
592 Indexes

Eigenvalue, 512 n-step, 53

Spectrum, 511 Transition probability, 49
Splitting, 98 Transition probability kernel, 59
Splitting general Harris chains, 433
Spread out, 107, 248 Ultimately bounded, 261
Stability, 13, 171 Unbounded oﬀ petite sets, 189
P n -properties, 532 Uniformly accessible, 86, 116
τ -properties, 532 Uniformly transient, 182
Asymptotic s., 18 Upper semicontinuous, 557
Drift properties, 532
Global asymptotic s., 18 V-norm, 392, 395
Global exponential s., 306 Vague topology, 560
Lagrange, 17
Lagrange s. for CM(F ) model, 406 Weak convergence, 558
S. in the sense of Lyapunov, 18 Weak Feller property, 125
State space, 50 Weak topology, 290, 558
Stationary processes, 230 Weakly, 140
Stochastic comparison, 218
Stopping time, 65
First hitting, 64
First return, 64
Storage model, 4, 56
Content-dependent, 46
Simple, 44
Strong Feller property, 125
Strong Markov property, 66
Strong mixing, 393, 397
Subinvariant measure, 232
Sublevel set, 188, 557
Supermartingale, 561

T-chain, 124, 130

Bounded in probability, 472
Positive recurrent , 472
Taboo probabilities, 67
Test function, 538
Test set, 538
Tight, 15, 288
Tightness, 559
Topologically recurrent state, 208
Topology, 556
Total variation norm, 315, 553
V -norm, 392
f -norm, 336
Transient, 174, 180, 532
GI/M/1 queue, 195
Uniformly t., 171
Transient atom, 173
Transition kernel
Substochastic, 59
Transition matrix, 52
Symbols

A B, Uniformly accessibility, 86 Rn , Residual service time immediately af-

A+ (x), States reachable from x by CM(F ), ter customer arrival, 71
148 Sn (g), Partial sum of g(Φk ), 421
Ak+ (x), States reachable from x at time k T (x, A), Continuous component, 124
by CM(F ), 147 Ta b = min{j : Za (j) = Zb (j) = 1}, 321
Cxk 0 , Generalized controllability matrix, Uh , Resolvent kernel, 293
153 V + (n), Forward recurrence time chain, 39
∗, Convolution operator, 68 V + (t) := inf(Zn − t : Zn > t, n ≥ 1) For-
1 ⊗ π, Outer-product kernel, 392 ward recurrence time process, 69
a B, B is uniformly accessible using a Vδ+ (n) = V + (nδ), Forward recurrence time
from A, 116 δ-skeleton, 69
A ⊗ B, 415 V − (t) := inf(t − Zn : Zn ≤ t, n ≥ 1), Back-
A0 , Points from which A is inaccessible, 86 ward recurrence time process, 69
C ∞ , Functions whose derivatives of arbi- Vδ− (n) = V − (nδ), Backward recurrence
trary order exist, 26 time δ-skeleton, 69
CV (r) := {x : V (x) ≤ r}, Sublevel set of V , VC , Minimal solution to (V2), 267
188 ∆ = P − I, Drift operator, 172
Cn , Controllability matrix, 90 Γ, Distribution of a disturbance variable,
Co, Complex plane, 90 22
Fk , Output maps for the linear control Λ∗i (x, A) = P (i, x; 0, A), Ladder chain
model, 27 transition probability, 70
IB , Indicator kernel of B, 66 Λi −j + 1 (x, A) = P (i, x; j, A), Ladder chain
Ig , Multiplication kernel using g, 126 transition probability, 70
Ka ε , Resolvent kernel, 62 Ω = X∞ , Sequence space, 49
L(x, A) := Px (τA < ∞), 64 Ω+ (C), Omega limit set for NSS(F ), 154
L(x, h) = Uh (x, h), 296 Φn , Markov chain value at time n, 3
Mn (g), Martingale derived from g, 446 Σµ , σ-ﬁeld of Pµ -invariant events, 423
N (t), Number of customers in queue at ||µ||:=supA ∈B(X) µ(A)−inf A ∈B(X) µ(A), To-
time t, 40 tal variation norm, 315
N (t), Number of customers in queue im- δx (A) = P 0 (x, A), Dirac measure, 61
mediately before nth arrival, 41 γg2 , Asymptotic variance in the CLT, 422
Nn∗ , Number of customers in queue imme- λ∗ , Split measure on bcx, 99
diately after nth service time is ↔, Communicates with, 78
completed, 43 p̄(M ), Upper tail of renewal sequence, 465
Ow , Control set, 149, 153 IB , Indicator function of B, 62
Ow , Supports input in CM(F ) and NSS(F ) R, Real line, 553
models, 29 Rn , n-dimensional Euclidean space, 6
P (x, A), n-step transition probability, 61 Z+ , non-negative integers, 3
P (x, A), One-step transition probability, µL e b Lebesgue measure, 88
52, 61 µL e b , Lebesgue measure on R, 553
P n (x, A), n-step transition probability, 53 −→w µ∞ , Weak convergence of µk to µ∞ ,
Ph (x, A), Kernel for “process on h”, 295 139
Q(x, A), Probability that chain enters A ∇Φ , Sensitivity process, 163
i.o. from x, 199 A, Points from which A is accessible, 86
Q(x, h) = Px {Ψk ∈ Ah i.o.}, 296 A(m), 86
R(x), Conditional emptying time for dam π, Invariant measure, 229
model, 45 ψ, Maximal irreducibility measure, 83

593
594 Indexes

ρ(F ), Maximum eigenvalue modulus, 137

→, Leads to, 78
σA := min{n ≥ 0 : Φn ∈ A}, 64

, Absolute continuity, 75
τA := min{n ≥ 1 : Φn ∈ A}, 64
τA , First entry time to A, 14
τA (1) := τA , 64
τA (k) := min{n > τA (k − 1) : Φn ∈ A}, 64
θ k , k t h order shift operator, 63
ϕ, Irreducibility measure, 81
hY , Almost everywhere invariant function,
423
mn (t), Interpolation of Mn (g), 447
qj , Prob. of j arrivals in one service in
M/G/1 queue, 58
sj (f ), Sum of f (Φi ) between visits to atom,
428
sn (t), Interpolation of Sn (g), 447
B(R), Borel σ-ﬁeld, 553
B+ (X), Sets with ψ(A) > 0, 84
FζΦ :={A ∈ F : {ζ = n}∩A ∈ FnΦ , n ∈ Z+ },
66
FnΦ := σ(Φ0 , . . . , Φn ), 63
G + (γ), Distributions with transform con-
vergent in [0, γ], 399
M, Borel probability measures, 17
P̌ (xi , A) Split transition function, 99
ĝ, Solution to Poisson’s equation, 443
π̃{A}, 425
P k (x, · ), Cesaro average of P k , 288
Φ, Markov chain, 3, 61
Φm , m-skeleton chain, 62
Φa , Chain with transition function Ka , 115
Φa , Sampled chain, 116
vec (B), 415
Px , Prob. conditional on Φ0 = x, 14
X, State space, 50
B(X), Borel σ-ﬁeld on X, 49
C(X), Bounded continuous functions, 557
C(X), Continuous bounded functions on X,
125
C0 (X), Continuous functions vanishing at
∞, 560
Cc (X), Continuous functions on X, compact
support, 140
A P (x, B) := Px (Φ n ∈ B, τA ≥ n), 67
n

Var(u), Variation in a renewal sequence,

330
Φ̌, Split chain, 99

CASE Solution Vyaderm
100% (2)
CASE Solution Vyaderm
8 pages
(Mathematics Study Resources, 1) Ludger Rüschendorf - Stochastic Processes and Financial Mathematics-Springer (2023)
100% (1)
(Mathematics Study Resources, 1) Ludger Rüschendorf - Stochastic Processes and Financial Mathematics-Springer (2023)
310 pages
Handbook On Image Processing and Computer Vision Volume 3 From Pattern To Object
No ratings yet
Handbook On Image Processing and Computer Vision Volume 3 From Pattern To Object
694 pages
CP8D Information 2023
No ratings yet
CP8D Information 2023
4 pages
Convex Functions and Their Applications: Constantin P. Niculescu Lars-Erik Persson
No ratings yet
Convex Functions and Their Applications: Constantin P. Niculescu Lars-Erik Persson
430 pages
Kulkarni A. Optimization in Machine Learning and Applications 2020
100% (1)
Kulkarni A. Optimization in Machine Learning and Applications 2020
202 pages
Stable Convergence and Stable Limit Theorems: Erich Häusler Harald Luschgy
No ratings yet
Stable Convergence and Stable Limit Theorems: Erich Häusler Harald Luschgy
231 pages
(Modern Approaches in Geophysics 15) Frank Scherbaum (Auth.) - of Poles and Zeros - Fundamentals of Digital Seismology (2001, Springer Netherlands) PDF
No ratings yet
(Modern Approaches in Geophysics 15) Frank Scherbaum (Auth.) - of Poles and Zeros - Fundamentals of Digital Seismology (2001, Springer Netherlands) PDF
281 pages
Umberto Michelucci - Fundamental Mathematical Concepts for Machine Learning in Science-Springer (2024)
No ratings yet
Umberto Michelucci - Fundamental Mathematical Concepts for Machine Learning in Science-Springer (2024)
259 pages
Dokumen - Pub Machine Learning in Bioinformatics of Protein Sequences Algorithms Databases and Resources For Modern Protein Bioinformatics 9811258570 9789811258572
No ratings yet
Dokumen - Pub Machine Learning in Bioinformatics of Protein Sequences Algorithms Databases and Resources For Modern Protein Bioinformatics 9811258570 9789811258572
378 pages
Statistics and Machine Learning Toolbox™ Release Notes
No ratings yet
Statistics and Machine Learning Toolbox™ Release Notes
150 pages
Statistical Inference, Econometric Analysis and Matrix Algebra. Schipp, Bernhard Krämer, Walter. 2009
No ratings yet
Statistical Inference, Econometric Analysis and Matrix Algebra. Schipp, Bernhard Krämer, Walter. 2009
445 pages
Daniel D. Gajski, Jianwen Zhu, Rainer Dömer (Auth.), Jørgen Staunstrup, ...
No ratings yet
Daniel D. Gajski, Jianwen Zhu, Rainer Dömer (Auth.), Jørgen Staunstrup, ...
405 pages
STOCHASTIC PROCESSES Lecture 1
No ratings yet
STOCHASTIC PROCESSES Lecture 1
17 pages
Snakes: Active Contours
No ratings yet
Snakes: Active Contours
18 pages
AdvancesInKnowledgeDicoveryAndDataMining 2012 Part1
No ratings yet
AdvancesInKnowledgeDicoveryAndDataMining 2012 Part1
642 pages
Recent Advances in Computer Vision Applications Using Parallel Processing
No ratings yet
Recent Advances in Computer Vision Applications Using Parallel Processing
126 pages
QUANTITATIVE ECONOMICS With Python PDF
No ratings yet
QUANTITATIVE ECONOMICS With Python PDF
670 pages
MCMC - Markov Chain Monte Carlo: One of The Top Ten Algorithms of The 20th Century
100% (1)
MCMC - Markov Chain Monte Carlo: One of The Top Ten Algorithms of The 20th Century
31 pages
Advances in Quantum Machine Learning
No ratings yet
Advances in Quantum Machine Learning
38 pages
Operator Theory Functional Analysis and Applications
No ratings yet
Operator Theory Functional Analysis and Applications
654 pages
Rajarama M. Jena, Subrat K. Jena, Snehashish Chakraverty - Computational Fractional Dynamical Systems - Fractional Differential Equations and Applications-Wiley (2022)
No ratings yet
Rajarama M. Jena, Subrat K. Jena, Snehashish Chakraverty - Computational Fractional Dynamical Systems - Fractional Differential Equations and Applications-Wiley (2022)
268 pages
(Probability Theory and Stochastic Modelling 103) Zenghu Li - Measure-Valued Branching Markov Processes-Springer-Verlag GMBH (2023)
No ratings yet
(Probability Theory and Stochastic Modelling 103) Zenghu Li - Measure-Valued Branching Markov Processes-Springer-Verlag GMBH (2023)
481 pages
PDF
No ratings yet
PDF
296 pages
(Applied Mathematical Sciences 167) Otmar Scherzer, Markus Grasmair, Harald Grossauer, Markus Haltmeier, Frank Lenzen (Auth.) - Variational Methods in imaging-Springer-Verlag New York (2009)
No ratings yet
(Applied Mathematical Sciences 167) Otmar Scherzer, Markus Grasmair, Harald Grossauer, Markus Haltmeier, Frank Lenzen (Auth.) - Variational Methods in imaging-Springer-Verlag New York (2009)
323 pages
2018 Book DataScienceAndPredictiveAnalyt
No ratings yet
2018 Book DataScienceAndPredictiveAnalyt
929 pages
Advances in Intelligent Information and Database Systems
No ratings yet
Advances in Intelligent Information and Database Systems
371 pages
PDF Evolutionary Optimization Algorithms Full Online: Book Details
No ratings yet
PDF Evolutionary Optimization Algorithms Full Online: Book Details
1 page
(Problem Books in Mathematics) Marek Capiński, Tomasz Zastawniak (Auth.) - Probability Through Problems-Springer-Verlag New York (2001)
No ratings yet
(Problem Books in Mathematics) Marek Capiński, Tomasz Zastawniak (Auth.) - Probability Through Problems-Springer-Verlag New York (2001)
262 pages
2018 Book NetworkDataAnalytics PDF
100% (1)
2018 Book NetworkDataAnalytics PDF
406 pages
From Algorithms To ZScores SHORT
100% (2)
From Algorithms To ZScores SHORT
409 pages
Jean Gallier, Jocelyn Quaintance - Linear Algebra and Optimization With Applications To Machine Learning - Volume II - Fundamentals of Optimization Theory With Applications To Machine Learning. 2-Wor
100% (1)
Jean Gallier, Jocelyn Quaintance - Linear Algebra and Optimization With Applications To Machine Learning - Volume II - Fundamentals of Optimization Theory With Applications To Machine Learning. 2-Wor
896 pages
Advances in Deterministic and Stochastic Analysis (2007)
100% (1)
Advances in Deterministic and Stochastic Analysis (2007)
372 pages
Pytorch Lightning Manual Readthedocs Io English May2020
No ratings yet
Pytorch Lightning Manual Readthedocs Io English May2020
562 pages
_OceanofPDF.com_Data_Visualization_in_R_and_Python_-_Marco_Cremonini
No ratings yet
_OceanofPDF.com_Data_Visualization_in_R_and_Python_-_Marco_Cremonini
977 pages
Point Process Calculus in Time and Space An Introduction With Applications
No ratings yet
Point Process Calculus in Time and Space An Introduction With Applications
562 pages
Dokumen.pub Computational Finance With r 9789811920073 9789811920080
No ratings yet
Dokumen.pub Computational Finance With r 9789811920073 9789811920080
352 pages
Finance RN
No ratings yet
Finance RN
144 pages
Bayesoptbook A4
No ratings yet
Bayesoptbook A4
374 pages
Mathematics For Social Sciences
No ratings yet
Mathematics For Social Sciences
211 pages
9cgmv A Guide To Experimental Algorithmics PDF
100% (2)
9cgmv A Guide To Experimental Algorithmics PDF
272 pages
Dijiktras Algorithm: 16it206 Data Structures and Algorithms Unit Iii: Dijiktras Algorithm - Tracy Sneha
No ratings yet
Dijiktras Algorithm: 16it206 Data Structures and Algorithms Unit Iii: Dijiktras Algorithm - Tracy Sneha
24 pages
Innovations in Classification Data Science Daniel Baier, Klaus Dieter
No ratings yet
Innovations in Classification Data Science Daniel Baier, Klaus Dieter
620 pages
Pierre Bremaud - Mathematical Principles of Signal Processing
No ratings yet
Pierre Bremaud - Mathematical Principles of Signal Processing
262 pages
William Easttom-Modern Cryptography - Applied Mathematics For Encryption and Information Security-Springer International Publishing - Springer (2021)
No ratings yet
William Easttom-Modern Cryptography - Applied Mathematics For Encryption and Information Security-Springer International Publishing - Springer (2021)
403 pages
Practitioner's Guide To Data Science
No ratings yet
Practitioner's Guide To Data Science
403 pages
(Bernhard Schölkopf, Alexander J. Smola) Learning With Kernels PDF
No ratings yet
(Bernhard Schölkopf, Alexander J. Smola) Learning With Kernels PDF
645 pages
Test
100% (1)
Test
297 pages
Large Networks and Graph Limits
100% (2)
Large Networks and Graph Limits
487 pages
Debabala Swain Machine Learning and Information 2020
No ratings yet
Debabala Swain Machine Learning and Information 2020
533 pages
2022 - Chuan Shi, Xiao Wang, Cheng Yang - Advances in Graph Neural Networks-Springer
No ratings yet
2022 - Chuan Shi, Xiao Wang, Cheng Yang - Advances in Graph Neural Networks-Springer
207 pages
Lecture Notes
100% (1)
Lecture Notes
324 pages
Linear Algebra For Data Science 9811276226 9789811276224 - Compress
100% (2)
Linear Algebra For Data Science 9811276226 9789811276224 - Compress
257 pages
Computer Simulation Techniques: The Definitive Introduction!
No ratings yet
Computer Simulation Techniques: The Definitive Introduction!
175 pages
Data Classification - Algorithms and Applications-Chapman and Hall - CRC (2014) - (Chapman & Hall - CRC Data Mining and Knowledge Discovery Series) Charu C. Aggarwal PDF
100% (1)
Data Classification - Algorithms and Applications-Chapman and Hall - CRC (2014) - (Chapman & Hall - CRC Data Mining and Knowledge Discovery Series) Charu C. Aggarwal PDF
704 pages
A Review of Bayesian Machine Learning Principles, Methods, and Applications
No ratings yet
A Review of Bayesian Machine Learning Principles, Methods, and Applications
6 pages
Math For Data Science
No ratings yet
Math For Data Science
538 pages
STAT613
No ratings yet
STAT613
295 pages
Untitled
100% (1)
Untitled
201 pages
ARIMA Models in Python Chapter2
No ratings yet
ARIMA Models in Python Chapter2
43 pages
Effective Amazon Machine Learning
From Everand
Effective Amazon Machine Learning
Alexis Perrier
No ratings yet
Stochastic - Lecture Notes
100% (1)
Stochastic - Lecture Notes
108 pages
con_037908
No ratings yet
con_037908
54 pages
1887_3721905-BroekemaAdriaanse2023
No ratings yet
1887_3721905-BroekemaAdriaanse2023
8 pages
my-chi-to--pdf
No ratings yet
my-chi-to--pdf
3 pages
DallasFed-Bankers-Quick-Reference-CRA-Guide
No ratings yet
DallasFed-Bankers-Quick-Reference-CRA-Guide
14 pages
virgin--atlantic-case-study
No ratings yet
virgin--atlantic-case-study
7 pages
bro-sma
No ratings yet
bro-sma
6 pages
challenges-to-preserving-value-in-debt-restructuring
No ratings yet
challenges-to-preserving-value-in-debt-restructuring
3 pages
alert-Key-Issues-for-Preferred-Equity-Investors-in-Real-Estate-Transactions
No ratings yet
alert-Key-Issues-for-Preferred-Equity-Investors-in-Real-Estate-Transactions
2 pages
pc-loan-ops-rev
No ratings yet
pc-loan-ops-rev
10 pages
MS_Gotham_Fund
No ratings yet
MS_Gotham_Fund
287 pages
Thomas Friedmann
No ratings yet
Thomas Friedmann
67 pages
Release_Deutsche_Bank_CEBS_Stress_Test_23_July_2010
No ratings yet
Release_Deutsche_Bank_CEBS_Stress_Test_23_July_2010
4 pages
Risk and benefit sharing schemes in oil exploration
No ratings yet
Risk and benefit sharing schemes in oil exploration
4 pages
IR_Release_08_Dec_2011_Reconciliation_EBA_capital_exercise
No ratings yet
IR_Release_08_Dec_2011_Reconciliation_EBA_capital_exercise
2 pages
covered-transactions-and-considerations-in-fund-finance
No ratings yet
covered-transactions-and-considerations-in-fund-finance
3 pages
MORGAN STANLEY DIRECT LENDING FUND 8-K
No ratings yet
MORGAN STANLEY DIRECT LENDING FUND 8-K
6 pages
2024-03-04-Hamilton-Lane-Announces-Pricing-of-Public-Offering-of-Class-A-Common-Stock
No ratings yet
2024-03-04-Hamilton-Lane-Announces-Pricing-of-Public-Offering-of-Class-A-Common-Stock
1 page
backleveraged-operating-leases-recent-developments
No ratings yet
backleveraged-operating-leases-recent-developments
3 pages
preqin-special-report-secondary-market-for-alternatives-aug-14
No ratings yet
preqin-special-report-secondary-market-for-alternatives-aug-14
1 page
Alternative-performance-measures-definition
No ratings yet
Alternative-performance-measures-definition
1 page
White Williams Distressed Loan and Asset Sales
No ratings yet
White Williams Distressed Loan and Asset Sales
2 pages
Bullet LCDS - Updated Polling Rules
No ratings yet
Bullet LCDS - Updated Polling Rules
7 pages
benefits-and-considerations-of-family-office-nav-credit-facilities
No ratings yet
benefits-and-considerations-of-family-office-nav-credit-facilities
3 pages
event-Structuring-AB-Pari-Passu-Mezzanine-Preferred-Equity-and-Intercreditor-Arrangements-for-Securitization
No ratings yet
event-Structuring-AB-Pari-Passu-Mezzanine-Preferred-Equity-and-Intercreditor-Arrangements-for-Securitization
1 page
SPGlobal_2019
No ratings yet
SPGlobal_2019
3 pages
wp125
No ratings yet
wp125
3 pages
debtorinpossession-financing
No ratings yet
debtorinpossession-financing
7 pages
commercial-real-estate-lending
No ratings yet
commercial-real-estate-lending
2 pages
remit_qanda
No ratings yet
remit_qanda
2 pages
Optimized Mediprobe Presentation
No ratings yet
Optimized Mediprobe Presentation
27 pages
OSS QB (1)
No ratings yet
OSS QB (1)
6 pages
English Form 2 Mid Year 2012
No ratings yet
English Form 2 Mid Year 2012
8 pages
The Beatles - Let It Be
No ratings yet
The Beatles - Let It Be
4 pages
Service Innovation in Public Sector: A Case Study On PT. Kereta Api Indonesia
No ratings yet
Service Innovation in Public Sector: A Case Study On PT. Kereta Api Indonesia
8 pages
The Appraisal of Real Estate 15th Edition Appraisal Institute - The ebook is ready for download with just one simple click
100% (1)
The Appraisal of Real Estate 15th Edition Appraisal Institute - The ebook is ready for download with just one simple click
42 pages
Control Corrosion Factors in Ammonia and Urea Plants
No ratings yet
Control Corrosion Factors in Ammonia and Urea Plants
13 pages
CT Operating Instructions
No ratings yet
CT Operating Instructions
15 pages
Radio Wave Propagation
No ratings yet
Radio Wave Propagation
48 pages
Single Case Research Design and Analysis New Directions for Psychology and Education 1st Edition Thomas R Kratochwill Joel R Levin Editors - The ebook in PDF format is available for download
100% (1)
Single Case Research Design and Analysis New Directions for Psychology and Education 1st Edition Thomas R Kratochwill Joel R Levin Editors - The ebook in PDF format is available for download
72 pages
Electronic Devices and Circuits.
No ratings yet
Electronic Devices and Circuits.
8 pages
De La Salle University: College of Science Department of Mathematics
No ratings yet
De La Salle University: College of Science Department of Mathematics
3 pages
Abolghasem Ebrahimi Designing and Evaluating Insurance
No ratings yet
Abolghasem Ebrahimi Designing and Evaluating Insurance
19 pages
Submitted by ROLL NO: 2400-BC-2018: Assignment
No ratings yet
Submitted by ROLL NO: 2400-BC-2018: Assignment
7 pages
Book Review and Analysis of "Harrison Bergeron" by Kurt Vonnegut, Jr.
No ratings yet
Book Review and Analysis of "Harrison Bergeron" by Kurt Vonnegut, Jr.
5 pages
Summer Project On Smartphones by Pradeep
No ratings yet
Summer Project On Smartphones by Pradeep
36 pages
4E Cleric
No ratings yet
4E Cleric
2 pages
Prove That 7 Is Irrational ?: 10th Class
No ratings yet
Prove That 7 Is Irrational ?: 10th Class
11 pages
University College of Engineering Villupuram Department of Mechanical Engineering
No ratings yet
University College of Engineering Villupuram Department of Mechanical Engineering
6 pages
VOAD Membership Booklet (Printed)
No ratings yet
VOAD Membership Booklet (Printed)
28 pages
Introduction To Prolotherapy PRP and Stem Cell Therapy
No ratings yet
Introduction To Prolotherapy PRP and Stem Cell Therapy
6 pages
Michael Jordan
No ratings yet
Michael Jordan
14 pages
Cavernox Best Flags
No ratings yet
Cavernox Best Flags
10 pages
Effortlessly Between The Two
No ratings yet
Effortlessly Between The Two
4 pages
Horror On Mars 1
100% (1)
Horror On Mars 1
11 pages
Automotive Industry Agenda
No ratings yet
Automotive Industry Agenda
88 pages
Sony Prs 600 Manual
No ratings yet
Sony Prs 600 Manual
190 pages
CHEM 1252 Exam Information Sheet Fall 2021
No ratings yet
CHEM 1252 Exam Information Sheet Fall 2021
2 pages