0% found this document useful (0 votes)

15 views14 pages

9781107113046_frontmatter

The document presents 'Introduction to Probability and Statistics for Data Science,' a textbook designed to teach fundamental statistical concepts and methods to students in various fields. It emphasizes practical applications through examples and R code, covering modern topics like regression trees and bootstrapping while providing a solid theoretical foundation. The authors, Steven E. Rigdon, Ronald D. Fricker, Jr., and Douglas C. Montgomery, are esteemed professionals in their respective fields, contributing to the book's credibility and depth.

Uploaded by

John Jairo Marin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views14 pages

9781107113046_frontmatter

Uploaded by

John Jairo Marin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Cambridge University Press & Assessment

978-1-107-11304-6 — Introduction to Probability and Statistics for Data Science

Steven E. Rigdon, Ronald D. Fricker, Jr, Douglas C. Montgomery
Frontmatter
More Information

Introduction to Probability and Statistics

for Data Science

Introduction to Probability and Statistics for Data Science provides a solid course in the fundamental
concepts, methods, and theory of statistics for students in statistics, data science, biostatistics, engi-
neering, and physical science programs. It teaches students to understand, use, and build on modern
statistical techniques for complex problems. The authors develop the methods from both an intuitive
and mathematical angle, illustrating with simple examples how and why the methods work. More
complicated examples, many of which incorporate data and code in R, show how the method is used
in practice. Through this guidance, students get the big picture about how statistics works and can be
applied. This text covers more modern topics such as regression trees, large-scale hypothesis testing,
bootstrapping, MCMC, time series, and fewer theoretical topics such as the Cramer–Rao lower bound
and the Rao–Blackwell theorem. It features more than 250 high-quality figures, 180 of which involve
actual data. Data and R code are available on the book’s website so that students can reproduce the
examples and complete hands-on exercises.

Steven E. Rigdon is Professor of Biostatistics at Saint Louis University. He is a fellow of the

American Statistical Association and is the author of Statistical Methods for the Reliability of Repairable
Systems, Calculus, 8th and 9th editions, Monitoring the Health of Populations by Tracking Disease
Outbreaks (2020), and Design of Experiments for Reliability Achievement (2022). He has received the
Waldo Vizeau Award for technical contributions to quality, the Soren Bisgaard Award, and the Paul
Simon Award for linking teaching and research. He is also Distinguished Research Professor Emeritus
at Southern Illinois University Edwardsville.

Ronald D. Fricker, Jr. is Vice Provost for Faculty Affairs at Virginia Tech, where he has served as
head of the Department of Statistics, Senior Associate Dean in the College of Science, and, subsequently,
interim dean of the college. He is the author of Introduction to Statistical Methods for Biosurveillance
(2013) and, with Steve Rigdon, Monitoring the Health of Populations by Tracking Disease Outbreaks
(2020). He is a fellow of the American Statistical Association, a fellow of the American Association for
the Advancement of Science, and an elected member of the Virginia Academy of Science, Engineering,
and Medicine.

Douglas C. Montgomery is Regents’ Professor and ASU Foundation Professor of Engineering at

Arizona State University. He is an Honorary Member of the American Society for Quality, a fellow of the
American Statistical Association, a fellow of the Institute of Industrial and Systems Engineering, and a
fellow of the Royal Statistical Society. He is the author of 15 other books, including Design and Analysis
of Experiments, 10th edition (2020) and Design of Experiments for Reliability Achievement (2022). He
has received the Shewhart Medal, the Distinguished Service Medal, and the Brumbaugh Award from
the ASQ, the Deming Lecture Award from the ASA, the Greenfield Medal from the Royal Statistical
Society, and the George Box Medal from the European Network for Business and Industrial Statistics.

© in this web service Cambridge University Press & Assessment www.cambridge.org

Cambridge University Press & Assessment
978-1-107-11304-6 — Introduction to Probability and Statistics for Data Science
Steven E. Rigdon, Ronald D. Fricker, Jr, Douglas C. Montgomery
Frontmatter
More Information

“This book serves as an excellent resource for students with diverse backgrounds, offering a thorough exploration of
fundamental topics in statistics. The clear explanation of concepts, methods, and theory, coupled with an abundance
of practical examples, provides a solid foundation to help students understand statistical principles and bridge the
gap between theory and application. This book offers invaluable insights and guidance for anyone seeking to master
the principles of statistics. I highly recommend adopting this book for my future statistics class.”
Haijun Gong, Saint Louis University
“Professors Rigdon, Fricker and Montgomery have put together an impressive volume that covers not only basic
probability and basic statistics, but also includes extensions in a number of directions, all of which have immediate
relevance to the work of practitioners in quantitative fields. Suffused with common sense and insights about real
data and problems, it is both approachable and precise. I’m excited about the inclusion of material on power and
on multiple testing, both of which will help users become smarter about what their analyses can do, and I applaud
their omission of too much theory. I also appreciate their use of R and of real data. This would be an excellent text
for undergraduate or graduate-level data analysts.”
Sam Buttrey, Naval Postgraduate School (NPS)
“This is a comprehensive and rich book that extends foundational concepts in statistics and probability in easily
accessible form into data science as an integrated discipline. The reader applies and validates theoretical concepts
in R and connects results from R back to the theory across many methods: from descriptive statistics to Bayesian
models, time series, generalized linear models and more. Thoroughly enjoyable!”
Oliver Schabenberger, Virginia Tech Academy of Data Science

© in this web service Cambridge University Press & Assessment www.cambridge.org

Introduction to Probability and

Statistics for Data Science
with R

Steven E. Rigdon
Saint Louis University

Ronald D. Fricker, Jr.

Virginia Polytechnic Institute and State University

Douglas C. Montgomery
Arizona State University

© in this web service Cambridge University Press & Assessment www.cambridge.org

Shaftesbury Road, Cambridge CB2 8EA, United Kingdom

One Liberty Plaza, 20th Floor, New York, NY 10006, USA
477 Williamstown Road, Port Melbourne, VIC 3207, Australia
314-321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi - 110025, India
103 Penang Road, #05–06/07, Visioncrest Commercial, Singapore 238467

Cambridge University Press is part of Cambridge University Press & Assessment,

a department of the University of Cambridge
We share the University’s mission to contribute to society through the pursuit of
education, learning and research at the highest international levels of excellence.

www.cambridge.org
Information on this title: www.cambridge.org/highereducation/isbn/9781107113046
DOI: 10.1017/9781316286166
© Steven E. Rigdon, Ronald D. Fricker, Jr., and Douglas C. Montgomery 2025
This publication is in copyright. Subject to statutory exception and to the provisions
of relevant collective licensing agreements, no reproduction of any part may take place
without the written permission of Cambridge University Press & Assessment.
When citing this work, please include a reference to the DOI 10.1017/9781316286166
First published 2025
Printed in Mexico by Litográfica Ingramex, S.A. de C.V.
A catalogue record for this publication is available from the British Library.
A Cataloging-in-Publication data record for this book is available from the Library of Congress.
ISBN 978-1-107-11304-6 Hardback
ISBN 978-1-009-56835-7 Paperback
Additional resources for this publication at www.cambridge.org/ProbStatsforDS.
Cambridge University Press has no responsibility for the persistence or accuracy of
URLs for external or third-party internet websites referred to in this publication
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.

© in this web service Cambridge University Press & Assessment www.cambridge.org

Steve Rigdon

To my wife Pat, who has always supported me and been at my side.

Ron Fricker

To my spouse, Christine: Tu ventus sub alis meis es.

And to my first statistics professor, Randy Spoeri: You introduced me to the subject and made it fun.

Doug Montgomery

To Cheryl, who has always supported and encouraged me. And to the memory of my first statistics
professor, Ray Myers, mentor, colleague, collaborator, and friend.

© in this web service Cambridge University Press & Assessment www.cambridge.org

Contents

Preface xiii 2.2.3 Histograms 35

2.2.4 Lattice (or Trellis) Plots 38
2.2.5 Box Plots 40
1 Introduction 1
2.2.6 Scatterplots 43
1.1 Data Science and Statistics 2
2.3 Graphics for Longitudinal Data 46
1.2 More on Statistics 3
2.3.1 Time Series Plots 47
1.2.1 Populations and Samples 4
2.3.2 Repeated Cross-Sectional Plots 48
1.2.2 Descriptive versus Inferential
2.3.3 Autocorrelation Plots 50
Statistics 5
2.4 Chapter Summary 50
1.3 An Introduction to R 6
2.5 Problems 52
1.4 Descriptive Statistics 7
1.4.1 Types of Data 8
1.4.2 Example Data: US Domestic 3 Basic Probability 54
Flights from 1987 to 2008 9 3.1 Introduction 54
1.5 Cross-Sectional Data 9 3.2 Events and Sample Spaces 54
1.5.1 Measures of Location 9 3.2.1 Probability Axioms 56
1.5.2 Measures of Variation 16 3.2.2 Union of Events 57
1.5.3 Measures of How Two Variables 3.2.3 Intersection of Independent
Co-vary 18 Events 59
1.5.4 Other Summary Statistics 23 3.2.4 Complementary Events 60
1.6 Tabular Summaries of Data 25 3.2.5 Conditional Probability 62
3.3 Calculating Probabilities 63
1.7 Chapter Summary 27
3.3.1 Sample Point Method 64
1.8 Problems 29
3.3.2 Counting Sample Points 66
3.3.3 Combining Events 70
2 Data Visualization 31 3.4 Bringing It All Together 71
2.1 Introduction 31 3.4.1 Law of Total Probability 71
2.2 Traditional Statistical Graphics 32 3.4.2 Bayes’ Theorem 74
2.2.1 Bar Charts 32 3.5 Chapter Summary 77
2.2.2 Pie Charts 35 3.6 Problems 78

© in this web service Cambridge University Press & Assessment www.cambridge.org

viii Contents

4 Random Variables 82 6.5 Gamma and Weibull Distribu-

4.1 Introduction 82 tions 189
4.2 Discrete Random Variables 82 6.5.1 Gamma Distribution 189
4.2.1 Probability Mass Function 82 6.5.2 Weibull Distribution 192
4.2.2 Cumulative Distribution 6.6 Distributions Related to the
Function 86 Normal 196
4.2.3 Expected Value 88 6.6.1 Chi-square (χ2 ) Distribution 196
4.2.4 Variance and Standard Deviation 91 6.6.2 t Distribution 198
4.3 Continuous Random Variables 93 6.6.3 F Distribution 201
4.3.1 Probability Density Function 93 6.7 Beta Distribution 202
4.3.2 Cumulative Distribution 6.8 Transformations 205
Function 97 6.8.1 Simulating from Distributions 207
4.3.3 Expected Value 101 6.9 Moment Generating Functions 209
4.3.4 Variance and Standard Deviation 101 6.10 Quantile--Quantile Plots 215
4.4 Expected Value and Variance 6.11 Chapter Summary 220
Properties 102 6.12 Problems 221
4.5 Joint Distributions for Discrete
Random Variables 105 7 About Data and Data
4.6 Conditional Distributions for Collection 226
Discrete Random Variables 111 7.1 Introduction 226
4.7 Joint Distributions for Continuous 7.2 Data and the Scientific Method 228
Random Variables 113 7.3 Experimental vs. Observational
4.8 Conditional Distributions for Data 231
Continuous Random Variables 121 7.3.1 Convenience vs. Probability
4.9 Conditioning on a Random Sampling 234
Variable 123 7.4 Accuracy vs. Precision 234
4.10 Chapter Summary 126 7.5 Types of Random Samples 236
4.11 Problems 127 7.5.1 Sources of Bias 237
7.6 Types of Error 239
7.7 Historical Gaffes in Data
5 Discrete Distributions 132
Collection 240
5.1 Introduction 132
7.8 Chapter Summary 241
5.2 Binomial Distribution 132
7.9 Problems 242
5.3 Geometric Distribution 141
5.4 Negative Binomial 146
5.5 Hypergeometric Distribution 149 8 Sampling Distributions 244
5.6 Poisson Distribution 154 8.1 Introduction 244
5.7 Multinomial Distribution 160 8.2 Linear Combinations of Random
5.8 Chapter Summary 164 Variables 244
8.3 Sampling Distributions for Sums
5.9 Problems 166
and Means 249
8.4 Sampling Distribution for the
6 Continuous Distributions 170 Sample Variance 253
6.1 Introduction 170 8.5 The Central Limit Theorem 255
6.2 Uniform Distribution 170 8.6 Normal Approximation to the
6.3 Exponential Distribution 173 Binomial 259
6.4 Normal Distribution 180 8.7 Sampling Distributions for
6.4.1 Standardizing 183 Proportions 263
6.4.2 Bivariate and Multivariate 8.8 Tchebysheff’s Theorem and the
Normal Distributions 186 Law of Large Numbers 265

© in this web service Cambridge University Press & Assessment www.cambridge.org

Contents ix

8.9 Chapter Summary 268 11.5 Testing the Mean: Variance

8.10 Problems 269 Known 357
11.5.1 Hypothesis Tests for the
Mean from a Population with
9 Point Estimation 273
Known Variance 357
9.1 Introduction and Intuitive
11.5.2 Power 360
Estimators 273
11.6 Testing the Mean: Variance
9.2 Estimation Criteria 275
Unknown 364
9.2.1 Unbiased Estimators 275
9.2.2 Consistent Estimators 277
11.7 Testing a Proportion 370
9.3 Method of Moments 279 11.8 Testing the Variance 373
9.4 Maximum Likelihood 283 11.9 Likelihood Ratio Tests 374
9.5 Approximating MLEs 289 11.10 Chapter Summary 380
9.6 Sufficiency 294 11.11 Problems 381
9.7 Chapter Summary 298
9.8 Problems 299 12 Hypothesis Tests for Two or
More Populations 386
10 Confidence Intervals 302 12.1 Introduction 386
10.1 Introduction 302 12.2 Testing Two Independent
10.2 Basic Properties 303 Samples 386
10.3 Large Sample Confidence 12.2.1 Comparing Two Means 386
Intervals 307 12.2.2 Comparing Two Proportions 398
10.4 Small Sample Confidence 12.2.3 Comparing Variances 402
Intervals 312 12.3 Testing Paired Samples 404
10.5 Confidence Intervals for 12.4 Single-Factor Analysis of
Differences 316 Variance 409
10.5.1 Confidence Intervals for 12.5 Two-Factor ANOVA 425
Differences of Proportions 317 12.6 Other Designs for Experiments 430
10.5.2 Confidence Intervals for 12.6.1 Two-Level Factorial Designs 431
Differences in Means 319 12.6.2 Fractional Factorial Designs 436
10.5.3 Confidence Interval for 12.6.3 Block Designs 439
Paired Data 324 12.6.4 Some Experimental Design
10.6 Determining the Sample Size 326 Principles 442
10.7 Confidence Intervals from 12.7 Power 444
Complex Survey Data 330 12.7.1 Power for Two-Sample
10.7.1 Sampling from a Finite t-Test 445
Population 330 12.7.2 Power for One-Factor
10.7.2 Stratified Random Samples 333 ANOVA 448
10.7.3 Cluster Sampling 338 12.8 Chapter Summary 451
10.7.4 Secondary Data Sources 339 12.9 Problems 453
10.7.5 Software for Analyzing Data
from Complex Surveys 341 13 Hypothesis Tests for
10.8 Chapter Summary 343 Categorical Data 459
10.9 Problems 344 13.1 Introduction 459
13.2 Goodness-of-Fit Tests 460
11 Hypothesis Testing 348 13.3 Contingency Tables: Testing
11.1 Introduction 348 Independence 465
11.2 Elements of a Statistical Test 350 13.4 Contingency Tables: Homogene-
11.3 Power 354 ity 471
11.4 P-values 356 13.5 Fisher’s Exact Test 473

© in this web service Cambridge University Press & Assessment www.cambridge.org

x Contents

13.6 The Continuity Correction and 15.6 Noninformative Priors 596

Simulation 479 15.7 Simulation Methods 599
13.7 McNemar’s Test 481 15.7.1 Metropolis–Hastings
13.8 Higher-Dimensional Tables and Algorithm 601
Simpson’s Paradox 485 15.7.2 The Gibbs Sampling
13.9 Chapter Summary 488 Algorithm 604
13.10 Problems 489 15.8 Hierarchical Bayes Models 606
15.9 Chapter Summary 613
14 Regression 493 15.10 Problems 614
14.1 Introduction 493
14.1.1 Prediction vs. Explanation 493 16 Time Series Methods 618
14.1.2 Terminology 495 16.1 Introduction 618
14.1.3 A Working Example 495 16.2 Using R for Time Series 622
14.2 Simple Linear Regression 496 16.3 Numerical Description of Time
14.3 Properties of the Least Squares Series 623
Estimators 503 16.4 Exponential Smoothing Methods 628
14.4 Inference for Parameters of the 16.5 Autoregressive Integrated
Simple Linear Regression Model 510 Moving Average (ARIMA)
14.5 Matrix Formulation of Simple Models 635
Linear Regression 516 16.6 Chapter Summary 640
14.6 Joint Confidence Regions 518 16.7 Problems 642
14.7 Confidence and Prediction
Intervals for Responses 520 17 Estimating the Standard Error:
14.8 Optimal Selection of Levels of
Predictor Variables 526
Analytic Approximations,
14.9 The ANOVA Table for Simple the Jackknife, and the
Linear Regression 528 Bootstrap 645
14.10 Linear Models in More than One 17.1 Introduction 645
Predictor 531 17.2 Analytic Approximations to the
14.11 Indicator Variables 539 Standard Error of an Estimator 646
14.12 Polynomial and Nonlinear 17.3 The Jackknife 656
Regression 542 17.4 The Bootstrap 662
14.13 Inference for a Linear Combina- 17.4.1 Bootstrap Confidence
tion of Model Parameters 546 Intervals Based on
14.14 Correlation 551 Percentiles 667
14.15 R2 and Adjusted R2 561 17.5 Parametric Bootstrap 670
14.16 Model Checking 565 17.6 Bootstrapping in R 674
14.16.1 Normal Probability Plots 566 17.7 Chapter Summary 681
14.16.2 Plot of Residuals against the 17.8 Problems 682
Fitted Values 567
14.17 Chapter Summary 568 18 Generalized Linear Models
14.18 Problems 570 and Regression Trees 684
18.1 Logistic Regression 684
15 Bayesian Methods 574 18.2 Multinomial Logistic Regression 698
15.1 Introduction 574 18.3 Poisson Regression 703
15.2 Bayes’ Theorem 575 18.4 Generalized Linear Models 706
15.3 The Bayesian Paradigm 581 18.5 Regression Trees 707
15.4 Two Paradoxes 585 18.6 Discrimination and Classifica-
15.5 Conjugate Priors 588 tion 714

© in this web service Cambridge University Press & Assessment www.cambridge.org

Contents xi

18.6.1 K = 2 Groups and p = 1 19.5 Cross-Validation with Classifica-

Variable 718 tion Data 765
18.6.2 K Groups and p = 1 Variable 724 19.6 Chapter Summary 767
18.6.3 K Groups and p Variables 725 19.7 Problems 769
18.6.4 Quadratic Discriminant
Analysis 737
18.6.5 Dealing with Estimated
20 Large-Scale Hypothesis
Parameters 739 Testing 773
18.6.6 Choosing Between Linear 20.1 Review of Hypothesis Testing 773
and Quadratic Discriminant 20.2 Testing Multiple Hypotheses 775
Analysis 739 20.3 The FWER and the Bonferroni
18.7 Logistic Regression for Correction 780
Classification 740 20.4 Holm’s Method 783
18.8 Chapter Summary 745 20.5 The False Discovery Rate 785
18.9 Problems 746 20.6 Simultaneous Confidence
Intervals 789
20.7 Tukey’s Method 791
19 Cross-Validation and 20.8 Scheffé’s Method 794
Estimates of Prediction Error 751 20.9 Chapter Summary 801
19.1 Overfitting and Underfitting 751 20.10 Problems 802
19.2 Cross-Validation 754
19.2.1 Splitting the Data at Random 755
19.3 Leave-One-Out Cross- References 805
Validation 760
19.4 k-Fold Cross-Validation 762 Index 809

Preface

This book is designed for students in statistics, data science, biostatistics, engineering, and mathematics
programs who need a solid course in the fundamental concepts, methods, and theory of statistics.
Our goal is to give students enough background in the methods and theory of statistics that they can
understand modern techniques used in statistics and be able to apply them in the practice of data science.
We had to make some difficult choices regarding topic coverage. We do cover the important
concepts of statistics, including maximum likelihood, the information matrix, power, etc., because
these are needed for a student to be a successful statistician. When we cover maximum likelihood
estimation, we specifically cover the method of approximating the maximum of the (log) likelihood
function. Nowadays, data are so plentiful that we are often faced with testing multiple null hypotheses.
Holm’s method and the Benjamini–Hochberg method are derived and applied to real problems. There
are a number of statistical methods that were developed in the late twentieth and early twenty-
first centuries, including regression trees, large-scale hypothesis testing, methods of cross-validation,
the bootstrap, Markov chain Monte Carlo, and others. We address the optimal selection of levels
of a predictor variable to maximize the information we obtain; this leads to an introduction to
the topic of optimal design. With some exceptions, these techniques have not found their way
into introductory textbooks, especially those that emphasize theory. Throughout, we have tried to
include topics that a statistician would use in the practice of statistics and to cover these thoroughly.
We don’t develop every aspect of statistical theory; for example, we cover very little of the limit
theorems in statistics (convergence in probability, convergence in distribution, almost sure convergence,
Slutsky’s theorem, etc.). We don’t cover the Cramer–Rao lower bound or the Rao–Blackwell theorem.
We cover joint continuous distributions using multiple integration, but we do not go into great
depth.
The emphasis is on modern methods of statistical inference. We develop enough theory so that
students will understand these methods. If a statistician or data scientist is to work effectively with
practitioners, it is up to the statistician to be the one to explain how methods work, what assumptions
underlie the methods, what the limitations are, and how (or whether) the assumptions can be checked.
Subject matter experts (i.e., the nonstatisticians) are not trained to do this. This is why it is important for
students of statistics to understand the underlying theory behind the methods.
The flip side of our approach is that we do not develop theory for theory’s sake. No theory is
developed for the purpose that it might be usable in a future course. We have found that students

xiv Preface

who understand probability and the foundational concepts of statistical theory can understand and use
advanced statistical methods. Without a solid grounding on the theory and concepts of statistics it is
difficult to pick up new methods.
Calculus is used in a number of places in the book, so students will need at least one or two semesters
of calculus. There are a few uses of multiple integrals when we discuss joint continuous distributions,
and for these the third semester of calculus will be needed. An instructor can skip these topics or sidestep
the use of multiple integrals. We use calculus when it is necessary, for example in getting expected values
of continuous random variables. We use R throughout the book. Although we do cover an introduction
to R, it would be helpful if students had some prior background in R.
We use data extensively throughout the book. Most of the data sets are real (although at times we
give small data sets to introduce a method). Many of these data sets are large. In most cases, we have
provided a csv (comma separated values) file for the data. We also provide the R code used in the book
to analyze the data sets that we provide. This can be found at: www.cambridge.org/ProbStatsforDS
While the book’s website contains information about getting R up and running, we offer the
following advice about loading in data sets and packages. First, it is always good practice to set the
working directory to the directory on your computer that contains your data files. You can do this with
the setwd() command. For example,

setwd("C:/Users/Documents/Rfiles")

will force R to read (write) files from (to) this directory. Note two things: (1) the path must be enclosed
in quotes, and (2) subdirectories are indicated by forward slashes, not backslashes. Second, many of
the methods we apply in this book require special R packages to run. These packages are collections of
functions, dataframes, etc. Before you can use a package you must (1) install it, and (2) load it in during
each R session. To install a package, such as dplyr, type

install.packages("dplyr")

Then, every time you start a new R session, you will have to load this package using

library(dplyr)

You need only install a package once on your computer, but you must call library() each time you
begin an R session.
If you type library for a package you haven’t installed, you will get an error. For example, if you
haven’t installed the testassay package and if you type library(testassay), then you will get an
error like this:

Error in library(testassay) : there is no package called ‘testassay’

The remedy is to first install the package by typing install.packages("testassay") and then typing
library(testassay). If you ever get an error like the following

Error in arrange(df, y) : could not find function "arrange"

there is a good chance you forgot to load the package that contains the function arrange(), which is in
the dplyr package. The remedy is to first type library(dplyr).

Preface xv

Most two-semester courses will include a fairly standard first semester, which would likely cover
the following chapters:

Semester 1
Chapter Topic
1 Introduction
2 Data Visualization
3 Basic Probability
4 Random Variables
5 Discrete Distributions
6 Continuous Distributions
7 About Data and Data Collection
8 Sampling Distributions
9 Point Estimation
10 Confidence Intervals
11 Hypothesis Testing
12 Hypothesis Tests for Two or More Populations

The choice of topics for a second course would depend on the nature of the course. For example, our
book could be used in a mathematical statistics course that emphasizes applications of statistics without
sacrificing any of the underlying theory. Such a course could use the following material in the second
term:

Semester 2
Chapter Topic
13 Hypothesis Tests for Categorical Data
14 Regression
15 Bayesian Methods
17 The Jackknife and Bootstrap
18 Generalized Linear Models and Regression Trees
20 Large-Scale Hypothesis Testing

For a course that leans toward data science, the second semester coverage might include:

Semester 2
Chapter Topic
13 Hypothesis Tests for Categorical Data
14 Regression
16 Time Series Methods
17 The Jackknife and Bootstrap
18 Generalized Linear Models and Regression Trees
19 Cross-Validation and Estimates of Prediction Error
20 Large-Scale Hypothesis Testing

A course for scientists or engineers could include selected topics in the above chapters, with
additional methods from Chapter 15. For example, a course in biostatistics might emphasize the sections
on logistic regression, discrimination, and classification since these are frequently used in medical and
public health research. Such a course could minimize or skip material on regression trees. Instructors

xvi Preface

could also use this as a textbook for a one-semester course by selecting (and omitting) material in the
early part of the book. For example, the following chapters could be covered in a one-semester course:
One-semester course emphasizing statistics
Chapter Topic
1 Introduction
2 Data Visualization (omitting data visualization for survey data, geospatial
data, and network data)
3 Basic Probability
4 Random Variables
5 Discrete Distributions (possibly omitting the hypergeometric and
multinomial distributions)
6 Continuous Distributions (possibly skipping the Weibull, Beta distributions,
and the sections on transformations, moment-generating functions, and QQ
plots)
7 About Data and Data Collection (hitting just the main ideas)
8 Sampling Distributions (skipping the proof of the Central Limit Theorem)
9 Point Estimation
10 Confidence Intervals
11 Hypothesis Testing
12 Hypothesis Tests for Two or More Populations
13 or 14 Hypothesis Tests for Categorical Data/Regression

For situations where students have had a prior course on statistics (possibly one that did not use
calculus), a course could be designed to emphasize data science:
One-semester course emphasizing data science
Chapter Topics
4–6 Select topics in these chapters to bring students up to speed
7 About Data and Data Collection (hitting just the main ideas)
8 Sampling Distributions (skipping the proof of the Central Limit Theorem)
9 Point Estimation
10 Confidence Intervals
11 Hypothesis Testing
12 Hypothesis Tests for Two or More Populations
13 Hypothesis Tests for Categorical Data
14 Regression
17. The Jackknife and Bootstrap
18. Generalized Linear Models and Regression Trees
20. Large-Scale Hypothesis Testing

This book was typeset in LATEX using a modified version of The Legrand Orange Book template
originally created by Mathias Legrand and modified by Vel and the authors.
We would like to thank Emily Rigdon for LATEXing much of the material in the book and Gary Smith
for his careful reading and editing of the manuscript. We would also like to thank the staff at Cambridge,
especially Lauren Cowles, Maggie Jeffers, and Lucy Edwards for their help in molding this book into
what it has become, and for their patience through the process.
Steven E. Rigdon
Ronald D. Fricker, Jr.
Douglas C. Montgomery

(Ebook PDF) The Practice of Statistics For Business and Economics 4th All Chapters Instant Download
100% (3)
(Ebook PDF) The Practice of Statistics For Business and Economics 4th All Chapters Instant Download
41 pages
Statistics by Jim PDF
40% (5)
Statistics by Jim PDF
25 pages
Introductory Statistics: Student Solutions Manual To Accompany
100% (1)
Introductory Statistics: Student Solutions Manual To Accompany
192 pages
Hoel, Paul G 1971 Introduction To Mathematical Statistics 4th Ed
100% (1)
Hoel, Paul G 1971 Introduction To Mathematical Statistics 4th Ed
424 pages
Statistical Inference. Casella, G. y Berger, R. L. 2002
No ratings yet
Statistical Inference. Casella, G. y Berger, R. L. 2002
584 pages
Larson, H.D. - Introduction To Probability Theory and Statistical Inference
100% (3)
Larson, H.D. - Introduction To Probability Theory and Statistical Inference
664 pages
2022 Bookmatter StatisticsForDataScientists
No ratings yet
2022 Bookmatter StatisticsForDataScientists
24 pages
Practical Bayesian Inference
100% (2)
Practical Bayesian Inference
322 pages
Introduction To Probabilistic and Statistical Methods With Examples in R - 9783030457990 PDF
100% (2)
Introduction To Probabilistic and Statistical Methods With Examples in R - 9783030457990 PDF
163 pages
Probability And Statistics For Engineering And The Sciences With Modeling Using R 1st Edition William P Fox pdf download
100% (1)
Probability And Statistics For Engineering And The Sciences With Modeling Using R 1st Edition William P Fox pdf download
90 pages
Probability and Statistics for Engineering and the Sciences with Modeling Using R 1st Edition William P. Foxpdf download
100% (1)
Probability and Statistics for Engineering and the Sciences with Modeling Using R 1st Edition William P. Foxpdf download
46 pages
Probability and Statistics for Engineering and the Sciences with Modeling Using R 1st Edition William P. Fox instant download
100% (4)
Probability and Statistics for Engineering and the Sciences with Modeling Using R 1st Edition William P. Fox instant download
78 pages
Download (Ebook) Practical Bayesian Inference A Primer for Physical Scientists by Coryn A. L. Bailer Jones ISBN 9781316642214, 1316642216 ebook All Chapters PDF
100% (9)
Download (Ebook) Practical Bayesian Inference A Primer for Physical Scientists by Coryn A. L. Bailer Jones ISBN 9781316642214, 1316642216 ebook All Chapters PDF
55 pages
Data Analyticsi Foundations
No ratings yet
Data Analyticsi Foundations
540 pages
Instant download Introductory statistics 4TH REVISED EDITION Edition Sheldon M. Ross pdf all chapter
100% (1)
Instant download Introductory statistics 4TH REVISED EDITION Edition Sheldon M. Ross pdf all chapter
51 pages
Introductory Statistics for Data Analysis Warren J. Ewens instant download
No ratings yet
Introductory Statistics for Data Analysis Warren J. Ewens instant download
33 pages
Where can buy (Ebook) Instructor's Solutions Manual for Probability and Statistics with R for Engineers and Scientists 1st edition by Michael Akritas ISBN 9780321853080, 0321853083 ebook with cheap price
No ratings yet
Where can buy (Ebook) Instructor's Solutions Manual for Probability and Statistics with R for Engineers and Scientists 1st edition by Michael Akritas ISBN 9780321853080, 0321853083 ebook with cheap price
82 pages
22041
No ratings yet
22041
41 pages
Practical Bayesian Inference A Primer For Physical Scientists 1st Edition Coryn A L Bailer Jones download
No ratings yet
Practical Bayesian Inference A Primer For Physical Scientists 1st Edition Coryn A L Bailer Jones download
90 pages
Introductory Statistics for Data Analysis Warren J. Ewens instant download
No ratings yet
Introductory Statistics for Data Analysis Warren J. Ewens instant download
78 pages
Mathematical Statistics 16th Edition Keith Knight All Chapters Instant Download
100% (2)
Mathematical Statistics 16th Edition Keith Knight All Chapters Instant Download
74 pages
Practice of Statistics in the Life Sciences Brigitte Baldi pdf download
100% (2)
Practice of Statistics in the Life Sciences Brigitte Baldi pdf download
50 pages
AEM Lecture 1
No ratings yet
AEM Lecture 1
70 pages
Full download (Ebook) Introductory Statistics for Data Analysis by Warren J. Ewens, Katherine Brumberg ISBN 9783031281884, 3031281888 pdf docx
100% (7)
Full download (Ebook) Introductory Statistics for Data Analysis by Warren J. Ewens, Katherine Brumberg ISBN 9783031281884, 3031281888 pdf docx
81 pages
Introduction to probability and statistics Second Edition, Revised And Expanded Edition Giri pdf download
No ratings yet
Introduction to probability and statistics Second Edition, Revised And Expanded Edition Giri pdf download
84 pages
Practice of Statistics in the Life Sciences Brigitte Baldi pdf download
100% (4)
Practice of Statistics in the Life Sciences Brigitte Baldi pdf download
78 pages
Introduction to probability and statistics Second Edition, Revised And Expanded Edition Giri download
No ratings yet
Introduction to probability and statistics Second Edition, Revised And Expanded Edition Giri download
79 pages
Probability & Statistics for Engineers & Scientists, 9th Edition Ronald E. Walpole pdf download
100% (1)
Probability & Statistics for Engineers & Scientists, 9th Edition Ronald E. Walpole pdf download
62 pages
Introduction to Probability Models 9th Edition by Sheldon Ross 0125980620 9780125980623 download
100% (2)
Introduction to Probability Models 9th Edition by Sheldon Ross 0125980620 9780125980623 download
45 pages
Theory of stochastic objects: probability, stochastic processes, and inference First Edition Micheas - Discover the ebook with all chapters in just a few seconds
100% (3)
Theory of stochastic objects: probability, stochastic processes, and inference First Edition Micheas - Discover the ebook with all chapters in just a few seconds
70 pages
Solutions Manual to accompany Miller & Freund’s Probability and Statistics for Engineers 8th edition 0321640772 - Download Instantly To Experience The Full Content
100% (7)
Solutions Manual to accompany Miller & Freund’s Probability and Statistics for Engineers 8th edition 0321640772 - Download Instantly To Experience The Full Content
51 pages
(eBook PDF) Statistical Reasoning for Everyday Life 4th Edition by Jeffrey O. Bennett instant download
100% (2)
(eBook PDF) Statistical Reasoning for Everyday Life 4th Edition by Jeffrey O. Bennett instant download
27 pages
Probability And Statistics For Stem 2nd Edition Emmanuel N Barron instant download
No ratings yet
Probability And Statistics For Stem 2nd Edition Emmanuel N Barron instant download
86 pages
1probability and Statistics For Pre-Engineers Course Outline
No ratings yet
1probability and Statistics For Pre-Engineers Course Outline
3 pages
Sample Intro Statistics Intuitive Guide
50% (2)
Sample Intro Statistics Intuitive Guide
25 pages
Untitled
No ratings yet
Untitled
177 pages
Stat Lesson1
No ratings yet
Stat Lesson1
10 pages
Complete Theory of Stochastic Objects Probability Stochastic Processes and Inference 1st Edition Micheas PDF For All Chapters
100% (3)
Complete Theory of Stochastic Objects Probability Stochastic Processes and Inference 1st Edition Micheas PDF For All Chapters
62 pages
Introduction To Probability And Statistics Second Edition Revised And Expanded Giri download
No ratings yet
Introduction To Probability And Statistics Second Edition Revised And Expanded Giri download
90 pages
Q3 Statistics and Probability Week 1
No ratings yet
Q3 Statistics and Probability Week 1
19 pages
Statistical Methods in Medical Research New Edition PDF
No ratings yet
Statistical Methods in Medical Research New Edition PDF
15 pages
Full Download Probability and Statistics by Example Volume 1 Basic Probability and Statistics 2nd Revised edition Edition Yuri Suhov PDF DOCX
100% (2)
Full Download Probability and Statistics by Example Volume 1 Basic Probability and Statistics 2nd Revised edition Edition Yuri Suhov PDF DOCX
67 pages
Data Analysis A Bayesian Tutorial 2nd Edition Devinderjit Sivia - The complete ebook set is ready for download today
50% (2)
Data Analysis A Bayesian Tutorial 2nd Edition Devinderjit Sivia - The complete ebook set is ready for download today
61 pages
2018 Book ProbabilityAndStatisticsForCom
No ratings yet
2018 Book ProbabilityAndStatisticsForCom
374 pages
David Williams - Weighing The Odds A Course in Probability and Statistics
100% (1)
David Williams - Weighing The Odds A Course in Probability and Statistics
567 pages
Text Book
No ratings yet
Text Book
2 pages
Instructor s Solutions Manual for Probability and Statistics with R for Engineers and Scientists 1st edition Michael Akritas download pdf
100% (3)
Instructor s Solutions Manual for Probability and Statistics with R for Engineers and Scientists 1st edition Michael Akritas download pdf
79 pages
(eBook PDF) Probability and Statistics for Economists - Download the ebook now to never miss important content
100% (1)
(eBook PDF) Probability and Statistics for Economists - Download the ebook now to never miss important content
44 pages
DLMDSAS01 - Advanced Statistics.
100% (1)
DLMDSAS01 - Advanced Statistics.
248 pages
Instructor s Solutions Manual for Probability and Statistics with R for Engineers and Scientists 1st edition Michael Akritas instant download
100% (1)
Instructor s Solutions Manual for Probability and Statistics with R for Engineers and Scientists 1st edition Michael Akritas instant download
36 pages
q3 Stat Prob Week 1 7
No ratings yet
q3 Stat Prob Week 1 7
95 pages
S1B 16 All Lectures
No ratings yet
S1B 16 All Lectures
221 pages
(eBook PDF) Statistical Reasoning for Everyday Life 4th Edition by Jeffrey O. Bennett download
100% (1)
(eBook PDF) Statistical Reasoning for Everyday Life 4th Edition by Jeffrey O. Bennett download
41 pages
BookonProbailityRandomProcessesandStatisticalAnalysis
No ratings yet
BookonProbailityRandomProcessesandStatisticalAnalysis
3 pages
(Ebook) Practice of Statistics in the Life Sciences by Brigitte Baldi, David S. Moore ISBN 9781319067496, 1319067492 pdf download
No ratings yet
(Ebook) Practice of Statistics in the Life Sciences by Brigitte Baldi, David S. Moore ISBN 9781319067496, 1319067492 pdf download
54 pages
David Forsyth - Probability and Statistics For Computer Science (2018, Springer)
No ratings yet
David Forsyth - Probability and Statistics For Computer Science (2018, Springer)
368 pages
Data Modeling For The Sciences Applications Basics Computations 1st Edition Steve Press pdf download
No ratings yet
Data Modeling For The Sciences Applications Basics Computations 1st Edition Steve Press pdf download
80 pages
Probability and Statistics by Example Volume 1 Basic Probability and Statistics 2nd Revised edition Edition Yuri Suhov all chapter instant download
100% (3)
Probability and Statistics by Example Volume 1 Basic Probability and Statistics 2nd Revised edition Edition Yuri Suhov all chapter instant download
67 pages
Understanding Statistics As A Language
From Everand
Understanding Statistics As A Language
Robert Andrews
No ratings yet
Big Data, Little Data, No Data: Scholarship in the Networked World
From Everand
Big Data, Little Data, No Data: Scholarship in the Networked World
Christine L. Borgman
2.5/5 (2)
Dokumen.pub an Introduction to Symbolic Logic
No ratings yet
Dokumen.pub an Introduction to Symbolic Logic
298 pages
13sec2_5
No ratings yet
13sec2_5
8 pages
apc_gauss_law_recitations_key
No ratings yet
apc_gauss_law_recitations_key
12 pages
1 (1)
No ratings yet
1 (1)
13 pages
Dinamica
No ratings yet
Dinamica
7 pages
Introduction to Probability Second Edition Joseph K Blitzstein Jessica Hwang pdf download
100% (3)
Introduction to Probability Second Edition Joseph K Blitzstein Jessica Hwang pdf download
26 pages
MBA - Syllabus For Specialisation Courses
No ratings yet
MBA - Syllabus For Specialisation Courses
79 pages
Boolean Algebra and Logic Simplification
No ratings yet
Boolean Algebra and Logic Simplification
11 pages
Major Problems of the Indian Election System
No ratings yet
Major Problems of the Indian Election System
3 pages
Tips in Making MCQ
No ratings yet
Tips in Making MCQ
28 pages
Jurnal Alkadimat Posko13 - (BISMILLAH)
No ratings yet
Jurnal Alkadimat Posko13 - (BISMILLAH)
10 pages
2 General Thesis Data of Factchecks On 2023 Eleection in Nigeria
No ratings yet
2 General Thesis Data of Factchecks On 2023 Eleection in Nigeria
18 pages
Fabula Chronicles S0C06 Nel Vortice Dei Ricordi EN
No ratings yet
Fabula Chronicles S0C06 Nel Vortice Dei Ricordi EN
10 pages
CBSE-Class-10-Maths-Chapter-9-Some-Applications-of-Trigonometry-Revision-Notes
No ratings yet
CBSE-Class-10-Maths-Chapter-9-Some-Applications-of-Trigonometry-Revision-Notes
5 pages
ABAKADA V Ermita
No ratings yet
ABAKADA V Ermita
6 pages
Pressure Vessel Hand Book 10th Edition
No ratings yet
Pressure Vessel Hand Book 10th Edition
489 pages
Final Poly 1
No ratings yet
Final Poly 1
14 pages
DHL 1
No ratings yet
DHL 1
17 pages
BUGNAY
No ratings yet
BUGNAY
33 pages
What Is Whole-Body Vibration?
No ratings yet
What Is Whole-Body Vibration?
2 pages
Marginson - Studies in HE Final 30 June 2017
No ratings yet
Marginson - Studies in HE Final 30 June 2017
20 pages
Les Temps Du Verbe Et Les Modes
No ratings yet
Les Temps Du Verbe Et Les Modes
22 pages
Total Intravenous Anesthesia and TCI
100% (1)
Total Intravenous Anesthesia and TCI
25 pages
Ode To Nightingale by Keats
No ratings yet
Ode To Nightingale by Keats
4 pages
520l0553 PDF
No ratings yet
520l0553 PDF
52 pages
Aquarius User Manual PDF
No ratings yet
Aquarius User Manual PDF
16 pages
15ee232 - Eecs - Unit 1-Electric Circuits PDF
No ratings yet
15ee232 - Eecs - Unit 1-Electric Circuits PDF
85 pages
Tang VS Ca
No ratings yet
Tang VS Ca
1 page
Concealment Notes
No ratings yet
Concealment Notes
1 page
Modelo 4 - Parte I
No ratings yet
Modelo 4 - Parte I
6 pages
Check List: For Processing of RA Bills @
No ratings yet
Check List: For Processing of RA Bills @
9 pages
Unit-6.PDF Engg Math
No ratings yet
Unit-6.PDF Engg Math
56 pages
BHArathiar BBA VI Sem Service Marketing Model Papers
100% (1)
BHArathiar BBA VI Sem Service Marketing Model Papers
7 pages
RayMix Concrete Case Study
No ratings yet
RayMix Concrete Case Study
3 pages
Caring For The Mechanically Ventilated Patient Tip Card - January 2019
100% (1)
Caring For The Mechanically Ventilated Patient Tip Card - January 2019
3 pages

9781107113046_frontmatter

Uploaded by

9781107113046_frontmatter

Uploaded by

Cambridge University Press & Assessment

978-1-107-11304-6 — Introduction to Probability and Statistics for Data Science

Introduction to Probability and Statistics

Steven E. Rigdon is Professor of Biostatistics at Saint Louis University. He is a fellow of the

Douglas C. Montgomery is Regents’ Professor and ASU Foundation Professor of Engineering at

© in this web service Cambridge University Press & Assessment www.cambridge.org

© in this web service Cambridge University Press & Assessment www.cambridge.org

Introduction to Probability and

Ronald D. Fricker, Jr.

© in this web service Cambridge University Press & Assessment www.cambridge.org

Shaftesbury Road, Cambridge CB2 8EA, United Kingdom

Cambridge University Press is part of Cambridge University Press & Assessment,

© in this web service Cambridge University Press & Assessment www.cambridge.org

To my wife Pat, who has always supported me and been at my side.

To my spouse, Christine: Tu ventus sub alis meis es.

© in this web service Cambridge University Press & Assessment www.cambridge.org

Preface xiii 2.2.3 Histograms 35

© in this web service Cambridge University Press & Assessment www.cambridge.org

4 Random Variables 82 6.5 Gamma and Weibull Distribu-

© in this web service Cambridge University Press & Assessment www.cambridge.org

8.9 Chapter Summary 268 11.5 Testing the Mean: Variance

© in this web service Cambridge University Press & Assessment www.cambridge.org

13.6 The Continuity Correction and 15.6 Noninformative Priors 596

© in this web service Cambridge University Press & Assessment www.cambridge.org

18.6.1 K = 2 Groups and p = 1 19.5 Cross-Validation with Classifica-

© in this web service Cambridge University Press & Assessment www.cambridge.org

© in this web service Cambridge University Press & Assessment www.cambridge.org

Error in library(testassay) : there is no package called ‘testassay’

Error in arrange(df, y) : could not find function "arrange"

© in this web service Cambridge University Press & Assessment www.cambridge.org

© in this web service Cambridge University Press & Assessment www.cambridge.org

© in this web service Cambridge University Press & Assessment www.cambridge.org

You might also like