## Vital Statistics

## Probability and Statistics for Economics and Business

### William Sandholm and Brett Saraniti

## Table of Contents

*Asterisks indicate sections tangential to the main line of argument

*Preface for Students *

Preface for Instructors

Acknowledgments

**1. Random Reasoning **

1.1 Introduction

1.2 Probability

1.3 Statistics

1.4 Conclusion

**2. Probability Models **

2.1 Ex Ante vs. Ex Post

2.2 Sample Spaces

*2.2.1 Sample spaces, outcomes, and events *

2.2.2 New events from old

2.3 Probability Measures

*2.3.1 The axioms of probability *

2.3.2 Further properties of probability measures

2.3.3 Interpreting and assigning probabilities

2.4 Conditional Probability

*2.4.1 What is conditional probability?*

2.4.2 Joint, marginal, and conditional probabilities

2.4.3 The total probability rule

2.4.4 Bayes' *rule *

2.5 Independence

*2.5.1 Independence of pairs of events *

2.5.2 Independence of many events

2.5.3 Independence of many events: A formal treatment*

2.6 Constructing Probability Models*

*2.6.1 Two probability problems *

2.6.2 Discussion of the Linda problem

2.6.3 Discussion of the Monty Hall problem

2.A Appendix: Finite and Countable Additivity

2.E Exercises

**3. Random Variables**

3.1 Random Variables

*3.1.1 What exactly is a random variable? *

3.1.2 Ex ante vs. ex post revisited

3.1.3 The distribution of a random variable

3.2 Traits of Random Variables

*3.2.1 Expected value *

3.2.2 Variance and standard deviation

3.2.3 An alternate formula for expected values*

3.3 Functions of Random Variables

3.4 Independent Random Variables

*3.4.1 Independence of two random variables *

3.4.2 Independence of many random variables

3.4.3 Sums of independent random variables

3.4.4 New independent random variables from old

3.E Exercises

**4. Multiple Random Variables **

4.1 Multiple Random Variables

*4.1.1 Joint distributions and marginal distributions *

4.1.2 Conditional distributions

4.1.3 Conditional traits and the law of iterated expectations

4.2 Traits of Random Variable Pairs

*4.2.1 Covariance *

4.2.2 Correlation

4.2.3 Some useful facts

4.2.4 Independence and zero correlation

4.3 Functions of Multiple Random Variables

4.4 Portfolio Selection*

*4.4.1 A simple model of a financial market *

4.4.2 Portfolio *selection and diversification *

4.4.3 Efficient portfolios

4.4.4 The benefits of diversification

4.A Appendix

*4.A.1 Definitions, formulas, and facts about random variables *

4.A.2 Derivations of formulas and facts

4.B The Capital Asset Pricing Model **[ONLINE]**

4.E Exercises

**5. Bernoulli Trials Processes and Discrete Distributions **

5.1 Families of Distributions

*5.1.1 Indicator random variables *

5.1.2 Bernoulli distributions

5.1.3 Traits of Bernoulli random variables

5.2 Bernoulli Trials Processes

5.3 How to Count

*5.3.1 Choice sequences *

5.3.2 Orderings

5.3.3 Permutations

5.3.4 Combinations

5.4 Binomial Distributions

*5.4.1 Definition *

5.4.2 Another way to represent binomial distributions

5.4.3 Traits of binomial random variables

5.5 Simulation and Mathematical Analysis of Probability Models*

*5.5.1 The birthday problem *

5.5.2 Simulations

5.5.3 Mathematical analysis

5.5.4 Simulation versus mathematical analysis

5.E Exercises

**6. Continuous Random Variables and Distributions **

6.1 Continuous Probability Models

*6.1.1 Why bother with continuous probability models? *

6.1.2 "Probability zero" and "impossible

6.2 Continuous Random Variables and Distributions

*6.2.1 Cumulative probabilities *

6.2.2 Density functions

6.2.3 Density functions: Intuition

6.2.4 Percentiles of continuous distributions

6.2.5 Traits of continuous random variables

6.3 Uniform Distributions

*6.3.1 Definitions*

6.3.2 *Traits *

6.3.3 Shifting and scaling

6.4 Normal Distributions

*6.4.1 Shifting, scaling, and the standard normal distribution *

6.4.2 Standard normal probabilities

6.4.3 Normal probabilities

6.5 Calculating Normal Probabilities Using the Table

*6.5.1 The standard normal distribution table *

6.5.2 Calculating standard normal probabilities

6.5.3 Calculating normal probabilities

6.6 Sums of Independent Normal Random Variables

*6.6.1 Distributions of sums of independent random variables *

6.6.2 Brownian motion*

6.A Continuous Distributions (using calculus) **[ONLINE]**

6.B Continuous Joint Distributions (using calculus)** [ONLINE]**

6.E Exercises

**7. The Central Limit Theorem **

7.1 I.I.D. Random Variables

7.2 Sums and Sample Means of I.I.D. Random Variables

*7.2.1 Definition *

7.2.2 Traits of sums and sample means of i.i.d. random variables

7.3 The Law of Large Numbers

*7.3.1 Statement of the law of large numbers *

7.3.2 The law of large numbers and the "law of averages"

7.3.3 Proving the law of large numbers*

7.4 The Central Limit Theorem

*7.4.1 Convergence in distribution *

7.4.2 Statement of the central limit theorem

7.4.3 Simulations with continuous trials

7.4.4 The continuity correction

7.4.5 Simulations with discrete trials

7.5 The Central Limit Theorem: Applications

*7.5.1 Normal approximation of binomial distributions *

7.5.2 Gambling

7.5.3 Queues

7.5.4 Statistical inference

7.A Proof of the Central Limit Theorem **[ONLINE]**

7.E Exercises

**8. Poisson and Exponential Distributions **

8.1 Poisson Distributions and the Poisson limit theorem

*8.1.1 e *

8.1.2 Poisson distributions

8.1.3 The Poisson limit theorem

8.2 Exponential Distributions

*8.2.1 Definition *

8.2.2 Probabilities and traits

8.2.3 Peculiar properties

8.3 The Exponential Interarrival Model and the Poisson Process*

8.A Appendix

8.E Exercises

**9. The Psychology of Probability **

9.1 Thought Experiments

9.2 Framing Effects

9.3 Overconfidence

9.4 Misestimating the Impact of Evidence

9.5 The "Law of Small Numbers"

9.6 Gambling Systems and Technical Trading Strategies

9.E Exercises

**10. How to Lie with Statistics **

10.1 Introduction

10.2 Variation

*10.2.1 Variation within a population *

10.2.2 Variation within subgroups: Simpson's paradox

10.2.3 Variation in the results of random samples

10.3 Polls and Sampling

*10.3.1 Sampling from the wrong population *

10.3.2 Designing polls: Wording of questions

10.3.3 Designing polls: Selection of response alternatives

10.3.4 Designing polls: Arrangement of questions

10.3.5 Administering polls: Ensuring honest reporting

10.3.6 When can I trust a poll?

10.4 Endogenous Sampling Biases

10.5 Causal Inference and Extrapolation

*10.5.1 Confounding variables *

10.5.2 Spurious correlation and data mining

10.5.3 Linear extrapolation of nonlinear data

10.E Exercises

**11. Data Graphics **

11.1 Data

*11.1.1 Types of variables *

11.1.2 Types of data sets

11.1.3 Sources of economic and business data

11.2 Graphics for Univariate Data

*11.2.1 Graphics that display every observation *

11.2.2 Graphics for absolute and relative frequencies

11.2.3 Graphics for cumulative frequencies

11.3 Graphics for Multivariate Data

*11.3.1 Graphics for frequencies *

11.3.2 Graphics that display every observation

11.4 Principles for Data Graphics Design

*11.4.1 First, do no harm *

11.4.2 Infographics

11.4.3 One step beyond

11.A Appendix: Creating Data Graphics in Excel** [ONLINE]**

11.E Exercises

**12. Descriptive Statistics **

12.1 Descriptive Statistics for Univariate Data

*12.1.1 Measures of relative standing: Percentiles and* *ranges *

12.1.2 Measures of centrality: Mean and median

12.1.3 Measures of dispersion: Variance and standard deviation

12.2 Descriptive Statistics for Bivariate Data

*12.2.1 Measures of linear association: Covariance and correlation *

12.2.2 Visualizing correlations

12.2.3 Computing correlations: Arithmetic, pictures, or computer

12.2.4 The road ahead: Regression analysis

12.E Exercises

**13. Probability Models for Statistical Inference **

13.1 Introduction

13.2 The I.I.D. Trials Model for Statistical Inference

13.3 Inference about Inherently Random Processes

*13.3.1 Bernoulli trials *

13.3.2 Trials with an unknown distribution

13.4 Random Sampling and Inference about Populations

*13.4.1 Random sampling *

13.4.2 The *trials' traits equal the data set's descriptive statistics *

13.4.3 Bernoulli trials

13.4.4 Trials with an unknown distribution

13.5 Random Sampling in Practice

13.E Exercises

**14. Point Estimation **

14.1 Parameters, Estimators, and Estimates

14.2 Desirable Properties of Point Estimators

14.3 The Sample Means

*14.3.1 Unbiasedness and consistency *

14.3.2 Efficiency

14.3.3 The distribution of the sample mean

14.4 The Sample Variance

*14.4.1 Defining the sample variance *

14.4.2 Unbiasedness and consistency of the sample variance

14.5 Classical Statistics and Bayesian Statistics*

14.A Appendix: A Short Introduction to Bayesian Statistics

14.B Appendix: Derivations of Properties of the Sample Variance

14.E Exercises

**15. Interval Estimation and Confidence Intervals **

15.1 What Is Interval Estimation?

15.2 Constructing Interval Estimators

*15.2.1 The 95% interval estimator for mu when sigma2 is known *

15.2.2 The 95% interval estimator for mu when sigma2 is unknown

15.2.3 The (1 - alpha) interval estimator for when is unknown

15.2.4 Looking ahead: Standard errors and t distributions

15.3 Interval Estimators for Bernoulli Trials

15.4 Interpreting Confidence

15.5 Choosing Sample Sizes

*15.5.1 Sample sizes for general i.i.d. trials *

15.5.2 Sample sizes for Bernoulli trials processes

15.6 A Better Interval Estimator for Bernoulli Trials*

15.E Exercises

**16. Hypothesis Testing **

16.1 What Is Hypothesis Testing?

16.2 Hypothesis Testing: Basic Concepts

*16.2.1 The probability model *

16.2.2 Null and alternative hypotheses

16.2.3 One-tailed and two-tailed tests

16.2.4 Hypothesis tests and their significance levels

16.3 Designing Hypothesis Tests

*16.3.1 Hypothesis tests for mu when sigma2 is known*

16.3.2 Hypothesis tests for mu when sigma2 is unknown

16.3.3 Hypothesis tests for Bernoulli trials

16.4 Two-Tailed Hypothesis Tests

*16.4.1 Two-tailed tests vs. one-tailed tests *

16.4.2 Comparing two-tailed hypothesis tests and confidence intervals

16.5 Alternate Ways of Expressing Hypothesis Tests

*16.5.1 z-statistics *

16.5.2 P-values

16.6 Interpreting Hypothesis Tests

*16.6.1 The meaning of significance *

16.6.2 "Do not reject" vs. *"accept" *

16.6.3 Statistical significance versus practical significance

16.6.4 P-value .049 vs. P-value .051

16.6.5 Hypothesis testing in a vacuum

16.7 Significance and Power

*16.7.1 Type I and Type II errors *

16.7.2 Evaluating error probabilities

16.7.3 Power and the power curve

16.7.4 Underpowered studies

16.8 Choosing Sample Sizes

*16.8.1 Sample sizes for general i.i.d. trials *

16.8.2 Sample sizes for Bernoulli trials processes

16.9 Summary and Preview

16.E Exercises

**17. Inference from Small Samples **

17.1 The t-Statistic

17.2 t Distributions

17.3 Small-Sample Inference about the Mean of Normal Trials

*17.3.1 The t-statistic and the t distribution *

17.3.2 Interval estimation

17.3.3 Hypothesis *testing *

17.4 Sort-of-Normal Trials: The Robustness of the t-Statistic

17.5 Evaluating Normality of Trials*

17.A Appendix: Descendants of the Standard Normal Distribution **[ONLINE]**

17.E Exercises

**18. Inference about Differences in Means **

18.1 Inference from Two Separate Samples

*18.1.1 The basic two-sample model *

18.1.2 Bernoulli trials

18.1.3 Small samples, normal trials, equal variances*

18.2 Inference from Paired Samples

*18.2.1 Constructing paired samples *

18.2.2 The basic paired-sample model

18.2.3 Small samples, normal trials*

18.3 Choosing between Separate and Paired Samples

*18.3.1 A general rule *

18.3.2 Paired sampling using two observations per individual

18.3.3 Pairing samples using observable *characteristics* *

18.4 Causal Inference: Treatment Effects*

*18.4.1 Randomized controlled experiments and observational studies *

18.4.2 Interventions and causal assumptions

18.4.3 Potential outcomes and average treatment effects

18.4.4 A probability model of an observational study

18.4.5 Selection bias in observational studies

18.4.6 Random assignment eliminates selection bias

18.4.7 Controlling for observable confounding variables

18.A Appendix

18.B Appendix: The Distribution of the Pooled Sample Variance** [ONLINE]**

18.E Exercises

**19. Simple Regression: Descriptive Statistics **

19.1 The Regression Line

*19.1.1 A brief review of descriptive statistics *

19.1.2 The regression line

19.1.3 Examples, computations, and *simulations *

19.2 Prediction and Residuals

*19.2.1 Predictors, predictions, and residuals *

19.2.2 Best-in-class predictors

19.2.3 Further characterizations of the regression line

19.2.4 Deriving the best constant and best linear predictors*

19.3 The Conditional Mean Function

*19.3.1 Best unrestricted prediction *

19.3.2 Best linear prediction of conditional means

19.4 Analysis of Residuals

*19.4.1 Sums of squares and variances of residuals for best-in-class predictors *

19.4.2 Relative quality for best-in-class predictors

19.4.3 Decomposition of variance for regression

19.4.4 Sums of squares revisited

19.5 Pitfalls in Interpreting Regressions

*19.5.1 Nonlinear relationships *

19.5.2 Regression to the mean

19.5.3 *Correlation and causation *

19.6 Three Lines of Best Fit*

*19.6.1 The reverse regression line *

19.6.2 The neutral line

19.6.3 The three lines compared

19.A Appendix

*19.A.1 Equivalence of the characterizations of the regression line *

19.A.2 Best linear prediction of conditional means

19.A.3 Relative quality for best-in-class predictors: Derivation

19.A.4 Decomposition of variance for regression: Derivation

19.B Appendix: Characterization of the Neutral Line [ONLINE]

19.E Exercises

**20. Simple Regression: Statistical Inference [ONLINE]**

20.1 The Classical and Random Sampling Regression Models

*20.1.1 Fixed x sampling versus random sampling*

20.1.2 Linearity of conditional means

20.1.3 Constant conditional variances

20.1.4 How reasonable are the assumptions?

20.2 The OLS Estimators

*20.2.1 Defining the OLS estimators*

20.2.2 Basic properties of the OLS estimators

20.2.3 Estimating conditional means

20.2.4 Approximate normality of the OLS estimators

20.2.5 Efficiency of the OLS estimators: The Gauss-Markov Theorem*

20.3 The Sample Conditional Variance

20.4 Interval Estimators and Hypothesis Tests

*20.4.1 Review: Inference about an unknown mean *

20.4.2 Interval estimators and hypothesis tests for beta

20.4.3 Interval estimators and hypothesis tests for conditional means

20.4.4 Population regressions versus sample regressions

20.5 Small Samples and the Classical Normal Regression Model

*20.5.1 The classical normal regression model*

20.5.2 Interval estimators and hypothesis tests for beta

20.5.3 Interval estimators and hypothesis tests for conditional means

20.5.4 Prediction intervals*

20.6 Analysis of Residuals, R2, and F Tests

*20.6.1 Sums of squares and R2 *

20.6.2 The F test for beta = 0

20.6.3 What happens without normality? The robustness of the F-statistic*

20.7 Regression and Causation

*20.7.1 An alternate description of the classical regression model *

20.7.2 Causal regression models

20.7.3 Multiple regression

20.A Appendix

*20.A.1 Analysis of the random sampling regression model *

20.A.2 The unstructured regression model

20.A.3 Computation of the mean and variance of B

20.A.4 Proof of the Gauss-Markov Theorem

20.A.5 Proof that the *sample conditional variance is unbiased *

20.A.6 Deriving the distribution of the F-statistic

20.E Exercises

*Index*