Vital Statistics
Probability and Statistics for Economics and Business
First Edition
William Sandholm and Brett Saraniti
Table of Contents
*Asterisks indicate sections tangential to the main line of argument
Preface for Students
Preface for Instructors
Acknowledgments
1. Random Reasoning
1.1 Introduction
1.2 Probability
1.3 Statistics
1.4 Conclusion
2. Probability Models
2.1 Ex Ante vs. Ex Post
2.2 Sample Spaces
2.2.1 Sample spaces, outcomes, and events
2.2.2 New events from old
2.3 Probability Measures
2.3.1 The axioms of probability
2.3.2 Further properties of probability measures
2.3.3 Interpreting and assigning probabilities
2.4 Conditional Probability
2.4.1 What is conditional probability?
2.4.2 Joint, marginal, and conditional probabilities
2.4.3 The total probability rule
2.4.4 Bayes' rule
2.5 Independence
2.5.1 Independence of pairs of events
2.5.2 Independence of many events
2.5.3 Independence of many events: A formal treatment*
2.6 Constructing Probability Models*
2.6.1 Two probability problems
2.6.2 Discussion of the Linda problem
2.6.3 Discussion of the Monty Hall problem
2.A Appendix: Finite and Countable Additivity
2.E Exercises
3. Random Variables
3.1 Random Variables
3.1.1 What exactly is a random variable?
3.1.2 Ex ante vs. ex post revisited
3.1.3 The distribution of a random variable
3.2 Traits of Random Variables
3.2.1 Expected value
3.2.2 Variance and standard deviation
3.2.3 An alternate formula for expected values*
3.3 Functions of Random Variables
3.4 Independent Random Variables
3.4.1 Independence of two random variables
3.4.2 Independence of many random variables
3.4.3 Sums of independent random variables
3.4.4 New independent random variables from old
3.E Exercises
4. Multiple Random Variables
4.1 Multiple Random Variables
4.1.1 Joint distributions and marginal distributions
4.1.2 Conditional distributions
4.1.3 Conditional traits and the law of iterated expectations
4.2 Traits of Random Variable Pairs
4.2.1 Covariance
4.2.2 Correlation
4.2.3 Some useful facts
4.2.4 Independence and zero correlation
4.3 Functions of Multiple Random Variables
4.4 Portfolio Selection*
4.4.1 A simple model of a financial market
4.4.2 Portfolio selection and diversification
4.4.3 Efficient portfolios
4.4.4 The benefits of diversification
4.A Appendix
4.A.1 Definitions, formulas, and facts about random variables
4.A.2 Derivations of formulas and facts
4.B The Capital Asset Pricing Model [ONLINE]
4.E Exercises
5. Bernoulli Trials Processes and Discrete Distributions
5.1 Families of Distributions
5.1.1 Indicator random variables
5.1.2 Bernoulli distributions
5.1.3 Traits of Bernoulli random variables
5.2 Bernoulli Trials Processes
5.3 How to Count
5.3.1 Choice sequences
5.3.2 Orderings
5.3.3 Permutations
5.3.4 Combinations
5.4 Binomial Distributions
5.4.1 Definition
5.4.2 Another way to represent binomial distributions
5.4.3 Traits of binomial random variables
5.5 Simulation and Mathematical Analysis of Probability Models*
5.5.1 The birthday problem
5.5.2 Simulations
5.5.3 Mathematical analysis
5.5.4 Simulation versus mathematical analysis
5.E Exercises
6. Continuous Random Variables and Distributions
6.1 Continuous Probability Models
6.1.1 Why bother with continuous probability models?
6.1.2 "Probability zero" and "impossible
6.2 Continuous Random Variables and Distributions
6.2.1 Cumulative probabilities
6.2.2 Density functions
6.2.3 Density functions: Intuition
6.2.4 Percentiles of continuous distributions
6.2.5 Traits of continuous random variables
6.3 Uniform Distributions
6.3.1 Definitions
6.3.2 Traits
6.3.3 Shifting and scaling
6.4 Normal Distributions
6.4.1 Shifting, scaling, and the standard normal distribution
6.4.2 Standard normal probabilities
6.4.3 Normal probabilities
6.5 Calculating Normal Probabilities Using the Table
6.5.1 The standard normal distribution table
6.5.2 Calculating standard normal probabilities
6.5.3 Calculating normal probabilities
6.6 Sums of Independent Normal Random Variables
6.6.1 Distributions of sums of independent random variables
6.6.2 Brownian motion*
6.A Continuous Distributions (using calculus) [ONLINE]
6.B Continuous Joint Distributions (using calculus) [ONLINE]
6.E Exercises
7. The Central Limit Theorem
7.1 I.I.D. Random Variables
7.2 Sums and Sample Means of I.I.D. Random Variables
7.2.1 Definition
7.2.2 Traits of sums and sample means of i.i.d. random variables
7.3 The Law of Large Numbers
7.3.1 Statement of the law of large numbers
7.3.2 The law of large numbers and the "law of averages"
7.3.3 Proving the law of large numbers*
7.4 The Central Limit Theorem
7.4.1 Convergence in distribution
7.4.2 Statement of the central limit theorem
7.4.3 Simulations with continuous trials
7.4.4 The continuity correction
7.4.5 Simulations with discrete trials
7.5 The Central Limit Theorem: Applications
7.5.1 Normal approximation of binomial distributions
7.5.2 Gambling
7.5.3 Queues
7.5.4 Statistical inference
7.A Proof of the Central Limit Theorem [ONLINE]
7.E Exercises
8. Poisson and Exponential Distributions
8.1 Poisson Distributions and the Poisson limit theorem
8.1.1 e
8.1.2 Poisson distributions
8.1.3 The Poisson limit theorem
8.2 Exponential Distributions
8.2.1 Definition
8.2.2 Probabilities and traits
8.2.3 Peculiar properties
8.3 The Exponential Interarrival Model and the Poisson Process*
8.A Appendix
8.E Exercises
9. The Psychology of Probability
9.1 Thought Experiments
9.2 Framing Effects
9.3 Overconfidence
9.4 Misestimating the Impact of Evidence
9.5 The "Law of Small Numbers"
9.6 Gambling Systems and Technical Trading Strategies
9.E Exercises
10. How to Lie with Statistics
10.1 Introduction
10.2 Variation
10.2.1 Variation within a population
10.2.2 Variation within subgroups: Simpson's paradox
10.2.3 Variation in the results of random samples
10.3 Polls and Sampling
10.3.1 Sampling from the wrong population
10.3.2 Designing polls: Wording of questions
10.3.3 Designing polls: Selection of response alternatives
10.3.4 Designing polls: Arrangement of questions
10.3.5 Administering polls: Ensuring honest reporting
10.3.6 When can I trust a poll?
10.4 Endogenous Sampling Biases
10.5 Causal Inference and Extrapolation
10.5.1 Confounding variables
10.5.2 Spurious correlation and data mining
10.5.3 Linear extrapolation of nonlinear data
10.E Exercises
11. Data Graphics
11.1 Data
11.1.1 Types of variables
11.1.2 Types of data sets
11.1.3 Sources of economic and business data
11.2 Graphics for Univariate Data
11.2.1 Graphics that display every observation
11.2.2 Graphics for absolute and relative frequencies
11.2.3 Graphics for cumulative frequencies
11.3 Graphics for Multivariate Data
11.3.1 Graphics for frequencies
11.3.2 Graphics that display every observation
11.4 Principles for Data Graphics Design
11.4.1 First, do no harm
11.4.2 Infographics
11.4.3 One step beyond
11.A Appendix: Creating Data Graphics in Excel [ONLINE]
11.E Exercises
12. Descriptive Statistics
12.1 Descriptive Statistics for Univariate Data
12.1.1 Measures of relative standing: Percentiles and ranges
12.1.2 Measures of centrality: Mean and median
12.1.3 Measures of dispersion: Variance and standard deviation
12.2 Descriptive Statistics for Bivariate Data
12.2.1 Measures of linear association: Covariance and correlation
12.2.2 Visualizing correlations
12.2.3 Computing correlations: Arithmetic, pictures, or computer
12.2.4 The road ahead: Regression analysis
12.E Exercises
13. Probability Models for Statistical Inference
13.1 Introduction
13.2 The I.I.D. Trials Model for Statistical Inference
13.3 Inference about Inherently Random Processes
13.3.1 Bernoulli trials
13.3.2 Trials with an unknown distribution
13.4 Random Sampling and Inference about Populations
13.4.1 Random sampling
13.4.2 The trials' traits equal the data set's descriptive statistics
13.4.3 Bernoulli trials
13.4.4 Trials with an unknown distribution
13.5 Random Sampling in Practice
13.E Exercises
14. Point Estimation
14.1 Parameters, Estimators, and Estimates
14.2 Desirable Properties of Point Estimators
14.3 The Sample Means
14.3.1 Unbiasedness and consistency
14.3.2 Efficiency
14.3.3 The distribution of the sample mean
14.4 The Sample Variance
14.4.1 Defining the sample variance
14.4.2 Unbiasedness and consistency of the sample variance
14.5 Classical Statistics and Bayesian Statistics*
14.A Appendix: A Short Introduction to Bayesian Statistics
14.B Appendix: Derivations of Properties of the Sample Variance
14.E Exercises
15. Interval Estimation and Confidence Intervals
15.1 What Is Interval Estimation?
15.2 Constructing Interval Estimators
15.2.1 The 95% interval estimator for mu when sigma2 is known
15.2.2 The 95% interval estimator for mu when sigma2 is unknown
15.2.3 The (1 - alpha) interval estimator for when is unknown
15.2.4 Looking ahead: Standard errors and t distributions
15.3 Interval Estimators for Bernoulli Trials
15.4 Interpreting Confidence
15.5 Choosing Sample Sizes
15.5.1 Sample sizes for general i.i.d. trials
15.5.2 Sample sizes for Bernoulli trials processes
15.6 A Better Interval Estimator for Bernoulli Trials*
15.E Exercises
16. Hypothesis Testing
16.1 What Is Hypothesis Testing?
16.2 Hypothesis Testing: Basic Concepts
16.2.1 The probability model
16.2.2 Null and alternative hypotheses
16.2.3 One-tailed and two-tailed tests
16.2.4 Hypothesis tests and their significance levels
16.3 Designing Hypothesis Tests
16.3.1 Hypothesis tests for mu when sigma2 is known
16.3.2 Hypothesis tests for mu when sigma2 is unknown
16.3.3 Hypothesis tests for Bernoulli trials
16.4 Two-Tailed Hypothesis Tests
16.4.1 Two-tailed tests vs. one-tailed tests
16.4.2 Comparing two-tailed hypothesis tests and confidence intervals
16.5 Alternate Ways of Expressing Hypothesis Tests
16.5.1 z-statistics
16.5.2 P-values
16.6 Interpreting Hypothesis Tests
16.6.1 The meaning of significance
16.6.2 "Do not reject" vs. "accept"
16.6.3 Statistical significance versus practical significance
16.6.4 P-value .049 vs. P-value .051
16.6.5 Hypothesis testing in a vacuum
16.7 Significance and Power
16.7.1 Type I and Type II errors
16.7.2 Evaluating error probabilities
16.7.3 Power and the power curve
16.7.4 Underpowered studies
16.8 Choosing Sample Sizes
16.8.1 Sample sizes for general i.i.d. trials
16.8.2 Sample sizes for Bernoulli trials processes
16.9 Summary and Preview
16.E Exercises
17. Inference from Small Samples
17.1 The t-Statistic
17.2 t Distributions
17.3 Small-Sample Inference about the Mean of Normal Trials
17.3.1 The t-statistic and the t distribution
17.3.2 Interval estimation
17.3.3 Hypothesis testing
17.4 Sort-of-Normal Trials: The Robustness of the t-Statistic
17.5 Evaluating Normality of Trials*
17.A Appendix: Descendants of the Standard Normal Distribution [ONLINE]
17.E Exercises
18. Inference about Differences in Means
18.1 Inference from Two Separate Samples
18.1.1 The basic two-sample model
18.1.2 Bernoulli trials
18.1.3 Small samples, normal trials, equal variances*
18.2 Inference from Paired Samples
18.2.1 Constructing paired samples
18.2.2 The basic paired-sample model
18.2.3 Small samples, normal trials*
18.3 Choosing between Separate and Paired Samples
18.3.1 A general rule
18.3.2 Paired sampling using two observations per individual
18.3.3 Pairing samples using observable characteristics*
18.4 Causal Inference: Treatment Effects*
18.4.1 Randomized controlled experiments and observational studies
18.4.2 Interventions and causal assumptions
18.4.3 Potential outcomes and average treatment effects
18.4.4 A probability model of an observational study
18.4.5 Selection bias in observational studies
18.4.6 Random assignment eliminates selection bias
18.4.7 Controlling for observable confounding variables
18.A Appendix
18.B Appendix: The Distribution of the Pooled Sample Variance [ONLINE]
18.E Exercises
19. Simple Regression: Descriptive Statistics
19.1 The Regression Line
19.1.1 A brief review of descriptive statistics
19.1.2 The regression line
19.1.3 Examples, computations, and simulations
19.2 Prediction and Residuals
19.2.1 Predictors, predictions, and residuals
19.2.2 Best-in-class predictors
19.2.3 Further characterizations of the regression line
19.2.4 Deriving the best constant and best linear predictors*
19.3 The Conditional Mean Function
19.3.1 Best unrestricted prediction
19.3.2 Best linear prediction of conditional means
19.4 Analysis of Residuals
19.4.1 Sums of squares and variances of residuals for best-in-class predictors
19.4.2 Relative quality for best-in-class predictors
19.4.3 Decomposition of variance for regression
19.4.4 Sums of squares revisited
19.5 Pitfalls in Interpreting Regressions
19.5.1 Nonlinear relationships
19.5.2 Regression to the mean
19.5.3 Correlation and causation
19.6 Three Lines of Best Fit*
19.6.1 The reverse regression line
19.6.2 The neutral line
19.6.3 The three lines compared
19.A Appendix
19.A.1 Equivalence of the characterizations of the regression line
19.A.2 Best linear prediction of conditional means
19.A.3 Relative quality for best-in-class predictors: Derivation
19.A.4 Decomposition of variance for regression: Derivation
19.B Appendix: Characterization of the Neutral Line [ONLINE]
19.E Exercises
20. Simple Regression: Statistical Inference [ONLINE]
20.1 The Classical and Random Sampling Regression Models
20.1.1 Fixed x sampling versus random sampling
20.1.2 Linearity of conditional means
20.1.3 Constant conditional variances
20.1.4 How reasonable are the assumptions?
20.2 The OLS Estimators
20.2.1 Defining the OLS estimators
20.2.2 Basic properties of the OLS estimators
20.2.3 Estimating conditional means
20.2.4 Approximate normality of the OLS estimators
20.2.5 Efficiency of the OLS estimators: The Gauss-Markov Theorem*
20.3 The Sample Conditional Variance
20.4 Interval Estimators and Hypothesis Tests
20.4.1 Review: Inference about an unknown mean
20.4.2 Interval estimators and hypothesis tests for beta
20.4.3 Interval estimators and hypothesis tests for conditional means
20.4.4 Population regressions versus sample regressions
20.5 Small Samples and the Classical Normal Regression Model
20.5.1 The classical normal regression model
20.5.2 Interval estimators and hypothesis tests for beta
20.5.3 Interval estimators and hypothesis tests for conditional means
20.5.4 Prediction intervals*
20.6 Analysis of Residuals, R2, and F Tests
20.6.1 Sums of squares and R2
20.6.2 The F test for beta = 0
20.6.3 What happens without normality? The robustness of the F-statistic*
20.7 Regression and Causation
20.7.1 An alternate description of the classical regression model
20.7.2 Causal regression models
20.7.3 Multiple regression
20.A Appendix
20.A.1 Analysis of the random sampling regression model
20.A.2 The unstructured regression model
20.A.3 Computation of the mean and variance of B
20.A.4 Proof of the Gauss-Markov Theorem
20.A.5 Proof that the sample conditional variance is unbiased
20.A.6 Deriving the distribution of the F-statistic
20.E Exercises
Index