The R procedures and datasets provided here correspond to many of the examples discussed in R.K. Pearson, *Exploring Data in Engineering, the Sciences, and Medicine*. The R procedures are provided as text files (.txt) that may be copied and pasted into an interactive R session, and the datasets are provided as comma-separated value (.csv) files. These files are easily read in R via the **read.csv** command, or they may be examined by opening them in Microsoft Excel. Note that the R procedures described here are built on commands available in base R and the add-on packages designated as recommended, and do not require any other add-on packages. These commands were implemented in R version 2.11.1, installed as binary files in a Microsoft Windows environment.

Note that versions of a number of these datasets are available as built-in datasets in a variety of R packages (e.g., the von Bortkewitsch horsekick deaths data is available in the R add-on package **vcd** as the dataset VonBort). In addition, three of these datasets (federalist.csv, horsekick.csv, and bitterpit.csv) were constructed from datasets described in the book *Data* by D.F. Andrews and A.M. Herzberg (Springer-Verlag, New York, 1985) and available from the following website:

http://lib.stat.cmu.edu/datasets/Andrews/

Similarly, the datasets mushroom.csv and pima.csv were constructed from datasets available from the UCI Machine Learning Repository (Frank, A. & Asuncion, A. (2010). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science).

Chapter 1 – The Art of Analyzing Data

- The Ohm's law dataset (ohmdata.csv)
- The first industrial pressure dataset (pressure1.csv)
- The second industrial pressure dataset (pressure2.csv)
- The third industrial pressure dataset (pressure3.csv)
- The fourth industrial pressure dataset (pressure4.csv)
- R code to generate Fig. 1.8 boxplot (ch1fig8proc.txt)
- The physical property dataset (physprop.csv)
- R code to generate Fig. 1.9 with lowess smoother (ch1fig9proc.txt)
- The brain/body weight dataset (brainbody.csv)

Chapter 2 – Data: Types, Uncertainty, and Quality

- The UCI mushroom dataset (mushroom.csv)
- The UCI Pima Indians diabetes dataset (pima.csv)
- The helicopter dataset (helicopter.csv)
- The makeup flow rate dataset (makeup.csv)

Chapter 3 – Characterizing Categorical Variables

- The
*Federalist Papers*dataset (federalist.csv) was generated from Table 4.1 in the book*Data*by D.F. Andrews and A.M. Herzberg (Springer-Verlag, New York, 1985). - The von Bortkewitsch horsekick deaths data (horsekick.csv) was generated from Table 69.1 in the book
*Data*by D.F. Andrews and A.M. Herzberg (Springer-Verlag, New York, 1985). - R procedure for the normalized Shannon measure (shannonproc.txt)
- R procedure for the normalized Simpson measure (simpsonproc.txt)
- R procedure for the normalized Gini measure (giniproc.txt)
- R procedure for the normalized Bray measure (brayproc.txt)
- R procedure to generate Fig. 3.3 (ch3fig3proc.txt)
- R procedure to generate Fig. 3.4 (ch3fig4proc.txt)

Chapter 4 – Uncertainty in Real Variables

Chapter 5 – Fitting Straight Lines

- R procedure to generate Fig. 5.1 (ch5fig1proc.txt)

Chapter 6 – A Brief Introduction to Estimation Theory

- R procedure to generate Fig. 6.7 (ch6fig7proc.txt)
- R procedure to generate Fig. 7.8 (ch6fig8proc.txt)

Chapter 7 – Outliers: Distributional Monsters (?) That Lurk in Data

- R code for the 3 sigma edit rule (threesigmaproc.txt)
- R code for the Hampel outlier identifier (hampelproc.txt)
- R code for moment-based skewness measure (skewnessproc.txt)
- R code for Galton's skewness measure (galtonskewproc.txt)
- R code for Hotelling's skewness measure (hotellingskewproc.txt)

Chapter 8 – Characterizing a Dataset

- R code to generate Poissonness plots (poissonnessplot.txt)
- The Old Faithful geyser dataset is
**faithful**in base R (**datasets**package) - R code to generate data comparison plots (dataqqplotproc.txt)
- R code to generate negative binomialness plots (negbinessplot.txt)

Chapter 9 – Confidence Intervals and Hypothesis Testing

- R code for modified Poissonness plots (modpoissonnessplot.txt)
- R code for binomial confidence intervals (binomCIproc.txt)
- R code for Beal's Method (BealsMethodproc.txt)

Chapter 10 – Relations among Variables

- The World Almanac election dataset (elections.csv)

Chapter 11 – Regression Models I: Real Data

Chapter 12 – Reexpression: Data Transformations

- R code for Box-Cox transformations (boxcoxtransform.txt)
- R code for Aranda-Ordaz transformations (arandaordazproc.txt)
- R code for angular transformations (angulartransform.txt)

Chapter – 13: Regression Models II: Mixed Data Types

- R code for odds ratio characterizations (oddsratioWaldproc.txt)
- The apple tree/bitter pit dataset (bitterpit.csv) was generated from Table 59.1 in the book
*Data*by D.F. Andrews and A.M. Herzberg (Springer-Verlag, New York, 1985).

Chapter 14 – Characterizing Analysis Results

Chapter 15 – Regression Models III: Diagnostics and Refinements

Chapter 16 – Dealing with Missing Data