masthead
 

R programs and datasets

The R procedures and datasets provided here correspond to many of the examples discussed in R.K. Pearson, Exploring Data in Engineering, the Sciences, and Medicine.  The R procedures are provided as text files (.txt) that may be copied and pasted into an interactive R session, and the datasets are provided as comma-separated value (.csv) files. These files are easily read in R via the read.csv command, or they may be examined by opening them in Microsoft Excel.  Note that the R procedures described here are built on commands available in base R and the add-on packages designated as recommended, and do not require any other add-on packages.  These commands were implemented in R version 2.11.1, installed as binary files in a Microsoft Windows environment.

Note that versions of a number of these datasets are available as built-in datasets in a variety of R packages (e.g., the von Bortkewitsch horsekick deaths data is available in the R add-on package vcd as the dataset VonBort).  In addition, three of these datasets (federalist.csv, horsekick.csv, and bitterpit.csv) were constructed from datasets described in the book Data by D.F. Andrews and A.M. Herzberg (Springer-Verlag, New York, 1985) and available from the following website:

http://lib.stat.cmu.edu/datasets/Andrews/

Similarly, the datasets mushroom.csv and pima.csv were constructed from datasets available from the UCI Machine Learning Repository (Frank, A. & Asuncion, A. (2010). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science).

Chapter 1 – The Art of Analyzing Data

Chapter 2 – Data: Types, Uncertainty, and Quality

Chapter 3 – Characterizing Categorical Variables

  • The Federalist Papers dataset (federalist.csv) was generated from Table 4.1 in the book Data by D.F. Andrews and A.M. Herzberg (Springer-Verlag, New York, 1985).
  • The von Bortkewitsch horsekick deaths data (horsekick.csv) was generated from Table 69.1 in the book Data by D.F. Andrews and A.M. Herzberg (Springer-Verlag, New York, 1985).
  • R procedure for the normalized Shannon measure (shannonproc.txt)
  • R procedure for the normalized Simpson measure (simpsonproc.txt)
  • R procedure for the normalized Gini measure (giniproc.txt)
  • R procedure for the normalized Bray measure (brayproc.txt)
  • R procedure to generate Fig. 3.3 (ch3fig3proc.txt)
  • R procedure to generate Fig. 3.4 (ch3fig4proc.txt)

Chapter 4 – Uncertainty in Real Variables

Chapter 5 – Fitting Straight Lines

Chapter 6 – A Brief Introduction to Estimation Theory

Chapter 7 – Outliers: Distributional Monsters (?) That Lurk in Data

Chapter 8 – Characterizing a Dataset

Chapter 9 – Confidence Intervals and Hypothesis Testing

Chapter 10 – Relations among Variables

Chapter 11 – Regression Models I: Real Data

Chapter 12 – Reexpression: Data Transformations

Chapter – 13: Regression Models II: Mixed Data Types

  • R code for odds ratio characterizations (oddsratioWaldproc.txt)
  • The apple tree/bitter pit dataset (bitterpit.csv) was generated from Table 59.1 in the book Data by D.F. Andrews and A.M. Herzberg (Springer-Verlag, New York, 1985).

Chapter 14 – Characterizing Analysis Results

Chapter 15 – Regression Models III: Diagnostics and Refinements

Chapter 16 – Dealing with Missing Data



Website Terms and Conditions and Privacy Policy
Please send comments or suggestions about this Website to custserv.us@oup.com        
cover