### Grafen & Hails: Modern Statistics for the Life Sciences

# Preface

There are five reasons you should learn to do statistics the way this book teaches it.
It teaches a language, with which you can communicate about statistics. You use it day-to-day in telling your computer which tests you wish to do; and you use the same language to discuss with statistical advisers what tests you have been doing, might do, and should do. It is the language of **model formulae**, which was developed in the 1960s by statistical theorists, and is today universally employed by statisticians. The commonly used computer packages all have commands that use these model formulae, so this powerful language can now be usefully learnt by all users of statistics. This book teaches you that language.

The language of model formulae is based on a grand conceptual scheme, called the General Linear Model or GLM. This contains within it all the usual parametric tests, including t-tests, analysis of variance, contrast analysis, linear regression, multiple regression, analysis of covariance, and polynomial regression. Instead of learning these as separate tests with broadly similar features but maddening differences, this book will teach you a single coherent framework. Instead of a mish-mash of eccentrically named accidents, this book presents statistics as a meaningful whole. This is intellectually satisfying, but it is also practically useful in two ways. It"s all much easier to remember and the unifying approach allows a lot more material to be covered in the same amount of time. Learn more faster!

Statistics textbooks tend to be divided into cookbooks (for those who want to do but not understand) and spookbooks (vice versa). The problem faced by the writers is that the obvious way to explain why a test works is to give a mathematical proof. But the mathematics is not important in everyday use of statistics, and is anyway not accessible to most users. This book chooses a different conceptual plane on which to explain statistics, one that is only possible because of the new language and conceptual unity. Some of the ideas in this plane are introduced, using geometrical pictures, to explain a bit about how GLM works. The ideas and concepts we chose to explain geometrically in this book are those that you really do need to use statistics properly. Concentrate your learning on what really matters.

If you are a student, and you learn the old way, then you are very likely to have the following experience at some stage in your course. You plan and carry out an experiment or survey, and go to ask for help from the person who taught you statistics. Unfortunately, the course didn"t actually cover anything so sophisticated as the relatively simple project you"ve done. The advice is either "Do this simpler test that was covered, even though it isn"t actually the right test" or "Give the data to me, and I"ll do it for you". However welcome this second response may be, you really would be better off knowing how to do it yourself! The power of GLM, using model formulae, is such that using the basic toolkit this book provides, you have an excellent chance of being able to analyse your project yourself. The final reason is that GLM does not cover all of statistics. But the conceptual framework you learn from this book transfers almost unchanged into Generalised Linear Models. These cover logistic and probit regressions, log-linear models and many more. So if you just want to learn basic statistics at the moment, the advantage of learning the way this book teaches is that, if you ever do want to go further, you will be well prepared. If, on the other hand, you already possess the laudable ambition to learn Generalised Linear Models, but lack the mathematical skills or sheer technical courage needed to tackle the textbooks on the subject, then get a firm conceptual grounding in General Linear Models from this book first. You will find the extension plain sailing.

It is important to say that many kinds of test are not covered in this book. The main classes are the too simple and the too complex. Nonparametric tests do not belong to GLM. If your datasets are always going to be simple enough to handle this way, then you"re probably better off sticking with them"but do be aware of the danger of doing simple tests, when you should be doing more sophisticated ones, just because you don"t know how. For example, the options for statistical elimination of variables are extremely limited, and estimation in nonparametric statistics is usually based on fairly dodgy logic.

Some of the tests that are too complex have already been mentioned: those that come under the umbrella of Generalised Linear Models. The others include factor analysis, principal component analysis and time series analysis. These branches of statistics are all based on the simple General Linear Model, and though not directly covered here, the concepts and skills you will learn from this book are a good preparation for tackling them later on.

We have taught this course to first and second year biology undergraduates at Oxford University for about ten years. Interest around the world in the lecture notes has confirmed us in our belief that this it is the right way to teach statistics today. Those notes have here been completely rewritten with simplicity and the logical order of presentation in mind. This book represents a major attempt to make the ideas of General Linear Models accessible to life sciences undergraduates everywhere.