Grafen & Hails: Modern Statistics for the Life Sciences
07.04.06 JF David, CNRS, France
07.04.06: JF David, CNRS, France
Q: In the analysis of the tulips dataset, why didn't you test the fixed factors water and shade using the interactions bed*water and bed*shade as error terms? In my own data, I have two fixed factors crossed with a random factor (several broods of an invertebrate, considered as blocks). Do you think it's correct if I pool the SS and df for all the interactions with blocks, and use the pooled MS as an error term? That's what you did in box 7.5, and more generally in your book you never examine interactions between a factor and blocks.
A: Jean-Francois's question is a good one. If the book were about statistics in ecology rather than biology more generally, we would definitely have to discuss the whole question of how to treat blocks in this kind of dataset. Two demonstrators in the statistics course on which the book is based were ecologists, and raised this issue about ten years ago: the result is a paper devoted to the question of whether blocks should be fixed or random, and whether block interactions should be included in model formulae (J.R. Newman, J. Bergelson & A. Grafen.
1997. ‘Blocking factors and hypothesis tests in ecology: Is your statistics text wrong?’ Ecology, 78, 1312-1320).
So, for the long answer, read the paper! Coming to the particular question, there are differing possible views, as follows:
(1) The principled view. Jean-Francois should not regard his broods as blocks. Rather, he should regard each brood as one datapoint; the different treatments within a brood then lead to repeated measures on the brood, and the book discusses analyses like this one in Section 8.2. (A traditional block is a subdivision of the experimental material, made by the experimenter, in order to reduce experimental error. A brood, however, is more like a biological individual.)
(2) Standard ecology view 1. Jean-Francois's current analysis follows this view (i.e. declare blocks as a random effect, and include the block by treatment interactions in the model formula. This will lead to tests for main effects using the interaction with block as the denominator.)
This analysis makes the strong assumption that treatment effects are independently and Normally distributed across broods: this would not be true if, for example, the response of broods was bimodal as if resulting from a major gene effect; or if some broods shared a father; or if some treatments are related and so some pairs of levels will have more similar responses than other pairs. This view is natural (but not necessarily right) when the 'blocks' are interpretable as biological individuals.
(3) Standard ecology view 2. This is the analysis Jean-Francois assumed the book was recommending, that is, omit all the block interactions (and so actually it doesn't then matter whether block is declared as random or not, but we'd usually leave it as fixed). The problem with this approach is seen in the assumption of omitting the block interactions. This analysis is fine if we are happy to assume that all broods have the same difference in their responses to each pair of treatment levels. But if we want to allow the possibility that broods differ in their response to treatment, then this analysis won't be valid. (One circumstance in which this view can be valid is if we're testing a null hypothesis that there are *no* treatment effects: the test is valid on the null hypothesis, and we may be particularly interested in consistent treatment differences across broods.) This view is natural (and some would argue right) when the blocks are truly subdivisions of the experimental material decided on by the experimenter solely to reduce experimental error (the "agricultural" type of block).
The paper discusses different examples of blocks in ecology, and suggests when the different approaches might be appropriate for them. You'll see from the discussion of the three views above that this is quite a fundamental and complex issue, with many ramifications, and probably there is no single answer but it will vary from case to case. Probably, different sub-literatures within ecology are firmly attached to one or other of the two 'standard ecology views', and this may often be linked to the crucial issue of the nature of the blocking. If you read the paper, you may also gather that the authors do not always agree on the right answers!
In general, different areas of science have their own special topics in statistics that need discussing. The core ideas of linear models and model formulae apply very widely, and it is usually possible to express the topics in particular areas in those terms.