Grafen & Hails: Modern Statistics for the Life Sciences
21.01.03 Veronique Perrot, Emory University, Atlanta, GA, USA
Chapter 04, page 61
Q:You are very careful to describe 2 types of SSs, and to give them names much more informative than type I and type III (sequential and ajusted SS respectively). Unfortunately, you don't explain how to calculate them, nor do you draw the parallel with the SSs you described in the 1st chapter. I looked in standard stats textbooks (Zar 2nd ed, Sokal & Rohlf 3rd ed, GLIM for ecologists, Statistical models in S), and that topic doesn't show up. Where did I miss something?
The statistical package I use (R) clearly does not use adj SS, so I guess it uses seq SS. So I can alter the significance of a factor by changing its place in the model... I wouldn't even have suspected it had I not read your book, but now I don't know how to proceed! Help! thanks.
A:The book is aimed at users of the mainstream packages, which all supply Seq and Adj sums of squares. Other programs, such as R, S and GLIM, are more technical and designed for use by people with substantial statistical training. They are wonderful in many ways, but probably not to learn with.
These technical packages produce Seq SS, as these are logically simpler. Adj SS can be obtained with a little extra work. To see what we need to do to obtain Adj SS, we need to look at the definitions. Consider first a model without interactions, say Y=A+B+C. A model has a sequence of terms in it, in this case A, B and C. Seq SS are adjusted for all preceding terms, while Adj SS are adjusted for all other terms. So in Y=A+B+C, the Seq and Adj SS for C must be equal because the preceding terms are A and B, and 'all other terms' are also A and B. Thus, to find the Adj SS for A and B, we need to fit models in which the same terms appear but in a different order. We can obtain the Adj SS for A from the model Y=B+C+A and for B from the model Y=A+C+B.
There is therefore one value for the Adj SS for a variable no matter what order the variables are fitted in, and this is why they are often called "Unique SS". In models without interactions, the automatic production of Adj SS by the mainstream packages is very helpful.
Once a model has interactions, life becomes more complicated, and in a way that is very relevant to the question. Consider the model Y=A+B+C+A*B. If we follow the earlier suggestion, we would try to obtain the Adj SS for A by fitting the model Y=B+C+A*B+A. But this model violates marginality, and quite what will happen will depend on your package. Some refuse to fit a "non-hierarchical model". Others fit it but include A in A*B and so there is nothing left for A. Others (very wickedly) use whatever dummy variables represent A*B and fit those first, and then A.
The main point about these problems, though, is that (again according to marginality) a sum of squares for A does not make sense if it has been adjusted for A*B. A more general principle is "every meaningful sum of squares is a sequential sum of squares in some model". So Veronique can generate all meaningful SS using R, its just that she may have to fit models with the terms in different orders. And when Veronique cannot reorder the model terms to produce an Adj SS, that is because the Adj SS is meaningless. Thus the routine production of Adj SS by mainstream packages frequently generates Adj SS that users should refuse to interpret. Perhaps they should be encouraged to omit or at least flag these dubious Adj SS.
One final wrinkle. In one common situation with interactions, the Adj SS can be trusted. That is where the design of an experiment is orthogonal. The reason they can be trusted is that orthogonality guarantees that the Adj SS and Seq SS are equal -- and so we can trust them because they are Seq SS.