## Chapter 9

Analyzing Simple Epidemiologic Data
1. With person-time data, the numerators of rates are considered Poisson random variables, and the denominators are treated as if they were constants, not subject to variability. Nevertheless, the person-time must be measured and is therefore subject to measurement error. Why are the denominators treated as constants if they are subject to measurement error? What would be the effect on the confidence interval of taking this measurement error into account instead of ignoring it?

2. The approximate formulas for confidence intervals described in this chapter do not work well with small numbers. Suppose 20 people are followed, and 1 develops a disease of interest, giving a risk estimate of 1/20 = 0.05. The binomial model would give a 90% confidence interval for the risk from ?0.03 to 0.13. The lower limit implies a negative risk, which does not make sense. The lower limit should never go below zero, and the upper limit should never go above 1. These risk estimates, based on only one case, are too small for these approximate formulas. Instead, exact formulas based on the binomial distribution can be used. Would you expect a confidence interval for risk calculated from an exact formula to be symmetric around the point estimate (0.05), as the approximate confidence interval is?

3. There is another approximation for obtaining the confidence interval for a binomial proportion that comes closer to the exact method. It is an expression that was proposed in 1927 by Wilson:

In this formula, a is the number of cases (numerator), N is the number at risk (denominator), and Z is the multiplier from the standard normal distribution that corresponds to the confidence level. The ± sign gives the lower bound when the minus sign is used and the upper bound when the plus sign is used. Even with only 1 case among 20 people, this formula gives results very close to the exact confidence interval, and its accuracy only improves with larger numbers. What is the 90% confidence interval for the risk estimate of 1/20 using Wilson’s equation? If Wilson’s equation is so accurate, why do you suppose that it has not been adopted more widely as the usual approach to getting confidence limits for a binomial variable?

4. Why are the estimation equations to obtain confidence intervals the same for prevalence data and for risk data (see Equations 9–2 and 9–3)?

5. Why do the estimation equations for confidence intervals differ for risk data and case-control data (see Equations 9–3 and 9–6), whereas the formula for obtaining a ? statistic to test the null hypothesis is the same for risk data and case-control data (see Equation 9–7)?

6. Does it lend a false sense of precision to present a 90% confidence interval instead of a 95% confidence interval?

7. Calculate a 90% confidence interval and a 95% confidence interval for the odds ratio from the following crude case-control data relating to the effect of exposure to magnetic fields on risk of acute leukemia in children:

Median Nighttime Exposure

Adapted from: Michaelis J, Schütz, Meinert R, Zemann E, Grigat J-P, Kaatssch P, Kaletsch U, Miesner A, Brinkmann K, Kalkner W, Kärner H. Combined risk estimates for two German population-based case-control studies on residential magnetic fields and childhood acute leukemia. Epidemiology 1997;9:92–94.