# Lesson 7 — Count and Rate Data (v3 expanded)

*Companion-podcast transcript • Sarah & Kiffer* 
*~5,196 words • ~28 min audio*

---

**Sarah:** Welcome back to Office Hours. I'm Sarah.

**Kiffer:** And I'm Kiffer. Today we're working through Lesson 7, Count and Rate Data. And this is the lesson where we leave continuous outcomes and binary outcomes behind, and we take on a whole new outcome type, counts.

**Sarah:** Right. So far in this course we've covered linear regression for continuous outcomes, like blood pressure or BMI. Then logistic regression for binary outcomes, like disease versus no disease. Then in Lesson 6 we extended that to ordinal and multinomial models for outcomes with more than two categories. And now we're moving to counts.

**Kiffer:** And counts are everywhere in epidemiology. Number of new disease cases in a region. Number of doctor visits in a year. Number of asthma attacks a child has over six months. Number of falls in elderly people. Number of parasites on a host animal. Whenever the outcome is the number of times something happened, you're in count territory.

**Sarah:** And counts have some properties that make them really different from continuous or binary outcomes. They're discrete, you can't have two and a half doctor visits. They're non-negative, you can't have minus three asthma attacks. And in real epidemiological data they tend to be right-skewed. Most people have zero or one event, and a small number of people have a lot.

**Kiffer:** Which means linear regression is the wrong tool. Linear regression assumes the outcome can be any real number, positive or negative, with errors that are roughly normally distributed. Counts violate both of those. So we need a different family of models. That's what this lesson is about.

**Sarah:** And the lesson opens by naming four types of count and rate data you'll see in epi work. Worth listing because they each demand slightly different analytic moves. First, simple counts. Just the number of events, with no denominator. Like the number of cases in a hospital this week.

**Kiffer:** Second, rates with person-time denominators. Number of events divided by accumulated person-time at risk. Person-years, person-months, person-days. The standard rate measure when follow-up varies. And the offset trick we'll get to later is exactly how you model this kind of data.

**Sarah:** Third, population rates. Number of events divided by the population size, often per 100,000 people. Routine surveillance data, like cancer registry rates by province, falls in this category.

**Kiffer:** And fourth, area-based counts. Counts of events per geographic area, like number of asthma cases per neighborhood. These often have spatial structure that needs special handling, but the basic count machinery still applies.

**Sarah:** The lesson walks through four sections, but conceptually they hang together as three big moves. First, Poisson regression, the default model for counts. Second, evaluating the Poisson model and diagnosing overdispersion, which is the most common problem you'll run into. And third, the standard fixes, negative binomial regression for overdispersion and zero-adjusted models for excess zeros, plus the offset trick for handling subjects with different observation periods.

**Kiffer:** Let's start with Poisson. The Poisson distribution is named after the French mathematician Simeon Denis Poisson, who worked on it in the early 1800s. It's the foundational probability distribution for count data.

**Sarah:** And what does the Poisson distribution actually describe?

**Kiffer:** It describes the probability of observing a given number of events in a fixed interval of time or space, assuming events occur independently at a constant average rate. So if you know that on average a hospital emergency department gets about three pediatric asthma admissions per night, the Poisson distribution will tell you the probability that tonight you'll see exactly zero, exactly one, exactly two, exactly five, exactly ten admissions.

**Sarah:** And the parameter that controls the whole distribution is lambda, written as the Greek letter lambda. Lambda is both the average number of events and, importantly, the variance of the number of events. We'll come back to that in a minute because it's load-bearing.

**Kiffer:** Right. So lambda is the rate. And the Poisson distribution gives you the probability of observing k events when events occur randomly with rate lambda. In the textbook this is written as a formula with lambda raised to the k, times e to the negative lambda, divided by k factorial. We don't need to memorize the formula. What matters is the intuition. Higher lambda, more events on average. Lower lambda, fewer events on average.

**Sarah:** And the Poisson is defined for non-negative integers only. Zero, one, two, three, and so on. You can't have a fractional Poisson outcome. Which matches counts perfectly.

**Kiffer:** There are some assumptions baked into the Poisson distribution that are worth naming, because they're going to come back when we talk about why the Poisson model so often breaks down.

**Sarah:** Yeah. There are basically four assumptions. One, events are independent of each other. Two, the rate at which events occur is constant over the observation period. Three, two events can't occur at exactly the same instant. And four, the probability of an event in a short interval is proportional to the length of the interval.

**Kiffer:** And in real epidemiological settings, those assumptions are often only approximately true. Which is why so much of this lesson is about diagnosing and fixing the situations where the Poisson model doesn't quite work.

**Sarah:** Now let's talk about Poisson regression. The model that lets us bring predictors into the picture.

**Kiffer:** Right. The plain Poisson distribution just describes counts. It doesn't tell you why one person has more events than another. Poisson regression lets you model how the expected count varies with predictors. Like age, sex, smoking status, neighborhood, whatever you think matters.

**Sarah:** And the way the model is built is through a log link function. Let me try to put it in plain words. The Poisson regression model says the log of the expected count equals beta zero plus beta one times X one plus beta two times X two, and so on through your predictors.

**Kiffer:** Beta zero is the intercept, the log of the expected count when all predictors are zero. Beta one is the slope for the first predictor. Beta two is the slope for the second predictor. Same structure as linear regression. The only twist is that we're modelling the log of the expected count, not the count itself.

**Sarah:** And why the log? Why not just model the expected count directly?

**Kiffer:** Two reasons. One, the log keeps the predictions on the right side of zero. Counts can't be negative. If we modelled the expected count directly with a linear function of predictors, nothing would stop the prediction from going negative for some combination of predictors. Taking the log and then exponentiating at the end guarantees a non-negative prediction.

**Sarah:** Okay, and what's the second reason?

**Kiffer:** The log link gives us a really clean interpretation on the rate-ratio scale. Because of how exponentials work, if you exponentiate beta one, you get the multiplicative effect of a one-unit change in X one on the expected count. And that multiplicative effect is what we call the rate ratio, or in this context, the incidence rate ratio. Often shortened to IRR.

**Sarah:** So let's pause and define incidence rate ratio. Incidence rate ratio is the ratio of the rate of events in one group compared to another. If smokers have an incidence rate ratio of 1.4 for lung cancer compared to non-smokers, that means smokers experience the event 40 percent more often per unit of follow-up time.

**Kiffer:** And it's analogous to the odds ratio in logistic regression and the risk ratio in cohort studies. It's the multiplicative summary measure for rates. Greater than one means more events in the exposed group. Less than one means fewer. Equal to one means no difference.

**Sarah:** So the workflow for Poisson regression is, you fit the model, R or Stata or whatever software gives you back the betas. You exponentiate each beta. And those exponentiated betas are your incidence rate ratios.

**Kiffer:** And the lesson gives a nice worked example. A study of mastitis in dairy herds. Mastitis is an inflammation of the udder, very common in dairy cows. They model the number of mastitis cases per herd over one year, with herd size as the predictor. They get a beta of 0.012 for herd size.

**Sarah:** And to get the IRR you exponentiate. e to the 0.012 is 1.012. So for each additional cow in the herd, the rate of mastitis goes up by about 1.2 percent. Small per-cow effect, but if you scale up to a hundred more cows in the herd, you compound that effect. e to the 0.012 times 100, which is e to the 1.2, gives you about 3.32. So a herd that has a hundred more cows than another has roughly a 3.3-fold higher mastitis rate.

**Kiffer:** And that compounding is just what exponentials do. The model is multiplicative on the rate scale, additive on the log scale. Once you internalize that, the interpretation comes easily.

**Sarah:** Okay. Now we get to the part of the Poisson setup that drives the rest of the lesson. The key Poisson assumption.

**Kiffer:** Yeah. The Poisson distribution has this very specific property. The mean equals the variance. Both of them equal lambda. There's only one parameter, and it controls both the average and the spread.

**Sarah:** And let me unpack what that means in real terms. If you're modelling the number of doctor visits per person per year, and the average is two, the Poisson distribution says the variance also has to be two. Which constrains how spread out the data can be.

**Kiffer:** And in real epidemiological data, that constraint is rarely satisfied. Almost always, when you actually look at count data in the wild, the variance is bigger than the mean. Sometimes a lot bigger.

**Sarah:** Which is the phenomenon called overdispersion. The observed variance is larger than the variance the Poisson model would predict. Over, as in too much. Dispersion, as in spread.

**Kiffer:** And it really matters because it's not just an aesthetic problem. Overdispersion has direct consequences for inference. If you fit a Poisson model to data that are overdispersed, the standard errors come out too small.

**Sarah:** And too-small standard errors mean confidence intervals that are too narrow. Which means p-values that are too small. Which means you're going to falsely declare results statistically significant when they're not really.

**Kiffer:** Right. The technical term for that mistake is inflated Type One error. Let me describe what Type One error means in plain words. A Type One error is rejecting the null hypothesis when the null is actually true. Saying you found an effect when there isn't one. A false positive. And if your standard errors are too small, you'll commit Type One errors more often than your nominal alpha level says you should.

**Sarah:** So overdispersion is the most consequential problem in count modelling. And much of what's left in this lesson is essentially about how to detect it and how to fix it.

**Kiffer:** Before we move on, the lesson gives a couple of intuitions for why overdispersion happens. The most common reason in epidemiology is what's called unobserved heterogeneity.

**Sarah:** Unobserved heterogeneity. Walk me through what that actually means in plain terms.

**Kiffer:** It just means that different people have genuinely different underlying rates, and you don't have enough predictors in your model to capture those differences. So in the doctor-visits example, you might have age and sex in your model, but the people with chronic conditions, severe anxiety, or social isolation are going to have much higher visit rates than people without those characteristics. If you don't measure and adjust for those characteristics, the variation between people leaks into the residual variance and shows up as overdispersion.

**Sarah:** Other sources of overdispersion include clustering, like animals within herds or kids within schools, where events within a cluster are correlated and that violates the independence assumption.

**Kiffer:** And there can be excess zeros, which we'll come back to later in the lesson. Lots of people having literally zero events, more than the Poisson distribution can comfortably accommodate.

**Sarah:** The lesson is also careful to flag a really important distinction. Apparent overdispersion versus real overdispersion.

**Kiffer:** Yeah, this is worth slowing down on. Apparent overdispersion looks like overdispersion, but it's actually a sign that your model is misspecified. You can fix it by fixing the model. Real overdispersion is genuine extra-Poisson variation that no amount of model improvement is going to make go away. You have to use a different distribution.

**Sarah:** Examples of apparent overdispersion. A few outliers in the data inflating the dispersion statistic. Missing important predictors so unexplained variation looks like overdispersion. Wrong functional form, like a linear predictor when the relationship is actually curved.

**Kiffer:** And the warning the lesson really wants you to internalize is this. Before you reach for a fancier distribution like negative binomial, check whether you can fix the Poisson model first. Add the missing predictors. Investigate the outliers. Try a quadratic term. If you can resolve the overdispersion that way, that's the principled solution.

**Sarah:** Because if you apply an overdispersion correction to a misspecified model, you can mask the misspecification. The corrected standard errors look fine, but you're hiding a real signal in the data.

**Kiffer:** Right. The warning is, fix the model first. Then if overdispersion remains, that's real overdispersion, and that's what the negative binomial section coming up is for.

**Sarah:** So how do you actually detect overdispersion in practice?

**Kiffer:** The standard diagnostic is the dispersion parameter. You compute the sum of squared Pearson residuals, which are the standardized differences between observed and expected counts, and divide by the residual degrees of freedom. Under the Poisson assumption, that ratio should be about one. If it's substantially bigger than one, you have overdispersion. If it's less than one, you have underdispersion, which is rarer but possible.

**Sarah:** And there are formal tests too. In R, the dispersiontest function in the AER package tests overdispersion explicitly. If it's significant, you've got real evidence of overdispersion, and you should consider switching distributions.

**Kiffer:** There are also a couple of residual types worth knowing. Pearson residuals are the basic standardized residuals, observed minus expected divided by the square root of expected. Deviance residuals are based on the log-likelihood contribution of each point. They tend to be more nicely behaved when expected counts are small. And Anscombe residuals use a cube-root transformation to make the residuals as close to normally distributed as possible.

**Sarah:** Different residuals for different diagnostic plots. Pearson for goodness of fit, deviance and Anscombe for normal probability plots. The lesson notes that a thorough evaluation uses all three.

**Kiffer:** And there's a hierarchy of corrections worth knowing about. The simplest correction is to scale the standard errors by the square root of the dispersion parameter. Coefficients don't change, but confidence intervals widen. That's a quick fix when overdispersion is mild to moderate and you trust your point estimates.

**Sarah:** More principled options include negative binomial regression, which we'll get to next. Random effects models, which use a random intercept to capture unobserved heterogeneity, especially useful in clustered data. And generalized estimating equations, often called GEE, which use sandwich standard errors to handle clustering when you want population-averaged estimates rather than subject-specific ones.

**Kiffer:** Each of those is appropriate in slightly different situations. Scaling standard errors is the quickest. Negative binomial is the standard distributional alternative. Random effects are right when the clustering structure is hierarchical. GEE when you want marginal interpretation. Different tools, same underlying problem.

**Kiffer:** Good. So that wraps the Poisson part. Log of the expected count is a linear function of predictors. Exponentiate the betas to get incidence rate ratios. Watch out for overdispersion, which violates the mean-equals-variance assumption, makes standard errors too small, and inflates Type One error.

**Sarah:** Now on to negative binomial regression. The standard solution for overdispersed counts.

**Kiffer:** Yeah. The negative binomial distribution extends the Poisson by adding an extra parameter, often called the dispersion parameter and written as alpha. That extra parameter explicitly allows the variance to be larger than the mean.

**Sarah:** And here's a really intuitive way to think about what the negative binomial is doing. It's a Poisson distribution where the rate parameter itself varies across individuals. Each subject has their own lambda, drawn from a separate distribution, usually a Gamma distribution. When you mix a Poisson with a Gamma like that, you get a negative binomial.

**Kiffer:** Which connects back to the unobserved heterogeneity intuition. Different people genuinely have different rates. The negative binomial bakes that heterogeneity into the model directly, instead of pretending it doesn't exist.

**Sarah:** And operationally, the negative binomial regression model uses the same log-linear form as Poisson regression. Log of the expected count equals beta zero plus beta one times X one and so on. Same coefficients, same interpretation. The exponentiated betas are still incidence rate ratios.

**Kiffer:** The only difference is the assumed variance structure. The negative binomial allows the variance to exceed the mean, controlled by alpha. When alpha equals zero, the negative binomial reduces to the Poisson. So the Poisson is a special case of the negative binomial, which means you can formally test one against the other.

**Sarah:** How do you test it?

**Kiffer:** A likelihood ratio test for whether alpha equals zero. If the test is significant, the extra parameter is doing real work, and the negative binomial fits significantly better than the Poisson. If it's not significant, you don't have evidence of overdispersion and you can stick with Poisson. The lesson notes there's a small technicality, because alpha equal to zero is at the boundary of the parameter space, the standard chi-squared p-value is conservative. But the test still works.

**Sarah:** There are actually two flavors of negative binomial, called NB-1 and NB-2. They differ in how the variance grows with the mean.

**Kiffer:** NB-1 says the variance grows linearly with the mean. Variance equals mu plus alpha times mu. So the ratio of variance to mean is constant, just bigger than one. That's similar to a quasi-Poisson model with a fixed dispersion factor.

**Sarah:** And NB-2, which is the default in most software, including R's glm dot nb function and Stata's nbreg, says the variance grows quadratically with the mean. Variance equals mu plus alpha times mu squared. So observations with higher expected counts have proportionally more variance.

**Kiffer:** And NB-2 tends to fit biological data better in practice, because variability really does seem to grow faster than the average in most settings. So unless you have a specific reason to use NB-1, NB-2 is the standard choice.

**Sarah:** And the practical payoff is that the negative binomial gives you correct standard errors when overdispersion is present. Your point estimates of the incidence rate ratios are usually similar to what Poisson would have given you. But your confidence intervals widen appropriately. So you stop falsely declaring results significant when they shouldn't be.

**Kiffer:** Now alongside negative binomial, the lesson also covers a separate problem that sometimes shows up alongside or instead of overdispersion. Excess zeros.

**Sarah:** Excess zeros. Walk me through how that shows up in real data.

**Kiffer:** Sometimes the data have way more zero counts than even a Poisson or negative binomial would predict. Like if you're modelling the number of cigarettes smoked per day in the general population, an enormous fraction of people smoke zero. Not because they happened to smoke zero on this particular day, but because they're never-smokers. They will never smoke. They're a fundamentally different population from the smokers.

**Sarah:** And the standard count models can struggle with that, because they're trying to fit one distribution to two populations. The structural zeros from never-smokers, plus the count process for everyone else.

**Kiffer:** Right. And there are two model families designed to handle that. Zero-inflated models and hurdle models. Let's take them in order.

**Sarah:** Zero-inflated models first. There's zero-inflated Poisson, often abbreviated ZIP, and zero-inflated negative binomial, abbreviated ZINB. Same idea, different count distribution underneath.

**Kiffer:** And the structure of a zero-inflated model is what's called a mixture. Two components stitched together. The first component is a logistic regression that predicts the probability that an observation is a structural zero. A guaranteed zero, regardless of anything else. The second component is a count model, Poisson or negative binomial, that predicts the count for everyone who isn't a structural zero.

**Sarah:** And here's the subtle bit. In a zero-inflated model, zeros can come from either component. Some zeros are structural, the never-smokers in our example. And some zeros are from the count process, smokers who happened to smoke nothing today.

**Kiffer:** Right. The model accommodates both kinds of zeros. The logistic part determines which population you came from. And then if you came from the count population, the count part predicts your number, which might happen to be zero.

**Sarah:** Then hurdle models. Same general idea, but a different structure.

**Kiffer:** In a hurdle model, the first part also predicts whether the count is zero or non-zero. But here, all zeros come from that first part. The second part is a truncated count distribution for people with at least one event. Truncated meaning the count distribution is conditioned on being greater than zero.

**Sarah:** So in a hurdle model the two parts are clean. Zero versus non-zero, and then for the non-zeros, how big is the count. Whereas in a zero-inflated model, zeros can come from either side.

**Kiffer:** And that subtle distinction has practical consequences. If your data-generating process really does have two distinct kinds of zeros, structural zeros plus count zeros, the zero-inflated model is more faithful. If the zero-versus-non-zero decision and the count-magnitude decision are best understood as separate processes, the hurdle model is the cleaner fit.

**Sarah:** An epidemiological example where the distinction matters. Healthcare utilization. Some people have zero doctor visits this year because they have no access to care, no insurance, no usual provider. That's a structural zero. Other people have zero visits because they had access but happened not to use it. Those are count zeros.

**Kiffer:** And if you wanted to model both processes separately, with covariates predicting access in one part and intensity of use in the other, a hurdle model gives you a cleaner mapping. But if you think both kinds of zeros coexist and are mixed together, zero-inflated is the way.

**Sarah:** There's also a third small case the lesson mentions. Zero-truncated models. Used when zeros literally cannot occur in your data.

**Kiffer:** Yeah, this comes up when the sampling design excludes zeros. Hospital length of stay, for example. If you're sampling hospital admissions, every admission has at least one day. There's no such thing as a zero-day admission. Or number of items purchased among customers who walked out with at least one item.

**Sarah:** In those settings the standard count distributions don't quite work, because they put nonzero probability on the value zero. Zero-truncated models adjust the distribution so that probability is redistributed onto positive integers.

**Kiffer:** Different problem from zero inflation. Same family of fixes.

**Sarah:** How do you decide between a standard count model and a zero-inflated alternative? Is there a formal test?

**Kiffer:** There's the Vuong test, which is specifically designed for non-nested model comparisons. Standard Poisson and zero-inflated Poisson aren't strictly nested, so the Vuong test is the conventional way to compare them. AIC and BIC also help. And just looking at the proportion of zeros in your data versus what the basic Poisson predicts is a useful first diagnostic.

**Sarah:** Good. So that's the negative binomial family. Negative binomial when you have overdispersion. Zero-inflated and hurdle when you have excess zeros. Zero-truncated when zeros can't occur by design. The choice depends on what's actually happening in your data.

**Kiffer:** Now let's talk about rate models and offsets. This is where we handle a really common practical problem. Different subjects observed for different amounts of time.

**Sarah:** And this comes up constantly in real epidemiology. People drop out, people enroll late, people die before the study ends, people move away. So when you tally up the number of events for each person, you're tallying it up over different observation windows.

**Kiffer:** Right. And if you just compare counts directly, you're going to get misleading answers. Someone followed for two years and someone followed for one year shouldn't be treated equivalently for a count of events. The person followed twice as long has twice as much opportunity to have an event, even at the same underlying rate.

**Sarah:** So what do you do? The natural move is to model the rate, events per unit time, instead of the raw count.

**Kiffer:** And the trick that makes that possible inside a Poisson or negative binomial regression is called the offset.

**Sarah:** Walk me through what the offset actually is.

**Kiffer:** An offset is a fixed term that you add to the linear predictor in your regression, with its coefficient locked at one. It doesn't get estimated. It just enters the model as a known quantity, shifting the intercept onto the rate scale.

**Sarah:** And specifically, the offset you add is the log of the exposure time. Log of follow-up years, or log of person-years, or log of person-months, depending on the units.

**Kiffer:** Right. So your Poisson regression model with an offset reads, the log of the expected count equals log of exposure time plus beta zero plus beta one times X one and so on. The log of exposure time is on the right-hand side as a fixed quantity, not estimated.

**Sarah:** And the algebra works out beautifully. Because log of expected count minus log of exposure time equals log of expected count divided by exposure time. Which is the log of the expected rate. So when you include the offset, you're effectively modelling the rate, even though the outcome variable is still the raw count.

**Kiffer:** So the offset converts a count regression into a rate regression. Same machinery. Same software. Same likelihood. The only thing that changes is that exposure time has been factored out.

**Sarah:** And the interpretation changes accordingly. The exponentiated betas are now rate ratios per unit of follow-up time. So if you used person-years as your offset, the incidence rate ratio for smoking is the rate ratio per person-year. Which is what you actually want when follow-up varies.

**Kiffer:** And one really common confusion to clear up. People sometimes ask, why don't I just include log of exposure time as a regular predictor? Why does it have to be an offset?

**Sarah:** Yeah, that's a question that comes up a lot. What's the answer?

**Kiffer:** Because if you let the regression estimate a coefficient for log of exposure time, the model will fit some coefficient that isn't necessarily one. Maybe it'll be 0.85, maybe 1.12, depending on noise in the data. And that doesn't have a clean interpretation. The whole conceptual point of the offset is that, by definition, doubling your exposure time should double your expected count, holding everything else constant. That's a coefficient of exactly one. Locking it at one with the offset enforces the right conceptual structure. Letting it float wastes a degree of freedom and gives you something that isn't a rate model anymore.

**Sarah:** The lesson actually has a nice R activity that walks through this. There's a companion dataset called phaa underscore followup dot csv. It records how many GP visits each participant had during their follow-up. And because follow-up varies, you need an offset of log of follow-up years.

**Kiffer:** And the workflow is, fit a Poisson regression with offset of log follow-up years. Get the incidence rate ratios by exponentiating the coefficients. Check the dispersion. If you have overdispersion, refit as negative binomial with the same offset. Compare AICs. The R code is straightforward. The conceptual move is the offset.

**Sarah:** And one more practical note. The lesson points out that Poisson regression can also be used to estimate relative risks directly from binary data when the outcome is rare. With robust standard errors. This is useful when you want a relative risk, which is more interpretable than an odds ratio, but you don't want the limitations of log-binomial regression.

**Kiffer:** When the outcome is rare, the Poisson rate approximates the binomial probability. So you can fit a Poisson model to binary data, with sandwich standard errors to handle the fact that the data aren't really Poisson. And you get back relative risks directly. It's a useful trick to have in your toolkit.

**Sarah:** Okay. Let me try to pull the takeaways together.

**Kiffer:** Yeah, please. Walk us through them.

**Sarah:** First takeaway. When the outcome is a count, reach for Poisson regression. The model says the log of the expected count equals beta zero plus beta one times X one and so on. Exponentiate the betas to get incidence rate ratios. Same multiplicative interpretation as odds ratios from logistic regression, except now we're talking about rates of events.

**Kiffer:** Second. The Poisson model assumes the mean equals the variance. In real data this is rarely true. The variance is almost always bigger than the mean, which is the phenomenon called overdispersion.

**Sarah:** Third. Overdispersion matters because it makes Poisson standard errors too small, which inflates Type One error. You'll declare effects significant that aren't really. So always check for overdispersion. The dispersion parameter, the ratio of sum of squared Pearson residuals to residual degrees of freedom, should be about one. If it's substantially greater than one, you have overdispersion.

**Kiffer:** Fourth. Before you correct for overdispersion, check whether your model is misspecified. Apparent overdispersion comes from missing predictors, outliers, or wrong functional form. Real overdispersion is genuine extra-Poisson variation. Apply a correction to a misspecified model and you can hide a real signal.

**Sarah:** Fifth. The standard fix for real overdispersion is negative binomial regression. Adds a dispersion parameter that lets the variance exceed the mean. Same log-linear form, same incidence rate ratio interpretation. Just gives you correct standard errors when overdispersion is present.

**Kiffer:** Sixth. For excess zeros, use zero-inflated or hurdle models. Zero-inflated Poisson and zero-inflated negative binomial mix a logistic regression for structural zeros with a count regression for the rest. Hurdle models are cleaner two-part structures, with the binary part predicting zero versus non-zero and the count part predicting the magnitude given non-zero. Use the Vuong test, AIC, and BIC to compare.

**Sarah:** Seventh. For variable observation periods, use an offset. Add log of exposure time as a fixed term in the regression with coefficient one. Doesn't get estimated. Effectively converts count regression into rate regression. Exponentiated coefficients are rate ratios per unit time.

**Kiffer:** And eighth. Poisson regression with robust standard errors can estimate relative risks directly from binary data when the outcome is rare. Useful trick when you want a relative risk instead of an odds ratio.

**Sarah:** And the practical recommendation. When you sit down with count data, the workflow is roughly. Step one, fit the Poisson model with the right offset if exposure varies. Step two, check the dispersion parameter and look at residuals. Step three, if overdispersion is present, investigate whether it's apparent. Add missing predictors, check outliers, try non-linear terms. Step four, if it's still there, refit as negative binomial. Step five, look at the proportion of zeros versus what the model predicts, and if there are excess zeros, consider zero-inflated or hurdle models.

**Kiffer:** And keep the conceptual moves clear. The Poisson is the foundation. Negative binomial is the standard fix for overdispersion. Zero-inflated and hurdle handle the special case of excess zeros. The offset is the trick for variable exposure time. That's the whole map.

**Sarah:** And if you can fit and diagnose a Poisson model, then refit as negative binomial when needed, then add an offset when exposure varies, you've covered probably 90 percent of count modelling problems you'll encounter in epidemiology.

**Kiffer:** Next up is Lesson 8. Survival Data. Time-to-event outcomes. We'll extend the regression toolkit to handle censoring and the rich information in the timing of events, not just whether they occurred.

**Sarah:** Take care, everyone.

**Kiffer:** See you there.
