# Lesson 9 — Introduction to Clustered Data (v3 expanded)

*Companion-podcast transcript • Sarah & Kiffer* 
*~5338 words • ~28.9 min audio*

---

**Sarah:** Welcome back to Office Hours. I'm Sarah.

**Kiffer:** And I'm Kiffer. Today we're working through Lesson 9, Introduction to Clustered Data. And this lesson is a real hinge in the course, because everything we've done up to now has quietly relied on an assumption that almost never holds in real epidemiology.

**Sarah:** Okay, you have to tell me which assumption that is.

**Kiffer:** Independence. Lessons three through eight built up the regression toolkit for different outcome types. Continuous outcomes with linear regression. Binary outcomes with logistic regression. Ordered and multi-category outcomes with their respective generalizations. Counts with Poisson and negative binomial. Time-to-event with survival analysis. Six lessons, six different outcome shapes. But every one of those models assumed that the observations you fed in were independent of one another.

**Sarah:** Meaning that knowing the value for person A tells you nothing extra about person B.

**Kiffer:** Right. The technical phrase is independent and identically distributed observations. Sometimes shortened to i.i.d. The math is much easier under that assumption, and most software defaults to it. But once you start looking at how epi data actually arrive, you realize the assumption is almost always wrong.

**Sarah:** And that's what Lesson 9 is here to do. Take that assumption head on. Show why it breaks. Show how to detect when it's broken. Show what to do about it.

**Kiffer:** And then preview the family of methods, mixed models, generalized estimating equations, robust standard errors, that we'll spend the next three lessons developing in depth. So Lesson 9 is the on-ramp. The structural understanding of clustered data that everything in Lessons 10, 11, and 12 builds on.

**Sarah:** Okay. Let's start with the basics. What does it actually mean for data to be clustered? Define the term carefully.

**Kiffer:** Clustered data, sometimes called hierarchical data, arises whenever observations are grouped within higher-level units in a way that makes observations in the same group more similar to each other than they are to observations in other groups.

**Sarah:** So the key word there is similar. Observations in the same group share something that observations across groups don't.

**Kiffer:** Exactly. They share an environment, an exposure, a process, a history, a caregiver, a building, a neighbourhood. Whatever it is, that shared influence makes their outcomes correlated. And once outcomes are correlated, the math we've been doing for six lessons starts to give us the wrong answers.

**Sarah:** Let's make this concrete. Walk us through the canonical examples of clustered data in epidemiology.

**Kiffer:** Sure. Five archetypes worth memorizing. First. Patients within hospitals. If you study, say, surgical complication rates, the patients in any given hospital share the same surgeons, the same operating rooms, the same protocols, the same nursing staff, the same infection control practices. Two patients at the same hospital are more alike, in terms of their risk of complications, than two patients drawn at random from different hospitals.

**Sarah:** Got it. What's the second example?

**Kiffer:** Students within classrooms within schools. A classic education and child-health example. Kids in the same classroom share a teacher, a curriculum, a set of classmates, a daily schedule. Kids in different classrooms within the same school share a principal, a building, a neighbourhood. And kids in different schools within the same district share a set of policies and resources. So you've got three levels of nesting at once.

**Sarah:** And this matters for any school-based health intervention study. A study of asthma management programs delivered through schools, for example.

**Kiffer:** Right. Third example. Repeated measurements on the same person over time. This is its own form of clustering, and a really important one. If you measure someone's blood pressure once a month for a year, those twelve measurements are not independent. They share the same person. The same genetics. The same baseline behaviour. The same chronic conditions. The person is the cluster, and the repeated measurements within them are correlated.

**Sarah:** And what's the fourth one?

**Kiffer:** Children within families. Two siblings share genes, household environment, parental income, neighbourhood, dietary patterns, language exposure. They are not independent observations on, say, body mass index or asthma diagnosis.

**Sarah:** And number five rounds it out?

**Kiffer:** Households within neighbourhoods. Two households on the same block share air pollution exposure, walkability, food access, water source, local crime rates, social capital. Their health outcomes will be correlated. And researchers who do not account for that will systematically overstate their precision. The technical phrase the textbook uses is within-cluster correlation. The fact that observations sharing a cluster have an outcome correlation that's bigger than zero. That positive correlation is what makes the math break, and it's also what gives us a quantity to measure.

**Sarah:** Worth saying explicitly. Within-cluster correlation can technically be slightly negative in unusual situations, but in epi we almost always see positive correlation, where cluster mates are more alike than randomly chosen pairs. That's the case the methods are designed for. And other examples worth flagging. Animals in herds, which is really common in veterinary epidemiology. Patients seen by the same physician. Workers in the same factory. People in the same census tract. The list is long. But the structural feature is always the same. There's a higher-level unit, the cluster, and observations inside it share some common influence.

**Kiffer:** And here's the moment to slow down on what goes wrong if you ignore that structure. Because the consequence is more severe than students usually expect.

**Sarah:** Walk us through what breaks.

**Kiffer:** Standard errors come out too small. And once standard errors are too small, every downstream piece of statistical inference is distorted. Confidence intervals are too narrow. Test statistics are too large. P-values are too small. Significance is overstated. You end up with false confidence in findings that may not be real.

**Sarah:** Let me make sure I have the intuition for why standard errors shrink. The standard error is, roughly, your uncertainty about an estimate. It depends on how much information you have. If you have a thousand independent observations, you have a thousand pieces of information.

**Kiffer:** Right, that's the baseline picture.

**Sarah:** But if those thousand observations come from twenty hospitals, fifty patients each, and patients within a hospital are correlated, then you don't really have a thousand independent pieces of information. Patients within the same hospital are partly redundant. They tell you about hospital-level effects more than they tell you about person-level effects.

**Kiffer:** Exactly. So your effective sample size is smaller than your nominal sample size. And the standard error formula assumes you have the nominal sample size. Which means it assumes more information than you actually have. Which means it gives you a smaller standard error than you should have. Which means it gives you false precision.

**Sarah:** And then everything cascades from there. The test statistic, which is the estimate divided by the standard error, gets too big. The p-value gets too small. The confidence interval gets too narrow. Significance is reported when it shouldn't be.

**Kiffer:** And this isn't a small effect. Even a quite mild correlation within clusters can dramatically inflate Type I error rates. Researchers who run a naive analysis on clustered data, ignoring the clustering, may declare findings significant at five percent when their actual false positive rate is fifteen, twenty, sometimes thirty percent. We'll come back to that with numbers in a bit.

**Sarah:** Now before we get to the quantitative stuff, the lesson distinguishes two main types of clustering. Hierarchical and cross-classified. Walk through that distinction.

**Kiffer:** Hierarchical clustering is strict nesting. Each unit at the lower level belongs to exactly one unit at the higher level. A patient is admitted to one hospital. A student is in one classroom. A child belongs to one family. Whatever the lower-level unit is, it sits inside one and only one cluster.

**Sarah:** And cross-classified clustering is when observations belong to multiple grouping factors at the same time, and those factors are not nested within one another.

**Kiffer:** Right. The classic example is a patient who is seen by multiple physicians. Or a child who attends multiple schools over the course of a study. Or a student classified both by school and by neighbourhood, where students from the same school may live in different neighbourhoods, and students from the same neighbourhood may attend different schools.

**Sarah:** So in a hierarchical structure, if you tell me the school, you know the neighbourhood, because schools are inside neighbourhoods. In a cross-classified structure, that doesn't follow. School and neighbourhood are two crossing dimensions, not nested ones.

**Kiffer:** Exactly. And the methods you use to handle the two are similar in spirit but different in implementation. Most of Lessons 10 through 12 will focus on the hierarchical case, because that's the most common in epi data. Cross-classified models exist, the software handles them, but they're a more advanced corner of the field.

**Sarah:** There's also a category the lesson mentions called split-plot designs, which come out of agricultural experiments. Some factors are applied at the cluster level and others at the individual level. We'll bracket that for now. The key thing is that not all clustering is strict nesting.

**Kiffer:** Okay. So we've set the stage. Clustered data is everywhere in epidemiology. Ignoring it makes standard errors too small and p-values too small. There are two main flavours, hierarchical and cross-classified. Now let's turn to the question, how do we measure how much clustering matters in a particular dataset? And how does that translate into effective sample size?

**Sarah:** Right. The next piece introduces two of the most important quantities in this whole topic. The intracluster correlation coefficient, and the design effect. Let's take them one at a time.

**Kiffer:** The intracluster correlation coefficient. Often abbreviated as ICC. Sometimes called the intraclass correlation coefficient. Same animal, two names. We'll say ICC after the first use.

**Sarah:** What is it?

**Kiffer:** It's a number between zero and one that tells you how similar observations within the same cluster are, relative to the total variability in the data. The way I'd describe it in words is, the intracluster correlation coefficient is the fraction of total variance in the outcome that is attributable to differences between clusters.

**Sarah:** Let's unpack that. The total variance in the outcome can be split into two pieces.

**Kiffer:** Yes. There's variance between clusters, the differences in average outcomes from one cluster to the next. And there's variance within clusters, the differences from one observation to the next inside a single cluster. Add them together and you get total variance.

**Sarah:** And the ICC is the between-cluster variance divided by the total variance.

**Kiffer:** Right. Mathematically, in words, ICC equals variance between clusters, divided by the sum of variance between clusters and variance within clusters.

**Sarah:** So if all the variance in the data is between clusters, the ICC is one. Every observation in a cluster is identical, and the only differences are from cluster to cluster.

**Kiffer:** And if there is no between-cluster variance, if clusters are all the same on average, the ICC is zero. Then the cluster structure doesn't matter, because the clustering carries no information.

**Sarah:** Most real ICCs in epi land somewhere in between. What are typical values?

**Kiffer:** It varies enormously by domain. For something like blood pressure measured in primary care clinics, you might see ICC values around five to fifteen percent. For something like vaccination uptake measured at the household level, ICC can be much higher, twenty or thirty percent, because households make decisions together. For very stable individual outcomes, like death from a specific disease, ICC at the cluster level might be quite low, one or two percent.

**Sarah:** And here's the part that surprises students. Even very small ICCs, like one or two percent, can have substantial consequences if cluster sizes are large. Which brings us to the second key quantity.

**Kiffer:** The design effect. Sometimes abbreviated as deff. The design effect is the factor by which the variance of an estimate is inflated, relative to what it would be under simple random sampling, because of the clustering.

**Sarah:** And the formula in words?

**Kiffer:** The design effect equals one plus the quantity, average cluster size minus one, multiplied by the ICC.

**Sarah:** Let me say that back. Take the average number of observations per cluster. Subtract one. Multiply by the ICC. Add one. That's your design effect.

**Kiffer:** Right. And the way to interpret it is, your effective sample size is your nominal sample size divided by the design effect. So if your design effect is two, your effective sample size is half what it looks like. If your design effect is four, you have only a quarter of the information you thought you had.

**Sarah:** Let's do the worked example from the lesson. Imagine a study with an ICC of zero point zero five, that's five percent, and average cluster size of twenty observations per cluster.

**Kiffer:** Plug into the formula. Cluster size minus one is nineteen. Multiply by the ICC, which is zero point zero five. Nineteen times zero point zero five is zero point nine five. Add one. The design effect is one point nine five. Call it roughly two.

**Sarah:** So an ICC of just five percent, with twenty observations per cluster, almost doubles the variance of every estimate. The effective sample size is roughly half what the nominal sample size suggests.

**Kiffer:** Right. If you ran the study with a thousand subjects, your effective sample size is about five hundred. You have half the information you thought.

**Sarah:** And the consequences scale up quickly when cluster size gets bigger. The lesson does another example. Twenty hospitals, fifty patients each, total of a thousand. ICC of zero point zero five.

**Kiffer:** Cluster size minus one is forty nine. Multiply by zero point zero five. That's two point four five. Add one. Design effect is three point four five. Effective sample size is one thousand divided by three point four five, which is about two hundred and ninety. So you don't have a thousand independent patients, you have the equivalent of about two hundred and ninety.

**Sarah:** That's a big drop. And it's worth pausing to notice. The ICC didn't change. It's still five percent. What changed was the cluster size. Bigger clusters mean bigger design effects, even at the same ICC.

**Kiffer:** Which is why hospital-based studies, school-based studies, household-based studies with lots of observations per cluster need particularly careful attention to clustering. The same correlation that's nearly invisible with cluster size of two becomes catastrophic with cluster size of fifty.

**Sarah:** Now there's a really nice way to feel this in practice. In R, you can fit a null mixed model, which is just a model with no predictors and a random intercept for cluster. You ask the software to estimate the between-cluster variance and the within-cluster variance, and the ICC falls out as their ratio. Then you can compute the design effect by hand using the average cluster size.

**Kiffer:** And the lesson includes an R activity using a simulated primary care dataset. Thirty clinics, somewhere between eighteen and forty five patients each. The outcome is systolic blood pressure. The patients have age, smoking status, body mass index. The clinics have size and an urban or rural indicator. Once you fit the null mixed model, you find an ICC around ten to fifteen percent, a design effect around three point six, and an effective sample size that's roughly a third of the nominal nine hundred and sixty.

**Sarah:** And the lesson is sharp on the implication. Pretending you have nine hundred and sixty independent observations when you really have the information equivalent of three hundred or so is going to give you wildly overconfident inference.

**Kiffer:** And one more piece worth mentioning before we move on. The design effect concept extends to discrete outcomes too, not just continuous ones. The ICC for binary data is defined slightly differently and is more involved to estimate, but the practical consequence is the same. Ignoring clustering shrinks standard errors and inflates Type I error.

**Sarah:** Okay. That gives us the vocabulary. ICC, the intracluster correlation coefficient. Design effect, the variance inflation factor. Effective sample size, the nominal size divided by the design effect. Now let's turn to what we actually do about clustering. The methods that pay back the variance you would otherwise lose.

**Kiffer:** And the lesson previews three main approaches. Marginal models, also called generalized estimating equations or GEE. Random effects models, also called mixed models or multilevel models. And robust standard errors, also called sandwich estimators. Three different philosophies, three different sets of trade-offs.

**Sarah:** Let's take them one at a time. Start with the first one. Generalized estimating equations. Spell that out.

**Kiffer:** Generalized estimating equations. We'll abbreviate as GEE after this. The basic idea is that you fit a regression model, similar to what you would have fit ignoring clustering, but you tell the estimator to account for the within-cluster correlation when computing standard errors.

**Sarah:** And the key feature of GEE is what it estimates. It estimates what's called a population-averaged effect, sometimes called a marginal effect. That's the average effect of the exposure across the whole population, not within any specific cluster.

**Kiffer:** Right. So if you're studying, say, the effect of a smoking cessation program implemented across many clinics, GEE answers the question, on average across all the clinics in this population, what is the effect of the program? It treats the cluster effect as a nuisance to be accounted for, not as a quantity of intrinsic interest.

**Sarah:** Which is useful when the question is policy-level. When you want to know the effect at the population level, the average across clusters, GEE is a natural fit.

**Kiffer:** Now the second approach. Mixed models. Also called random effects models, also called multilevel models. The mixed in mixed models means a mix of fixed and random effects. Fixed effects are the regression coefficients you've always estimated. Random effects are coefficients that vary across clusters according to a probability distribution.

**Sarah:** And the language of fixed and random can be confusing at first. Let's slow down on it.

**Kiffer:** A fixed effect is a parameter you want to estimate directly. The effect of age on blood pressure, say. There's a single number that captures it across the whole population, and you fit it to the data. A random effect is a deviation from the overall average that varies across clusters according to some assumed distribution. So instead of estimating a separate intercept for every clinic, you assume the clinic-specific intercepts are draws from a normal distribution around an overall mean, and you estimate the variance of that distribution.

**Sarah:** And mixed models give you what's called cluster-specific effects. Effects that capture variation between clusters, not just the average across them.

**Kiffer:** Exactly. A mixed model says, the effect of age on blood pressure is, on average, this much, but it varies a bit from clinic to clinic, and we'll quantify that variation. Or, the average blood pressure varies across clinics around an overall mean, and we'll estimate how much.

**Sarah:** Useful when the cluster-level dynamics are themselves of interest. When you want to ask, are some clinics doing better than others? Or, does the effect of the intervention vary across schools? Mixed models give you direct access to that question.

**Kiffer:** And the third approach is robust standard errors. Sandwich estimators. The idea here is that you fit a perfectly ordinary regression, ignoring clustering, but then you replace the standard error formula with one that's adjusted for the fact that observations within clusters are correlated.

**Sarah:** Why is it called a sandwich estimator?

**Kiffer:** Because the formula has a particular shape. There's a piece in the middle, the variance of the score, and it's flanked by two outer pieces, copies of the inverse Fisher information. So you have bread, filling, bread. Sandwich. The name has stuck for decades.

**Sarah:** And what's the appeal?

**Kiffer:** It's a compromise. You don't have to specify the within-cluster correlation structure. You don't have to commit to a particular probability model for the random effects. You just say, my regression coefficients are estimated as usual, but my standard errors will be conservative and account for clustering. It's the easiest fix when you don't want to refit the analysis as a mixed model or a GEE.

**Sarah:** And the trade-off?

**Kiffer:** Two main ones. First, robust standard errors don't give you cluster-specific information. You can't ask how much clinics differ. You just get a coefficient and an adjusted standard error for the population average. Second, robust standard errors require enough clusters. The rule of thumb is at least twenty to thirty clusters. With fewer than that, the sandwich estimator can actually underestimate the true variance, which defeats the purpose.

**Sarah:** So robust standard errors are a clean, conservative quick fix when you have lots of clusters and don't need cluster-specific inference. GEE is for when the question is population-averaged. Mixed models are for when you want cluster-specific structure. Three tools, three jobs.

**Kiffer:** And the lesson also mentions a few other approaches worth knowing about, even if we don't go deep on them. Fixed effects for clusters, where you literally include a dummy variable for each cluster. That handles cluster-level confounding completely but uses a lot of degrees of freedom and doesn't let you estimate the effect of cluster-level predictors.

**Sarah:** Right. If you put a fixed effect for hospital in your model, you've controlled for everything that varies at the hospital level, measured or unmeasured. But you can't then ask how the urban versus rural status of the hospital affects the outcome, because urban versus rural is a hospital-level variable, and the hospital fixed effects have absorbed it.

**Kiffer:** And the lesson also mentions survey methods, which are design-based approaches to clustering used when the data come from complex sample surveys with stratification, clustering, and unequal selection probabilities. Stata's svy commands, SAS's PROC SURVEY procedures, R's survey package. Specialized but important when you're dealing with national surveys.

**Sarah:** Now the lesson previews mixed models in particular for the rest of the course. Lessons 10, 11, and 12 develop mixed models in depth, first for continuous outcomes, then for discrete outcomes, then for repeated measures and longitudinal designs. So it's worth previewing two key concepts that will recur.

**Kiffer:** Random intercepts and random slopes.

**Sarah:** Define them carefully.

**Kiffer:** A random intercept means the baseline level of the outcome varies across clusters. So in a clinic study, the average blood pressure differs from clinic to clinic, and the random intercept captures that. The model still has one slope, one effect of, say, age on blood pressure, but each clinic has its own starting point.

**Sarah:** And a random slope?

**Kiffer:** A random slope means the effect of a predictor varies across clusters. The effect of age on blood pressure might be steeper in some clinics than others. Or the effect of an intervention might be larger in some schools than others. The random slope lets the model capture that variation explicitly.

**Sarah:** And these two pieces, random intercepts and random slopes, are the building blocks of multilevel modelling. Most mixed models in epi have a random intercept. Some also have one or more random slopes, depending on whether you think key predictors actually vary in their effect across clusters.

**Kiffer:** And one more thing worth flagging. The lesson mentions that clustering can sometimes introduce confounding, not just inflate standard errors. If a cluster-level variable is associated with both the exposure and the outcome, it acts as a confounder, and ignoring the cluster structure means that confounding is unaddressed.

**Sarah:** Give an example.

**Kiffer:** Suppose you're studying the effect of an exercise program on diabetes risk, and the data come from clinics in different regions. Some regions are more affluent and have lower diabetes risk overall. Those same regions might also be the ones where the exercise program is more available. So if you don't account for region, the association between exposure and outcome will be partly driven by regional differences, which has nothing to do with the program itself.

**Sarah:** And that's a different kind of damage from the standard error issue. Standard errors can be patched up by sandwich estimators or GEE or mixed models. But if there's confounding by cluster-level variables, you need to either include those variables in the model or use a method that handles cluster-level confounding directly.

**Kiffer:** Fixed effects for clusters do that. Mixed models with appropriate covariate adjustment can do it. Robust standard errors alone don't, because they only fix variance, not point estimates.

**Sarah:** Important distinction. Now the simulation evidence in the lesson is striking. Let me run through some of the numbers, because they make the abstract problem feel concrete.

**Kiffer:** Please, go ahead and walk us through them.

**Sarah:** Scenario one. Small ICC, large clusters. ICC of one percent, fifty subjects per cluster. The design effect is one plus forty nine times zero point zero one, which is one point four nine. Doesn't sound like much. But simulation studies show the actual Type I error rate, instead of the nominal five percent, can climb to ten or fifteen percent.

**Kiffer:** Scenario two. Moderate ICC, moderate clusters. ICC of five percent, twenty per cluster. Design effect of about one point nine five. Type I error rates of fifteen to twenty five percent. Meaning that one in four or one in five so-called significant findings are false positives.

**Sarah:** Scenario three. Larger ICC, larger clusters. ICC of ten percent, thirty per cluster. Design effect of about three point nine. Type I error rates of thirty to forty percent. At that level, the naive analysis is essentially unreliable.

**Kiffer:** And the takeaway is the same in all three cases. Even small ICCs, when combined with moderate or large cluster sizes, can produce alarming inflation of false positive rates. There's no safe threshold below which you can ignore clustering. You always have to think about it.

**Sarah:** Okay. Before we pull the takeaways together, there's one more piece of the lesson I want to highlight.

**Kiffer:** Yeah, what's that?

**Sarah:** The interactive simulation widget. Students can pick a number of clusters, a cluster size, an ICC, and an alpha level, and then run five hundred simulated studies under the null hypothesis, where there is no real treatment effect.

**Kiffer:** And the widget compares two analyses run on the same simulated data. A naive analysis that pretends observations are independent, and a cluster-aware analysis that accounts for the clustering. Under the null hypothesis, both analyses should reject the null about five percent of the time, the nominal alpha level.

**Sarah:** And what students see is that the cluster-aware test sticks at five percent. Exactly where it should be. Whereas the naive test, when you crank up the ICC, balloons. At an ICC of five percent with twenty clusters of fifty subjects each, the naive test rejects the null around fifteen to twenty percent of the time.

**Kiffer:** So three to four times the rate it should. And the data the test is operating on are designed to have no real signal at all. Every rejection is a false positive. The clustering alone is generating spurious significance.

**Sarah:** And that's the felt version of what the design effect formula was telling us in equations. The same intuition, just rendered in pixels and counts.

**Kiffer:** I'd encourage every student to spend ten minutes with that widget. Move the sliders. Push ICC up and watch the naive false positive rate climb. Push cluster size up and watch the same thing happen. Once you've felt that pattern in your hands, you won't forget it.

**Sarah:** Okay. One more piece worth flagging before we land. The lesson talks about how to detect clustering in your data when you're not sure how much there is. Three approaches.

**Kiffer:** Yeah. First, visual inspection. Plot your outcome by cluster. Box plots, scatter plots with cluster colour, whatever helps you see the data. If clusters look systematically different from each other, you've got clustering.

**Sarah:** Second, ICC estimation. Fit a null mixed model with a random intercept for cluster. The software gives you the between-cluster variance and the within-cluster variance. Compute the ICC. If it's well above zero, you have clustering that needs to be addressed.

**Kiffer:** Third, a likelihood ratio test. Fit two models. One with a random intercept for cluster. One without. Compare the likelihoods. If the model with the random intercept fits significantly better, the cluster-level variance is real and not a numerical artifact.

**Sarah:** And the practical recommendation is, in any clustered design, just do all three. Look at the data. Compute the ICC. Run the likelihood ratio test. Then choose your method based on what you find and what your scientific question is.

**Kiffer:** Good. Let's land the takeaways from this lesson.

**Sarah:** First takeaway. Clustering is the rule, not the exception, in epidemiologic data. Patients in hospitals, students in classrooms, repeated measurements in people, children in families, households in neighbourhoods. Anywhere your sampling design or the natural structure of the population creates groups of similar observations, you have clustering.

**Kiffer:** Second. Ignoring clustering has predictable and serious consequences. Standard errors come out too small. Confidence intervals are too narrow. P-values are too small. Type I error rates inflate. False findings are reported as significant. The damage is not subtle, especially when cluster sizes are moderate or large.

**Sarah:** Third. Two key quantities organize how to think about clustering. The intracluster correlation coefficient, which measures how similar observations within clusters are relative to total variability. And the design effect, which equals one plus the average cluster size minus one, all multiplied by the ICC. The design effect tells you how much your nominal sample size needs to be discounted.

**Kiffer:** Fourth. Even small intracluster correlation coefficients can have large consequences when cluster sizes are big. An ICC of five percent with cluster size of fifty gives you a design effect of three point four five, which means your effective sample size is less than a third of your nominal sample size. There's no safe threshold below which clustering can be ignored.

**Sarah:** Fifth. Two main types of clustering exist. Hierarchical, where each unit nests inside exactly one cluster at the next level up. And cross-classified, where units belong to multiple non-nested grouping factors at once. Most epi work is hierarchical, but cross-classified structures are increasingly common in education, neighbourhood, and physician-network research.

**Kiffer:** Sixth. Three main families of methods exist for handling clustered data. Marginal models, particularly generalized estimating equations, for population-averaged effects. Mixed models, also called multilevel or random effects models, for cluster-specific effects. And robust standard errors, also called sandwich estimators, as a quick conservative fix when you don't need cluster-specific structure and you have enough clusters.

**Sarah:** Seventh. Mixed models are the focus of the next three lessons in this course. The two key building blocks to keep in mind are random intercepts, where the baseline level of the outcome varies across clusters, and random slopes, where the effect of a predictor varies across clusters. Most mixed models include a random intercept. Whether you also include random slopes depends on your scientific question and the structure of your data.

**Kiffer:** Eighth. Always check for clustering before you finalize an analysis. Estimate the ICC. Compute the design effect. Pick a method that fits the structure of your data and the question you want to answer. When in doubt, fit it both ways and compare. If your conclusions are robust to the choice of method, you can be more confident in them. If they aren't, the methodological choice itself becomes an important part of the discussion.

**Sarah:** And one practical note. Robust standard errors require a moderate-to-large number of clusters, typically at least twenty to thirty. With fewer than that, the sandwich estimator can actually underestimate the variance and give you false reassurance. In small-cluster settings, mixed models or fixed effects are usually safer.

**Kiffer:** And one more practical note. Cluster-level confounding is a separate problem from cluster-level variance inflation. Robust standard errors fix the variance but not the confounding. If a cluster-level variable affects both your exposure and your outcome, you need to either include that variable in the model or use a method, like cluster fixed effects, that handles the confounding directly.

**Sarah:** Next up is Lesson 10. Mixed models for continuous outcomes. We'll go deep on random intercepts and random slopes, fit them in R, interpret the variance components, and learn how to choose between mixed and marginal approaches when both are reasonable.

**Kiffer:** Take care, everyone.

**Sarah:** See you in Lesson 10.
