Ecological and Group-Level Studies

Evaluating Epidemiological Research

Learning objectives for this lesson:

List the 3 major categories of variable used in ecologic models and describe their attributes
Describe the constructs of a linear model at the individual and group levels and constraints on estimating incidence rate ratios at the group level
Describe how within-group misclassification, group-level confounding, and group-level interaction can affect causal inferences
Describe the basis of the ecologic and atomistic fallacies
Identify scenarios where ecologic studies are less likely to produce cross-level inferential errors
Describe how to integrate individual-level studies with ecologic studies to prevent cross-level inferential errors

This course was developed by Dr. Kiffer G. Card, Faculty of Health Sciences, Simon Fraser University.

Reference

Glossary: Key Terms, People & Concepts

📚 Reference page, available throughout the lesson

This glossary collects the key concepts, people, and ideas you will meet in this lesson. Use it as a reference while you work through the material, or as a review before assessments. Type in the search box to filter entries.

Key Concepts & Ideas

Ecological (Group-Level) Study An observational study in which the unit of analysis is a group (country, province, school, neighborhood) rather than an individual. Exposure and outcome are measured as group-level summaries.

Unit of Analysis The entity for which data are aggregated and analyzed. In ecological studies the unit is a group; in cohort and case-control studies it is the individual.

Aggregate Variable A group-level variable derived from individual data, e.g., mean income, percent vaccinated. Summarizes individuals into a single number for the group.

Environmental Variable A group-level variable that has no individual-level analogue, e.g., air pollution, latitude, water hardness. A property of the place itself.

Global (Contextual) Variable A group-level attribute that characterizes the group as a whole, not its members, e.g., presence of a smoking ban, type of healthcare system. Cannot be derived from individuals.

Ecological Fallacy The error of inferring individual-level relationships from group-level data, named by Robinson (1950) and later refined by Selvin (1958) and Schwartz (1994). A correlation between average income and average health across countries does not mean richer individuals are healthier within a country.

Atomistic (Individualistic) Fallacy The reverse error: inferring group-level relationships from individual-level data alone, ignoring contextual effects. Articulated by Schwartz (1994) and Diez Roux (1998). Smoking causes lung cancer in individuals, but country-level smoking rates do not by themselves explain country-level cancer rates.

Cross-Level Inference An inference that crosses levels of analysis, from group to individual, or from individual to group. The central methodological challenge of multilevel epidemiology.

Within-Group Misclassification Bias arising because group-level summaries treat all members as if exposed (or unexposed) to the same degree. The greater the within-group variability, the more misclassification distorts ecological associations.

Group-Level Confounding Confounding by a variable that operates at the group level, e.g., wealth, climate, healthcare system. Even if individual-level confounders are controlled, group-level confounding remains.

Group-Level Interaction (Effect Modification) When the effect of an exposure differs across groups because of group-level conditions. Treating these effects as if uniform produces misleading pooled estimates.

Multilevel (Hierarchical) Model A statistical model that simultaneously incorporates individual- and group-level variables, with random effects representing group-level variation. The modern toolkit for sidestepping the ecological/atomistic fallacies (Diez Roux, 1998).

Contextual Effect An effect on individuals that depends on properties of their group, beyond their own individual exposure, e.g., the effect on health of living in a low-income neighborhood independent of one's own income.

Compositional Effect A group-level pattern that arises simply because of who is in the group. If a city has poor average health because more poor people live there, the effect is compositional, not contextual.

Rate-Based Ecological Comparison Comparing incidence or mortality rates across groups (regions, time periods). Often the simplest ecological design.

Risk-Based Ecological Comparison Comparing risks (proportions) across groups, typically when the population is closed and follow-up is uniform.

Time-Trend (Time-Series) Study An ecological design that follows a single group over time, comparing exposures and outcomes across time periods. Often used to evaluate policy interventions.

Multi-Group Comparison An ecological design that compares many groups at one point in time, e.g., disease rates across countries.

Mixed (Multi-Group/Time) Design An ecological design that varies both place and time, e.g., country×year panels, to leverage both kinds of variation while making confounding harder to ignore.

Hybrid (Multilevel) Study A design that combines individual- and group-level data to triangulate inferences and avoid pure ecological or atomistic interpretations.

Key People

William S. Robinson (1913–1949) American sociologist whose 1950 paper “Ecological Correlations and the Behavior of Individuals” named the ecological fallacy and demonstrated it dramatically with US Census data on literacy and race; the paper was reissued in the International Journal of Epidemiology in 2009.

Hal Morgenstern American epidemiologist whose work in the 1980s and 1990s, notably Morgenstern (1995), formalized the typology of ecological designs and clarified when ecological inferences are (and are not) defensible.

Ana Diez Roux Argentine-American epidemiologist whose work on neighborhood effects, especially Diez Roux (1998), helped bring multilevel modeling into mainstream public-health practice as a remedy for ecological/atomistic confusion.

Émile Durkheim (1858–1917) French sociologist whose 1897 study Suicide compared suicide rates across European regions and religions; one of the foundational ecological analyses in social science. A classic example of using group-level data to argue for social causation; Selvin (1958) later used it to illustrate the cross-level inference problem that came to be called the ecological fallacy.

No matching entries. Try a different search term.

Section 1 of 5

Introduction & Rationale for Group-Level Studies

⏱ Estimated reading time: 45 minutes

Section 1 of 5

Introduction & Rationale for Group-Level Studies

What ecologic studies are, why they exist, and what they cannot tell you.

The central problem

Group data, individual inference

When exposure and outcome are both measured at the group level, the association between them need not describe individuals at all.

That gap between the level of the data and the level of the inference is the ecological fallacy, named by Robinson in 1950. Understanding when and why it appears is the spine of this lesson.

The design defined

What makes a study ecologic

Unit of analysis

A group: county, nation, school, neighbourhood. Each data point is a population summary, not a person.

Variants

Exploratory (no direct exposure), analytic (exposure measured and modelled), or partial ecologic (mixing levels).

The primary limitation: we do not know the joint distribution of risk factors and disease within groups. That ignorance is the source of potential bias.

Two examples

Ecologic designs in practice

Idaho groundwater (Ex. 29.1)

County-level arsenic exposure vs. cancer incidence. No significant association after adjustment. A group-level null need not mean no individual-level effect.

US bladder cancer (Ex. 29.2)

State-level predictors: smoking prevalence, insurance coverage, UV index, water type. Associations found, but do they hold for individuals?

Why use it

Four rationales, one warning

Measurement constraint

Individual exposure is impossible to record directly, e.g., historical pollution or population dietary intake.

Exposure homogeneity

All group members share the same exposure, e.g., same water supply, same school curriculum.

Group-level interest

The research question is inherently about groups: vaccination coverage and population-level incidence.

Simplicity

Fast and low-cost. But simplicity alone does not justify an ecologic design when individual-level data are available.

Design types

Three ecologic designs

Time-trend

One group over time. Common for policy evaluations: did the intervention change the trajectory?

Multi-group

Many groups at one time point. Classic cross-national comparisons of disease rates.

Mixed (panel)

Varies both place and time. Harder to confound; more analytic complexity.

Morgenstern (1995) formalized this typology and the conditions under which ecologic inference is defensible.

Carry forward

What to take into the next section

Ecologic studies are justified by measurement constraints, exposure homogeneity, or a genuine group-level question, not by convenience alone.
The primary limitation is ignorance of the within-group joint distribution of exposure and disease.
Three design types (time-trend, multi-group, mixed) shape how much of the bias machinery applies.

Introduction and Overview

Earlier lessons worked one unit of analysis: the individual person. Cross-sectional, case-control, and cohort designs all sample people, measure exposures and outcomes on people, and make inferences about people. This lesson changes the unit of analysis to the group (counties, schools, nations) and immediately introduces a problem the previous designs did not have: even when group-level associations are strong and well-measured, you cannot, in general, conclude anything about individuals. The phrase you will spend most of this lesson learning to recognise is the ecological fallacy, named by Robinson (1950) and elaborated by Selvin (1958). Across the four content sections we move from the rationale for these designs (this section), to the kinds of variables they use (a later section), to the inferential traps they create (a later section), to the analytic strategies that mitigate those traps and the related-but-distinct world of group-level studies that do not commit the fallacy at all (a later section).

Learning Objectives

Define ecologic studies and distinguish exploratory, analytic, and partial-ecologic variants (Morgenstern, 1995).
Explain why the unit of analysis matters and how it shapes the inferences a design can support.
Articulate four legitimate reasons to run an ecologic study (measurement constraint, exposure homogeneity, group-level interest, simplicity) and the cost each carries.
Read group-level examples (arsenic in groundwater, bladder cancer across U.S. states) with the right inferential caveats from the start.

14.1 What Are Ecologic Studies?

Ecologic studies are studies where exposure, outcome, and confounders are all measured at the group level (e.g., townships, counties, nations), but the researcher wants to make inferences about individuals. The groups serve as cluster samples of the population.

Ecologic studies can be exploratory (no direct exposure measurement, looking for associations to guide future research) or analytic (exposure factor is measured and included in the analysis). Some studies are partial ecologic, combining some individual-level variables with group-level variables, which introduces unique inferential challenges.

Key Limitation

The primary limitation of ecologic studies is that we do not know the joint distribution of risk factors and disease within groups. This ignorance about within-group associations creates the potential for severe bias when inferring to the individual level.

In plain terms, we know each group's average exposure and its overall disease rate, but not whether it was the exposed members who actually developed the disease. That missing link is what lets a group-level pattern and the individual-level truth point in different directions.

14.1.1 Examples of Ecologic Studies

Two short examples make the design concrete before we discuss why anyone would ever use it.

Example 29.1: Arsenic in Idaho Groundwater

County-level data on cancer incidence and arsenic levels in groundwater were examined. After adjusting for confounders, no significant relationship was found between county-level arsenic exposure and cancer incidence. This illustrates how group-level analyses may fail to detect individual-level associations.

Example 29.2: Bladder Cancer Across US States

Bladder cancer mortality rates across US states were examined in relation to state-level predictors: smoking prevalence, health insurance coverage, UV index, and water supply type. The ecological analysis identified associations that may or may not reflect individual-level causal mechanisms.

14.1.2 Rationale for Ecologic Studies

Given the inferential limit just stated, why does anyone run an ecologic study at all? Four reasons recur, each unpacked in the accordion below. The first three explain when ecologic data are the only data you have; the fourth is a warning that simplicity has a real cost. Susser (1994) provides the foundational defence of ecologic analysis as both an outlook and a method in modern epidemiology.

Despite their limitations, ecologic studies are sometimes the only practical approach:

Measurement Constraints at Individual Level

Individual-level measurement of some exposures is impractical or impossible. For example, measuring historical pollution levels or dietary intake for an entire population is expensive. Group-level aggregates (e.g., county-level average pollutant concentration, regional disease prevalence) can serve as proxies.

Exposure Homogeneity Within Groups

In some situations, exposure is relatively homogeneous within groups. For instance, all residents of a region receive water from the same supply, all schoolchildren in a district receive the same curriculum-based intervention, or all patients in a clinic receive the same standard of care.

Interest in Group-Level Effects

Sometimes the research question is fundamentally about group-level phenomena: Do communities with water fluoridation have lower dental caries rates? Do nations with higher vaccination coverage have lower measles incidence? The group itself is the unit of scientific interest.

Simplicity of Analysis

Ecologic analysis is often simpler and faster than acquiring and analyzing individual-level data across many groups. However, this simplicity may hide serious methodological problems and inferential errors.

Reflection

Think of a public health issue in your community. How might you design an ecologic study to examine it? What would be your unit of analysis (e.g., neighbourhood, city, province)? What group-level variables would you measure?

Model answerA defensible example: an ecologic study of municipal cycling infrastructure (km of protected lanes per 100,000 residents) and bicycle-related serious injury rates across BC cities, using ICBC injury claims as the outcome. Unit of analysis: city / health-region. Variables: (a) aggregate (population density, average commute distance; (b) environmental) km of protected lanes, presence of vision-zero policy; (c) global, provincial law (mandatory helmet, e-bike rules) since policy varies by jurisdiction. This design is feasible because the exposure exists only at the city level and individual exposure ("did you ride past a protected lane this morning?") is not the relevant scale of inference. Limitation to name: ecologic findings here speak to group-level policy questions, not individual behaviour-change recommendations.

Minimum 20 characters required.

✓ Reflection saved

Knowledge Check; this section

1. What distinguishes an ecologic study from other observational study designs?

It always uses randomization Exposures and outcomes are measured at the group level rather than the individual level It can only study environmental factors

In ecologic studies, the unit of analysis is the group (e.g., county, nation), not the individual. Measurements are aggregated or summarized at the group level.

2. What is a "partial ecologic study"?

A study that only measures the outcome A study where some variables are measured at the individual level and others at the group level A study with incomplete data collection

Partial ecologic studies mix individual- and group-level measurements, which introduces unique inferential challenges.

3. Which of the following is NOT a rationale for conducting ecologic studies?

Difficulty measuring individual-level exposures Interest in group-level effects like public health interventions The ability to determine individual-level causal mechanisms

Ecologic studies cannot establish individual-level causal mechanisms because the data are aggregated. They are useful when individual-level measurement is impractical or when group-level effects are of interest.

Section 2 of 5

Types of Ecologic Variables & The Linear Model

⏱ Estimated reading time: 45 minutes

Section 2 of 5

Types of Ecologic Variables & The Linear Model

Three variable categories and the group-level regression that links them to outcomes.

Variable typology

Three categories of ecologic variable

Aggregate

Derived by summarizing individual data within the group. Examples: proportion vaccinated, mean BMI, disease rate.

Environmental

Physical characteristics of the group itself. Examples: UV index, pollutant level, water supply type.

Global

No individual-level analogue. Examples: population density, healthcare policy, presence of a smoking ban.

Global variables exist only at the group level; there is nothing to assign to an individual person.

The model

The ecologic linear model

Ecologic linear model

\[ \color{#0B7B6B}{Y_j} = \color{#C2410C}{\beta_0} + \color{#6D28D9}{\beta_1} \color{#1D4ED8}{X_{1j}} + \beta_2 X_{2j} + \color{#BE185D}{\varepsilon_j} \]

Y_j group disease rateβ₀ baseline rateβ₁ exposure slopeX_1j group exposureε_j group error

Where \(Y_j\) is the outcome rate for group \(j\), \(X_{1j}\) is the exposure proportion, \(X_{2j}\) is a confounder proportion, and \(\varepsilon_j\) is the error term.

\(\beta_0\) = predicted rate when \(X_1 = 0\) (no group member exposed).
\(\beta_1\) = change in outcome rate per unit increase in exposure proportion.

The rate ratio

Group-level incidence rate ratio

Group-level IR_G (Eq. 29.1)

\[ \color{#0B7B6B}{\text{IR}_G} = \frac{\color{#C2410C}{\beta_0} + \color{#6D28D9}{\beta_1}}{\color{#C2410C}{\beta_0}} = 1 + \frac{\color{#6D28D9}{\beta_1}}{\color{#C2410C}{\beta_0}} \]

IR_G group rate ratioβ₀ baseline rateβ₁ exposure slope

This is the ratio of predicted rates in a fully exposed group versus a fully unexposed group.

The problem: most observed exposure proportions lie between those extremes. Estimating \(\text{IR}_G\) requires extrapolation to \(X_1 = 0\) and \(X_1 = 1\), often well outside the observed data range.

Modelling pitfalls

Three issues in published ecologic work

Correlation vs. regression: About one-third of ecologic studies report a correlation coefficient. Only a regression coefficient estimates the incidence rate difference directly.
Standardised outcomes: Using standardised mortality ratios rather than crude rates adds complexity and can distort \(\text{IR}_G\).
Interaction form mismatch: A linear group-level model and a logit individual-level model handle interaction terms differently. Estimates may not be comparable across levels.

Carry forward

What to take into the next section

Three variable types: aggregate (derived from individuals), environmental (place-level), global (no individual analogue).
The ecologic linear model estimates \(\text{IR}_G = 1 + \beta_1/\beta_0\), which requires extrapolation beyond most observed data.
Regression coefficients are preferable to correlation coefficients; interaction forms differ between levels.

Introduction and Overview

An earlier section motivated the design and named its central liability. To use it carefully, we need a vocabulary for the variables it operates on, a model that connects them, and a clear sense of what that model can and cannot say. This section provides all three.

Learning Objectives

Distinguish aggregate, environmental, and global ecologic variables and explain why the distinction matters for cross-level inference.
Read and interpret the ecologic linear model Y_j = β₀ + β₁X_1j + ε_j and the group-level incidence rate ratio it implies.
Explain why estimating IR_G requires extrapolation beyond observed exposure ranges and what that means for inference.
Identify modelling pitfalls: correlation vs regression, standardised outcomes, and cross-level interaction differences.

14.2 Categories of Ecologic Variables

Three major categories of variables can be used in ecologic models, each with different attributes and interpretations. The three flip cards below define each in turn; click each one and notice that they differ in whether the variable has any individual-level analogue at all, a distinction that becomes important when we reach the ecologic fallacy in a later section.

Aggregate VariablesClick to explore

Environmental VariablesClick to explore

Global VariablesClick to explore

In brief: an aggregate variable summarizes the individuals in a group (the percent vaccinated, the mean BMI, the disease rate); an environmental variable is a physical feature of the place itself (the UV index, a pollutant level, the water supply type); and a global variable has no individual version at all (a law, a policy, or the population density). Only an aggregate variable can even in principle be traced back to a single person, and that difference is what will matter when we reach the ecologic fallacy.

With the variable types in hand, the standard analytic move is a linear regression at the group level. The model below is the workhorse; the equations that follow it describe what the regression coefficients mean and where their interpretation gets uncomfortable.

14.2.1 The Linear Model in Ecologic Studies

Ecologic studies often use linear regression to model the relationship between group-level exposure and group-level outcome:

Ecologic linear model

\[ \color{#0B7B6B}{Y_j} = \color{#C2410C}{\beta_0} + \color{#6D28D9}{\beta_1} \color{#1D4ED8}{X_{1j}} + \beta_2 X_{2j} + \color{#BE185D}{\varepsilon_j} \]

The group rate of disease equals a baseline level plus the slope times the group exposure proportion (and other group predictors), plus group-level error. Each unit is a group, not a person.

Where Y is the outcome rate for group j, X₁ is the exposure proportion, X₂ is a confounder, and ε is the error term. The group-level incidence rate ratio (IR_G) is estimated as:

Group-level incidence rate ratio (Eq. 29.1)

\[ \color{#0B7B6B}{\text{IR}_G} = \frac{\color{#C2410C}{\beta_0} + \color{#6D28D9}{\beta_1}}{\color{#C2410C}{\beta_0}} = 1 + \frac{\color{#6D28D9}{\beta_1}}{\color{#C2410C}{\beta_0}} \]

The group-level rate ratio compares the rate when everyone is exposed against the baseline rate when no one is. It is driven entirely by the ratio of the exposure slope to the intercept.

A major limitation of this approach is that IR_G requires extrapolation to groups with 0% and 100% exposure, which may extend far beyond the range of observed data. Additionally, different group sizes may require weighted regression for valid inference.

14.2.2 Modelling Issues

Several issues arise when modelling ecologic data:

Correlation vs. regression: About 33% of ecologic studies use correlation coefficients instead of regression coefficients. Regression coefficients estimate the incidence rate difference, which correlation does not provide directly.
Standardized outcomes: Some studies use standardized mortality ratios (SMRs) rather than crude rates, which may introduce additional complexity.
Interaction terms: The form of interaction at the group level may differ from the individual level when using linear models at group level and logit models at individual level.

Reflection

Consider the three types of ecologic variables. For a study on the relationship between income inequality and mental health outcomes across Canadian provinces, classify each: (a) provincial median income, (b) provincial mental health policy score, (c) average winter temperature.

Model answer(a) Provincial median income, aggregate variable (summary of individual incomes). (b) Provincial mental health policy score, global / contextual variable (exists only at the group level; no individual analogue). (c) Average winter temperature, environmental / contextual variable (a feature of the place itself). The distinction matters because aggregate variables can in principle be related back to individuals (with strong assumptions), while global and environmental variables cannot, their effects on health are inherently contextual and require multilevel models if you want individual-level inference.

Minimum 20 characters required.

✓ Reflection saved

Knowledge Check; this section

1. Which type of ecologic variable has NO analogue at the individual level?

Aggregate variable Environmental variable Global variable

Global variables (e.g., population density, laws, organizational policies) are characteristics of the group that cannot be meaningfully measured for an individual.

2. In an ecologic linear regression model Y_j = β₀ + β₁X_1j + ε_j, what does the group-level incidence rate ratio IR_G estimate?

The ratio of disease rates in exposed vs non-exposed individuals The ratio of predicted disease rates when the group proportion exposed changes from 0 to 1 The individual-level odds ratio

IR_G = 1 + β₁/β₀, representing the ratio of the predicted rate in a fully exposed group to the rate in a fully unexposed group. This requires extrapolation beyond observed data ranges.

3. Why is using correlation coefficients rather than regression coefficients problematic in ecologic studies?

Correlation coefficients are always larger Regression coefficients provide an estimate of the incidence rate difference, which correlation coefficients do not Correlation coefficients require larger sample sizes

Regression coefficients provide an estimate of the incidence rate difference (ID_G), which can be used to estimate the incidence rate ratio. Correlation coefficients do not provide this directly and are considered less informative.

Section 3 of 5

Inferential Errors & Sources of Ecologic Bias

⏱ Estimated reading time: 45 minutes

Section 3 of 5

Inferential Errors & Sources of Ecologic Bias

The ecological fallacy, the atomistic fallacy, and the three structural reasons ecologic estimates mislead.

The fallacy named

The ecological fallacy (Robinson, 1950)

The error of assuming a group-level association applies to individuals.

Robinson's 1950 demonstration: US states with more foreign-born residents had higher average literacy at the group level. Within states, foreign-born individuals were less literate on average. Same data, opposite conclusions at different levels.

The mirror error

The atomistic fallacy (Schwartz, 1994)

The error of assuming individual-level findings hold at the group level. Populations have properties individuals do not.

Herd immunity

A population-level phenomenon. There is no individual-level analogue: you cannot be herd-immune; your community can be.

The symmetry

Ecological fallacy: group data wrongly applied to individuals. Atomistic fallacy: individual data wrongly applied to groups. Both are cross-level inference errors.

Bias source 1

Within-group misclassification

Effect on IR_G (Eq. 29.3)

\[ \color{#0B7B6B}{\text{IR}_G} = 1 + \frac{\color{#1D4ED8}{\text{IR}} - 1}{\color{#C2410C}{\text{Se}} + \color{#6D28D9}{\text{Sp}} \cdot \color{#1D4ED8}{\text{IR}} - \color{#1D4ED8}{\text{IR}}} \]

IR_G observed group RRIR true rate ratioSe sensitivitySp specificity

Where IR is the true individual-level rate ratio, Se is sensitivity, and Sp is specificity.

Key reversal: non-differential misclassification at the individual level biases toward the null. At the group level it biases away from the null, inflating the ecologic association.

Bias sources 2 and 3

Group-level confounding and effect modification

Group-level confounding

A factor not confounding at the individual level can confound at the group level if its distribution varies across groups. Ecologic adjustment only partly removes the bias (Ex. 29.5).

Effect modification by group

When the rate difference varies across groups, the linear model introduces bias. In Ex. 29.6, the individual IR was 5.0; the ecologic IR_G was 0.67, a complete reversal.

When bias is smaller

Conditions that reduce ecologic bias

Wide range of exposure across groups: more signal relative to within-group noise.
Small within-group exposure variance: group summaries are better proxies for individuals.
Strong exposure effect: large signal-to-bias ratio.
Similar confounder distributions across groups: limits group-level confounding.
Uniform rate difference across groups: avoids the effect-modification mechanism.

These conditions are arguments for choosing settings and groups carefully, not for ignoring the problem.

Carry forward

What to take into the next section

Ecological fallacy: group-level association wrongly applied to individuals (Robinson, 1950).
Atomistic fallacy: individual-level finding wrongly applied to groups (Schwartz, 1994).
Three bias sources: within-group misclassification (inflates away from null), group-level confounding, and effect modification by group.
Use the embedded simulators to build these failure modes before moving on.

Introduction and Overview

Earlier sections set up the design and the model. This section is about the trap. It introduces two complementary errors (the ecologic fallacy and the atomistic fallacy) then catalogues the three structural reasons ecologic estimates can mislead. Two interactive simulators let you build the failure modes yourself rather than just reading about them.

Learning Objectives

Define the ecologic fallacy (Robinson, 1950) and explain how a strong group-level association can mislead about individual-level effects.
Define the atomistic fallacy and identify population-level emergent properties (e.g., herd immunity) that have no individual analogue.
Identify the three structural sources of ecologic bias: within-group exposure misclassification, group-level confounding, and effect modification by group.
Predict the direction and magnitude of bias under each source and use that prediction to read published ecologic studies critically.

14.3 The Ecologic Fallacy

The ecologic fallacy is the error of assuming that a group-level association applies to individuals. A finding at the group level (e.g., exposure associated with 3x increased disease risk) does not necessarily mean this is true for individuals. This concept was formally named by Robinson (1950).

▸ INTERACTIVE STORY, THE ECOLOGICAL FALLACY
Open full screen ↗

Watch a country-level pattern flip when you zoom into individuals. Next ▶ advances scenes.

A 6-scene visualization of Simpson's paradox: wine consumption and life expectancy across countries shows a strong positive trend; zoom into France and the individual-level pattern reverses. Aggregate data answers aggregate questions.

The group-level bias typically exaggerates the association away from the null, but can occasionally reverse the direction of association.

14.3.1 The Atomistic Fallacy

The atomistic fallacy is the opposite error: assuming individual-level findings apply at the group level (Schwartz, 1994; Diez Roux, 1998). Populations have emergent properties not found in individuals. A classic example is herd immunity, a population-level phenomenon with no individual-level counterpart.

Key Distinction

The ecologic fallacy occurs when group-level findings are incorrectly applied to individuals. The atomistic fallacy occurs when individual-level findings are incorrectly applied to groups.

Hands-on: Ecological Fallacy Explorer

What you'll do: use the simulator below to set a within-group slope (how X relates to Y for individuals inside the same group) and a between-group slope (how group means relate), then watch the ecological regression line that an analyst would actually report. What to take away: when the within-group and between-group slopes have opposite signs, the ecological regression points the wrong way for individuals, and you have built the canonical failure mode by hand. Try the “Robinson 1950” preset first; it reproduces the original demonstration that gave the fallacy its name.

📊 Interactive: Ecological Fallacy Explorer

Each colored dot is a person, nested inside a group (e.g., a country or neighborhood). Adjust the within-group slope (how X & Y relate inside a group) and the between-group slope (how group means relate). When the two slopes have opposite signs, the ecological regression lies about the individual reality.

Individual-level data

Each dot = a person, colored by group. Black line = individual-level regression slope.

Group-level (ecological) data

Each large dot = a group's mean. Red line = ecological regression, what an ecological study reports.

Within-group slope (β_W) +0.7 How X relates to Y for individuals inside the same group.

Between-group slope (β_B) -1.00 How group means of X relate to group means of Y.

Number of groups 6 How many group clusters to draw.

People per group 25 How many individuals per group.

Within-group spread 1.00 How tightly individuals cluster inside their group.

Individual slope

Ecological slope

Sign reversal?

Presets:

Try this: set within-group to +1.0 and between-group to −1.0. Individuals show a clear positive relationship; but the ecological regression line points the wrong way. That is the ecological fallacy.

The simulator shows that the fallacy is real and structural, the same dataset can support opposite conclusions at the two levels. The next subsection names the three specific mechanisms by which the structural problem becomes a quantitative bias in real ecologic estimates.

14.4 Three Sources of Ecologic Bias

Each mechanism below corresponds to a different way the within-group slope and the between-group slope can come apart. The first is about exposure measurement, the second is about confounders that look different at the two levels, and the third is about the mathematical mismatch between linear and logit models. The canonical treatment of these mechanisms is Greenland & Robins (1994) and Morgenstern (1995).

14.4.1 Within-Group Misclassification (Bias)

Non-differential misclassification at the individual level biases group-level estimates AWAY from the null (opposite direction from individual-level studies). This is given by:

Effect of misclassification on IR_G (Eq. 29.3)

\[ \color{#0B7B6B}{\text{IR}_G} = 1 + \frac{\color{#1D4ED8}{\text{IR}} - 1}{\color{#C2410C}{\text{Se}} + \color{#6D28D9}{\text{Sp}}\,\color{#1D4ED8}{\text{IR}} - \color{#1D4ED8}{\text{IR}}} \]

The observed group rate ratio is distorted by exposure misclassification: it depends on the true rate ratio, the sensitivity, and the specificity of group-level exposure measurement. The usual effect is to inflate this ratio away from the null; when the true effect is strong, even modest errors can reverse its apparent direction.

Where IR is the true individual-level incidence rate ratio, Se is sensitivity, and Sp is specificity. The example of a school CRD study (Example 29.4) demonstrated how misclassification at the individual level inflates group-level estimates.

Here is the intuition behind that reversal of the usual rule. Mismeasuring exposure squeezes the groups' measured exposure into a narrower range, while their disease rates stay where they are. Fitting a line through the same vertical spread over a shorter horizontal span forces the slope to be steeper, and a steeper slope relative to the intercept is exactly what a larger group-level rate ratio means.

14.4.2 Group-Level Confounding

Group-level confounding arises from differential distribution of individual-level risk factors across groups. Critically, even factors that are NOT confounders at the individual level can cause confounding at the group level.

Controlling for extraneous risk factors in ecologic analysis generally only removes part of the bias. Example 29.5 showed confounding that produces biased IR_G even when there is no confounding at the individual level.

14.4.3 Effect Modification (Interaction) by Group

When the rate difference at the individual level varies across groups, non-linearity is introduced: the linear model at group level assumes additivity, but the logit model at individual level is inherently non-linear.

Example 29.6 is striking: effect modification by group completely reversed the direction of association. The true individual-level IR was 5.0, but the ecologic IR_G was 0.67, making a harmful exposure appear protective at the group level.

14.4.4 When Cross-Level Bias Is Less Likely

Conditions Minimizing Ecologic Bias

Cross-level (ecologic) bias will NOT occur if:

The incidence rate difference within groups is uniform across groups, AND
There is no correlation between group-level exposure and the rate of the outcome in the unexposed

Ecologic bias is LESS likely when:

There is a large observed range of exposure across groups
There is small within-group variance of exposure (homogeneous groups)
Exposure is a strong risk factor varying in prevalence across groups
Distribution of extraneous risk factors is similar among groups (little group-level confounding)
Include positive and negative health controls to strengthen ecologic evidence

Hands-on: MAUP Sandbox

What you'll do: the second simulator below holds the underlying individual-level data fixed and changes only how zone boundaries are drawn around the same people. What to take away: the area-level correlation can change dramatically (even flip sign) without anything about the people changing. This is the modifiable areal unit problem (Fotheringham & Wong, 1991), and it is the reason ecologic results are sensitive to choices that look administrative rather than scientific. Try each zoning preset; same individuals, different ecological “findings.”

🧭 Interactive: MAUP Sandbox, Same People, Different Zones

A 12×12 grid of people, each with an exposure value (X) and outcome (Y). The individual correlation is fixed; but the area-level correlation depends entirely on how you draw the boundaries. Try each zoning scheme: same people, very different ecological "findings." That is the modifiable areal unit problem (Fotheringham & Wong, 1991).

Individual data (with zoning overlay)

Each tile = a person, shaded by their X value. Black borders show the chosen zones.

Zone-level (ecological) scatter

Each dot = one zone's mean X vs. mean Y. Red line = ecological regression.

Zoning scheme:

True individual correlation (ρ) +0.40

Spatial autocorrelation in X 0.50

Individual correlation

Zone-level correlation

Number of zones

Same 144 people. Now click through each zoning scheme and watch the ecological correlation jump around (sometimes flipping sign) while the individual correlation stays put. That's MAUP.

R Build the ecological fallacy from scratch

What you'll do: simulate three groups whose means rise together but whose individuals within each group are uncorrelated. Compute the within-group correlations, the pooled (overall) correlation, and the group-level (ecologic) correlation, then visualise all three on one scatterplot.

What to take away: a strong group-level correlation can coexist with near-zero individual-level association, that gap is the ecological fallacy in numbers.

set.seed(230)

# Three groups; means line up positively, individuals are flat within group.
group <- rep(c("A", "B", "C"), each = 50)
x <- c(rnorm(50, 2), rnorm(50, 5), rnorm(50, 8))
y <- c(rnorm(50, 3), rnorm(50, 6), rnorm(50, 9))

# Pooled (individual-level) correlation -- dominated by group means
cor(x, y)

# Within each group (truth at the person level)
tapply(seq_along(x), group, function(i) cor(x[i], y[i]))

# Group-level (ecologic) correlation: nearly perfect by construction
gmean <- aggregate(cbind(x, y), list(group = group), mean)
cor(gmean$x, gmean$y)

# Stretch: visualise the discrepancy
plot(x, y, col = factor(group), pch = 19,
     xlab = "X", ylab = "Y",
     main = "Ecological fallacy: groups separate, individuals flat")
points(gmean$x, gmean$y, pch = 8, cex = 3, col = "black")

Console output (key lines)

[1] 0.937 # pooled cor(x, y) -- looks very strong A B C -0.069 0.014 -0.041 # within-group correlations -- near zero [1] 0.99999 # group-mean (ecologic) correlation -- nearly perfect

Reading the three numbers. The within-group correlations are essentially zero. The group-mean correlation is essentially 1. The pooled correlation lies in between but is dominated by between-group variation. Concluding from the ecologic 0.9999 that there is an individual-level relationship would be the ecological fallacy.

R Reflect on what you just ran

Use the questions below to interpret the output you produced. Look at your console and plot before answering.

1. Compare the three within-group correlations (close to 0) with the group-level correlation (~1.0). Which of these two numbers describes the relationship for an individual person, and which describes the relationship between groups?

Model answerThe within-group correlations (close to 0) describe the relationship for individual people: within each cluster, the exposure and outcome do not move together, the slope is flat. The group-level correlation of ~1.0 describes only the relationship between the group means; the three coloured clusters sit perfectly along a diagonal even though no individual contributes to that diagonal. Two units of analysis, two different answers, the structural fact that produces the ecological fallacy.

2. Look at the scatterplot. The three coloured clusters appear flat (no slope) within themselves but rise diagonally as a whole. Describe in your own words why the pooled correlation (~0.94) is so close to the ecologic correlation rather than to the within-group correlations.

Model answerThe pooled correlation mixes within-group (flat) and between-group (steep) variation. Because the between-group variance dominates the spread of the points overall (the three clusters are far apart on both axes) the pooled regression line tracks the diagonal between cluster centres rather than the within-cluster flat lines. The arithmetic of an overall correlation is dominated by whichever level of variation is largest; here the group-mean spread is so much bigger than within-cluster spread that the pooled r is essentially the ecologic r.

3. Imagine a researcher only had access to the three group-mean stars (no individual dots), e.g., country-level averages of two variables. State the conclusion they would draw and, using the simulation, explain precisely how that conclusion would be wrong at the individual level.

Model answerLooking only at the three group-mean stars, the researcher would conclude that more exposure means more outcome, a strong positive individual-level relationship. The simulation shows that conclusion is wrong for individuals: within any of the three groups, the exposure-outcome slope is zero. The mistake is the ecological fallacy: an inference about people made from aggregated data assumes that within-group structure matches between-group structure. Whenever the unit of analysis differs from the unit of inference, this risk is built in, and the only sure remedy is individual-level data.

Saved.

Reflection

A researcher finds that countries with higher per-capita chocolate consumption have more Nobel Prize winners. They conclude that eating chocolate makes individuals smarter. Identify the inferential error being made and explain why this conclusion is problematic. What confounders might explain the group-level association?

Model answerThis is the textbook ecological fallacy: an inference about individuals ("eating chocolate makes you smarter") drawn from a correlation between group averages. The fallacy assumes that within-country variation mirrors between-country variation, which the data do not show. Plausible confounders at the group level: GDP per capita and educational investment (rich countries can afford both more chocolate and more research universities); cultural factors (European countries with strong scientific traditions also have established confectionery industries); reporting completeness (Nobel committees and chocolate-industry statistics both better in OECD countries). Any of these would generate the country-level correlation with no individual-level effect of chocolate on cognition.

Minimum 20 characters required.

✓ Reflection saved

Knowledge Check; this section

1. What is the ecologic fallacy?

Assuming that individual-level findings apply to groups Assuming that group-level findings apply to individuals A type of selection bias in ecologic studies

The ecologic fallacy occurs when associations observed at the group level are incorrectly assumed to hold at the individual level. Robinson (1950) famously demonstrated this error.

2. How does non-differential exposure misclassification at the individual level affect ecologic study estimates?

It biases the association toward the null, just as in individual-level studies It biases the group-level association AWAY from the null It has no effect on group-level estimates

Unlike individual-level studies where non-differential misclassification biases toward the null, in ecologic studies it biases the group-level IR and ID away from the null, the opposite direction.

3. In Example 29.6, effect modification by group caused the ecologic IR_G to be 0.67 when the true individual-level IR was 5.0. What does this demonstrate?

That ecologic studies always underestimate effects That group-level bias can actually reverse the direction of an association That individual-level data is always unreliable

This is a striking example of how ecologic bias from effect modification can distort the magnitude of an association and even completely reverse its direction, making a harmful exposure appear protective at the group level.

Section 4 of 5

Reducing Bias & Non-Ecologic Group Studies

⏱ Estimated reading time: 45 minutes

Section 4 of 5

Reducing Bias & Non-Ecologic Group Studies

Analytic strategies for recovering individual-level signal, and the design that sidesteps the fallacy entirely.

Strategy 1

Multilevel modelling (Diez Roux, 1998)

Individuals are nested within groups. The model estimates effects at both levels simultaneously:

Individual predictors explain within-group variation.
Group predictors explain between-group variation.
Random effects capture residual group-level variance.

Diez Roux (1998) argued this framework distinguishes contextual effects (due to the group) from compositional effects (due to who is in it).

Strategy 2

The Wakefield & Haneuse two-phase design (2008)

The idea

Outcome-dependent sampling within groups links individual-level data to group-level data, without requiring complete individual information across all groups.

The payoff

Individual-level signal is recovered at a fraction of the cost of a full cohort. Particularly useful when group-level data already exist from routine surveillance.

Non-ecologic group studies

When groups are the right unit

The etiology of a case: why did this person get sick? The etiology of incidence: why do populations have different disease rates?Geoffrey Rose, 2001

Data collected at the group level with inferences directed at the group level do not produce the ecological fallacy. The fallacy only appears in the cross-level move.

Vaccination coverage vs. measles incidence across districts: a genuine group-level question.
Community bike-share program vs. community-level cardiovascular mortality: group inference, no cross-level error.

Reading checklist

Quality of published ecologic research (Dufault & Klar, 2011)

18%justified choice of ecologic units

42%adequately justified the ecologic design

54%used fewer than 100 group observations

~0%sufficiently discussed ecologic bias

Use these as a reading checklist: did the authors justify the unit, state the inference level, discuss bias sources, and report enough groups?

Wrapping up

What to take into the final section

Multilevel modelling (Diez Roux, 1998): estimates individual and group effects simultaneously, separates contextual from compositional effects.
Two-phase design (Wakefield & Haneuse, 2008): links individual and group data through outcome-dependent subsampling.
Group-level inference avoids the fallacy entirely when both data and conclusions stay at the group level.
Rose's distinction: etiology of a case (individual) versus etiology of incidence (population).

Introduction and Overview

An earlier section catalogued the failure modes. This section is the constructive response. The first half names analytic strategies that can pull part of the individual-level signal out of group-level data; the second half makes a distinction the textbook treatment often glosses over, that some studies use group-level data without committing the ecologic fallacy at all, because their inferences also stay at the group level.

Learning Objectives

Describe analytic strategies for reducing ecologic bias, including multilevel modelling (Diez Roux, 1998) and the Wakefield & Haneuse two-phase design (2008).
Distinguish ecologic from non-ecologic group-level studies based on the level at which inferences are drawn, not the level at which data are collected.
Apply Rose’s distinction between “the etiology of a case” and “the etiology of incidence” to research questions you encounter.
Use the Dufault & Klar (2011) reporting-quality findings as a reading checklist for any ecologic study you appraise.

14.5 Minimizing Ecologic Bias

Ecologic bias is less of a problem when certain conditions are met (see an earlier section summary). Additionally, researchers can employ specific analytical strategies:

14.5.1 Analysing Ecologic Data

Multilevel modelling (MLM): Combines individual-level and group-level data to distinguish individual-level effects from contextual (group-level) effects (Diez Roux, 1998). Validates assumptions and investigates random effects.
Two-phase design (Wakefield & Haneuse, 2008): Links individual-level data with ecologic data using outcome-dependent sampling within groups, reducing the need for complete individual-level information.
Prior information: Importance of prior knowledge about within-area probabilities and contextual effects when making inferences.

The strategies above try to repair ecologic data so it can speak about individuals. The next subsection makes the alternative move: keep the inference at the group level and acknowledge that groups are sometimes the right unit of scientific interest in their own right.

14.6 Non-Ecologic Group-Level Studies

Not all studies using group-level data are ecologic studies. A critical distinction:

The Key Difference

When variables are measured at the group level AND inferences remain at the group level → NOT ecologic. The group as the aggregate-scale of interest studying how group-level characteristics (population density, policies, social environments) affect group-level outcomes.

Examples of non-ecologic group-level studies include:

Health promotion programs targeting communities, with outcomes measured at the community level
Vaccination campaigns evaluated by population-level coverage and population-level incidence
Organizational interventions in clinics or hospitals, with organization-level outcomes

14.6.1 The Question of Inference Level

Rose (2001) distinguished two key epidemiological questions:

"What is the etiology of a case?" This is an individual-level question, seeking to understand why a particular person became ill.
"What is the etiology of incidence?" This is a population-level question, seeking to understand why populations have different disease rates.

Both questions are important; the appropriate level of analysis depends on the research question. The atomistic fallacy arises when researchers reduce all phenomena to individual-level explanations, ignoring emergent group properties.

14.6.2 Quality of Current Ecologic Research

Dufault & Klar (2011) reviewed the reporting quality of ecologic studies and found concerning patterns:

Only 18% explicitly justified their choice of ecologic units
97% of outcomes were aggregate in nature
54% relied on fewer than 100 group-level observations
Only 42% adequately justified why an ecologic design was necessary
Most studies did not sufficiently inform readers about possible ecologic bias

Reflection

Consider a city that wants to evaluate whether its new bicycle-sharing program has reduced cardiovascular disease rates. Would an ecologic design or individual-level design be more appropriate? What are the trade-offs? How might you combine both approaches using multilevel modelling?

Model answerFor evaluating a city-wide bike-share programme, an ecologic (interrupted time-series or comparative interrupted time-series with control cities) design is the natural fit because the intervention exists only at the city level, you cannot randomise individuals to live in a city with bike-share. Strengths: low cost, uses routinely collected CVD outcome data, captures population-level effect of interest. Weakness: the ecological fallacy, you can't tell whether riders or non-riders changed. Combining: a multilevel design with individuals nested in neighbourhoods nested in cities lets you estimate (a) the city-level effect of bike-share access on CVD rates, (b) the individual-level effect of personal bike-share use on the same outcome, and (c) the interaction (does the city-level effect depend on individual use?). This is the design that bridges the two scales of inference.

Minimum 20 characters required.

✓ Reflection saved

Knowledge Check; this section

1. Which of the following conditions makes ecologic bias LESS likely?

Large within-group variance of exposure Small observed range of exposure across groups Distribution of extraneous risk factors is similar among groups

When the distribution of extraneous risk factors is similar across groups, there is little group-level confounding, which is one of the major sources of ecologic bias.

2. What is multilevel modelling (MLM) in the context of ecologic studies?

A method that only uses group-level data An approach that combines individual- and group-level data to reduce cross-level inferential errors A technique for increasing sample size in ecologic studies

MLM integrates data from multiple levels (individual and group), helping to distinguish individual-level effects from group-level (contextual) effects and reducing the risk of ecologic fallacy.

3. When is a group-level study NOT considered an ecologic study?

When the study uses aggregate variables When variables are measured at the group level and inferences remain at the group level When the sample size is very large

Ecologic studies involve making inferences about individuals from group-level data. If the variables are measured at the group level and the inferences are also directed at the group level, this is a non-ecologic group-level study and does not pose the same cross-level inferential problems.

Section 5 of 5

Final Assessment

⏱ Estimated reading time: 20 minutes

Bringing It All Together

This lesson moved from the rationale for ecologic designs through the variables and models they use, to the structural traps that make them treacherous, and finally to the analytic and conceptual responses to those traps. The arc was deliberate: every step was preparation for being able to read a published group-level study without overclaiming or underclaiming what its data actually support.

The single most important idea to carry forward is the one Robinson named in 1950: a finding that holds at the group level need not hold at the individual level, and the reverse (the atomistic fallacy) is equally damaging in the other direction. The three structural sources of ecologic bias (within-group misclassification, group-level confounding, effect modification by group) explain why the cross-level move can fail. Multilevel modelling and the two-phase design explain how to bring some of that signal back without abandoning the group-level data we already have. Rose’s distinction between the etiology of a case and the etiology of incidence reminds us that the right unit of analysis depends on the question being asked, not on which is more familiar from earlier lessons.

A later lesson takes the next step: how the constructs we measure (exposure, outcome, confounder) are defined and operationalised in the first place, the conceptualisation step that determines whether any of the cross-level inference machinery in this lesson can do the work we want it to.

R Activity, Seeing the ecological fallacy in simulated data

The companion R script r-activities/HSCI_230_Lesson_6_Ecological_and_Group_Level_Studies.R simulates three groups whose group means line up almost perfectly while individuals within each group are essentially uncorrelated. You will compute the pooled correlation, the within-group correlations, and the group-mean (ecologic) correlation, and watch the three numbers diverge, a worked demonstration of the cross-level inference trap that defines this lesson.

set.seed(230)

# Three groups; means line up positively, individuals are flat within group.
group <- rep(c("A", "B", "C"), each = 50)
x <- c(rnorm(50, 2), rnorm(50, 5), rnorm(50, 8))
y <- c(rnorm(50, 3), rnorm(50, 6), rnorm(50, 9))

# Individual-level correlation (overall - mostly driven by group means)
cor(x, y)

# Individual-level correlation WITHIN each group (truth at the person level)
tapply(seq_along(x), group, function(i) cor(x[i], y[i]))

# Group-level (ecologic) correlation: nearly perfect by construction
gmean <- aggregate(cbind(x, y), list(group = group), mean)
cor(gmean$x, gmean$y)

Key Takeaways from this lesson

Ecologic studies measure exposure and outcome at the group level; partial ecologic studies mix individual- and group-level variables, which introduces its own inferential challenges.
The ecologic fallacy (Robinson, 1950) is the error of assuming a group-level association applies to individuals; the atomistic fallacy (Schwartz, 1994) is the symmetric error in the other direction.
Three structural sources of ecologic bias, within-group exposure misclassification, group-level confounding, and effect modification by group, explain why the cross-level move so often fails.
The ecologic linear model estimates a group-level rate ratio that requires extrapolation to 0% and 100% exposure, well beyond most observed data.
Multilevel modelling and the Wakefield–Haneuse two-phase design are the principal analytic strategies for recovering individual-level signal from group-level data.
Not every group-level study is ecologic: when both data and inferences stay at the group level, the cross-level fallacies do not apply, Rose’s “etiology of incidence” is a legitimate and important question in its own right.

Reflection

Reflecting on this lesson, describe a scenario from public health or your field of interest where an ecologic study design would be the most practical and informative approach. What safeguards would you implement to minimize the risk of the ecologic fallacy?

Model answerStrong scenario: evaluating the public-health impact of a province-wide sugar-sweetened-beverage tax. The intervention is purely group-level (policy applied uniformly within province), individual purchase data are not centrally available, and outcomes (obesity prevalence, diabetes incidence) are tracked through administrative data, an ecologic / interrupted time-series design is the most informative and feasible. Safeguards against the ecologic fallacy: (a) pre-specify that conclusions are about the population-level effect, not individual choice; (b) use control jurisdictions (other provinces without the tax) for a comparative interrupted time-series; (c) supplement with cross-sectional individual-level surveys (CHMS, CCHS) where possible; (d) include sub-population analyses by SES; (e) discuss the inferential limits explicitly in the abstract.

Minimum 20 characters required.

✓ Reflection saved

Final Knowledge Assessment

Congratulations!

You have successfully completed this lesson: Ecological and Group-Level Studies. You now understand the principles, strengths, and limitations of ecologic studies, including the three types of variables, sources of bias, the ecologic and atomistic fallacies, and strategies for reducing inferential errors through integration of individual and group-level data.

Earlier lessons covered the four major observational designs. A later lesson (Conceptualization, Measurement, and Causal Specification) steps back from study design to the prior question every design depends on: how do we move from a vague research question to a measurable variable, and how do we specify the causal model the analysis is supposed to test? It is the bridge into the bias material that occupies later lessons.

HSCI 230, Lesson 6

Evaluating Epidemiological Research

Ecological and Group-Level Studies

Learning objectives for this lesson:

Glossary: Key Terms, People & Concepts

Introduction & Rationale for Group-Level Studies

Introduction & Rationale for Group-Level Studies

Group data, individual inference

What makes a study ecologic

Unit of analysis

Variants

Ecologic designs in practice

Idaho groundwater (Ex. 29.1)

US bladder cancer (Ex. 29.2)

Four rationales, one warning

Measurement constraint

Exposure homogeneity

Group-level interest

Simplicity

Three ecologic designs

Time-trend

Multi-group

Mixed (panel)

What to take into the next section

Introduction and Overview

Learning Objectives

14.1 What Are Ecologic Studies?

Key Limitation

14.1.1 Examples of Ecologic Studies

14.1.2 Rationale for Ecologic Studies

Reflection

Types of Ecologic Variables & The Linear Model

Types of Ecologic Variables & The Linear Model

Three categories of ecologic variable

Aggregate

Environmental

Global

The ecologic linear model

Group-level incidence rate ratio

Three issues in published ecologic work

What to take into the next section

Introduction and Overview

Learning Objectives

14.2 Categories of Ecologic Variables

14.2.1 The Linear Model in Ecologic Studies

14.2.2 Modelling Issues

Reflection

Inferential Errors & Sources of Ecologic Bias

Inferential Errors & Sources of Ecologic Bias

The ecological fallacy (Robinson, 1950)

The atomistic fallacy (Schwartz, 1994)

Herd immunity

The symmetry

Within-group misclassification

Group-level confounding and effect modification

Group-level confounding

Effect modification by group

Conditions that reduce ecologic bias

What to take into the next section

Introduction and Overview

Learning Objectives

14.3 The Ecologic Fallacy

14.3.1 The Atomistic Fallacy

Key Distinction

Hands-on: Ecological Fallacy Explorer

📊 Interactive: Ecological Fallacy Explorer

Individual-level data

Group-level (ecological) data

14.4 Three Sources of Ecologic Bias

14.4.1 Within-Group Misclassification (Bias)

14.4.2 Group-Level Confounding

14.4.3 Effect Modification (Interaction) by Group

14.4.4 When Cross-Level Bias Is Less Likely

Conditions Minimizing Ecologic Bias

Hands-on: MAUP Sandbox

🧭 Interactive: MAUP Sandbox, Same People, Different Zones

Individual data (with zoning overlay)

Zone-level (ecological) scatter

R Reflect on what you just ran

Reflection