# Lesson 6 — Ordinal & Multinomial Models (v3 expanded)

*Companion-podcast transcript • Sarah & Kiffer* 
*~5,421 words • ~29 min audio*

---

**Sarah:** Welcome back to Office Hours. I'm Sarah.

**Kiffer:** And I'm Kiffer. Today we're working through Lesson six, Ordinal and Multinomial Models. And honestly, this is one of those lessons where the conceptual move is small but the practical payoff is huge.

**Sarah:** What do you mean by that, small move but big payoff?

**Kiffer:** Well, up to this point we've been living inside binary logistic regression. Disease, no disease. Event, no event. Two outcomes. And that framework is incredibly powerful, but it leaves a lot of real public health outcomes on the table. Because in the wild, outcomes very often have more than two categories. Sometimes those categories are ordered, sometimes they aren't. The conceptual move in this lesson is just acknowledging that, and the payoff is a whole new family of models that handle multi-category outcomes properly.

**Sarah:** Right. And the way the lesson splits the world is exactly along that line. If the categories are ordered, you reach for an ordinal model. The workhorse there is the proportional odds model. If the categories aren't ordered, you reach for a multinomial model. Multinomial logistic regression, sometimes called polytomous logistic regression.

**Kiffer:** And we're going to spend the bulk of this episode on those two model families. The lesson actually mentions two more specialized cousins, the adjacent-category model and the continuation-ratio model, and we'll touch on those briefly at the end. But the two big workhorses you'll see in the literature, ninety percent of the time, are proportional odds and multinomial. So that's where we'll focus.

**Sarah:** Before we dive in, I want to set up a piece of vocabulary that's going to be everywhere in this episode. Categorical outcomes.

**Kiffer:** Yeah, define that for someone who's brand new.

**Sarah:** A categorical outcome is just an outcome that takes one of a fixed set of discrete values. Categories. Not numbers on a continuous scale. So height in centimetres is continuous. But pain severity coded as none, mild, moderate, severe is categorical. Type of insurance, like public, private, or uninsured, is categorical. Voting choice in an election with five candidates is categorical.

**Kiffer:** And within categorical outcomes, the lesson draws a line between two flavors. Nominal and ordinal.

**Sarah:** Nominal categories have no natural ordering. Type of cancer. Lung, breast, colon, prostate. There's no sense in which lung is greater than breast, or colon is between the two. They're just different categories.

**Kiffer:** And ordinal categories do have a natural order, but the spacing between them isn't necessarily numeric. None, mild, moderate, severe pain. Those are clearly ranked. Severe is more than mild. But you can't say severe is exactly two point three times mild. The order is real but the distances between categories are vague.

**Sarah:** And that distinction is going to drive everything that follows. If your outcome is ordinal, exploit the ordering. You'll get a more parsimonious model. If your outcome is nominal, don't try to force an ordering that isn't there. Use the model that respects the absence of order.

**Kiffer:** There's also a tempting middle path that I want to flag and warn against. Some people, when faced with an ordinal outcome, just treat it as numeric. They code none as zero, mild as one, moderate as two, severe as three, and run an ordinary linear regression. And that approach can produce reasonable-looking output, but it makes a really strong assumption that's almost always wrong.

**Sarah:** What's the assumption that's almost always wrong?

**Kiffer:** That the spacing between categories is equal. That the difference between none and mild is exactly the same magnitude as the difference between moderate and severe. And in almost every real outcome, that's not true. The jump from no pain to mild pain is not equivalent to the jump from moderate pain to severe pain. One is the threshold of perception, the other is closer to incapacitating. Treating them as equal numeric distances throws away information and can mislead your inferences.

**Sarah:** So an ordinal model gets you the ordering without committing to a specific numeric spacing. That's the right tool for the job.

**Kiffer:** Okay. Section one of the episode. Ordinal models. Let's set the scene with examples first, because I find this is the easiest way to recognize when you'd reach for an ordinal model.

**Sarah:** Yeah, and the lesson gives several. Disease severity coded as none, mild, moderate, severe. That's classic ordinal. A Likert agreement scale where someone responds strongly disagree, disagree, neutral, agree, strongly agree. Self-rated health, where respondents report poor, fair, good, very good, excellent.

**Kiffer:** And these come up everywhere in epidemiology. Self-rated health is one of the most studied outcome measures in all of population health research. There's decades of evidence that a single ordinal item, where someone rates their own health on a five-point scale, predicts mortality and morbidity better than many objective measures. So this isn't a niche tool. This is core machinery for a huge class of public health questions.

**Sarah:** So the question becomes, given an ordered categorical outcome with say four or five levels, how do you model it? And the dominant answer in epidemiology is the proportional odds model. Sometimes called the cumulative logit model. Sometimes called ordinal logistic regression. Three names for essentially the same beast.

**Kiffer:** Let's build it from the ground up. Imagine we have an outcome with K categories, where K is some whole number bigger than two. Let's say K equals four. Pain severity. None, mild, moderate, severe.

**Sarah:** And the trick the proportional odds model plays is to convert a four-category problem into a series of binary problems. K minus one of them. So if K equals four, we have three binary problems.

**Kiffer:** And the way you set up those binary problems is the key. Each one asks the question, is the outcome at or below this category, versus above this category? So with our pain example, the first binary problem is, is the patient at none, versus mild or higher? The second is, is the patient at none or mild, versus moderate or higher? And the third is, is the patient at none, mild, or moderate, versus severe? Three cumulative cutpoints.

**Sarah:** And those cumulative probabilities are the foundation. We're not modelling the probability of being in any single category directly. We're modelling the cumulative probability of being at or below each cutpoint.

**Kiffer:** Right. And then on top of that, we apply a logit transformation, just like in binary logistic regression. The logit of the cumulative probability gives us a linear scale we can model with predictors. So at each of those three cutpoints, we have a logistic regression equation.

**Sarah:** There's also a nice intuition the lesson offers, which is to think about the proportional odds model as having a hidden continuous variable underneath. A latent variable. Imagine that pain severity is really a continuous quantity, but we only observe a coarsened version of it through the four categories. The cutpoints divide the latent continuum into the four observed categories.

**Kiffer:** And that latent variable framing is really useful pedagogically because it explains why the slopes are shared across cutpoints. The predictors are acting on the latent variable. The cutpoints just slice the latent variable into observable bins. The same slope on the latent variable produces the same effect on the cumulative probability at every cutpoint, because the cutpoints are fixed positions.

**Sarah:** Okay. And here's where the magic happens. The proportional odds assumption.

**Kiffer:** Yeah, this is the key constraint. The proportional odds assumption says that the predictor coefficients are the same at every cutpoint. The slope on age is the same whether you're modelling the cutpoint between none and the rest, or between mild and the rest, or between moderate and the rest. Only the intercept changes from cutpoint to cutpoint.

**Sarah:** Geometrically you can think of it as a set of parallel lines on the logit scale. K minus one parallel lines. Same slopes. Different intercepts.

**Kiffer:** And what that buys you is enormous parsimony. Instead of K minus one separate sets of coefficients, one for each cutpoint, you have a single set of coefficients that applies across all of them. One odds ratio per predictor, period.

**Sarah:** And the interpretation of that single odds ratio is really clean. Take a predictor like age, with an odds ratio of one point five. That means a one-unit increase in age multiplies the odds of being in a higher pain category, versus a lower one, by one point five. And that one point five applies at every cutpoint. From none versus everything-else, all the way up to severe versus everything-else.

**Kiffer:** And contrast that with what you'd have to report if you didn't have the proportional odds assumption. You'd have three different odds ratios per predictor, one at each cutpoint. For ten predictors, that's thirty odds ratios. Now multiply by all the predictors in a real model and you can see why parsimony matters. The proportional odds assumption gives you a clean, interpretable summary.

**Sarah:** But of course, you only get to claim that clean interpretation if the assumption actually holds. So how do you check it?

**Kiffer:** Yeah, this is where things get practical. The two main tests in the literature are the Brant test and the score test.

**Sarah:** Walk us through the Brant test first, because it's the most commonly used.

**Kiffer:** The Brant test, developed by Rollin Brant in 1990, fits the proportional odds model and then does a Wald-type comparison. It asks, if we let each predictor have its own coefficient at each cutpoint, would those coefficients differ significantly from a single shared coefficient? It gives you both an overall test for the model as a whole, and individual tests for each predictor, so you can see exactly which variables are violating the assumption.

**Sarah:** And the score test is a related approach. It looks at whether the score function, which is essentially the slope of the log-likelihood, is consistent with the proportional odds restriction. Different test statistic, similar idea.

**Kiffer:** And the practical workflow is, fit the proportional odds model, run a Brant test, look at the overall p-value and the per-predictor p-values. If everything is consistent with proportional odds, great, you report the model as is. If one or two predictors violate the assumption, that's where partial proportional odds comes in.

**Sarah:** One caveat about the Brant test that's worth mentioning. With very large sample sizes, the test will reject the proportional odds assumption even when the deviations are tiny and substantively unimportant. Statistical significance and practical significance diverge. So a smart analyst doesn't just look at the p-value. They also look at the magnitude of the deviation. Plot the per-cutpoint coefficients and see whether they're really different in a way that matters, or just statistically distinguishable in a huge sample.

**Kiffer:** Yeah, that's a great point. The test is a tool, not an oracle. Use it alongside judgment about whether the violations matter substantively.

**Sarah:** Let's define partial proportional odds carefully. This is one of the most useful pragmatic tools in the ordinal modeling toolkit.

**Kiffer:** A partial proportional odds model relaxes the proportional odds assumption for selected predictors only. The ones that fail the Brant test. While maintaining the proportional odds constraint for all the rest.

**Sarah:** So you get the parsimony of proportional odds for the predictors that play nicely. And you get the flexibility of separate coefficients at each cutpoint for the predictors that don't. It's a compromise, but it's often the right compromise.

**Kiffer:** And the lesson points out something really important. The proportional odds assumption is often violated in practice. Especially when you have many predictors, or when the outcome categories represent very different phenomena. Like a self-rated health scale where the difference between poor and fair might really be about chronic illness, while the difference between very good and excellent might really be about subjective wellbeing. Those are different mechanisms, and they may not respond to the same predictors in the same way.

**Sarah:** And if the assumption fails badly across many predictors, the alternatives keep going. You can fit a fully multinomial model, which we'll get to in a minute, and just ignore the ordering. You give up the parsimony entirely, but you get an honest model that doesn't impose constraints the data don't support.

**Kiffer:** Or you can use what's called a generalized ordinal logistic regression model, which essentially fits K minus one binary logistic regressions simultaneously, with separate coefficients at each cutpoint. That's basically the unconstrained version of proportional odds.

**Sarah:** There are a couple of other specialty options the lesson mentions. The stereotype logistic model. The heterogeneous choice logistic model. These are more advanced and less common, but it's worth knowing they exist. The point is, if proportional odds fails, you have options. You're not stuck.

**Kiffer:** Quick aside on naming, because students get confused. Some textbooks define the proportional odds model in terms of being at or above each cutpoint. Others define it in terms of being at or below. Mathematically they're equivalent. Just signs flipping. But it can make a difference for whether your odds ratio is a number above or below one. So when you read a paper, check the convention before you interpret the magnitude.

**Sarah:** Yeah that's a good catch. Practical detail that can confuse you when you're starting out.

**Kiffer:** Let me ground all of this in a concrete example, because I think the proportional odds idea really clicks once you see numbers.

**Sarah:** Yeah, let's see the numbers, that always helps it click.

**Kiffer:** The lesson uses the example of Apgar scores. An Apgar score is a quick assessment of newborn health, measured at one and five minutes after birth, scoring zero to ten on five domains. Higher is better. The textbook collapses the score into four ordinal categories. One through six, which is low. Seven. Eight. And nine through ten, which is high. So we have an ordered four-category outcome.

**Sarah:** And the question is whether the number of prenatal visits is associated with the Apgar score category. Specifically, comparing mothers with six or more prenatal visits to those with fewer than six.

**Kiffer:** Run the proportional odds model and you get a single odds ratio for prenatal visits of one point five nine. The interpretation is, mothers with six or more prenatal visits have one point five nine times the odds of being in a higher Apgar category, compared to mothers with fewer prenatal visits. And critically, that one point five nine applies at every cutpoint. The cutpoint between low and seven-or-higher, the cutpoint between seven-or-lower and eight-or-higher, and the cutpoint between eight-or-lower and high.

**Sarah:** One number. Three cutpoints. That's the whole appeal of the proportional odds model.

**Kiffer:** Okay. Section two of the episode. Multinomial models. Now we're switching from outcomes that are ordered to outcomes that aren't.

**Sarah:** And the examples here look really different. Insurance type. Public, private, employer-based, uninsured. Cancer type. Lung, breast, colon, prostate. Voting choice. Liberal, Conservative, NDP, Green, Bloc. Mode of transportation to work. Drive, transit, bike, walk. None of those have a natural order.

**Kiffer:** And in epidemiology and health services research, you see multinomial outcomes all the time. Choice of treatment among several alternatives. Healthcare facility used. Reason for visit. The methodology was actually developed largely by economists and political scientists, and there's a beautiful Nobel-prize-winning history there with Daniel McFadden, who shared the 2000 prize partly for this work.

**Sarah:** So how does multinomial logistic regression work? Let's build it up the same way we built up proportional odds.

**Kiffer:** Take K categories. Pick one of them as the reference category. Then fit K minus one logistic regressions simultaneously, each one comparing one of the non-reference categories to the reference.

**Sarah:** So if you have four insurance categories and you pick uninsured as the reference, you fit three logistic regressions. Public versus uninsured. Private versus uninsured. Employer-based versus uninsured. Three separate sets of coefficients. K minus one sets total.

**Kiffer:** And critically, all three regressions are estimated jointly. That matters because the categories aren't independent. If a person is in private insurance, by definition they're not in any other category. So the model accounts for the correlation among the comparisons in the joint estimation.

**Sarah:** Now contrast this with the proportional odds model. In proportional odds, we have one set of coefficients that applies across K minus one cutpoints. In multinomial, we have K minus one separate sets of coefficients, one per non-reference category.

**Kiffer:** And that difference is the cost of generality. Multinomial doesn't impose any constraints across categories, so it works for ordered or unordered outcomes. But you pay for that flexibility in parameters. With ten predictors and four categories, you've got ten times three, that's thirty coefficients per predictor across the model. Lots of numbers to interpret.

**Sarah:** Speaking of interpretation, this is actually one of the trickiest parts of multinomial models. The exponentiated coefficients in a multinomial model are technically called relative risk ratios. Or R R R. Not strictly odds ratios.

**Kiffer:** Define that distinction carefully because students stumble here.

**Sarah:** In a binary logistic regression, the exponentiated coefficient is an odds ratio. The ratio of the odds of the outcome under one condition versus another. Clean. In a multinomial logistic regression, the exponentiated coefficient compares the probability of being in category j relative to the probability of being in the reference category, for a one-unit change in the predictor. That's a ratio of probabilities, or relative risks, hence relative risk ratio.

**Kiffer:** But honestly, in a lot of applied work, people just call them odds ratios anyway. Stata calls them relative risk ratios. R will sometimes call them odds ratios, sometimes log-odds. The numbers themselves are computed the same way. Just be aware of the strict terminology when you're reading a careful methodology paper.

**Sarah:** Now the choice of reference category in a multinomial model matters a lot for ease of interpretation. You're free to choose any category as the reference. But some choices make the resulting comparisons way easier to communicate.

**Kiffer:** Yeah. The general rule is, pick the reference category that makes the comparisons most natural. If you're studying treatment choice and one option is no treatment, no treatment is a great reference, because each comparison reads as the effect on choosing some active treatment versus not. If you're studying voting choice in Canada, the largest party in your sample makes a natural reference, because each other party is a comparison against the dominant alternative.

**Sarah:** Another consideration is statistical. You generally want the reference category to be reasonably common in your sample. If you pick a rare category as the reference, the comparisons against it will have wide confidence intervals because there's not much data anchoring the reference. So pragmatic advice is, choose a reference that's both substantively meaningful and well-populated in your data.

**Kiffer:** Right. And practically, you can always re-estimate the model with a different reference if you want a different set of comparisons. The fit of the model doesn't change. Just the way the coefficients are labelled.

**Sarah:** Okay. Goodness of fit. How do we tell whether a multinomial model is any good?

**Kiffer:** There are a few tools, and they each tell you something different. The first is the likelihood ratio test for nested models.

**Sarah:** Let me define that. A likelihood ratio test compares two models, where one is nested inside the other, meaning the simpler model is a special case of the more complex one. You compute negative two times the difference in log-likelihoods, and that statistic follows a chi-squared distribution with degrees of freedom equal to the difference in number of parameters between the two models.

**Kiffer:** And in the multinomial context, you'd typically use likelihood ratio tests to test whether a predictor as a whole is significant. Because each predictor in a multinomial model has K minus one coefficients, the question, is age significant, isn't a single test. You're testing K minus one coefficients simultaneously. The likelihood ratio test handles that joint hypothesis cleanly.

**Sarah:** The second tool is McFadden's pseudo R-squared. Quick definition. Real R-squared, the kind you see in linear regression, is the proportion of variance in the outcome explained by the model. McFadden's pseudo R-squared is an analogue for likelihood-based models. It's based on the log-likelihood of your model compared to the log-likelihood of an intercept-only null model.

**Kiffer:** And the values aren't directly interpretable as proportion of variance the way real R-squared is. McFadden himself suggested that values between zero point two and zero point four indicate excellent fit. So don't be alarmed if your pseudo R-squared is zero point one five. That can still be a reasonable model, depending on what you're studying.

**Sarah:** And then there's a really important negative point. The Hosmer-Lemeshow test, which you may remember from Lesson five on binary logistic regression, doesn't directly apply to multinomial models.

**Kiffer:** Why not, what breaks when you try to apply it?

**Sarah:** The Hosmer-Lemeshow test divides the data into deciles of predicted probability and compares observed versus expected counts in a chi-squared style test. That works cleanly when there's a single predicted probability per observation, like in binary logistic regression. But in a multinomial model, each observation has K predicted probabilities, one for each category, and they have to sum to one. So you'd need to extend the binning logic to a multivariate setting. Some extensions exist in the literature, but they're not as standardized or widely implemented as the binary version.

**Kiffer:** So in practice, for multinomial models, people lean on likelihood ratio tests, McFadden's pseudo R-squared, and graphical comparisons of predicted versus observed proportions across covariate patterns. There's no single dominant goodness-of-fit test.

**Sarah:** Now let's talk about the most famous assumption in multinomial logistic regression. Independence of Irrelevant Alternatives, often abbreviated I I A.

**Kiffer:** And this name is a mouthful, so let's slow down.

**Sarah:** Independence of Irrelevant Alternatives means, the relative odds of choosing one category over another do not depend on what other categories are available. The other alternatives in the choice set are, in this technical sense, irrelevant to the comparison between any two specific alternatives.

**Kiffer:** And the classic example, which goes back to McFadden, is the so-called red bus, blue bus problem. Imagine commuters choose between car and red bus, and the model says the relative odds are fifty-fifty. Now you add a blue bus, identical to the red bus in every way except color.

**Sarah:** Right, and what does common sense say should happen to the split?

**Kiffer:** Common sense says, half the people who took the red bus would now take the blue bus. So you'd end up with half by car, a quarter by red bus, a quarter by blue bus. The relative odds of car versus red bus changes from fifty-fifty to about two-to-one, because the blue bus took some of the red bus market.

**Sarah:** But what does multinomial logistic regression say, under the I I A assumption?

**Kiffer:** It says, the relative odds of car versus red bus stay the same. Fifty-fifty. So if you're going to add a blue bus and call it a third of the choice set, the model has to redistribute probabilities so that car drops to a third and red bus drops to a third and blue bus is also a third. Even though, intuitively, the blue bus shouldn't be cannibalizing the car at all.

**Sarah:** So that's why I I A is such a famous problem in choice modeling. When two alternatives are very similar, like the two buses, I I A fails badly. The model treats them as if they're independent of each other when they really aren't.

**Kiffer:** So how do we test for I I A violations in practice?

**Sarah:** The classic test is the Hausman-McFadden test, developed by Jerry Hausman and Daniel McFadden in 1984. The logic is, if I I A holds, then dropping one of the alternatives from the choice set shouldn't substantially change the coefficients of the model fit on the remaining alternatives. So you fit the full model, then you fit a restricted model dropping one category, and you compare the coefficient estimates. If they differ substantially, I I A is rejected.

**Kiffer:** There's also the Small-Hsiao test, which uses a different sample-splitting approach. The two tests sometimes give conflicting results in practice, and the lesson is honest that the testing landscape isn't perfect. If both tests reject, that's a clear signal. If they disagree, you have to use judgment.

**Sarah:** And here's the key practical point. I I A is mainly a concern for genuinely substitutable alternatives. The bus example is the classic case. If you're modeling cancer type, I I A is less of a concern, because lung cancer and breast cancer aren't really substitutes for each other in the same sense.

**Kiffer:** And when I I A does fail, what are the alternatives we can reach for?

**Sarah:** The most common alternative is the nested logit model. The idea is, you group similar alternatives into nests, and you model the choice in two stages. First you model which nest someone is in, then you model which alternative within the nest.

**Kiffer:** So in the bus example, you'd put the two buses in a nest called bus, and put car in its own nest. Then your top-level model is, car versus bus, and the within-nest model is, red bus versus blue bus, conditional on choosing bus. Now adding a blue bus only affects the within-bus comparison. It doesn't change the car-versus-bus odds. Which is exactly what intuition says should happen.

**Sarah:** And there are more flexible models too, like mixed logit, which allows the coefficients to vary across individuals according to a distribution. That's beyond the scope of this lesson, but it's worth knowing the field is rich. If standard multinomial fails, there are well-developed alternatives.

**Kiffer:** Let me ground multinomial in the Apgar example, the same one we used for proportional odds.

**Sarah:** Yeah, this is illuminating because you can see what you gain and what you lose by using multinomial instead of proportional odds on an ordered outcome.

**Kiffer:** So the same Apgar score data. Four categories. One through six, seven, eight, nine through ten. The textbook fits a multinomial model with the highest category as the reference. The relative risk ratios for prenatal visits, comparing six or more visits to fewer than six, are zero point two four for the lowest category, zero point six five for category seven, and zero point seven two for category eight.

**Sarah:** And what's striking about those numbers is the gradient. The strongest effect is at the lowest Apgar category, where having more prenatal visits drops the relative risk by seventy-six percent. The effect is more modest at the middle categories. So prenatal visits are most protective against the worst outcomes, less impactful against intermediate outcomes.

**Kiffer:** And this gradient is exactly what you'd hope to see for an ordered outcome. The proportional odds model summarizes this with a single number, one point five nine. The multinomial model gives you a richer picture, three numbers, one per cutpoint, and you can see the gradient explicitly.

**Sarah:** But here's the trade-off. If the proportional odds assumption holds, the proportional odds model is more efficient. You're estimating fewer parameters from the same data, so each parameter is estimated more precisely. If the assumption doesn't hold, the multinomial model is more honest, because it doesn't impose a constraint the data don't support.

**Kiffer:** Which is why testing the proportional odds assumption is so important. It's the deciding factor for which model you should report.

**Sarah:** Quick mention of the two specialty cousins we promised at the top. The adjacent-category model and the continuation-ratio model.

**Kiffer:** Yeah, give them a paragraph each so people know when to reach for them.

**Sarah:** The adjacent-category model is for ordinal outcomes, but instead of comparing each category to all-lower or all-higher categories, it compares each category to its immediate neighbor. Mild versus none. Moderate versus mild. Severe versus moderate. The constraint is similar in spirit to proportional odds. Like proportional odds, it gives you a single coefficient per predictor. It's a constrained version of multinomial logistic regression, and the validity of the constraint can be tested with a likelihood ratio test against the unconstrained multinomial fit.

**Kiffer:** And the continuation-ratio model is for outcomes where each level represents a sequential stage that must be passed through to reach the next. Number of attempts to pass an exam. Stages of a multi-step intervention. Career advancement levels. The model compares each level to all lower levels combined, in a way that respects the sequential structure. It's not appropriate when categories can be reached without passing through lower levels. So the Apgar example wouldn't be a continuation-ratio target, because a baby doesn't pass through each Apgar score on the way to the highest.

**Sarah:** Both are useful tools when you have a specific reason to think the structure of the outcome justifies them. But for most ordinal outcomes in epidemiology, proportional odds is the default first attempt, and you only escalate to one of these specialized models if proportional odds doesn't fit or if the substantive structure of the outcome calls for them.

**Kiffer:** Okay. Let me try to pull the whole arc together because there's a lot of moving pieces.

**Sarah:** Yeah. Let me anchor the takeaways. There are six big ones I'd want a student to walk away with.

**Kiffer:** Go ahead, lay them out for us.

**Sarah:** First. The first decision in any multi-category outcome problem is whether the categories are ordered or not. Ordered means an ordinal model, with proportional odds as the default. Not ordered means a multinomial model. The decision tree starts here, and it's almost always determined by the substance of your outcome, not by anything statistical.

**Kiffer:** Second. The proportional odds model converts a K-category problem into K minus one logistic regressions, each modeling a cumulative probability at a cutpoint. The proportional odds assumption is that the predictor coefficients are the same across all cutpoints. Only the intercepts change. The payoff is one odds ratio per predictor, with a clean interpretation that holds across all cutpoints.

**Sarah:** Third. The proportional odds assumption has to be tested. Brant test, score test. If the assumption holds for all predictors, report the proportional odds model. If a few predictors fail, use a partial proportional odds model that relaxes the constraint just for those. If many predictors fail, fall back to a generalized ordinal logistic model or a fully multinomial model.

**Kiffer:** Fourth. Multinomial logistic regression fits K minus one logistic regressions simultaneously, comparing each non-reference category to a chosen reference. K minus one sets of coefficients per predictor. The exponentiated coefficients are technically relative risk ratios, though many people call them odds ratios in casual usage. Choose the reference category strategically to make the contrasts easy to interpret, and make sure it's well-populated in your data.

**Sarah:** Fifth. Goodness of fit for multinomial models lives in likelihood ratio tests for nested model comparisons and McFadden's pseudo R-squared for overall fit. Hosmer-Lemeshow doesn't directly apply because there are multiple outcome categories with multiple predicted probabilities per observation. There are extensions but they're not as standardized.

**Kiffer:** And sixth. The Independence of Irrelevant Alternatives assumption matters when alternatives are genuinely substitutable. The Hausman-McFadden test, sometimes alongside the Small-Hsiao test, checks the assumption. When I I A fails, the most common alternative is the nested logit model, which groups similar alternatives into nests and models the choice in stages.

**Sarah:** And there are a few practical recommendations I'd add. One. When you fit an ordinal model, always test the proportional odds assumption before you report results. The Brant test is one line of code in R. There's no excuse for skipping it. But also, look at the magnitude of any deviations, not just the p-value, because in a large sample the test will reject for trivial differences.

**Kiffer:** Two. When you write up a multinomial model, present the results in a way that respects the choice of reference. Tables that lay out each non-reference comparison clearly. Or visualizations that show predicted probabilities at meaningful covariate patterns, since predicted probabilities are often more interpretable than the raw relative risk ratios for general audiences.

**Sarah:** Three. Don't treat an ordinal outcome as numeric just because it's coded with numbers. The implicit equal-spacing assumption that produces is almost always wrong, and an ordinal model is the right tool for outcomes with order but no numeric metric.

**Kiffer:** And one connecting point back to the rest of the course. Lesson five covered binary logistic regression. The models we covered today are direct generalizations. Proportional odds is binary logistic regression repeated K minus one times across cumulative cutpoints, with a shared coefficient constraint. Multinomial is binary logistic regression repeated K minus one times across category contrasts, with no shared constraint. Both are members of the generalized linear model family, sometimes abbreviated G L M, and both inherit the maximum likelihood machinery, the standard errors, and the inference apparatus you already know.

**Sarah:** And next up is Lesson seven. Count and Rate Data. Where we stay in the generalized linear model family but pivot to outcomes that are counts of events. Number of hospital visits. Number of incident cases. The standard tools there are Poisson and negative binomial regression. And the offset terms you'll meet there are how rate data become tractable in the same framework.

**Kiffer:** Take care, everyone, and we'll see you in Lesson seven.

**Sarah:** See you there.
