Ordinal Regression: Breathing Test Results.

Fahrmeir and Tutz (2001) Examples 3.2 & 3.4

Published

April 23, 2026

Data Loading and Setup

Code

library(conflicted)
library(dplyr)
library(MASS)
library(purrr)
library(readr)
library(tidyr)

We load the BTR dataset. It contains breathing test results with 3 levels: normal, borderline, and abnormal (which we can consider as ordered). We want to analyze the effects of the covariates age and smoking status.

Code

btr_dat <- read_csv(
  "data/Example3-2BTR.csv"
)

Rows: 18 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): BTR, Age, Smoking
dbl (1): Freq

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

As it is, we can see that the variables BTR, Age, and Smoking are all read as characters. This is not what we want, and we will use the col_types argument to specify that we want them as factors.

Code

btr_dat <- read_csv(
  "data/Example3-2BTR.csv",
  col_types = cols(
    BTR = col_factor(),
    Age = col_factor(),
    Smoking = col_factor(),
    Freq = col_double()
  )
)

btr_dat

# A tibble: 18 × 4
   BTR     Age    Smoking   Freq
   <fct>   <fct>  <fct>    <dbl>
 1 1Normal <40    1Never     577
 2 2Border <40    1Never      27
 3 3Abnorm <40    1Never       7
 4 1Normal <40    2Former    192
 5 2Border <40    2Former     20
 6 3Abnorm <40    2Former      3
 7 1Normal <40    3Current   682
 8 2Border <40    3Current    46
 9 3Abnorm <40    3Current    11
10 1Normal 40to59 1Never     164
11 2Border 40to59 1Never       4
12 3Abnorm 40to59 1Never       0
13 1Normal 40to59 2Former    145
14 2Border 40to59 2Former     15
15 3Abnorm 40to59 2Former      7
16 1Normal 40to59 3Current   245
17 2Border 40to59 3Current    47
18 3Abnorm 40to59 3Current    27

BTR, Age, and Smoking are all factors now. We next check if the breathing test result (BTR) is encoded as an ordered factor, and if not, we convert it to an ordered factor:

Code

is.ordered(btr_dat$BTR)

[1] FALSE

Code

btr_dat$BTR <- as.ordered(btr_dat$BTR)

We can now confirm the conversion was successful.

Code

class(btr_dat$BTR)

[1] "ordered" "factor"

Code

btr_dat$BTR

 [1] 1Normal 2Border 3Abnorm 1Normal 2Border 3Abnorm 1Normal 2Border 3Abnorm
[10] 1Normal 2Border 3Abnorm 1Normal 2Border 3Abnorm 1Normal 2Border 3Abnorm
Levels: 1Normal < 2Border < 3Abnorm

The default ordering happens to be good for us, but if we needed to change the order, we could specify it with factor(..., levels = c(...), ordered = TRUE).

We now examine the current contrast coding scheme:

Code

getOption("contrasts")

        unordered           ordered 
"contr.treatment"      "contr.poly"

"contr.treatment" corresponds to dummy coding for unordered factors. For ordered factors, the default is "contr.poly" (orthogonal polynomials); we won’t cover polynomial contrasts in this subject but refer to p.33 of Chambers and Hastie (1992) for more details. We consider Age and Smoking as unordered factors, and we will use dummy coding.

Code

contrasts(btr_dat$Smoking)

         2Former 3Current
1Never         0        0
2Former        1        0
3Current       0        1

Code

contrasts(btr_dat$Age)

       40to59
<40         0
40to59      1

The default reference categories are never-smoked for Smoking, and the <40 for Age.

Note

Dummy coding: Reference category is 0, other categories are 1; coefficients represent difference from reference category
Effect coding: Reference category is -1, other categories are 0 or 1; coefficients represent difference from overall mean

Model Fitting: Proportional Odds Models

The MASS::polr() function fits cumulative logit (and other) models. Looking at the documentation using ?polr, particularly how the linear predictor $\eta$ is defined therein; the signs of beta coefficients in MASS::polr() is opposite to the corresponding coefficients in the textbook (Fahrmeir and Tutz (2001))’s formula.

We first fit a cumulative logistic (proportional odds) model with an intercept, age, smoking. We also include an interaction effect between age and smoking.

Code

btr_logit_inter <- polr(
  BTR ~ Age * Smoking,
  data = btr_dat,
  weights = Freq,
  Hess = TRUE,
  method = "logistic"
)

MASS::polr() Function Syntax

formula: Model structure response ~ predictors; response must be an ordered factor
data: Data frame containing the variables
weights: Case weights (here, frequency counts for each covariate combination)
Hess = TRUE: Return the Hessian matrix; needed for summary() and confidence intervals
method: Link function—"logistic" (default, proportional odds), "probit", "cloglog", "loglog", or "cauchit"

The MASS::polr() function is limited. It can model standard cumulative models (see section 3.3.1 of Fahrmeir and Tutz (2001)), but doesn’t allow for extended cumulative model with threshold variables (see section 3.3.2 of Fahrmeir and Tutz (2001)).

Code

summary(btr_logit_inter)

Call:
polr(formula = BTR ~ Age * Smoking, data = btr_dat, weights = Freq, 
    Hess = TRUE, method = "logistic")

Coefficients:
                            Value Std. Error t value
Age40to59                 -0.8857     0.5359  -1.653
Smoking2Former             0.6973     0.2822   2.471
Smoking3Current            0.3472     0.2238   1.551
Age40to59:Smoking2Former   1.1458     0.6228   1.840
Age40to59:Smoking3Current  2.2007     0.5689   3.868

Intercepts:
                Value   Std. Error t value
1Normal|2Border  2.8334  0.1764    16.0603
2Border|3Abnorm  4.3078  0.2130    20.2210

Residual Deviance: 1564.968 
AIC: 1578.968

SELF TEST: Provided that a worker is in the 40–59 age group, what is his/her (cumulative) odds of having a normal test result if s/he is former smoker, compared to if s/he has never smoked?

The fitted model has cumulative logit link of the form \[\begin{multline*} \log\frac{P(\texttt{BTR}=\texttt{Normal})}{P(\texttt{BTR}>\texttt{Normal})} = \theta_1 + \gamma_1\texttt{Age40to59}+\gamma_2\texttt{Smoking2Former}+\gamma_3\texttt{Smoking3Current} +\\ \gamma_4\texttt{Age40to59}\cdot \texttt{Smoking2Former} + \gamma_5\texttt{Age40to59}\cdot \texttt{Smoking3Current} \end{multline*}\] and \[\begin{multline*} \log\frac{P(\texttt{BTR} \leq \texttt{Borderline})}{P(\texttt{BTR}>\texttt{Borderline})} = \theta_2 + \gamma_1\texttt{Age40to59}+\gamma_2\texttt{Smoking2Former}+\gamma_3\texttt{Smoking3Current} +\\ \gamma_4\texttt{Age40to59}\cdot \texttt{Smoking2Former} + \gamma_5\texttt{Age40to59}\cdot \texttt{Smoking3Current}, \end{multline*}\] where the MLE of the parameters are

\[ \hat{\theta}_1=2.8334, \hat{\theta}_2=4.3078, \hat{\gamma}_1=0.8857, \hat{\gamma}_2=-0.6973 , \hat{\gamma}_3=-0.3472 , \hat{\gamma}_4=-1.1458 , \hat{\gamma}_5=-2.2007. \]

Given that a worker is in the 40–59 age group, the cumulative odds of having a normal test result for a former smoker is $\exp(-0.6973 - 1.1458)= 0.158$ times that for a person who has never smoked.

As it is a “proportional odds” model, the same is true for the cumulative odds of having “$\leq$ Borderline” testing result.

There is no residual readily available, but we can observe the linear predictor $\eta$ as defined in the documentation page for the MASS::polr() function without the intercept.

Code

btr_logit_inter$lp

         1          2          3          4          5          6          7 
 0.0000000  0.0000000  0.0000000  0.6972823  0.6972823  0.6972823  0.3472245 
         8          9         10         11         12         13         14 
 0.3472245  0.3472245 -0.8857462 -0.8857462 -0.8857462  0.9573393  0.9573393 
        15         16         17         18 
 0.9573393  1.6621466  1.6621466  1.6621466

The $\zeta$’s are the theshold parameters $\theta$’s as defined in our textbook (Fahrmeir and Tutz (2001)) and slides.

Code

btr_logit_inter$zeta

1Normal|2Border 2Border|3Abnorm 
       2.833443        4.307832

We verify the linear predictors by hand-computing the linear predictor.

Code

Z <- cbind(
  (btr_dat$Age == "40to59"),
  (btr_dat$Smoking == "2Former"),
  (btr_dat$Smoking == "3Current"),
  (btr_dat$Age == "40to59") * (btr_dat$Smoking == "2Former"),
  (btr_dat$Age == "40to59") * (btr_dat$Smoking == "3Current")
)

tibble(
  manual = c(Z %*% coef(btr_logit_inter)),
  polr = btr_logit_inter$lp
)

# A tibble: 18 × 2
   manual   polr
    <dbl>  <dbl>
 1  0      0    
 2  0      0    
 3  0      0    
 4  0.697  0.697
 5  0.697  0.697
 6  0.697  0.697
 7  0.347  0.347
 8  0.347  0.347
 9  0.347  0.347
10 -0.886 -0.886
11 -0.886 -0.886
12 -0.886 -0.886
13  0.957  0.957
14  0.957  0.957
15  0.957  0.957
16  1.66   1.66 
17  1.66   1.66 
18  1.66   1.66

We see that they line up.

Fit Model without Interaction (Main Effects Only)

We fit a main-effects-only model without interaction.

Code

btr_logit_main <- polr(
  BTR ~ Age + Smoking,
  data = btr_dat,
  weights = Freq,
  Hess = TRUE,
  method = "logistic"
)

summary(btr_logit_main)

Call:
polr(formula = BTR ~ Age + Smoking, data = btr_dat, weights = Freq, 
    Hess = TRUE, method = "logistic")

Coefficients:
                 Value Std. Error t value
Age40to59       0.7772     0.1482   5.243
Smoking2Former  0.7815     0.2333   3.350
Smoking3Current 0.9607     0.1918   5.010

Intercepts:
                Value   Std. Error t value
1Normal|2Border  3.1927  0.1749    18.2544
2Border|3Abnorm  4.6543  0.2125    21.9076

Residual Deviance: 1589.744 
AIC: 1599.744

Again the signs of the covariate coefficients are the opposite to our model defined in lectures. Since the main effects model is nested within the interaction model, we can compare them with a likelihood ratio test.

SELF TEST: How Many Degrees of Freedom will the Likelihood Ratio Test have?

The test has 2 degrees of freedom since we are adding two interaction terms (Age40to59:Smoking2Former and Age40to59:Smoking3Current).

We can use anova():

Code

anova(btr_logit_inter, btr_logit_main)

Likelihood ratio tests of ordinal regression models

Response: BTR
          Model Resid. df Resid. Dev   Test    Df LR stat.      Pr(Chi)
1 Age + Smoking      2214   1589.744                                   
2 Age * Smoking      2212   1564.968 1 vs 2     2 24.77585 4.168619e-06

SELF TEST: Interpret the Likelihood Ratio Test Results

We reject the null hypothesis that there is no interaction, at all reasonable significance levels.

Note

The LR statistic in the anova() output is the difference between the deviances of the two models. Each residual deviance reported is $-2 \times \log(\text{likelihood})$ for that model (more on this to follow).

Alternative Link Functions

The proportional odds assumption can be tested by fitting models with different link functions. The MASS::polr() function supports:

logistic: cumulative logit model (default)
probit: cumulative normal model
cloglog: complementary log-log (grouped Cox/proportional hazards)
loglog: extreme maximal-value distribution.

As discussed in lecture, these can be motivated by the so-called “threshold approach” where the response is a categorized version of an underlying latent continuous variables. Figure 3.1 of Fahrmeir and Tutz (2001) displays the shapes of the error densities of the latent variable with respect to these models .

Grouped Cox / Proportional Hazards (Complementary Log-Log Link)

Code

btr_ph <- polr(
  BTR ~ Age + Smoking + Age:Smoking,
  data = btr_dat,
  weights = Freq,
  Hess = TRUE,
  method = "cloglog"
)

summary(btr_ph)

Call:
polr(formula = BTR ~ Age + Smoking + Age:Smoking, data = btr_dat, 
    weights = Freq, Hess = TRUE, method = "cloglog")

Coefficients:
                            Value Std. Error t value
Age40to59                 -0.2865    0.14264  -2.008
Smoking2Former             0.2125    0.10054   2.114
Smoking3Current            0.1088    0.07301   1.490
Age40to59:Smoking2Former   0.4333    0.18941   2.288
Age40to59:Smoking3Current  0.8379    0.16357   5.122

Intercepts:
                Value   Std. Error t value
1Normal|2Border  1.0477  0.0558    18.7612
2Border|3Abnorm  1.5528  0.0629    24.6916

Residual Deviance: 1559.949 
AIC: 1573.949

Extreme Maximal-Value Distribution

Code

btr_extm <- polr(
  BTR ~ Age + Smoking + Age:Smoking,
  data = btr_dat,
  weights = Freq,
  Hess = TRUE,
  method = "loglog"
)

summary(btr_extm)

Call:
polr(formula = BTR ~ Age + Smoking + Age:Smoking, data = btr_dat, 
    weights = Freq, Hess = TRUE, method = "loglog")

Coefficients:
                            Value Std. Error t value
Age40to59                 -0.8672     0.5286  -1.640
Smoking2Former             0.6755     0.2700   2.501
Smoking3Current            0.3368     0.2167   1.554
Age40to59:Smoking2Former   1.1003     0.6070   1.813
Age40to59:Smoking3Current  2.0724     0.5573   3.719

Intercepts:
                Value   Std. Error t value
1Normal|2Border  2.8614  0.1715    16.6838
2Border|3Abnorm  4.2761  0.2080    20.5605

Residual Deviance: 1566.335 
AIC: 1580.335

Matching our Results to Fahrmeir and Tutz (2001) Example 3.4

By default, R uses dummy coding. To match to the numerical results in Table 3.3 of Fahrmeir and Tutz (2001) Example 3.4, we need to use effect coding. We switch the global contrast option and refit.

Code

options(contrasts = c("contr.sum", "contr.poly"))

# Verify the contrast change
contrasts(btr_dat$Smoking)

         [,1] [,2]
1Never      1    0
2Former     0    1
3Current   -1   -1

Code

contrasts(btr_dat$Age)

       [,1]
<40       1
40to59   -1

We now have the last categories coded as -1, as also done in the textbook. Refitting both models:

Code

btr_logit_inter_eff <- polr(
  BTR ~ Age + Smoking + Age:Smoking,
  data = btr_dat,
  weights = Freq,
  Hess = TRUE,
  method = "logistic"
)

btr_logit_main_eff <- polr(
  BTR ~ Age + Smoking,
  data = btr_dat,
  weights = Freq,
  Hess = TRUE,
  method = "logistic"
)

btr_logit_inter_eff_summary <- summary(btr_logit_inter_eff)
btr_logit_inter_eff_summary

Call:
polr(formula = BTR ~ Age + Smoking + Age:Smoking, data = btr_dat, 
    weights = Freq, Hess = TRUE, method = "logistic")

Coefficients:
                 Value Std. Error t value
Age1          -0.11487     0.1086 -1.0580
Smoking1      -0.90596     0.1890 -4.7933
Smoking2       0.36428     0.1421  2.5644
Age1:Smoking1  0.55777     0.1890  2.9512
Age1:Smoking2 -0.01517     0.1421 -0.1068

Intercepts:
                Value   Std. Error t value
1Normal|2Border  2.3704  0.1086    21.8209
2Border|3Abnorm  3.8447  0.1599    24.0486

Residual Deviance: 1564.968 
AIC: 1578.968

Code

btr_logit_main_eff_summary <- summary(btr_logit_main_eff)
btr_logit_main_eff_summary

Call:
polr(formula = BTR ~ Age + Smoking, data = btr_dat, weights = Freq, 
    Hess = TRUE, method = "logistic")

Coefficients:
           Value Std. Error t value
Age1     -0.3886    0.07412  -5.243
Smoking1 -0.5807    0.12807  -4.535
Smoking2  0.2008    0.12549   1.600

Intercepts:
                Value   Std. Error t value
1Normal|2Border  2.2234  0.0835    26.6418
2Border|3Abnorm  3.6850  0.1434    25.6938

Residual Deviance: 1589.744 
AIC: 1599.744

The values reported in the summaries now match the textbook (up to rounding errors and sign change of the covariate coefficients). In addition, we can redo the likelihood ratio test with the new models to confirm we get the same result.

Code

anova(btr_logit_inter_eff, btr_logit_main_eff)

Likelihood ratio tests of ordinal regression models

Response: BTR
                        Model Resid. df Resid. Dev   Test    Df LR stat.
1               Age + Smoking      2214   1589.744                      
2 Age + Smoking + Age:Smoking      2212   1564.968 1 vs 2     2 24.77585
       Pr(Chi)
1             
2 4.168618e-06

As expected, the likelihood ratio test results are the same as before, because using effect coding, as opposed to dummy coding, amounts only to a reparametrization of the model.

Saturated Model and Deviance Calculation

In each of the regressed model summaries, there is a value “Residual Deviance” reported. We need to be careful about what this exactly means; as usual, this can be highly package dependent.

The deviance component (reported as “residual deviance” by the summary() function) contained in a polr object is $-2 \times \log(\text{likelihood})$ for the fitted model. We can manually compute the (scaled) deviance as defined in class , i.e. twice the difference between the maximum log-likelihoods of the saturated model and the GLM.

We aim to compute the maximum log-likelihood of the saturated model. We first create a matrix with each row corresponding to a unique covariate combination, with three columns counting the frequencies of each response category for that covariate combination. This can be accomplished elegantly by the function pivot_wider() in the tidyr package.

Code

btr_counts <- btr_dat |>
  pivot_wider(
    id_cols = c(Age, Smoking),
    names_from = BTR,
    values_from = Freq
  )
btr_counts

# A tibble: 6 × 5
  Age    Smoking  `1Normal` `2Border` `3Abnorm`
  <fct>  <fct>        <dbl>     <dbl>     <dbl>
1 <40    1Never         577        27         7
2 <40    2Former        192        20         3
3 <40    3Current       682        46        11
4 40to59 1Never         164         4         0
5 40to59 2Former        145        15         7
6 40to59 3Current       245        47        27

Code

data_mat <- btr_counts |>
  dplyr::select(-Age, -Smoking) |>
  as.matrix()
data_mat

     1Normal 2Border 3Abnorm
[1,]     577      27       7
[2,]     192      20       3
[3,]     682      46      11
[4,]     164       4       0
[5,]     145      15       7
[6,]     245      47      27

We can also represent the data as a matrix of empirical proportions (dividing each row by it’s row sum).

Code

probs <- data_mat / rowSums(data_mat)
probs

       1Normal    2Border    3Abnorm
[1,] 0.9443535 0.04418985 0.01145663
[2,] 0.8930233 0.09302326 0.01395349
[3,] 0.9228687 0.06224628 0.01488498
[4,] 0.9761905 0.02380952 0.00000000
[5,] 0.8682635 0.08982036 0.04191617
[6,] 0.7680251 0.14733542 0.08463950

This is the also the matrix of maximum likelihood estimates under the saturated model (Why?). We can manually calculate the maximum log-likelihood of the saturated model using the multinomial distribution by first calculating the log-likelihood for each cell, (treating $0 \log(0) := 0$, as one of the cells in data_mat is a zero entry), and summing over all cells to get the total log-likelihood for the saturated model.

Code

loglike_matrix <- ifelse(data_mat > 0, data_mat * log(probs), 0)
loglike_matrix

        1Normal    2Border   3Abnorm
[1,] -33.035958  -84.22002 -31.28431
[2,] -21.723390  -47.49812 -12.81608
[3,] -54.742955 -127.72620 -46.28143
[4,]  -3.951998  -14.95068   0.00000
[5,] -20.482710  -36.14915 -22.20459
[6,] -64.663559  -90.00704 -66.67256

Code

loglike_saturated <- sum(loglike_matrix)
loglike_saturated

[1] -778.4107

SELF TEST: Explain Why We Compute the Saturated Log-Likelihood with this Sum

The above computation is under the ungrouped view of the multinomial data as in Appendix A2. The log-likelihood is then: \[ \sum_{i = 1}^6 \sum_{j = 1}^3 \Bigl(\log(p_{ij}) \tilde{y}_{ij} \Bigr), \] where $\tilde{y}_{ij}$ (the entries in data_mat) is the count of workers in response category $j$ (one of “Normal”, “Border” or “Abnormal”) within covariate group $i$ (defined by “Age” and “Smoking”), and we treat $0 \log(0) = 0$.

We can recover the deviance reported in Table 3.3 of Fahrmeir and Tutz (2001) by computing twice the difference between the reported deviance of the fitted model and the log-likelihood of the saturated model (just manually computed).

Code

residual_dev_logit <- 2 * loglike_saturated + deviance(btr_logit_inter_eff)
residual_dev_logit

[1] 8.146686

This confirms that the so called deviance component of a polr object is actually just twice the negative of the log-likelihood of the fitted model (under the ungrouped view in Appendix A2). This is very different from what we usually consider the deviance.

Returning the Deviance for Alternative Link Functions

We also fit the models with complementary log-log and loglog link functions using effect coding, then summarize results in a comparison table similar to Table 3.3 of Fahrmeir and Tutz (2001), p.90:

Tip

The table is built in three steps:

Fit one model per link and extract coefficient estimates with coef(summary(...))[ ,"Value"].
Build two small tibbles (one for coefficients, one for residual deviance) and stack them with dplyr::bind_rows().
Use dplyr::mutate(across(where(is.numeric), ~ round(., 3))) to round all numeric columns to 3 decimal places for cleaner display.

rownames(coef(...)) supplies the parameter labels so each estimate aligns across link functions.

Code

# Fit complementary log-log (proportional hazards)
btr_ph_eff <- polr(
  BTR ~ Age + Smoking + Age:Smoking,
  data = btr_dat,
  weights = Freq,
  Hess = TRUE,
  method = "cloglog"
)
btr_ph_eff_summary <- summary(btr_ph_eff)

# Fit loglog (extreme maximal-value)
btr_extm_eff <- polr(
  BTR ~ Age + Smoking + Age:Smoking,
  data = btr_dat,
  weights = Freq,
  Hess = TRUE,
  method = "loglog"
)
btr_extm_eff_summary <- summary(btr_extm_eff)

# Create a list of models for easy iteration
models <- list(
  Logistic = btr_logit_inter_eff_summary,
  `Complementary Log-Log` = btr_ph_eff_summary,
  Loglog = btr_extm_eff_summary
)

# Extract variable names
coef_names <- rownames(coef(models[[1]]))

results_table <- bind_rows(
  # Coefficients
  tibble(
    Parameter = coef_names,
    `Cum Log` = coef(btr_logit_inter_eff_summary)[, "Value"],
    `Prop Haz` = coef(btr_ph_eff_summary)[, "Value"],
    `Ext Max Val` = coef(btr_extm_eff_summary)[, "Value"]
  ),
  # Residual Deviance
  tibble(
    Parameter = "Deviance",
    `Cum Log` = 2 * loglike_saturated + deviance(btr_logit_inter_eff),
    `Prop Haz` = 2 * loglike_saturated + deviance(btr_ph_eff),
    `Ext Max Val` = 2 * loglike_saturated + deviance(btr_extm_eff)
  )
) |>
  mutate(across(where(is.numeric), ~ round(., 3)))

results_table

# A tibble: 8 × 4
  Parameter       `Cum Log` `Prop Haz` `Ext Max Val`
  <chr>               <dbl>      <dbl>         <dbl>
1 Age1               -0.115     -0.069        -0.095
2 Smoking1           -0.906     -0.319        -0.866
3 Smoking2            0.364      0.11          0.359
4 Age1:Smoking1       0.558      0.212         0.529
5 Age1:Smoking2      -0.015     -0.005        -0.021
6 1Normal|2Border     2.37       0.872         2.43 
7 2Border|3Abnorm     3.84       1.38          3.84 
8 Deviance            8.15       3.13          9.51

Alternative (completely optional!): Nested Data + Functional Mapping Workflow

This can be written in a nested/list-column style using functional programming. The idea is to store one row per link function, fit each model via purrr::map(), then map again to extract coefficients and residual deviance.

Code

link_specs <- tibble(
  label = c("Cum Log", "Prop Haz", "Ext Max Val"),
  method = c("logistic", "cloglog", "loglog")
)

# Fit one model per row (list-column of models)
model_tbl <- link_specs |>
  mutate(
    model = map(
      method,
      ~ polr(
        BTR ~ Age + Smoking + Age:Smoking,
        data = btr_dat,
        weights = Freq,
        Hess = TRUE,
        method = .x
      )
    ),
    coef_value = map(model, ~ coef(summary(.x))[, "Value"]),
    residual_dev = map_dbl(model, ~ 2 * loglike_saturated + deviance(.x))
  )

# Expand mapped coefficients to long format
coef_long <- model_tbl |>
  mutate(
    Parameter = map(coef_value, names),
    estimate = coef_value
  ) |>
  dplyr::select(label, Parameter, estimate) |>
  tidyr::unnest(cols = c(Parameter, estimate))

# Add residual deviance row in matching long format
dev_long <- model_tbl |>
  transmute(
    label,
    Parameter = "Residual Deviance",
    estimate = residual_dev
  )

results_table_map <- bind_rows(coef_long, dev_long) |>
  pivot_wider(
    names_from = label,
    values_from = estimate
  ) |>
  mutate(across(where(is.numeric), ~ round(., 4)))

results_table_map

# A tibble: 8 × 4
  Parameter         `Cum Log` `Prop Haz` `Ext Max Val`
  <chr>                 <dbl>      <dbl>         <dbl>
1 Age1                -0.115     -0.0686       -0.0952
2 Smoking1            -0.906     -0.319        -0.866 
3 Smoking2             0.364      0.110         0.359 
4 Age1:Smoking1        0.558      0.212         0.529 
5 Age1:Smoking2       -0.0152    -0.0048       -0.0214
6 1Normal|2Border      2.37       0.872         2.43  
7 2Border|3Abnorm      3.84       1.38          3.84  
8 Residual Deviance    8.15       3.13          9.51

We have now matched Table 3.3 of Fahrmeir and Tutz (2001) (with reverse signs for the non-threshold coefficients).

It is highly recommended that you read through Example 3.4 in the textbook to get an idea of intepreting the results. It is worth considering here that the proportional hazard has the lowest deviance and hence the best fit.

Other Tricks and Commands

Residuals

The MASS::polr() function does not readily provide residuals.

Code

residuals(btr_logit_inter_eff)

NULL

Fitted Values and Predictions

Fitted probabilities can be extracted directly or using the generic predict() function:

Code

head(fitted(btr_logit_inter_eff))

    1Normal    2Border    3Abnorm
1 0.9444565 0.04225911 0.01328436
2 0.9444565 0.04225911 0.01328436
3 0.9444565 0.04225911 0.01328436
4 0.8943660 0.07930714 0.02632687
5 0.8943660 0.07930714 0.02632687
6 0.8943660 0.07930714 0.02632687

Code

head(predict(btr_logit_inter_eff, type = "probs"))

    1Normal    2Border    3Abnorm
1 0.9444565 0.04225911 0.01328436
2 0.9444565 0.04225911 0.01328436
3 0.9444565 0.04225911 0.01328436
4 0.8943660 0.07930714 0.02632687
5 0.8943660 0.07930714 0.02632687
6 0.8943660 0.07930714 0.02632687

Both approaches yield identical results.

Model Selection with AIC

The step() function performs automatic model selection using the AIC criterion. (not directly covered in this course unless time allows)

Code

step(btr_logit_inter_eff)

Start:  AIC=1578.97
BTR ~ Age + Smoking + Age:Smoking

              Df    AIC
<none>           1579.0
- Age:Smoking  2 1599.7

Call:
polr(formula = BTR ~ Age + Smoking + Age:Smoking, data = btr_dat, 
    weights = Freq, Hess = TRUE, method = "logistic")

Coefficients:
         Age1      Smoking1      Smoking2 Age1:Smoking1 Age1:Smoking2 
  -0.11486771   -0.90596380    0.36428448    0.55777234   -0.01516692 

Intercepts:
1Normal|2Border 2Border|3Abnorm 
       2.370384        3.844735 

Residual Deviance: 1564.968 
AIC: 1578.968

Variance-Covariance Matrix and Standard Errors

We can extract the estimated variance-covariance matrix of the parameter estimates using vcov(). This should roughly line up with the inverse of the Fisher information matrix.

Code

logit_vcov <- vcov(btr_logit_inter_eff)
logit_vcov

                        Age1    Smoking1     Smoking2 Age1:Smoking1
Age1             0.011787741 -0.01243168  0.005998232    0.01214182
Smoking1        -0.012431684  0.03572298 -0.020537642   -0.02505560
Smoking2         0.005998232 -0.02053764  0.020179329    0.01274490
Age1:Smoking1    0.012141822 -0.02505560  0.012744904    0.03571939
Age1:Smoking2   -0.003394051  0.01274270 -0.006617613   -0.02053578
1Normal|2Border  0.006312767 -0.01213435  0.003392219    0.01242837
2Border|3Abnorm  0.006164206 -0.01244460  0.003424437    0.01263740
                Age1:Smoking2 1Normal|2Border 2Border|3Abnorm
Age1             -0.003394051     0.006312767     0.006164206
Smoking1          0.012742700    -0.012134352    -0.012444602
Smoking2         -0.006617613     0.003392219     0.003424437
Age1:Smoking1    -0.020535784     0.012428367     0.012637403
Age1:Smoking2     0.020179722    -0.006001108    -0.005918600
1Normal|2Border  -0.006001108     0.011800241     0.011347044
2Border|3Abnorm  -0.005918600     0.011347044     0.025559519

From this we can match the standard errors reported in the summary output, since they are the square root of the diagonal elements of the variance-covariance matrix:

Code

btr_logit_inter_eff_coefs <- coef(btr_logit_inter_eff_summary)
btr_logit_inter_eff_coefs

                      Value Std. Error    t value
Age1            -0.11486771  0.1085714 -1.0579927
Smoking1        -0.90596380  0.1890052 -4.7933268
Smoking2         0.36428448  0.1420540  2.5644091
Age1:Smoking1    0.55777234  0.1889957  2.9512429
Age1:Smoking2   -0.01516692  0.1420553 -0.1067677
1Normal|2Border  2.37038447  0.1086289 21.8209350
2Border|3Abnorm  3.84473524  0.1598734 24.0486169

Code

se_from_vcov <- sqrt(diag(logit_vcov))
se_from_summary <- btr_logit_inter_eff_coefs[, "Std. Error"]

se_from_vcov == se_from_summary

           Age1        Smoking1        Smoking2   Age1:Smoking1   Age1:Smoking2 
           TRUE            TRUE            TRUE            TRUE            TRUE 
1Normal|2Border 2Border|3Abnorm 
           TRUE            TRUE

We should be able to recover the same confidence intervals by using the variance-covariance matrix and a normal approximation (Wald intervals).

We can check the covariance matrix against the inverse of the Hessian matrix (Fisher information) returned by MASS::polr(); note that I can simply refer to it as btr_logit_inter_eff$Hess instead of btr_logit_inter_eff$Hessian because of partial matching in R:

Code

hessian_inv <- solve(btr_logit_inter_eff$Hess)
hessian_inv

                         Age1     Smoking1      Smoking2 Age1:Smoking1
Age1             0.0117877412 -0.012431684  5.998232e-03  0.0121418216
Smoking1        -0.0124316842  0.035722975 -2.053764e-02 -0.0250556010
Smoking2         0.0059982315 -0.020537642  2.017933e-02  0.0127449043
Age1:Smoking1    0.0121418216 -0.025055601  1.274490e-02  0.0357193912
Age1:Smoking2   -0.0033940507  0.012742700 -6.617613e-03 -0.0205357845
1Normal|2Border  0.0063127668 -0.012134352  3.392219e-03  0.0124283674
2Border|3Abnorm -0.0001007637 -0.000210432  2.185198e-05  0.0001417814
                Age1:Smoking2 1Normal|2Border 2Border|3Abnorm
Age1            -3.394051e-03    0.0063127668   -1.007637e-04
Smoking1         1.274270e-02   -0.0121343518   -2.104320e-04
Smoking2        -6.617613e-03    0.0033922194    2.185198e-05
Age1:Smoking1   -2.053578e-02    0.0124283674    1.417814e-04
Age1:Smoking2    2.017972e-02   -0.0060011075    5.596186e-05
1Normal|2Border -6.001108e-03    0.0118002410   -3.073874e-04
2Border|3Abnorm  5.596186e-05   -0.0003073874    6.746838e-03

We can visually see that the two matrices hessian_inv and logit_vcov show slight differences in certain smaller entries. (The exact reason isn’t known to us yet; mostly numerical.)

Code

all.equal(hessian_inv, logit_vcov)

[1] "Mean relative difference: 0.2576063"

Confidence Intervals

R provides the generic method confint() to compute confidence intervals for model parameters.

Code

confint(btr_logit_inter_eff) |> round(3)

Waiting for profiling to be done...

               2.5 % 97.5 %
Age1          -0.315  0.118
Smoking1      -1.328 -0.570
Smoking2       0.088  0.651
Age1:Smoking1  0.221  0.980
Age1:Smoking2 -0.305  0.257

We can attempt to recovery the same confidence intervals using a Wald approximation based on the coefficient estimates and their standard errors:

Code

# Manual Wald CI (upper limit)
coeff_est <- btr_logit_inter_eff_coefs[, 1]
wald_upper <- coeff_est + qnorm(0.975) * sqrt(diag(vcov(btr_logit_inter_eff)))
wald_lower <- coeff_est + qnorm(0.025) * sqrt(diag(vcov(btr_logit_inter_eff)))
tibble(wald_lower, wald_upper)

# A tibble: 7 × 2
  wald_lower wald_upper
       <dbl>      <dbl>
1    -0.328      0.0979
2    -1.28      -0.536 
3     0.0859     0.643 
4     0.187      0.928 
5    -0.294      0.263 
6     2.16       2.58  
7     3.53       4.16

These Wald confidence intervals are different from those computed by confint(); in particular, the Wald ones are symmetric around the parameter estimates, but those by confint() aren’t.

This is because the confint() function uses the profile likelihood confidence intervals contructed by inverting the likelihood ratio tests, which is more suitable for some non-linear regression problems. See Agresti (2013) section 1.3.4 for constructing CIs by inverting tests, and also Agresti (2013) section 3.2.6 for constructing profile likelihood confidence intervals.

Run profile(btr_logit_inter_eff) and see Venables and Ripley (2002), p. 220 for more details on the profile() function.

Package Citations and Session Information

Session Information

Code

sessionInfo()

R version 4.4.2 (2024-10-31)
Platform: x86_64-apple-darwin20
Running under: macOS Sequoia 15.7.5

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Australia/Melbourne
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] tidyr_1.3.2      readr_2.1.6      purrr_1.2.1      MASS_7.3-61     
[5] dplyr_1.2.0      conflicted_1.2.0

loaded via a namespace (and not attached):
 [1] crayon_1.5.3      vctrs_0.7.1       cli_3.6.5         knitr_1.51       
 [5] rlang_1.1.7       xfun_0.56         renv_1.1.7        generics_0.1.4   
 [9] jsonlite_2.0.0    bit_4.6.0         glue_1.8.0        htmltools_0.5.9  
[13] hms_1.1.4         rmarkdown_2.30    evaluate_1.0.5    tibble_3.3.1     
[17] tzdb_0.5.0        fastmap_1.2.0     yaml_2.3.12       lifecycle_1.0.5  
[21] memoise_2.0.1     compiler_4.4.2    pkgconfig_2.0.3   rstudioapi_0.18.0
[25] digest_0.6.39     R6_2.6.1          utf8_1.2.6        tidyselect_1.2.1 
[29] parallel_4.4.2    vroom_1.7.0       pillar_1.11.1     magrittr_2.0.4   
[33] withr_3.0.2       bit64_4.6.0-1     tools_4.4.2       cachem_1.1.0

Code

packages <- c("conflicted", "dplyr", "MASS", "purrr", "readr", "tidyr")
for (pkg in packages) {
  cit <- citation(pkg)

  # Check if multiple citations
  if (length(cit) > 1) {
    cat("- **", pkg, "**:\n", sep = "")
    for (i in seq_along(cit)) {
      cit_text <- format(cit[i], style = "text")
      cat("    - ", cit_text, "\n\n", sep = "")
    }
  } else {
    cit_text <- format(cit, style = "text")
    cat("- **", pkg, "**: ", cit_text, "\n\n", sep = "")
  }
}

conflicted: Wickham H (2023). conflicted: An Alternative Conflict Resolution Strategy. R package version 1.2.0, https://github.com/r-lib/conflicted, https://conflicted.r-lib.org/.
dplyr: Wickham H, François R, Henry L, Müller K, Vaughan D (2026). dplyr: A Grammar of Data Manipulation. R package version 1.2.0, https://github.com/tidyverse/dplyr, https://dplyr.tidyverse.org.
MASS: Venables WN, Ripley BD (2002). Modern Applied Statistics with S, Fourth edition. Springer, New York. ISBN 0-387-95457-0, https://www.stats.ox.ac.uk/pub/MASS4/.
purrr: Wickham H, Henry L (2026). purrr: Functional Programming Tools. R package version 1.2.1, https://github.com/tidyverse/purrr, https://purrr.tidyverse.org/.
readr: Wickham H, Hester J, Bryan J (2025). readr: Read Rectangular Text Data. R package version 2.1.6, https://github.com/tidyverse/readr, https://readr.tidyverse.org.
tidyr: Wickham H, Vaughan D, Girlich M (2025). tidyr: Tidy Messy Data. R package version 1.3.2, https://github.com/tidyverse/tidyr, https://tidyr.tidyverse.org.

References

Agresti, Alan. 2013. Categorical Data Analysis. Third. Wiley Series in Probability and Statistics. Wiley-Interscience [John Wiley & Sons], Hoboken, NJ.

Chambers, John M., and Trevor J. Hastie. 1992. Statistical Models in s. Wadsworth & Brooks/Cole.

Fahrmeir, Ludwig, and Gerhard Tutz. 2001. Multivariate Statistical Modelling Based on Generalized Linear Models. Second Edition. Springer Series in Statistics. Springer. https://doi.org/10.1007/978-1-4757-3454-6.

Venables, W. N., and B. D. Ripley. 2002. Modern Applied Statistics with S. Fourth. Springer. https://www.stats.ox.ac.uk/pub/MASS4/.