Code
library(conflicted)
library(dplyr)
library(MASS)
library(purrr)
library(readr)
library(tidyr)Fahrmeir and Tutz (2001) Examples 3.2 & 3.4
April 23, 2026
We load the BTR dataset. It contains breathing test results with 3 levels: normal, borderline, and abnormal (which we can consider as ordered). We want to analyze the effects of the covariates age and smoking status.
Rows: 18 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): BTR, Age, Smoking
dbl (1): Freq
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
As it is, we can see that the variables BTR, Age, and Smoking are all read as characters. This is not what we want, and we will use the col_types argument to specify that we want them as factors.
# A tibble: 18 × 4
BTR Age Smoking Freq
<fct> <fct> <fct> <dbl>
1 1Normal <40 1Never 577
2 2Border <40 1Never 27
3 3Abnorm <40 1Never 7
4 1Normal <40 2Former 192
5 2Border <40 2Former 20
6 3Abnorm <40 2Former 3
7 1Normal <40 3Current 682
8 2Border <40 3Current 46
9 3Abnorm <40 3Current 11
10 1Normal 40to59 1Never 164
11 2Border 40to59 1Never 4
12 3Abnorm 40to59 1Never 0
13 1Normal 40to59 2Former 145
14 2Border 40to59 2Former 15
15 3Abnorm 40to59 2Former 7
16 1Normal 40to59 3Current 245
17 2Border 40to59 3Current 47
18 3Abnorm 40to59 3Current 27
BTR, Age, and Smoking are all factors now. We next check if the breathing test result (BTR) is encoded as an ordered factor, and if not, we convert it to an ordered factor:
We can now confirm the conversion was successful.
[1] "ordered" "factor"
[1] 1Normal 2Border 3Abnorm 1Normal 2Border 3Abnorm 1Normal 2Border 3Abnorm
[10] 1Normal 2Border 3Abnorm 1Normal 2Border 3Abnorm 1Normal 2Border 3Abnorm
Levels: 1Normal < 2Border < 3Abnorm
The default ordering happens to be good for us, but if we needed to change the order, we could specify it with factor(..., levels = c(...), ordered = TRUE).
We now examine the current contrast coding scheme:
"contr.treatment" corresponds to dummy coding for unordered factors. For ordered factors, the default is "contr.poly" (orthogonal polynomials); we won’t cover polynomial contrasts in this subject but refer to p.33 of Chambers and Hastie (1992) for more details. We consider Age and Smoking as unordered factors, and we will use dummy coding.
2Former 3Current
1Never 0 0
2Former 1 0
3Current 0 1
40to59
<40 0
40to59 1
The default reference categories are never-smoked for Smoking, and the <40 for Age.
The MASS::polr() function fits cumulative logit (and other) models. Looking at the documentation using ?polr, particularly how the linear predictor \(\eta\) is defined therein; the signs of beta coefficients in MASS::polr() is opposite to the corresponding coefficients in the textbook (Fahrmeir and Tutz (2001))’s formula.
We first fit a cumulative logistic (proportional odds) model with an intercept, age, smoking. We also include an interaction effect between age and smoking.
MASS::polr() Function Syntax
formula: Model structure response ~ predictors; response must be an ordered factordata: Data frame containing the variablesweights: Case weights (here, frequency counts for each covariate combination)Hess = TRUE: Return the Hessian matrix; needed for summary() and confidence intervalsmethod: Link function—"logistic" (default, proportional odds), "probit", "cloglog", "loglog", or "cauchit"The MASS::polr() function is limited. It can model standard cumulative models (see section 3.3.1 of Fahrmeir and Tutz (2001)), but doesn’t allow for extended cumulative model with threshold variables (see section 3.3.2 of Fahrmeir and Tutz (2001)).
Call:
polr(formula = BTR ~ Age * Smoking, data = btr_dat, weights = Freq,
Hess = TRUE, method = "logistic")
Coefficients:
Value Std. Error t value
Age40to59 -0.8857 0.5359 -1.653
Smoking2Former 0.6973 0.2822 2.471
Smoking3Current 0.3472 0.2238 1.551
Age40to59:Smoking2Former 1.1458 0.6228 1.840
Age40to59:Smoking3Current 2.2007 0.5689 3.868
Intercepts:
Value Std. Error t value
1Normal|2Border 2.8334 0.1764 16.0603
2Border|3Abnorm 4.3078 0.2130 20.2210
Residual Deviance: 1564.968
AIC: 1578.968
The fitted model has cumulative logit link of the form \[\begin{multline*} \log\frac{P(\texttt{BTR}=\texttt{Normal})}{P(\texttt{BTR}>\texttt{Normal})} = \theta_1 + \gamma_1\texttt{Age40to59}+\gamma_2\texttt{Smoking2Former}+\gamma_3\texttt{Smoking3Current} +\\ \gamma_4\texttt{Age40to59}\cdot \texttt{Smoking2Former} + \gamma_5\texttt{Age40to59}\cdot \texttt{Smoking3Current} \end{multline*}\] and \[\begin{multline*} \log\frac{P(\texttt{BTR} \leq \texttt{Borderline})}{P(\texttt{BTR}>\texttt{Borderline})} = \theta_2 + \gamma_1\texttt{Age40to59}+\gamma_2\texttt{Smoking2Former}+\gamma_3\texttt{Smoking3Current} +\\ \gamma_4\texttt{Age40to59}\cdot \texttt{Smoking2Former} + \gamma_5\texttt{Age40to59}\cdot \texttt{Smoking3Current}, \end{multline*}\] where the MLE of the parameters are
\[ \hat{\theta}_1=2.8334, \hat{\theta}_2=4.3078, \hat{\gamma}_1=0.8857, \hat{\gamma}_2=-0.6973 , \hat{\gamma}_3=-0.3472 , \hat{\gamma}_4=-1.1458 , \hat{\gamma}_5=-2.2007. \]
Given that a worker is in the 40–59 age group, the cumulative odds of having a normal test result for a former smoker is \(\exp(-0.6973 - 1.1458)= 0.158\) times that for a person who has never smoked.
As it is a “proportional odds” model, the same is true for the cumulative odds of having “\(\leq\) Borderline” testing result.
There is no residual readily available, but we can observe the linear predictor \(\eta\) as defined in the documentation page for the MASS::polr() function without the intercept.
1 2 3 4 5 6 7
0.0000000 0.0000000 0.0000000 0.6972823 0.6972823 0.6972823 0.3472245
8 9 10 11 12 13 14
0.3472245 0.3472245 -0.8857462 -0.8857462 -0.8857462 0.9573393 0.9573393
15 16 17 18
0.9573393 1.6621466 1.6621466 1.6621466
The \(\zeta\)’s are the theshold parameters \(\theta\)’s as defined in our textbook (Fahrmeir and Tutz (2001)) and slides.
We verify the linear predictors by hand-computing the linear predictor.
Z <- cbind(
(btr_dat$Age == "40to59"),
(btr_dat$Smoking == "2Former"),
(btr_dat$Smoking == "3Current"),
(btr_dat$Age == "40to59") * (btr_dat$Smoking == "2Former"),
(btr_dat$Age == "40to59") * (btr_dat$Smoking == "3Current")
)
tibble(
manual = c(Z %*% coef(btr_logit_inter)),
polr = btr_logit_inter$lp
)# A tibble: 18 × 2
manual polr
<dbl> <dbl>
1 0 0
2 0 0
3 0 0
4 0.697 0.697
5 0.697 0.697
6 0.697 0.697
7 0.347 0.347
8 0.347 0.347
9 0.347 0.347
10 -0.886 -0.886
11 -0.886 -0.886
12 -0.886 -0.886
13 0.957 0.957
14 0.957 0.957
15 0.957 0.957
16 1.66 1.66
17 1.66 1.66
18 1.66 1.66
We see that they line up.
We fit a main-effects-only model without interaction.
Call:
polr(formula = BTR ~ Age + Smoking, data = btr_dat, weights = Freq,
Hess = TRUE, method = "logistic")
Coefficients:
Value Std. Error t value
Age40to59 0.7772 0.1482 5.243
Smoking2Former 0.7815 0.2333 3.350
Smoking3Current 0.9607 0.1918 5.010
Intercepts:
Value Std. Error t value
1Normal|2Border 3.1927 0.1749 18.2544
2Border|3Abnorm 4.6543 0.2125 21.9076
Residual Deviance: 1589.744
AIC: 1599.744
Again the signs of the covariate coefficients are the opposite to our model defined in lectures. Since the main effects model is nested within the interaction model, we can compare them with a likelihood ratio test.
The test has 2 degrees of freedom since we are adding two interaction terms (Age40to59:Smoking2Former and Age40to59:Smoking3Current).
We can use anova():
Likelihood ratio tests of ordinal regression models
Response: BTR
Model Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
1 Age + Smoking 2214 1589.744
2 Age * Smoking 2212 1564.968 1 vs 2 2 24.77585 4.168619e-06
We reject the null hypothesis that there is no interaction, at all reasonable significance levels.
The LR statistic in the anova() output is the difference between the deviances of the two models. Each residual deviance reported is \(-2 \times \log(\text{likelihood})\) for that model (more on this to follow).
The proportional odds assumption can be tested by fitting models with different link functions. The MASS::polr() function supports:
As discussed in lecture, these can be motivated by the so-called “threshold approach” where the response is a categorized version of an underlying latent continuous variables. Figure 3.1 of Fahrmeir and Tutz (2001) displays the shapes of the error densities of the latent variable with respect to these models .
Call:
polr(formula = BTR ~ Age + Smoking + Age:Smoking, data = btr_dat,
weights = Freq, Hess = TRUE, method = "cloglog")
Coefficients:
Value Std. Error t value
Age40to59 -0.2865 0.14264 -2.008
Smoking2Former 0.2125 0.10054 2.114
Smoking3Current 0.1088 0.07301 1.490
Age40to59:Smoking2Former 0.4333 0.18941 2.288
Age40to59:Smoking3Current 0.8379 0.16357 5.122
Intercepts:
Value Std. Error t value
1Normal|2Border 1.0477 0.0558 18.7612
2Border|3Abnorm 1.5528 0.0629 24.6916
Residual Deviance: 1559.949
AIC: 1573.949
Call:
polr(formula = BTR ~ Age + Smoking + Age:Smoking, data = btr_dat,
weights = Freq, Hess = TRUE, method = "loglog")
Coefficients:
Value Std. Error t value
Age40to59 -0.8672 0.5286 -1.640
Smoking2Former 0.6755 0.2700 2.501
Smoking3Current 0.3368 0.2167 1.554
Age40to59:Smoking2Former 1.1003 0.6070 1.813
Age40to59:Smoking3Current 2.0724 0.5573 3.719
Intercepts:
Value Std. Error t value
1Normal|2Border 2.8614 0.1715 16.6838
2Border|3Abnorm 4.2761 0.2080 20.5605
Residual Deviance: 1566.335
AIC: 1580.335
By default, R uses dummy coding. To match to the numerical results in Table 3.3 of Fahrmeir and Tutz (2001) Example 3.4, we need to use effect coding. We switch the global contrast option and refit.
[,1] [,2]
1Never 1 0
2Former 0 1
3Current -1 -1
[,1]
<40 1
40to59 -1
We now have the last categories coded as -1, as also done in the textbook. Refitting both models:
btr_logit_inter_eff <- polr(
BTR ~ Age + Smoking + Age:Smoking,
data = btr_dat,
weights = Freq,
Hess = TRUE,
method = "logistic"
)
btr_logit_main_eff <- polr(
BTR ~ Age + Smoking,
data = btr_dat,
weights = Freq,
Hess = TRUE,
method = "logistic"
)
btr_logit_inter_eff_summary <- summary(btr_logit_inter_eff)
btr_logit_inter_eff_summaryCall:
polr(formula = BTR ~ Age + Smoking + Age:Smoking, data = btr_dat,
weights = Freq, Hess = TRUE, method = "logistic")
Coefficients:
Value Std. Error t value
Age1 -0.11487 0.1086 -1.0580
Smoking1 -0.90596 0.1890 -4.7933
Smoking2 0.36428 0.1421 2.5644
Age1:Smoking1 0.55777 0.1890 2.9512
Age1:Smoking2 -0.01517 0.1421 -0.1068
Intercepts:
Value Std. Error t value
1Normal|2Border 2.3704 0.1086 21.8209
2Border|3Abnorm 3.8447 0.1599 24.0486
Residual Deviance: 1564.968
AIC: 1578.968
Call:
polr(formula = BTR ~ Age + Smoking, data = btr_dat, weights = Freq,
Hess = TRUE, method = "logistic")
Coefficients:
Value Std. Error t value
Age1 -0.3886 0.07412 -5.243
Smoking1 -0.5807 0.12807 -4.535
Smoking2 0.2008 0.12549 1.600
Intercepts:
Value Std. Error t value
1Normal|2Border 2.2234 0.0835 26.6418
2Border|3Abnorm 3.6850 0.1434 25.6938
Residual Deviance: 1589.744
AIC: 1599.744
The values reported in the summaries now match the textbook (up to rounding errors and sign change of the covariate coefficients). In addition, we can redo the likelihood ratio test with the new models to confirm we get the same result.
Likelihood ratio tests of ordinal regression models
Response: BTR
Model Resid. df Resid. Dev Test Df LR stat.
1 Age + Smoking 2214 1589.744
2 Age + Smoking + Age:Smoking 2212 1564.968 1 vs 2 2 24.77585
Pr(Chi)
1
2 4.168618e-06
As expected, the likelihood ratio test results are the same as before, because using effect coding, as opposed to dummy coding, amounts only to a reparametrization of the model.
In each of the regressed model summaries, there is a value “Residual Deviance” reported. We need to be careful about what this exactly means; as usual, this can be highly package dependent.
The deviance component (reported as “residual deviance” by the summary() function) contained in a polr object is \(-2 \times \log(\text{likelihood})\) for the fitted model. We can manually compute the (scaled) deviance as defined in class , i.e. twice the difference between the maximum log-likelihoods of the saturated model and the GLM.
We aim to compute the maximum log-likelihood of the saturated model. We first create a matrix with each row corresponding to a unique covariate combination, with three columns counting the frequencies of each response category for that covariate combination. This can be accomplished elegantly by the function pivot_wider() in the tidyr package.
# A tibble: 6 × 5
Age Smoking `1Normal` `2Border` `3Abnorm`
<fct> <fct> <dbl> <dbl> <dbl>
1 <40 1Never 577 27 7
2 <40 2Former 192 20 3
3 <40 3Current 682 46 11
4 40to59 1Never 164 4 0
5 40to59 2Former 145 15 7
6 40to59 3Current 245 47 27
1Normal 2Border 3Abnorm
[1,] 577 27 7
[2,] 192 20 3
[3,] 682 46 11
[4,] 164 4 0
[5,] 145 15 7
[6,] 245 47 27
We can also represent the data as a matrix of empirical proportions (dividing each row by it’s row sum).
1Normal 2Border 3Abnorm
[1,] 0.9443535 0.04418985 0.01145663
[2,] 0.8930233 0.09302326 0.01395349
[3,] 0.9228687 0.06224628 0.01488498
[4,] 0.9761905 0.02380952 0.00000000
[5,] 0.8682635 0.08982036 0.04191617
[6,] 0.7680251 0.14733542 0.08463950
This is the also the matrix of maximum likelihood estimates under the saturated model (Why?). We can manually calculate the maximum log-likelihood of the saturated model using the multinomial distribution by first calculating the log-likelihood for each cell, (treating \(0 \log(0) := 0\), as one of the cells in data_mat is a zero entry), and summing over all cells to get the total log-likelihood for the saturated model.
1Normal 2Border 3Abnorm
[1,] -33.035958 -84.22002 -31.28431
[2,] -21.723390 -47.49812 -12.81608
[3,] -54.742955 -127.72620 -46.28143
[4,] -3.951998 -14.95068 0.00000
[5,] -20.482710 -36.14915 -22.20459
[6,] -64.663559 -90.00704 -66.67256
The above computation is under the ungrouped view of the multinomial data as in Appendix A2. The log-likelihood is then: \[
\sum_{i = 1}^6 \sum_{j = 1}^3 \Bigl(\log(p_{ij}) \tilde{y}_{ij} \Bigr),
\] where \(\tilde{y}_{ij}\) (the entries in data_mat) is the count of workers in response category \(j\) (one of “Normal”, “Border” or “Abnormal”) within covariate group \(i\) (defined by “Age” and “Smoking”), and we treat \(0 \log(0) = 0\).
We can recover the deviance reported in Table 3.3 of Fahrmeir and Tutz (2001) by computing twice the difference between the reported deviance of the fitted model and the log-likelihood of the saturated model (just manually computed).
[1] 8.146686
This confirms that the so called deviance component of a polr object is actually just twice the negative of the log-likelihood of the fitted model (under the ungrouped view in Appendix A2). This is very different from what we usually consider the deviance.
We also fit the models with complementary log-log and loglog link functions using effect coding, then summarize results in a comparison table similar to Table 3.3 of Fahrmeir and Tutz (2001), p.90:
The table is built in three steps:
coef(summary(...))[ ,"Value"].dplyr::bind_rows().dplyr::mutate(across(where(is.numeric), ~ round(., 3))) to round all numeric columns to 3 decimal places for cleaner display.rownames(coef(...)) supplies the parameter labels so each estimate aligns across link functions.
# Fit complementary log-log (proportional hazards)
btr_ph_eff <- polr(
BTR ~ Age + Smoking + Age:Smoking,
data = btr_dat,
weights = Freq,
Hess = TRUE,
method = "cloglog"
)
btr_ph_eff_summary <- summary(btr_ph_eff)
# Fit loglog (extreme maximal-value)
btr_extm_eff <- polr(
BTR ~ Age + Smoking + Age:Smoking,
data = btr_dat,
weights = Freq,
Hess = TRUE,
method = "loglog"
)
btr_extm_eff_summary <- summary(btr_extm_eff)
# Create a list of models for easy iteration
models <- list(
Logistic = btr_logit_inter_eff_summary,
`Complementary Log-Log` = btr_ph_eff_summary,
Loglog = btr_extm_eff_summary
)
# Extract variable names
coef_names <- rownames(coef(models[[1]]))
results_table <- bind_rows(
# Coefficients
tibble(
Parameter = coef_names,
`Cum Log` = coef(btr_logit_inter_eff_summary)[, "Value"],
`Prop Haz` = coef(btr_ph_eff_summary)[, "Value"],
`Ext Max Val` = coef(btr_extm_eff_summary)[, "Value"]
),
# Residual Deviance
tibble(
Parameter = "Deviance",
`Cum Log` = 2 * loglike_saturated + deviance(btr_logit_inter_eff),
`Prop Haz` = 2 * loglike_saturated + deviance(btr_ph_eff),
`Ext Max Val` = 2 * loglike_saturated + deviance(btr_extm_eff)
)
) |>
mutate(across(where(is.numeric), ~ round(., 3)))
results_table# A tibble: 8 × 4
Parameter `Cum Log` `Prop Haz` `Ext Max Val`
<chr> <dbl> <dbl> <dbl>
1 Age1 -0.115 -0.069 -0.095
2 Smoking1 -0.906 -0.319 -0.866
3 Smoking2 0.364 0.11 0.359
4 Age1:Smoking1 0.558 0.212 0.529
5 Age1:Smoking2 -0.015 -0.005 -0.021
6 1Normal|2Border 2.37 0.872 2.43
7 2Border|3Abnorm 3.84 1.38 3.84
8 Deviance 8.15 3.13 9.51
This can be written in a nested/list-column style using functional programming. The idea is to store one row per link function, fit each model via purrr::map(), then map again to extract coefficients and residual deviance.
link_specs <- tibble(
label = c("Cum Log", "Prop Haz", "Ext Max Val"),
method = c("logistic", "cloglog", "loglog")
)
# Fit one model per row (list-column of models)
model_tbl <- link_specs |>
mutate(
model = map(
method,
~ polr(
BTR ~ Age + Smoking + Age:Smoking,
data = btr_dat,
weights = Freq,
Hess = TRUE,
method = .x
)
),
coef_value = map(model, ~ coef(summary(.x))[, "Value"]),
residual_dev = map_dbl(model, ~ 2 * loglike_saturated + deviance(.x))
)
# Expand mapped coefficients to long format
coef_long <- model_tbl |>
mutate(
Parameter = map(coef_value, names),
estimate = coef_value
) |>
dplyr::select(label, Parameter, estimate) |>
tidyr::unnest(cols = c(Parameter, estimate))
# Add residual deviance row in matching long format
dev_long <- model_tbl |>
transmute(
label,
Parameter = "Residual Deviance",
estimate = residual_dev
)
results_table_map <- bind_rows(coef_long, dev_long) |>
pivot_wider(
names_from = label,
values_from = estimate
) |>
mutate(across(where(is.numeric), ~ round(., 4)))
results_table_map# A tibble: 8 × 4
Parameter `Cum Log` `Prop Haz` `Ext Max Val`
<chr> <dbl> <dbl> <dbl>
1 Age1 -0.115 -0.0686 -0.0952
2 Smoking1 -0.906 -0.319 -0.866
3 Smoking2 0.364 0.110 0.359
4 Age1:Smoking1 0.558 0.212 0.529
5 Age1:Smoking2 -0.0152 -0.0048 -0.0214
6 1Normal|2Border 2.37 0.872 2.43
7 2Border|3Abnorm 3.84 1.38 3.84
8 Residual Deviance 8.15 3.13 9.51
We have now matched Table 3.3 of Fahrmeir and Tutz (2001) (with reverse signs for the non-threshold coefficients).
It is highly recommended that you read through Example 3.4 in the textbook to get an idea of intepreting the results. It is worth considering here that the proportional hazard has the lowest deviance and hence the best fit.
The MASS::polr() function does not readily provide residuals.
Fitted probabilities can be extracted directly or using the generic predict() function:
1Normal 2Border 3Abnorm
1 0.9444565 0.04225911 0.01328436
2 0.9444565 0.04225911 0.01328436
3 0.9444565 0.04225911 0.01328436
4 0.8943660 0.07930714 0.02632687
5 0.8943660 0.07930714 0.02632687
6 0.8943660 0.07930714 0.02632687
1Normal 2Border 3Abnorm
1 0.9444565 0.04225911 0.01328436
2 0.9444565 0.04225911 0.01328436
3 0.9444565 0.04225911 0.01328436
4 0.8943660 0.07930714 0.02632687
5 0.8943660 0.07930714 0.02632687
6 0.8943660 0.07930714 0.02632687
Both approaches yield identical results.
The step() function performs automatic model selection using the AIC criterion. (not directly covered in this course unless time allows)
Start: AIC=1578.97
BTR ~ Age + Smoking + Age:Smoking
Df AIC
<none> 1579.0
- Age:Smoking 2 1599.7
Call:
polr(formula = BTR ~ Age + Smoking + Age:Smoking, data = btr_dat,
weights = Freq, Hess = TRUE, method = "logistic")
Coefficients:
Age1 Smoking1 Smoking2 Age1:Smoking1 Age1:Smoking2
-0.11486771 -0.90596380 0.36428448 0.55777234 -0.01516692
Intercepts:
1Normal|2Border 2Border|3Abnorm
2.370384 3.844735
Residual Deviance: 1564.968
AIC: 1578.968
We can extract the estimated variance-covariance matrix of the parameter estimates using vcov(). This should roughly line up with the inverse of the Fisher information matrix.
Age1 Smoking1 Smoking2 Age1:Smoking1
Age1 0.011787741 -0.01243168 0.005998232 0.01214182
Smoking1 -0.012431684 0.03572298 -0.020537642 -0.02505560
Smoking2 0.005998232 -0.02053764 0.020179329 0.01274490
Age1:Smoking1 0.012141822 -0.02505560 0.012744904 0.03571939
Age1:Smoking2 -0.003394051 0.01274270 -0.006617613 -0.02053578
1Normal|2Border 0.006312767 -0.01213435 0.003392219 0.01242837
2Border|3Abnorm 0.006164206 -0.01244460 0.003424437 0.01263740
Age1:Smoking2 1Normal|2Border 2Border|3Abnorm
Age1 -0.003394051 0.006312767 0.006164206
Smoking1 0.012742700 -0.012134352 -0.012444602
Smoking2 -0.006617613 0.003392219 0.003424437
Age1:Smoking1 -0.020535784 0.012428367 0.012637403
Age1:Smoking2 0.020179722 -0.006001108 -0.005918600
1Normal|2Border -0.006001108 0.011800241 0.011347044
2Border|3Abnorm -0.005918600 0.011347044 0.025559519
From this we can match the standard errors reported in the summary output, since they are the square root of the diagonal elements of the variance-covariance matrix:
Value Std. Error t value
Age1 -0.11486771 0.1085714 -1.0579927
Smoking1 -0.90596380 0.1890052 -4.7933268
Smoking2 0.36428448 0.1420540 2.5644091
Age1:Smoking1 0.55777234 0.1889957 2.9512429
Age1:Smoking2 -0.01516692 0.1420553 -0.1067677
1Normal|2Border 2.37038447 0.1086289 21.8209350
2Border|3Abnorm 3.84473524 0.1598734 24.0486169
Age1 Smoking1 Smoking2 Age1:Smoking1 Age1:Smoking2
TRUE TRUE TRUE TRUE TRUE
1Normal|2Border 2Border|3Abnorm
TRUE TRUE
We should be able to recover the same confidence intervals by using the variance-covariance matrix and a normal approximation (Wald intervals).
We can check the covariance matrix against the inverse of the Hessian matrix (Fisher information) returned by MASS::polr(); note that I can simply refer to it as btr_logit_inter_eff$Hess instead of btr_logit_inter_eff$Hessian because of partial matching in R:
Age1 Smoking1 Smoking2 Age1:Smoking1
Age1 0.0117877412 -0.012431684 5.998232e-03 0.0121418216
Smoking1 -0.0124316842 0.035722975 -2.053764e-02 -0.0250556010
Smoking2 0.0059982315 -0.020537642 2.017933e-02 0.0127449043
Age1:Smoking1 0.0121418216 -0.025055601 1.274490e-02 0.0357193912
Age1:Smoking2 -0.0033940507 0.012742700 -6.617613e-03 -0.0205357845
1Normal|2Border 0.0063127668 -0.012134352 3.392219e-03 0.0124283674
2Border|3Abnorm -0.0001007637 -0.000210432 2.185198e-05 0.0001417814
Age1:Smoking2 1Normal|2Border 2Border|3Abnorm
Age1 -3.394051e-03 0.0063127668 -1.007637e-04
Smoking1 1.274270e-02 -0.0121343518 -2.104320e-04
Smoking2 -6.617613e-03 0.0033922194 2.185198e-05
Age1:Smoking1 -2.053578e-02 0.0124283674 1.417814e-04
Age1:Smoking2 2.017972e-02 -0.0060011075 5.596186e-05
1Normal|2Border -6.001108e-03 0.0118002410 -3.073874e-04
2Border|3Abnorm 5.596186e-05 -0.0003073874 6.746838e-03
We can visually see that the two matrices hessian_inv and logit_vcov show slight differences in certain smaller entries. (The exact reason isn’t known to us yet; mostly numerical.)
R provides the generic method confint() to compute confidence intervals for model parameters.
Waiting for profiling to be done...
2.5 % 97.5 %
Age1 -0.315 0.118
Smoking1 -1.328 -0.570
Smoking2 0.088 0.651
Age1:Smoking1 0.221 0.980
Age1:Smoking2 -0.305 0.257
We can attempt to recovery the same confidence intervals using a Wald approximation based on the coefficient estimates and their standard errors:
# A tibble: 7 × 2
wald_lower wald_upper
<dbl> <dbl>
1 -0.328 0.0979
2 -1.28 -0.536
3 0.0859 0.643
4 0.187 0.928
5 -0.294 0.263
6 2.16 2.58
7 3.53 4.16
These Wald confidence intervals are different from those computed by confint(); in particular, the Wald ones are symmetric around the parameter estimates, but those by confint() aren’t.
This is because the confint() function uses the profile likelihood confidence intervals contructed by inverting the likelihood ratio tests, which is more suitable for some non-linear regression problems. See Agresti (2013) section 1.3.4 for constructing CIs by inverting tests, and also Agresti (2013) section 3.2.6 for constructing profile likelihood confidence intervals.
Run profile(btr_logit_inter_eff) and see Venables and Ripley (2002), p. 220 for more details on the profile() function.
R version 4.4.2 (2024-10-31)
Platform: x86_64-apple-darwin20
Running under: macOS Sequoia 15.7.5
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Australia/Melbourne
tzcode source: internal
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] tidyr_1.3.2 readr_2.1.6 purrr_1.2.1 MASS_7.3-61
[5] dplyr_1.2.0 conflicted_1.2.0
loaded via a namespace (and not attached):
[1] crayon_1.5.3 vctrs_0.7.1 cli_3.6.5 knitr_1.51
[5] rlang_1.1.7 xfun_0.56 renv_1.1.7 generics_0.1.4
[9] jsonlite_2.0.0 bit_4.6.0 glue_1.8.0 htmltools_0.5.9
[13] hms_1.1.4 rmarkdown_2.30 evaluate_1.0.5 tibble_3.3.1
[17] tzdb_0.5.0 fastmap_1.2.0 yaml_2.3.12 lifecycle_1.0.5
[21] memoise_2.0.1 compiler_4.4.2 pkgconfig_2.0.3 rstudioapi_0.18.0
[25] digest_0.6.39 R6_2.6.1 utf8_1.2.6 tidyselect_1.2.1
[29] parallel_4.4.2 vroom_1.7.0 pillar_1.11.1 magrittr_2.0.4
[33] withr_3.0.2 bit64_4.6.0-1 tools_4.4.2 cachem_1.1.0
packages <- c("conflicted", "dplyr", "MASS", "purrr", "readr", "tidyr")
for (pkg in packages) {
cit <- citation(pkg)
# Check if multiple citations
if (length(cit) > 1) {
cat("- **", pkg, "**:\n", sep = "")
for (i in seq_along(cit)) {
cit_text <- format(cit[i], style = "text")
cat(" - ", cit_text, "\n\n", sep = "")
}
} else {
cit_text <- format(cit, style = "text")
cat("- **", pkg, "**: ", cit_text, "\n\n", sep = "")
}
}conflicted: Wickham H (2023). conflicted: An Alternative Conflict Resolution Strategy. R package version 1.2.0, https://github.com/r-lib/conflicted, https://conflicted.r-lib.org/.
dplyr: Wickham H, François R, Henry L, Müller K, Vaughan D (2026). dplyr: A Grammar of Data Manipulation. R package version 1.2.0, https://github.com/tidyverse/dplyr, https://dplyr.tidyverse.org.
MASS: Venables WN, Ripley BD (2002). Modern Applied Statistics with S, Fourth edition. Springer, New York. ISBN 0-387-95457-0, https://www.stats.ox.ac.uk/pub/MASS4/.
purrr: Wickham H, Henry L (2026). purrr: Functional Programming Tools. R package version 1.2.1, https://github.com/tidyverse/purrr, https://purrr.tidyverse.org/.
readr: Wickham H, Hester J, Bryan J (2025). readr: Read Rectangular Text Data. R package version 2.1.6, https://github.com/tidyverse/readr, https://readr.tidyverse.org.
tidyr: Wickham H, Vaughan D, Girlich M (2025). tidyr: Tidy Messy Data. R package version 1.3.2, https://github.com/tidyverse/tidyr, https://tidyr.tidyverse.org.