Code
library(conflicted)
library(dplyr)
library(Fahrmeir)
library(tidyr)
library(VGAM)Loading required package: stats4
Loading required package: splines
Fahrmeir and Tutz (2001) Example 3.12
April 15, 2026
We will fit a simple asymmetric model to a dataset on happiness levels from Table 3.10 of p.115 in Fahrmeir and Tutz (2001). We want to determine the association between gender, level of education, and happiness levels. We will do this in two parts, first by setting gender as a predictor for years of schooling, and then by setting years of schooling as a predictor for happiness levels.
The model is specified more formally in the equation (3.5.3) in Fahrmeir and Tutz (2001).
As always, we start by loading the necessary packages (and installing any we don’t already have).
Loading required package: stats4
Loading required package: splines
We load the happy dataset from the Fahrmeir package, which contains data on happiness levels. As usual, we will convert it to a tibble first.
# A tibble: 24 × 4
Rep.happiness School Sex n
<ord> <fct> <fct> <int>
1 Not to happy <12 Males 40
2 Not to happy 12 Males 21
3 Not to happy 13-16 Males 14
4 Not to happy >16 Males 3
5 Pretty happy <12 Males 131
6 Pretty happy 12 Males 116
7 Pretty happy 13-16 Males 112
8 Pretty happy >16 Males 27
9 Very happy <12 Males 82
10 Very happy 12 Males 61
# ℹ 14 more rows
We want to fit a cumulative logit model to the marginal distribution of schooling, predicted by gender. To do this, we first need to arrange the data into wide form.
# A tibble: 2 × 5
Sex school1 school2 school3 school4
<fct> <int> <int> <int> <int>
1 Males 253 198 181 57
2 Females 304 309 183 33
dplyr::mutate() recodes factor levels so the column names match the model inputs.dplyr::summarize(..., .by = ...) aggregates counts within each group.tidyr::pivot_wider() spreads groups into columns and fills missing combinations with 0 (if there is any).Effect coding is needed to reproduce the numbers in the book:
contr.treatment is the default (corresponding to dummy coding), but we need contr.sum for effect coding:
Males
Females 0
Males 1
[,1]
Females 1
Males -1
We now have effect coding, however we need to switch the reference group to female, in order to match the book.
Now we can fit the model with school1 as the reference level.
Call:
vglm(formula = cbind(school1, school2, school3, school4) ~ Sex,
family = cumulative(), data = happy_wide)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept):1 -0.545312 0.053480 -10.197 < 2e-16 ***
(Intercept):2 0.841142 0.056303 14.940 < 2e-16 ***
(Intercept):3 2.794465 0.112567 24.825 < 2e-16 ***
Sex1:1 0.001059 0.053480 0.020 0.984203
Sex1:2 -0.201945 0.056303 -3.587 0.000335 ***
Sex1:3 -0.388627 0.112567 -3.452 0.000556 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Names of linear predictors: logitlink(P[Y<=1]), logitlink(P[Y<=2]),
logitlink(P[Y<=3])
Residual deviance: -1.225e-13 on 0 degrees of freedom
Log-likelihood: -19.0017 on 0 degrees of freedom
Number of Fisher scoring iterations: 4
Exponentiated coefficients:
Sex1:1 Sex1:2 Sex1:3
1.0010595 0.8171399 0.6779873
The coefficient values are the same as those reported in the first six rows of Table 3.11 on p.117 of Fahrmeir and Tutz (2001).
We can reproduce the standard errors (and hence the \(p\)-values) reported in the first six rows of Table 3.11 by using the code below:
(Intercept):1 (Intercept):2 (Intercept):3 Sex1:1 Sex1:2
0.05347986 0.05630259 0.11256715 0.05347986 0.05630259
Sex1:3
0.11256715
However, we probably should treat these standard errors and \(p\)-values in Table 3.11 with a grain of salt. See the end for more explanation.
We can access the linear predictors
Now we are interested in predicting happiness levels by years of schooling. We again arrange our data into a wide form with the happiness levels as columns.
happy2 <- happy_tbl |>
mutate(
School = factor(School, levels = c("<12", "12", "13-16", ">16")),
Rep.happiness = recode(
Rep.happiness,
"Not to happy" = "happy_low",
"Pretty happy" = "happy_medium",
"Very happy" = "happy_high"
)
) |>
summarize(n = sum(n), .by = c(Sex, School, Rep.happiness)) |>
pivot_wider(
names_from = Rep.happiness,
values_from = n,
values_fill = 0
)
happy2# A tibble: 8 × 5
Sex School happy_low happy_medium happy_high
<fct> <fct> <int> <int> <int>
1 Males <12 40 131 82
2 Males 12 21 116 61
3 Males 13-16 14 112 55
4 Males >16 3 27 27
5 Females <12 62 155 87
6 Females 12 26 156 127
7 Females 13-16 12 95 76
8 Females >16 3 15 15
We check the coding scheme for School:
Here we actually want to switch from effect to dummy coding with <12 as reference level:
12 13-16 >16
<12 0 0 0
12 1 0 0
13-16 0 1 0
>16 0 0 1
<12 is the reference level.
We now fit the model.
This regression ignores the Sex column (which is what we want).
Call:
vglm(formula = cbind(happy_low, happy_medium, happy_high) ~ School,
family = cumulative, data = happy2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept):1 -1.49532 0.10955 -13.649 < 2e-16 ***
(Intercept):2 0.83111 0.09217 9.018 < 2e-16 ***
School12:1 -0.78575 0.18829 -4.173 3.0e-05 ***
School12:2 -0.30236 0.13019 -2.323 0.02021 *
School13-16:1 -1.06962 0.23113 -4.628 3.7e-06 ***
School13-16:2 -0.25527 0.14290 -1.786 0.07404 .
School>16:1 -1.14373 0.43655 -2.620 0.00879 **
School>16:2 -0.69758 0.23052 -3.026 0.00248 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Names of linear predictors: logitlink(P[Y<=1]), logitlink(P[Y<=2])
Residual deviance: 13.2761 on 8 degrees of freedom
Log-likelihood: -45.6358 on 8 degrees of freedom
Number of Fisher scoring iterations: 4
Exponentiated coefficients:
School12:1 School12:2 School13-16:1 School13-16:2 School>16:1
0.4557758 0.7390738 0.3431373 0.7747108 0.3186275
School>16:2
0.4977909
When we compare the coefficients here to those in Table 3.11, only the two intercept estimates (corresponding to reference level <12 of School) match the numbers in the book. This is because under the dummy coding scheme, we are picking less than 12 years of school as our reference level. In contrast, the book treats each school intercept as it’s own absolute level (see second line of (3.5.3) in Fahrmeir and Tutz (2001)).
Therefore to recover the original estimates, we add the intercepts to each pair of coefficients for the other schooling levels:
(Intercept):1 (Intercept):2 School12:1 School12:2 School13-16:1
-1.4953246 0.8311066 -0.7857543 -0.3023575 -1.0696248
School13-16:2 School>16:1 School>16:2
-0.2552655 -1.1437327 -0.6975752
(Intercept):1 (Intercept):2 (Intercept):1 (Intercept):2 (Intercept):1
-2.2810789 0.5287491 -2.5649494 0.5758411 -2.6390573
(Intercept):2
0.1335314
These recover the values reported in the last six rows of Table 3.11 in Fahrmeir and Tutz (2001) for the other three schooling levels, i.e. 12 years, 13-16 years, and \(\geq\) 17 years.
We now attempt to reproduce the deviance.
To retrieve the likelihood of the full GLM model, we sum the log likelihoods resulting from the two cumulative models in the first and second parts of display \((3.5.3)\) in Fahrmeir and Tutz (2001).
Make sure you know why the log likelihood of the full GLM can be computed by simply summing the log likelihoods of the two components.
Alternatively we can extract the log-likelihood of each of the individual groups separately:
1 2
-9.523349 -9.478311
1 2 3 4 5 6 7 8
-6.192923 -6.887815 -6.194003 -3.790702 -6.302572 -6.621565 -6.122023 -3.524187
Now we just need to compute the log-likelihood of the saturated model. We split the dataset into a male group and female group. Within each group we have 12 observations corresponding to the 12 combinations of schooling and happiness levels (meaning we have 11 degrees of freedom for the male and female groups separately). The maximum likelihood estimators for each group are exactly the observed proportions for the different combinations of schooling and happiness levels within that group.
Therefore we can calculate the log-likelihood of the saturated model by summing the log-likelihoods as follows:
Rep.happiness School Sex n
1 Not to happy <12 Males 40
2 Not to happy 12 Males 21
3 Not to happy 13-16 Males 14
4 Not to happy >16 Males 3
5 Pretty happy <12 Males 131
6 Pretty happy 12 Males 116
7 Pretty happy 13-16 Males 112
8 Pretty happy >16 Males 27
9 Very happy <12 Males 82
10 Very happy 12 Males 61
11 Very happy 13-16 Males 55
12 Very happy >16 Males 27
Rep.happiness School Sex n
13 Not to happy <12 Females 62
14 Not to happy 12 Females 26
15 Not to happy 13-16 Females 12
16 Not to happy >16 Females 3
17 Pretty happy <12 Females 155
18 Pretty happy 12 Females 156
19 Pretty happy 13-16 Females 95
20 Pretty happy >16 Females 15
21 Very happy <12 Females 87
22 Very happy 12 Females 127
23 Very happy 13-16 Females 76
24 Very happy >16 Females 15
[1] -57.99938
Finally, we can compute the deviance as twice the difference between the log-likelihood of the saturated model and the log-likelihood of the fitted model:
This is same as the deviance reported on p.116 in Fahrmeir and Tutz (2001).
The model prescribed by display \((3.5.3)\) on p.116 of Fahrmeir and Tutz (2001) is by nature one single GLM with \((Y_1, Y_2)\) (years of school and happiness levels) as response (precisely, a multinomial response with 1 trial and 12 categories) and \(x\) (sex) as the covariate. To properly get the standard errors for all the 14 parameter estimates in Table 3.11, strictly speaking, one should first get an estimate of the \(14 \times 14\) Fisher information of this GLM (e.g. use the observed information matrix), and take the 14 diagonal entries of the inverse of it as the estimated variance for these parameter estimates.
Apparently, the standard errors presented in Table 3.11 are not obtained this way: From summary(happy_logit1_vglm) (which reproduces all the standard deviations for \(\theta_1, \theta_2, \theta_3, \beta_1^{(1)}, \beta_2^{(1)}, \beta_3^{(1)}\) in Table 3.11) and summary(happy_logit2_vglm)(which also reproduces the same standard deviations \(0.109\) and \(0.092\) for \(\theta_{11}\) and \(\theta_{12}\) in Table 3.11), we can see that these standard errors are only computed based on treating the two conditional models in \((3.5.3)\) as two separate GLMs. These may not be the same standard errors we could derive by using the (proper) “one single GLM” approach described in the prior paragraph; I leave it to the interested student to check if the two approaches actually coincide.
R version 4.4.2 (2024-10-31)
Platform: x86_64-apple-darwin20
Running under: macOS Sequoia 15.7.5
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Australia/Melbourne
tzcode source: internal
attached base packages:
[1] splines stats4 stats graphics grDevices datasets utils
[8] methods base
other attached packages:
[1] VGAM_1.1-14 tidyr_1.3.2 Fahrmeir_2016.5.31 dplyr_1.2.0
[5] conflicted_1.2.0
loaded via a namespace (and not attached):
[1] vctrs_0.7.1 cli_3.6.5 knitr_1.51 rlang_1.1.7
[5] xfun_0.56 purrr_1.2.1 renv_1.1.7 generics_0.1.4
[9] jsonlite_2.0.0 glue_1.8.0 htmltools_0.5.9 rmarkdown_2.30
[13] evaluate_1.0.5 tibble_3.3.1 fastmap_1.2.0 yaml_2.3.12
[17] lifecycle_1.0.5 memoise_2.0.1 compiler_4.4.2 pkgconfig_2.0.3
[21] rstudioapi_0.18.0 digest_0.6.39 R6_2.6.1 utf8_1.2.6
[25] tidyselect_1.2.1 pillar_1.11.1 magrittr_2.0.4 withr_3.0.2
[29] tools_4.4.2 cachem_1.1.0
packages <- c(
"conflicted",
"dplyr",
"Fahrmeir",
"tidyr",
"VGAM"
)
for (pkg in packages) {
cit <- citation(pkg)
# Check if multiple citations
if (length(cit) > 1) {
cat("- **", pkg, "**:\n", sep = "")
for (i in seq_along(cit)) {
cit_text <- format(cit[i], style = "text")
cat(" - ", cit_text, "\n\n", sep = "")
}
} else {
cit_text <- format(cit, style = "text")
cat("- **", pkg, "**: ", cit_text, "\n\n", sep = "")
}
}conflicted: Wickham H (2023). conflicted: An Alternative Conflict Resolution Strategy. R package version 1.2.0, https://github.com/r-lib/conflicted, https://conflicted.r-lib.org/.
dplyr: Wickham H, François R, Henry L, Müller K, Vaughan D (2026). dplyr: A Grammar of Data Manipulation. R package version 1.2.0, https://github.com/tidyverse/dplyr, https://dplyr.tidyverse.org.
Fahrmeir: Halvorsen cbKB (2016). Fahrmeir: Data from the Book “Multivariate Statistical Modelling Based on Generalized Linear Models”, First Edition, by Ludwig Fahrmeir and Gerhard Tutz. R package version 2016.5.31.
tidyr: Wickham H, Vaughan D, Girlich M (2025). tidyr: Tidy Messy Data. R package version 1.3.2, https://github.com/tidyverse/tidyr, https://tidyr.tidyverse.org.
VGAM:
Yee TW (2015). Vector Generalized Linear and Additive Models: With an Implementation in R. Springer, New York, USA.
Yee TW, Wild CJ (1996). “Vector Generalized Additive Models.” Journal of Royal Statistical Society, Series B, 58(3), 481-493.
Yee TW (2010). “The VGAM Package for Categorical Data Analysis.” Journal of Statistical Software, 32(10), 1-34. doi:10.18637/jss.v032.i10 https://doi.org/10.18637/jss.v032.i10.
Yee TW, Hadi AF (2014). “Row-column interaction models, with an R implementation.” Computational Statistics, 29(6), 1427-1445.
Yee TW (2025). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-14, https://CRAN.R-project.org/package=VGAM.
Yee TW (2013). “Two-parameter reduced-rank vector generalized linear models.” Computational Statistics and Data Analysis, 71, 889-902.
Yee TW, Stoklosa J, Huggins RM (2015). “The VGAM Package for Capture-Recapture Data Using the Conditional Likelihood.” Journal of Statistical Software, 65(5), 1-33. doi:10.18637/jss.v065.i05 https://doi.org/10.18637/jss.v065.i05.
Yee TW (2020). “The VGAM package for negative binomial regression.” Australian and New Zealand Journal of Statistics, 62(1), 116-131.
Yee TW (2022). “On the Hauck-Donner effect in Wald tests: Detection, tipping points and parameter space characterization.” Journal of the American Statistical Association, 117(540), 1763-1774.
Yee TW, Ma C (2024). “Generally altered, inflated, truncated and deflated regression.” Statistical Science, 39(4), 568-588.
Yee TW, Frigau L, Ma C (2025). “Heaping and seeping, GAITD regression and doubly constrained reduced-rank vector generalized linear models.” Annals of Applied Statistics, (in press), 1-25.