Multinomial Logistic Regression: Caesarian Dataset Analysis

Fahrmeir and Tutz (2001) Examples 3.1 and 3.11

Published

April 30, 2026

Overview

This script replicates the analyses from Examples 3.1 and 3.11 in Fahrmeir and Tutz (2001).

Packages and Data

Load all required packages and the prepared Caesar dataset. As always, we load conflicted to manage any function name conflicts across packages.

Code

library(conflicted)

Then we load the necessary packages for data manipulation and modelling.

Code

library(dplyr)
library(Fahrmeir)
library(glmnet)

Loading required package: Matrix

Loaded glmnet 4.1-10

Code

library(nnet)
library(tidyr)
library(readr)
library(VGAM)

Loading required package: stats4

Loading required package: splines

Example 3.1: Caesarian Birth Study (p. 73)

This data is the same as the data we examined in the first script, but now we distinguish between the two infection types Inf1 and Inf2 instead of just a binary Inf variable. We import the ready made dataset.

We analyze this using a basic multinomial logit model with no infection as the reference category.

What is the Model Specification: Expand for Answer

\[ \log\frac{P(\texttt{Inf1})}{P(\texttt{Noinf})} = \beta_{10} + \beta_{1N}\mathbb{1}_{\texttt{NoPlan}} + \beta_{1F}\mathbb{1}_{\texttt{RiskF}} + \beta_{1A}\mathbb{1}_{\texttt{Antib}} \] \[ \log\frac{P(\texttt{Inf2})}{P(\texttt{Noinf})} = \beta_{20} + \beta_{2N}\mathbb{1}_{\texttt{NoPlan}} + \beta_{2F}\mathbb{1}_{\texttt{RiskF}} + \beta_{2A}\mathbb{1}_{\texttt{Antib}} \]

Code

caes_dat <- read_csv("data/Example3-1Caes.csv")

Rows: 7 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (7): size, noInf, Inf1, Inf2, NoPlan, Antib, RiskF

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Code

caes_dat

# A tibble: 7 × 7
   size noInf  Inf1  Inf2 NoPlan Antib RiskF
  <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>
1    40    32     4     4      0     0     0
2    58    30    11    17      0     0     1
3     2     2     0     0      0     1     0
4    18    17     0     1      0     1     1
5     9     9     0     0      1     0     0
6    26     3    10    13      1     0     1
7    98    87     4     7      1     1     1

Alternative Data Preparation (from Fahrmeir Package)

Instead of using the pre-prepared CSV file, we could also start from the original caesar dataset in the Fahrmeir package, and manipulate it into the required format for modeling.

Code

head(caesar)

  y  w noplan       factor       antib yl patco
1 3 87    not risk factors antibiotics  0     1
2 1  4    not risk factors antibiotics  1     1
3 2  7    not risk factors antibiotics  1     1
4 2 13    not risk factors     without  1     2
5 1 10    not risk factors     without  1     2
6 3  3    not risk factors     without  0     2

The code below:

Converts the caesar data to a tibble and removes unused columns
Removes any rows with zero counts
Pivots from long format (one row per infection type) to wide format (one row per covariate combination)
Renames infection type codes 1, 2, and 3 to Inf1, Inf2, and Noinf respectively
Computes total counts and recodes factors to integers matching the textbook coding

Code

caesar_clean <- caesar |>
  as_tibble() |>
  select(-patco, -yl) |>
  dplyr::filter(w > 0) |>
  pivot_wider(
    names_from = y,
    values_from = w,
    values_fill = 0
  ) |>
  rename(Noinf = `3`, Inf1 = `1`, Inf2 = `2`) |>
  mutate(
    size = Noinf + Inf1 + Inf2,
    factor = as.integer(factor == "risk factors"),
    noplan = as.integer(noplan == "not"),
    antib = as.integer(antib == "antibiotics")
  ) |>
  arrange(noplan, antib, factor) |>
  relocate(size, Noinf, Inf1, Inf2, noplan, antib, factor)

caesar_clean

# A tibble: 7 × 7
   size Noinf  Inf1  Inf2 noplan antib factor
  <dbl> <dbl> <dbl> <dbl>  <int> <int>  <int>
1    40    32     4     4      0     0      0
2    58    30    11    17      0     0      1
3     2     2     0     0      0     1      0
4    18    17     0     1      0     1      1
5     9     9     0     0      1     0      0
6    26     3    10    13      1     0      1
7    98    87     4     7      1     1      1

Fit with `nnet::multinom()` from `nnet`

To fit the a multinomial log-linear model we use the nnet::multinom() function from the nnet package. The nnet package is written by the authors of and well documented in Venables and Ripley (2002). To see more information about the nnet::multinom() function, you can refer to the documentation by running ?multinom in your R environment. It is especially useful to check the arguments controlling optimisation (maxit, trace, Hess) when reproducing textbook results.

multinom Optimization Method

The nnet::multinom() function uses the BFGS optimization method, which is a quasi-Newton method for finding the maximum likelihood estimates of the model parameters. This is quicker than the Fisher scoring algorithm which requires computing the Hessian matrix at each iteration.

We extract the response variables as a matrix format for the nnet::multinom function,

Code

caes_response <- caes_dat |>
  select(noInf, Inf1, Inf2) |>
  as.matrix()

caes_response

     noInf Inf1 Inf2
[1,]    32    4    4
[2,]    30   11   17
[3,]     2    0    0
[4,]    17    0    1
[5,]     9    0    0
[6,]     3   10   13
[7,]    87    4    7

and fit the model.

Code

caes_nnet <- multinom(
  caes_response ~ NoPlan + Antib + RiskF,
  data = caes_dat
)

# weights:  15 (8 variable)
initial  value 275.751684 
iter  10 value 161.068578
final  value 160.937147 
converged

Code

caes_nnet_summary <- summary(caes_nnet)
caes_nnet_summary

Call:
multinom(formula = caes_response ~ NoPlan + Antib + RiskF, data = caes_dat)

Coefficients:
     (Intercept)    NoPlan     Antib    RiskF
Inf1   -2.621012 1.1742513 -3.520249 1.829241
Inf2   -2.559917 0.9959762 -3.087160 2.195458

Std. Errors:
     (Intercept)    NoPlan     Antib     RiskF
Inf1   0.5567209 0.5213014 0.6717416 0.6023322
Inf2   0.5462995 0.4813634 0.5498675 0.5869601

Residual Deviance: 321.8743 
AIC: 337.8743

The first column (noInf) of the response is treated as reference category as desired. We can print the coefficients in a format that matches the table on Fahrmeir and Tutz (2001) page 75.

Code

round(t(coef(caes_nnet_summary)), 3)

              Inf1   Inf2
(Intercept) -2.621 -2.560
NoPlan       1.174  0.996
Antib       -3.520 -3.087
RiskF        1.829  2.195

We can also have a look at the deviance of the fitted model.

Code

caes_nnet_dev <- deviance(caes_nnet)
caes_nnet_dev

[1] 321.8743

Deviance Interpretation

Looking at the documentation of the nnet::multinom() function by running ?multinom, we see that the deviance reported by nnet::multinom() is $-2 \times \text{log-likelihood}$ and has not been adjusted by the saturated model (or the scale parameter). This differs from deviance in stats::glm() and our usual deviance. Always check the documentation for how a package’s definition of “deviance” is calculated.

Exploring Model Objects

Use attributes(caes_nnet) to see all available components of a model object. This reveals what elements you can access (like $deviance, $fitted.values, $residuals) and helps you understand the object’s structure. We can access these components directly with $, but we can also use the generic methods deviance(), fitted(), and residuals() which are more robust to changes in the underlying object structure across different model types and packages. Similarly, str() shows the internal structure, and names() lists just the accessible components.

Code

attributes(caes_nnet)

$names
 [1] "n"             "nunits"        "nconn"         "conn"         
 [5] "nsunits"       "decay"         "entropy"       "softmax"      
 [9] "censored"      "value"         "wts"           "convergence"  
[13] "fitted.values" "residuals"     "call"          "terms"        
[17] "weights"       "deviance"      "rank"          "lab"          
[21] "coefnames"     "vcoefnames"    "xlevels"       "edf"          
[25] "AIC"          

$class
[1] "multinom" "nnet"

Fit with glmnet Package (Non-examinable)

This section is not examinable, but just for fun. glmnet is written by Jerome Friedman and Trevor Hastie, and is a popular package for regularized regression (that is, Lasso and Ridge regression). The codebase is in Fortran, and is very fast.

We can use glmnet to fit the same multinomial model by setting lambda = 0 (giving no weight to the regularisation step) to get the unregularized estimates. If you look up ?glmnet, pay attention to what they said about symmetric multinomial model which has a different parametrization; see the section on “Multinomial Regression” in their vignette for the precise form of this parametrization. Because of a different parametrization, the returned fitted coefficients won’t be same as nnet::multinom().

Code

caes_predictors <- caes_dat |>
  select(NoPlan, Antib, RiskF) |>
  as.matrix()

caes_glmnet <- glmnet(
  x = caes_predictors,
  y = caes_response,
  family = "multinomial",
  lambda = 0
)

# Extract and display coefficients
coef(caes_glmnet, s = 0)

$noInf
4 x 1 sparse Matrix of class "dgCMatrix"
                   s=0
(Intercept)  1.7269999
NoPlan      -0.9960274
Antib        3.0871833
RiskF       -1.8290859

$Inf1
4 x 1 sparse Matrix of class "dgCMatrix"
                   s=0
(Intercept) -0.8940574
NoPlan       0.1784734
Antib       -0.4329985
RiskF        .        

$Inf2
4 x 1 sparse Matrix of class "dgCMatrix"
                   s=0
(Intercept) -0.8329425
NoPlan       .        
Antib        .        
RiskF        0.3663593

These coefficients correspond to the alphas in

\[ \tilde{u}_{ir} = u_{ir} - u_{ik} = \underbrace{(\alpha_{r0} - \alpha_{k0})}_{\beta_{r0}} + \mathbf{z}_i^T\underbrace{ (\boldsymbol{\alpha}_r - \boldsymbol{\alpha}_k) }_{\boldsymbol{\beta}_r} = \beta_{r0} + \mathbf{z}_i^T\boldsymbol{\beta}_r. \]

Therefore, we can retrieve the nnet::multinom() coefficients by taking differences of the glmnet alphas:

Extract slope coefficients for each category and compute differences relative to the reference category
Extract intercept coefficients and compute differences relative to the reference category intercept

Code

beta_matrix <- cbind(
  caes_glmnet$beta[[2]] - caes_glmnet$beta[[1]],
  caes_glmnet$beta[[3]] - caes_glmnet$beta[[1]]
)
beta_matrix <- rbind(
  caes_glmnet$a0[-1] - caes_glmnet$a0[1],
  beta_matrix
)
round(beta_matrix, 3)

4 x 2 sparse Matrix of class "dgCMatrix"
           s0     s0
       -2.621 -2.560
NoPlan  1.175  0.996
Antib  -3.520 -3.087
RiskF   1.829  2.195

These also matches the coefficients from Fahrmeir and Tutz (2001) p. 75.

Alternative multinom fit Using Fahrmeir Package Data

We can fit instead using the caesar data from the Fahrmeir package, setting preferred reference levels before modeling.

Code

caesar_prepared <- caesar |>
  as_tibble() |>
  mutate(
    y = relevel(y, ref = "3"),
    noplan = relevel(noplan, ref = "planned"),
    antib = relevel(antib, ref = "without"),
    factor = relevel(factor, ref = "without")
  )

caes_nnet_prepared <- multinom(
  y ~ noplan + antib + factor,
  data = caesar_prepared,
  weights = w
)

# weights:  15 (8 variable)
initial  value 275.751684 
iter  10 value 161.068578
final  value 160.937147 
converged

Code

summary(caes_nnet_prepared)

Call:
multinom(formula = y ~ noplan + antib + factor, data = caesar_prepared, 
    weights = w)

Coefficients:
  (Intercept) noplannot antibantibiotics factorrisk factors
1   -2.621012 1.1742513        -3.520249           1.829241
2   -2.559917 0.9959762        -3.087160           2.195458

Std. Errors:
  (Intercept) noplannot antibantibiotics factorrisk factors
1   0.5567209 0.5213014        0.6717416          0.6023322
2   0.5462995 0.4813634        0.5498675          0.5869601

Residual Deviance: 321.8743 
AIC: 337.8743

Code

round(t(coef(caes_nnet_prepared)), 3)

                        1      2
(Intercept)        -2.621 -2.560
noplannot           1.174  0.996
antibantibiotics   -3.520 -3.087
factorrisk factors  1.829  2.195

This again matches the textbook.

We can also check the “deviance” of this fit, which should be the same as before since it’s the same model.

Code

deviance(caes_nnet_prepared)

[1] 321.8743

Code

caes_nnet_dev

[1] 321.8743

Explore the Fitted Values and Residuals

We now inspect the fitted values and residuals of the caes_nnet model. The fitted values are the predicted probabilities for each category.

Code

fitted(caes_nnet)

      noInf        Inf1        Inf2
1 0.8695347 0.063240568 0.067224753
2 0.4656329 0.210951192 0.323415876
3 0.9943521 0.002140051 0.003507889
4 0.9568456 0.012827892 0.030326540
5 0.6922135 0.162899513 0.144886971
6 0.2300766 0.337273035 0.432650355
7 0.8855925 0.038416539 0.075990954

The residuals are the differences between the observed proportions and the fitted probabilities.

Code

residuals(caes_nnet)

         noInf         Inf1         Inf2
1 -0.069534679  0.036759432  0.032775247
2  0.051608447 -0.021296019 -0.030312428
3  0.005647940 -0.002140051 -0.003507889
4 -0.012401124 -0.012827892  0.025229016
5  0.307786484 -0.162899513 -0.144886971
6 -0.114691994  0.047342349  0.067349645
7  0.002162595  0.002399788 -0.004562382

We can manually validate the residuals by computing the observed proportions and subtracting the fitted probabilities.

Code

caes_nnet_prop <- caes_dat |>
  mutate(
    noInf = noInf / size,
    Inf1 = Inf1 / size,
    Inf2 = Inf2 / size
  ) |>
  select(noInf, Inf1, Inf2)

caes_nnet_prop - fitted(caes_nnet)

         noInf         Inf1         Inf2
1 -0.069534679  0.036759432  0.032775247
2  0.051608447 -0.021296019 -0.030312428
3  0.005647940 -0.002140051 -0.003507889
4 -0.012401124 -0.012827892  0.025229016
5  0.307786484 -0.162899513 -0.144886971
6 -0.114691994  0.047342349  0.067349645
7  0.002162595  0.002399788 -0.004562382

Given a new observation, we can either predict the most likely category (with type = "class") or the probabilities for each category (with type = "probs").

Code

new_case <- tibble(NoPlan = 1, Antib = 1, RiskF = 0)

predict(caes_nnet, newdata = new_case, type = "probs")

      noInf        Inf1        Inf2 
0.983753294 0.006850796 0.009395909

Code

predict(caes_nnet, newdata = new_case, type = "class")

[1] noInf
Levels: noInf Inf1 Inf2

For a sanity check, we can also compute the predicted probabilities manually using the coefficients.

Code

coef_matrix <- coef(caes_nnet)
ratios <- exp(coef_matrix %*% c(1, 1, 1, 0))
prob_no_inf <- 1 / (sum(ratios) + 1)

prob_no_inf

[1] 0.9837533

Caution

We see that the manually calculated probability for noInf is slightly different from the one returned by predict(). The exact reason for this discrepancy is not clear.

Alternative Fit with VGAM Package

The VGAM package will be used a lot more later in the course, because it fits a wide variety of multivariate response regression models. It is not as fast as glmnet or nnet for large datasets, because it uses the Fisher scoring algorithm (like the book).

It is good practice to read the documentation for the VGAM::vglm() function with ?vglm to understand the arguments and output structure. Also check ?multinomial, especially refLevel and parallel, since these choices directly affect model interpretation.

We use the raw data caes_dat extracted from the CSV file. VGAM::multinomial() has a refLevel argument to specify the reference category.

Code

caes_vgam <- vglm(
  formula = cbind(noInf, Inf1, Inf2) ~ NoPlan + Antib + RiskF,
  family = multinomial(refLevel = "noInf"),
  data = caes_dat
)

vglm Default Reference Level

Looking at the documentation for VGAM::multinomial(), we can see that the default parameter for refLevel is "(Last)". This means that the last category in the response matrix is treated as the reference. In our case, this would be Inf2, which is not what we want. we ensure that noInf is treated as the reference category by setting refLevel = "noInf".

Code

summary(caes_vgam)


Call:
vglm(formula = cbind(noInf, Inf1, Inf2) ~ NoPlan + Antib + RiskF, 
    family = multinomial(refLevel = "noInf"), data = caes_dat)

Coefficients: 
              Estimate Std. Error z value Pr(>|z|)    
(Intercept):1  -2.6210     0.5567  -4.708 2.50e-06 ***
(Intercept):2  -2.5599     0.5463  -4.686 2.79e-06 ***
NoPlan:1        1.1742     0.5213   2.253 0.024289 *  
NoPlan:2        0.9960     0.4814   2.069 0.038539 *  
Antib:1        -3.5202     0.6717  -5.240 1.60e-07 ***
Antib:2        -3.0872     0.5499  -5.614 1.97e-08 ***
RiskF:1         1.8292     0.6023   3.037 0.002390 ** 
RiskF:2         2.1955     0.5870   3.740 0.000184 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Names of linear predictors: log(mu[,2]/mu[,1]), log(mu[,3]/mu[,1])

Residual deviance: 11.8295 on 6 degrees of freedom

Log-likelihood: -20.8871 on 6 degrees of freedom

Number of Fisher scoring iterations: 5 


Reference group is level  1  of the response

Log-Likelihood Differences Between Packages

Remember that the deviance reported by nnet::multinom() is twice the negative log-likelihood. Therefore the log-likelihood from nnet can be computed as:

Code

-caes_nnet_dev / 2

[1] -160.9371

This is significantly different from the log-likelihood -20.8871 reported by VGAM::vglm() for the same data and model. This is because different packages treat grouped vs. ungrouped multinomial data differently. See Appendix A2 for a detailed explanation of how this arises and why you must be careful when comparing results across packages.

Extracting the coefficients and the fitted probabilities from the vglm fit, we see they also those in Fahrmeir and Tutz (2001) p. 75. They match those from nnet::multinom() up to numerical differences, likely due to the difference in fitting algorithms used under the hood.

Code

 coef(caes_vgam, matrix.out = TRUE)

            log(mu[,2]/mu[,1]) log(mu[,3]/mu[,1])
(Intercept)          -2.621010         -2.5599132
NoPlan                1.174247          0.9959748
Antib                -3.520248         -3.0871595
RiskF                 1.829241          2.1954542

Code

fitted(caes_vgam)

      noInf        Inf1        Inf2
1 0.8695343 0.063240687 0.067224963
2 0.4656328 0.210951544 0.323415652
3 0.9943520 0.002140059 0.003507904
4 0.9568455 0.012827935 0.030326545
5 0.6922135 0.162899155 0.144887296
6 0.2300770 0.337272714 0.432650306
7 0.8855926 0.038416501 0.075990881

Example 3.11

We now move on to Example 3.11. Looking at the coefficients from the multinomial logit model, we can see that the coefficients for NoPlan and RiskF are very similar across the two infection types. Therefore we can constrain the coefficients for NoPlan and RiskF to be equal across infection types using the parallel argument.

What is the Model Specification: Expand for Answer

\[ \log\frac{P(\texttt{Inf1})}{P(\texttt{Noinf})} = \beta_{10} + \beta_{N}\mathbb{1}_{\texttt{NoPlan}} + \beta_{F}\mathbb{1}_{\texttt{RiskF}} + \beta_{1A}\mathbb{1}_{\texttt{Antib}} \] \[ \log\frac{P(\texttt{Inf2})}{P(\texttt{Noinf})} = \beta_{20} + \beta_{N}\mathbb{1}_{\texttt{NoPlan}} + \beta_{F}\mathbb{1}_{\texttt{RiskF}} + \beta_{2A}\mathbb{1}_{\texttt{Antib}} \]

We run the regression, and extract the coefficients.

Code

caes_vgam_parallel <- vglm(
  formula = cbind(noInf, Inf1, Inf2) ~ NoPlan + Antib + RiskF,
  data = caes_dat,
  family = multinomial(
    refLevel = "noInf",
    parallel = TRUE ~ NoPlan + RiskF - 1
  )
)

coefvlm(caes_vgam_parallel, matrix.out = TRUE)

            log(mu[,2]/mu[,1]) log(mu[,3]/mu[,1])
(Intercept)          -2.751287          -2.443802
NoPlan                1.071967           1.071967
Antib                -3.494351          -3.108688
RiskF                 2.029896           2.029896

VGAM Using Fahrmeir Package Data

We can try running the VGAM::vglm() function on the long-format caesar_prepared we manipulated earlier from the original caesar data from the Fahrmeir package. We don’t fit the parallel model here.

Code

vglm(
  formula = y ~ noplan + antib + factor,
  data = caesar_prepared,
  weights = w,
  family = multinomial(refLevel = "3")
)

Error in `if (max(abs(ycounts - round(ycounts))) > smallno) ...`:
! missing value where TRUE/FALSE needed

This error is due to the zero-count rows. We remove them and try again.

Code

caesar_no_zero_w <- caesar_prepared |> dplyr::filter(w > 0)
caesar_no_zero_w

# A tibble: 16 × 7
   y         w noplan  factor       antib          yl patco
   <fct> <dbl> <fct>   <fct>        <fct>       <dbl> <dbl>
 1 3        87 not     risk factors antibiotics     0     1
 2 1         4 not     risk factors antibiotics     1     1
 3 2         7 not     risk factors antibiotics     1     1
 4 2        13 not     risk factors without         1     2
 5 1        10 not     risk factors without         1     2
 6 3         3 not     risk factors without         0     2
 7 3         9 not     without      without         0     4
 8 2         1 planned risk factors antibiotics     1     5
 9 3        17 planned risk factors antibiotics     0     5
10 2        17 planned risk factors without         1     6
11 3        30 planned risk factors without         0     6
12 1        11 planned risk factors without         1     6
13 3         2 planned without      antibiotics     0     7
14 2         4 planned without      without         1     8
15 3        32 planned without      without         0     8
16 1         4 planned without      without         1     8

Code

levels(caesar_no_zero_w$noplan)

[1] "planned" "not"

Code

levels(caesar_no_zero_w$factor)

[1] "without"      "risk factors"

Code

levels(caesar_no_zero_w$antib)

[1] "without"     "antibiotics"

Code

caes_vgam2 <- vglm(
  formula = y ~ noplan + antib + factor,
  data = caesar_no_zero_w,
  weights = w,
  family = multinomial(refLevel = "3")
)

summary(caes_vgam2)


Call:
vglm(formula = y ~ noplan + antib + factor, family = multinomial(refLevel = "3"), 
    data = caesar_no_zero_w, weights = w)

Coefficients: 
                     Estimate Std. Error z value Pr(>|z|)    
(Intercept):1         -2.6210     0.5567  -4.708 2.50e-06 ***
(Intercept):2         -2.5599     0.5463  -4.686 2.79e-06 ***
noplannot:1            1.1742     0.5213   2.253 0.024288 *  
noplannot:2            0.9960     0.4814   2.069 0.038539 *  
antibantibiotics:1    -3.5202     0.6717  -5.241 1.60e-07 ***
antibantibiotics:2    -3.0872     0.5499  -5.614 1.97e-08 ***
factorrisk factors:1   1.8292     0.6023   3.037 0.002390 ** 
factorrisk factors:2   2.1955     0.5870   3.740 0.000184 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Names of linear predictors: log(mu[,2]/mu[,1]), log(mu[,3]/mu[,1])

Residual deviance: 321.8743 on 24 degrees of freedom

Log-likelihood: -160.9371 on 24 degrees of freedom

Number of Fisher scoring iterations: 5 


Reference group is level  1  of the response

We can compare the coefficients from this fit to the earlier vglm fit caes_vgam based on the wide-format data.

Code

coefvlm(caes_vgam2, matrix.out = TRUE)

                   log(mu[,2]/mu[,1]) log(mu[,3]/mu[,1])
(Intercept)                 -2.621010         -2.5599132
noplannot                    1.174247          0.9959748
antibantibiotics            -3.520248         -3.0871595
factorrisk factors           1.829241          2.1954542

Code

coefvlm(caes_vgam, matrix.out = TRUE)

            log(mu[,2]/mu[,1]) log(mu[,3]/mu[,1])
(Intercept)          -2.621010         -2.5599132
NoPlan                1.174247          0.9959748
Antib                -3.520248         -3.0871595
RiskF                 1.829241          2.1954542

Code

caes_vgam@criterion

$deviance
[1] 11.82954

$loglikelihood
[1] -20.88715

Code

caes_vgam2@criterion

$deviance
[1] 321.8743

$loglikelihood
[1] -160.9371

We can see that the deviance and log-likelihoods are different between the two vglm() fits. The deviance of caes_vgam matches that of in the book, however the deviance of caes_vgam2 is just two times the negative log-likelihood of the fitted model.

This obscure behavior is unlike the two multinom fits caes_nnet and caes_nnet_prepared respectively based on wide and long data, where both give a deviance of 321.8743.

Why @ Here Instead of $?

In R, @ accesses slots of S4 objects, while $ accesses named elements in lists, data frames, and many S3 objects.

VGAM::vglm() returns an S4 object, so criterion is stored as a slot, and we use caes_vgam@criterion rather than caes_vgam$criterion.

Using dplyr and tidyr to Reshape Data

When fitting multinomial models, the data can be structured in either long or wide format. We can convert the data back into long format and confirm we still get the same fit.

The manipulation below:

Removes the total-count column (size) because the long-form weight column will carry counts
Pivots infection outcomes (noInf, Inf1, Inf2) into two columns: category name (inf_type) and count (w)
Sets an explicit factor order for inf_type so the reference and comparisons are reproducible

Code

caesar_long <- caes_dat |>
  select(-size) |>
  pivot_longer(
    cols = noInf:Inf2,
    names_to = "inf_type",
    values_to = "w"
  ) |>
  mutate(inf_type = factor(inf_type, levels = c("noInf", "Inf1", "Inf2")))

caesar_long

# A tibble: 21 × 5
   NoPlan Antib RiskF inf_type     w
    <dbl> <dbl> <dbl> <fct>    <dbl>
 1      0     0     0 noInf       32
 2      0     0     0 Inf1         4
 3      0     0     0 Inf2         4
 4      0     0     1 noInf       30
 5      0     0     1 Inf1        11
 6      0     0     1 Inf2        17
 7      0     1     0 noInf        2
 8      0     1     0 Inf1         0
 9      0     1     0 Inf2         0
10      0     1     1 noInf       17
# ℹ 11 more rows

Code

caes_nnet_long <- multinom(
  inf_type ~ NoPlan + Antib + RiskF,
  data = caesar_long,
  weights = w
)

# weights:  15 (8 variable)
initial  value 275.751684 
iter  10 value 161.068578
final  value 160.937147 
converged

Code

coef(caes_nnet_long)

     (Intercept)    NoPlan     Antib    RiskF
Inf1   -2.621012 1.1742513 -3.520249 1.829241
Inf2   -2.559917 0.9959762 -3.087160 2.195458

Code

coef(caes_nnet_prepared)

  (Intercept) noplannot antibantibiotics factorrisk factors
1   -2.621012 1.1742513        -3.520249           1.829241
2   -2.559917 0.9959762        -3.087160           2.195458

Package Citations and Session Information

Session Information

Code

sessionInfo()

R version 4.4.2 (2024-10-31)
Platform: x86_64-apple-darwin20
Running under: macOS Sequoia 15.7.7

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Australia/Melbourne
tzcode source: internal

attached base packages:
[1] splines   stats4    stats     graphics  grDevices datasets  utils    
[8] methods   base     

other attached packages:
[1] VGAM_1.1-14        readr_2.1.6        tidyr_1.3.2        nnet_7.3-19       
[5] glmnet_4.1-10      Matrix_1.7-1       Fahrmeir_2016.5.31 dplyr_1.2.0       
[9] conflicted_1.2.0  

loaded via a namespace (and not attached):
 [1] utf8_1.2.6        generics_0.1.4    renv_1.1.7        shape_1.4.6.1    
 [5] lattice_0.22-6    hms_1.1.4         digest_0.6.39     magrittr_2.0.4   
 [9] evaluate_1.0.5    grid_4.4.2        iterators_1.0.14  fastmap_1.2.0    
[13] foreach_1.5.2     jsonlite_2.0.0    survival_3.7-0    purrr_1.2.1      
[17] codetools_0.2-20  cli_3.6.5         rlang_1.1.7       crayon_1.5.3     
[21] bit64_4.6.0-1     withr_3.0.2       cachem_1.1.0      yaml_2.3.12      
[25] parallel_4.4.2    tools_4.4.2       tzdb_0.5.0        memoise_2.0.1    
[29] vctrs_0.7.1       R6_2.6.1          lifecycle_1.0.5   bit_4.6.0        
[33] vroom_1.7.0       pkgconfig_2.0.3   pillar_1.11.1     glue_1.8.0       
[37] Rcpp_1.1.1        xfun_0.56         tibble_3.3.1      tidyselect_1.2.1 
[41] rstudioapi_0.18.0 knitr_1.51        htmltools_0.5.9   rmarkdown_2.30   
[45] compiler_4.4.2

Code

packages <- c(
  "conflicted",
  "dplyr",
  "Fahrmeir",
  "glmnet",
  "nnet",
  "tidyr",
  "readr",
  "VGAM"
)
for (pkg in packages) {
  cit <- citation(pkg)

  # Check if multiple citations
  if (length(cit) > 1) {
    cat("- **", pkg, "**:\n", sep = "")
    for (i in seq_along(cit)) {
      cit_text <- format(cit[i], style = "text")
      cat("    - ", cit_text, "\n\n", sep = "")
    }
  } else {
    cit_text <- format(cit, style = "text")
    cat("- **", pkg, "**: ", cit_text, "\n\n", sep = "")
  }
}

conflicted: Wickham H (2023). conflicted: An Alternative Conflict Resolution Strategy. R package version 1.2.0, https://github.com/r-lib/conflicted, https://conflicted.r-lib.org/.
dplyr: Wickham H, François R, Henry L, Müller K, Vaughan D (2026). dplyr: A Grammar of Data Manipulation. R package version 1.2.0, https://github.com/tidyverse/dplyr, https://dplyr.tidyverse.org.
Fahrmeir: Halvorsen cbKB (2016). Fahrmeir: Data from the Book “Multivariate Statistical Modelling Based on Generalized Linear Models”, First Edition, by Ludwig Fahrmeir and Gerhard Tutz. R package version 2016.5.31.
glmnet:
- Friedman J, Hastie T, Tibshirani R (2010). “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software, 33(1), 1-22. doi:10.18637/jss.v033.i01 https://doi.org/10.18637/jss.v033.i01.
- Simon N, Friedman J, Hastie T, Tibshirani R (2011). “Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent.” Journal of Statistical Software, 39(5), 1-13. doi:10.18637/jss.v039.i05 https://doi.org/10.18637/jss.v039.i05.
- Tay JK, Narasimhan B, Hastie T (2023). “Elastic Net Regularization Paths for All Generalized Linear Models.” Journal of Statistical Software, 106(1), 1-31. doi:10.18637/jss.v106.i01 https://doi.org/10.18637/jss.v106.i01.
nnet: Venables WN, Ripley BD (2002). Modern Applied Statistics with S, Fourth edition. Springer, New York. ISBN 0-387-95457-0, https://www.stats.ox.ac.uk/pub/MASS4/.
tidyr: Wickham H, Vaughan D, Girlich M (2025). tidyr: Tidy Messy Data. R package version 1.3.2, https://github.com/tidyverse/tidyr, https://tidyr.tidyverse.org.
readr: Wickham H, Hester J, Bryan J (2025). readr: Read Rectangular Text Data. R package version 2.1.6, https://github.com/tidyverse/readr, https://readr.tidyverse.org.
VGAM:
- Yee TW (2015). Vector Generalized Linear and Additive Models: With an Implementation in R. Springer, New York, USA.
- Yee TW, Wild CJ (1996). “Vector Generalized Additive Models.” Journal of Royal Statistical Society, Series B, 58(3), 481-493.
- Yee TW (2010). “The VGAM Package for Categorical Data Analysis.” Journal of Statistical Software, 32(10), 1-34. doi:10.18637/jss.v032.i10 https://doi.org/10.18637/jss.v032.i10.
- Yee TW, Hadi AF (2014). “Row-column interaction models, with an R implementation.” Computational Statistics, 29(6), 1427-1445.
- Yee TW (2025). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-14, https://CRAN.R-project.org/package=VGAM.
- Yee TW (2013). “Two-parameter reduced-rank vector generalized linear models.” Computational Statistics and Data Analysis, 71, 889-902.
- Yee TW, Stoklosa J, Huggins RM (2015). “The VGAM Package for Capture-Recapture Data Using the Conditional Likelihood.” Journal of Statistical Software, 65(5), 1-33. doi:10.18637/jss.v065.i05 https://doi.org/10.18637/jss.v065.i05.
- Yee TW (2020). “The VGAM package for negative binomial regression.” Australian and New Zealand Journal of Statistics, 62(1), 116-131.
- Yee TW (2022). “On the Hauck-Donner effect in Wald tests: Detection, tipping points and parameter space characterization.” Journal of the American Statistical Association, 117(540), 1763-1774.
- Yee TW, Ma C (2024). “Generally altered, inflated, truncated and deflated regression.” Statistical Science, 39(4), 568-588.
- Yee TW, Frigau L, Ma C (2025). “Heaping and seeping, GAITD regression and doubly constrained reduced-rank vector generalized linear models.” Annals of Applied Statistics, (in press), 1-25.

References

Fahrmeir, Ludwig, and Gerhard Tutz. 2001. Multivariate Statistical Modelling Based on Generalized Linear Models. Second Edition. Springer Series in Statistics. Springer. https://doi.org/10.1007/978-1-4757-3454-6.

Venables, W. N., and B. D. Ripley. 2002. Modern Applied Statistics with S. Fourth. Springer. https://www.stats.ox.ac.uk/pub/MASS4/.

Overview

Packages and Data

Example 3.1: Caesarian Birth Study (p. 73)

Fit with nnet::multinom() from nnet

Explore the Fitted Values and Residuals

Example 3.11

Package Citations and Session Information

References

Fit with `nnet::multinom()` from `nnet`