Multinomial Logistic Regression: Caesarian Dataset Analysis

Fahrmeir and Tutz (2001) Examples 3.1 and 3.11

Published

March 12, 2026

Overview

This script replicates the analyses from Examples 3.1 and 3.11 in Fahrmeir and Tutz (2001).

Packages and Data

Load all required packages and the prepared Caesar dataset. As always, we load conflicted to manage any function name conflicts across packages.

Code
library(conflicted)

Then we load the necessary packages for data manipulation and modelling.

Code
library(dplyr)
library(Fahrmeir)
library(glmnet)
Loading required package: Matrix
Loaded glmnet 4.1-10
Code
library(nnet)
library(tidyr)
library(readr)
library(VGAM)
Loading required package: stats4
Loading required package: splines

Example 3.1: Caesarian Birth Study (p. 73)

This data is the same as the data we examined in the first script, but now we distinguish between the two infection types Inf1 and Inf2 instead of just a binary Inf variable. We import the ready made dataset.

We do this using a basic multinomial logit model with no infection as the reference category.

\[ \log\frac{P(\texttt{Inf1})}{P(\texttt{Noinf})} = \beta_{10} + \beta_{1N}\mathbb{1}_{\texttt{NoPlan}} + \beta_{1F}\mathbb{1}_{\texttt{RiskF}} + \beta_{1A}\mathbb{1}_{\texttt{Antib}} \] \[ \log\frac{P(\texttt{Inf2})}{P(\texttt{Noinf})} = \beta_{20} + \beta_{2N}\mathbb{1}_{\texttt{NoPlan}} + \beta_{2F}\mathbb{1}_{\texttt{RiskF}} + \beta_{2A}\mathbb{1}_{\texttt{Antib}} \]

Code
caes_dat <- read_csv("data/Example3-1Caes.csv")
Rows: 7 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (7): size, noInf, Inf1, Inf2, NoPlan, Antib, RiskF

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
caes_dat
# A tibble: 7 × 7
   size noInf  Inf1  Inf2 NoPlan Antib RiskF
  <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>
1    40    32     4     4      0     0     0
2    58    30    11    17      0     0     1
3     2     2     0     0      0     1     0
4    18    17     0     1      0     1     1
5     9     9     0     0      1     0     0
6    26     3    10    13      1     0     1
7    98    87     4     7      1     1     1

Instead of using the pre-prepared CSV file, we could also start from the original caesar dataset in the Fahrmeir package, and manipulate it into the required format for modeling.

Code
head(caesar)
  y  w noplan       factor       antib yl patco
1 3 87    not risk factors antibiotics  0     1
2 1  4    not risk factors antibiotics  1     1
3 2  7    not risk factors antibiotics  1     1
4 2 13    not risk factors     without  1     2
5 1 10    not risk factors     without  1     2
6 3  3    not risk factors     without  0     2

The code below:

  1. Converts the caesar data to a tibble and removes unused columns
  2. Removes any rows with zero counts
  3. Pivots from long format (one row per infection type) to wide format (one row per covariate combination)
  4. Renames infection type codes 1, 2, and 3 to Inf1, Inf2, and Noinf respectively
  5. Computes total counts and recodes factors to integers matching the textbook coding
Code
caesar_clean <- caesar |>
    as_tibble() |>
    select(-patco, -yl) |>
    dplyr::filter(w > 0) |>
    pivot_wider(
        names_from = y,
        values_from = w,
        values_fill = 0
    ) |>
    rename(Noinf = `3`, Inf1 = `1`, Inf2 = `2`) |>
    mutate(
        size = Noinf + Inf1 + Inf2,
        factor = as.integer(factor == "risk factors"),
        noplan = as.integer(noplan == "not"),
        antib = as.integer(antib == "antibiotics")
    ) |>
    arrange(noplan, antib, factor) |>
    relocate(size, Noinf, Inf1, Inf2, noplan, antib, factor)

caesar_clean
# A tibble: 7 × 7
   size Noinf  Inf1  Inf2 noplan antib factor
  <dbl> <dbl> <dbl> <dbl>  <int> <int>  <int>
1    40    32     4     4      0     0      0
2    58    30    11    17      0     0      1
3     2     2     0     0      0     1      0
4    18    17     0     1      0     1      1
5     9     9     0     0      1     0      0
6    26     3    10    13      1     0      1
7    98    87     4     7      1     1      1

Fit with multinom() from nnet

To fit the a multinomial log-linear model we use the multinom() function from the nnet package. The nnet package is written by the authors of and well documented in Venables and Ripley (2002). To see more information about the multinom() function, you can refer to the documentation by running ?multinom in your R environment. It is especially useful to check the arguments controlling optimisation (maxit, trace, Hess) when reproducing textbook results.

The multinom() function uses the BFGS optimization method, which is a quasi-Newton method for finding the maximum likelihood estimates of the model parameters. This is quicker than the Fisher scoring algorithm which requires computing the Hessian matrix at each iteration.

We extract the response variables as a matrix format for the multinom function,

Code
caes_response <- caes_dat |>
    select(noInf, Inf1, Inf2) |>
    as.matrix()

caes_response
     noInf Inf1 Inf2
[1,]    32    4    4
[2,]    30   11   17
[3,]     2    0    0
[4,]    17    0    1
[5,]     9    0    0
[6,]     3   10   13
[7,]    87    4    7

and fit the model.

Code
caes_nnet <- multinom(
    caes_response ~ NoPlan + Antib + RiskF,
    data = caes_dat
)
# weights:  15 (8 variable)
initial  value 275.751684 
iter  10 value 161.068578
final  value 160.937147 
converged
Code
caes_nnet_summary <- summary(caes_nnet)
caes_nnet_summary
Call:
multinom(formula = caes_response ~ NoPlan + Antib + RiskF, data = caes_dat)

Coefficients:
     (Intercept)    NoPlan     Antib    RiskF
Inf1   -2.621012 1.1742513 -3.520249 1.829241
Inf2   -2.559917 0.9959762 -3.087160 2.195458

Std. Errors:
     (Intercept)    NoPlan     Antib     RiskF
Inf1   0.5567209 0.5213014 0.6717416 0.6023322
Inf2   0.5462995 0.4813634 0.5498675 0.5869601

Residual Deviance: 321.8743 
AIC: 337.8743 

The first column (noInf) of the response is treated as reference category as desired. We can print the coefficients in a format that matches the table on Fahrmeir and Tutz (2001) page 75.

Code
round(t(coef(caes_nnet_summary)), 3)
              Inf1   Inf2
(Intercept) -2.621 -2.560
NoPlan       1.174  0.996
Antib       -3.520 -3.087
RiskF        1.829  2.195

We can also have a look at the deviance of the fitted model.

Code
caes_nnet$deviance
[1] 321.8743
Deviance Interpretation

Looking at the documentation of the multinom() function by running ?multinom, we see that the deviance reported by multinom() is \(-2 \times \text{log-likelihood}\) and has not been adjusted by the saturated model (or the scale parameter). This differs from deviance in glm() and our usual deviance. Always check the documentation for how deviance is calculated.

Use attributes(caes_nnet) to see all available components of a model object. This reveals what elements you can access (like $deviance, $fitted.values, $residuals) and helps you understand the object’s structure. Similarly, str() shows the internal structure, and names() lists just the accessible components.

Code
attributes(caes_nnet)
$names
 [1] "n"             "nunits"        "nconn"         "conn"         
 [5] "nsunits"       "decay"         "entropy"       "softmax"      
 [9] "censored"      "value"         "wts"           "convergence"  
[13] "fitted.values" "residuals"     "call"          "terms"        
[17] "weights"       "deviance"      "rank"          "lab"          
[21] "coefnames"     "vcoefnames"    "xlevels"       "edf"          
[25] "AIC"          

$class
[1] "multinom" "nnet"    

This section is not examinable, but just for fun. glmnet is written by Jerome Friedman and Trevor Hastie, and is a popular package for regularized regression (that is, Lasso and Ridge regression). The codebase is in Fortran, and is very fast.

We can use glmnet to fit the same multinomial model by setting lambda = 0 (giving no weight to the regularisation step) to get the unregularized estimates. If you look up ?glmnet, pay attention to what they said about symmetric multinomial model which has a different parametrization; see the section on “Multinomial Regression” in their vignette for the precise form of this parametrization. Because of a different parametrization, the returned fitted coefficients won’t be same as multinom().

Code
caes_predictors <- caes_dat |>
    select(NoPlan, Antib, RiskF) |>
    as.matrix()

caes_glmnet <- glmnet(
    x = caes_predictors,
    y = caes_response,
    family = "multinomial",
    lambda = 0
)

# Extract and display coefficients
coef(caes_glmnet, s = 0)
$noInf
4 x 1 sparse Matrix of class "dgCMatrix"
                   s=0
(Intercept)  1.7269999
NoPlan      -0.9960274
Antib        3.0871833
RiskF       -1.8290859

$Inf1
4 x 1 sparse Matrix of class "dgCMatrix"
                   s=0
(Intercept) -0.8940574
NoPlan       0.1784734
Antib       -0.4329985
RiskF        .        

$Inf2
4 x 1 sparse Matrix of class "dgCMatrix"
                   s=0
(Intercept) -0.8329425
NoPlan       .        
Antib        .        
RiskF        0.3663593

These coefficients correspond to the alphas in

\[ \tilde{u}_{ir} = u_{ir} - u_{ik} = \underbrace{(\alpha_{r0} - \alpha_{k0})}_{\beta_{r0}} + \mathbf{z}_i^T\underbrace{ (\boldsymbol{\alpha}_r - \boldsymbol{\alpha}_k) }_{\boldsymbol{\beta}_r} = \beta_{r0} + \mathbf{z}_i^T\boldsymbol{\beta}_r. \]

Therefore, we can retrieve the multinom() coefficients by taking differences of the glmnet alphas:

  1. Extract slope coefficients for each category and compute differences relative to the reference category
  2. Extract intercept coefficients and compute differences relative to the reference category intercept
Code
beta_matrix <- cbind(
    caes_glmnet$beta[[2]] - caes_glmnet$beta[[1]],
    caes_glmnet$beta[[3]] - caes_glmnet$beta[[1]]
)
beta_matrix <- rbind(
    caes_glmnet$a0[-1] - caes_glmnet$a0[1],
    beta_matrix
)
round(beta_matrix, 3)
4 x 2 sparse Matrix of class "dgCMatrix"
           s0     s0
       -2.621 -2.560
NoPlan  1.175  0.996
Antib  -3.520 -3.087
RiskF   1.829  2.195

These also matches the coefficients from Fahrmeir and Tutz (2001) p. 75.

We can fit instead using the caesar data from the Fahrmeir package, setting preferred reference levels before modeling.

Code
caesar_prepared <- caesar |>
    as_tibble() |>
    mutate(
        y = relevel(y, ref = "3"),
        noplan = relevel(noplan, ref = "planned"),
        antib = relevel(antib, ref = "without"),
        factor = relevel(factor, ref = "without")
    )

caes_nnet_prepared <- multinom(
    y ~ noplan + antib + factor,
    data = caesar_prepared,
    weights = w
)
# weights:  15 (8 variable)
initial  value 275.751684 
iter  10 value 161.068578
final  value 160.937147 
converged
Code
summary(caes_nnet_prepared)
Call:
multinom(formula = y ~ noplan + antib + factor, data = caesar_prepared, 
    weights = w)

Coefficients:
  (Intercept) noplannot antibantibiotics factorrisk factors
1   -2.621012 1.1742513        -3.520249           1.829241
2   -2.559917 0.9959762        -3.087160           2.195458

Std. Errors:
  (Intercept) noplannot antibantibiotics factorrisk factors
1   0.5567209 0.5213014        0.6717416          0.6023322
2   0.5462995 0.4813634        0.5498675          0.5869601

Residual Deviance: 321.8743 
AIC: 337.8743 
Code
round(t(coef(caes_nnet_prepared)), 3)
                        1      2
(Intercept)        -2.621 -2.560
noplannot           1.174  0.996
antibantibiotics   -3.520 -3.087
factorrisk factors  1.829  2.195

This again matches the textbook.

We can also check the deviance of this fit, which should be the same as before since it’s the same model.

Code
caes_nnet_prepared$deviance
[1] 321.8743
Code
caes_nnet$deviance
[1] 321.8743

Explore the Fitted Values and Residuals

We now inspect the fitted values and residuals of the caes_nnet model. The fitted values are the predicted probabilities for each category.

Code
caes_nnet$fitted.values
      noInf        Inf1        Inf2
1 0.8695347 0.063240568 0.067224753
2 0.4656329 0.210951192 0.323415876
3 0.9943521 0.002140051 0.003507889
4 0.9568456 0.012827892 0.030326540
5 0.6922135 0.162899513 0.144886971
6 0.2300766 0.337273035 0.432650355
7 0.8855925 0.038416539 0.075990954

The residuals are the differences between the observed proportions and the fitted probabilities.

Code
caes_nnet$residuals
         noInf         Inf1         Inf2
1 -0.069534679  0.036759432  0.032775247
2  0.051608447 -0.021296019 -0.030312428
3  0.005647940 -0.002140051 -0.003507889
4 -0.012401124 -0.012827892  0.025229016
5  0.307786484 -0.162899513 -0.144886971
6 -0.114691994  0.047342349  0.067349645
7  0.002162595  0.002399788 -0.004562382

We can manually validate the residuals by computing the observed proportions and subtracting the fitted probabilities.

Code
caes_nnet_prop <- caes_dat |>
    mutate(
        noInf = noInf / size,
        Inf1 = Inf1 / size,
        Inf2 = Inf2 / size
    ) |>
    select(noInf, Inf1, Inf2)

caes_nnet_prop - caes_nnet$fitted
         noInf         Inf1         Inf2
1 -0.069534679  0.036759432  0.032775247
2  0.051608447 -0.021296019 -0.030312428
3  0.005647940 -0.002140051 -0.003507889
4 -0.012401124 -0.012827892  0.025229016
5  0.307786484 -0.162899513 -0.144886971
6 -0.114691994  0.047342349  0.067349645
7  0.002162595  0.002399788 -0.004562382

Given a new observation, we can either predict the most likely category (with type = "class") or the probabilities for each category (with type = "probs"). Checking ?predict.multinom is recommended here, because these two type options return different output scales and structures.

Code
new_case <- tibble(NoPlan = 1, Antib = 1, RiskF = 0)

predict(caes_nnet, newdata = new_case, type = "probs")
      noInf        Inf1        Inf2 
0.983753294 0.006850796 0.009395909 
Code
predict(caes_nnet, newdata = new_case, type = "class")
[1] noInf
Levels: noInf Inf1 Inf2

For a sanity check, we can also compute the predicted probabilities manually using the coefficients.

Code
coef_matrix <- coef(caes_nnet)
ratios <- exp(coef_matrix %*% c(1, 1, 1, 0))
prob_no_inf <- 1 / (sum(ratios) + 1)

prob_no_inf
[1] 0.9837533
Caution

We see that the manually calculated probability for noInf is slightly different from the one returned by predict(). The exact reason for this discrepancy is not clear.

The VGAM package will be used a lot more later in the course, because it fits a wide variety of multivariate response regression models. It is not as fast as glmnet or nnet for large datasets, because it uses the Fisher scoring algorithm (like the book).

It is good practice to read the documentation for the vglm() function with ?vglm to understand the arguments and output structure. Also check ?multinomial, especially refLevel and parallel, since these choices directly affect model interpretation.

We use the raw data caes_dat extracted from the CSV file. multinomial() has a refLevel argument to specify the reference category.

Code
caes_vgam <- vglm(
    formula = cbind(noInf, Inf1, Inf2) ~ NoPlan + Antib + RiskF,
    family = multinomial(refLevel = "noInf"),
    data = caes_dat
)

Looking at the documentation for multinomial(), we can see that the default parameter for refLevel is "(Last)". This means that the last category in the response matrix is treated as the reference. In our case, this would be Inf2, which is not what we want. we ensure that noInf is treated as the reference category by setting refLevel = "noInf".

Code
summary(caes_vgam)

Call:
vglm(formula = cbind(noInf, Inf1, Inf2) ~ NoPlan + Antib + RiskF, 
    family = multinomial(refLevel = "noInf"), data = caes_dat)

Coefficients: 
              Estimate Std. Error z value Pr(>|z|)    
(Intercept):1  -2.6210     0.5567  -4.708 2.50e-06 ***
(Intercept):2  -2.5599     0.5463  -4.686 2.79e-06 ***
NoPlan:1        1.1742     0.5213   2.253 0.024289 *  
NoPlan:2        0.9960     0.4814   2.069 0.038539 *  
Antib:1        -3.5202     0.6717  -5.240 1.60e-07 ***
Antib:2        -3.0872     0.5499  -5.614 1.97e-08 ***
RiskF:1         1.8292     0.6023   3.037 0.002390 ** 
RiskF:2         2.1955     0.5870   3.740 0.000184 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Names of linear predictors: log(mu[,2]/mu[,1]), log(mu[,3]/mu[,1])

Residual deviance: 11.8295 on 6 degrees of freedom

Log-likelihood: -20.8871 on 6 degrees of freedom

Number of Fisher scoring iterations: 5 


Reference group is level  1  of the response
Log-Likelihood Differences Between Packages

Remember that the deviance reported by nnet::multinom() is twice the negative log-likelihood. Therefore the log-likelihood from nnet can be computed as:

Code
-caes_nnet$deviance / 2
[1] -160.9371

This is significantly different from the log-likelihood reported by VGAM::vglm() for the same data and model. This is because different packages treat grouped vs. ungrouped multinomial data differently. See Appendix A2 for a detailed explanation of how this arises and why you must be careful when comparing results across packages.

Extracting the coefficients from the vglm fit we see they also those in Fahrmeir and Tutz (2001) p. 75, and the fitted probabilities match those from multinom().

Code
round(coef(caes_vgam, matrix.out = TRUE), 3)
            log(mu[,2]/mu[,1]) log(mu[,3]/mu[,1])
(Intercept)             -2.621             -2.560
NoPlan                   1.174              0.996
Antib                   -3.520             -3.087
RiskF                    1.829              2.195
Code
fitted(caes_vgam)
      noInf        Inf1        Inf2
1 0.8695343 0.063240687 0.067224963
2 0.4656328 0.210951544 0.323415652
3 0.9943520 0.002140059 0.003507904
4 0.9568455 0.012827935 0.030326545
5 0.6922135 0.162899155 0.144887296
6 0.2300770 0.337272714 0.432650306
7 0.8855926 0.038416501 0.075990881

Example 3.11

We now move on to Example 3.11. Looking at the coefficients from the multinomial logit model, we can see that the coefficients for NoPlan and RiskF are very similar across the two infection types. Therefore we can constrain the coefficients for NoPlan and RiskF to be equal across infection types using the parallel argument.

\[ \log\frac{P(\texttt{Inf1})}{P(\texttt{Noinf})} = \beta_{10} + \beta_{N}\mathbb{1}_{\texttt{NoPlan}} + \beta_{F}\mathbb{1}_{\texttt{RiskF}} + \beta_{1A}\mathbb{1}_{\texttt{Antib}} \] \[ \log\frac{P(\texttt{Inf2})}{P(\texttt{Noinf})} = \beta_{20} + \beta_{N}\mathbb{1}_{\texttt{NoPlan}} + \beta_{F}\mathbb{1}_{\texttt{RiskF}} + \beta_{2A}\mathbb{1}_{\texttt{Antib}} \]

We run the regression, and extract the coefficients.

Code
caes_vgam_parallel <- vglm(
    formula = cbind(noInf, Inf1, Inf2) ~ NoPlan + Antib + RiskF,
    data = caes_dat,
    family = multinomial(
        refLevel = "noInf",
        parallel = TRUE ~ NoPlan + RiskF - 1
    )
)

coefvlm(caes_vgam_parallel, matrix.out = TRUE)
            log(mu[,2]/mu[,1]) log(mu[,3]/mu[,1])
(Intercept)          -2.751287          -2.443802
NoPlan                1.071967           1.071967
Antib                -3.494351          -3.108688
RiskF                 2.029896           2.029896

We can try running the vglm() function on the original caesar data we manipulated earlier from the Fahrmeir package. We don’t fit the parallel model here.

Code
vglm(
    formula = y ~ noplan + antib + factor,
    data = caesar_prepared,
    weights = w,
    family = multinomial(refLevel = "3")
)
Error in `if (max(abs(ycounts - round(ycounts))) > smallno) ...`:
! missing value where TRUE/FALSE needed

This error is due to the zero-count rows. We remove them and try again.

Code
caesar_no_zero_w <- caesar_prepared |> dplyr::filter(w > 0)
caesar_no_zero_w
# A tibble: 16 × 7
   y         w noplan  factor       antib          yl patco
   <fct> <dbl> <fct>   <fct>        <fct>       <dbl> <dbl>
 1 3        87 not     risk factors antibiotics     0     1
 2 1         4 not     risk factors antibiotics     1     1
 3 2         7 not     risk factors antibiotics     1     1
 4 2        13 not     risk factors without         1     2
 5 1        10 not     risk factors without         1     2
 6 3         3 not     risk factors without         0     2
 7 3         9 not     without      without         0     4
 8 2         1 planned risk factors antibiotics     1     5
 9 3        17 planned risk factors antibiotics     0     5
10 2        17 planned risk factors without         1     6
11 3        30 planned risk factors without         0     6
12 1        11 planned risk factors without         1     6
13 3         2 planned without      antibiotics     0     7
14 2         4 planned without      without         1     8
15 3        32 planned without      without         0     8
16 1         4 planned without      without         1     8
Code
levels(caesar_no_zero_w$noplan)
[1] "planned" "not"    
Code
levels(caesar_no_zero_w$factor)
[1] "without"      "risk factors"
Code
levels(caesar_no_zero_w$antib)
[1] "without"     "antibiotics"
Code
caes_vgam2 <- vglm(
    formula = y ~ noplan + antib + factor,
    data = caesar_no_zero_w,
    weights = w,
    family = multinomial(refLevel = "3")
)

summary(caes_vgam2)

Call:
vglm(formula = y ~ noplan + antib + factor, family = multinomial(refLevel = "3"), 
    data = caesar_no_zero_w, weights = w)

Coefficients: 
                     Estimate Std. Error z value Pr(>|z|)    
(Intercept):1         -2.6210     0.5567  -4.708 2.50e-06 ***
(Intercept):2         -2.5599     0.5463  -4.686 2.79e-06 ***
noplannot:1            1.1742     0.5213   2.253 0.024288 *  
noplannot:2            0.9960     0.4814   2.069 0.038539 *  
antibantibiotics:1    -3.5202     0.6717  -5.241 1.60e-07 ***
antibantibiotics:2    -3.0872     0.5499  -5.614 1.97e-08 ***
factorrisk factors:1   1.8292     0.6023   3.037 0.002390 ** 
factorrisk factors:2   2.1955     0.5870   3.740 0.000184 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Names of linear predictors: log(mu[,2]/mu[,1]), log(mu[,3]/mu[,1])

Residual deviance: 321.8743 on 24 degrees of freedom

Log-likelihood: -160.9371 on 24 degrees of freedom

Number of Fisher scoring iterations: 5 


Reference group is level  1  of the response

We can compare the coefficients from this fit to the earlier vglm fit on the wide-format data.

Code
coefvlm(caes_vgam2, matrix.out = TRUE)
                   log(mu[,2]/mu[,1]) log(mu[,3]/mu[,1])
(Intercept)                 -2.621010         -2.5599132
noplannot                    1.174247          0.9959748
antibantibiotics            -3.520248         -3.0871595
factorrisk factors           1.829241          2.1954542
Code
coefvlm(caes_vgam, matrix.out = TRUE)
            log(mu[,2]/mu[,1]) log(mu[,3]/mu[,1])
(Intercept)          -2.621010         -2.5599132
NoPlan                1.174247          0.9959748
Antib                -3.520248         -3.0871595
RiskF                 1.829241          2.1954542
Code
caes_vgam@criterion
$deviance
[1] 11.82954

$loglikelihood
[1] -20.88715
Code
caes_vgam2@criterion
$deviance
[1] 321.8743

$loglikelihood
[1] -160.9371

We can see that the deviance and log-likelihoods are different between the fits. The deviance of caes_vgam matches that of in the book, however the deviance of caes_vgam2 is just two times the negative log-likelihood of the fitted model.

In R, @ accesses slots of S4 objects, while $ accesses named elements in lists, data frames, and many S3 objects.

vglm() returns an S4 object, so criterion is stored as a slot, and we use caes_vgam@criterion rather than caes_vgam$criterion.

When fitting multinomial models, the data can be structured in either long or wide format. We can convert the data back into long format and confirm we still get the same fit.

The manipulation below:

  1. Removes the total-count column (size) because the long-form weight column will carry counts
  2. Pivots infection outcomes (noInf, Inf1, Inf2) into two columns: category name (inf_type) and count (w)
  3. Sets an explicit factor order for inf_type so the reference and comparisons are reproducible
Code
caesar_long <- caes_dat |>
    select(-size) |>
    pivot_longer(
        cols = noInf:Inf2,
        names_to = "inf_type",
        values_to = "w"
    ) |>
    mutate(inf_type = factor(inf_type, levels = c("noInf", "Inf1", "Inf2")))

caesar_long
# A tibble: 21 × 5
   NoPlan Antib RiskF inf_type     w
    <dbl> <dbl> <dbl> <fct>    <dbl>
 1      0     0     0 noInf       32
 2      0     0     0 Inf1         4
 3      0     0     0 Inf2         4
 4      0     0     1 noInf       30
 5      0     0     1 Inf1        11
 6      0     0     1 Inf2        17
 7      0     1     0 noInf        2
 8      0     1     0 Inf1         0
 9      0     1     0 Inf2         0
10      0     1     1 noInf       17
# ℹ 11 more rows
Code
caes_nnet_long <- multinom(
    inf_type ~ NoPlan + Antib + RiskF,
    data = caesar_long,
    weights = w
)
# weights:  15 (8 variable)
initial  value 275.751684 
iter  10 value 161.068578
final  value 160.937147 
converged
Code
coef(caes_nnet_long)
     (Intercept)    NoPlan     Antib    RiskF
Inf1   -2.621012 1.1742513 -3.520249 1.829241
Inf2   -2.559917 0.9959762 -3.087160 2.195458
Code
coef(caes_nnet_prepared)
  (Intercept) noplannot antibantibiotics factorrisk factors
1   -2.621012 1.1742513        -3.520249           1.829241
2   -2.559917 0.9959762        -3.087160           2.195458

Package citations and references

Code
packages <- c(
    "conflicted",
    "dplyr",
    "Fahrmeir",
    "glmnet",
    "nnet",
    "tidyr",
    "readr",
    "VGAM"
)
for (pkg in packages) {
    cit <- citation(pkg)

    # Check if multiple citations
    if (length(cit) > 1) {
        cat("- **", pkg, "**:\n", sep = "")
        for (i in seq_along(cit)) {
            cit_text <- format(cit[i], style = "text")
            cat("    - ", cit_text, "\n\n", sep = "")
        }
    } else {
        cit_text <- format(cit, style = "text")
        cat("- **", pkg, "**: ", cit_text, "\n\n", sep = "")
    }
}
  • conflicted: Wickham H (2023). conflicted: An Alternative Conflict Resolution Strategy. R package version 1.2.0, https://github.com/r-lib/conflicted, https://conflicted.r-lib.org/.

  • dplyr: Wickham H, François R, Henry L, Müller K, Vaughan D (2026). dplyr: A Grammar of Data Manipulation. R package version 1.2.0, https://github.com/tidyverse/dplyr, https://dplyr.tidyverse.org.

  • Fahrmeir: Halvorsen cbKB (2016). Fahrmeir: Data from the Book “Multivariate Statistical Modelling Based on Generalized Linear Models”, First Edition, by Ludwig Fahrmeir and Gerhard Tutz. R package version 2016.5.31.

  • glmnet:

    • Friedman J, Hastie T, Tibshirani R (2010). “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software, 33(1), 1-22. doi:10.18637/jss.v033.i01 https://doi.org/10.18637/jss.v033.i01.

    • Simon N, Friedman J, Hastie T, Tibshirani R (2011). “Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent.” Journal of Statistical Software, 39(5), 1-13. doi:10.18637/jss.v039.i05 https://doi.org/10.18637/jss.v039.i05.

    • Tay JK, Narasimhan B, Hastie T (2023). “Elastic Net Regularization Paths for All Generalized Linear Models.” Journal of Statistical Software, 106(1), 1-31. doi:10.18637/jss.v106.i01 https://doi.org/10.18637/jss.v106.i01.

  • nnet: Venables WN, Ripley BD (2002). Modern Applied Statistics with S, Fourth edition. Springer, New York. ISBN 0-387-95457-0, https://www.stats.ox.ac.uk/pub/MASS4/.

  • tidyr: Wickham H, Vaughan D, Girlich M (2025). tidyr: Tidy Messy Data. R package version 1.3.2, https://github.com/tidyverse/tidyr, https://tidyr.tidyverse.org.

  • readr: Wickham H, Hester J, Bryan J (2025). readr: Read Rectangular Text Data. R package version 2.1.6, https://github.com/tidyverse/readr, https://readr.tidyverse.org.

  • VGAM:

    • Yee TW (2015). Vector Generalized Linear and Additive Models: With an Implementation in R. Springer, New York, USA.

    • Yee TW, Wild CJ (1996). “Vector Generalized Additive Models.” Journal of Royal Statistical Society, Series B, 58(3), 481-493.

    • Yee TW (2010). “The VGAM Package for Categorical Data Analysis.” Journal of Statistical Software, 32(10), 1-34. doi:10.18637/jss.v032.i10 https://doi.org/10.18637/jss.v032.i10.

    • Yee TW, Hadi AF (2014). “Row-column interaction models, with an R implementation.” Computational Statistics, 29(6), 1427-1445.

    • Yee TW (2025). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-14, https://CRAN.R-project.org/package=VGAM.

    • Yee TW (2013). “Two-parameter reduced-rank vector generalized linear models.” Computational Statistics and Data Analysis, 71, 889-902.

    • Yee TW, Stoklosa J, Huggins RM (2015). “The VGAM Package for Capture-Recapture Data Using the Conditional Likelihood.” Journal of Statistical Software, 65(5), 1-33. doi:10.18637/jss.v065.i05 https://doi.org/10.18637/jss.v065.i05.

    • Yee TW (2020). “The VGAM package for negative binomial regression.” Australian and New Zealand Journal of Statistics, 62(1), 116-131.

    • Yee TW (2022). “On the Hauck-Donner effect in Wald tests: Detection, tipping points and parameter space characterization.” Journal of the American Statistical Association, 117(540), 1763-1774.

    • Yee TW, Ma C (2024). “Generally altered, inflated, truncated and deflated regression.” Statistical Science, 39(4), 568-588.

    • Yee TW, Frigau L, Ma C (2025). “Heaping and seeping, GAITD regression and doubly constrained reduced-rank vector generalized linear models.” Annals of Applied Statistics, (in press), 1-25.

References

Fahrmeir, Ludwig, and Gerhard Tutz. 2001. Multivariate Statistical Modelling Based on Generalized Linear Models. Second Edition. Springer Series in Statistics. New York, NY: Springer. https://doi.org/10.1007/978-1-4757-3454-6.
Venables, W. N., and B. D. Ripley. 2002. Modern Applied Statistics with S. Fourth. New York: Springer. https://www.stats.ox.ac.uk/pub/MASS4/.