Code
library(conflicted)
library(readr)
library(VGAM)
library(nnet)Differences Between Packages
Dennis Leung
April 19, 2026
Let \[ \mathbf{y}_i = (y_{i1}, \dots, y_{ik}), \qquad i = 1, \dots, n \] be i.i.d. random \(k\)-vectors distributed as \[ \mathbf{y}_i \sim \operatorname{Multinom}(1, \boldsymbol{\pi}). \] That is, each \(\mathbf{y}_i\) is a multinomial distributed random vector of 1 trial with class probability \(k\)-vector \[ \boldsymbol{\pi} = ({\pi}_1, \dots, {\pi}_k). \] For any given \(\mathbf{y}_i\), only one of \(y_{i1}, \ldots, y_{ik}\) can equal 1.
There are two ways to view the likelihood of multinomial data:
Ungrouped View: If we compute the likelihood of the data as is, we get: \[ L_{\text{u}}(\boldsymbol{\pi}) = \prod_{i=1}^{n} {\pi}_1^{y_{i1}} \cdots {\pi}_k^{y_{ik}} \] with log-likelihood: \[ \ell_{\text{u}}(\boldsymbol{\pi}) = \sum_{j=1}^{k} \log({\pi}_j) \sum_{i=1}^{n} y_{ij} \tag{1}\]
Grouped View: Alternatively, since they are i.i.d., we can group all the \(\mathbf{y}_i\)’s as: \[ \tilde{\mathbf{y}} = \sum_{i=1}^{n} \mathbf{y}_i \] so \[ \tilde{\mathbf{y}} = ( \tilde{y}_1, \ldots, \tilde{y}_k) = \biggl(\sum_{i=1}^n y_{i1}, \ldots, \sum_{i=1}^n y_{ik}\biggr) \sim \operatorname{Multinom}(n, \boldsymbol{\pi}), \] and treat \(\tilde{\mathbf{y}}\) as our raw data. In this view, the likelihood is: \[ L_{\text{g}}(\boldsymbol{\pi}) = \frac{n!}{\tilde{y}_1! \cdots \tilde{y}_k!} {\pi}_1^{\tilde{y}_1} \cdots {\pi}_k^{\tilde{y}_k} \] with log-likelihood: \[ \ell_{\text{g}}(\boldsymbol{\pi}) = { \log \biggl(\frac{n!}{\tilde{y}_1! \cdots \tilde{y}_k!} \biggr) + \sum_{j=1}^{k} \log({\pi}_j)\sum_{i=1}^{n} y_{ij} } \tag{2}\]
Comparing log-likelihood equations (1) and (2), we see that the grouped and ungrouped log-likelihoods differ by:
\[ \log \biggl(\frac{n!}{\tilde{y}_1! \cdots \tilde{y}_k!} \biggr) \]
This term adjusts for the different permutations that can give rise to the same grouped multinomial data.
Consider the original wide format of the C-section dataset:
Rows: 7 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (7): size, noInf, Inf1, Inf2, NoPlan, Antib, RiskF
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 7 × 7
size noInf Inf1 Inf2 NoPlan Antib RiskF
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 40 32 4 4 0 0 0
2 58 30 11 17 0 0 1
3 2 2 0 0 0 1 0
4 18 17 0 1 0 1 1
5 9 9 0 0 1 0 0
6 26 3 10 13 1 0 1
7 98 87 4 7 1 1 1
When we fit a multinomial model using nnet::multinom():
# weights: 15 (8 variable)
initial value 275.751684
iter 10 value 161.068578
final value 160.937147
converged
[1] -160.9371
The reported log-likelihood is -160.937.
When we fit using VGAM::vglm():
[1] -20.88715
The reported log-likelihood is now a different number!
To understand the difference, we compute the “permutation factor” \[ \log \biggl(\frac{n!}{\tilde{y}_1! \cdots \tilde{y}_k!} \biggr) \] in (2) for each of the seven covariate groups, and sum them:
[1] 140.05
This is precisely the difference between the two log-likelihoods:
For the original “wide format data”:
nnet::multinom() treats it as ungrouped dataVGAM::vglm() treats it as grouped dataMoral: One must be very careful when interpreting numbers reported by different packages. The same data can produce different values for the what seems to be the same quantity.
R version 4.4.2 (2024-10-31)
Platform: x86_64-apple-darwin20
Running under: macOS Sequoia 15.7.5
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Australia/Melbourne
tzcode source: internal
attached base packages:
[1] splines stats4 stats graphics grDevices datasets utils
[8] methods base
other attached packages:
[1] nnet_7.3-19 VGAM_1.1-14 readr_2.1.6 conflicted_1.2.0
loaded via a namespace (and not attached):
[1] crayon_1.5.3 vctrs_0.7.1 cli_3.6.5 knitr_1.51
[5] rlang_1.1.7 xfun_0.56 renv_1.1.7 jsonlite_2.0.0
[9] bit_4.6.0 glue_1.8.0 htmltools_0.5.9 hms_1.1.4
[13] rmarkdown_2.30 evaluate_1.0.5 tibble_3.3.1 tzdb_0.5.0
[17] fastmap_1.2.0 yaml_2.3.12 lifecycle_1.0.5 memoise_2.0.1
[21] compiler_4.4.2 pkgconfig_2.0.3 rstudioapi_0.18.0 digest_0.6.39
[25] R6_2.6.1 tidyselect_1.2.1 parallel_4.4.2 vroom_1.7.0
[29] pillar_1.11.1 magrittr_2.0.4 bit64_4.6.0-1 tools_4.4.2
[33] cachem_1.1.0