Likelihood for Grouped and Ungrouped Multinomial Data

Differences Between Packages

Author

Dennis Leung

Published

April 19, 2026

Overview

Let \[ \mathbf{y}_i = (y_{i1}, \dots, y_{ik}), \qquad i = 1, \dots, n \] be i.i.d. random \(k\)-vectors distributed as \[ \mathbf{y}_i \sim \operatorname{Multinom}(1, \boldsymbol{\pi}). \] That is, each \(\mathbf{y}_i\) is a multinomial distributed random vector of 1 trial with class probability \(k\)-vector \[ \boldsymbol{\pi} = ({\pi}_1, \dots, {\pi}_k). \] For any given \(\mathbf{y}_i\), only one of \(y_{i1}, \ldots, y_{ik}\) can equal 1.

There are two ways to view the likelihood of multinomial data:

  1. Ungrouped View: If we compute the likelihood of the data as is, we get: \[ L_{\text{u}}(\boldsymbol{\pi}) = \prod_{i=1}^{n} {\pi}_1^{y_{i1}} \cdots {\pi}_k^{y_{ik}} \] with log-likelihood: \[ \ell_{\text{u}}(\boldsymbol{\pi}) = \sum_{j=1}^{k} \log({\pi}_j) \sum_{i=1}^{n} y_{ij} \tag{1}\]

  2. Grouped View: Alternatively, since they are i.i.d., we can group all the \(\mathbf{y}_i\)’s as: \[ \tilde{\mathbf{y}} = \sum_{i=1}^{n} \mathbf{y}_i \] so \[ \tilde{\mathbf{y}} = ( \tilde{y}_1, \ldots, \tilde{y}_k) = \biggl(\sum_{i=1}^n y_{i1}, \ldots, \sum_{i=1}^n y_{ik}\biggr) \sim \operatorname{Multinom}(n, \boldsymbol{\pi}), \] and treat \(\tilde{\mathbf{y}}\) as our raw data. In this view, the likelihood is: \[ L_{\text{g}}(\boldsymbol{\pi}) = \frac{n!}{\tilde{y}_1! \cdots \tilde{y}_k!} {\pi}_1^{\tilde{y}_1} \cdots {\pi}_k^{\tilde{y}_k} \] with log-likelihood: \[ \ell_{\text{g}}(\boldsymbol{\pi}) = { \log \biggl(\frac{n!}{\tilde{y}_1! \cdots \tilde{y}_k!} \biggr) + \sum_{j=1}^{k} \log({\pi}_j)\sum_{i=1}^{n} y_{ij} } \tag{2}\]

Comparing log-likelihood equations (1) and (2), we see that the grouped and ungrouped log-likelihoods differ by:

\[ \log \biggl(\frac{n!}{\tilde{y}_1! \cdots \tilde{y}_k!} \biggr) \]

This term adjusts for the different permutations that can give rise to the same grouped multinomial data.

Example: C-Section Dataset

Code
library(conflicted)
library(readr)
library(VGAM)
library(nnet)

Consider the original wide format of the C-section dataset:

Code
caes_dat <- read_csv("data/Example3-1Caes.csv")
Rows: 7 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (7): size, noInf, Inf1, Inf2, NoPlan, Antib, RiskF

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
caes_dat
# A tibble: 7 × 7
   size noInf  Inf1  Inf2 NoPlan Antib RiskF
  <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>
1    40    32     4     4      0     0     0
2    58    30    11    17      0     0     1
3     2     2     0     0      0     1     0
4    18    17     0     1      0     1     1
5     9     9     0     0      1     0     0
6    26     3    10    13      1     0     1
7    98    87     4     7      1     1     1

Using nnet::multinom()

When we fit a multinomial model using nnet::multinom():

Code
caes_nnet <- nnet::multinom(
  cbind(noInf, Inf1, Inf2) ~ NoPlan + Antib + RiskF,
  data = caes_dat
)
# weights:  15 (8 variable)
initial  value 275.751684 
iter  10 value 161.068578
final  value 160.937147 
converged
Code
nnet_logLik <- deviance(caes_nnet) / (-2)
nnet_logLik
[1] -160.9371

The reported log-likelihood is -160.937.

Using VGAM::vglm()

When we fit using VGAM::vglm():

Code
Caes_vgam <- VGAM::vglm(
  formula = cbind(noInf, Inf1, Inf2) ~ NoPlan + Antib + RiskF,
  data = caes_dat,
  family = VGAM::multinomial(refLevel = "noInf")
)
vgam_logLik <- logLik(Caes_vgam)
vgam_logLik
[1] -20.88715

The reported log-likelihood is now a different number!

Explanation

To understand the difference, we compute the “permutation factor” \[ \log \biggl(\frac{n!}{\tilde{y}_1! \cdots \tilde{y}_k!} \biggr) \] in (2) for each of the seven covariate groups, and sum them:

Code
perm_fact_sum <- sum(log(
  factorial(caes_dat$size) /
    (factorial(caes_dat$noInf) *
      factorial(caes_dat$Inf1) *
      factorial(caes_dat$Inf2))
))
perm_fact_sum
[1] 140.05

This is precisely the difference between the two log-likelihoods:

Code
vgam_logLik - nnet_logLik
[1] 140.05

Conclusion

For the original “wide format data”:

  • nnet::multinom() treats it as ungrouped data
  • VGAM::vglm() treats it as grouped data

Moral: One must be very careful when interpreting numbers reported by different packages. The same data can produce different values for the what seems to be the same quantity.

Package Citations and Session Information

Code
sessionInfo()
R version 4.4.2 (2024-10-31)
Platform: x86_64-apple-darwin20
Running under: macOS Sequoia 15.7.5

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Australia/Melbourne
tzcode source: internal

attached base packages:
[1] splines   stats4    stats     graphics  grDevices datasets  utils    
[8] methods   base     

other attached packages:
[1] nnet_7.3-19      VGAM_1.1-14      readr_2.1.6      conflicted_1.2.0

loaded via a namespace (and not attached):
 [1] crayon_1.5.3      vctrs_0.7.1       cli_3.6.5         knitr_1.51       
 [5] rlang_1.1.7       xfun_0.56         renv_1.1.7        jsonlite_2.0.0   
 [9] bit_4.6.0         glue_1.8.0        htmltools_0.5.9   hms_1.1.4        
[13] rmarkdown_2.30    evaluate_1.0.5    tibble_3.3.1      tzdb_0.5.0       
[17] fastmap_1.2.0     yaml_2.3.12       lifecycle_1.0.5   memoise_2.0.1    
[21] compiler_4.4.2    pkgconfig_2.0.3   rstudioapi_0.18.0 digest_0.6.39    
[25] R6_2.6.1          tidyselect_1.2.1  parallel_4.4.2    vroom_1.7.0      
[29] pillar_1.11.1     magrittr_2.0.4    bit64_4.6.0-1     tools_4.4.2      
[33] cachem_1.1.0