Likelihood for Grouped and Ungrouped Multinomial Data

Differences Between Packages

Author

Dennis Leung

Published

March 12, 2026

Overview

Let \[ \mathbf{y}_i = (y_{i1}, \dots, y_{ik}), \qquad i = 1, \dots, n \] be i.i.d random \(k\)-vectors distributed as \[ \mathbf{y}_i \sim \operatorname{Multinom}(1, \boldsymbol{\pi}). \] That is, each \(\mathbf{y}_i\) is a multinomial distributed random vector of 1 trial with class probability \(k\)-vector \[ \boldsymbol{\pi} = ({\pi}_1, \dots, {\pi}_k). \] For any given \(\mathbf{y}_i\), only one of \(y_{i1}, \ldots, y_{ik}\) can equal 1.

There are two ways to view the likelihood of multinomial data:

  1. Ungrouped View: If we compute the likelihood of the data as is, we get: \[ L_{\text{u}}(\boldsymbol{\pi}) = \prod_{i=1}^{n} {\pi}_1^{y_{i1}} \cdots {\pi}_k^{y_{ik}} \] with log-likelihood: \[ \ell_{\text{u}}(\boldsymbol{\pi}) = \sum_{j=1}^{k} \log({\pi}_j) \sum_{i=1}^{n} y_{ij} \tag{1}\]

  2. Grouped View Alternatively, we can group the \(\mathbf{y}_i\)’s as: \[ \mathbf{x} = \sum_{i=1}^{n} \mathbf{y}_i \] so \[ \mathbf{x} = (x_1, \ldots, x_k) = \biggl(\sum_i y_{i1}, \ldots, \sum_i y_{ik}\biggr) \sim \operatorname{Multinom}(n, \boldsymbol{\pi}), \] and treat \(\mathbf{x}\) as our raw data. In this view, the likelihood is: \[ L_{\text{g}}(\boldsymbol{\pi}) = \frac{n!}{x_1! \cdots x_k!} {\pi}_1^{x_1} \cdots {\pi}_k^{x_k} \] with log-likelihood: \[ \ell_{\text{g}}(\boldsymbol{\pi}) = { \log \biggl(\frac{n!}{x_1! \cdots x_k!} \biggr) + \sum_{j=1}^{k} \log({\pi}_j)\sum_{i=1}^{n} y_{ij} } \tag{2}\]

Comparing log-likelihood equations (1) and (2), we see that the grouped and ungrouped log-likelihoods differ by:

\[ \log\biggl(\frac{n!}{x_1! \cdots x_k!}\biggr) \]

This term adjusts for the different permutations that can give rise to the same grouped multinomial data.

Example: C-Section Dataset

Code
library(conflicted)
library(readr)
library(VGAM)
library(nnet)

Consider the original wide format of the C-section dataset:

Code
caes_dat <- read_csv("data/Example3-1Caes.csv")
Rows: 7 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (7): size, noInf, Inf1, Inf2, NoPlan, Antib, RiskF

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
caes_dat
# A tibble: 7 × 7
   size noInf  Inf1  Inf2 NoPlan Antib RiskF
  <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>
1    40    32     4     4      0     0     0
2    58    30    11    17      0     0     1
3     2     2     0     0      0     1     0
4    18    17     0     1      0     1     1
5     9     9     0     0      1     0     0
6    26     3    10    13      1     0     1
7    98    87     4     7      1     1     1

Using nnet::multinom()

When we fit a multinomial model using nnet::multinom(), treating the data as ungrouped:

Code
caes_nnet <- nnet::multinom(
    cbind(noInf, Inf1, Inf2) ~ NoPlan + Antib + RiskF,
    data = caes_dat
)
# weights:  15 (8 variable)
initial  value 275.751684 
iter  10 value 161.068578
final  value 160.937147 
converged
Code
nnet_logLik <- caes_nnet$deviance / (-2)
nnet_logLik
[1] -160.9371

The reported log-likelihood is -160.937.

Using VGAM::vglm()

When we fit using VGAM::vglm() with the data in grouped format:

Code
Caes_vgam <- VGAM::vglm(
    formula = cbind(noInf, Inf1, Inf2) ~ NoPlan + Antib + RiskF,
    data = caes_dat,
    family = VGAM::multinomial(refLevel = "noInf")
)
vgam_logLik <- logLik(Caes_vgam)
vgam_logLik
[1] -20.88715

The reported log-likelihood is now a different number!

Explanation

To understand the difference, we compute the sum of the “permutation factors” \[ \log\biggl(\frac{n!}{x_1! \cdots x_k!}\biggr) \] in (2) for the seven covariate groups:

Code
perm_fact_sum <- sum(log(
    factorial(caes_dat$size) /
        (factorial(caes_dat$noInf) *
            factorial(caes_dat$Inf1) *
            factorial(caes_dat$Inf2))
))
perm_fact_sum
[1] 140.05

This is precisely the difference between the two log-likelihoods:

Code
vgam_logLik - nnet_logLik
[1] 140.05

Conclusion

For the original “wide format data”:

  • nnet::multinom() treats it as ungrouped data
  • VGAM::vglm() treats it as grouped data

Moral: One must be very careful when interpreting numbers reported by different packages. The same data can produce different values for the what seems to be the same quantity.