Binomial Generalized Linear Models: Caesarian Birth Study

Fahrmeir and Tutz (2001) Examples 2.1 and 2.4

Published

March 12, 2026

Overview

The Fahrmeir & Tutz book is available to download for free from the University of Melbourne Library

This script reproduces Examples 2.1, 2.4 from Fahrmeir and Tutz (2001) (pages 30 and 51). We consider the Caesarian birth study with 251 births. We consider the binary response variable \(y\), which is 1 for infection, and 0 for no infection.

There are 3 categorical variables:

  1. NOPLAN: whether or not the Caesarian was planned or not
  2. FACTOR: whether there are risk factors present or not
  3. ANTIB: whether antibiotics were given to prevent infection (rather than treat infection)

We use fit the logistic regression model \[ \ln\biggl(\frac{\Pr(\text{infection})}{\Pr(\text{no infection})}\biggr) = \beta_{0} + \beta_{1}\texttt{NOPLAN} + \beta_{2}\texttt{FACTOR} +\beta_{3}\texttt{ANTIB} \] to the data.

We want to replicate the following table from page 30:

Covariate Effect
1 -1.89
NOPLAN 1.07
FACTOR 2.03
ANTIB -3.25

Data Setup

We start by loading the Caesar dataset from the Fahrmeir package. If you don’t already have the packages installed, run

Code
install.packages(c("conflicted", "dplyr", "Fahrmeir"))

The conflicted library forces you to choose which library you want to call a function from if it is in two or more loaded libraries. This is very important, particularly if you are looking at legacy code (old code) from someone else who may have had particular functions loaded.

Code
library(conflicted) # avoid calling functions in multiple libraries
library(dplyr) # manipulate dataframes
library(forcats) # deal with categorical data (factors)
library(Fahrmeir) # load datasets from the book

We load the Caeser dataset in (from the Fahrmeir package), and inspect the top few rows using head.

Code
data(caesar)
head(caesar)
  y  w noplan       factor       antib yl patco
1 3 87    not risk factors antibiotics  0     1
2 1  4    not risk factors antibiotics  1     1
3 2  7    not risk factors antibiotics  1     1
4 2 13    not risk factors     without  1     2
5 1 10    not risk factors     without  1     2
6 3  3    not risk factors     without  0     2

You can inspect information about the data set by running the following command in RStudio

Code
?caeser

Data Transformation

The raw caesar dataset has one row per birth (251 rows total). For binomial regression, we need to aggregate the data: one row per unique combination of covariates (noplan, factor, antib), with columns for total cases and infection counts in each group.

First, convert to a tibble for clearer printing, and easier data manipulation. Note that tibble print the data types of each column.

Code
caesar_tbl <- as_tibble(caesar)
caesar_tbl
# A tibble: 24 × 7
   y         w noplan factor       antib          yl patco
   <fct> <dbl> <fct>  <fct>        <fct>       <dbl> <dbl>
 1 3        87 not    risk factors antibiotics     0     1
 2 1         4 not    risk factors antibiotics     1     1
 3 2         7 not    risk factors antibiotics     1     1
 4 2        13 not    risk factors without         1     2
 5 1        10 not    risk factors without         1     2
 6 3         3 not    risk factors without         0     2
 7 2         0 not    without      antibiotics     1     3
 8 3         0 not    without      antibiotics     0     3
 9 1         0 not    without      antibiotics     1     3
10 2         0 not    without      without         1     4
# ℹ 14 more rows

Grouping and Summarizing

Now we use a pipe (|>) to chain operations together. Think of the pipe as saying “and then…”:

  • Take caesar_tbl and then
  • Group rows by covariate combinations and then
  • Count totals and infections per group
Code
caesar_tbl <- caesar_tbl |>
    group_by(noplan, factor, antib) |>
    summarize(
        size = sum(w),
        Infect_count = sum(yl * w),
        .groups = "drop"
    )
caesar_tbl
# A tibble: 8 × 5
  noplan  factor       antib        size Infect_count
  <fct>   <fct>        <fct>       <dbl>        <dbl>
1 not     risk factors antibiotics    98           11
2 not     risk factors without        26           23
3 not     without      antibiotics     0            0
4 not     without      without         9            0
5 planned risk factors antibiotics    18            1
6 planned risk factors without        58           28
7 planned without      antibiotics     2            0
8 planned without      without        40            8

Setting .groups = "drop" removes the grouping after summarize(), giving us a normal tibble, and avoiding confusing behaviour in later operations.

The third row has size zero, and so we remove it (and any other empty variable combinations)

Code
caesar_tbl <- caesar_tbl |>
    dplyr::filter(size != 0)

caesar_tbl
# A tibble: 7 × 5
  noplan  factor       antib        size Infect_count
  <fct>   <fct>        <fct>       <dbl>        <dbl>
1 not     risk factors antibiotics    98           11
2 not     risk factors without        26           23
3 not     without      without         9            0
4 planned risk factors antibiotics    18            1
5 planned risk factors without        58           28
6 planned without      antibiotics     2            0
7 planned without      without        40            8

Unfold the following code to see the equivalent unpiped version.

Code
# Step 1: Group by covariates
caesar_tbl <- group_by(caesar_tbl, noplan, factor, antib)

# Step 2: Compute totals per group
caesar_tbl <- summarize(
    caesar_tbl,
    size = sum(w),
    Infect_count = sum(yl * w),
    .groups = "drop"
)

# Step 3: Remove empty cells
caesar_tbl <- dplyr::filter(caesar_tbl, size != 0)

Understanding Factor Coding

It is good practice to understand the data structure we have. From the above tibble we can see that we have 7 observations of 5 variables. The benefit of tibbles is that we can see what class each variable has. noplan, factor, and antib have the class factor. size and Infect_count have the class double (numeric).

Look at a description of the structure of caesar and the coding scheme of each variable of type “factor” using the function contrasts().

Code
contrasts(caesar_tbl$noplan)
        planned
not           0
planned       1
Code
contrasts(caesar_tbl$factor)
             without
risk factors       0
without            1
Code
contrasts(caesar_tbl$antib)
            without
antibiotics       0
without           1

We can see that these are coded in the exact opposite way that F & T describes, and so we need to redefine the reference levels of these variables. Set the reference level for each factor. The reference level (first level) becomes the baseline category in regression coefficients.

Code
caesar_tbl <- caesar_tbl |>
    mutate(
        noplan = fct_relevel(noplan, "planned"),
        factor = fct_relevel(factor, "without"),
        antib = fct_relevel(antib, "without")
    )

contrasts(caesar_tbl$noplan)
        not
planned   0
not       1
Code
contrasts(caesar_tbl$factor)
             risk factors
without                 0
risk factors            1
Code
contrasts(caesar_tbl$antib)
            antibiotics
without               0
antibiotics           1

Main Effects Model

Now that our dataset looks good, we are ready to go ahead with the regression. Infect_count / size as our \(y\) gives us the proportion of women with an infection per group of covariates. When we use grouped data, the weight should always be the same as the number of counts of that combination of covariates, hence weights = size.

Fit the regression with the logit link (default). The unscaled deviance gives the same scaled deviance for binomial and poisson data. (Note: The documentation of glm() about what it is reporting for the deviance may be misleading, because it ignores the fact that deviance only becomes twice the log-likelihood difference when rescaled by the dispersion.)

Code
caesar_glm_fct <- glm(
    Infect_count / size ~ antib + factor + noplan,
    family = binomial,
    data = caesar_tbl,
    weights = size
)

caesar_glm_fct

Call:  glm(formula = Infect_count/size ~ antib + factor + noplan, family = binomial, 
    data = caesar_tbl, weights = size)

Coefficients:
       (Intercept)    antibantibiotics  factorrisk factors           noplannot  
            -1.893              -3.254               2.030               1.072  

Degrees of Freedom: 6 Total (i.e. Null);  3 Residual
Null Deviance:      83.49 
Residual Deviance: 11   AIC: 36.18

These results line up with the ones in the book (yay).

Alternatively, we can assign integer values to the factors. This makes the dataframe look exactly like it matches the books notation exactly.

Code
caesar_tbl_int <- caesar_tbl |>
    mutate(
        noplan = as.integer(noplan == "not"),
        factor = as.integer(factor == "risk factors"),
        antib = as.integer(antib == "antibiotics")
    )

caesar_tbl_int
# A tibble: 7 × 5
  noplan factor antib  size Infect_count
   <int>  <int> <int> <dbl>        <dbl>
1      1      1     1    98           11
2      1      1     0    26           23
3      1      0     0     9            0
4      0      1     1    18            1
5      0      1     0    58           28
6      0      0     1     2            0
7      0      0     0    40            8
Code
caesar_tbl
# A tibble: 7 × 5
  noplan  factor       antib        size Infect_count
  <fct>   <fct>        <fct>       <dbl>        <dbl>
1 not     risk factors antibiotics    98           11
2 not     risk factors without        26           23
3 not     without      without         9            0
4 planned risk factors antibiotics    18            1
5 planned risk factors without        58           28
6 planned without      antibiotics     2            0
7 planned without      without        40            8

We can see that we get the same results for the integer version

Code
caesar_int_glm <- glm(
    Infect_count / size ~ antib + factor + noplan,
    family = binomial,
    data = caesar_tbl_int,
    weights = size
)

caesar_int_glm

Call:  glm(formula = Infect_count/size ~ antib + factor + noplan, family = binomial, 
    data = caesar_tbl_int, weights = size)

Coefficients:
(Intercept)        antib       factor       noplan  
     -1.893       -3.254        2.030        1.072  

Degrees of Freedom: 6 Total (i.e. Null);  3 Residual
Null Deviance:      83.49 
Residual Deviance: 11   AIC: 36.18

We can see the results more fully by using summary()

Code
summary(caesar_int_glm)

Call:
glm(formula = Infect_count/size ~ antib + factor + noplan, family = binomial, 
    data = caesar_tbl_int, weights = size)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.8926     0.4124  -4.589 4.45e-06 ***
antib        -3.2544     0.4813  -6.761 1.37e-11 ***
factor        2.0299     0.4553   4.459 8.25e-06 ***
noplan        1.0720     0.4254   2.520   0.0117 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 83.491  on 6  degrees of freedom
Residual deviance: 10.997  on 3  degrees of freedom
AIC: 36.178

Number of Fisher Scoring iterations: 5

It is important to understand how glm() reports the summary. We can see that the reported deviance is 10.997. This reported deviance is what we call the unscaled deviance. In the binomial and Poisson cases, the scaled deviance will be the same as the unscaled deviance.

Note

The documentation of glm() about what it is reporting for the deviance may be misleading, because it ignores the fact that deviance only becomes twice the log-likelihood difference when rescaled by the dispersion.

We can now consider the significance of each of the variables

Code
anova(caesar_int_glm, test = "Chisq")
Analysis of Deviance Table

Model: binomial, link: logit

Response: Infect_count/size

Terms added sequentially (first to last)

       Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                       6     83.491              
antib   1   38.736         5     44.756 4.852e-10 ***
factor  1   26.881         4     17.874 2.164e-07 ***
noplan  1    6.878         3     10.997  0.008728 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This table closely matches the one in the notes. All three effects are highly significant.

Code
anova(caesar_int_glm, test = "F")
Warning in anova.glm(caesar_int_glm, test = "F"): using F test with a
'binomial' family is inappropriate
Analysis of Deviance Table

Model: binomial, link: logit

Response: Infect_count/size

Terms added sequentially (first to last)

       Df Deviance Resid. Df Resid. Dev       F    Pr(>F)    
NULL                       6     83.491                      
antib   1   38.736         5     44.756 38.7358 4.852e-10 ***
factor  1   26.881         4     17.874 26.8811 2.164e-07 ***
noplan  1    6.878         3     10.997  6.8777  0.008728 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

For binomial GLMs, the natural test for nested models is based on differences in deviance, which asymptotically follows a chi-square distribution. The F test is appropriate for Gaussian linear models with unknown dispersion (and certain quasi-likelihood cases), but not for binomial GLMs with fixed dispersion.

Model Diagnostics

We can extract residuals of different types; here we collect them into a single tibble and round to three decimals for readability.

Code
tibble(
    deviance = resid(caesar_int_glm, type = "deviance"),
    pearson = resid(caesar_int_glm, type = "pearson"),
    working = resid(caesar_int_glm, type = "working"),
    response = resid(caesar_int_glm, type = "response")
)
# A tibble: 7 × 4
  deviance pearson working response
     <dbl>   <dbl>   <dbl>    <dbl>
1  -0.0716 -0.0714 -0.0226 -0.00230
2   1.50    1.39    0.647   0.114  
3  -2.56   -1.99   -1.44   -0.306  
4   0.265   0.277   0.324   0.0131 
5  -0.785  -0.786  -0.207  -0.0515 
6  -0.152  -0.108  -1.01   -0.00578
7   1.22    1.29    0.607   0.0691 

As a sanity check, if we sum the squares of the deviance residuals, we should recover the deviance.

Perform Pearson and deviance tests for model goodness-of-fit. Compare fitted and realized proportions as on the top of p.52.

Code
sum(resid(caesar_int_glm, type = "deviance")^2)
[1] 10.99674
Code
deviance(caesar_int_glm)
[1] 10.99674

We can also test the goodness of fit using the deviance, and the Pearson statistic. Both of these converge to the same asymptotic chi-squared distribution. If we assume that the number of observations in each group is sufficiently large, the degrees of freedom in the chi-squared distribution is the difference between the number of parameters in the saturated model (7, make sure you can explain why), and the degrees of freedom in the glm model (4).

Both the deviance test

Code
1 - pchisq(caesar_int_glm$deviance, df = 3)
[1] 0.01174351

and the Pearson test

Code
1 - pchisq(sum(resid(caesar_int_glm, type = "pearson")^2), df = 3)
[1] 0.04069089

return values under 0.05, and so under both tests (assuming sufficient asymptotics apply), we reject null hypothesis that the data can be explained by these models, with significant level of 0.05.

We can investigate why this might be by comparing the observed and fitted proportions side by side:

Code
tibble(
    observed = round(caesar_tbl_int$Infect_count / caesar_tbl_int$size, 2),
    fitted = round(caesar_int_glm$fitted.values, 2)
)
# A tibble: 7 × 2
  observed fitted
     <dbl>  <dbl>
1     0.11   0.11
2     0.88   0.77
3     0      0.31
4     0.06   0.04
5     0.48   0.53
6     0      0.01
7     0.2    0.13

We can see that this lines up with what we have in the book (Example 2.4), with a significant discrepancy in the 3rd row (unplanned, no risk factors, without antibiotics).

Interaction Effects

We now follow Example 2.4 and add in interaction effects.

Adding noplan:antib Interaction

We test the interaction between noplan (planned/emergency surgery) and antib (antibiotic use). We are aiming to replicate the results in book

The deviance must decrease, since it is a larger, nested model.

Code
caesar_glm_noplan_antib <- glm(
    Infect_count / size ~ factor + noplan * antib,
    data = caesar_tbl_int,
    weights = size,
    family = binomial
)

anova(caesar_glm_noplan_antib)
Analysis of Deviance Table

Model: binomial, link: logit

Response: Infect_count/size

Terms added sequentially (first to last)

             Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                             6     83.491              
factor        1    5.483         5     78.008   0.01920 *  
noplan        1    3.745         4     74.263   0.05297 .  
antib         1   63.267         3     10.997 1.805e-15 ***
noplan:antib  1    0.078         2     10.918   0.77949    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The change in deviance agrees with the numbers in Example 2.4.

To test the significance of this interaction effect, we use the limiting chi-squared distribution with degree of freedom one.

Tip

Make sure you know why the degree of freedom is one.

We extract the deviance for the interaction term and compute the \(p\)-value manually:

Code
anova_result <- anova(caesar_glm_noplan_antib)
interaction_deviance <- tail(anova_result$Deviance, 1)
interaction_deviance
[1] 0.07838813
Code
1 - pchisq(interaction_deviance, df = 1)
[1] 0.7794938

We do not reject the null hypothesis that the interaction effect is zero.

Adding noplan:factor Interaction

This is not the only possible interaction effect. We now try the interaction effect noplan:factor.

Code
caesar_glm_noplan_factor <- glm(
    Infect_count / size ~ noplan + factor + antib + noplan:factor,
    data = caesar_tbl_int,
    weights = size,
    family = binomial
)

# eqiuvalently, you can do
# caesar_glm_noplan_factor <- glm(Infect_count/size ~  antib + noplan*factor,
#                    data=caesar_tbl , weight=size, family=binomial)

summary(caesar_glm_noplan_factor)

Call:
glm(formula = Infect_count/size ~ noplan + factor + antib + noplan:factor, 
    family = binomial, data = caesar_tbl_int, weights = size)

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)      -1.3880     0.3952  -3.512 0.000444 ***
noplan          -20.6379 12262.9882  -0.002 0.998657    
factor            1.3622     0.4729   2.881 0.003968 ** 
antib            -3.8294     0.6025  -6.355 2.08e-10 ***
noplan:factor    22.4866 12262.9882   0.002 0.998537    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 83.49142  on 6  degrees of freedom
Residual deviance:  0.95499  on 2  degrees of freedom
AIC: 28.136

Number of Fisher Scoring iterations: 19

We see some enormous estimates and standard errors for some coefficients. The standard errors are the diagonal of the inverse of the Fisher information matrix. Therefore the values of the Fisher information are very small. Therefore, when we plot the log-likelihood it will have a very small second derivative, and will be very flat in these dimensions.

This means that the estimates are likely to be very unreliable. Read the book in detail to understand why, but in summary, there are two rows with no infection counts, which means that certain conditions are not satisfied. See the section Uniqueness and existence of MLE’s (p.43 of textbook) for more information.

The book says ‘The reason is that data are too sparse with responses for the cell NOPLAN = 0, FACTOR = 0 and ANTIB = 1 all in the “yes” category, and for the cell NOPLAN = 1, FACTOR= 0 and ANTIB = 0 all in the “no” category.’ however, there is a typo. The first “yes” should instead say “no”.

If we ignore the problem for the time being, we can continue by testing for goodness-of-fit. Compare fitted and realized proportions as on p.53.

Code
1 - pchisq(sum(resid(caesar_glm_noplan_factor, type = "pearson")^2), df = 2)
[1] 0.5293846
Code
1 - pchisq(sum(resid(caesar_glm_noplan_factor, type = "deviance")^2), df = 2)
[1] 0.6203355

Both statistics give a large \(p\)-value, and so we don’t reject the interaction.

Code
tibble(
    observed = round(caesar_tbl_int$Infect_count / caesar_tbl_int$size, 2),
    fitted = round(caesar_glm_noplan_factor$fitted.values, 2)
)
# A tibble: 7 × 2
  observed fitted
     <dbl>  <dbl>
1     0.11   0.12
2     0.88   0.86
3     0      0   
4     0.06   0.02
5     0.48   0.49
6     0      0.01
7     0.2    0.2 

We can see that the observed and fitted values are very similar.

Handling Separation Issues

The book recommends that to make the analysis more robust, we should add fictitious data to the two zero Infect_count cells to avoid existence of MLE issue. We do this by adding a ‘count’ of 0.5 infections and 1 total case to problematic row.

Code
caesar_tbl_int
# A tibble: 7 × 5
  noplan factor antib  size Infect_count
   <int>  <int> <int> <dbl>        <dbl>
1      1      1     1    98           11
2      1      1     0    26           23
3      1      0     0     9            0
4      0      1     1    18            1
5      0      1     0    58           28
6      0      0     1     2            0
7      0      0     0    40            8
Code
caesar_augment <- caesar_tbl_int
caesar_augment$Infect_count[3] <- 0.5
caesar_augment$size[3] <- caesar_augment$size[3] + 1

caesar_augment
# A tibble: 7 × 5
  noplan factor antib  size Infect_count
   <int>  <int> <int> <dbl>        <dbl>
1      1      1     1    98         11  
2      1      1     0    26         23  
3      1      0     0    10          0.5
4      0      1     1    18          1  
5      0      1     0    58         28  
6      0      0     1     2          0  
7      0      0     0    40          8  
Code
caesar_glm_noplan_factor_augment <- glm(
    Infect_count / size ~ noplan + factor + antib + noplan:factor,
    data = caesar_augment,
    weights = size,
    family = binomial
)
Warning in eval(family$initialize): non-integer #successes in a binomial glm!
Code
summary(caesar_glm_noplan_factor_augment)

Call:
glm(formula = Infect_count/size ~ noplan + factor + antib + noplan:factor, 
    family = binomial, data = caesar_augment, weights = size)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -1.3880     0.3952  -3.512 0.000444 ***
noplan         -1.5565     1.5038  -1.035 0.300661    
factor          1.3622     0.4729   2.881 0.003968 ** 
antib          -3.8294     0.6025  -6.355 2.08e-10 ***
noplan:factor   3.4052     1.6138   2.110 0.034861 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 81.11546  on 6  degrees of freedom
Residual deviance:  0.95499  on 2  degrees of freedom
AIC: 29.162

Number of Fisher Scoring iterations: 5

Compare observed and fitted proportions:

Code
tibble(
    observed = round(caesar_augment$Infect_count / caesar_augment$size, 2),
    fitted = round(caesar_glm_noplan_factor_augment$fitted.values, 2)
)
# A tibble: 7 × 2
  observed fitted
     <dbl>  <dbl>
1     0.11   0.12
2     0.88   0.86
3     0.05   0.05
4     0.06   0.02
5     0.48   0.49
6     0      0.01
7     0.2    0.2 

Test goodness-of-fit using Pearson and deviance statistics:

Code
pearson_pval <- 1 -
    pchisq(
        sum(resid(caesar_glm_noplan_factor_augment, type = "pearson")^2),
        df = 2
    )
deviance_pval <- 1 -
    pchisq(
        sum(resid(caesar_glm_noplan_factor_augment, type = "deviance")^2),
        df = 2
    )

tibble(
    test = c("Pearson", "Deviance"),
    p_value = c(pearson_pval, deviance_pval)
)
# A tibble: 2 × 2
  test     p_value
  <chr>      <dbl>
1 Pearson    0.529
2 Deviance   0.620

We do not reject the new model with noplan:factor interaction at usual levels such as 0.1 or 0.05.

Reproducibility

Session Information

Code
sessionInfo()
R version 4.4.2 (2024-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS 26.3.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

time zone: Australia/Melbourne
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] Fahrmeir_2016.5.31 forcats_1.0.1      dplyr_1.2.0        conflicted_1.2.0  

loaded via a namespace (and not attached):
 [1] vctrs_0.7.1      cli_3.6.5        knitr_1.51       rlang_1.1.7     
 [5] xfun_0.56        otel_0.2.0       renv_1.1.7       generics_0.1.4  
 [9] jsonlite_2.0.0   glue_1.8.0       htmltools_0.5.9  rmarkdown_2.30  
[13] evaluate_1.0.5   tibble_3.3.1     fastmap_1.2.0    yaml_2.3.12     
[17] lifecycle_1.0.5  memoise_2.0.1    compiler_4.4.2   pkgconfig_2.0.3 
[21] digest_0.6.39    R6_2.6.1         utf8_1.2.6       tidyselect_1.2.1
[25] pillar_1.11.1    magrittr_2.0.4   tools_4.4.2      cachem_1.1.0    

Package Citations

Below are the formal citations for the packages used in this analysis:

Code
packages <- c("conflicted", "dplyr", "forcats", "Fahrmeir")
for (pkg in packages) {
    cit <- citation(pkg)

    # Check if multiple citations
    if (length(cit) > 1) {
        cat("- **", pkg, "**:\n", sep = "")
        for (i in seq_along(cit)) {
            cit_text <- format(cit[i], style = "text")
            cat("    - ", cit_text, "\n\n", sep = "")
        }
    } else {
        cit_text <- format(cit, style = "text")
        cat("- **", pkg, "**: ", cit_text, "\n\n", sep = "")
    }
}
  • conflicted: Wickham H (2023). conflicted: An Alternative Conflict Resolution Strategy. R package version 1.2.0, https://github.com/r-lib/conflicted, https://conflicted.r-lib.org/.

  • dplyr: Wickham H, François R, Henry L, Müller K, Vaughan D (2026). dplyr: A Grammar of Data Manipulation. R package version 1.2.0, https://github.com/tidyverse/dplyr, https://dplyr.tidyverse.org.

  • forcats: Wickham H (2025). forcats: Tools for Working with Categorical Variables (Factors). R package version 1.0.1, https://github.com/tidyverse/forcats, https://forcats.tidyverse.org/.

  • Fahrmeir: Halvorsen cbKB (2016). Fahrmeir: Data from the Book “Multivariate Statistical Modelling Based on Generalized Linear Models”, First Edition, by Ludwig Fahrmeir and Gerhard Tutz. R package version 2016.5.31.

References

Fahrmeir, Ludwig, and Gerhard Tutz. 2001. Multivariate Statistical Modelling Based on Generalized Linear Models. Second Edition. Springer Series in Statistics. New York, NY: Springer. https://doi.org/10.1007/978-1-4757-3454-6.