Generalized Estimating Equations: CTSIB Balance Test Data

Faraway (2016) Section 13.5

Published

April 15, 2026

Overview

We now look at the CTSIB balance test dataset from the very applied book by Faraway (2016), which can be accessed online. In the second edition of this book, it is in section 13.5, page 292.

Data Setup

First, we load the data and set the default contrasts to effect coding:

Code
library(conflicted)
library(dplyr)
library(faraway)
library(gee)

options(contrasts = c("contr.treatment", "contr.poly"))

data(ctsib)

As always, we should spend some time reading the documentation to understand the data set.

Code
?ctsib

We can also inspect the data types by converting the data to a tibble:

Code
ctsib_tbl <- as_tibble(ctsib)
head(ctsib_tbl, 20)
# A tibble: 20 × 8
   Subject Sex     Age Height Weight Surface Vision CTSIB
     <int> <fct> <int>  <dbl>  <dbl> <fct>   <fct>  <int>
 1       1 male     22    176   68.2 norm    open       1
 2       1 male     22    176   68.2 norm    open       1
 3       1 male     22    176   68.2 norm    closed     2
 4       1 male     22    176   68.2 norm    closed     2
 5       1 male     22    176   68.2 norm    dome       1
 6       1 male     22    176   68.2 norm    dome       2
 7       1 male     22    176   68.2 foam    open       2
 8       1 male     22    176   68.2 foam    open       2
 9       1 male     22    176   68.2 foam    closed     2
10       1 male     22    176   68.2 foam    closed     2
11       1 male     22    176   68.2 foam    dome       2
12       1 male     22    176   68.2 foam    dome       2
13       2 male     22    181   67.6 norm    open       1
14       2 male     22    181   67.6 norm    open       1
15       2 male     22    181   67.6 norm    closed     2
16       2 male     22    181   67.6 norm    closed     2
17       2 male     22    181   67.6 norm    dome       2
18       2 male     22    181   67.6 norm    dome       2
19       2 male     22    181   67.6 foam    open       2
20       2 male     22    181   67.6 foam    open       2

Creating Binary Response

We convert the ordinal CTSIB outcome into a binary variable indicating “stable” (when CTSIB == 1) vs. “not stable” balance.

Code
ctsib_tbl <- ctsib_tbl |>
  mutate(stable = (CTSIB == 1))

ctsib_tbl
# A tibble: 480 × 9
   Subject Sex     Age Height Weight Surface Vision CTSIB stable
     <int> <fct> <int>  <dbl>  <dbl> <fct>   <fct>  <int> <lgl> 
 1       1 male     22    176   68.2 norm    open       1 TRUE  
 2       1 male     22    176   68.2 norm    open       1 TRUE  
 3       1 male     22    176   68.2 norm    closed     2 FALSE 
 4       1 male     22    176   68.2 norm    closed     2 FALSE 
 5       1 male     22    176   68.2 norm    dome       1 TRUE  
 6       1 male     22    176   68.2 norm    dome       2 FALSE 
 7       1 male     22    176   68.2 foam    open       2 FALSE 
 8       1 male     22    176   68.2 foam    open       2 FALSE 
 9       1 male     22    176   68.2 foam    closed     2 FALSE 
10       1 male     22    176   68.2 foam    closed     2 FALSE 
# ℹ 470 more rows

Fitting the Model with gee::gee()

We fit a model using the generalised estimating equations assuming exchangeable correlation structure among repeated measurements within each subject.

The exchangeable correlation structure assumes that any pair of observations within the same cluster (subject) has the same correlation.

Code
gg <- gee(
  stable ~ Sex + Age + Height + Weight + Surface + Vision,
  id = Subject,
  data = ctsib_tbl,
  family = binomial,
  corstr = "exchangeable",
  scale.fix = TRUE
)
Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
running glm to get initial regression estimate
 (Intercept)      Sexmale          Age       Height       Weight  Surfacenorm 
 7.277448835  1.401577281  0.002521209 -0.096413369  0.043503033  3.967515232 
  Visiondome   Visionopen 
 0.363752768  3.187500719 
Code
coef(gg)
(Intercept)     Sexmale         Age      Height      Weight Surfacenorm 
 8.60287367  1.64107984 -0.01184236 -0.10202012  0.04365469  3.91725411 
 Visiondome  Visionopen 
 0.35896088  3.18012592 
Code
summary(gg)

 GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
 gee S-function, version 4.13 modified 98/01/27 (1998) 

Model:
 Link:                      Logit 
 Variance to Mean Relation: Binomial 
 Correlation Structure:     Exchangeable 

Call:
gee(formula = stable ~ Sex + Age + Height + Weight + Surface + 
    Vision, id = Subject, data = ctsib_tbl, family = binomial, 
    corstr = "exchangeable", scale.fix = TRUE)

Summary of Residuals:
         Min           1Q       Median           3Q          Max 
-0.936596515 -0.140171442 -0.010831509 -0.001490722  0.958070067 


Coefficients:
               Estimate Naive S.E.    Naive z Robust S.E.   Robust z
(Intercept)  8.60287367 5.19900613  1.6547150  5.91126348  1.4553359
Sexmale      1.64107984 0.70144427  2.3395727  0.90284040  1.8176854
Age         -0.01184236 0.03302210 -0.3586191  0.04798622 -0.2467866
Height      -0.10202012 0.03631470 -2.8093341  0.04233629 -2.4097558
Weight       0.04365469 0.02463001  1.7724186  0.03396364  1.2853359
Surfacenorm  3.91725411 0.41250076  9.4963562  0.56633339  6.9168694
Visiondome   0.35896088 0.33580384  1.0689600  0.40417466  0.8881331
Visionopen   3.18012592 0.37710855  8.4329192  0.46036532  6.9078312

Estimated Scale Parameter:  1
Number of Iterations:  4

Working Correlation
           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
 [1,] 1.0000000 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948
 [2,] 0.2138948 1.0000000 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948
 [3,] 0.2138948 0.2138948 1.0000000 0.2138948 0.2138948 0.2138948 0.2138948
 [4,] 0.2138948 0.2138948 0.2138948 1.0000000 0.2138948 0.2138948 0.2138948
 [5,] 0.2138948 0.2138948 0.2138948 0.2138948 1.0000000 0.2138948 0.2138948
 [6,] 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948 1.0000000 0.2138948
 [7,] 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948 1.0000000
 [8,] 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948
 [9,] 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948
[10,] 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948
[11,] 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948
[12,] 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948
           [,8]      [,9]     [,10]     [,11]     [,12]
 [1,] 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948
 [2,] 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948
 [3,] 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948
 [4,] 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948
 [5,] 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948
 [6,] 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948
 [7,] 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948
 [8,] 1.0000000 0.2138948 0.2138948 0.2138948 0.2138948
 [9,] 0.2138948 1.0000000 0.2138948 0.2138948 0.2138948
[10,] 0.2138948 0.2138948 1.0000000 0.2138948 0.2138948
[11,] 0.2138948 0.2138948 0.2138948 1.0000000 0.2138948
[12,] 0.2138948 0.2138948 0.2138948 0.2138948 1.0000000
  • stable ~ Sex + Age + Height + Weight + Surface + Vision: a main effects model, with stable as the response.
  • id = Subject: identifies the subject-level clusters for repeated measures.
  • data = ctsib_tbl: the data frame used to evaluate the variables in the formula.
  • family = binomial: a binomial mean-variance model for the binary response.
  • corstr = "exchangeable": assumes a common within-subject correlation for all repeated observations.
  • scale.fix = TRUE: fixes the scale parameter fixed to 1.

DENNIS: you finished the demonstration of this script here last year

Empirical Scale Estimation

Alternatively, we can also estimate the scale from the data (which Faraway (2016) does not do).

Code
gg_scale <- gee(
  stable ~ Sex + Age + Height + Weight + Surface + Vision,
  id = Subject,
  family = binomial,
  data = ctsib_tbl,
  corstr = "exchangeable",
  scale.fix = FALSE
)
Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
running glm to get initial regression estimate
 (Intercept)      Sexmale          Age       Height       Weight  Surfacenorm 
 7.277448835  1.401577281  0.002521209 -0.096413369  0.043503033  3.967515232 
  Visiondome   Visionopen 
 0.363752768  3.187500719 
Code
gg_scale$scale
[1] 0.7438231

We can compare the coefficient estimates from the fixed scale vs. empirical scale models:

Code
coef_compare <- tibble(
  term = names(coef(gg)),
  fixed_scale = coef(gg),
  empirical_scale = coef(gg_scale),
)

coef_compare
# A tibble: 8 × 3
  term        fixed_scale empirical_scale
  <chr>             <dbl>           <dbl>
1 (Intercept)      8.60            8.60  
2 Sexmale          1.64            1.64  
3 Age             -0.0118         -0.0118
4 Height          -0.102          -0.102 
5 Weight           0.0437          0.0437
6 Surfacenorm      3.92            3.92  
7 Visiondome       0.359           0.359 
8 Visionopen       3.18            3.18  

The coefficient estimates are actually identical.

The geepack package provides an alternative implementation of GEE with some methodological differences. This is the same package used in Faraway (2016). The code is provided here without commentary.

Fitting with geepack (Fixed Scale)

Code
library(geepack)

modgeep <- geeglm(
  stable ~ Sex + Age + Height + Weight + Surface + Vision,
  id = Subject,
  corstr = "exchangeable",
  scale.fix = TRUE,
  data = ctsib_tbl,
  family = binomial
)

coef(modgeep)
(Intercept)     Sexmale         Age      Height      Weight Surfacenorm 
 8.62332261  1.64487784 -0.01205120 -0.10210508  0.04365101  3.91631510 
 Visiondome  Visionopen 
 0.35888245  3.17990085 
Code
summary(modgeep)

Call:
geeglm(formula = stable ~ Sex + Age + Height + Weight + Surface + 
    Vision, family = binomial, data = ctsib_tbl, id = Subject, 
    corstr = "exchangeable", scale.fix = TRUE)

 Coefficients:
            Estimate  Std.err   Wald Pr(>|W|)    
(Intercept)  8.62332  5.91992  2.122   0.1452    
Sexmale      1.64488  0.90347  3.315   0.0687 .  
Age         -0.01205  0.04802  0.063   0.8019    
Height      -0.10211  0.04239  5.801   0.0160 *  
Weight       0.04365  0.03399  1.649   0.1991    
Surfacenorm  3.91632  0.56682 47.738 4.87e-12 ***
Visiondome   0.35888  0.40403  0.789   0.3744    
Visionopen   3.17990  0.46063 47.657 5.08e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation structure = exchangeable 
Scale is fixed.

  Link = identity 

Estimated Correlation Parameters:
      Estimate Std.err
alpha   0.2185 0.04467
Number of clusters:   40  Maximum cluster size: 12 
Code
summary(modgeep)$scale.value
NULL
Code
summary(modgeep)$dispersion
            Estimate Std.err
(Intercept)   0.7262       0

Fitting with geepack (Empirical Scale)

Code
modgeep_scale <- geeglm(
  stable ~ Sex + Age + Height + Weight + Surface + Vision,
  id = Subject,
  corstr = "exchangeable",
  scale.fix = FALSE,
  data = ctsib_tbl,
  family = binomial
)

coef(modgeep_scale)
(Intercept)     Sexmale         Age      Height      Weight Surfacenorm 
    8.61677     1.64358    -0.01198    -0.10208     0.04365     3.91664 
 Visiondome  Visionopen 
    0.35891     3.17998 
Code
summary(modgeep_scale)$scale.value
NULL
Code
summary(modgeep_scale)$dispersion
            Estimate Std.err
(Intercept)   0.7315  0.6742

Final Comparison: All Four Models

Code
all_terms <- sort(unique(c(
  names(coef(gg)),
  names(coef(gg_scale)),
  names(coef(modgeep)),
  names(coef(modgeep_scale))
)))

coef_compare_all <- tibble(
  term = all_terms,
  gee_fixed = unname(coef(gg)[all_terms]),
  gee_empirical = unname(coef(gg_scale)[all_terms]),
  geepack_fixed = unname(coef(modgeep)[all_terms]),
  geepack_empirical = unname(coef(modgeep_scale)[all_terms])
)

coef_compare_all
# A tibble: 8 × 5
  term        gee_fixed gee_empirical geepack_fixed geepack_empirical
  <chr>           <dbl>         <dbl>         <dbl>             <dbl>
1 (Intercept)    8.60          8.60          8.62              8.62  
2 Age           -0.0118       -0.0118       -0.0121           -0.0120
3 Height        -0.102        -0.102        -0.102            -0.102 
4 Sexmale        1.64          1.64          1.64              1.64  
5 Surfacenorm    3.92          3.92          3.92              3.92  
6 Visiondome     0.359         0.359         0.359             0.359 
7 Visionopen     3.18          3.18          3.18              3.18  
8 Weight         0.0437        0.0437        0.0437            0.0437

Package Citations and Session Information

Code
sessionInfo()
R version 4.4.2 (2024-10-31)
Platform: x86_64-apple-darwin20
Running under: macOS Sequoia 15.7.5

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Australia/Melbourne
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] geepack_1.3.13   gee_4.13-29      faraway_1.0.9    dplyr_1.2.0     
[5] conflicted_1.2.0

loaded via a namespace (and not attached):
 [1] Matrix_1.7-1      jsonlite_2.0.0    compiler_4.4.2    renv_1.1.7       
 [5] tidyselect_1.2.1  Rcpp_1.1.1        tidyr_1.3.2       splines_4.4.2    
 [9] boot_1.3-31       yaml_2.3.12       fastmap_1.2.0     lattice_0.22-6   
[13] R6_2.6.1          generics_0.1.4    knitr_1.51        backports_1.5.0  
[17] rbibutils_2.4.1   MASS_7.3-61       tibble_3.3.1      nloptr_2.2.1     
[21] minqa_1.2.8       pillar_1.11.1     rlang_1.1.7       utf8_1.2.6       
[25] broom_1.0.12      cachem_1.1.0      xfun_0.56         memoise_2.0.1    
[29] cli_3.6.5         magrittr_2.0.4    Rdpack_2.6.5      digest_0.6.39    
[33] grid_4.4.2        rstudioapi_0.18.0 lme4_1.1-38       lifecycle_1.0.5  
[37] nlme_3.1-168      reformulas_0.4.4  vctrs_0.7.1       evaluate_1.0.5   
[41] glue_1.8.0        purrr_1.2.1       rmarkdown_2.30    tools_4.4.2      
[45] pkgconfig_2.0.3   htmltools_0.5.9  
Code
packages <- c("conflicted", "gee", "geepack", "dplyr", "faraway")
for (pkg in packages) {
  cit <- citation(pkg)

  # Check if multiple citations
  if (length(cit) > 1) {
    cat("- **", pkg, "**:\n", sep = "")
    for (i in seq_along(cit)) {
      cit_text <- format(cit[i], style = "text")
      cat("    - ", cit_text, "\n\n", sep = "")
    }
  } else {
    cit_text <- format(cit, style = "text")
    cat("- **", pkg, "**: ", cit_text, "\n\n", sep = "")
  }
}
  • conflicted: Wickham H (2023). conflicted: An Alternative Conflict Resolution Strategy. R package version 1.2.0, https://github.com/r-lib/conflicted, https://conflicted.r-lib.org/.

  • gee: Carey VJ (2024). gee: Generalized Estimation Equation Solver. R package version 4.13-29.

  • geepack:

    • Halekoh U, Højsgaard S, Yan J (2006). “The R Package geepack for Generalized Estimating Equations.” Journal of Statistical Software, 15/2, 1-11.

    • Yan J, Fine JP (2004). “Estimating Equations for Association Structures.” Statistics in Medicine, 23, 859-880.

    • Yan J (2002). “geepack: Yet Another Package for Generalized Estimating Equations.” R-News, 2/3, 12-14.

    • Xu, Z., Fine, P. J, Song, W., Yan, J. (2025). “On GEE for mean-variance-correlation models: Variance estimation and model selection.” Statistics in Medicine, 44, 1-2.

  • dplyr: Wickham H, François R, Henry L, Müller K, Vaughan D (2026). dplyr: A Grammar of Data Manipulation. R package version 1.2.0, https://github.com/tidyverse/dplyr, https://dplyr.tidyverse.org.

  • faraway: Faraway J (2025). faraway: Datasets and Functions for Books by Julian Faraway. R package version 1.0.9, https://github.com/julianfaraway/faraway.

References

Faraway, Julian J. 2016. Extending the Linear Model with r : Generalized Linear, Mixed Effects and Nonparametric Regression Models, Second Edition. Second edition. Chapman & Hall/CRC Texts in Statistical Science Series. Chapman; Hall/CRC. https://research.ebsco.com/plink/3806c0de-f79d-330d-9c4e-7bb136efaa53.