Code
library(conflicted)
library(dplyr)
library(faraway)
library(gee)
options(contrasts = c("contr.treatment", "contr.poly"))
data(ctsib)Faraway (2016) Section 13.5
April 15, 2026
We now look at the CTSIB balance test dataset from the very applied book by Faraway (2016), which can be accessed online. In the second edition of this book, it is in section 13.5, page 292.
First, we load the data and set the default contrasts to effect coding:
As always, we should spend some time reading the documentation to understand the data set.
We can also inspect the data types by converting the data to a tibble:
# A tibble: 20 × 8
Subject Sex Age Height Weight Surface Vision CTSIB
<int> <fct> <int> <dbl> <dbl> <fct> <fct> <int>
1 1 male 22 176 68.2 norm open 1
2 1 male 22 176 68.2 norm open 1
3 1 male 22 176 68.2 norm closed 2
4 1 male 22 176 68.2 norm closed 2
5 1 male 22 176 68.2 norm dome 1
6 1 male 22 176 68.2 norm dome 2
7 1 male 22 176 68.2 foam open 2
8 1 male 22 176 68.2 foam open 2
9 1 male 22 176 68.2 foam closed 2
10 1 male 22 176 68.2 foam closed 2
11 1 male 22 176 68.2 foam dome 2
12 1 male 22 176 68.2 foam dome 2
13 2 male 22 181 67.6 norm open 1
14 2 male 22 181 67.6 norm open 1
15 2 male 22 181 67.6 norm closed 2
16 2 male 22 181 67.6 norm closed 2
17 2 male 22 181 67.6 norm dome 2
18 2 male 22 181 67.6 norm dome 2
19 2 male 22 181 67.6 foam open 2
20 2 male 22 181 67.6 foam open 2
We convert the ordinal CTSIB outcome into a binary variable indicating “stable” (when CTSIB == 1) vs. “not stable” balance.
# A tibble: 480 × 9
Subject Sex Age Height Weight Surface Vision CTSIB stable
<int> <fct> <int> <dbl> <dbl> <fct> <fct> <int> <lgl>
1 1 male 22 176 68.2 norm open 1 TRUE
2 1 male 22 176 68.2 norm open 1 TRUE
3 1 male 22 176 68.2 norm closed 2 FALSE
4 1 male 22 176 68.2 norm closed 2 FALSE
5 1 male 22 176 68.2 norm dome 1 TRUE
6 1 male 22 176 68.2 norm dome 2 FALSE
7 1 male 22 176 68.2 foam open 2 FALSE
8 1 male 22 176 68.2 foam open 2 FALSE
9 1 male 22 176 68.2 foam closed 2 FALSE
10 1 male 22 176 68.2 foam closed 2 FALSE
# ℹ 470 more rows
gee::gee()We fit a model using the generalised estimating equations assuming exchangeable correlation structure among repeated measurements within each subject.
The exchangeable correlation structure assumes that any pair of observations within the same cluster (subject) has the same correlation.
Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
running glm to get initial regression estimate
(Intercept) Sexmale Age Height Weight Surfacenorm
7.277448835 1.401577281 0.002521209 -0.096413369 0.043503033 3.967515232
Visiondome Visionopen
0.363752768 3.187500719
(Intercept) Sexmale Age Height Weight Surfacenorm
8.60287367 1.64107984 -0.01184236 -0.10202012 0.04365469 3.91725411
Visiondome Visionopen
0.35896088 3.18012592
GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
gee S-function, version 4.13 modified 98/01/27 (1998)
Model:
Link: Logit
Variance to Mean Relation: Binomial
Correlation Structure: Exchangeable
Call:
gee(formula = stable ~ Sex + Age + Height + Weight + Surface +
Vision, id = Subject, data = ctsib_tbl, family = binomial,
corstr = "exchangeable", scale.fix = TRUE)
Summary of Residuals:
Min 1Q Median 3Q Max
-0.936596515 -0.140171442 -0.010831509 -0.001490722 0.958070067
Coefficients:
Estimate Naive S.E. Naive z Robust S.E. Robust z
(Intercept) 8.60287367 5.19900613 1.6547150 5.91126348 1.4553359
Sexmale 1.64107984 0.70144427 2.3395727 0.90284040 1.8176854
Age -0.01184236 0.03302210 -0.3586191 0.04798622 -0.2467866
Height -0.10202012 0.03631470 -2.8093341 0.04233629 -2.4097558
Weight 0.04365469 0.02463001 1.7724186 0.03396364 1.2853359
Surfacenorm 3.91725411 0.41250076 9.4963562 0.56633339 6.9168694
Visiondome 0.35896088 0.33580384 1.0689600 0.40417466 0.8881331
Visionopen 3.18012592 0.37710855 8.4329192 0.46036532 6.9078312
Estimated Scale Parameter: 1
Number of Iterations: 4
Working Correlation
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1.0000000 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948
[2,] 0.2138948 1.0000000 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948
[3,] 0.2138948 0.2138948 1.0000000 0.2138948 0.2138948 0.2138948 0.2138948
[4,] 0.2138948 0.2138948 0.2138948 1.0000000 0.2138948 0.2138948 0.2138948
[5,] 0.2138948 0.2138948 0.2138948 0.2138948 1.0000000 0.2138948 0.2138948
[6,] 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948 1.0000000 0.2138948
[7,] 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948 1.0000000
[8,] 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948
[9,] 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948
[10,] 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948
[11,] 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948
[12,] 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948
[,8] [,9] [,10] [,11] [,12]
[1,] 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948
[2,] 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948
[3,] 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948
[4,] 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948
[5,] 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948
[6,] 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948
[7,] 0.2138948 0.2138948 0.2138948 0.2138948 0.2138948
[8,] 1.0000000 0.2138948 0.2138948 0.2138948 0.2138948
[9,] 0.2138948 1.0000000 0.2138948 0.2138948 0.2138948
[10,] 0.2138948 0.2138948 1.0000000 0.2138948 0.2138948
[11,] 0.2138948 0.2138948 0.2138948 1.0000000 0.2138948
[12,] 0.2138948 0.2138948 0.2138948 0.2138948 1.0000000
gee::gee() syntax
stable ~ Sex + Age + Height + Weight + Surface + Vision: a main effects model, with stable as the response.id = Subject: identifies the subject-level clusters for repeated measures.data = ctsib_tbl: the data frame used to evaluate the variables in the formula.family = binomial: a binomial mean-variance model for the binary response.corstr = "exchangeable": assumes a common within-subject correlation for all repeated observations.scale.fix = TRUE: fixes the scale parameter fixed to 1.DENNIS: you finished the demonstration of this script here last year
Alternatively, we can also estimate the scale from the data (which Faraway (2016) does not do).
Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
running glm to get initial regression estimate
(Intercept) Sexmale Age Height Weight Surfacenorm
7.277448835 1.401577281 0.002521209 -0.096413369 0.043503033 3.967515232
Visiondome Visionopen
0.363752768 3.187500719
[1] 0.7438231
We can compare the coefficient estimates from the fixed scale vs. empirical scale models:
# A tibble: 8 × 3
term fixed_scale empirical_scale
<chr> <dbl> <dbl>
1 (Intercept) 8.60 8.60
2 Sexmale 1.64 1.64
3 Age -0.0118 -0.0118
4 Height -0.102 -0.102
5 Weight 0.0437 0.0437
6 Surfacenorm 3.92 3.92
7 Visiondome 0.359 0.359
8 Visionopen 3.18 3.18
The coefficient estimates are actually identical.
The geepack package provides an alternative implementation of GEE with some methodological differences. This is the same package used in Faraway (2016). The code is provided here without commentary.
(Intercept) Sexmale Age Height Weight Surfacenorm
8.62332261 1.64487784 -0.01205120 -0.10210508 0.04365101 3.91631510
Visiondome Visionopen
0.35888245 3.17990085
Call:
geeglm(formula = stable ~ Sex + Age + Height + Weight + Surface +
Vision, family = binomial, data = ctsib_tbl, id = Subject,
corstr = "exchangeable", scale.fix = TRUE)
Coefficients:
Estimate Std.err Wald Pr(>|W|)
(Intercept) 8.62332 5.91992 2.122 0.1452
Sexmale 1.64488 0.90347 3.315 0.0687 .
Age -0.01205 0.04802 0.063 0.8019
Height -0.10211 0.04239 5.801 0.0160 *
Weight 0.04365 0.03399 1.649 0.1991
Surfacenorm 3.91632 0.56682 47.738 4.87e-12 ***
Visiondome 0.35888 0.40403 0.789 0.3744
Visionopen 3.17990 0.46063 47.657 5.08e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation structure = exchangeable
Scale is fixed.
Link = identity
Estimated Correlation Parameters:
Estimate Std.err
alpha 0.2185 0.04467
Number of clusters: 40 Maximum cluster size: 12
NULL
Estimate Std.err
(Intercept) 0.7262 0
(Intercept) Sexmale Age Height Weight Surfacenorm
8.61677 1.64358 -0.01198 -0.10208 0.04365 3.91664
Visiondome Visionopen
0.35891 3.17998
NULL
Estimate Std.err
(Intercept) 0.7315 0.6742
all_terms <- sort(unique(c(
names(coef(gg)),
names(coef(gg_scale)),
names(coef(modgeep)),
names(coef(modgeep_scale))
)))
coef_compare_all <- tibble(
term = all_terms,
gee_fixed = unname(coef(gg)[all_terms]),
gee_empirical = unname(coef(gg_scale)[all_terms]),
geepack_fixed = unname(coef(modgeep)[all_terms]),
geepack_empirical = unname(coef(modgeep_scale)[all_terms])
)
coef_compare_all# A tibble: 8 × 5
term gee_fixed gee_empirical geepack_fixed geepack_empirical
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 8.60 8.60 8.62 8.62
2 Age -0.0118 -0.0118 -0.0121 -0.0120
3 Height -0.102 -0.102 -0.102 -0.102
4 Sexmale 1.64 1.64 1.64 1.64
5 Surfacenorm 3.92 3.92 3.92 3.92
6 Visiondome 0.359 0.359 0.359 0.359
7 Visionopen 3.18 3.18 3.18 3.18
8 Weight 0.0437 0.0437 0.0437 0.0437
R version 4.4.2 (2024-10-31)
Platform: x86_64-apple-darwin20
Running under: macOS Sequoia 15.7.5
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Australia/Melbourne
tzcode source: internal
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] geepack_1.3.13 gee_4.13-29 faraway_1.0.9 dplyr_1.2.0
[5] conflicted_1.2.0
loaded via a namespace (and not attached):
[1] Matrix_1.7-1 jsonlite_2.0.0 compiler_4.4.2 renv_1.1.7
[5] tidyselect_1.2.1 Rcpp_1.1.1 tidyr_1.3.2 splines_4.4.2
[9] boot_1.3-31 yaml_2.3.12 fastmap_1.2.0 lattice_0.22-6
[13] R6_2.6.1 generics_0.1.4 knitr_1.51 backports_1.5.0
[17] rbibutils_2.4.1 MASS_7.3-61 tibble_3.3.1 nloptr_2.2.1
[21] minqa_1.2.8 pillar_1.11.1 rlang_1.1.7 utf8_1.2.6
[25] broom_1.0.12 cachem_1.1.0 xfun_0.56 memoise_2.0.1
[29] cli_3.6.5 magrittr_2.0.4 Rdpack_2.6.5 digest_0.6.39
[33] grid_4.4.2 rstudioapi_0.18.0 lme4_1.1-38 lifecycle_1.0.5
[37] nlme_3.1-168 reformulas_0.4.4 vctrs_0.7.1 evaluate_1.0.5
[41] glue_1.8.0 purrr_1.2.1 rmarkdown_2.30 tools_4.4.2
[45] pkgconfig_2.0.3 htmltools_0.5.9
packages <- c("conflicted", "gee", "geepack", "dplyr", "faraway")
for (pkg in packages) {
cit <- citation(pkg)
# Check if multiple citations
if (length(cit) > 1) {
cat("- **", pkg, "**:\n", sep = "")
for (i in seq_along(cit)) {
cit_text <- format(cit[i], style = "text")
cat(" - ", cit_text, "\n\n", sep = "")
}
} else {
cit_text <- format(cit, style = "text")
cat("- **", pkg, "**: ", cit_text, "\n\n", sep = "")
}
}conflicted: Wickham H (2023). conflicted: An Alternative Conflict Resolution Strategy. R package version 1.2.0, https://github.com/r-lib/conflicted, https://conflicted.r-lib.org/.
gee: Carey VJ (2024). gee: Generalized Estimation Equation Solver. R package version 4.13-29.
geepack:
Halekoh U, Højsgaard S, Yan J (2006). “The R Package geepack for Generalized Estimating Equations.” Journal of Statistical Software, 15/2, 1-11.
Yan J, Fine JP (2004). “Estimating Equations for Association Structures.” Statistics in Medicine, 23, 859-880.
Yan J (2002). “geepack: Yet Another Package for Generalized Estimating Equations.” R-News, 2/3, 12-14.
Xu, Z., Fine, P. J, Song, W., Yan, J. (2025). “On GEE for mean-variance-correlation models: Variance estimation and model selection.” Statistics in Medicine, 44, 1-2.
dplyr: Wickham H, François R, Henry L, Müller K, Vaughan D (2026). dplyr: A Grammar of Data Manipulation. R package version 1.2.0, https://github.com/tidyverse/dplyr, https://dplyr.tidyverse.org.
faraway: Faraway J (2025). faraway: Datasets and Functions for Books by Julian Faraway. R package version 1.0.9, https://github.com/julianfaraway/faraway.