Survival Analysis: Breast Cancer

Kaplan-Meier, Exponential and Weibull Models

Published

April 15, 2026

DENNIS: You may want to add more exposition to this code. I’m not exactly sure what happens in the lectures but when you are explaining it you are quite succinct (particularly in the sections on parametric models). You may also want to add some bibliographic information in the front end, such as the Cox and Oakes 84 paper, Kalbfleisch book, and Venables and Ripley (2002).

Data Loading and Setup

We load packages and breast cancer survival data adapted from Leathem and Brooks (1987) (referenced by Collett (2023)), and inspect the structure:

Code

library(conflicted)
library(dplyr)
library(ggplot2)
library(readr)
library(scales)
library(survival)

bcancer <- read_csv("data/survival.csv")

Rows: 45 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): group
dbl (2): time, status

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Code

bcancer

# A tibble: 45 × 3
    time status group
   <dbl>  <dbl> <chr>
 1    23      1 N    
 2    47      1 N    
 3    69      1 N    
 4    70      0 N    
 5    71      0 N    
 6   100      0 N    
 7   101      0 N    
 8   148      1 N    
 9   181      1 N    
10   198      0 N    
# ℹ 35 more rows

Code

tail(bcancer)

# A tibble: 6 × 3
   time status group
  <dbl>  <dbl> <chr>
1   154      0 P    
2   162      0 P    
3   188      0 P    
4   212      0 P    
5   217      0 P    
6   225      0 P

Note

Data structure for survival analysis:

time: Time in days until event (death) or censoring (last follow-up)
status: Censoring indicator (0 = censored, 1 = event observed)
group: Grouping variable (here: staining status: Positive (P) or Negative (N)) this indicates whether metastasis was observed in the lymph nodes, which is a prognostic factor for breast cancer survival.

The survival::Surv() function combines time and status into a survival object for modelling.

Diagnostics

Kaplan-Meier Estimation

We fit the Kaplan-Meier (non-parametric) estimator for the survival function by group:

Code

cancer_km <- survfit(Surv(time, status) ~ group, data = bcancer)

cancer_km

Call: survfit(formula = Surv(time, status) ~ group, data = bcancer)

         n events median 0.95LCL 0.95UCL
group=N 13      5     NA     148      NA
group=P 32     21   64.5      41      NA

Code

summary(cancer_km)

Call: survfit(formula = Surv(time, status) ~ group, data = bcancer)

                group=N 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
   23     13       1    0.923  0.0739        0.789        1.000
   47     12       1    0.846  0.1001        0.671        1.000
   69     11       1    0.769  0.1169        0.571        1.000
  148      6       1    0.641  0.1522        0.402        1.000
  181      5       1    0.513  0.1673        0.271        0.972

                group=P 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    5     32       1    0.969  0.0308        0.910        1.000
    8     31       1    0.938  0.0428        0.857        1.000
   10     30       1    0.906  0.0515        0.811        1.000
   13     29       1    0.875  0.0585        0.768        0.997
   18     28       1    0.844  0.0642        0.727        0.979
   24     27       1    0.812  0.0690        0.688        0.960
   26     26       2    0.750  0.0765        0.614        0.916
   31     24       1    0.719  0.0795        0.579        0.893
   35     23       1    0.688  0.0819        0.544        0.868
   40     22       1    0.656  0.0840        0.511        0.843
   41     21       1    0.625  0.0856        0.478        0.817
   48     20       1    0.594  0.0868        0.446        0.791
   50     19       1    0.562  0.0877        0.414        0.764
   59     18       1    0.531  0.0882        0.384        0.736
   61     17       1    0.500  0.0884        0.354        0.707
   68     16       1    0.469  0.0882        0.324        0.678
   71     15       1    0.438  0.0877        0.295        0.648
  113     10       1    0.394  0.0892        0.253        0.614
  118      8       1    0.345  0.0906        0.206        0.577
  143      7       1    0.295  0.0900        0.162        0.537

Here the survfit object contains the estimated survival probabilities at each event time, along with confidence intervals and the number at risk. So for group=N at time 23, the estimated survival probability is 0.923.

n.risk is the number of patients still at risk just before that time, and n.event is the number of events (deaths) that occur at that time. n.risk can decrease due to events or censoring, which is why it doesn’t always decrease by the number of events.

The generic summary() method for survfit objects also has a data.frame = TRUE option which we can use and convert to a tibble for easier manipulation:

Code

km_summary_tbl <- summary(cancer_km, data.frame = TRUE) |> as_tibble()
str(km_summary_tbl)

tibble [25 × 11] (S3: tbl_df/tbl/data.frame)
 $ time    : num [1:25] 23 47 69 148 181 5 8 10 13 18 ...
 $ n.risk  : num [1:25] 13 12 11 6 5 32 31 30 29 28 ...
 $ n.event : num [1:25] 1 1 1 1 1 1 1 1 1 1 ...
 $ n.censor: num [1:25] 0 0 0 4 0 0 0 0 0 0 ...
 $ surv    : num [1:25] 0.923 0.846 0.769 0.641 0.513 ...
 $ cumhaz  : num [1:25] 0.0769 0.1603 0.2512 0.4178 0.6178 ...
 $ std.err : num [1:25] 0.0739 0.1001 0.1169 0.1522 0.1673 ...
 $ std.chaz: num [1:25] 0.0769 0.1134 0.1453 0.2211 0.2982 ...
 $ lower   : num [1:25] 0.789 0.671 0.571 0.402 0.271 ...
 $ upper   : num [1:25] 1 1 1 1 0.972 ...
 $ strata  : Factor w/ 2 levels "group=N","group=P": 1 1 1 1 1 2 2 2 2 2 ...

We use str() to inspect the structure of the summary data frame, because our tibble has many columns, and doesn’t display neatly.

Plotting Setup

Build a tidy data frame from the Kaplan-Meier fit so we can reuse it across the plots below:

Code

km_plot_data <- km_summary_tbl |>
    mutate(
        group = factor(
            strata,
            levels = c("group=N", "group=P"),
            labels = c("Negative Stain", "Positive Stain")
        )
    ) |>
    select(group, time, surv, lower, upper) |>
    arrange(group, time)

km_plot_data

# A tibble: 25 × 5
   group           time  surv lower upper
   <fct>          <dbl> <dbl> <dbl> <dbl>
 1 Negative Stain    23 0.923 0.789 1    
 2 Negative Stain    47 0.846 0.671 1    
 3 Negative Stain    69 0.769 0.571 1    
 4 Negative Stain   148 0.641 0.402 1    
 5 Negative Stain   181 0.513 0.271 0.972
 6 Positive Stain     5 0.969 0.910 1    
 7 Positive Stain     8 0.938 0.857 1    
 8 Positive Stain    10 0.906 0.811 1    
 9 Positive Stain    13 0.875 0.768 0.997
10 Positive Stain    18 0.844 0.727 0.979
# ℹ 15 more rows

Since we’ll be making multiple survival plots, we can define a common theme and colour palette. We use colourblind-friendly colours as suggested by Paul Tol:

Code

stain_colours <- c(
    "Negative Stain" = "#EE6677",
    "Positive Stain" = "#4477AA"
)

survival_theme <- theme_minimal(base_size = 14) +
    theme(
        legend.position = "top",
        legend.title = element_text(face = "bold"),
        plot.title = element_text(face = "bold"),
        panel.grid.minor = element_blank()
    )

Syntax walkthrough (shared plot settings):

stain_colours <- c(...) creates a named vector: the names ("Negative Stain", "Positive Stain") must match the group labels in the data, and the values are the colours used in the legend and lines.
ggplot2::theme_minimal(base_size = 14) sets a clean base style and readable default text size.
+ ggplot2::theme(...) modifies specific theme elements on top of ggplot2::theme_minimal().
legend.position = "top" moves the legend above the plot, which is usually easier to scan for two-group comparisons.
legend.title = ggplot2::element_text(face = "bold") and plot.title = ggplot2::element_text(face = "bold") make key text elements stand out.
panel.grid.minor = ggplot2::element_blank() removes minor gridlines to reduce visual clutter.

Plot Kaplan-Meier survival curves for the two groups:

Code

ggplot(km_plot_data, aes(x = time, y = surv, colour = group)) +
    geom_ribbon(aes(ymin = lower, ymax = upper, fill = group), alpha = 0.18, colour = NA) +
    geom_step(linewidth = 0.9) +
    scale_colour_manual(values = stain_colours, name = "Group") +
    scale_fill_manual(values = stain_colours, name = "Group") +
    scale_y_continuous(
        limits = c(0, 1),
        breaks = seq(0, 1, 0.2)
    ) +
    labs(
        title = "Kaplan-Meier survival curves",
        subtitle = "Step drops show the estimated survival probability over time;\nshaded bands show 95% confidence intervals",
        x = "Time",
        y = "Survival probability",
        colour = NULL
    ) +
    survival_theme

Plot syntax walkthrough (Kaplan-Meier)

Read this plot from top to bottom as a sequence of layers:

ggplot2::ggplot(km_plot_data, ggplot2::aes(...)) picks the data and default mappings (time on x-axis, survival surv on y-axis, colour by group).
ggplot2::geom_ribbon(ggplot2::aes(ymin = lower, ymax = upper, fill = group), ...) draws the 95% confidence interval band between the lower and upper survival limits.
ggplot2::geom_step(...) draws Kaplan-Meier style step curves.
ggplot2::scale_colour_manual(...) applies the named palette so the step-line legend labels match the colours.
ggplot2::scale_fill_manual(...) uses the same palette for the confidence-band fill.
ggplot2::scale_y_continuous(...) constrains and labels the y-axis to survival-probability values between 0 and 1.
ggplot2::labs(...) adds title, subtitle, and axis labels.
survival_theme applies shared styling (legend placement, text emphasis, and cleaner gridlines).

Weibull Suitability Check

Before fitting parametric models, check if the Weibull distribution is suitable. The log-log transformation of a Weibull survival curve is linear: \[\log(-\log(S(t))) = \alpha \log(\lambda t),\] and so plotting \(\log(-\log(S(t)))\) vs. \(\log(t)\) should yield a straight line if the Weibull model is appropriate.

Code

weibull_plot_data <- km_plot_data |>
    dplyr::filter(time > 0, surv > 0) |>
    mutate(
        log_time = log(time),
        log_log_surv = log(-log(surv))
    )

ggplot(weibull_plot_data, aes(x = log_time, y = log_log_surv, colour = group)) +
    geom_point(size = 2.2, alpha = 0.8) +
    geom_line(linewidth = 0.9) +
    scale_colour_manual(values = stain_colours, name = "Group") +
    labs(
        title = "Weibull diagnostic plot",
        subtitle = "A roughly straight pattern suggests a Weibull model is reasonable",
        x = "log(Time)",
        y = "log(-log(Survival))",
        colour = NULL
    ) +
    survival_theme

Plot syntax walkthrough (Weibull diagnostic)

ggplot2::ggplot(weibull_plot_data, ggplot2::aes(x = log_time, y = log_log_surv, colour = group)) sets the data source and maps transformed variables to the axes.
ggplot2::geom_point(size = 2.2, alpha = 0.8) adds plotted observations with size and transparency controls.
ggplot2::geom_line(linewidth = 0.9) connects points within each colour group.
ggplot2::scale_colour_manual(values = stain_colours, name = "Group") applies a named colour vector and legend title.
ggplot2::labs(...) sets the plot title, subtitle, and axis labels.
survival_theme applies the shared theme object.

Here we have a roughly linear pattern for both groups, suggesting the Weibull model may be a reasonable parametric choice for modeling survival in this dataset.

Proportional Hazards Assumption

Many survival models (e.g., Cox) assume proportional hazards: the hazard ratio between groups is constant over time. We can check this by plotting the complementary log-log transformation of the survival curves: \(\log(-\log(S(t)))\) vs. time, and looking for a parallel shift between groups.

Code

cloglog_plot_data <- km_plot_data |>
    dplyr::filter(time > 0, surv > 0) |>
    mutate(cloglog = log(-log(surv)))

ggplot(cloglog_plot_data, aes(x = time, y = cloglog, colour = group)) +
    geom_step(linewidth = 0.9) +
    scale_colour_manual(values = stain_colours, name = "Group") +
    labs(
        title = "Proportional hazards check via cloglog plot",
        subtitle = "A parallel shift supports the proportional hazards assumption",
        x = "Time",
        y = "log(-log(Survival))",
        colour = NULL
    ) +
    survival_theme

Plot syntax walkthrough (cloglog PH check)

ggplot2::ggplot(cloglog_plot_data, ggplot2::aes(x = time, y = cloglog, colour = group)) maps time and the transformed cloglog response.
ggplot2::geom_step(linewidth = 0.9) draws the observed transformed values as step functions.
ggplot2::scale_colour_manual(values = stain_colours, name = "Group") uses the predefined palette and legend title.
ggplot2::labs(...) defines the title, subtitle, and axis labels.
survival_theme adds the common visual styling.

Here is it plausible that the curves are roughly parallel, which supports the proportional hazards assumption.

Exponential Regression

Now we fit parametric survival models to estimate the effect of group on survival time. The first model we fit is the exponential survival model:

Code

cancer_exp <- survreg(
    Surv(time, status) ~ group,
    data = bcancer,
    dist = "exponential"
)

summary(cancer_exp)


Call:
survreg(formula = Surv(time, status) ~ group, data = bcancer, 
    dist = "exponential")
             Value Std. Error     z      p
(Intercept)  5.800      0.447 12.97 <2e-16
groupP      -0.952      0.498 -1.91  0.056

Scale fixed at 1 

Exponential distribution
Loglik(model)= -156.8   Loglik(intercept only)= -159
    Chisq= 4.36 on 1 degrees of freedom, p= 0.037 
Number of Newton-Raphson Iterations: 4 
n= 45

SELF TEST: What are the model assumptions for the exponential survival model?

The time-to-event follows an exponential distribution, which implies a constant hazard rate over time.
The effect of the group is multiplicative on the hazard scale (i.e., it shifts the rate parameter \(\lambda\) by a factor of \(e^{\beta}\)).
Censoring is non-informative (the reason for censoring is unrelated to the survival time).

We can recover the p-value for the group effect using the Wald test.

Wald Test:

Code

exp_coefs <- coef(cancer_exp)
exp_coefs

(Intercept)      groupP 
  5.8003040  -0.9516276

Code

exp_cov <- vcov(cancer_exp)
exp_cov

            (Intercept)    groupP
(Intercept)         0.2 -0.200000
groupP             -0.2  0.247619

Code

beta <- exp_coefs[2]
var_beta <- exp_cov[2, 2]

wald_stat <- beta / sqrt(var_beta)
2 * pnorm(-abs(wald_stat)) |> round(3)

groupP 
 0.056

We can also do a likelihood ratio test comparing the exponential model with group to a null model without group:

Code

exp_loglik <- cancer_exp$loglik
lr_stat <- -2 * (exp_loglik[1] - exp_loglik[2])
pchisq(q = lr_stat, df = 1, lower.tail = FALSE) |> round(3)

[1] 0.037

Extracting Rate Parameters

For the exponential model, the rate parameter (hazard) is \[\lambda = e^{(\beta_0 + \beta_1 \cdot \text{group})}\]:

Code

lambda_n <- exp(-exp_coefs[1])
lambda_p <- exp(-sum(exp_coefs))

lambda_n

(Intercept) 
0.003026634

Code

lambda_p

[1] 0.007838746

Note

Negative stain group: \(\lambda_N =\) 0.003 events/time
Positive stain group: \(\lambda_P =\) 0.0078 events/time

Higher \(\lambda\) means shorter survival (higher hazard rate).

Verification: MLE via Method of Moments

For the exponential distribution, the MLE is: \[\hat{\lambda} = \frac{\text{\# events}}{\text{total time at risk}}.\] Therefore, we can verify our MLE estimates by calculating the event rates directly from the data:

Code

bcancer |>
    group_by(group) |>
    summarise(rate = sum(status) / sum(time), .groups = "drop")

# A tibble: 2 × 2
  group    rate
  <chr>   <dbl>
1 N     0.00303
2 P     0.00784

These match our model-based estimates, confirming that the MLE for the exponential model is consistent with the method of moments calculation.

Delta Method Confidence Interval

Recall there are two methods we use to construct confidence intervals. Here, we use the delta method (linearization). \[\text{var}(g(X)) \approx (g'(X))^2 \cdot \text{var}(X)\]

Code

var_lambda <- (exp(-beta))^2 * var_beta

CI <- exp(-beta) + qnorm(0.975) * sqrt(var_lambda) * c(-1, 1)
CI |> round(3)

[1] 0.064 5.116

Note

The lower bound is positive (doesn’t cross zero), which is good since \(\lambda\) must be positive. However, the delta method doesn’t always respect parameter constraints; a safer approach is to transform:

Better approach: Transform to enforce positivity:

Code

# Build CI on log scale, then exponentiate
exp(-(beta - qnorm(0.975) * sqrt(var_beta) * c(-1, 1))) |>
    round(3)

[1] 0.977 6.868

Weibull Regression

The Weibull model generalizes the exponential by allowing a shape parameter \(\alpha > 0\). When \(\alpha = 1\), it reduces to exponential; \(\alpha > 1\) means hazard increases with time; \(\alpha < 1\) means hazard decreases (realistic in diseases that primarily effect infants such as malaria).

We fit the Weibull survival model, and extract some of its parameters:

Code

cancer_wbl <- survreg(
    Surv(time, status) ~ group,
    data = bcancer,
    dist = "weibull"
)

summary(cancer_wbl)


Call:
survreg(formula = Surv(time, status) ~ group, data = bcancer, 
    dist = "weibull")
              Value Std. Error     z      p
(Intercept)  5.8544     0.4989 11.74 <2e-16
groupP      -0.9967     0.5441 -1.83  0.067
Log(scale)   0.0646     0.1674  0.39  0.699

Scale= 1.07 

Weibull distribution
Loglik(model)= -156.7   Loglik(intercept only)= -158.8
    Chisq= 4.14 on 1 degrees of freedom, p= 0.042 
Number of Newton-Raphson Iterations: 5 
n= 45

Code

wbl_coef <- coef(cancer_wbl)
wbl_scale <- cancer_wbl$scale
wbl_cov <- vcov(cancer_wbl)

wbl_coef

(Intercept)      groupP 
  5.8543638  -0.9966647

Code

wbl_scale

[1] 1.066777

Code

wbl_cov

            (Intercept)      groupP  Log(scale)
(Intercept)  0.24887910 -0.24501139  0.02441408
groupP      -0.24501139  0.29603784 -0.01997602
Log(scale)   0.02441408 -0.01997602  0.02801424

Caution

Note: survreg parameterizes the scale as \(\sigma = 1/\alpha\), so the shape parameter is \(\alpha = 1/\text{scale}\).

Rate and Shape Parameters

Extract and interpret the Weibull parameters:

Code

lambda_n <- exp(-wbl_coef[1])
lambda_p <- exp(-sum(wbl_coef))

alpha <- 1 / wbl_scale

s_n_100 <- exp(-(lambda_n * 100)^alpha)
s_p_100 <- exp(-(lambda_p * 100)^alpha)

lambda_n

(Intercept) 
0.002867359

Code

lambda_p

[1] 0.007768337

Code

alpha

[1] 0.9374033

Code

s_n_100

(Intercept) 
  0.7334049

Code

s_p_100

[1] 0.454203

SELF TEST: What do these survival probabilities mean?

s_n_100 is the estimated survival probability at time 100 for negative-stain patients. s_p_100 is the same for positive-stain patients.

Hazard Ratio

We can also estimate the hazard ratio (positive stain vs. negative stain):

Code

hazard_ratio <- (lambda_p / lambda_n)^alpha
hazard_ratio

(Intercept) 
   2.545372

Note

A hazard ratio > 1 means positive-stain patients have higher hazard (die sooner). < 1 means better survival for positive-stain.

Testing the Shape Parameter

We can do a Wald test or likelihood ratio test to check if the shape parameter \(\alpha\) is significantly different from 1, which would indicate that the Weibull model provides a better fit than the exponential model.

Wald Test

Code

wald_z <- log(wbl_scale) / sqrt(wbl_cov[3, 3])

# p-value for H0: log(scale) = 0, i.e., scale = 1, i.e., alpha = 1
2 * pnorm(-abs(wald_z))

[1] 0.6993412

Here we don’t reject the null hypothesis that \(\alpha = 1\), which suggests the exponential model may be sufficient.

Likelihood Ratio Test

We also check this with a likelihood ratio test comparing the Weibull model to the exponential model.

Code

wbl_loglik <- logLik(cancer_wbl)
wbl_loglik

'log Lik.' -156.747 (df=3)

Code

lr_stat <- 2 * (wbl_loglik - exp_loglik[2])

pchisq(
    lr_stat,
    df = 1,
    lower.tail = FALSE
)

'log Lik.' 0.6952189 (df=3)

Again, we don’t reject the null hypothesis that \(\alpha = 1\).

Package Citations and Session Information

Session Information

Code

sessionInfo()

R version 4.4.2 (2024-10-31)
Platform: x86_64-apple-darwin20
Running under: macOS Sequoia 15.7.5

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Australia/Melbourne
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] survival_3.7-0   scales_1.4.0     readr_2.1.6      ggplot2_4.0.2   
[5] dplyr_1.2.0      conflicted_1.2.0

loaded via a namespace (and not attached):
 [1] bit_4.6.0          Matrix_1.7-1       gtable_0.3.6       jsonlite_2.0.0    
 [5] crayon_1.5.3       compiler_4.4.2     renv_1.1.7         tidyselect_1.2.1  
 [9] parallel_4.4.2     splines_4.4.2      yaml_2.3.12        fastmap_1.2.0     
[13] lattice_0.22-6     R6_2.6.1           labeling_0.4.3     generics_0.1.4    
[17] knitr_1.51         tibble_3.3.1       pillar_1.11.1      RColorBrewer_1.1-3
[21] tzdb_0.5.0         rlang_1.1.7        utf8_1.2.6         cachem_1.1.0      
[25] xfun_0.56          S7_0.2.1           bit64_4.6.0-1      memoise_2.0.1     
[29] cli_3.6.5          withr_3.0.2        magrittr_2.0.4     digest_0.6.39     
[33] grid_4.4.2         vroom_1.7.0        rstudioapi_0.18.0  hms_1.1.4         
[37] lifecycle_1.0.5    vctrs_0.7.1        evaluate_1.0.5     glue_1.8.0        
[41] farver_2.1.2       rmarkdown_2.30     tools_4.4.2        pkgconfig_2.0.3   
[45] htmltools_0.5.9

Code

packages <- c("conflicted", "dplyr", "ggplot2", "readr", "scales", "survival")
for (pkg in packages) {
    cit <- citation(pkg)

    # Check if multiple citations
    if (length(cit) > 1) {
        cat("- **", pkg, "**:\n", sep = "")
        for (i in seq_along(cit)) {
            cit_text <- format(cit[i], style = "text")
            cat("    - ", cit_text, "\n\n", sep = "")
        }
    } else {
        cit_text <- format(cit, style = "text")
        cat("- **", pkg, "**: ", cit_text, "\n\n", sep = "")
    }
}

conflicted: Wickham H (2023). conflicted: An Alternative Conflict Resolution Strategy. R package version 1.2.0, https://github.com/r-lib/conflicted, https://conflicted.r-lib.org/.
dplyr: Wickham H, François R, Henry L, Müller K, Vaughan D (2026). dplyr: A Grammar of Data Manipulation. R package version 1.2.0, https://github.com/tidyverse/dplyr, https://dplyr.tidyverse.org.
ggplot2: Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org.
readr: Wickham H, Hester J, Bryan J (2025). readr: Read Rectangular Text Data. R package version 2.1.6, https://github.com/tidyverse/readr, https://readr.tidyverse.org.
scales: Wickham H, Pedersen T, Seidel D (2025). scales: Scale Functions for Visualization. R package version 1.4.0, https://github.com/r-lib/scales, https://scales.r-lib.org.
survival:
- Therneau T (2024). A Package for Survival Analysis in R. R package version 3.7-0, https://CRAN.R-project.org/package=survival.
- Terry M. Therneau, Patricia M. Grambsch (2000). Modeling Survival Data: Extending the Cox Model. Springer, New York. ISBN 0-387-98784-3.

References

Collett, David. 2023. Modelling Survival Data in Medical Research. Fourth edition. Chapman & Hall/CRC Texts in Statistical Science. Chapman; Hall/CRC. https://research.ebsco.com/linkprocessor/plink?id=d1071170-8881-33d1-a2fe-2cff38fe2053.

Leathem, A. J., and SusanA. Brooks. 1987. “Predictive Value of Lectin Binding on Breast-Cancer Recurrence and Survival.” The Lancet 329 (8541): 1054–56. https://doi.org/10.1016/S0140-6736(87)90482-X.

Venables, W. N., and B. D. Ripley. 2002. Modern Applied Statistics with S. Fourth. Springer. https://www.stats.ox.ac.uk/pub/MASS4/.