Skip to content

Discrepancies in reporting for models of class svycoxph #1174

@ejvalencia

Description

@ejvalencia

tl;dr

When applying parameters::parameters() to a model of class svycoxph, the reported SE is taken as the model-based SE instead of the robust SE, 95% CIs are incorrectly estimated, p-values are incorrectly estimated, and degrees of freedom may be incorrectly reported.

The problem in-depth

Take the following data, which is adapted from the survey package vignette for svycoxph

# Loading packages
library(parameters)
library(survival)
library(survey)
library(broom)

# Importing data
df <- survival::pbc

# Defining radomization
df$randomized <- with(df, !is.na(trt) & trt > 0)

# Defining randomization probability weights
df$randprob <- fitted(glm(randomized ~ age * edema, data = df, family = binomial))

# Generating survey design object
d.svy <- svydesign(id = ~1, 
                   prob = ~randprob, 
                   strata = ~edema,
                   data = subset(df, randomized))

# The design-based cox proportional hazards model
m.index <- svycoxph(Surv(time, status > 0) ~ log(bili) + protime + albumin, design = d.svy)

I first use summary() to generate a coefficient summary table that will be taken as the correct summary of the model.

# A model summary produced via `summary()`
m.index |> summary()

#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Call:
#> svycoxph(formula = Surv(time, status > 0) ~ log(bili) + protime + 
#>     albumin, design = d.svy)
#> 
#>   n= 312, number of events= 144 
#> 
#>               coef exp(coef) se(coef) robust se      z Pr(>|z|)    
#> log(bili)  0.88592   2.42522  0.09140   0.09048  9.791  < 2e-16 ***
#> protime    0.24487   1.27745  0.07825   0.08122  3.015  0.00257 ** 
#> albumin   -1.04298   0.35240  0.21211   0.20454 -5.099 3.41e-07 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#>           exp(coef) exp(-coef) lower .95 upper .95
#> log(bili)    2.4252     0.4123     2.031    2.8958
#> protime      1.2775     0.7828     1.089    1.4979
#> albumin      0.3524     2.8377     0.236    0.5262
#> 
#> Concordance= 0.808  (se = 0.019 )
#> Likelihood ratio test= NA  on 3 df,   p=NA
#> Wald test            = 178.3  on 3 df,   p=<2e-16
#> Score (logrank) test = NA  on 3 df,   p=NA
#> 
#>   (Note: the likelihood ratio and score tests assume independence of
#>      observations within a cluster, the Wald and robust score tests do not).

When I deploy broom::tidy() I can obtain estimates of the HRs, SEs, Robust SEs, and p-values that match the content of the first coefficient summary table from summary() and 95% CIs that match those from the second coefficients summary table from summary().

# A model summary produced via `tidy()`
m.index |> tidy(exponentiate = TRUE, conf.int = TRUE)

#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> # A tibble: 3 × 8
#>   term      estimate std.error robust.se statistic  p.value conf.low conf.high
#>   <chr>        <dbl>     <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
#> 1 log(bili)    2.43     0.0914    0.0905      9.79 1.23e-22    2.03      2.90 
#> 2 protime      1.28     0.0782    0.0812      3.01 2.57e- 3    1.09      1.50 
#> 3 albumin      0.352    0.212     0.205      -5.10 3.41e- 7    0.236     0.526

However, when I deploy parameters::parameters() expecting to obtain the same estimates, the reported output departs from expectation in a few notable ways:

  1. Information on the survey design object is iteratively printed, a departure from parameter summaries of svyglm models.
  2. The SEs are consistent with the model-based SEs, but I believe the Robust SEs are recommended here.
  3. The CIs are inconsistent with those from summary() or broom::tidy(), I believe owing to the fact that they are estimated using the model-based SEs instead of the robust SEs (though in this example, they are also 95% CIs for the log-HR and not the HR).
  4. The p-values are inconsistent with those from summary() or broom::tidy().
  5. In this example, the degrees of freedom reported by parameters::parameters() are consistent with the degrees of freedom estimated by survey::degf() which accounts for the complex sampling architecture. However, for a separate project using prepared NHANES data set, when I summarized a svycoxph model using parameters::parameters() the degrees of freedom reported by parameters::parameters() were inconsistent with the DoF calculated by survey::degf(). While the later example is not included here, I believe the issue might stem from the fact that the design object produced for the current example contains no cluster information (e.g., svydesign(id = ~1, ...) is specified instead of setting to a vector (or vectors) reflecting a different psu).
# A model summary produced via `parameters()`
m.index |> parameters(digits = 5, exponentiate = FALSE)

#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Parameter  | Coefficient |      SE |               95% CI | Statistic |  df |      p
#> ------------------------------------------------------------------------------------
#> bili [log] |     0.88592 | 0.09140 | [ 0.70608,  1.06576] |   9.79099 | 309 | 0.002 
#> protime    |     0.24487 | 0.07825 | [ 0.09091,  0.39883] |   3.01490 | 309 | 0.083 
#> albumin    |    -1.04298 | 0.21211 | [-1.46034, -0.62562] |  -5.09909 | 309 | > .999
#> 
#> Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
#>   using a Wald distribution approximation.

I do not have immediate recommendations on how to resolve this issue, but I thought to bring it to your attention in case it has gone overlooked.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions