Discrepancies in reporting for models of class svycoxph

# tl;dr
When applying `parameters::parameters()` to a model of class `svycoxph`, the reported SE is taken as the model-based SE instead of the robust SE, 95% CIs are incorrectly estimated, p-values are incorrectly estimated, and degrees of freedom may be incorrectly reported.

# The problem in-depth

Take the following data, which is adapted from the `survey` package vignette for `svycoxph`

```
# Loading packages
library(parameters)
library(survival)
library(survey)
library(broom)

# Importing data
df <- survival::pbc

# Defining radomization
df$randomized <- with(df, !is.na(trt) & trt > 0)

# Defining randomization probability weights
df$randprob <- fitted(glm(randomized ~ age * edema, data = df, family = binomial))

# Generating survey design object
d.svy <- svydesign(id = ~1, 
                   prob = ~randprob, 
                   strata = ~edema,
                   data = subset(df, randomized))

# The design-based cox proportional hazards model
m.index <- svycoxph(Surv(time, status > 0) ~ log(bili) + protime + albumin, design = d.svy)
```

I first use `summary()` to generate a coefficient summary table that will be taken as the correct summary of the model.

```
# A model summary produced via `summary()`
m.index |> summary()

#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Call:
#> svycoxph(formula = Surv(time, status > 0) ~ log(bili) + protime + 
#>     albumin, design = d.svy)
#> 
#>   n= 312, number of events= 144 
#> 
#>               coef exp(coef) se(coef) robust se      z Pr(>|z|)    
#> log(bili)  0.88592   2.42522  0.09140   0.09048  9.791  < 2e-16 ***
#> protime    0.24487   1.27745  0.07825   0.08122  3.015  0.00257 ** 
#> albumin   -1.04298   0.35240  0.21211   0.20454 -5.099 3.41e-07 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#>           exp(coef) exp(-coef) lower .95 upper .95
#> log(bili)    2.4252     0.4123     2.031    2.8958
#> protime      1.2775     0.7828     1.089    1.4979
#> albumin      0.3524     2.8377     0.236    0.5262
#> 
#> Concordance= 0.808  (se = 0.019 )
#> Likelihood ratio test= NA  on 3 df,   p=NA
#> Wald test            = 178.3  on 3 df,   p=<2e-16
#> Score (logrank) test = NA  on 3 df,   p=NA
#> 
#>   (Note: the likelihood ratio and score tests assume independence of
#>      observations within a cluster, the Wald and robust score tests do not).
```

When I deploy `broom::tidy()` I can obtain estimates of the HRs, SEs, Robust SEs, and p-values that match the content of the first coefficient summary table from `summary()` and 95% CIs that match those from the second coefficients summary table from `summary()`.

```
# A model summary produced via `tidy()`
m.index |> tidy(exponentiate = TRUE, conf.int = TRUE)

#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> # A tibble: 3 × 8
#>   term      estimate std.error robust.se statistic  p.value conf.low conf.high
#>   <chr>        <dbl>     <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
#> 1 log(bili)    2.43     0.0914    0.0905      9.79 1.23e-22    2.03      2.90 
#> 2 protime      1.28     0.0782    0.0812      3.01 2.57e- 3    1.09      1.50 
#> 3 albumin      0.352    0.212     0.205      -5.10 3.41e- 7    0.236     0.526
```

However, when I deploy `parameters::parameters()` expecting to obtain the same estimates, the reported output departs from expectation in a few notable ways:

1. Information on the survey design object is iteratively printed, a departure from `parameter` summaries of `svyglm` models.
2. The SEs are consistent with the model-based SEs, but I believe the Robust SEs are recommended here.
3. The CIs are inconsistent with those from `summary()` or `broom::tidy()`, I believe owing to the fact that they are estimated using the model-based SEs instead of the robust SEs (though in this example, they are also 95% CIs for the log-HR and not the HR).
4. The p-values are inconsistent with those from `summary()` or `broom::tidy()`.
5. In this example, the degrees of freedom reported by `parameters::parameters()` are consistent with the degrees of freedom estimated by `survey::degf()` which accounts for the complex sampling architecture. However, for a separate project using prepared NHANES data set, when I summarized a `svycoxph` model using `parameters::parameters()` the degrees of freedom reported by `parameters::parameters()` were inconsistent with the DoF calculated by `survey::degf()`. While the later example is not included here, I believe the issue might stem from the fact that the design object produced for the current example contains no cluster information (e.g., `svydesign(id = ~1, ...)` is specified instead of setting to a vector (or vectors) reflecting a different psu).

```
# A model summary produced via `parameters()`
m.index |> parameters(digits = 5, exponentiate = FALSE)

#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(df, 
#>     randomized))
#> Parameter  | Coefficient |      SE |               95% CI | Statistic |  df |      p
#> ------------------------------------------------------------------------------------
#> bili [log] |     0.88592 | 0.09140 | [ 0.70608,  1.06576] |   9.79099 | 309 | 0.002 
#> protime    |     0.24487 | 0.07825 | [ 0.09091,  0.39883] |   3.01490 | 309 | 0.083 
#> albumin    |    -1.04298 | 0.21211 | [-1.46034, -0.62562] |  -5.09909 | 309 | > .999
#> 
#> Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
#>   using a Wald distribution approximation.
```

I do not have immediate recommendations on how to resolve this issue, but I thought to bring it to your attention in case it has gone overlooked.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Discrepancies in reporting for models of class svycoxph #1174

tl;dr

The problem in-depth

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Discrepancies in reporting for models of class svycoxph #1174

Description

tl;dr

The problem in-depth

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions