Neuroaxial vs GA: A Bayesian take

Lies, damned lies, and P-values that don’t mean what you think they do

Author

Nick Plummer

Published

January 19, 2025

Every intensivist should be reading Critical Care Reviews. It’s a great resource where the week’s published trials, reviews, and guidelines are listed, allowing you to flick through and appraise (as critically as you like) anything that peaks your interest.

One that caught my eye was an RTC comparing the incidence of “postoperative pulmonary complications” (PPCs) following vascular surgery under neuroaxial vs general anaesthetics. The abstract concludes:

A total of 128 patients were included in the study, with 123 patients completing the study protocol. Approximately 26.7% of patients who received general anesthesia experienced PPC, compared with 12.7% of those who received spinal anesthesia (p = 0.051)… Spinal anesthesia did not significantly reduce the incidence of PPCs in patients undergoing peripheral vascular surgery compared with general anesthesia

No significant reduction in PPCs, despite on the face of it a fall from 27% to 13%? That feels “significant”, even if the t-test they used says it isn’t. This sounds like a job for the Reverend.

We’ll start by mimicking their data. We’re only concerned about the primary outcome, and there was no adjustment in their statistics, so we can simply make a table showing 8 PPCs in the 63 patients who received neuroaxial anaesthesia, and 16 in the 60 patients who recieved a GA.

library(tidyverse)
library(brms)
library(tidybayes)

df <- tibble(
  anesthesia = factor(c(rep("neuroaxial", 63), # 63 neuroaxial pts
                        rep("general", 60))),  # 60 GA pts
  ppc = c(rep(1, 8), rep(0, 63 - 8),           # 8 experienced a PPC
          rep(1, 16), rep(0, 60 - 16))         # 16 experienced a PPC        
)

Next, and what makes this Bayesian, is we need to define our priors. We’ll go for:

  1. A non-informative prior (i.e. we know nothing and expect nothing as a result)
  2. An optimistic prior, based on the paper’s power calculation (i.e, we think neuroaxial anaesthesia will reduce PPCs from 40% to 27.5%)
  3. A skeptical prior, i.e. an expectation that neuroaxial anaesthesia will make it worse by the same amount as the optimists think it’ll reduce.

Let’s have a look at these:

# Non-informative prior
prior_noninformative <- function(x) dnorm(x, mean = 0, sd = 10)

# Optimistic prior (reduction from 40% to 27.5%)
prior_optimistic_logodds <- log(0.275 / (1 - 0.275)) - log(0.40 / (1 - 0.40))
prior_optimistic <- function(x) dnorm(x, mean = prior_optimistic_logodds, sd = 1)

# Skeptical prior (opposite to optimistic)
prior_skeptical_logodds <- -prior_optimistic_logodds
prior_skeptical <- function(x) dnorm(x, mean = prior_skeptical_logodds, sd = 1)

# Calculate prior densities for each (for plotting)
x_values <- seq(-5, 5, length.out = 1000)

prior_densities <- tibble(
  x = rep(x_values, 3),
  prior = factor(rep(c("Non-informative", "Optimistic", "Skeptical"), each = 1000)),
  density = c(prior_noninformative(x_values), prior_optimistic(x_values), prior_skeptical(x_values))
)

# Plot the priors
ggplot(prior_densities) +
  aes(x = x, y = density, color = prior) +
  geom_line() +
  labs(
    x = "Effect of Neuroaxial Anesthesia (log-odds ratio)",
    y = "Density",
    title = "Prior Distributions for the Effect of Neuroaxial Anesthesia",
    color = "Prior"
  ) +
  theme_classic()

The non-informative prior believes all likelihoods are equal, the optimistic is centred around an OR of 0.6, and the skeptical around 1.8 for PPCs.

We now need to apply these models to the new data using {brms}. I’m using logistic regression because it’s essentially the same thing as a t-test in this context.

# Model with non-informative prior
model_noninformative <- brm(
  ppc ~ anesthesia, 
  data = df, 
  family = bernoulli(),
  prior = c(prior(normal(0, 10), class = "b")),
  chains = 4, 
  iter = 2000, 
  warmup = 1000,
  seed = 123
)

# Model with optimistic prior
model_optimistic <- brm(
  ppc ~ anesthesia, 
  data = df, 
  family = bernoulli(),
  prior = c(prior(normal(log(0.275 / (1 - 0.275)) - log(0.40 / (1 - 0.40)), 1), class = "b")), 
  chains = 4, 
  iter = 2000, 
  warmup = 1000,
  seed = 123
)

# Model with skeptical prior
model_skeptical <- brm(
  ppc ~ anesthesia, 
  data = df, 
  family = bernoulli(),
  prior = c(prior(normal(-(log(0.275 / (1 - 0.275)) - log(0.40 / (1 - 0.40))), 1), class = "b")),
  chains = 4, 
  iter = 2000, 
  warmup = 1000,
  seed = 123
)

The big problem with Bayesian analyses as opposed to frequentist approaches is no longer the computation, because these models execute quicktly - but it’s how do we then interpret them.

We can start with the t-test used in the article, as most clinicians “understand” these:

# Create a subset of the data for each anesthesia group
neuroaxial_group <- df %>% filter(anesthesia == "neuroaxial") %>% pull(ppc)
general_group <- df %>% filter(anesthesia == "general") %>% pull(ppc)

# Perform the t-test (Assuming unequal variances)
t_test_result <- t.test(neuroaxial_group, general_group, var.equal = FALSE)

# Print the results
print(t_test_result)

    Welch Two Sample t-test

data:  neuroaxial_group and general_group
t = -1.9555, df = 109.5, p-value = 0.05308
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.281251458  0.001886378
sample estimates:
mean of x mean of y 
0.1269841 0.2666667 

This spits out a nice answer. We think that there’s a 5.3% that the data we observed might occur if the true PPC difference was actually zero, and as such we reject the hypothesis that there is a difference to avoid making a type 1 error (but risk - and indeed probably make - a type 2 error).

However, the t-test doesn’t tell us the probability of either of the hypotheses being true, or the true difference between groups. It is also heavily effected by both the magnitude of the difference we’re looking for (i.e. the power calculation, which assumed a 12.5% reduction in PPC rate) and correspondingly the sample sizes. The smaller the true difference, the more samples you’re going to need to find it - as such one wonders if with a larger sample size the value would have crept under the magic 5%?

Now we’ll look at the Bayesian test result instead. For the time being lets stick with just the non-informative (unbiased) prior.

summary(model_noninformative)
 Family: bernoulli 
  Links: mu = logit 
Formula: ppc ~ anesthesia 
   Data: df (Number of observations: 123) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
                     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept               -1.02      0.30    -1.60    -0.46 1.00     4008
anesthesianeuroaxial    -0.94      0.49    -1.93    -0.05 1.00     2442
                     Tail_ESS
Intercept                2646
anesthesianeuroaxial     2468

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

The Bayesian coefficient (and its posterior distribution) represents our updated belief about the effect of neuroaxial anesthesia on PPCs, after considering both the new data and our prior knowledge (or in this case, lack of knowledge). It’s a statement about the plausible values of the effect, given the evidence - i.e. the opposite of the frequentist t-test above.

In this case, we can be 95% confident that the natural logarithm of the odds ratio for the effect of neuroaxial anaesthesia on PPCs is between -1.93 and -0.05. Or to convert it into numbers that are more understandable:

# Extract posterior samples for the relevant parameter (anesthesianeuroaxial)
posterior_noninformative <- model_noninformative %>% 
  spread_draws(b_anesthesianeuroaxial) 

# Calculate the odds ratio (OR)
posterior_noninformative <- posterior_noninformative %>% 
  mutate(OR = exp(b_anesthesianeuroaxial))

# Now you can summarize and interpret the OR
posterior_noninformative %>% 
  summarize(
    median_OR = median(OR),
    lower_CI = quantile(OR, 0.025),
    upper_CI = quantile(OR, 0.975)
  )
# A tibble: 1 × 3
  median_OR lower_CI upper_CI
      <dbl>    <dbl>    <dbl>
1     0.397    0.145    0.953

So the most probable OR is 0.4, and we’re 95% confident that the true value is somewhere between 0.15 and 0.95.

But this is where it gets more complex. Bayesian thinking relies on the prior probability too, and updating this “belief” with the new data. So lets’s look at what we now believe if we used the other priors:

# Extract other priors
posterior_optimistic <- model_optimistic %>% 
  spread_draws(b_anesthesianeuroaxial) 

posterior_skeptical <- model_skeptical %>% 
  spread_draws(b_anesthesianeuroaxial) 

# Calculate the proportion of posterior samples where the coefficient is negative 
# (i.e. fewer PPCs with neuroaxial)
confidence_noninformative <- posterior_noninformative %>% 
  summarize(prob_fewer_ppc = mean(b_anesthesianeuroaxial < 0)) %>% 
  pull(prob_fewer_ppc)

confidence_optimistic <- posterior_optimistic %>% 
  summarize(prob_fewer_ppc = mean(b_anesthesianeuroaxial < 0)) %>% 
  pull(prob_fewer_ppc)

confidence_skeptical <- posterior_skeptical %>% 
  summarize(prob_fewer_ppc = mean(b_anesthesianeuroaxial < 0)) %>% 
  pull(prob_fewer_ppc)

And for each one in turn, how confident we are now that neuroaxial anaesthesia reduces PPCs:

print(paste0("Confidence (non-informative prior): ", round(confidence_noninformative * 100, 2), "%"))
[1] "Confidence (non-informative prior): 98.28%"
print(paste0("Confidence (optimistic prior): ", round(confidence_optimistic * 100, 2), "%"))
[1] "Confidence (optimistic prior): 97.9%"
print(paste0("Confidence (skeptical prior): ", round(confidence_skeptical * 100, 2), "%"))
[1] "Confidence (skeptical prior): 94.12%"

So in summary:

Why is that? Well, to understand that we need to look at the probability distributions:

# Combine the posterior samples into a single data frame for plotting
posterior_combined <- bind_rows(
  posterior_noninformative %>% mutate(prior = "Non-informative"),
  posterior_optimistic %>% mutate(prior = "Optimistic"),
  posterior_skeptical %>% mutate(prior = "Skeptical")
)

# Create the plot
ggplot(posterior_combined, aes(x = b_anesthesianeuroaxial, fill = prior)) +
  geom_density(alpha = 0.5) +
  geom_vline(xintercept = 0, linetype = "dashed") +
  labs(
    x = "Effect of Neuroaxial Anesthesia (log-odds ratio)",
    y = "Density",
    title = "Posterior Distributions of Treatment Effect with Different Priors",
    fill = "Prior"
  ) +
  theme_classic()

Despite what you learned in basic statistics, probability isn’t just a binary p-value - it’s a distribution of confidence. We’re increasingly confident about the middle, and less confident towards the tails.

What we’re seeing here is that the optimistic prior’s mean is slightly lower than the non-informative prior’s - but it is narrower (i.e. more certain), and has shifted slightly away from the mean it had before (as the difference is less stark than the 40% vs 27.5% it was expecting).

Similarly, the skeptic is more confident that the true difference is slightly less than the other two, but that’s because they’ve been dragged all the way from the other side of the line of no effect.

So to summarise - if the stats in a paper look wrong, they probably are. The authors do address this in their discussion:

Given the observed trend toward a reduction in the overall PPC incidence in the neuraxial anesthesia group, it is conceivable that a type II error may have occurred, and a larger sample size might be necessary to detect a significant difference in PPC incidence between the groups. Based on these findings, a minimum sample size of 252 patients (126 patients enrolled in each study arm) would be required to detect an absolute difference of 14% (with α = 0.05 and β = 0.20) in PPC incidence between neuraxial and general anesthesia in patients undergoing peripheral bypass surgery.

However, if you want to be strictly frequentist about things, then sure, there is no “significant” difference between neuroaxial and general anaesthesia in the incidence of PPCs in vascular patients (but then there’s also no such thing as a “trend” in frequentist stats). But if you want to think in a probabilistic way, based on this data, we can be at least 94% sure that neuroaxial anaesthesia is better in this context.