library(tidyverse)
library(brms)
library(tidybayes)
<- tibble(
df anesthesia = factor(c(rep("neuroaxial", 63), # 63 neuroaxial pts
rep("general", 60))), # 60 GA pts
ppc = c(rep(1, 8), rep(0, 63 - 8), # 8 experienced a PPC
rep(1, 16), rep(0, 60 - 16)) # 16 experienced a PPC
)
Every intensivist should be reading Critical Care Reviews. It’s a great resource where the week’s published trials, reviews, and guidelines are listed, allowing you to flick through and appraise (as critically as you like) anything that peaks your interest.
One that caught my eye was an RTC comparing the incidence of “postoperative pulmonary complications” (PPCs) following vascular surgery under neuroaxial vs general anaesthetics. The abstract concludes:
A total of 128 patients were included in the study, with 123 patients completing the study protocol. Approximately 26.7% of patients who received general anesthesia experienced PPC, compared with 12.7% of those who received spinal anesthesia (p = 0.051)… Spinal anesthesia did not significantly reduce the incidence of PPCs in patients undergoing peripheral vascular surgery compared with general anesthesia
No significant reduction in PPCs, despite on the face of it a fall from 27% to 13%? That feels “significant”, even if the t-test they used says it isn’t. This sounds like a job for the Reverend.
We’ll start by mimicking their data. We’re only concerned about the primary outcome, and there was no adjustment in their statistics, so we can simply make a table showing 8 PPCs in the 63 patients who received neuroaxial anaesthesia, and 16 in the 60 patients who recieved a GA.
Next, and what makes this Bayesian, is we need to define our priors. We’ll go for:
- A non-informative prior (i.e. we know nothing and expect nothing as a result)
- An optimistic prior, based on the paper’s power calculation (i.e, we think neuroaxial anaesthesia will reduce PPCs from 40% to 27.5%)
- A skeptical prior, i.e. an expectation that neuroaxial anaesthesia will make it worse by the same amount as the optimists think it’ll reduce.
Let’s have a look at these:
# Non-informative prior
<- function(x) dnorm(x, mean = 0, sd = 10)
prior_noninformative
# Optimistic prior (reduction from 40% to 27.5%)
<- log(0.275 / (1 - 0.275)) - log(0.40 / (1 - 0.40))
prior_optimistic_logodds <- function(x) dnorm(x, mean = prior_optimistic_logodds, sd = 1)
prior_optimistic
# Skeptical prior (opposite to optimistic)
<- -prior_optimistic_logodds
prior_skeptical_logodds <- function(x) dnorm(x, mean = prior_skeptical_logodds, sd = 1)
prior_skeptical
# Calculate prior densities for each (for plotting)
<- seq(-5, 5, length.out = 1000)
x_values
<- tibble(
prior_densities x = rep(x_values, 3),
prior = factor(rep(c("Non-informative", "Optimistic", "Skeptical"), each = 1000)),
density = c(prior_noninformative(x_values), prior_optimistic(x_values), prior_skeptical(x_values))
)
# Plot the priors
ggplot(prior_densities) +
aes(x = x, y = density, color = prior) +
geom_line() +
labs(
x = "Effect of Neuroaxial Anesthesia (log-odds ratio)",
y = "Density",
title = "Prior Distributions for the Effect of Neuroaxial Anesthesia",
color = "Prior"
+
) theme_classic()
The non-informative prior believes all likelihoods are equal, the optimistic is centred around an OR of 0.6, and the skeptical around 1.8 for PPCs.
We now need to apply these models to the new data using {brms}. I’m using logistic regression because it’s essentially the same thing as a t-test in this context.
# Model with non-informative prior
<- brm(
model_noninformative ~ anesthesia,
ppc data = df,
family = bernoulli(),
prior = c(prior(normal(0, 10), class = "b")),
chains = 4,
iter = 2000,
warmup = 1000,
seed = 123
)
# Model with optimistic prior
<- brm(
model_optimistic ~ anesthesia,
ppc data = df,
family = bernoulli(),
prior = c(prior(normal(log(0.275 / (1 - 0.275)) - log(0.40 / (1 - 0.40)), 1), class = "b")),
chains = 4,
iter = 2000,
warmup = 1000,
seed = 123
)
# Model with skeptical prior
<- brm(
model_skeptical ~ anesthesia,
ppc data = df,
family = bernoulli(),
prior = c(prior(normal(-(log(0.275 / (1 - 0.275)) - log(0.40 / (1 - 0.40))), 1), class = "b")),
chains = 4,
iter = 2000,
warmup = 1000,
seed = 123
)
The big problem with Bayesian analyses as opposed to frequentist approaches is no longer the computation, because these models execute quicktly - but it’s how do we then interpret them.
We can start with the t-test used in the article, as most clinicians “understand” these:
# Create a subset of the data for each anesthesia group
<- df %>% filter(anesthesia == "neuroaxial") %>% pull(ppc)
neuroaxial_group <- df %>% filter(anesthesia == "general") %>% pull(ppc)
general_group
# Perform the t-test (Assuming unequal variances)
<- t.test(neuroaxial_group, general_group, var.equal = FALSE)
t_test_result
# Print the results
print(t_test_result)
Welch Two Sample t-test
data: neuroaxial_group and general_group
t = -1.9555, df = 109.5, p-value = 0.05308
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.281251458 0.001886378
sample estimates:
mean of x mean of y
0.1269841 0.2666667
This spits out a nice answer. We think that there’s a 5.3% that the data we observed might occur if the true PPC difference was actually zero, and as such we reject the hypothesis that there is a difference to avoid making a type 1 error (but risk - and indeed probably make - a type 2 error).
However, the t-test doesn’t tell us the probability of either of the hypotheses being true, or the true difference between groups. It is also heavily effected by both the magnitude of the difference we’re looking for (i.e. the power calculation, which assumed a 12.5% reduction in PPC rate) and correspondingly the sample sizes. The smaller the true difference, the more samples you’re going to need to find it - as such one wonders if with a larger sample size the value would have crept under the magic 5%?
Now we’ll look at the Bayesian test result instead. For the time being lets stick with just the non-informative (unbiased) prior.
summary(model_noninformative)
Family: bernoulli
Links: mu = logit
Formula: ppc ~ anesthesia
Data: df (Number of observations: 123)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept -1.02 0.30 -1.60 -0.46 1.00 4008
anesthesianeuroaxial -0.94 0.49 -1.93 -0.05 1.00 2442
Tail_ESS
Intercept 2646
anesthesianeuroaxial 2468
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The Bayesian coefficient (and its posterior distribution) represents our updated belief about the effect of neuroaxial anesthesia on PPCs, after considering both the new data and our prior knowledge (or in this case, lack of knowledge). It’s a statement about the plausible values of the effect, given the evidence - i.e. the opposite of the frequentist t-test above.
In this case, we can be 95% confident that the natural logarithm of the odds ratio for the effect of neuroaxial anaesthesia on PPCs is between -1.93 and -0.05. Or to convert it into numbers that are more understandable:
# Extract posterior samples for the relevant parameter (anesthesianeuroaxial)
<- model_noninformative %>%
posterior_noninformative spread_draws(b_anesthesianeuroaxial)
# Calculate the odds ratio (OR)
<- posterior_noninformative %>%
posterior_noninformative mutate(OR = exp(b_anesthesianeuroaxial))
# Now you can summarize and interpret the OR
%>%
posterior_noninformative summarize(
median_OR = median(OR),
lower_CI = quantile(OR, 0.025),
upper_CI = quantile(OR, 0.975)
)
# A tibble: 1 × 3
median_OR lower_CI upper_CI
<dbl> <dbl> <dbl>
1 0.397 0.145 0.953
So the most probable OR is 0.4, and we’re 95% confident that the true value is somewhere between 0.15 and 0.95.
But this is where it gets more complex. Bayesian thinking relies on the prior probability too, and updating this “belief” with the new data. So lets’s look at what we now believe if we used the other priors:
# Extract other priors
<- model_optimistic %>%
posterior_optimistic spread_draws(b_anesthesianeuroaxial)
<- model_skeptical %>%
posterior_skeptical spread_draws(b_anesthesianeuroaxial)
# Calculate the proportion of posterior samples where the coefficient is negative
# (i.e. fewer PPCs with neuroaxial)
<- posterior_noninformative %>%
confidence_noninformative summarize(prob_fewer_ppc = mean(b_anesthesianeuroaxial < 0)) %>%
pull(prob_fewer_ppc)
<- posterior_optimistic %>%
confidence_optimistic summarize(prob_fewer_ppc = mean(b_anesthesianeuroaxial < 0)) %>%
pull(prob_fewer_ppc)
<- posterior_skeptical %>%
confidence_skeptical summarize(prob_fewer_ppc = mean(b_anesthesianeuroaxial < 0)) %>%
pull(prob_fewer_ppc)
And for each one in turn, how confident we are now that neuroaxial anaesthesia reduces PPCs:
print(paste0("Confidence (non-informative prior): ", round(confidence_noninformative * 100, 2), "%"))
[1] "Confidence (non-informative prior): 98.28%"
print(paste0("Confidence (optimistic prior): ", round(confidence_optimistic * 100, 2), "%"))
[1] "Confidence (optimistic prior): 97.9%"
print(paste0("Confidence (skeptical prior): ", round(confidence_skeptical * 100, 2), "%"))
[1] "Confidence (skeptical prior): 94.12%"
So in summary:
- If you believed there’s no effect, you’re now 98% confident that neuroaxial anaesthesia reduces PPCs.
- More interestingly, if you believed it made things worse, you’re now 94% confident that it reduces PPCs.
- And if you previously believes there was an effect, oddly you’re now (slightly) less confident than those who didn’t believe there was any effect at all.
Why is that? Well, to understand that we need to look at the probability distributions:
# Combine the posterior samples into a single data frame for plotting
<- bind_rows(
posterior_combined %>% mutate(prior = "Non-informative"),
posterior_noninformative %>% mutate(prior = "Optimistic"),
posterior_optimistic %>% mutate(prior = "Skeptical")
posterior_skeptical
)
# Create the plot
ggplot(posterior_combined, aes(x = b_anesthesianeuroaxial, fill = prior)) +
geom_density(alpha = 0.5) +
geom_vline(xintercept = 0, linetype = "dashed") +
labs(
x = "Effect of Neuroaxial Anesthesia (log-odds ratio)",
y = "Density",
title = "Posterior Distributions of Treatment Effect with Different Priors",
fill = "Prior"
+
) theme_classic()
Despite what you learned in basic statistics, probability isn’t just a binary p-value - it’s a distribution of confidence. We’re increasingly confident about the middle, and less confident towards the tails.
What we’re seeing here is that the optimistic prior’s mean is slightly lower than the non-informative prior’s - but it is narrower (i.e. more certain), and has shifted slightly away from the mean it had before (as the difference is less stark than the 40% vs 27.5% it was expecting).
Similarly, the skeptic is more confident that the true difference is slightly less than the other two, but that’s because they’ve been dragged all the way from the other side of the line of no effect.
So to summarise - if the stats in a paper look wrong, they probably are. The authors do address this in their discussion:
Given the observed trend toward a reduction in the overall PPC incidence in the neuraxial anesthesia group, it is conceivable that a type II error may have occurred, and a larger sample size might be necessary to detect a significant difference in PPC incidence between the groups. Based on these findings, a minimum sample size of 252 patients (126 patients enrolled in each study arm) would be required to detect an absolute difference of 14% (with α = 0.05 and β = 0.20) in PPC incidence between neuraxial and general anesthesia in patients undergoing peripheral bypass surgery.
However, if you want to be strictly frequentist about things, then sure, there is no “significant” difference between neuroaxial and general anaesthesia in the incidence of PPCs in vascular patients (but then there’s also no such thing as a “trend” in frequentist stats). But if you want to think in a probabilistic way, based on this data, we can be at least 94% sure that neuroaxial anaesthesia is better in this context.