Won’t somebody please think of the pie charts?

After being called out at the EMSA Winter Scientific Meeting for my hatred of pie charts, I think it’s important to point out why they are the worst way to visualise your audit data on a poster or presentation, and more importantly how straightforward it is to do better.

Edward Tufte, the don of data visualisation, had this to say about pie charts:

A table is nearly always better than a dumb pie chart; the only worse design than a pie chart is several of them, for then the viewer is asked to compare quantities located in spatial disarray both within and between pies… Given their low data-density and failure to order numbers along a visual dimension, pie charts should never be used.

Why would he say such a thing? Humans love a circle, and pie charts are insanely popular in just about any medium where data needs to be presented. Excel especially loves to generate a pie chart for you. So they can’t be that bad?

Pie charts are notoriously difficult to read, and are horrendously ambiguous:

When there are too many categories tiny slices become indistinguishable
People tend to intepret them by area, rather than angle (as intended). This leads to subjective interpretation, especially when people start adding 3D effects (shudder).
Similarly they can be made even more misleading - for example the angle of a slice affects how you interpret it’s size
And ultimately, unless percentages are added (in which case, just use a table) comparing similar proportions is almost impossible. Good luck comparing that 22% slice to the 24% one.

Don’t just take my rant for it - turns out there’s a decent evidence base behind why they’re bad too.

But all I know is a pie chart!

What’s the alternative then?

For pretty much every data set I’ve seen recently presented as a pie chart on a poster or presentation, a bar chart would be far, far less amiguous. Let’s use some data from one of my own audits as an example. This is sedation regimes used during transfer of a critically unwell patient.

library(tidyverse)

raw_data <- read_csv("raw_data.csv")

processed <- raw_data %>% 
  mutate(Sedation = case_when(
    # If nothing documented
    is.na(Hypnotic) & is.na(Opiate) & is.na(Benzo) ~ "No sedation documented",
    # Otherwise combine the hypnotic/opiate/and benzodiazepines
    TRUE ~ paste(ifelse(!is.na(Hypnotic), Hypnotic, ""), 
                 ifelse(!is.na(Opiate), Opiate, ""), 
                 ifelse(!is.na(Benzo), Benzo, ""), 
                 sep = " ")
  )) %>% 
  # Get rid of leading whitespace
  mutate(Sedation = trimws(Sedation))

count(processed$Sedation)


         Alfentanil Midazolam            Fentanyl Midazolam 
                            1                             4 
           Morphine Midazolam        No sedation documented 
                           10                             8 
                     Propofol           Propofol  Midazolam 
                           72                             7 
          Propofol Alfentanil Propofol Alfentanil Midazolam 
                           20                             3 
            Propofol Fentanyl             Propofol Morphine 
                           16                             8 
  Propofol Morphine Midazolam         Propofol Remifentanil 
                            1                             6 
                 Remifentanil 
                            1

Let’s try displaying this as a pie chart. Note that pie charts are so bad that ggplot2 doesn’t allow you to make them natively, so you instead have to make a bar chart and then play with the coordinates system to make it into a pie!

processed %>% 
  # Count each type of sedation
  count(Sedation) %>% 
  # Pass to ggplot() to plot
  ggplot() +
  aes(x= "", y = n, fill = Sedation) +
  geom_bar(stat="identity", width=1) +
  coord_polar("y", start=0) +
  # Remove background, grid, numeric labels to neaten
  theme_void()

Let’s be honest - this is essentially useless. We can see that propofol alone seems to make up the bulk of the sedation approaches used, but the smaller segments are essentially uninterpretable. The table is more intereptable at a glance.

We could try to fiddle with the pie chart to make it easier to interpret by combining groups we’re not interested in together (all types of opiates, for example) and then using colours to highlight the things we’re worried about (single-agent sedation, and times that no sedation has been documented).

cleaner <- raw_data %>% 
  mutate(Sedation = case_when(
    is.na(Hypnotic) & is.na(Opiate) & is.na(Benzo) ~ "No sedation documented",
    TRUE ~ paste(ifelse(!is.na(Hypnotic), Hypnotic, ""), 
                 # Collapse all opiates into one group
                 ifelse(!is.na(Opiate), paste("+", "Opiate"), ""), 
                 ifelse(!is.na(Benzo), paste("+", Benzo), ""), 
                 sep = " ")
  )) %>%
  # Tidy up the names of the groups
  mutate(Sedation = trimws(Sedation)) %>% 
  mutate(Sedation = gsub("^\\+\\s*", "", Sedation)) %>% # Remove leading "+ "
  count(Sedation)

cleaner %>% 
  ggplot() +
  aes(x= "", y = n, fill = Sedation) +
  geom_bar(stat="identity", width=1, colour = "black") +
  coord_polar("y", start=0) +
  theme_void() +
  # Add a colour scale to highlight things we're concerned about
  scale_fill_manual(values = c("red", "#77B8DF", "grey", "yellow", "grey", "grey", "grey"))

It’s still not the clearest - there’s no idea of absolute or relative numbers, and we lose information by making all the different types of mixed regimes the same colour as well as space by including the legend.

What are our options? Add labels to the chart with numbers or percentages? Or even the sedation type for each section?

Luckily there’s another visualisation available to us that contains all this information, and more:

cleaner %>%
  mutate (
    # Turn the absolute numbers into percentage
    Percentage = n / sum(n) * 100,
    # Order the sedation types in how we want them to appear
    Sedation = fct_relevel(Sedation,
                           "Propofol + Opiate + Midazolam",
                           "Propofol + Midazolam",
                           "Opiate + Midazolam",
                           "Opiate",
                           "Propofol + Opiate",
                           "Propofol",
                           "No sedation documented")
    ) %>% 
  ggplot() +
  aes(x = Sedation, y = Percentage, fill = Sedation) +
  geom_bar(stat = "identity", color="black") +
  # Add meaningful labels
  labs(title = "Documented sedation regime during transfer", x = "", y="Used on % of transfers") +
  scale_fill_manual(values = c("grey", "grey", "grey", "#77B8DF", "grey", "yellow", "red")) +
  # Rotate 90deg
  coord_flip() +
  theme_classic() +
  # Get rid of the legend as it's no longer needed
  theme(legend.position = "none")

Why is this so much better?

Enables direct comparison: The length of the bars makes it crystal clear which category dominates
Values are explicit: No need to estimate the size of a slice; the axis has you covered
Space for detail: Want to annotate specific points? A bar chart offers all the room you need
Clarity: The key bars are highlighted, and the rest can be uncovered without the need for complex colour schemes and legends

Although I’ve done this in R, the same principles of data visualisation apply when using something like Excel to make graphs. Only you’d never do that, because the concept of reproducible research applies to audits etc too, so you’re going to use R to manipulate your data in a reproducible way? Yes?

But what’s the problem with multiple pie charts?

Back to Professor Tufte:

the only worse design than a pie chart is several of them

There seems to be a trend of showing changes over time with a series of pie charts, kind of like this (made up) example of results from a teaching session over time that I’ve mostly borrowed from Yan Holtz:

Don’t pretend that you can actually make head or tail of the trends here!

Luckily our old friend the bar chart is here to help:

This is objectively a much clearer representation - it shows the feedback for the teaching session improving from left to right, and allows the reader to extract the values.

There’s other alternatives you could consider here, including line charts (below) or area graphs, but for most of the work we do around audits and QIPs bar charts are ideal (unless you’re considering using a run chart to show evidence of progress over time…)

To wrap up

In conclusion, pie charts are perfect… if your goal is to obfuscate data, frustrate viewers, and add unnecessary pizzazz to presentations.

If you care about communicating information succintly and clearly, please stick to bar charts (unless an event better visualisation is appropriate). And if you’re ever tempted, even after all this, I’ll point you towards this summary from The University of Melbourne: