The Spade

The Spade

Share this post

The Spade
The Spade
Tutorial: How to Make Sankey Diagrams in R with ggalluvial

Tutorial: How to Make Sankey Diagrams in R with ggalluvial

More code hot off the press

Ray Carpenter's avatar
Ray Carpenter
Jul 18, 2025
∙ Paid

Share this post

The Spade
The Spade
Tutorial: How to Make Sankey Diagrams in R with ggalluvial
Share

Hey everyone, we’re gonna jump right into this R tutorial on how to make the Sankey style charts seen in my cash for clunkers write-up from last week.

1. Choose Some Data

If you’d like to use the actual cash for clunkers dataset, please contact me and I will send you it. Below I’m going to provide an even simpler walkthrough if you have a dataset of your own you’d like to run this on.

A Sankey Diagram shows flows. One thing into another, or one thing into another thing into another thing, etc. The most common use out there on the internet is probably the folks who create them to document their job searches. One bucket of applications is flowed into several buckets of ‘rejected’, ‘ghosted’, ‘failed technical interview’, until the ‘offered’ bucket shows up and one record flows into an ‘accepted offer’ end bucket.

For this example, I’m going to use Bill Radjewski’s new college football starter pack dataset. It’s paywalled so I will not share out of respect to Bill and what he’s built. However, a very similar dataset is available on his website through an API, here’s the link. Perhaps that’ll be the topic of a later tutorial if you need some assistance grabbing data from there. You can also easily adapt this script to work on the nflfastR dataset which I’ve covered in a previous tutorial. I’ll show you how to do that below. Basically, step 1 is to pick you dataset.

Pick a dataset that has an entity and a result. For example, if you’re using nflfastR, choose play-by-play data from any season, pass through a list of your favorite quarterbacks, and use duckDB to take a count of their fixed_drive_results. Map the quarterbacks to their drive results with the visualization I’ll show you how to make.

2. Load Libraries and Font

You know what’s going on here:

if (!require("readr",    quietly=TRUE)) install.packages("readr");library(readr) # reading the csv
if (!require("dplyr",    quietly=TRUE)) install.packages("dplyr");library(dplyr) # filtering down
if (!require("purrr",    quietly=TRUE)) install.packages("purrr");library(purrr) # loop over files in folder
if (!require("stringr",  quietly=TRUE)) install.packages("stringr");library(stringr) # string parsing
if (!require("ggplot2",  quietly=TRUE)) install.packages("ggplot2");library(ggplot2) # plotting
if (!require("ggalluvial",quietly=TRUE)) install.packages("ggalluvial");library(ggalluvial) # sankey
if (!require("ggtext", quietly=TRUE)) install.packages("ggtext"); library(ggtext) # markdown
if (!require("showtext", quietly=TRUE)) install.packages("showtext");library(showtext) # add custom font


font_add_google("Karla", "karla")
showtext_auto()

Just loading the libraries and the font.

Keep reading with a 7-day free trial

Subscribe to The Spade to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Raymond Carpenter
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share