The Spade: A Football Analytics Newsletter

The Spade: A Football Analytics Newsletter

Tutorial: Football Data Engineering Part 4 - Airflow & SQLite

More data engineering in your inbox

Ray Carpenter's avatar
Ray Carpenter
Jun 05, 2026
∙ Paid

Good afternoon everyone and happy Friday,

Sorry for the missing tutorial last week, sometimes life gets busy. I’m here with more info on a) why I am doubling data in the football_data_engineering repo and b) how to use Airflow.

The Spade: A Football Analytics Newsletter is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Why the Data Infrastructure Change?

If you’re tech-adjacent or a tech worker and you’ve been in it for more than a year, you’re familiar with the never-ending supply of new, shiny tools that solve all your business problems. DuckDB is the new, shiny data tool of 2026. On a package import level, it’s great. It reads through .csv files quicker than any other Python query engine I’ve used, and I didn’t uncover a roadblock until a reader informed me that you guys can’t open my .duckdb file because it only allows one reader and one writer (me). In response to this constraint, I’ve returned to ol’ reliable query engine, SQLite. SQLite has been around forever and just gets the job done plain and simple. It can’t handle a billion row CSV using computer memory like DuckDB can, but you folks will be able to open this .sqlite file and that’s all I care about. I’m not running an enterprise here, just giving you football data engineering tutorials. So from now on, I’ll be supporting the .duckdb NFL data file as well as the .sqlite file. I am aware I am creating future tech debt for myself, but it’s all good.

Here’s the code I used to export the .duckdb to SQLite, just a few lines:

import duckdb

  GOLD_TABLES = [
      "jumbo_plays", "jumbo_plays_full", "jumbo_by_team_season",
      "epa_per_dropback", "rushing_player_stats", "rushing_gap_stats",
  ]

  con = duckdb.connect("data/football.duckdb")
  con.execute("ATTACH 'data/football.sqlite' AS sqlite_db (TYPE sqlite)")

  for table in GOLD_TABLES:
      con.execute(f"CREATE OR REPLACE TABLE sqlite_db.{table} AS SELECT *
  FROM {table}")

  con.close()
  print("wrote data/football.sqlite")

DuckDB actually has a nifty ATTACH function for this exact scenario. And here’s a quick little script you can use to fetch data from this:

 import sqlite3
  con = sqlite3.connect("data/football.sqlite")
  print(con.execute(
      "SELECT posteam, count(*) FROM jumbo_plays GROUP BY 1 ORDER BY 2
  DESC LIMIT 5"
  ).fetchall())

Now to our next part, what we’ll be learning about today: Airflow.

What is Airflow?

User's avatar

Continue reading this post for free, courtesy of Ray Carpenter.

Or purchase a paid subscription.
© 2026 Raymond Carpenter · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture