Tutorial: Football Data Engineering Part 4 - Airflow & SQLite
More data engineering in your inbox
Good afternoon everyone and happy Friday,
Sorry for the missing tutorial last week, sometimes life gets busy. I’m here with more info on a) why I am doubling data in the football_data_engineering repo and b) how to use Airflow.
Why the Data Infrastructure Change?
If you’re tech-adjacent or a tech worker and you’ve been in it for more than a year, you’re familiar with the never-ending supply of new, shiny tools that solve all your business problems. DuckDB is the new, shiny data tool of 2026. On a package import level, it’s great. It reads through .csv files quicker than any other Python query engine I’ve used, and I didn’t uncover a roadblock until a reader informed me that you guys can’t open my .duckdb file because it only allows one reader and one writer (me). In response to this constraint, I’ve returned to ol’ reliable query engine, SQLite. SQLite has been around forever and just gets the job done plain and simple. It can’t handle a billion row CSV using computer memory like DuckDB can, but you folks will be able to open this .sqlite file and that’s all I care about. I’m not running an enterprise here, just giving you football data engineering tutorials. So from now on, I’ll be supporting the .duckdb NFL data file as well as the .sqlite file. I am aware I am creating future tech debt for myself, but it’s all good.
Here’s the code I used to export the .duckdb to SQLite, just a few lines:
import duckdb
GOLD_TABLES = [
"jumbo_plays", "jumbo_plays_full", "jumbo_by_team_season",
"epa_per_dropback", "rushing_player_stats", "rushing_gap_stats",
]
con = duckdb.connect("data/football.duckdb")
con.execute("ATTACH 'data/football.sqlite' AS sqlite_db (TYPE sqlite)")
for table in GOLD_TABLES:
con.execute(f"CREATE OR REPLACE TABLE sqlite_db.{table} AS SELECT *
FROM {table}")
con.close()
print("wrote data/football.sqlite")DuckDB actually has a nifty ATTACH function for this exact scenario. And here’s a quick little script you can use to fetch data from this:
import sqlite3
con = sqlite3.connect("data/football.sqlite")
print(con.execute(
"SELECT posteam, count(*) FROM jumbo_plays GROUP BY 1 ORDER BY 2
DESC LIMIT 5"
).fetchall())Now to our next part, what we’ll be learning about today: Airflow.


