I’m not usually one to complain about paywalls for things but today I’m going to. The world of sports data has been compromised by middlemen trying to get between you and the data. Whether it’s AWS Next Gen Stats, PFF or in this case, datagolf, someone in this world always feels the need to construct a useless internet toll booth in the form of a $20 monthly subscription that they’re praying you forget exists so they can return record profits year after year after year. Where possible, you should fight back against this.
I became enamored with the sport of golf approximately two weeks ago when I decided to drive to my mom’s house and retrieve the bag of clubs I was gifted for my 12th birthday from her attic. I have officially declared this summer to be the Summer of Golf™ and I am going to play as much golf as possible once I fix this lefty slice. With golf on my mind I thought it would be fun to create a golf themed data project for my passionate readers while we wait for football season to return. Much to my dismay, like everything else golf-related in this world, I was met with a paywall.
But, unlike country clubs across the nation, this paywall doesn’t have security guards keeping me out. There’s a nice little backdoor I can use to retrieve the golf data I desire. I’ll leave out all of the dorky details but as always, the code is on my GitHub.
I’ll give you some dorky details. Every website has photos. Let’s say you’re a pro golf website and you need to make sure a picture of Tiger Woods pops up next to his name and stats on the most recent tour event. To achieve this, a photo will be saved in a non-relational image database with an id attached. To a computer, the name Tiger Woods means nothing. A computer, however, can better process golfer #08793. Finding 08793’s stats and headshot is much quicker to achieve and allows for cross database syncing for a breadth of information to be passed to the website viewers. If we inspect the HTML of the website, we can see that the golfer ID is used for everything. So all we have to do to get these numbers is inspect the picture elements on the website.
So what we see here is that Scottie Scheffler is golfer #46046 to my developer brothers in arms over at the Pro Golf Association. If we scrape a few of these ids for some famous players, we can compare their stats to each other using Python like so:
And now you can go hit the driving range and fix your slice while your machine scrapes Shots Gained data for you using GraphQL. AND you just saved yourself $20 a month.
See you out there on the range!