Interested in analyzing the play-by-play football stats? I pushed some code to GitHub to help you parse the plays and output them to your choice of CSV (for use in a spreadsheet), JSON, or a SQL database. You can check out the code (and a brief README) here. The repository also contains all the source code of my play-by-play effectiveness webapp.
How to Parse Plays to CSV
If you don’t need the data in a SQL database, you can quickly parse the plays to a CSV file (comma-separated values). You could then open this file in a spreadsheet application or read it into the scripting environment of your choice (R, MATLAB, Python, etc.).
I created a specific branch of the repository that will parse the plays without any dependencies on database libraries (i.e. SQLAlchemy). Assuming you’re on Linux and have Git installed, just clone the “nodatabase” branch of the repository to get started like this:
$ git clone --branch=nodatabase https://github.com/10flow/playbyplay.git
Then you can run the parser script:
$ cd playbyplay/play-parser/ $ ./play_parser.py
After a minute or two of processing, you’ll have a “output.csv” ready to play with.
How to Ingest Plays into a Database
If you want to use a SQL database, I’ll assume you have one installed. If not, I’ll refer you here for Ubuntu’s guide to MySQL.
My ingest script requires SQLAlchemy, a popular database library for Python. This is easy to install with the “easy_install” package manager. If you don’t have easy install, you can install it on Ubuntu as follows: “sudo apt-get install python-setuptools”. Next, get SQLAlchemy like this:
$ sudo easy_install SQLAlchemy
Now you’re ready to download the code from GitHub. Grab the master branch if you’re planning on using a database:
$ git clone https://github.com/10flow/playbyplay.git
Before you run the parser script, there’s a few things to do. First, specify your database information in “play_parser.py” on line 28. And next, you’ll need to create the database before you can insert the data. For example, if your database connection string (line 28) looks like this:
database_engine = create_engine('mysql+mysqldb://root:password@localhost/playtest', echo=False)
Then create the database like this:
$ mysql -u root -p mysql> create database playtest; Query OK, 1 row affected (0.06 sec) mysql> exit
Now you’re ready to run the script:
$ cd playbyplay/play-parser/ $ ./play_parser.py
Inserting the plays will take a long time. But once it’s done, you can issue queries this:
$ mysql -u root -p mysql> use playtest; Database changed mysql> select result,count(result) from plays where season=2012 and result!='play' group by result; +--------+---------------+ | result | count(result) | +--------+---------------+ | downs | 209 | | fg | 855 | | fumble | 184 | | half | 185 | | int | 378 | | loss | 61 | | miss | 193 | | ot | 14 | | other | 154 | | punt | 2480 | | td | 1269 | | tie | 2 | | win | 148 | +--------+---------------+ 13 rows in set (0.02 sec)
And now you’re ready to analyze.
Let me know if you come with any interesting stats in time for the big game!