NFL Data Bowl

December 2019 - present

In December of 2019, I submitted a 12-page report for the student subcompetition of the 2020 NFL Data Bowl, a data analytics competition organized by the NFL, open to teams of undergraduate and graduate students. The student competition's problem statement was simple: using any dataset, produce a report with innovative findings regarding the run game in the NFL. That is to say, produce some data analysis that leads to new conclusions about the usage of running backs in football. In late January of 2020, the NFL announced I was selected as one of the six finalists in this competition. Next, I'll be presenting my findings in person at the 2020 Scouting Combine on February 26th. The overall winner will receive four NFL game tickets as well as $1,000 to spend in the NFL gift shop. Below is the official announcement of my selection as a finalist.

My approach in this competition involved leveraging the newly-available dataset of player tracking data for 23,000 run plays alongside a "pitch control" model—initially developed by soccer analytics researchers—to compute each team's control over every point on the field. I then used these field control values to fit a model predicting the outcome of run plays, finding that the most relevant predictor of a succesful run play is the presence of a gap in the running back's expected point of intersection with the line of scrimmage. In other words, it matters most that there's space where the running back is running, and less that space exists somewhere else where the running back could adjust his run towards. My complete report detailing my motivations, computational methods, and eventual results is avaiable in full below.

My findings have several key implications for the offensive design of football plays, and open avenues for future work. Firstly, my results are in line with what many football researchers have previously concluded: the most critical factor in creating succesful run plays is the space-creation of the offensive line. Aditionally, my results call into question the sustainability of players like Le'Veon Bell, who are known for their impressive ability to stall, change directions, and bide their time until a gap presents itself. With respect to future work, the most obvious avenue for further exploration is with an augmented dataset providing tracking data for the entirety of the play. My work inherently adopts a simplification of the running back's motion, treating it as a straight line. Complete tracking data would allow for a more realistic modeling of running back motion, and a better understanding of how field control changes as a play progresses.