First Update on College Football Recruiting Analysis

The necessary preliminary procedure for any data analysis project is data collection and compilation. So it is for my project, which is to determine the impact of distance from home on college football player performance. Fortunately, the website has a fairly complete database of recruits, as well as their rating, position, hometown, and college. Unfortunately, the website has this information in a very unfriendly format for data scraping. The major challenge of this first segment was implementing a program that took those attributes from the website and created a spreadsheet with this information in a workable format (Microsoft Excel). I successfully implemented this program and created the necessary databases by utilizing the BeautifulSoup module in Python to scrape the data from the website, as well as the xlwt package to write it out to Excel.

I also needed to create a spreadsheet with data on all the colleges that compete in NCAA Division I FBS football. This was an easier bout of data scraping from a table on Wikipedia. I used these spreadsheets together with the geopy package in Python to obtain the distance in miles from each recruit’s hometown to their college.


In essence, I have created my independent variable—distance from home to college. The next step is to create my dependent variable—player performance. Once that is finished, I will analyze the relationship and come to a conclusion!


  1. eclawrence says:

    Nice post, I’m interested to see the results. Distance from home must effect every player differently so I hope you’re able to find a good trend. If you do find a negative correlation between distance from home and performance, it will be cool to see what colleges are outliers, and if players adapt after their first year. How are you comparing performance in high school vs college? Do the stats match up pretty directly or on average do players’ stats go down?

  2. cmahlbacher says:

    Thanks for your response. There is not a good way to directly compare high school stats and college stats because it is often very difficult to come by high school stats. Whereas there are databases devoted to cataloging college football stats. The method I have for comparing high school performance to college performance is based on recruiting ratings. Four recruiting services(Rivals, ESPN, 247, and Scout) evaluate high school prospects and rate them for their skills. I am using the composite of these four services as the measure of a high school player’s skill. Previous studies have shown strong correlation between the caliber of recruits on a college team and that college team’s success. Although this seems obvious, this serves as a confirmation that recruiting ratings can be used as a metric for high school players’ ability.

Speak Your Mind