Second Panera post: Fun with Stata!

Hello everyone! It’s been about a month and a half since I last posted on this site; I apologize for my tardiness, but my enrollment in a summer computer science class here at the College in June took up much more of my time than I’d anticipated. Luckily though, I’ve made some decent progress on my project!

I left off in June describing the process of obtaining vote ‘swing’ calculations for every county/local jurisdiction in the United States. As I mentioned in my last blog post (http://freshmanmonroe.blogs.wm.edu/2018/06/09/first-panera-post-summer/), I originally attempted to calculate the vote swing for each county by hand. This was tedious, and after a few hours of mindlessly subtracting vote percentages with my calculator, I decided it was time to teach myself the basics of Stata (woo!)

Stata is an incredibly useful statistical software that can be used for regression analysis. As I mentioned in my last post, the primary goal of this project is to determine if there’s a correlative relationship between the amount of Panera franchises in a county and that county’s vote ‘swing’ from 2012 to 2016. To determine that relationship, I intended to construct a very simple regression model comparing those two variables. However, while Stata can work with various different file formats, I wanted to exclusively use .csv files as I’m most familiar working with that format. As such, there were two problems that threatened to impede my research.

First, after an exhaustive search, I couldn’t locate a .csv data file online containing Panera locations by county, as the official Panera Bread website only listed franchises by city and state. Second, I still hadn’t calculated the vote swing for every county and consolidated that information into a .csv file. Before I could accomplish those two tasks, attempting to run the regression wouldn’t be feasible.

Tackling the first problem was relatively challenging for someone who’d only been tooling around with Stata for a few hours. First, I compiled a spreadsheet of each city in the United States with at least one Panera Bread location (fun fact: there are 1,477 cities in the country that have at least one Panera–Williamsburg, Virginia included). Then, using Stata, I merged that .csv spreadsheet with another .csv file (which was publicly accessible online) that delineated the cities in every county in the United States. The final .csv file that this merge command produced had exactly what I wanted: a numerical value for the amount of Paneras in every county in the United States.

Tackling the second problem was easier. I already had .csv data files that contained the vote percentages in every American county for both the 2012 and 2016 elections. All I had to do here was merge the two data sets, then create a new variable by subtracting the 2012 Republican vote share from the 2016 Republican vote share and store that information for each county in the new data set.

So now, moving forward, I have the amount of Paneras by county and the vote swing for each county. All that’s required now is actually running the regression, which I intend to do in the coming days once I learn more about using the commands in Stata to make that happen. To anyone reading, thanks for bearing with my technical ineptitude as I meander through the basics of Stata! I’ll be back with my final post of the summer in the coming weeks.

 

Comments

  1. mmpincombe says:

    Hello, Ethan! What an incredible fusion of relevant political theories and statistical analysis! Your research topic is fascinating, and the tedious work you have done to compile your data is impressive. It is exciting that you have both discovered a software tool that can make your calculations easier and learned how to use it! Hopefully the tips and tricks available within Stata as well as tutorials and instructions available online are plentiful in order to make the next stage of your research smooth.
    From the lack of available data containing either calculations of vote swings from 2012 to 2016 or Panera locations by county, it seems as though there may be a lack of development to Fallon’s theory before you began your research. Did Fallon present any evidence as to why this demographic may be the ideal target? Also, from the discussion on how to most effectively target individuals to swap partisan allegiance in upcoming elections, was there any regression analysis testing other possibilities? I am curious to see if the Panera theory holds up, and I would also be interested in seeing the strength of this theory compared to other prevalent theories (such as the political swing experienced by minority voters and women as you mentioned previously). As you prepare to run your regression analysis, it may also be interesting to look into potential confounding variables that may affect the outcome.
    Fallon’s reference to “Panera voters” could have simply been an analogy for the target demographic, but I am truly interested to see whether you find statistical backing for his theory. Good luck on the final portion of your project!

Speak Your Mind