# Second Panera post: Fun with Stata!

Hello everyone! It’s been about a month and a half since I last posted on this site; I apologize for my tardiness, but my enrollment in a summer computer science class here at the College in June took up much more of my time than I’d anticipated. Luckily though, I’ve made some decent progress on my project!

I left off in June describing the process of obtaining vote ‘swing’ calculations for every county/local jurisdiction in the United States. As I mentioned in my last blog post (http://freshmanmonroe.blogs.wm.edu/2018/06/09/first-panera-post-summer/), I originally attempted to calculate the vote swing for each county by hand. This was tedious, and after a few hours of mindlessly subtracting vote percentages with my calculator, I decided it was time to teach myself the basics of Stata (woo!)

Stata is an incredibly useful statistical software that can be used for regression analysis. As I mentioned in my last post, the primary goal of this project is to determine if there’s a correlative relationship between the amount of Panera franchises in a county and that county’s vote ‘swing’ from 2012 to 2016. To determine that relationship, I intended to construct a very simple regression model comparing those two variables. However, while Stata can work with various different file formats, I wanted to exclusively use .csv files as I’m most familiar working with that format. As such, there were two problems that threatened to impede my research.

First, after an exhaustive search, I couldn’t locate a .csv data file online containing Panera locations by county, as the official Panera Bread website only listed franchises by city and state. Second, I still hadn’t calculated the vote swing for every county and consolidated that information into a .csv file. Before I could accomplish those two tasks, attempting to run the regression wouldn’t be feasible.

Tackling the first problem was relatively challenging for someone who’d only been tooling around with Stata for a few hours. First, I compiled a spreadsheet of each city in the United States with at least one Panera Bread location (fun fact: there are 1,477 cities in the country that have at least one Panera–Williamsburg, Virginia included). Then, using Stata, I merged that .csv spreadsheet with another .csv file (which was publicly accessible online) that delineated the cities in every county in the United States. The final .csv file that this merge command produced had exactly what I wanted: a numerical value for the amount of Paneras in every county in the United States.

Tackling the second problem was easier. I already had .csv data files that contained the vote percentages in every American county for both the 2012 and 2016 elections. All I had to do here was merge the two data sets, then create a new variable by subtracting the 2012 Republican vote share from the 2016 Republican vote share and store that information for each county in the new data set.

So now, moving forward, I have the amount of Paneras by county and the vote swing for each county. All that’s required now is actually running the regression, which I intend to do in the coming days once I learn more about using the commands in Stata to make that happen. To anyone reading, thanks for bearing with my technical ineptitude as I meander through the basics of Stata! I’ll be back with my final post of the summer in the coming weeks.