Final Panera post: My thoughts on the Panera Theory

Hi everyone! It’s time for my last blog post discussing Brian Fallon’s ‘Panera Theory’. Thanks for bearing with me as I meandered through the wonderful world of Stata this summer; I’m tremendously grateful to the Charles Center for its financial support of my research, and I feel as if I’m walking away from the Freshman Monroe grant with a greater appreciation (and understanding) of statistical modeling. I certainly intend on pursuing more sophisticated research along these lines in the future so I’m very pleased I was able to get a little experience and familiarity with the statistical techniques necessary for further research. But anyways, let’s talk about Panera!

So I left off in my last blog post talking about the various spreadsheet merging commands I had to complete before I could actually run a linear regression comparing vote swing and Panera Bread locations by county. It was comparatively easy to obtain a vote swing percentage for every county, as all that required was a simple command to create a new variable equal to the GOP vote share in 2016 – GOP vote share in 2012.

As you can see here, calculating the vote share only required two lines of code (look underneath *calculating vote swing per county)

As you can see here, calculating the vote share only required two lines of code (look underneath *calculating vote swing per county)

In my last blog post, I also briefly discussed the process of obtaining a spreadsheet that contained a list of Panera Bread locations by county. As I believe it may be useful context for the final merging of the Panera/vote swing spreadsheets, I’ll discuss a little more about specifics of that process here:

(First, I merged two spreadsheets; the first contained a list of Panera Bread locations by city (there are over two thousand in the United States–I created this spreadsheet), and the second one containing listings of all the cities, counties, and zip codes in the United States. Using the 1:m merge command, I was able to obtain a single spreadsheet that matched each city in the first spreadsheet to a county in the second spreadsheet. This could have posed a problem since city names are not unique; there are several instances of Panera Bread locations in identically named cities—take Portland, Maine and Portland, Oregon for example—but merging the two spreadsheets based on a unique county identification number (called ‘fips’) instead of by city name allowed me to bypass this potential issue)

Unfortunately, I soon ran into another problem with this consolidated Panera spreadsheet. Since the merge command only matched up cities in the United States that DID have a Panera location, that left more than 35,000 cities on the uscities.dta file with a ‘null’ value for the location, rather than reporting simply that there were zero Panera Bread locations with city limits. Despite not having Panera Bread locations (how tragic!), omitting these cities from the regression analysis would skew the results significantly. Only examining cities WITH Panera Bread locations would weaken the Panera Theory’s central argument (which is that cities WITHOUT Panera Bread locations would report a larger swing towards Republican candidates; getting rid of these cities and failing to examine them would be a flawed strategy).

So I had to tinker around and figure out a way to replace the ‘null’ value with a zero; I’ll put a screen grab of the few lines of code that accomplished that here.

You can see that it took me SEVERAL attempts, but where it says 35, 337 changes made is correct!)

You can see that it took me SEVERAL attempts, but where it says 35, 337 changes made is correct!)

Once that was done, I just had to merge the Panera spreadsheet and the vote records/swing spreadsheet into one spreadsheet; then I could run a regression comparing two variables within that singular spreadsheet (phew!) I did this using a 1:m merge, merging the two spreadsheets on the basis of the unique ‘fips’ county identification number that I mentioned earlier; and lo and behold, I had the data set I’d been working towards all along!

Now, all that was left was actually running the regression: the simple linear regression I used follows the format reg [dependent variable] [independent variable]. Since I wanted to examine how the quantity of Panera Bread locations influenced vote share, I simply ran reg voteswing location. I still don’t have any formal experience working with regression tables, so I quickly found a graphical way of portraying the regression’s results..


So, it appears from this regression that there may be a negative relationship between vote share for Republican presidential candidates and the quantity of Panera Bread locations in a given county. A negative vote swing value (indicated on the y-axis) means that Donald Trump under-performed Mitt Romney in a certain county, and the trend appears to show that as a county possesses more and more Panera Bread locations (denoted by the x-axis), then vote swing becomes more negative for that county.

Obviously, there are many, many caveats to this research. I have no formal training in Stata and this was an experimental exercise, so I will not claim to disprove or prove the Panera theory. Furthermore, I am not attempting to showcase Panera Bread as the sole predictor of electoral performance. There are so many other variables in a given county that play a role in electoral outcomes (from education and occupation to race and income) that claiming Panera supersedes all of them is foolish.

But I do find it interesting that according to my attempted research, Brian Fallon’s theory may hold some weight. Donald Trump seems to have under-performed in counties with a higher prevalence of Panera Bread locations; whether that can be attributed to the franchise is doubtful, but the ramifications for Republicans’ future performance in suburban, affluent, ‘Panera’ districts may be significant.

Anyways, thanks for bearing with me as I tried to work my way through Stata this summer. I had a wonderful time delving into the field of political/statistical analysis and I look forward to similar work in the future once I’ve finished the relevant coursework.


  1. Hi! I enjoyed reading your blog posts about your project–it is such a creative idea. During the summer, I wrote a paper on the populist rises in the Democratic and Republican parties largely inspired by my curiosity about what types of votes the Democrats actually need going forth, so this project caught my attention quickly.

    I think the results of your research lead to a lot more questions. Since the counties with more Paneras were less favorable toward Trump, Democrats need to win over more Panera customers. Or could we look at it differently? Since the places with more Paneras were less favorable toward Trump, Democrats can’t win over too many more Panera-goers and should focus on other groups more.

    One point about this concept, though: I think the difference between “need” and “can get” are really important. The biggest group of voters that do not align with the Democratic party may be the group the Democrats need the most (as they are a large bloc), but that does not mean the party can actually secure those votes. If winning certain votes is not actually feasible, there is hardly a need to bother with targeting those votes. I think this tugs at the bigger question of social identity and how far from a person’s base group they are willing to go in their vote.

    I would also be curious to see the correlation between Panera customers and voting patterns as opposed to the number of Panera stores. Obviously this data is harder to get, but I think it could provide a clearer vision to the Democratic party’s base and next steps. Just like you said, there are a lot of other variables at play that take away from any sort of causal relationship. Still, I think this project was awesome and thought provoking, and I can see you worked hard and got a lot out of it. I would be really interested in understanding more of your findings if you ever chose to continue your research.