Update #1: Economic Trends and Relationships in Broadway Theater and Movie Musicals

This is the first update post for my Monroe summer research project. My project is split into three main sections: data collection, calculation of summary and inferential statistics, and the final analysis and presentation development phase. I decided to post my first update because I am about to finish the data collection portion of my project. As a reminder, for my research this summer, I am analyzing public data on movie musicals and Broadway musicals to see if there is a relationship or correlation between the two. Does one form of entertainment influence the other? Which one has grown more steadily in the past 17 years, etc? These questions can be answered through exploratory data analysis.

My first step in the data collection phase was to gather available data off of the internet and put it into an Excel spreadsheet. By putting it into Excel, I now have the ability to manipulate the data and run various statistical tests. This part alone has taken me many hours to complete. I have finally finished compiling the data into two separate spreadsheets: one for Broadway data (from Broadway League), and one for movie musicals (from Box Office Mojo). This was a great learning process for me. It took a very log time, unfortunately, to compile the data, but that alone taught me a lot. I now have a much better understanding of how I should go about collecting large data-sets in the future.

My research will be based on data collected since the year 2000. I stopped collecting data at the end of the year 2017, so that I am analyzing a complete year of data. So, in the end, the data I compiled for both movie musicals and Broadway was from 2000-2017.

Now that I have collected the data, I will organize it so that it makes sense and has the unnecessary columns/variables removed. This is called QC or Quality Control. In a new excel spreadsheet, I am going to organize the data in chronological order with a consistent date format that can be easily interpreted in tables and graphs. I decided to format the date as “year-week#”. For example the week of January 14, 2001 will be understood in my data-set as 2001-03 for being the third week in 2001. The organization of this new spreadsheet will make it easier for me to use Pivot Tables and Pivot Charts in Excel. This tool is extremely helpful in data analysis, and it will help me manipulate the data and change which variables I am analyzing without having to change the data-set itself.

So far I am still in the preparation phase, but it is an important step for my research. The more I familiarize myself with the data, the easier I will be able to analyze it later. I am excited to start the statistical analysis portion of my project. Once the data is organized in a way that can be understood, I can quickly use formulas and statistical tests in excel. My next post will be after I have hopefully found some results, so I look forward to sharing that information later.


  1. klsheridan says:

    Hi, Lauren! This sounds like a really interesting project, and I’m looking forward to seeing your results!

    I was wondering if you could elaborate a little bit on the data you’ve collected – are you looking purely at the box office revenue and ticket sales, or other information?

  2. lgkohout says:

    Hi Kelsey!
    Thank you for your comment and question.
    The original data I collected contained many different values including growth from prior weeks, % change, budget, % Cap, etc. I, however, will be focusing on two main data points for both Broadway and movie musicals. In the Broadway data, I will be analyzing the total gross for each production as well as the number of people in attendance. For movie musicals, I will be analyzing the total gross for each film and the theater count, which is the number of theaters where the film was being shown.
    These pieces of data will help me look at the relationship between Broadway and movie musicals in terms of revenue, but also in terms of the growth or decline of the number of followers by looking at the attendance and theater count data.

  3. This definitely sounds like a really cool and useful project! I’m interested to see what you find.

    I know from my project that data collection can be a pain and is often harder than all the analysis. Did you just have to copy and paste directly from the websites or did you find a quicker way?

    Also this might be answered in the next update but I’m curious about how you’ll plan to compare the markets for Broadway and movies and which might influence the other. Do you have any predictions about which might influence the other?

  4. lgkohout says:

    Thank you for your comment. Yes, the data collection was a grueling process, and I am glad it is complete. Unfortunately, I was not able to find a quicker way to compile the data. I know some websites have a download option where you are able to simply press download button and then select the type of file you would like to have downloaded onto your computer; however, that was not the case for the data that I used.

    I will first start the comparison between Broadway and movies visually based on a line graph that shows the revenue of both of these industries from the year 2000 to 2017. This line graph will show the times where there were peaks and also dips in sales. I will be looking at these parts of the graph that stand out and then look by through the data and do additional research to try and figure out what was going on at that time. I am at the very start of doing that, so I have yet to see what kind of relationships I may be able to find. I initially predicted that Broadway would influence the movie musicals that are released; however, I have been doing a lot of literature research this past week, and it seems that movies actually are the “influencer” especially in the past century. Movies not only make a lot more money, but they are shown in many more theaters to a larger population of people. In addition more movies are being turned into Broadway musicals instead of vice versa (for example Frozen and Sponge Bob are now on Broadway). I am going to post my second update soon which should explain this a little clearer. I will also try to include the preliminary graphs that I have created.

    Thank you again!