Update #1: Economic Trends and Relationships in Broadway Theater and Movie Musicals

This is the first update post for my Monroe summer research project. My project is split into three main sections: data collection, calculation of summary and inferential statistics, and the final analysis and presentation development phase. I decided to post my first update because I am about to finish the data collection portion of my project. As a reminder, for my research this summer, I am analyzing public data on movie musicals and Broadway musicals to see if there is a relationship or correlation between the two. Does one form of entertainment influence the other? Which one has grown more steadily in the past 17 years, etc? These questions can be answered through exploratory data analysis.

My first step in the data collection phase was to gather available data off of the internet and put it into an Excel spreadsheet. By putting it into Excel, I now have the ability to manipulate the data and run various statistical tests. This part alone has taken me many hours to complete. I have finally finished compiling the data into two separate spreadsheets: one for Broadway data (from Broadway League), and one for movie musicals (from Box Office Mojo). This was a great learning process for me. It took a very log time, unfortunately, to compile the data, but that alone taught me a lot. I now have a much better understanding of how I should go about collecting large data-sets in the future.

My research will be based on data collected since the year 2000. I stopped collecting data at the end of the year 2017, so that I am analyzing a complete year of data. So, in the end, the data I compiled for both movie musicals and Broadway was from 2000-2017.

Now that I have collected the data, I will organize it so that it makes sense and has the unnecessary columns/variables removed. This is called QC or Quality Control. In a new excel spreadsheet, I am going to organize the data in chronological order with a consistent date format that can be easily interpreted in tables and graphs. I decided to format the date as “year-week#”. For example the week of January 14, 2001 will be understood in my data-set as 2001-03 for being the third week in 2001. The organization of this new spreadsheet will make it easier for me to use Pivot Tables and Pivot Charts in Excel. This tool is extremely helpful in data analysis, and it will help me manipulate the data and change which variables I am analyzing without having to change the data-set itself.

So far I am still in the preparation phase, but it is an important step for my research. The more I familiarize myself with the data, the easier I will be able to analyze it later. I am excited to start the statistical analysis portion of my project. Once the data is organized in a way that can be understood, I can quickly use formulas and statistical tests in excel. My next post will be after I have hopefully found some results, so I look forward to sharing that information later.


  1. klsheridan says:

    Hi, Lauren! This sounds like a really interesting project, and I’m looking forward to seeing your results!

    I was wondering if you could elaborate a little bit on the data you’ve collected – are you looking purely at the box office revenue and ticket sales, or other information?

  2. lgkohout says:

    Hi Kelsey!
    Thank you for your comment and question.
    The original data I collected contained many different values including growth from prior weeks, % change, budget, % Cap, etc. I, however, will be focusing on two main data points for both Broadway and movie musicals. In the Broadway data, I will be analyzing the total gross for each production as well as the number of people in attendance. For movie musicals, I will be analyzing the total gross for each film and the theater count, which is the number of theaters where the film was being shown.
    These pieces of data will help me look at the relationship between Broadway and movie musicals in terms of revenue, but also in terms of the growth or decline of the number of followers by looking at the attendance and theater count data.

Speak Your Mind