The taste test : Do local beer come out on Top ?

Do Americans

By: Thibaut Stoltz, Achraf Berrada, Claire Payoux, Mathis Lettermann

Group: Datalcoholic

Press the logo

Introduction

Did you know that former President Barack Obama was the first president ever to brew a beer in the White House ? Beer is deeply ingrained in American culture, the US are ranked 15th in beer consumption per capita. With the booming craft beer industry, there are now tens of thousands of local breweries throughout the country producing interesting, unique and flavorful beers. With beer reviews, you can learn about the history and backstory of each brewery, as well as the various beers they offer and their flavor profiles. By reading reviews, you can discover which beers are best suited to your particular tastes and which local breweries are worth a visit.

There is reason to believe that investigating beer reviews will allow us to understand if American people are more likely to give better grades to local beers. We will also investigate if political tendecies impact beer liking. Our analysis will rely on almost 10 million reviews from worldwide reviews site that we will introduce you in the next section.

Reviews in the States

Taking a closer look to the dataset we observe that there is a total of 5277 breweries. However they do not seem to be the largest ones in the US. The 2nd largest brewery in the US in 2021 was MolsonCoors for instance and we observe that only 0.26% of the reviews are made from beers coming out of their breweries. It seems that the dataset gathers more reviews about craft brewed beers and won’t necessarily be representative of all beer drinkers. It's always important to know the bias you are working with.

---

Research questions

Do American reviwers give better grades to local beers?
Are some states more represented than others in term of number of reviews ?
Which part of the grading system is the most important?
Are reviews more explicits when people liked the beer?

How politics tendencies are spread within USA ?
Can we observe a correlation between beer reviews and political tendencies ?
Is there a trend over time ?

Where do the reviews come from?

To understand better the data we are working with, we will first introduce you to the data sources. The reviews come from two different websites : Beer Advocate and Rate Beer which are the two biggest beer review websites in the world.

BeerAdvocate and RateBeer provide valuable insights into the world of beer reviews. The datasets contain data on beer ratings, reviews, and other information about beer from users of the sites. This includes information about the type of beer, its style, its region of origin, its ABV, and its ratings from users. Additionally, the datasets also contain information about the user who wrote the review, such as their location, age, and gender. The will alow us to analyse how beer reviews influence the perception of a beer. For example, a comparison of ratings and reviews of a particular beer can be used to determine whether users in different regions have different opinions about the same beer. Additionally, the datasets can be used to examine how factors such as age, gender, and location affect user opinion. Information can be drawn to determine the different types of beers popular in different regions.

Pre-processing

For both datasets, we have geographical information about the country of users and breweries. Moreover, for the US based locations, we also have the name of the state. We extracted this information and createed an extra column for both breweries and users containing the US postal abbreviations for each state. We then finished processing the locations by dropping the state name in the location column. By quickly giving a look at the repartition of the users location. Some US territories have 1 review each. We therefore removed them from the dataframe since they will not be useful for our analysis. We end up with 2 dataframes named BA_US and RB_US. We also converted the column to a datetime object.

After our processing, we wanted to know the proportion of NaN values for the and columns in the dataframes. If we look at how much these NaN values are represented in the dataframes, we can see that for RB_US the NaN values represent 0.005% of the data and for BA_US they represent 0.6% of the data. Thus we can drop these rows without losing too much information. As our story concerns US reviews, we decided to get rid of the reviews made by users not living in the US.

Comes the time to merge the dataframes. We needed to normalize the data because the distributions are different between the two websites. By following the same procedure as in Lederrey-West paper, thus assuming that the inherent quality of beers being rated stays roughly constant, we performed a z-score normalization of the ratings. Then merging BA_US and RB_US dataframes using the concat() of pandas is trivial. From now on we have a dataframe called df_ratings to work with.

numbers about the data

Find out Meaningful numbers about the reviews

Number of reviews about US beers

Number of US users that made reviews

Number of 5 stars beers in the US

Extract of the reviews

Pours a hazy dark brown with huge creamy white head that fades slowly and leaves ample walls of creamy lace. The aroma is of banana bread with a touch of cocoa. The flavor follows the nose, but lacks on banana. Clove comes out with a touch of cinnamon and cocoa. The feel is creamy and smooth.Delicious and enjoyable. Easy drinking.

A spicy heffe we have here. Cloudy dark straw colored beer with thin white head that retains constantly throughout the drinking experience. It leaves nice creamy walls. Beautiful for the style.

My first dunkel weizen. I was shocked when I realized recently that there was a an entire style of beer available in Alabama that I had never tried. So, I set out to remedy that oversight and I bought this example from Tucher yesterday at Bruno's and enjoyed it last night.

Most reviewed breweries of the USA

Rogue Ales

Miller Brewing Company

Anheuser-Busch InBev

Brooklyn Brewery

Boston Beer Company (Samuel Adams)

Amongst the top 5 breweries present in the dataframe, we find some wll-known companies such as Anheuser-Busch InBev, but also craft breweries such as Rogue Ales and Brooklyn Brewery.

Some more data about the number of reviews related to the location

Comparing the number of reviews made by users from one state, to the number of reviews made about beers from that state, we can see that within the top 10 of both distributions we find the states of California, Washington, Pensilvania, New York and Massachusetts. Some of them are not surprising as they are the states with the highest population. Nonethless, these are valuable insights strenghtening our research question about a possible link between the location of the user and the location of the beer reviewed

Foreign or Local ?

To answer our research question : do Americans give better grades to local beers ? Let's perform some tests. In a first step we will divide the dataset in 2 : one composed of American reviews on american beers and another one of American reviews on foreign beers. Our strategy is to compare the 2 datasets and to apply different statistical tests on them. The null hypothesis of these tests is that both distributions come from the same probability distribution. If the p_value is less than 0.05, then the 2 distributions do not come from the same law and therefore there is an effect: The Americans give grades to American beers.

Here we plotted the number of reviews per grade for each beer aspect for both datasets:
  • Blue: Foreign beers
  • Red: Us beers
  • It seems that the distrubtions are identical. However, one or more statistical tests are necessary to be sure these distributions comes from the same probability law. It should also be noted that the distributions are approximately Gaussian, so we can use a t_test. More precisely we will use 2 different static tests (t-test, Wilcoxon test) and calculate their p-values. If this one is lower than 0.05, then the null hypothesis : The 2 samples come from the same distribution, will be rejected. We will conclude that there is a general effect that American beers are differently rated by Americans.

    The Wilcoxon test is a non-parametric statistical test that is used to compare the median of two related or paired samples. The test is based on the Wilcoxon signed-rank test, which is used to compare the medians of two related samples. The test involves calculating the difference between the pairs of observations and then ranking the absolute values of these differences. The test statistic is then calculated based on the sum of the ranks of the positive differences.

    T-test

    p-value aroma 0.265992
    p-value appearance 0.344700
    p-value palate 0.417915
    p-value taste 0.394902
    p-value overall 0.437345
    p-value rating 0.358212

    Wilcoxon test

    p-value aroma 0.227786
    p-value appearance 0.400466
    p-value palate 0.499953
    p-value taste 0.366176
    p-value overall 0.469101
    p-value rating 0.404500
    It seems that whatever the test chosen and the beer aspect, the p_value is greater than 0.05. We cannot refute the null hypothesis, and therefore these two samples could come from the same probability law. Americans do not give better ratings to their local beers.

    Sentiment analysis

    First of all we will investigate if the reviews made by users are positive or negative and if they correspond to the grade given by the user (Rating). Sentiment analysis is a method using Natural Language Processing, text analysis and other algorithms to identify, extract, quantify and study the polarity of subjective information in a given text (e.g. a positive or negative opinion). Performing a Sentiment analysis on beer reviews will allow us to see if the reviews tend to be positive or negative. We will use the TextBlob library that uses a natural language toolkit (NLTK). It uses NLTK because it is simple, easy to deploy, will use up fewer resources, gives dependency parsing, and can be used even for small applications. When a sentence is passed into Textblob it gives two outputs, which are polarity and subjectivity. Polarity is the output that lies between [-1,1], where -1 refers to negative sentiment and +1 refers to positive sentiment. Subjectivity is the output that lies within [0,1] and refers to personal opinions and judgments.

    This too is a great beer! Around Christmas time they release a 3 Litre Bottle! Once bottle is finished you can still bring it into any Stone Brewery and get it refilled!
    This particular review had a sentiment score of: 1.0. This is expected as they used the words "great" and "Christmas" which are often used as positive words. Also the use of "!" shows that the reviewer is excited about the beer. We can also notice that the quantity of the beer is mentioned in a positive way, as the reviewer is excited about the fact that they can get the beer refilled. Comparing with the garde they gave to the beer, it shows consistancy as the rating is of 5 stars. Let's investigate the relation between the sentiment score and the rating, to see if there is a correlation.
    In this figure, we observe that the distributions of sentiment scores are concentrated around the median, the maximum and the minimum are quite close to the median. However, rating scores are more spread across the median, with outliers on both sides. From observational analysis only we can not conclude that there is a correlation between the sentiment score and the rating. We will use a regression analysis to see if there is a correlation as well as statistical tests such as Pearson's correlation coefficient and Spearman's rank correlation coefficient.

    Linear Regression

    For this linear regression we decided to plot the sentiment score, calculated previously, against the rating. We used the statsmodels library to perform the regression analysis. The results of the regression analysis are shown below for 6 different beers having at leat 500 reviews (an arbitrary number chosen as the minimum significant number of reviews). In the first column we chose 3 beers amongst the top 10 most reviewed beers while in the second column we chose 3 beers with much less reviews and coming from different countries.

    Sentiment score vs Rating

    The R-squared values are really low, which means that the sentiment score does not explain the variation in the rating. The more the reviews the worse the regression model fits the data.

    Statistical tests

    Using the pearsonr function from the scipy.stats library, we can calculate the Pearson's correlation coefficient and the p-value for the sentiment score and the rating.

    p-value 0.000000
    Pearson's coefficient 0.352315


    The p-value is very low, meaning that there is a correlation between the sentiment score and the rating. Which is nuanced by the Pearson's coefficient which positive and close to 0.35, therefore the correlation is moderate. The Pearson's correlation test assumes that the data is normally distributed, which is not the case here, and a linear relationship betweem the variables.

    The assumptions of the Spearman correlation are that data must be ordinal (we can tell without doubt if a value is lower or higher than another). We are testing if the data are monotically related (the size of one variable increases as the other variables also increases, or where the size of one variable increases as the other variable also decreases).

    Spearman's coefficient 0.334857


    We obtained a Spearman coefficient of 0.33, which is close to the Pearson's coefficient. It represents a medium association between the sentiment score and the rating. A positive coefficient indicates that as the rating increases, the sentiment score also increases, even if it's a moderate effect.

    And what about local beers ?

    To validate (or invalidate) our hypothesis that users from on state prefer beers brewed in the same state, let's perform the same linear regression than before to see if the sentiment score is more related to the rating as we expect users to be more positive in their reviews.

    Sentiment score vs Rating

    The first remark we can make is that the total number of reviews is much lower than the previous regression. This is due to the fact that we only kept the reviews from users from the same state as the brewery. The second remark is that the R-squared value are still very low, meaning that the linear regressions do not fit the data.

    Performing the same two tests as before, on every review made by a user on a local beer (relative to its state), we obtain correlation coefficients in the same range as before. The p-value of the Pearson's correlation test is still very low, meaning that there is a correlation between the sentiment score and the rating.

    p-value 0.000000
    Pearson's coefficient 0.367722
    Spearman's coefficient 0.351841


    Beer Aspects

    Reviewers are asked to fill in 5 different grades : 'Aroma', 'Taste', 'Palate', 'Appearance' and 'Overall'. A rating is then done by doing the weighted average of these 5 aspects. In this section we investigate which aspect in 'Aroma', 'Taste', 'Palate' and ‘Appearance' are the most important for the overall score. This will give us insights on what aspect reviewers tend to prefer in beers. To do so we will use 2 different methods to be able to cross check our findings. The first method is a PCA or Principal Component Analysis, the second method is a multi-linear regression.

    Princpal Component Analysis

    For the PCA, we will use the 4 different aspects as features and ‘overall’ as the target. We use the PCA function from sklearn.decomposition library with 2 components. After fitting the features the to the PCA we extract the explained variance ratios this gives us the following vector :
    0.73276579, 0.12129502
    This means the first principal component explains 73.3% of the variance and the second 12.1%. We finish our PCA analysis by plotting the principal components 1 and 2.
    It seems taste is the most important aspect and appearance the least important. There is almost no difference between aroma and palate thus we cannot conclude much on which one of the two are more important.

    Multilinear Regression

    We perform a multilinear regression using the sklearn’s LinearRegression function. We apply fit the data to the regression and in order to evaluate the quality of our regression we calculate the coefficient of determination : 0.75, this is considered a good value. Thus, our model should be correct. We now simply get the coefficient of the regression and print the weights we obtain :
    0.04819957 0.14019966 0.58786571 0.17108195
    These correspond respectively to 'appearance','aroma','taste' and 'palate'. As expected the most important is taste and then followed by palate and aroma and behind is the appearance. Due to the close weights of palate and aroma, it is hazardous to state that one is more important than the other. In order to push this a little further, we computed a heat map of the correlation between the different variables.
    As we can see, taste is still the most important and appearance the least however aroma and palate have very similar scores with aroma a bit ahead this time. With this in mind, the only conclusion we can bring out is the fact that taste is most important and appearance the least important. Our followup question after finding out that taste is the most important aspect and appearance the least important, is whether there are notable changes for beers with strong taste and beers with mild taste. To do so we chose four beer styles, Pale Lager and American Adjunct Lager as beers with mild taste and Stout and India Pale Ale (IPA) as beers with strong taste. We then inspect their scores and plot a heat map of the correlation matrix for each beer style.
    We see that the two strong beer styles have better grades compared to the two beer styles with a lighter taste. We also see, from the correlation matrices, that the results from our previous correlation matrix are quite similar : taste the most important and appearance the least. Taste tends to be slightly more important for the two lighter beer styles.

    One more analysis on these four beer styles is plotting the number of reviews and the mean rating of these styles. With the first map, we can see where the reviews for each beer style come from and with the second, the average rating users in the state give to these beer styles. We could interpret these maps in different ways but there doesn’t seem to be any reliable information we could take out from these maps. Our initial objective was to investigate a potential link between the reviews and political tendencies of the reviewers. Even if these maps don’t point out something about a possible link between politics and beer reviews, we will branch out a bit further and see if we find anything interesting.

    Political Tendencies

    Here, we try to discover if there is a pattern in the reviews of beers based on the political tendencies of the American reviewers. We use our dataset containing statistics of presidential elections in the US during the period from 2000 to 2017, we have the data of the number of votes for each candidate in each state. More precisely we only use the statistics from Democratic and Democratic% parties because they are more than 95% of the votes in each states for each election. Than we want to see if there is a pattern in preferring local beers (brewed in the same state as the reviewer/in the US) to foreign beers (another state of the US/another country). We used the dataset scrapped from wikipedia about statistics in presidential elections of 2000 to 2016.

    We obviously don't have direct access to the political inclination of the reviewers, but we know the percentage of population that voted for one candidat or the other for each election. We can use this information to estimate political inclination of the reviewers based on their geographical location. If there is a pattern in the preference of local beers we shouls be able to see it in the states where there is a higher percentage of votes for one party or the other. We first need to know the percentages of votes for each party in each states.

    We find 6 states where the mean percentage of votes over the 5 elections for the Republican party is higher than 60%: Wyoming, Idaho, Oklahoma, Utah, Alabama and Nebraska. There is almost no reviews for beers brewed in these states so we cannot make an analysis at this level of locality. However the numbers are quite balances when we compare reviews from those states of beers brewed in the US and beers brewed in the rest of the world.

    We already removed the unique review of Washington DC during the pre-processings part. We still find 5 states where the mean percentage of votes over the 5 elections for the Democratic party is Higher than 60% : Hawaii, Massachusetts, New-York, Rhode Island and Vermont. Now we will try to find if there is a pattern in the reviews of beers based on the political tendencies of the American reviewers. For this we can simply separate the reviews in 2 Dataframes for each party : one with the reviews of local beers and one with the reviews of foreign beers. We then perform linear regression to see if there is a correlation on the rating of local vs foreign beers and the political tendency of the reviewers. Now that all the data is gathered the linear regression can be performed for the Blue states (Republicans).

  • WY p-value: 0.05908908717717479 confidence interval: [-0.13913697 0.00050374]
  • OK p-value: 0.7247659726982489 confidence interval: [-0.07528982 -0.01418895]
  • ID p-value: 0.951618871006537 confidence interval: [-0.07811525 -0.03316912]
  • UT p-value: 0.6231749171668464 confidence interval: [0.08216246 0.16026653]
  • NE p-value: 0.2111609862100128 confidence interval: [-0.14643203 -0.0833191 ]
  • AL p-value: 0.037036910800166245 confidence interval: [-0.08008461 -0.03260173]
  • The p-values are very high for almost all states but AL. For AL there could be a pattern, looking at the correlation coefficient that is in the bounds of negative confidence intervals, beer ratings in Alabama are more likely to be a bit higher for local beers (brewed in the US) than for foreign beers (brewed in another country). However the p-value is still very high (p > 0.05) for all other strongly republican states so we cannot say that there is a pattern for Republican states in general based on this analysis. Moreover there could be unknown cofounders like the price of the beer or the type of beer that could explain the pattern we see in Alabama.


  • HI p-value: 0.03618986125294816 confidence interval: [-0.03887986 0.09044383]]
  • MA p-value: 9.713442706946788e-07 confidence interval: [0.0613177 0.0834096]
  • NY p-value: 7.311745691626443e-06 confidence interval: [0.00766482 0.02497248]
  • RI p-value: 0.004618993287231016 confidence interval: [0.04654619 0.11169225]
  • VT p-value: 0.3485235272459348 confidence interval: [0.01631276 0.07483121]
  • Here we see that 2 of the highly democratic states (Massashussetts and New-York) have a p-value lower than 0.05. This means that there could be a pattern in the reviews of beers based on the political tendencies of the American reviewers. The correlation coefficient is positive for both states which means that this time, the ratings of foreign beers are higher than the ratings of foreign beers. This seems interesting because democrats are more likely to be more open-minded and open to foreign cultures than republicans that are more conservative. This could explain the pattern we see in the reviews of beers based on the political tendencies of the American reviewers. But there could also be other biases like the price of the beer, the type of beer or other geographical factors that could explain the pattern we see in in those 2 states. Moreover we don't find this pattern in the other highly democratic states (Hawaii, Rhode Island and Vermont) so we cannot say that there is a pattern for Democratic states in general based on this analysis.
    In the end, based on our preliminary results and the p-values, there don't seem to be a pattern beteen higher local ratings and political tendencies in the US.

    Limitations and Future work

    Overall we were not able to search for every things we wanted to do at first sight. Here are some of the limitations we faced that restricted us from doing more in the amount of time we had.

  • Consumption: one of the main aspect of our research is that the dataset contained only reviews, no consumption information. Even though we searched external datasets about this, we were not able to find a reliable one.
  • Time: we were not able to do a lot of things we wanted to do because of the time we had. A part of this point is that we have not made a lot of researches before the 2nd Milestone and we had to make up for it later.

  • Conclusion

    In this study we analyzed 2 large datasets of beer reviews from two well known websites : Rate Beer and Beer Advocate. Our aim in this case study was to discover patterns between the geographical origin of the reviewers and the beers. We had to preprocess the data a bit, especially to adapt the ratings coming from the 2 different websites to use them together in our analysis. As we wanted to discover a correlation between the location of a user and the location of the beer they rated, we performed statistical tests on the “local reviews” and the “foreign local”. We performed sentimental analysis on the data and found that as we could expect, a good sentiment score is correlated with a good rating. Then, looking more precisely about local beers, there is not a higher correlation between high rating and sentiment score for local beers than for all beers globally. In a second time, we wanted to investigate the importance of the different beer aspects. We performed PCA and Multilinear regression. We found that the taste is, in general, the most important aspect of a beer whereas appearance is the least important by far. Finally we wanted to see if we could find some patterns between the political inclination of reviewers and preference for local or foreign beers. We found no clear evidence of those kinds of patterns.

    Even if our hypothesis seemed plausible, they do not seem to come to a clear conclusion. Next time let’s just go grab a beer together instead of crushing our heads on such datasets ;)