Introduction
Did you know that former President Barack Obama was the first president ever to brew a beer in the White House ?
Beer is deeply ingrained in American culture, the US are ranked 15th in beer consumption per capita.
With the booming craft beer industry, there are now tens of thousands of local breweries throughout the country producing interesting, unique and flavorful beers.
With beer reviews, you can learn about the history and backstory of each brewery, as well as the various beers they offer and their flavor profiles.
By reading reviews, you can discover which beers are best suited to your particular tastes and which local breweries are worth a visit.
There is reason to believe that investigating beer reviews will allow us to understand if American people are more likely to give better grades to local beers.
We will also investigate if political tendecies impact beer liking. Our analysis will rely on almost 10 million reviews from
worldwide reviews site that we will introduce you in the next section.
Reviews in the States
Taking a closer look to the dataset we observe that there is a total of 5277 breweries. However they do not seem to be the largest ones in the US. The 2nd largest brewery in the US in 2021 was MolsonCoors for instance and we observe that only 0.26% of the reviews are made from beers coming out of their breweries. It seems that the dataset gathers more reviews about craft brewed beers and won’t necessarily be representative of all beer drinkers. It's always important to know the bias you are working with.Research questions
Do American reviwers give better grades to local beers?
Are some states more represented than others in term of number of reviews ?
Which part of the grading system is the most important?
Are reviews more explicits when people liked the beer?
How politics tendencies are spread within USA ?
Can we observe a correlation between beer reviews and political tendencies ?
Is there a trend over time ?
Where do the reviews come from?
To understand better the data we are working with, we will first introduce you to the data sources. The reviews come from two different
websites : Beer Advocate and
Rate Beer which are the two biggest beer review websites in the world.
BeerAdvocate and RateBeer provide valuable insights into the world of beer reviews.
The datasets contain data on beer ratings, reviews, and other information about beer from users of the sites.
This includes information about the type of beer, its style, its region of origin, its ABV, and its ratings from users.
Additionally, the datasets also contain information about the user who wrote the review, such as their location, age, and gender.
The will alow us to analyse how beer reviews influence the perception of a beer. For example, a comparison of ratings and reviews of a particular beer can be used to determine whether users in different regions have different opinions about the same beer.
Additionally, the datasets can be used to examine how factors such as age, gender, and location affect user opinion.
Information can be drawn to determine the different types of beers popular in different regions.
Pre-processing
For both datasets, we have geographical information about the country of users and breweries. Moreover, for the US based locations, we also have the name of the state. We extracted this information and createed an extra column for both breweries and users containing the US postal abbreviations for each state. We then finished processing the locations by dropping the state name in the location column. By quickly giving a look at the repartition of the users location. Some US territories have 1 review each. We therefore removed them from the dataframe since they will not be useful for our analysis. We end up with 2 dataframes namedBA_US
and RB_US
.
We also converted the column to a datetime object
.
After our processing, we wanted to know the proportion of
NaN
values for the and columns in the dataframes.
If we look at how much these NaN
values are represented in the dataframes, we can see that for RB_US
the NaN
values represent 0.005% of the data and for BA_US
they represent 0.6% of the data. Thus we can drop these rows without losing too much information.
As our story concerns US reviews, we decided to get rid of the reviews made by users not living in the US.
Comes the time to merge the dataframes. We needed to normalize the data because the distributions are different between the two websites. By following the same procedure as in Lederrey-West paper, thus assuming
that the inherent quality of beers being rated stays roughly constant, we performed a z-score normalization of the ratings. Then merging
BA_US
and RB_US
dataframes using the concat()
of pandas
is trivial.
From now on we have a dataframe called df_ratings
to work with.
numbers about the data
Find out Meaningful numbers about the reviews
Number of reviews about US beers
Number of US users that made reviews
Number of 5 stars beers in the US
Extract of the reviews
Most reviewed breweries of the USA
Rogue Ales
Miller Brewing Company
Anheuser-Busch InBev
Brooklyn Brewery
Boston Beer Company (Samuel Adams)
Some more data about the number of reviews related to the location
Foreign or Local ?
To answer our research question : do Americans give better grades to local beers ? Let's perform some tests. In a first step we will divide the dataset in 2 : one composed of American reviews on american beers and another one of American reviews on foreign beers. Our strategy is to compare the 2 datasets and to apply different statistical tests on them. The null hypothesis of these tests is that both distributions come from the same probability distribution. If the p_value is less than 0.05, then the 2 distributions do not come from the same law and therefore there is an effect: The Americans give grades to American beers.
The Wilcoxon test is a non-parametric statistical test that is used to compare the median of two related or paired samples. The test is based on the Wilcoxon signed-rank test, which is used to compare the medians of two related samples. The test involves calculating the difference between the pairs of observations and then ranking the absolute values of these differences. The test statistic is then calculated based on the sum of the ranks of the positive differences.
T-test
p-value aroma 0.265992p-value appearance 0.344700
p-value palate 0.417915
p-value taste 0.394902
p-value overall 0.437345
p-value rating 0.358212
Wilcoxon test
p-value aroma 0.227786p-value appearance 0.400466
p-value palate 0.499953
p-value taste 0.366176
p-value overall 0.469101
p-value rating 0.404500
Sentiment analysis
First of all we will investigate if the reviews made by users are positive or negative and if they correspond to the grade given by the user (Rating). Sentiment analysis is a method using Natural Language Processing, text analysis and other algorithms to identify, extract, quantify and study the polarity of subjective information in a given text (e.g. a positive or negative opinion). Performing a Sentiment analysis on beer reviews will allow us to see if the reviews tend to be positive or negative. We will use the TextBlob library that uses a natural language toolkit (NLTK). It uses NLTK because it is simple, easy to deploy, will use up fewer resources, gives dependency parsing, and can be used even for small applications. When a sentence is passed into Textblob it gives two outputs, which are polarity and subjectivity. Polarity is the output that lies between [-1,1], where -1 refers to negative sentiment and +1 refers to positive sentiment. Subjectivity is the output that lies within [0,1] and refers to personal opinions and judgments.
Linear Regression
For this linear regression we decided to plot the sentiment score, calculated previously, against the rating. We used the statsmodels library to perform the regression analysis. The results of the regression analysis are shown below for 6 different beers having at leat 500 reviews (an arbitrary number chosen as the minimum significant number of reviews). In the first column we chose 3 beers amongst the top 10 most reviewed beers while in the second column we chose 3 beers with much less reviews and coming from different countries.Sentiment score vs Rating
Statistical tests
pearsonr
function from the scipy.stats
library, we can calculate the Pearson's correlation coefficient and the p-value
for the sentiment score and the rating.
p-value 0.000000
Pearson's coefficient 0.352315
The p-value is very low, meaning that there is a correlation between the sentiment score and the rating. Which is nuanced by the Pearson's coefficient which positive and close to 0.35, therefore the correlation is moderate. The Pearson's correlation test assumes that the data is normally distributed, which is not the case here, and a linear relationship betweem the variables.
The assumptions of the Spearman correlation are that data must be ordinal (we can tell without doubt if a value is lower or higher than another). We are testing if the data are monotically related (the size of one variable increases as the other variables also increases, or where the size of one variable increases as the other variable also decreases).
Spearman's coefficient 0.334857
We obtained a Spearman coefficient of 0.33, which is close to the Pearson's coefficient. It represents a medium association between the sentiment score and the rating. A positive coefficient indicates that as the rating increases, the sentiment score also increases, even if it's a moderate effect.
And what about local beers ?
Sentiment score vs Rating
Performing the same two tests as before, on every review made by a user on a local beer (relative to its state), we obtain correlation coefficients in the same range as before. The p-value of the Pearson's correlation test is still very low, meaning that there is a correlation between the sentiment score and the rating.
p-value 0.000000
Pearson's coefficient 0.367722
Spearman's coefficient 0.351841
Beer Aspects
Reviewers are asked to fill in 5 different grades : 'Aroma', 'Taste', 'Palate', 'Appearance' and 'Overall'. A rating is then done by doing the weighted average of these 5 aspects. In this section we investigate which aspect in 'Aroma', 'Taste', 'Palate' and ‘Appearance' are the most important for the overall score. This will give us insights on what aspect reviewers tend to prefer in beers. To do so we will use 2 different methods to be able to cross check our findings. The first method is a PCA or Principal Component Analysis, the second method is a multi-linear regression.
Princpal Component Analysis
For the PCA, we will use the 4 different aspects as features and ‘overall’ as the target. We use the PCA function from sklearn.decomposition library with 2 components. After fitting the features the to the PCA we extract the explained variance ratios this gives us the following vector :Multilinear Regression
We perform a multilinear regression using the sklearn’s LinearRegression function. We apply fit the data to the regression and in order to evaluate the quality of our regression we calculate the coefficient of determination : 0.75, this is considered a good value. Thus, our model should be correct. We now simply get the coefficient of the regression and print the weights we obtain :One more analysis on these four beer styles is plotting the number of reviews and the mean rating of these styles. With the first map, we can see where the reviews for each beer style come from and with the second, the average rating users in the state give to these beer styles. We could interpret these maps in different ways but there doesn’t seem to be any reliable information we could take out from these maps. Our initial objective was to investigate a potential link between the reviews and political tendencies of the reviewers. Even if these maps don’t point out something about a possible link between politics and beer reviews, we will branch out a bit further and see if we find anything interesting.
Political Tendencies
Here, we try to discover if there is a pattern in the reviews of beers based on the political tendencies of the American reviewers.
We use our dataset containing statistics of presidential elections in the US during the period from 2000 to 2017, we have the data of the number of votes for each candidate in each state.
More precisely we only use the statistics from Democratic and Democratic% parties because they are more than 95% of the votes in each states for each election.
Than we want to see if there is a pattern in preferring local beers (brewed in the same state as the reviewer/in the US) to foreign beers (another state of the US/another country).
We used the dataset scrapped from wikipedia about statistics in presidential elections of 2000 to 2016.
We obviously don't have direct access to the political inclination of the reviewers, but we know the percentage of population that voted for one candidat
or the other for each election. We can use this information to estimate political inclination of the reviewers based on their geographical location.
If there is a pattern in the preference of local beers we shouls be able to see it in the states where there is a higher percentage of votes for one party or the other.
We first need to know the percentages of votes for each party in each states.
We find 6 states where the mean percentage of votes over the 5 elections for the Republican party is higher than 60%: Wyoming, Idaho, Oklahoma, Utah, Alabama and Nebraska.
There is almost no reviews for beers brewed in these states so we cannot make an analysis at this level of locality. However the numbers are quite balances when we compare reviews
from those states of beers brewed in the US and beers brewed in the rest of the world.
We already removed the unique review of Washington DC during the pre-processings part. We still find 5 states where the mean percentage of votes over the 5 elections for the Democratic party
is Higher than 60% : Hawaii, Massachusetts, New-York, Rhode Island and Vermont. Now we will try to find if there is a pattern in the reviews of beers based on the political tendencies of the American reviewers.
For this we can simply separate the reviews in 2 Dataframes for each party : one with the reviews of local beers and one with the reviews of foreign beers. We then perform linear regression
to see if there is a correlation on the rating of local vs foreign beers and the political tendency of the reviewers.
Now that all the data is gathered the linear regression can be performed for the Blue states (Republicans).
Limitations and Future work
Overall we were not able to search for every things we wanted to do at first sight. Here are some of the limitations we faced that restricted us from doing more in the amount of time we had.
Conclusion
In this study we analyzed 2 large datasets of beer reviews from two well known websites : Rate Beer and Beer Advocate. Our aim in this case study was to discover patterns between the geographical origin of the reviewers and the beers. We had to preprocess the data a bit, especially to adapt the ratings coming from the 2 different websites to use them together in our analysis. As we wanted to discover a correlation between the location of a user and the location of the beer they rated, we performed statistical tests on the “local reviews” and the “foreign local”. We performed sentimental analysis on the data and found that as we could expect, a good sentiment score is correlated with a good rating. Then, looking more precisely about local beers, there is not a higher correlation between high rating and sentiment score for local beers than for all beers globally. In a second time, we wanted to investigate the importance of the different beer aspects. We performed PCA and Multilinear regression. We found that the taste is, in general, the most important aspect of a beer whereas appearance is the least important by far. Finally we wanted to see if we could find some patterns between the political inclination of reviewers and preference for local or foreign beers. We found no clear evidence of those kinds of patterns.
Even if our hypothesis seemed plausible, they do not seem to come to a clear conclusion. Next time let’s just go grab a beer together instead of crushing our heads on such datasets ;)