Looking at 5 point rating scales: TripAdvisor, Google Reviews, and Yelp
How did we get here?
Using rating scales to measure attitudes goes back about 90 years. In the 1930's, social psychologist Rensis Likert refined the idea of using a fixed set of possible responses to survey questions that were designed to measure attitudes. Today "Likert-like" scales are ubiquitous in surveys and consumer reviews. Most often, we see 5 point rating scales used in these applications, like in TripAdvisor, Yelp, and Google Reviews.
One of the pitfalls of these scales is that labels attached to the scales can affect how we interpret the data from these ratings. Back in the day, Likert addressed this concept (ordinal v interval scales), but in 2022 people often don't consider this. Let me explain by taking a closer look at the 5 point rating scales for TripAdvisor, Yelp, and Google Reviews.
COMPARING THE RATING SCALES
The TripAdvisor review scale consists of 5 "bubbles". When you go to review a hotel or restaurant and choose how many bubbles to fill in, you can see the labels attached to them:
- Filling in all 5 bubbles is labeled "Excellent",
- 4 bubbles, "Very Good"
- 3 bubbles, "Average"
- 2 bubbles, "Poor"
- 1 bubble, "Terrible"
Starting at the 3 bubble, or "Average" rating, if we go 1 rating step better, we get to "Very Good" (4 bubbles filled in); 2 rating steps better gets us to "Excellent" (all 5 bubbles filled in). So far, so good. But now let's go in the other direction. Starting again at the 3 bubble "Average", if we move 1 rating step worse, we get to "Poor" (2), and then another step worse gets us to "Terrible" (1 bubble filled in).
Do these labels make sense? Or more importantly, do the strengths of the words match up with the number values of the "bubble" scale? This is the problem; the increments between bubbles (exactly 1) do not equal the "word value" increments in the word descriptive labels assigned to them. Very Good is not the same "distance" (positively) from Average as Poor is (negatively). Perhaps more problematic is the difference in the strengths of the negative ratings (Poor and Terrible) versus the positive ratings (Very Good and Excellent).
Why is this important?
A hotel's average (mathematical average, or mean) TripAdvisor bubble score is a key metric that many consumers look at when deciding where to stay. But, how accurate is that mean score when the labels assigned to the bubble values are at mis-matched intervals? And, how many TripAdvisor users actually consider the labels when rating a business? Certainly some, but we really don't know how many, and so making mathematical calculations on the numeric bubble values is very messy.
Google Reviews also uses a 5 point rating scale, simply presented as stars, with no descriptive label attached to them. This makes for a cleaner mathematical analysis; a restaurant's mean star score is straightforward.
Finally, in the Yelp ecosystem, their 5 point scale is also presented as stars. They provide the following descriptive labels for each rating;
- 5 stars = Great,
- 4 = Good,
- 3 = OK,
- 2 = Could've been better,
- 1 = Not good.
Compared to TripAdvisor, I think the "distance", or interval, between these labels is better and so a Yelp mean star score is cleaner, from this perspective.
I like Google's approach of not providing labels best.
The idea of 5 point, or star, or bubble rating scales have become ubiquitous in our society. Since 90+% of consumers are looking at reviews before purchasing goods and services (including hospitality), almost everyone has developed their own personal rating algorithm as they subconsciously match up their experiences to reviews. And, since it's impossible to know how many TripAdvisor or Yelp reviewers consider, or even pay attention to, the labels provided, I think TripAdvisor and Yelp are just muddying the waters.
I've just scratched the surface of this topic here. If you want a little bit of a deeper dive, please watch or listen to this 16 minute episode of our podcast, Feedback Matters: