3 Reasons Why You Should Not Track the Average Star-Rating of Your Brand
September 10, 2018
Brand owners who realize the impact star-ratings have on consumer purchase intention in their category, try to improve this area.
One of the first questions we usually hear from clients is "What is the optimal star-rating?". There are many theories about what should be the optimal rating for a product. Many believe it should be above 4.0, some believe it should rather be between 4.2 and 4.6. This number would depend on many factors and should optimally be adjusted to competition (product does not seem attractive with star-rating of 4.2 if competitive alternative enjoys 4.8). Still, most agree that the star-rating below 4.0 is undoubtedly harmful to product sales and it is the minimum level to achieve in the first step.
After clients figure out what rating they want to achieve, they face another dilemma: How to synthesize information about the star-rating of hundreds of products in dozens of stores?. Unfortunately, there is one intuitive yet wrong answer they usually find first, the average.
3 Reasons Why the average star-rating metric is a mistake
Each metric must serve some purpose. So, why do we analyze star-ratings?
Most clients want to understand how strong their brand is (namely attractive for shoppers) and identify opportunities which can be fixed.
Average star-rating metrics results in unclear and often misleading results which provide neither good assessment of a situation nor give any indication of how to fix problems with underperforming products.
In order to illustrate the issue, let's have a look at example star-ratings for two brands.
Success criteria for star-rating per product: above 4.0 Green, below 4.0 Red.
1. Consumers / Shoppers never look at the average star-rating for a total brand, they have a product-by-product view.
If you want your products to stay competitive, you need to ensure they ALL (one-by-one) exceed a specific threshold of star-ratings and not simply the overall average for these products.
Example: In the above table, Brand A has 2 strong products (4.8 and 5.0 average stars) and issue with the next three (3.9). Total brand A average metric would indicate a value of 4.3, which is well accepted. This approach would not highlight any issues. The example of Brand B is the exact opposite. An average score of 3.9 would ring a bell despite only 1 out of 5 products create an issue.
2. Portfolio choices might be improperly impacted by maximizing average rating.
?Average? as a formula is sensitive to outstanding cases. Therefore, having one product with an ultra-low score or with ultra-high, may impact the score greatly and consequently mislead the analyst. Furthermore, delisting of old products with high ratings would drive the average ratings down. This may encourage e-commerce managers to artificially maintain product cards for products which are no longer available.
Example: Brand manager of Brand A may be hesitant to discontinue product 1 as it is a power horse for the average star-rating metric.
3. Average star-rating does not support differentiated star-rating objectives for different products.
Assume you decide to set a different star-rating objective for a premium line of your brand and different objective for value variants. Synthesising the data with an average star-rating is highly misleading. The average score does not inform at all whether the performance of products is satisfactory or require intervention (and where).
Example: Some products of Brand A are premium and are expected to be at least 4.5 stars (especially seeing as competitive variants are also 4.5), whereas value variants may be reviewed at 4.0. The combined average score of both product lines may be between 4.0 and 4.5, which is good for value products and not enough for premium. Does it tell us if and where we have a problem?
How to measure star-ratings properly?
We recommend the following approach:
- Understand what a minimum star-rating level is in order to stay competitive (success criteria) in your category/countries. This may be one number for all your products in all stores or you may decide to differentiate it even on product/store level.
- eStoreCheck checks if a given Success Criteria is delivered and flags product as True or False (above or below the accepted minimum level of star-rating per product).
- Dashboards calculate the percentage of products which deliver on success criteria and those that are creating issues.
Example: See color coding and line ?% of products that meet success criteria? for reference.
Benefits of the above approach?
- Looking through shoppers eyes, i.e. product per product: a clear picture of how close the brand is to research desired success criteria.
- Natural link to products that require attention (?false? on success criteria).
- Listing/delisting of a single product card seldom changes the metric significantly.
One more thing
There are two ways to calculate an average star-rating: numeric and weighted. Numeric is simply an average of the products? star-ratings, while weighted is an average of each review's star-rating, i.e. products with more review weigh more in the final value. Although both ways have limited application for commercial teams due to the issues discussed in this paper, the weighted average may be useful for teams designing products, as it quantifies and averages feedback of shoppers on a brand level.