Cultured Solutions

Archives: August 2014

PART V – Turning a judgement call into a value call

Posted on

Having read the last segment, think back to the situation you might be in. Whether you are a restaurateur managing a wine list, the consummate host of many guests at a party, or just simply looking for a wine for tomorrow night’s meal, quality and price are going to be among the top factors in your buying decision. You may have a specific grape or region in mind but, eventually, the choice is all within a budget. Let’s recap the tasting scores and ‘winners’, but this time we’ll include their retail prices. This way, we may better figure out which buying decisions might be ‘worth it’ and if there are some wines to be had that are of great value.

Flight 1 (6 wines, 6 tasters)

A B C D E F
La Follette Veramonte Santa Carolina Rodney Strong Gloria Ferrer Errazuriz
CAL CHL CHL CAL CAL CHL
Chard Chard Cabsauv Cabsauv Pnoir Pnoir
17.2 11.0 13.0 16.7 16.7 14.5 AVG SCORE
win loss loss win win loss win/loss/tie
$44.95 $12.95 $18.95 $59.95 $26.95 $13.95 actual retail price

 

Flight 2 (8 wines, 10 tasters)

A B C D E F G H
Errazuriz Mx Ch. St. Jean Cakebread Maycas Limarí Chilcas Dierberg Dom. Napa  V. San Esteban
CHL CAL CAL CHL CHL CAL CAL CHL
Svblanc Svblanc Chard Chard Pnoir Pnoir Cabsauv Cabsauv
12.5 12.8 12.8 16.8 15.0 16.7 17.3 16.0 AVG SCORE
tie tie loss win loss win tie tie win/loss/tie
$15.95 $19.95 $59.95 $19.95 $17.95 $47.95 $77.95 $18.95 actual retail price

 

Comparisons, comparisons…is that price ‘worth it’?

During the tasting I also asked the judges to estimate a retail price based on their scores. However, the estimated prices for any given wine were widely varied and most quoted no higher than $30. Having a crowd that is not as well versed as professionals in market valuations of wines presents inherent bias in that consumers tend to be influenced by what they would want to pay for the bottle rather than assigning value based on the perceived quality of the wine. Consumers buy wine with their own money for their personal enjoyment versus commercial buyers that are concerned with using their establishments’ funds to resell the wine for a profit. All of them are still ‘value for money’ conscious but have different goals in mind.

For some of the comparisons, like the for the Sauvignon blanc matchup, there’s not much of an added value in paying the $4 more if quality and complexity are your angle. If in reading the tasting note, you like the stylistics of one of the wines (the Chilean wine was made in a light bodied, crisp style and the Californian was a more weighty, softer style) then you may prefer one versus the other and this could justify the difference in expense.

 

What about those situations where there was a clear favourite in the scoring? Taking a look at the red wines in both flights, California won three matches (some even by a clear 2- or 3-point lead) and practically tied in a fourth. The Californian reds retailed from approximately double to almost quadruple the price of the Chilean reds. Perhaps there are people that would consider these differences ‘worth it’, but it would difficult to gather a mass of consumers to part with that kind of money if they know there is good wine to be had for a quarter of the price. Keep in mind these Chilean wines scored at least a 13 out of 20 and one scored 16 points – very respectable! In some competitions, 13 and 16 points may earn these wines bronze or silver medals, respectively.

 

Some match-ups were heavily in favour of the Chilean wine, such as the Chardonnay matchup of Flight 2. When a wine quite outshines another based on score (by 4 points) and price (undercutting by $40), it can make potential buyers timid to buy premium priced wines. In this case, the Chilean offering delivered wholeheartedly and the Californian wine left much to be desired.*

*In its defence, I tried the Cakebread Chardonnay the night after and it did improve, perhaps by two points, but I still didn’t believe it warranted the price of $60.

 

Taste around to find great value

The wines of stellar value can sometimes be found by doing something as simple as reading. Perhaps your favourite wine columnist mentions the three or four stars they gave to a $10 bottle. The critics’ ratings are a great start; however they still represent someone else’s opinions and you may feel differently once the wine makes it into your own glass. The statistical analyses help us to see the big picture a little more clearly, but these are definitely not perfect methods – I also admit they are certainly not the most practical either. Only when we ourselves taste and compare different wines will we be truly able to appreciate what great value means. It is important to keep a small record or wine journal to help you recall those that were gems to both your palate and pocketbook. In future articles, it would be interesting to discuss how wines are even priced in the first place and what factors influence whether or not you’ll be asked to pay more (or less) for your next great find.

PART IV – Introductory methods for scoring analysis

Posted on

At face value, California edged out in front of Chile, overall. In a basic competition, scores would be totalled up or, even further, averages would be calculated to declare a winner – quite simple. But do these comparisons go far enough? Can we say with certainty that if we were to repeat these flights with other groups over and over again (and you know how we’d love to!) that California would remain on top?

 

Statistical sip #1 – Averaging and standard error

I never thought that statistics could be this much fun to work with until using them in a wine context. The number crunching becomes even more enjoyable with a refreshing glass in hand! Taking a look at the breakdown, the individual scores are below along with an average. Accompanying the average is another measure known as standard error of the mean (SEM) which is one of many ways to describe the spread, or how much each of the scores vary from one another. If the SEM was zero, the scores would be completely identical. If the scores varied greatly, then the SEM would also be much greater. I labelled each wine as either from California (CAL) or Chile (CHL).

 

 

Table 1: Flight 1 – Individual scores, average and standard error (out of 8 tasters)

A B C D E F
La Follette Veramonte Santa Carolina Rodney Strong Gloria Ferrer Errazuriz
CAL CHL CHL CAL CAL CHL
Chard Chard Cabsauv Cabsauv Pnoir Pnoir
16 11 11 17 16 15
15 10 15 15 17 14
19 12 12 16 16 15
17 11 12 16 18 14
18 9 13 5 17 15
15 10 18 10 13 14
18 15 14 19 19 18
17.1 11.3 13.4 14.4 16.5 14.9 AVERAGE
0.6 0.6 0.8 1.6 0.6 0.5 SEM

 

Table 2: Flight 2 – Individual scores, average and standard error (out of 12 tasters)

A B C D E F G H
Errazuriz Mx Ch. St. Jean Cakebread Maycas Limarí Chilcas Dierberg Dom. Napa  V. San Esteban
CHL CAL CAL CHL CHL CAL CAL CHL
Svblanc Svblanc Chard Chard Pnoir Pnoir Cabsauv Cabsauv
11 13 13 19 13 18 19 16
12 12 13 18 14 19 18 17
14 16 12 16 15 18 19 15
11 11 12 16 14 12 18 14
3 12 10 16 16 17 16 17
9 6 14 14 11 17 17 18
14 15 13 17 14 17 13 18
15 16 12 16 17 17 14 13
13 12 15 18 16 15 19 17
12 11 16 17 15 15 18 15
14 10 12 16 17 15 15 14
15 17 12 18 16 18 19 17
11.9 12.6 12.8 16.8 14.8 16.5 17.1 15.9 AVERAGE
1.0 0.9 0.5 0.4 0.5 0.6 0.6 0.5 SEM

 

With a closer look at the numbers one can tell that in some matchups Chile actually beat California. In other pairings the scores were almost tied with a difference of less than one point. So now, going beyond calculating an average for the raw scores, let’s use some rudimentary statistics to further analyze the tasting.

 

Statistical sip #2 – Corrected score, removing the outliers

You’ve often seen this in figure skating, or at least suspect it, where one or more judges score far from their peers. In wine tasting, it is possible that one or more judges are just having a great day and score a wine as though it’s made from angel tears. On the other hand, there might be another that caught a really bad cold and can’t taste anything at all, scoring the wines far below the panel average. By removing the top and bottom outliers we arrive at a more representative average. Also, notice how the SEM becomes smaller because we’ve essentially removed some of the spread. Statisticians have many ways to select which values to tag as outliers for removal, some of these methods involve mathematical complexities that should only be discussed over a fine bottle of port (and a full bottle, if one wishes to keep their companion’s attention) so we’ll just keep it simple – no method is the best method.

Table 3: Flight 1 – Corrected average and standard error (out of 6 tasters)

A B C D E F
La Follette Veramonte Santa Carolina Rodney Strong Gloria Ferrer Errazuriz
CAL CHL CHL CAL CAL CHL
Chard Chard Cabsauv Cabsauv Pnoir Pnoir
17.2 11.0 13.0 16.7 16.7 14.5 AVG SCORE
0.6 0.4 0.5 0.6 0.3 0.2 SEM

Table 4: Flight 2 – Corrected average and standard error (out of 10 tasters) 

A B C D E F G H
Errazuriz Mx Ch. St. Jean Cakebread Maycas Limarí Chilcas Dierberg Dom. Napa  V. San Esteban
CHL CAL CAL CHL CHL CAL CAL CHL
Svblanc Svblanc Chard Chard Pnoir Pnoir Cabsauv Cabsauv
12.5 12.8 12.8 16.8 15.0 16.7 17.3 16.0 AVG SCORE
0.6 0.7 0.3 0.3 0.4 0.4 0.6 0.4 SEM


Statistical sip #3 – Visualizing the spread

One can visualize the judges’ scores by simply charting the average score into a point and using bars to depict the size of that standard error we discussed earlier. Here’s one set of many possible examples:

Scores in chart format with dots representing the average score and the bars depicting the spread as a standard error. Wines A through F (left to right).

Scores in chart format with dots representing the average score and the bars depicting the spread as a standard error. Wines A through F (left to right).

Scores in chart format with dots representing the average score and the bars depicting the spread as a standard error. Wines A through  H (left to right).

Scores in chart format with dots representing the average score and the bars depicting the spread as a standard error. Wines A through H (left to right).

 

 

 

 

 

 

 

 

 

Statistical sip #4 – Testing the spread

Together, the judges’ individual scores are like repeated measurements of a wine’s quality. The spread of their scores indicates how varied they are. If we were to repeat this tasting another day we may find differences in the scores and spread. But until you gather the time and money to have another tasting, you can use statistics. This way you can determine if the averages you gathered are significantly different from one another so as to determine the winners. One such method is formally known as the ‘Student’s t-distribution’, designed by William Sealy Gosset who coincidently had been applying his statistics knowledge while working at Guinness. Without getting into too much detail, in order to use his method, one needs two objects to compare and a series of repeated measurements of each…The wine scores! Eureka!

The numerical result of this method is called the t-test statistic and it can be converted into a measure called a probability value (p-value) to tell us how much we can be sure the difference between the wine scores in a matchup is not due to mere chance. It represents the level of certainty that we can base our judgement call to determine which one wine scored higher than the other. The smaller the number, the more confident we can be. For example, a p-value of 0.05 tells us that there is a likelihood of ‘5 out of 100’ or ‘1-in-20’ that the difference in score is due to random chance, rather than one wine truly winning out. Most of the time, a 1-in-20 chance is chosen as an arbitrary cut-off. How did the wines fare?

 

Table 5: Flight 1 – Testing the difference in scores for significance (out of 6 tasters)

A B C D E F
La Follette Veramonte Santa Carolina Rodney Strong Gloria Ferrer Errazuriz
CAL CHL CHL CAL CAL CHL
Chard Chard Cabsauv Cabsauv Pnoir Pnoir
17.2 11.0 13.0 16.7 16.7 14.5 AVG SCORE
0.6 0.4 0.5 0.6 0.3 0.2 SEM
Winner CAL p<0.01 Winner CAL p<0.01 Winner CAL p<0.01 t-test p-value (*unpaired and unequal variance)

Table 6: Flight 2 – Testing the difference in scores for significance (out of 10 tasters)

A B C D E F G H
Errazuriz Mx Ch. St. Jean Cakebread Maycas Limarí Chilcas Dierberg Dom. Napa  V. San Esteban
CHL CAL CAL CHL CHL CAL CAL CHL
Svblanc Svblanc Chard Chard Pnoir Pnoir Cabsauv Cabsauv
12.5 12.8 12.8 16.8 15.0 16.7 17.3 16.0 AVG SCORE
0.6 0.7 0.3 0.3 0.4 0.4 0.6 0.4 SEM
p=0.74 (tie)  Winner CHL p<0.01  Winner CAL p<0.01 p=0.09 (tie)  t-test p-value (*unpaired and unequal variance)

* The term unpaired means that the scores are not partners of each other. In each matchup, we are tasting two separate wines. We might consider a paired test if we tasted the same wine before and after decanting, for example. The test also did not assume that the scores were necessarily going to have an equal spread (unequal variance).

 

Briefly, in Flight 1, all the scores were in favour of California, but in Flight 2 they were more varied – the Chardonnay clearly went to Chile and the Pinot noir to California. However, the judges were impartial to the Sauvignon and the Cabernet in this flight. So even though the numbers favoured California at face value the differences in the judges’ scores were just not significant enough. It could be that if we had more people to taste the wine maybe we’d have a clearer favourite.

 

So the last drop of this simple statistics exercise was that the numbers may not always be what they seem. The scores we see with reviews aren’t really meant to take into account the comparative angle. This blind comparison tasting adds a new element to not only our palate but may also add some perspective on how to inform our purchasing choices. From a sensory perspective, in using these scores we can determine whether or not paying a few or many dollars more for a wine is justified. We may even find some value gems.