How Consistent Are Wine Judges?

Survey shows only 30 of 65 judging panels achieved similar results

by Peter Mitham
Dr.Robert Hodgson
Dr.Robert Hodgson
Walla Walla, Wash. -- Quantitative standards guiding awards in wine competitions could help bolster the value of award-winning wines to wineries and consumers.

Writing in the current issue of the Journal of Wine Economics, Dr. Robert Hodgson documents the significant variability in decisions by judges at the California State Fair Commercial Wine Competition.

A survey of approximately 65 judging panels between 2005 and 2008 yielded just 30 panels that achieved anything close to similar results, with the data pointing to "judge inconsistency, lack of concordance--or both" as reasons for the variation. The phenomenon was so pronounced, in fact, that one panel of judges rejected two samples of identical wine, only to award the same wine a double gold in a third tasting.

Hodgson, a retired professor in the Department of Oceanography at Humboldt State University in Arcata, Calif., and proprietor of Fieldbrook Winery in Fieldbrook, Calif., became curious about the phenomenon as a result of his own experience as a judge for the California fair's competition.

"Sitting there with 30 wines in front of you, trying to make decisions and going back from one to another, I just didn't think I was very capable of it," he said. He eventually resigned as a judge, joining the competition's advisory board at the invitation of G.M. "Pooch" Pucilowski, the competition's chief judge.

While the lack of concordance among competition judges is well known, Hodgson was intrigued at the specific causes, and suggested examining judge reliability at the California competition. To do so, Hodgson developed two sets of trials that saw triplicate samples embedded in flights of wines the judges tasted during regular competitions at the California State Fair. The goal was to see if judges could replicate their results. "It turns out to be a fairly difficult task," Hodgson said.

The first trial saw wines qualified for a final judging. A single wine was given to four judges in triplicate. A sample rejected by at least half the judges would not proceed to the final judging. Ultimately, three of four judges rejected the first two of the triplicate samples. The judges accepted the third sample for the final judging, where it received a double-gold medal.

The second trial saw four triplicate samples given to each judge. These were poured from the same bottle, reducing the potential for variation among the samples each judge received. The triplicate samples were typically delivered during the second flight of the day, Hodgson explained, with a view to making it easy for the judges to reach similar conclusions.

"We decided to try to make it as easy as possible," he said. "The results show that there are some judges that are more consistent than others, but there's a lot of variation or inconsistency in general."

Hodgson made clear that the results don't necessarily reflect the skill of the judges.

"It's just a terribly difficult task," he said of the judging process. "It's not that the judges are bad judges. I think the format of having a judge taste 30 wines four times a day exceeds the limits of their abilities."

Indeed, the conditions are such that the same judges enjoying the same wines over the course of a dinner might reach different conclusions.

Pucilowski agreed. Although the wine competition's advisory board debated whether or not to let Hodgson publish the study's results, initially discussed at the August 2008 meeting of the American Association of Wine Economists in Portland, Ore., it considered the question of inconsistent results a significant issue--not just at the California fair's competition, but at competitions across the industry.

"I'm happy we did it. I'm happy we'll continue to do it. And I'd be a whole lot happier if other competitions would step up and also do it," Pucilowski said.

His support of Hodgson's work stems from an interest in identifying a baseline for judges that would reduce inconsistent results and thereby improve the professional rigor of decisions at the competition, started in 1855. "Frankly, the reason I did this is I wanted to get the best judges I could possibly get. How do I know by looking at you or by looking at your scores that you're a good judge? I have no idea," Pucilowski told Wines & Vines. "The one thing I want is consistency."

Hodgson said the question of judge reliability is important because competitions are not cheap for participants, but winning gold medals or other honors stands to significantly improve a winery's sales. Consistent results could underscore the merits of the wines receiving awards.

The question of inconsistent results has drawn attention from researchers in the past. Hodgson's own interest was stirred by a report in the California Grapevine in 2003 surveying medals won by wines at competitions in various states.

More recently, Richard Gawel and P.W. Godden examined the results of "expert wine tasters" over a 15-year period in the Australian Journal of Grape Wine Research. Gawel and Godden concluded that consistency varied greatly among individuals, but improved through the combination of scores from a small team of tasters.

On the personal side, Hodgson feels better about his own performance as a judge after seeing the results of his research.
"After doing the study, I don't think I'm any worse than anyone else," he said.
Posted on 01.27.2009 - 13:12:28 PST
I think the previous poster missed one of the key points of the summary. This isn't so much an issue of differences of opinion between judges as to quality, rather it is that the same judge cannot replicate his/her opinion.
Davis, CA USA

Posted on 01.27.2009 - 08:11:31 PST
My response to the article on wine-judge consistency (or, inconsistency) is that inconsistency is by far the most common criticism of the entire wine-judging environment. Like art, which may be in the eye of the beholder, wine quality may well be in the eye, nose and mouth of the taster. Just look at what happened when Jancis Robinson and Robert Parker disagreed on a Bordeaux that one described as near-perfect and the other as nearly undrinkable plonk. There must be standards of excellence in wine somewhere, but we don't seem to have discovered them yet. Guess we'll just have to keep on tasting and drinking until something definitive turns up!
Stamford, CT USA

Posted on 01.27.2009 - 21:44:46 PST
Retired after chairing local and international shows for 30 years. Not a great palate, I have tried to be consistent, with some success. Wines change when opened, and in the bottle over time. This is often neglected in assessing judges.As chairman I justified the confidence placed in me by assessing this factor above all.
Sydney, CA Australia

Posted on 01.28.2009 - 12:45:55 PST
This is why consensus judging and small flights of wine make so much more sense. The problem is often the sturcture of the competiiton more than the judges.
Competitions would do far better and be much fairer if the judging was done by panels and not individuals. We saw the improvement when we switched a 30-year-old competition from individual judging and no discussion, to four-person panels with a leader. It's still less than perfect, but vastly better and fairer because we try to bring a mix of experience to each panel: a winemaker, a retailer, a journalist, a sommelier. etc. When a wine is graded by four palates who must find agreement, it makes a big difference.
I would argue that had the article's researchers put those wines on panels intead of individuals, three or four times during a competition, chances are vastly greater that the panel would have spotted the similarity in the wines.
In that case, that would make for much more accurate and better judging. No?
Ann Arbor, MI USA

Posted on 01.29.2009 - 12:06:53 PST
Points worth making.

--The attack on winewriters is absurd if one praises Jancis and criticizes Parker. For whatever else he may be, he is consistent.

--Wine judging is not a science. Big competitions use lots of judges and ask them to rate hundreds of wines. Very few people could ever do that consistently, but experienced palates who are used to judging lots of wines will do it better.

--Many people, including some who have posted here, no longer will submit themselves to the agony of trying to judge hundreds of wines at a time. The dred Petite Sirah 75 at nine in the morning, or the Sparkling Shiraz as the first order of the day, were enough to send me screaming into the night.

--These various judgings are no substitute for one's own analysis. Nor is the informed opinion of Robinson, Parker, Tanzer, Laube, Connoisseurs' Guide or anyone else, but, consumers want and need help. Hence, the variety of beauty contests.
San Francisco, CA USA

Posted on 01.31.2009 - 09:50:28 PST
I agree that the selection of judges is ONE of the key issues, but it's silly and unfair to slam wine writers who judge. I judge a half dozen major and mid-size competitions in the US and occasionally abroad every year. I have judged alongside two MWs and one MSW who should never be allowed on a panel because of a lack of skill or myopic views of wines and grapes. And, I've sat side-by-side with vastly more skilled journalists. I've also been surprised by weak winemakers and admired the brilliance of several retailers. The point is that there are good judges and poor judges in all these areas of wine. Regardless of their professions, good judges are those with a vast range of experience, a nose and mouth for flaws, and an understanding of different wines. And most of all, they accept regional characteristics as they are, rather than impose one region as the benchmark on what they feel the other region should do.
Ann Arbor, MI USA

Posted on 03.03.2009 - 05:45:56 PST
Dionysus, you are missing the point of this editorial, which is on consistency of wine judges. The certain lawyer you are so sad about has shown over the years to be very consistent in his judging. You may have a dfferent preference than him for wine, but your cannot fault his consistency.
Rochester Hills, MI USA

Posted on 01.27.2009 - 10:47:24 PST
This is why I stopped judging competitions. The only consistency I've seen is that judges reward high-alcohol residual sugar wines that are stronger only in the literal sense against their peers. Wines of complexity and elegance have no chance in these marathon lineups.
Sacramento, CA USA

Posted on 01.29.2009 - 06:33:33 PST
One of the major difficulties with wine judging lies with the wine judges and competition organisers. Simply the wrong people are too frequently selected to be judges. Why choose journalists whose skills lie in putting words into the public domaine? Well, usually because the journalists make the most noise about their expertise when many can scarcely differentiate between a riesling and a rice pudding. Wine judging requires the services of people whose tasting ability is proven, not some hack who wants to see their names in self-congratulatory print. Thank goodness for the expertise of the likes of Jancis and sad the day that a lawyer could set himself up as an infallible wine expert.
Westmount, QC Canada