Rethinking the value of SCRABBLE® tiles

By Joshua Lewis

Editor’s note: Originally published on his blog on December 30, Josh Lewis’s suggestion that SCRABBLE® tile values be changed to reflect current word usage has gone viral, prompting media commentary all over the world. Here we reprint Josh’s original piece; links to responses from NASPA co-president John Chew (published on the NASPAwiki) and Word Freak author Stefan Fatsis (published by Slate); and Josh’s response to John Chew’s remarks.  Stay tuned--the debate is ongoing!

When Alfred Butts invented Scrabble in 1938, he based the values and distribution of letters on the frequency of their appearance on the front page of the New York Times. Today, Butts' distribution is still the standard for English play.

What has changed in the intervening years is the set of acceptable words, the corpus, for competitive play. As an enthusiastic amateur player I've annoyed several relatives with words like QI and ZA, and I think the annoyance is justified: the values for Scrabble tiles were set when such words weren't acceptable, and they make challenging letters much easier to play.

So what would a modern distribution look like? To find out, I've developed an open source package called Valett for determining letter valuations in word games based on statistical analyses of corpora. In addition to calculating the frequency of each letter in a corpus, Valett calculates the frequency by word length and the incoming and outgoing entropy for each letter's transition probabilities. One can then weight these properties of the corpus based on the structure of the game and arrive at a suggested value for each letter.

For Scrabble, Valett provides three advantages over Butts' original methodology. First, it bases letter frequency on the exact frequency in the corpus, rather than on an estimate. Second, it allows one to selectively weight frequency based on word length. This is desirable because in a game like Scrabble, the presence of a letter in two- or three-letter words is valuable for playability (one can more easily play alongside tiles on the board), and the presence of a letter in seven- or eight-letter words is valuable for bingos. Finally, by calculating the transition probabilities into and out of letters it quantifies the likelihood of a letter fitting well with other tiles in a rack. So, for example, the probability distribution out of Q is steeply peaked at U, and thus the entropy of Q's outgoing distribution is quite low.

Intuitively, I've long felt that letters like Z and X were overvalued in Scrabble, especially X since it is prevalent in the two-letter word list: XI XU AX EX OX. In contrast, V and C seem undervalued, with no two-letter words. Using Valett with an even weighting of letter frequency, frequency by length, and transition entropy I've generated a new value distribution that roughly matches my intuition:

A: 1  B: 3  C: 2  D: 2  E: 1  F: 3  G: 3  H: 2  I: 1  J: 6  K: 4  L: 2  M: 2

N: 1  O: 1  P: 2  Q: 10 R: 1  S: 1  T: 1  U: 2  V: 5  W: 4  X: 5  Y: 3  Z: 6

Note: This distribution is calculated with TWL06. G drops from 3 to 2 using SOWPODS.

For the weighting of frequency by length, I most heavily favor two-letter words, three- and seven-letter words, and eight-letter words, in that order. Incoming and outgoing entropy are weighted evenly.

There are several things I like about this new distribution. Looking at the statistics, Q is clearly an outlier both in frequency and entropy, and in this distribution it is also an outlier in value. V bumps up to five points to match X, and close to J and Z, which have dropped to six. U, as the most challenging vowel, jumps up to two points. Overall there is downward pressure on the valuations to keep the justified separation from Q at ten points.

Most mysteriously to me, C drops to two points despite its absence on the two-letter word list. G jumping to three points is also surprising, though it stays at two using the SOWPODS corpus instead of TWL06. (As a side note, it's nice that the distribution changes are minor from TWL06 to SOWPODS, as they should be for word lists based on the same language.)

While this distribution is interesting, I'm not suggesting that it's the most justified one. I'm an amateur player and my perspective on the relative importance of frequency vs. transition entropy and frequency at various word lengths is informed by my imperfect knowledge of the game. By publishing the code, which easily allows one to set all the weights, I hope to enable a data-driven discussion around letter valuation in Scrabble.

More broadly, I think Valett can provide the foundation for answering other interesting questions in word games, such as how to quantify the difficulty of Boggle boards (perhaps useful in a tournament setting as a means of normalization). To that end I would welcome any pull requests on GitHub that add to the statistics generated from corpora, or add game-specific analyses like the included Scrabble analysis. If you'd rather not write code but have ideas regarding Valett, just drop me a line!

To read NASPA co-president John Chew’s response, “Catastrophic Outrage: A Reply to Joshua Lewis’ ‘Rethinking the value of Scrabble tiles,’  and additional comments, click here.

To read Word Freak author Stefan Fatsis’s response, published in Slate as “What is a Z Really Worth?: Why efforts to assign Scrabble tiles their ‘real value’ miss the point of the game,’ click here.

The following is Joshua Lewis’s response to John Chew’s comments:

A response to Chew's "Catastrophic Outrage"

John Chew, copresident of the North American Scrabble Players Association, recently replied to my post on Valett, which suggested altering the point values of some Scrabble tiles based on a statistical analysis of tournament-legal Scrabble words. Chew's post highlights how an intelligent and skilled Scrabble player might misunderstand what Valett does and its intent. I'd like to clarify these issues, if possible, and show how Valett's results suggest changes to Scrabble that are very much in line with its rich tradition and dynamic play.

Fundamentally there are two related but separate aspects of Scrabble: the structure of the game (its rules, board, tile distribution, word list, etc.) and the play of the game (rack composition, board position, remaining tiles in the bag, strategy, etc.). Because essentially all of the formal analysis of Scrabble (mainly via the excellent software package Quackle) is on the play of the game, John mistakenly conflates the goals of Valett and Quackle, and tile value and equity value, when they in fact deal with two separate domains of analysis: structure vs play.

Valett puts two specific aspects of the structure of the game in harmony: the word list and tile values. One could also revisit tile distribution, as John suggests, but that relationship is a bit less formal since you need to artificially limit the number of S tiles.

We could run Quackle on a version of Scrabble where its structure (tile point values) have been changed as per Valett's suggestions, and you'd end up with different equity values. So the concept of equity value and tile value are closely related (tile value partially determines equity value), but they're absolutely not interchangeable. Tile value is a basic component of the structure of the game, and equity value is a calculated value based on extensive simulation of game play taking into account rack composition, board position, and so forth. John ignores the distinction between these values in his reply, "Given that the 'I' currently has a face value of 1, if you wanted to create a 'fair' SCRABBLE game that didn't penalize players for drawing an 'I', you'd want to increase the face value by 2 points to 3." Changing some tile values in Scrabble wouldn't alter equity values in the direct manner Chew suggests.

Later on, Chew mischaracterizes Valett's tile frequency by word length analysis, "Valett's requirement that you specify the rates at which words of each length are played is also problematic..." When weighting frequency by length, Valett is not making any assumptions about how frequently words of those lengths are played, but rather how important words of different lengths are given the rules of Scrabble. Two- and three-letter words are important in Scrabble because it is a crossword game, and the presence of a letter in a two- or three-letter word makes that letter easier to hook off of existing letters on the board. Similarly, because there is a substantial bonus for playing all the tiles in one's rack (which has seven tiles), seven- and eight-letter words are particularly powerful. Without saying anything about how they might be played, Valett allows one to weight the frequency of letters in these word lengths more highly than at less notable lengths like five or twelve.

Chew is also a bit disingenuous here, "If you did this, you'd reduce a little bit of the luck of the draw, but at the same time you'd be reducing the skill involved in recognizing which tiles are good or bad and playing accordingly. You'd end up with a game that was a little closer to just rolling a die to determine the winner." In fact, reducing the luck of the draw by aligning tile values more closely with the word list (and causing the related changes in equity value and game strategy) would make the game less about luck and more about skill, and therefore further from just rolling a die to determine the winner.

Tournament players benefit from a system with a little less luck because it makes tournaments more accurate. While something like Elo is reliable in a game with a lot of luck when it draws from a very large sample of games, in tournaments only a certain number of games can be played due to time and stamina constraints. So the more luck in the game, the less accurate tournaments are in determining who's the best.

Now, one might object that if I want to reduce the luck in the game, why aren't I suggesting to remove the blanks, or have a computer distribute tiles to player's racks fairly. Well, I like Scrabble, and a game without blanks isn't Scrabble. Alfred Butts clearly went to a lot of trouble to get the structure of the game to reflect English use in the 30s, but he wanted the game to have entertaining aspects like blank tiles and the luck of the draw. Slightly modifying tile values to track changes in the set of legal Scrabble plays respects this tradition and maintains the lucky elements of the game, such as drawing a blank, and the unlucky, such as drawing all vowels.

Valett is an attempt to keep the intentional luck in the game, and remove the unintentional luck that has crept in over time as the use of English has changed. I hope Chew and I might see eye to eye on that goal, and I look forward to further spirited discussion.

Joshua Lewis is a postdoctoral scholar at the UC San Diego Cognitive Science Department, where he received his Ph.D. in 2011. His research investigates the role of human perception and insight in the data analysis process. Joshua is also a cofounder of Ost with Galen Wolfe-Pauly. Ost connects to your online services, such as Dropbox and Twitter, and puts them into spaces where you control what you see and share.