## 07 December 2010

### Dice and Information

There is a concept is statistics and the information sciences of information. Several concepts actually, as there are different types of information, but I want to focus specifically on Shannon information or Entropy. Entropy is a way of measuring the amount of variability or uncertainty in a probability distribution, and a simple way to illustrate this is with the example of a coin flip.

But first, a comment of notation, since the Blogger editor is not too equation friendly. Calculating entropy requires a logarithm function, usually denoted ln(x) or loge(x) for base-e or natural-log, and Shannon Information specifically uses a base-2 logarithm, which I denote here as log2(x). If  my equations are not clear, any mention of the log function (outside this paragraph) always means the base-2 logarithm. If you are following along with a calculator, you probably have a natrual-log button ln(x), but can calculate the base-2 log as log2(x) = ln(x)/ln(2). Image source, and quite interesting in itself.
Assuming a fair coin with a 0.50 probability of heads or tails, then the first step is to calculate a quantity called the Self-information or "surprisal" of all events. This is a measure of how surprising a given event is relative to the other possible events in the distribution. This less likely the event, the higher the value of its surprisal.

Surprisal is equal to -log2(p), where p is the probability of a given outcome. Calculating ...
log2(.5) = -1,
-(-1) = 1
... and the NOT so surprising result here is that heads and tails are equally surprising, with a value of 1 each.

Shannon Information is measured in "bits", the basic unit of information used in calculation by computers. To relate this to games it might help to think of one bit of information being equal to the amount of variability in the flip of a coin. Now that we have the surprisal, we can calculate the Entropy as the average or expected value of the surprisal over the entire distribution. This is p times the surprisal -log2(p) of each event, summed over all events. For this example the calculation is trivial; 0.5 times the surprisal of 1 (for heads) plus another 0.5 times a surprisal of 1 (for tails), is just 1, so a fair coin flip has 1 bit of entropy.

Here I have a table representing the information in discrete uniform distributions from 1 to N. In gaming terms this is the information in single N-sided dice, with each face of the die being equally likely as all others. I included all the values representing true polyhedral dice, and some additional values for comparison (most of these are powers of 2 or 10).
The second column p(x) gives the probability of each "face", the third the surprisal, and the forth the entropy.
Here we can see that a 2-sided die (a coin!) again has 1 bit of entropy, a 4-sided die (d4) has 2 bits, a d8 has 3 bits, and a hypothetical d16 has 4 bits, following powers of 2 as you might expect. I put in some extreme values just for fun - the final row, a one million-sided die, would have nearly 20 bits (or 20 coin-flips) of entropy.

As in the example of the fair coin, when all outcomes are equally likely, the surprisal and entropy are equal. This also maximizes the value of the entropy - meaning that if any result was more or less likely than another, the result can only become more predictable, and the value of the entropy must be less, as will be seen in the next example.

For the second example I'm calculating the entropy of the sum of two six-sided dice. This table shows the possible results from 2 to 12, the probability of each result (twice) as the odds-in-36 and a probability. Next (4th column) is the surprisal of each result, and unlike the uniform distributions this values varies with the probability of the outcome. A roll of 7 has a surprisal of 2.58 bits, and a roll of 12 (or 2) 5.17 bits; a 12 is the more surprising result, relatively speaking.
The final column is the surprisal multiplied by the probability, and these are summed to determine the Entropy at the bottom, which is 3.27.
In terms of information, a 2d6 roll is in-between the d9 and d10 rolls from the first table. This doesn't mean they are the same, but that they have a similar amount of variability.

For the third table I have calculated the entropy for some commonly used dice-rolls in games and listed them in order of increasing entropy. The 2d10- designates the difference of two ten-sided dice, as used for penetration damage in Squadron Strike.

Note the entropy of 4d6 is less than twice than of 2d6, and likewise 2d6 is not twice that 1d6. As numbers from single dice are summed, the distribution becomes less uniform, more like the bell curve of the normal distribution, is more predictable, and therefore has less entropy. If we were using two separate d6 rolls to generate a uniform random number between 1 and 36, we should expect the d36 entropy to be twice than of a d6, and it is; log2(1/36) = 5.17. We also see this with the entropy of d100 being twice that of d10.

What strikes me from this is rolls of 1d6, 3d6, and everything in-between, vary by only about 1 coin-flip of entropy, so maybe the many variations of dice used in games really don't make so much difference in terms of the variability of play.

A final note: Just because there might be more information in some combinations of dice does not mean the game takes full advantage of that variability. For instance if you are making a to-hit roll with some probability of success (hit or miss), then there is at most 1 bit of information in that result no matter what kind of dice you roll it with. There are only a full 6.64 bit of information in a d100 roll if there are 100 unique outcomes.

More:
Dice and Information, So What? Ashley said...

Fascinating stuff. Before I post my house rules I will need to think through the probabilities that I want and are they being best served by the mechanisms I choose?

My brain feels mushy now.

Head = Ouch. :D I like to use percentile dice. Say I have a game and two of the units are soldiers with a rifle and a scope each. One has slightly newer and improved scope then the other. The new scope’s improvements are subtle. (Even a slight edge over an opponent is good.) So let’s say the guy with old scope has an 80 percent chance to hit. The guy with new one can have that better scope represented subtly be giving him a 82 percent chance to hit. It works well for my game systems to use percentiles. The players also find having actions dependency for success explained in clear and easy to conceptualize terms. A 50 percent chance to be successful is simple and clear for them. That’s the two reasons why I use percentiles.
So what this information is telling me is that the 2 percent difference is allot more subtle then I originally thought……? Perhaps to the point of being virtually meaningless…?

Dan Eastwood said...

I actually thought I had this scheduled for tomorrow morning, so I was a bit surprised to wake up to two comments. :-)

I think there is a bit more to say about this, because after all that work I'm left with the feeling that it really isn't good for anything. It is good for something (I should know) but I need to try a little harder to relate it to games. Part 2 here I come!

@Nunya B.: The difference in information between 80% and 82% success percentage will be trivial, so I don't think this will help in that comparison. There is a statistic called "number needed to treat" that might be useful though. I'll work that into part 2 (or you could Google it).

Ashley said...

Well here is another comment. Awesome serendipity occurred this week after attending a lecture by Professor Jamie Angus, which I will talk about in my next blog post. Thanks again for a most interesting and relevant post.

Dan Eastwood said...

Ashley: Now you have me curious! I'm looking forward to your post.

skiltao said...

I'm not surprised that there's so little difference in information between 2d6 and 4d6. I used to play a homebrewed fistful-of-d6 RPG, and while it was impossible to calculate the exact odds for any given roll, everybody developed a "gut feeling" for it without too much trouble.

But as for whether "the many variations of dice used in games really don't make so much difference in terms of the variability of play," well, I'm a little curious as to what you mean by "variability of play." Games are all about manipulating the odds into your favor, and the dice a game uses does tend to dictate how a player can increment those odds.

skiltao said...

does = do. Gah, grammaticality fails me at 5am.

Dan Eastwood said...

Edit: I re-wrote the paragraph describing how entropy is summed up to be more clear.