## 07 December 2010

### Dice and Information

There is a concept is statistics and the information sciences of information. Several concepts actually, as there are different types of information, but I want to focus specifically on Shannon information or Entropy. Entropy is a way of measuring the amount of variability or uncertainty in a probability distribution, and a simple way to illustrate this is with the example of a coin flip.

But first, a comment of notation, since the Blogger editor is not too equation friendly. Calculating entropy requires a logarithm function, usually denoted ln(x) or loge(x) for base-e or natural-log, and Shannon Information specifically uses a base-2 logarithm, which I denote here as log2(x). If  my equations are not clear, any mention of the log function (outside this paragraph) always means the base-2 logarithm. If you are following along with a calculator, you probably have a natrual-log button ln(x), but can calculate the base-2 log as log2(x) = ln(x)/ln(2).

 Image source, and quite interesting in itself.
Assuming a fair coin with a 0.50 probability of heads or tails, then the first step is to calculate a quantity called the Self-information or "surprisal" of all events. This is a measure of how surprising a given event is relative to the other possible events in the distribution. This less likely the event, the higher the value of its surprisal.

Surprisal is equal to -log2(p), where p is the probability of a given outcome. Calculating ...
log2(.5) = -1,
-(-1) = 1
... and the NOT so surprising result here is that heads and tails are equally surprising, with a value of 1 each.

Shannon Information is measured in "bits", the basic unit of information used in calculation by computers. To relate this to games it might help to think of one bit of information being equal to the amount of variability in the flip of a coin. Now that we have the surprisal, we can calculate the Entropy as the average or expected value of the surprisal over the entire distribution. This is p times the surprisal -log2(p) of each event, summed over all events. For this example the calculation is trivial; 0.5 times the surprisal of 1 (for heads) plus another 0.5 times a surprisal of 1 (for tails), is just 1, so a fair coin flip has 1 bit of entropy.

Here I have a table representing the information in discrete uniform distributions from 1 to N. In gaming terms this is the information in single N-sided dice, with each face of the die being equally likely as all others. I included all the values representing true polyhedral dice, and some additional values for comparison (most of these are powers of 2 or 10).
The second column p(x) gives the probability of each "face", the third the surprisal, and the forth the entropy.
Here we can see that a 2-sided die (a coin!) again has 1 bit of entropy, a 4-sided die (d4) has 2 bits, a d8 has 3 bits, and a hypothetical d16 has 4 bits, following powers of 2 as you might expect. I put in some extreme values just for fun - the final row, a one million-sided die, would have nearly 20 bits (or 20 coin-flips) of entropy.

As in the example of the fair coin, when all outcomes are equally likely, the surprisal and entropy are equal. This also maximizes the value of the entropy - meaning that if any result was more or less likely than another, the result cam only become more predictable, and the value of the entropy must be less, as will be seen in the next example.

For the second example I'm calculating the entropy of the sum of two six-sided dice. This table show the possible results from 2 to 12, the probability of each result (twice) as the odds-in-36 and a probability. Next (4th column) is the surprisal of each result, and unlike the uniform distributions this values varies with the probability of the outcome. A roll of 7 has a surprisal of 2.58 bits, and a roll of 12 (or 2) 5.17 bits; a 12 is the more surprising result, relatively speaking.
The final column is the surprisal multiplied by the probability, and these are summed to determine the Entropy at the bottom, which is 3.27.
In terms of information, a 2d6 roll is in-between the d9 and d10 rolls from the first table. This doesn't mean they are the same, but that they have a similar amount of variability.

For the third table I have calculated the entropy for some commonly used dice-rolls in games and listed them in order of increasing entropy. The 2d10- designates the difference of two ten-sided dice, as used for penetration damage in Squadron Strike.

Note the entropy of 4d6 is less than twice than of 2d6, and likewise 2d6 is not twice that 1d6. As numbers from single dice are summed, the distribution becomes less uniform, more like the bell curve of the normal distribution, is more predictable, and therefore has less entropy. If we were using two separate d6 rolls to generate a uniform random number between 1 and 36, we should expect the d36 entropy to be twice than of a d6, and it is; log2(1/36) = 5.17. We also see this with the entropy of d100 being twice that of d10.

What strikes me from this is rolls of 1d6, 3d6, and everything in-between, vary by only about 1 coin-flip of entropy, so maybe the many variations of dice used in games really don't make so much difference in terms of the variability of play.

A final note: Just because there might be more information in some combinations of dice does not mean the game takes full advantage of that variability. For instance if you are making a to-hit roll with some probability of success (hit or miss), then there is at most 1 bit of information in that result no matter what kind of dice you roll it with. There are only a full 6.64 bit of information in a d100 roll if there are 100 unique outcomes.

More:
Dice and Information, So What?