08 October 2009

Probability versus Odds

I keep referencing the odds and log-odds as ways to express probability, so it's worth the time to explain this concept by itself. In the future I can reference this post rather than re-explain the same idea every time.

Probabilities are numbers between zero and one [0,1]. This is sometimes also expressed as a percentage between 0% and 100%, but I percentages are sometimes used to represent proportions less-than zero or greater-than one, so I generally present probabilities as a number between zero and one to avoid that confusion (and it you ever teach intro-stats, it IS a confusion for some).

Odds are another way of expressing probability. For some event A that occurs with probability p, the "odds of A" are the ratio of (the probability of) A happening to A not-happening, so the odds of event A are p/(1-p). The odds transform a probability p between zero and one into a number that is between zero and positive infinity, and can represent any probabilities except for zero and one exactly. Fortunately, this isn't much of a limitation, because random events that never occur or always occur are not really random.

The odds are also often expressed as the ratio of two whole numbers. For example: if the probability of event A is p=0.6, then the odds of A are 0.6/(1-0.6) = 0.6/0.4 = 1.5. In whole numbers, 1.5 is equal to 3/2, and the odds are expressed as "3-to-2" odds, or sometimes just "3:2 odds". It's OK to skip the whole number step and just express the odds as "1.5-to-1" or "1.5:1" or just plain old "1.5". (Some people just don't like fractions I guess.)

Odds ratios are the ratio of two odds (I bet you didn't need me to tell you that). These might also be important, but not for today. Maybe I will come back and fill this in later.

Statistics are what I do to fund my gaming habit. Totally unrelated, I just thought I would throw this in for fun. :-)

Now, statisticians like to do regression models, and that usually means fitting a line equation to data that may range between negative and positive infinity. Numbers that are probabilities or odds present a problem because they have limited ranges, and regression models fit so nicely.
Enter the logarithmic transform, that funny little button on your calculator that most everyone learned about in school and promptly forgot about because they never use it. Logarithms transform numbers between zero and positive infinity to numbers between negative and positive infinity. They have some other nice properties too, like changing equations that are a series of multiplications into a series of sums, that are often easier to deal with mathematically.

Taking the logarithm of the odds turns this number into something statisticians know well. This facilitates regression models predicting the probability of an event occurring in much the same way as we might create any other regression model. Usually we use the natural log (log base e) for this, but the base doesn't matter too much. There are other functions that can also be used for this purpose (i.e.: Probit), but that is a tale for another day.

Back to the example: We started with a probability of p=0.6, which gave an odds of 1.5. The log-odds are then log(1.5)=0.4055. The charts below show the relationships between probability, odds, and log-odds.


This chart isn't very useful, because on a linear (our usual) scale the odds are relatively "flat", and then the explode to infinity as the probability of success approaches one.



Here is the same chart with the Y-axis changed to a logarithmic scale. Here is it easy to see what the odds are doing on the low end of the scale, and the symmetry of the relationship is clear.


Now the log-odds. Surprise! (well, maybe not.) This chart is identical to the last. All I have done is to switch the Y-axis back to our familiar linear scale, and substitute the natural log of the odds in place of the odds. Six of one, or a half-dozen of the other.

It is interesting to note that for probabilities between 0.25 and 0.75, the log-odds are nearly a straight line on this graph. In this range you can use a simple no-calculator conversion as a pretty good approximation between the two: log-odds = 4*(p-0.25) - 1, and p = 0.25 + (log-odds + 1)/4.
GBR Giant Battling Robots Favicon

2 comments:

Paint it Pink said...

Lovely exposition of what can otherwise be a rather dry topic. So good in fact I saved the page off for later off-line reference. Thank you.

Dan Eastwood said...

Thank you. I try to keep this blog from turning into a statistics textbook, but when I'm talking about things that have mathematical definitions I need a bit of that. It seemed worth the effort to do this right.