Ben Fulton
Conceptually, the expectation maximization algorithm is simple: you take a guess at the values you want to find, then run your calculations as though they were correct. Then, use those results to improve your guess.
What is the expectation maximization algorithm?
That article does a good job of explaining the algorithm in more detail, but I couldn’t make sense of the example. So I wrote some code to help me figure out.
The example is like this: You have two coins. They’re not exactly balanced, and you want to know how heavily weighted they are. The only information you have, though, are five sequences of ten flips each. And, you don’t even know which coin is responsible for which sequence!
Okay, let’s write some code. I’ll be working in Python for this example.
rolls = [ "HTTTHHTHTH",
"HHHHTHHHHH",
"HTHHHHHTHH",
"HTHTTTHHTT",
"THHHTHHHTH" ]
Our coins are A and B. Let’s make our initial guess that coin A comes up heads 60% of the time, and that coin B is fair. The tradition is that the unknown variables are called Theta, so:
theta = { "A":.6, "B":.5 }
We need a method to calculate the likelihood of a sequence given a weight. That’s straightforward binomial probability:
def likelihoodOfRoll( roll, prob ):
numHeads = roll.count( "H" )
n = len(roll)
return pow( prob, numHeads ) * pow( 1-prob, n - numHeads )
Now what? One easy way to a solution would be to figure out which sequences were most likely to come from which coins, then credit all the heads and tails from those sequences to those coins. Using that method, rolls 1,2, and 4 are assigned to coin A, and rolls 0 and 3 are assigned to B. 24 of the 30 flips assigned to coin A are heads, and 9 of 20 flips for coin B. So with that strategy, our new value for theta becomes { “A”:.8, “B”:.45}.
But we can do better. Rather than assign an entire sequence to a single coin, we’ll use the calculated probabilities to weight the sequence. So – and this is a tricky bit – we have to compare the calculated probabilities to each other rather than looking at them as absolutes. So if one coin has a probability of 20% and the other has a probability of 30%, we’ll assign them weights of .4 and .6, respectively. Here’s how:
for roll in rolls:
s = sum( likelihoodOfRoll( roll, k) for k in theta.values())
for hidden in theta.keys():
a = likelihoodOfRoll( roll, theta[hidden])
aChance = a / s
I didn’t figure out that part until I came across a StackExchange question about it. How does expectation maximization work?
The variable aChance holds the weighted probability of the sequence coming from the coin. Next, we’ll assign that percentage of the heads in the sequence to that coin (We’ll keep a couple of dictionaries to hold the information):
headSum[hidden] += roll.count( "H" ) * aChance
tailSum[hidden] += roll.count( "T" ) * aChance
This is all we need to calculate new values for theta.
for hidden in theta.keys():
theta[hidden] = headSum[hidden] / (headSum[hidden] + tailSum[hidden])
Iterate the whole thing a few times, or a whole lot of times, and eventually it will converge to values. Not necessarily the best values – it may end up in a local maximum. But some values.
That’s all there is to it. The code is a lot simpler to understand than the math, I think. You can download the code from GitHub.