Sunday, December 13, 2009

Computing combinatorials

I was commenting on this fun post about efficiently computing n-choose-c but my attempt to post some code seems to have been eaten by the comment system (I suspect use of square and angle brackets upset it), so I'll post something here instead.

Here's why c always takes integer values and more generally why (k + 1) * (k + 2) * ... * (k + n) is always divisible by factorial(n). There are three ways to show this.

Apologies for the crappy maths typesetting, I should really figure out how to do this nicely.

Proof by hand-waving.

First off, something hand-wavy. One of (k + 1)*(k + 2) is even so 2 divides into that product. One of (k + 1)*(k + 2)*(k + 3) is threeven (not a real word!) so 3 divides into that product. 4 is not so simple. One of (k + 1)*(k + 2)*(k + 3)*(k + 4) is fourven so 4 divides into that product - however maybe the fourven number was also the even number we used to cancel the 2 earlier on so it only has one 2 left for division but if that's the case then since one of the remaining 3 numbers is also even and you can get a 2 from that to make up the 4. 5 works just like 2 and 3. 6 contends with both 2 and 3 but you can resolve that too. This gets messy quickly but essentially, for any prime p, every time you come to the next multiple of p in the divisor you will have already added at least one multiple of p in the k+i product and by the time you hit a multiple of p2 you will have already hit a corresponding one in the k+i product too, so you'll still have enough ps in the product overall to cancel all the ps in the divisor. Still a bit hand-wavy but it could be made rigorous, however there is another proof that's easily rigorous but less insightful.

Proof by counting prime factors.

There is a well known formula for finding the power of p in factorial(n). It's

sum(i=1..infinity, [n/pi])

where [] means take the integer part, throwing away any fractions. So if p = 5, and n=30 then this is
[30/5] + [30/25} + [30/125] + [30/625] + ... = 6 + 1 + 0 + 0 + ...

It's all zeros after that, so the end result is 7. So 57 divides factorial(30) and 7 is the highest power of 5 that divides it.

So rewriting the k + i product as factorial(k + n)/factorial(k) we now know that for any prime p, it contains exactly

sum(i=1..infinity, [(n+k)/pi]) - sum(i=1..infinity, [(k)/pi])

powers of p. If this is greater than
sum(i=1..infinity, [(k)/pi])

then we know that factorial(n) divides factorial(k + n)/factorial(k).

Well, [x + y] >= [x] + [y] so

[(n+k)/pi] >= [(n)/pi] + [(k)/pi]

now bring one term over to the left hand side to get

[(n+k)/pi] - [(n)/pi] >= [(k)/pi]

Now sum over i=1..infinity and you get exactly what we needed to show. So for any prime p, there are at least as many powers of p in the k+i product as there are in factorial(n), all the ps in the divisor cancel out and the divisor divides in evenly.

Proof by "it just is".

The final and least insightful way is to note that factorial(n+k)/(factorial(k) * factorial(n)) is n+k choose n and so must be an integer


Here's a python snippet that checks that the division is always even with an assert.

Give it an n and it will print out comb(n, i) for all i. The assert never fires for me.

As it loops, c gets the values comb(k + 1, 1), comb(k + 2, 2), ..., comb(n, n - k), all of which are integers (and remember that comb(n, n - k) = comb(n, k) so c ends up with the desired value.)

If your lanugage does tail-call optimisation, you could define

comb(n, 0) = 1
comb(n, i) = comb(n - 1, i - 1) * n / (n - k)

and be done. With memoisation you could avoid lots of repeated calculations.


colmmacc said...

Yep, you're definitely right, the proof makes sense. I wasn't assuming that k must be divisable, that was the comment box eating my text, plus having a borken arm ;-)

Now I'm wondering what cases caused me to add the float() in my implementation to begin with, time to search my editor backups!

Fergal said...

Hope the arms gets better soon! Funny how you're so on my radar this week, what with lunch with Wilmer and Carlo and me chatting with Josh Glover.

Nelnik said...

I'm sure you've thought of this but, if we want to work out
comb(500,499) efficiently, then we would like to use the fact that comb(500,499) = comb(500,1) first, before applying the recursive definition:
comb(n, i) = comb(n - 1, i - 1) * n / (n - k)

how about
comb(n, i) = comb(n - 1, min(n-i,i) - 1) * n / (n - k)

Nelnik said...

would need to define k

Replacing k:

comb(n, i) = comb(n - 1, min(n-i,i) - 1) * n / min(n-i,i)

From an efficiency point of view that wouldn't be great, calling min(,) much more often than necessary.

In my experience, functional languages give really terrible (unhelpful) error messages. In big systems that can be a very serious problem.

Anonymous said...

BTW, if you need C(n,k) values for all k, then the well-known formula
comes really handy.