This post is an addendum to the recent post Reviewing a thousand things. We’re going to look again at the coupon collector problem, randomly sampling a set of *N* things with replacement until we’ve collected one of everything.

As noted before, for large *N* the expected number of draws before you’ve seen everything at least once is approximately

*N*( log(*N*) + γ )

where γ is Euler’s constant.

A stronger result using a Chernoff bound says that for any constant *c*, the probability that the number of draws exceeds

*N*( log(*N*) + *c* )

approaches

1 − exp( − exp(−*c*))

as *N* increases. See [1].

I said in the earlier post that if you were reviewing a thousand things by sampling flash cards with replacement, there’s a 96% chance that after drawing 10,000 cards you would have seen everything. We could arrive at this result using the theorem above.

If *N* = 1,000 and we want *N*( log(*N*) + *c* ) to equal 10,000, we set *c* = 3.092. Then we find

1 − exp( − exp(−c) ) = 0.0444

which is consistent with saying there’s a 96% chance of having seen everything, 95.56% if you want to be more precise.

[1] Michael Mitzenmacher and Eli Upfal. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, 2005.

The post Collecting a large number of coupons first appeared on John D. Cook.