Bob Carpenter wrote today about how Markov chains cannot thoroughly cover high-dimensional spaces, and that they do not need to. That’s kinda the point of Monte Carlo integration. If you want systematic coverage, you can/must sample systematically, and that’s impractical in high dimensions.

Bob gives the example that if you want to get one integration point in each quadrant of a 20-dimensional space, you need a million samples. (2^{20} to be precise.) But you may be able to get sufficiently accurate results with a lot less than a million samples.

If you wanted to be certain to have one sample in each quadrant, you could sample at (±1, ±1, ±1, …, ±1). But if for some weird reason you wanted to sample randomly *and* hit each quadrant, you have a coupon collector problem. The expected number of samples to hit each of *N* cells with uniform [1] random sampling is

*N*( log(*N*) + γ )

where γ is Euler’s constant. So if *N* = 2^{20}, the expected number of draws would be over 15 million.

## Related posts

- The coupon collector problem and π
- Collecting a large number of coupons
- Consecutive coupon collector problem

[1] We’re assuming each cell is sampled independently with equal probability each time. Markov chain Monte Carlo is more complicated than that because the draws are *not* independent.

The post MCMC and the coupon collector problem first appeared on John D. Cook.