Coupon collector problem
Suppose you have a bag of balls labeled 1 through 1,000. You draw draw balls one at a time and put them back after each draw. How many draws would you have to make before you’ve seen every ball at least once?
This is the coupon collector problem with N = 1000, and the expected number of draws is
N HN
where
HN = 1 + 1/2 + 1/3 + … + 1/N
is the Nth harmonic number.
As N increases, HN approaches log(N) + γ where γ = 0.577… is the Euler-Mascheroni constant, and so the expected time for the coupon collector problem is approximately
N (log(N) + γ).
Consecutive draws
Now suppose that instead of drawing single items, you draw blocks of consecutive items. For example, suppose the 1,000 balls are arranged in a circle. You pick a random starting point on the circle, then scoop up 10 consecutive balls, then put them back. Now how long would it take to see everything?
By choosing consecutive balls, you make it harder for a single ball to be a hold out. Filling in the holes becomes easier.
Bucketed problem
Now suppose the 1,000 balls are placed in 100 buckets and the buckets are arranged in a circle. Now instead of choosing 10 consecutive balls, you choose a bucket of 10 balls. Now you have a new coupon collector problem with N = 100.
This is like the problem above, except you are constraining your starting point to be a multiple of n.
Upper and lower bounds
I’ll use the word “scoop” to mean a selection of n balls at a time to avoid possible confusion over drawing individual balls or groups of balls.
If you scoop n balls at a time by making n independent draws, then you just have the original coupon collector problem, with the expected time divided by n.
If you scoop up n consecutively numbered balls each time, you reduce the expected time to see everything at least once. But your scoops can still overlap. For example, maybe you selected 13 through 22 on one draw, and 19 through 38 on the next.
In the bucketed problem, you reduce the expected time even further. Now your scoops will not partially overlap. (But they may entirely overlap, so it’s not clear that this reduces the total time.)
It would seem that we have sandwiched our problem between two other problems we have the solution to. The longest expected time would be if our scoop is made of n independent draws. Then the expected number of scoops is
N HN / n.
The shortest time is the bucketed problem in which the expected number of scoops is
(N/n) H(N/n).
It seems the problem of scooping n consecutive balls, with no constraint on the starting point, would have expected time somewhere between these two bounds. I say “it seems” because I haven’t proven anything here, just given plausibility arguments.
By the way, we can see how much bucketing reduces the expected time by using the log approximation above. With n independent draws each time, the expected number of scoops is roughly
(N/n) log(N)
whereas with the bucketed problem the expected number of scoops is roughly
(N/n) log(N/n).
Expected number of scoops
I searched a bit on this topic, and I found many problems with titles like “A variation on the coupon collector problem,” but none of the papers I found considered the variation I’ve written about here. If you work out the expected number of scoops, or find a paper where someone has worked this out, please let me know.
The continuous analog seems like an easier problem, and one that would provide a good approximation. Suppose you have a circle of circumference N and randomly place arcs of length n on the circle. What is the expected time until the circle is covered? I imagine this problem has been worked out many times and may even have a name.
Simulation results
When N = 1000 and n = 10, the upper and lower bounds work out to 748 and 518.
When I simulated the consecutive coupon collector problem I got an average of 675 scoops, a little more than the average of the upper and lower bounds.
The post Consecutive coupon collector problem first appeared on John D. Cook.