SoatDev IT Consulting
SoatDev IT Consulting
  • About us
  • Expertise
  • Services
  • How it works
  • Contact Us
  • News
  • May 25, 2023
  • Rss Fetcher

Suppose you have a random number generator that returns numbers between 1 and N. The birthday problem asks how many random numbers would you have to output before there’s a 50-50 chance that you’ll repeat a number. The coupon collector problem asks how many numbers you expect to generate before you’ve seen all N numbers at least once.

These are examples of occupancy problems. The name comes from imagining N urns, randomly assigning balls to each, then asking how many urns are occupied.

Suppose people are walking into a room one at a time. The birthday problem asks at what point is there even odds that two people in the room have the same birthday. The coupon collector problem asks the expected number of people to enter the room before all birthdays are represented. But we could ask, for example, after 100 people enter the room, how likely it is that there are 70 different birthdays represented.

When we draw r random samples with replacement from a set of N items, let X(N, r) be the random variable that represents the number of distinct items in the sample. For example, suppose a hashing algorithm returns one of N possible hash codes. The number of distinct hash codes after hashing r documents would be X(N, r).

The probability mass function (pmf) for X(N, r) has been calculated, for example in [1], but it’s very complicated and you’re not likely to get much understanding of X(N, r) by looking at the expression for the pmf. The mean and variance of X(N, r) are somewhat complicated [2], but easier to work with than the pmf.

The mean of X(N, r) is

Nleft( 1 - left(frac{N-1}{N}right)^r right)

In the special case that N = r and N is large, the mean is approximately N(1 – 1/e). For example, suppose you had a deck of 52 cards. You draw one card, put it back in the deck, and shuffle the deck. Repeat this 52 times. You would get about 33 distinct cards on average.

The variance of X(N, r) is more complicated than the mean.

frac{(N-1)^r}{N^{r-1}} + frac{(N-1)(N-2)^r}{N^{r-1}} - frac{(N-1)^{2r}}{N^{2r-2}}

As with the mean, the case N = r with N large is simpler. In that case the variance is approximately N(1/e – 1/e²). In the example above, this works out to about 12. The standard deviation is about 3.5, and so you’d often see 33 ± 7 distinct cards.

Related posts

  • Probability of secure hash collisions
  • The coupon collector problem and π
  • Serious applications of a party trick

[1] Paul G. Hoel, Sidney C. Port, and Charles J. Stone, Introduction to Probability TheoryX(N, r), Houghton Mifflin, Boston, 1971, pp. 43–45.

[2] Emily S. Murphree. Replacement Costs: The Inefficiencies of Sampling with Replacement. Mathematics Magazine, Vol. 78, No. 1 (Feb., 2005), pp. 51-57.

The post Occupancy problem distribution first appeared on John D. Cook.

Previous Post
Next Post

Recent Posts

  • Trump pulls Musk ally’s NASA Administrator nomination
  • Left-leaning influencers embrace Bluesky without abandoning X, Pew says
  • NAACP calls on Memphis officials to halt operations at xAI’s ‘dirty data center’
  • Meta plans to automate many of its product risk assessments
  • The ellipse hidden inside Pascal’s triangle

Categories

  • Industry News
  • Programming
  • RSS Fetched Articles
  • Uncategorized

Archives

  • May 2025
  • April 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023

Tap into the power of Microservices, MVC Architecture, Cloud, Containers, UML, and Scrum methodologies to bolster your project planning, execution, and application development processes.

Solutions

  • IT Consultation
  • Agile Transformation
  • Software Development
  • DevOps & CI/CD

Regions Covered

  • Montreal
  • New York
  • Paris
  • Mauritius
  • Abidjan
  • Dakar

Subscribe to Newsletter

Join our monthly newsletter subscribers to get the latest news and insights.

© Copyright 2023. All Rights Reserved by Soatdev IT Consulting Inc.