SoatDev IT Consulting
SoatDev IT Consulting
  • About us
  • Expertise
  • Services
  • How it works
  • Contact Us
  • News
  • February 17, 2024
  • Rss Fetcher

Suppose you have a list of encrypted surnames names of US citizens. If the list is long enough, the encrypted name that occurs most often probably corresponds to Smith. The second most common encrypted name probably corresponds to Johnson, and so forth. This kind of inference is analogous to solving a cryptogram puzzle by counting letter frequencies.

The probability of correctly guessing the most common names based on frequency analysis depends critically in the sample size. In a small sample, there may be no Smiths. In a larger sample, the name Smith may be common, but not the most common.

I did some simulations to estimate how well frequency analysis would work at identifying the 10 most common names as a function of the sample size N. For each N, I simulated 100 data sets using probabilities derived from the surname frequencies derived from US Census Bureau data.

When N = 1,000, there was a 53% chance that the most common name in the population, Smith, would be the most common name in the sample. The second most common name in the population, Johnson, was the second most common name in the sample only 14% of the time.

When N = 10,000, there was a 94% chance of identifying Smith, and at least a 30% chance of identifying the five most common names.

When N = 1,000,000, the three most common names were identified every time in the simulation. And each of the 10 most common names were correctly identified most of the time. In fact, the 18 most common names were correctly identified most of the time.

A consequence of this analysis is that hashing names does not protect privacy if the same size is large. Hashing names along with other information, so that the combined data has a more uniform distribution, may protect privacy.

The post Frequency analysis first appeared on John D. Cook.

Previous Post
Next Post

Recent Posts

  • Advantages and Limitations of Trading CFDs in Volatile Markets
  • Navigating the Cloud: Upcoming Trends, Challenges, and Strategies
  • Kaspersky Appoints General Manager for Sub-Saharan Africa
  • Solarise Africa & RUBiSOL Named “Deal of the Year 2024 – East Africa
  • Monzo’s pivot from cool to corporate: ‘freshness is not about gimmicks’

Categories

  • Industry News
  • Programming
  • RSS Fetched Articles
  • Uncategorized

Archives

  • May 2025
  • April 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023

Tap into the power of Microservices, MVC Architecture, Cloud, Containers, UML, and Scrum methodologies to bolster your project planning, execution, and application development processes.

Solutions

  • IT Consultation
  • Agile Transformation
  • Software Development
  • DevOps & CI/CD

Regions Covered

  • Montreal
  • New York
  • Paris
  • Mauritius
  • Abidjan
  • Dakar

Subscribe to Newsletter

Join our monthly newsletter subscribers to get the latest news and insights.

© Copyright 2023. All Rights Reserved by Soatdev IT Consulting Inc.