SoatDev IT Consulting
SoatDev IT Consulting
  • About us
  • Expertise
  • Services
  • How it works
  • Contact Us
  • News
  • August 5, 2023
  • Rss Fetcher

A few days ago I wrote about using the CMU Pronouncing Dictionary to search for words that decode to certain numbers in the Major mnemonic system. You can find a brief description of the Major system in that post.

As large as the CMU dictionary is, it did not contain words mapping to some three-digit numbers, so it would be good to explore a larger, or at least different, dictionary. But the CMU dictionary is apparently the largest dictionary with pronunciation openly available.

To get more pronunciation data, you’ll need to generate it. This is what linguists call the grapheme to phoneme problem. There are software packages that create phonetic spellings using large neural network models, including models trained on the CMU data.

Why quick-and-dirty is OK

However, it’s possible to do a good enough job with much simpler software. There are several reasons why we don’t need the sophistication of research software. First and foremost, we can tolerate errors. If we get a few false positives, we can skim through those and ignore them. And if we get a few false negatives, that’s OK as long as we find a few of the words we’re looking for.

Another thing in our favor is that we’re not looking for pronunciation per se, only the numbers generated from the pronunciation. The hardest part of the grapheme to phoneme problem is vowel sounds, and we don’t care about vowel sounds at all. And we don’t care about distinguishing, for example, between voiced and unvoiced variations on the th sound because they both map to 1.

Code

The Major mnemonic system is based on pronunciation, not spelling. Nevertheless, you can do a rough-and-ready conversion, adequate for our purposes, based on spelling. I take into account a minimal amount of context, such as noting that c is soft before i, e, and y, but hard before a, o, and u. The handling of ch is probably biggest source of errors because the sound of ch depends on etymology.

I wrote this as a Python script initially because I wanted to share it with someone who knows Python. But I’ll present it here in Perl because the Perl code is much more compact.

sub word2num {
    $_ = shift;
    
    tr/A-Z/a-z/; # lower case
    
    s/ng/n/g;
    s/sch/j/g;
    s/che/k/g;
    s/[cs]h/j/g;
    s/g[iey]/j/g; # soft g -> j
    s/c[eiy]/s/g; # soft c -> s
    s/c[aou]/k/g; # hard c -> k
    s/ph/f/g;
    s/([flmrv])1+/1/g; # condense double letters
    s/qu/k/g;
    s/x/ks/g;

    tr/szdnmrljgkfvpb/00123456778899/;
    tr/a-z//d; # remove remaining letters
    
    return $_
}

Perl has implicit variables, for better and for worse, and here it’s for the better. All the translation (tr//) and substitution (s//) operate in place on the implicit argument, the word sent to the function.

The corresponding Python code is more verbose:

def word2num(w):

    w = w.lower()

    w = w.replace('ng', 'n')
    w = w.replace('sch', 'j')
    w = w.replace('che', 'k')

    for x in ['gi', 'ge', 'gy', 'ch', 'sh']:
        w = w.replace(x, 'j')

    ...

The order of the replacement statements matters. For example, you want to decide whether c and g are hard before you discard the vowels.

This script works better than I expected it would for being such a dirty hack. I ran it on some large word lists looking for more alternatives to the three-digit numbers not in the output of the script processing the CMU dictionary. I list a few of the words I found here. The most amusing find was phobophobia, the fear of phobias, for 898.

Aside from filling in gaps in three-digit numbers, you could also use a script like this to search for mnemonic words in specialized lists of words, such as baseball players, or animal species, or brand names.

The post Code to convert words to Major system numbers first appeared on John D. Cook.

Previous Post
Next Post

Recent Posts

  • Lawyers could face ‘severe’ penalties for fake AI-generated citations, UK court warns
  • At the Bitcoin Conference, the Republicans were for sale
  • Week in Review: Why Anthropic cut access to Windsurf
  • Will Musk vs. Trump affect xAI’s $5 billion debt deal?
  • Superblocks CEO: How to find a unicorn idea by studying AI system prompts

Categories

  • Industry News
  • Programming
  • RSS Fetched Articles
  • Uncategorized

Archives

  • June 2025
  • May 2025
  • April 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023

Tap into the power of Microservices, MVC Architecture, Cloud, Containers, UML, and Scrum methodologies to bolster your project planning, execution, and application development processes.

Solutions

  • IT Consultation
  • Agile Transformation
  • Software Development
  • DevOps & CI/CD

Regions Covered

  • Montreal
  • New York
  • Paris
  • Mauritius
  • Abidjan
  • Dakar

Subscribe to Newsletter

Join our monthly newsletter subscribers to get the latest news and insights.

© Copyright 2023. All Rights Reserved by Soatdev IT Consulting Inc.