ARPAbet and the Major mnemonic system

July 29, 2023
Rss Fetcher

Giraffe

ARPAbet is a phonetic spelling system developed by— you guessed it—ARPA, before it became DARPA.

The ARPAbet system is less expressive than IPA, but much easier for English speakers to understand. Every sound is encoded as one or two English letters. So, for example, the sound denoted ʒ in IPA is ZH in ARPAbet.

In ARPAbet notation, the Major mnemonic system can be summarized as follows:

0: S or Z
1: D, DH, T, or DH
2: N or NG
3: M
4: R
5: L
6: CH, JH, SH, or ZH
7: G or K
8: F or V
9: P or B

Numbers are encoded using the consonants above. You can insert any vowels or semivowels (e.g. w or y) you like. For example, you could encode 648 as “giraffe” or 85 as “waffle.”

The CMU Pronouncing Dictionary lists 134,373 words along with their ARPAbet pronunciation. The Python code below will read in the pronouncing dictionary and produce a Major mnemonic dictionary. The resulting file is available here as a zip compressed text file.

To find a word that encodes a number, search the code output for that number. For example,

    grep ' 648' cmu_major.txt

will find words whose Major encoding begins with 648, and

    grep ' 648$' cmu_major.txt

fill find words whose Major encoding is exactly 648.

From this we learn that “sherriff” is another possible encoding for 648.

Here’s the Python code.

# NB: File encoding is Latin-1, not UTF-8.
with open("cmudict-0.7b", "r", encoding="latin-1") as f:
    lines = f.readlines()

for line in lines:
    line.replace('0','') # remove stress notation
    line.replace('1','')
    line.replace('2','')
    
    pieces = line.split()
    numstr = ""
    for p in pieces[1:]:
        match p:
            case "S" | "Z":
                numstr += "0"
            case "D" | "DH" | "T" | "DH":
                numstr += "1"
            case "N" | "NG":
                numstr += "2"
            case "M":
                numstr += "3"
            case "R":
                numstr += "4"
            case "L":
                numstr += "5"
            case "CH" | "JH" | "SH" | "ZH":
                numstr += "6"
            case "G" | "K":
                numstr += "7"
            case "F" | "V":
                numstr += "8"
            case "P" | "B":
                numstr += "9"
    print(pieces[0], numstr)

The post ARPAbet and the Major mnemonic system first appeared on John D. Cook.

Previous Post

Next Post

Solutions

Regions Covered