Scribbles / Telephone keypad word conversion script

  • By Jason Spencer
  • First published: 13 July 2012
  • Last update: 16 January 2015
  • HISTORY
bash DTMF gawk IVR keypad SIP telephony VoIP

On a standard telephone handset and many mobile dialers there is a set of alphabetic characters assigned to the different digits.
This little shell script does the mapping at the command line:

jsp@machina:~$ echo "john paul george ringo pete stuart" | sed -e 's/[aAbBcC]/2/g' -e 's/[dDeEfF]/3/g' -e 's/[gGhHiI]/4/g' -e 's/[jJkKlL]/5/g' -e 's/[mMnNoO]/6/g' -e 's/[pPqQrRsS]/7/g' -e 's/[tTuUvV]/8/g' -e 's/[wWxXyYzZ]/9/g'
5646 7285 436743 74646 7383 788278
jsp@machina:~$ 

I wrote it because I wanted to come up with a quick and memorable telephone extension numbering scheme for a small office client.
You can download the script here (just be sure to rename to .sh), or a GAWK version here. The GAWK version also prints the name alongside the converted code.

If there aren't too many names/extensions then there is no need to wait for the entire sequence as the extension can be chosen unambiguous after a few keystrokes anyway. It might be an idea to incorporate a "To connect to john press hash, or press 0 to return to main menu" confirmation in the IVR (after 5 and 6 have been dialled) to make sure there are no mis-diallings.

To check for a numbering collision this next one-liner does the conversion, truncates the number sequences and looks for duplicates:

jsp@machina:~$ echo "john paul george ringo pete stuart" | ./mapchars.sh | xargs -n 1 | cut -c -2 | sort -n | uniq -c
      1 43
      1 56
      1 72
      1 73
      1 74
      1 78
jsp@machina:~$ 

For this set of names the extension can already be identified unambiguously after only two digits. Adjust the value in the cut statement to try different lengths. The first column is the number of times the n number sequence appears, and the second column is the sequence.

To check the depth of all collisions, here's a gawk script:

jsp@machina:~$ echo "john paul george ringo pete stuart mick keith ron brian charlie john eric michael terry graham ronnie reggie" | gawk -f mapchars.gawk | gawk '(NF==2) {mapping[$1]=$2; len = length($1); if(maxlen<len)maxlen=len; } END{ for(i in mapping) { print i "\t\t" mapping[i]; } print "\nCollisions:"; for(i=maxlen;i>=1;i--){ delete arr; delete tally; for(str in mapping) { num=""+mapping[str]; tnum=substr(num,1,i); tnums[tnum]=1; tally[tnum]++; arr[tnum,tally[tnum]]=str; } for(tn in tnums){ if(tally[tn]>1){ printf("%s :\t\t",tn); tn_len = length(tn); for(ti=1;ti<=tally[tn];ti++) { if(tn_len==length(arr[tn,ti]))alias="*"; else alias=""; printf (" %s%s", arr[tn,ti], alias);} printf("\n");  } }  }  }'
paul		7285
terry		83779
mick		6425
brian		27426
reggie		734443
john		5646
keith		53484
eric		3742
ringo		74646
george		436743
graham		472426
michael		6424235
ron		766
ronnie		766643
charlie		2427543
stuart		788278
pete		7383

Collisions:
766 :		 ron* ronnie
642 :		 mick michael
73 :		 reggie pete
76 :		 ron ronnie
64 :		 mick michael
2 :		 brian charlie
4 :		 george graham
5 :		 john keith
6 :		 mick michael
7 :		 paul reggie ringo ron ronnie stuart pete
jsp@machina:~$ 

The script first lists all the converions in full and then attempts to shorten them one digit from the end at a time and checks for number collisions. The digits on the left are the converted digits over which the collision occurs.

We can see that collisions occur when we only consider the first three numbers (or less) of the converted extension (collisions between mick and michael - both map to 642), so, for these names a four digit extension number would be enough to unambiguously select a person. (The keypad letter to number mapping is a little like a hash function and this is a way of checking how big each bucket will be.)

The script additionally labels names with an asterisk if they are fully aliased - as is the case with ron and ronnie in the example above. In such a case it may be an idea to terminate the extension with a hash/pound sign (#) - so 766# would unambiguously identify ron and 7666 would identify ronnie.

The collision detection script can also be accessed here so you can do:

jsp@machina:~$ echo "john paul george ringo pete stuart mick keith ron brian charlie john eric michael terry graham ronnie reggie" | gawk -f mapchars.gawk | gawk -f mapchars_collisions.gawk	

To get the same output as previously.

I wouldn't propose this numbering scheme as the only or primary extension numbering scheme, but it may be a handy speed dialling scheme used alongside a properly numbered set of extensions. So, one could program the PBX with a primary set of extensions starting with the number 1, say, and a set of alias extension numbers which are all prefixed with, say, a 3, and then dialling 37383 would speed dial pete's extension, and 36425 would dial mick's and 36424 would dial michael - this way you don't have to remember the extension numbers, just the 3 prefix and use the dialpad letters to do the rest.
If there are collisions that cannot be avoided, say someone called aaaaaaa (2222222) and bbbbbbb (2222222) then I propose an IVR prompt saying "For aaaaaaa press 1 and for bbbbbbb press 2, to return to the main menu press 0" once the full sequence was entered. Anyway, that's how I did it. It seemed to work.

Bon chance.

Version history:

Fri 13 July 2012 19:57 UTCinitial version
Fri 16 January 2015 21:19 UTCadded keypad image