8
Random Pronouncable Text Generator
JavaScript

This is a simple library that generates random, pronounceable words of a given length. For example:

  generate.word(8); // shadedus
  generate.word(4); // thim
  generate.word(8); // citurcho

It can also do sentences and paragraphs by combining words of different lengths.


Why?

It was first written to generate random nicknames for anonymous users in Scrollback chat. This lets them chat without registering an account, and allows others to remember and address them naturally and distinctly.

We also use it extensively in mocks (for testing) to generate text that match the characteristics of user-generated content.

Another possible use is in applications where passwords or keys are generated by the system but should be remembered by the user — most people will find it easier to remember a pronounceable word than a string of random characters.


Overview

From a corpus of English words, we extracted three lists: common bigrams (two-character sequences) that occur at the beginning of words, common trigrams that occur anywhere in a word, and common trigrams that occur at the end of a word.

The word generator simply starts with a random choice from the first list (the two initial characters of the word) and then fills out the rest of the word, one character at a time, using a randomly picked trigram from the second or third lists that fits the string we’ve built until then.


Implementation

The code is in JavaScript (we’re a Node.js shop) but it’s easily portable to just about any language.

var initialBigrams = ["TH","OF","AN","IN","TO","CO", /* many more */ ],
    /* We group the trigrams by initial two characters for ease of lookup */
    middleTrigrams = {TH:['E','A','I','O','R'], AN:['D','T','Y','C','S','G','N','I','O','E','A','K'], /* more */},
    finalTrigrams = {TH:['E','O'],AN:['D','T','Y','S','G','O','E','A','K'], /* more */};

function pickRandomItem(array) {
    return array[Math.floor(array.length * Math.random())];
}

function word(n) {
    var str = '', len, lookupDigram, candidates;

    while((len = str.length) < n) {
        if(len < 2) {
            // if we are starting off or we have backtracked
            // to just one character (see below), start over.
            str = pickRandomItem(initialBigrams);
            continue;
        }
        lookupDigram = str.substr(len-2);
        if(len == n - 1) {
            candidates = finalTrigrams[lookupDigram];
        } else {
            candidates = middleTrigrams[lookupDigram];
        }

        if(candidates.length) {
            str += pickRandomItem(candidates);
        } else {
            // Occasionally, we might reach a state where
            // there is in no possible way to continue. If that
            // happens, backtrack by three characters and
            // try again.
            str = str.substr(0, len-3);
        }
    }
    return str.toLowerCase();
}

The actual code can be found in the lib/generate.js file in the project at github.com/scrollback/scrollback. It also contains the full bigram and trigram lists we use.


Beginner

While the average performance of the code above is O(n) where n is the number of characters, the backtracking step makes the theoretical worst-case complexity infinite. Also, it looks really ugly — the choice of three characters to chop off is pretty arbitrary.

We could eliminate backtracking entirely by removing from middleTrigrams all characters that may put the string in a blocked state. You will need to write a small script to do that — if you do, send a pull request to github.com/scrollback/scrollback and we may have something for you :-)

Author

Notifications

?