JeanJoe
<font color=red>VMK Community Leader!</font color>
- Joined
- Apr 27, 2005
- Messages
- 2,027
Thanks to ElectricLime, who originally posted how to obtain the VMK dictionary as well as the tag method VMK uses to expand it, I've been playing with ways to work with the VMK dictionary, and have written some analysis tools. For people who might be curious, these are some things that I've been messing around with.
Because it was too long to post here, you can download the current full VMK dictionary hereEdited to add: not a link that's kept current. If anyone finds it of any use / interest, let me know.
The dictionary file that your computer downloads from VMK consists of a bunch of "root" words along with tags that specify how the root words can be modified. ElectricLime proposed, and I confirmed, that VMK uses "okspell" encoding. The tags modify the root words, for example, by prefixing with "re-" "in-" "un-", or suffixing with "-ed, -ing, -th, -ings, -s, -est, -ive, -ions, -ly, -ers", etc.
I wrote a script to expand all the root words to ALL the words allowable in VMK. Because they sometimes put tags on root words they shouldn't, like the "in-" tag onto "lalala", we can actually say things like "inlalala".
I expanded the script to do the following:
- download the dictionary file from VMK
- expand the dictionary from root words only, to ALL allowed words
Then, you can ask the program to:
- list all VMK words that contain a certain text
- do fancier searches, like all VMK words that begin or end a particular way, or even WAY more convoluted (see below)
- list all VMK words of a certain length
- take a block of text, and flag all words that are not allowed
Here's an example of how it looks:
Starting the program:
It'd be easy enough to have a separate list of "dictionary dances" that would substitute words not allowed by the VMK dictionary with commonly used "dances." Of course, I, as a law-abiding VMK individual who would never try to bypass the VMK dictionary, would never ever ever consider doing something like that.
Go ahead and say it... I am a geek, old, lame... AND PROUD!

Because it was too long to post here, you can download the current full VMK dictionary hereEdited to add: not a link that's kept current. If anyone finds it of any use / interest, let me know.
The dictionary file that your computer downloads from VMK consists of a bunch of "root" words along with tags that specify how the root words can be modified. ElectricLime proposed, and I confirmed, that VMK uses "okspell" encoding. The tags modify the root words, for example, by prefixing with "re-" "in-" "un-", or suffixing with "-ed, -ing, -th, -ings, -s, -est, -ive, -ions, -ly, -ers", etc.
I wrote a script to expand all the root words to ALL the words allowable in VMK. Because they sometimes put tags on root words they shouldn't, like the "in-" tag onto "lalala", we can actually say things like "inlalala".
I expanded the script to do the following:
- download the dictionary file from VMK
- expand the dictionary from root words only, to ALL allowed words
Then, you can ask the program to:
- list all VMK words that contain a certain text
- do fancier searches, like all VMK words that begin or end a particular way, or even WAY more convoluted (see below)
- list all VMK words of a certain length
- take a block of text, and flag all words that are not allowed
Here's an example of how it looks:
Starting the program:
This shows that there are a total of 10,106 words that are allowed by VMK, expanded from the tags attached to 3,134 root words.Welcome to Darwin!
~ $ /tmp/501/Cleanup\ At\ Startup/dictsearchtool3-164413260.01.pl.command; exit
3134 root words loaded from dictionary.
10106 unique words expanded from dictionary.
This listed all words that contained 'rip' anywhere within them.Search regular expression, or length, TEXT, or <RET> to exit: rip
description description's descriptions trip trip's triple triply trips
This listed all words that END with -eth (the $ tells the program that matches have to be at the end).Search regular expression, or length, TEXT, or <RET> to exit: eth$
cometh giveth liveth taketh teeth
OK, why you want to, I don't know. But, here were all the allowed words in the VMK dictionary whose 3rd letter was 'a' AND ended with either 'f' or 'h'.Search regular expression, or length, TEXT, or <RET> to exit: ^.{2}a.*[fh]$
branch coach crash dearth flash health hearth leaf search smash staff teach yeah
This tells the program to list all allowed words that are exactly 15 characters long.Search regular expression, or length, TEXT, or <RET> to exit: 15
adventureland's alternativeness appropriateness comfortableness congratulations entertainment's environmentally fortuneteller's imaginativeness interestingness saint patrick's understandingly
The program took a block of text I pasted in, and then flagged the words that are not allowed by VMK.Search regular expression, or length, TEXT, or <RET> to exit: TEXT
Text to test against dictionary, EXIT when done
When you wish upon a star
Makes no difference who you are
Anything your heart desires
Will come to you
If your heart is in your dream
No request is too extreme
When you wish upon a star
As dreamers do
Fate is kind
She brings to those who love
The sweet fulfillment of
Their secret longing
Like a bolt out of the blue
Fate steps in and sees you through
When you wish upon a star
Your dreams come true
EXIT
########################
when you wish upon a star
makes no difference who you are
anything your heart desires
will come to you
if your heart is in your dream
no request is too extreme
when you wish upon a star
as dreamers do
fate is kind
she brings to those who [##love##]
the sweet [##fulfillment##] of
their secret longing
like a [##bolt##] out of the blue
fate steps in and sees you through
when you wish upon a star
your dreams come true
########################
And this is how the VMK dictionary handles more complex text.Search regular expression, or length, TEXT, or <RET> to exit: TEXT
Text to test against dictionary, EXIT when done
Four score and seven years ago our fathers brought forth on this continent a new nation, conceived in liberty and dedicated to the proposition that all men are created equal.
Now we are engaged in a great civil war, testing whether that nation or any nation so conceived and so dedicated can long endure. We are met on a great battlefield of that war. We have come to dedicate a portion of that field as a final resting-place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this. But in a larger sense, we cannot dedicate, we cannot consecrate, we cannot hallow this ground. The brave men, living and dead who struggled here have consecrated it far above our poor power to add or detract. The world will little note nor long remember what we say here, but it can never forget what they did here.
It is for us the living rather to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great task remaining before us--that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion--that we here highly resolve that these dead shall not have died in vain, that this nation under God shall have a new birth of freedom, and that government of the people, by the people, for the people shall not perish from the earth.
EXIT
########################
[##four##] score and [##seven##] [##years##] ago our [##fathers##] brought [##forth##] on this [##continent##] a new [##nation##] [##conceived##] in liberty and [##dedicated##] to the [##proposition##] that all [##men##] are created equal
now we are [##engaged##] in a great [##civil##] war testing whether that [##nation##] or any [##nation##] so [##conceived##] and so [##dedicated##] can long [##endure##] we are met on a great [##battlefield##] of that war we have come to [##dedicate##] a [##portion##] of that field as a final resting place for those who here gave their lives that that [##nation##] might live it is altogether [##fitting##] and proper that we should do this but in a [##larger##] sense we cannot [##dedicate##] we cannot [##consecrate##] we cannot [##hallow##] this ground the brave [##men##] living and dead who [##struggled##] here have [##consecrated##] it far above our poor power to add or [##detract##] the world will little note nor long remember what we say here but it can never forget what they did here
it is for us the living rather to be [##dedicated##] here to the [##unfinished##] work which they who fought here have thus far so [##nobly##] advanced it is rather for us to be here [##dedicated##] to the great task remaining before us that from these [##honored##] dead we take [##increased##] [##devotion##] to that cause for which they gave the last full measure of [##devotion##] that we here [##highly##] [##resolve##] that these dead shall not have [##died##] in [##vain##] that this [##nation##] under [##god##] shall have a new [##birth##] of freedom and that [##government##] of the people by the people for the people shall not [##perish##] from the earth
########################
Search regular expression, or length, TEXT, or <RET> to exit:
It'd be easy enough to have a separate list of "dictionary dances" that would substitute words not allowed by the VMK dictionary with commonly used "dances." Of course, I, as a law-abiding VMK individual who would never try to bypass the VMK dictionary, would never ever ever consider doing something like that.
Go ahead and say it... I am a geek, old, lame... AND PROUD!
