Staredit Network > Forums > Technology & Computers > Topic: Spell Checker Algorithm
Spell Checker Algorithm
Jan 4 2011, 7:56 pm
By: A_of-s_t  

Jan 4 2011, 7:56 pm A_of-s_t Post #1

aka idmontie

I've never had to make a spell checker before, so I was wondering if anyone knew the best algorithm for this. It will be written in Javascript, and no, I don't want a premade spell checker. If anyone can post a link to some different algorithms, it would be appreciated.

And yes, I am googling it right now, I was just wondering if any one here knew some algorithms.

Thanks.

EDIT: Oh, and time efficiency is important.



Personal GitHub
Starcraft GitHub Organization - Feel free to request member status!
TwitchTV

Jan 4 2011, 8:26 pm The Starport Post #2



Algorithm? Isn't just a simple lookup table?

Unless you want to fill up your own dictionary with all the words in the english language, I'd suggest getting ahold of someone else's if you can.



None.

Jan 4 2011, 8:38 pm Decency Post #3



Yeah pretty sure Tux is right, there's no algorithm. It's a dictionary lookup and if there is an algorithm, it's just based off adding suffixes to words to form plurals, participles, etc., while keeping track of exceptions. This is pretty much the last thing I'd want to create.



None.

Jan 4 2011, 9:02 pm A_of-s_t Post #4

aka idmontie

Finding out if a word doesn't exist in the dictionary is the easy part, I was talking about the part where it suggests what word you meant to type if it doesn't exist in the look up table.



Personal GitHub
Starcraft GitHub Organization - Feel free to request member status!
TwitchTV

Jan 4 2011, 9:39 pm Sand Wraith Post #5

she/her

You could brute-force it by checking for the adjacent keys of each letter of the word.

Example: I wanted to spell "dog", but typed "eog" instead. "eog" is not in the dictionary, go through each letter of "eog" (e, o, g) to find respective adjacent letter-keys (e's adjacent keys would be w, s, d, f, r; o's adjacent keys would be p, l, k, i), and if any of the new combination of letters makes a valid word from the dictionary, suggest that word.

But, that doesn't sound very efficient. :x

(wog), sog, (dog), (fog), rog
eig, ekg, elg, epg
eot, eor, eof, eov, eob, (eon), eoh, eoy

The next (more inefficient but comprehensive) method would be to replace every single letter of the word with every single letter from the alphabet.
But that would still be slow. :x




Jan 4 2011, 9:46 pm A_of-s_t Post #6

aka idmontie

Quote from Sand Wraith
The next (more inefficient but comprehensive) method would be to replace every single letter of the word with every single letter from the alphabet.
But that would still be slow. :x
That's what most spell checkers do from what my research shows. I like your idea though.

I've also noticed that some methods add and subtract letters, and swap letters, along with replacing letters. They then repeat this process two or three times.

But that sounds like a lot of checking...



Personal GitHub
Starcraft GitHub Organization - Feel free to request member status!
TwitchTV

Jan 4 2011, 10:34 pm The Starport Post #7



Oh word suggest? Maybe you could dissect a given word into its syllables, then check alternate spellings for given syllables and return the resulting set of words that also happen to exist in the dictionary. Or find similar-sounding or closely-spelled syllables, try them out, and check if they produce actual words as well.

If it's only 1 word at a time, it shouldn't be hugely problematic performance wise.

Post has been edited 2 time(s), last time on Jan 4 2011, 10:41 pm by Tuxedo-Templar.



None.

Jan 5 2011, 1:57 am NudeRaider Post #8

We can't explain the universe, just describe it; and we don't know whether our theories are true, we just know they're not wrong. >Harald Lesch

How about making a 2nd row in the word database which has an integer value associated to each word. In the first step you look for the checked word itself and if it wasn't found (e.g. typo) you compare the 2nd row with the integer value of the checked word and if it's only different by a small amount suggest this word.

This would be among the fastest methods possible, I believe, but the would probably also generate false positives - depending on the algorithm creating the integer. The problem is finding a clever algorithm that creates a useful integer value for you. Ideally it'd be something like a MD5 hash where every word has a unique number but similar words (differ only in 1 or 2 letters) have close values. I'm not sure if something like this exists or is even possible, but if you're interested I can put more thought / research into the idea.




Jan 5 2011, 2:00 am O)FaRTy1billion[MM] Post #9

👻 👾 👽 💪

The point of something like MD5 is so similar things have drastically different results ... You'd probably have to make your own hashing algorithm.



TinyMap2 - Latest in map compression! ( 7/09/14 - New build! )
EUD Action Enabler - Lightweight EUD/EPD support! (ChaosLauncher/MPQDraft support!)
EUDDB - topic - Help out by adding your EUDs! Or Submit reference files in the References tab!
MapSketch - New image->map generator!
EUDTrig - topic - Quickly and easily convert offsets to EUDs! (extended players supported)
SC2 Map Texture Mask Importer/Exporter - Edit texture placement in an image editor!
\:farty\: This page has been viewed [img]http://farty1billion.dyndns.org/Clicky.php?img.gif[/img] times!

Jan 5 2011, 2:06 am NudeRaider Post #10

We can't explain the universe, just describe it; and we don't know whether our theories are true, we just know they're not wrong. >Harald Lesch

Quote from O)FaRTy1billion[MM]
The point of something like MD5 is so similar things have drastically different results ...
I know how MD5 works. My point was that it should create a unique ID for each word (like MD5) but NOT a completely different ID for similar words (UNlike MD5)

Quote from O)FaRTy1billion[MM]
You'd probably have to make your own hashing algorithm.
Quote from NudeRaider
The problem is finding a clever algorithm that creates a useful integer value for you.





Jan 5 2011, 2:39 am A_of-s_t Post #11

aka idmontie

Quote from NudeRaider
Quote from O)FaRTy1billion[MM]
The point of something like MD5 is so similar things have drastically different results ...
I know how MD5 works. My point was that it should create a unique ID for each word (like MD5) but NOT a completely different ID for similar words (UNlike MD5)

Quote from O)FaRTy1billion[MM]
You'd probably have to make your own hashing algorithm.
Quote from NudeRaider
The problem is finding a clever algorithm that creates a useful integer value for you.
There appears to be something like this already: http://en.wikipedia.org/wiki/Levenshtein_distance . I'm currently looking into it. There likewise exists the function ratio() in the python library that returns a ratio of whether two strings are similar.

Is this what you are talking about?

Btw, thanks everyone for the suggestions. Most of these I've never even thought of.



Personal GitHub
Starcraft GitHub Organization - Feel free to request member status!
TwitchTV

Jan 5 2011, 12:30 pm NudeRaider Post #12

We can't explain the universe, just describe it; and we don't know whether our theories are true, we just know they're not wrong. >Harald Lesch

Well, kinda, yes. But the levenshtein distance requires you to run a function involving both strings which I was trying to avoid because that's far slower than comparing a value with a row in a table.
However there's several optimizations, most of which can be applied to your problem since you're only interested in near distances (close matches). You'll have to try it out if performance is good enough for you.

EDIT:
These seem also relevant:
http://en.wikipedia.org/wiki/Levenshtein_automaton
http://en.wikipedia.org/wiki/Hash_function#Finding_similar_records

Post has been edited 1 time(s), last time on Jan 5 2011, 12:41 pm by NudeRaider.




Jan 5 2011, 8:04 pm A_of-s_t Post #13

aka idmontie

My dictionary of words also has a frequency associated with each word, so, I'm thinking of ordering possible word suggestions by frequency. Good idea, or no?



Personal GitHub
Starcraft GitHub Organization - Feel free to request member status!
TwitchTV

Jan 5 2011, 8:17 pm DavidJCobb Post #14



Typos are generally of a similar length to the misspelled words.

DOG
FOGG

So take all words that are within two characters of input's length, and narrow it down from there, or something. Prefixes and suffixes could complicate this, though.

FAMNING

Did you mean:
FARMING
DAMNING
CANNING
FAMINE

etc.



None.

Jan 5 2011, 8:22 pm A_of-s_t Post #15

aka idmontie

Quote from DavidJCobb
Typos are generally of a similar length to the misspelled words.

DOG
FOGG

So take all words that are within two characters of input's length, and narrow it down from there, or something. Prefixes and suffixes could complicate this, though.

FAMNING

Did you mean:
FARMING
DAMNING
CANNING
FAMINE

etc.


Wouldn't FAMNING be two mutations away from FARMING? That would be a lot of mutations to check. Not sure if there is a better way. And technically DAMNING is only one mutation away, so it should be first on the list.



Personal GitHub
Starcraft GitHub Organization - Feel free to request member status!
TwitchTV

Options
  Back to forum
Please log in to reply to this topic or to report it.
Members in this topic: None.
[01:56 am]
Oh_Man -- cool bit of history, spellsword creator talking about the history of EUD ^
[09:24 pm]
Moose -- denis
[05:00 pm]
lil-Inferno -- benis
[10:41 am]
v9bettel -- Nice
[2024-4-19. : 1:39 am]
Ultraviolet -- no u elky skeleton guy, I'll use em better
[2024-4-18. : 10:50 pm]
Vrael -- Ultraviolet
Ultraviolet shouted: How about you all send me your minerals instead of washing them into the gambling void? I'm saving up for a new name color and/or glow
hey cut it out I'm getting all the minerals
[2024-4-18. : 10:11 pm]
Ultraviolet -- :P
[2024-4-18. : 10:11 pm]
Ultraviolet -- How about you all send me your minerals instead of washing them into the gambling void? I'm saving up for a new name color and/or glow
[2024-4-17. : 11:50 pm]
O)FaRTy1billion[MM] -- nice, now i have more than enough
Please log in to shout.


Members Online: Roy, lil-Inferno, Ultraviolet, Oh_Man