[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: arabic ispell



> I tried once to create a Hebrew ispell dictionary (see
> http://ivrix.org.il/projects/spell-checker/) but never finished the project.
> I'm not aware of any other complete Hebrew ispell dictionary...

Yours is the only one that I am aware of (I have it listed on the
ispell Web page).

> Unlike languages like (say) English, where you have a very small number of
> forms to each noun or verb, Hebrew and Arabic have complex conjugation and
> inflection rules based on 3-letter "roots", rather than basewords and affixes.

This is kind of similar to Turkish and Finnish.  In those languages,
all words are formed by gluing together small morphemes.  You can have
tremendously complex words ("the gray-haired man who sold a cow to my
brother-in-law's neighbor").

I have a paper in my files on the subject of spell-checking such languages:

@Article{Oflazer96a,
	author =	"Kemal Oflazer",
	title =		"Error-tolerant Finite State Recognition with
			 Applications to Morphological Analysis and
			 Spelling Correction",
	journal =	"Computational Linguistics",
	volume =	22,
	number =	1,
	month =		mar,
	year =		1996,
	pages =		"73--89",
	note =		"Previously published as TR cmp-lg/9504031",
	keywords =	"spell-checking, approximate search",
}

The basic idea is that you represent the dictionary as a huge
finite-state machine.  The inflection and conjugation rules, as well
as the Hebrew particles, can then be represented as relatively simple
state transitions.  

The approach also works for affix-based languages such as English.
It's fast, requires relatively little storage, and allows correction
of multiple errors.  Unfortunately, Oflazer's paper doesn't give a lot
of detail.  That, combined with the small amount of time I have for
working on ispell, has meant that I haven't been able to look into
adopting the idea.
-- 
    Geoff Kuenning   geoff at cs dot hmc dot edu   http://www.cs.hmc.edu/~geoff/

If you find yourself wondering whether your partner really consented to
having sex with you...then they didn't.