[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: arabic ispell



On Wed, Oct 17, 2001, Nadim Shaikli wrote about "Re: arabic ispell":
> > There is already a Hebrew ispell dictionary, so clearly someone else
> > has found a way to surmount the problem.  But I would need to know a
> > few more things.

I tried once to create a Hebrew ispell dictionary (see
http://ivrix.org.il/projects/spell-checker/) but never finished the project.
I'm not aware of any other complete Hebrew ispell dictionary...

The bidi issues are irrelevant to a spell checker, which should read the
input in "logical" order. But there are other problems that Semitic
languages (like Hebrew, and also Arabic) have when trying to create an
ispell-style word-list.

Unlike languages like (say) English, where you have a very small number of
forms to each noun or verb, Hebrew and Arabic have complex conjugation and
inflection rules based on 3-letter "roots", rather than basewords and affixes.
Creating a word list by hand, where each root will have dozens of words
derived from it, is a sure way to make mistakes and forgetting many of the
words.
Instead I tried to write a program which takes a base noun and inflects it
in all possible ways (this is not easy, because there are nearly a hundred
cases on how to do that in Hebrew) and generates a "full" list of nouns.
Dan Kenigsberg then did the same thing for verbs. We then added many
other words we found in online newspapers, and the like. But our word list
is *very* far from being complete: it doesn't deal with all cases yet, and
the base-word list is very short.

There's another problem I faced and till this day I don't know how to fix
(which makes this whole Hebrew word list unusable): In Hebrew, the particles
(like the English and, or, at, in, the, etc.) are one-letter (mem, kaf, bet,
etc.) put in the beginning of the word. So I would like ispell to mark any
valid word with a Bet (say) in front of it as valid also; So far I haven't
been able to do so, even though I did try to create a matching hebrew.aff
file - I probably did something wrong.

-- 
Nadav Har'El                        |    Thursday, Oct 18 2001, 1 Heshvan 5762
nyh at math dot technion dot ac dot il             |-----------------------------------------
Phone: +972-53-245868, ICQ 13349191 |Why do doctors call what they do
http://nadav.harel.org.il           |practice? Think about it.