[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Arabic spellchecker

Asalamu alaikum.

On Tuesday 15 November 2005 20:55, Ahmad Khalifa wrote:
> This is helpful if you want to collect a fat wordlist which
> describes the verb 'كتب'(KTB) and all its derivatives in 15 entries.

I did something exactly the same way because it was feasible. ;)

I agree the approach is far from being organized.

> But, such a dictionary would not be very efficient given that a verb 
> such as 'كتب'(KTB) can be written as 1 entry with special flags.
> This is where its difficulty lies. Defining the AFFIX rules and
> writing a *flagged* wordlist.

This is a real problem.


is the root for:

chances for a findig a pragmatical way, or a decent pattern, could be difficult. Not
to mention that the AFFIX rules would be useless, in my humble opinion (don't let me
put you down).

For fun, consider modern Arabic terms -- one that I can't forget was "maykanat"
(automating). The root is MKN (e.g., wallatheena inn makkannaahum fil ardh...).
Problem is that the yaa comes exactly in the middle of the root. Same goes for
kitaab, the alif comes in the middle of the root. If you could solve such cases,
I would be very much interested to see your work.

I had to solve a problem recently related to laam al-jarr and al atta'reef. It was
a nightmare to me. Both cases involve two unrelated grammatical rules, I was working
in tandem with those rules. At the end, I found that the computation I wanted applies
to both grammatical rules; I didn't pay attention to that. :)

I think this is not a disadvantage or an example of how the language is complicated.
It just shows how flexible the language can be. For programmers, that could be quite
a real challenge: More flexibility = more cases to handle.

> Right now, we are in the 'Define Arabic as AFFIX rules' phase. Next we
> would be in the 'populate the flagged dictionary list' phase.

I wish you goodluck insha-Allah.

Abdalla Alothman