[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Arabic spellchecker



Wa alaikum asalam wa rahmatullaah

On Tuesday 15 November 2005 22:20, Ahmad Khalifa wrote:
 
> You mean you already have such a wordlist ? I would be interested
> in taking a look at it, if you don't mind. I would like to see how it
> performs in OO.o.

Do you want to see the source or the results? The source is within a personal
project that I wanted to release. It contains more than 30 C++ files. It's not
ready, but I would love it if you can test it altogether.

The results are within 32 files; it's a comprehensive word index. They are
automatically generated in XML and HTML. I can show you some samples if you
want.
 
 
> Right now, ammar is working on elzubeir's "Arabic Grammer Rules"
> document,
> http://cvs.arabeyes.org/viewcvs/projects/duali/doc/arabic-grammar

As I mentioned earlier, when it comes to computation there *might* be
shortcuts available. The problem I was referring to was to produce
transliteration from Arabic to English. When the user enters:

مُوسى
it will construct:

Mousa

if the input is:

لِلقمرِ ضياء
the result will be:

lilqamari dhiyaa-a

This is fine. However, if the input is:

لِلشمس بريق

or:

والشمس والضحى

I had a problem. It was irritating to me. Naturally, the output will be:

LILshamsi bareeq
walshams waldhuHaa.

The laams here should be silenced (hidden). We have two different grammatical rules.
but only one small function is needed to produce the proper result:

lish-shamsi bareeq
wash-shamsi wadh-dhuHaa.

It appeared to me that regardless of the two grammatical rules, there are some
letters that:

1. Appear after the two LAM cases.

2. Hide the Laam by...

3. Doubling themselves.

An example is Seen, Sheen, THaad, dhaad, ra, nuun, etc. So, I don't have to write
two functions that represent the two grammatical cases, I only need to identify
those letters and check if they exist after a xxLL or xxAL, where xx is an optional
letter or two letters, and xx = wa, fa, ka, etc.

The first example above is related to laam al-jarr, while the second is AL atta`reef.
Two rules, and one computational rule to solve them. I hope that you will be lucky to
face such cases that are unrelated grammatically, but highly related computationally.
If you do, insha-Allah, I would be very interested to take a look, if you don't mind.
I've talked to you about something you have never seen, I know this can be confusing.
Insha-Allah I will post this transliteration class once I'm done describing it.


Wishing you and your family peace and good health.


Salam,
Abdalla Alothman