[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Arabic spellchecker
- To: Ahmad Khalifa <ahmad at khalifa dot ws>
- Subject: Re: Arabic spellchecker
- From: Abdalla Alothman <abdalla at pheye dot net>
- Date: Wed, 16 Nov 2005 09:57:49 +0300
- Cc: Development Discussions <developer at arabeyes dot org>
- Organization: Pheye Technologique, GT&C
- User-agent: KMail/1.8
Wa alaikum asalam wa rahmatullaah
On Tuesday 15 November 2005 22:20, Ahmad Khalifa wrote:
> You mean you already have such a wordlist ? I would be interested
> in taking a look at it, if you don't mind. I would like to see how it
> performs in OO.o.
Do you want to see the source or the results? The source is within a personal
project that I wanted to release. It contains more than 30 C++ files. It's not
ready, but I would love it if you can test it altogether.
The results are within 32 files; it's a comprehensive word index. They are
automatically generated in XML and HTML. I can show you some samples if you
want.
> Right now, ammar is working on elzubeir's "Arabic Grammer Rules"
> document,
> http://cvs.arabeyes.org/viewcvs/projects/duali/doc/arabic-grammar
As I mentioned earlier, when it comes to computation there *might* be
shortcuts available. The problem I was referring to was to produce
transliteration from Arabic to English. When the user enters:
مُوسى
it will construct:
Mousa
if the input is:
لِلقمرِ ضياء
the result will be:
lilqamari dhiyaa-a
This is fine. However, if the input is:
لِلشمس بريق
or:
والشمس والضحى
I had a problem. It was irritating to me. Naturally, the output will be:
LILshamsi bareeq
walshams waldhuHaa.
The laams here should be silenced (hidden). We have two different grammatical rules.
but only one small function is needed to produce the proper result:
lish-shamsi bareeq
wash-shamsi wadh-dhuHaa.
It appeared to me that regardless of the two grammatical rules, there are some
letters that:
1. Appear after the two LAM cases.
2. Hide the Laam by...
3. Doubling themselves.
An example is Seen, Sheen, THaad, dhaad, ra, nuun, etc. So, I don't have to write
two functions that represent the two grammatical cases, I only need to identify
those letters and check if they exist after a xxLL or xxAL, where xx is an optional
letter or two letters, and xx = wa, fa, ka, etc.
The first example above is related to laam al-jarr, while the second is AL atta`reef.
Two rules, and one computational rule to solve them. I hope that you will be lucky to
face such cases that are unrelated grammatically, but highly related computationally.
If you do, insha-Allah, I would be very interested to take a look, if you don't mind.
I've talked to you about something you have never seen, I know this can be confusing.
Insha-Allah I will post this transliteration class once I'm done describing it.
Wishing you and your family peace and good health.
Salam,
Abdalla Alothman