[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Arabic spellchecker

To: abdalla at pheye dot net, Development Discussions <developer at arabeyes dot org>
Subject: Re: Arabic spellchecker
From: Ahmad Khalifa <ahmad at khalifa dot ws>
Date: Tue, 15 Nov 2005 19:55:13 +0200
User-agent: Mozilla Thunderbird 1.0.6 (Windows/20050716)

Abdalla Alothman wrote:

I read somewhere about someone who did a very cleaver move: He

[...]

I would rather spend a week or two developing such tools rather than type large amounts of data by hand.


I think duali's dataset was collected that way. By crawling arabic news
sites like http://www.ahram.org.eg/

This is helpful if you want to collect a fat wordlist which
describes the verb 'كتب'(KTB) and all its derivatives in 15 entries.
see http://www.khalifa.ws/files/public/arabic-dictionary.txt

You can get such a dictionary in a few hours. Just parse duali's dataset
and generate a wordlist.

But, such a dictionary would not be very efficient given that a verb such as 'كتب'(KTB) can be written as 1 entry with special flags.

This is where its difficulty lies. Defining the AFFIX rules and
writing a *flagged* wordlist.

Right now, we are in the 'Define Arabic as AFFIX rules' phase. Next we
would be in the 'populate the flagged dictionary list' phase.

If all fails, or this takes too long, we will fall back to the fat
wordlist option, which would then require a small *PERL* script to parse
duali's space delimited 'stems' file.

This is just a simple idea that I have never tried. I hope it's helpful.


It would actually work :) but we're going for the better solution. We
need this to be as efficient as possible, for this would be (hopefully)
part of OO.o someday.

--
Salam,
Ahmad Khalifa

Follow-Ups:
- Re: Arabic spellchecker
  - From: Abdalla Alothman

References:
- Arabic spellchecker
  - From: Ahmad Khalifa
- Re: Arabic spellchecker
  - From: Mohammed Sameer
- Re: Arabic spellchecker
  - From: Abdalla Alothman

Prev by Date: error at booting from CD
Next by Date: Re: Arabic spellchecker
Previous by thread: Re: Arabic spellchecker
Next by thread: Re: Arabic spellchecker
Index(es):
- Date
- Thread