[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Arabic Support For Aspell



On Sat, 17 Apr 2004, Mohammed Elzubeir wrote:

> On Sat, 2004-04-17 at 13:21, Kevin Atkinson wrote:
> > Hi, I am contacting you because you are the author of Duali.
> > 
> > Would you be interested in working with be to add support for Arabic to 
> > Aspell (http://aspell.net)?
> > 
> 
> Very interested. It was my original intention to do it this way, but I
> have received very little feedback during my initial attempts to
> establish contact and so I have abandoned that to work on Duali.

I searched my mailbox and found some brief discussion.  Back then Aspell
lacked Affix support or support for Unicode.  This has now changed.

> > I believe Aspell can handle it now, however, I am not sure.  The Arabic 
> > encoding in Unicode is very complicated and I do not understand all the 
> > issues involved.  Would you be willing to explain to be what I need to 
> > know about the encoding for spell checking.  In particular should I expect 
> > the "Arabic Presentation Forms" to be used?  Should words in the 
> > dictionary be encoded with the presentation forms? Etc.
> 
> Presentation forms are just what they say they are, for visual
> presentation. So, they would not be used. The dictionary is encoded in
> UTF-8 format (at least that's what I use for Duali). 

Aspell supports Unicode, but internally it is still 8-bit.  So the first 
order of business is to establish an internal encoding.  Is iso-8859-6 
sufficient?  If not a new character set can be made up.  You can use up to 
210 characters (128 upper 8-bit, 30 control, 52 Latin letters).   If you 
could tell me what parts of the Unicode block 0600-06FF Arabic needs for 
words I can create a mapping for you.

> > Once we establish that Aspell can indeed handle Arabic the next thing to do 
> > is to convert the "prefixes, suffixes, etc" into an affix file.
> > 
> 
> You may want to have a look at the lexicon class [1] in CVS. Those are
> the main methods performed in Duali.

OK.  That looks a lot like Aspell affix code.  I believe Aspell can now 
handle it.  However the affix data needs to be converted into a single 
Affix file.  See 
http://aspell.sourceforge.net/devel-doc/man/Affix-Compression.html.

> > If you rather spend your effort working on Duali I fully understand.  
> 
> Our (Arabeyes') goal (and mine) are to make Arabic spell checking
> available to as big of an audience as possible. I believe adding it to
> aspell would make it more reachable. Having said that, I don't plan to
> completely abandon Duali itself, but would be more than happy to
> actively contribute to Arabic spell checking in aspell.

OK great.

> P.S. Please do contact me via the 'developer' [2] list as some people
> may be interested to know about such issues.

Will do.

--- 
http://kevin.atkinson.dhs.org