[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Arabic Support For Aspell
- To: Mohammed Elzubeir <elzubeir at arabeyes dot org>
- Subject: Re: Arabic Support For Aspell
- From: Kevin Atkinson <kevin at atkinson dot dhs dot org>
- Date: Sat, 17 Apr 2004 06:10:24 -0400 (EDT)
- Cc: developer at arabeyes dot org
On Sat, 17 Apr 2004, Mohammed Elzubeir wrote:
> On Sat, 2004-04-17 at 13:21, Kevin Atkinson wrote:
> > Hi, I am contacting you because you are the author of Duali.
> >
> > Would you be interested in working with be to add support for Arabic to
> > Aspell (http://aspell.net)?
> >
>
> Very interested. It was my original intention to do it this way, but I
> have received very little feedback during my initial attempts to
> establish contact and so I have abandoned that to work on Duali.
I searched my mailbox and found some brief discussion. Back then Aspell
lacked Affix support or support for Unicode. This has now changed.
> > I believe Aspell can handle it now, however, I am not sure. The Arabic
> > encoding in Unicode is very complicated and I do not understand all the
> > issues involved. Would you be willing to explain to be what I need to
> > know about the encoding for spell checking. In particular should I expect
> > the "Arabic Presentation Forms" to be used? Should words in the
> > dictionary be encoded with the presentation forms? Etc.
>
> Presentation forms are just what they say they are, for visual
> presentation. So, they would not be used. The dictionary is encoded in
> UTF-8 format (at least that's what I use for Duali).
Aspell supports Unicode, but internally it is still 8-bit. So the first
order of business is to establish an internal encoding. Is iso-8859-6
sufficient? If not a new character set can be made up. You can use up to
210 characters (128 upper 8-bit, 30 control, 52 Latin letters). If you
could tell me what parts of the Unicode block 0600-06FF Arabic needs for
words I can create a mapping for you.
> > Once we establish that Aspell can indeed handle Arabic the next thing to do
> > is to convert the "prefixes, suffixes, etc" into an affix file.
> >
>
> You may want to have a look at the lexicon class [1] in CVS. Those are
> the main methods performed in Duali.
OK. That looks a lot like Aspell affix code. I believe Aspell can now
handle it. However the affix data needs to be converted into a single
Affix file. See
http://aspell.sourceforge.net/devel-doc/man/Affix-Compression.html.
> > If you rather spend your effort working on Duali I fully understand.
>
> Our (Arabeyes') goal (and mine) are to make Arabic spell checking
> available to as big of an audience as possible. I believe adding it to
> aspell would make it more reachable. Having said that, I don't plan to
> completely abandon Duali itself, but would be more than happy to
> actively contribute to Arabic spell checking in aspell.
OK great.
> P.S. Please do contact me via the 'developer' [2] list as some people
> may be interested to know about such issues.
Will do.
---
http://kevin.atkinson.dhs.org