[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: arabic ispell



Please note, I'm CC'ing a mailing-list which might refute or correct
what I'm about to reply with (ie. they'll keep me honest :-)

--- Geoff Kuenning wrote:
> >
> > If I wanted to add arabic support to ispell, would that be possible
> > (arabic -> UTF-8) ? do note that arabic is a right-to-left language.
> 
> There have been a couple of attempts at adapting ispell to Arabic in
> the past.  The first was by somebody in Egypt, probably over 10 years
> ago.  I received another mail about 2-3 years ago.  Both people
> eventually disappeared (from my point of view).  I don't know if that
> was because the task turned out to be too difficult, or because they
> were distracted by other things.
> 
> There is already a Hebrew ispell dictionary, so clearly someone else
> has found a way to surmount the problem.  But I would need to know a
> few more things.
> 
> First, is Arabic text stored right-to-left or left-to-right
> internally?  In other words, if the Arabic equivalent of "Cairo" were
> stored in a file, would the first byte of the first line be the "C" or
> the "o"?  If it's the latter, then it would be trivial to add
> rudimentary Arabic support to ispell.

Well, in arabic "Cairo" would be displayed as "oriaC" and would be stored
in the order it was entered (it'll be stored as "Cairo"), thus the first
byte of the first line in the file you note above would be "C" (left most 
character in that file per normal usage).

A quickie about arabic - there 2 notions of order; logical/physical and
visual order (logical/physical order being the storage order).  Visual
order is used to reverse the order of the bytes stored and to display them 
correctly (there are various optional hint bytes to better control this
transformation).  There is also the notion of shaping which is invoked in
the visual order only (and does not pertain to physical/logical order).
Shaping's main function is to morphs or changes the characters appearance
(by calling a corresponding positional glyph from the font library) depending
on where that character is within a word; so you might have say "help"
stored, but when its displayed it will appear as "PleH" for example
(where 'P' and 'H' are transformations of the stored letters).  For the
purposes of a spell-checker, I don't think shaping is of any concern
(what's saved in memory/disk is).

> Second, is Arabic normally displayed with the right-hand margin
> justified and the left one unjustified, as a mirror image of the way
> English does it?

Yes - yet, there is another quirk to this (which ought to be ignored -
I'm mentioning it simple for completeness).  One can intermix Arabic and
English words in the same document (utilizing Bidi -- bi-directionality).
I'm not sure how the spell-checker would know how to distinguish between
arabic vs. english bytes/words (I don't think it will be able to) and thus
I'm inclined to think a single language (user defined or command-line or
whatever) would be all that one could spell-check against at any one moment.

> It seems to me, thinking only superficially about the problem, that it
> should be quite easy to add some sort of right-to-left support to
> ispell.  From ispell's point of view, it's mostly a question of
> how to display the text.  There's a small problem in that you'd like
> to spell-check the first (right-hand) word on the line before the last
> one, but I can think of a couple of simple ways to solve that problem
> For example, it might be enough to just reverse the contents of the
> line before spell-checking, and un-reverse it afterwards.
> 
> I think the bigger problem you might encounter is in constructing the
> affix file.  I don't know a thing about Arabic, but I know that some
> languages just don't do well with the way ispell handles affixes.  I
> would recommend that you investigate that problem early in the
> project.  However, regardless of that question, I think I could be
> talked into helping with a project to add right-to-left support to
> ispell.

I (along with others) will look into your point.

Many thanks for your rapid response.

 - Nadim

__________________________________________________
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com