[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Vim-6.0 & Arabic



On Mon, 26 Nov 2001 12:01:49 -0800
 "Ron Aaron" <raaron at GOAHEAD dot COM> wrote:
> 
> Nadim Shaikli <shaikli at yahoo dot com> writes:
> >
> >In any regard, I had the following initial questions regarding adding
> >Arabic support to VIM-6,
> >
> >  http://www.arabeyes.org/archives/developer/2001/November/msg00007.html
> >
> 
> 
> Nadim, 
> 
> Have you got any responses?  

Received one that wasn't very comprehensive (CC'ed to the list) - all
in all, nothing that really helped yet :-/

Where's Bram :-)

> I am heavily involved in the Hebrew support in vim, and I think we have some
> parallel issues.
> 
> One example:
> 
> Normally the word "shalom" would be spelled "shlm", without any vowels.
> This is how maybe 95% of the Hebrew out there is written (e.g. w/o vowels).
> But in religious texts or beginners' texts, or when there is ambiguity, it
> would be written "shalom".  
> 
> The problem arises when one wishes to search the text for "shlm" and find
> also "shalom".  Essentially, I argue that the "combining characters" should
> be ignored (by an option, probably).  I think Arabic has a very similar
> issue, true?
> 
> 
> A worse problem is that in Unicode, there are different /equivalent/
> encodings of some of the consonant-vowel pairs.  For example:
> 
>         "b" followed by "." may be equivalently:
> 
>                 "b."
>         or
>                 "B"
> 
> The screen represenation looks (almost) the same, but one is two characters
> and the other is just one (internally -- the "b" is the important part for
> most purposes).  I just found this problem my upgrading my version of
> "libiconv", which very helpfully converts to the shortest representation --
> and which vim has no mechanism to deal with. 
> 
> Here, when I search for "b" I should find "b", "b." and "B" equally since
> they are the same as far as the searcher is concerned.

I'm am not familiar with hebrew, but it sounds like you'd benefit from what
we are trying to do as well (in having two representations of the words --
an array holding what's typed and an array holding what's displayed, in our
case (Arabic) we'd like to search the "what's typed" array where-as it sounds
like you'd search the "what's displayed" array).  In any regard, I had wanted
to get some feedback from Bram (as I've mailed him personally a couple of
times), but it either seems he's away from his PC or rather busy with other
things. 

> In Hebrew we also have the idea of a character assuming different glyphs
> based on its position in the word, but it is /much/ simpler than in Arabic!
> We have five letters which are different when appearing at the end of a
> word.  It seems having an option to 'normalize' the searched-text to find
> all variants (all four in Arabic!) as being equivalent is what is desired.

As noted, we've decided to take the approach of having two arrays (one that
contains what the user enters and is all entailed in ISO-8859-6 and the other
that would contain what's displayed on screen (shaped letters, etc)).  The
reason we're opting to go this direction is to simplify what's next (search,
insertion, replace, etc).  In Arabic we'll have a one-to-one correspondence
(or we'll try to work toward that anyways) between the typed-array and the
display-array.

> Perhaps these are all the same problem, but slightly different issues?

I believe your statement above to be true.

> Does Farsi use the same rules for letter shaping etc. as Arabic?

Very close to it -- the Farsi code in VIM currently is not Unicode compliant
which gives them all sorts of "non-sanctioned" short-cuts that we, on Arabic,
are NOT willing to take.  I know there are a few Farsi speakers on this list
that might be interested in helping if enough progress is made.  The key as
far as we're concerned is the display and typed/store arrays.
 
> If we combine our efforts perhaps we will be able to make vim and excellent
> BiDi editor (it's really getting there).

Agreed - note that VIM is not a BiDi editor :-)  Its a unidirectional (either
rightleft or leftright); that's something we don't want to get into right now
since Arabic is SO far behind and we'd like to catch-up and be able to at
least read/write simple Arabic text (but, Bidi does need to be addressed,
maybe via Fribidi ??).

What are your thoughts on how to proceed ?  It is incumbent upon us to get
Arabic working ASAP since there are a number of people who are waiting (and
have been for a very long time) to get their hands on something that works.
 
 - Nadim


__________________________________________________
Do You Yahoo!?
Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
http://geocities.yahoo.com/ps/info1