[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Submitted papers



Behdad Esfahbod wrote:

> On Thu, 16 Aug 2001, Chahine M. Hamila wrote:
>
> > Yes, the issue has been discussed with other people using other
> > Semitic based language. For non Hebrew users, the problem is not
> > really important since 1) what is done on Arabization benefits
> > every Arabic-based language
>
> not necesserily, bidi proposal as a counter-example ;-).

I'm missing something here. Except for translations, I don't see how you can
seperate Arabic from Persian for example. I know there are a few more
characters in Persian (which btw are needed for spoken Arabic transcription),
but they don't have any impact on coding.

> > 2) non Hebrew and non Arabic scripts are so little used for now
> > that any work done is what will actually define how things will
> > work.
>
> not true, but not annoying too much because we (in Iran) have decided
> to just use unicode and other *standards*.

When I speak about non-Arabic speak, I am not including Persian for example. I
am more refering to other non-Arabic based languages, like Syriac for example.
It doesn't really matter whether I call them Persian-based, Urdu-based,
Arabic-based... If you have another less ambiguous name, I will use it.
Now when it comes to standards, yes, using standards is the best. But when
standards are obviously buggy, it can be interesting to change standards (i.e.
we're still working on standards, but new ones;), even if they have to be de
facto ones).

> > Now, with Hebrew users, the some feedback we got wasn't very
> > enthusiastic about the idea indeed. The main disagreement is about
> > Hebrew users wanting to stick to the already defined standards
> > (which isn't a bad thing in itself),
>
> and also persian guys, maybe all other non arabs :-).

Okay, now I hope I am not sensing regional sensitivities in here.
I would want Persian users/developers to coordinate their efforts with us,
since unlike Hebrew users, our alphabets overlap (again when I speak about
Arabic stuff, I am pretty much talking about Arabic-based alphabets, not
necessarily the ones used by Arabs only).
I think it would be a pity to throw a new proposal altogether just because of
conservatism due to some (not that bad but improvable) standard that has been
defined before. Especially when the transition can be smooth.

> > and the Arabeyes people who are more into making one minor change
> > to the bidi standard which would allow us to benefit from
> > thousands of programs in a raw without a single line of code added
> > to the existing programs and without any overhead or addition of
> > complexity.
>
> is this true? would you please first contribute us a sample/reference
> implementation of your bidi alg.?

As little usable as Akka is for now, you can already edit what is called
'simple text' (i.e text that's not really bidirectional).
The problem is that if you use text edited in 'simple text', conventional
unicode bidi algorithm won't read it correctly if it contains any word made of
numbers or Latin chars. For example a 'simple text' in a RTL assumed
orientation, text containing 2001 will be stored 1002 and unicode bidi algo
will keep it as such when displaying it. In the same spirit, the same text
containing Linux will store it xuniL and unicode bidi will keep it as such. The
whole idea of the proposal is to make unicode bidi alg. aware of the assumed
orientation and act accordingly.

> an example of thos thousands of
> programs...???

With Akka (even though it's far from being production use now), you can already
edit Arabic text with vi, emacs, read a text with Lynx and so on. IOW, if you
want a list of the thousands of programs that you can use, you just have to
take a list of existing applications for linux console:)

> about a year later, I started to develop my own bidi
> algorithm, the ideas were great in mind, but when I started to
> implement...,

the idea is not to reinvent the wheel but to make it possible for an Arab (or a
Persian) to write the same way an English person would, i.e. that writing
monodirectional (more or less, to be more accurate, what has been defined as
'simple text') would not need the implementation of the unicode bidi algorithm
at every level (which is impossible anyway, see Akka, a terminal driver), and
that monodirectional text could still be read with unicode bidi algo. The way
the unicode bidi alg was done is compatible with purely English text edited
with non bidi-aware progs, they forgot that monodirectional texts exist in
Arabic too. It is not ideological stuff but practical stuff, as advancement in
Arabization would be a lot faster if that was possible (since every English
soft would be usable for Arabic).

> it was impossible, Unicode's UAX#9 is a masterpiece as a
> bidi algorithm, just run your algorithm (not welldefined in paper)

Okay, that's the whole point in sending the proposal on this list. I am not a
professional proposal-writer, and I need your input on things that seem unclear
in the proposal. Note that the proposal is not to write something new from
scratch, but bring a minor change to the existing unicode algorithm, so it is
assumed that you read it with the unicode standard in mind.

> on
> unicode's test datas, and send us your output.

the test data for 'complex texts' would be slightly different as logical order
would be different (for example, in l2r assumed orientation, Arabic "RAC EHT"
would be stored "EHT RAC" instead of "THE CAR").

> > If we still keep parting direction, we will indeed have two
> > different bidi algorithms. IMHO, even though it would be better to
> > have one only algorithm, it's not that important, as the algorithm
> > would just be language dependent.
>
> ok, you can use your own bidi alg. with your charsets/codepages, but
> don't think about using more that one bidi alg. with unicode text.

There wouldn't be any point then in using any different algo. Please reconsider
this after reading the above mentioned stuff.

> > and the pros of getting all these thousands of existing programs
> > at once is more important that the cons of having two bidi algos
> > or being slightly different from what the unicode consortium has
> > defined.
>
> :-O.
>
> <snip>

> if Hebrew users chose  another bidirectional system,
> > > you could easily get to the point where half the systems use one bidi
> > > sytem and half the other, messing with any attempt  at reliable RTL/bidi
> > > use.
>
> no one can get to that point, because I believe the main unicode
> arabic script users will be for persian, because no iso8859 encoding
> supports persian well, and we stick ourselves to use unicode,
> currently we have lots of text encoded in unicode in our projects, ...

Actually, the unicode (or the changed proposed) bidi algo is charset
independent. Which means, it applies to iso-8859-6 as well as the Persian ISIRI
via a conversion to unicode (not necessary when implementing of course but
that's the idea behind it). In other words, implementing such an algo on an
encoding-dependent basis is pointless and would better be abandoned.

Salam,
Chahine