[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Submitted papers

To: <general at arabeyes dot org>
Subject: Re: Submitted papers
From: Behdad Esfahbod <behdad at bamdad dot org>
Date: Sun, 19 Aug 2001 21:28:58 +0430 (IRST)
On Sat, 18 Aug 2001, Chahine M. Hamila wrote:
> Behdad Esfahbod wrote:
> > not necesserily, bidi proposal as a counter-example ;-).
>
> I'm missing something here. Except for translations, I don't see how you can
> seperate Arabic from Persian for example. I know there are a few more
> characters in Persian (which btw are needed for spoken Arabic transcription),
> but they don't have any impact on coding.

The main difference is that we are so confused with myriad visual
homemade non-standard non-sophisticated 8bit codpages/encodings (each
company develops it's softwares with it's own encoding...), standard
ISIRI-3342 is not much better too, then we have decided to just use
Unicode (and utf-8 as encoding), and be conformant...

> When I speak about non-Arabic speak, I am not including Persian
> for example. I am more refering to other non-Arabic based
> languages, like Syriac for example. It doesn't really matter
> whether I call them Persian-based, Urdu-based, Arabic-based... If
> you have another less ambiguous name, I will use it. Now when it
> comes to standards, yes, using standards is the best. But when
> standards are obviously buggy, it can be interesting to change
> standards (i.e. we're still working on standards, but new ones;),
> even if they have to be de facto ones).

1. But I can't see the nessecity for another bidi algorithm, Unicode's
is good enough in my opinion, what you talk about as cons of the
algorithm, is non-sense, you are skiping the level named
"reverse-bidi" and want to define simple text as visual form, ...

2. I wonder if you can force Unicode guys to change the bidi algorithm
UAX#9 completely, small corrections would be possible, but changes you
talk about is against their policies...

  http://www.unicode.org/unicode/standard/policies.html

> Okay, now I hope I am not sensing regional sensitivities in here.
> I would want Persian users/developers to coordinate their efforts with us,
> since unlike Hebrew users, our alphabets overlap (again when I speak about
> Arabic stuff, I am pretty much talking about Arabic-based alphabets, not
> necessarily the ones used by Arabs only).
> I think it would be a pity to throw a new proposal altogether just because of
> conservatism due to some (not that bad but improvable) standard that has been
> defined before. Especially when the transition can be smooth.

See above.

> As little usable as Akka is for now, you can already edit what is
> called 'simple text' (i.e text that's not really bidirectional).
> The problem is that if you use text edited in 'simple text',
> conventional unicode bidi algorithm won't read it correctly if it
> contains any word made of numbers or Latin chars. For example a
> 'simple text' in a RTL assumed orientation, text containing 2001
> will be stored 1002 and unicode bidi algo will keep it as such
> when displaying it.

Then let's first discuss about how should a text be recorded in a
computer, in logical or visual order?  For many trivial and important
reasons we prefer logical order, but you are talking about visual.

> In the same spirit, the same text containing Linux will store it
> xuniL and unicode bidi will keep it as such. The whole idea of the
> proposal is to make unicode bidi alg. aware of the assumed
> orientation and act accordingly.
>
> > an example of thos thousands of
> > programs...???
>
> With Akka (even though it's far from being production use now),
> you can already edit Arabic text with vi, emacs, read a text with
> Lynx and so on. IOW, if you want a list of the thousands of
> programs that you can use, you just have to take a list of
> existing applications for linux console:)

I have read the Akka code, also I have implemented a utf-8 filter like
that for console, but not it's far from being productive, it cannot be
at any time, because it's not the way to do this...

> > about a year later, I started to develop my own bidi
> > algorithm, the ideas were great in mind, but when I started to
> > implement...,
>
> the idea is not to reinvent the wheel but to make it possible for
> an Arab (or a Persian) to write the same way an English person
> would, i.e. that writing monodirectional (more or less, to be more
> accurate, what has been defined as 'simple text')

Where is it defined ('simple text')???

> would not need the implementation of the unicode bidi algorithm at
> every level (which is impossible anyway, see Akka, a terminal
> driver)

So important: Akka is not a terminal driver, if it was, the life was
too better for you, Akka is just a hack, that even does not take care
of linebreaking and terminal's width, just the terminal itself can
take care of all of these, that I think would be enough for us to run
old applications almost without any modifications...

> and that monodirectional text could still be read with
> unicode bidi algo. The way the unicode bidi alg was done is
> compatible with purely English text edited with non bidi-aware
> progs, they forgot that monodirectional texts exist in Arabic too.
> It is not ideological stuff but practical stuff, as advancement in
> Arabization would be a lot faster if that was possible (since
> every English soft would be usable for Arabic).

A bidi algorithm MUST be compatible with pure english text..., to let
you add a bidi level to your system and still everything english work
well, they didn't forgot that Arabic monodirectional texts exist,
because in comparison with english texts/applications, the amount of
arabic ones is small enough to be negligible.

> > it was impossible, Unicode's UAX#9 is a masterpiece as a
> > bidi algorithm, just run your algorithm (not welldefined in paper)
>
> Okay, that's the whole point in sending the proposal on this list.
> I am not a professional proposal-writer, and I need your input on
> things that seem unclear in the proposal. Note that the proposal
> is not to write something new from scratch, but bring a minor
> change to the existing unicode algorithm, so it is assumed that
> you read it with the unicode standard in mind.

I have UAX#9 completely in my mind, even remember the rules names and
numbers..., would you please prepare a diff for the algorithm?
Something like the "Retaining Format Codes" of the standard, both for
the algorithm itself, ang definitions/goals.

> > on unicode's test datas, and send us your output.
>
> the test data for 'complex texts' would be slightly different as
> logical order would be different (for example, in l2r assumed
> orientation, Arabic "RAC EHT" would be stored "EHT RAC" instead of
> "THE CAR").

:O, what about breaking lines? how do you break lines of a paragraph?
how do you cut & paste? how do you insert a complex text, between
another one?????? you will need lots of explicit bidirectional marks
to *emulate* these...

> > ok, you can use your own bidi alg. with your charsets/codepages, but
> > don't think about using more that one bidi alg. with unicode text.
>
> There wouldn't be any point then in using any different algo.
> Please reconsider this after reading the above mentioned stuff.

You cannot force them to change the memory representation to your
mixture of logical and visual, does this representation have any
benefits? you even cannot search text in it.

> > <snip>
>
> > if Hebrew users chose  another bidirectional system,
> > > > you could easily get to the point where half the systems use one bidi
> > > > sytem and half the other, messing with any attempt  at reliable RTL/bidi
> > > > use.
> >
> > no one can get to that point, because I believe the main unicode
> > arabic script users will be for persian, because no iso8859 encoding
> > supports persian well, and we stick ourselves to use unicode,
> > currently we have lots of text encoded in unicode in our projects, ...
>
> Actually, the unicode (or the changed proposed) bidi algo is
> charset independent. Which means, it applies to iso-8859-6 as well
> as the Persian ISIRI via a conversion to unicode (not necessary
> when implementing of course but that's the idea behind it). In
> other words, implementing such an algo on an encoding-dependent
> basis is pointless and would better be abandoned.
>
> Salam,
> Chahine

My recommentdation: fill the ambigiuties, find rationales for your
suggestions, show how trivial needed functions (like search, paste,
comparision, substring...) could be implemented efficiently with them,
...., in few words: proof it's usefulness.


Yours,
-- 
Behdad
28 Mordad 1380, 2001 Aug 19

[Finger for Geek Code]
Follow-Ups:
- Re: Submitted papers
  - From: Chahine M. Hamila
References:
- Re: Submitted papers
  - From: Chahine M. Hamila
Prev by Date: Re: StarOffice and Arabic
Next by Date: Re: Submitted papers
Previous by thread: Re: Submitted papers
Next by thread: Re: Submitted papers
Index(es):
- Date
- Thread