[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Submitted papers

To: general at arabeyes dot org
Subject: Re: Submitted papers
From: "Chahine M. Hamila" <mch at chaham dot com>
Date: Sun, 19 Aug 2001 20:55:03 +0200
Behdad Esfahbod wrote:

>  we have decided to just use
> Unicode (and utf-8 as encoding), and be conformant...

note that utf-8 as internal encoding for an application is not the most
practical, especially in terms of algorithmic complexity (other encodings
such as UCS are better for that).

> 1. But I can't see the nessecity for another bidi algorithm, Unicode's
> is good enough in my opinion, what you talk about as cons of the
> algorithm, is non-sense, you are skiping the level named
> "reverse-bidi" and want to define simple text as visual form, ...

Not really.

First, correct me if I'm wrong, but the reverse bidi stuff is not part of any
standard. It's definitely not part of the unicode specification, and all I have
found by looking on the web (since this is the first I heard about it) is that
IBM is supposed to have developped something in that sense (which would
actually be great and would render the purpose of this proposal void).

Now simple text is not visual text, but the subset of bidirectional texts that
when stored visually could match the logical storage of a more complex
bidirectional text. IOW, it is the set of all texts minus (those who will
contain at least two expressions made each of at least (two subsequent words
of the same orientation i.e. Latin Latin or Arabic Arabic)). Why minus this
set ? Simply because it would involve line breaks which makes storage of
bidirectional text incompatible with visual storage. As you can see, it's a
bit more than what you state, but this is not the point.

Second, I wrote quite a few times about the necessity of such an algorithm:
access to all existing software without other modifications than very limited
hacks in a very limited number of libraries/drivers. Now, we definitely
should support the unicode specification, but supporting ONLY the unicode
specifications equates to voluntarily limiting the amount of software
available to us. Unless you give me a good reason for limiting the amount of
software we should have access to, I think that implementing that proposal is
the way to go (unless the reverse bidi algo fulfills our need, in which case
all we need is that you point to us where it is defined/implemented).

> 2. I wonder if you can force Unicode guys to change the bidi algorithm
> UAX#9 completely, small corrections would be possible, but changes you
> talk about is against their policies...
>
>   http://www.unicode.org/unicode/standard/policies.html

To be honest, I don't think we'll change their minds, I'm quite aware of that
policy. I have enough hard time convincing those would benefit from such a
change, so I wouldn't go into the wihful thinking that I could convince those
who *made* the standard :) Seriously though, the goal would be more to reach
a critical mass of users in order to make it a de facto standard (and maybe
then a formal one).  In any case, I suggest keeping the support of the current
spec, and *extending it*.

> Then let's first discuss about how should a text be recorded in a
> computer, in logical or visual order?  For many trivial and important
> reasons we prefer logical order, but you are talking about visual.

Ok there, I think you misunderstand the whole stuff. In most texts, visual and
logical order can match. The problem is that a difference is imposed in the
case of Arabic. The idea here is as long as the difference between logical
and visual order is not necessary, it shoudln't be made for the reasons
mentionned before.  So to answer you question straightly: monodirectional
texts, (actually what was defined as 'simple texts' and which is a superset
of monodirectional texts) have normally no such a notion as visual/logical
order, the two are the same, keeping in mind that Arabic texts too can be
monodirectional texts. Bidirectional text should be stored in logical order
of course.
Why not store everything as bidirectional text then? Again (repetition doesn't
hurt), for two reasons: first, not all software can deal with bidirectional
texts.  They require capabilities that some soft/interfaces don't necessarily
have (actually, correctly displaying a bidirectional text always requires a
soft that has a global view of your text). The second and more important
reason is that most of today's software (and probably a good deal of the 
short-mid term future ones) are developped with monodirectional capabilities
only.

> I have read the Akka code, also I have implemented a utf-8 filter like
> that for console, but not it's far from being productive, it cannot be
> at any time, because it's not the way to do this...

It would be interesting to study it if you published anything interesting in
that sense.

> Where is it defined ('simple text')???

Actually it is defined by opposition to "complex text" in the proposal.
Also redefined above in this mail:)

> So important: Akka is not a terminal driver,

duh, this is about semantics...

> if it was, the life was
> too better for you, Akka is just a hack, that even does not take care
> of linebreaking and terminal's width,

Yes, remember Akka is to deal with the console as a character oriented device.
If we were to give it word capabilities, it would confuse a great deal of
software (any common text editor for example have a different representation
between the edited/viewed text and the way it is displayed).

> just the terminal itself can
> take care of all of these, that I think would be enough for us to run
> old applications almost without any modifications...

no, acon tries to do that for example. You might want to try acon and test.
Acon is a very useful hack, but it doesn't do an unfailable job. It bases 
its display on a more or less inspired unicode bidi algo (which I tried to
replace with fribidi at some time btw before realizing the involved problems).
Try editing a text with VI then try displaying the same text with different
terminal widths and so on. The problem is that the terminal driver (hack,
server, whatever you call that) can't have the same concept of lines and
words as the application that runs on top of it in the general case.

> A bidi algorithm MUST be compatible with pure english text..., to let
> you add a bidi level to your system and still everything english work
> well, they didn't forgot that Arabic monodirectional texts exist,
> because in comparison with english texts/applications, the amount of
> arabic ones is small enough to be negligible.

The amount of written Arabic text might be negligeable, but the amount of
monodirectional software that has been (and will be) written is far from being
so.

> > Okay, that's the whole point in sending the proposal on this list.
> > I am not a professional proposal-writer, and I need your input on
> > things that seem unclear in the proposal. Note that the proposal
> > is not to write something new from scratch, but bring a minor
> > change to the existing unicode algorithm, so it is assumed that
> > you read it with the unicode standard in mind.
>
> I have UAX#9 completely in my mind, even remember the rules names and
> numbers..., would you please prepare a diff for the algorithm?
> Something like the "Retaining Format Codes" of the standard, both for
> the algorithm itself, ang definitions/goals.

Okay, this involves some work which I can't do on a very short period (next
few weeks) but I will nonetheless do.

> > > on unicode's test datas, and send us your output.
> >
> > the test data for 'complex texts' would be slightly different as
> > logical order would be different (for example, in l2r assumed
> > orientation, Arabic "RAC EHT" would be stored "EHT RAC" instead of
> > "THE CAR").
>
> :O, what about breaking lines? how do you break lines of a paragraph?

Like in the unicode spec. The only difference there is that you don't reorder
the reverse words' letters. This allows you to render a monodirectional
Arabic text correctly (IOW, the match between visual and logical storage
in monodirectional texts is made through that trick without any impact on
the bidirectional abilities
or complexity of the algorithm).

> how do you cut & paste? how do you insert a complex text, between
> another one?????? you will need lots of explicit bidirectional marks
> to *emulate* these...

not more or less than with bidi.
An implementation will be ready soon anyway (I hope) and you will have the set.

> You cannot force them to change the memory representation to your
> mixture of logical and visual, does this representation have any
> benefits? you even cannot search text in it.

You can search text in it like you can do it in any English text. If you're
using a bidirectional-capable app, it'll be searcheable the same way it is in
any existing bidi app.

> My recommentdation: fill the ambigiuties, find rationales for your
> suggestions, show how trivial needed functions (like search, paste,
> comparision, substring...) could be implemented efficiently with them,

recommendations taken. Need time for it.

> ...., in few words: proof it's usefulness.

I think the usefulness is obvious and stated quite a few times already. Quite
a few descriptions have been given that should show that it wouldn't involve 
overhead or drawbacks. Though I understand the need for a more formal proof.
In any case, it'll be provided, once my time resources allow me to.

Anyway, waiting for the reverse bidi spec, which might save me some work.

Salam,
Chahine

PS: excuse the typos and language errors, can't reread myself right now. later
Follow-Ups:
- Re: Submitted papers
  - From: David Starner
References:
- Re: Submitted papers
  - From: Behdad Esfahbod
Prev by Date: Re: Submitted papers
Next by Date: Re: Submitted papers
Previous by thread: Re: Submitted papers
Next by thread: Re: Submitted papers
Index(es):
- Date
- Thread